
Database Performance Troubleshoot Notebook
Notebook

![]() | ![]() |
Intro
Introducing a powerful Python script designed to ease performance analysis tasks for database management.
This script loads query information from csv file exposed on public URL
Executes SQL queries against selected database
Exports results to searchable html tables and uploads archive of generated html files with index into stage area
Handles Stage Area operations using singlestore python client which uses SingleStore Management API
Simplifying complex tasks, this script is essential for streamlining workflows for administrators and developers alike
What you will learn in this notebook:
How to read a csv and load data into pandas dataframes[Python] Download DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv file from url
Execute queries and export result into html files [Python]
Use of SingleStore client for db operations and stage area [Python]
What benefits do you get out of using the notebook.
User will be able to run most used performance checks
Results are exported into HTML for better view
Along with analysis of known scenarios, script also provides background and possible actions to take
Questions?
Reach out to us through our forum.
Pre-requisites
We will need below parameters to proceed.
SingleStore Management API KEY. Follow this link for API Key
Directory Path of Stage Area ( Target location to upload archive )
URL to download csv file
URL of result template directory
Note: You may use the
DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv as template to add up your queries.
templates as templates for results
For simplicity of demo, here we are using a public accessible URL, you have to adapt access pattern to suit your needs.
CSV File structure
QueryID | QueryName | QueryTxt |
Install Libraries
In [1]:
1pip install openpyxl jsonpath_ng sql_metadata
Imports
In [2]:
1import io2import logging3import os4import shutil5import tarfile6import time7import json8import re9import os10import csv11 12import pandas as pd13import singlestoredb as s214 15from pathlib import Path16from urllib.request import urlopen17from jsonpath_ng import parse18from sql_metadata import Parser19from urllib.error import HTTPError20from datetime import datetime21from openpyxl import Workbook22 23from IPython.display import display, HTML
Variables
In [3]:
1query_data_url = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv"2template_url_base = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/templates/"3 4stage_folder_path = 'DBPERF-REPORT'5 6my_timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")7local_output_dir_suffix = '_' + my_timestamp + '_PERF_REPORT'8 9empty_result_table = '<p class="mb-3 mt-3" style="text-align:center;color:blue;">No Matching Records Found</p>'10result_table_html_classes = 'table table-striped table-bordered table-responsive my-2 px-2'11 12WORKGROUP_ID = os.getenv('SINGLESTOREDB_WORKSPACE_GROUP')13 14HEADERS = ["PLAN_ID", "DATABASE_NAME", "SQL_QUERY","SUGGESTION","CMP_EXP",15 "LEFT_TABLE", "LEFT_COLUMN", "LEFT_TYPE", "LEFT_TABLE_ROW_COUNT",16 "RIGHT_TABLE", "RIGHT_COLUMN", "RIGHT_TYPE", "RIGHT_TABLE_ROW_COUNT" ]17 18s2_workgroup_stage = None19s2_workspace_name = None20 21row_count_parser = parse("$.*.rowcount")22table_row_count_cache = {}23 24MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY = """25WITH mismatched_comp26 AS (SELECT plan_id,27 database_name,28 table_col AS mis_cmp29 FROM information_schema.plancache30 cross join TABLE(JSON_TO_ARRAY(31 plan_info :: type_mismatched_comparisons)32 ) HAVING plan_warnings like '%%Comparisons between mismatched datatypes%%' ),33 mismatche_cmp_234 AS (SELECT plan_id,35 database_name,36 mis_cmp ::$ comparison_expression AS cmp_exp,37 mis_cmp ::$ left_type AS left_type,38 mis_cmp ::$ right_type AS right_type39 FROM mismatched_comp),40 plan_tbl_row_counts41 AS (SELECT plan_id,42 database_name,43 optimizer_notes ::$ table_row_counts AS tbl_row_counts,44 query_text45 FROM information_schema.plancache)46SELECT m2.plan_id as PLAN_ID,47 m2.database_name as DATABASE_NAME ,48 m2.cmp_exp as CMP_EXP,49 m2.left_type as LEFT_TYPE,50 m2.right_type as RIGHT_TYPE,51 p2.tbl_row_counts as TBL_ROW_COUNTS,52 p2.query_text AS SQL_QUERY53FROM mismatche_cmp_2 m2,54 plan_tbl_row_counts p255WHERE m2.plan_id = p2.plan_id56 AND m2.database_name = p2.database_name ;57"""
Log Control
In [4]:
1def set_logging_enabled(enabled):2 """3 Set the logging level based on the enabled flag.4 5 Parameters6 ----------7 enabled : bool8 True to enable logging, False to disable it.9 """10 if enabled:11 logging.getLogger().setLevel(logging.INFO)12 else:13 logging.getLogger().setLevel(logging.CRITICAL)14 15set_logging_enabled(False)
Note To enable logs
Modify 'set_logging_enabled(False)' to 'set_logging_enabled(True)' in code below
Functions to display various alerts
In [5]:
1def show_warn(warn_msg):2 """3 Display a warning message in a formatted HTML alert box.4 5 Parameters6 ----------7 warn_msg : str8 The warning message to display.9 """10 display(HTML(f'''<div class="alert alert-block alert-warning">11 <b class="fa fa-solid fa-exclamation-circle"></b>12 <div>13 <p><b>Action Required</b></p>14 <p>{warn_msg}</p>15 </div>16</div>'''))17 18 19def show_error(error_msg):20 """21 Display an error message in a formatted HTML alert box.22 23 Parameters24 ----------25 error_msg : str26 The error message to display.27 """28 display(HTML(f'''<div class="alert alert-block alert-danger">29 <b class="fa fa-solid fa-exclamation-triangle"></b>30 <div>31 <p><b>Error</b></p>32 <p>{error_msg}</p>33 </div>34</div>'''))35 36 37def show_success(success_msg):38 """39 Display a success message in a formatted HTML alert box.40 41 Parameters42 ----------43 success_msg : str44 The success message to display.45 """46 display(HTML(f'''<div class="alert alert-block alert-success">47 <b class="fa fa-solid fa-check-circle"></b>48 <div>49 <p><b>Success</b></p>50 <p>{success_msg}</p>51 </div>52</div>'''))
Utility functions handling db connection and archiving
In [6]:
1def execute_query(dbcon, query_txt):2 """3 Execute a SQL query on the specified database connection.4 5 Parameters6 ----------7 dbcon : connection8 The database connection object.9 query_txt : str10 The SQL query to execute.11 12 Returns13 -------14 list15 A list of rows returned by the query.16 """17 try:18 with dbcon.cursor() as cur:19 cur.execute(query_txt)20 return cur.fetchall()21 except Exception as e:22 logging.error(f"Failed to execute query: {e}")23 raise Exception('Failed to execute query')24 25 26def make_tarfile(output_filename, source_dir):27 """28 Create a tar.gz archive of a directory.29 30 Parameters31 ----------32 output_filename : str33 The name of the output archive file.34 source_dir : str35 The path to the directory to archive.36 37 Returns38 -------39 bool40 True if the archive was created successfully, False otherwise.41 """42 try:43 with tarfile.open(output_filename, "w:gz") as tar:44 tar.add(source_dir, arcname=os.path.basename(source_dir))45 time.sleep(2)46 file_stats = os.stat(output_filename)47 logging.info(f'{output_filename} has size {(file_stats.st_size / (1024 * 1024))} mb')48 return True49 except Exception as e:50 logging.error(f'Failed to create archive: {e}')51 raise Exception(f'Failed to create archive: {e}')
Utility functions handling HTML generation
In [7]:
1def copy_index_file(p_local_dir_path, p_templ_base_url):2 """3 Copy the index file to the local directory.4 5 Parameters6 ----------7 p_local_dir_path : str8 The local directory path.9 p_templ_base_url : str10 The base URL for the template.11 12 """13 index_file = f'{p_local_dir_path}/index.html'14 # Fetch the content of the index template from a URL15 index_file_content = fetch_url_content(p_templ_base_url + 'index.template.html')16 # Write the index file content to a local file17 with open(index_file, 'w') as writer:18 writer.write(str(index_file_content))19 logging.info('Index Page are generated')20 21 22def generate_html_list(links):23 """24 Generate an HTML ordered list from a comma-separated list of links.25 26 Parameters27 ----------28 links : str29 A comma-separated list of links.30 31 Returns32 -------33 str34 The HTML formatted ordered list.35 """36 if 'nan' == links:37 return ''38 39 html_list = '<ol>'40 for item in links.split(','):41 html_list += f'<li><a href="{item}">{item}</a></li>'42 html_list += '</ol>'43 return html_list44 45def generate_stage_link(title, stg_path, curr_file_path):46 """47 Generate an HTML link to a stage area.48 49 Parameters50 ----------51 stg_path : str52 The path to the stage area.53 curr_file_path : str54 The current file path.55 56 Returns57 -------58 str59 The HTML formatted link.60 """61 url = f"https://portal.singlestore.com/organizations/{os.environ['SINGLESTOREDB_ORGANIZATION']}/workspaces/{os.environ['SINGLESTOREDB_WORKSPACE_GROUP']}#stage/{stg_path}"62 return f"""<div style=\"text-align:center;margin-top:5px; margin-bottom:5px;\">63 {title}64 STAGE Link <a href='{url}'> {curr_file_path} </a>65 </div>"""
Function loading query data in CSV
In [8]:
1def fetch_url_content(url):2 """3 Fetch the content of a URL.4 5 Parameters6 ----------7 url : str8 The URL to fetch.9 10 Returns11 -------12 str13 The content of the URL.14 """15 try:16 with urlopen(url) as response:17 if response.status == 200:18 my_bytes = response.read()19 file_content = my_bytes.decode("utf8")20 return file_content21 except HTTPError as e:22 logging.error(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')23 raise Exception(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')24 25 26def load_query_data(url):27 """28 Load CSV data from a URL into a pandas DataFrame.29 30 Parameters31 ----------32 url : str33 The URL of the CSV file.34 35 Returns36 -------37 pandas.DataFrame38 The loaded DataFrame.39 """40 csv_file_content = fetch_url_content(url)41 csv_df = pd.read_csv(io.StringIO(csv_file_content), sep=",",42 dtype={'QueryID': int, 'QueryName': str, 'QueryTxt': str, 'QueryParams': str})43 csv_df.sort_values(by=['QueryID'], inplace=True)44 return csv_df
Verify Stage Path and Create if not exists
In [9]:
1def verify_stage_area():2 """3 Verify the existence and writability of a stage area.4 5 Returns6 -------7 bool8 True if the stage area is valid, False otherwise.9 """10 try:11 global s2_workgroup_stage, s2_workspace_name12 my_workspace_mngr = s2.manage_workspaces()13 workspace_group = my_workspace_mngr.get_workspace_group(WORKGROUP_ID)14 s2_workspace_name = my_workspace_mngr.get_workspace(os.environ['SINGLESTOREDB_WORKSPACE']).name15 stage_obj = workspace_group.stage.mkdir(stage_path=stage_folder_path, overwrite=False)16 logging.info(17 f'Stage Path {stage_folder_path} is ok. Is Directory: {stage_obj.is_dir()}. Is Writeable: {stage_obj.writable}')18 if stage_obj.is_dir() and stage_obj.writable:19 s2_workgroup_stage = workspace_group.stage20 logging.info(f'stage is valid: {s2_workgroup_stage is not None}')21 return True22 else:23 logging.error(f'As provided path is neither directory nor writable.')24 return False25 except Exception as stage_ex:26 logging.error(f'Stage Path Verification Failed. {stage_ex}')27 return False
Functions to analyze data type mismatch
In [10]:
1def parse_table_row_counts(plan_id,json_data):2 """3 Extract table names and their corresponding rowcounts from a JSON string.4 5 Args:6 json_data (str): The JSON string containing table names and rowcounts.7 8 Returns:9 dict: A dictionary containing table names as keys and their rowcounts as values.10 """11 try:12 if not json_data:13 logging.warning(f'Plan id: {plan_id}, Optimizer notes does not contain row count element')14 return None15 16 # Extract rowcounts for each table17 matches = row_count_parser.find(json.loads(json_data))18 19 # Create a dictionary to store table names and rowcounts20 table_rowcounts = {}21 for match in matches:22 # Extract the table name from the JSONPath match23 table_name = match.full_path.left.fields[0]24 rowcount = match.value25 table_rowcounts[table_name] = rowcount26 27 return table_rowcounts28 except json.JSONDecodeError as e:29 # Raise an error if JSON parsing fails30 logging.error("Invalid JSON data: " + str(e))31 raise ValueError("Invalid JSON data: " + str(e))32 33 34def fetch_table_row_counts(database_name, table_name):35 """36 Fetch the row count for a given table from the database, using a cursor.37 38 Args:39 cursor: The database cursor object.40 database_name (str): The name of the database.41 table_name (str): The name of the table.42 43 Returns:44 int: The row count of the table, or -1 if the row count cannot be determined.45 """46 global table_row_count_cache47 48 if database_name:49 lookup_key = f'{database_name}.{table_name}'50 # Check if the row count is already cached51 if lookup_key in table_row_count_cache.keys():52 return table_row_count_cache[lookup_key]53 else:54 logging.warning(f'{lookup_key} is missing in cache, will fetch and update cache')55 result = None56 with s2.connect(results_type='dict').cursor() as cursor:57 # Fetch the row count from the database58 cursor.execute(f"select sum(rows) as total_rows from information_schema.table_statistics "59 f"where database_name = '{database_name}' and table_name = '{table_name}'")60 result = cursor.fetchone()61 # Check if the result is None62 if result is None:63 logging.warning(f'RowCounts missing for database:{database_name}, table: {table_name}')64 return -165 else:66 # Cache the row count and return it67 table_row_count_cache[lookup_key] = result['total_rows']68 logging.info(f"fetched rowcount: {result['total_rows']}")69 return result['total_rows']70 else:71 logging.warning(f"database field empty, so returning -1 for table:{table_name}")72 return -173 74 75 76 77def fetch_rowcount(json_data):78 """79 Extract table names and their corresponding rowcounts from a JSON string.80 81 Args:82 json_data (str): The JSON string containing table names and rowcounts.83 84 Returns:85 dict: A dictionary containing table names as keys and their rowcounts as values.86 """87 try:88 # Parse the JSON string89 data = json.loads(json_data)90 91 # Define the JSONPath expression to extract rowcounts92 expr = parse("$.*.rowcount")93 94 # Extract rowcounts for each table95 matches = expr.find(data)96 97 # Create a dictionary to store table names and rowcounts98 table_rowcounts = {}99 for match in matches:100 # Extract the table name from the JSONPath match101 table_name = match.full_path.left.fields[0]102 rowcount = match.value103 table_rowcounts[table_name] = rowcount104 105 return table_rowcounts106 except json.JSONDecodeError as e:107 # Raise an error if JSON parsing fails108 logging.error("Invalid JSON data: " + str(e))109 raise ValueError("Invalid JSON data: " + str(e))110 111 112def extract_table_columns(comparison_expression, table_aliases):113 """114 Extract left and right table columns from a comparison expression.115 116 Args:117 comparison_expression (str): The comparison expression.118 119 Returns:120 tuple: A tuple containing the left and right table columns if they exist,121 or None if only one side is present in the expression.122 """123 try:124 match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)125 if match:126 left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)127 left_col = (match.group(2) if match.group(2) else "")128 right_tab = resolve_table_names((match.group(3) if match.group(3) else ""), table_aliases)129 right_col = (match.group(4) if match.group(2) else "")130 return left_tab, left_col, right_tab, right_col131 # exp like "(`mq`.`keyword` > 0)"132 match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ([\d.]+)\)", comparison_expression)133 if match:134 left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)135 left_col = (match.group(2) if match.group(2) else "")136 right_tab = ""137 right_col = ""138 return left_tab, left_col, right_tab, right_col139 # exp like "(`mq`.`keyword` > '0')"140 match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.+')\)", comparison_expression)141 if match:142 left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)143 left_col = (match.group(2) if match.group(2) else "")144 right_tab = ""145 right_col = ""146 return left_tab, left_col, right_tab, right_col147 # exp like "( 0 < `mq`.`keyword`)"148 match = re.match(r"\(([\d.]+) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)149 if match:150 left_tab = ""151 left_col = ""152 right_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)153 right_col = (match.group(3) if match.group(3) else "")154 return left_tab, left_col, right_tab, right_col155 # exp like "(`mq`.`keyword` = NULL)"156 match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ (.*?)\)", comparison_expression)157 if match:158 left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)159 left_col = (match.group(2) if match.group(2) else "")160 right_tab = ""161 right_col = ""162 return left_tab, left_col, right_tab, right_col163 # exp like "(`mq`.`keyword` = 'NULL')"164 match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.*?')\)", comparison_expression)165 if match:166 left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)167 left_col = (match.group(2) if match.group(2) else "")168 right_tab = ""169 right_col = ""170 return left_tab, left_col, right_tab, right_col171 172 # exp like ( DATE_FORMAT(`mq`.`record_date`,'%Y') = `mt`.`year`)173 match = re.match(r"\( ([A-Za-z_]+)\(`([^`]+)`\.`([^`]+)`\,('.+')\) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)",174 comparison_expression)175 if match:176 left_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)177 left_col = (match.group(3) if match.group(3) else "")178 right_tab = resolve_table_names((match.group(5) if match.group(5) else ""), table_aliases)179 right_col = (match.group(6) if match.group(6) else "")180 return left_tab, left_col, right_tab, right_col181 182 return None, None, None, None183 except Exception as ce:184 logging.error(f"Error extracting table columns from '{comparison_expression}': {ce}")185 raise186 187 188def resolve_table_names(table_alias, table_alias_dict):189 """190 Resolve the actual table name from a given table alias.191 192 Args:193 table_alias (str): The table alias to resolve.194 table_alias_dict (dict): A dictionary mapping table aliases to actual table names.195 196 Returns:197 str: The resolved table name.198 """199 try:200 if table_alias in table_alias_dict:201 return table_alias_dict[table_alias]202 elif re.match(r'.*_[0-9]+$', table_alias):203 return table_alias[:-2]204 else:205 return table_alias206 except Exception as e:207 logging.error(f"Error resolving table name for alias '{table_alias}': {e}")208 raise209 210 211def analyze_comparison(row):212 """213 Analyze a comparison expression in a row and generate a suggestion based on table row counts.214 215 Args:216 row (dict): A dictionary representing a row of data with keys 'DATABASE_NAME', 'CMP_EXP', 'LEFT_TABLE',217 'RIGHT_TABLE', and 'TBL_ROW_COUNTS'.218 219 Returns:220 dict: The input row dictionary updated with 'SUGGESTION', 'LEFT_TABLE_ROW_COUNT', and 'RIGHT_TABLE_ROW_COUNT'.221 """222 try:223 db_name = row['DATABASE_NAME']224 cmp_element = row['CMP_EXP']225 left_table = row['LEFT_TABLE']226 right_table = row['RIGHT_TABLE']227 #rowcount_dict = fetch_rowcount(row['TBL_ROW_COUNTS'])228 rowcount_dict = parse_table_row_counts(row['PLAN_ID'],row['TBL_ROW_COUNTS'])229 230 if rowcount_dict is None:231 rowcount_dict = {}232 if left_table:233 rowcount_dict[f'{db_name}.{left_table}'] = fetch_table_row_counts(db_name, left_table)234 if right_table:235 rowcount_dict[f'{db_name}.{right_table}'] = fetch_table_row_counts(db_name, right_table)236 237 suggestion = io.StringIO()238 239 suggestion.write(f'For Expression: {cmp_element}. ')240 left_row_count = None241 right_row_count = None242 243 if left_table is not None:244 left_lookup_key = (left_table if db_name in left_table else db_name + '.' + left_table)245 left_row_count = rowcount_dict.get(left_lookup_key, -1)246 247 if right_table is not None:248 right_lookup_key = (right_table if db_name in right_table else db_name + '.' + right_table)249 right_row_count = rowcount_dict.get(right_lookup_key, -1)250 251 if left_row_count is not None and right_row_count is not None:252 if left_row_count < right_row_count:253 suggestion.write(f"{left_table} has fewer records, consider table size while optimizing.")254 elif left_row_count > right_row_count:255 suggestion.write(f"{right_table} has fewer records, consider table size while optimizing.")256 else:257 suggestion.write(f"The number of records is equal on both sides of the expression.")258 else:259 suggestion.write(f"Unable to determine row counts for comparison: {cmp_element}.")260 261 row['SUGGESTION'] = suggestion.getvalue()262 row['LEFT_TABLE_ROW_COUNT'] = left_row_count263 row['RIGHT_TABLE_ROW_COUNT'] = right_row_count264 return row265 except Exception as e:266 logging.error(f"Error analyzing comparison for row '{row}': {e}")267 raise268 269 270def generate_datatype_mismatch_rep(conn, output_file, batch_size):271 """272 Process database records fetched using a SQL query and generate a excel report.273 274 Args:275 conn: The database connection object.276 output_file (str): filepath to write.277 batch_size (int): database query fetch size.278 279 Returns:280 None281 """282 try:283 logging.info("Starting database record processing...")284 lines = []285 total_records_processed = 0286 data_prep_start = time.perf_counter()287 with conn.cursor() as cursor:288 cursor.execute(MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY)289 while True:290 database_records = cursor.fetchmany(batch_size)291 logging.info(f"fetched {len(database_records)} rows")292 if not database_records:293 break294 for row in database_records:295 try:296 parser = Parser(row['SQL_QUERY'])297 row['LEFT_TABLE'], row['LEFT_COLUMN'], row['RIGHT_TABLE'], row[298 'RIGHT_COLUMN'] = extract_table_columns(row['CMP_EXP'], parser.tables_aliases)299 row = analyze_comparison(row)300 lines.append(row)301 302 total_records_processed += 1303 if total_records_processed % 1000 == 0:304 logging.info(f"Processed {total_records_processed} records...")305 except Exception as ex:306 logging.debug(f"while processing record: {row}")307 logging.error(ex)308 309 logging.debug(f"total Processed {total_records_processed} records")310 logging.info(f'Data Preparation took {(time.perf_counter() - data_prep_start):.4f} seconds')311 if total_records_processed > 0 :312 report_write_start = time.perf_counter()313 314 wb = Workbook()315 ws = wb.active316 ws.append(list(HEADERS))317 318 for line in lines:319 ws.append(list(line.values()))320 321 wb.save(output_file)322 323 logging.info("Writing to report completed")324 logging.info(f'Report Writing took {(time.perf_counter() - report_write_start):.4f} seconds')325 return True326 else:327 logging.info('No records to write, skipping report')328 return False329 330 #logging.info("Database record processing completed.")331 332 except Exception as e:333 logging.error(f"An error occurred during database record processing: {e}")334 raise
Process dataframe and generate reports for each query in csv
In [11]:
1def process_dataframe(query_csv_dataframe):2 """3 Process a DataFrame containing query data.4 5 Parameters6 ----------7 query_csv_dataframe : pandas.DataFrame8 The DataFrame containing query data.9 10 """11 excel_report_file = f'{local_dir_path}/perf_troubleshoot_report.xlsx'12 with pd.ExcelWriter(excel_report_file, engine="openpyxl") as xlwriter:13 14 for idx, row in query_csv_dataframe.astype(str).iterrows():15 query_id = row['QueryID']16 query_name = row['QueryName']17 query = row['QueryTxt']18 19 logging.debug(f'about to execute {query_name}')20 xlwriter.book.create_sheet(query_name[:30])21 22 try:23 # Execute the query24 result = execute_query(conn, query)25 26 logging.info(f"Fetched query ID: {query_id} NAME: {query_name}")27 # Fetch the template for the result page28 template = fetch_url_content(template_url_base + 'Result-' + str(query_id) + '.template.html')29 if not result:30 # If the result is empty, replace the template placeholder with an empty table31 final_content = template.replace('rstable', empty_result_table)32 else:33 # If the result is not empty, format it as an HTML table and replace the template placeholder34 result_df = pd.DataFrame(result)35 result_df.columns = map(str.upper, result_df.columns)36 result_table_id = 'rstbl'37 result_table_content = result_df.to_html(table_id=result_table_id,38 index=False,39 classes=result_table_html_classes)40 41 final_content = template.replace('rstable', result_table_content)42 result_df.to_excel(xlwriter, sheet_name=query_name, index=False)43 44 # Write the final content to an HTML file45 report_file = f'{local_dir_path}/{query_id}.html'46 with open(report_file, 'w') as writer:47 writer.write(final_content)48 49 50 except Exception as curr_iter_err:51 # If an exception occurs during query execution, log the error and show a warning message52 logging.error(f"Error executing query ID: {query_id}, NAME: {query_name}: {curr_iter_err}")53 logging.exception("Exception details")54 show_warn(f"Error executing query ID: {query_id}, NAME: {query_name}")55 56 logging.info(f'process completed for ID:{query_id} Name:{query_name}')57 58 59 logging.info('Result Pages are generated')60 logging.info(f'Excel Report perf_troubleshoot_report.xlsx is generated')
Function to clean up generated directories
In [12]:
1def clean_dir(p_dir_path, p_archive_file_path):2 """3 Clean the local directory by removing all HTML files and the archive file.4 5 Parameters6 ----------7 p_dir_path : str8 The path to the local directory.9 p_archive_file_path : str10 The path to the archive file.11 12 """13 # Remove the archive file14 try:15 os.remove(p_archive_file_path)16 logging.info('Local archive file removed')17 logging.info('about to clean previous generated files in local dir')18 shutil.rmtree(p_dir_path)19 except OSError as e:20 logging.error('Clean up failed')21 execution_success = False22 error_msg = 'clean up failed'23 print("Error: %s : %s" % (dir_path, e.strerror))24 raise Exception(f'Failed to clean up {str(e)}')
In [13]:
1if connection_url.endswith('/'):2 show_warn('Database not selected. Please select from dropdown in top of web page')3else:4 execution_success = True5 final_file_path = None6 error_msg = None7 try:8 if verify_stage_area():9 # Establish a database connection, use dict as results_type10 conn = s2.connect(results_type='dict')11 logging.info('Database Connection establised')12 # Load query data from a csv file into a pandas DataFrame.13 queries_df = load_query_data(url=query_data_url)14 logging.info('Query Data loaded')15 # Create a local directory for storing result files16 local_dir_path = (s2_workspace_name + local_output_dir_suffix)17 path = Path(local_dir_path)18 path.mkdir(exist_ok=True)19 20 process_dataframe(queries_df)21 22 23 mismatch_report_file_path = f"{local_dir_path}/datatype-mismatch-comparision-report.xlsx"24 generate_datatype_mismatch_rep(conn, output_file=mismatch_report_file_path, batch_size=1000)25 26 copy_index_file(local_dir_path, template_url_base)27 28 # Create a zip archive of the result files29 final_file_path = s2_workspace_name + '_PERF_REPORT_' + my_timestamp + '.tar.gz'30 31 zip_success = make_tarfile(final_file_path, local_dir_path)32 33 logging.info('archive created')34 # Upload the zip archive to the stage area35 if zip_success:36 try:37 uploaded_obj = s2_workgroup_stage.upload_file(local_path=final_file_path,38 stage_path=f'{stage_folder_path}/{final_file_path}')39 logging.info(f'Upload success. Path: {uploaded_obj.abspath()} ')40 print(f'File uploaded to STAGE AREA: {uploaded_obj.abspath()}')41 logging.info('Upload success')42 except Exception as e:43 # If an exception occurs during the upload process, log the error44 execution_success = False45 logging.error(f'Failed during upload process{e}')46 error_msg = 'File Upload failed'47 48 clean_dir(local_dir_path, final_file_path)49 logging.info('Local files cleaned')50 51 else:52 # If creating the zip archive fails, set execution_success to False and log the error53 logging.error('Failed to create archive')54 execution_success = False55 error_msg = 'Failed to create archive'56 57 else:58 # If verifying the stage area fails, set execution_success to False and log the error59 logging.info("Stage Area Verification Failed. Exiting.")60 print('Script execution Failed')61 execution_success = False62 error_msg = 'Failed to create missing stage area path or it is not writeable'63 except Exception as e:64 execution_success = False65 logging.error(f"An error occurred: {e}")66 logging.exception("Exception details")67 error_msg = f'Exception occured. {str(e)}'68 69 # Display a success or error message based on the execution success70 if execution_success:71 #show_success("Files are uploaded to Stage")72 show_success(generate_stage_link('Upload to stage success, File: ', stage_folder_path, final_file_path))73 74 else:75 show_error(error_msg)76 77 logging.info(f'Script execution completed sucessfully: {execution_success}')
Important NOTE
Actions suggested suit most of performance improvement scenarios, Still we would encourage to test and verify before applying on prod environemnts
To use notebook as scheduled one, we have to modify python code to refer configuration from table instead of user input

Details
About this Template
This notebook will help you perform database performance troubleshoot
This Notebook can be run in Standard and Enterprise deployments.
Tags
License
This Notebook has been released under the Apache 2.0 open source license.
See Notebook in action
Launch this notebook in SingleStore and start executing queries instantly.