
Database Performance Troubleshoot Notebook
Notebook

![]() | ![]() |
Intro
Introducing a powerful Python script designed to ease performance analysis tasks for database management.
This script loads query information from csv file exposed on public URL
Executes SQL queries against selected database
Exports results to searchable html tables and uploads archive of generated html files with index into stage area
Handles Stage Area operations using singlestore python client which uses SingleStore Management API
Simplifying complex tasks, this script is essential for streamlining workflows for administrators and developers alike
What you will learn in this notebook:
How to read a csv and load data into pandas dataframes[Python] Download DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv file from url
Execute queries and export result into html files [Python]
Use of SingleStore client for db operations and stage area [Python]
What benefits do you get out of using the notebook.
User will be able to run most used performance checks
Results are exported into HTML for better view
Along with analysis of known scenarios, script also provides background and possible actions to take
Questions?
Reach out to us through our forum.
Pre-requisites
We will need below parameters to proceed.
SingleStore Management API KEY. Follow this link for API Key
Directory Path of Stage Area ( Target location to upload archive )
URL to download csv file
URL of result template directory
Note: You may use the
DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv as template to add up your queries.
templates as templates for results
For simplicity of demo, here we are using a public accessible URL, you have to adapt access pattern to suit your needs.
CSV File structure
QueryID | QueryName | QueryTxt |
Install Libraries
In [1]:
1
pip install openpyxl jsonpath_ng sql_metadata
Imports
In [2]:
1
import io2
import logging3
import os4
import shutil5
import tarfile6
import time7
import json8
import re9
import os10
import csv11
12
import pandas as pd13
import singlestoredb as s214
15
from pathlib import Path16
from urllib.request import urlopen17
from jsonpath_ng import parse18
from sql_metadata import Parser19
from urllib.error import HTTPError20
from datetime import datetime21
from openpyxl import Workbook22
23
from IPython.display import display, HTML
Variables
In [3]:
1
query_data_url = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv"2
template_url_base = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/templates/"3
4
stage_folder_path = 'DBPERF-REPORT'5
6
my_timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")7
local_output_dir_suffix = '_' + my_timestamp + '_PERF_REPORT'8
9
empty_result_table = '<p class="mb-3 mt-3" style="text-align:center;color:blue;">No Matching Records Found</p>'10
result_table_html_classes = 'table table-striped table-bordered table-responsive my-2 px-2'11
12
WORKGROUP_ID = os.getenv('SINGLESTOREDB_WORKSPACE_GROUP')13
14
HEADERS = ["PLAN_ID", "DATABASE_NAME", "SQL_QUERY","SUGGESTION","CMP_EXP",15
"LEFT_TABLE", "LEFT_COLUMN", "LEFT_TYPE", "LEFT_TABLE_ROW_COUNT",16
"RIGHT_TABLE", "RIGHT_COLUMN", "RIGHT_TYPE", "RIGHT_TABLE_ROW_COUNT" ]17
18
s2_workgroup_stage = None19
s2_workspace_name = None20
21
row_count_parser = parse("$.*.rowcount")22
table_row_count_cache = {}23
24
MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY = """25
WITH mismatched_comp26
AS (SELECT plan_id,27
database_name,28
table_col AS mis_cmp29
FROM information_schema.plancache30
cross join TABLE(JSON_TO_ARRAY(31
plan_info :: type_mismatched_comparisons)32
) HAVING plan_warnings like '%%Comparisons between mismatched datatypes%%' ),33
mismatche_cmp_234
AS (SELECT plan_id,35
database_name,36
mis_cmp ::$ comparison_expression AS cmp_exp,37
mis_cmp ::$ left_type AS left_type,38
mis_cmp ::$ right_type AS right_type39
FROM mismatched_comp),40
plan_tbl_row_counts41
AS (SELECT plan_id,42
database_name,43
optimizer_notes ::$ table_row_counts AS tbl_row_counts,44
query_text45
FROM information_schema.plancache)46
SELECT m2.plan_id as PLAN_ID,47
m2.database_name as DATABASE_NAME ,48
m2.cmp_exp as CMP_EXP,49
m2.left_type as LEFT_TYPE,50
m2.right_type as RIGHT_TYPE,51
p2.tbl_row_counts as TBL_ROW_COUNTS,52
p2.query_text AS SQL_QUERY53
FROM mismatche_cmp_2 m2,54
plan_tbl_row_counts p255
WHERE m2.plan_id = p2.plan_id56
AND m2.database_name = p2.database_name ;57
"""
Log Control
In [4]:
1
def set_logging_enabled(enabled):2
"""3
Set the logging level based on the enabled flag.4
5
Parameters6
----------7
enabled : bool8
True to enable logging, False to disable it.9
"""10
if enabled:11
logging.getLogger().setLevel(logging.INFO)12
else:13
logging.getLogger().setLevel(logging.CRITICAL)14
15
set_logging_enabled(False)
Note To enable logs
Modify 'set_logging_enabled(False)' to 'set_logging_enabled(True)' in code below
Functions to display various alerts
In [5]:
1
def show_warn(warn_msg):2
"""3
Display a warning message in a formatted HTML alert box.4
5
Parameters6
----------7
warn_msg : str8
The warning message to display.9
"""10
display(HTML(f'''<div class="alert alert-block alert-warning">11
<b class="fa fa-solid fa-exclamation-circle"></b>12
<div>13
<p><b>Action Required</b></p>14
<p>{warn_msg}</p>15
</div>16
</div>'''))17
18
19
def show_error(error_msg):20
"""21
Display an error message in a formatted HTML alert box.22
23
Parameters24
----------25
error_msg : str26
The error message to display.27
"""28
display(HTML(f'''<div class="alert alert-block alert-danger">29
<b class="fa fa-solid fa-exclamation-triangle"></b>30
<div>31
<p><b>Error</b></p>32
<p>{error_msg}</p>33
</div>34
</div>'''))35
36
37
def show_success(success_msg):38
"""39
Display a success message in a formatted HTML alert box.40
41
Parameters42
----------43
success_msg : str44
The success message to display.45
"""46
display(HTML(f'''<div class="alert alert-block alert-success">47
<b class="fa fa-solid fa-check-circle"></b>48
<div>49
<p><b>Success</b></p>50
<p>{success_msg}</p>51
</div>52
</div>'''))
Utility functions handling db connection and archiving
In [6]:
1
def execute_query(dbcon, query_txt):2
"""3
Execute a SQL query on the specified database connection.4
5
Parameters6
----------7
dbcon : connection8
The database connection object.9
query_txt : str10
The SQL query to execute.11
12
Returns13
-------14
list15
A list of rows returned by the query.16
"""17
try:18
with dbcon.cursor() as cur:19
cur.execute(query_txt)20
return cur.fetchall()21
except Exception as e:22
logging.error(f"Failed to execute query: {e}")23
raise Exception('Failed to execute query')24
25
26
def make_tarfile(output_filename, source_dir):27
"""28
Create a tar.gz archive of a directory.29
30
Parameters31
----------32
output_filename : str33
The name of the output archive file.34
source_dir : str35
The path to the directory to archive.36
37
Returns38
-------39
bool40
True if the archive was created successfully, False otherwise.41
"""42
try:43
with tarfile.open(output_filename, "w:gz") as tar:44
tar.add(source_dir, arcname=os.path.basename(source_dir))45
time.sleep(2)46
file_stats = os.stat(output_filename)47
logging.info(f'{output_filename} has size {(file_stats.st_size / (1024 * 1024))} mb')48
return True49
except Exception as e:50
logging.error(f'Failed to create archive: {e}')51
raise Exception(f'Failed to create archive: {e}')
Utility functions handling HTML generation
In [7]:
1
def copy_index_file(p_local_dir_path, p_templ_base_url):2
"""3
Copy the index file to the local directory.4
5
Parameters6
----------7
p_local_dir_path : str8
The local directory path.9
p_templ_base_url : str10
The base URL for the template.11
12
"""13
index_file = f'{p_local_dir_path}/index.html'14
# Fetch the content of the index template from a URL15
index_file_content = fetch_url_content(p_templ_base_url + 'index.template.html')16
# Write the index file content to a local file17
with open(index_file, 'w') as writer:18
writer.write(str(index_file_content))19
logging.info('Index Page are generated')20
21
22
def generate_html_list(links):23
"""24
Generate an HTML ordered list from a comma-separated list of links.25
26
Parameters27
----------28
links : str29
A comma-separated list of links.30
31
Returns32
-------33
str34
The HTML formatted ordered list.35
"""36
if 'nan' == links:37
return ''38
39
html_list = '<ol>'40
for item in links.split(','):41
html_list += f'<li><a href="{item}">{item}</a></li>'42
html_list += '</ol>'43
return html_list44
45
def generate_stage_link(title, stg_path, curr_file_path):46
"""47
Generate an HTML link to a stage area.48
49
Parameters50
----------51
stg_path : str52
The path to the stage area.53
curr_file_path : str54
The current file path.55
56
Returns57
-------58
str59
The HTML formatted link.60
"""61
url = f"https://portal.singlestore.com/organizations/{os.environ['SINGLESTOREDB_ORGANIZATION']}/workspaces/{os.environ['SINGLESTOREDB_WORKSPACE_GROUP']}#stage/{stg_path}"62
return f"""<div style=\"text-align:center;margin-top:5px; margin-bottom:5px;\">63
{title}64
STAGE Link <a href='{url}'> {curr_file_path} </a>65
</div>"""
Function loading query data in CSV
In [8]:
1
def fetch_url_content(url):2
"""3
Fetch the content of a URL.4
5
Parameters6
----------7
url : str8
The URL to fetch.9
10
Returns11
-------12
str13
The content of the URL.14
"""15
try:16
with urlopen(url) as response:17
if response.status == 200:18
my_bytes = response.read()19
file_content = my_bytes.decode("utf8")20
return file_content21
except HTTPError as e:22
logging.error(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')23
raise Exception(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')24
25
26
def load_query_data(url):27
"""28
Load CSV data from a URL into a pandas DataFrame.29
30
Parameters31
----------32
url : str33
The URL of the CSV file.34
35
Returns36
-------37
pandas.DataFrame38
The loaded DataFrame.39
"""40
csv_file_content = fetch_url_content(url)41
csv_df = pd.read_csv(io.StringIO(csv_file_content), sep=",",42
dtype={'QueryID': int, 'QueryName': str, 'QueryTxt': str, 'QueryParams': str})43
csv_df.sort_values(by=['QueryID'], inplace=True)44
return csv_df
Verify Stage Path and Create if not exists
In [9]:
1
def verify_stage_area():2
"""3
Verify the existence and writability of a stage area.4
5
Returns6
-------7
bool8
True if the stage area is valid, False otherwise.9
"""10
try:11
global s2_workgroup_stage, s2_workspace_name12
my_workspace_mngr = s2.manage_workspaces()13
workspace_group = my_workspace_mngr.get_workspace_group(WORKGROUP_ID)14
s2_workspace_name = my_workspace_mngr.get_workspace(os.environ['SINGLESTOREDB_WORKSPACE']).name15
stage_obj = workspace_group.stage.mkdir(stage_path=stage_folder_path, overwrite=False)16
logging.info(17
f'Stage Path {stage_folder_path} is ok. Is Directory: {stage_obj.is_dir()}. Is Writeable: {stage_obj.writable}')18
if stage_obj.is_dir() and stage_obj.writable:19
s2_workgroup_stage = workspace_group.stage20
logging.info(f'stage is valid: {s2_workgroup_stage is not None}')21
return True22
else:23
logging.error(f'As provided path is neither directory nor writable.')24
return False25
except Exception as stage_ex:26
logging.error(f'Stage Path Verification Failed. {stage_ex}')27
return False
Functions to analyze data type mismatch
In [10]:
1
def parse_table_row_counts(plan_id,json_data):2
"""3
Extract table names and their corresponding rowcounts from a JSON string.4
5
Args:6
json_data (str): The JSON string containing table names and rowcounts.7
8
Returns:9
dict: A dictionary containing table names as keys and their rowcounts as values.10
"""11
try:12
if not json_data:13
logging.warning(f'Plan id: {plan_id}, Optimizer notes does not contain row count element')14
return None15
16
# Extract rowcounts for each table17
matches = row_count_parser.find(json.loads(json_data))18
19
# Create a dictionary to store table names and rowcounts20
table_rowcounts = {}21
for match in matches:22
# Extract the table name from the JSONPath match23
table_name = match.full_path.left.fields[0]24
rowcount = match.value25
table_rowcounts[table_name] = rowcount26
27
return table_rowcounts28
except json.JSONDecodeError as e:29
# Raise an error if JSON parsing fails30
logging.error("Invalid JSON data: " + str(e))31
raise ValueError("Invalid JSON data: " + str(e))32
33
34
def fetch_table_row_counts(database_name, table_name):35
"""36
Fetch the row count for a given table from the database, using a cursor.37
38
Args:39
cursor: The database cursor object.40
database_name (str): The name of the database.41
table_name (str): The name of the table.42
43
Returns:44
int: The row count of the table, or -1 if the row count cannot be determined.45
"""46
global table_row_count_cache47
48
if database_name:49
lookup_key = f'{database_name}.{table_name}'50
# Check if the row count is already cached51
if lookup_key in table_row_count_cache.keys():52
return table_row_count_cache[lookup_key]53
else:54
logging.warning(f'{lookup_key} is missing in cache, will fetch and update cache')55
result = None56
with s2.connect(results_type='dict').cursor() as cursor:57
# Fetch the row count from the database58
cursor.execute(f"select sum(rows) as total_rows from information_schema.table_statistics "59
f"where database_name = '{database_name}' and table_name = '{table_name}'")60
result = cursor.fetchone()61
# Check if the result is None62
if result is None:63
logging.warning(f'RowCounts missing for database:{database_name}, table: {table_name}')64
return -165
else:66
# Cache the row count and return it67
table_row_count_cache[lookup_key] = result['total_rows']68
logging.info(f"fetched rowcount: {result['total_rows']}")69
return result['total_rows']70
else:71
logging.warning(f"database field empty, so returning -1 for table:{table_name}")72
return -173
74
75
76
77
def fetch_rowcount(json_data):78
"""79
Extract table names and their corresponding rowcounts from a JSON string.80
81
Args:82
json_data (str): The JSON string containing table names and rowcounts.83
84
Returns:85
dict: A dictionary containing table names as keys and their rowcounts as values.86
"""87
try:88
# Parse the JSON string89
data = json.loads(json_data)90
91
# Define the JSONPath expression to extract rowcounts92
expr = parse("$.*.rowcount")93
94
# Extract rowcounts for each table95
matches = expr.find(data)96
97
# Create a dictionary to store table names and rowcounts98
table_rowcounts = {}99
for match in matches:100
# Extract the table name from the JSONPath match101
table_name = match.full_path.left.fields[0]102
rowcount = match.value103
table_rowcounts[table_name] = rowcount104
105
return table_rowcounts106
except json.JSONDecodeError as e:107
# Raise an error if JSON parsing fails108
logging.error("Invalid JSON data: " + str(e))109
raise ValueError("Invalid JSON data: " + str(e))110
111
112
def extract_table_columns(comparison_expression, table_aliases):113
"""114
Extract left and right table columns from a comparison expression.115
116
Args:117
comparison_expression (str): The comparison expression.118
119
Returns:120
tuple: A tuple containing the left and right table columns if they exist,121
or None if only one side is present in the expression.122
"""123
try:124
match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)125
if match:126
left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)127
left_col = (match.group(2) if match.group(2) else "")128
right_tab = resolve_table_names((match.group(3) if match.group(3) else ""), table_aliases)129
right_col = (match.group(4) if match.group(2) else "")130
return left_tab, left_col, right_tab, right_col131
# exp like "(`mq`.`keyword` > 0)"132
match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ([\d.]+)\)", comparison_expression)133
if match:134
left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)135
left_col = (match.group(2) if match.group(2) else "")136
right_tab = ""137
right_col = ""138
return left_tab, left_col, right_tab, right_col139
# exp like "(`mq`.`keyword` > '0')"140
match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.+')\)", comparison_expression)141
if match:142
left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)143
left_col = (match.group(2) if match.group(2) else "")144
right_tab = ""145
right_col = ""146
return left_tab, left_col, right_tab, right_col147
# exp like "( 0 < `mq`.`keyword`)"148
match = re.match(r"\(([\d.]+) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)149
if match:150
left_tab = ""151
left_col = ""152
right_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)153
right_col = (match.group(3) if match.group(3) else "")154
return left_tab, left_col, right_tab, right_col155
# exp like "(`mq`.`keyword` = NULL)"156
match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ (.*?)\)", comparison_expression)157
if match:158
left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)159
left_col = (match.group(2) if match.group(2) else "")160
right_tab = ""161
right_col = ""162
return left_tab, left_col, right_tab, right_col163
# exp like "(`mq`.`keyword` = 'NULL')"164
match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.*?')\)", comparison_expression)165
if match:166
left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)167
left_col = (match.group(2) if match.group(2) else "")168
right_tab = ""169
right_col = ""170
return left_tab, left_col, right_tab, right_col171
172
# exp like ( DATE_FORMAT(`mq`.`record_date`,'%Y') = `mt`.`year`)173
match = re.match(r"\( ([A-Za-z_]+)\(`([^`]+)`\.`([^`]+)`\,('.+')\) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)",174
comparison_expression)175
if match:176
left_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)177
left_col = (match.group(3) if match.group(3) else "")178
right_tab = resolve_table_names((match.group(5) if match.group(5) else ""), table_aliases)179
right_col = (match.group(6) if match.group(6) else "")180
return left_tab, left_col, right_tab, right_col181
182
return None, None, None, None183
except Exception as ce:184
logging.error(f"Error extracting table columns from '{comparison_expression}': {ce}")185
raise186
187
188
def resolve_table_names(table_alias, table_alias_dict):189
"""190
Resolve the actual table name from a given table alias.191
192
Args:193
table_alias (str): The table alias to resolve.194
table_alias_dict (dict): A dictionary mapping table aliases to actual table names.195
196
Returns:197
str: The resolved table name.198
"""199
try:200
if table_alias in table_alias_dict:201
return table_alias_dict[table_alias]202
elif re.match(r'.*_[0-9]+$', table_alias):203
return table_alias[:-2]204
else:205
return table_alias206
except Exception as e:207
logging.error(f"Error resolving table name for alias '{table_alias}': {e}")208
raise209
210
211
def analyze_comparison(row):212
"""213
Analyze a comparison expression in a row and generate a suggestion based on table row counts.214
215
Args:216
row (dict): A dictionary representing a row of data with keys 'DATABASE_NAME', 'CMP_EXP', 'LEFT_TABLE',217
'RIGHT_TABLE', and 'TBL_ROW_COUNTS'.218
219
Returns:220
dict: The input row dictionary updated with 'SUGGESTION', 'LEFT_TABLE_ROW_COUNT', and 'RIGHT_TABLE_ROW_COUNT'.221
"""222
try:223
db_name = row['DATABASE_NAME']224
cmp_element = row['CMP_EXP']225
left_table = row['LEFT_TABLE']226
right_table = row['RIGHT_TABLE']227
#rowcount_dict = fetch_rowcount(row['TBL_ROW_COUNTS'])228
rowcount_dict = parse_table_row_counts(row['PLAN_ID'],row['TBL_ROW_COUNTS'])229
230
if rowcount_dict is None:231
rowcount_dict = {}232
if left_table:233
rowcount_dict[f'{db_name}.{left_table}'] = fetch_table_row_counts(db_name, left_table)234
if right_table:235
rowcount_dict[f'{db_name}.{right_table}'] = fetch_table_row_counts(db_name, right_table)236
237
suggestion = io.StringIO()238
239
suggestion.write(f'For Expression: {cmp_element}. ')240
left_row_count = None241
right_row_count = None242
243
if left_table is not None:244
left_lookup_key = (left_table if db_name in left_table else db_name + '.' + left_table)245
left_row_count = rowcount_dict.get(left_lookup_key, -1)246
247
if right_table is not None:248
right_lookup_key = (right_table if db_name in right_table else db_name + '.' + right_table)249
right_row_count = rowcount_dict.get(right_lookup_key, -1)250
251
if left_row_count is not None and right_row_count is not None:252
if left_row_count < right_row_count:253
suggestion.write(f"{left_table} has fewer records, consider table size while optimizing.")254
elif left_row_count > right_row_count:255
suggestion.write(f"{right_table} has fewer records, consider table size while optimizing.")256
else:257
suggestion.write(f"The number of records is equal on both sides of the expression.")258
else:259
suggestion.write(f"Unable to determine row counts for comparison: {cmp_element}.")260
261
row['SUGGESTION'] = suggestion.getvalue()262
row['LEFT_TABLE_ROW_COUNT'] = left_row_count263
row['RIGHT_TABLE_ROW_COUNT'] = right_row_count264
return row265
except Exception as e:266
logging.error(f"Error analyzing comparison for row '{row}': {e}")267
raise268
269
270
def generate_datatype_mismatch_rep(conn, output_file, batch_size):271
"""272
Process database records fetched using a SQL query and generate a excel report.273
274
Args:275
conn: The database connection object.276
output_file (str): filepath to write.277
batch_size (int): database query fetch size.278
279
Returns:280
None281
"""282
try:283
logging.info("Starting database record processing...")284
lines = []285
total_records_processed = 0286
data_prep_start = time.perf_counter()287
with conn.cursor() as cursor:288
cursor.execute(MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY)289
while True:290
database_records = cursor.fetchmany(batch_size)291
logging.info(f"fetched {len(database_records)} rows")292
if not database_records:293
break294
for row in database_records:295
try:296
parser = Parser(row['SQL_QUERY'])297
row['LEFT_TABLE'], row['LEFT_COLUMN'], row['RIGHT_TABLE'], row[298
'RIGHT_COLUMN'] = extract_table_columns(row['CMP_EXP'], parser.tables_aliases)299
row = analyze_comparison(row)300
lines.append(row)301
302
total_records_processed += 1303
if total_records_processed % 1000 == 0:304
logging.info(f"Processed {total_records_processed} records...")305
except Exception as ex:306
logging.debug(f"while processing record: {row}")307
logging.error(ex)308
309
logging.debug(f"total Processed {total_records_processed} records")310
logging.info(f'Data Preparation took {(time.perf_counter() - data_prep_start):.4f} seconds')311
if total_records_processed > 0 :312
report_write_start = time.perf_counter()313
314
wb = Workbook()315
ws = wb.active316
ws.append(list(HEADERS))317
318
for line in lines:319
ws.append(list(line.values()))320
321
wb.save(output_file)322
323
logging.info("Writing to report completed")324
logging.info(f'Report Writing took {(time.perf_counter() - report_write_start):.4f} seconds')325
return True326
else:327
logging.info('No records to write, skipping report')328
return False329
330
#logging.info("Database record processing completed.")331
332
except Exception as e:333
logging.error(f"An error occurred during database record processing: {e}")334
raise
Process dataframe and generate reports for each query in csv
In [11]:
1
def process_dataframe(query_csv_dataframe):2
"""3
Process a DataFrame containing query data.4
5
Parameters6
----------7
query_csv_dataframe : pandas.DataFrame8
The DataFrame containing query data.9
10
"""11
excel_report_file = f'{local_dir_path}/perf_troubleshoot_report.xlsx'12
with pd.ExcelWriter(excel_report_file, engine="openpyxl") as xlwriter:13
14
for idx, row in query_csv_dataframe.astype(str).iterrows():15
query_id = row['QueryID']16
query_name = row['QueryName']17
query = row['QueryTxt']18
19
logging.debug(f'about to execute {query_name}')20
xlwriter.book.create_sheet(query_name[:30])21
22
try:23
# Execute the query24
result = execute_query(conn, query)25
26
logging.info(f"Fetched query ID: {query_id} NAME: {query_name}")27
# Fetch the template for the result page28
template = fetch_url_content(template_url_base + 'Result-' + str(query_id) + '.template.html')29
if not result:30
# If the result is empty, replace the template placeholder with an empty table31
final_content = template.replace('rstable', empty_result_table)32
else:33
# If the result is not empty, format it as an HTML table and replace the template placeholder34
result_df = pd.DataFrame(result)35
result_df.columns = map(str.upper, result_df.columns)36
result_table_id = 'rstbl'37
result_table_content = result_df.to_html(table_id=result_table_id,38
index=False,39
classes=result_table_html_classes)40
41
final_content = template.replace('rstable', result_table_content)42
result_df.to_excel(xlwriter, sheet_name=query_name, index=False)43
44
# Write the final content to an HTML file45
report_file = f'{local_dir_path}/{query_id}.html'46
with open(report_file, 'w') as writer:47
writer.write(final_content)48
49
50
except Exception as curr_iter_err:51
# If an exception occurs during query execution, log the error and show a warning message52
logging.error(f"Error executing query ID: {query_id}, NAME: {query_name}: {curr_iter_err}")53
logging.exception("Exception details")54
show_warn(f"Error executing query ID: {query_id}, NAME: {query_name}")55
56
logging.info(f'process completed for ID:{query_id} Name:{query_name}')57
58
59
logging.info('Result Pages are generated')60
logging.info(f'Excel Report perf_troubleshoot_report.xlsx is generated')
Function to clean up generated directories
In [12]:
1
def clean_dir(p_dir_path, p_archive_file_path):2
"""3
Clean the local directory by removing all HTML files and the archive file.4
5
Parameters6
----------7
p_dir_path : str8
The path to the local directory.9
p_archive_file_path : str10
The path to the archive file.11
12
"""13
# Remove the archive file14
try:15
os.remove(p_archive_file_path)16
logging.info('Local archive file removed')17
logging.info('about to clean previous generated files in local dir')18
shutil.rmtree(p_dir_path)19
except OSError as e:20
logging.error('Clean up failed')21
execution_success = False22
error_msg = 'clean up failed'23
print("Error: %s : %s" % (dir_path, e.strerror))24
raise Exception(f'Failed to clean up {str(e)}')
In [13]:
1
if connection_url.endswith('/'):2
show_warn('Database not selected. Please select from dropdown in top of web page')3
else:4
execution_success = True5
final_file_path = None6
error_msg = None7
try:8
if verify_stage_area():9
# Establish a database connection, use dict as results_type10
conn = s2.connect(results_type='dict')11
logging.info('Database Connection establised')12
# Load query data from a csv file into a pandas DataFrame.13
queries_df = load_query_data(url=query_data_url)14
logging.info('Query Data loaded')15
# Create a local directory for storing result files16
local_dir_path = (s2_workspace_name + local_output_dir_suffix)17
path = Path(local_dir_path)18
path.mkdir(exist_ok=True)19
20
process_dataframe(queries_df)21
22
23
mismatch_report_file_path = f"{local_dir_path}/datatype-mismatch-comparision-report.xlsx"24
generate_datatype_mismatch_rep(conn, output_file=mismatch_report_file_path, batch_size=1000)25
26
copy_index_file(local_dir_path, template_url_base)27
28
# Create a zip archive of the result files29
final_file_path = s2_workspace_name + '_PERF_REPORT_' + my_timestamp + '.tar.gz'30
31
zip_success = make_tarfile(final_file_path, local_dir_path)32
33
logging.info('archive created')34
# Upload the zip archive to the stage area35
if zip_success:36
try:37
uploaded_obj = s2_workgroup_stage.upload_file(local_path=final_file_path,38
stage_path=f'{stage_folder_path}/{final_file_path}')39
logging.info(f'Upload success. Path: {uploaded_obj.abspath()} ')40
print(f'File uploaded to STAGE AREA: {uploaded_obj.abspath()}')41
logging.info('Upload success')42
except Exception as e:43
# If an exception occurs during the upload process, log the error44
execution_success = False45
logging.error(f'Failed during upload process{e}')46
error_msg = 'File Upload failed'47
48
clean_dir(local_dir_path, final_file_path)49
logging.info('Local files cleaned')50
51
else:52
# If creating the zip archive fails, set execution_success to False and log the error53
logging.error('Failed to create archive')54
execution_success = False55
error_msg = 'Failed to create archive'56
57
else:58
# If verifying the stage area fails, set execution_success to False and log the error59
logging.info("Stage Area Verification Failed. Exiting.")60
print('Script execution Failed')61
execution_success = False62
error_msg = 'Failed to create missing stage area path or it is not writeable'63
except Exception as e:64
execution_success = False65
logging.error(f"An error occurred: {e}")66
logging.exception("Exception details")67
error_msg = f'Exception occured. {str(e)}'68
69
# Display a success or error message based on the execution success70
if execution_success:71
#show_success("Files are uploaded to Stage")72
show_success(generate_stage_link('Upload to stage success, File: ', stage_folder_path, final_file_path))73
74
else:75
show_error(error_msg)76
77
logging.info(f'Script execution completed sucessfully: {execution_success}')
Important NOTE
Actions suggested suit most of performance improvement scenarios, Still we would encourage to test and verify before applying on prod environemnts
To use notebook as scheduled one, we have to modify python code to refer configuration from table instead of user input

Details
About this Template
This notebook will help you perform database performance troubleshoot
This Notebook can be run in Standard and Enterprise deployments.
Tags
License
This Notebook has been released under the Apache 2.0 open source license.
See Notebook in action
Launch this notebook in SingleStore and start executing queries instantly.