New

Database Performance Troubleshoot Notebook

Notebook

SingleStore Notebooks

Database Performance Troubleshoot Notebook

Intro

Introducing a powerful Python script designed to ease performance analysis tasks for database management.

This script loads query information from csv file exposed on public URL
Executes SQL queries against selected database
Exports results to searchable html tables and uploads archive of generated html files with index into stage area
Handles Stage Area operations using singlestore python client which uses SingleStore Management API
Simplifying complex tasks, this script is essential for streamlining workflows for administrators and developers alike

What you will learn in this notebook:

How to read a csv and load data into pandas dataframes[Python] Download DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv file from url
Execute queries and export result into html files [Python]
Use of SingleStore client for db operations and stage area [Python]

What benefits do you get out of using the notebook.

User will be able to run most used performance checks
Results are exported into HTML for better view
Along with analysis of known scenarios, script also provides background and possible actions to take

Questions?

Reach out to us through our forum.

Pre-requisites

We will need below parameters to proceed.

SingleStore Management API KEY. Follow this link for API Key
Directory Path of Stage Area ( Target location to upload archive )
URL to download csv file
URL of result template directory

Note: You may use the

DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv as template to add up your queries.
templates as templates for results

For simplicity of demo, here we are using a public accessible URL, you have to adapt access pattern to suit your needs.

CSV File structure

	QueryID	QueryName	QueryTxt

Install Libraries

In [1]:

1pip install openpyxl jsonpath_ng sql_metadata

Imports

In [2]:

1import io2import logging3import os4import shutil5import tarfile6import time7import json8import re9import os10import csv11
12import pandas as pd13import singlestoredb as s214
15from pathlib import Path16from urllib.request import urlopen17from jsonpath_ng import parse18from sql_metadata import Parser19from urllib.error import HTTPError20from datetime import datetime21from openpyxl import Workbook22
23from IPython.display import display, HTML

Variables

In [3]:

1query_data_url = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv"2template_url_base = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/templates/"3
4stage_folder_path = 'DBPERF-REPORT'5
6my_timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")7local_output_dir_suffix = '_' + my_timestamp + '_PERF_REPORT'8
9empty_result_table = '<p class="mb-3 mt-3" style="text-align:center;color:blue;">No Matching Records Found</p>'10result_table_html_classes = 'table table-striped table-bordered table-responsive my-2 px-2'11
12WORKGROUP_ID = os.getenv('SINGLESTOREDB_WORKSPACE_GROUP')13
14HEADERS = ["PLAN_ID", "DATABASE_NAME", "SQL_QUERY","SUGGESTION","CMP_EXP",15           "LEFT_TABLE", "LEFT_COLUMN", "LEFT_TYPE", "LEFT_TABLE_ROW_COUNT",16           "RIGHT_TABLE", "RIGHT_COLUMN", "RIGHT_TYPE", "RIGHT_TABLE_ROW_COUNT" ]17
18s2_workgroup_stage = None19s2_workspace_name = None20
21row_count_parser = parse("$.*.rowcount")22table_row_count_cache = {}23
24MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY = """25WITH mismatched_comp26     AS (SELECT plan_id,27                database_name,28                table_col AS mis_cmp29         FROM   information_schema.plancache30                cross join TABLE(JSON_TO_ARRAY(31                                 plan_info :: type_mismatched_comparisons)32                           ) HAVING plan_warnings like '%%Comparisons between mismatched datatypes%%' ),33     mismatche_cmp_234     AS (SELECT plan_id,35                database_name,36                mis_cmp ::$ comparison_expression AS cmp_exp,37                mis_cmp ::$ left_type             AS left_type,38                mis_cmp ::$ right_type            AS right_type39         FROM   mismatched_comp),40     plan_tbl_row_counts41     AS (SELECT plan_id,42                database_name,43                optimizer_notes ::$ table_row_counts AS tbl_row_counts,44                query_text45         FROM   information_schema.plancache)46SELECT m2.plan_id as PLAN_ID,47       m2.database_name as DATABASE_NAME ,48       m2.cmp_exp as CMP_EXP,49       m2.left_type as LEFT_TYPE,50       m2.right_type as RIGHT_TYPE,51       p2.tbl_row_counts as TBL_ROW_COUNTS,52       p2.query_text AS SQL_QUERY53FROM   mismatche_cmp_2 m2,54       plan_tbl_row_counts p255WHERE  m2.plan_id = p2.plan_id56       AND m2.database_name = p2.database_name ;57"""

Log Control

In [4]:

1def set_logging_enabled(enabled):2    """3    Set the logging level based on the enabled flag.4
5    Parameters6    ----------7    enabled : bool8        True to enable logging, False to disable it.9    """10    if enabled:11        logging.getLogger().setLevel(logging.INFO)12    else:13        logging.getLogger().setLevel(logging.CRITICAL)14
15set_logging_enabled(False)

Note To enable logs

Modify 'set_logging_enabled(False)' to 'set_logging_enabled(True)' in code below

Functions to display various alerts

In [5]:

1def show_warn(warn_msg):2    """3    Display a warning message in a formatted HTML alert box.4
5    Parameters6    ----------7    warn_msg : str8        The warning message to display.9    """10    display(HTML(f'''<div class="alert alert-block alert-warning">11    <b class="fa fa-solid fa-exclamation-circle"></b>12    <div>13        <p><b>Action Required</b></p>14        <p>{warn_msg}</p>15    </div>16</div>'''))17
18
19def show_error(error_msg):20    """21    Display an error message in a formatted HTML alert box.22
23    Parameters24    ----------25    error_msg : str26        The error message to display.27    """28    display(HTML(f'''<div class="alert alert-block alert-danger">29    <b class="fa fa-solid fa-exclamation-triangle"></b>30    <div>31        <p><b>Error</b></p>32        <p>{error_msg}</p>33    </div>34</div>'''))35
36
37def show_success(success_msg):38    """39    Display a success message in a formatted HTML alert box.40
41    Parameters42    ----------43    success_msg : str44        The success message to display.45    """46    display(HTML(f'''<div class="alert alert-block alert-success">47    <b class="fa fa-solid fa-check-circle"></b>48    <div>49        <p><b>Success</b></p>50        <p>{success_msg}</p>51    </div>52</div>'''))

Utility functions handling db connection and archiving

In [6]:

1def execute_query(dbcon, query_txt):2    """3    Execute a SQL query on the specified database connection.4
5    Parameters6    ----------7    dbcon : connection8        The database connection object.9    query_txt : str10        The SQL query to execute.11
12    Returns13    -------14    list15        A list of rows returned by the query.16    """17    try:18        with dbcon.cursor() as cur:19            cur.execute(query_txt)20        return cur.fetchall()21    except Exception as e:22        logging.error(f"Failed to execute query: {e}")23        raise Exception('Failed to execute query')24
25
26def make_tarfile(output_filename, source_dir):27    """28    Create a tar.gz archive of a directory.29
30    Parameters31    ----------32    output_filename : str33        The name of the output archive file.34    source_dir : str35        The path to the directory to archive.36
37    Returns38    -------39    bool40        True if the archive was created successfully, False otherwise.41    """42    try:43        with tarfile.open(output_filename, "w:gz") as tar:44            tar.add(source_dir, arcname=os.path.basename(source_dir))45        time.sleep(2)46        file_stats = os.stat(output_filename)47        logging.info(f'{output_filename} has size {(file_stats.st_size / (1024 * 1024))} mb')48        return True49    except Exception as e:50        logging.error(f'Failed to create archive: {e}')51        raise Exception(f'Failed to create archive: {e}')

Utility functions handling HTML generation

In [7]:

1def copy_index_file(p_local_dir_path, p_templ_base_url):2    """3    Copy the index file to the local directory.4
5    Parameters6    ----------7    p_local_dir_path : str8        The local directory path.9    p_templ_base_url : str10        The base URL for the template.11
12    """13    index_file = f'{p_local_dir_path}/index.html'14    # Fetch the content of the index template from a URL15    index_file_content = fetch_url_content(p_templ_base_url + 'index.template.html')16    # Write the index file content to a local file17    with open(index_file, 'w') as writer:18        writer.write(str(index_file_content))19    logging.info('Index Page are generated')20
21
22def generate_html_list(links):23    """24    Generate an HTML ordered list from a comma-separated list of links.25
26    Parameters27    ----------28    links : str29        A comma-separated list of links.30
31    Returns32    -------33    str34        The HTML formatted ordered list.35    """36    if 'nan' == links:37        return ''38
39    html_list = '<ol>'40    for item in links.split(','):41        html_list += f'<li><a href="{item}">{item}</a></li>'42    html_list += '</ol>'43    return html_list44
45def generate_stage_link(title, stg_path, curr_file_path):46    """47    Generate an HTML link to a stage area.48
49    Parameters50    ----------51    stg_path : str52        The path to the stage area.53    curr_file_path : str54        The current file path.55
56    Returns57    -------58    str59        The HTML formatted link.60    """61    url = f"https://portal.singlestore.com/organizations/{os.environ['SINGLESTOREDB_ORGANIZATION']}/workspaces/{os.environ['SINGLESTOREDB_WORKSPACE_GROUP']}#stage/{stg_path}"62    return f"""<div style=\"text-align:center;margin-top:5px; margin-bottom:5px;\">63                {title}64                 STAGE Link &nbsp;&nbsp;&nbsp;&nbsp; <a href='{url}'> {curr_file_path} </a>65               </div>"""

Function loading query data in CSV

In [8]:

1def fetch_url_content(url):2    """3    Fetch the content of a URL.4
5    Parameters6    ----------7    url : str8        The URL to fetch.9
10    Returns11    -------12    str13        The content of the URL.14    """15    try:16        with urlopen(url) as response:17            if response.status == 200:18                my_bytes = response.read()19                file_content = my_bytes.decode("utf8")20                return file_content21    except HTTPError as e:22        logging.error(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')23        raise Exception(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')24
25
26def load_query_data(url):27    """28    Load CSV data from a URL into a pandas DataFrame.29
30    Parameters31    ----------32    url : str33        The URL of the CSV file.34
35    Returns36    -------37    pandas.DataFrame38        The loaded DataFrame.39    """40    csv_file_content = fetch_url_content(url)41    csv_df = pd.read_csv(io.StringIO(csv_file_content), sep=",",42                         dtype={'QueryID': int, 'QueryName': str, 'QueryTxt': str, 'QueryParams': str})43    csv_df.sort_values(by=['QueryID'], inplace=True)44    return csv_df

Verify Stage Path and Create if not exists

In [9]:

1def verify_stage_area():2    """3    Verify the existence and writability of a stage area.4
5    Returns6    -------7    bool8        True if the stage area is valid, False otherwise.9    """10    try:11        global s2_workgroup_stage, s2_workspace_name12        my_workspace_mngr = s2.manage_workspaces()13        workspace_group = my_workspace_mngr.get_workspace_group(WORKGROUP_ID)14        s2_workspace_name = my_workspace_mngr.get_workspace(os.environ['SINGLESTOREDB_WORKSPACE']).name15        stage_obj = workspace_group.stage.mkdir(stage_path=stage_folder_path, overwrite=False)16        logging.info(17            f'Stage Path {stage_folder_path} is ok. Is Directory: {stage_obj.is_dir()}. Is Writeable: {stage_obj.writable}')18        if stage_obj.is_dir() and stage_obj.writable:19            s2_workgroup_stage = workspace_group.stage20            logging.info(f'stage is valid: {s2_workgroup_stage is not None}')21            return True22        else:23            logging.error(f'As provided path is neither directory nor writable.')24            return False25    except Exception as stage_ex:26        logging.error(f'Stage Path Verification Failed. {stage_ex}')27        return False

Functions to analyze data type mismatch

In [10]:

1def parse_table_row_counts(plan_id,json_data):2    """3    Extract table names and their corresponding rowcounts from a JSON string.4
5    Args:6    json_data (str): The JSON string containing table names and rowcounts.7
8    Returns:9    dict: A dictionary containing table names as keys and their rowcounts as values.10    """11    try:12        if not json_data:13            logging.warning(f'Plan id: {plan_id}, Optimizer notes does not contain row count element')14            return None15
16         # Extract rowcounts for each table17        matches = row_count_parser.find(json.loads(json_data))18
19        # Create a dictionary to store table names and rowcounts20        table_rowcounts = {}21        for match in matches:22            # Extract the table name from the JSONPath match23            table_name = match.full_path.left.fields[0]24            rowcount = match.value25            table_rowcounts[table_name] = rowcount26
27        return table_rowcounts28    except json.JSONDecodeError as e:29        # Raise an error if JSON parsing fails30        logging.error("Invalid JSON data: " + str(e))31        raise ValueError("Invalid JSON data: " + str(e))32
33
34def fetch_table_row_counts(database_name, table_name):35    """36    Fetch the row count for a given table from the database, using a cursor.37
38    Args:39    cursor: The database cursor object.40    database_name (str): The name of the database.41    table_name (str): The name of the table.42
43    Returns:44    int: The row count of the table, or -1 if the row count cannot be determined.45    """46    global table_row_count_cache47
48    if database_name:49        lookup_key = f'{database_name}.{table_name}'50        # Check if the row count is already cached51        if lookup_key in table_row_count_cache.keys():52            return table_row_count_cache[lookup_key]53        else:54            logging.warning(f'{lookup_key} is missing in cache, will fetch and update cache')55            result = None56            with s2.connect(results_type='dict').cursor() as cursor:57                # Fetch the row count from the database58                cursor.execute(f"select sum(rows) as total_rows from information_schema.table_statistics "59                            f"where database_name = '{database_name}' and table_name = '{table_name}'")60                result = cursor.fetchone()61            # Check if the result is None62            if result is None:63                logging.warning(f'RowCounts missing for database:{database_name}, table: {table_name}')64                return -165            else:66                # Cache the row count and return it67                table_row_count_cache[lookup_key] = result['total_rows']68                logging.info(f"fetched rowcount: {result['total_rows']}")69                return result['total_rows']70    else:71        logging.warning(f"database field empty, so returning -1 for table:{table_name}")72        return -173
74
75
76
77def fetch_rowcount(json_data):78    """79    Extract table names and their corresponding rowcounts from a JSON string.80
81    Args:82    json_data (str): The JSON string containing table names and rowcounts.83
84    Returns:85    dict: A dictionary containing table names as keys and their rowcounts as values.86    """87    try:88        # Parse the JSON string89        data = json.loads(json_data)90
91        # Define the JSONPath expression to extract rowcounts92        expr = parse("$.*.rowcount")93
94        # Extract rowcounts for each table95        matches = expr.find(data)96
97        # Create a dictionary to store table names and rowcounts98        table_rowcounts = {}99        for match in matches:100            # Extract the table name from the JSONPath match101            table_name = match.full_path.left.fields[0]102            rowcount = match.value103            table_rowcounts[table_name] = rowcount104
105        return table_rowcounts106    except json.JSONDecodeError as e:107        # Raise an error if JSON parsing fails108        logging.error("Invalid JSON data: " + str(e))109        raise ValueError("Invalid JSON data: " + str(e))110
111
112def extract_table_columns(comparison_expression, table_aliases):113    """114    Extract left and right table columns from a comparison expression.115
116    Args:117    comparison_expression (str): The comparison expression.118
119    Returns:120    tuple: A tuple containing the left and right table columns if they exist,121           or None if only one side is present in the expression.122    """123    try:124        match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)125        if match:126            left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)127            left_col = (match.group(2) if match.group(2) else "")128            right_tab = resolve_table_names((match.group(3) if match.group(3) else ""), table_aliases)129            right_col = (match.group(4) if match.group(2) else "")130            return left_tab, left_col, right_tab, right_col131        # exp like "(`mq`.`keyword` > 0)"132        match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ([\d.]+)\)", comparison_expression)133        if match:134            left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)135            left_col = (match.group(2) if match.group(2) else "")136            right_tab = ""137            right_col = ""138            return left_tab, left_col, right_tab, right_col139        # exp like "(`mq`.`keyword` > '0')"140        match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.+')\)", comparison_expression)141        if match:142            left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)143            left_col = (match.group(2) if match.group(2) else "")144            right_tab = ""145            right_col = ""146            return left_tab, left_col, right_tab, right_col147        # exp like "( 0 < `mq`.`keyword`)"148        match = re.match(r"\(([\d.]+) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)149        if match:150            left_tab = ""151            left_col = ""152            right_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)153            right_col = (match.group(3) if match.group(3) else "")154            return left_tab, left_col, right_tab, right_col155        # exp like "(`mq`.`keyword` = NULL)"156        match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ (.*?)\)", comparison_expression)157        if match:158            left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)159            left_col = (match.group(2) if match.group(2) else "")160            right_tab = ""161            right_col = ""162            return left_tab, left_col, right_tab, right_col163        # exp like "(`mq`.`keyword` = 'NULL')"164        match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.*?')\)", comparison_expression)165        if match:166            left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)167            left_col = (match.group(2) if match.group(2) else "")168            right_tab = ""169            right_col = ""170            return left_tab, left_col, right_tab, right_col171
172        # exp like ( DATE_FORMAT(`mq`.`record_date`,'%Y') = `mt`.`year`)173        match = re.match(r"\( ([A-Za-z_]+)\(`([^`]+)`\.`([^`]+)`\,('.+')\) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)",174                         comparison_expression)175        if match:176            left_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)177            left_col = (match.group(3) if match.group(3) else "")178            right_tab = resolve_table_names((match.group(5) if match.group(5) else ""), table_aliases)179            right_col = (match.group(6) if match.group(6) else "")180            return left_tab, left_col, right_tab, right_col181
182        return None, None, None, None183    except Exception as ce:184        logging.error(f"Error extracting table columns from '{comparison_expression}': {ce}")185        raise186
187
188def resolve_table_names(table_alias, table_alias_dict):189    """190    Resolve the actual table name from a given table alias.191
192    Args:193    table_alias (str): The table alias to resolve.194    table_alias_dict (dict): A dictionary mapping table aliases to actual table names.195
196    Returns:197    str: The resolved table name.198    """199    try:200        if table_alias in table_alias_dict:201            return table_alias_dict[table_alias]202        elif re.match(r'.*_[0-9]+$', table_alias):203            return table_alias[:-2]204        else:205            return table_alias206    except Exception as e:207        logging.error(f"Error resolving table name for alias '{table_alias}': {e}")208        raise209
210
211def analyze_comparison(row):212    """213    Analyze a comparison expression in a row and generate a suggestion based on table row counts.214
215    Args:216    row (dict): A dictionary representing a row of data with keys 'DATABASE_NAME', 'CMP_EXP', 'LEFT_TABLE',217                'RIGHT_TABLE', and 'TBL_ROW_COUNTS'.218
219    Returns:220    dict: The input row dictionary updated with 'SUGGESTION', 'LEFT_TABLE_ROW_COUNT', and 'RIGHT_TABLE_ROW_COUNT'.221    """222    try:223        db_name = row['DATABASE_NAME']224        cmp_element = row['CMP_EXP']225        left_table = row['LEFT_TABLE']226        right_table = row['RIGHT_TABLE']227        #rowcount_dict = fetch_rowcount(row['TBL_ROW_COUNTS'])228        rowcount_dict = parse_table_row_counts(row['PLAN_ID'],row['TBL_ROW_COUNTS'])229
230        if rowcount_dict is None:231            rowcount_dict = {}232            if left_table:233                rowcount_dict[f'{db_name}.{left_table}'] = fetch_table_row_counts(db_name, left_table)234            if right_table:235                rowcount_dict[f'{db_name}.{right_table}'] = fetch_table_row_counts(db_name, right_table)236
237        suggestion = io.StringIO()238
239        suggestion.write(f'For Expression: {cmp_element}. ')240        left_row_count = None241        right_row_count = None242
243        if left_table is not None:244            left_lookup_key = (left_table if db_name in left_table else db_name + '.' + left_table)245            left_row_count = rowcount_dict.get(left_lookup_key, -1)246
247        if right_table is not None:248            right_lookup_key = (right_table if db_name in right_table else db_name + '.' + right_table)249            right_row_count = rowcount_dict.get(right_lookup_key, -1)250
251        if left_row_count is not None and right_row_count is not None:252            if left_row_count < right_row_count:253                suggestion.write(f"{left_table} has fewer records, consider table size while optimizing.")254            elif left_row_count > right_row_count:255                suggestion.write(f"{right_table} has fewer records, consider table size while optimizing.")256            else:257                suggestion.write(f"The number of records is equal on both sides of the expression.")258        else:259            suggestion.write(f"Unable to determine row counts for comparison: {cmp_element}.")260
261        row['SUGGESTION'] = suggestion.getvalue()262        row['LEFT_TABLE_ROW_COUNT'] = left_row_count263        row['RIGHT_TABLE_ROW_COUNT'] = right_row_count264        return row265    except Exception as e:266        logging.error(f"Error analyzing comparison for row '{row}': {e}")267        raise268
269
270def generate_datatype_mismatch_rep(conn, output_file, batch_size):271    """272    Process database records fetched using a SQL query and generate a excel report.273
274    Args:275    conn: The database connection object.276    output_file (str): filepath to write.277    batch_size (int): database query fetch size.278
279    Returns:280    None281    """282    try:283        logging.info("Starting database record processing...")284        lines = []285        total_records_processed = 0286        data_prep_start = time.perf_counter()287        with conn.cursor() as cursor:288                cursor.execute(MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY)289                while True:290                    database_records = cursor.fetchmany(batch_size)291                    logging.info(f"fetched {len(database_records)} rows")292                    if not database_records:293                        break294                    for row in database_records:295                        try:296                            parser = Parser(row['SQL_QUERY'])297                            row['LEFT_TABLE'], row['LEFT_COLUMN'], row['RIGHT_TABLE'], row[298                                'RIGHT_COLUMN'] = extract_table_columns(row['CMP_EXP'], parser.tables_aliases)299                            row = analyze_comparison(row)300                            lines.append(row)301
302                            total_records_processed += 1303                            if total_records_processed % 1000 == 0:304                                logging.info(f"Processed {total_records_processed} records...")305                        except Exception as ex:306                            logging.debug(f"while processing record: {row}")307                            logging.error(ex)308
309        logging.debug(f"total Processed {total_records_processed} records")310        logging.info(f'Data Preparation took {(time.perf_counter() - data_prep_start):.4f} seconds')311        if total_records_processed > 0 :312            report_write_start = time.perf_counter()313
314            wb = Workbook()315            ws = wb.active316            ws.append(list(HEADERS))317
318            for line in lines:319                ws.append(list(line.values()))320
321            wb.save(output_file)322
323            logging.info("Writing to report completed")324            logging.info(f'Report Writing took {(time.perf_counter() - report_write_start):.4f} seconds')325            return True326        else:327            logging.info('No records to write, skipping report')328            return False329
330        #logging.info("Database record processing completed.")331
332    except Exception as e:333        logging.error(f"An error occurred during database record processing: {e}")334        raise

Process dataframe and generate reports for each query in csv

In [11]:

1def process_dataframe(query_csv_dataframe):2    """3    Process a DataFrame containing query data.4
5    Parameters6    ----------7    query_csv_dataframe : pandas.DataFrame8        The DataFrame containing query data.9
10    """11    excel_report_file = f'{local_dir_path}/perf_troubleshoot_report.xlsx'12    with pd.ExcelWriter(excel_report_file, engine="openpyxl") as xlwriter:13
14        for idx, row in query_csv_dataframe.astype(str).iterrows():15            query_id = row['QueryID']16            query_name = row['QueryName']17            query = row['QueryTxt']18
19            logging.debug(f'about to execute {query_name}')20            xlwriter.book.create_sheet(query_name[:30])21
22            try:23                # Execute the query24                result = execute_query(conn, query)25
26                logging.info(f"Fetched query ID: {query_id} NAME: {query_name}")27                # Fetch the template for the result page28                template = fetch_url_content(template_url_base + 'Result-' + str(query_id) + '.template.html')29                if not result:30                    # If the result is empty, replace the template placeholder with an empty table31                    final_content = template.replace('rstable', empty_result_table)32                else:33                    # If the result is not empty, format it as an HTML table and replace the template placeholder34                    result_df = pd.DataFrame(result)35                    result_df.columns = map(str.upper, result_df.columns)36                    result_table_id = 'rstbl'37                    result_table_content = result_df.to_html(table_id=result_table_id,38                                                             index=False,39                                                             classes=result_table_html_classes)40
41                    final_content = template.replace('rstable', result_table_content)42                    result_df.to_excel(xlwriter, sheet_name=query_name, index=False)43
44                # Write the final content to an HTML file45                report_file = f'{local_dir_path}/{query_id}.html'46                with open(report_file, 'w') as writer:47                    writer.write(final_content)48
49
50            except Exception as curr_iter_err:51                # If an exception occurs during query execution, log the error and show a warning message52                logging.error(f"Error executing query ID: {query_id}, NAME: {query_name}: {curr_iter_err}")53                logging.exception("Exception details")54                show_warn(f"Error executing query ID: {query_id}, NAME: {query_name}")55
56            logging.info(f'process completed for ID:{query_id} Name:{query_name}')57
58
59    logging.info('Result Pages are generated')60    logging.info(f'Excel Report perf_troubleshoot_report.xlsx is generated')

Function to clean up generated directories

In [12]:

1def clean_dir(p_dir_path, p_archive_file_path):2    """3    Clean the local directory by removing all HTML files and the archive file.4
5    Parameters6    ----------7    p_dir_path : str8        The path to the local directory.9    p_archive_file_path : str10        The path to the archive file.11
12    """13    # Remove the archive file14    try:15        os.remove(p_archive_file_path)16        logging.info('Local archive file removed')17        logging.info('about to clean previous generated files in local dir')18        shutil.rmtree(p_dir_path)19    except OSError as e:20        logging.error('Clean up failed')21        execution_success = False22        error_msg = 'clean up failed'23        print("Error: %s : %s" % (dir_path, e.strerror))24        raise Exception(f'Failed to clean up {str(e)}')

In [13]:

1if connection_url.endswith('/'):2    show_warn('Database not selected. Please select from dropdown in top of web page')3else:4    execution_success = True5    final_file_path = None6    error_msg = None7    try:8        if verify_stage_area():9            # Establish a database connection, use dict as results_type10            conn = s2.connect(results_type='dict')11            logging.info('Database Connection establised')12            # Load query data from a csv file into a pandas DataFrame.13            queries_df = load_query_data(url=query_data_url)14            logging.info('Query Data loaded')15             # Create a local directory for storing result files16            local_dir_path = (s2_workspace_name + local_output_dir_suffix)17            path = Path(local_dir_path)18            path.mkdir(exist_ok=True)19
20            process_dataframe(queries_df)21
22
23            mismatch_report_file_path = f"{local_dir_path}/datatype-mismatch-comparision-report.xlsx"24            generate_datatype_mismatch_rep(conn, output_file=mismatch_report_file_path, batch_size=1000)25
26            copy_index_file(local_dir_path, template_url_base)27
28            # Create a zip archive of the result files29            final_file_path = s2_workspace_name + '_PERF_REPORT_' + my_timestamp + '.tar.gz'30
31            zip_success = make_tarfile(final_file_path, local_dir_path)32
33            logging.info('archive created')34            # Upload the zip archive to the stage area35            if zip_success:36                try:37                    uploaded_obj = s2_workgroup_stage.upload_file(local_path=final_file_path,38                                                                  stage_path=f'{stage_folder_path}/{final_file_path}')39                    logging.info(f'Upload success. Path: {uploaded_obj.abspath()} ')40                    print(f'File uploaded to STAGE AREA: {uploaded_obj.abspath()}')41                    logging.info('Upload success')42                except Exception as e:43                    # If an exception occurs during the upload process, log the error44                    execution_success = False45                    logging.error(f'Failed during upload process{e}')46                    error_msg = 'File Upload failed'47
48                clean_dir(local_dir_path, final_file_path)49                logging.info('Local files cleaned')50
51            else:52                # If creating the zip archive fails, set execution_success to False and log the error53                logging.error('Failed to create archive')54                execution_success = False55                error_msg = 'Failed to create archive'56
57        else:58            # If verifying the stage area fails, set execution_success to False and log the error59            logging.info("Stage Area Verification Failed. Exiting.")60            print('Script execution Failed')61            execution_success = False62            error_msg = 'Failed to create missing stage area path or it is not writeable'63    except Exception as e:64        execution_success = False65        logging.error(f"An error occurred: {e}")66        logging.exception("Exception details")67        error_msg = f'Exception occured. {str(e)}'68
69    # Display a success or error message based on the execution success70    if execution_success:71        #show_success("Files are uploaded to Stage")72        show_success(generate_stage_link('Upload to stage success, File: ', stage_folder_path, final_file_path))73
74    else:75        show_error(error_msg)76
77    logging.info(f'Script execution completed sucessfully: {execution_success}')

Important NOTE

Actions suggested suit most of performance improvement scenarios, Still we would encourage to test and verify before applying on prod environemnts
To use notebook as scheduled one, we have to modify python code to refer configuration from table instead of user input

Details

About this Template

This notebook will help you perform database performance troubleshoot

This Notebook can be run in Standard and Enterprise deployments.

License

This Notebook has been released under the Apache 2.0 open source license.

See Notebook in action

Launch this notebook in SingleStore and start executing queries instantly.

Database Performance Troubleshoot Notebook

Notebook

Database Performance Troubleshoot Notebook

Intro

What you will learn in this notebook:

What benefits do you get out of using the notebook.

Questions?

Pre-requisites

Install Libraries

Imports

Variables

Log Control

Functions to display various alerts

Utility functions handling db connection and archiving

Utility functions handling HTML generation

Function loading query data in CSV

Verify Stage Path and Create if not exists

Functions to analyze data type mismatch

Process dataframe and generate reports for each query in csv

Function to clean up generated directories

Details

About this Template

Tags

License

See Notebook in action