New

Database Performance Troubleshoot Notebook

Notebook


SingleStore Notebooks

Database Performance Troubleshoot Notebook

Intro

Introducing a powerful Python script designed to ease performance analysis tasks for database management.

  1. This script loads query information from csv file exposed on public URL

  2. Executes SQL queries against selected database

  3. Exports results to searchable html tables and uploads archive of generated html files with index into stage area

  4. Handles Stage Area operations using singlestore python client which uses SingleStore Management API

  5. Simplifying complex tasks, this script is essential for streamlining workflows for administrators and developers alike

What you will learn in this notebook:

  1. How to read a csv and load data into pandas dataframes[Python] Download DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv file from url

  2. Execute queries and export result into html files [Python]

  3. Use of SingleStore client for db operations and stage area [Python]

What benefits do you get out of using the notebook.

  1. User will be able to run most used performance checks

  2. Results are exported into HTML for better view

  3. Along with analysis of known scenarios, script also provides background and possible actions to take

Questions?

Reach out to us through our forum.

Pre-requisites

We will need below parameters to proceed.

  1. SingleStore Management API KEY. Follow this link for API Key

  2. Directory Path of Stage Area ( Target location to upload archive )

  3. URL to download csv file

  4. URL of result template directory

Note: You may use the

For simplicity of demo, here we are using a public accessible URL, you have to adapt access pattern to suit your needs.

CSV File structure

QueryIDQueryNameQueryTxt

Install Libraries

In [1]:

1

pip install openpyxl jsonpath_ng sql_metadata

Imports

In [2]:

1

import io

2

import logging

3

import os

4

import shutil

5

import tarfile

6

import time

7

import json

8

import re

9

import os

10

import csv

11

12

import pandas as pd

13

import singlestoredb as s2

14

15

from pathlib import Path

16

from urllib.request import urlopen

17

from jsonpath_ng import parse

18

from sql_metadata import Parser

19

from urllib.error import HTTPError

20

from datetime import datetime

21

from openpyxl import Workbook

22

23

from IPython.display import display, HTML

Variables

In [3]:

1

query_data_url = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv"

2

template_url_base = "https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/notebooks/performance-troubleshooting/assets/templates/"

3

4

stage_folder_path = 'DBPERF-REPORT'

5

6

my_timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

7

local_output_dir_suffix = '_' + my_timestamp + '_PERF_REPORT'

8

9

empty_result_table = '<p class="mb-3 mt-3" style="text-align:center;color:blue;">No Matching Records Found</p>'

10

result_table_html_classes = 'table table-striped table-bordered table-responsive my-2 px-2'

11

12

WORKGROUP_ID = os.getenv('SINGLESTOREDB_WORKSPACE_GROUP')

13

14

HEADERS = ["PLAN_ID", "DATABASE_NAME", "SQL_QUERY","SUGGESTION","CMP_EXP",

15

"LEFT_TABLE", "LEFT_COLUMN", "LEFT_TYPE", "LEFT_TABLE_ROW_COUNT",

16

"RIGHT_TABLE", "RIGHT_COLUMN", "RIGHT_TYPE", "RIGHT_TABLE_ROW_COUNT" ]

17

18

s2_workgroup_stage = None

19

s2_workspace_name = None

20

21

row_count_parser = parse("$.*.rowcount")

22

table_row_count_cache = {}

23

24

MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY = """

25

WITH mismatched_comp

26

AS (SELECT plan_id,

27

database_name,

28

table_col AS mis_cmp

29

FROM information_schema.plancache

30

cross join TABLE(JSON_TO_ARRAY(

31

plan_info :: type_mismatched_comparisons)

32

) HAVING plan_warnings like '%%Comparisons between mismatched datatypes%%' ),

33

mismatche_cmp_2

34

AS (SELECT plan_id,

35

database_name,

36

mis_cmp ::$ comparison_expression AS cmp_exp,

37

mis_cmp ::$ left_type AS left_type,

38

mis_cmp ::$ right_type AS right_type

39

FROM mismatched_comp),

40

plan_tbl_row_counts

41

AS (SELECT plan_id,

42

database_name,

43

optimizer_notes ::$ table_row_counts AS tbl_row_counts,

44

query_text

45

FROM information_schema.plancache)

46

SELECT m2.plan_id as PLAN_ID,

47

m2.database_name as DATABASE_NAME ,

48

m2.cmp_exp as CMP_EXP,

49

m2.left_type as LEFT_TYPE,

50

m2.right_type as RIGHT_TYPE,

51

p2.tbl_row_counts as TBL_ROW_COUNTS,

52

p2.query_text AS SQL_QUERY

53

FROM mismatche_cmp_2 m2,

54

plan_tbl_row_counts p2

55

WHERE m2.plan_id = p2.plan_id

56

AND m2.database_name = p2.database_name ;

57

"""

Log Control

In [4]:

1

def set_logging_enabled(enabled):

2

"""

3

Set the logging level based on the enabled flag.

4

5

Parameters

6

----------

7

enabled : bool

8

True to enable logging, False to disable it.

9

"""

10

if enabled:

11

logging.getLogger().setLevel(logging.INFO)

12

else:

13

logging.getLogger().setLevel(logging.CRITICAL)

14

15

set_logging_enabled(False)

Note To enable logs

  • Modify 'set_logging_enabled(False)' to 'set_logging_enabled(True)' in code below

Functions to display various alerts

In [5]:

1

def show_warn(warn_msg):

2

"""

3

Display a warning message in a formatted HTML alert box.

4

5

Parameters

6

----------

7

warn_msg : str

8

The warning message to display.

9

"""

10

display(HTML(f'''<div class="alert alert-block alert-warning">

11

<b class="fa fa-solid fa-exclamation-circle"></b>

12

<div>

13

<p><b>Action Required</b></p>

14

<p>{warn_msg}</p>

15

</div>

16

</div>'''))

17

18

19

def show_error(error_msg):

20

"""

21

Display an error message in a formatted HTML alert box.

22

23

Parameters

24

----------

25

error_msg : str

26

The error message to display.

27

"""

28

display(HTML(f'''<div class="alert alert-block alert-danger">

29

<b class="fa fa-solid fa-exclamation-triangle"></b>

30

<div>

31

<p><b>Error</b></p>

32

<p>{error_msg}</p>

33

</div>

34

</div>'''))

35

36

37

def show_success(success_msg):

38

"""

39

Display a success message in a formatted HTML alert box.

40

41

Parameters

42

----------

43

success_msg : str

44

The success message to display.

45

"""

46

display(HTML(f'''<div class="alert alert-block alert-success">

47

<b class="fa fa-solid fa-check-circle"></b>

48

<div>

49

<p><b>Success</b></p>

50

<p>{success_msg}</p>

51

</div>

52

</div>'''))

Utility functions handling db connection and archiving

In [6]:

1

def execute_query(dbcon, query_txt):

2

"""

3

Execute a SQL query on the specified database connection.

4

5

Parameters

6

----------

7

dbcon : connection

8

The database connection object.

9

query_txt : str

10

The SQL query to execute.

11

12

Returns

13

-------

14

list

15

A list of rows returned by the query.

16

"""

17

try:

18

with dbcon.cursor() as cur:

19

cur.execute(query_txt)

20

return cur.fetchall()

21

except Exception as e:

22

logging.error(f"Failed to execute query: {e}")

23

raise Exception('Failed to execute query')

24

25

26

def make_tarfile(output_filename, source_dir):

27

"""

28

Create a tar.gz archive of a directory.

29

30

Parameters

31

----------

32

output_filename : str

33

The name of the output archive file.

34

source_dir : str

35

The path to the directory to archive.

36

37

Returns

38

-------

39

bool

40

True if the archive was created successfully, False otherwise.

41

"""

42

try:

43

with tarfile.open(output_filename, "w:gz") as tar:

44

tar.add(source_dir, arcname=os.path.basename(source_dir))

45

time.sleep(2)

46

file_stats = os.stat(output_filename)

47

logging.info(f'{output_filename} has size {(file_stats.st_size / (1024 * 1024))} mb')

48

return True

49

except Exception as e:

50

logging.error(f'Failed to create archive: {e}')

51

raise Exception(f'Failed to create archive: {e}')

Utility functions handling HTML generation

In [7]:

1

def copy_index_file(p_local_dir_path, p_templ_base_url):

2

"""

3

Copy the index file to the local directory.

4

5

Parameters

6

----------

7

p_local_dir_path : str

8

The local directory path.

9

p_templ_base_url : str

10

The base URL for the template.

11

12

"""

13

index_file = f'{p_local_dir_path}/index.html'

14

# Fetch the content of the index template from a URL

15

index_file_content = fetch_url_content(p_templ_base_url + 'index.template.html')

16

# Write the index file content to a local file

17

with open(index_file, 'w') as writer:

18

writer.write(str(index_file_content))

19

logging.info('Index Page are generated')

20

21

22

def generate_html_list(links):

23

"""

24

Generate an HTML ordered list from a comma-separated list of links.

25

26

Parameters

27

----------

28

links : str

29

A comma-separated list of links.

30

31

Returns

32

-------

33

str

34

The HTML formatted ordered list.

35

"""

36

if 'nan' == links:

37

return ''

38

39

html_list = '<ol>'

40

for item in links.split(','):

41

html_list += f'<li><a href="{item}">{item}</a></li>'

42

html_list += '</ol>'

43

return html_list

44

45

def generate_stage_link(title, stg_path, curr_file_path):

46

"""

47

Generate an HTML link to a stage area.

48

49

Parameters

50

----------

51

stg_path : str

52

The path to the stage area.

53

curr_file_path : str

54

The current file path.

55

56

Returns

57

-------

58

str

59

The HTML formatted link.

60

"""

61

url = f"https://portal.singlestore.com/organizations/{os.environ['SINGLESTOREDB_ORGANIZATION']}/workspaces/{os.environ['SINGLESTOREDB_WORKSPACE_GROUP']}#stage/{stg_path}"

62

return f"""<div style=\"text-align:center;margin-top:5px; margin-bottom:5px;\">

63

{title}

64

STAGE Link &nbsp;&nbsp;&nbsp;&nbsp; <a href='{url}'> {curr_file_path} </a>

65

</div>"""

Function loading query data in CSV

In [8]:

1

def fetch_url_content(url):

2

"""

3

Fetch the content of a URL.

4

5

Parameters

6

----------

7

url : str

8

The URL to fetch.

9

10

Returns

11

-------

12

str

13

The content of the URL.

14

"""

15

try:

16

with urlopen(url) as response:

17

if response.status == 200:

18

my_bytes = response.read()

19

file_content = my_bytes.decode("utf8")

20

return file_content

21

except HTTPError as e:

22

logging.error(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')

23

raise Exception(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')

24

25

26

def load_query_data(url):

27

"""

28

Load CSV data from a URL into a pandas DataFrame.

29

30

Parameters

31

----------

32

url : str

33

The URL of the CSV file.

34

35

Returns

36

-------

37

pandas.DataFrame

38

The loaded DataFrame.

39

"""

40

csv_file_content = fetch_url_content(url)

41

csv_df = pd.read_csv(io.StringIO(csv_file_content), sep=",",

42

dtype={'QueryID': int, 'QueryName': str, 'QueryTxt': str, 'QueryParams': str})

43

csv_df.sort_values(by=['QueryID'], inplace=True)

44

return csv_df

Verify Stage Path and Create if not exists

In [9]:

1

def verify_stage_area():

2

"""

3

Verify the existence and writability of a stage area.

4

5

Returns

6

-------

7

bool

8

True if the stage area is valid, False otherwise.

9

"""

10

try:

11

global s2_workgroup_stage, s2_workspace_name

12

my_workspace_mngr = s2.manage_workspaces()

13

workspace_group = my_workspace_mngr.get_workspace_group(WORKGROUP_ID)

14

s2_workspace_name = my_workspace_mngr.get_workspace(os.environ['SINGLESTOREDB_WORKSPACE']).name

15

stage_obj = workspace_group.stage.mkdir(stage_path=stage_folder_path, overwrite=False)

16

logging.info(

17

f'Stage Path {stage_folder_path} is ok. Is Directory: {stage_obj.is_dir()}. Is Writeable: {stage_obj.writable}')

18

if stage_obj.is_dir() and stage_obj.writable:

19

s2_workgroup_stage = workspace_group.stage

20

logging.info(f'stage is valid: {s2_workgroup_stage is not None}')

21

return True

22

else:

23

logging.error(f'As provided path is neither directory nor writable.')

24

return False

25

except Exception as stage_ex:

26

logging.error(f'Stage Path Verification Failed. {stage_ex}')

27

return False

Functions to analyze data type mismatch

In [10]:

1

def parse_table_row_counts(plan_id,json_data):

2

"""

3

Extract table names and their corresponding rowcounts from a JSON string.

4

5

Args:

6

json_data (str): The JSON string containing table names and rowcounts.

7

8

Returns:

9

dict: A dictionary containing table names as keys and their rowcounts as values.

10

"""

11

try:

12

if not json_data:

13

logging.warning(f'Plan id: {plan_id}, Optimizer notes does not contain row count element')

14

return None

15

16

# Extract rowcounts for each table

17

matches = row_count_parser.find(json.loads(json_data))

18

19

# Create a dictionary to store table names and rowcounts

20

table_rowcounts = {}

21

for match in matches:

22

# Extract the table name from the JSONPath match

23

table_name = match.full_path.left.fields[0]

24

rowcount = match.value

25

table_rowcounts[table_name] = rowcount

26

27

return table_rowcounts

28

except json.JSONDecodeError as e:

29

# Raise an error if JSON parsing fails

30

logging.error("Invalid JSON data: " + str(e))

31

raise ValueError("Invalid JSON data: " + str(e))

32

33

34

def fetch_table_row_counts(database_name, table_name):

35

"""

36

Fetch the row count for a given table from the database, using a cursor.

37

38

Args:

39

cursor: The database cursor object.

40

database_name (str): The name of the database.

41

table_name (str): The name of the table.

42

43

Returns:

44

int: The row count of the table, or -1 if the row count cannot be determined.

45

"""

46

global table_row_count_cache

47

48

if database_name:

49

lookup_key = f'{database_name}.{table_name}'

50

# Check if the row count is already cached

51

if lookup_key in table_row_count_cache.keys():

52

return table_row_count_cache[lookup_key]

53

else:

54

logging.warning(f'{lookup_key} is missing in cache, will fetch and update cache')

55

result = None

56

with s2.connect(results_type='dict').cursor() as cursor:

57

# Fetch the row count from the database

58

cursor.execute(f"select sum(rows) as total_rows from information_schema.table_statistics "

59

f"where database_name = '{database_name}' and table_name = '{table_name}'")

60

result = cursor.fetchone()

61

# Check if the result is None

62

if result is None:

63

logging.warning(f'RowCounts missing for database:{database_name}, table: {table_name}')

64

return -1

65

else:

66

# Cache the row count and return it

67

table_row_count_cache[lookup_key] = result['total_rows']

68

logging.info(f"fetched rowcount: {result['total_rows']}")

69

return result['total_rows']

70

else:

71

logging.warning(f"database field empty, so returning -1 for table:{table_name}")

72

return -1

73

74

75

76

77

def fetch_rowcount(json_data):

78

"""

79

Extract table names and their corresponding rowcounts from a JSON string.

80

81

Args:

82

json_data (str): The JSON string containing table names and rowcounts.

83

84

Returns:

85

dict: A dictionary containing table names as keys and their rowcounts as values.

86

"""

87

try:

88

# Parse the JSON string

89

data = json.loads(json_data)

90

91

# Define the JSONPath expression to extract rowcounts

92

expr = parse("$.*.rowcount")

93

94

# Extract rowcounts for each table

95

matches = expr.find(data)

96

97

# Create a dictionary to store table names and rowcounts

98

table_rowcounts = {}

99

for match in matches:

100

# Extract the table name from the JSONPath match

101

table_name = match.full_path.left.fields[0]

102

rowcount = match.value

103

table_rowcounts[table_name] = rowcount

104

105

return table_rowcounts

106

except json.JSONDecodeError as e:

107

# Raise an error if JSON parsing fails

108

logging.error("Invalid JSON data: " + str(e))

109

raise ValueError("Invalid JSON data: " + str(e))

110

111

112

def extract_table_columns(comparison_expression, table_aliases):

113

"""

114

Extract left and right table columns from a comparison expression.

115

116

Args:

117

comparison_expression (str): The comparison expression.

118

119

Returns:

120

tuple: A tuple containing the left and right table columns if they exist,

121

or None if only one side is present in the expression.

122

"""

123

try:

124

match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)

125

if match:

126

left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)

127

left_col = (match.group(2) if match.group(2) else "")

128

right_tab = resolve_table_names((match.group(3) if match.group(3) else ""), table_aliases)

129

right_col = (match.group(4) if match.group(2) else "")

130

return left_tab, left_col, right_tab, right_col

131

# exp like "(`mq`.`keyword` > 0)"

132

match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ([\d.]+)\)", comparison_expression)

133

if match:

134

left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)

135

left_col = (match.group(2) if match.group(2) else "")

136

right_tab = ""

137

right_col = ""

138

return left_tab, left_col, right_tab, right_col

139

# exp like "(`mq`.`keyword` > '0')"

140

match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.+')\)", comparison_expression)

141

if match:

142

left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)

143

left_col = (match.group(2) if match.group(2) else "")

144

right_tab = ""

145

right_col = ""

146

return left_tab, left_col, right_tab, right_col

147

# exp like "( 0 < `mq`.`keyword`)"

148

match = re.match(r"\(([\d.]+) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)", comparison_expression)

149

if match:

150

left_tab = ""

151

left_col = ""

152

right_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)

153

right_col = (match.group(3) if match.group(3) else "")

154

return left_tab, left_col, right_tab, right_col

155

# exp like "(`mq`.`keyword` = NULL)"

156

match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ (.*?)\)", comparison_expression)

157

if match:

158

left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)

159

left_col = (match.group(2) if match.group(2) else "")

160

right_tab = ""

161

right_col = ""

162

return left_tab, left_col, right_tab, right_col

163

# exp like "(`mq`.`keyword` = 'NULL')"

164

match = re.match(r"\(`([^`]+)`\.`([^`]+)` (?:=|>|<|>=|<=)+ ('.*?')\)", comparison_expression)

165

if match:

166

left_tab = resolve_table_names((match.group(1) if match.group(1) else ""), table_aliases)

167

left_col = (match.group(2) if match.group(2) else "")

168

right_tab = ""

169

right_col = ""

170

return left_tab, left_col, right_tab, right_col

171

172

# exp like ( DATE_FORMAT(`mq`.`record_date`,'%Y') = `mt`.`year`)

173

match = re.match(r"\( ([A-Za-z_]+)\(`([^`]+)`\.`([^`]+)`\,('.+')\) (?:=|>|<|>=|<=)+ `([^`]+)`\.`([^`]+)`\)",

174

comparison_expression)

175

if match:

176

left_tab = resolve_table_names((match.group(2) if match.group(2) else ""), table_aliases)

177

left_col = (match.group(3) if match.group(3) else "")

178

right_tab = resolve_table_names((match.group(5) if match.group(5) else ""), table_aliases)

179

right_col = (match.group(6) if match.group(6) else "")

180

return left_tab, left_col, right_tab, right_col

181

182

return None, None, None, None

183

except Exception as ce:

184

logging.error(f"Error extracting table columns from '{comparison_expression}': {ce}")

185

raise

186

187

188

def resolve_table_names(table_alias, table_alias_dict):

189

"""

190

Resolve the actual table name from a given table alias.

191

192

Args:

193

table_alias (str): The table alias to resolve.

194

table_alias_dict (dict): A dictionary mapping table aliases to actual table names.

195

196

Returns:

197

str: The resolved table name.

198

"""

199

try:

200

if table_alias in table_alias_dict:

201

return table_alias_dict[table_alias]

202

elif re.match(r'.*_[0-9]+$', table_alias):

203

return table_alias[:-2]

204

else:

205

return table_alias

206

except Exception as e:

207

logging.error(f"Error resolving table name for alias '{table_alias}': {e}")

208

raise

209

210

211

def analyze_comparison(row):

212

"""

213

Analyze a comparison expression in a row and generate a suggestion based on table row counts.

214

215

Args:

216

row (dict): A dictionary representing a row of data with keys 'DATABASE_NAME', 'CMP_EXP', 'LEFT_TABLE',

217

'RIGHT_TABLE', and 'TBL_ROW_COUNTS'.

218

219

Returns:

220

dict: The input row dictionary updated with 'SUGGESTION', 'LEFT_TABLE_ROW_COUNT', and 'RIGHT_TABLE_ROW_COUNT'.

221

"""

222

try:

223

db_name = row['DATABASE_NAME']

224

cmp_element = row['CMP_EXP']

225

left_table = row['LEFT_TABLE']

226

right_table = row['RIGHT_TABLE']

227

#rowcount_dict = fetch_rowcount(row['TBL_ROW_COUNTS'])

228

rowcount_dict = parse_table_row_counts(row['PLAN_ID'],row['TBL_ROW_COUNTS'])

229

230

if rowcount_dict is None:

231

rowcount_dict = {}

232

if left_table:

233

rowcount_dict[f'{db_name}.{left_table}'] = fetch_table_row_counts(db_name, left_table)

234

if right_table:

235

rowcount_dict[f'{db_name}.{right_table}'] = fetch_table_row_counts(db_name, right_table)

236

237

suggestion = io.StringIO()

238

239

suggestion.write(f'For Expression: {cmp_element}. ')

240

left_row_count = None

241

right_row_count = None

242

243

if left_table is not None:

244

left_lookup_key = (left_table if db_name in left_table else db_name + '.' + left_table)

245

left_row_count = rowcount_dict.get(left_lookup_key, -1)

246

247

if right_table is not None:

248

right_lookup_key = (right_table if db_name in right_table else db_name + '.' + right_table)

249

right_row_count = rowcount_dict.get(right_lookup_key, -1)

250

251

if left_row_count is not None and right_row_count is not None:

252

if left_row_count < right_row_count:

253

suggestion.write(f"{left_table} has fewer records, consider table size while optimizing.")

254

elif left_row_count > right_row_count:

255

suggestion.write(f"{right_table} has fewer records, consider table size while optimizing.")

256

else:

257

suggestion.write(f"The number of records is equal on both sides of the expression.")

258

else:

259

suggestion.write(f"Unable to determine row counts for comparison: {cmp_element}.")

260

261

row['SUGGESTION'] = suggestion.getvalue()

262

row['LEFT_TABLE_ROW_COUNT'] = left_row_count

263

row['RIGHT_TABLE_ROW_COUNT'] = right_row_count

264

return row

265

except Exception as e:

266

logging.error(f"Error analyzing comparison for row '{row}': {e}")

267

raise

268

269

270

def generate_datatype_mismatch_rep(conn, output_file, batch_size):

271

"""

272

Process database records fetched using a SQL query and generate a excel report.

273

274

Args:

275

conn: The database connection object.

276

output_file (str): filepath to write.

277

batch_size (int): database query fetch size.

278

279

Returns:

280

None

281

"""

282

try:

283

logging.info("Starting database record processing...")

284

lines = []

285

total_records_processed = 0

286

data_prep_start = time.perf_counter()

287

with conn.cursor() as cursor:

288

cursor.execute(MISMATCH_DATATYPE_ROWCOUNT_COMP_QUERY)

289

while True:

290

database_records = cursor.fetchmany(batch_size)

291

logging.info(f"fetched {len(database_records)} rows")

292

if not database_records:

293

break

294

for row in database_records:

295

try:

296

parser = Parser(row['SQL_QUERY'])

297

row['LEFT_TABLE'], row['LEFT_COLUMN'], row['RIGHT_TABLE'], row[

298

'RIGHT_COLUMN'] = extract_table_columns(row['CMP_EXP'], parser.tables_aliases)

299

row = analyze_comparison(row)

300

lines.append(row)

301

302

total_records_processed += 1

303

if total_records_processed % 1000 == 0:

304

logging.info(f"Processed {total_records_processed} records...")

305

except Exception as ex:

306

logging.debug(f"while processing record: {row}")

307

logging.error(ex)

308

309

logging.debug(f"total Processed {total_records_processed} records")

310

logging.info(f'Data Preparation took {(time.perf_counter() - data_prep_start):.4f} seconds')

311

if total_records_processed > 0 :

312

report_write_start = time.perf_counter()

313

314

wb = Workbook()

315

ws = wb.active

316

ws.append(list(HEADERS))

317

318

for line in lines:

319

ws.append(list(line.values()))

320

321

wb.save(output_file)

322

323

logging.info("Writing to report completed")

324

logging.info(f'Report Writing took {(time.perf_counter() - report_write_start):.4f} seconds')

325

return True

326

else:

327

logging.info('No records to write, skipping report')

328

return False

329

330

#logging.info("Database record processing completed.")

331

332

except Exception as e:

333

logging.error(f"An error occurred during database record processing: {e}")

334

raise

Process dataframe and generate reports for each query in csv

In [11]:

1

def process_dataframe(query_csv_dataframe):

2

"""

3

Process a DataFrame containing query data.

4

5

Parameters

6

----------

7

query_csv_dataframe : pandas.DataFrame

8

The DataFrame containing query data.

9

10

"""

11

excel_report_file = f'{local_dir_path}/perf_troubleshoot_report.xlsx'

12

with pd.ExcelWriter(excel_report_file, engine="openpyxl") as xlwriter:

13

14

for idx, row in query_csv_dataframe.astype(str).iterrows():

15

query_id = row['QueryID']

16

query_name = row['QueryName']

17

query = row['QueryTxt']

18

19

logging.debug(f'about to execute {query_name}')

20

xlwriter.book.create_sheet(query_name[:30])

21

22

try:

23

# Execute the query

24

result = execute_query(conn, query)

25

26

logging.info(f"Fetched query ID: {query_id} NAME: {query_name}")

27

# Fetch the template for the result page

28

template = fetch_url_content(template_url_base + 'Result-' + str(query_id) + '.template.html')

29

if not result:

30

# If the result is empty, replace the template placeholder with an empty table

31

final_content = template.replace('rstable', empty_result_table)

32

else:

33

# If the result is not empty, format it as an HTML table and replace the template placeholder

34

result_df = pd.DataFrame(result)

35

result_df.columns = map(str.upper, result_df.columns)

36

result_table_id = 'rstbl'

37

result_table_content = result_df.to_html(table_id=result_table_id,

38

index=False,

39

classes=result_table_html_classes)

40

41

final_content = template.replace('rstable', result_table_content)

42

result_df.to_excel(xlwriter, sheet_name=query_name, index=False)

43

44

# Write the final content to an HTML file

45

report_file = f'{local_dir_path}/{query_id}.html'

46

with open(report_file, 'w') as writer:

47

writer.write(final_content)

48

49

50

except Exception as curr_iter_err:

51

# If an exception occurs during query execution, log the error and show a warning message

52

logging.error(f"Error executing query ID: {query_id}, NAME: {query_name}: {curr_iter_err}")

53

logging.exception("Exception details")

54

show_warn(f"Error executing query ID: {query_id}, NAME: {query_name}")

55

56

logging.info(f'process completed for ID:{query_id} Name:{query_name}')

57

58

59

logging.info('Result Pages are generated')

60

logging.info(f'Excel Report perf_troubleshoot_report.xlsx is generated')

Function to clean up generated directories

In [12]:

1

def clean_dir(p_dir_path, p_archive_file_path):

2

"""

3

Clean the local directory by removing all HTML files and the archive file.

4

5

Parameters

6

----------

7

p_dir_path : str

8

The path to the local directory.

9

p_archive_file_path : str

10

The path to the archive file.

11

12

"""

13

# Remove the archive file

14

try:

15

os.remove(p_archive_file_path)

16

logging.info('Local archive file removed')

17

logging.info('about to clean previous generated files in local dir')

18

shutil.rmtree(p_dir_path)

19

except OSError as e:

20

logging.error('Clean up failed')

21

execution_success = False

22

error_msg = 'clean up failed'

23

print("Error: %s : %s" % (dir_path, e.strerror))

24

raise Exception(f'Failed to clean up {str(e)}')

In [13]:

1

if connection_url.endswith('/'):

2

show_warn('Database not selected. Please select from dropdown in top of web page')

3

else:

4

execution_success = True

5

final_file_path = None

6

error_msg = None

7

try:

8

if verify_stage_area():

9

# Establish a database connection, use dict as results_type

10

conn = s2.connect(results_type='dict')

11

logging.info('Database Connection establised')

12

# Load query data from a csv file into a pandas DataFrame.

13

queries_df = load_query_data(url=query_data_url)

14

logging.info('Query Data loaded')

15

# Create a local directory for storing result files

16

local_dir_path = (s2_workspace_name + local_output_dir_suffix)

17

path = Path(local_dir_path)

18

path.mkdir(exist_ok=True)

19

20

process_dataframe(queries_df)

21

22

23

mismatch_report_file_path = f"{local_dir_path}/datatype-mismatch-comparision-report.xlsx"

24

generate_datatype_mismatch_rep(conn, output_file=mismatch_report_file_path, batch_size=1000)

25

26

copy_index_file(local_dir_path, template_url_base)

27

28

# Create a zip archive of the result files

29

final_file_path = s2_workspace_name + '_PERF_REPORT_' + my_timestamp + '.tar.gz'

30

31

zip_success = make_tarfile(final_file_path, local_dir_path)

32

33

logging.info('archive created')

34

# Upload the zip archive to the stage area

35

if zip_success:

36

try:

37

uploaded_obj = s2_workgroup_stage.upload_file(local_path=final_file_path,

38

stage_path=f'{stage_folder_path}/{final_file_path}')

39

logging.info(f'Upload success. Path: {uploaded_obj.abspath()} ')

40

print(f'File uploaded to STAGE AREA: {uploaded_obj.abspath()}')

41

logging.info('Upload success')

42

except Exception as e:

43

# If an exception occurs during the upload process, log the error

44

execution_success = False

45

logging.error(f'Failed during upload process{e}')

46

error_msg = 'File Upload failed'

47

48

clean_dir(local_dir_path, final_file_path)

49

logging.info('Local files cleaned')

50

51

else:

52

# If creating the zip archive fails, set execution_success to False and log the error

53

logging.error('Failed to create archive')

54

execution_success = False

55

error_msg = 'Failed to create archive'

56

57

else:

58

# If verifying the stage area fails, set execution_success to False and log the error

59

logging.info("Stage Area Verification Failed. Exiting.")

60

print('Script execution Failed')

61

execution_success = False

62

error_msg = 'Failed to create missing stage area path or it is not writeable'

63

except Exception as e:

64

execution_success = False

65

logging.error(f"An error occurred: {e}")

66

logging.exception("Exception details")

67

error_msg = f'Exception occured. {str(e)}'

68

69

# Display a success or error message based on the execution success

70

if execution_success:

71

#show_success("Files are uploaded to Stage")

72

show_success(generate_stage_link('Upload to stage success, File: ', stage_folder_path, final_file_path))

73

74

else:

75

show_error(error_msg)

76

77

logging.info(f'Script execution completed sucessfully: {execution_success}')

Important NOTE

  • Actions suggested suit most of performance improvement scenarios, Still we would encourage to test and verify before applying on prod environemnts

  • To use notebook as scheduled one, we have to modify python code to refer configuration from table instead of user input

Details


About this Template

This notebook will help you perform database performance troubleshoot

Notebook Icon

This Notebook can be run in Standard and Enterprise deployments.

Tags

troubleshootperformancetuningstarter

License

This Notebook has been released under the Apache 2.0 open source license.

See Notebook in action

Launch this notebook in SingleStore and start executing queries instantly.