In my last blog post, I investigated a Linux performance issue affecting a specific customer workload. In this post, I will introduce the tool I created to drive that investigation.
Recently, a customer was running a test where data was loaded into SingleStore via LOAD DATA
. The customer’s third-party benchmarking tool found that SingleStore took twice as long to load the same amount of data as a competing database; however, the numbers reported by this tool did not make sense. Local tests had shown SingleStore could process ~200 MB/s of data on a single machine<Sup id={"fnref-1"} >1, but the third-party tool reported SingleStore was only processing ~60 MB/s of data. Looking at raw TCP socket data on both sides of the connection, I realized that the benchmarking tool was inadvertently throttling itself when running against SingleStore. When I used the standard mysql
client to run the LOAD DATA
command directly, performance doubled as SingleStore saturated the network card and matched competitor performance at 128MB/s<Sup id={"fnref-2"} >2.
The problem here was that the customer was passively running their tool and only looking at the final result. I follow a methodology known as active benchmarking, a technique for validating the results of a benchmark. The original description summarizes the active benchmarking process quite well:
While the benchmark is running, analyze the performance of all components involved using other tools, to identify the true limiter of the benchmark.
Every active benchmarking session is a fully-fledged performance investigation that takes more time and effort. But it turns up the actual bottleneck<Sup id={"fnref-3"} >3 and reveals the maximum possible performance.
dbbench
When I first joined SingleStore, I found myself frequently writing one-off scripts to simulate complicated workloads for these active benchmarking investigations. Having previously used `fio` to test filesystems, I wanted a similar tool for defining a custom workload and running it efficiently. I found the existing tools to be fairly restrictive:
mysqlslap
is only really designed to execute one type of query at a time, andsysbench
only supports a very small set of workloads as first class citizens. I wanted to test more complicated workloads.- Both
mysqlslap
andsysbench
cap the number of concurrent connections to the configured number of threads if the database cannot keep up with the workload. I want to test and observe queueing in the database (not in my benchmarking tool) as queries pile up. - Both
mysqlslap
andsysbench
are designed first and foremost be stand-alone tests: they want to create their own managed tables with their own data. - HammerDB is a graphical tool that does not appear to have easy ways to configure otherwise. I wanted to use simple text files for defining workloads (à la
fio
) so I could share them via email or check them into a git repository.
In addition, I had concerns about the performance overhead of running these thread-based workload drivers that generate queries or data<Sup id={"fnref-4"} >4. I commonly run the workload generator on the database server, where it is unacceptable for the generator to compete with the database for resources. I needed a tool to be as lightweight as possible.
Ultimately, these concerns motivated me to write a new tool, dbbench
. dbbench
is a fast, lightweight database workload generator that executes a workload defined a flexible configuration file. dbbench
can behave like the other stand-alone testing tools, but it really shines as a tool for active benchmarking complicated workloads. The following configuration file describes a common analytics workload: analytics queries on a table with streaming writes that are periodically deleted from a table:
[streaming inserts]
; Insert 100 rows per second.
query=insert into t (value) select rand() * 1000
rate=100
[clean old rows from table]
; Every 100s, clean old rows from the table
query=delete from t where insert_date < now() - interval 1 hour
rate=0.01
[read nearest values]
; 8 clients reading as fast as possible.
query=select count(*) from t where value between 900 and 1000
queue-depth=8
Given this configuration file, dbbench
will run forever and log some simple statistics about each job. While it is running, I can very easily observe interesting performance behaviors such as how the performance of read queries changes dramatically after the table is cleaned up. In the snippet from the logs below, the average latency of the “read nearest values” job drops from 225ms to 56ms after the table is cleaned:
2016/01/20 02:56:06 **streaming inserts**: latency 3.078086ms±1.173286ms; 100 transactions (100.177 TPS); 100 rows (100.177 RPS)
2016/01/20 02:56:06 **read nearest values**: latency _225.831701ms_±63.85481ms; 39 transactions (28.282 TPS); 39 rows (28.282 RPS)
2016/01/20 02:56:06 **clean old rows from table**: latency 7.824990134s±0; 2 transactions (0.155 TPS); 3259108 rows (253306.618 RPS)
2016/01/20 02:56:07 **streaming inserts**: latency 2.785273ms±931.039µs; 100 transactions (100.870 TPS); 100 rows (100.870 RPS)
2016/01/20 02:56:07 **read nearest values**: latency _56.268517ms_±5.004933ms; 143 transactions (131.732 TPS); 143 rows (131.732 RPS)
I recently used a very similar configuration to test a customer workload and noticed that write query throughput dropped precipitously after the table cleanup was executed. It turned out that the background log merger thread was misconfigured to be aggressive on the SingleStore instance. When the table cleanup executed, the background thread triggered and used up so many disk IOPs that the transaction log was unable to keep up with incoming writes:
dbbench
was instrumental in this investigation because it provided a simple way to describe the workload and a reliable way to execute it. By monitoring and questioning the data throughout the execution of the benchmark, I was able to observe and fix a complicated performance anomaly.
dbbench
version 0.1 is publicly available under an Apache license. I’m actively improving and stabilizing dbbench
. SingleStore is already using it for many customer Proof of Concepts and performance testing. Try out or contribute to dbbench
on Github today.
- During the investigation, I found an easy way to improve this to 1GB/s ↩
- We were bottlenecked on the 1 Gigabit Ethernet card ↩
- Including in the benchmarking tool itself ↩
- For example, the new sysbench 0.5 Lua interface has a noticable CPU overhead ↩