Over the last several months, we have been developing our new Shared Tier for SingleStore Helios cloud service, and I would like to share the rationale behind some of our design decisions. In addition, I'd like to explain how we built our own custom shared MySQL proxy, and the challenges that came with it.
Some of the high-level design goals for this project were:
- It needs to be cost-efficient to keep hundreds of users in the same underlying SingleStore cluster
- Ship fast and learn with the community's usage. This means that this project can’t take years to ship — we need to get it out fast
- It needs to work on top of the existing foundations of SingleStore Helios cloud and still be performant
- It should support the existing APIs that SingleStore already supports (MySQL, Mongo, HTTP, etc.)
- Improve our free trials and developer experience
- Increase SingleStore's exposure to those for whom our normal tier is too expensive
SingleStore was not initially built as a multi-tenant system or a serverless database, so to build an efficient shared multi-tenancy system on top of it without major refactors, we had to be creative to make it work in a cost efficient manner.
Fortunately, we are building this new tier on top of a system with very solid foundations. SingleStore is built in a way that storage and compute can be completely detached. Considering this architecture, we can aggressively detach any idle database and reattach it very quickly (~2-5 seconds). We believe that by using this built-in attach/detach mechanism — together with a proxy to automatically attach idle databases — we could make this new tier cost-efficient and still very useful to users with lower storage requirements.
Regarding isolation and noisy neighbors, everything works on top of existing database features (database-level RBAC, resource pools, etc). This is something we will keep improving and will cover in a different post.
We also tweaked some other database configurations to be more lightweight. By default, SingleStore databases are created to support huge amounts of storage and an insanely high ingest rate, which make the overhead per individual database higher than it needs to be for most use cases.
Custom MySQL proxy
This is the main component we wanted to talk about in this post. At a very high level, our architecture looks something like this:
We explored existing open source MySQL proxies, but considering our custom requirements, it made more sense to build something ourselves. It turned out not to be that hard, and this way we have more control. At the end of the day, this proxy only needs to understand MySQL packets until the connection is established. After that, it can work as a dumb proxy that blindly forwards packets back and forth.
The most important packet is the handshake response packet. We parse this packet in the proxy so that we can extract the current username and database, and make sure they exist in the chosen compute session for that connection. If they don’t yet exist the proxy seamlessly handles the creation of those resources, typically adding an overhead of 2-4 seconds on idle databases. We only create resources on demand. The username/database pair can also be used as routing information for the connection.
The proxy basically sits in the middle of this exchange and tries to simulate all the interactions, so that we can have access to the Handshake Response Packet.
// Proxy Code in golang
serverInitialHandshakePacket, err := dbConn.ReadMessage()
clientConn.WriteMessage(serverInitialHandshakePacket)
// reads the packet where the client forces the upgrade to tls
sslRequestPacket, err := dbConn.ReadMessage()
// upgrades both connections to TLS (ssl handshake happens here)
tlsDbConn := tls.Client(dbConn, tlsConfing)
tlsClientConn := tls.Server(clientConn, tlsConfing)
// this is the packet with the username and database
sslHandshakeResponsePacket, err := tlsConn.ReadtMessage()
userName, database, err :=
mysql.ExtractUsernameDatabase(sslHandshakeResponsePacket)
We faced some interesting bugs while building this proxy:
- When doing stress tests, we realized that a very small percentage of connections were just hanging
- This wasn’t happening in all the environments, and not with all the drivers
- It seemed to happen more often with clients using OpenSSL v1.1
This bug took some days of investigation to identify the root cause, mainly because we initially thought it was unrelated to our proxy code. One thing that really helps to tackle these tricky bugs is to find a consistent reproducible example. After having one, some wiresharking and adding some debug information inside the Golang TLS implementation, we finally found the root cause.
This is a problem that is very specific to how the MySQL protocol works. The first packets are always unencrypted and the SSL handshake only happens later in the process. The problem was that we had a race condition in the proxy while reading one of the unencrypted packets sent by the client, and the first client hello packet from the SSL handshake.
These two packets are sent one after the other, and if we are not careful, we can easily read too much from the buffer while handling the first packet. Using a fixed buffer size to read the initial packet before letting Golang upgrade the connection to TLS fixed the problem. This works because that initial packet has a fixed size.
sslHandshakeResponsePacketBuffer := make([]byte, 36)
sslHandshakeResponsePacket, err =
tlsClientConn.ReadWithCustomBuffer(sslHandshakeResponsePacketBuffer)
tlsDbConn := tls.Client(dbConn, tlsConfing)
We already started rolling out a preview of this new offering, and are looking for design partners and feedback as we mature this new feature! If you don’t have this feature enabled yet and would like to give it a try, feel free to reach out directly to me, or through our Forums.