Data architecture advice needed:
I am trying to architect a SaaS database for a multi tenant application (i.e. all customers will have the same Database).
One of the key requirements for a SaaS application having the ability by the customer to add Custom fields of multiple types (int, float, string, long, etc) to extend the SaaS application to meet their specific use case. I also need to record the history of changes as it happens on each custom field. (for design scale purpose expect 1 change per week per field)
How will you design a table for this? A few options I have are:
Have a Table pre defined custom fields table (say with 24/48 columns (this will the limit for custom attributes customers can define), and let them only name the label for each pre defined custom_field_1. Q: Will I need a table for each eight type? I have no way of knowing what types they will use.
Have a unique custom table for each customer (will be a 1000 tables for 1000 customers, so this table won’t be multi tenant. Will this put a lot of my Database? Since every customer will be using this table extensively.
EAV (Entity Attribute Value) approach of creating a Row Store for every time data is entered. Q: I like this, but how do I show when the data has not be provided (null)? Do I make an entry for null when a custom attributed is created by the customer?
This is really orthogonal to SingleStore and applies to relational databases in general. There are some good youtube videos on this, for example. I’m sure there are articles and books on it. One thing I would consider if I were you is to have your tables that may need extra fields just use one JSON column that has any of the extra, non-null fields in it. A lot of our customers do that.
Also, there are multiple approaches to multi-tenancy, but having one DB per user is popular. You can create hundreds of DBs in one cluster. An advantage of this is that that it’s not too hard to move a DB to another cluster if it gets too big. But of course you can’t then query across multiple tenants data in one query easily.
We support cross-DB query so if you have data that needs to span tenants, you can have that in one global DB.
Then if a tenant quits you, you can drop the database and it’s backups and all their data is gone.
@hanson - if you have hundred of DBs will it not create a big overhead in terms of local disk storage and allocated ram, as each database statically allocates storage + memory ?
There is a fixed overhead per DB but it’s not too bad. I don’t know the exact numbers, but we reduced the per-DB overhead dramatically about 2 or 3 years ago.