Consistent Sharding Across Databases

erica.anderson · June 8, 2022, 8:32pm

Hi!

I understand that across databases in the same cluster, the sharding strategy can be different. That is, if two tables that exist in different databases have the same shard key, they won’t necessarily be distributed the same way. Thus, joins of these tables across databases may require repartitioning.

I’m sure there’s a good reason for this, but I was wondering if in the future you might add a configuration option to force different DBs in the same cluster to use the same sharding strategy?

The use-case here is that after migrating from Vertica, my org had to arrange our data into separate DBs for backwards compatibility with existing APIs. Vertica has “schemas” which gave us a convenient way to organize data by a particular category within the same database, denoted by a prefix, IE:

my_metadata_store.table_1
my_fact_store.table_2

Where my_metadata_store and my_fact_store were in the same database, but different “schemas.” Since switching to SingleStore, we have organized it such that my_metadata_store and my_fact_store are separate databases to achieve the same organization style. In hindsight, knowing what we know now about sharding, we might have approached this differently, but reorganizing this data otherwise would be a pretty big endeavor.

At query time we have many instances where cross-database joins occur. So, if we could shard those tables the same way, it would be very helpful performance-wise.

Excuse the long post, the context felt relevant. Thanks for reading!

MariaSilverhardt · June 28, 2022, 4:26pm

Thank you so much for the feedback Erica! It means a lot to us when our community shares their thoughts. It allows us to improve and become more efficient. Your feedback has been passed to the team and we just want you to know we appreciate your perspective. Thanks again and will update you about developments as they become available.