I have a table with three fields: Id, Data, Updated. The Data field is a Json.
The table has 1.5M rows.
I am a bit confused about the query performance.
I created an index on some of the properties of the Json document.
Performing a select count(*) on these indexes is very fast. For example I have a query that gives me 20.
But if I do the same query but use select * from with the same where clause, it takes a couple of seconds to retrieve the row. It’s not that the json is very large, so a bit confused why retrieving the whole row is soo much slower?
Most encodings that we use are able to take advantage of subsegment access, which limits the number of rows that have to be decoded at query runtime to those only needed by the query rather than decoding the entire segment. However, the default encoding for JSON does not support this today. You can override this with the following column_name JSON COLLATE utf8_bin NOT NULL option 'SeekableLZ4'
. Note that although this will improve performance of selective queries that read the entire json document, it will decrease compression ratio and regress performance in cases where only a couple keys are extracted from the document.