In the MemSQL 7.0 general availability (GA) release, we fixed a bug related to the collation of JSON objects, which affects the results of the JSON_EXTRACT_STRING
function.
You can use JSON_EXTRACT_STRING
as a function, and also using the ::$
notation, e.g. json_field::$x
is the same as JSON_EXTRACT_STRING(json_field, 'x')
. Prior to 7.0, the collation of a JSON_EXTRACT_STRING
result could be different depending on whether it was called on the field of a row, or on the result of a JSON expression. This inconsistency was fixed in 7.0 GA so that the result of JSON_EXTRACT_STRING
always has collation utf8_bin
, the standard collation for JSON data.
The fix changed behavior in a way that was noticeable to some application developers, who may have relied on the previous behavior in some cases.
In MemSQL 7.1 GA, and simultaneously in the 7.0.18 patch release, we added a global variable (json_extract_string_collation
) to give additional control to application developers over the collation of their JSON_EXTRACT_STRING
results.
Default Behavior and New Global Variable to Change Collation Behavior
The default behavior for JSON_EXTRACT_STRING
output collation in MemSQL 7.0 and 7.1 is described below. The default behavior in 7.0 and 7.1 differ. You will see a change in behavior when you upgrade from 7.0 to 7.1.
-
In MemSQL version 7.1, the
JSON_EXTRACT_STRING
output will by default match the collation of the server (i.e., thecollation_server
variable value), which is typicallyutf8_general_ci
. -
In MemSQL version 7.0, the
JSON_EXTRACT_STRING
output will by default be the collation used by JSON, which is binary (in MemSQL this is calledutf8_bin
).
If your application requires different behavior than the defaults in these respective versions, you can use a new global variable json_extract_string_collation
to control this behavior. The variable is a global sync variable and the allowed settings for it are json
, server
, or auto
. By default, MemSQL will use the auto
value, which will be interpreted as json
in 7.0 and server
in 7.1.
The server
setting means the result of JSON_EXTRACT_STRING
will be the value of collation_server
.
The json
setting means the result of JSON_EXTRACT_STRING
will be the standard collation used for JSON (binary), which in MemSQL is called utf8_bin
.
The collation that matches the server collation is the most consistent with what developers will expect when developing SQL applications since other string expressions have this collation. That’s why the default behavior in 7.1 was chosen to be the same as the default server collation.
The 7.0 GA release uses the JSON collation (binary) for the output of JSON_EXTRACT_STRING
. We chose not to change this behavior by default in a patch release, to avoid potential breaking changes.
Changing The Default Collation Behavior
If you need to change the value of this variable from the default of auto
, you can update the variable via SET GLOBAL
. For example, the below command updates the variable to use a collation of json
:
SET GLOBAL json_extract_string_collation = "json";
Controlling JSON_EXTRACT_STRING
collation at the expression level
If your application requires finer-grain control than the variable allows, you can use a cast to change the collation of JSON_EXTRACT_STRING
expression results. For example, suppose that in a particular query, you wanted to use a binary collation for one JSON_EXTRACT_STRING
expression but a case-insensitive collation for another JSON_EXTRACT_STRING
expression. That can be done like so:
select …
from t
where t.json_field::$x :> text collate utf8_bin = "string1"
and t.json_field::$y :> text collate utf8_general_ci = "string2";
The syntax
:> text collate CollationName
is a type cast that includes a collation. This allows you to cast any JSON_EXTRACT_STRING
result to any desired collation.