Altering pipeline avro schema

i_will · November 14, 2019, 2:12pm

Hi,

We are creating pipeline that uses AVRO schema to ingest data in tables from kafka. As with avro we get flexibility of evolving our schema we are looking for a similar feature with pipeline too.

But with Alter pipeline we noticed that there is no option to update schema, we can only set offsets/transforms thus our only solution left to achieve this is
1)Stop pipeline
2)fetch offset from all partitions
3)Create new pipeline with updated schema
4)Alter pipeline with previous pipeline offset
5)Drop old pipeline
6)start this new pipeline.

This looks tedious specially maintaining offsets. Is memsql has any plans to support schema evolution or any better way to achieve this use case?

JoYo · November 14, 2019, 11:52pm

create or replace pipeline <name> <new pipeline definition>

JoYo · November 14, 2019, 11:53pm

Also, show create pipeline <name> extended will give you the exact alter statement you need, so you don’t have to generate it yourself

hari.nair · November 20, 2019, 10:48pm

We have exactly the same issue. But the drop/replace solution doesn’t quite work for us, because it would mean we would have to replace the pipeline at exactly the point where the old schema version has been drained and before the new one starts.
The scheme we have settled on now is to create a new topic and a new pipeline for a new version, and alter table to match it. let the old version drain when it does, and then drop it. (Of course all this assumes that schema changes are backward compatible.)

i_will · November 25, 2019, 9:21am

@JoYo based on @hari.nair usages it seems create or replace doesn’t work, do we have any concrete solution to this ?

sasha · November 26, 2019, 4:06am

Agreed, there’s no good way to replace the pipeline at the exact point where the schema changes. For now, working with a new topic is in fact likely to be the best workaround. It’s possible to use a transform to rewrite records of older schemas as instances of the pipeline’s expected schema, replacing the pipeline with a new schema and transform as appropriate, but it’s not simple and carries a performance cost.

We are in fact working on schema registry integration for exactly these reasons. It’ll be ready for release soon after 7.0, though I’m not qualified to give a more precise date.

himawari · August 8, 2023, 7:00am

Hi @i_will were you able to ingest avro data from kafka to single store? I am trying to do the same and facing many challenges