Can I use a right outer join between 2 queries in a procedure?

FredV · April 9, 2020, 11:28am

Hello,

I am creating some procedures I would like to call in a pipeline and I am doing some testing about joining queries in procedures but it is not working as expected.

The setResultTypeId procedure will receive the data from the pipeline and check if the data is already present into the table and do some extra computing on the records

delimiter //
CREATE OR REPLACE PROCEDURE setResultTypeId(result_name_query Query(result_name Text)) AS
DECLARE
result_tbl QUERY(id BIGINT(19), name Text) =
SELECT rt.id, rnq.NAME as name
FROM RESULT_TYPES rt
right outer join result_name_query rnq on rt.name = rnq.name
where isnull(id);
records ARRAY(RECORD(id BIGINT(19), name Text)) = collect(result_tbl);
BEGIN
echo select * from result_tbl;
FOR rec IN records LOOP
/* extra code will be set here to create missing records , using an insert is not enough.
INSERT IGNORE INTO RESULT_TYPES(NAME, CREATED_AT) VALUES(rec.name, now());
id = last_insert_id();
call GenerateGram(rec.name,id,1);
*/
END LOOP;
END //

The test procedure is to simulate the pipeline call

delimiter //
CREATE OR REPLACE PROCEDURE test() AS
DECLARE
result_name_query Query(result_name Text) = select name from temp_r_name;
BEGIN
call setResultTypeId(result_name_query);
END //

call test();

The echo returns no row and no error

id name

but when I am doing it manually

select name from temp_r_name

name

abcd
efgh

select id, name from RESULT_TYPES
id name

1004 abcd
1006 pIC50

SELECT rt.id, rnq.NAME as name
FROM RESULT_TYPES rt
right outer join temp_r_name rnq on rt.name = rnq.name
where isnull(id);

id name

efgh

Maybe I am doing something wrong, but I can’t find some insight from the documentation

Thank you

FredV · April 10, 2020, 7:49am

I did more testing by changing the query. When I remove the right outer join and replace by a union + a group by, it is working but I have some doubts about the performance. Then I guess it is a bug or a feature that needs to be implemented.

delimiter //
CREATE OR REPLACE PROCEDURE setResultTypeId(result_name_query Query(result_name Text)) AS
DECLARE
result_tbl QUERY(name Text) =
select name from (
select max(id) as id , name from (
SELECT rt.ID as id , rt.NAME as name
FROM RESULT_TYPES rt
UNION
select null as id, rnq.result_name as name
from result_name_query rnq) group by name) where isnull(id);
records ARRAY(RECORD(name Text)) = collect(result_tbl);
BEGIN
#echo select * from result_name_query;
echo select * from result_tbl;
FOR rec IN records LOOP
/* extra code will be set here to create missing records , using an insert is not enough.
INSERT IGNORE INTO RESULT_TYPES(NAME, CREATED_AT) VALUES(rec.name, now());
id = last_insert_id();
call GenerateGram(rec.name,id,1);
*/
END LOOP;
END //

hanson · April 13, 2020, 6:13pm

It looks like you found a bug. I’ll file an internal ticket for it. The isnull() applied to the fields filled in as NULL by the right outer join is kind of a corner case. Consider a query rewrite, like a NOT EXISTS or NOT IN to get a similar result. Sounds like you already found a different one.

FredV · April 14, 2020, 2:02pm

Hi Hanson,

Thank you for the feedback.

Best Regards

Saksham · April 14, 2020, 4:02pm

The original issue with the query

    result_tbl QUERY(id BIGINT(19), name Text) =
        SELECT rt.id, rnq.NAME as name
        FROM RESULT_TYPES rt
        right outer join result_name_query rnq on rt.name = rnq.name
        where isnull(id);

was that rnq.NAME and rnq.name should probably be rnq.result_name
Ideally, MemSQL should had thrown an error here saying result_name_query has no field named name, but it didn’t and this is a bug we are fixing. You can proceed with your work by correcting the field name and I think the query then should work fine.

FredV · April 17, 2020, 4:37pm

Hello Saksham

Indeed you are right, I have made the correction and it is working perfectly.

Thank you very much !