Re: plpgsql variable assignment not supporting distinct anymore - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: plpgsql variable assignment not supporting distinct anymore
Date
Msg-id CAFj8pRDCz-Pz-oZ1bROJaRNzYA+7=8SN_BBYONaikyCCihZ0pA@mail.gmail.com
Whole thread Raw
In response to plpgsql variable assignment not supporting distinct anymore  (easteregg@verfriemelt.org)
Responses Re: plpgsql variable assignment not supporting distinct anymore
List pgsql-hackers


pá 22. 1. 2021 v 14:41 odesílatel <easteregg@verfriemelt.org> napsal:
the code provided is just a little poc to get the error ( which i have not included with my first mail sorry. )

   ERROR:  syntax error at or near "DISTINCT"
   LINE 8:     _test := DISTINCT a FROM ( VALUES ( (true), ( true ) ) )...


the code in production looked like this:


    _resource_id :=
        DISTINCT ti_resource_id
           FROM tabk.resource_timeline
          WHERE ti_a2_id = _ab2_id
            AND ti_type = 'task'
    ;

this is backed up by a trigger function, that will ensure to every instance with the same ti_a2_id exists only one ti_resource_id, hence the query can never fail due to more than one row beeing returned. but this syntax is not supported anymore, which will break BC. up until PG 13, the assignment statement was just an implizit SELECT <expression> Query.
Since Tom Lane didn't mentioned this change in the other thread, i figured the devteam might not be aware of this chance.

i can refactor this line into

    _resource_id :=
        ti_resource_id
       FROM tabk.resource_timeline
      WHERE ti_a2_id = _ab2_id
        AND ti_type = 'task'
      GROUP BY ti_resource_id
    ;

but concerns about BC was already raised, although with UNION there might be far less people affected.
with kind regards, richard

Probably the fix is not hard, but it is almost the same situation as the UNION case. The result of your code is not deterministic

If there are more different ti_resource_id then some values can be randomly ignored - when hash agg is used.

The safe fix should be

_resource_id := (SELECT ti_resource_id
       FROM tabk.resource_timeline
      WHERE ti_a2_id = _ab2_id
        AND ti_type = 'task');

and you get an exception if some values are ignored. Or if you want to ignore some values, then you can write

_resource_id := (SELECT MIN(ti_resource_id) -- or MAX
       FROM tabk.resource_timeline
      WHERE ti_a2_id = _ab2_id
        AND ti_type = 'task');

Using DISTINCT is not a good solution.




pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Some more hackery around cryptohashes (some fixes + SHA1)
Next
From: Julien Rouhaud
Date:
Subject: Re: mkid reference