Thread: lead() with arrays - strange behaviour

lead() with arrays - strange behaviour

From
Thomas Kellerer
Date:
Consider the following dummy table (this is a simplified example from a bigger query):

    create table sample_data (id int, id_list int[]);
    insert into sample_data (id, id_list)
    values 
       (1, array[1,2,3]),
       (2, array[2,3,4]),
       (3, array[4,5,6]);

The following statement tries to find the overlapping values in id_list between the current row and the next row:

    select id, 
           id_list,
           lead(id_list) over (order by id) as next_list,
           array(select unnest(id_list) intersect select unnest(lead(id_list) over (order by id))) as common_ids
    from sample_data;       

The above returns: 

    id | id_list | next_list | common_ids
    ---+---------+-----------+-----------
     1 | {1,2,3} | {2,3,4}   | {}        
     2 | {2,3,4} | {4,5,6}   | {}        
     3 | {4,5,6} |           | {}        

The empty array for "common_ids" is obviously incorrect. 

However, when the evaluation of the "next_list" is put into a derived table, then this works as expected:

    select id, id_list, next_list, 
           array(select unnest(id_list) intersect select unnest(next_list)) as common_ids
    from (
      select id, 
             id_list,
             lead(id_list) over (order by id) as next_list
      from sample_data
    ) t

returns:

    id | id_list | next_list | common_ids
    ---+---------+-----------+-----------
     1 | {1,2,3} | {2,3,4}   | {2,3}     
     2 | {2,3,4} | {4,5,6}   | {4}       
     3 | {4,5,6} |           | {}        

This is with Postgres 11.4

Is this a bug or simply not supported?

It does work correctly with intarray's "intersect" operator:

    select id, 
           id_list,
           id_list & lead(id_list) over (order by id) as next_list
    from sample_data;       


However, the actual data uses a bigint, so intarray isn't an option.

Thomas


    
              




Re: lead() with arrays - strange behaviour

From
David Rowley
Date:
On Thu, 8 Aug 2019 at 21:06, Thomas Kellerer <spam_eater@gmx.net> wrote:
> The following statement tries to find the overlapping values in id_list between the current row and the next row:
>
>     select id,
>            id_list,
>            lead(id_list) over (order by id) as next_list,
>            array(select unnest(id_list) intersect select unnest(lead(id_list) over (order by id))) as common_ids
>     from sample_data;
>
> The above returns:
>
>     id | id_list | next_list | common_ids
>     ---+---------+-----------+-----------
>      1 | {1,2,3} | {2,3,4}   | {}
>      2 | {2,3,4} | {4,5,6}   | {}
>      3 | {4,5,6} |           | {}
>
> The empty array for "common_ids" is obviously incorrect.

I think you're confused with what the SELECT with the empty FROM
clause does here.  In your subquery "id_list" is just a parameter from
the outer query. LEAD(id_list) OVER (ORDER BY id) is never going to
return anything since those are both just effectively scalar values,
to which there is no "next" value.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: lead() with arrays - strange behaviour

From
Thomas Kellerer
Date:
David Rowley schrieb am 08.08.2019 um 13:03:
>> The following statement tries to find the overlapping values in id_list between the current row and the next row:
>>
>>     select id,
>>            id_list,
>>            lead(id_list) over (order by id) as next_list,
>>            array(select unnest(id_list) intersect select unnest(lead(id_list) over (order by id))) as common_ids
>>     from sample_data;
>>
>> The above returns:
>>
>>     id | id_list | next_list | common_ids
>>     ---+---------+-----------+-----------
>>      1 | {1,2,3} | {2,3,4}   | {}
>>      2 | {2,3,4} | {4,5,6}   | {}
>>      3 | {4,5,6} |           | {}
>>
>> The empty array for "common_ids" is obviously incorrect.
> 
> I think you're confused with what the SELECT with the empty FROM
> clause does here.  In your subquery "id_list" is just a parameter from
> the outer query. LEAD(id_list) OVER (ORDER BY id) is never going to
> return anything since those are both just effectively scalar values,
> to which there is no "next" value.

id_list is a column in the table and as you can see in the output 
lead(id_list) most definitely returns the array from the next row. 

and "select unnest(some_array)" works just fine as you can see 
when "next_list" is taken from the derived table. 

Thomas





Re: lead() with arrays - strange behaviour

From
Tom Lane
Date:
Thomas Kellerer <spam_eater@gmx.net> writes:
> David Rowley schrieb am 08.08.2019 um 13:03:
>> I think you're confused with what the SELECT with the empty FROM
>> clause does here.  In your subquery "id_list" is just a parameter from
>> the outer query. LEAD(id_list) OVER (ORDER BY id) is never going to
>> return anything since those are both just effectively scalar values,
>> to which there is no "next" value.

> id_list is a column in the table and as you can see in the output
> lead(id_list) most definitely returns the array from the next row.
> and "select unnest(some_array)" works just fine as you can see
> when "next_list" is taken from the derived table.

David's point is that the two occurrences of lead() don't mean the
same thing.  A window function is directly tied to the SELECT that
it is in the select-list of, and its notion of next and previous
rows is concerned with the set of rows that that SELECT's FROM-clause
generates.  In this example, the inner SELECT has an empty FROM that
returns one row, so the lead() in that SELECT doesn't do anything
useful.

You could probably get where you want to go with something along
the lines of

    select id,
           id_list,
           next_list,
           array(select unnest(id_list) intersect select unnest(next_list)) as common_ids
    from (
    select id,
           id_list,
           lead(id_list) over (order by id) as next_list
    from sample_data
) ss;

            regards, tom lane



Re: lead() with arrays - strange behaviour

From
Thomas Kellerer
Date:
Tom Lane schrieb am 08.08.2019 um 16:10:
> David's point is that the two occurrences of lead() don't mean the
> same thing.  A window function is directly tied to the SELECT that
> it is in the select-list of, and its notion of next and previous
> rows is concerned with the set of rows that that SELECT's FROM-clause
> generates.  In this example, the inner SELECT has an empty FROM that
> returns one row, so the lead() in that SELECT doesn't do anything
> useful.

Ah! Now I get it ;) 

Thanks for clearing that up.

> You could probably get where you want to go with something along
> the lines of

Yes, that's what I did in the end (see my initial post)