Thread: lead() with arrays - strange behaviour
Consider the following dummy table (this is a simplified example from a bigger query): create table sample_data (id int, id_list int[]); insert into sample_data (id, id_list) values (1, array[1,2,3]), (2, array[2,3,4]), (3, array[4,5,6]); The following statement tries to find the overlapping values in id_list between the current row and the next row: select id, id_list, lead(id_list) over (order by id) as next_list, array(select unnest(id_list) intersect select unnest(lead(id_list) over (order by id))) as common_ids from sample_data; The above returns: id | id_list | next_list | common_ids ---+---------+-----------+----------- 1 | {1,2,3} | {2,3,4} | {} 2 | {2,3,4} | {4,5,6} | {} 3 | {4,5,6} | | {} The empty array for "common_ids" is obviously incorrect. However, when the evaluation of the "next_list" is put into a derived table, then this works as expected: select id, id_list, next_list, array(select unnest(id_list) intersect select unnest(next_list)) as common_ids from ( select id, id_list, lead(id_list) over (order by id) as next_list from sample_data ) t returns: id | id_list | next_list | common_ids ---+---------+-----------+----------- 1 | {1,2,3} | {2,3,4} | {2,3} 2 | {2,3,4} | {4,5,6} | {4} 3 | {4,5,6} | | {} This is with Postgres 11.4 Is this a bug or simply not supported? It does work correctly with intarray's "intersect" operator: select id, id_list, id_list & lead(id_list) over (order by id) as next_list from sample_data; However, the actual data uses a bigint, so intarray isn't an option. Thomas
On Thu, 8 Aug 2019 at 21:06, Thomas Kellerer <spam_eater@gmx.net> wrote: > The following statement tries to find the overlapping values in id_list between the current row and the next row: > > select id, > id_list, > lead(id_list) over (order by id) as next_list, > array(select unnest(id_list) intersect select unnest(lead(id_list) over (order by id))) as common_ids > from sample_data; > > The above returns: > > id | id_list | next_list | common_ids > ---+---------+-----------+----------- > 1 | {1,2,3} | {2,3,4} | {} > 2 | {2,3,4} | {4,5,6} | {} > 3 | {4,5,6} | | {} > > The empty array for "common_ids" is obviously incorrect. I think you're confused with what the SELECT with the empty FROM clause does here. In your subquery "id_list" is just a parameter from the outer query. LEAD(id_list) OVER (ORDER BY id) is never going to return anything since those are both just effectively scalar values, to which there is no "next" value. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley schrieb am 08.08.2019 um 13:03: >> The following statement tries to find the overlapping values in id_list between the current row and the next row: >> >> select id, >> id_list, >> lead(id_list) over (order by id) as next_list, >> array(select unnest(id_list) intersect select unnest(lead(id_list) over (order by id))) as common_ids >> from sample_data; >> >> The above returns: >> >> id | id_list | next_list | common_ids >> ---+---------+-----------+----------- >> 1 | {1,2,3} | {2,3,4} | {} >> 2 | {2,3,4} | {4,5,6} | {} >> 3 | {4,5,6} | | {} >> >> The empty array for "common_ids" is obviously incorrect. > > I think you're confused with what the SELECT with the empty FROM > clause does here. In your subquery "id_list" is just a parameter from > the outer query. LEAD(id_list) OVER (ORDER BY id) is never going to > return anything since those are both just effectively scalar values, > to which there is no "next" value. id_list is a column in the table and as you can see in the output lead(id_list) most definitely returns the array from the next row. and "select unnest(some_array)" works just fine as you can see when "next_list" is taken from the derived table. Thomas
Thomas Kellerer <spam_eater@gmx.net> writes: > David Rowley schrieb am 08.08.2019 um 13:03: >> I think you're confused with what the SELECT with the empty FROM >> clause does here. In your subquery "id_list" is just a parameter from >> the outer query. LEAD(id_list) OVER (ORDER BY id) is never going to >> return anything since those are both just effectively scalar values, >> to which there is no "next" value. > id_list is a column in the table and as you can see in the output > lead(id_list) most definitely returns the array from the next row. > and "select unnest(some_array)" works just fine as you can see > when "next_list" is taken from the derived table. David's point is that the two occurrences of lead() don't mean the same thing. A window function is directly tied to the SELECT that it is in the select-list of, and its notion of next and previous rows is concerned with the set of rows that that SELECT's FROM-clause generates. In this example, the inner SELECT has an empty FROM that returns one row, so the lead() in that SELECT doesn't do anything useful. You could probably get where you want to go with something along the lines of select id, id_list, next_list, array(select unnest(id_list) intersect select unnest(next_list)) as common_ids from ( select id, id_list, lead(id_list) over (order by id) as next_list from sample_data ) ss; regards, tom lane
Tom Lane schrieb am 08.08.2019 um 16:10: > David's point is that the two occurrences of lead() don't mean the > same thing. A window function is directly tied to the SELECT that > it is in the select-list of, and its notion of next and previous > rows is concerned with the set of rows that that SELECT's FROM-clause > generates. In this example, the inner SELECT has an empty FROM that > returns one row, so the lead() in that SELECT doesn't do anything > useful. Ah! Now I get it ;) Thanks for clearing that up. > You could probably get where you want to go with something along > the lines of Yes, that's what I did in the end (see my initial post)