Thread: Understanding behavior of SELECT with multiple unnested columns

Understanding behavior of SELECT with multiple unnested columns

From
Ken Tanzer
Date:
I've been working on some queries involving multiple unnested columns.  At first, I expected the number of rows returned would be the product of the array lengths, so that this query would return 4 rows:

SELECT unnest2(array['a','b']),unnest2(array['1','2']);

when in fact it returns 2:

 unnest2 | unnest2 
---------+---------
 a       | 1
 b       | 2

Which is all well and good.  (Better, in fact, for my purposes.)  But then this query returns 6 rows:

SELECT unnest2(array['a','b','c']),unnest2(array['1','2']);
 unnest2 | unnest2 
---------+---------
 a       | 1
 b       | 2
 c       | 1
 a       | 2
 b       | 1
 c       | 2

Throw an unnested null column in and you get zero rows, which I also didn't expect:

SELECT unnest2(array['a','b','c']),unnest2(array['1','2']),unnest(NULL::varchar[]);
 unnest2 | unnest2 | unnest 
---------+---------+--------
(0 rows)


After some head scratching, I think I understand what to expect from these unnests, but I'm unclear of the logic behind what is going on.  I'm hoping someone can explain it a bit.  Also, on a practical level, would anyone know how to query so that SELECT unnest2(array['a','b','c']),unnest2(array['1','2']) would return three rows instead of six, like so:

a 1
b 2
c (NULL)

As that would be perfect for my purposes.  Thanks in advance!

Ken




--
AGENCY Software  
A data system that puts you in control
100% Free Software
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Understanding behavior of SELECT with multiple unnested columns

From
Ian Lawrence Barwick
Date:
2013/3/27 Ken Tanzer <ken.tanzer@gmail.com>
>
> I've been working on some queries involving multiple unnested columns.  At first, I expected the number of rows
returnedwould be the product of the array lengths, so that this query would return 4 rows: 
>
> SELECT unnest2(array['a','b']),unnest2(array['1','2']);
>
> when in fact it returns 2:
>
>  unnest2 | unnest2
> ---------+---------
>  a       | 1
>  b       | 2
>
> Which is all well and good.  (Better, in fact, for my purposes.)  But then this query returns 6 rows:
>
> SELECT unnest2(array['a','b','c']),unnest2(array['1','2']);
>  unnest2 | unnest2
> ---------+---------
>  a       | 1
>  b       | 2
>  c       | 1
>  a       | 2
>  b       | 1
>  c       | 2
>
> Throw an unnested null column in and you get zero rows, which I also didn't expect:
>
> SELECT unnest2(array['a','b','c']),unnest2(array['1','2']),unnest(NULL::varchar[]);
>  unnest2 | unnest2 | unnest
> ---------+---------+--------
> (0 rows)
>
>
> After some head scratching, I think I understand what to expect from these unnests, but I'm unclear of the logic
behindwhat is going on.  I'm hoping someone can explain it a bit. 

Basically you are getting Cartesian joins on the row output of
unnest() (and presumably
unnest2() - I guess this is a function you defined yourself?)

Effectively you are doing this:

CREATE TABLE t1 (val INT);
INSERT INTO t1 VALUES (1),(2);

CREATE TABLE t2 (val CHAR(1));
INSERT INTO t2 VALUES ('a'),('b'),('c');

CREATE TABLE t3 (val INT);

testdb=# SELECT * from t1, t2;
 val | val
-----+-----
   1 | a
   1 | b
   1 | c
   2 | a
   2 | b
   2 | c
(6 rows)


testdb=# DELETE FROM t2 where val='c';
DELETE 1
testdb=# SELECT * from t1, t2;
 val | val
-----+-----
   1 | a
   1 | b
   2 | a
   2 | b
(4 rows)

testdb=# SELECT * from t1, t2, t3;
 val | val | val
-----+-----+-----
(0 rows)


HTH

Ian Barwick


Re: Understanding behavior of SELECT with multiple unnested columns

From
Ken Tanzer
Date:
Basically you are getting Cartesian joins on the row output of
unnest()

Well that's what I expected too.  Except look at this example, after you delete c:

testdb=# DELETE FROM t2 where val='c';
DELETE 1
testdb=# SELECT * from t1, t2;
 val | val
-----+-----
   1 | a
   1 | b
   2 | a
   2 | b
(4 rows)
 
And compare to:

SELECT unnest(array[1,2]),unnest(array['a','b']);
 unnest | unnest 
--------+--------
      1 | a
      2 | b
(2 rows)

You can see they are not the same!  Or this, which does not return the 12 rows we might both expect:

SELECT unnest(array[1,2]),unnest(array['a','b']),unnest(array[4,5,6]);
 unnest | unnest | unnest 
--------+--------+--------
      1 | a      |      4
      2 | b      |      5
      1 | a      |      6
      2 | b      |      4
      1 | a      |      5
      2 | b      |      6
(6 rows)

Add another element onto the third array, so they "match up" better, and you get only 4 rows:

SELECT unnest(array[1,2]),unnest(array['a','b']),unnest(array[4,5,6,7]);
 unnest | unnest | unnest 
--------+--------+--------
      1 | a      |      4
      2 | b      |      5
      1 | a      |      6
      2 | b      |      7
(4 rows)

(and presumably
unnest2() - I guess this is a function you defined yourself?)

Sorry for causing confusion--I meant to remove the unnest2.  There was source code for the unnest function for earlier versions, which I defined as unnest2 to try to understand what was going on.  It should yield the same behavior as unnest itself.

Cheers,
Ken

On Tue, Mar 26, 2013 at 11:55 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote:
2013/3/27 Ken Tanzer <ken.tanzer@gmail.com>
>
> I've been working on some queries involving multiple unnested columns.  At first, I expected the number of rows returned would be the product of the array lengths, so that this query would return 4 rows:
>
> SELECT unnest2(array['a','b']),unnest2(array['1','2']);
>
> when in fact it returns 2:
>
>  unnest2 | unnest2
> ---------+---------
>  a       | 1
>  b       | 2
>
> Which is all well and good.  (Better, in fact, for my purposes.)  But then this query returns 6 rows:
>
> SELECT unnest2(array['a','b','c']),unnest2(array['1','2']);
>  unnest2 | unnest2
> ---------+---------
>  a       | 1
>  b       | 2
>  c       | 1
>  a       | 2
>  b       | 1
>  c       | 2
>
> Throw an unnested null column in and you get zero rows, which I also didn't expect:
>
> SELECT unnest2(array['a','b','c']),unnest2(array['1','2']),unnest(NULL::varchar[]);
>  unnest2 | unnest2 | unnest
> ---------+---------+--------
> (0 rows)
>
>
> After some head scratching, I think I understand what to expect from these unnests, but I'm unclear of the logic behind what is going on.  I'm hoping someone can explain it a bit.

Basically you are getting Cartesian joins on the row output of
unnest() (and presumably
unnest2() - I guess this is a function you defined yourself?)

Effectively you are doing this:

CREATE TABLE t1 (val INT);
INSERT INTO t1 VALUES (1),(2);

CREATE TABLE t2 (val CHAR(1));
INSERT INTO t2 VALUES ('a'),('b'),('c');

CREATE TABLE t3 (val INT);

testdb=# SELECT * from t1, t2;
 val | val
-----+-----
   1 | a
   1 | b
   1 | c
   2 | a
   2 | b
   2 | c
(6 rows)


testdb=# DELETE FROM t2 where val='c';
DELETE 1
testdb=# SELECT * from t1, t2;
 val | val
-----+-----
   1 | a
   1 | b
   2 | a
   2 | b
(4 rows)

testdb=# SELECT * from t1, t2, t3;
 val | val | val
-----+-----+-----
(0 rows)


HTH

Ian Barwick



--
AGENCY Software  
A data system that puts you in control
100% Free Software
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Understanding behavior of SELECT with multiple unnested columns

From
Ian Lawrence Barwick
Date:

2013/3/27 Ken Tanzer <ken.tanzer@gmail.com>
Basically you are getting Cartesian joins on the row output of
unnest()

Well that's what I expected too.  Except look at this example, after you delete c:

testdb=# DELETE FROM t2 where val='c';
DELETE 1
testdb=# SELECT * from t1, t2;
 val | val
-----+-----
   1 | a
   1 | b
   2 | a
   2 | b
(4 rows)
 
And compare to:

SELECT unnest(array[1,2]),unnest(array['a','b']);
 unnest | unnest 
--------+--------
      1 | a
      2 | b
(2 rows)

You can see they are not the same! 

Ah yes, what I suggested is actually the equivalent to 
SELECT * FROM unnest(array[1,2]) u1,unnest(array['a','b']) u2;

I seem to recall seeing the explanation for this before, although I'll be
darned if I can remember what it is.

FWIW this happens with other functions returning SETOF:

testdb=# SELECT
testdb-# generate_series(1,2) x,
testdb-# generate_series(1,2) y;
 x | y
---+---
 1 | 1
 2 | 2
(2 rows)

testdb=# SELECT
testdb-# generate_series(1,2) x,
testdb-# generate_series(1,3) y;
 x | y
---+---
 1 | 1
 2 | 2
 1 | 3
 2 | 1
 1 | 2
 2 | 3
(6 rows)

Regards


Ian Barwick

Re: Understanding behavior of SELECT with multiple unnested columns

From
Gavin Flower
Date:
On 27/03/13 20:36, Ian Lawrence Barwick wrote:

2013/3/27 Ken Tanzer <ken.tanzer@gmail.com>
Basically you are getting Cartesian joins on the row output of
unnest()

Well that's what I expected too.  Except look at this example, after you delete c:

testdb=# DELETE FROM t2 where val='c';
DELETE 1
testdb=# SELECT * from t1, t2;
 val | val
-----+-----
   1 | a
   1 | b
   2 | a
   2 | b
(4 rows)
 
And compare to:

SELECT unnest(array[1,2]),unnest(array['a','b']);
 unnest | unnest 
--------+--------
      1 | a
      2 | b
(2 rows)

You can see they are not the same! 

Ah yes, what I suggested is actually the equivalent to 
SELECT * FROM unnest(array[1,2]) u1,unnest(array['a','b']) u2;

I seem to recall seeing the explanation for this before, although I'll be
darned if I can remember what it is.

FWIW this happens with other functions returning SETOF:

testdb=# SELECT
testdb-# generate_series(1,2) x,
testdb-# generate_series(1,2) y;
 x | y
---+---
 1 | 1
 2 | 2
(2 rows)

testdb=# SELECT
testdb-# generate_series(1,2) x,
testdb-# generate_series(1,3) y;
 x | y
---+---
 1 | 1
 2 | 2
 1 | 3
 2 | 1
 1 | 2
 2 | 3
(6 rows)

Regards


Ian Barwick



The rule appears to be,
where N_x & N_y are the number of entries returned for x & y:
N_result = is the smallest positive integer that has N_x & N_y as factors.


Cheers,
Gavin

Re: Understanding behavior of SELECT with multiple unnested columns

From
Misa Simic
Date:
Hi,

You can try:

SELECT c1, c2 FROM
(
WITH a AS
(
SELECT row_number() OVER(),* FROM unnest(array['a','b', 'c', 'd']) c1
),
b AS
(
SELECT row_number() OVER(),* FROM unnest(array['1','2', '3']) c2
)
SELECT * FROM a LEFT JOIN b USING (row_number)
UNION
SELECT * FROM a RIGHT JOIN b USING (row_number)
ORDER BY row_number
) t

To simplify this you can wrap it in function what accepts two array parameters...

Kind Regards,

Misa


2013/3/27 Ken Tanzer <ken.tanzer@gmail.com>
I've been working on some queries involving multiple unnested columns.  At first, I expected the number of rows returned would be the product of the array lengths, so that this query would return 4 rows:

SELECT unnest2(array['a','b']),unnest2(array['1','2']);

when in fact it returns 2:

 unnest2 | unnest2 
---------+---------
 a       | 1
 b       | 2

Which is all well and good.  (Better, in fact, for my purposes.)  But then this query returns 6 rows:

SELECT unnest2(array['a','b','c']),unnest2(array['1','2']);
 unnest2 | unnest2 
---------+---------
 a       | 1
 b       | 2
 c       | 1
 a       | 2
 b       | 1
 c       | 2

Throw an unnested null column in and you get zero rows, which I also didn't expect:

SELECT unnest2(array['a','b','c']),unnest2(array['1','2']),unnest(NULL::varchar[]);
 unnest2 | unnest2 | unnest 
---------+---------+--------
(0 rows)


After some head scratching, I think I understand what to expect from these unnests, but I'm unclear of the logic behind what is going on.  I'm hoping someone can explain it a bit.  Also, on a practical level, would anyone know how to query so that SELECT unnest2(array['a','b','c']),unnest2(array['1','2']) would return three rows instead of six, like so:

a 1
b 2
c (NULL)

As that would be perfect for my purposes.  Thanks in advance!

Ken




--
AGENCY Software  
A data system that puts you in control
100% Free Software

learn more about AGENCY or
follow the discussion.

Re: Understanding behavior of SELECT with multiple unnested columns

From
Tom Lane
Date:
Gavin Flower <GavinFlower@archidevsys.co.nz> writes:
> The rule appears to be,
> where N_x & N_y are the number of entries returned for x & y:
> N_result = is the smallest positive integer that has N_x & N_y as factors.

Right: if there are multiple set-returning functions in a SELECT list,
the number of rows you get is the least common multiple of their
periods.  (See the logic in ExecTargetList that cycles the SRFs until
they all report "done" at the same time.)  I guess there's some value
in this for the case where they all have the same period, but otherwise
it's kind of bizarre.  It's been like that since Berkeley days though,
so I doubt we'll consider changing it now.  Rather, it'll just be
quietly deprecated in favor of putting SRFs into FROM (with LATERAL
where needed).

            regards, tom lane


Re: Understanding behavior of SELECT with multiple unnested columns

From
Ian Lawrence Barwick
Date:
2013/3/27 Tom Lane <tgl@sss.pgh.pa.us>:
> Gavin Flower <GavinFlower@archidevsys.co.nz> writes:
>> The rule appears to be,
>> where N_x & N_y are the number of entries returned for x & y:
>> N_result = is the smallest positive integer that has N_x & N_y as factors.
>
> Right: if there are multiple set-returning functions in a SELECT list,
> the number of rows you get is the least common multiple of their
> periods.  (See the logic in ExecTargetList that cycles the SRFs until
> they all report "done" at the same time.)  I guess there's some value
> in this for the case where they all have the same period, but otherwise
> it's kind of bizarre.  It's been like that since Berkeley days though,
> so I doubt we'll consider changing it now.  Rather, it'll just be
> quietly deprecated in favor of putting SRFs into FROM (with LATERAL
> where needed).

Thanks for the clarification, I was half-worried there was some fundamental
set theory or something which had passed me by.

Regards

Ian Barwick


Re: Understanding behavior of SELECT with multiple unnested columns

From
Gavin Flower
Date:
On 28/03/13 03:03, Tom Lane wrote:
Gavin Flower <GavinFlower@archidevsys.co.nz> writes:
The rule appears to be,
where N_x & N_y are the number of entries returned for x & y:
N_result = is the smallest positive integer that has N_x & N_y as factors.
Right: if there are multiple set-returning functions in a SELECT list,
the number of rows you get is the least common multiple of their
periods.  (See the logic in ExecTargetList that cycles the SRFs until
they all report "done" at the same time.)  I guess there's some value
in this for the case where they all have the same period, but otherwise
it's kind of bizarre.  It's been like that since Berkeley days though,
so I doubt we'll consider changing it now.  Rather, it'll just be
quietly deprecated in favor of putting SRFs into FROM (with LATERAL
where needed).
		regards, tom lane
It surprised me when I first came across the behaviour, but I can't say it is wrong!

I have used the current behaviour to create test data.

It might be useful to have a means of supplying the starting indices list and the number of rows to generate (which would not only allow the number to be less than the natural period, but also greater). I think this would be very useful to create test data.  Best if the facility could take any source of values, including SELECT statements.  Probably not 'Politically Correct', but making it an SQL construct might be less confusing than a function.  Also, either, but not both of the indices list, or the limit, could be omitted to get their default values.


Cheers,
Gavin

Re: Understanding behavior of SELECT with multiple unnested columns

From
Jasen Betts
Date:
On 2013-03-27, Ken Tanzer <ken.tanzer@gmail.com> wrote:
> --047d7b5da657ecd54004d8e23a90
> Content-Type: text/plain; charset=ISO-8859-1
>
> I've been working on some queries involving multiple unnested columns.  At
> first, I expected the number of rows returned would be the product of the
> array lengths, so that this query would return 4 rows:
>
> SELECT unnest2(array['a','b']),unnest2(array['1','2']);
>
> when in fact it returns 2:
>
>  unnest2 | unnest2
> ---------+---------
>  a       | 1
>  b       | 2
>
> Which is all well and good.  (Better, in fact, for my purposes.)  But then
> this query returns 6 rows:
>
> SELECT unnest2(array['a','b','c']),unnest2(array['1','2']);
>  unnest2 | unnest2
> ---------+---------
>  a       | 1
>  b       | 2
>  c       | 1
>  a       | 2
>  b       | 1
>  c       | 2
>

check out the documentation for generate_series, it behaves similarly.

I think basically the problem is caused by using SRFs between SELECT
and FROM

to see that it's not the cartesion product try it with sets of length
4 and 6

SELECT unnest(array['a','b','c','d']),unnest(array['1','2','3','4','5','6']);


what you want is possible, but it's not pretty - you have to number
the results and join explicitly.

with
  a as ( select u,row_number() over ()
           from (select unnest(array['a','b']) as u ) as x),
  b as ( select u,row_number() over ()
           from (select unnest(array['1','2','3']) as u ) as x)
select a.u as a, b.u as b
  from a full outer join b on a.row_number=b.row_number;


⚂⚃ 100% natural

Re: Understanding behavior of SELECT with multiple unnested columns

From
Merlin Moncure
Date:
On Wed, Mar 27, 2013 at 9:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Gavin Flower <GavinFlower@archidevsys.co.nz> writes:
>> The rule appears to be,
>> where N_x & N_y are the number of entries returned for x & y:
>> N_result = is the smallest positive integer that has N_x & N_y as factors.
>
> Right: if there are multiple set-returning functions in a SELECT list,
> the number of rows you get is the least common multiple of their
> periods.  (See the logic in ExecTargetList that cycles the SRFs until
> they all report "done" at the same time.)  I guess there's some value
> in this for the case where they all have the same period, but otherwise
> it's kind of bizarre.  It's been like that since Berkeley days though,
> so I doubt we'll consider changing it now.  Rather, it'll just be
> quietly deprecated in favor of putting SRFs into FROM (with LATERAL
> where needed).

It's a neat way to make a query that doesn't terminate (which AFAIK is
impossible in vanilla SQL):

create sequence s;
select generate_series(1,nextval('s')), generate_series(1,nextval('s'));

merlin