Set-Returning functions in a select list - Mailing list pgsql-general
From | J Lumby |
---|---|
Subject | Set-Returning functions in a select list |
Date | |
Msg-id | DM5PR06MB3082AEC288F546F7AF5393E3A3E69@DM5PR06MB3082.namprd06.prod.outlook.com Whole thread Raw |
Responses |
Re: Set-Returning functions in a select list
|
List | pgsql-general |
I am unclear about the rules which postgresql uses in building a result set when the select list ( see note ) contains several different Set-Returning functions, possibly with sets of different sizes. At a very high level, what I see happening is that each Set-Returning expression causes the result set to be expanded with additional rows, one per member of the set. When there is only one such expression, this is clear - the size of the final result set is N times the size of the output of the final operator where N is the size of the set (i.e. N times the size if that expression had been a scalar). But it becomes unclear where there is more than one such expression. I can't find much on this point in the doc (I am looking in PostgreSQL 13.2 Documentation). There is one small example in "9.25. Set Returning Functions" but in this example there is only one Set-Returning expression in the select list. Also the description of the SELECT command mentions something called a "row group" : "The actual output rows are computed using the SELECT output expressions for each selected row or row group. (See SELECT List below.)" but no mention of Set-Returning expressions in the referenced section. I tried an experiment : <code> echo -e "DROP SCHEMA testing cascade; CREATE SCHEMA testing; CREATE TABLE testing.set_returning_in_select_list (id int4 , bitmask_a int , bitmask_b int ); insert into testing.set_returning_in_select_list values ( 1 , x'1010'::int , x'0101'::int ) , ( 2 , x'2020'::int , x'0202'::int ) , ( 3 , x'3030'::int , x'0303'::int ); select id , to_hex(bitmask_a) as hex_a , unnest( array [ ( bitmask_a & 16 ) , ( bitmask_a & 32 ) , ( bitmask_a & 48 ) ] ) as bits_of_a , to_hex(bitmask_b) as hex_b from testing.set_returning_in_select_list order by bits_of_a; select id , to_hex(bitmask_a) as hex_a , unnest( array [ ( bitmask_a & 16 ) , ( bitmask_a & 32 ) , ( bitmask_a & 48 ) ] ) as bits_of_a , to_hex(bitmask_b) as hex_b , unnest( array [ ( bitmask_b & 1 ) , ( bitmask_b & 2 ) , ( bitmask_b & 3 ) ] ) as bits_of_b from testing.set_returning_in_select_list order by bits_of_a; select id , to_hex(bitmask_a) as hex_a , unnest( array [ ( bitmask_a & 16 ) , ( bitmask_a & 32 ) , ( bitmask_a & 48 ) ] ) as bits_of_a , to_hex(bitmask_b) as hex_b , unnest( array [ ( bitmask_b & 3 ) , ( bitmask_b & 2 ) , ( bitmask_b & 1 ) ] ) as bits_of_b , unnest( array [ ( bitmask_b & 3 ) , ( bitmask_b & 2 ) , ( bitmask_b & 1 ) , ( bitmask_b & 17 ) ] ) as extraneous_bits_of_b from testing.set_returning_in_select_list order by bits_of_a;" | psql -e </code> The output of the last query is, firstly, different on postgresql-9.6 than on 13.3, and secondly, hard to explain in either case. my outputs appended below. I *think* : . postgresql-9.6 applies a kind of triple cartesian join on the elements of each set, joining one set with the other set and then again with itself, ending up with many duplicates . postgresql-13.3 applies a kind of null-padding extension of the elements of the smaller set and then an arbitrary linear pairing of the two sets, ending up with one duplicate. Which one, if any, is correct? And why? (note : I am not asking about the case of Set-Returning functions in the FROM clause, where things seem clear(er).) Cheers, John Lumby My outputs of the final query : . postgresql-9.6 <code> id | hex_a | bits_of_a | hex_b | bits_of_b | extraneous_bits_of_b ----+-------+-----------+-------+-----------+---------------------- 1 | 1010 | 0 | 101 | 0 | 0 1 | 1010 | 0 | 101 | 0 | 1 1 | 1010 | 0 | 101 | 0 | 1 1 | 1010 | 0 | 101 | 0 | 1 2 | 2020 | 0 | 202 | 2 | 2 2 | 2020 | 0 | 202 | 2 | 2 2 | 2020 | 0 | 202 | 2 | 0 2 | 2020 | 0 | 202 | 2 | 0 1 | 1010 | 16 | 101 | 1 | 1 1 | 1010 | 16 | 101 | 1 | 1 1 | 1010 | 16 | 101 | 1 | 1 1 | 1010 | 16 | 101 | 1 | 0 1 | 1010 | 16 | 101 | 1 | 1 1 | 1010 | 16 | 101 | 1 | 1 1 | 1010 | 16 | 101 | 1 | 0 1 | 1010 | 16 | 101 | 1 | 1 3 | 3030 | 16 | 303 | 3 | 3 3 | 3030 | 16 | 303 | 3 | 1 3 | 3030 | 16 | 303 | 3 | 1 3 | 3030 | 16 | 303 | 3 | 2 2 | 2020 | 32 | 202 | 2 | 0 2 | 2020 | 32 | 202 | 2 | 2 2 | 2020 | 32 | 202 | 2 | 0 2 | 2020 | 32 | 202 | 0 | 0 2 | 2020 | 32 | 202 | 2 | 2 2 | 2020 | 32 | 202 | 0 | 2 2 | 2020 | 32 | 202 | 0 | 0 2 | 2020 | 32 | 202 | 0 | 2 3 | 3030 | 32 | 303 | 2 | 2 3 | 3030 | 32 | 303 | 2 | 3 3 | 3030 | 32 | 303 | 2 | 1 3 | 3030 | 32 | 303 | 2 | 1 3 | 3030 | 48 | 303 | 1 | 3 3 | 3030 | 48 | 303 | 1 | 2 3 | 3030 | 48 | 303 | 1 | 1 3 | 3030 | 48 | 303 | 1 | 1 (36 rows) </code> . postgresql-13.3 <code> id | hex_a | bits_of_a | hex_b | bits_of_b | extraneous_bits_of_b ----+-------+-----------+-------+-----------+---------------------- 1 | 1010 | 0 | 101 | 0 | 0 2 | 2020 | 0 | 202 | 2 | 2 1 | 1010 | 16 | 101 | 1 | 1 1 | 1010 | 16 | 101 | 1 | 1 3 | 3030 | 16 | 303 | 3 | 3 2 | 2020 | 32 | 202 | 0 | 0 2 | 2020 | 32 | 202 | 2 | 2 3 | 3030 | 32 | 303 | 2 | 2 3 | 3030 | 48 | 303 | 1 | 1 1 | 1010 | | 101 | | 1 3 | 3030 | | 303 | | 1 2 | 2020 | | 202 | | 0 (12 rows) </code>
pgsql-general by date: