Thread: WHERE col = ANY($1) extended to 2 or more columns?

WHERE col = ANY($1) extended to 2 or more columns?

From

Dominique Devienne

Date:

09 February 2023, 15:41:41

Hi. We are implementing an API which takes a list of row keys, and must return info about those rows. To implement that efficiently, in as few round-trips as possible, we bind a (binary) array of keys (ints, uuids, or strings) and that works great, but only if the key is a scalar one.

Now we'd like to do the same for composite keys, and I don't know how to do that.

Is it possible? Could someone please help out or demo such a thing?

We are doing it in C++ using libpq, but a pure SQL or PL/pgSQL demo would still help (I think).

Thanks, --DD

Re: WHERE col = ANY($1) extended to 2 or more columns?

From

"David G. Johnston"

Date:

09 February 2023, 15:51:10

On Thu, Feb 9, 2023 at 8:41 AM Dominique Devienne <ddevienne@gmail.com> wrote:

Hi. We are implementing an API which takes a list of row keys, and must return info about those rows. To implement that efficiently, in as few round-trips as possible, we bind a (binary) array of keys (ints, uuids, or strings) and that works great, but only if the key is a scalar one.

Now we'd like to do the same for composite keys, and I don't know how to do that.
Is it possible? Could someone please help out or demo such a thing?
We are doing it in C++ using libpq, but a pure SQL or PL/pgSQL demo would still help (I think).

It's trivial in pl/pgsql since I don't have to deal with serialization of the data.

An array-of-composites is simply:

SELECT ARRAY[ ROW(1, 'one'), ROW(2, 'two') ]::composite_type[];

Not sure about the binary part but there are rules for how to serialize to text both composites and arrays, and quite probably libpq provides functions for this already though i've never used it raw.

David J.

Re: WHERE col = ANY($1) extended to 2 or more columns?

From

Dominique Devienne

Date:

09 February 2023, 16:03:54

On Thu, Feb 9, 2023 at 4:51 PM David G. Johnston <david.g.johnston@gmail.com> wrote:

On Thu, Feb 9, 2023 at 8:41 AM Dominique Devienne <ddevienne@gmail.com> wrote:
Now we'd like to do the same for composite keys, and I don't know how to do that.

An array-of-composites is simply:
SELECT ARRAY[ ROW(1, 'one'), ROW(2, 'two') ]::composite_type[];

Thanks. I don't consider that "simple" myself :). But I'm definitely not an advanced PostgreSQL user!

Would still appreciate a more fleshed out demo, if anyone is kind enough to provide it. Thanks, --DD

Re: WHERE col = ANY($1) extended to 2 or more columns?

From

Dominique Devienne

Date:

09 February 2023, 16:09:10

On Thu, Feb 9, 2023 at 5:03 PM Dominique Devienne <ddevienne@gmail.com> wrote:

On Thu, Feb 9, 2023 at 4:51 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thu, Feb 9, 2023 at 8:41 AM Dominique Devienne <ddevienne@gmail.com> wrote:
Now we'd like to do the same for composite keys, and I don't know how to do that.

An array-of-composites is simply:
SELECT ARRAY[ ROW(1, 'one'), ROW(2, 'two') ]::composite_type[];

Thanks. I don't consider that "simple" myself :). But I'm definitely not an advanced PostgreSQL user!
Would still appreciate a more fleshed out demo, if anyone is kind enough to provide it. Thanks, --DD

Also, I'm still not sure how to write that WHERE clause, with the $1 being an array of a composite type.

And since this is a binary bind, what kind of OIDs to use. Your example seems to generate a type on-the-fly for example David.

Will we need to create custom types just so we have OIDs we can use to assemble the bytes of the array-of-composite bind?

I believe there's an implicit ROW type per table created. Are there also implicit types for composite PKs and/or UNIQUE constraints?

Lots of questions...

Re: WHERE col = ANY($1) extended to 2 or more columns?

From

"David G. Johnston"

Date:

09 February 2023, 16:17:17

On Thu, Feb 9, 2023 at 9:09 AM Dominique Devienne <ddevienne@gmail.com> wrote:

On Thu, Feb 9, 2023 at 5:03 PM Dominique Devienne <ddevienne@gmail.com> wrote:
On Thu, Feb 9, 2023 at 4:51 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thu, Feb 9, 2023 at 8:41 AM Dominique Devienne <ddevienne@gmail.com> wrote:
Now we'd like to do the same for composite keys, and I don't know how to do that.

An array-of-composites is simply:
SELECT ARRAY[ ROW(1, 'one'), ROW(2, 'two') ]::composite_type[];

Thanks. I don't consider that "simple" myself :). But I'm definitely not an advanced PostgreSQL user!
Would still appreciate a more fleshed out demo, if anyone is kind enough to provide it. Thanks, --DD

Also, I'm still not sure how to write that WHERE clause, with the $1 being an array of a composite type.
And since this is a binary bind, what kind of OIDs to use. Your example seems to generate a type on-the-fly for example David.
Will we need to create custom types just so we have OIDs we can use to assemble the bytes of the array-of-composite bind?
I believe there's an implicit ROW type per table created. Are there also implicit types for composite PKs and/or UNIQUE constraints?
Lots of questions...

postgres=# select (1, 'one'::text) = any(array[(1, 'one'::text)::record]::record[]);

?column?
----------
t
(1 row)

Not sure how much that helps but there it is.

If you wanted to use an actual explicit composite type with an OID it would need to be created.

There where clause is the easy part, its the code side for setting the parameter that I don't know. Ideally the library lets you pass around language-specific objects and it does it for you.

You could consider writing out a JSONB object and writing your condition in terms of json operators/expressions.

David J.

Re: WHERE col = ANY($1) extended to 2 or more columns?

From

Alban Hertroys

Date:

09 February 2023, 16:28:37

> On 9 Feb 2023, at 16:41, Dominique Devienne <ddevienne@gmail.com> wrote:
>
> Hi. We are implementing an API which takes a list of row keys, and must return info about those rows. To implement
thatefficiently, in as few round-trips as possible, we bind a (binary) array of keys (ints, uuids, or strings) and that
worksgreat, but only if the key is a scalar one. 
>
> Now we'd like to do the same for composite keys, and I don't know how to do that.
> Is it possible? Could someone please help out or demo such a thing?
> We are doing it in C++ using libpq, but a pure SQL or PL/pgSQL demo would still help (I think).

This works:

=> select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1, 'one'::text), (2, 'two'::text));
 ?column?
----------
 t
(1 row)

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: WHERE col = ANY($1) extended to 2 or more columns?

From

"David G. Johnston"

Date:

09 February 2023, 16:37:29

On Thu, Feb 9, 2023 at 9:28 AM Alban Hertroys <haramrae@gmail.com> wrote:

> On 9 Feb 2023, at 16:41, Dominique Devienne <ddevienne@gmail.com> wrote:
>
> Hi. We are implementing an API which takes a list of row keys, and must return info about those rows. To implement that efficiently, in as few round-trips as possible, we bind a (binary) array of keys (ints, uuids, or strings) and that works great, but only if the key is a scalar one.
>
> Now we'd like to do the same for composite keys, and I don't know how to do that.
> Is it possible? Could someone please help out or demo such a thing?
> We are doing it in C++ using libpq, but a pure SQL or PL/pgSQL demo would still help (I think).

This works:

=> select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1, 'one'::text), (2, 'two'::text));
?column?
----------
t
(1 row)

But you cannot write the right-side of the IN as a single parameter which seems to be the primary constraint trying to be conformed to.

David J.

Re: WHERE col = ANY($1) extended to 2 or more columns?

From

Dominique Devienne

Date:

09 February 2023, 17:35:42

On Thu, Feb 9, 2023 at 5:37 PM David G. Johnston <david.g.johnston@gmail.com> wrote:

On Thu, Feb 9, 2023 at 9:28 AM Alban Hertroys <haramrae@gmail.com> wrote:
> On 9 Feb 2023, at 16:41, Dominique Devienne <ddevienne@gmail.com> wrote:
> Now we'd like to do the same for composite keys, and I don't know how to do that.

This works:
=> select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1, 'one'::text), (2, 'two'::text));
But you cannot write the right-side of the IN as a single parameter which seems to be the primary constraint trying to be conformed to.

Right. The goal is to (re)use a prepared statement (i.e. plan once), and bind the RHS (binary) array

and do a single exec (single round-trip) to get the matching rows. AFAIK, this is the fastest way.

If there's a better/faster way, I'm interested. --DD

Re: WHERE col = ANY($1) extended to 2 or more columns?

From

Alban Hertroys

Date:

10 February 2023, 08:14:19

> On 9 Feb 2023, at 18:35, Dominique Devienne <ddevienne@gmail.com> wrote:
>
> On Thu, Feb 9, 2023 at 5:37 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
> On Thu, Feb 9, 2023 at 9:28 AM Alban Hertroys <haramrae@gmail.com> wrote:
> > On 9 Feb 2023, at 16:41, Dominique Devienne <ddevienne@gmail.com> wrote:
> > Now we'd like to do the same for composite keys, and I don't know how to do that.
>
> This works:
> => select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1, 'one'::text), (2, 'two'::text));
> But you cannot write the right-side of the IN as a single parameter which seems to be the primary constraint trying
tobe conformed to. 
>
> Right. The goal is to (re)use a prepared statement (i.e. plan once), and bind the RHS (binary) array
> and do a single exec (single round-trip) to get the matching rows. AFAIK, this is the fastest way.
> If there's a better/faster way, I'm interested. --DD

How would an ORM like that push a list of tuples into a single query parameter though? Is that feasible?

Perhaps this is easier to use with an ORM then? It would need a list of placeholders for each item, but I suspect you
wouldneed that anyway… 

=> with v(col1, col2) as (
    values (1, 'two'::text), (2, 'one'::text), (1, 'one'::text), (2, 'two'::text)
)
select * from v where (col1, col2) = (1, 'one'::text);
 col1 | col2
------+------
    1 | one
(1 row)

This could be written as a join to a table with fixed values in the OP’s case.


Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: WHERE col = ANY($1) extended to 2 or more columns?

From

"Peter J. Holzer"

Date:

11 February 2023, 15:21:49

On 2023-02-09 18:35:42 +0100, Dominique Devienne wrote:
> On Thu, Feb 9, 2023 at 5:37 PM David G. Johnston <david.g.johnston@gmail.com>
> wrote:
>
>     On Thu, Feb 9, 2023 at 9:28 AM Alban Hertroys <haramrae@gmail.com> wrote:
>
>         > On 9 Feb 2023, at 16:41, Dominique Devienne <ddevienne@gmail.com>
>         wrote:
>         > Now we'd like to do the same for composite keys, and I don't know how
>         to do that.
>
>         This works:
>         => select (1, 'one'::text) in ((1, 'two'::text), (2, 'one'::text), (1,
>         'one'::text), (2, 'two'::text));
>
>     But you cannot write the right-side of the IN as a single parameter which
>     seems to be the primary constraint trying to be conformed to.
>
>
> Right. The goal is to (re)use a prepared statement (i.e. plan once), and bind
> the RHS (binary) array
> and do a single exec (single round-trip) to get the matching rows. AFAIK, this
> is the fastest way.

Planning time is often much less than execution time, so minimizing it
may not give you the fastest results.

For example I tried to fetch two data points from a medium sized (3 mio
rows) table ((macrobondtimeseries, date) is the primary key here):

wds=> explain (analyze) select macrobondtimeseries, date, value
from facttable_oecd_naq_2018
where (macrobondtimeseries, date) = any(
    array [
        ( 'naq_mex_b1_gi_cqr_a'     , '2013-01-01' ),
        ( 'naq_lux_p3s13_gpsa_a'    , '1961-01-01' )
    ]::tsd[]
);

╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                                                   QUERY PLAN
                        ║ 

╟────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Gather  (cost=1000.00..334945.37 rows=35242 width=34) (actual time=6.194..1618.968 rows=2 loops=1)
                        ║ 
║   Workers Planned: 2
                        ║ 
║   Workers Launched: 2
                        ║ 
║   ->  Parallel Seq Scan on facttable_oecd_naq_2018  (cost=0.00..330421.17 rows=14684 width=34) (actual
time=1054.739..1589.818rows=1 loops=3) ║ 
║         Filter: (ROW(macrobondtimeseries, date) = ANY
('{"(naq_mex_b1_gi_cqr_a,2013-01-01)","(naq_lux_p3s13_gpsa_a,1961-01-01)"}'::tsd[]))    ║ 
║         Rows Removed by Filter: 1178191
                        ║ 
║ Planning Time: 1.833 ms
                        ║ 
║ JIT:
                        ║ 
║   Functions: 12
                        ║ 
║   Options: Inlining false, Optimization false, Expressions true, Deforming true
                        ║ 
║   Timing: Generation 1.026 ms, Inlining 0.000 ms, Optimization 0.948 ms, Emission 12.613 ms, Total 14.586 ms
                        ║ 
║ Execution Time: 1619.729 ms
                        ║ 

╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
(12 rows)

wds=> explain (analyze) select macrobondtimeseries, date, value
from facttable_oecd_naq_2018
where (macrobondtimeseries, date) in
(
    ( 'naq_mex_b1_gi_cqr_a'     , '2013-01-01' ),
    ( 'naq_lux_p3s13_gpsa_a'    , '1961-01-01' )
);

╔═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                                                                                    QUERY PLAN
                                                                                           ║ 

╟───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Bitmap Heap Scan on facttable_oecd_naq_2018  (cost=4.13..7.17 rows=2 width=34) (actual time=0.259..0.263 rows=2
loops=1)                                                                                         ║ 
║   Recheck Cond: ((((macrobondtimeseries)::text = 'naq_mex_b1_gi_cqr_a'::text) AND (date = '2013-01-01'::date)) OR
(((macrobondtimeseries)::text= 'naq_lux_p3s13_gpsa_a'::text) AND (date = '1961-01-01'::date))) ║ 
║   Heap Blocks: exact=1
                                                                                           ║ 
║   ->  BitmapOr  (cost=4.13..4.13 rows=2 width=0) (actual time=0.184..0.185 rows=0 loops=1)
                                                                                           ║ 
║         ->  Bitmap Index Scan on facttable_oecd_naq_2018_pkey  (cost=0.00..2.06 rows=1 width=0) (actual
time=0.124..0.125rows=1 loops=1)                                                                         ║ 
║               Index Cond: (((macrobondtimeseries)::text = 'naq_mex_b1_gi_cqr_a'::text) AND (date =
'2013-01-01'::date))                                                                                          ║ 
║         ->  Bitmap Index Scan on facttable_oecd_naq_2018_pkey  (cost=0.00..2.06 rows=1 width=0) (actual
time=0.058..0.058rows=1 loops=1)                                                                         ║ 
║               Index Cond: (((macrobondtimeseries)::text = 'naq_lux_p3s13_gpsa_a'::text) AND (date =
'1961-01-01'::date))                                                                                         ║ 
║ Planning Time: 1.414 ms
                                                                                           ║ 
║ Execution Time: 0.330 ms
                                                                                           ║ 

╚═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
(10 rows)

The latter is almost 1000 times faster. Saving 1.8 ms on planning time
doesn't help you if you then waste 1.6 s on execution.

So sometimes it pays off to give the planner a little bit of extra
information to work on.

        hp


--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc

Re: WHERE col = ANY($1) extended to 2 or more columns?

From

"Peter J. Holzer"

Date:

11 February 2023, 22:00:31

On 2023-02-11 16:21:49 +0100, Peter J. Holzer wrote:
> On 2023-02-09 18:35:42 +0100, Dominique Devienne wrote:
> > Right. The goal is to (re)use a prepared statement (i.e. plan once), and bind
> > the RHS (binary) array
> > and do a single exec (single round-trip) to get the matching rows. AFAIK, this
> > is the fastest way.
>
> Planning time is often much less than execution time, so minimizing it
> may not give you the fastest results.
>
> For example I tried to fetch two data points from a medium sized (3 mio
> rows) table ((macrobondtimeseries, date) is the primary key here):
>
> wds=> explain (analyze) select macrobondtimeseries, date, value
> from facttable_oecd_naq_2018
> where (macrobondtimeseries, date) = any(
>     array [
>         ( 'naq_mex_b1_gi_cqr_a'     , '2013-01-01' ),
>         ( 'naq_lux_p3s13_gpsa_a'    , '1961-01-01' )
>     ]::tsd[]
> );
[Sequential scan]
> ║ Planning Time: 1.833 ms
> ║ Execution Time: 1619.729 ms
> ╚════════════════════════════════════════════════════════════════════
> (12 rows)
>
> wds=> explain (analyze) select macrobondtimeseries, date, value
> from facttable_oecd_naq_2018
> where (macrobondtimeseries, date) in
> (
>     ( 'naq_mex_b1_gi_cqr_a'     , '2013-01-01' ),
>     ( 'naq_lux_p3s13_gpsa_a'    , '1961-01-01' )
> );
[Bitmap index scan]
> ║ Planning Time: 1.414 ms
> ║ Execution Time: 0.330 ms
> ╚════════════════════════════════════════════════════════════════════
> (10 rows)
>
> The latter is almost 1000 times faster. Saving 1.8 ms on planning time
> doesn't help you if you then waste 1.6 s on execution.
>
> So sometimes it pays off to give the planner a little bit of extra
> information to work on.

And here is a variant which does achieve a good execution
plan with an array:

wds=> explain(analyze) with a as (
    select * from unnest (
        array [
            ( 'naq_mex_b1_gi_cqr_a'     , '2013-01-01' ),
            ( 'naq_lux_p3s13_gpsa_a'    , '1961-01-01' )
        ]::tsd[]
    )
)
select macrobondtimeseries, date, value
from a natural join facttable_oecd_naq_2018;

╔═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                                                         QUERY PLAN
                                     ║ 

╟─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Nested Loop  (cost=0.56..7.17 rows=1 width=44) (actual time=0.075..0.115 rows=2 loops=1)
                                     ║ 
║   ->  Function Scan on unnest  (cost=0.00..0.02 rows=2 width=36) (actual time=0.011..0.012 rows=2 loops=1)
                                     ║ 
║   ->  Index Scan using facttable_oecd_naq_2018_pkey on facttable_oecd_naq_2018  (cost=0.56..3.57 rows=1 width=34)
(actualtime=0.045..0.045 rows=1 loops=2) ║ 
║         Index Cond: (((macrobondtimeseries)::text = (unnest.macrobondtimeseries)::text) AND (date = unnest.date))
                                     ║ 
║ Planning Time: 1.061 ms
                                     ║ 
║ Execution Time: 0.172 ms
                                     ║ 

╚═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
(6 rows)

(At least with these parameters on this table with this version of
PostgreSQL. Your mileage may vary, of course.)

        hp


--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc