Home > mailing lists

How bad is using queries with thousands of values for operators IN or ANY? - Mailing list pgsql-general

From	Thorsten Schöning
Subject	How bad is using queries with thousands of values for operators IN or ANY?
Date	August 31, 2020 11:04:33
Msg-id	1697696276.20200831100433@am-soft.de Whole thread Raw
Responses	Re: How bad is using queries with thousands of values for operators IN or ANY? (Tim Cross <theophilusx@gmail.com>) Re: How bad is using queries with thousands of values for operators IN or ANY? (Pavel Stehule <pavel.stehule@gmail.com>)
List	pgsql-general

Tree view

Hi all,

I have lots of queries in which I need to restrict access to rows
using some decimal row-ID and am mostly doing so with using the
operator IN in WHERE-clauses. Additionally I'm mostly embedding the
IDs as ","-seperated list into the query directly, e.g. because I
already hit a limitation of ~32k parameters of the JDBC-driver[1] for
Postgres.

I really thought that in most cases simply sending a large amount of
IDs embedded into the query is better than looping, because it safes
roundtrips to access the DB, the planner of the DB has all pieces of
information it needs to decide best strategies etc. OTOH, with recent
tests and an increased number of IDs of about factor 100, I have
additional load in Tomcat before actually sending the query to the DB
already and in the DB itself as well of course. I've attached an
example query and plan.

> ->  Hash  (cost=242592.66..242592.66 rows=6825 width=39) (actual time=91.117..91.117 rows=40044 loops=3)
>       Buckets: 65536 (originally 8192)  Batches: 1 (originally 1)  Memory Usage: 3016kB
>       ->  Hash Join  (cost=137.57..242592.66 rows=6825 width=39) (actual time=10.194..82.412 rows=40044 loops=3)
>             Hash Cond: (meter.meter_bcd = meter_bcd.id)
>             ->  Index Scan using pk_meter on meter  (cost=0.42..242237.10 rows=40044 width=25) (actual
time=9.350..71.276rows=40044 loops=3) 
>                   Index Cond: (id = ANY ('{[...]}'::integer[]))
>             ->  Hash  (cost=135.73..135.73 rows=113 width=22) (actual time=0.830..0.830 rows=113 loops=3)

Do you know of any obvious limitations of the JDBC-driver of handling
such large queries? In the end, the query is mostly large text with
only very few bind parameters.

Do you know of any obvious problem in Postgres itself with that query,
when parsing it or alike? Do things simply take how long they take and
are mostly comparable to looping or is there some additional overhead
the larger the query itself gets? From my naive expectation, comparing
IDs shouldn't care if things get looped or transmitted at once.

I'm just trying to collect some input for where to look at to optimize
things in the future. Thanks!

[1]: https://github.com/pgjdbc/pgjdbc/issues/90

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning       E-Mail: Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Attachment

attachment.zip

pgsql-general by date:

From: Laurenz Albe
Date: 31 August 2020, 10:42:21
Subject: Re: When are largobject records TOASTed into pg_toast_2613?

From: Tim Cross
Date: 31 August 2020, 11:21:00
Subject: Re: How bad is using queries with thousands of values for operators IN or ANY?

How bad is using queries with thousands of values for operators IN or ANY? - Mailing list pgsql-general

Attachment

Previous

Next