Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions - Mailing list pgsql-general

From Jeff Janes
Subject Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions
Date
Msg-id CAMkU=1xjPzenssKktcX3pqTt9EpJd6nECAHLxRg8_fgGc9VT_Q@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wr

regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id IN
:values_clause;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=245006.16..245006.17 rows=1 width=8) (actual time=3550.581..3550.581 rows=1 loops=1)
 Execution time: 3550.700 ms

 
 

regression=# set enable_hashagg TO 0;
regression=# set enable_sort TO 0;
SET
regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id IN
:values_clause;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=320003.90..320003.91 rows=1 width=8) (actual time=3548.364..3548.364 rows=1 loops=1)
 Execution time: 3548.463 ms


 
At least in this example, the actual runtimes are basically identical
regardless, so there is no great point in sweating over it.


Since The run times are equal, but one is estimated to be 30% more expensive, I think there is at least some little reason to sweat over it.

Incidentally, I accidentally ran this against a server running with your patch from https://www.postgresql.org/message-id/10078.1471955305@sss.pgh.pa.us.  On that server, it did choose the semi-join.  But I have no idea why, as it seems like the effect of that patch would have been to change the distinct estimate from the magic hard-coded 200, to the natural 200 coming from the query itself.  Why would that affect the cost?

Cheers,

Jeff

pgsql-general by date:

Previous
From: Ed Behn
Date:
Subject: [GENERAL] Partitioned TEMP tables
Next
From: Jeff Janes
Date:
Subject: Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions