Re: Have I b0rked something? Slow comparisons on "where x in (...)" - Mailing list pgsql-general

From Listmail
Subject Re: Have I b0rked something? Slow comparisons on "where x in (...)"
Date
Msg-id op.trrgen13zcizji@apollo13
Whole thread Raw
In response to Re: Have I b0rked something? Slow comparisons on "where x in (...)"  (Richard Huxton <dev@archonet.com>)
List pgsql-general
> Try creating a temporary table, populating with the list and joining
> against it. That's probably your best bet for a long list of target
> values.

    Check :

forum_bench=> CREATE TABLE test (value INTEGER NOT NULL);
CREATE TABLE
forum_bench=> INSERT INTO test SELECT * FROM generate_series( 1, 1000000 );
INSERT 0 1000000
forum_bench=> ANALYZE test;
forum_bench=> EXPLAIN ANALYZE SELECT * FROM test;
                                                   QUERY PLAN
---------------------------------------------------------------------------------------------------------------
  Seq Scan on test  (cost=0.00..14405.24 rows=999924 width=4) (actual
time=0.030..349.699 rows=1000000 loops=1)
  Total runtime: 542.914 ms
(2 lignes)


    OK : 542 ms to grab the data.
    IN() :

EXPLAIN ANALYZE SELECT * FROM test WHERE value IN ( 1000 values from 0 to
999000 in steps of 1000 ):
  Seq Scan on test  (cost=0.00..1264310.24 rows=1000 width=4) (actual
time=17.649..17977.085 rows=999 loops=1)
    Filter: (value = ANY ('{0,1000..........99000}'::integer[]))
  Total runtime: 17978.061 ms

    Ouch.

forum_bench=> EXPLAIN ANALYZE SELECT * FROM test WHERE value IN (VALUES
(0),(1000),(2000),....................(998000),(999000));
                                                            QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=19.50..18176.45 rows=200 width=4) (actual
time=2.823..736.960 rows=999 loops=1)
    Hash Cond: (test.value = "*VALUES*".column1)
    ->  Seq Scan on test  (cost=0.00..14405.24 rows=999924 width=4) (actual
time=0.032..335.680 rows=1000000 loops=1)
    ->  Hash  (cost=17.00..17.00 rows=200 width=4) (actual
time=2.108..2.108 rows=1000 loops=1)
          ->  HashAggregate  (cost=15.00..17.00 rows=200 width=4) (actual
time=1.165..1.542 rows=1000 loops=1)
                ->  Values Scan on "*VALUES*"  (cost=0.00..12.50 rows=1000
width=4) (actual time=0.004..0.478 rows=1000 loops=1)
  Total runtime: 737.362 ms

    Removing the 542 ms to read the table, we see checking if the values are
in the hash is really rally fast.

    So, obvious truth : hash is faster than dumb compare. Much faster.
    Now, postgres should do this on its own, I think.

    PS : if the 1000 values are all the same (1000 times 1), IN() doesn't
detect it, so the runtime does not change. Hash join doesn't care, so the
runtime doesn't change either.

pgsql-general by date:

Previous
From: "Islam Hegazy"
Date:
Subject: Re: C functions under windows
Next
From: Listmail
Date:
Subject: Re: Have I b0rked something? Slow comparisons on "where x in (...)"