Re: SQL WHERE: many sql or large IN() - Mailing list pgsql-general

From tom
Subject Re: SQL WHERE: many sql or large IN()
Date
Msg-id 6140957A-5C19-4880-828B-6CB8391433C8@tacocat.net
Whole thread Raw
In response to Re: SQL WHERE: many sql or large IN()  (Listmail <lists@peufeu.com>)
Responses Re: SQL WHERE: many sql or large IN()  (Chris <dmagick@gmail.com>)
List pgsql-general
This is some good stuff and I can use the explain analyze going forward.
But I can't get these VALUES queries to work.

I checked and I am on version 8.1.  but I think from the docs that I
should still be able to do this.

queue=> select t.* from test t, (values(4, 23,84884,1,324234)) as v
where t.value = v.column1;
ERROR:  syntax error at or near ")" at character 54
LINE 1: ...ect t.* from test t, (values(4, 23,84884,1,324234)) as v
whe...

                                ^
Not sure if the carat is in the right place on this email, but it's
the second ')' that's the problem.


On Apr 6, 2007, at 2:26 PM, Listmail wrote:

> On Fri, 06 Apr 2007 18:45:15 +0200, Markus Schiltknecht
> <markus@bluegap.ch> wrote:
>
>> Hi,
>>
>> tom wrote:
>>> Initially it seems that the WHERE IN (...) approach takes a turn
>>> for the worse when the list gets very large.
>>
>
>     Since I use this a lot on webpages, I thought maybe a little
> benchmark is in order ?
>
> CREATE TABLE test (id SERIAL PRIMARY KEY, value TEXT NOT NULL );
> INSERT INTO test (value) SELECT * FROM generate_series( 1, 1000000 );
> CREATE INDEX test_value ON test( value );
> ANALYZE test;
>
>     My script runs EXPLAIN ANALYZE 10 times and keeps the fastest
> timing, then displays it.
>     Then it executes the query enough times so that 1 second is
> elapsed (also fetching the results), then prints the real runtime.
>
>     Well, the table is clustered (since it was created from
> generate_series) so of course, bitmap index scan rules.
>     Get the first 300 rows of the table using column 'value' which is
> an indexed TEXT column.
>     (using 'id' which is an integer is a bit faster)
>
> SELECT * FROM test WHERE value IN <300 values>
> Bitmap Heap Scan on test  (cost=1225.83..2317.08 rows=300 width=13)
> (actual time=3.736..3.807 rows=300 loops=1)
>   Recheck Cond: (value = ANY ('{<300 values>}'::text[]))
>   ->  Bitmap Index Scan on test_value  (cost=0.00..1225.75 rows=300
> width=0) (actual time=3.717..3.717 rows=300 loops=1)
>         Index Cond: (value = ANY ('{<300 values>}'::text[]))
> Explain Analyze runtime: 1 x 3.896 ms = 3.896 ms
> Real runtime: 1 x 6.027 ms = 6.027 ms (timed on 257 iterations)
> ----------------------------------------
> SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE
> t.value=v.column1
> Nested Loop  (cost=0.00..2447.27 rows=300 width=13) (actual
> time=0.035..4.724 rows=300 loops=1)
>   ->  Values Scan on "*VALUES*"  (cost=0.00..3.75 rows=300
> width=32) (actual time=0.002..0.219 rows=300 loops=1)
>   ->  Index Scan using test_value on test t  (cost=0.00..8.13
> rows=1 width=13) (actual time=0.013..0.014 rows=1 loops=300)
>         Index Cond: (t.value = "*VALUES*".column1)
> Explain Analyze runtime: 1 x 4.814 ms = 4.814 ms
> Real runtime: 1 x 6.786 ms = 6.786 ms (timed on 208 iterations)
> ----------------------------------------
> SELECT * FROM test WHERE value='1'
> Index Scan using test_value on test  (cost=0.00..8.40 rows=1
> width=13) (actual time=0.014..0.015 rows=1 loops=1)
>   Index Cond: (value = '1'::text)
> Explain Analyze runtime: 300 x 0.032 ms = 9.600 ms
> Real runtime: 300 x 0.149 ms = 44.843 ms (timed on 31251 iterations)
>
>     Now if we ask for 300 random rows out of the million in the table,
> which is a lot more likely situation...
>
> SELECT * FROM test WHERE value IN <300 values>
> Bitmap Heap Scan on test  (cost=1225.83..2317.08 rows=300 width=13)
> (actual time=4.516..4.945 rows=300 loops=1)
>   Recheck Cond: (value = ANY ('{<300 values>}'::text[]))
>   ->  Bitmap Index Scan on test_value  (cost=0.00..1225.75 rows=300
> width=0) (actual time=4.451..4.451 rows=300 loops=1)
>         Index Cond: (value = ANY ('{<300 values>}'::text[]))
> Explain Analyze runtime: 1 x 5.034 ms = 5.034 ms
> Real runtime: 1 x 7.278 ms = 7.278 ms (timed on 199 iterations)
> ----------------------------------------
> SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE
> t.value=v.column1
> Nested Loop  (cost=0.00..2447.27 rows=300 width=13) (actual
> time=0.046..5.503 rows=300 loops=1)
>   ->  Values Scan on "*VALUES*"  (cost=0.00..3.75 rows=300
> width=32) (actual time=0.001..0.200 rows=300 loops=1)
>   ->  Index Scan using test_value on test t  (cost=0.00..8.13
> rows=1 width=13) (actual time=0.016..0.016 rows=1 loops=300)
>         Index Cond: (t.value = "*VALUES*".column1)
> Explain Analyze runtime: 1 x 5.625 ms = 5.625 ms
> Real runtime: 1 x 7.572 ms = 7.572 ms (timed on 178 iterations)
>
>     Doing a query per row is a lot slower than it appears in EXPLAIN
> ANALYZE when you add all the overhead. In fact it sucks. That was
> ecpected of course. I could tell you about when I got this php/
> mysql website where the previous developer's idea of a join was a
> select in a PHP for() loop because "joins are slow". It was hell.
>
>     I tried to query 1, 2, 10, 1000, 10k and 100k random rows (thats
> abuse) and :
>     - one query per row is (obviously) the slowest unless you select
> only one row
>     - IN reverts to a single index scan if it contains one value, else
> it always uses a bitmap scan
>     - VALUES join uses nested loop until about 100k rows where it
> switches to hash join
>
>     I also tried on a real table with many columns. In that case,
> there are less tuples per page, and bitmap scans are more efficient
> than nested loops, so IN wins.
>     So, IN() does not turn to crap anymore like it used to !
>
>     However, if you are asking for this question because you don't
> want to use temporary tables, it seems that temp tables have been
> upgraded so they are now usable even in a webpage (well, not ALL
> webpages, but that big nasty slow webpage, you know which one I'm
> talking about).
>
>
>
>
>
>
>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>


pgsql-general by date:

Previous
From: "John D. Burger"
Date:
Subject: New to concurrency
Next
From: Marcus Engene
Date:
Subject: Re: performance; disk bad or something?