Re: SQL WHERE: many sql or large IN() - Mailing list pgsql-general
From | tom |
---|---|
Subject | Re: SQL WHERE: many sql or large IN() |
Date | |
Msg-id | 6140957A-5C19-4880-828B-6CB8391433C8@tacocat.net Whole thread Raw |
In response to | Re: SQL WHERE: many sql or large IN() (Listmail <lists@peufeu.com>) |
Responses |
Re: SQL WHERE: many sql or large IN()
(Chris <dmagick@gmail.com>)
|
List | pgsql-general |
This is some good stuff and I can use the explain analyze going forward. But I can't get these VALUES queries to work. I checked and I am on version 8.1. but I think from the docs that I should still be able to do this. queue=> select t.* from test t, (values(4, 23,84884,1,324234)) as v where t.value = v.column1; ERROR: syntax error at or near ")" at character 54 LINE 1: ...ect t.* from test t, (values(4, 23,84884,1,324234)) as v whe... ^ Not sure if the carat is in the right place on this email, but it's the second ')' that's the problem. On Apr 6, 2007, at 2:26 PM, Listmail wrote: > On Fri, 06 Apr 2007 18:45:15 +0200, Markus Schiltknecht > <markus@bluegap.ch> wrote: > >> Hi, >> >> tom wrote: >>> Initially it seems that the WHERE IN (...) approach takes a turn >>> for the worse when the list gets very large. >> > > Since I use this a lot on webpages, I thought maybe a little > benchmark is in order ? > > CREATE TABLE test (id SERIAL PRIMARY KEY, value TEXT NOT NULL ); > INSERT INTO test (value) SELECT * FROM generate_series( 1, 1000000 ); > CREATE INDEX test_value ON test( value ); > ANALYZE test; > > My script runs EXPLAIN ANALYZE 10 times and keeps the fastest > timing, then displays it. > Then it executes the query enough times so that 1 second is > elapsed (also fetching the results), then prints the real runtime. > > Well, the table is clustered (since it was created from > generate_series) so of course, bitmap index scan rules. > Get the first 300 rows of the table using column 'value' which is > an indexed TEXT column. > (using 'id' which is an integer is a bit faster) > > SELECT * FROM test WHERE value IN <300 values> > Bitmap Heap Scan on test (cost=1225.83..2317.08 rows=300 width=13) > (actual time=3.736..3.807 rows=300 loops=1) > Recheck Cond: (value = ANY ('{<300 values>}'::text[])) > -> Bitmap Index Scan on test_value (cost=0.00..1225.75 rows=300 > width=0) (actual time=3.717..3.717 rows=300 loops=1) > Index Cond: (value = ANY ('{<300 values>}'::text[])) > Explain Analyze runtime: 1 x 3.896 ms = 3.896 ms > Real runtime: 1 x 6.027 ms = 6.027 ms (timed on 257 iterations) > ---------------------------------------- > SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE > t.value=v.column1 > Nested Loop (cost=0.00..2447.27 rows=300 width=13) (actual > time=0.035..4.724 rows=300 loops=1) > -> Values Scan on "*VALUES*" (cost=0.00..3.75 rows=300 > width=32) (actual time=0.002..0.219 rows=300 loops=1) > -> Index Scan using test_value on test t (cost=0.00..8.13 > rows=1 width=13) (actual time=0.013..0.014 rows=1 loops=300) > Index Cond: (t.value = "*VALUES*".column1) > Explain Analyze runtime: 1 x 4.814 ms = 4.814 ms > Real runtime: 1 x 6.786 ms = 6.786 ms (timed on 208 iterations) > ---------------------------------------- > SELECT * FROM test WHERE value='1' > Index Scan using test_value on test (cost=0.00..8.40 rows=1 > width=13) (actual time=0.014..0.015 rows=1 loops=1) > Index Cond: (value = '1'::text) > Explain Analyze runtime: 300 x 0.032 ms = 9.600 ms > Real runtime: 300 x 0.149 ms = 44.843 ms (timed on 31251 iterations) > > Now if we ask for 300 random rows out of the million in the table, > which is a lot more likely situation... > > SELECT * FROM test WHERE value IN <300 values> > Bitmap Heap Scan on test (cost=1225.83..2317.08 rows=300 width=13) > (actual time=4.516..4.945 rows=300 loops=1) > Recheck Cond: (value = ANY ('{<300 values>}'::text[])) > -> Bitmap Index Scan on test_value (cost=0.00..1225.75 rows=300 > width=0) (actual time=4.451..4.451 rows=300 loops=1) > Index Cond: (value = ANY ('{<300 values>}'::text[])) > Explain Analyze runtime: 1 x 5.034 ms = 5.034 ms > Real runtime: 1 x 7.278 ms = 7.278 ms (timed on 199 iterations) > ---------------------------------------- > SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE > t.value=v.column1 > Nested Loop (cost=0.00..2447.27 rows=300 width=13) (actual > time=0.046..5.503 rows=300 loops=1) > -> Values Scan on "*VALUES*" (cost=0.00..3.75 rows=300 > width=32) (actual time=0.001..0.200 rows=300 loops=1) > -> Index Scan using test_value on test t (cost=0.00..8.13 > rows=1 width=13) (actual time=0.016..0.016 rows=1 loops=300) > Index Cond: (t.value = "*VALUES*".column1) > Explain Analyze runtime: 1 x 5.625 ms = 5.625 ms > Real runtime: 1 x 7.572 ms = 7.572 ms (timed on 178 iterations) > > Doing a query per row is a lot slower than it appears in EXPLAIN > ANALYZE when you add all the overhead. In fact it sucks. That was > ecpected of course. I could tell you about when I got this php/ > mysql website where the previous developer's idea of a join was a > select in a PHP for() loop because "joins are slow". It was hell. > > I tried to query 1, 2, 10, 1000, 10k and 100k random rows (thats > abuse) and : > - one query per row is (obviously) the slowest unless you select > only one row > - IN reverts to a single index scan if it contains one value, else > it always uses a bitmap scan > - VALUES join uses nested loop until about 100k rows where it > switches to hash join > > I also tried on a real table with many columns. In that case, > there are less tuples per page, and bitmap scans are more efficient > than nested loops, so IN wins. > So, IN() does not turn to crap anymore like it used to ! > > However, if you are asking for this question because you don't > want to use temporary tables, it seems that temp tables have been > upgraded so they are now usable even in a webpage (well, not ALL > webpages, but that big nasty slow webpage, you know which one I'm > talking about). > > > > > > > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
pgsql-general by date: