Thread: SQL WHERE: many sql or large IN()

SQL WHERE: many sql or large IN()

From

tom

Date:

06 April 2007, 09:59:43

I'm wondering where the differences are in running two different
types of SQL statements.

Given ~300 tokens/words I can either run 1 sql statement with a large
list in a "WHERE foo IN (...300 tokens...)"
or I can run ~300 statements, one for each token.
In the first case, the SQL is not prepared, but just executed.
In the second case, the SQL is prepared and run as a cached execution
plan (I think).

Now.  It would seem that the second approach would be painfully
slow.  But I'm not sure that I'm seeing this.
Currently I have <5 users.  As always, this might change...

Before I start going about coding and testing lots of stuff I thought
I would ask for some historical experiences someone might have had
when comparing these two approaches and if there are inflection
points between the performance in terms of the number of tokens or
simultaneous users.

I should add that the tokens are either indexed or primary indexed
but in both cases, unique, and not guaranteed to exist in every case.

Initially it seems that the WHERE IN (...) approach takes a turn for
the worse when the list gets very large.
It also seems to do comparatively worse when the number of tokens is
very small.
But I can't claim any scientifically sound basis for making this
distinction.

Any experiences someone would like to share?

Re: SQL WHERE: many sql or large IN()

From

Oleg Bartunov

Date:

06 April 2007, 10:11:02

Tom,

have you seen contrib/intarray ?

Oleg
On Fri, 6 Apr 2007, tom wrote:

> I'm wondering where the differences are in running two different types of SQL
> statements.
>
> Given ~300 tokens/words I can either run 1 sql statement with a large list in
> a "WHERE foo IN (...300 tokens...)"
> or I can run ~300 statements, one for each token.
> In the first case, the SQL is not prepared, but just executed.
> In the second case, the SQL is prepared and run as a cached execution plan (I
> think).
>
> Now.  It would seem that the second approach would be painfully slow.  But
> I'm not sure that I'm seeing this.
> Currently I have <5 users.  As always, this might change...
>
> Before I start going about coding and testing lots of stuff I thought I would
> ask for some historical experiences someone might have had when comparing
> these two approaches and if there are inflection points between the
> performance in terms of the number of tokens or simultaneous users.
>
> I should add that the tokens are either indexed or primary indexed but in
> both cases, unique, and not guaranteed to exist in every case.
>
> Initially it seems that the WHERE IN (...) approach takes a turn for the
> worse when the list gets very large.
> It also seems to do comparatively worse when the number of tokens is very
> small.
> But I can't claim any scientifically sound basis for making this distinction.
>
> Any experiences someone would like to share?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>     subscribe-nomail command to majordomo@postgresql.org so that your
>     message can get through to the mailing list cleanly

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: SQL WHERE: many sql or large IN()

From

tom

Date:

06 April 2007, 10:23:59

If I read this right, intarray is for reading values from an array
data type.

I don't have this.

I have a varchar() field that is indexed (unique), call it 'foo'

I have a choice of running:

SELECT bar FROM tokens WHERE foo IN ('apple','orange','biscuit'....)
for up to ~300 words

OR

SELECT bar FROM tokens WHERE foo = 'apple' up to ~300 times as a
prepared/cached SQL statements.



On Apr 6, 2007, at 9:10 AM, Oleg Bartunov wrote:

> Tom,
>
> have you seen contrib/intarray ?
>
> Oleg
> On Fri, 6 Apr 2007, tom wrote:
>
>> I'm wondering where the differences are in running two different
>> types of SQL statements.
>>
>> Given ~300 tokens/words I can either run 1 sql statement with a
>> large list in a "WHERE foo IN (...300 tokens...)"
>> or I can run ~300 statements, one for each token.
>> In the first case, the SQL is not prepared, but just executed.
>> In the second case, the SQL is prepared and run as a cached
>> execution plan (I think).
>>
>> Now.  It would seem that the second approach would be painfully
>> slow.  But I'm not sure that I'm seeing this.
>> Currently I have <5 users.  As always, this might change...
>>
>> Before I start going about coding and testing lots of stuff I
>> thought I would ask for some historical experiences someone might
>> have had when comparing these two approaches and if there are
>> inflection points between the performance in terms of the number
>> of tokens or simultaneous users.
>>
>> I should add that the tokens are either indexed or primary indexed
>> but in both cases, unique, and not guaranteed to exist in every case.
>>
>> Initially it seems that the WHERE IN (...) approach takes a turn
>> for the worse when the list gets very large.
>> It also seems to do comparatively worse when the number of tokens
>> is very small.
>> But I can't claim any scientifically sound basis for making this
>> distinction.
>>
>> Any experiences someone would like to share?
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>     subscribe-nomail command to majordomo@postgresql.org so that your
>>     message can get through to the mailing list cleanly
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

Re: SQL WHERE: many sql or large IN()

From

Oleg Bartunov

Date:

06 April 2007, 10:36:01

Aha,

then why not use gin index for text[] ?

see, for example, my testing
http://www.sai.msu.su/~megera/wiki/GinTest

oleg
On Fri, 6 Apr 2007, tom wrote:

> If I read this right, intarray is for reading values from an array data type.
>
> I don't have this.
>
> I have a varchar() field that is indexed (unique), call it 'foo'
>
> I have a choice of running:
>
> SELECT bar FROM tokens WHERE foo IN ('apple','orange','biscuit'....) for up
> to ~300 words
>
> OR
>
> SELECT bar FROM tokens WHERE foo = 'apple' up to ~300 times as a
> prepared/cached SQL statements.
>
>
>
> On Apr 6, 2007, at 9:10 AM, Oleg Bartunov wrote:
>
>> Tom,
>>
>> have you seen contrib/intarray ?
>>
>> Oleg
>> On Fri, 6 Apr 2007, tom wrote:
>>
>>> I'm wondering where the differences are in running two different types of
>>> SQL statements.
>>>
>>> Given ~300 tokens/words I can either run 1 sql statement with a large list
>>> in a "WHERE foo IN (...300 tokens...)"
>>> or I can run ~300 statements, one for each token.
>>> In the first case, the SQL is not prepared, but just executed.
>>> In the second case, the SQL is prepared and run as a cached execution plan
>>> (I think).
>>>
>>> Now.  It would seem that the second approach would be painfully slow.  But
>>> I'm not sure that I'm seeing this.
>>> Currently I have <5 users.  As always, this might change...
>>>
>>> Before I start going about coding and testing lots of stuff I thought I
>>> would ask for some historical experiences someone might have had when
>>> comparing these two approaches and if there are inflection points between
>>> the performance in terms of the number of tokens or simultaneous users.
>>>
>>> I should add that the tokens are either indexed or primary indexed but in
>>> both cases, unique, and not guaranteed to exist in every case.
>>>
>>> Initially it seems that the WHERE IN (...) approach takes a turn for the
>>> worse when the list gets very large.
>>> It also seems to do comparatively worse when the number of tokens is very
>>> small.
>>> But I can't claim any scientifically sound basis for making this
>>> distinction.
>>>
>>> Any experiences someone would like to share?
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>>    subscribe-nomail command to majordomo@postgresql.org so that your
>>>    message can get through to the mailing list cleanly
>>
>>     Regards,
>>         Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: SQL WHERE: many sql or large IN()

From

Listmail

Date:

06 April 2007, 11:07:59


> I have a choice of running:
>
> SELECT bar FROM tokens WHERE foo IN ('apple','orange','biscuit'....) for
> up to ~300 words
>
> OR
>
> SELECT bar FROM tokens WHERE foo = 'apple' up to ~300 times as a
> prepared/cached SQL statements.

    With new PG versions you can also use VALUES which will save you a hash
if you know your keys are unique.
    Example use integers but you can use anything. Just like a normal join
 from a table.
    Putting 300 values in VALUES is certainly a LOT faster than doing 300
individual SELECTs !

test=> EXPLAIN ANALYZE SELECT t.* FROM test t, ( VALUES (1),(2) ) AS v
WHERE t.id=v.column1;
                                                        QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.00..16.76 rows=2 width=8) (actual time=0.029..0.039
rows=2 loops=1)
    ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4) (actual
time=0.004..0.005 rows=2 loops=1)
    ->  Index Scan using test_pkey on test t  (cost=0.00..8.36 rows=1
width=8) (actual time=0.013..0.015 rows=1 loops=2)
          Index Cond: (t.id = "*VALUES*".column1)
  Total runtime: 0.085 ms

Re: SQL WHERE: many sql or large IN()

From

Markus Schiltknecht

Date:

06 April 2007, 13:45:29

Hi,

tom wrote:
> Initially it seems that the WHERE IN (...) approach takes a turn for the
> worse when the list gets very large.

What version do you use? PostgreSQL 8.2 had great improvements for that
specific issue. Did you try EXPLAIN?

Regards

Markus

Re: SQL WHERE: many sql or large IN()

From

Listmail

Date:

06 April 2007, 15:26:28

On Fri, 06 Apr 2007 18:45:15 +0200, Markus Schiltknecht
<markus@bluegap.ch> wrote:

> Hi,
>
> tom wrote:
>> Initially it seems that the WHERE IN (...) approach takes a turn for
>> the worse when the list gets very large.
>

    Since I use this a lot on webpages, I thought maybe a little benchmark is
in order ?

CREATE TABLE test (id SERIAL PRIMARY KEY, value TEXT NOT NULL );
INSERT INTO test (value) SELECT * FROM generate_series( 1, 1000000 );
CREATE INDEX test_value ON test( value );
ANALYZE test;

    My script runs EXPLAIN ANALYZE 10 times and keeps the fastest timing,
then displays it.
    Then it executes the query enough times so that 1 second is elapsed (also
fetching the results), then prints the real runtime.

    Well, the table is clustered (since it was created from generate_series)
so of course, bitmap index scan rules.
    Get the first 300 rows of the table using column 'value' which is an
indexed TEXT column.
    (using 'id' which is an integer is a bit faster)

SELECT * FROM test WHERE value IN <300 values>
Bitmap Heap Scan on test  (cost=1225.83..2317.08 rows=300 width=13)
(actual time=3.736..3.807 rows=300 loops=1)
   Recheck Cond: (value = ANY ('{<300 values>}'::text[]))
   ->  Bitmap Index Scan on test_value  (cost=0.00..1225.75 rows=300
width=0) (actual time=3.717..3.717 rows=300 loops=1)
         Index Cond: (value = ANY ('{<300 values>}'::text[]))
Explain Analyze runtime: 1 x 3.896 ms = 3.896 ms
Real runtime: 1 x 6.027 ms = 6.027 ms (timed on 257 iterations)
----------------------------------------
SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE
t.value=v.column1
Nested Loop  (cost=0.00..2447.27 rows=300 width=13) (actual
time=0.035..4.724 rows=300 loops=1)
   ->  Values Scan on "*VALUES*"  (cost=0.00..3.75 rows=300 width=32)
(actual time=0.002..0.219 rows=300 loops=1)
   ->  Index Scan using test_value on test t  (cost=0.00..8.13 rows=1
width=13) (actual time=0.013..0.014 rows=1 loops=300)
         Index Cond: (t.value = "*VALUES*".column1)
Explain Analyze runtime: 1 x 4.814 ms = 4.814 ms
Real runtime: 1 x 6.786 ms = 6.786 ms (timed on 208 iterations)
----------------------------------------
SELECT * FROM test WHERE value='1'
Index Scan using test_value on test  (cost=0.00..8.40 rows=1 width=13)
(actual time=0.014..0.015 rows=1 loops=1)
   Index Cond: (value = '1'::text)
Explain Analyze runtime: 300 x 0.032 ms = 9.600 ms
Real runtime: 300 x 0.149 ms = 44.843 ms (timed on 31251 iterations)

    Now if we ask for 300 random rows out of the million in the table, which
is a lot more likely situation...

SELECT * FROM test WHERE value IN <300 values>
Bitmap Heap Scan on test  (cost=1225.83..2317.08 rows=300 width=13)
(actual time=4.516..4.945 rows=300 loops=1)
   Recheck Cond: (value = ANY ('{<300 values>}'::text[]))
   ->  Bitmap Index Scan on test_value  (cost=0.00..1225.75 rows=300
width=0) (actual time=4.451..4.451 rows=300 loops=1)
         Index Cond: (value = ANY ('{<300 values>}'::text[]))
Explain Analyze runtime: 1 x 5.034 ms = 5.034 ms
Real runtime: 1 x 7.278 ms = 7.278 ms (timed on 199 iterations)
----------------------------------------
SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE
t.value=v.column1
Nested Loop  (cost=0.00..2447.27 rows=300 width=13) (actual
time=0.046..5.503 rows=300 loops=1)
   ->  Values Scan on "*VALUES*"  (cost=0.00..3.75 rows=300 width=32)
(actual time=0.001..0.200 rows=300 loops=1)
   ->  Index Scan using test_value on test t  (cost=0.00..8.13 rows=1
width=13) (actual time=0.016..0.016 rows=1 loops=300)
         Index Cond: (t.value = "*VALUES*".column1)
Explain Analyze runtime: 1 x 5.625 ms = 5.625 ms
Real runtime: 1 x 7.572 ms = 7.572 ms (timed on 178 iterations)

    Doing a query per row is a lot slower than it appears in EXPLAIN ANALYZE
when you add all the overhead. In fact it sucks. That was ecpected of
course. I could tell you about when I got this php/mysql website where the
previous developer's idea of a join was a select in a PHP for() loop
because "joins are slow". It was hell.

    I tried to query 1, 2, 10, 1000, 10k and 100k random rows (thats abuse)
and :
    - one query per row is (obviously) the slowest unless you select only one
row
    - IN reverts to a single index scan if it contains one value, else it
always uses a bitmap scan
    - VALUES join uses nested loop until about 100k rows where it switches to
hash join

    I also tried on a real table with many columns. In that case, there are
less tuples per page, and bitmap scans are more efficient than nested
loops, so IN wins.
    So, IN() does not turn to crap anymore like it used to !

    However, if you are asking for this question because you don't want to
use temporary tables, it seems that temp tables have been upgraded so they
are now usable even in a webpage (well, not ALL webpages, but that big
nasty slow webpage, you know which one I'm talking about).

Re: SQL WHERE: many sql or large IN()

From

tom

Date:

06 April 2007, 15:29:33

I've never seen this before.
Is this PG specific or generic SQL that I've never been exposed to?


On Apr 6, 2007, at 10:08 AM, Listmail wrote:

>
>
>> I have a choice of running:
>>
>> SELECT bar FROM tokens WHERE foo IN
>> ('apple','orange','biscuit'....) for up to ~300 words
>>
>> OR
>>
>> SELECT bar FROM tokens WHERE foo = 'apple' up to ~300 times as a
>> prepared/cached SQL statements.
>
>     With new PG versions you can also use VALUES which will save you a
> hash if you know your keys are unique.
>     Example use integers but you can use anything. Just like a normal
> join from a table.
>     Putting 300 values in VALUES is certainly a LOT faster than doing
> 300 individual SELECTs !
>
> test=> EXPLAIN ANALYZE SELECT t.* FROM test t, ( VALUES (1),(2) )
> AS v WHERE t.id=v.column1;
>                                                        QUERY PLAN
> ----------------------------------------------------------------------
> --------------------------------------------------
>  Nested Loop  (cost=0.00..16.76 rows=2 width=8) (actual
> time=0.029...0.039 rows=2 loops=1)
>    ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)
> (actual time=0.004..0.005 rows=2 loops=1)
>    ->  Index Scan using test_pkey on test t  (cost=0.00..8.36
> rows=1 width=8) (actual time=0.013..0.015 rows=1 loops=2)
>          Index Cond: (t.id = "*VALUES*".column1)
>  Total runtime: 0.085 ms
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>

Re: SQL WHERE: many sql or large IN()

From

Listmail

Date:

06 April 2007, 16:02:58

> I've never seen this before.
> Is this PG specific or generic SQL that I've never been exposed to?

    http://www.postgresql.org/docs/8.2/interactive/sql-values.html
    VALUES conforms to the SQL standard, except that LIMIT and OFFSET are
PostgreSQL extensions.

    It doesn't seem like much at first glance, but it's really useful. It's
your kleenex temp table. It can select the rows you want from a table in
the order you want, update a table, replacing several values with several
others (as a before=>after correspondence table) and many other things. It
is more powerful than IN() because you can use columns of the VALUES() in
any part of your query, updating, ordering, etc.

Re: SQL WHERE: many sql or large IN()

From

Tom Lane

Date:

06 April 2007, 16:33:56

tom <tom@tacocat.net> writes:
> I've never seen this before.
> Is this PG specific or generic SQL that I've never been exposed to?

It's in the SQL standard.  SQL92 saith

         <table value constructor> ::=
              VALUES <table value constructor list>

         <table value constructor list> ::=
              <row value constructor> [ { <comma> <row value constructor> }... ]

and lists this as an alternative to <query specification> (ie, a SELECT)
in the query grammar.  So you can write VALUES anywhere you could write
SELECT.  However, the spec also allows "entry level SQL" implementations
to dumb it down pretty far:

         1) The following restrictions apply for Intermediate SQL:

            a) A <table value constructor> shall contain exactly one <row
              value constructor> that shall be of the form "(<row value
              constructor list>)".

            b) A <table value constructor> shall be the <query expression>
              of an <insert statement>.

         2) The following restrictions apply for Entry SQL in addition to
            any Intermediate SQL restrictions:

              None.

What we had up to 8.2 was the entry-level definition.  I couldn't tell
you which other DBMSes support the full definition...

            regards, tom lane

Re: SQL WHERE: many sql or large IN()

From

tom

Date:

06 April 2007, 21:49:19

This is some good stuff and I can use the explain analyze going forward.
But I can't get these VALUES queries to work.

I checked and I am on version 8.1.  but I think from the docs that I
should still be able to do this.

queue=> select t.* from test t, (values(4, 23,84884,1,324234)) as v
where t.value = v.column1;
ERROR:  syntax error at or near ")" at character 54
LINE 1: ...ect t.* from test t, (values(4, 23,84884,1,324234)) as v
whe...

                                ^
Not sure if the carat is in the right place on this email, but it's
the second ')' that's the problem.


On Apr 6, 2007, at 2:26 PM, Listmail wrote:

> On Fri, 06 Apr 2007 18:45:15 +0200, Markus Schiltknecht
> <markus@bluegap.ch> wrote:
>
>> Hi,
>>
>> tom wrote:
>>> Initially it seems that the WHERE IN (...) approach takes a turn
>>> for the worse when the list gets very large.
>>
>
>     Since I use this a lot on webpages, I thought maybe a little
> benchmark is in order ?
>
> CREATE TABLE test (id SERIAL PRIMARY KEY, value TEXT NOT NULL );
> INSERT INTO test (value) SELECT * FROM generate_series( 1, 1000000 );
> CREATE INDEX test_value ON test( value );
> ANALYZE test;
>
>     My script runs EXPLAIN ANALYZE 10 times and keeps the fastest
> timing, then displays it.
>     Then it executes the query enough times so that 1 second is
> elapsed (also fetching the results), then prints the real runtime.
>
>     Well, the table is clustered (since it was created from
> generate_series) so of course, bitmap index scan rules.
>     Get the first 300 rows of the table using column 'value' which is
> an indexed TEXT column.
>     (using 'id' which is an integer is a bit faster)
>
> SELECT * FROM test WHERE value IN <300 values>
> Bitmap Heap Scan on test  (cost=1225.83..2317.08 rows=300 width=13)
> (actual time=3.736..3.807 rows=300 loops=1)
>   Recheck Cond: (value = ANY ('{<300 values>}'::text[]))
>   ->  Bitmap Index Scan on test_value  (cost=0.00..1225.75 rows=300
> width=0) (actual time=3.717..3.717 rows=300 loops=1)
>         Index Cond: (value = ANY ('{<300 values>}'::text[]))
> Explain Analyze runtime: 1 x 3.896 ms = 3.896 ms
> Real runtime: 1 x 6.027 ms = 6.027 ms (timed on 257 iterations)
> ----------------------------------------
> SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE
> t.value=v.column1
> Nested Loop  (cost=0.00..2447.27 rows=300 width=13) (actual
> time=0.035..4.724 rows=300 loops=1)
>   ->  Values Scan on "*VALUES*"  (cost=0.00..3.75 rows=300
> width=32) (actual time=0.002..0.219 rows=300 loops=1)
>   ->  Index Scan using test_value on test t  (cost=0.00..8.13
> rows=1 width=13) (actual time=0.013..0.014 rows=1 loops=300)
>         Index Cond: (t.value = "*VALUES*".column1)
> Explain Analyze runtime: 1 x 4.814 ms = 4.814 ms
> Real runtime: 1 x 6.786 ms = 6.786 ms (timed on 208 iterations)
> ----------------------------------------
> SELECT * FROM test WHERE value='1'
> Index Scan using test_value on test  (cost=0.00..8.40 rows=1
> width=13) (actual time=0.014..0.015 rows=1 loops=1)
>   Index Cond: (value = '1'::text)
> Explain Analyze runtime: 300 x 0.032 ms = 9.600 ms
> Real runtime: 300 x 0.149 ms = 44.843 ms (timed on 31251 iterations)
>
>     Now if we ask for 300 random rows out of the million in the table,
> which is a lot more likely situation...
>
> SELECT * FROM test WHERE value IN <300 values>
> Bitmap Heap Scan on test  (cost=1225.83..2317.08 rows=300 width=13)
> (actual time=4.516..4.945 rows=300 loops=1)
>   Recheck Cond: (value = ANY ('{<300 values>}'::text[]))
>   ->  Bitmap Index Scan on test_value  (cost=0.00..1225.75 rows=300
> width=0) (actual time=4.451..4.451 rows=300 loops=1)
>         Index Cond: (value = ANY ('{<300 values>}'::text[]))
> Explain Analyze runtime: 1 x 5.034 ms = 5.034 ms
> Real runtime: 1 x 7.278 ms = 7.278 ms (timed on 199 iterations)
> ----------------------------------------
> SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE
> t.value=v.column1
> Nested Loop  (cost=0.00..2447.27 rows=300 width=13) (actual
> time=0.046..5.503 rows=300 loops=1)
>   ->  Values Scan on "*VALUES*"  (cost=0.00..3.75 rows=300
> width=32) (actual time=0.001..0.200 rows=300 loops=1)
>   ->  Index Scan using test_value on test t  (cost=0.00..8.13
> rows=1 width=13) (actual time=0.016..0.016 rows=1 loops=300)
>         Index Cond: (t.value = "*VALUES*".column1)
> Explain Analyze runtime: 1 x 5.625 ms = 5.625 ms
> Real runtime: 1 x 7.572 ms = 7.572 ms (timed on 178 iterations)
>
>     Doing a query per row is a lot slower than it appears in EXPLAIN
> ANALYZE when you add all the overhead. In fact it sucks. That was
> ecpected of course. I could tell you about when I got this php/
> mysql website where the previous developer's idea of a join was a
> select in a PHP for() loop because "joins are slow". It was hell.
>
>     I tried to query 1, 2, 10, 1000, 10k and 100k random rows (thats
> abuse) and :
>     - one query per row is (obviously) the slowest unless you select
> only one row
>     - IN reverts to a single index scan if it contains one value, else
> it always uses a bitmap scan
>     - VALUES join uses nested loop until about 100k rows where it
> switches to hash join
>
>     I also tried on a real table with many columns. In that case,
> there are less tuples per page, and bitmap scans are more efficient
> than nested loops, so IN wins.
>     So, IN() does not turn to crap anymore like it used to !
>
>     However, if you are asking for this question because you don't
> want to use temporary tables, it seems that temp tables have been
> upgraded so they are now usable even in a webpage (well, not ALL
> webpages, but that big nasty slow webpage, you know which one I'm
> talking about).
>
>
>
>
>
>
>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>

Re: SQL WHERE: many sql or large IN()

From

Chris

Date:

20 April 2007, 04:56:08

tom wrote:
> This is some good stuff and I can use the explain analyze going forward.
> But I can't get these VALUES queries to work.
>
> I checked and I am on version 8.1.  but I think from the docs that I
> should still be able to do this.

Multiple VALUES was introduced in 8.2 as others mentioned so you can't
use it in 8.1.

--
Postgresql & php tutorials
http://www.designmagick.com/