Thread: Re: slower merge join on sorted data chosen over

Re: slower merge join on sorted data chosen over

From
"Kevin Grittner"
Date:
Thanks, Tom.
I spent a few hours trying different searches in the archives, and
found three very interesting threads on the topic.  All were from
2003.  Should I keep digging for more recent threads, or would
these probably represent the current state of the issue?
These left me somewhat concerned, since people were reporting
queries which ran orders of magnitude slower using merge joins
than when they forced them to nested loop index scans.  In our
first brush with the issue it caused our query to run only two to
three times slower than it would if the planner could more
accurately cost the nested index scans, and it was in an area with
minimal impact to the organization, so this one is relatively benign.
We are looking at doing much more with PostgreSQL over the
next two years, and it seems likely that this issue will come up
again where it is more of a problem.  It sounded like there was
some agreement on HOW this was to be fixed, yet I don't see
any mention of doing it in the TODO list.  Is there any sort of
estimate for how much programming work would be involved?
Any chance of an incremental improvement, such as only
counting leaf access in a nested select?  (While that's not
perfect, it seems likely to be closer, and therefore beneficial
overall.)
Thanks,
-Kevin
>>> Tom Lane <tgl@sss.pgh.pa.us> 10/06/05 9:28 PM >>>

There's a known issue that the planner tends to overestimate the cost of
inner-index-scan nestloops, because it doesn't allow for the strong
caching effects associated with repeated scans of the same index (in
particular, that all the upper index levels tend to stay in cache).
See the archives for past inconclusive discussions about how to fix
this.



Re: slower merge join on sorted data chosen over

From
Simon Riggs
Date:
On Mon, 2005-10-10 at 15:14 -0500, Kevin Grittner wrote:
> We are looking at doing much more with PostgreSQL over the
> next two years, and it seems likely that this issue will come up
> again where it is more of a problem.  It sounded like there was
> some agreement on HOW this was to be fixed, yet I don't see
> any mention of doing it in the TODO list.  

> Is there any sort of
> estimate for how much programming work would be involved?

The main work here is actually performance testing, not programming. The
cost model is built around an understanding of the timings and costs
involved in the execution.

Once we have timings to cover a sufficiently large range of cases, we
can derive the cost model. Once derived, we can program it. Discussing
improvements to the cost model without test results is never likely to
convince people. Everybody knows the cost models can be improved, the
only question is in what cases? and in what ways?

So deriving the cost model needs lots of trustworthy test results that
can be assessed and discussed, so we know how to improve things. [...and
I don't mean 5 minutes with pg_bench...]

Detailed analysis such as that is time consuming and also needs to be
done in a sufficiently reproducible manner that we can rely on it.

Your help would be greatly appreciated in that area. You and your team
clearly have an eye for the fine detail of these issues.

...IIRC there is a TODO item relating to that.

Perhaps we should put a more general call out on the TODO list towards
detailed, complete, accurate and reproducible performance test results?

Best Regards, Simon Riggs





Re: slow IN() clause for many cases

From
"Ilia Kantor"
Date:
When in clause becomes large enough (>20-30 cases),
It is much better to use "join" way of processing..

I mean,
"SELECT * FROM table WHERE field IN (1,2...30)" will be slower than
"SELECT * FROM table JOIN (SRF returning 1...30) USING(field)"

I'm not quite sure, where the difference starts, but sometimes I need to
make selects with 30 or more items by primary key and I get significant
speed up by this transform:

CREATE OR REPLACE FUNCTION array2table(arr int[]) RETURNS SETOF int

select * from persons join (select array2table as id from
array2table(array[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,2
3,24,25,26,27,28,29,30])) a using(id);

I'm sure that backend could do that in a much faster and elegant fasion.
Bitmap-or is nice, but for many IN arguments it is still much slower than
join.



Re: slow IN() clause for many cases

From
"Jonah H. Harris"
Date:
Please post an explain analyze on your query with a 20-30 item IN clause so that we can see what plan is being generated.


On 10/11/05, Ilia Kantor <ilia@obnovlenie.ru> wrote:

When in clause becomes large enough (>20-30 cases),
It is much better to use "join" way of processing..

I mean,
"SELECT * FROM table WHERE field IN (1,2...30)" will be slower than
"SELECT * FROM table JOIN (SRF returning 1...30) USING(field)"

I'm not quite sure, where the difference starts, but sometimes I need to
make selects with 30 or more items by primary key and I get significant
speed up by this transform:

CREATE OR REPLACE FUNCTION array2table(arr int[]) RETURNS SETOF int

select * from persons join (select array2table as id from
array2table(array[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,2
3,24,25,26,27,28,29,30])) a using(id);

I'm sure that backend could do that in a much faster and elegant fasion.
Bitmap-or is nice, but for many IN arguments it is still much slower than
join.


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq



--
Respectfully,

Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/

Re: slow IN() clause for many cases

From
"Ilia Kantor"
Date:
>Please post an explain analyze on your query with a 20-30 item IN clause so
that we can see what plan is being generated.

It is bitmap-OR on multiple index(PK) lookups.



Re: slow IN() clause for many cases

From
Josh Berkus
Date:
Ilia,

> It is bitmap-OR on multiple index(PK) lookups.

Describing it doesn't help.  We need an *actual* EXPLAIN ANALYZE.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: slow IN() clause for many cases

From
"Ilia Kantor"
Date:
>> It is bitmap-OR on multiple index(PK) lookups.

> Describing it doesn't help.  We need an *actual* EXPLAIN ANALYZE.
Sure, why not..

6ms for

Bitmap Heap Scan on objects_hier  (cost=60.29..179.57 rows=80 width=600)
(actual time=0.835..1.115 rows=138 loops=1)  Recheck Cond: ((id = 1) OR (id = 2) OR (id = 3) OR (id = 4) OR (id = 5)
OR (id = 6) OR (id = 7) OR (id = 8) OR (id = 9)OR (id = 10) OR (id = 11) OR (id = 12) OR (id = 13) OR (id = 14) OR (id
=
15) OR (id = 16) OR (id = 17) OR (id = 18) OR (
id = 19) OR (id = 20) OR (id = 21) OR (id = 22) OR (id = 23) OR (id = 24) OR
(id = 25) OR (id = 26) OR (id = 27) OR (id =
28) OR (id = 29) OR (id = 30))  ->  BitmapOr  (cost=60.29..60.29 rows=82 width=0) (actual
time=0.553..0.553 rows=0 loops=1)        ->  Bitmap Index Scan on lkjk  (cost=0.00..2.02 rows=6 width=0)
(actual time=0.036..0.036 rows=6 loops=1)              Index Cond: (id = 1)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.02rows=6 width=0)
 
(actual time=0.044..0.044 rows=6 loops=1)              Index Cond: (id = 2)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.02rows=6 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 3)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.02rows=6 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 4)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.02rows=6 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 5)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.02rows=6 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 6)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.02rows=6 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 7)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.02rows=6 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 8)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.02rows=6 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 9)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.02rows=6 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 10)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 11)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 12)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 13)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 14)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 15)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 16)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 17)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 18)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 19)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 20)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 21)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 22)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=6 loops=1)              Index Cond: (id = 23)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=0 loops=1)              Index Cond: (id = 24)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=0 loops=1)              Index Cond: (id = 25)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=0 loops=1)              Index Cond: (id = 26)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=0 loops=1)              Index Cond: (id = 27)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=0 loops=1)              Index Cond: (id = 28)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=0 loops=1)              Index Cond: (id = 29)        ->  Bitmap Index Scan on lkjk
(cost=0.00..2.00rows=1 width=0)
 
(actual time=0.002..0.002 rows=0 loops=1)              Index Cond: (id = 30)




4ms for

explain analyze select * from objects_hier join (select array2table as id
from 
array2table(array[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,2
3,24,25,26,27,28,29,30])) a using(id);                                                           QUERY PLAN

----------------------------------------------------------------------------
------------------------------------------------------Merge Join  (cost=62.33..576.80 rows=1117 width=600) (actual
time=0.542..2.898 rows=138 loops=1)  Merge Cond: ("outer".id = "inner".array2table)  ->  Index Scan using lkjk on
objects_hier (cost=0.00..493.52 rows=1680
 
width=600) (actual time=0.025..1.248 rows=139 loops=1)  ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual
time=0.510..0.799
rows=145 loops=1)        Sort Key: array2table.array2table        ->  Function Scan on array2table  (cost=0.00..12.50
rows=1000
width=4) (actual time=0.081..0.141 rows=30 loops=1)



Re: slow IN() clause for many cases

From
Tom Lane
Date:
"Ilia Kantor" <ilia@obnovlenie.ru> writes:
> Bitmap Heap Scan on objects_hier  (cost=60.29..179.57 rows=80 width=600)
> (actual time=0.835..1.115 rows=138 loops=1)

vs

>  Merge Join  (cost=62.33..576.80 rows=1117 width=600) (actual
> time=0.542..2.898 rows=138 loops=1)

Hmm, sure looks from here like the bitmap plan is faster.
        regards, tom lane


Re: slow IN() clause for many cases

From
"Jonah H. Harris"
Date:
true dat :)


On 10/12/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Ilia Kantor" <ilia@obnovlenie.ru> writes:
> Bitmap Heap Scan on objects_hier  (cost=60.29..179.57 rows=80 width=600)
> (actual time=0.835..1.115 rows=138 loops=1)

vs

>  Merge Join  (cost=62.33..576.80 rows=1117 width=600) (actual
> time=0.542..2.898 rows=138 loops=1)

Hmm, sure looks from here like the bitmap plan is faster.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster



--
Respectfully,

Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/

Re: slow IN() clause for many cases

From
Andrew - Supernews
Date:
On 2005-10-11, "Ilia Kantor" <ilia@obnovlenie.ru> wrote:
> When in clause becomes large enough (>20-30 cases),
> It is much better to use "join" way of processing..

or even a different way of writing the IN clause.

This one is one I've used after considerable research:

select * from tablewhere field in (select (some_array_of_N_items)[i]                  from generate_series(1,N) as
s(i));

This generally plans as a nestloop, with a HashAggregate of the function
scan (of generate_series) on the outer path, and index lookups on the inner
path.

It's worth noting that EXPLAIN ANALYZE doesn't tell the whole story when
comparing queries of this kind. The IN (1,2,...30) form is much slower to
plan, and usually can't usefully be used in prepared form (you'd need to
prepare it separately for every different number of items); in contrast,
the array version can be prepared once and reused.

As the number of items in the IN clause increases, the planning time grows
rather radically. For example with 1000 items (here stashed in a psql
convenience variable for brevity), using 8.1beta2:

test=# prepare tstplan1 as select * from test where id in (:v);
Time: 4881.702 ms

compare:

test=# prepare tstplan2 as select * from test where id in  (select (ARRAY[:v])[i] from generate_series(1,1000) s(i));
Time: 10.889 ms

(on my machine the break-even point for these two is less than 20 items,
or even less if the array is passed in as a literal or a single parameter
rather than constructed with ARRAY[].)

The actual execution time of these two is very close, with the second
being about 10% slower on my system (31ms vs 34ms, based on \timing values
from psql and averaged over several goes). However, the timings returned
from EXPLAIN ANALYZE are much more skewed: 42ms vs 66ms as reported in the
"total runtime" line. So not only is the planning time different, but also
the instrumentation overhead of EXPLAIN ANALYZE is wildly different between
the two forms.

What this means is that unless you're going to prepare in advance every
possible number of parameters to IN that your app is ever going to use,
the only way to get useful performance for IN queries with more than a
handful of literal values is to use an array method, in spite of the fact
that the bitmap-OR execution plan is actually at least as fast.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: slow IN() clause for many cases

From
Tom Lane
Date:
Andrew - Supernews <andrew+nonews@supernews.com> writes:
> As the number of items in the IN clause increases, the planning time grows
> rather radically.

I was looking at this yesterday.  There is some O(N^2) behavior in
create_bitmap_subplan, stemming from trying to remove duplicated qual
conditions.  That strikes me as a relatively useless activity, and I was
thinking the easiest answer might be to just delete that "optimization".

> The actual execution time of these two is very close, with the second
> being about 10% slower on my system (31ms vs 34ms, based on \timing values
> from psql and averaged over several goes). However, the timings returned
> from EXPLAIN ANALYZE are much more skewed: 42ms vs 66ms as reported in the
> "total runtime" line. So not only is the planning time different, but also
> the instrumentation overhead of EXPLAIN ANALYZE is wildly different between
> the two forms.

Yeah, this isn't all that surprising because of the larger number of
plan nodes involved in the bitmap plan.
        regards, tom lane


Re: slow IN() clause for many cases

From
"Ilia Kantor"
Date:
1) merge join is faster for me then hash join (6 vs 3-4ms):

explain analyze select * from objects_hier where id in (select
(array[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,
27,28,29,30])[i] from generate_series(1,30) s(i));

Hash Join  (cost=17.50..162.93 rows=223 width=600) (actual
time=1.148..32.654 rows=138 loops=1)  Hash Cond: ("outer".id =
('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,2
8,29,30}'::integer[])["inner".i])  ->  Seq Scan on objects_hier  (cost=0.00..134.80 rows=1680 width=600)
(actual time=0.235..12.271 rows=1680 loops=1)  ->  Hash  (cost=17.00..17.00 rows=200 width=4) (actual
time=0.893..0.893
rows=30 loops=1)        ->  HashAggregate  (cost=15.00..17.00 rows=200 width=4) (actual
time=0.161..0.830 rows=30 loops=1)              ->  Function Scan on generate_series s  (cost=0.00..12.50
rows=1000 width=4) (actual time=0.037..0.097 rows=30 loops=1)

2) Is it possible to make the planner plan that better, e.g use Bitmap-OR
for <20numbers, use some other method for >20 numbers (actual number depends
on costs) ?

Maybe that should be added to TODO ?




-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Andrew - Supernews
Sent: Wednesday, October 12, 2005 1:41 PM
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] slow IN() clause for many cases

On 2005-10-11, "Ilia Kantor" <ilia@obnovlenie.ru> wrote:
> When in clause becomes large enough (>20-30 cases),
> It is much better to use "join" way of processing..

or even a different way of writing the IN clause.

This one is one I've used after considerable research:

select * from tablewhere field in (select (some_array_of_N_items)[i]                  from generate_series(1,N) as
s(i));

This generally plans as a nestloop, with a HashAggregate of the function
scan (of generate_series) on the outer path, and index lookups on the inner
path.

It's worth noting that EXPLAIN ANALYZE doesn't tell the whole story when
comparing queries of this kind. The IN (1,2,...30) form is much slower to
plan, and usually can't usefully be used in prepared form (you'd need to
prepare it separately for every different number of items); in contrast,
the array version can be prepared once and reused.

As the number of items in the IN clause increases, the planning time grows
rather radically. For example with 1000 items (here stashed in a psql
convenience variable for brevity), using 8.1beta2:

test=# prepare tstplan1 as select * from test where id in (:v);
Time: 4881.702 ms

compare:

test=# prepare tstplan2 as select * from test where id in  (select (ARRAY[:v])[i] from generate_series(1,1000) s(i));
Time: 10.889 ms

(on my machine the break-even point for these two is less than 20 items,
or even less if the array is passed in as a literal or a single parameter
rather than constructed with ARRAY[].)

The actual execution time of these two is very close, with the second
being about 10% slower on my system (31ms vs 34ms, based on \timing values
from psql and averaged over several goes). However, the timings returned
from EXPLAIN ANALYZE are much more skewed: 42ms vs 66ms as reported in the
"total runtime" line. So not only is the planning time different, but also
the instrumentation overhead of EXPLAIN ANALYZE is wildly different between
the two forms.

What this means is that unless you're going to prepare in advance every
possible number of parameters to IN that your app is ever going to use,
the only way to get useful performance for IN queries with more than a
handful of literal values is to use an array method, in spite of the fact
that the bitmap-OR execution plan is actually at least as fast.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
              http://www.postgresql.org/docs/faq



Re: slow IN() clause for many cases

From
"Jonah H. Harris"
Date:
IMHO, we should first look at whether the O(N^2) behavior is needed.

On 10/12/05, Ilia Kantor <ilia@obnovlenie.ru > wrote:

1) merge join is faster for me then hash join (6 vs 3-4ms):

explain analyze select * from objects_hier where id in (select
(array[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,
27,28,29,30])[i] from generate_series(1,30) s(i));

Hash Join  (cost= 17.50..162.93 rows=223 width=600) (actual
time=1.148..32.654 rows=138 loops=1)
   Hash Cond: ("outer".id =
('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,2
8,29,30}'::integer[])["inner".i])
   ->  Seq Scan on objects_hier  (cost=0.00..134.80 rows=1680 width=600)
(actual time=0.235..12.271 rows=1680 loops=1)
   ->  Hash  (cost=17.00..17.00 rows=200 width=4) (actual time=0.893..0.893
rows=30 loops=1)
         ->  HashAggregate  (cost=15.00..17.00 rows=200 width=4) (actual
time=0.161..0.830 rows=30 loops=1)
               ->  Function Scan on generate_series s  (cost=0.00..12.50
rows=1000 width=4) (actual time=0.037..0.097 rows=30 loops=1)

2) Is it possible to make the planner plan that better, e.g use Bitmap-OR
for <20numbers, use some other method for >20 numbers (actual number depends
on costs) ?

Maybe that should be added to TODO ?




-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto: pgsql-hackers-owner@postgresql.org] On Behalf Of Andrew - Supernews
Sent: Wednesday, October 12, 2005 1:41 PM
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] slow IN() clause for many cases

On 2005-10-11, "Ilia Kantor" <ilia@obnovlenie.ru> wrote:
> When in clause becomes large enough (>20-30 cases),
> It is much better to use "join" way of processing..

or even a different way of writing the IN clause.

This one is one I've used after considerable research:

select * from table
where field in (select (some_array_of_N_items)[i]
                   from generate_series(1,N) as s(i));

This generally plans as a nestloop, with a HashAggregate of the function
scan (of generate_series) on the outer path, and index lookups on the inner
path.

It's worth noting that EXPLAIN ANALYZE doesn't tell the whole story when
comparing queries of this kind. The IN (1,2,...30) form is much slower to
plan, and usually can't usefully be used in prepared form (you'd need to
prepare it separately for every different number of items); in contrast,
the array version can be prepared once and reused.

As the number of items in the IN clause increases, the planning time grows
rather radically. For example with 1000 items (here stashed in a psql
convenience variable for brevity), using 8.1beta2:

test=# prepare tstplan1 as select * from test where id in (:v);
Time: 4881.702 ms

compare:

test=# prepare tstplan2 as select * from test where id in
  (select (ARRAY[:v])[i] from generate_series(1,1000) s(i));
Time: 10.889 ms

(on my machine the break-even point for these two is less than 20 items,
or even less if the array is passed in as a literal or a single parameter
rather than constructed with ARRAY[].)

The actual execution time of these two is very close, with the second
being about 10% slower on my system (31ms vs 34ms, based on \timing values
from psql and averaged over several goes). However, the timings returned
from EXPLAIN ANALYZE are much more skewed: 42ms vs 66ms as reported in the
"total runtime" line. So not only is the planning time different, but also
the instrumentation overhead of EXPLAIN ANALYZE is wildly different between
the two forms.

What this means is that unless you're going to prepare in advance every
possible number of parameters to IN that your app is ever going to use,
the only way to get useful performance for IN queries with more than a
handful of literal values is to use an array method, in spite of the fact
that the bitmap-OR execution plan is actually at least as fast.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend



--
Respectfully,

Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/

Re: slow IN() clause for many cases

From
Andrew - Supernews
Date:
On 2005-10-12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew - Supernews <andrew+nonews@supernews.com> writes:
>> As the number of items in the IN clause increases, the planning time grows
>> rather radically.
>
> I was looking at this yesterday.  There is some O(N^2) behavior in
> create_bitmap_subplan, stemming from trying to remove duplicated qual
> conditions.  That strikes me as a relatively useless activity, and I was
> thinking the easiest answer might be to just delete that "optimization".

Well, the behaviour is definitely bad.

For comparison, on 8.0 the IN (list of 1000 items) version plans only about
four times slower than IN (select array...), and again the execution times
are comparable. But for the case of a real-world app that uses IN () a lot
(dspam), I've had reports that the performance improvements from switching
to the array method are even more substantial than my raw timings suggest.

>> The actual execution time of these two is very close, with the second
>> being about 10% slower on my system (31ms vs 34ms, based on \timing values
>> from psql and averaged over several goes). However, the timings returned
>> from EXPLAIN ANALYZE are much more skewed: 42ms vs 66ms as reported in the
>> "total runtime" line. So not only is the planning time different, but also
>> the instrumentation overhead of EXPLAIN ANALYZE is wildly different between
>> the two forms.
>
> Yeah, this isn't all that surprising because of the larger number of
> plan nodes involved in the bitmap plan.

I think you have this backwards - the instrumentation overhead is _lower_
for the bitmap-OR plan than for the nestloop. This was also true in 8.0;
the instrumentation overhead of an index OR scan plan is lower than the
nestloop plan, though not by nearly as much.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: slow IN() clause for many cases

From
Andrew - Supernews
Date:
On 2005-10-12, Andrew - Supernews <andrew+nonews@supernews.com> wrote:
> On 2005-10-12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Andrew - Supernews <andrew+nonews@supernews.com> writes:
>>> As the number of items in the IN clause increases, the planning time grows
>>> rather radically.
>>
>> I was looking at this yesterday.  There is some O(N^2) behavior in
>> create_bitmap_subplan, stemming from trying to remove duplicated qual
>> conditions.  That strikes me as a relatively useless activity, and I was
>> thinking the easiest answer might be to just delete that "optimization".
>
> Well, the behaviour is definitely bad.
>
> For comparison, on 8.0 the IN (list of 1000 items) version plans only about
> four times slower than IN (select array...), and again the execution times
> are comparable. But for the case of a real-world app that uses IN () a lot
> (dspam), I've had reports that the performance improvements from switching
> to the array method are even more substantial than my raw timings suggest.

Trying this again, on the same data, with the latest planner change shows
that the plan time for IN (list of 1000 items) is now much better, 47ms
rather than nearly five seconds, but that's still substantially longer than
the execution time in the case where the data is in cache. I'm not sure why,
but the execution time for that query has improved too, it's now about 20%
faster for the bitmap-OR than for the array method.

With everything in cache, selecting 1000 random rows from a 200k row table,
I get:

for IN (list):  planning time 47ms, execution time 27ms
array/nestloop: planning time 8ms, execution time 34ms

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: slow IN() clause for many cases

From
Tom Lane
Date:
Andrew - Supernews <andrew+nonews@supernews.com> writes:
> With everything in cache, selecting 1000 random rows from a 200k row table,
> I get:
>
> for IN (list):  planning time 47ms, execution time 27ms
> array/nestloop: planning time 8ms, execution time 34ms

The reason for the slow planning time is that IN (list) is currently
converted (by the grammar) into an OR structure:
(x = val1) OR (x = val2) OR (x = val3) OR (x = val4) OR ...

which means that the planner has to consider each subclause separately:
discover that it can be matched to an index, build the indexscan plan,
etc.  Even just looking up all the "=" operators during parsing takes a
fair amount of time :-( The large number of plan nodes also imply
relatively slow executor startup.

It strikes me that now that we have the bitmap indexscan mechanism,
it wouldn't be hard to do better.  I'm thinking that IN should be
converted to a ScalarArrayOpExpr, ie
x = ANY (ARRAY[val1,val2,val3,val4,...])

and then we could treat both OpExpr and ScalarArrayOpExpr as matching
an index when the LHS is the index key and the operator is in the
index's opclass.  This wouldn't fit comfortably into a plain indexscan,
but a bitmap indexscan has already got the mechanism for ORing together
the results of several primitive indexscans (one per array element).
You just do the scans and insert all the results into your output
bitmap.

This approach would reduce the planner and executor-startup costs of
even a long IN list to be pretty comparable to a single index key
comparison.  The actual runtime of the plan wouldn't change much,
I think, but it's the overhead that's hurting us here.

This also solves the problem mentioned earlier in the thread that
you can't make a prepared plan for varying lengths of IN-lists:
instead of using IN, you do something likex = ANY($1::int[])
and then ship your lookup keys as a single array parameter instead of
multiple scalars.

The main objection I can see is that this technique requires the
elements of the IN list to be converted to a common datatype (ie, the
element datatype of the ARRAY construct).  Historically our code has
made no such assumption.  However, I believe that the SQL standard
expects that conversion to happen: "x IN (list)" is defined as
"x = ANY (VALUES list)" and <table value constructor> expects all its
rows to be converted to a common rowtype.  So we could point to the
standard if anyone complains that the behavior changed.

Obviously this is not happening for 8.1, but it seems very do-able for
8.2.

Comments?
        regards, tom lane


Re: slow IN() clause for many cases

From
Tom Lane
Date:
I wrote:
> I'm thinking that IN should be
> converted to a ScalarArrayOpExpr, ie

>     x = ANY (ARRAY[val1,val2,val3,val4,...])

Actually, there is one little thing in the way of doing this: it'll
fail if any of the IN-list elements are NULL, because we have not got
support for arrays with null elements.  So we'd have to fix that first.
        regards, tom lane


Re: slow IN() clause for many cases

From
David Fetter
Date:
On Fri, Oct 14, 2005 at 07:09:17PM -0400, Tom Lane wrote:
> I wrote:
> > I'm thinking that IN should be converted to a ScalarArrayOpExpr,
> > ie
> 
> >     x = ANY (ARRAY[val1,val2,val3,val4,...])
> 
> Actually, there is one little thing in the way of doing this: it'll
> fail if any of the IN-list elements are NULL, because we have not
> got support for arrays with null elements.   So we'd have to fix
> that first.

+1 on fixing that. :)

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!


Re: slow IN() clause for many cases

From
mark@mark.mielke.cc
Date:
On Fri, Oct 14, 2005 at 07:09:17PM -0400, Tom Lane wrote:
> I wrote:
> > I'm thinking that IN should be
> > converted to a ScalarArrayOpExpr, ie
> >     x = ANY (ARRAY[val1,val2,val3,val4,...])
> Actually, there is one little thing in the way of doing this: it'll
> fail if any of the IN-list elements are NULL, because we have not got
> support for arrays with null elements.  So we'd have to fix that first.

Hey Tom.

Obviously your area of expertise, so please tell me where I'm wrong -

But doesn't "x IN (NULL)" already fail to ever match, similar to "x
= NULL"? (Assuming that compatibility switch isn't enabled)

So, I'd hope people weren't using such an expression? :-) Or is that
not good enough? What does NULL::text[] turn into? An empty string? Is
that the danger? It might match against an empty string?

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: slow IN() clause for many cases

From
Tom Lane
Date:
mark@mark.mielke.cc writes:
> But doesn't "x IN (NULL)" already fail to ever match, similar to "x
> = NULL"? (Assuming that compatibility switch isn't enabled)

The case I'm worried about is "x IN (1,2,NULL)".  This *can* match.

Also, it's supposed to return NULL (not FALSE) if it doesn't match.
So simply discarding NULLs before we build the array wouldn't give
the right answer.

Considering that allowing null array entries has been a big wishlist
item for many years, I don't think that anyone will object to fixing
that in 8.2, anyway ...
        regards, tom lane


Re: slow IN() clause for many cases

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> It strikes me that now that we have the bitmap indexscan mechanism,
> it wouldn't be hard to do better.  I'm thinking that IN should be
> converted to a ScalarArrayOpExpr, ie
> 
>     x = ANY (ARRAY[val1,val2,val3,val4,...])
> 
> and then we could treat both OpExpr and ScalarArrayOpExpr as matching
> an index when the LHS is the index key and the operator is in the
> index's opclass.  This wouldn't fit comfortably into a plain indexscan,
> but a bitmap indexscan has already got the mechanism for ORing together
> the results of several primitive indexscans (one per array element).
> You just do the scans and insert all the results into your output
> bitmap.

Would this mean it would be impossible to get a fast-start plan for an IN
expression though?

I would fear queries like
SELECT * from tab WHERE x IN (1,2,3) LIMIT 1

Which ought to be instantaneous would become potentially slow.

(Actually I'm more interested in cases where instead of LIMIT 1 it's the
client that will stop as soon as it finds the record it's really looking for.)


-- 
greg



Re: slow IN() clause for many cases

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> I would fear queries like

>  SELECT * from tab WHERE x IN (1,2,3) LIMIT 1

> Which ought to be instantaneous would become potentially slow.

Why?  They certainly wouldn't be any slower than they are now.
        regards, tom lane


Re: slow IN() clause for many cases

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > I would fear queries like
> 
> >  SELECT * from tab WHERE x IN (1,2,3) LIMIT 1
> 
> > Which ought to be instantaneous would become potentially slow.
> 
> Why?  They certainly wouldn't be any slower than they are now.

Well currently they get rewritten to use OR which does a single index scan
which I assumed returned rows as soon as it finds them like it does for
regular range lookup index scans. Is that assumption wrong?

The bitmap scan has to traverse all the index entries for matching records
before it can fetch the first record. So it wouldn't be a fast-start plan. Not
as bad as performing a sort step or anything like that of course.

-- 
greg



Re: slow IN() clause for many cases

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Why?  They certainly wouldn't be any slower than they are now.

> Well currently they get rewritten to use OR which does a single index scan

Not in 8.1 it doesn't; that code is long gone.

2005-04-24 21:30  tgl
Remove support for OR'd indexscans internal to a single IndexScanplan node, as this behavior is now better done as a
bitmapORindexscan.  This allows considerable simplification innodeIndexscan.c itself as well as several planner modules
concernedwithindexscan plan generation.  Also we can improve the sharing ofcode between regular and bitmap indexscans,
sincethey are nowworking with nigh-identical Plan nodes.
 

> The bitmap scan has to traverse all the index entries for matching records
> before it can fetch the first record. So it wouldn't be a fast-start
> plan.

If the fraction of records matching the IN-list is so large as to make
that an issue, I'd think the planner would prefer a seqscan anyway.
Besides which, it's a bit silly to worry about whether a plan is
fast-start without taking into account the amount of planner time needed
to create it.

Another point here is that LIMIT without any ORDER BY isn't an amazingly
useful case.  Neither the old implementation of OR indexscans nor the
new can produce ordered output, which means you're not going to get
fast-start behavior anyway for real queries.
        regards, tom lane


Re: slow IN() clause for many cases

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> If the fraction of records matching the IN-list is so large as to make
> that an issue, I'd think the planner would prefer a seqscan anyway.
> Besides which, it's a bit silly to worry about whether a plan is
> fast-start without taking into account the amount of planner time needed
> to create it.

Well I'm thinking about the cases where there's a short IN-list but many
records match those clauses. Something like:

WHERE user_status IN ('active','pending')

Where there could be thousands of matching records.

> Another point here is that LIMIT without any ORDER BY isn't an amazingly
> useful case.  Neither the old implementation of OR indexscans nor the
> new can produce ordered output, which means you're not going to get
> fast-start behavior anyway for real queries.

That's true. That's why I was wondering more about cases where the client end
was going to read all the records until it found the record it's looking for
or found enough records for its purposes.

There are other uses of fast-start as well. If the records are going to be put
through a slow process it can be more efficient to pipeline them through while
the later records are still coming from the database.

But I guess this is all moot if the option isn't there, at least not without a
lot more programming.


The example above raises another idea though. Would it be possible for the
optimizer to recognize when a clause is so expansive that it would be faster
to read the complement than the actual clause as written?

That could be useful with bitmap operations if it meant less time reading the
index to build the bitmap but the eventual result after the bitmap operations
is restrictive enough to justify an index scan. 

eg a case where 90% of users are active, and 10% have pending set and there's
an index on each of these:

WHERE user_status = 'active' AND pending = true

If the optimizer tries to a bitmap index scan now it has to read in a huge
index; it's probably better off just using the pending index alone. But if it
realizes that it can use the index to find the tuples that *aren't* active and
then take the complement of that bitmap it could build a bitmap reading in
only 10% of that index.


-- 
greg



Re: slow IN() clause for many cases

From
Martijn van Oosterhout
Date:
On Sun, Oct 16, 2005 at 12:03:33PM -0400, Greg Stark wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> > Another point here is that LIMIT without any ORDER BY isn't an amazingly
> > useful case.  Neither the old implementation of OR indexscans nor the
> > new can produce ordered output, which means you're not going to get
> > fast-start behavior anyway for real queries.
>
> That's true. That's why I was wondering more about cases where the client end
> was going to read all the records until it found the record it's looking for
> or found enough records for its purposes.

I would argue that the client should simply ask for what it wants
rather than filtering on the client end. Then PostgreSQL has the info
to optimise appropriately.

> There are other uses of fast-start as well. If the records are going to be put
> through a slow process it can be more efficient to pipeline them through while
> the later records are still coming from the database.

Well, IIRC if you create a cursor or use LIMIT, PostgreSQL will prefer
fast-start plans if that gets the requested result faster.

> The example above raises another idea though. Would it be possible for the
> optimizer to recognize when a clause is so expansive that it would be faster
> to read the complement than the actual clause as written?
>
> That could be useful with bitmap operations if it meant less time reading the
> index to build the bitmap but the eventual result after the bitmap operations
> is restrictive enough to justify an index scan.

The problem is, the bitmaps are lossy. If so many matches indicate
about the same area in the table, they are coalesced into a single
block. Hence, you cannot meaningfully ask for a complement.

However, if you knew from the beginning that you wanted the complement
you may be able to arrange something. OTOH, there's no way to tell from
the index if it's referred to *all* the tuples in a page so there is no
way to exclude a page from consideration.

> eg a case where 90% of users are active, and 10% have pending set and there's
> an index on each of these:
>
> WHERE user_status = 'active' AND pending = true
>
> If the optimizer tries to a bitmap index scan now it has to read in a huge
> index; it's probably better off just using the pending index alone. But if it
> realizes that it can use the index to find the tuples that *aren't* active and
> then take the complement of that bitmap it could build a bitmap reading in
> only 10% of that index.

As I pointed out above, I don't think you can take the complement of a
bitmap. Besides, I havn't looked at the costings for bitmap scans but
ISTM that if the stats indicate a 90% coverage for user_status and the
index is not highly correlated, it should exclude that index from
consideration altogether. And whether it uses a normal index scan or a
bitmap scan for pending also depends on the stats. It's the comparison
between scanning the whole index and then the pages in sequence versus
just grabbing the tuples from the index.

Besides, the case you provide is the perfect candidate for a partial
index. Attributes that only apply to a small fraction of the table are
better off as predicates if you're going to be searching for them a
lot.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: slow IN() clause for many cases

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Sun, Oct 16, 2005 at 12:03:33PM -0400, Greg Stark wrote:
>> That's true. That's why I was wondering more about cases where the client end
>> was going to read all the records until it found the record it's looking for
>> or found enough records for its purposes.

> I would argue that the client should simply ask for what it wants
> rather than filtering on the client end. Then PostgreSQL has the info
> to optimise appropriately.

Certainly, if you do not supply a LIMIT, there is no justification
at all for expecting the planner to prefer fast-start over
minimum-total-cost.
        regards, tom lane


Re: slow IN() clause for many cases

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> The example above raises another idea though. Would it be possible for the
> optimizer to recognize when a clause is so expansive that it would be faster
> to read the complement than the actual clause as written?

Being able to compute the complement, much less do so with an indexable
clause, is usually difficult in SQL (think about NULLs).  In any case
I think this is the situation where you are happy to fall back to a
seqscan.
        regards, tom lane


Re: slow IN() clause for many cases

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Martijn van Oosterhout <kleptog@svana.org> writes:
> > On Sun, Oct 16, 2005 at 12:03:33PM -0400, Greg Stark wrote:
> >> That's true. That's why I was wondering more about cases where the client end
> >> was going to read all the records until it found the record it's looking for
> >> or found enough records for its purposes.
> 
> > I would argue that the client should simply ask for what it wants
> > rather than filtering on the client end. Then PostgreSQL has the info
> > to optimise appropriately.
> 
> Certainly, if you do not supply a LIMIT, there is no justification
> at all for expecting the planner to prefer fast-start over
> minimum-total-cost.

Well figuring out when to prefer one or the other is a hard problem.
Fundamentally the server simply does not have the information it needs to
determine that available. 

(I think there really ought to be a bit in the protocol that the client sends
with the query to indicate which is needed. That would be cleaner than
Oracle's /*+ FIRST_ROW */ and /*+ ALL_ROWS */ hints.)

But having it as an option is a separate question. Even if the server needs
some cajoling to actually choose the right one it's always a good thing if
it's at least possible.

-- 
greg



Re: slow IN() clause for many cases

From
Martijn van Oosterhout
Date:
On Sun, Oct 16, 2005 at 05:09:57PM -0400, Greg Stark wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> > Certainly, if you do not supply a LIMIT, there is no justification
> > at all for expecting the planner to prefer fast-start over
> > minimum-total-cost.
>
> Well figuring out when to prefer one or the other is a hard problem.
> Fundamentally the server simply does not have the information it needs to
> determine that available.

Umm, not really. Notice how EXPLAIN has two numbers: time to first row,
time to last row. If you add limit 1 it will favour plans that return
the first row quickly. If you don't it'll favour plans that have the
lowest total execution time, even if the first tuple takes longer.

> (I think there really ought to be a bit in the protocol that the client sends
> with the query to indicate which is needed. That would be cleaner than
> Oracle's /*+ FIRST_ROW */ and /*+ ALL_ROWS */ hints.)

It's called LIMIT and has been supported for a long time.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: slow IN() clause for many cases

From
Greg Stark
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:

> > (I think there really ought to be a bit in the protocol that the client sends
> > with the query to indicate which is needed. That would be cleaner than
> > Oracle's /*+ FIRST_ROW */ and /*+ ALL_ROWS */ hints.)
> 
> It's called LIMIT and has been supported for a long time.

And if I *don't* want to limit the number of rows I retriev?

-- 
greg



Re: slow IN() clause for many cases

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Martijn van Oosterhout <kleptog@svana.org> writes:
>> It's called LIMIT and has been supported for a long time.

> And if I *don't* want to limit the number of rows I retriev?

You still need to do something proactive to inform the system that you
want a fast-start plan.  What's more, you really really do not want to
give it an unlimited license to prefer zero-start-cost plans, else you
might still be waiting for the third row when hell freezes over.

In the current system structure, the only very reasonable way to set up
incremental client processing of a query result is to use a cursor and
FETCH a few rows at a time from it.  The planner already has a hack to
give some preference to fast-start plans when planning DECLARE CURSOR.
My recollection is that it optimizes on the assumption that 10% of the
rows will be retrieved, which gives some balance between start speed and
not blowing out the total runtime unreasonably.  We've already had some
discussion about exposing that 10% figure as a GUC variable, so that you
could put a finger on the scale depending on how much of the result you
expected to fetch.  But nobody got excited enough to make it happen...
        regards, tom lane


Re: slow IN() clause for many cases

From
Simon Riggs
Date:
On Fri, 2005-10-14 at 19:09 -0400, Tom Lane wrote:
> I wrote:
> > I'm thinking that IN should be
> > converted to a ScalarArrayOpExpr, ie
> 
> >     x = ANY (ARRAY[val1,val2,val3,val4,...])
> 
> Actually, there is one little thing in the way of doing this: it'll
> fail if any of the IN-list elements are NULL, because we have not got
> support for arrays with null elements.  So we'd have to fix that first.

You'd also need to consider how this effects partial indexes and
constraint exclusion. Not much of an issue, but an extra case to handle
in the predicate proving code.

= = =

Just had a case where using an IN list was quicker than using a join
because it allowed an index lookup to occur. There is also some clear
mileage in transforming this type of query to a more plannable form:

select * from bigtable where word IN (
select word from customer_word where customer = 6)

i.e. where the values for the IN clause are evaluated at run time,
rather than at plan time.

Best Regards, Simon Riggs



Re: slower merge join on sorted data chosen over

From
"Jim C. Nasby"
Date:
On Tue, Oct 11, 2005 at 10:58:58AM +0100, Simon Riggs wrote:
> On Mon, 2005-10-10 at 15:14 -0500, Kevin Grittner wrote:
> > We are looking at doing much more with PostgreSQL over the
> > next two years, and it seems likely that this issue will come up
> > again where it is more of a problem.  It sounded like there was
> > some agreement on HOW this was to be fixed, yet I don't see
> > any mention of doing it in the TODO list.  
> 
> > Is there any sort of
> > estimate for how much programming work would be involved?
> 
> The main work here is actually performance testing, not programming. The
> cost model is built around an understanding of the timings and costs
> involved in the execution.
> 
> Once we have timings to cover a sufficiently large range of cases, we
> can derive the cost model. Once derived, we can program it. Discussing
> improvements to the cost model without test results is never likely to
> convince people. Everybody knows the cost models can be improved, the
> only question is in what cases? and in what ways?
> 
> So deriving the cost model needs lots of trustworthy test results that
> can be assessed and discussed, so we know how to improve things. [...and
> I don't mean 5 minutes with pg_bench...]
> 
> Detailed analysis such as that is time consuming and also needs to be
> done in a sufficiently reproducible manner that we can rely on it.
> 
> Your help would be greatly appreciated in that area. You and your team
> clearly have an eye for the fine detail of these issues.
> 
> ...IIRC there is a TODO item relating to that.
> 
> Perhaps we should put a more general call out on the TODO list towards
> detailed, complete, accurate and reproducible performance test results?

I touched on some of this in
http://archives.postgresql.org/pgsql-performance/2005-05/msg00336.php:

"In terms of a testing system, here's what I'm thinking of. For each cost
estimate, there will be a number of input variables we want to vary, and
then check to see how changes in them effect run time. Using index scan
as a simple example, 1st order variables will likely be index and table
size (especially in relation to cache size), and correlation. So, we
need some kind of a test harness that can vary these variables
(prefferably one at a time), and run a test case after each change. It
would then need to store the timing info, possibly along with other
information (such as explain output). If I'm the one to write this it'll
end up in perl, since that's the only language I know that would be able
to accomplish this. DBT seems to be a reasonable test database to do
this testing with, especially since it would provide a ready means to
provide external load."

Of course, this work can be done by hand, but it's a very slow, tedeous
process. It's also rather error-prone.

There's been some discussion on the osdl-dbt mailing lists about
providing a front-end that would allow for scheduling tests and storing
results in a database where you could data-mine more easily than you
currently can (currently everything's just stored as files on a disk
somewhere). ISTM that having that would make this kind of testing much
easier to do. But I've also been working with dbt quite a bit recently,
so it's my hammer that makes everything performance related look like a
nail...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: slower merge join on sorted data chosen over

From
Simon Riggs
Date:
On Mon, 2005-10-17 at 14:55 -0500, Jim C. Nasby wrote:
> On Tue, Oct 11, 2005 at 10:58:58AM +0100, Simon Riggs wrote:
> > On Mon, 2005-10-10 at 15:14 -0500, Kevin Grittner wrote:
> > > We are looking at doing much more with PostgreSQL over the
> > > next two years, and it seems likely that this issue will come up
> > > again where it is more of a problem.  It sounded like there was
> > > some agreement on HOW this was to be fixed, yet I don't see
> > > any mention of doing it in the TODO list.  
> > 
> > > Is there any sort of
> > > estimate for how much programming work would be involved?
> > 
> > The main work here is actually performance testing, not programming. The
> > cost model is built around an understanding of the timings and costs
> > involved in the execution.
> > 
> > Once we have timings to cover a sufficiently large range of cases, we
> > can derive the cost model. Once derived, we can program it. Discussing
> > improvements to the cost model without test results is never likely to
> > convince people. Everybody knows the cost models can be improved, the
> > only question is in what cases? and in what ways?
> > 
> > So deriving the cost model needs lots of trustworthy test results that
> > can be assessed and discussed, so we know how to improve things. [...and
> > I don't mean 5 minutes with pg_bench...]

...

> DBT seems to be a reasonable test database 

I was discussing finding the cost equations to use within the optimizer
based upon a series of exploratory tests using varying data. That is
different to using the same database with varying parameters. Both sound
interesting, but it is the former that, IMHO, would be the more
important.

Best Regards, Simon Riggs



Re: slower merge join on sorted data chosen over

From
"Jim C. Nasby"
Date:
On Mon, Oct 17, 2005 at 09:30:24PM +0100, Simon Riggs wrote:
> On Mon, 2005-10-17 at 14:55 -0500, Jim C. Nasby wrote:
> > On Tue, Oct 11, 2005 at 10:58:58AM +0100, Simon Riggs wrote:
> > > On Mon, 2005-10-10 at 15:14 -0500, Kevin Grittner wrote:
> > > > We are looking at doing much more with PostgreSQL over the
> > > > next two years, and it seems likely that this issue will come up
> > > > again where it is more of a problem.  It sounded like there was
> > > > some agreement on HOW this was to be fixed, yet I don't see
> > > > any mention of doing it in the TODO list.  
> > > 
> > > > Is there any sort of
> > > > estimate for how much programming work would be involved?
> > > 
> > > The main work here is actually performance testing, not programming. The
> > > cost model is built around an understanding of the timings and costs
> > > involved in the execution.
> > > 
> > > Once we have timings to cover a sufficiently large range of cases, we
> > > can derive the cost model. Once derived, we can program it. Discussing
> > > improvements to the cost model without test results is never likely to
> > > convince people. Everybody knows the cost models can be improved, the
> > > only question is in what cases? and in what ways?
> > > 
> > > So deriving the cost model needs lots of trustworthy test results that
> > > can be assessed and discussed, so we know how to improve things. [...and
> > > I don't mean 5 minutes with pg_bench...]
> 
> ...
> 
> > DBT seems to be a reasonable test database 
> 
> I was discussing finding the cost equations to use within the optimizer
> based upon a series of exploratory tests using varying data. That is
> different to using the same database with varying parameters. Both sound
> interesting, but it is the former that, IMHO, would be the more
> important.

True, although that doesn't necessarily mean you can't use the same data
generation. For the testing I was doing before I was just varying
correlation using cluster (or selecting from different fields with
different correlations).
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: slow IN() clause for many cases

From
Simon Riggs
Date:
On Mon, 2005-10-17 at 12:49 +0100, Simon Riggs wrote:
> On Fri, 2005-10-14 at 19:09 -0400, Tom Lane wrote:
> > I wrote:
> > > I'm thinking that IN should be
> > > converted to a ScalarArrayOpExpr, ie
> > 
> > >     x = ANY (ARRAY[val1,val2,val3,val4,...])
> > 
> > Actually, there is one little thing in the way of doing this: it'll
> > fail if any of the IN-list elements are NULL, because we have not got
> > support for arrays with null elements.  So we'd have to fix that first.
> 
> You'd also need to consider how this effects partial indexes and
> constraint exclusion. Not much of an issue, but an extra case to handle
> in the predicate proving code.
> 
> = = =
> 
> Just had a case where using an IN list was quicker than using a join
> because it allowed an index lookup to occur. There is also some clear
> mileage in transforming this type of query to a more plannable form:
> 
> select * from bigtable where word IN (
> select word from customer_word where customer = 6)
> 
> i.e. where the values for the IN clause are evaluated at run time,
> rather than at plan time.

Do you think we'll be able to generate a single ScalarArrayOpExpr from a
small subselect and pass it through as an indexable expression?

I'm guessing its not lost on you that this would give a Star join like
capability, when joining multiple dimension tables to a large Fact
table.

e.g.

Select * From Sales where month IN (
select month from time_dimension where FinYear = 2005 and Quarter = 3)
...

Having taught predtest.c about ScalarArrayOpExpr means that would allow
this to work with constraint exclusion.

So that solves the how-to-join-AND-partition problem I've been
struggling with: don't join, transform. Very cool.

Best Regards, Simon Riggs





Re: slow IN() clause for many cases

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Do you think we'll be able to generate a single ScalarArrayOpExpr from a
> small subselect and pass it through as an indexable expression?

If you don't mind spelling it with the ARRAY(sub-select) syntax, which
I think is a Postgres-ism (though it's possible Joe got it from
SQL2003).

regression=# explain select * from tenk1 where unique1 = any (array(select f1 from int4_tbl));
     QUERY PLAN
 
-----------------------------------------------------------------------------Bitmap Heap Scan on tenk1
(cost=3.09..37.86rows=10 width=244)  Recheck Cond: (unique1 = ANY ($0))  InitPlan    ->  Seq Scan on int4_tbl
(cost=0.00..1.05rows=5 width=4)  ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..2.04 rows=10 width=0)        Index
Cond:(unique1 = ANY ($0))
 
(6 rows)

Of course the planner is just guessing about how many rows this will
produce.

> e.g.
> Select * From Sales where month IN (
> select month from time_dimension where FinYear = 2005 and Quarter = 3)

> Having taught predtest.c about ScalarArrayOpExpr means that would allow
> this to work with constraint exclusion.

Not hardly, unless you want to play fast and loose with semantics by
evaluating subselects at plan time instead of run time.  You could
persuade that to happen by wrapping the ARRAY(sub-select) into a
function mis-declared as IMMUTABLE, but I'd be pretty resistant to
having the planner assume any such thing by default.
        regards, tom lane



Re: slow IN() clause for many cases

From
Joe Conway
Date:
Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> 
>>Do you think we'll be able to generate a single ScalarArrayOpExpr from a
>>small subselect and pass it through as an indexable expression?
> 
> If you don't mind spelling it with the ARRAY(sub-select) syntax, which
> I think is a Postgres-ism (though it's possible Joe got it from
> SQL2003).

It's in SQL 2003. See excerpt below.

Joe

6.36 <array value constructor>

Function
  Specify construction of an array.

Format
  <array value constructor> ::=    <array value constructor by enumeration> |    <array value constructor by query>
<arrayvalue constructor by enumeration> ::=    ARRAY <left bracket or trigraph>          <array element list>
<rightbracket or trigraph>  <array value constructor by query> ::=    ARRAY <left paren>          <query expression> [
<orderby clause> ]          <right paren>
 


Re: slow IN() clause for many cases

From
Simon Riggs
Date:
On Tue, 2005-11-29 at 17:21 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Do you think we'll be able to generate a single ScalarArrayOpExpr from a
> > small subselect and pass it through as an indexable expression?
> 
> If you don't mind spelling it with the ARRAY(sub-select) syntax, which
> I think is a Postgres-ism (though it's possible Joe got it from
> SQL2003).
> 
> regression=# explain select * from tenk1 where unique1 = any (array(select f1 from int4_tbl));
>                                  QUERY PLAN
> -----------------------------------------------------------------------------
>  Bitmap Heap Scan on tenk1  (cost=3.09..37.86 rows=10 width=244)
>    Recheck Cond: (unique1 = ANY ($0))
>    InitPlan
>      ->  Seq Scan on int4_tbl  (cost=0.00..1.05 rows=5 width=4)
>    ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..2.04 rows=10 width=0)
>          Index Cond: (unique1 = ANY ($0))
> (6 rows)
> 
> Of course the planner is just guessing about how many rows this will
> produce.

So we could teach the planner to transform:

IN (subselect) 

into

= ANY(array(subselect))

if we had the planner think the subselect had say < 1000 rows?

> > e.g.
> > Select * From Sales where month IN (
> > select month from time_dimension where FinYear = 2005 and Quarter = 3)
> 
> > Having taught predtest.c about ScalarArrayOpExpr means that would allow
> > this to work with constraint exclusion.
> 
> Not hardly, unless you want to play fast and loose with semantics by
> evaluating subselects at plan time instead of run time.  You could
> persuade that to happen by wrapping the ARRAY(sub-select) into a
> function mis-declared as IMMUTABLE, but I'd be pretty resistant to
> having the planner assume any such thing by default.

Man, thats a horrible thought. I must be dragging you down :-)

IMHO the only way to do joins that access partitions is to do the
constraint exclusion at run time, but I can see thats a longer
conversation than I can start right now.

Best Regards, Simon Riggs




Re: slow IN() clause for many cases

From
Martijn van Oosterhout
Date:
On Tue, Nov 29, 2005 at 10:53:38PM +0000, Simon Riggs wrote:
> On Tue, 2005-11-29 at 17:21 -0500, Tom Lane wrote:
> > regression=# explain select * from tenk1 where unique1 = any (array(select f1 from int4_tbl));

<snip>

> So we could teach the planner to transform:
>
> IN (subselect)
>
> into
>
> = ANY(array(subselect))
>
> if we had the planner think the subselect had say < 1000 rows?

Do these constructs have the same semantics w.r.t. NULL? Currently
arrays can't have nulls but that is changing. Also, won't they behave
differently in the case where the subselect returns duplicate values?

And finally, why can't:

> > > Select * From Sales where month IN (
> > > select month from time_dimension where FinYear = 2005 and Quarter = 3)

Be written as:

Select sales.* From Sales, time_dimension
where month = time_dimension.inYear = 2005 and time_dimension.Quarter = 3;

As long as there are no NULLs it returns the same as the IN() version
and PostgreSQL can optimise it just fine.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: slow IN() clause for many cases

From
Greg Stark
Date:
Simon Riggs <simon@2ndquadrant.com> writes:

> IMHO the only way to do joins that access partitions is to do the
> constraint exclusion at run time, but I can see thats a longer
> conversation than I can start right now.

My experience in Oracle was that you can see three different types of
partition plans

1) Plans that the planner can statically deduce will access specific  partitions. This is basically what Postgres has
now.

2) Plans that the planner can statically deduce will only access a limited  number of partitions but which partitions
willbe determined at run-time.  This is sort of like how Postgres handles index scans in subqueries where  the value
beinglooked up is a parameter.
 

3) Plans that expect to span all the partitions because it can't find any  constraint in the query from which it can
deducewhich partitions to look  at.
 

Option 2 happens both for joins, and also for simple lookups where the value
being looked up is a parameter in a prepared query. My personal inclination is
that prepared queries are the way of the future so I think this is an
important case.

-- 
greg



Re: slow IN() clause for many cases

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> Do these constructs have the same semantics w.r.t. NULL?

I think so, though it'd be good to read the spec closely.

> Currently arrays can't have nulls

That's so last week ;-)
        regards, tom lane


Re: slow IN() clause for many cases

From
Simon Riggs
Date:
On Wed, 2005-11-30 at 07:18 +0100, Martijn van Oosterhout wrote:

> And finally, why can't:
> 
> > > > Select * From Sales where month IN (
> > > > select month from time_dimension where FinYear = 2005 and Quarter = 3)
> 
> Be written as: 
> 
> Select sales.* From Sales, time_dimension 
> where month = time_dimension.inYear = 2005 and time_dimension.Quarter = 3;
> 
> As long as there are no NULLs it returns the same as the IN() version
> and PostgreSQL can optimise it just fine.

It can, of course, but there must be value in that optimization.

If you consider how IN () would be transformed into
=ANY(ARRAY(subselect)) you'll see that the subselect values would be
treated as constants that could result in a bitmap index lookup.

Transforming IN () into straight joins would not take the same approach
when more than one join (i.e. 3 or more tables) was requested.

Best Regards, Simon Riggs



Re: slow IN() clause for many cases

From
Martijn van Oosterhout
Date:
On Fri, Dec 02, 2005 at 08:18:44AM +0000, Simon Riggs wrote:
> It can, of course, but there must be value in that optimization.
>
> If you consider how IN () would be transformed into
> =ANY(ARRAY(subselect)) you'll see that the subselect values would be
> treated as constants that could result in a bitmap index lookup.
>
> Transforming IN () into straight joins would not take the same approach
> when more than one join (i.e. 3 or more tables) was requested.

Are you sure? If you have one table joined to many others, that is the
single most obvious case for bitmap indexes. And joins are converted to
bitmap index scans all the time so I'm unsure why this case would be
any different. If the results are the same, the optimiser should
optimise both the same, no?

Anyway, maybe I'm just old fashioned in thinking that joins are by far
the easiest to optimise because they are closest to relational algebra.
IN() can also be converted to a join, except for the NULL effect.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.