Thread: Re: slower merge join on sorted data chosen over
Thanks, Tom. I spent a few hours trying different searches in the archives, and found three very interesting threads on the topic. All were from 2003. Should I keep digging for more recent threads, or would these probably represent the current state of the issue? These left me somewhat concerned, since people were reporting queries which ran orders of magnitude slower using merge joins than when they forced them to nested loop index scans. In our first brush with the issue it caused our query to run only two to three times slower than it would if the planner could more accurately cost the nested index scans, and it was in an area with minimal impact to the organization, so this one is relatively benign. We are looking at doing much more with PostgreSQL over the next two years, and it seems likely that this issue will come up again where it is more of a problem. It sounded like there was some agreement on HOW this was to be fixed, yet I don't see any mention of doing it in the TODO list. Is there any sort of estimate for how much programming work would be involved? Any chance of an incremental improvement, such as only counting leaf access in a nested select? (While that's not perfect, it seems likely to be closer, and therefore beneficial overall.) Thanks, -Kevin >>> Tom Lane <tgl@sss.pgh.pa.us> 10/06/05 9:28 PM >>> There's a known issue that the planner tends to overestimate the cost of inner-index-scan nestloops, because it doesn't allow for the strong caching effects associated with repeated scans of the same index (in particular, that all the upper index levels tend to stay in cache). See the archives for past inconclusive discussions about how to fix this.
On Mon, 2005-10-10 at 15:14 -0500, Kevin Grittner wrote: > We are looking at doing much more with PostgreSQL over the > next two years, and it seems likely that this issue will come up > again where it is more of a problem. It sounded like there was > some agreement on HOW this was to be fixed, yet I don't see > any mention of doing it in the TODO list. > Is there any sort of > estimate for how much programming work would be involved? The main work here is actually performance testing, not programming. The cost model is built around an understanding of the timings and costs involved in the execution. Once we have timings to cover a sufficiently large range of cases, we can derive the cost model. Once derived, we can program it. Discussing improvements to the cost model without test results is never likely to convince people. Everybody knows the cost models can be improved, the only question is in what cases? and in what ways? So deriving the cost model needs lots of trustworthy test results that can be assessed and discussed, so we know how to improve things. [...and I don't mean 5 minutes with pg_bench...] Detailed analysis such as that is time consuming and also needs to be done in a sufficiently reproducible manner that we can rely on it. Your help would be greatly appreciated in that area. You and your team clearly have an eye for the fine detail of these issues. ...IIRC there is a TODO item relating to that. Perhaps we should put a more general call out on the TODO list towards detailed, complete, accurate and reproducible performance test results? Best Regards, Simon Riggs
When in clause becomes large enough (>20-30 cases), It is much better to use "join" way of processing.. I mean, "SELECT * FROM table WHERE field IN (1,2...30)" will be slower than "SELECT * FROM table JOIN (SRF returning 1...30) USING(field)" I'm not quite sure, where the difference starts, but sometimes I need to make selects with 30 or more items by primary key and I get significant speed up by this transform: CREATE OR REPLACE FUNCTION array2table(arr int[]) RETURNS SETOF int select * from persons join (select array2table as id from array2table(array[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,2 3,24,25,26,27,28,29,30])) a using(id); I'm sure that backend could do that in a much faster and elegant fasion. Bitmap-or is nice, but for many IN arguments it is still much slower than join.
Please post an explain analyze on your query with a 20-30 item IN clause so that we can see what plan is being generated.
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
On 10/11/05, Ilia Kantor <ilia@obnovlenie.ru> wrote:
When in clause becomes large enough (>20-30 cases),
It is much better to use "join" way of processing..
I mean,
"SELECT * FROM table WHERE field IN (1,2...30)" will be slower than
"SELECT * FROM table JOIN (SRF returning 1...30) USING(field)"
I'm not quite sure, where the difference starts, but sometimes I need to
make selects with 30 or more items by primary key and I get significant
speed up by this transform:
CREATE OR REPLACE FUNCTION array2table(arr int[]) RETURNS SETOF int
select * from persons join (select array2table as id from
array2table(array[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,2
3,24,25,26,27,28,29,30])) a using(id);
I'm sure that backend could do that in a much faster and elegant fasion.
Bitmap-or is nice, but for many IN arguments it is still much slower than
join.
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
>Please post an explain analyze on your query with a 20-30 item IN clause so that we can see what plan is being generated. It is bitmap-OR on multiple index(PK) lookups.
Ilia, > It is bitmap-OR on multiple index(PK) lookups. Describing it doesn't help. We need an *actual* EXPLAIN ANALYZE. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
>> It is bitmap-OR on multiple index(PK) lookups. > Describing it doesn't help. We need an *actual* EXPLAIN ANALYZE. Sure, why not.. 6ms for Bitmap Heap Scan on objects_hier (cost=60.29..179.57 rows=80 width=600) (actual time=0.835..1.115 rows=138 loops=1) Recheck Cond: ((id = 1) OR (id = 2) OR (id = 3) OR (id = 4) OR (id = 5) OR (id = 6) OR (id = 7) OR (id = 8) OR (id = 9)OR (id = 10) OR (id = 11) OR (id = 12) OR (id = 13) OR (id = 14) OR (id = 15) OR (id = 16) OR (id = 17) OR (id = 18) OR ( id = 19) OR (id = 20) OR (id = 21) OR (id = 22) OR (id = 23) OR (id = 24) OR (id = 25) OR (id = 26) OR (id = 27) OR (id = 28) OR (id = 29) OR (id = 30)) -> BitmapOr (cost=60.29..60.29 rows=82 width=0) (actual time=0.553..0.553 rows=0 loops=1) -> Bitmap Index Scan on lkjk (cost=0.00..2.02 rows=6 width=0) (actual time=0.036..0.036 rows=6 loops=1) Index Cond: (id = 1) -> Bitmap Index Scan on lkjk (cost=0.00..2.02rows=6 width=0) (actual time=0.044..0.044 rows=6 loops=1) Index Cond: (id = 2) -> Bitmap Index Scan on lkjk (cost=0.00..2.02rows=6 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 3) -> Bitmap Index Scan on lkjk (cost=0.00..2.02rows=6 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 4) -> Bitmap Index Scan on lkjk (cost=0.00..2.02rows=6 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 5) -> Bitmap Index Scan on lkjk (cost=0.00..2.02rows=6 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 6) -> Bitmap Index Scan on lkjk (cost=0.00..2.02rows=6 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 7) -> Bitmap Index Scan on lkjk (cost=0.00..2.02rows=6 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 8) -> Bitmap Index Scan on lkjk (cost=0.00..2.02rows=6 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 9) -> Bitmap Index Scan on lkjk (cost=0.00..2.02rows=6 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 10) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 11) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 12) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 13) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 14) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 15) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 16) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 17) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 18) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 19) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 20) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 21) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 22) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=6 loops=1) Index Cond: (id = 23) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: (id = 24) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: (id = 25) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: (id = 26) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: (id = 27) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: (id = 28) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: (id = 29) -> Bitmap Index Scan on lkjk (cost=0.00..2.00rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: (id = 30) 4ms for explain analyze select * from objects_hier join (select array2table as id from array2table(array[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,2 3,24,25,26,27,28,29,30])) a using(id); QUERY PLAN ---------------------------------------------------------------------------- ------------------------------------------------------Merge Join (cost=62.33..576.80 rows=1117 width=600) (actual time=0.542..2.898 rows=138 loops=1) Merge Cond: ("outer".id = "inner".array2table) -> Index Scan using lkjk on objects_hier (cost=0.00..493.52 rows=1680 width=600) (actual time=0.025..1.248 rows=139 loops=1) -> Sort (cost=62.33..64.83 rows=1000 width=4) (actual time=0.510..0.799 rows=145 loops=1) Sort Key: array2table.array2table -> Function Scan on array2table (cost=0.00..12.50 rows=1000 width=4) (actual time=0.081..0.141 rows=30 loops=1)
"Ilia Kantor" <ilia@obnovlenie.ru> writes: > Bitmap Heap Scan on objects_hier (cost=60.29..179.57 rows=80 width=600) > (actual time=0.835..1.115 rows=138 loops=1) vs > Merge Join (cost=62.33..576.80 rows=1117 width=600) (actual > time=0.542..2.898 rows=138 loops=1) Hmm, sure looks from here like the bitmap plan is faster. regards, tom lane
true dat :)
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
On 10/12/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Ilia Kantor" <ilia@obnovlenie.ru> writes:
> Bitmap Heap Scan on objects_hier (cost=60.29..179.57 rows=80 width=600)
> (actual time=0.835..1.115 rows=138 loops=1)
vs
> Merge Join (cost=62.33..576.80 rows=1117 width=600) (actual
> time=0.542..2.898 rows=138 loops=1)
Hmm, sure looks from here like the bitmap plan is faster.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
On 2005-10-11, "Ilia Kantor" <ilia@obnovlenie.ru> wrote: > When in clause becomes large enough (>20-30 cases), > It is much better to use "join" way of processing.. or even a different way of writing the IN clause. This one is one I've used after considerable research: select * from tablewhere field in (select (some_array_of_N_items)[i] from generate_series(1,N) as s(i)); This generally plans as a nestloop, with a HashAggregate of the function scan (of generate_series) on the outer path, and index lookups on the inner path. It's worth noting that EXPLAIN ANALYZE doesn't tell the whole story when comparing queries of this kind. The IN (1,2,...30) form is much slower to plan, and usually can't usefully be used in prepared form (you'd need to prepare it separately for every different number of items); in contrast, the array version can be prepared once and reused. As the number of items in the IN clause increases, the planning time grows rather radically. For example with 1000 items (here stashed in a psql convenience variable for brevity), using 8.1beta2: test=# prepare tstplan1 as select * from test where id in (:v); Time: 4881.702 ms compare: test=# prepare tstplan2 as select * from test where id in (select (ARRAY[:v])[i] from generate_series(1,1000) s(i)); Time: 10.889 ms (on my machine the break-even point for these two is less than 20 items, or even less if the array is passed in as a literal or a single parameter rather than constructed with ARRAY[].) The actual execution time of these two is very close, with the second being about 10% slower on my system (31ms vs 34ms, based on \timing values from psql and averaged over several goes). However, the timings returned from EXPLAIN ANALYZE are much more skewed: 42ms vs 66ms as reported in the "total runtime" line. So not only is the planning time different, but also the instrumentation overhead of EXPLAIN ANALYZE is wildly different between the two forms. What this means is that unless you're going to prepare in advance every possible number of parameters to IN that your app is ever going to use, the only way to get useful performance for IN queries with more than a handful of literal values is to use an array method, in spite of the fact that the bitmap-OR execution plan is actually at least as fast. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
Andrew - Supernews <andrew+nonews@supernews.com> writes: > As the number of items in the IN clause increases, the planning time grows > rather radically. I was looking at this yesterday. There is some O(N^2) behavior in create_bitmap_subplan, stemming from trying to remove duplicated qual conditions. That strikes me as a relatively useless activity, and I was thinking the easiest answer might be to just delete that "optimization". > The actual execution time of these two is very close, with the second > being about 10% slower on my system (31ms vs 34ms, based on \timing values > from psql and averaged over several goes). However, the timings returned > from EXPLAIN ANALYZE are much more skewed: 42ms vs 66ms as reported in the > "total runtime" line. So not only is the planning time different, but also > the instrumentation overhead of EXPLAIN ANALYZE is wildly different between > the two forms. Yeah, this isn't all that surprising because of the larger number of plan nodes involved in the bitmap plan. regards, tom lane
1) merge join is faster for me then hash join (6 vs 3-4ms): explain analyze select * from objects_hier where id in (select (array[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26, 27,28,29,30])[i] from generate_series(1,30) s(i)); Hash Join (cost=17.50..162.93 rows=223 width=600) (actual time=1.148..32.654 rows=138 loops=1) Hash Cond: ("outer".id = ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,2 8,29,30}'::integer[])["inner".i]) -> Seq Scan on objects_hier (cost=0.00..134.80 rows=1680 width=600) (actual time=0.235..12.271 rows=1680 loops=1) -> Hash (cost=17.00..17.00 rows=200 width=4) (actual time=0.893..0.893 rows=30 loops=1) -> HashAggregate (cost=15.00..17.00 rows=200 width=4) (actual time=0.161..0.830 rows=30 loops=1) -> Function Scan on generate_series s (cost=0.00..12.50 rows=1000 width=4) (actual time=0.037..0.097 rows=30 loops=1) 2) Is it possible to make the planner plan that better, e.g use Bitmap-OR for <20numbers, use some other method for >20 numbers (actual number depends on costs) ? Maybe that should be added to TODO ? -----Original Message----- From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Andrew - Supernews Sent: Wednesday, October 12, 2005 1:41 PM To: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] slow IN() clause for many cases On 2005-10-11, "Ilia Kantor" <ilia@obnovlenie.ru> wrote: > When in clause becomes large enough (>20-30 cases), > It is much better to use "join" way of processing.. or even a different way of writing the IN clause. This one is one I've used after considerable research: select * from tablewhere field in (select (some_array_of_N_items)[i] from generate_series(1,N) as s(i)); This generally plans as a nestloop, with a HashAggregate of the function scan (of generate_series) on the outer path, and index lookups on the inner path. It's worth noting that EXPLAIN ANALYZE doesn't tell the whole story when comparing queries of this kind. The IN (1,2,...30) form is much slower to plan, and usually can't usefully be used in prepared form (you'd need to prepare it separately for every different number of items); in contrast, the array version can be prepared once and reused. As the number of items in the IN clause increases, the planning time grows rather radically. For example with 1000 items (here stashed in a psql convenience variable for brevity), using 8.1beta2: test=# prepare tstplan1 as select * from test where id in (:v); Time: 4881.702 ms compare: test=# prepare tstplan2 as select * from test where id in (select (ARRAY[:v])[i] from generate_series(1,1000) s(i)); Time: 10.889 ms (on my machine the break-even point for these two is less than 20 items, or even less if the array is passed in as a literal or a single parameter rather than constructed with ARRAY[].) The actual execution time of these two is very close, with the second being about 10% slower on my system (31ms vs 34ms, based on \timing values from psql and averaged over several goes). However, the timings returned from EXPLAIN ANALYZE are much more skewed: 42ms vs 66ms as reported in the "total runtime" line. So not only is the planning time different, but also the instrumentation overhead of EXPLAIN ANALYZE is wildly different between the two forms. What this means is that unless you're going to prepare in advance every possible number of parameters to IN that your app is ever going to use, the only way to get useful performance for IN queries with more than a handful of literal values is to use an array method, in spite of the fact that the bitmap-OR execution plan is actually at least as fast. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
IMHO, we should first look at whether the O(N^2) behavior is needed.
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
On 10/12/05, Ilia Kantor <ilia@obnovlenie.ru > wrote:
1) merge join is faster for me then hash join (6 vs 3-4ms):
explain analyze select * from objects_hier where id in (select
(array[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,
27,28,29,30])[i] from generate_series(1,30) s(i));
Hash Join (cost= 17.50..162.93 rows=223 width=600) (actual
time=1.148..32.654 rows=138 loops=1)
Hash Cond: ("outer".id =
('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,2
8,29,30}'::integer[])["inner".i])
-> Seq Scan on objects_hier (cost=0.00..134.80 rows=1680 width=600)
(actual time=0.235..12.271 rows=1680 loops=1)
-> Hash (cost=17.00..17.00 rows=200 width=4) (actual time=0.893..0.893
rows=30 loops=1)
-> HashAggregate (cost=15.00..17.00 rows=200 width=4) (actual
time=0.161..0.830 rows=30 loops=1)
-> Function Scan on generate_series s (cost=0.00..12.50
rows=1000 width=4) (actual time=0.037..0.097 rows=30 loops=1)
2) Is it possible to make the planner plan that better, e.g use Bitmap-OR
for <20numbers, use some other method for >20 numbers (actual number depends
on costs) ?
Maybe that should be added to TODO ?
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto: pgsql-hackers-owner@postgresql.org] On Behalf Of Andrew - Supernews
Sent: Wednesday, October 12, 2005 1:41 PM
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] slow IN() clause for many cases
On 2005-10-11, "Ilia Kantor" <ilia@obnovlenie.ru> wrote:
> When in clause becomes large enough (>20-30 cases),
> It is much better to use "join" way of processing..
or even a different way of writing the IN clause.
This one is one I've used after considerable research:
select * from table
where field in (select (some_array_of_N_items)[i]
from generate_series(1,N) as s(i));
This generally plans as a nestloop, with a HashAggregate of the function
scan (of generate_series) on the outer path, and index lookups on the inner
path.
It's worth noting that EXPLAIN ANALYZE doesn't tell the whole story when
comparing queries of this kind. The IN (1,2,...30) form is much slower to
plan, and usually can't usefully be used in prepared form (you'd need to
prepare it separately for every different number of items); in contrast,
the array version can be prepared once and reused.
As the number of items in the IN clause increases, the planning time grows
rather radically. For example with 1000 items (here stashed in a psql
convenience variable for brevity), using 8.1beta2:
test=# prepare tstplan1 as select * from test where id in (:v);
Time: 4881.702 ms
compare:
test=# prepare tstplan2 as select * from test where id in
(select (ARRAY[:v])[i] from generate_series(1,1000) s(i));
Time: 10.889 ms
(on my machine the break-even point for these two is less than 20 items,
or even less if the array is passed in as a literal or a single parameter
rather than constructed with ARRAY[].)
The actual execution time of these two is very close, with the second
being about 10% slower on my system (31ms vs 34ms, based on \timing values
from psql and averaged over several goes). However, the timings returned
from EXPLAIN ANALYZE are much more skewed: 42ms vs 66ms as reported in the
"total runtime" line. So not only is the planning time different, but also
the instrumentation overhead of EXPLAIN ANALYZE is wildly different between
the two forms.
What this means is that unless you're going to prepare in advance every
possible number of parameters to IN that your app is ever going to use,
the only way to get useful performance for IN queries with more than a
handful of literal values is to use an array method, in spite of the fact
that the bitmap-OR execution plan is actually at least as fast.
--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
--
Respectfully,
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/
On 2005-10-12, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew - Supernews <andrew+nonews@supernews.com> writes: >> As the number of items in the IN clause increases, the planning time grows >> rather radically. > > I was looking at this yesterday. There is some O(N^2) behavior in > create_bitmap_subplan, stemming from trying to remove duplicated qual > conditions. That strikes me as a relatively useless activity, and I was > thinking the easiest answer might be to just delete that "optimization". Well, the behaviour is definitely bad. For comparison, on 8.0 the IN (list of 1000 items) version plans only about four times slower than IN (select array...), and again the execution times are comparable. But for the case of a real-world app that uses IN () a lot (dspam), I've had reports that the performance improvements from switching to the array method are even more substantial than my raw timings suggest. >> The actual execution time of these two is very close, with the second >> being about 10% slower on my system (31ms vs 34ms, based on \timing values >> from psql and averaged over several goes). However, the timings returned >> from EXPLAIN ANALYZE are much more skewed: 42ms vs 66ms as reported in the >> "total runtime" line. So not only is the planning time different, but also >> the instrumentation overhead of EXPLAIN ANALYZE is wildly different between >> the two forms. > > Yeah, this isn't all that surprising because of the larger number of > plan nodes involved in the bitmap plan. I think you have this backwards - the instrumentation overhead is _lower_ for the bitmap-OR plan than for the nestloop. This was also true in 8.0; the instrumentation overhead of an index OR scan plan is lower than the nestloop plan, though not by nearly as much. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
On 2005-10-12, Andrew - Supernews <andrew+nonews@supernews.com> wrote: > On 2005-10-12, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Andrew - Supernews <andrew+nonews@supernews.com> writes: >>> As the number of items in the IN clause increases, the planning time grows >>> rather radically. >> >> I was looking at this yesterday. There is some O(N^2) behavior in >> create_bitmap_subplan, stemming from trying to remove duplicated qual >> conditions. That strikes me as a relatively useless activity, and I was >> thinking the easiest answer might be to just delete that "optimization". > > Well, the behaviour is definitely bad. > > For comparison, on 8.0 the IN (list of 1000 items) version plans only about > four times slower than IN (select array...), and again the execution times > are comparable. But for the case of a real-world app that uses IN () a lot > (dspam), I've had reports that the performance improvements from switching > to the array method are even more substantial than my raw timings suggest. Trying this again, on the same data, with the latest planner change shows that the plan time for IN (list of 1000 items) is now much better, 47ms rather than nearly five seconds, but that's still substantially longer than the execution time in the case where the data is in cache. I'm not sure why, but the execution time for that query has improved too, it's now about 20% faster for the bitmap-OR than for the array method. With everything in cache, selecting 1000 random rows from a 200k row table, I get: for IN (list): planning time 47ms, execution time 27ms array/nestloop: planning time 8ms, execution time 34ms -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
Andrew - Supernews <andrew+nonews@supernews.com> writes: > With everything in cache, selecting 1000 random rows from a 200k row table, > I get: > > for IN (list): planning time 47ms, execution time 27ms > array/nestloop: planning time 8ms, execution time 34ms The reason for the slow planning time is that IN (list) is currently converted (by the grammar) into an OR structure: (x = val1) OR (x = val2) OR (x = val3) OR (x = val4) OR ... which means that the planner has to consider each subclause separately: discover that it can be matched to an index, build the indexscan plan, etc. Even just looking up all the "=" operators during parsing takes a fair amount of time :-( The large number of plan nodes also imply relatively slow executor startup. It strikes me that now that we have the bitmap indexscan mechanism, it wouldn't be hard to do better. I'm thinking that IN should be converted to a ScalarArrayOpExpr, ie x = ANY (ARRAY[val1,val2,val3,val4,...]) and then we could treat both OpExpr and ScalarArrayOpExpr as matching an index when the LHS is the index key and the operator is in the index's opclass. This wouldn't fit comfortably into a plain indexscan, but a bitmap indexscan has already got the mechanism for ORing together the results of several primitive indexscans (one per array element). You just do the scans and insert all the results into your output bitmap. This approach would reduce the planner and executor-startup costs of even a long IN list to be pretty comparable to a single index key comparison. The actual runtime of the plan wouldn't change much, I think, but it's the overhead that's hurting us here. This also solves the problem mentioned earlier in the thread that you can't make a prepared plan for varying lengths of IN-lists: instead of using IN, you do something likex = ANY($1::int[]) and then ship your lookup keys as a single array parameter instead of multiple scalars. The main objection I can see is that this technique requires the elements of the IN list to be converted to a common datatype (ie, the element datatype of the ARRAY construct). Historically our code has made no such assumption. However, I believe that the SQL standard expects that conversion to happen: "x IN (list)" is defined as "x = ANY (VALUES list)" and <table value constructor> expects all its rows to be converted to a common rowtype. So we could point to the standard if anyone complains that the behavior changed. Obviously this is not happening for 8.1, but it seems very do-able for 8.2. Comments? regards, tom lane
I wrote: > I'm thinking that IN should be > converted to a ScalarArrayOpExpr, ie > x = ANY (ARRAY[val1,val2,val3,val4,...]) Actually, there is one little thing in the way of doing this: it'll fail if any of the IN-list elements are NULL, because we have not got support for arrays with null elements. So we'd have to fix that first. regards, tom lane
On Fri, Oct 14, 2005 at 07:09:17PM -0400, Tom Lane wrote: > I wrote: > > I'm thinking that IN should be converted to a ScalarArrayOpExpr, > > ie > > > x = ANY (ARRAY[val1,val2,val3,val4,...]) > > Actually, there is one little thing in the way of doing this: it'll > fail if any of the IN-list elements are NULL, because we have not > got support for arrays with null elements. So we'd have to fix > that first. +1 on fixing that. :) Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
On Fri, Oct 14, 2005 at 07:09:17PM -0400, Tom Lane wrote: > I wrote: > > I'm thinking that IN should be > > converted to a ScalarArrayOpExpr, ie > > x = ANY (ARRAY[val1,val2,val3,val4,...]) > Actually, there is one little thing in the way of doing this: it'll > fail if any of the IN-list elements are NULL, because we have not got > support for arrays with null elements. So we'd have to fix that first. Hey Tom. Obviously your area of expertise, so please tell me where I'm wrong - But doesn't "x IN (NULL)" already fail to ever match, similar to "x = NULL"? (Assuming that compatibility switch isn't enabled) So, I'd hope people weren't using such an expression? :-) Or is that not good enough? What does NULL::text[] turn into? An empty string? Is that the danger? It might match against an empty string? Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
mark@mark.mielke.cc writes: > But doesn't "x IN (NULL)" already fail to ever match, similar to "x > = NULL"? (Assuming that compatibility switch isn't enabled) The case I'm worried about is "x IN (1,2,NULL)". This *can* match. Also, it's supposed to return NULL (not FALSE) if it doesn't match. So simply discarding NULLs before we build the array wouldn't give the right answer. Considering that allowing null array entries has been a big wishlist item for many years, I don't think that anyone will object to fixing that in 8.2, anyway ... regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > It strikes me that now that we have the bitmap indexscan mechanism, > it wouldn't be hard to do better. I'm thinking that IN should be > converted to a ScalarArrayOpExpr, ie > > x = ANY (ARRAY[val1,val2,val3,val4,...]) > > and then we could treat both OpExpr and ScalarArrayOpExpr as matching > an index when the LHS is the index key and the operator is in the > index's opclass. This wouldn't fit comfortably into a plain indexscan, > but a bitmap indexscan has already got the mechanism for ORing together > the results of several primitive indexscans (one per array element). > You just do the scans and insert all the results into your output > bitmap. Would this mean it would be impossible to get a fast-start plan for an IN expression though? I would fear queries like SELECT * from tab WHERE x IN (1,2,3) LIMIT 1 Which ought to be instantaneous would become potentially slow. (Actually I'm more interested in cases where instead of LIMIT 1 it's the client that will stop as soon as it finds the record it's really looking for.) -- greg
Greg Stark <gsstark@mit.edu> writes: > I would fear queries like > SELECT * from tab WHERE x IN (1,2,3) LIMIT 1 > Which ought to be instantaneous would become potentially slow. Why? They certainly wouldn't be any slower than they are now. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > I would fear queries like > > > SELECT * from tab WHERE x IN (1,2,3) LIMIT 1 > > > Which ought to be instantaneous would become potentially slow. > > Why? They certainly wouldn't be any slower than they are now. Well currently they get rewritten to use OR which does a single index scan which I assumed returned rows as soon as it finds them like it does for regular range lookup index scans. Is that assumption wrong? The bitmap scan has to traverse all the index entries for matching records before it can fetch the first record. So it wouldn't be a fast-start plan. Not as bad as performing a sort step or anything like that of course. -- greg
Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Why? They certainly wouldn't be any slower than they are now. > Well currently they get rewritten to use OR which does a single index scan Not in 8.1 it doesn't; that code is long gone. 2005-04-24 21:30 tgl Remove support for OR'd indexscans internal to a single IndexScanplan node, as this behavior is now better done as a bitmapORindexscan. This allows considerable simplification innodeIndexscan.c itself as well as several planner modules concernedwithindexscan plan generation. Also we can improve the sharing ofcode between regular and bitmap indexscans, sincethey are nowworking with nigh-identical Plan nodes. > The bitmap scan has to traverse all the index entries for matching records > before it can fetch the first record. So it wouldn't be a fast-start > plan. If the fraction of records matching the IN-list is so large as to make that an issue, I'd think the planner would prefer a seqscan anyway. Besides which, it's a bit silly to worry about whether a plan is fast-start without taking into account the amount of planner time needed to create it. Another point here is that LIMIT without any ORDER BY isn't an amazingly useful case. Neither the old implementation of OR indexscans nor the new can produce ordered output, which means you're not going to get fast-start behavior anyway for real queries. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > If the fraction of records matching the IN-list is so large as to make > that an issue, I'd think the planner would prefer a seqscan anyway. > Besides which, it's a bit silly to worry about whether a plan is > fast-start without taking into account the amount of planner time needed > to create it. Well I'm thinking about the cases where there's a short IN-list but many records match those clauses. Something like: WHERE user_status IN ('active','pending') Where there could be thousands of matching records. > Another point here is that LIMIT without any ORDER BY isn't an amazingly > useful case. Neither the old implementation of OR indexscans nor the > new can produce ordered output, which means you're not going to get > fast-start behavior anyway for real queries. That's true. That's why I was wondering more about cases where the client end was going to read all the records until it found the record it's looking for or found enough records for its purposes. There are other uses of fast-start as well. If the records are going to be put through a slow process it can be more efficient to pipeline them through while the later records are still coming from the database. But I guess this is all moot if the option isn't there, at least not without a lot more programming. The example above raises another idea though. Would it be possible for the optimizer to recognize when a clause is so expansive that it would be faster to read the complement than the actual clause as written? That could be useful with bitmap operations if it meant less time reading the index to build the bitmap but the eventual result after the bitmap operations is restrictive enough to justify an index scan. eg a case where 90% of users are active, and 10% have pending set and there's an index on each of these: WHERE user_status = 'active' AND pending = true If the optimizer tries to a bitmap index scan now it has to read in a huge index; it's probably better off just using the pending index alone. But if it realizes that it can use the index to find the tuples that *aren't* active and then take the complement of that bitmap it could build a bitmap reading in only 10% of that index. -- greg
On Sun, Oct 16, 2005 at 12:03:33PM -0400, Greg Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > Another point here is that LIMIT without any ORDER BY isn't an amazingly > > useful case. Neither the old implementation of OR indexscans nor the > > new can produce ordered output, which means you're not going to get > > fast-start behavior anyway for real queries. > > That's true. That's why I was wondering more about cases where the client end > was going to read all the records until it found the record it's looking for > or found enough records for its purposes. I would argue that the client should simply ask for what it wants rather than filtering on the client end. Then PostgreSQL has the info to optimise appropriately. > There are other uses of fast-start as well. If the records are going to be put > through a slow process it can be more efficient to pipeline them through while > the later records are still coming from the database. Well, IIRC if you create a cursor or use LIMIT, PostgreSQL will prefer fast-start plans if that gets the requested result faster. > The example above raises another idea though. Would it be possible for the > optimizer to recognize when a clause is so expansive that it would be faster > to read the complement than the actual clause as written? > > That could be useful with bitmap operations if it meant less time reading the > index to build the bitmap but the eventual result after the bitmap operations > is restrictive enough to justify an index scan. The problem is, the bitmaps are lossy. If so many matches indicate about the same area in the table, they are coalesced into a single block. Hence, you cannot meaningfully ask for a complement. However, if you knew from the beginning that you wanted the complement you may be able to arrange something. OTOH, there's no way to tell from the index if it's referred to *all* the tuples in a page so there is no way to exclude a page from consideration. > eg a case where 90% of users are active, and 10% have pending set and there's > an index on each of these: > > WHERE user_status = 'active' AND pending = true > > If the optimizer tries to a bitmap index scan now it has to read in a huge > index; it's probably better off just using the pending index alone. But if it > realizes that it can use the index to find the tuples that *aren't* active and > then take the complement of that bitmap it could build a bitmap reading in > only 10% of that index. As I pointed out above, I don't think you can take the complement of a bitmap. Besides, I havn't looked at the costings for bitmap scans but ISTM that if the stats indicate a 90% coverage for user_status and the index is not highly correlated, it should exclude that index from consideration altogether. And whether it uses a normal index scan or a bitmap scan for pending also depends on the stats. It's the comparison between scanning the whole index and then the pages in sequence versus just grabbing the tuples from the index. Besides, the case you provide is the perfect candidate for a partial index. Attributes that only apply to a small fraction of the table are better off as predicates if you're going to be searching for them a lot. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Sun, Oct 16, 2005 at 12:03:33PM -0400, Greg Stark wrote: >> That's true. That's why I was wondering more about cases where the client end >> was going to read all the records until it found the record it's looking for >> or found enough records for its purposes. > I would argue that the client should simply ask for what it wants > rather than filtering on the client end. Then PostgreSQL has the info > to optimise appropriately. Certainly, if you do not supply a LIMIT, there is no justification at all for expecting the planner to prefer fast-start over minimum-total-cost. regards, tom lane
Greg Stark <gsstark@mit.edu> writes: > The example above raises another idea though. Would it be possible for the > optimizer to recognize when a clause is so expansive that it would be faster > to read the complement than the actual clause as written? Being able to compute the complement, much less do so with an indexable clause, is usually difficult in SQL (think about NULLs). In any case I think this is the situation where you are happy to fall back to a seqscan. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Martijn van Oosterhout <kleptog@svana.org> writes: > > On Sun, Oct 16, 2005 at 12:03:33PM -0400, Greg Stark wrote: > >> That's true. That's why I was wondering more about cases where the client end > >> was going to read all the records until it found the record it's looking for > >> or found enough records for its purposes. > > > I would argue that the client should simply ask for what it wants > > rather than filtering on the client end. Then PostgreSQL has the info > > to optimise appropriately. > > Certainly, if you do not supply a LIMIT, there is no justification > at all for expecting the planner to prefer fast-start over > minimum-total-cost. Well figuring out when to prefer one or the other is a hard problem. Fundamentally the server simply does not have the information it needs to determine that available. (I think there really ought to be a bit in the protocol that the client sends with the query to indicate which is needed. That would be cleaner than Oracle's /*+ FIRST_ROW */ and /*+ ALL_ROWS */ hints.) But having it as an option is a separate question. Even if the server needs some cajoling to actually choose the right one it's always a good thing if it's at least possible. -- greg
On Sun, Oct 16, 2005 at 05:09:57PM -0400, Greg Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > Certainly, if you do not supply a LIMIT, there is no justification > > at all for expecting the planner to prefer fast-start over > > minimum-total-cost. > > Well figuring out when to prefer one or the other is a hard problem. > Fundamentally the server simply does not have the information it needs to > determine that available. Umm, not really. Notice how EXPLAIN has two numbers: time to first row, time to last row. If you add limit 1 it will favour plans that return the first row quickly. If you don't it'll favour plans that have the lowest total execution time, even if the first tuple takes longer. > (I think there really ought to be a bit in the protocol that the client sends > with the query to indicate which is needed. That would be cleaner than > Oracle's /*+ FIRST_ROW */ and /*+ ALL_ROWS */ hints.) It's called LIMIT and has been supported for a long time. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > > (I think there really ought to be a bit in the protocol that the client sends > > with the query to indicate which is needed. That would be cleaner than > > Oracle's /*+ FIRST_ROW */ and /*+ ALL_ROWS */ hints.) > > It's called LIMIT and has been supported for a long time. And if I *don't* want to limit the number of rows I retriev? -- greg
Greg Stark <gsstark@mit.edu> writes: > Martijn van Oosterhout <kleptog@svana.org> writes: >> It's called LIMIT and has been supported for a long time. > And if I *don't* want to limit the number of rows I retriev? You still need to do something proactive to inform the system that you want a fast-start plan. What's more, you really really do not want to give it an unlimited license to prefer zero-start-cost plans, else you might still be waiting for the third row when hell freezes over. In the current system structure, the only very reasonable way to set up incremental client processing of a query result is to use a cursor and FETCH a few rows at a time from it. The planner already has a hack to give some preference to fast-start plans when planning DECLARE CURSOR. My recollection is that it optimizes on the assumption that 10% of the rows will be retrieved, which gives some balance between start speed and not blowing out the total runtime unreasonably. We've already had some discussion about exposing that 10% figure as a GUC variable, so that you could put a finger on the scale depending on how much of the result you expected to fetch. But nobody got excited enough to make it happen... regards, tom lane
On Fri, 2005-10-14 at 19:09 -0400, Tom Lane wrote: > I wrote: > > I'm thinking that IN should be > > converted to a ScalarArrayOpExpr, ie > > > x = ANY (ARRAY[val1,val2,val3,val4,...]) > > Actually, there is one little thing in the way of doing this: it'll > fail if any of the IN-list elements are NULL, because we have not got > support for arrays with null elements. So we'd have to fix that first. You'd also need to consider how this effects partial indexes and constraint exclusion. Not much of an issue, but an extra case to handle in the predicate proving code. = = = Just had a case where using an IN list was quicker than using a join because it allowed an index lookup to occur. There is also some clear mileage in transforming this type of query to a more plannable form: select * from bigtable where word IN ( select word from customer_word where customer = 6) i.e. where the values for the IN clause are evaluated at run time, rather than at plan time. Best Regards, Simon Riggs
On Tue, Oct 11, 2005 at 10:58:58AM +0100, Simon Riggs wrote: > On Mon, 2005-10-10 at 15:14 -0500, Kevin Grittner wrote: > > We are looking at doing much more with PostgreSQL over the > > next two years, and it seems likely that this issue will come up > > again where it is more of a problem. It sounded like there was > > some agreement on HOW this was to be fixed, yet I don't see > > any mention of doing it in the TODO list. > > > Is there any sort of > > estimate for how much programming work would be involved? > > The main work here is actually performance testing, not programming. The > cost model is built around an understanding of the timings and costs > involved in the execution. > > Once we have timings to cover a sufficiently large range of cases, we > can derive the cost model. Once derived, we can program it. Discussing > improvements to the cost model without test results is never likely to > convince people. Everybody knows the cost models can be improved, the > only question is in what cases? and in what ways? > > So deriving the cost model needs lots of trustworthy test results that > can be assessed and discussed, so we know how to improve things. [...and > I don't mean 5 minutes with pg_bench...] > > Detailed analysis such as that is time consuming and also needs to be > done in a sufficiently reproducible manner that we can rely on it. > > Your help would be greatly appreciated in that area. You and your team > clearly have an eye for the fine detail of these issues. > > ...IIRC there is a TODO item relating to that. > > Perhaps we should put a more general call out on the TODO list towards > detailed, complete, accurate and reproducible performance test results? I touched on some of this in http://archives.postgresql.org/pgsql-performance/2005-05/msg00336.php: "In terms of a testing system, here's what I'm thinking of. For each cost estimate, there will be a number of input variables we want to vary, and then check to see how changes in them effect run time. Using index scan as a simple example, 1st order variables will likely be index and table size (especially in relation to cache size), and correlation. So, we need some kind of a test harness that can vary these variables (prefferably one at a time), and run a test case after each change. It would then need to store the timing info, possibly along with other information (such as explain output). If I'm the one to write this it'll end up in perl, since that's the only language I know that would be able to accomplish this. DBT seems to be a reasonable test database to do this testing with, especially since it would provide a ready means to provide external load." Of course, this work can be done by hand, but it's a very slow, tedeous process. It's also rather error-prone. There's been some discussion on the osdl-dbt mailing lists about providing a front-end that would allow for scheduling tests and storing results in a database where you could data-mine more easily than you currently can (currently everything's just stored as files on a disk somewhere). ISTM that having that would make this kind of testing much easier to do. But I've also been working with dbt quite a bit recently, so it's my hammer that makes everything performance related look like a nail... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Mon, 2005-10-17 at 14:55 -0500, Jim C. Nasby wrote: > On Tue, Oct 11, 2005 at 10:58:58AM +0100, Simon Riggs wrote: > > On Mon, 2005-10-10 at 15:14 -0500, Kevin Grittner wrote: > > > We are looking at doing much more with PostgreSQL over the > > > next two years, and it seems likely that this issue will come up > > > again where it is more of a problem. It sounded like there was > > > some agreement on HOW this was to be fixed, yet I don't see > > > any mention of doing it in the TODO list. > > > > > Is there any sort of > > > estimate for how much programming work would be involved? > > > > The main work here is actually performance testing, not programming. The > > cost model is built around an understanding of the timings and costs > > involved in the execution. > > > > Once we have timings to cover a sufficiently large range of cases, we > > can derive the cost model. Once derived, we can program it. Discussing > > improvements to the cost model without test results is never likely to > > convince people. Everybody knows the cost models can be improved, the > > only question is in what cases? and in what ways? > > > > So deriving the cost model needs lots of trustworthy test results that > > can be assessed and discussed, so we know how to improve things. [...and > > I don't mean 5 minutes with pg_bench...] ... > DBT seems to be a reasonable test database I was discussing finding the cost equations to use within the optimizer based upon a series of exploratory tests using varying data. That is different to using the same database with varying parameters. Both sound interesting, but it is the former that, IMHO, would be the more important. Best Regards, Simon Riggs
On Mon, Oct 17, 2005 at 09:30:24PM +0100, Simon Riggs wrote: > On Mon, 2005-10-17 at 14:55 -0500, Jim C. Nasby wrote: > > On Tue, Oct 11, 2005 at 10:58:58AM +0100, Simon Riggs wrote: > > > On Mon, 2005-10-10 at 15:14 -0500, Kevin Grittner wrote: > > > > We are looking at doing much more with PostgreSQL over the > > > > next two years, and it seems likely that this issue will come up > > > > again where it is more of a problem. It sounded like there was > > > > some agreement on HOW this was to be fixed, yet I don't see > > > > any mention of doing it in the TODO list. > > > > > > > Is there any sort of > > > > estimate for how much programming work would be involved? > > > > > > The main work here is actually performance testing, not programming. The > > > cost model is built around an understanding of the timings and costs > > > involved in the execution. > > > > > > Once we have timings to cover a sufficiently large range of cases, we > > > can derive the cost model. Once derived, we can program it. Discussing > > > improvements to the cost model without test results is never likely to > > > convince people. Everybody knows the cost models can be improved, the > > > only question is in what cases? and in what ways? > > > > > > So deriving the cost model needs lots of trustworthy test results that > > > can be assessed and discussed, so we know how to improve things. [...and > > > I don't mean 5 minutes with pg_bench...] > > ... > > > DBT seems to be a reasonable test database > > I was discussing finding the cost equations to use within the optimizer > based upon a series of exploratory tests using varying data. That is > different to using the same database with varying parameters. Both sound > interesting, but it is the former that, IMHO, would be the more > important. True, although that doesn't necessarily mean you can't use the same data generation. For the testing I was doing before I was just varying correlation using cluster (or selecting from different fields with different correlations). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Mon, 2005-10-17 at 12:49 +0100, Simon Riggs wrote: > On Fri, 2005-10-14 at 19:09 -0400, Tom Lane wrote: > > I wrote: > > > I'm thinking that IN should be > > > converted to a ScalarArrayOpExpr, ie > > > > > x = ANY (ARRAY[val1,val2,val3,val4,...]) > > > > Actually, there is one little thing in the way of doing this: it'll > > fail if any of the IN-list elements are NULL, because we have not got > > support for arrays with null elements. So we'd have to fix that first. > > You'd also need to consider how this effects partial indexes and > constraint exclusion. Not much of an issue, but an extra case to handle > in the predicate proving code. > > = = = > > Just had a case where using an IN list was quicker than using a join > because it allowed an index lookup to occur. There is also some clear > mileage in transforming this type of query to a more plannable form: > > select * from bigtable where word IN ( > select word from customer_word where customer = 6) > > i.e. where the values for the IN clause are evaluated at run time, > rather than at plan time. Do you think we'll be able to generate a single ScalarArrayOpExpr from a small subselect and pass it through as an indexable expression? I'm guessing its not lost on you that this would give a Star join like capability, when joining multiple dimension tables to a large Fact table. e.g. Select * From Sales where month IN ( select month from time_dimension where FinYear = 2005 and Quarter = 3) ... Having taught predtest.c about ScalarArrayOpExpr means that would allow this to work with constraint exclusion. So that solves the how-to-join-AND-partition problem I've been struggling with: don't join, transform. Very cool. Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > Do you think we'll be able to generate a single ScalarArrayOpExpr from a > small subselect and pass it through as an indexable expression? If you don't mind spelling it with the ARRAY(sub-select) syntax, which I think is a Postgres-ism (though it's possible Joe got it from SQL2003). regression=# explain select * from tenk1 where unique1 = any (array(select f1 from int4_tbl)); QUERY PLAN -----------------------------------------------------------------------------Bitmap Heap Scan on tenk1 (cost=3.09..37.86rows=10 width=244) Recheck Cond: (unique1 = ANY ($0)) InitPlan -> Seq Scan on int4_tbl (cost=0.00..1.05rows=5 width=4) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..2.04 rows=10 width=0) Index Cond:(unique1 = ANY ($0)) (6 rows) Of course the planner is just guessing about how many rows this will produce. > e.g. > Select * From Sales where month IN ( > select month from time_dimension where FinYear = 2005 and Quarter = 3) > Having taught predtest.c about ScalarArrayOpExpr means that would allow > this to work with constraint exclusion. Not hardly, unless you want to play fast and loose with semantics by evaluating subselects at plan time instead of run time. You could persuade that to happen by wrapping the ARRAY(sub-select) into a function mis-declared as IMMUTABLE, but I'd be pretty resistant to having the planner assume any such thing by default. regards, tom lane
Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > >>Do you think we'll be able to generate a single ScalarArrayOpExpr from a >>small subselect and pass it through as an indexable expression? > > If you don't mind spelling it with the ARRAY(sub-select) syntax, which > I think is a Postgres-ism (though it's possible Joe got it from > SQL2003). It's in SQL 2003. See excerpt below. Joe 6.36 <array value constructor> Function Specify construction of an array. Format <array value constructor> ::= <array value constructor by enumeration> | <array value constructor by query> <arrayvalue constructor by enumeration> ::= ARRAY <left bracket or trigraph> <array element list> <rightbracket or trigraph> <array value constructor by query> ::= ARRAY <left paren> <query expression> [ <orderby clause> ] <right paren>
On Tue, 2005-11-29 at 17:21 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Do you think we'll be able to generate a single ScalarArrayOpExpr from a > > small subselect and pass it through as an indexable expression? > > If you don't mind spelling it with the ARRAY(sub-select) syntax, which > I think is a Postgres-ism (though it's possible Joe got it from > SQL2003). > > regression=# explain select * from tenk1 where unique1 = any (array(select f1 from int4_tbl)); > QUERY PLAN > ----------------------------------------------------------------------------- > Bitmap Heap Scan on tenk1 (cost=3.09..37.86 rows=10 width=244) > Recheck Cond: (unique1 = ANY ($0)) > InitPlan > -> Seq Scan on int4_tbl (cost=0.00..1.05 rows=5 width=4) > -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..2.04 rows=10 width=0) > Index Cond: (unique1 = ANY ($0)) > (6 rows) > > Of course the planner is just guessing about how many rows this will > produce. So we could teach the planner to transform: IN (subselect) into = ANY(array(subselect)) if we had the planner think the subselect had say < 1000 rows? > > e.g. > > Select * From Sales where month IN ( > > select month from time_dimension where FinYear = 2005 and Quarter = 3) > > > Having taught predtest.c about ScalarArrayOpExpr means that would allow > > this to work with constraint exclusion. > > Not hardly, unless you want to play fast and loose with semantics by > evaluating subselects at plan time instead of run time. You could > persuade that to happen by wrapping the ARRAY(sub-select) into a > function mis-declared as IMMUTABLE, but I'd be pretty resistant to > having the planner assume any such thing by default. Man, thats a horrible thought. I must be dragging you down :-) IMHO the only way to do joins that access partitions is to do the constraint exclusion at run time, but I can see thats a longer conversation than I can start right now. Best Regards, Simon Riggs
On Tue, Nov 29, 2005 at 10:53:38PM +0000, Simon Riggs wrote: > On Tue, 2005-11-29 at 17:21 -0500, Tom Lane wrote: > > regression=# explain select * from tenk1 where unique1 = any (array(select f1 from int4_tbl)); <snip> > So we could teach the planner to transform: > > IN (subselect) > > into > > = ANY(array(subselect)) > > if we had the planner think the subselect had say < 1000 rows? Do these constructs have the same semantics w.r.t. NULL? Currently arrays can't have nulls but that is changing. Also, won't they behave differently in the case where the subselect returns duplicate values? And finally, why can't: > > > Select * From Sales where month IN ( > > > select month from time_dimension where FinYear = 2005 and Quarter = 3) Be written as: Select sales.* From Sales, time_dimension where month = time_dimension.inYear = 2005 and time_dimension.Quarter = 3; As long as there are no NULLs it returns the same as the IN() version and PostgreSQL can optimise it just fine. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Simon Riggs <simon@2ndquadrant.com> writes: > IMHO the only way to do joins that access partitions is to do the > constraint exclusion at run time, but I can see thats a longer > conversation than I can start right now. My experience in Oracle was that you can see three different types of partition plans 1) Plans that the planner can statically deduce will access specific partitions. This is basically what Postgres has now. 2) Plans that the planner can statically deduce will only access a limited number of partitions but which partitions willbe determined at run-time. This is sort of like how Postgres handles index scans in subqueries where the value beinglooked up is a parameter. 3) Plans that expect to span all the partitions because it can't find any constraint in the query from which it can deducewhich partitions to look at. Option 2 happens both for joins, and also for simple lookups where the value being looked up is a parameter in a prepared query. My personal inclination is that prepared queries are the way of the future so I think this is an important case. -- greg
Martijn van Oosterhout <kleptog@svana.org> writes: > Do these constructs have the same semantics w.r.t. NULL? I think so, though it'd be good to read the spec closely. > Currently arrays can't have nulls That's so last week ;-) regards, tom lane
On Wed, 2005-11-30 at 07:18 +0100, Martijn van Oosterhout wrote: > And finally, why can't: > > > > > Select * From Sales where month IN ( > > > > select month from time_dimension where FinYear = 2005 and Quarter = 3) > > Be written as: > > Select sales.* From Sales, time_dimension > where month = time_dimension.inYear = 2005 and time_dimension.Quarter = 3; > > As long as there are no NULLs it returns the same as the IN() version > and PostgreSQL can optimise it just fine. It can, of course, but there must be value in that optimization. If you consider how IN () would be transformed into =ANY(ARRAY(subselect)) you'll see that the subselect values would be treated as constants that could result in a bitmap index lookup. Transforming IN () into straight joins would not take the same approach when more than one join (i.e. 3 or more tables) was requested. Best Regards, Simon Riggs
On Fri, Dec 02, 2005 at 08:18:44AM +0000, Simon Riggs wrote: > It can, of course, but there must be value in that optimization. > > If you consider how IN () would be transformed into > =ANY(ARRAY(subselect)) you'll see that the subselect values would be > treated as constants that could result in a bitmap index lookup. > > Transforming IN () into straight joins would not take the same approach > when more than one join (i.e. 3 or more tables) was requested. Are you sure? If you have one table joined to many others, that is the single most obvious case for bitmap indexes. And joins are converted to bitmap index scans all the time so I'm unsure why this case would be any different. If the results are the same, the optimiser should optimise both the same, no? Anyway, maybe I'm just old fashioned in thinking that joins are by far the easiest to optimise because they are closest to relational algebra. IN() can also be converted to a join, except for the NULL effect. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.