Thread: strange query behavior
Hi, I have a query that uses an IN clause and it seems in perform great when there is more than two values in it but if there is only one it is really slow. Also if I change the query to use an = instead of IN in the case of only one value it is still slow. Possibly I need to reindex this particular index? thanks Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555
On mið, 2006-12-13 at 13:42 -0500, Tim Jones wrote: > I have a query that uses an IN clause and it seems in perform great > when there is more than two values in it but if there is only one it is > really slow. Also if I change the query to use an = instead of IN in the > case of only one value it is still slow. Possibly I need to reindex this > particular index? can you provide us with an EXPLAIN ANALYZE for these 2 cases? what version pg is this? does this happen only for a particular single value, or for any values? I assume you have ANALYZEd the table in question. gnari
"Tim Jones" <TJones@optio.com> writes: > I have a query that uses an IN clause and it seems in perform great > when there is more than two values in it but if there is only one it is > really slow. Also if I change the query to use an = instead of IN in the > case of only one value it is still slow. Please provide EXPLAIN ANALYZE output for both cases. > Possibly I need to reindex this > particular index? More likely you need to ANALYZE the table so that the planner has up-to-date stats ... regards, tom lane
The tables for theses queries are vacuumed and analyzed regularly. I just did an analyze to be sure and here are the results explain analyze select * from battery join observationresults on battery.batteryidentifier = observationresults.batteryidentifier left outer join observationcomment on observationresults.observationidentifier = observationcomment.observationidentifier left outer join batterycomment on battery.batteryidentifier=batterycomment.batteryidentifier where battery.batteryidentifier in (1177470, 1177469); QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ --------------- Nested Loop Left Join (cost=5.03..12553.00 rows=13 width=248) (actual time=0.362..1.345 rows=30 loops=1) -> Nested Loop Left Join (cost=4.01..12424.13 rows=13 width=208) (actual time=0.307..0.927 rows=30 loops=1) -> Nested Loop (cost=4.01..9410.49 rows=13 width=145) (actual time=0.227..0.416 rows=30 loops=1) -> Bitmap Heap Scan on battery (cost=4.01..11.64 rows=2 width=69) (actual time=0.135..0.138 rows=2 loops=1) Recheck Cond: ((batteryidentifier = 1177470) OR (batteryidentifier = 1177469)) -> BitmapOr (cost=4.01..4.01 rows=2 width=0) (actual time=0.106..0.106 rows=0 loops=1) -> Bitmap Index Scan on ix_battery_id (cost=0.00..2.00 rows=1 width=0) (actual time=0.091..0.091 rows=1 loops=1) Index Cond: (batteryidentifier = 1177470) -> Bitmap Index Scan on ix_battery_id (cost=0.00..2.00 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1) Index Cond: (batteryidentifier = 1177469) -> Index Scan using ix_obresults_bat on observationresults (cost=0.00..4682.40 rows=1362 width=76) (actual time=0.047..0.091 rows=15 loops=2) Index Cond: ("outer".batteryidentifier = observationresults.batteryidentifier) -> Index Scan using ix_obcomment_obid on observationcomment (cost=0.00..227.73 rows=327 width=63) (actual time=0.013..0.013 rows=0 loops=30) Index Cond: ("outer".observationidentifier = observationcomment.observationidentifier) -> Bitmap Heap Scan on batterycomment (cost=1.02..9.84 rows=6 width=40) (actual time=0.007..0.007 rows=0 loops=30) Recheck Cond: ("outer".batteryidentifier = batterycomment.batteryidentifier) -> Bitmap Index Scan on ix_batcomment (cost=0.00..1.02 rows=6 width=0) (actual time=0.005..0.005 rows=0 loops=30) Index Cond: ("outer".batteryidentifier = batterycomment.batteryidentifier) Total runtime: 1.585 ms explain analyze select * from battery join observationresults on battery.batteryidentifier = observationresults.batteryidentifier left outer join observationcomment on observationresults.observationidentifier = observationcomment.observationidentifier left outer join batterycomment on battery.batteryidentifier=batterycomment.batteryidentifier where battery.batteryidentifier = 1177470; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------ Hash Left Join (cost=4733.62..269304.43 rows=1348 width=248) (actual time=19275.506..19275.568 rows=9 loops=1) Hash Cond: ("outer".batteryidentifier = "inner".batteryidentifier) -> Merge Right Join (cost=4723.75..269287.81 rows=1348 width=208) (actual time=19275.432..19275.473 rows=9 loops=1) Merge Cond: ("outer".observationidentifier = "inner".observationidentifier) -> Index Scan using ix_obcomment_obid on observationcomment (cost=0.00..245841.14 rows=7484253 width=63) (actual time=0.094..13403.300 rows=4361601 loops=1) -> Sort (cost=4723.75..4727.12 rows=1348 width=145) (actual time=0.270..0.278 rows=9 loops=1) Sort Key: observationresults.observationidentifier -> Nested Loop (cost=0.00..4653.67 rows=1348 width=145) (actual time=0.166..0.215 rows=9 loops=1) -> Index Scan using ix_battery_id on battery (cost=0.00..5.81 rows=1 width=69) (actual time=0.079..0.082 rows=1 loops=1) Index Cond: (batteryidentifier = 1177470) -> Index Scan using ix_obresults_bat on observationresults (cost=0.00..4634.38 rows=1348 width=76) (actual time=0.079..0.102 rows=9 loops=1) Index Cond: (1177470 = batteryidentifier) -> Hash (cost=9.85..9.85 rows=6 width=40) (actual time=0.039..0.039 rows=0 loops=1) -> Bitmap Heap Scan on batterycomment (cost=1.02..9.85 rows=6 width=40) (actual time=0.037..0.037 rows=0 loops=1) Recheck Cond: (batteryidentifier = 1177470) -> Bitmap Index Scan on ix_batcomment (cost=0.00..1.02 rows=6 width=0) (actual time=0.032..0.032 rows=0 loops=1) Index Cond: (batteryidentifier = 1177470) Total runtime: 19275.838 ms (18 rows) Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, December 13, 2006 2:17 PM To: Tim Jones Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] strange query behavior "Tim Jones" <TJones@optio.com> writes: > I have a query that uses an IN clause and it seems in perform great > when there is more than two values in it but if there is only one it > is really slow. Also if I change the query to use an = instead of IN > in the case of only one value it is still slow. Please provide EXPLAIN ANALYZE output for both cases. > Possibly I need to reindex this > particular index? More likely you need to ANALYZE the table so that the planner has up-to-date stats ... regards, tom lane
"Tim Jones" <TJones@optio.com> writes: > The tables for theses queries are vacuumed and analyzed regularly. I > just did an analyze to be sure and here are the results > ... There's something pretty wacko about the choice of plan in the slow case --- I don't see why it'd not have used the same plan structure as for the IN case. It's coming up with a cost a lot higher than for the other, so it certainly knows this isn't a great plan ... Which PG version is this exactly? Are you running with any nondefault planner parameters? regards, tom lane
Version 8.1 Here are the planner constraints I believe we changed effective_cache_size and random_page_cost BTW this is an AIX 5.2 #----------------------------------------------------------------------- ---- # QUERY TUNING #----------------------------------------------------------------------- ---- # - Planner Method Configuration - #enable_bitmapscan = on #enable_hashagg = on #enable_hashjoin = on #enable_indexscan = on #enable_mergejoin = on #enable_nestloop = on #enable_seqscan = on #enable_sort = on #enable_tidscan = on # - Planner Cost Constants - #effective_cache_size = 10000 # typically 8KB each effective_cache_size = 400000 random_page_cost = 3.8 # units are one sequential page fetch # cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) # - Genetic Query Optimizer - #geqo = on #geqo_threshold = 12 #geqo_effort = 5 # range 1-10 #geqo_pool_size = 0 # selects default based on effort #geqo_generations = 0 # selects default based on effort #geqo_selection_bias = 2.0 # range 1.5-2.0 # - Other Planner Options - #default_statistics_target = 10 # range 1-1000 #constraint_exclusion = off #from_collapse_limit = 8 #join_collapse_limit = 8 # 1 disables collapsing of explicit # JOINs Thanks Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, December 13, 2006 4:59 PM To: Tim Jones Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] strange query behavior "Tim Jones" <TJones@optio.com> writes: > The tables for theses queries are vacuumed and analyzed regularly. I > just did an analyze to be sure and here are the results ... There's something pretty wacko about the choice of plan in the slow case --- I don't see why it'd not have used the same plan structure as for the IN case. It's coming up with a cost a lot higher than for the other, so it certainly knows this isn't a great plan ... Which PG version is this exactly? Are you running with any nondefault planner parameters? regards, tom lane
"Tim Jones" <TJones@optio.com> writes: >> Which PG version is this exactly? Are you running with any nondefault >> planner parameters? > Version 8.1 8.1.what? > Here are the planner constraints I believe we changed > effective_cache_size and random_page_cost Those look reasonably harmless. My best bet at the moment is that you've got a pretty early 8.1.x release and are hitting one of the planner bugs that we fixed earlier this year. Not enough info to say for sure though. regards, tom lane
Looks like 8.1.2 Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, December 13, 2006 5:37 PM To: Tim Jones Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] strange query behavior "Tim Jones" <TJones@optio.com> writes: >> Which PG version is this exactly? Are you running with any >> nondefault planner parameters? > Version 8.1 8.1.what? > Here are the planner constraints I believe we changed > effective_cache_size and random_page_cost Those look reasonably harmless. My best bet at the moment is that you've got a pretty early 8.1.x release and are hitting one of the planner bugs that we fixed earlier this year. Not enough info to say for sure though. regards, tom lane
That's what I did and got 8.1.2 ... do you want gcc version etc 3.3.2 powerpc aix5.2 Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -----Original Message----- From: Matthew O'Connor [mailto:matthew@zeut.net] Sent: Wednesday, December 13, 2006 5:51 PM To: Tim Jones Subject: Re: [PERFORM] strange query behavior From psql perform: select version(); and send us that output. Tim Jones wrote: > Looks like 8.1.2 > > Tim Jones > Healthcare Project Manager > Optio Software, Inc. > (770) 576-3555 > > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Wednesday, December 13, 2006 5:37 PM > To: Tim Jones > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] strange query behavior > > "Tim Jones" <TJones@optio.com> writes: >>> Which PG version is this exactly? Are you running with any >>> nondefault planner parameters? > >> Version 8.1 > > 8.1.what? > >> Here are the planner constraints I believe we changed >> effective_cache_size and random_page_cost > > Those look reasonably harmless. > > My best bet at the moment is that you've got a pretty early 8.1.x > release and are hitting one of the planner bugs that we fixed earlier > this year. Not enough info to say for sure though. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
"Tim Jones" <TJones@optio.com> writes: > [ explain results ] As best I can see, the problem is with the estimate of the size of the inner join: for two keys we have -> Nested Loop (cost=4.01..9410.49 rows=13 width=145) (actual time=0.227..0.416 rows=30 loops=1) -> Bitmap Heap Scan on battery (cost=4.01..11.64 rows=2 width=69) (actual time=0.135..0.138 rows=2 loops=1) Recheck Cond: ((batteryidentifier = 1177470) OR (batteryidentifier = 1177469)) -> BitmapOr (cost=4.01..4.01 rows=2 width=0) (actual time=0.106..0.106 rows=0 loops=1) -> Bitmap Index Scan on ix_battery_id (cost=0.00..2.00 rows=1 width=0) (actual time=0.091..0.091rows=1 loops=1) Index Cond: (batteryidentifier = 1177470) -> Bitmap Index Scan on ix_battery_id (cost=0.00..2.00 rows=1 width=0) (actual time=0.011..0.011rows=1 loops=1) Index Cond: (batteryidentifier = 1177469) -> Index Scan using ix_obresults_bat on observationresults (cost=0.00..4682.40 rows=1362 width=76) (actualtime=0.047..0.091 rows=15 loops=2) Index Cond: ("outer".batteryidentifier = observationresults.batteryidentifier) but for one key we have -> Nested Loop (cost=0.00..4653.67 rows=1348 width=145) (actual time=0.166..0.215 rows=9 loops=1) -> Index Scan using ix_battery_id on battery (cost=0.00..5.81 rows=1 width=69) (actual time=0.079..0.082rows=1 loops=1) Index Cond: (batteryidentifier = 1177470) -> Index Scan using ix_obresults_bat on observationresults (cost=0.00..4634.38 rows=1348 width=76)(actual time=0.079..0.102 rows=9 loops=1) Index Cond: (1177470 = batteryidentifier) The large rowcount estimate makes it back off to a non-nestloop plan for the outer joins, and in this situation that's a loser. I'm actually not sure why they're not both too high --- with the rowcount estimate of 1362 for the inner scan in the first example, you'd expect about twice that for the join result. But the immediate problem is that in the case where it knows exactly what batteryidentifier is being probed for, it's still off by more than a factor of 100 on the rowcount estimate for observationresults. How many rows in observationresults, and may we see the pg_stats entry for observationresults.batteryidentifier? It's likely that the answer for you will be "raise the statistics target for observationresults and re-ANALYZE", but I'd like to gather more info about what's going wrong first. regards, tom lane
18,273,008 rows in observationresults pg_stats: select * from pg_stats where tablename='observationresults' and attname='batteryidentifier'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+--------------------+-------------------+-----------+------ -----+------------+----------------------------------------------------- ---------------------+-------------------------------------------------- -----------------------+------------------------------------------------ -------------------------------------+------------- public | observationresults | batteryidentifier | 0 | 4 | 12942 | {437255,1588952,120420,293685,356599,504069,589910,693683,834990,854693} | {0.00133333,0.00133333,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001} | {3561,271263,556929,839038,1125682,1406538,1697589,1970463,2226781,25392 41,2810844} | 0.31779 thanks Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, December 13, 2006 6:25 PM To: Tim Jones Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] strange query behavior The large rowcount estimate makes it back off to a non-nestloop plan for the outer joins, and in this situation that's a loser. I'm actually not sure why they're not both too high --- with the rowcount estimate of 1362 for the inner scan in the first example, you'd expect about twice that for the join result. But the immediate problem is that in the case where it knows exactly what batteryidentifier is being probed for, it's still off by more than a factor of 100 on the rowcount estimate for observationresults. How many rows in observationresults, and may we see the pg_stats entry for observationresults.batteryidentifier? It's likely that the answer for you will be "raise the statistics target for observationresults and re-ANALYZE", but I'd like to gather more info about what's going wrong first. regards, tom lane
"Tim Jones" <TJones@optio.com> writes: > 18,273,008 rows in observationresults > [ and n_distinct = 12942 ] OK, so the estimated rowcounts are coming from those two numbers. It's notoriously hard to get a decent n_distinct estimate from a small sample :-(, and I would imagine the number of batteryidentifiers is really a lot more than 12942? What you need to do is increase the statistics target for observationresults.batteryidentifier (see ALTER TABLE) and re-ANALYZE and see if you get a saner n_distinct in pg_stats. I'd try 100 and then 1000 as target. Or you could just increase the global default target (see postgresql.conf) but that might be overkill. It's still a bit odd that the case with two batteryidentifiers was estimated fairly accurately when the other wasn't; I'll go look into that. But in any case you need better stats if you want good plans. regards, tom lane
I wrote: > It's still a bit odd that the case with two batteryidentifiers was > estimated fairly accurately when the other wasn't; I'll go look into > that. For the sake of the archives: I looked into this, and it seems there's not anything going wrong other than the bogusly small n_distinct for observationresults. I'm assuming that battery.batteryidentifier is unique (stop me here, Tim, if not). That means that (a) there won't be any most-common-values statistics list for it, and (b) the n_distinct estimate should be pretty accurate. What happens in the multiple-batteryidentifier case is that eqjoinsel() doesn't have two MCV lists to work with, and so it bases its selectivity estimate on the larger n_distinct, which in this case is the accurate value from the battery table. So we come out with a decent estimate even though the other n_distinct is all wrong. What happens in the single-batteryidentifier case is that transitive equality deduction removes the battery.batteryidentifier = observationresults.batteryidentifier join condition altogether, replacing it with two restriction conditions batteryidentifier = 1177470. So eqjoinsel() is never called, and the join size estimate is just the product of the indexscan size estimates, and the scan estimate for observationresults is too high because its n_distinct is too small. So the bottom line is that eqjoinsel() is actually a bit more robust than one might have thought ... regards, tom lane
ok thanks Tom I will alter the statistics and re-analyze the table. Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Thursday, December 14, 2006 12:49 PM To: Tim Jones Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] strange query behavior "Tim Jones" <TJones@optio.com> writes: > 18,273,008 rows in observationresults > [ and n_distinct = 12942 ] OK, so the estimated rowcounts are coming from those two numbers. It's notoriously hard to get a decent n_distinct estimate from a small sample :-(, and I would imagine the number of batteryidentifiers is really a lot more than 12942? What you need to do is increase the statistics target for observationresults.batteryidentifier (see ALTER TABLE) and re-ANALYZE and see if you get a saner n_distinct in pg_stats. I'd try 100 and then 1000 as target. Or you could just increase the global default target (see postgresql.conf) but that might be overkill. It's still a bit odd that the case with two batteryidentifiers was estimated fairly accurately when the other wasn't; I'll go look into that. But in any case you need better stats if you want good plans. regards, tom lane