Thread: strange query behavior

strange query behavior

From

"Tim Jones"

Date:

13 December 2006, 15:02:12

Hi,

  I have a query that uses an IN clause and it seems in perform great
when there is more than two values in it but if there is only one it is
really slow. Also if I change the query to use an = instead of IN in the
case of only one value it is still slow. Possibly I need to reindex this
particular index?

thanks

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

Re: strange query behavior

From

Ragnar

Date:

13 December 2006, 15:14:44

On mið, 2006-12-13 at 13:42 -0500, Tim Jones wrote:

>   I have a query that uses an IN clause and it seems in perform great
> when there is more than two values in it but if there is only one it is
> really slow. Also if I change the query to use an = instead of IN in the
> case of only one value it is still slow. Possibly I need to reindex this
> particular index?

can you provide us with an EXPLAIN ANALYZE for these 2 cases?

what version pg is this?

does this happen only for a particular single value, or for any values?

I assume you have ANALYZEd the table in question.

gnari

Re: strange query behavior

From

Tom Lane

Date:

13 December 2006, 15:16:48

"Tim Jones" <TJones@optio.com> writes:
>   I have a query that uses an IN clause and it seems in perform great
> when there is more than two values in it but if there is only one it is
> really slow. Also if I change the query to use an = instead of IN in the
> case of only one value it is still slow.

Please provide EXPLAIN ANALYZE output for both cases.

> Possibly I need to reindex this
> particular index?

More likely you need to ANALYZE the table so that the planner has
up-to-date stats ...

            regards, tom lane

Re: strange query behavior

From

"Tim Jones"

Date:

13 December 2006, 15:44:57

The tables for theses queries are vacuumed and analyzed regularly. I
just did an analyze to be sure and here are the results


explain analyze select * from battery join observationresults on
battery.batteryidentifier = observationresults.batteryidentifier left
outer join observationcomment on
observationresults.observationidentifier =
observationcomment.observationidentifier left outer join batterycomment
on battery.batteryidentifier=batterycomment.batteryidentifier  where
battery.batteryidentifier in (1177470, 1177469);

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
---------------
 Nested Loop Left Join  (cost=5.03..12553.00 rows=13 width=248) (actual
time=0.362..1.345 rows=30 loops=1)
   ->  Nested Loop Left Join  (cost=4.01..12424.13 rows=13 width=208)
(actual time=0.307..0.927 rows=30 loops=1)
         ->  Nested Loop  (cost=4.01..9410.49 rows=13 width=145) (actual
time=0.227..0.416 rows=30 loops=1)
               ->  Bitmap Heap Scan on battery  (cost=4.01..11.64 rows=2
width=69) (actual time=0.135..0.138 rows=2 loops=1)
                     Recheck Cond: ((batteryidentifier = 1177470) OR
(batteryidentifier = 1177469))
                     ->  BitmapOr  (cost=4.01..4.01 rows=2 width=0)
(actual time=0.106..0.106 rows=0 loops=1)
                           ->  Bitmap Index Scan on ix_battery_id
(cost=0.00..2.00 rows=1 width=0) (actual time=0.091..0.091 rows=1
loops=1)
                                 Index Cond: (batteryidentifier =
1177470)
                           ->  Bitmap Index Scan on ix_battery_id
(cost=0.00..2.00 rows=1 width=0) (actual time=0.011..0.011 rows=1
loops=1)
                                 Index Cond: (batteryidentifier =
1177469)
               ->  Index Scan using ix_obresults_bat on
observationresults  (cost=0.00..4682.40 rows=1362 width=76) (actual
time=0.047..0.091 rows=15 loops=2)
                     Index Cond: ("outer".batteryidentifier =
observationresults.batteryidentifier)
         ->  Index Scan using ix_obcomment_obid on observationcomment
(cost=0.00..227.73 rows=327 width=63) (actual time=0.013..0.013 rows=0
loops=30)
               Index Cond: ("outer".observationidentifier =
observationcomment.observationidentifier)
   ->  Bitmap Heap Scan on batterycomment  (cost=1.02..9.84 rows=6
width=40) (actual time=0.007..0.007 rows=0 loops=30)
         Recheck Cond: ("outer".batteryidentifier =
batterycomment.batteryidentifier)
         ->  Bitmap Index Scan on ix_batcomment  (cost=0.00..1.02 rows=6
width=0) (actual time=0.005..0.005 rows=0 loops=30)
               Index Cond: ("outer".batteryidentifier =
batterycomment.batteryidentifier)
 Total runtime: 1.585 ms


explain analyze select * from battery join observationresults on
battery.batteryidentifier = observationresults.batteryidentifier left
outer join observationcomment on
observationresults.observationidentifier =
observationcomment.observationidentifier left outer join batterycomment
on battery.batteryidentifier=batterycomment.batteryidentifier  where
battery.batteryidentifier = 1177470;

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------
 Hash Left Join  (cost=4733.62..269304.43 rows=1348 width=248) (actual
time=19275.506..19275.568 rows=9 loops=1)
   Hash Cond: ("outer".batteryidentifier = "inner".batteryidentifier)
   ->  Merge Right Join  (cost=4723.75..269287.81 rows=1348 width=208)
(actual time=19275.432..19275.473 rows=9 loops=1)
         Merge Cond: ("outer".observationidentifier =
"inner".observationidentifier)
         ->  Index Scan using ix_obcomment_obid on observationcomment
(cost=0.00..245841.14 rows=7484253 width=63) (actual
time=0.094..13403.300 rows=4361601 loops=1)
         ->  Sort  (cost=4723.75..4727.12 rows=1348 width=145) (actual
time=0.270..0.278 rows=9 loops=1)
               Sort Key: observationresults.observationidentifier
               ->  Nested Loop  (cost=0.00..4653.67 rows=1348 width=145)
(actual time=0.166..0.215 rows=9 loops=1)
                     ->  Index Scan using ix_battery_id on battery
(cost=0.00..5.81 rows=1 width=69) (actual time=0.079..0.082 rows=1
loops=1)
                           Index Cond: (batteryidentifier = 1177470)
                     ->  Index Scan using ix_obresults_bat on
observationresults  (cost=0.00..4634.38 rows=1348 width=76) (actual
time=0.079..0.102 rows=9 loops=1)
                           Index Cond: (1177470 = batteryidentifier)
   ->  Hash  (cost=9.85..9.85 rows=6 width=40) (actual time=0.039..0.039
rows=0 loops=1)
         ->  Bitmap Heap Scan on batterycomment  (cost=1.02..9.85 rows=6
width=40) (actual time=0.037..0.037 rows=0 loops=1)
               Recheck Cond: (batteryidentifier = 1177470)
               ->  Bitmap Index Scan on ix_batcomment  (cost=0.00..1.02
rows=6 width=0) (actual time=0.032..0.032 rows=0 loops=1)
                     Index Cond: (batteryidentifier = 1177470)
 Total runtime: 19275.838 ms
(18 rows)


Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, December 13, 2006 2:17 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] strange query behavior

"Tim Jones" <TJones@optio.com> writes:
>   I have a query that uses an IN clause and it seems in perform great
> when there is more than two values in it but if there is only one it
> is really slow. Also if I change the query to use an = instead of IN
> in the case of only one value it is still slow.

Please provide EXPLAIN ANALYZE output for both cases.

> Possibly I need to reindex this
> particular index?

More likely you need to ANALYZE the table so that the planner has
up-to-date stats ...

            regards, tom lane

Re: strange query behavior

From

Tom Lane

Date:

13 December 2006, 17:59:12

"Tim Jones" <TJones@optio.com> writes:
> The tables for theses queries are vacuumed and analyzed regularly. I
> just did an analyze to be sure and here are the results
> ...

There's something pretty wacko about the choice of plan in the slow case
--- I don't see why it'd not have used the same plan structure as for
the IN case.  It's coming up with a cost a lot higher than for the
other, so it certainly knows this isn't a great plan ...

Which PG version is this exactly?  Are you running with any nondefault
planner parameters?

            regards, tom lane

Re: strange query behavior

From

"Tim Jones"

Date:

13 December 2006, 18:08:51

Version 8.1

Here are the planner constraints I believe we changed
effective_cache_size and random_page_cost
BTW this is an AIX 5.2

#-----------------------------------------------------------------------
----
# QUERY TUNING
#-----------------------------------------------------------------------
----

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#effective_cache_size = 10000       # typically 8KB each
effective_cache_size = 400000
random_page_cost = 3.8          # units are one sequential page fetch
                    # cost
#cpu_tuple_cost = 0.01          # (same)
#cpu_index_tuple_cost = 0.001       # (same)
#cpu_operator_cost = 0.0025     # (same)

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5            # range 1-10
#geqo_pool_size = 0         # selects default based on effort
#geqo_generations = 0           # selects default based on effort
#geqo_selection_bias = 2.0      # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10     # range 1-1000
#constraint_exclusion = off
#from_collapse_limit = 8
#join_collapse_limit = 8        # 1 disables collapsing of explicit
                    # JOINs


Thanks

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, December 13, 2006 4:59 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] strange query behavior

"Tim Jones" <TJones@optio.com> writes:
> The tables for theses queries are vacuumed and analyzed regularly. I
> just did an analyze to be sure and here are the results ...

There's something pretty wacko about the choice of plan in the slow case
--- I don't see why it'd not have used the same plan structure as for
the IN case.  It's coming up with a cost a lot higher than for the
other, so it certainly knows this isn't a great plan ...

Which PG version is this exactly?  Are you running with any nondefault
planner parameters?

            regards, tom lane

Re: strange query behavior

From

Tom Lane

Date:

13 December 2006, 18:36:56

"Tim Jones" <TJones@optio.com> writes:
>> Which PG version is this exactly?  Are you running with any nondefault
>> planner parameters?

> Version 8.1

8.1.what?

> Here are the planner constraints I believe we changed
> effective_cache_size and random_page_cost

Those look reasonably harmless.

My best bet at the moment is that you've got a pretty early 8.1.x
release and are hitting one of the planner bugs that we fixed earlier
this year.  Not enough info to say for sure though.

            regards, tom lane

Re: strange query behavior

From

"Tim Jones"

Date:

13 December 2006, 18:44:59

Looks like 8.1.2

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, December 13, 2006 5:37 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] strange query behavior

"Tim Jones" <TJones@optio.com> writes:
>> Which PG version is this exactly?  Are you running with any
>> nondefault planner parameters?

> Version 8.1

8.1.what?

> Here are the planner constraints I believe we changed
> effective_cache_size and random_page_cost

Those look reasonably harmless.

My best bet at the moment is that you've got a pretty early 8.1.x
release and are hitting one of the planner bugs that we fixed earlier
this year.  Not enough info to say for sure though.

            regards, tom lane

Re: strange query behavior

From

"Tim Jones"

Date:

13 December 2006, 18:54:19

That's what I did and got 8.1.2 ... do you want gcc version etc 3.3.2
powerpc aix5.2

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-----Original Message-----
From: Matthew O'Connor [mailto:matthew@zeut.net]
Sent: Wednesday, December 13, 2006 5:51 PM
To: Tim Jones
Subject: Re: [PERFORM] strange query behavior

 From psql perform: select version();
and send us that output.

Tim Jones wrote:
> Looks like 8.1.2
>
> Tim Jones
> Healthcare Project Manager
> Optio Software, Inc.
> (770) 576-3555
>
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Wednesday, December 13, 2006 5:37 PM
> To: Tim Jones
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] strange query behavior
>
> "Tim Jones" <TJones@optio.com> writes:
>>> Which PG version is this exactly?  Are you running with any
>>> nondefault planner parameters?
>
>> Version 8.1
>
> 8.1.what?
>
>> Here are the planner constraints I believe we changed
>> effective_cache_size and random_page_cost
>
> Those look reasonably harmless.
>
> My best bet at the moment is that you've got a pretty early 8.1.x
> release and are hitting one of the planner bugs that we fixed earlier
> this year.  Not enough info to say for sure though.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: strange query behavior

From

Tom Lane

Date:

13 December 2006, 19:25:09

"Tim Jones" <TJones@optio.com> writes:
> [ explain results ]

As best I can see, the problem is with the estimate of the size of the
inner join: for two keys we have

         ->  Nested Loop  (cost=4.01..9410.49 rows=13 width=145) (actual time=0.227..0.416 rows=30 loops=1)
               ->  Bitmap Heap Scan on battery  (cost=4.01..11.64 rows=2 width=69) (actual time=0.135..0.138 rows=2
loops=1)
                     Recheck Cond: ((batteryidentifier = 1177470) OR (batteryidentifier = 1177469))
                     ->  BitmapOr  (cost=4.01..4.01 rows=2 width=0) (actual time=0.106..0.106 rows=0 loops=1)
                           ->  Bitmap Index Scan on ix_battery_id (cost=0.00..2.00 rows=1 width=0) (actual
time=0.091..0.091rows=1 loops=1) 
                                 Index Cond: (batteryidentifier = 1177470)
                           ->  Bitmap Index Scan on ix_battery_id (cost=0.00..2.00 rows=1 width=0) (actual
time=0.011..0.011rows=1 loops=1) 
                                 Index Cond: (batteryidentifier = 1177469)
               ->  Index Scan using ix_obresults_bat on observationresults  (cost=0.00..4682.40 rows=1362 width=76)
(actualtime=0.047..0.091 rows=15 loops=2) 
                     Index Cond: ("outer".batteryidentifier = observationresults.batteryidentifier)

but for one key we have

               ->  Nested Loop  (cost=0.00..4653.67 rows=1348 width=145) (actual time=0.166..0.215 rows=9 loops=1)
                     ->  Index Scan using ix_battery_id on battery (cost=0.00..5.81 rows=1 width=69) (actual
time=0.079..0.082rows=1 loops=1) 
                           Index Cond: (batteryidentifier = 1177470)
                     ->  Index Scan using ix_obresults_bat on observationresults  (cost=0.00..4634.38 rows=1348
width=76)(actual time=0.079..0.102 rows=9 loops=1) 
                           Index Cond: (1177470 = batteryidentifier)

The large rowcount estimate makes it back off to a non-nestloop
plan for the outer joins, and in this situation that's a loser.

I'm actually not sure why they're not both too high --- with the
rowcount estimate of 1362 for the inner scan in the first example, you'd
expect about twice that for the join result.  But the immediate problem
is that in the case where it knows exactly what batteryidentifier is
being probed for, it's still off by more than a factor of 100 on the
rowcount estimate for observationresults.  How many rows in
observationresults, and may we see the pg_stats entry for
observationresults.batteryidentifier?

It's likely that the answer for you will be "raise the statistics target
for observationresults and re-ANALYZE", but I'd like to gather more info
about what's going wrong first.

            regards, tom lane

Re: strange query behavior

From

"Tim Jones"

Date:

14 December 2006, 10:02:27

18,273,008 rows in observationresults

pg_stats:

select * from pg_stats where tablename='observationresults' and
attname='batteryidentifier';

 schemaname |     tablename      |      attname      | null_frac |
avg_width | n_distinct |                             most_common_vals
|                            most_common_freqs
|                                  histogram_bounds
| correlation
------------+--------------------+-------------------+-----------+------
-----+------------+-----------------------------------------------------
---------------------+--------------------------------------------------
-----------------------+------------------------------------------------
-------------------------------------+-------------
 public     | observationresults | batteryidentifier |         0 |
4 |      12942 |
{437255,1588952,120420,293685,356599,504069,589910,693683,834990,854693}
|
{0.00133333,0.00133333,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001}
|
{3561,271263,556929,839038,1125682,1406538,1697589,1970463,2226781,25392
41,2810844} |     0.31779

thanks

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, December 13, 2006 6:25 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] strange query behavior


The large rowcount estimate makes it back off to a non-nestloop plan for
the outer joins, and in this situation that's a loser.

I'm actually not sure why they're not both too high --- with the
rowcount estimate of 1362 for the inner scan in the first example, you'd
expect about twice that for the join result.  But the immediate problem
is that in the case where it knows exactly what batteryidentifier is
being probed for, it's still off by more than a factor of 100 on the
rowcount estimate for observationresults.  How many rows in
observationresults, and may we see the pg_stats entry for
observationresults.batteryidentifier?

It's likely that the answer for you will be "raise the statistics target
for observationresults and re-ANALYZE", but I'd like to gather more info
about what's going wrong first.

            regards, tom lane

Re: strange query behavior

From

Tom Lane

Date:

14 December 2006, 13:48:46

"Tim Jones" <TJones@optio.com> writes:
> 18,273,008 rows in observationresults
> [ and n_distinct = 12942 ]

OK, so the estimated rowcounts are coming from those two numbers.
It's notoriously hard to get a decent n_distinct estimate from a small
sample :-(, and I would imagine the number of batteryidentifiers is
really a lot more than 12942?

What you need to do is increase the statistics target for
observationresults.batteryidentifier (see ALTER TABLE) and re-ANALYZE
and see if you get a saner n_distinct in pg_stats.  I'd try 100 and
then 1000 as target.  Or you could just increase the global default
target (see postgresql.conf) but that might be overkill.

It's still a bit odd that the case with two batteryidentifiers was
estimated fairly accurately when the other wasn't; I'll go look into
that.  But in any case you need better stats if you want good plans.

            regards, tom lane

Re: strange query behavior

From

Tom Lane

Date:

14 December 2006, 14:30:29

I wrote:
> It's still a bit odd that the case with two batteryidentifiers was
> estimated fairly accurately when the other wasn't; I'll go look into
> that.

For the sake of the archives: I looked into this, and it seems there's
not anything going wrong other than the bogusly small n_distinct for
observationresults.

I'm assuming that battery.batteryidentifier is unique (stop me here,
Tim, if not).  That means that (a) there won't be any most-common-values
statistics list for it, and (b) the n_distinct estimate should be pretty
accurate.

What happens in the multiple-batteryidentifier case is that eqjoinsel()
doesn't have two MCV lists to work with, and so it bases its selectivity
estimate on the larger n_distinct, which in this case is the accurate
value from the battery table.  So we come out with a decent estimate
even though the other n_distinct is all wrong.

What happens in the single-batteryidentifier case is that transitive
equality deduction removes the battery.batteryidentifier =
observationresults.batteryidentifier join condition altogether,
replacing it with two restriction conditions batteryidentifier = 1177470.
So eqjoinsel() is never called, and the join size estimate is just the
product of the indexscan size estimates, and the scan estimate for
observationresults is too high because its n_distinct is too small.

So the bottom line is that eqjoinsel() is actually a bit more robust
than one might have thought ...

            regards, tom lane

Re: strange query behavior

From

"Tim Jones"

Date:

14 December 2006, 15:19:12

ok thanks Tom I will alter the statistics and re-analyze the table.

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, December 14, 2006 12:49 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] strange query behavior

"Tim Jones" <TJones@optio.com> writes:
> 18,273,008 rows in observationresults
> [ and n_distinct = 12942 ]

OK, so the estimated rowcounts are coming from those two numbers.
It's notoriously hard to get a decent n_distinct estimate from a small
sample :-(, and I would imagine the number of batteryidentifiers is
really a lot more than 12942?

What you need to do is increase the statistics target for
observationresults.batteryidentifier (see ALTER TABLE) and re-ANALYZE
and see if you get a saner n_distinct in pg_stats.  I'd try 100 and then
1000 as target.  Or you could just increase the global default target
(see postgresql.conf) but that might be overkill.

It's still a bit odd that the case with two batteryidentifiers was
estimated fairly accurately when the other wasn't; I'll go look into
that.  But in any case you need better stats if you want good plans.

            regards, tom lane