Re: strange query behavior - Mailing list pgsql-performance

From Tom Lane
Subject Re: strange query behavior
Date
Msg-id 12988.1166052295@sss.pgh.pa.us
Whole thread Raw
In response to Re: strange query behavior  ("Tim Jones" <TJones@optio.com>)
Responses Re: strange query behavior
List pgsql-performance
"Tim Jones" <TJones@optio.com> writes:
> [ explain results ]

As best I can see, the problem is with the estimate of the size of the
inner join: for two keys we have

         ->  Nested Loop  (cost=4.01..9410.49 rows=13 width=145) (actual time=0.227..0.416 rows=30 loops=1)
               ->  Bitmap Heap Scan on battery  (cost=4.01..11.64 rows=2 width=69) (actual time=0.135..0.138 rows=2
loops=1)
                     Recheck Cond: ((batteryidentifier = 1177470) OR (batteryidentifier = 1177469))
                     ->  BitmapOr  (cost=4.01..4.01 rows=2 width=0) (actual time=0.106..0.106 rows=0 loops=1)
                           ->  Bitmap Index Scan on ix_battery_id (cost=0.00..2.00 rows=1 width=0) (actual
time=0.091..0.091rows=1 loops=1) 
                                 Index Cond: (batteryidentifier = 1177470)
                           ->  Bitmap Index Scan on ix_battery_id (cost=0.00..2.00 rows=1 width=0) (actual
time=0.011..0.011rows=1 loops=1) 
                                 Index Cond: (batteryidentifier = 1177469)
               ->  Index Scan using ix_obresults_bat on observationresults  (cost=0.00..4682.40 rows=1362 width=76)
(actualtime=0.047..0.091 rows=15 loops=2) 
                     Index Cond: ("outer".batteryidentifier = observationresults.batteryidentifier)

but for one key we have

               ->  Nested Loop  (cost=0.00..4653.67 rows=1348 width=145) (actual time=0.166..0.215 rows=9 loops=1)
                     ->  Index Scan using ix_battery_id on battery (cost=0.00..5.81 rows=1 width=69) (actual
time=0.079..0.082rows=1 loops=1) 
                           Index Cond: (batteryidentifier = 1177470)
                     ->  Index Scan using ix_obresults_bat on observationresults  (cost=0.00..4634.38 rows=1348
width=76)(actual time=0.079..0.102 rows=9 loops=1) 
                           Index Cond: (1177470 = batteryidentifier)

The large rowcount estimate makes it back off to a non-nestloop
plan for the outer joins, and in this situation that's a loser.

I'm actually not sure why they're not both too high --- with the
rowcount estimate of 1362 for the inner scan in the first example, you'd
expect about twice that for the join result.  But the immediate problem
is that in the case where it knows exactly what batteryidentifier is
being probed for, it's still off by more than a factor of 100 on the
rowcount estimate for observationresults.  How many rows in
observationresults, and may we see the pg_stats entry for
observationresults.batteryidentifier?

It's likely that the answer for you will be "raise the statistics target
for observationresults and re-ANALYZE", but I'd like to gather more info
about what's going wrong first.

            regards, tom lane

pgsql-performance by date:

Previous
From: "Steven Flatt"
Date:
Subject: Re: Insertion to temp table deteriorating over time
Next
From: Tom Lane
Date:
Subject: Re: Insertion to temp table deteriorating over time