Thread: Slow query - possible bug?

Slow query - possible bug?

From
Gavin Hamill
Date:
laterooms=# explain analyze select allocation0_."ID" as y1_,
allocation0_."RoomID" as y2_, allocation0_."StatusID" as y4_,
allocation0_."Price" as y3_, allocation0_."Number" as y5_,
allocation0_."Date" as y6_ from "Allocation" allocation0_ where
(allocation0_."Date" between '2006-06-09 00:00:00.000000' and
'2006-06-09 00:00:00.000000')and(allocation0_."RoomID" in(4300591));
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ix_date on "Allocation" allocation0_  (cost=0.00..4.77
rows=1 width=34) (actual time=1411.325..1689.860 rows=1 loops=1)
   Index Cond: (("Date" >= '2006-06-09'::date) AND ("Date" <=
'2006-06-09'::date))
   Filter: ("RoomID" = 4300591)
 Total runtime: 1689.917 ms
(4 rows)

Yep, the two dates are identical - yep I would change the client
software to do where "Date" = '2006-06-09 00:00:00.000000' if I could...

However, it's clear to see why this simple query is taking so long - the
plan is selecting /all/ dates after 2006-06-09 and /all/ dates before
then, and only returning the union of the two - a large waste of effort,
surely?

VACUUM ANALYZE hasn't improved matters... the schema for the table is

 "ID" int8 NOT NULL DEFAULT
nextval(('public."allocation_id_seq"'::text)::regclass),
  "RoomID" int4,
  "Price" numeric(10,2),
  "StatusID" int4,
  "Number" int4,
  "Date" date,

and there are indexes kept for 'RoomID' and 'Date' in this 4.3-million
row table.

Is this a bug or a hidden feature in pg 8.1.3 ? :)

Cheers,
Gavin.


Re: Slow query - possible bug?

From
"chris smith"
Date:
On 4/13/06, Gavin Hamill <gdh@laterooms.com> wrote:
> laterooms=# explain analyze select allocation0_."ID" as y1_,
> allocation0_."RoomID" as y2_, allocation0_."StatusID" as y4_,
> allocation0_."Price" as y3_, allocation0_."Number" as y5_,
> allocation0_."Date" as y6_ from "Allocation" allocation0_ where
> (allocation0_."Date" between '2006-06-09 00:00:00.000000' and
> '2006-06-09 00:00:00.000000')and(allocation0_."RoomID" in(4300591));
>                                                                 QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using ix_date on "Allocation" allocation0_  (cost=0.00..4.77
> rows=1 width=34) (actual time=1411.325..1689.860 rows=1 loops=1)
>    Index Cond: (("Date" >= '2006-06-09'::date) AND ("Date" <=
> '2006-06-09'::date))
>    Filter: ("RoomID" = 4300591)
>  Total runtime: 1689.917 ms
> (4 rows)

1.6secs isn't too bad on 4.3mill rows...

How many entries are there for that date range?

--
Postgresql & php tutorials
http://www.designmagick.com/

Re: Slow query - possible bug?

From
Gavin Hamill
Date:
chris smith wrote:

>1.6secs isn't too bad on 4.3mill rows...
>
>How many entries are there for that date range?
>
>
1.7 secs /is/ good - it typically takes 5 or 6 seconds, which isn't so
good. My question is 'why does the planner choose such a bizarre range
request when both elements of the 'between' are identical? :)'

If I replace the
(allocation0_."Date" between '2006-06-09 00:00:00.000000' and
'2006-06-09 00:00:00.000000')

with

allocation0_."Date" ='2006-04-09 00:00:00.000000'

then the query comes back in a few milliseconds (as I'd expect :) - and
yup I've been using different dates for each test to avoid the query
being cached.

For ref, there are typically 35000 rows per date :)

Cheers,
Gavin.


Re: Slow query - possible bug?

From
Richard Huxton
Date:
Gavin Hamill wrote:
> chris smith wrote:
>
>> 1.6secs isn't too bad on 4.3mill rows...
>>
>> How many entries are there for that date range?
>>
>>
> 1.7 secs /is/ good - it typically takes 5 or 6 seconds, which isn't so
> good. My question is 'why does the planner choose such a bizarre range
> request when both elements of the 'between' are identical? :)'

What's bizarre about the range request, and are you sure it's searching
doing the union of both conditions separately? It looks to me like it's
doing a standard range-search. If it was trying to fetch 4.3 million
rows via that index, I'd expect it to use a different index instead.

If you've got stats turned on, look in pg_stat_user_indexes/tables
before and after the query to see. Here's an example of a similar query
against one of my log tables. It's small, but the clause is the same,
and I don't see any evidence of the whole table being selected.

lamp=> SELECT * FROM pg_stat_user_indexes WHERE relname LIKE 'act%';
   relid  | indexrelid | schemaname | relname |  indexrelname  |
idx_scan | idx_tup_read | idx_tup_fetch
---------+------------+------------+---------+----------------+----------+--------------+---------------
  6124993 |    7519044 | public     | act_log | act_log_ts_idx |
23 |           18 |            18
  6124993 |    7371115 | public     | act_log | act_log_pkey   |
0 |            0 |             0
(2 rows)

lamp=> EXPLAIN ANALYSE SELECT * FROM act_log WHERE al_ts BETWEEN
'2006-04-05 14:10:23+00'::timestamptz AND '2006-04-05
14:10:23+00'::timestamptz;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
  Index Scan using act_log_ts_idx on act_log  (cost=0.00..3.02 rows=1
width=102) (actual time=0.116..0.131 rows=1 loops=1)
    Index Cond: ((al_ts >= '2006-04-05 15:10:23+01'::timestamp with time
zone) AND (al_ts <= '2006-04-05 15:10:23+01'::timestamp with time zone))
  Total runtime: 0.443 ms
(3 rows)

lamp=> SELECT * FROM pg_stat_user_indexes WHERE relname LIKE 'act%';
   relid  | indexrelid | schemaname | relname |  indexrelname  |
idx_scan | idx_tup_read | idx_tup_fetch
---------+------------+------------+---------+----------------+----------+--------------+---------------
  6124993 |    7519044 | public     | act_log | act_log_ts_idx |
24 |           19 |            19
  6124993 |    7371115 | public     | act_log | act_log_pkey   |
0 |            0 |             0
(2 rows)


1. vacuum full verbose your table (and post the output please)
2. perhaps reindex?
3. Try the explain analyse again and see what happens.
--
   Richard Huxton
   Archonet Ltd

Re: Slow query - possible bug?

From
Tom Lane
Date:
Gavin Hamill <gdh@laterooms.com> writes:
> If I replace the
> (allocation0_."Date" between '2006-06-09 00:00:00.000000' and
> '2006-06-09 00:00:00.000000')
> with
> allocation0_."Date" ='2006-04-09 00:00:00.000000'
> then the query comes back in a few milliseconds (as I'd expect :)

Could we see EXPLAIN ANALYZE for
* both forms of the date condition, with the roomid condition;
* both forms of the date condition, WITHOUT the roomid condition;
* just the roomid condition

I'm thinking the planner is misestimating something, but it's hard
to tell what without breaking it down.

            regards, tom lane

Re: Slow query - possible bug?

From
Gavin Hamill
Date:
Tom Lane wrote:

>Gavin Hamill <gdh@laterooms.com> writes:
>
>
>>If I replace the
>>(allocation0_."Date" between '2006-06-09 00:00:00.000000' and
>>'2006-06-09 00:00:00.000000')
>>with
>>allocation0_."Date" ='2006-04-09 00:00:00.000000'
>>then the query comes back in a few milliseconds (as I'd expect :)
>>
>>
>
>Could we see EXPLAIN ANALYZE for
>* both forms of the date condition, with the roomid condition;
>* both forms of the date condition, WITHOUT the roomid condition;
>* just the roomid condition
>
>I'm thinking the planner is misestimating something, but it's hard
>to tell what without breaking it down.
>
>

Of course. In each case, I have changed the date by two weeks to try and
minimise the effect of any query caching.

The base query is "explain analyse select allocation0_."ID" as y1_,
allocation0_."RoomID" as y2_, allocation0_."StatusID" as y4_,
allocation0_."Price" as y3_, allocation0_."Number" as y5_,
allocation0_."Date" as y6_ from "Allocation" allocation0_ where"

now both forms of the Date condition

a)

(allocation0_."Date" between '2006-04-25 00:00:00.000000' and
'2006-04-25 00:00:00.000000')and(allocation0_."RoomID" in(211800));
                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ix_date on "Allocation" allocation0_  (cost=0.00..4.77
rows=1 width=34) (actual time=3253.340..48040.396 rows=1 loops=1)
   Index Cond: (("Date" >= '2006-04-25'::date) AND ("Date" <=
'2006-04-25'::date))
   Filter: ("RoomID" = 211800)
 Total runtime: 48040.451 ms (ouch!)


b)

(allocation0_."Date"= '2006-05-10
00:00:00.000000'::date)and(allocation0_."RoomID" in(211800));
                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ix_dateroom on "Allocation" allocation0_
(cost=0.00..5.01 rows=1 width=34) (actual time=0.033..0.035 rows=1 loops=1)
   Index Cond: (("RoomID" = 211800) AND ("Date" = '2006-05-10'::date))
 Total runtime: 0.075 ms (whoosh!)

And now without the RoomID condition:

a)
(allocation0_."Date" between '2006-06-10 00:00:00.000000' and
'2006-06-10 00:00:00.000000');
                                                                 QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ix_date on "Allocation" allocation0_  (cost=0.00..4.77
rows=1 width=34) (actual time=0.035..6706.467 rows=34220 loops=1)
   Index Cond: (("Date" >= '2006-06-10'::date) AND ("Date" <=
'2006-06-10'::date))
 Total runtime: 6728.743 ms

b)
(allocation0_."Date"= '2006-05-25 00:00:00.000000'::date);
                                                                  QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on "Allocation" allocation0_  (cost=87.46..25017.67
rows=13845 width=34) (actual time=207.674..9702.656 rows=34241 loops=1)
   Recheck Cond: ("Date" = '2006-05-25'::date)
   ->  Bitmap Index Scan on ix_date  (cost=0.00..87.46 rows=13845
width=0) (actual time=185.086..185.086 rows=42705 loops=1)
         Index Cond: ("Date" = '2006-05-25'::date)
 Total runtime: 9725.470 ms


Wow, I'm not really sure what that tells me...

Cheers,
Gavin.


Re: Slow query - possible bug?

From
Tom Lane
Date:
Gavin Hamill <gdh@laterooms.com> writes:
> Tom Lane wrote:
>> I'm thinking the planner is misestimating something, but it's hard
>> to tell what without breaking it down.

> (allocation0_."Date" between '2006-06-10 00:00:00.000000' and
> '2006-06-10 00:00:00.000000');
>                                                                  QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using ix_date on "Allocation" allocation0_  (cost=0.00..4.77
> rows=1 width=34) (actual time=0.035..6706.467 rows=34220 loops=1)
>    Index Cond: (("Date" >= '2006-06-10'::date) AND ("Date" <=
> '2006-06-10'::date))
>  Total runtime: 6728.743 ms

Bingo, there's our misestimation: estimated 1 row, actual 34220 :-(

That's why it's choosing the wrong index: it thinks the condition on
RoomID isn't going to reduce the number of rows fetched any further,
and so the smaller index ought to be marginally cheaper to use.
In reality, it works way better when using the two-column index.

I think this is the same problem recently discussed about how the
degenerate case for a range comparison is making an unreasonably small
estimate, where it probably ought to fall back to some equality estimate
instead.  With the simple-equality form of the date condition, it does
get a reasonable estimate, and so it picks the right index.

There should be a fix for this by the time PG 8.2 comes out, but in the
meantime you might find that it helps to write the range check in a way
that doesn't have identical bounds, eg
    date >= '2006-06-10'::date AND date < '2006-06-11'::date

            regards, tom lane

Re: Slow query - possible bug?

From
Gavin Hamill
Date:
On Tue, 18 Apr 2006 13:31:48 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> There should be a fix for this by the time PG 8.2 comes out, but in
> the meantime you might find that it helps to write the range check in
> a way that doesn't have identical bounds, eg
>     date >= '2006-06-10'::date AND date < '2006-06-11'::date

OK coolies - we've already had a code release for this (and other
stuff) planned for tomorrow morning checking on the client side
if a single date has been chosen, then do an equality test on that...
otherwise leave the between in place - seems to work like a charm, and
hopefully it'll mean we don't have a loadavg of 15 on our main pg
server tomorrow (!) :))

Basically, as long as I know it's a pg issue rather than something daft
I've done (or not done) then I'm happy enough.

Cheers,
Gavin.

Re: Slow query - possible bug?

From
Tom Lane
Date:
Gavin Hamill <gdh@laterooms.com> writes:
> On Tue, 18 Apr 2006 13:31:48 -0400
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There should be a fix for this by the time PG 8.2 comes out, but in
>> the meantime you might find that it helps to write the range check in
>> a way that doesn't have identical bounds, eg
>> date >= '2006-06-10'::date AND date < '2006-06-11'::date

> OK coolies - we've already had a code release for this (and other
> stuff) planned for tomorrow morning checking on the client side
> if a single date has been chosen, then do an equality test on that...

Fair enough, no reason to replace one workaround with another.  But
would you try it on your test case, just to verify the diagnosis?

            regards, tom lane

Re: Slow query - possible bug?

From
Gavin Hamill
Date:
On Tue, 18 Apr 2006 15:51:44 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Fair enough, no reason to replace one workaround with another.  But
> would you try it on your test case, just to verify the diagnosis?

Yup I can confirm it from testing earlier today - as soon as
the two dates are non-equal, an index scan is correctly selected and
returns results in just a few milliseconds:

laterooms=# explain analyse select allocation0_."ID" as y1_,
allocation0_."RoomID" as y2_, allocation0_."StatusID" as y4_,
allocation0_."Price" as y3_, allocation0_."Number" as y5_,
allocation0_."Date" as y6_ from "Allocation" allocation0_ where
(allocation0_."Date" between '2006-04-25 00:00:00.000000' and
'2006-04-26 00:00:00.000000')and(allocation0_."RoomID" in(211800));
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_dateroom on "Allocation" allocation0_
(cost=0.00..14.02 rows=4 width=34) (actual time=16.799..21.804 rows=2
loops=1) Index Cond: (("RoomID" = 211800) AND ("Date" >=
'2006-04-25'::date) AND ("Date" <= '2006-04-26'::date))
Total runtime: 21.910 ms

which I ran first, versus the identical-date equivalent which turned
in a whopping...

 Index Scan using ix_date on "Allocation" allocation0_
(cost=0.00..4.77 rows=1 width=34) (actual time=6874.272..69541.064
rows=1 loops=1) Index Cond: (("Date" >= '2006-04-25'::date) AND ("Date"
<= '2006-04-25'::date)) Filter: ("RoomID" = 211800) Total runtime:
69541.113 ms (4 rows)

Cheers,
Gavin.