Re: strange query plan with LIMIT - Mailing list pgsql-performance

From Pavel Stehule
Subject Re: strange query plan with LIMIT
Date
Msg-id BANLkTikqL1yKaaLSc20uoSS2zO-GPG56vg@mail.gmail.com
Whole thread Raw
In response to strange query plan with LIMIT  (anthony.shipman@symstream.com)
List pgsql-performance
Hello

did you run a ANALYZE statement on table tdiag? A statistics are
absolutelly out.

Regards

Pavel Stehule

2011/6/7  <anthony.shipman@symstream.com>:
> Version: PostgreSQL 8.3.5 (mammoth replicator)
>
> Schema:
>
> CREATE TABLE tdiag (
>    diag_id             integer DEFAULT nextval('diag_id_seq'::text),
>    create_time         timestamp with time zone default now(), /* time this record
> was created */
>    diag_time           timestamp with time zone not null,
>    device_id           integer,                /* optional */
>    fleet_id            integer,                /* optional */
>    customer_id         integer,                /* optional */
>    module              character varying,
>    node_kind           smallint,
>    diag_level          smallint,
>    tag                 character varying not null default '',
>    message             character varying not null default '',
>    options             text,
>
>    PRIMARY KEY (diag_id)
> );
>
> create index tdiag_create_time   ON tdiag(create_time);
>
> The number of rows is around 33 million with time stamps over the past two
> weeks.
> A VACUUM ANALYZE has been done recently on the table.
>
> The create_time order is almost identical to the id order.  What I want
> to find is the first or last entry by id in a given time range. The
> query I am having a problem with is:
>
> symstream2=> explain analyze select * from tdiag where (create_time
>>= '2011-06-03
> 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by
> diag_id limit 1;
>
>
> QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..16.75 rows=1 width=114) (actual time=69425.356..69425.358
> rows=1 loops=1)
>   ->  Index Scan using tdiag_pkey on tdiag  (cost=0.00..19114765.76
> rows=1141019 width=114)
> (actual time=69425.352..69425.352 rows=1 loops=1)
>         Filter: ((create_time >= '2011-06-03 19:49:04+10'::timestamp with
> time zone) AND
> (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
>  Total runtime: 69425.400 ms
>
> PG seems to decide it must scan the diag_id column and filter each row by the
> create_time.
>
>
>
> If I leave out the limit I get
>
> symstream2=> explain analyze select * from tdiag where (create_time
>>= '2011-06-03
> 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by
> diag_id;
>
>
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=957632.43..960484.98 rows=1141019 width=114) (actual
> time=552.795..656.319 rows=86530
> loops=1)
>   Sort Key: diag_id
>   Sort Method:  external merge  Disk: 9872kB
>   ->  Bitmap Heap Scan on tdiag  (cost=25763.48..638085.13 rows=1141019
> width=114) (actual
> time=43.232..322.441 rows=86530 loops=1)
>         Recheck Cond: ((create_time >= '2011-06-03 19:49:04+10'::timestamp
> with time zone) AND
> (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
>         ->  Bitmap Index Scan on tdiag_create_time  (cost=0.00..25478.23
> rows=1141019 width=0)
> (actual time=42.574..42.574 rows=86530 loops=1)
>               Index Cond: ((create_time >= '2011-06-03
> 19:49:04+10'::timestamp with time zone) AND
> (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
>  Total runtime: 736.440 ms
> (8 rows)
>
>
>
>
> I can be explicit about the query order:
>
> select * into tt from tdiag where (create_time >= '2011-06-03
> 09:49:04.000000+0' and create_time <
> '2011-06-06 09:59:04.000000+0');
>
> symstream2=> explain analyze select * from tt order by diag_id limit 1;
>                                                       QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=2731.95..2731.95 rows=1 width=101) (actual time=440.165..440.166
> rows=1 loops=1)
>   ->  Sort  (cost=2731.95..2948.28 rows=86530 width=101) (actual
> time=440.161..440.161 rows=1
> loops=1)
>         Sort Key: diag_id
>         Sort Method:  top-N heapsort  Memory: 17kB
>         ->  Seq Scan on tt  (cost=0.00..2299.30 rows=86530 width=101) (actual
> time=19.602..330.873
> rows=86530 loops=1)
>  Total runtime: 440.209 ms
> (6 rows)
>
>
>
> But if I try using a subquery I get
>
> symstream2=> explain analyze select * from (select * from tdiag where
> (create_time >= '2011-06-03
> 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0')) as sub
> order by diag_id limit
> 1;
>
>
> QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..16.75 rows=1 width=114) (actual time=90344.384..90344.385
> rows=1 loops=1)
>   ->  Index Scan using tdiag_pkey on tdiag  (cost=0.00..19114765.76
> rows=1141019 width=114)
> (actual time=90344.380..90344.380 rows=1 loops=1)
>         Filter: ((create_time >= '2011-06-03 19:49:04+10'::timestamp with
> time zone) AND
> (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
>  Total runtime: 90344.431 ms
>
>
> How do I make this both fast and simple?
> --
> Anthony Shipman                 | flailover systems: When one goes down it
> Anthony.Shipman@symstream.com   | flails about until the other goes down too.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Re: i want to ask monitory peformance memory postgresql with automatically
Next
From: anthony.shipman@symstream.com
Date:
Subject: strange query plan with LIMIT