Thread: overzealous sorting?

overzealous sorting?

From

anthony.shipman@symstream.com

Date:

26 September 2011, 03:53:24

In Mammoth Replicator (PG 8.3) I have a table described as

                                       Table "public.tevent_cdr"
     Column     |           Type           |                         Modifiers
----------------+--------------------------+------------------------------------------------------------
 event_id       | integer                  | not null default
nextval(('event_id_seq'::text)::regclass)
 timestamp      | timestamp with time zone | not null
 classification | character varying        | not null
 area           | character varying        | not null
 kind           | character varying        |
 device_id      | integer                  |
 device_name    | character varying        |
 fleet_id       | integer                  |
 fleet_name     | character varying        |
 customer_id    | integer                  |
 customer_name  | character varying        |
 event          | text                     |
Indexes:
    "tevent_cdr_event_id" UNIQUE, btree (event_id)
    "tevent_cdr_timestamp" btree ("timestamp")
Check constraints:
    "tevent_cdr_classification_check" CHECK (classification::text
= 'cdr'::text)
Inherits: tevent


This simple query puzzles me. Why does it need to sort the records? Don't they
come from the index in order?

 "explain analyze select * from tevent_cdr where timestamp >=
'2011-09-09 12:00:00.000000+0' and timestamp < '2011-09-09
13:00:00.000000+0' and classification = 'cdr' order by timestamp;"

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=9270.93..9277.12 rows=2477 width=588) (actual
time=9.219..11.489 rows=2480 loops=1)
    Sort Key: "timestamp"
    Sort Method:  quicksort  Memory: 2564kB
    ->  Bitmap Heap Scan on tevent_cdr  (cost=57.93..9131.30 rows=2477
width=588) (actual time=0.440..3.923 rows=2480 loops=1)
          Recheck Cond: (("timestamp" >= '2011-09-09
22:00:00+10'::timestamp with time zone) AND ("timestamp" < '2011-09-09
23:00:00+10'::timestamp with time zone))
          Filter: ((classification)::text = 'cdr'::text)
          ->  Bitmap Index Scan on tevent_cdr_timestamp
(cost=0.00..57.31 rows=2477 width=0) (actual time=0.404..0.404 rows=2480
loops=1)
                Index Cond: (("timestamp" >= '2011-09-09
22:00:00+10'::timestamp with time zone) AND ("timestamp" < '2011-09-09
23:00:00+10'::timestamp with time zone))
  Total runtime: 13.847 ms
(9 rows)
--
Anthony Shipman                 | flailover systems: When one goes down it
Anthony.Shipman@symstream.com   | flails about until the other goes down too.

Re: overzealous sorting?

From

Marc Cousin

Date:

26 September 2011, 06:44:42

Le Mon, 26 Sep 2011 16:28:15 +1000,
anthony.shipman@symstream.com a écrit :

> In Mammoth Replicator (PG 8.3) I have a table described as
>
>                                        Table "public.tevent_cdr"
>      Column     |           Type           |
> Modifiers
> ----------------+--------------------------+------------------------------------------------------------
> event_id       | integer                  | not null default
> nextval(('event_id_seq'::text)::regclass) timestamp      | timestamp
> with time zone | not null classification | character varying        |
> not null area           | character varying        | not null
>  kind           | character varying        |
>  device_id      | integer                  |
>  device_name    | character varying        |
>  fleet_id       | integer                  |
>  fleet_name     | character varying        |
>  customer_id    | integer                  |
>  customer_name  | character varying        |
>  event          | text                     |
> Indexes:
>     "tevent_cdr_event_id" UNIQUE, btree (event_id)
>     "tevent_cdr_timestamp" btree ("timestamp")
> Check constraints:
>     "tevent_cdr_classification_check" CHECK (classification::text
> = 'cdr'::text)
> Inherits: tevent
>
>
> This simple query puzzles me. Why does it need to sort the records?
> Don't they come from the index in order?
>
>  "explain analyze select * from tevent_cdr where timestamp >=
> '2011-09-09 12:00:00.000000+0' and timestamp < '2011-09-09
> 13:00:00.000000+0' and classification = 'cdr' order by timestamp;"
>
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Sort  (cost=9270.93..9277.12 rows=2477 width=588) (actual
> time=9.219..11.489 rows=2480 loops=1)
>     Sort Key: "timestamp"
>     Sort Method:  quicksort  Memory: 2564kB
>     ->  Bitmap Heap Scan on tevent_cdr  (cost=57.93..9131.30
> rows=2477 width=588) (actual time=0.440..3.923 rows=2480 loops=1)
>           Recheck Cond: (("timestamp" >= '2011-09-09
> 22:00:00+10'::timestamp with time zone) AND ("timestamp" <
> '2011-09-09 23:00:00+10'::timestamp with time zone))
>           Filter: ((classification)::text = 'cdr'::text)
>           ->  Bitmap Index Scan on tevent_cdr_timestamp
> (cost=0.00..57.31 rows=2477 width=0) (actual time=0.404..0.404
> rows=2480 loops=1)
>                 Index Cond: (("timestamp" >= '2011-09-09
> 22:00:00+10'::timestamp with time zone) AND ("timestamp" <
> '2011-09-09 23:00:00+10'::timestamp with time zone))
>   Total runtime: 13.847 ms
> (9 rows)

Because Index Scans are sorted, not Bitmap Index Scans, which builds a
list of pages to visit, to be then visited by the Bitmap Heap Scan step.

Marc.

Re: overzealous sorting?

From

anthony.shipman@symstream.com

Date:

26 September 2011, 23:45:14

On Monday 26 September 2011 19:39, Marc Cousin wrote:
> Because Index Scans are sorted, not Bitmap Index Scans, which builds a
> list of pages to visit, to be then visited by the Bitmap Heap Scan step.
>
> Marc.

Where does this bitmap index scan come from? It seems to negate the advantages
of b-tree indexes described in the section "Indexes and ORDER BY" of the
manual. If I do "set enable_bitmapscan = off;" the query runs a bit faster
although with a larger time range it reverts to a sequential scan.

--
Anthony Shipman                 | Consider the set of blacklists that
Anthony.Shipman@symstream.com   | do not blacklist themselves...

Re: overzealous sorting?

From

Marc Cousin

Date:

27 September 2011, 05:54:47

Le Tue, 27 Sep 2011 12:45:00 +1000,
anthony.shipman@symstream.com a écrit :

> On Monday 26 September 2011 19:39, Marc Cousin wrote:
> > Because Index Scans are sorted, not Bitmap Index Scans, which
> > builds a list of pages to visit, to be then visited by the Bitmap
> > Heap Scan step.
> >
> > Marc.
>
> Where does this bitmap index scan come from? It seems to negate the
> advantages of b-tree indexes described in the section "Indexes and
> ORDER BY" of the manual. If I do "set enable_bitmapscan = off;" the
> query runs a bit faster although with a larger time range it reverts
> to a sequential scan.
>

Bitmap Index Scan is just another way to use a btree index. It is often
used when a bigger part of a table is required, as it costs more than
plain index scan to retrieve a few records, but less when a lot of
records are needed.

Your tests show that index scans are a bit faster on this query. But it
is probably true only when most needed data is cached, which is probably
your case, as you are doing tests using the same query all the time.
The bitmap index scan is probably cheaper when data isn't in cache. You
could also see the bitmap index scan as safer, as it won't perform as
bad when data is not cached (less random IO) :)

The thing is, the optimizer doesn't know if your data will be in cache
when you will run your query… if you are sure most of your data is in
the cache most of the time, you could try to tune random_page_cost
(lower it) to reflect that data is cached. But if the win is small on
this query, it may not be worth it.

Re: overzealous sorting?

From

anthony.shipman@symstream.com

Date:

27 September 2011, 06:05:27

On Tuesday 27 September 2011 18:54, Marc Cousin wrote:
> The thing is, the optimizer doesn't know if your data will be in cache
> when you will run your query… if you are sure most of your data is in
> the cache most of the time, you could try to tune random_page_cost
> (lower it) to reflect that data is cached. But if the win is small on
> this query, it may not be worth it.

What I really want is to just read a sequence of records in timestamp order
between two timestamps. The number of records to be read may be in the
millions totalling more than 1GB of data so I'm trying to read them a slice
at a time but I can't get PG to do just this.

If I use offset and limit to grab a slice of the records from a large
timestamp range then PG will grab all of the records in the range, sort them
on disk and return just the slice I want. This is absurdly slow.

The query that I've shown is one of a sequence of queries with the timestamp
range progressing in steps of 1 hour through the timestamp range. All I want
PG to do is find the range in the index, find the matching records in the
table and return them. All of the planner's cleverness just seems to get in
the way.

--
Anthony Shipman                 | Consider the set of blacklists that
Anthony.Shipman@symstream.com   | do not blacklist themselves...

Re: overzealous sorting?

From

Mark Kirkwood

Date:

27 September 2011, 06:29:10

On 27/09/11 22:05, anthony.shipman@symstream.com wrote:
>
> What I really want is to just read a sequence of records in timestamp order
> between two timestamps. The number of records to be read may be in the
> millions totalling more than 1GB of data so I'm trying to read them a slice
> at a time but I can't get PG to do just this.
>
> If I use offset and limit to grab a slice of the records from a large
> timestamp range then PG will grab all of the records in the range, sort them
> on disk and return just the slice I want. This is absurdly slow.
>
> The query that I've shown is one of a sequence of queries with the timestamp
> range progressing in steps of 1 hour through the timestamp range. All I want
> PG to do is find the range in the index, find the matching records in the
> table and return them. All of the planner's cleverness just seems to get in
> the way.
>

It is not immediately clear that the planner is making the wrong choices
here. Index scans are not always the best choice, it depends heavily on
the correlation of the column concerned to the physical order of the
table's heap file. I suspect the reason for the planner choosing the
bitmap scan is that said correlation is low (consult pg_stats to see).
Now if you think that the table's heap data is cached anyway, then this
is not such an issue - but you have to tell the planner that by reducing
random_page_cost (as advised previously). Give it a try and report back!

regards

Mark

Re: overzealous sorting?

From

Marc Cousin

Date:

27 September 2011, 12:00:16

Le Tue, 27 Sep 2011 19:05:09 +1000,
anthony.shipman@symstream.com a écrit :

> On Tuesday 27 September 2011 18:54, Marc Cousin wrote:
> > The thing is, the optimizer doesn't know if your data will be in
> > cache when you will run your query… if you are sure most of your
> > data is in the cache most of the time, you could try to tune
> > random_page_cost (lower it) to reflect that data is cached. But if
> > the win is small on this query, it may not be worth it.
>
> What I really want is to just read a sequence of records in timestamp
> order between two timestamps. The number of records to be read may be
> in the millions totalling more than 1GB of data so I'm trying to read
> them a slice at a time but I can't get PG to do just this.
>
> If I use offset and limit to grab a slice of the records from a large
> timestamp range then PG will grab all of the records in the range,
> sort them on disk and return just the slice I want. This is absurdly
> slow.
>
> The query that I've shown is one of a sequence of queries with the
> timestamp range progressing in steps of 1 hour through the timestamp
> range. All I want PG to do is find the range in the index, find the
> matching records in the table and return them. All of the planner's
> cleverness just seems to get in the way.
>

Maybe you should try using a cursor, if you don't know where you'll
stop. This associated with a very low cursor_tuple_fraction will
probably give you what you want (a fast start plan).

Re: overzealous sorting?

From

anthony.shipman@symstream.com

Date:

28 September 2011, 02:13:20

On Tuesday 27 September 2011 19:22, Mark Kirkwood wrote:
> > The query that I've shown is one of a sequence of queries with the
> > timestamp range progressing in steps of 1 hour through the timestamp
> > range. All I want PG to do is find the range in the index, find the
> > matching records in the table and return them. All of the planner's
> > cleverness just seems to get in the way.
>
> It is not immediately clear that the planner is making the wrong choices
> here. Index scans are not always the best choice, it depends heavily on
> the correlation of the column concerned to the physical order of the
> table's heap file. I suspect the reason for the planner choosing the
> bitmap scan is that said correlation is low (consult pg_stats to see).
> Now if you think that the table's heap data is cached anyway, then this
> is not such an issue - but you have to tell the planner that by reducing
> random_page_cost (as advised previously). Give it a try and report back!
>
> regards
>
> Mark

I don't expect that any of it is cached. It is supposed to be a once-a-day
linear scan of a slice of the table. The correlation on the timestamp is
reported as 0.0348395. I can't use cluster since it would lock the table for
too long.

I would try a cursor but my framework for this case doesn't support cursors.
In a later version of the framework I've tried cursors and haven't found them
to be faster than reading in slices, in the tests I've done.

Anyway at the moment it is fast enough.

Thanks
--
Anthony Shipman                 | It's caches all the way
Anthony.Shipman@symstream.com   | down.