Re: Performance on large, append-only tables - Mailing list pgsql-performance
| From | David Yeu |
|---|---|
| Subject | Re: Performance on large, append-only tables |
| Date | |
| Msg-id | 4205A216-5AD0-4F23-826F-542537B27E62@skype.net Whole thread Raw |
| In response to | Performance on large, append-only tables (David Yeu <david.yeu@skype.net>) |
| Responses |
Re: Performance on large, append-only tables
|
| List | pgsql-performance |
Yeah, Reply-All...
Begin forwarded message:
> From: David Yeu <david.yeu@skype.net>
> Subject: Re: [PERFORM] Performance on large, append-only tables
> Date: February 10, 2012 10:59:04 AM EST
> To: Merlin Moncure <mmoncure@gmail.com>
>
> On Feb 10, 2012, at 10:19 AM, Merlin Moncure wrote:
>
>> You can probably significantly optimize this. But first, can we see
>> some explain analyze for the affected queries?
>
> Sorry, we should have included these in the original post. Here's the EXPLAIN output for a "id < ?" query:
>
>
> => EXPLAIN ANALYZE SELECT "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) AND (id
<?) ORDER BY id DESC LIMIT 20 OFFSET 0;
> QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=9267.44..9267.45 rows=20 width=1321) (actual time=348.844..348.877 rows=20 loops=1)
> -> Sort (cost=9267.44..9269.76 rows=4643 width=1321) (actual time=348.840..348.852 rows=20 loops=1)
> Sort Key: id
> Sort Method: top-N heapsort Memory: 29kB
> -> Index Scan using index_lines_on_group_id on lines (cost=0.00..9242.73 rows=4643 width=1321) (actual
time=6.131..319.835rows=23038 loops=1)
> Index Cond: (group_id = ?)
> Filter: ((deleted_at IS NULL) AND (id < ?))
> Total runtime: 348.987 ms
>
>
> A quick suggestion from Heroku yesterday was a new index on (group_id, id). After adding it to a database fork, we
endedup with:
>
>
> => EXPLAIN ANALYZE SELECT "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) AND (id
<?) ORDER BY id DESC LIMIT 20 OFFSET 0;
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..28.88 rows=20 width=1321) (actual time=17.216..109.905 rows=20 loops=1)
> -> Index Scan Backward using index_lines_on_group_id_and_id on lines (cost=0.00..6416.04 rows=4443 width=1321)
(actualtime=17.207..109.867 rows=20 loops=1)
> Index Cond: ((group_id = ?) AND (id < ?))
> Filter: (deleted_at IS NULL)
> Total runtime: 110.039 ms
>
>
> The result has been pretty dramatic for the "id <> ?" queries, which make up the bulk of the queries. Running a whole
bunchof EXPLAIN ANAYLZE queries also showed that some queries were actually choosing to use the index on `id' instead
of`group_id', and that performed about as poorly as expected. Thankfully, the new index on (group_id, id) seems to be
preferablenearly always.
>
> And for reference, here's the EXPLAIN for the LIMIT, OFFSET query:
>
>
> => EXPLAIN ANALYZE SELECT "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) ORDER
BYid DESC LIMIT 20 OFFSET 60;
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=9274.45..9274.46 rows=20 width=1321) (actual time=109.674..109.708 rows=20 loops=1)
> -> Sort (cost=9274.42..9276.75 rows=4646 width=1321) (actual time=109.606..109.657 rows=80 loops=1)
> Sort Key: id
> Sort Method: top-N heapsort Memory: 43kB
> -> Index Scan using index_lines_on_group_id on lines (cost=0.00..9240.40 rows=4646 width=1321) (actual
time=0.117..98.905rows=7999 loops=1)
> Index Cond: (group_id = ?)
> Filter: (deleted_at IS NULL)
> Total runtime: 109.753 ms
>
>
> - Dave
>
pgsql-performance by date: