Re: Performance on large, append-only tables - Mailing list pgsql-performance

From David Yeu
Subject Re: Performance on large, append-only tables
Date
Msg-id 4205A216-5AD0-4F23-826F-542537B27E62@skype.net
Whole thread Raw
In response to Performance on large, append-only tables  (David Yeu <david.yeu@skype.net>)
Responses Re: Performance on large, append-only tables  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-performance
Yeah, Reply-All...

Begin forwarded message:

> From: David Yeu <david.yeu@skype.net>
> Subject: Re: [PERFORM] Performance on large, append-only tables
> Date: February 10, 2012 10:59:04 AM EST
> To: Merlin Moncure <mmoncure@gmail.com>
>
> On Feb 10, 2012, at 10:19 AM, Merlin Moncure wrote:
>
>> You can probably significantly optimize this.  But first, can we see
>> some explain analyze for the affected queries?
>
> Sorry, we should have included these in the original post. Here's the EXPLAIN output for a "id < ?" query:
>
>
> => EXPLAIN ANALYZE SELECT  "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) AND (id
<?) ORDER BY id DESC LIMIT 20 OFFSET 0; 
>                                                                       QUERY PLAN
                                  
>
---------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=9267.44..9267.45 rows=20 width=1321) (actual time=348.844..348.877 rows=20 loops=1)
>   ->  Sort  (cost=9267.44..9269.76 rows=4643 width=1321) (actual time=348.840..348.852 rows=20 loops=1)
>         Sort Key: id
>         Sort Method:  top-N heapsort  Memory: 29kB
>         ->  Index Scan using index_lines_on_group_id on lines  (cost=0.00..9242.73 rows=4643 width=1321) (actual
time=6.131..319.835rows=23038 loops=1) 
>               Index Cond: (group_id = ?)
>               Filter: ((deleted_at IS NULL) AND (id < ?))
> Total runtime: 348.987 ms
>
>
> A quick suggestion from Heroku yesterday was a new index on (group_id, id). After adding it to a database fork, we
endedup with: 
>
>
> => EXPLAIN ANALYZE SELECT  "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) AND (id
<?) ORDER BY id DESC LIMIT 20 OFFSET 0; 
>                                                                            QUERY PLAN
                                           
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=0.00..28.88 rows=20 width=1321) (actual time=17.216..109.905 rows=20 loops=1)
>   ->  Index Scan Backward using index_lines_on_group_id_and_id on lines  (cost=0.00..6416.04 rows=4443 width=1321)
(actualtime=17.207..109.867 rows=20 loops=1) 
>         Index Cond: ((group_id = ?) AND (id < ?))
>         Filter: (deleted_at IS NULL)
> Total runtime: 110.039 ms
>
>
> The result has been pretty dramatic for the "id <> ?" queries, which make up the bulk of the queries. Running a whole
bunchof EXPLAIN ANAYLZE queries also showed that some queries were actually choosing to use the index on `id' instead
of`group_id', and that performed about as poorly as expected. Thankfully, the new index on (group_id, id) seems to be
preferablenearly always. 
>
> And for reference, here's the EXPLAIN for the LIMIT, OFFSET query:
>
>
> => EXPLAIN ANALYZE SELECT  "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) ORDER
BYid DESC LIMIT 20 OFFSET 60; 
>                                                                      QUERY PLAN
                                
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=9274.45..9274.46 rows=20 width=1321) (actual time=109.674..109.708 rows=20 loops=1)
>   ->  Sort  (cost=9274.42..9276.75 rows=4646 width=1321) (actual time=109.606..109.657 rows=80 loops=1)
>         Sort Key: id
>         Sort Method:  top-N heapsort  Memory: 43kB
>         ->  Index Scan using index_lines_on_group_id on lines  (cost=0.00..9240.40 rows=4646 width=1321) (actual
time=0.117..98.905rows=7999 loops=1) 
>               Index Cond: (group_id = ?)
>               Filter: (deleted_at IS NULL)
> Total runtime: 109.753 ms
>
>
> - Dave
>



pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Performance on large, append-only tables
Next
From: Claudio Freire
Date:
Subject: Re: Performance on large, append-only tables