Thread: Slow first query despite LIMIT and OFFSET clause

Slow first query despite LIMIT and OFFSET clause

From

Phoenix Kiula

Date:

25 January 2009, 23:41:42

Hope someone can shed light on this.

I have a well-performing query that uses the index. Without OFFSET etc
it returns about 11,000 records.

I only need about 20 of these records at any given time, which is why
my LIMIT and OFFSET try to pull only the 20 records or so.

My queries are fast in general *except* the first time. The first time
I pull my 20 records, it takes quite a lot of time -- about 8-10
seconds or so, which is unacceptable in our case.

My guesses:

1. This first query slowness may be because thequery is being read
into the memory?

2. Because the query uses an ORDER BY DESC on another indexed (date)
key, which means it may well be slowing down the sorting the first
time?

Appreciate any thoughts.

My query is:


explain analyze SELECT
            testimonials.url
            ,testimonials.alias
            ,testimonials.aliasEntered
            ,testimonials.title
            ,testimonials.modify_date
            ,testimonials.id
            ,visitcount.visit_count
            ,visitcount.unique_count
            ,visitcount.modify_date
            ,coalesce(  extract(epoch from now()) -  extract(epoch
from visitcount.modify_date), 0)
            ,(select count(id) from testimonials WHERE
testimonials.user_id = 'superman' and testimonials.user_known = 1 and
testimonials.status = 'Y' ) AS total
    FROM testimonials
    LEFT JOIN visitcount ON testimonials.id = visitcount.id
    WHERE
             testimonials.user_id = 'superman'
         and testimonials.user_known = 1
         and testimonials.status = 'Y'
    ORDER BY testimonials.modify_date desc
    OFFSET 0 LIMIT 10
;




                                                     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=224.68..224.71 rows=10 width=187) (actual
time=453.429..453.539 rows=10 loops=1)
   InitPlan
     ->  Aggregate  (cost=63.52..63.53 rows=1 width=8) (actual
time=89.268..89.271 rows=1 loops=1)
           ->  Index Scan using new_idx_userknown on testimonials
(cost=0.00..63.41 rows=42 width=8) (actual time=0.039..49.968
rows=10149 loops=1)
                 Index Cond: ((user_id)::text = 'superman'::text)
                 Filter: (status = 'Y'::bpchar)
   ->  Sort  (cost=161.16..161.26 rows=42 width=187) (actual
time=453.420..453.464 rows=10 loops=1)
         Sort Key: testimonials.modify_date
         ->  Nested Loop Left Join  (cost=0.00..160.02 rows=42
width=187) (actual time=89.384..395.008 rows=10149 loops=1)
               ->  Index Scan using new_idx_userknown on testimonials
(cost=0.00..63.41 rows=42 width=171) (actual time=0.061..50.990
rows=10149 loops=1)
                     Index Cond: ((user_id)::text = 'superman'::text)
                     Filter: (status = 'Y'::bpchar)
               ->  Index Scan using visitcount_pkey1 on visitcount
(cost=0.00..2.28 rows=1 width=24) (actual time=0.007..0.010 rows=1
loops=10149)
                     Index Cond: (testimonials.id = visitcount.id)
 Total runtime: 461.
682 ms
(15 rows)



It's using the following indexes on the "testimonials" table.


"new_idx_modify_date" btree (modify_date) WITH (fillfactor=75)
"new_idx_userknown" btree (user_id) WITH (fillfactor=70) WHERE user_known = 1



THANKS!

Re: Slow first query despite LIMIT and OFFSET clause

From

Scott Marlowe

Date:

26 January 2009, 02:26:56

On Sun, Jan 25, 2009 at 8:41 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

> My query is:
>
>
> explain analyze SELECT
>            testimonials.url
>            ,testimonials.alias
>            ,testimonials.aliasEntered
>            ,testimonials.title
>            ,testimonials.modify_date
>            ,testimonials.id
>            ,visitcount.visit_count
>            ,visitcount.unique_count
>            ,visitcount.modify_date
>            ,coalesce(  extract(epoch from now()) -  extract(epoch
> from visitcount.modify_date), 0)
>            ,(select count(id) from testimonials WHERE
> testimonials.user_id = 'superman' and testimonials.user_known = 1 and
> testimonials.status = 'Y' ) AS total
>    FROM testimonials
>    LEFT JOIN visitcount ON testimonials.id = visitcount.id
>    WHERE
>             testimonials.user_id = 'superman'
>         and testimonials.user_known = 1
>         and testimonials.status = 'Y'
>    ORDER BY testimonials.modify_date desc
>    OFFSET 0 LIMIT 10
>
>
>                                                     QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=224.68..224.71 rows=10 width=187) (actual
> time=453.429..453.539 rows=10 loops=1)
>   InitPlan
>     ->  Aggregate  (cost=63.52..63.53 rows=1 width=8) (actual
> time=89.268..89.271 rows=1 loops=1)
>           ->  Index Scan using new_idx_userknown on testimonials
> (cost=0.00..63.41 rows=42 width=8) (actual time=0.039..49.968
> rows=10149 loops=1)
>                 Index Cond: ((user_id)::text = 'superman'::text)
>                 Filter: (status = 'Y'::bpchar)
>   ->  Sort  (cost=161.16..161.26 rows=42 width=187) (actual
> time=453.420..453.464 rows=10 loops=1)
>         Sort Key: testimonials.modify_date
>         ->  Nested Loop Left Join  (cost=0.00..160.02 rows=42
> width=187) (actual time=89.384..395.008 rows=10149 loops=1)
>               ->  Index Scan using new_idx_userknown on testimonials
> (cost=0.00..63.41 rows=42 width=171) (actual time=0.061..50.990
> rows=10149 loops=1)
>                     Index Cond: ((user_id)::text = 'superman'::text)
>                     Filter: (status = 'Y'::bpchar)
>               ->  Index Scan using visitcount_pkey1 on visitcount
> (cost=0.00..2.28 rows=1 width=24) (actual time=0.007..0.010 rows=1
> loops=10149)

Have you analyzed these tables?  The estimates and real row counts are
quite different.

Re: Slow first query despite LIMIT and OFFSET clause

From

Phoenix Kiula

Date:

26 January 2009, 03:03:47

On Mon, Jan 26, 2009 at 2:26 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Sun, Jan 25, 2009 at 8:41 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>
>> My query is:
>>
>>
>> explain analyze SELECT
>>            testimonials.url
>>            ,testimonials.alias
>>            ,testimonials.aliasEntered
>>            ,testimonials.title
>>            ,testimonials.modify_date
>>            ,testimonials.id
>>            ,visitcount.visit_count
>>            ,visitcount.unique_count
>>            ,visitcount.modify_date
>>            ,coalesce(  extract(epoch from now()) -  extract(epoch
>> from visitcount.modify_date), 0)
>>            ,(select count(id) from testimonials WHERE
>> testimonials.user_id = 'superman' and testimonials.user_known = 1 and
>> testimonials.status = 'Y' ) AS total
>>    FROM testimonials
>>    LEFT JOIN visitcount ON testimonials.id = visitcount.id
>>    WHERE
>>             testimonials.user_id = 'superman'
>>         and testimonials.user_known = 1
>>         and testimonials.status = 'Y'
>>    ORDER BY testimonials.modify_date desc
>>    OFFSET 0 LIMIT 10
>>
>>
>>                                                     QUERY PLAN
>>
----------------------------------------------------------------------------------------------------------------------------------------------------
>>  Limit  (cost=224.68..224.71 rows=10 width=187) (actual
>> time=453.429..453.539 rows=10 loops=1)
>>   InitPlan
>>     ->  Aggregate  (cost=63.52..63.53 rows=1 width=8) (actual
>> time=89.268..89.271 rows=1 loops=1)
>>           ->  Index Scan using new_idx_userknown on testimonials
>> (cost=0.00..63.41 rows=42 width=8) (actual time=0.039..49.968
>> rows=10149 loops=1)
>>                 Index Cond: ((user_id)::text = 'superman'::text)
>>                 Filter: (status = 'Y'::bpchar)
>>   ->  Sort  (cost=161.16..161.26 rows=42 width=187) (actual
>> time=453.420..453.464 rows=10 loops=1)
>>         Sort Key: testimonials.modify_date
>>         ->  Nested Loop Left Join  (cost=0.00..160.02 rows=42
>> width=187) (actual time=89.384..395.008 rows=10149 loops=1)
>>               ->  Index Scan using new_idx_userknown on testimonials
>> (cost=0.00..63.41 rows=42 width=171) (actual time=0.061..50.990
>> rows=10149 loops=1)
>>                     Index Cond: ((user_id)::text = 'superman'::text)
>>                     Filter: (status = 'Y'::bpchar)
>>               ->  Index Scan using visitcount_pkey1 on visitcount
>> (cost=0.00..2.28 rows=1 width=24) (actual time=0.007..0.010 rows=1
>> loops=10149)
>
> Have you analyzed these tables?  The estimates and real row counts are
> quite different.
>



Hi Scott. Yes, there is an autovacuum on both the tables. Should i
additionally do a manual vacuum too?

Re: Slow first query despite LIMIT and OFFSET clause

From

Scott Marlowe

Date:

26 January 2009, 03:04:52

On Sun, Jan 25, 2009 at 11:58 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> Hi Scott. Yes, there is an autovacuum on both the tables. Should i
> additionally do a manual vacuum too?

Nah, just an analyze.  what version of pgsql is this, btw?

Re: Slow first query despite LIMIT and OFFSET clause

From

Phoenix Kiula

Date:

26 January 2009, 03:06:12

On Mon, Jan 26, 2009 at 3:04 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Sun, Jan 25, 2009 at 11:58 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>> Hi Scott. Yes, there is an autovacuum on both the tables. Should i
>> additionally do a manual vacuum too?
>
> Nah, just an analyze.  what version of pgsql is this, btw?
>


Actually both vacuum and analyze happen at regular intervals. Are they
not doing their job?

PG 8.2.9

Re: Slow first query despite LIMIT and OFFSET clause

From

Scott Marlowe

Date:

26 January 2009, 03:07:44

On Mon, Jan 26, 2009 at 12:06 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> On Mon, Jan 26, 2009 at 3:04 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Sun, Jan 25, 2009 at 11:58 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>>> Hi Scott. Yes, there is an autovacuum on both the tables. Should i
>>> additionally do a manual vacuum too?
>>
>> Nah, just an analyze.  what version of pgsql is this, btw?
>>
>
>
> Actually both vacuum and analyze happen at regular intervals. Are they
> not doing their job?

Hard to say.  You could try increasing your stats target on the fields
where the approximation is way off.

Re: Slow first query despite LIMIT and OFFSET clause

From

Alban Hertroys

Date:

27 January 2009, 14:37:33

On Jan 26, 2009, at 4:41 AM, Phoenix Kiula wrote:

> Appreciate any thoughts.
>
> My query is:
>
>
> explain analyze SELECT
>            testimonials.url
>            ,testimonials.alias
>            ,testimonials.aliasEntered
>            ,testimonials.title
>            ,testimonials.modify_date
>            ,testimonials.id
>            ,visitcount.visit_count
>            ,visitcount.unique_count
>            ,visitcount.modify_date
>            ,coalesce(  extract(epoch from now()) -  extract(epoch
> from visitcount.modify_date), 0)
>            ,(select count(id) from testimonials WHERE
> testimonials.user_id = 'superman' and testimonials.user_known = 1 and
> testimonials.status = 'Y' ) AS total
>    FROM testimonials
>    LEFT JOIN visitcount ON testimonials.id = visitcount.id
>    WHERE
>             testimonials.user_id = 'superman'
>         and testimonials.user_known = 1
>         and testimonials.status = 'Y'
>    ORDER BY testimonials.modify_date desc
>    OFFSET 0 LIMIT 10
> ;
>
>
>
>
>                                                     QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=224.68..224.71 rows=10 width=187) (actual
> time=453.429..453.539 rows=10 loops=1)
>   InitPlan
>     ->  Aggregate  (cost=63.52..63.53 rows=1 width=8) (actual
> time=89.268..89.271 rows=1 loops=1)
>           ->  Index Scan using new_idx_userknown on testimonials
> (cost=0.00..63.41 rows=42 width=8) (actual time=0.039..49.968
> rows=10149 loops=1)
>                 Index Cond: ((user_id)::text = 'superman'::text)
>                 Filter: (status = 'Y'::bpchar)
>   ->  Sort  (cost=161.16..161.26 rows=42 width=187) (actual
> time=453.420..453.464 rows=10 loops=1)
>         Sort Key: testimonials.modify_date
>         ->  Nested Loop Left Join  (cost=0.00..160.02 rows=42
> width=187) (actual time=89.384..395.008 rows=10149 loops=1)
>               ->  Index Scan using new_idx_userknown on testimonials
> (cost=0.00..63.41 rows=42 width=171) (actual time=0.061..50.990
> rows=10149 loops=1)
>                     Index Cond: ((user_id)::text = 'superman'::text)
>                     Filter: (status = 'Y'::bpchar)
>               ->  Index Scan using visitcount_pkey1 on visitcount
> (cost=0.00..2.28 rows=1 width=24) (actual time=0.007..0.010 rows=1
> loops=10149)
>                     Index Cond: (testimonials.id = visitcount.id)
> Total runtime: 461.
> 682 ms
> (15 rows)


Does that query plan look any better without the select count(id) from
testimonials?

If so you may be better off keeping track of those counts in a
separate table updated by triggers on the testimonials table. Whether
that really helps depends on how variable your selectors are to
determine those counts. If those counts are generally very low the
benefit will probably be minimal.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,497f5466747032672819277!

Re: Slow first query despite LIMIT and OFFSET clause

From

Phoenix Kiula

Date:

28 January 2009, 08:38:43

On Wed, Jan 28, 2009 at 2:37 AM, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:
>
> Does that query plan look any better without the select count(id) from
> testimonials?
>
> If so you may be better off keeping track of those counts in a separate
> table updated by triggers on the testimonials table. Whether that really
> helps depends on how variable your selectors are to determine those counts.
> If those counts are generally very low the benefit will probably be minimal.
>



Thanks Alban. We have now made all the triggers and such. That part is
working. I suppose not having the count(id) is helping just with a few
seconds, but the query is still taking about 15 seconds in some cases.

Here are the query and its exec plan again fyi. Any other ideas for tweaking?




explain analyze SELECT
           testimonials.url
           ,testimonials.alias
           ,testimonials.aliasEntered
           ,testimonials.title
           ,testimonials.modify_date
           ,testimonials.id
           ,visitcount.visit_count
           ,visitcount.unique_count
           ,visitcount.modify_date
           ,coalesce(  extract(epoch from now()) -  extract(epoch
from visitcount.modify_date), 0)
           ,(select count(id) from testimonials WHERE
testimonials.user_id = 'superman' and testimonials.user_known = 1 and
testimonials.status = 'Y' ) AS total
   FROM testimonials
   LEFT JOIN visitcount ON testimonials.id = visitcount.id
   WHERE
            testimonials.user_id = 'superman'
        and testimonials.user_known = 1
        and testimonials.status = 'Y'
   ORDER BY testimonials.modify_date desc
   OFFSET 0 LIMIT 10
;




                                                    QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=224.68..224.71 rows=10 width=187) (actual
time=453.429..453.539 rows=10 loops=1)
  InitPlan
    ->  Aggregate  (cost=63.52..63.53 rows=1 width=8) (actual
time=89.268..89.271 rows=1 loops=1)
          ->  Index Scan using new_idx_userknown on testimonials
(cost=0.00..63.41 rows=42 width=8) (actual time=0.039..49.968
rows=10149 loops=1)
                Index Cond: ((user_id)::text = 'superman'::text)
                Filter: (status = 'Y'::bpchar)
  ->  Sort  (cost=161.16..161.26 rows=42 width=187) (actual
time=453.420..453.464 rows=10 loops=1)
        Sort Key: testimonials.modify_date
        ->  Nested Loop Left Join  (cost=0.00..160.02 rows=42
width=187) (actual time=89.384..395.008 rows=10149 loops=1)
              ->  Index Scan using new_idx_userknown on testimonials
(cost=0.00..63.41 rows=42 width=171) (actual time=0.061..50.990
rows=10149 loops=1)
                    Index Cond: ((user_id)::text = 'superman'::text)
                    Filter: (status = 'Y'::bpchar)
              ->  Index Scan using visitcount_pkey1 on visitcount
(cost=0.00..2.28 rows=1 width=24) (actual time=0.007..0.010 rows=1
loops=10149)
                    Index Cond: (testimonials.id = visitcount.id)
 Total runtime: 461.
682 ms
(15 rows)

Re: Slow first query despite LIMIT and OFFSET clause

From

Alban Hertroys

Date:

28 January 2009, 14:25:21

On Jan 28, 2009, at 1:38 PM, Phoenix Kiula wrote:

> Thanks Alban. We have now made all the triggers and such. That part is
> working. I suppose not having the count(id) is helping just with a few
> seconds, but the query is still taking about 15 seconds in some cases.
>
> Here are the query and its exec plan again fyi. Any other ideas for
> tweaking?

Ah I see, that's the original query and its plan again, not the one
after implementing those triggers! You had me scratching my head for a
bit there, wondering why the count() subquery was still there.

A few things in this query appear to take relatively much time:

- The index scans on new_idx_userknown; What's worrying there is that
the planner expects only a few rows (42) while in actuality they are
quite many (10149). This scan is performed twice too! It seems that
the statistics that index uses are off. That may mean changing the
statistics on the columns involved or increasing the frequency that
autovacuum visits them.

- The nested loop left join is expensive; That's probably also due to
the incorrect assumptions the planner makes about the index scans I
mentioned above. It expects to have to loop 42 times, but ends up
doing so 10149 times instead! I believe loops aren't particularly
efficient, they'll only beat other methods if there are few rows to
loop through.
The loop is taking 395-89 = 306 ms for 10149 rows, while the planner
expected it to take 306 * (42/10149) = 1.3 ms. Quite a difference!

You probably need to do something about new_idx_userknown. A partial
index (as suggested elsewhere) may help make it smaller (easier to fit
in RAM, fewer branches required to find a node), but the bad
statistics are likely to be the real problem here. Without knowing
anything about that particular index and the tables it's indexing it's
hard to tell how to improve it.

> explain analyze SELECT
>           testimonials.url
>           ,testimonials.alias
>           ,testimonials.aliasEntered
>           ,testimonials.title
>           ,testimonials.modify_date
>           ,testimonials.id
>           ,visitcount.visit_count
>           ,visitcount.unique_count
>           ,visitcount.modify_date
>           ,coalesce(  extract(epoch from now()) -  extract(epoch
> from visitcount.modify_date), 0)
>           ,(select count(id) from testimonials WHERE
> testimonials.user_id = 'superman' and testimonials.user_known = 1 and
> testimonials.status = 'Y' ) AS total
>   FROM testimonials
>   LEFT JOIN visitcount ON testimonials.id = visitcount.id
>   WHERE
>            testimonials.user_id = 'superman'
>        and testimonials.user_known = 1
>        and testimonials.status = 'Y'
>   ORDER BY testimonials.modify_date desc
>   OFFSET 0 LIMIT 10
> ;
>
>
>
>
>                                                    QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=224.68..224.71 rows=10 width=187) (actual
> time=453.429..453.539 rows=10 loops=1)
>  InitPlan
>    ->  Aggregate  (cost=63.52..63.53 rows=1 width=8) (actual
> time=89.268..89.271 rows=1 loops=1)
>          ->  Index Scan using new_idx_userknown on testimonials
> (cost=0.00..63.41 rows=42 width=8) (actual time=0.039..49.968
> rows=10149 loops=1)
>                Index Cond: ((user_id)::text = 'superman'::text)
>                Filter: (status = 'Y'::bpchar)
>  ->  Sort  (cost=161.16..161.26 rows=42 width=187) (actual
> time=453.420..453.464 rows=10 loops=1)
>        Sort Key: testimonials.modify_date
>        ->  Nested Loop Left Join  (cost=0.00..160.02 rows=42
> width=187) (actual time=89.384..395.008 rows=10149 loops=1)
>              ->  Index Scan using new_idx_userknown on testimonials
> (cost=0.00..63.41 rows=42 width=171) (actual time=0.061..50.990
> rows=10149 loops=1)
>                    Index Cond: ((user_id)::text = 'superman'::text)
>                    Filter: (status = 'Y'::bpchar)
>              ->  Index Scan using visitcount_pkey1 on visitcount
> (cost=0.00..2.28 rows=1 width=24) (actual time=0.007..0.010 rows=1
> loops=10149)
>                    Index Cond: (testimonials.id = visitcount.id)
> Total runtime: 461.
> 682 ms
> (15 rows)
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
>

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4980a309747032541118883!

Re: Slow first query despite LIMIT and OFFSET clause

From

Phoenix Kiula

Date:

28 January 2009, 20:35:43

On Thu, Jan 29, 2009 at 2:25 AM, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:


> Ah I see, that's the original query and its plan again, not the one after
> implementing those triggers! You had me scratching my head for a bit there,
> wondering why the count() subquery was still there.



Yes that was just for info. Here are the new query without the count()
in there:


explain analyze SELECT
          testimonials.url
          ,testimonials.alias
          ,testimonials.aliasEntered
          ,testimonials.title
          ,testimonials.modify_date
          ,testimonials.id
          ,visitcount.visit_count
          ,visitcount.unique_count
          ,visitcount.modify_date
          ,coalesce(  extract(epoch from now()) -  extract(epoch from
visitcount.modify_date), 0)
  FROM testimonials
  LEFT OUTER JOIN visitcount USING (id)
  WHERE
               testimonials.user_id = 'superman'
       and testimonials.user_known = 1
       and testimonials.status = 'Y'
  ORDER BY testimonials.modify_date desc
  OFFSET 0 LIMIT 10
;




QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=61.42..61.44 rows=10 width=162) (actual
time=105.400..105.499 rows=10 loops=1)
   ->  Sort  (cost=61.42..61.46 rows=16 width=162) (actual
time=105.392..105.425 rows=10 loops=1)
         Sort Key: testimonials.modify_date
         ->  Nested Loop Left Join  (cost=0.00..61.10 rows=16
width=162) (actual time=0.092..94.516 rows=2027 loops=1)
               ->  Index Scan using new_idx_userknown on testimonials
(cost=0.00..24.29 rows=16 width=146) (actual time=0.058..10.983
rows=2027 loops=1)
                     Index Cond: ((user_id)::text = 'superman'::text)
                     Filter: (status = 'Y'::bpchar)
               ->  Index Scan using visitcount_pkey1 on visitcount
(cost=0.00..2.28 rows=1 width=24) (actual time=0.024..0.026 rows=1
loops=2027)
                     Index Cond: (testimonials.id = visitcount.id)
 Total runtime: 105.652 ms
(10 rows)




Note that I have an index on user_id, but because this is a website,
there are several user_ids where we only have the IP. The above query
is only ever needed for registered users, so for just the registered
users we created another partial index called

     "new_idx_userknown" btree (user_id) WHERE user_known = 1

Of course for unregistered users we use user_known = 0, so they are
excluded from this index. Is this not a useful partial index? I think
in this SQL, the user_id is always "superman" and the user_known
always 1 which is why the guesstimate from the planner may be off?

Love to hear thoughts.

THANKS!

Re: Slow first query despite LIMIT and OFFSET clause

From

Alban Hertroys

Date:

29 January 2009, 03:56:24

On Jan 29, 2009, at 1:35 AM, Phoenix Kiula wrote:

> On Thu, Jan 29, 2009 at 2:25 AM, Alban Hertroys
> <dalroi@solfertje.student.utwente.nl> wrote:
>
>> Ah I see, that's the original query and its plan again, not the one
>> after
>> implementing those triggers! You had me scratching my head for a
>> bit there,
>> wondering why the count() subquery was still there.
>
> Yes that was just for info. Here are the new query without the count()
> in there:
>
>
> explain analyze SELECT
>          testimonials.url
>          ,testimonials.alias
>          ,testimonials.aliasEntered
>          ,testimonials.title
>          ,testimonials.modify_date
>          ,testimonials.id
>          ,visitcount.visit_count
>          ,visitcount.unique_count
>          ,visitcount.modify_date
>          ,coalesce(  extract(epoch from now()) -  extract(epoch from
> visitcount.modify_date), 0)
>  FROM testimonials
>  LEFT OUTER JOIN visitcount USING (id)
>  WHERE
>               testimonials.user_id = 'superman'
>       and testimonials.user_known = 1
>       and testimonials.status = 'Y'
>  ORDER BY testimonials.modify_date desc
>  OFFSET 0 LIMIT 10
> ;
>
>
> QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=61.42..61.44 rows=10 width=162) (actual
> time=105.400..105.499 rows=10 loops=1)
>   ->  Sort  (cost=61.42..61.46 rows=16 width=162) (actual
> time=105.392..105.425 rows=10 loops=1)
>         Sort Key: testimonials.modify_date
>         ->  Nested Loop Left Join  (cost=0.00..61.10 rows=16
> width=162) (actual time=0.092..94.516 rows=2027 loops=1)
>               ->  Index Scan using new_idx_userknown on testimonials
> (cost=0.00..24.29 rows=16 width=146) (actual time=0.058..10.983
> rows=2027 loops=1)
>                     Index Cond: ((user_id)::text = 'superman'::text)
>                     Filter: (status = 'Y'::bpchar)
>               ->  Index Scan using visitcount_pkey1 on visitcount
> (cost=0.00..2.28 rows=1 width=24) (actual time=0.024..0.026 rows=1
> loops=2027)
>                     Index Cond: (testimonials.id = visitcount.id)
> Total runtime: 105.652 ms
> (10 rows)
>
> Note that I have an index on user_id, but because this is a website,
> there are several user_ids where we only have the IP. The above query
> is only ever needed for registered users, so for just the registered
> users we created another partial index called
>
>     "new_idx_userknown" btree (user_id) WHERE user_known = 1
>
> Of course for unregistered users we use user_known = 0, so they are
> excluded from this index. Is this not a useful partial index? I think
> in this SQL, the user_id is always "superman" and the user_known
> always 1 which is why the guesstimate from the planner may be off?
>
> Love to hear thoughts.

Well, that seems to have got you rid of the somewhat expensive index
scans on new_idx_userknown as well (the duplicate entry for the scan
being due to the subquery of course).

What's remaining is the left join.
If I understand correctly you have a PK on visitcount.id and that
table only contains records for people who have a visitcount > 0? That
table gets updated a lot I'd think? The query plan still shows a bad
estimate on that join; it has improved, but not enough.

Does the plan look better right after you ANALYSE visitcount? I'm
suspecting you either need to autovacuum visitcount more frequently or
you need to increase the statistics size on visitcount.id. You're
updating that table a lot I think, which creates one new dead row for
every update. Letting vacuum mark the dead ones as reusable more
frequently should also help keep that table and it's indexes cleaner,
although the records the indexes are pointing to will be all over the
place.

I'm wondering... In highly updated tables it's probably more efficient
to leave the dead rows alone (just marking them dead) and only append
the updated ones at the end of the table? The dead rows will
accumulate at the start of the table while the new ones go to the end.
After a while a large section of the start of the table could just be
removed as it'd only contain dead rows... This may already be in place
of course, I don't have time now to look into the design specifics and
it seems kind of an obvious thing to do!

Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,49816122747034095710041!

Re: Slow first query despite LIMIT and OFFSET clause

From

Gregory Stark

Date:

29 January 2009, 07:39:40

Phoenix Kiula <phoenix.kiula@gmail.com> writes:

>                      Index Cond: ((user_id)::text = 'superman'::text)
>                      Filter: (status = 'Y'::bpchar)

> Of course for unregistered users we use user_known = 0, so they are
> excluded from this index. Is this not a useful partial index? I think
> in this SQL, the user_id is always "superman" and the user_known
> always 1 which is why the guesstimate from the planner may be off?

Well the histograms are for each column separately, so the planner will take
the selectivity estimates for user_id='superman' and status = 'Y' and multiply
them.

If the "status" of 'superman' records are very different from the status
records as a whole then this will give poor results.

If that's not the case then raising the statistics target for those two
columns might help. And of course if the table hasn't been analyzed recently
then analyzing it more often is always good.

There isn't really a good solution for cross-column stats. You could perhaps
create a functional index (could still be partial too) on an expression like

  CASE WHEN status = 'Y' THEN superman

Which will make Postgres build stats for the result of that expression
specifically. Then if you use that expression exactly as-is in the query the
planner should those statistics. I think. I haven't tried this... Tell us how
it goes :)

I wonder if we should look at building "partial" histograms for the columns in
partial indexes effectively equivalent to this... hm...

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!