Re: Slow first query despite LIMIT and OFFSET clause - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Slow first query despite LIMIT and OFFSET clause
Date
Msg-id BBB69B27-0815-4335-8A4D-2CA8D3F0C40C@solfertje.student.utwente.nl
Whole thread Raw
In response to Re: Slow first query despite LIMIT and OFFSET clause  (Phoenix Kiula <phoenix.kiula@gmail.com>)
List pgsql-general
On Jan 29, 2009, at 1:35 AM, Phoenix Kiula wrote:

> On Thu, Jan 29, 2009 at 2:25 AM, Alban Hertroys
> <dalroi@solfertje.student.utwente.nl> wrote:
>
>> Ah I see, that's the original query and its plan again, not the one
>> after
>> implementing those triggers! You had me scratching my head for a
>> bit there,
>> wondering why the count() subquery was still there.
>
> Yes that was just for info. Here are the new query without the count()
> in there:
>
>
> explain analyze SELECT
>          testimonials.url
>          ,testimonials.alias
>          ,testimonials.aliasEntered
>          ,testimonials.title
>          ,testimonials.modify_date
>          ,testimonials.id
>          ,visitcount.visit_count
>          ,visitcount.unique_count
>          ,visitcount.modify_date
>          ,coalesce(  extract(epoch from now()) -  extract(epoch from
> visitcount.modify_date), 0)
>  FROM testimonials
>  LEFT OUTER JOIN visitcount USING (id)
>  WHERE
>               testimonials.user_id = 'superman'
>       and testimonials.user_known = 1
>       and testimonials.status = 'Y'
>  ORDER BY testimonials.modify_date desc
>  OFFSET 0 LIMIT 10
> ;
>
>
> QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=61.42..61.44 rows=10 width=162) (actual
> time=105.400..105.499 rows=10 loops=1)
>   ->  Sort  (cost=61.42..61.46 rows=16 width=162) (actual
> time=105.392..105.425 rows=10 loops=1)
>         Sort Key: testimonials.modify_date
>         ->  Nested Loop Left Join  (cost=0.00..61.10 rows=16
> width=162) (actual time=0.092..94.516 rows=2027 loops=1)
>               ->  Index Scan using new_idx_userknown on testimonials
> (cost=0.00..24.29 rows=16 width=146) (actual time=0.058..10.983
> rows=2027 loops=1)
>                     Index Cond: ((user_id)::text = 'superman'::text)
>                     Filter: (status = 'Y'::bpchar)
>               ->  Index Scan using visitcount_pkey1 on visitcount
> (cost=0.00..2.28 rows=1 width=24) (actual time=0.024..0.026 rows=1
> loops=2027)
>                     Index Cond: (testimonials.id = visitcount.id)
> Total runtime: 105.652 ms
> (10 rows)
>
> Note that I have an index on user_id, but because this is a website,
> there are several user_ids where we only have the IP. The above query
> is only ever needed for registered users, so for just the registered
> users we created another partial index called
>
>     "new_idx_userknown" btree (user_id) WHERE user_known = 1
>
> Of course for unregistered users we use user_known = 0, so they are
> excluded from this index. Is this not a useful partial index? I think
> in this SQL, the user_id is always "superman" and the user_known
> always 1 which is why the guesstimate from the planner may be off?
>
> Love to hear thoughts.


Well, that seems to have got you rid of the somewhat expensive index
scans on new_idx_userknown as well (the duplicate entry for the scan
being due to the subquery of course).

What's remaining is the left join.
If I understand correctly you have a PK on visitcount.id and that
table only contains records for people who have a visitcount > 0? That
table gets updated a lot I'd think? The query plan still shows a bad
estimate on that join; it has improved, but not enough.

Does the plan look better right after you ANALYSE visitcount? I'm
suspecting you either need to autovacuum visitcount more frequently or
you need to increase the statistics size on visitcount.id. You're
updating that table a lot I think, which creates one new dead row for
every update. Letting vacuum mark the dead ones as reusable more
frequently should also help keep that table and it's indexes cleaner,
although the records the indexes are pointing to will be all over the
place.



I'm wondering... In highly updated tables it's probably more efficient
to leave the dead rows alone (just marking them dead) and only append
the updated ones at the end of the table? The dead rows will
accumulate at the start of the table while the new ones go to the end.
After a while a large section of the start of the table could just be
removed as it'd only contain dead rows... This may already be in place
of course, I don't have time now to look into the design specifics and
it seems kind of an obvious thing to do!

Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,49816122747034095710041!



pgsql-general by date:

Previous
From: Simon Riggs
Date:
Subject: Re: New 8.4 hot standby feature
Next
From: Ivan Sergio Borgonovo
Date:
Subject: Re: very long update gin index troubles back?