Re: why is the LIMIT clause slowing down this SELECT? - Mailing list pgsql-general

From Mason Hale
Subject Re: why is the LIMIT clause slowing down this SELECT?
Date
Msg-id 8bca3aa10708011942w57b2922an470b704f40b99d85@mail.gmail.com
Whole thread Raw
In response to Re: why is the LIMIT clause slowing down this SELECT?  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: why is the LIMIT clause slowing down this SELECT?  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-general
> Let's call those plan 1 and plan 2.
>
> In plan 1, the planner thinks that it will find 25 tuples matching that
> topic_id quickly during the backwards index scan on
> topic_feed_score_index. Instead, it looks like it has to go through a
> lot of tuples before it finds the necessary 25.
>
> In plan 2, it does it backward: first it finds the tuples matching the
> topic_id, then it sorts those and returns the first 25.
>
> If I'm reading the plans correctly (plan 2, in particular), there are
> 2460 tuples matching the topic_id, but only 492 are visible. I don't
> know why there are so many invisible tuples that still exist in the
> index after your VACUUM.
>
> Are there a lot of UPDATEs/DELETEs on tuples with that topic_id between
> when you run the VACUUM and when you run the SELECT? Do you have big
> transactions open when running the VACUUM?

we have autovacuum running, but do update this table in particular
frequently. Every row in this table probably gets updated at least
once per day.

>
> Also, how many tuples are in the table overall?
>

147,207

> The stats look fairly accurate: it's only off on the estimate of
> matching rows by about a factor of two (which isn't terribly bad), 814
> estimated versus 492 that are actually visible and match the topic_id.
>
> If I had to guess, I'd say you either:
>
> (a) have too many invisible tuples - you can correct this with more
> frequent VACUUMing
> (b) the distribution of tuples with matching topic_id is not even, and
> many of the tuples with a matching topic_id happen to have a low score -
> this is harder to fix, because the stats collector can't detect it.
> You'd probably have to coerce it to use plan 2.

The score in this case are definitely not a normal distribution. They
follow a power law pattern, with a few with very high scores and a
long tail.

I ended up coercing it to use plan 2 by dropping the index on topic_feed(score).

Which raises another question -- if the planner has already used an
index on topic_id to select the rows, would it ever us another index
on score to order the rows? Or is a compound topic_feed(topic_id,
score) index the way to go there?

thanks,
Mason

pgsql-general by date:

Previous
From: "Josh Tolley"
Date:
Subject: What do people like to monitor (or in other words, what might be nice in pgsnmpd)?
Next
From: "Andrej Ricnik-Bay"
Date:
Subject: Re: Linux distro