Re: The usual sequential scan, but with LIMIT ! - Mailing list pgsql-performance

From Pierre-Frédéric Caillaud
Subject Re: The usual sequential scan, but with LIMIT !
Date
Msg-id opsdx2os1ncq72hf@musicbox
Whole thread Raw
In response to Re: The usual sequential scan, but with LIMIT !  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: The usual sequential scan, but with LIMIT !
List pgsql-performance
    OK, thanks a lot for your explanations. Knowing how the planner "thinks",
makes it pretty logical. Thank you.

    Now another question...

    I have a table of records representing forum posts with a primary key
(id), a topic_id, a timestamp, and other fields which I won't detail. I
want to split them into pages (like forums usually do), with N posts per
page. In that case :

    SELECT * FROM table WHERE topic_id=... ORDER BY post_timestamp asc LIMIT
N OFFSET N*page;

    Also it's almost the same to order by id rather than post_timestamp (id
being a serial).

    SELECT * FROM table WHERE topic_id=... ORDER BY id asc LIMIT N OFFSET
N*page;

    This query runs slower and slower as the OFFSET grows, which is a problem
because the most accessed page in a forum is the last one.

    So, for the last page, I tried :
    SELECT * FROM table WHERE topic_id=... ORDER BY id desc LIMIT N;
    But this does not use the index at all (seq scan + sort + limit).

    My solution is simple : build an index on (-id), or on (some
date)-post_timestamp, then :
    SELECT * FROM table WHERE topic_id=... ORDER BY (-id) desc LIMIT N;

    Then the last page is the fastest one, but it always shows N posts.
That's not a problem, so I guess I'll use that. I don't like forums which
show 1 post on the last page because the number of posts modulo N is 1.
    I may store the number of posts in a forum (updated by a trigger) to
avoid costly COUNT queries to count the pages, so I could use ORDER BY id
for the first half of the pages, and ORDER BY (-id) for the rest, so it
will always be fastest.

    I could even create a pages table to store the id of the first post on
that page and then :
    SELECT * FROM table WHERE topic_id=... AND id>id_of_first_post_in_page
ORDER BY id asc LIMIT N;
    then all pages would be aqually fast.

    Or, I could cache the query results for all pages but the last one.

    Finally, the question : having a multiple field btree, it is not harder
to scan it in "desc order" than in "asc order". So why does not Postgres
do it ? Here is a btree example :

    topic_id    id
    1        1
    1        10
    2        2
    2        5
    2        17
    3        4
    3        6

    suppose I SELECT WHERE topic_id=2 ORDER BY topic_id ASC,id ASC.
    Postgres simply finds the first row with topic_id=2 and goes from there.

    suppose I SELECT WHERE topic_id=2 ORDER BY topic_id ASC,id DESC.
    Postgres does a seq scan, but it could think a bit more and start at
"first index node which has topic_id>2" (simple to find in a btree) then
go backwards in the index. This can ge beneralized to any combination of
(asc,desc).

    I made some more experiments, and saw Postgres does an 'Index Scan' when
ORDER BY clauses are all ASC, and an 'Index Scan Backwards' when all ORDER
BY are DESC. However, it does not handle a combination of ASC and DESC?

    What do you think of this ?


On Mon, 06 Sep 2004 12:40:41 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> =?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=
> <lists@boutiquenumerique.com> writes:
>> Now, if I LIMIT the query to 10 rows, the index should be used all the
>> time, because it will always return few rows... well, it doesn't !
>
> Not at all.  From the planner's point of view, the LIMIT is going to
> reduce the cost by about a factor of 10/1403, since the underlying plan
> step will only be run partway through.  That's not going to change the
> decision about which underlying plan step is cheapest: 10/1403 of a
> cheaper plan is still always less than 10/1403 of a more expensive plan.
>
> Later, you note that LIMIT with ORDER BY does affect the plan choice
> --- that's because in that situation one plan alternative has a much
> higher startup cost than the other (namely the cost of a sort step).
> A small LIMIT can allow the fast-startup plan to be chosen even though
> it would be estimated to be the loser if run to completion.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: The usual sequential scan, but with LIMIT !
Next
From: G u i d o B a r o s i o
Date:
Subject: TOAST tables, cannot truncate