Re: Sorting. When? - Mailing list pgsql-hackers

From Nicolas Barbier
Subject Re: Sorting. When?
Date
Msg-id AANLkTimd-aUg-Zhu5kcxBRdSn7vi5KOaYOtMOmZwRd4p@mail.gmail.com
Whole thread Raw
In response to Re: Sorting. When?  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: Sorting. When?  ("mac_man2008@yahoo.it" <mac_man2008@yahoo.it>)
List pgsql-hackers
2011/2/11 Kevin Grittner <Kevin.Grittner@wicourts.gov>:

> "mac_man2008@yahoo.it" <mac_man2008@yahoo.it> wrote:
>
>> I need to know, from an algorithmic point of view, in which cases
>> sorting is invoked.

[..]

> Are your really looking to categorize the types of queries where
> sorting is *invoked*, or the ones where it is *considered*?  Or
> perhaps only those where it is *required*, since there are no
> possible plans without sorting?

Or, if you are seeking the exact rules that are used by the planner to
determine all possible plans from which the one with minimum cost is
chosen (and hence all ways in which sorts can be used), I think that
the source code is the only complete reference. A non-complete
introduction is:

<URL:http://www.postgresql.org/docs/9.0/static/planner-optimizer.html>

Basically, physical operators (seq scan, index scan, hash join, merge
join, nested loop, filter, set operation, etc) may require their input
to satisfy certain sort constraints (for example, both inputs of a
merge join need to be sorted on the join attribute(s)). If it happens
to be of lowest cost to explicitly sort the inputs right before
consuming them, that will be done. If there is a way to get the same
input in an already-ordered way (for example an index scan, or the
output of a merge join), so that the cost is less than the non-ordered
way + an explicit sort, then that already-ordered way will be chosen.

Super-basic example:

SELECT * FROM t ORDER BY a;

This may either perform a seq scan of table t and then do an explicit
sort, or perform a full index scan of an index on attribute a
(provided that such an index exists), in which case the explicit sort
is not needed because the index scan will deliver the rows in
already-sorted order. Which option is chosen depends on the cost: The
costs of both plans are calculated and the least costly plan is
chosen. See the (non-exhaustive) list of things that influence the
costs provided by Kevin to get a feeling for how many variables there
are that influence this choice.

Nicolas


pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Range Types: empty ranges
Next
From: Robert Haas
Date:
Subject: Re: Debian readline/libedit breakage