Thread: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first

pgsql: Teach tuplesort.c about "top N" sorting, in which only the first

From
tgl@postgresql.org (Tom Lane)
Date:
Log Message:
-----------
Teach tuplesort.c about "top N" sorting, in which only the first N tuples
need be returned.  We keep a heap of the current best N tuples and sift-up
new tuples into it as we scan the input.  For M input tuples this means
only about M*log(N) comparisons instead of M*log(M), not to mention a lot
less workspace when N is small --- avoiding spill-to-disk for large M
is actually the most attractive thing about it.  Patch includes planner
and executor support for invoking this facility in ORDER BY ... LIMIT
queries.  Greg Stark, with some editorialization by moi.

Modified Files:
--------------
    pgsql/src/backend/executor:
        nodeLimit.c (r1.29 -> r1.30)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/nodeLimit.c.diff?r1=1.29&r2=1.30)
        nodeSort.c (r1.60 -> r1.61)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/nodeSort.c.diff?r1=1.60&r2=1.61)
    pgsql/src/backend/optimizer/path:
        costsize.c (r1.181 -> r1.182)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/path/costsize.c.diff?r1=1.181&r2=1.182)
    pgsql/src/backend/optimizer/plan:
        createplan.c (r1.229 -> r1.230)

(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/plan/createplan.c.diff?r1=1.229&r2=1.230)
        planmain.c (r1.100 -> r1.101)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/plan/planmain.c.diff?r1=1.100&r2=1.101)
        planner.c (r1.218 -> r1.219)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/plan/planner.c.diff?r1=1.218&r2=1.219)
    pgsql/src/backend/optimizer/util:
        pathnode.c (r1.139 -> r1.140)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/util/pathnode.c.diff?r1=1.139&r2=1.140)
    pgsql/src/backend/utils/misc:
        guc.c (r1.389 -> r1.390)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/misc/guc.c.diff?r1=1.389&r2=1.390)
    pgsql/src/backend/utils/sort:
        tuplesort.c (r1.74 -> r1.75)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/sort/tuplesort.c.diff?r1=1.74&r2=1.75)
    pgsql/src/include/nodes:
        execnodes.h (r1.172 -> r1.173)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/nodes/execnodes.h.diff?r1=1.172&r2=1.173)
    pgsql/src/include/optimizer:
        cost.h (r1.85 -> r1.86)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/optimizer/cost.h.diff?r1=1.85&r2=1.86)
        planmain.h (r1.100 -> r1.101)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/optimizer/planmain.h.diff?r1=1.100&r2=1.101)
    pgsql/src/include/utils:
        tuplesort.h (r1.25 -> r1.26)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/utils/tuplesort.h.diff?r1=1.25&r2=1.26)

Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first

From
Magnus Hagander
Date:
Is there some way to see in the generated query plan if this optimisation
is used?

//Magnus


On Thu, May 03, 2007 at 10:13:45PM -0300, Tom Lane wrote:
> Log Message:
> -----------
> Teach tuplesort.c about "top N" sorting, in which only the first N tuples
> need be returned.  We keep a heap of the current best N tuples and sift-up
> new tuples into it as we scan the input.  For M input tuples this means
> only about M*log(N) comparisons instead of M*log(M), not to mention a lot
> less workspace when N is small --- avoiding spill-to-disk for large M
> is actually the most attractive thing about it.  Patch includes planner
> and executor support for invoking this facility in ORDER BY ... LIMIT
> queries.  Greg Stark, with some editorialization by moi.
>
> Modified Files:
> --------------
>     pgsql/src/backend/executor:
>         nodeLimit.c (r1.29 -> r1.30)
>         (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/nodeLimit.c.diff?r1=1.29&r2=1.30)
>         nodeSort.c (r1.60 -> r1.61)
>         (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/nodeSort.c.diff?r1=1.60&r2=1.61)
>     pgsql/src/backend/optimizer/path:
>         costsize.c (r1.181 -> r1.182)
>
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/path/costsize.c.diff?r1=1.181&r2=1.182)
>     pgsql/src/backend/optimizer/plan:
>         createplan.c (r1.229 -> r1.230)
>
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/plan/createplan.c.diff?r1=1.229&r2=1.230)
>         planmain.c (r1.100 -> r1.101)
>
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/plan/planmain.c.diff?r1=1.100&r2=1.101)
>         planner.c (r1.218 -> r1.219)
>
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/plan/planner.c.diff?r1=1.218&r2=1.219)
>     pgsql/src/backend/optimizer/util:
>         pathnode.c (r1.139 -> r1.140)
>
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/util/pathnode.c.diff?r1=1.139&r2=1.140)
>     pgsql/src/backend/utils/misc:
>         guc.c (r1.389 -> r1.390)
>         (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/misc/guc.c.diff?r1=1.389&r2=1.390)
>     pgsql/src/backend/utils/sort:
>         tuplesort.c (r1.74 -> r1.75)
>         (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/sort/tuplesort.c.diff?r1=1.74&r2=1.75)
>     pgsql/src/include/nodes:
>         execnodes.h (r1.172 -> r1.173)
>         (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/nodes/execnodes.h.diff?r1=1.172&r2=1.173)
>     pgsql/src/include/optimizer:
>         cost.h (r1.85 -> r1.86)
>         (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/optimizer/cost.h.diff?r1=1.85&r2=1.86)
>         planmain.h (r1.100 -> r1.101)
>         (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/optimizer/planmain.h.diff?r1=1.100&r2=1.101)
>     pgsql/src/include/utils:
>         tuplesort.h (r1.25 -> r1.26)
>         (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/utils/tuplesort.h.diff?r1=1.25&r2=1.26)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Magnus Hagander <magnus@hagander.net> writes:
> Is there some way to see in the generated query plan if this optimisation
> is used?

If there's a SORT just below a LIMIT (that has a limit, ie it's not just
an OFFSET), then it's potentially used.  Whether it's actually used
depends on actual row counts and widths at runtime --- you'd have to
turn on trace_sort and look at the log output to determine that.

Also, if you want to experiment, you can compile with -DDEBUG_BOUNDED_SORT
to have a GUC variable optimize_bounded_sort that disables the new code
for comparison purposes.

            regards, tom lane

Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first

From
Magnus Hagander
Date:
On Fri, May 04, 2007 at 10:04:08AM -0400, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > Is there some way to see in the generated query plan if this optimisation
> > is used?
>
> If there's a SORT just below a LIMIT (that has a limit, ie it's not just
> an OFFSET), then it's potentially used.  Whether it's actually used
> depends on actual row counts and widths at runtime --- you'd have to
> turn on trace_sort and look at the log output to determine that.

Could we show it in EXPLAIN ANALYZE somehow? I'm thinking it would be good
to see at runtime (for example as a hint that if you put in a bit more
work_mem it might get used)

//Magnus


Magnus Hagander <magnus@hagander.net> writes:
> Could we show it in EXPLAIN ANALYZE somehow? I'm thinking it would be good
> to see at runtime (for example as a hint that if you put in a bit more
> work_mem it might get used)

I don't see that this is any more interesting than whether the sort
spilled to disk or not; which is something we don't show in EXPLAIN
ANALYZE either.  trace_sort is the agreed API for examining that.
It's not exactly easy to do, because (a) none of this information
is exposed outside tuplesort.c, and (b) the tuplesortstate object
is probably gone by the time EXPLAIN ANALYZE runs, anyway.

            regards, tom lane

Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Magnus Hagander <magnus@hagander.net> writes:
>> Could we show it in EXPLAIN ANALYZE somehow? I'm thinking it would be good
>> to see at runtime (for example as a hint that if you put in a bit more
>> work_mem it might get used)
>
> I don't see that this is any more interesting than whether the sort
> spilled to disk or not; which is something we don't show in EXPLAIN
> ANALYZE either.  trace_sort is the agreed API for examining that.
> It's not exactly easy to do, because (a) none of this information
> is exposed outside tuplesort.c, and (b) the tuplesortstate object
> is probably gone by the time EXPLAIN ANALYZE runs, anyway.

It would be positively wonderful to see whether the sort spilled to disk in
the explain analyze. Could we make putting more feedback about sorts in
EXPLAIN ANALYZE output a TODO?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first

From
Magnus Hagander
Date:
On Fri, May 04, 2007 at 12:38:18PM -0400, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > Could we show it in EXPLAIN ANALYZE somehow? I'm thinking it would be good
> > to see at runtime (for example as a hint that if you put in a bit more
> > work_mem it might get used)
>
> I don't see that this is any more interesting than whether the sort
> spilled to disk or not; which is something we don't show in EXPLAIN
> ANALYZE either.  trace_sort is the agreed API for examining that.

Now that you mention it, that'd be nice to have as well - the point being
making it available without recompile.

> It's not exactly easy to do, because (a) none of this information
> is exposed outside tuplesort.c, and (b) the tuplesortstate object
> is probably gone by the time EXPLAIN ANALYZE runs, anyway.

Hmm. Ok. Don't know enough about those parts of the code to comment on
that, but I'll certainly take your word for it :-)

//Magnus


Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first

From
Bruce Momjian
Date:
This has been saved for the 8.4 release:

    http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Gregory Stark wrote:
>
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
> > Magnus Hagander <magnus@hagander.net> writes:
> >> Could we show it in EXPLAIN ANALYZE somehow? I'm thinking it would be good
> >> to see at runtime (for example as a hint that if you put in a bit more
> >> work_mem it might get used)
> >
> > I don't see that this is any more interesting than whether the sort
> > spilled to disk or not; which is something we don't show in EXPLAIN
> > ANALYZE either.  trace_sort is the agreed API for examining that.
> > It's not exactly easy to do, because (a) none of this information
> > is exposed outside tuplesort.c, and (b) the tuplesortstate object
> > is probably gone by the time EXPLAIN ANALYZE runs, anyway.
>
> It would be positively wonderful to see whether the sort spilled to disk in
> the explain analyze. Could we make putting more feedback about sorts in
> EXPLAIN ANALYZE output a TODO?
>
> --
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                 http://www.postgresql.org/about/donate

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Bruce Momjian <bruce@momjian.us> writes:
> This has been saved for the 8.4 release:
>     http://momjian.postgresql.org/cgi-bin/pgpatches_hold

Too late ;-)

            regards, tom lane

Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > This has been saved for the 8.4 release:
> >     http://momjian.postgresql.org/cgi-bin/pgpatches_hold
>
> Too late ;-)

Too late, already removed.  ;-)

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +