Thread: Patch for circular buffer in tuplestore to optimize merge joins (v1)

This patch implements a circular buffer in tuplestore which drops old tuples
as they're no longer needed. It uses this for merge joins to avoid having to
spill the tuplestore if no single value exceeds work_mem. It also is what's
needed for both recursive query support and OLAP window functions (hence why
it implements the more complex circular buffer rather than just moving the
single tuple up to the head of the buffer).

This was mostly already done by Simon, I just finished the logic in tuplesort.c.

This is actually not quite polished so I guess it's still a WIP but it's
certainly ready to be reviewed. All that remains is polishing. If there's
anything in there people object to now I would like to know.



--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Attachment

Re: Patch for circular buffer in tuplestore to optimizemerge joins (v1)

From
"Simon Riggs"
Date:
On Tue, 2007-03-27 at 13:32 +0100, stark wrote:

> This patch implements a circular buffer in tuplestore which drops old tuples
> as they're no longer needed. It uses this for merge joins to avoid having to
> spill the tuplestore if no single value exceeds work_mem. It also is what's
> needed for both recursive query support and OLAP window functions (hence why
> it implements the more complex circular buffer rather than just moving the
> single tuple up to the head of the buffer).
>
> This was mostly already done by Simon, I just finished the logic in tuplesort.c.

Cool.

This item was previously discussed here:
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00096.php

(this changes tuplestore.c not tuplesort.c, though its clear in the
patch)

> This is actually not quite polished so I guess it's still a WIP but it's
> certainly ready to be reviewed. All that remains is polishing. If there's
> anything in there people object to now I would like to know.

I guess a performance test would be a great eyecatcher here, but it does
seem otherwise complete. Any eager merge join testers out there?

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



Re: Patch for circular buffer in tuplestore to optimize merge joins (v1)

From
Bruce Momjian
Date:
Via IM, author says it is ready.

Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


stark wrote:
>
> This patch implements a circular buffer in tuplestore which drops old tuples
> as they're no longer needed. It uses this for merge joins to avoid having to
> spill the tuplestore if no single value exceeds work_mem. It also is what's
> needed for both recursive query support and OLAP window functions (hence why
> it implements the more complex circular buffer rather than just moving the
> single tuple up to the head of the buffer).
>
> This was mostly already done by Simon, I just finished the logic in tuplesort.c.
>
> This is actually not quite polished so I guess it's still a WIP but it's
> certainly ready to be reviewed. All that remains is polishing. If there's
> anything in there people object to now I would like to know.
>

[ Attachment, skipping... ]

>
>
> --
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

stark <stark@enterprisedb.com> writes:
> This patch implements a circular buffer in tuplestore which drops old tuples
> as they're no longer needed. It uses this for merge joins to avoid having to
> spill the tuplestore if no single value exceeds work_mem. It also is what's
> needed for both recursive query support and OLAP window functions (hence why
> it implements the more complex circular buffer rather than just moving the
> single tuple up to the head of the buffer).

Applied with revisions; mostly, that I took out the "circular buffer"
logic because I had no confidence in it.  It's a lot simpler to move the
remaining tuples down to the head of the array during tuplestore_trim.
And at least for the existing use-case, it's demonstrably useless to
have all the extra logic: there'll only be one tuple to move each time,
so it's cheaper to just do that.  When and if we have callers with
different usage patterns, we can measure whether it's worth trying to
avoid the memmove().  I'm dubious that it will be a win even then,
since the only way to have a lot of stuff to move is if you don't
trim very often.

            regards, tom lane