Re: Ordered Append Node - Mailing list pgsql-hackers

From Markus Schiltknecht
Subject Re: Ordered Append Node
Date
Msg-id 4745DA70.9060209@bluegap.ch
Whole thread Raw
In response to Ordered Append Node  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: Ordered Append Node  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
Hello Gregory,

Gregory Stark wrote:
> I've been hacking on the idea of an Append node which maintains the ordering
> of its subtables merging their records in order. This is important for
> partitioned tables since otherwise a lot of plans are not available such as
> merge joins.

Cool!

Some time ago, I've been trying something very similar, but didn't get 
very far. I'd like to share my thoughts anyway.

First of, I envisioned that append node to be applicable also to 
unordered reading from multiple partitions, simply interleaving between 
the partitions.

> The logic I've followed is to do as follows:
> 
> 1) Go through all subrels asking for any interesting pathkey lists. Gather up
>    the union of all of these.

I also tried to modify the Append node first, then figured that it might 
be better to base on the merge join node instead. While this seems 
farther away, I had the hope that a binary tree of such 'plain merge' 
nodes would require less comparisons in total.

Plus, I found it a lot simpler to cope with exactly two input relations 
instead of n, as with the append node. :-)

> 2) Go back through all the subrels and for each accumulated pathkey list ask
>    for the best path for that subrel for that pathkey list. 
> 
> 3) Generate an two append paths for each of these pathkey lists, one using the
>    best startup_cost subpath and one with the best total_cost subpath
>    available for the pathkey list for each child rel. If there is none
>    available take the best unordered path and put a sort node above it.
> 
> 4) Additionally advertise the traditional unordered append node which our
>    parent could choose to put a sort node above same as ever.
> 
> 5) Append plan nodes look a lot like Sort plan nodes glued onto an Append plan
>    node, with sort function oids and so on.
> 
> 6) Append exec state nodes look a lot like a (a few fields from)
>    tuplesortstate glued onto an Append node. They have the ScanKey array and
>    the fmgr sort functions. They also have an array of TupleTableSlot and a
>    heap of indexes into that array.
> 
> 8) ExecAppend does something similar to tuplesort's bounded sort (I fear I'm
>    going to get typecasted) or more to the point, similar to the final merge
>    of a merge sort. It directly returns the slot the child rel last returned
>    to it.

Uh.. well, yeah. I guess you have a better understanding of the planner 
and executor that I do.

> Open questions:
> 
> 1) I still haven't completely figured out what to do with equivalence classes.
>    My hack of just stuffing all the append subrel vars into there seems to
>    work fine. I need to understand what's going on to see if there's really a
>    problem with it or not.
> 
> 2) I'm not sure this code will work when the append rel is a target (ie UPDATE
>    and DELETE stmts).
> 
> 3) It does seem to work when the columns in the subrels don't line up but I
>    didn't do anything special to handle this case.

Uh.. is there any use case for that? WRT partitioning certainly not, is 
there?

I'll have a look at your patch.

Regards

Markus



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Autovacuum and OldestXmin
Next
From: Gregory Stark
Date:
Subject: Re: Ordered Append Node