Re: Adding pipelining support to set returning functions - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Adding pipelining support to set returning functions
Date
Msg-id 1207908004.6865.14.camel@huvostro
Whole thread Raw
In response to Adding pipelining support to set returning functions  (Hannu Krosing <hannu@krosing.net>)
Responses Re: Adding pipelining support to set returning functions  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
On Fri, 2008-04-11 at 10:57 +0200, Hans-Juergen Schoenig wrote:
> Hannu Krosing wrote:
> > A question to all pg hackers
> >
> > Is anybody working on adding pipelining to set returning functions.
> >
> > How much effort would it take ?
> >
> > Where should I start digging ?
> >   
> 
> i asked myself basically the same question some time ago.
> pipelining seems fairly impossible unless we ban joins on those 
> "plugins" completely.

Not really, just they have to be "materialized" before joins, or
streaming node has to be at the driving side of the join, so you can
fetch one tuple and then join it to index or hash lookup

> i think this should be fine for your case (no need to join PL/proxy 
> partitions) - what we want here is to re-unify data and sent it through 
> centralized BI.

In PL/Proxy context I was aiming at sorting data at nodes and then being
able to merge several partitions while preserving order, and doing this
without having to store N*partition_size rows in resultset.

> >
...
> >   
> 
> currently things like nodeSeqscan do SeqNext and so on - one records is 
> passed on to the next level.
> why not have a nodePlugin or so doing the same?
> or maybe some additional calling convention for streaming functions...
> 
> e.g.:
> CREATE STREAMING FUNCTION xy() RETURNS NEXT RECORD AS $$
>     return exactly one record to keep doing
>     return NULL to mark "end of table"
> $$ LANGUAGE 'any';
> 
> so - for those function no ...
>     WHILE ...
>        RETURN NEXT
> 
> but just one tuple per call ...
> this would pretty much do it for this case.
> i would not even call this a special case - whenever there is a LOT of 
> data, this could make sense.

In python (an also javascript starting at version 1.7) you do it by
returning a generator from a function, which is done by using YIELD
instead of return.

>>> def numgen(i):
...     while 1:
...         yield i
...         i += 1
>>> ng = numgen(1)
>>> ng
<generator object at 0xb7ce3bcc>
>>> ng.next()
1
>>> ng.next()
2

In fact any pl/python function SRF puts its result set to retun buffer
using generator mechanisms, even in case you return the result from
function as a list or an array.

What would be nice, is to wire the python generator directly to
postgreSQLs FuncNext call

At C function level this should probably be a mirror image of AGGREGATE
functions, where you have a init() function that prepares some opaque
data structure and next() for getting records with some special value
for end, and preferrably also some finalize() to clean up in case
postgresql stops before next() indicated EOD.

Maybe some extra info would be nice for optimized, like expected
rowcouunt or that data is returned sorted on some field. This would be
good for current return mechanisms as well.

-----------------
Hannu










pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Separate psql commands from arguments
Next
From: "Zeugswetter Andreas OSB SD"
Date:
Subject: Re: Index AM change proposals, redux