Re: Adding pipelining support to set returning functions - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: Adding pipelining support to set returning functions |
Date | |
Msg-id | 1207908004.6865.14.camel@huvostro Whole thread Raw |
In response to | Adding pipelining support to set returning functions (Hannu Krosing <hannu@krosing.net>) |
Responses |
Re: Adding pipelining support to set returning functions
|
List | pgsql-hackers |
On Fri, 2008-04-11 at 10:57 +0200, Hans-Juergen Schoenig wrote: > Hannu Krosing wrote: > > A question to all pg hackers > > > > Is anybody working on adding pipelining to set returning functions. > > > > How much effort would it take ? > > > > Where should I start digging ? > > > > i asked myself basically the same question some time ago. > pipelining seems fairly impossible unless we ban joins on those > "plugins" completely. Not really, just they have to be "materialized" before joins, or streaming node has to be at the driving side of the join, so you can fetch one tuple and then join it to index or hash lookup > i think this should be fine for your case (no need to join PL/proxy > partitions) - what we want here is to re-unify data and sent it through > centralized BI. In PL/Proxy context I was aiming at sorting data at nodes and then being able to merge several partitions while preserving order, and doing this without having to store N*partition_size rows in resultset. > > ... > > > > currently things like nodeSeqscan do SeqNext and so on - one records is > passed on to the next level. > why not have a nodePlugin or so doing the same? > or maybe some additional calling convention for streaming functions... > > e.g.: > CREATE STREAMING FUNCTION xy() RETURNS NEXT RECORD AS $$ > return exactly one record to keep doing > return NULL to mark "end of table" > $$ LANGUAGE 'any'; > > so - for those function no ... > WHILE ... > RETURN NEXT > > but just one tuple per call ... > this would pretty much do it for this case. > i would not even call this a special case - whenever there is a LOT of > data, this could make sense. In python (an also javascript starting at version 1.7) you do it by returning a generator from a function, which is done by using YIELD instead of return. >>> def numgen(i): ... while 1: ... yield i ... i += 1 >>> ng = numgen(1) >>> ng <generator object at 0xb7ce3bcc> >>> ng.next() 1 >>> ng.next() 2 In fact any pl/python function SRF puts its result set to retun buffer using generator mechanisms, even in case you return the result from function as a list or an array. What would be nice, is to wire the python generator directly to postgreSQLs FuncNext call At C function level this should probably be a mirror image of AGGREGATE functions, where you have a init() function that prepares some opaque data structure and next() for getting records with some special value for end, and preferrably also some finalize() to clean up in case postgresql stops before next() indicated EOD. Maybe some extra info would be nice for optimized, like expected rowcouunt or that data is returned sorted on some field. This would be good for current return mechanisms as well. ----------------- Hannu
pgsql-hackers by date: