Thread: modifying the tbale function
Hi there
I am trying to modify the execution of the table function to work in iterator fashion instead of materializing the output. I have been digging the Postgresql code source for about a month now and I can figure out where the execution of the table function works. I will be very grateful if anyone tell where to begin as my project due is after 10 days only.
Regards
Islam Hegazy
On Sun, Mar 18, 2007 at 01:54:55PM -0600, Islam Hegazy wrote: > I am trying to modify the execution of the table function to work in > iterator fashion instead of materializing the output. I have been > digging the Postgresql code source for about a month now and I can > figure out where the execution of the table function works. I will be > very grateful if anyone tell where to begin as my project due is > after 10 days only. I've been thinking recently about why it's so difficult. It occurs to me that the problem is because the language interpreters don't lend themselves to being an iterator. What you want is that when you call a perl tablefunction that as soon as the perl function returns a row to return that to the caller. That means the perl interpreter has to be able to save all its state, return to the caller and when next called resume where it left off. I don't know if it can do that, but it would have to be implemented for each language (or use threads). Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > What you want is that when you call a perl tablefunction that as soon > as the perl function returns a row to return that to the caller. That > means the perl interpreter has to be able to save all its state, > return to the caller and when next called resume where it left off. > I don't know if it can do that, but it would have to be implemented for > each language (or use threads). > > > We haven't even worked out how to do that cleanly for plpgsql, which we control, let alone for any third party interpreter. I'm not convinced it would be a huge gain anyway. Switching madly in and out of the perl interpreter at least is a known performance problem, IIRC - Perl XML objects have, or used to have, problems with that (and they still don't perform terribly well). cheers andrew
Andrew Dunstan wrote: > I'm not convinced it would be a huge gain anyway. Switching madly in > and out of the perl interpreter at least is a known performance > problem, IIRC Returning control to the backend for every row returned would likely be excessive, but you could return once every k rows and get most of the benefits of both approaches (k might be on the order of 1000). The problem with the current approach is that it makes returning large result sets from PL functions very expensive, since they need to be spooled to disk. As for using threads, that's pretty much a non-starter: we can't safely allow calls into the backend from multiple concurrent threads, and I doubt that will chance any time soon. -Neil
Returning k rows would be a reasonable solution but which functions need to be modified to achieve this. ----- Original Message ----- From: "Neil Conway" <neilc@samurai.com> To: "Andrew Dunstan" <andrew@dunslane.net> Cc: "Martijn van Oosterhout" <kleptog@svana.org>; "Islam Hegazy" <islheg@hotmail.com>; <pgsql-hackers@postgresql.org> Sent: Sunday, March 18, 2007 4:57 PM Subject: Re: [HACKERS] modifying the tbale function > Andrew Dunstan wrote: >> I'm not convinced it would be a huge gain anyway. Switching madly in and >> out of the perl interpreter at least is a known performance problem, IIRC > > Returning control to the backend for every row returned would likely be > excessive, but you could return once every k rows and get most of the > benefits of both approaches (k might be on the order of 1000). The problem > with the current approach is that it makes returning large result sets > from PL functions very expensive, since they need to be spooled to disk. > > As for using threads, that's pretty much a non-starter: we can't safely > allow calls into the backend from multiple concurrent threads, and I doubt > that will chance any time soon. > > -Neil > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
Neil Conway <neilc@samurai.com> writes: > Returning control to the backend for every row returned would likely be > excessive, but you could return once every k rows and get most of the > benefits of both approaches (k might be on the order of 1000). However, this still leaves us with no idea how to persuade perl, tcl, python, et al to cooperate. I think you are underestimating the cost of suspending/resuming any of those interpreters, and overestimating the cost of a tuplestore, which on a per-tuple basis is really pretty cheap. It's quite likely that the proposed project would produce piddling or negative gains, after expending a huge amount of work. (A tenth of the effort on optimizing tuplestore some more would probably be a better investment.) A cross-check on this theory could be made without a lot of effort: hack SQL functions to use a tuplestore (fed via the tuplestore destreceiver, so as not to exit the executor) instead of return-after-every-tuple. Compare performance. I kinda suspect you'll find it a loss even there. regards, tom lane
Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: >> Returning control to the backend for every row returned would likely be >> excessive, but you could return once every k rows and get most of the >> benefits of both approaches (k might be on the order of 1000). > > However, this still leaves us with no idea how to persuade perl, tcl, > python, et al to cooperate. It seem like a useful optimization for C-functions, though. I was caught by surprise a while ago when I realized that the way I've been using to create simple test data quickly: INSERT INTO foo SELECT key FROM generate_series(1, <large number>) key materializes the generate_series result set first. I'd like to have that changed, even if we leave the behavior as it is for PLs. Another affected use case is using dblink to copy large tables. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Tom Lane wrote: >> Neil Conway <neilc@samurai.com> writes: >>> Returning control to the backend for every row returned would likely >>> be excessive, but you could return once every k rows and get most of >>> the benefits of both approaches (k might be on the order of 1000). >> >> However, this still leaves us with no idea how to persuade perl, tcl, >> python, et al to cooperate. > > It seem like a useful optimization for C-functions, though. I was > caught by surprise a while ago when I realized that the way I've been > using to create simple test data quickly: > > Actually, I think we could teach the PLs to do it - just not transparently, so we'd need to mark which functions used the new protocol. Such functions would get a state object as an implied first argument, so in plperl it might work like this (for a generate_series-like function): my $state = shift; my $low = shift; my $high = shift; if ($state->{callstatus} eq 'firstcall') { $state->{counter}= $low; } elseif ($state->{callstatus} eq 'cleanup') { # do cleanup here $state->{return_status}= 'cleaned'; return; } my $next = $state->{counter}++; $state->{return_status} = $next< $high ? 'result' : 'last_result'; return $next; To support this I think we'd need to do something like: create function mygs(int, int) returns setof int language plperl with srfstate as $$ ... $$; cheers andrew
Andrew Dunstan wrote: > > Actually, I think we could teach the PLs to do it - just not > transparently, so we'd need to mark which functions used the new > protocol. Such functions would get a state object as an implied first > argument, so in plperl it might work like this (for a > generate_series-like function): > To support this I think we'd need to do something like: > > create function mygs(int, int) > returns setof int > language plperl > with srfstate > as $$ ... $$; Is this not what we do with aggregate functions at present? -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Andrew Dunstan wrote: >> >> Actually, I think we could teach the PLs to do it - just not >> transparently, so we'd need to mark which functions used the new >> protocol. Such functions would get a state object as an implied first >> argument, so in plperl it might work like this (for a >> generate_series-like function): > >> To support this I think we'd need to do something like: >> >> create function mygs(int, int) >> returns setof int >> language plperl >> with srfstate >> as $$ ... $$; > > Is this not what we do with aggregate functions at present? > Yes, more or less. That's what made me think of it. OTOH, before we rush out and do it someone needs to show that it's a net win. I agree with Tom that making tuplestore faster would probably be a much better investment of time. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Richard Huxton wrote: >> Is this not what we do with aggregate functions at present? > Yes, more or less. That's what made me think of it. > OTOH, before we rush out and do it someone needs to show that it's a net > win. Yeah, because this isn't doing anything to address the problem of entry/exit overhead from calling a PL function many times. I kinda dislike shoving the problem onto the heads of PL programmers anyway... regards, tom lane
"Andrew Dunstan" <andrew@dunslane.net> writes: > Yes, more or less. That's what made me think of it. > > OTOH, before we rush out and do it someone needs to show that it's a net win. I > agree with Tom that making tuplestore faster would probably be a much better > investment of time. I don't think the problem with the tuplestore is a matter of speed. It's a matter of scalability and flexibility. It limits the types of applications that can use SRFs and the amount of data they can manipulate before it becomes impractical. Consider applications like dblink that have SRFs that read data from a slow network sources. Or that generate more data than the server can actually store at any one time. Or that overflow work_mem but are used in queries that could return quickly based on the first few records. Unfortunately, I don't think there's a simple fix that'll work for all PLs using the current interface. Even languages with iterators themselves (python, I think) probably don't expect to be called externally while an iterator is in progress. It seems to me the way to fix it is to abandon the iterator style interface and an interface that allows you to implement a SRF by providing a function that returns just the "next" record. It would have to save enough state for the next iteration explicitly in a data structure rather than being able to depend on the entire program state being restored. You could argue you could already do this using a non-SRF but there are two problems: 1) there's no convenient way to stash the state anywhere and 2) it wouldn't be convenient to use in SQL in FROM clauses the way SRFs are. IIRC there is already a hack in at least one of the PLs to stash data in a place where you can access it in repeated invocations. It doesn't work correctly if you call your function from two different places from a query. It would take executor support for such state data structures to fix that problem. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > "Andrew Dunstan" <andrew@dunslane.net> writes: > >> Yes, more or less. That's what made me think of it. >> >> OTOH, before we rush out and do it someone needs to show that it's a net win. I >> agree with Tom that making tuplestore faster would probably be a much better >> investment of time. > > I don't think the problem with the tuplestore is a matter of speed. It's a > matter of scalability and flexibility. It limits the types of applications > that can use SRFs and the amount of data they can manipulate before it becomes > impractical. > > Consider applications like dblink that have SRFs that read data from a slow > network sources. Or that generate more data than the server can actually store > at any one time. Or that overflow work_mem but are used in queries that could > return quickly based on the first few records. > > Unfortunately, I don't think there's a simple fix that'll work for all PLs > using the current interface. Even languages with iterators themselves (python, > I think) probably don't expect to be called externally while an iterator is in > progress. Just a thought - I believe that there are portable user-space thread implementations that contain little or no machine-specific code. What if postgres used one of those to switch from the PL into the executor and back after, say, 1000 rows were returned by the SFR? What would be needed is basically some enhanced version of setjmp/longjmp that actually saves the stack, and not just resets the stackpointer. Since context switching would occur only at two well-defined places (Some return_next_row function that PLs call when a SFR returns a row, and in the executor if no more previously returned rows from that SFR are available), this wouldn't introduce the usual multithreading headache, but still allow to switch in and out of the PL interpreter. greetings, Florian Pflug
Florian G. Pflug wrote: > Just a thought - I believe that there are portable user-space thread > implementations that contain little or no machine-specific code. What > if postgres used one of those to switch from the PL into the executor > and back after, say, 1000 rows were returned by the SFR? > > What would be needed is basically some enhanced version of setjmp/longjmp > that actually saves the stack, and not just resets the stackpointer. > > Since context switching would occur only at two well-defined places > (Some return_next_row function that PLs call when a SFR returns a row, > and in the executor if no more previously returned rows from that SFR > are available), this wouldn't introduce the usual multithreading > headache, but still allow to switch in and out of the PL interpreter. > > This just sounds horribly fragile. Are we really sure that this isn't a solution in search of a problem? cheers andrew
So, I understood from all those opinions that much of the work is to be done in the interface language interpreter not postgresql code itself. Am I right or I missed something? Regards Islam Hegazy ----- Original Message ----- From: "Andrew Dunstan" <andrew@dunslane.net> To: "Florian G. Pflug" <fgp@phlo.org> Cc: "Gregory Stark" <stark@enterprisedb.com>; "Richard Huxton" <dev@archonet.com>; "Heikki Linnakangas" <heikki@enterprisedb.com>; "Tom Lane" <tgl@sss.pgh.pa.us>; "Neil Conway" <neilc@samurai.com>; "Martijn van Oosterhout" <kleptog@svana.org>; "Islam Hegazy" <islheg@hotmail.com>; <pgsql-hackers@postgresql.org> Sent: Monday, March 19, 2007 12:18 PM Subject: Re: [HACKERS] modifying the tbale function > Florian G. Pflug wrote: >> Just a thought - I believe that there are portable user-space thread >> implementations that contain little or no machine-specific code. What >> if postgres used one of those to switch from the PL into the executor >> and back after, say, 1000 rows were returned by the SFR? >> >> What would be needed is basically some enhanced version of setjmp/longjmp >> that actually saves the stack, and not just resets the stackpointer. >> >> Since context switching would occur only at two well-defined places >> (Some return_next_row function that PLs call when a SFR returns a row, >> and in the executor if no more previously returned rows from that SFR >> are available), this wouldn't introduce the usual multithreading >> headache, but still allow to switch in and out of the PL interpreter. >> >> > > This just sounds horribly fragile. > > Are we really sure that this isn't a solution in search of a problem? > > cheers > > andrew > > > >
Andrew Dunstan wrote: > > Are we really sure that this isn't a solution in search of a problem? The need for value-per-call is real (examples mentioned down-thread) and was anticipated from day one of the SRF implementation (in fact the first patch I wrote was value-per-call, not materialize). But when we realized that value-per-call was not going to work very well for any PL *except* C-functions, we switched to SFRM_Materialize as the only supported mode, with SFRM_ValuePerCall left as a to-be-coded-later option (see SetFunctionReturnMode in execnodes.h). Personally I think it is worth having SFRM_ValuePerCall even if only C functions can make use of it. Joe
Joe Conway wrote: > Andrew Dunstan wrote: >> >> Are we really sure that this isn't a solution in search of a problem? > > The need for value-per-call is real (examples mentioned down-thread) > and was anticipated from day one of the SRF implementation (in fact > the first patch I wrote was value-per-call, not materialize). But when > we realized that value-per-call was not going to work very well for > any PL *except* C-functions, we switched to SFRM_Materialize as the > only supported mode, with SFRM_ValuePerCall left as a > to-be-coded-later option (see SetFunctionReturnMode in execnodes.h). > > Personally I think it is worth having SFRM_ValuePerCall even if only C > functions can make use of it. > Yeah, makes plenty of sense for C funcs. I don't think there's an argument about that. But for that we don't need any threading infrastructure. cheers andrew
Andrew Dunstan wrote: > Joe Conway wrote: >> Andrew Dunstan wrote: >>> Are we really sure that this isn't a solution in search of a problem? >> The need for value-per-call is real (examples mentioned down-thread) >> and was anticipated from day one of the SRF implementation (in fact >> the first patch I wrote was value-per-call, not materialize). But when >> we realized that value-per-call was not going to work very well for >> any PL *except* C-functions, we switched to SFRM_Materialize as the >> only supported mode, with SFRM_ValuePerCall left as a >> to-be-coded-later option (see SetFunctionReturnMode in execnodes.h). >> >> Personally I think it is worth having SFRM_ValuePerCall even if only C >> functions can make use of it. > > Yeah, makes plenty of sense for C funcs. I don't think there's an > argument about that. But for that we don't need any threading > infrastructure. Oh sure -- sorry I wasn't clear. I wasn't trying to support the idea of threading so much as the idea that value-per-call itself has merit for a number of use cases. Joe
"Florian G. Pflug" <fgp@phlo.org> writes: > Since context switching would occur only at two well-defined places > (Some return_next_row function that PLs call when a SFR returns a row, > and in the executor if no more previously returned rows from that SFR > are available), this wouldn't introduce the usual multithreading > headache... Yes it would. Consider what happens if the PL function calls into SPI to execute a query.... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Andrew Dunstan wrote: > Florian G. Pflug wrote: >> Just a thought - I believe that there are portable user-space thread >> implementations that contain little or no machine-specific code. What >> if postgres used one of those to switch from the PL into the executor >> and back after, say, 1000 rows were returned by the SFR? >> >> What would be needed is basically some enhanced version of setjmp/longjmp >> that actually saves the stack, and not just resets the stackpointer. >> >> Since context switching would occur only at two well-defined places >> (Some return_next_row function that PLs call when a SFR returns a row, >> and in the executor if no more previously returned rows from that SFR >> are available), this wouldn't introduce the usual multithreading >> headache, but still allow to switch in and out of the PL interpreter. >> >> > This just sounds horribly fragile. Why would it be? It's about the same as running postgresql in one thread, and some PL in another. This should only cause trouble if both use some non-reentrant libc-functions. But even that wouldn't matter because of the well-defined context switching points. Here is a paper about portable userspace threads that I just googled. http://www.gnu.org/software/pth/rse-pmt.ps > Are we really sure that this isn't a solution in search of a problem? I think this really depends on how you define "problem". Some people might think that "select * from myfunc(...) limit 1" should stop and return a result after myfunc(...) has returned one row. Others will say "well, just use a different software design that doesn't depend on this optimization". greetings, Florian Pflug
Gregory Stark wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: > >> Since context switching would occur only at two well-defined places >> (Some return_next_row function that PLs call when a SFR returns a row, >> and in the executor if no more previously returned rows from that SFR >> are available), this wouldn't introduce the usual multithreading >> headache... > > Yes it would. Consider what happens if the PL function calls into SPI to > execute a query.... I don't that would cause trouble. Postgres and the PL would still not run concurrently - in fact, from a control-flow point of view the PL would run at _exactly_ the same place as it is now. Precisely if someone tries to pull rows out of some executor node. The only difference is that now it's execution would would be stopped and restarted multiple times by some sort of stack-switching. There might be trouble if a second function has to be executed with the same PL as an already running (but currently "stopped") function. This would only work for PL that is thread-safe in some way. greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > Here is a paper about portable userspace threads that I just googled. > http://www.gnu.org/software/pth/rse-pmt.ps I'm not impressed --- the list of unsolved problems is a bit too long. One that seems a showstopper for our purposes is lack of ability to deal reliably with stack overflow on alternate stacks. If we're going to be doing anything as loosely defined as running a third-party language interpreter in these stacks, that just won't do. Another little problem is it's LGPL. regards, tom lane
"Florian G. Pflug" <fgp@phlo.org> writes: > There might be trouble if a second function has to be executed with > the same PL as an already running (but currently "stopped") > function. This would only work for PL that is thread-safe in some way. Seems a bit iffy. It strikes me that at least for plpgsql, it might be possible to support value-per-call mode without any thread support. What you'd need to do is get rid of the current arrangement whereby the control structure of the plpgsql code is modeled on-the-fly by the call stack of the C code, and instead have block nesting, loops, etc represented by explicit data structures that're manipulated by C code with a flat call stack. If the function wants to do a RETURN NEXT, you just return, leaving its current state all nicely tucked in a data structure. This would be a little tedious but is in principle a straightforward change. I'm not sure if there'd be any meaningful performance impact. The tricky part is what about exception handling? If the function does RETURN NEXT inside a BEGIN/EXCEPTION block, what do you do ... what does that even mean? There be equally nasty dragons lurking behind that question for a threaded implementation, of course. It might be that we could get away with decreeing that RETURN NEXT inside EXCEPTION isn't legal. regards, tom lane
Florian G. Pflug wrote: >> Are we really sure that this isn't a solution in search of a problem? > I think this really depends on how you define "problem". Some people > might think that "select * from myfunc(...) limit 1" should stop and > return a result after myfunc(...) has returned one row. Others will > say "well, just use a different software design that doesn't depend > on this optimization". > > I think that's a very thin use case to justify all the scaffolding you propose, so put me in with the others I guess. If we can provide a per-call mechanism for C functions, that should be adequate, I think. If you need heavy optimization then you need C functions anyway. cheers andrew
Ühel kenal päeval, T, 2007-03-20 kell 09:35, kirjutas Andrew Dunstan: > Florian G. Pflug wrote: > >> Are we really sure that this isn't a solution in search of a problem? > > I think this really depends on how you define "problem". Some people > > might think that "select * from myfunc(...) limit 1" should stop and > > return a result after myfunc(...) has returned one row. Others will > > say "well, just use a different software design that doesn't depend > > on this optimization". The same argument could be used for justifying leaving out any optimisations. > I think that's a very thin use case to justify all the scaffolding you > propose, so put me in with the others I guess. > > If we can provide a per-call mechanism for C functions, that should be > adequate, I think. Sure, having it at least at C level would make the job of pl implementors wanting to do per-call returns a little easier. > If you need heavy optimization then you need C > functions anyway. Check out pl/proxy ( http://pgfoundry.org/projects/plproxy ), which is a dblink replacement and database partitioning tool in one. A short summary is here https://developer.skype.com/SkypeGarage/DbProjects/PlProxy (may be a bit outdated). I'm sure there would be use-cases (like moving huge tables) where per-call (or per-kilocall :) ) mechanisms would come handy. One example would be doing a huge sort split between N partitions so that each partition sorts its part of the data and then merge the results in-order from all partitions in-line, without doing re-sorting on the master. With a function returning the full recordset the sorting can't be distributed very effectively, as the whole query result needs to be saved on the master node before returning. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> Here is a paper about portable userspace threads that I just googled. >> http://www.gnu.org/software/pth/rse-pmt.ps > > I'm not impressed --- the list of unsolved problems is a bit too long. > One that seems a showstopper for our purposes is lack of ability to > deal reliably with stack overflow on alternate stacks. If we're going > to be doing anything as loosely defined as running a third-party > language interpreter in these stacks, that just won't do. > > Another little problem is it's LGPL. I don't think using that GnuPT library the paper mentions is a viable approach for postgres. I just posted that link because it shows that this is not impossible to do. What would actually be interesting is to find out of the ucontext/getcontext/makecontext/swapcontext/setcontext functions mentioned in the paper are available on the plattform where postgres is used in production for larger dataset. I'd assume that people who need this PL optimization will run a quite update-to-date version of their particular OS, so maybe the portability problems of those functions wouldn't be a problem for postgres - the PL optimization would just be disabled at configure time if they are not available. The main question is: IF the stability problems like stack overflow can be addressed, would this be in principle considered to be feature that people would like to have? Or is it considered not worth the effort? greetings, Florian Pflug