Thread: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Peter Geoghegan
Date:
Hello, At the moment my pg client application (running 8.4) transfers data from several remote DBs (that run 8.3) via dblink, using cursors where appropriate, and reporting back progress to users using a progress bar and brief messages. I thought it would be great to change my code to have all this done within a single plpgsql function, and have all the benefits that that brings. It's easy enough to do all this, by writing a function that RETURNS TABLE(progress integer, message text), and RETURNing NEXT when time comes to display a new message or increment the progress bar. However, that approach has the considerable drawback of not actually returning any rows until it finally returns all of them. I'm not willing to give up giving the user those messages and having their progress bar updated in real-time though. I would like to have the function behave as a cursor, and return one row at a time when control reaches each RETURN NEXT statement. Is it possible to somehow achieve what I've described, perhaps by doing something with a function that returns refcursor? Thanks, Peter Geoghegan
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Alban Hertroys
Date:
On 16 Feb 2010, at 1:04, Peter Geoghegan wrote: > Hello, > > At the moment my pg client application (running 8.4) transfers data > from several remote DBs (that run 8.3) via dblink, using cursors where > appropriate, and reporting back progress to users using a progress bar > and brief messages. > > I thought it would be great to change my code to have all this done > within a single plpgsql function, and have all the benefits that that > brings. > > It's easy enough to do all this, by writing a function that RETURNS > TABLE(progress integer, message text), and RETURNing NEXT when time > comes to display a new message or increment the progress bar. However, > that approach has the considerable drawback of not actually returning > any rows until it finally returns all of them. I'm not willing to give > up giving the user those messages and having their progress bar > updated in real-time though. I would like to have the function behave > as a cursor, and return one row at a time when control reaches each > RETURN NEXT statement. I'd think RETURN NEXT would behave the way you want it to. There's probably something in your function causing your functionto behave like it does now. I suspect the problem lies in the way you determine how far you've progressed, but youdidn't tell us anything about your function, so I'm just guessing. I don't know much about dblink, so it is possible your problem is related to that. I imagine it may batch "small" resultsets and send them over all at once to reduce traffic. Do you see this problem with larger result sets (say >10k rows)? Alban Hertroys -- Screwing up is the best way to attach something to the ceiling. !DSPAM:737,4b7a5d9710441627593049!
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Pavel Stehule
Date:
2010/2/16 Alban Hertroys <dalroi@solfertje.student.utwente.nl>: > On 16 Feb 2010, at 1:04, Peter Geoghegan wrote: > >> Hello, >> >> At the moment my pg client application (running 8.4) transfers data >> from several remote DBs (that run 8.3) via dblink, using cursors where >> appropriate, and reporting back progress to users using a progress bar >> and brief messages. >> >> I thought it would be great to change my code to have all this done >> within a single plpgsql function, and have all the benefits that that >> brings. >> >> It's easy enough to do all this, by writing a function that RETURNS >> TABLE(progress integer, message text), and RETURNing NEXT when time >> comes to display a new message or increment the progress bar. However, >> that approach has the considerable drawback of not actually returning >> any rows until it finally returns all of them. I'm not willing to give >> up giving the user those messages and having their progress bar >> updated in real-time though. I would like to have the function behave >> as a cursor, and return one row at a time when control reaches each >> RETURN NEXT statement. > > > I'd think RETURN NEXT would behave the way you want it to. There's probably something in your function causing your functionto behave like it does now. I suspect the problem lies in the way you determine how far you've progressed, but youdidn't tell us anything about your function, so I'm just guessing. > > I don't know much about dblink, so it is possible your problem is related to that. I imagine it may batch "small" resultsets and send them over all at once to reduce traffic. Do you see this problem with larger result sets (say >10k rows)? if you have a large dataset (more than 1 M rows), then better is C coding. PLpgSQL (return next, return query) push result to memory, and it can be a problem. Regards Pavel Stehule > > Alban Hertroys > > -- > Screwing up is the best way to attach something to the ceiling. > > > !DSPAM:737,4b7a5d9710441627593049! > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Peter Geoghegan
Date:
> I'd think RETURN NEXT would behave the way you want it to. There's probably something in your function causing your functionto behave like it does now. I suspect the problem lies in the way you determine how far you've progressed, but youdidn't tell us anything about your function, so I'm just guessing. > Why do you think that? At the moment, the function merely sends two messages along the lines of "beginning downloading...", "connecting to first db...", before connecting to the first remote DB, which, in my perfunctory testing was unavailable. The connection blocks, throws an exception, is handled in an EXECPTION block (by once again sending a message in the usual way). However, I get all 3 messages at once, only when the dblink function finishes blocking and throws its exception, which takes about 7 seconds because I like to use a timeout in my connection string. > I don't know much about dblink, so it is possible your problem is related to that. I imagine it may batch "small" resultsets and send them over all at once to reduce traffic. Do you see this problem with larger result sets (say >10k rows)? > I'll attempt to devise a useful test, but I'm a little doubtful that I'll get anywhere with my current approach, given that my initial, very simple test failed. Regards, Peter Geoghegan
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Alban Hertroys
Date:
On 16 Feb 2010, at 10:34, Peter Geoghegan wrote: >> I'd think RETURN NEXT would behave the way you want it to. There's probably something in your function causing your functionto behave like it does now. I suspect the problem lies in the way you determine how far you've progressed, but youdidn't tell us anything about your function, so I'm just guessing. >> > > Why do you think that? At the moment, the function merely sends two > messages along the lines of "beginning downloading...", "connecting > to first db...", before connecting to the first remote DB, which, in > my perfunctory testing was unavailable. The connection blocks, throws > an exception, is handled in an EXECPTION block (by once again sending > a message in the usual way). However, I get all 3 messages at once, > only when the dblink function finishes blocking and throws its > exception, which takes about 7 seconds because I like to use a timeout > in my connection string. I assumed you were generating the progress indicator from query results in a remote DB. It turns out that's entirely notwhat you're doing, but how were we supposed to know that? You say you want your function to behave like a cursor. A function can't behave like a cursor, just as a select can't; both result in a result set. What RETURN NEXT and RETURN QUERYdo is make it possible to access the function results row by row. You still need a cursor to get the behaviour of acursor out of that. If that doesn't answer your question, I'm pretty sure that to help you with this problem people will need quite a bit moreinformation than you're giving us. What are you doing? What do you see vs. what do you expect? Some examples would help. >> I don't know much about dblink, so it is possible your problem is related to that. I imagine it may batch "small" resultsets and send them over all at once to reduce traffic. Do you see this problem with larger result sets (say >10k rows)? >> > > I'll attempt to devise a useful test, but I'm a little doubtful that > I'll get anywhere with my current approach, given that my initial, > very simple test failed. From your description it turns out dblink isn't involved yet, so it can't cause the issue you were asking about. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4b7a71b410441921015512!
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Peter Geoghegan
Date:
> I assumed you were generating the progress indicator from query results in a remote DB. It turns out that's entirely notwhat you're doing, but how were we supposed to know that? > Well, it made sense to leave dblink mostly out of things (it's a useful way to get a function to block though) until I'd figured out if it was feasible to do this with a function that RETURNS TABLE(...). Occam's razor and all that. It wasn't. I was trying to give background information, an indication of my intent - there may have been an entirely different approach that had not yet occurred to me, so that information may have been pertinent (although probably not - the fact that dblink is involved in incrementing the function probably doesn't matter at all. It probably might as well be anything else for our purposes). > You say you want your function to behave like a cursor. > A function can't behave like a cursor, just as a select can't; both result in a result set. What RETURN NEXT and RETURNQUERY do is make it possible to access the function results row by row. You still need a cursor to get the behaviourof a cursor out of that. I understand the distinction (after all, I suggested that the solution was likely to involve returning a refcursor in my original post) - let's not get bogged down in semantics though. I guess my question boils down to: can I return a cursor, but not to return the result of a single select, but of multiple different selects in succession, to report progress as described, or, alternatively, do something that will produce similar results? Here's my admittedly very rough stab at this, using a function that RETURNS TABLE, an approach that evidently doesn't work (which is not to say that I ever had a reasonable expectation of this working. In fact, I was almost certain it wouldn't work, but I needed to start somewhere): CREATE OR REPLACE FUNCTION download_sales(download_date_arg date, rem_arg integer[]) RETURNS TABLE (progress integer, message text) AS $BODY$ DECLARE cur_progress integer DEFAULT 0; tup rems%rowtype; BEGIN progress := cur_progress; cur_progress := cur_progress + 1; message := 'Beginning downloading sales...'; RETURN NEXT; FOR tup IN SELECT * FROM rems WHERE is_active AND id = ANY(rem_arg) LOOP DECLARE conn_str text; query_str text; BEGIN -- connection will timeout after 7 seconds. conn_str = 'hostaddr=' || tup.ip || ' port=' || tup.port || 'dbname=remote_db user=' || tup.username || ' password=' || tup.password || ' connect_timeout=7'; -- open persistent connection to rem DB progress := -1; cur_progress := 0; message := 'Connecting to rem ''' || tup.description || ''''; RETURN NEXT; SELECT dblink_connect(conn_str); -- TODO: Actually transfer data. INSERT it into local tables, and indicate progress to user as above -- The remote DB stores how many tuples there are for the day, so I don't have to do a count(*) first -- we send a magic number (say -1), which indicates that number of tuples follows, then we send number of tuples -- then we send which tuple we're currently on, for each and every remote DB EXCEPTION WHEN CONNECTION_EXCEPTION THEN message := 'Could not connect to rem ''' || tup.description || ''''; RETURN NEXT; RETURN; END; END LOOP; RETURN; END; $BODY$ LANGUAGE 'plpgsql' VOLATILE SECURITY DEFINER; As I've already said, the problem with this approach is that I see all 3 messages at once, when the CONNECTION_EXCEPTION is thrown and we finally RETURN, after about 7 seconds (which is undoubtedly how RETURNS TABLE is documented to behave). I want (although, as I've said, don't expect) to see the first two messages immediately, and only the third when the connection fails, so I know what's happening in real-time. Regards, Peter Geoghegan
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Alban Hertroys
Date:
On 16 Feb 2010, at 12:35, Peter Geoghegan wrote: > As I've already said, the problem with this approach is that I see all > 3 messages at once, when the CONNECTION_EXCEPTION is thrown and we > finally RETURN, after about 7 seconds (which is undoubtedly how > RETURNS TABLE is documented to behave). I want (although, as I've > said, don't expect) to see the first two messages immediately, and > only the third when the connection fails, so I know what's happening > in real-time. It seems you're right, I built a simple test-case (see attachment) using timeofday(). The numbers from fetching from a cursorover the set-returning function run away from the selects that directly call timeofday() in between. In my case I pause the _client_ between calls, but the results are the same. Peculiar... I'm running PostgreSQL 8.4.1 on i386-apple-darwin10.0.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (AppleInc. build 5646), 64-bit !DSPAM:737,4b7a925f10448503891907! Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4b7a925f10448503891907!
Attachment
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Peter Geoghegan
Date:
> It seems you're right, I built a simple test-case (see attachment) using timeofday(). The numbers from fetching from acursor over the set-returning function run away from the selects that directly call timeofday() in between. > In my case I pause the _client_ between calls, but the results are the same. > Peculiar... > Cursors simply address the problem of "impedance mismatch" (the use of that term in this way probably pre-dates ORM, so please don't tell me that that term refers exclusively to an ORM problem within an RDBMS context). You don't have to fetch the result set all at once where that is impractical. However, the entire result set is available on the server from the first fetch. What I want to do is to fetch the beginning of the result set before the end has even been determined, with the fetching potentially blocking in my app to let the plpgsql function catch up and generate a row that didn't even exist on the server before, and for a time after the most recent fetch. That is fundamentally different to how cursors are typically used, and I hope that there is a way to do it. Thanks, Peter Geoghegan
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Tom Lane
Date:
Peter Geoghegan <peter.geoghegan86@gmail.com> writes: > Cursors simply address the problem of "impedance mismatch" (the use of > that term in this way probably pre-dates ORM, so please don't tell me > that that term refers exclusively to an ORM problem within an RDBMS > context). You don't have to fetch the result set all at once where > that is impractical. However, the entire result set is available on > the server from the first fetch. The above statements are not true as far as the cursor mechanism is concerned (at least not in Postgres). However, they are true as far as the result of a plpgsql function returning set is concerned. The function runs to completion, dumping RETURN NEXT outputs into a "tuplestore", and then returns the tuplestore as its result. It is possible to code set-returning functions in a suspend-and-resume style, but none of the available PLs do that; you have to get down to the C level. regards, tom lane
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Peter Geoghegan
Date:
> It is possible to code set-returning functions in a suspend-and-resume > style, but none of the available PLs do that; you have to get down to > the C level. Aren't my requirements sufficiently common to justify developing a mechanism to report progress back to client applications during batch operations and the like? This seems like a nice thing to have, that would be broadly useful, perhaps as a contrib module. Obviously I have no idea of the amount of effort developing such a feature would entail. Regards, Peter Geoghegan
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Alvaro Herrera
Date:
Peter Geoghegan escribió: > > It is possible to code set-returning functions in a suspend-and-resume > > style, but none of the available PLs do that; you have to get down to > > the C level. > > Aren't my requirements sufficiently common to justify developing a > mechanism to report progress back to client applications during batch > operations and the like? This seems like a nice thing to have, that > would be broadly useful, perhaps as a contrib module. Obviously I have > no idea of the amount of effort developing such a feature would > entail. We've had several requests for progress reporting thingies of all kinds. The amount of effort is not insignificant, which is probably why nothing has gotten done yet ... -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Peter Geoghegan
Date:
> We've had several requests for progress reporting thingies of all kinds. > The amount of effort is not insignificant, which is probably why nothing > has gotten done yet ... What do you mean by "reporting thingies of all kinds"? It isn't as if I'm asking for some esoteric feature, some regional variation of reporting on batch operations. I'd like to be able to report things back to client apps. That would perhaps take the form of returning a message string, and an integer at each RETURN NEXT (or whatever mechanism is decided upon). It would probably still be very useful to a lot of people if the feature was completely inflexible, not allowing you to vary the RETURN NEXT datatype, for example. Can't someone think of a way of doing this as a neat adjunct to Postgres? Thanks, Peter Geoghegan
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Richard Huxton
Date:
On 16/02/10 20:40, Peter Geoghegan wrote: > > Can't someone think of a way of doing this as a neat adjunct to Postgres? Lots of people can think of ways of doing this. Not so many are prepared to devote the time and effort to making it happen. In fact, by definition, so far there have been zero such people. Don't suppose you're volunteering? :-) -- Richard Huxton Archonet Ltd
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Peter Geoghegan
Date:
> Don't suppose you're volunteering? :-) I've already looked at the SRF documentation. It doesn't look all that intimidating, and I might be willing to have a go if we can first build some consensus on what this ought to look like. Can we? Regards, Peter Geoghegan
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Tom Lane
Date:
Peter Geoghegan <peter.geoghegan86@gmail.com> writes: > Aren't my requirements sufficiently common to justify developing a > mechanism to report progress back to client applications during batch > operations and the like? Have you experimented with RAISE NOTICE? Using it this way is a bit of a hack maybe, but I think you are entirely unaware of what would be involved in producing something that's less of a hack. regards, tom lane
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Peter Geoghegan
Date:
>> Aren't my requirements sufficiently common to justify developing a >> mechanism to report progress back to client applications during batch >> operations and the like? > > Have you experimented with RAISE NOTICE? Using it this way is a bit of > a hack maybe, but I think you are entirely unaware of what would be > involved in producing something that's less of a hack. That assessment is accurate. Should I take it that you aren't particularly interested in pursuing something like this at present? I'll see what I can come up with by playing with RAISE NOTICE Thanks, Peter Geoghegan
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Peter Geoghegan <peter.geoghegan86@gmail.com> writes: >> Aren't my requirements sufficiently common to justify developing a >> mechanism to report progress back to client applications during batch >> operations and the like? > > Have you experimented with RAISE NOTICE? Using it this way is a bit of > a hack maybe, but I think you are entirely unaware of what would be > involved in producing something that's less of a hack. Would returning a refcursor then using fetch in the application be another solution? As far as hacking is concerned, I think it boils down to materialise against value-per-call implementation, right? Not saying it's easy to implement value-per-call support in plpgsql, but should the OP think about what's involved, is that the track to follow? http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/backend/utils/fmgr/README;hb=HEAD#l380 http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=d9a319525591bc437e4770b4e796a7517844a784 The first link is the fmgr/README explaining the concepts, and the second one is a recent enough patch dealing with materialise and value-per-call in the context of SQL functions. Regards, -- dim
Re: Having a plpgsql function return multiple rows that indicate its progress in a cursor like fashion
From
Peter Geoghegan
Date:
Hi dim, > Would returning a refcursor then using fetch in the application be > another solution? I assume not, since nobody stepped forward and offered a way, even though I suggested that returning a refcursor may be the way to go (you'll recall that you suggested that to me in IRC - I'm sternocera there. I believe we met in Paris too). As I said, "I guess my question boils down to: can I return a cursor, but not to return the result of a single select, but of multiple different selects in succession, to report progress as described, or, alternatively, do something that will produce similar results?" I also said: "Cursors simply address the problem of "impedance mismatch"...You don't have to fetch the result set all at once where that is impractical. However, the entire result set is available on the server from the first fetch." Tom contradicted this, but I believe he just meant that my statement was technically inaccurate, and not that it was conceptually inaccurate. My (perhaps incorrect) understanding is that once you open a cursor, you cannot change that which will later be fetched from the same cursor - What rows will be returned when everything is fetched is determined when the cursor is opened. Also, I cannot very well open a cursor twice, because, as the docs say, "the cursor cannot be open already" when opening a cursor. So, unless I'm mistaken, I don't think returning a refcursor helps me here. > As far as hacking is concerned, I think it boils down to materialise > against value-per-call implementation, right? Not saying it's easy to > implement value-per-call support in plpgsql, but should the OP think > about what's involved, is that the track to follow? > > http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/backend/utils/fmgr/README;hb=HEAD#l380 > http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=d9a319525591bc437e4770b4e796a7517844a784 > > The first link is the fmgr/README explaining the concepts, and the > second one is a recent enough patch dealing with materialise and > value-per-call in the context of SQL functions. I'll investigate. To be perfectly frank, I am probably overextending myself in doing so, because a) I am not familiar with the PostgreSQL codebase and b) Tom's admonition about the likely difficulty of doing this indicates that it is probably quite an involved task. I think it would be useful to articulate, in broad strokes, what this feature should look like, if not for my benefit, then for the benefit of whoever will eventually implement it (because, given the aspirations and momentum of the postgres community, and the obvious utility of what I've described, I think it's inevitable that *someone* will). Regards, Peter Geoghegan