Thread: Writing results while loop ongoing?
Hi there, Having some problems today. Hopefully someone can help. The situation is... I have a query that takes data from one of my tables, sends it to an API, gets some data back, and then writes it to a results table. The data comes from a table called 'stage_cleaned' The query is called 'create_route_geometry_mapquest' The results table is called 'route_geom' The problem that I have been having is that if I send say 50 requests to the API, and all but the 49th are completed fine, none of my results are wrote to the results table. It's all or nothing. To fix this I've tried to put the function within a loop and identified when the errors occur and put them as an exception. It now works fine and the functions always goes through the entire data. However it only write the data to the results table at the end. Is this normal? Could it not write the data to the results table after it's made each request? Thoughts? Query below: ------------------------------------------------------------------------ CREATE OR REPLACE FUNCTION routing_loop() RETURNS VOID AS $$ DECLARE record_number RECORD; BEGIN FOR record_number IN SELECT id FROM stage_cleaned WHERE google_mode = 'walking' AND route_geom IS NULL ORDER BY id LIMIT 5 LOOP LOOP BEGIN PERFORM create_route_geometry_mapquest( google_origin::text, google_destination::text, google_mode::text, id::text ), Notice('did stage cleaned id number ' || id ), pg_sleep(1) FROM stage_cleaned WHERE route_geom IS NULL AND google_mode = 'walking' AND id = record_number.id ORDER BY id; EXIT; EXCEPTION WHEN SQLSTATE 'XX000' THEN END; END LOOP; END LOOP; RETURN; END; $$ LANGUAGE plpgsql; ---------------------------------------------------------------------------- Thanks James
On Wed, Sep 4, 2013 at 6:40 AM, James David Smith <james.david.smith@gmail.com> wrote: > Hi there, > > Having some problems today. Hopefully someone can help. The situation is... > > I have a query that takes data from one of my tables, sends it to an > API, gets some data back, and then writes it to a results table. > > The data comes from a table called 'stage_cleaned' > The query is called 'create_route_geometry_mapquest' > The results table is called 'route_geom' > > The problem that I have been having is that if I send say 50 requests > to the API, and all but the 49th are completed fine, none of my > results are wrote to the results table. It's all or nothing. To fix > this I've tried to put the function within a loop and identified when > the errors occur and put them as an exception. It now works fine and > the functions always goes through the entire data. However it only > write the data to the results table at the end. Is this normal? Could > it not write the data to the results table after it's made each > request? Thoughts? Query below: This is a fundamental principle of SQL: transaction completion is 'all at once'. This allows for a lot of simplifications in terms of data management that I think you'll come to appreciate over time. But occasionally it can be annoying if you're trying to move as much coding into the sever as possible (which is sensible). "Stored Procedures" (especially) and "Autonomous Transactions" are two hypothetical features that should allow for server side coding while being able to manipulate transaction states independently of function call execution. These features are much discussed and sadly not likely to show up soon. In the meantime, you have two options: 1. Move the controlling code out of the server and into a client side language. If you're like me, you should find this distasteful and silly. 2. Abuse dblink. With dblink, you can connect the database to itself and issue transactions. That can commit before the controlling transaction resolves. This method has certain issues and annoyances but can generally be made to work. Autonomous Transactions is basically a formalization of this technique. http://www.postgresql.org/docs/9.2/static/dblink.html merlin
James David Smith <james.david.smith@gmail.com> wrote: > the functions always goes through the entire data. However it > only write the data to the results table at the end. Is this > normal? It is normal that the work of a transaction is not visible until and unless that transaction commits. Execution of a function is always part of a single transaction. > Could it not write the data to the results table after it's made > each request? It does; but the data does not become visible outside the transaction writing the data unless the transaction commits. http://en.wikipedia.org/wiki/ACID#Atomicity -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 4 September 2013 14:35, Kevin Grittner <kgrittn@ymail.com> wrote: > James David Smith <james.david.smith@gmail.com> wrote: > >> the functions always goes through the entire data. However it >> only write the data to the results table at the end. Is this >> normal? > > It is normal that the work of a transaction is not visible until > and unless that transaction commits. Execution of a function is > always part of a single transaction. > >> Could it not write the data to the results table after it's made >> each request? > > It does; but the data does not become visible outside the > transaction writing the data unless the transaction commits. > > http://en.wikipedia.org/wiki/ACID#Atomicity > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company Thanks for the help Merlin and Kevin. I think I understand. dblink looks like it might be the way to go, but to honest it looks quite complicated! I think what I might do instead is try to catch the errors in EXCEPTION clauses. For example the main reason my queries fail is: ERROR: R interpreter expression evaluation error SQL state: 22000 So if I change my query to below, then it should just push on anyway I think? Have I constructed it correctly? James ------------------------------------------------------------------------ CREATE OR REPLACE FUNCTION routing_loop() RETURNS VOID AS $$ DECLARE record_number RECORD; BEGIN FOR record_number IN SELECT id FROM stage_cleaned WHERE google_mode = 'walking' AND route_geom IS NULL ORDER BY id LIMIT 5 LOOP LOOP BEGIN PERFORM create_route_geometry_mapquest( google_origin::text, google_destination::text, google_mode::text, id::text ), Notice('did stage cleaned id number ' || id ), pg_sleep(1) FROM stage_cleaned WHERE route_geom IS NULL AND google_mode = 'walking' AND id = record_number.id ORDER BY id; EXIT; EXCEPTION WHEN SQLSTATE '22000' THEN RAISE NOTICE 'There is an error, but hopefully I will continue anyway'; PERFORM pg_sleep(60); END; END LOOP; END LOOP; RETURN; END; $$ LANGUAGE plpgsql;
On Wed, Sep 4, 2013 at 12:48 PM, James David Smith <james.david.smith@gmail.com> wrote: > On 4 September 2013 14:35, Kevin Grittner <kgrittn@ymail.com> wrote: >> James David Smith <james.david.smith@gmail.com> wrote: >> >>> the functions always goes through the entire data. However it >>> only write the data to the results table at the end. Is this >>> normal? >> >> It is normal that the work of a transaction is not visible until >> and unless that transaction commits. Execution of a function is >> always part of a single transaction. >> >>> Could it not write the data to the results table after it's made >>> each request? >> >> It does; but the data does not become visible outside the >> transaction writing the data unless the transaction commits. >> >> http://en.wikipedia.org/wiki/ACID#Atomicity >> >> -- >> Kevin Grittner >> EDB: http://www.enterprisedb.com >> The Enterprise PostgreSQL Company > > > Thanks for the help Merlin and Kevin. I think I understand. dblink > looks like it might be the way to go, but to honest it looks quite > complicated! I think what I might do instead is try to catch the > errors in EXCEPTION clauses. For example the main reason my queries > fail is: > > ERROR: R interpreter expression evaluation error > SQL state: 22000 > > So if I change my query to below, then it should just push on anyway I > think? Have I constructed it correctly? yes, but that's not the question you asked. dblink is mechanic to write out the data 'mid transaction' -- not necessarily error control. if all you care about is error suppression in the loop, then exception block should fit the bill. Be advised that since you're calling pl/r any side effects there (such as writing to a file) may not necessarily be rolled back with the transaction. I would consider dropping the sleep call unless there is a good reason for it to be there and would consider putting the contents of the error message in the RAISE, which I would additionally consider raising to 'warning' (which puts stronger emphasis on it entering the log). Also, I would strongly consider indenting your code. For pl/pgsql, I typically go with two space indents and absolutely forbid the use of tabs because they blow up psql when pasting. BTW, that R error is a fairly generic error. Do you have control over the R code? I can give you some tips on how to trace it down. merlin > ------------------------------------------------------------------------ > CREATE OR REPLACE FUNCTION routing_loop() RETURNS VOID AS $$ > DECLARE > record_number RECORD; > BEGIN > FOR record_number IN SELECT id FROM stage_cleaned WHERE google_mode = > 'walking' AND route_geom IS NULL ORDER BY id LIMIT 5 > LOOP > LOOP > BEGIN > > PERFORM create_route_geometry_mapquest( > google_origin::text, > google_destination::text, > google_mode::text, > id::text > ), > Notice('did stage cleaned id number ' || id ), > pg_sleep(1) > FROM stage_cleaned > WHERE route_geom IS NULL > AND google_mode = 'walking' > AND id = record_number.id > ORDER BY id; > > EXIT; > > EXCEPTION > WHEN SQLSTATE '22000' THEN > RAISE NOTICE 'There is an error, but hopefully I will continue anyway'; > PERFORM pg_sleep(60); > END; > END LOOP; > END LOOP; > RETURN; > END; > $$ LANGUAGE plpgsql; > > > -- > Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-novice
On 4 September 2013 19:48, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Sep 4, 2013 at 12:48 PM, James David Smith > <james.david.smith@gmail.com> wrote: >> On 4 September 2013 14:35, Kevin Grittner <kgrittn@ymail.com> wrote: >>> James David Smith <james.david.smith@gmail.com> wrote: >>> >>>> the functions always goes through the entire data. However it >>>> only write the data to the results table at the end. Is this >>>> normal? >>> >>> It is normal that the work of a transaction is not visible until >>> and unless that transaction commits. Execution of a function is >>> always part of a single transaction. >>> >>>> Could it not write the data to the results table after it's made >>>> each request? >>> >>> It does; but the data does not become visible outside the >>> transaction writing the data unless the transaction commits. >>> >>> http://en.wikipedia.org/wiki/ACID#Atomicity >>> >>> -- >>> Kevin Grittner >>> EDB: http://www.enterprisedb.com >>> The Enterprise PostgreSQL Company >> >> >> Thanks for the help Merlin and Kevin. I think I understand. dblink >> looks like it might be the way to go, but to honest it looks quite >> complicated! I think what I might do instead is try to catch the >> errors in EXCEPTION clauses. For example the main reason my queries >> fail is: >> >> ERROR: R interpreter expression evaluation error >> SQL state: 22000 >> >> So if I change my query to below, then it should just push on anyway I >> think? Have I constructed it correctly? > > yes, but that's not the question you asked. dblink is mechanic to > write out the data 'mid transaction' -- not necessarily error control. > if all you care about is error suppression in the loop, then > exception block should fit the bill. Be advised that since you're > calling pl/r any side effects there (such as writing to a file) may > not necessarily be rolled back with the transaction. > > I would consider dropping the sleep call unless there is a good reason > for it to be there and would consider putting the contents of the > error message in the RAISE, which I would additionally consider > raising to 'warning' (which puts stronger emphasis on it entering the > log). > > Also, I would strongly consider indenting your code. For pl/pgsql, I > typically go with two space indents and absolutely forbid the use of > tabs because they blow up psql when pasting. > > BTW, that R error is a fairly generic error. Do you have control over > the R code? I can give you some tips on how to trace it down. > > merlin > >> ------------------------------------------------------------------------ >> CREATE OR REPLACE FUNCTION routing_loop() RETURNS VOID AS $$ >> DECLARE >> record_number RECORD; >> BEGIN >> FOR record_number IN SELECT id FROM stage_cleaned WHERE google_mode = >> 'walking' AND route_geom IS NULL ORDER BY id LIMIT 5 >> LOOP >> LOOP >> BEGIN >> >> PERFORM create_route_geometry_mapquest( >> google_origin::text, >> google_destination::text, >> google_mode::text, >> id::text >> ), >> Notice('did stage cleaned id number ' || id ), >> pg_sleep(1) >> FROM stage_cleaned >> WHERE route_geom IS NULL >> AND google_mode = 'walking' >> AND id = record_number.id >> ORDER BY id; >> >> EXIT; >> >> EXCEPTION >> WHEN SQLSTATE '22000' THEN >> RAISE NOTICE 'There is an error, but hopefully I will continue anyway'; >> PERFORM pg_sleep(60); >> END; >> END LOOP; >> END LOOP; >> RETURN; >> END; >> $$ LANGUAGE plpgsql; >> >> >> -- >> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-novice Hi Merlin, Thanks for the reply again. Yes, you are right. I started off looking for ways to write out my results mid-query, but actually it's errors making the function fail that are my main concern. They're kind of related. If I could block all the errors, and just get the function to keep running that would accomplish the same result for me i.e. not losing data from queries than ran successfully. Regarding dropping pg_sleep , this is here so as not to overload the API servers that I am running the query against. The second pg_sleep in particular is there as I believe that the generic PL/R error I get sometimes is due to the server being overloaded. If I run the same data a few minutes later it works fine. Hence when I get the PL/R error, I put the query on pause for 60 seconds before it tries to start again. Yes, I do have control over the R code. Tips on how to refine the error would be appreciated. I wrote the R code and PL/R function by scratching together examples from a few places, with some help from Mark on the PostGIS mailing list, and then some heavy tweaking myself. What makes it frustrating is that if I run the row of data that I get an error with 5 minutes later it works fine. Also if I run it just in R. Thanks James