Thread: Writing results while loop ongoing?

Writing results while loop ongoing?

From
James David Smith
Date:
Hi there,

Having some problems today. Hopefully someone can help. The situation is...

I have a query that takes data from one of my tables, sends it to an
API, gets some data back, and then writes it to a results table.

The data comes from a table called 'stage_cleaned'
The query is called 'create_route_geometry_mapquest'
The results table is called 'route_geom'

The problem that I have been having is that if I send say 50 requests
to the API, and all but the 49th are completed fine, none of my
results are wrote to the results table. It's all or nothing. To fix
this I've tried to put the function within a loop and identified when
the errors occur and put them as an exception. It now works fine and
the functions always goes through the entire data. However it only
write the data to the results table at the end. Is this normal? Could
it not write the data to the results table after it's made each
request? Thoughts? Query below:

------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION routing_loop() RETURNS VOID AS $$
DECLARE
record_number RECORD;
BEGIN
FOR record_number IN SELECT id FROM stage_cleaned WHERE google_mode =
'walking' AND route_geom IS NULL ORDER BY id LIMIT 5
LOOP
LOOP
BEGIN

PERFORM create_route_geometry_mapquest(
google_origin::text,
google_destination::text,
google_mode::text,
id::text
),
Notice('did stage cleaned id number ' || id ),
pg_sleep(1)
FROM stage_cleaned
WHERE route_geom IS NULL
AND google_mode = 'walking'
AND id = record_number.id
ORDER BY id;

EXIT;

EXCEPTION
WHEN SQLSTATE 'XX000' THEN
END;
END LOOP;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
----------------------------------------------------------------------------

Thanks

James


Re: Writing results while loop ongoing?

From
Merlin Moncure
Date:
On Wed, Sep 4, 2013 at 6:40 AM, James David Smith
<james.david.smith@gmail.com> wrote:
> Hi there,
>
> Having some problems today. Hopefully someone can help. The situation is...
>
> I have a query that takes data from one of my tables, sends it to an
> API, gets some data back, and then writes it to a results table.
>
> The data comes from a table called 'stage_cleaned'
> The query is called 'create_route_geometry_mapquest'
> The results table is called 'route_geom'
>
> The problem that I have been having is that if I send say 50 requests
> to the API, and all but the 49th are completed fine, none of my
> results are wrote to the results table. It's all or nothing. To fix
> this I've tried to put the function within a loop and identified when
> the errors occur and put them as an exception. It now works fine and
> the functions always goes through the entire data. However it only
> write the data to the results table at the end. Is this normal? Could
> it not write the data to the results table after it's made each
> request? Thoughts? Query below:

This is a fundamental principle of SQL: transaction completion is 'all
at once'.  This allows for a lot of simplifications in terms of data
management that I think you'll come to appreciate over time.  But
occasionally it can be annoying if you're trying to move as much
coding into the sever as possible (which is sensible).

"Stored Procedures" (especially) and "Autonomous Transactions" are two
hypothetical features that should allow for server side coding while
being able to manipulate transaction states independently of function
call execution.  These features are much discussed and sadly not
likely to show up soon.  In the meantime, you have two options:

1. Move the controlling code out of the server and into a client side
language.  If you're like me, you should find this distasteful and
silly.
2. Abuse dblink.  With dblink, you can connect the database to itself
and issue transactions. That can commit before the controlling
transaction resolves.  This method has certain issues and annoyances
but can generally be made to work.  Autonomous Transactions is
basically a formalization of this technique.

http://www.postgresql.org/docs/9.2/static/dblink.html

merlin


Re: Writing results while loop ongoing?

From
Kevin Grittner
Date:
James David Smith <james.david.smith@gmail.com> wrote:

> the functions always goes through the entire data. However it
> only write the data to the results table at the end. Is this
> normal?

It is normal that the work of a transaction is not visible until
and unless that transaction commits.  Execution of a function is
always part of a single transaction.

> Could it not write the data to the results table after it's made
> each request?

It does; but the data does not become visible outside the
transaction writing the data unless the transaction commits.

http://en.wikipedia.org/wiki/ACID#Atomicity

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Writing results while loop ongoing?

From
James David Smith
Date:
On 4 September 2013 14:35, Kevin Grittner <kgrittn@ymail.com> wrote:
> James David Smith <james.david.smith@gmail.com> wrote:
>
>> the functions always goes through the entire data. However it
>> only write the data to the results table at the end. Is this
>> normal?
>
> It is normal that the work of a transaction is not visible until
> and unless that transaction commits.  Execution of a function is
> always part of a single transaction.
>
>> Could it not write the data to the results table after it's made
>> each request?
>
> It does; but the data does not become visible outside the
> transaction writing the data unless the transaction commits.
>
> http://en.wikipedia.org/wiki/ACID#Atomicity
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


Thanks for the help Merlin and Kevin. I think I understand. dblink
looks like it might be the way to go, but to honest it looks quite
complicated!  I think what I might do instead is try to catch the
errors in EXCEPTION clauses. For example the main reason my queries
fail is:

ERROR: R interpreter expression evaluation error
SQL state: 22000

So if I change my query to below, then it should just push on anyway I
think? Have I constructed it correctly?

James

------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION routing_loop() RETURNS VOID AS $$
DECLARE
record_number RECORD;
BEGIN
FOR record_number IN SELECT id FROM stage_cleaned WHERE google_mode =
'walking' AND route_geom IS NULL ORDER BY id LIMIT 5
LOOP
LOOP
BEGIN

PERFORM create_route_geometry_mapquest(
google_origin::text,
google_destination::text,
google_mode::text,
id::text
),
Notice('did stage cleaned id number ' || id ),
pg_sleep(1)
FROM stage_cleaned
WHERE route_geom IS NULL
AND google_mode = 'walking'
AND id = record_number.id
ORDER BY id;

EXIT;

EXCEPTION
WHEN SQLSTATE '22000' THEN
RAISE NOTICE 'There is an error, but hopefully I will continue anyway';
PERFORM pg_sleep(60);
END;
END LOOP;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;


Re: Writing results while loop ongoing?

From
Merlin Moncure
Date:
On Wed, Sep 4, 2013 at 12:48 PM, James David Smith
<james.david.smith@gmail.com> wrote:
> On 4 September 2013 14:35, Kevin Grittner <kgrittn@ymail.com> wrote:
>> James David Smith <james.david.smith@gmail.com> wrote:
>>
>>> the functions always goes through the entire data. However it
>>> only write the data to the results table at the end. Is this
>>> normal?
>>
>> It is normal that the work of a transaction is not visible until
>> and unless that transaction commits.  Execution of a function is
>> always part of a single transaction.
>>
>>> Could it not write the data to the results table after it's made
>>> each request?
>>
>> It does; but the data does not become visible outside the
>> transaction writing the data unless the transaction commits.
>>
>> http://en.wikipedia.org/wiki/ACID#Atomicity
>>
>> --
>> Kevin Grittner
>> EDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>
>
> Thanks for the help Merlin and Kevin. I think I understand. dblink
> looks like it might be the way to go, but to honest it looks quite
> complicated!  I think what I might do instead is try to catch the
> errors in EXCEPTION clauses. For example the main reason my queries
> fail is:
>
> ERROR: R interpreter expression evaluation error
> SQL state: 22000
>
> So if I change my query to below, then it should just push on anyway I
> think? Have I constructed it correctly?

yes, but that's not the question you asked. dblink is mechanic to
write out the data 'mid transaction' -- not necessarily error control.
 if all you care about is error suppression in the loop, then
exception block should fit the bill.    Be advised that since you're
calling pl/r any side effects there  (such as writing to a file) may
not necessarily be rolled back with the transaction.

I would consider dropping the sleep call unless there is a good reason
for it to be there and would consider putting the contents of the
error message in the RAISE, which I would additionally consider
raising to 'warning' (which puts stronger emphasis on it entering the
log).

Also, I would strongly consider indenting your code.  For pl/pgsql, I
typically go with two space indents and absolutely forbid the use of
tabs because they blow up psql when pasting.

BTW, that R error is a fairly generic error.  Do you have control over
the R code? I can give you some tips on how to trace it down.

merlin

> ------------------------------------------------------------------------
> CREATE OR REPLACE FUNCTION routing_loop() RETURNS VOID AS $$
> DECLARE
> record_number RECORD;
> BEGIN
> FOR record_number IN SELECT id FROM stage_cleaned WHERE google_mode =
> 'walking' AND route_geom IS NULL ORDER BY id LIMIT 5
> LOOP
> LOOP
> BEGIN
>
> PERFORM create_route_geometry_mapquest(
> google_origin::text,
> google_destination::text,
> google_mode::text,
> id::text
> ),
> Notice('did stage cleaned id number ' || id ),
> pg_sleep(1)
> FROM stage_cleaned
> WHERE route_geom IS NULL
> AND google_mode = 'walking'
> AND id = record_number.id
> ORDER BY id;
>
> EXIT;
>
> EXCEPTION
> WHEN SQLSTATE '22000' THEN
> RAISE NOTICE 'There is an error, but hopefully I will continue anyway';
> PERFORM pg_sleep(60);
> END;
> END LOOP;
> END LOOP;
> RETURN;
> END;
> $$ LANGUAGE plpgsql;
>
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice


Re: Writing results while loop ongoing?

From
James David Smith
Date:
On 4 September 2013 19:48, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Sep 4, 2013 at 12:48 PM, James David Smith
> <james.david.smith@gmail.com> wrote:
>> On 4 September 2013 14:35, Kevin Grittner <kgrittn@ymail.com> wrote:
>>> James David Smith <james.david.smith@gmail.com> wrote:
>>>
>>>> the functions always goes through the entire data. However it
>>>> only write the data to the results table at the end. Is this
>>>> normal?
>>>
>>> It is normal that the work of a transaction is not visible until
>>> and unless that transaction commits.  Execution of a function is
>>> always part of a single transaction.
>>>
>>>> Could it not write the data to the results table after it's made
>>>> each request?
>>>
>>> It does; but the data does not become visible outside the
>>> transaction writing the data unless the transaction commits.
>>>
>>> http://en.wikipedia.org/wiki/ACID#Atomicity
>>>
>>> --
>>> Kevin Grittner
>>> EDB: http://www.enterprisedb.com
>>> The Enterprise PostgreSQL Company
>>
>>
>> Thanks for the help Merlin and Kevin. I think I understand. dblink
>> looks like it might be the way to go, but to honest it looks quite
>> complicated!  I think what I might do instead is try to catch the
>> errors in EXCEPTION clauses. For example the main reason my queries
>> fail is:
>>
>> ERROR: R interpreter expression evaluation error
>> SQL state: 22000
>>
>> So if I change my query to below, then it should just push on anyway I
>> think? Have I constructed it correctly?
>
> yes, but that's not the question you asked. dblink is mechanic to
> write out the data 'mid transaction' -- not necessarily error control.
>  if all you care about is error suppression in the loop, then
> exception block should fit the bill.    Be advised that since you're
> calling pl/r any side effects there  (such as writing to a file) may
> not necessarily be rolled back with the transaction.
>
> I would consider dropping the sleep call unless there is a good reason
> for it to be there and would consider putting the contents of the
> error message in the RAISE, which I would additionally consider
> raising to 'warning' (which puts stronger emphasis on it entering the
> log).
>
> Also, I would strongly consider indenting your code.  For pl/pgsql, I
> typically go with two space indents and absolutely forbid the use of
> tabs because they blow up psql when pasting.
>
> BTW, that R error is a fairly generic error.  Do you have control over
> the R code? I can give you some tips on how to trace it down.
>
> merlin
>
>> ------------------------------------------------------------------------
>> CREATE OR REPLACE FUNCTION routing_loop() RETURNS VOID AS $$
>> DECLARE
>> record_number RECORD;
>> BEGIN
>> FOR record_number IN SELECT id FROM stage_cleaned WHERE google_mode =
>> 'walking' AND route_geom IS NULL ORDER BY id LIMIT 5
>> LOOP
>> LOOP
>> BEGIN
>>
>> PERFORM create_route_geometry_mapquest(
>> google_origin::text,
>> google_destination::text,
>> google_mode::text,
>> id::text
>> ),
>> Notice('did stage cleaned id number ' || id ),
>> pg_sleep(1)
>> FROM stage_cleaned
>> WHERE route_geom IS NULL
>> AND google_mode = 'walking'
>> AND id = record_number.id
>> ORDER BY id;
>>
>> EXIT;
>>
>> EXCEPTION
>> WHEN SQLSTATE '22000' THEN
>> RAISE NOTICE 'There is an error, but hopefully I will continue anyway';
>> PERFORM pg_sleep(60);
>> END;
>> END LOOP;
>> END LOOP;
>> RETURN;
>> END;
>> $$ LANGUAGE plpgsql;
>>
>>
>> --
>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-novice

Hi Merlin,

Thanks for the reply again.

Yes, you are right. I started off looking for ways to write out my
results mid-query, but actually it's errors making the function fail
that are my main concern. They're kind of related. If I could block
all the errors, and just get the function to keep running that would
accomplish the same result for me i.e. not losing data from queries
than ran successfully.

Regarding dropping pg_sleep , this is here so as not to overload the
API servers that I am running the query against. The second pg_sleep
in particular is there as I believe that the generic PL/R error I get
sometimes is due to the server being overloaded. If I run the same
data a few minutes later it works fine. Hence when I get the PL/R
error, I put the query on pause for 60 seconds before it tries to
start again.

Yes, I do have control over the R code. Tips on how to refine the
error would be appreciated. I wrote the R code and PL/R function by
scratching together examples from a few places, with some help from
Mark on the PostGIS mailing list, and then some heavy tweaking myself.
What makes it frustrating is that if I run the row of data that I get
an error with 5 minutes later it works fine. Also if I run it just in
R.

Thanks

James