Thread: Avoid long-running transactions in a long-running stored procedure?
Once per quarter, we need to load a lot of data, which causes many updates across the database. We have an online transaction processing-style application, which we really want to stay up during the update job.
The programmer coded a stored procedure which does the job well … logically. But as a single PL/pgSQL stored procedure, it is one long-running transaction. At least, that is my interpretation of http://www.postgresql.org/docs/8.0/interactive/plpgsql-porting.html#CO.PLPGSQL-PORTING-COMMIT – and in fact, we do get errors when we try little BEGIN-COMMIT blocks inside a stored procedure.
A single long-running transaction would be bad in production. A long run time = OK, but long-running transaction = site outage.
So I’m asking for advice on whether I can break this into small transactions without too much of a rewrite. Roughly, the algorithm is:
(1) One job dumps the data from the external source into a load table.
(2) Another job calls the stored procedure, which uses a cursor to traverse the load table. A loop for each record:
a. Processes a lot of special cases, with inserts and/or updates to many tables.
Unless this can be done within PL/pgSQL, I will have the programmer refactor job (2) so that the loop is in a java program, and the “normalization” logic in (a) – the guts of the loop – remain in a smaller stored procedure. The java loop will call that stored procedure once per row of the load table, each call in a separate transaction. That would both preserve the bulk of the PL/pgSQL code and keep the normalization logic close to the data. So the runtime will be reasonable, probably somewhat longer than his single monolithic stored procedure, but the transactions will be short.
We don’t need anything like SERIALIZATION transaction isolation of the online system from the entire load job.
Thanks for any ideas,
David Crane
DonorsChoose.org
David, > Once per quarter, we need to load a lot of data, which causes many > updates across the database. We have an online transaction > processing-style application, which we really want to stay up during the > update job. What you're talking about is "autonomous transactions". There's someone working on them for 8.4, and we may get them next version, but you can't have them now. However, you can write your stored procedures in an external language (like PL/Perl, PL/Ruby, PL/Java or PL/Python) and re-connect to your database in order to run several separate transactions. Several users are doing this for large ETL jobs. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
On Thu, 2008-02-14 at 17:29 -0800, Josh Berkus wrote: > David, > > > Once per quarter, we need to load a lot of data, which causes many > > updates across the database. We have an online transaction > > processing-style application, which we really want to stay up during the > > update job. > However, you can write your stored procedures in an external language (like > PL/Perl, PL/Ruby, PL/Java or PL/Python) and re-connect to your database in > order to run several separate transactions. Several users are doing this > for large ETL jobs. > I actually do it externally via a perl script even, and I'm breaking the data down to even more than miniscule size.
Thanks for the prompt replies! It sounds like these are variations of the same approach. In our case, we need to do a lot of comparing against the old data, audit tables and so forth, so the bulk of the work is in the body of the existing loop (already coded). So I think keeping that loop body in a stand-alone stored procedure will be the most efficient for us. And we'll port the logic outside the loop into a java program, easier for us to schedule through another existing system. Those autonomous transactions are gonna be nice, but PostgreSQL is plenty nice as it is. Progress is good, though. Thanks, David Crane -----Original Message----- From: Ow Mun Heng [mailto:Ow.Mun.Heng@wdc.com] Sent: Thursday, February 14, 2008 8:31 PM To: josh@agliodbs.com Cc: pgsql-performance@postgresql.org; David Crane Subject: Re: [PERFORM] Avoid long-running transactions in a long-runningstored procedure? On Thu, 2008-02-14 at 17:29 -0800, Josh Berkus wrote: > David, > > > Once per quarter, we need to load a lot of data, which causes many > > updates across the database. We have an online transaction > > processing-style application, which we really want to stay up during the > > update job. > However, you can write your stored procedures in an external language (like > PL/Perl, PL/Ruby, PL/Java or PL/Python) and re-connect to your database in > order to run several separate transactions. Several users are doing this > for large ETL jobs. > I actually do it externally via a perl script even, and I'm breaking the data down to even more than miniscule size.