Once per quarter, we need to load a lot of data, which causes many updates across the database. We have an online transaction processing-style application, which we really want to stay up during the update job.
The programmer coded a stored procedure which does the job well … logically. But as a single PL/pgSQL stored procedure, it is one long-running transaction. At least, that is my interpretation of http://www.postgresql.org/docs/8.0/interactive/plpgsql-porting.html#CO.PLPGSQL-PORTING-COMMIT – and in fact, we do get errors when we try little BEGIN-COMMIT blocks inside a stored procedure.
A single long-running transaction would be bad in production. A long run time = OK, but long-running transaction = site outage.
So I’m asking for advice on whether I can break this into small transactions without too much of a rewrite. Roughly, the algorithm is:
(1) One job dumps the data from the external source into a load table.
(2) Another job calls the stored procedure, which uses a cursor to traverse the load table. A loop for each record:
a. Processes a lot of special cases, with inserts and/or updates to many tables.
Unless this can be done within PL/pgSQL, I will have the programmer refactor job (2) so that the loop is in a java program, and the “normalization” logic in (a) – the guts of the loop – remain in a smaller stored procedure. The java loop will call that stored procedure once per row of the load table, each call in a separate transaction. That would both preserve the bulk of the PL/pgSQL code and keep the normalization logic close to the data. So the runtime will be reasonable, probably somewhat longer than his single monolithic stored procedure, but the transactions will be short.
We don’t need anything like SERIALIZATION transaction isolation of the online system from the entire load job.
Thanks for any ideas,
David Crane
DonorsChoose.org