Thread: Avoid long-running transactions in a long-running stored procedure?

Avoid long-running transactions in a long-running stored procedure?

From
"David Crane"
Date:

Once per quarter, we need to load a lot of data, which causes many updates across the database.  We have an online transaction processing-style application, which we really want to stay up during the update job.

 

The programmer coded a stored procedure which does the job well … logically.  But as a single PL/pgSQL stored procedure, it is one long-running transaction.  At least, that is my interpretation of http://www.postgresql.org/docs/8.0/interactive/plpgsql-porting.html#CO.PLPGSQL-PORTING-COMMIT – and in fact, we do get errors when we try little BEGIN-COMMIT blocks inside a stored procedure.

 

A single long-running transaction would be bad in production.  A long run time = OK, but long-running transaction = site outage.

 

So I’m asking for advice on whether I can break this into small transactions without too much of a rewrite.  Roughly, the algorithm is:

 

(1)   One job dumps the data from the external source into a load table.

(2)   Another job calls the stored procedure, which uses a cursor to traverse the load table.  A loop for each record:

a.      Processes a lot of special cases, with inserts and/or updates to many tables.

 

Unless this can be done within PL/pgSQL, I will have the programmer refactor job (2) so that the loop is in a java program, and the “normalization” logic in (a) – the guts of the loop – remain in a smaller stored procedure.  The java loop will call that stored procedure once per row of the load table, each call in a separate transaction.  That would both preserve the bulk of the PL/pgSQL code and keep the normalization logic close to the data.  So the runtime will be reasonable, probably somewhat longer than his single monolithic stored procedure, but the transactions will be short.

 

We don’t need anything like SERIALIZATION transaction isolation of the online system from the entire load job. 

 

Thanks for any ideas,

David Crane

DonorsChoose.org

Re: Avoid long-running transactions in a long-running stored procedure?

From
Josh Berkus
Date:
David,

> Once per quarter, we need to load a lot of data, which causes many
> updates across the database.  We have an online transaction
> processing-style application, which we really want to stay up during the
> update job.

What you're talking about is "autonomous transactions".  There's someone
working on them for 8.4, and we may get them next version, but you can't
have them now.

However, you can write your stored procedures in an external language (like
PL/Perl, PL/Ruby, PL/Java or PL/Python) and re-connect to your database in
order to run several separate transactions.  Several users are doing this
for large ETL jobs.


--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: Avoid long-running transactions in a long-running stored procedure?

From
Ow Mun Heng
Date:
On Thu, 2008-02-14 at 17:29 -0800, Josh Berkus wrote:
> David,
>
> > Once per quarter, we need to load a lot of data, which causes many
> > updates across the database.  We have an online transaction
> > processing-style application, which we really want to stay up during the
> > update job.
> However, you can write your stored procedures in an external language (like
> PL/Perl, PL/Ruby, PL/Java or PL/Python) and re-connect to your database in
> order to run several separate transactions.  Several users are doing this
> for large ETL jobs.
>

I actually do it externally via a perl script even, and I'm breaking the
data down to even more than miniscule size.

Re: Avoid long-running transactions in a long-runningstored procedure?

From
"David Crane"
Date:
Thanks for the prompt replies!

It sounds like these are variations of the same approach.  In our case,
we need to do a lot of comparing against the old data, audit tables and
so forth, so the bulk of the work is in the body of the existing loop
(already coded).  So I think keeping that loop body in a stand-alone
stored procedure will be the most efficient for us.  And we'll port the
logic outside the loop into a java program, easier for us to schedule
through another existing system.

Those autonomous transactions are gonna be nice, but PostgreSQL is
plenty nice as it is.  Progress is good, though.

Thanks,
David Crane

-----Original Message-----
From: Ow Mun Heng [mailto:Ow.Mun.Heng@wdc.com]
Sent: Thursday, February 14, 2008 8:31 PM
To: josh@agliodbs.com
Cc: pgsql-performance@postgresql.org; David Crane
Subject: Re: [PERFORM] Avoid long-running transactions in a
long-runningstored procedure?


On Thu, 2008-02-14 at 17:29 -0800, Josh Berkus wrote:
> David,
>
> > Once per quarter, we need to load a lot of data, which causes many
> > updates across the database.  We have an online transaction
> > processing-style application, which we really want to stay up during
the
> > update job.
> However, you can write your stored procedures in an external language
(like
> PL/Perl, PL/Ruby, PL/Java or PL/Python) and re-connect to your
database in
> order to run several separate transactions.  Several users are doing
this
> for large ETL jobs.
>

I actually do it externally via a perl script even, and I'm breaking the
data down to even more than miniscule size.