Re: in Pl/PgSQL, do commit every 5000 records - Mailing list pgsql-general

From Florian G. Pflug
Subject Re: in Pl/PgSQL, do commit every 5000 records
Date
Msg-id 4412D33A.4070503@phlo.org
Whole thread Raw
In response to in Pl/PgSQL, do commit every 5000 records  (Emi Lu <emilu@encs.concordia.ca>)
Responses Re: in Pl/PgSQL, do commit every 5000 records
List pgsql-general
Emi Lu wrote:
 >> Florian G. Pflug wrote:
 >> < snipped code of stored procedure >
 >>>> Are you aware of the "insert into <table> (<field1>, ..., <fieldn>)
 >>>> select <val1>, .., <valn> from ...."
 >>>> command? It'd be much faster to use that it it's possible...
 >>>>
 >>>
 >>> It did faster. Thank you Florian. Could you hint me why "insert into
 >>> .. select " is faster than a cursor transaction please?
 >>
 >> Well, you're avoiding a lot of overhead. "insert into ... select from .."
 >> is just one sql-statement. Of course, postgres internally does
 >> something similar to your stored procedure, but it's all compiled
 >> C code now (instead of interpreted plpgsql). Additionally, postgres
 >> might be able to optimize this more than you could from plpgsql, because
 >> you're restricted to the api that is exposed to plpgsql, while the
 >> backend-code
 >> might be able to "pull a few more tricks".
 >>
 >> In general, if you have the choice between looping over a large result
 >> in a stored procedure (or, even worse, in a client app) and letting the
 >> backend do the looping, then letting the backend handle it is nearly
 >> always
 >> faster.
 >
 > The information are very helpful! Thank you again Florian.
 >
 > If now, I have a series of queries to be run:
 >
 > 1. "insert into t1... (select .. from ...left join ... .. where ....) "
 > 2. "insert into t2 ... the same sub-query as in 1 "
 > 3. "update t3 set ... from ( the same sub-query as in 1) AS X where
 > t3.pk = X.pk " 4. "update t4 set ... from ( the same sub-query as in 1)
 > AS X where t4.pk = X.pk"
 >
 > . the subquery (select .. from ...left join ... .. where ....) is two
 > big tables doing left join
If running the subquery "(select ... from .. left join ... .. where ...)"
takes a long time, even without inserting the records into a new table
(You can benchmark this with "select count(*) from ... left join ... where ...",
and see how long it takes),
than it might be faster to first do
"create temporary table t as select .. from .. left join ... where ...",
and then use the temp-table instead of the subquery in the other statements.

If this is faster or slower depends on a lot of factors, so you'll have to
test which is better.

 > Will there be a better way between
 >
 > a. put all there 4 queries into one function
 >    in perl or java, just call this function
 >
 > b. in perl / java, write and run the 4 queries independently
Should be about the same - just use whatever fits your overall software design
better.

 > The pl/pgsql function does not allow commit. So, in the function , if
 > any step went wrong, all 4 steps rollback. While in java, after every
 > query, I can do commit. May java speed up all four updates?
In postgresql 8.0 and above, you could use the exception support in plpgsql
to prevent the whole transaction from rolling back in case of an error.
Only the statements _inside_ the block where you caught the error would roll back.

 From java, you could do the same, by using the "savepoint" command manually (or
maybe the jdbc driver for postgres has some support for this - I've never actually
used jdbc).

In any case, the "right right" depends on your application. Are those inserts/updates
independent of each other, or will it cause data inconsistencies if one is done
and the other is net? Is there actually something your app can do if a statement causes
an error, or will it just be reported, and a human will have to fix it?

greetings, Florian Pflug

Attachment

pgsql-general by date:

Previous
From: PFC
Date:
Subject: Re: [SQL] input from a external text file......!
Next
From: Christopher Browne
Date:
Subject: Re: Can the PostgreSQL store the Multimedia files