Thread: JDBC 5 million function insert returning Single Transaction Lock Access Exclusive Problem
JDBC 5 million function insert returning Single Transaction Lock Access Exclusive Problem
From
Eileen
Date:
Hi,
I have written some Java code which builds a postgresql function. That function calls approximately 6 INSERT statements with a RETURNING clause. I recreate and re-run the function about 900,000 times. I use JDBC to execute these functions on postgresql 8.3 on Windows. When I tried running this on a single Connection of Postgresql, it failed (some kind of memory error). So I split the JDBC connections up into chunks of 5000. I reran and everything was fine. It took about 1 hour to execute all the updates.
Since it took so long to perform the update, I wanted to prevent other users from querying the data during that time. So I read about the LOCK command. It seemed like I should LOCK all the tables in the database with an ACCESS EXCLUSIVE mode. That would prevent anyone from getting data while the database was making its updates.
Since a LOCK is only valid for 1 transaction, I set autocommit to FALSE. I also removed the code which chunked up the inserts. I had read that a single transaction ought to have better performance than committing after each insert, but that was clearly not what ended up happening in my case.
In my case, a few problems occurred. Number 1, the process ran at least 8 hours and never finished. It did not finish because the hard drive was filled up. After running a manual vacuum (VACUUM FULL), no space was freed up. I think this has cost me 20 GB of space. Is there any way to free this space up? I even dropped the database to no avail.
Secondly, why did this process take over 8 hours to run? While reading the performance mailing list, it seems like recommendations are to run lots of INSERTS in a single commit. Is 5 million too many? Is redefining a function over and over inside a transaction a problem? Does the RETURNING clause present a problem during a single transaction?
If anyone has any suggestions for me, I would really appreciate it.
Tina
Re: JDBC 5 million function insert returning Single Transaction Lock Access Exclusive Problem
From
Dave Cramer
Date:
Dave Cramer dave.cramer(at)credativ(dot)ca http://www.credativ.ca On Thu, Aug 30, 2012 at 2:34 AM, Eileen <hey_here@yahoo.com> wrote: > Hi, > > I have written some Java code which builds a postgresql function. That > function calls approximately 6 INSERT statements with a RETURNING clause. I > recreate and re-run the function about 900,000 times. I use JDBC to execute > these functions on postgresql 8.3 on Windows. When I tried running this on > a single Connection of Postgresql, it failed (some kind of memory error). > So I split the JDBC connections up into chunks of 5000. I reran and > everything was fine. It took about 1 hour to execute all the updates. > > Since it took so long to perform the update, I wanted to prevent other users > from querying the data during that time. So I read about the LOCK command. > It seemed like I should LOCK all the tables in the database with an ACCESS > EXCLUSIVE mode. That would prevent anyone from getting data while the > database was making its updates. Do you understand how MVCC works? Do you really need to lock out users ? > > Since a LOCK is only valid for 1 transaction, I set autocommit to FALSE. I > also removed the code which chunked up the inserts. I had read that a > single transaction ought to have better performance than committing after > each insert, but that was clearly not what ended up happening in my case. We would need more information as to what you are doing. > > In my case, a few problems occurred. Number 1, the process ran at least 8 > hours and never finished. It did not finish because the hard drive was > filled up. After running a manual vacuum (VACUUM FULL), no space was freed > up. I think this has cost me 20 GB of space. Is there any way to free this > space up? I even dropped the database to no avail. > > Secondly, why did this process take over 8 hours to run? While reading the > performance mailing list, it seems like recommendations are to run lots of > INSERTS in a single commit. Is 5 million too many? Is redefining a > function over and over inside a transaction a problem? Does the RETURNING > clause present a problem during a single transaction? VACUUM FULL on 8.3 is not a good idea > > If anyone has any suggestions for me, I would really appreciate it. > Can you explain at a high level what you are trying to do ? > Tina
Re: JDBC 5 million function insert returning Single Transaction Lock Access Exclusive Problem
From
"Albe Laurenz"
Date:
Eileen wrote: > I have written some Java code which builds a postgresql function. That function calls approximately 6 > INSERT statements with a RETURNING clause. I recreate and re-run the function about 900,000 times. I > use JDBC to execute these functions on postgresql 8.3 on Windows. When I tried running this on a > single Connection of Postgresql, it failed (some kind of memory error). So I split the JDBC > connections up into chunks of 5000. I reran and everything was fine. It took about 1 hour to execute > all the updates. > > Since it took so long to perform the update, I wanted to prevent other users from querying the data > during that time. So I read about the LOCK command. It seemed like I should LOCK all the tables in > the database with an ACCESS EXCLUSIVE mode. That would prevent anyone from getting data while the > database was making its updates. > > Since a LOCK is only valid for 1 transaction, I set autocommit to FALSE. I also removed the code > which chunked up the inserts. I had read that a single transaction ought to have better performance > than committing after each insert, but that was clearly not what ended up happening in my case. > > In my case, a few problems occurred. Number 1, the process ran at least 8 hours and never finished. > It did not finish because the hard drive was filled up. After running a manual vacuum (VACUUM FULL), > no space was freed up. I think this has cost me 20 GB of space. Is there any way to free this space > up? I even dropped the database to no avail. Try to identify what files use the space. Look at the size of directories. Could it be that "archive_mode" is "on" and you ran out of space for archived WALs? When you drop a database, all files that belong to the database are gone. > Secondly, why did this process take over 8 hours to run? While reading the performance mailing list, > it seems like recommendations are to run lots of INSERTS in a single commit. Is 5 million too many? > Is redefining a function over and over inside a transaction a problem? Does the RETURNING clause > present a problem during a single transaction? It would be interesting to know how the time was spent. Were the CPUs busy? Were there locks? Yours, Laurenz Albe
Re: JDBC 5 million function insert returning Single Transaction Lock Access Exclusive Problem
From
Scott Marlowe
Date:
On Thu, Aug 30, 2012 at 12:34 AM, Eileen <hey_here@yahoo.com> wrote: > Hi, > > I have written some Java code which builds a postgresql function. That > function calls approximately 6 INSERT statements with a RETURNING clause. I > recreate and re-run the function about 900,000 times. I use JDBC to execute That's generally a pretty inefficient way of doing things. Can you create a function that does what you want and not drop / recreate it over and over?
Re: JDBC 5 million function insert returning Single Transaction Lock Access Exclusive Problem
From
Craig Ringer
Date:
On 08/30/2012 02:34 PM, Eileen wrote: > In my case, a few problems occurred. Number 1, the process ran at least > 8 hours and never finished. You're on a very old version of Pg, so you're missing out on a lot of improvements made since then. One of them is, if I recall correctly, an improvement to exception handling efficiency. Does your function use BEGIN ... EXCEPTION to (say) handle data validation errors? -- Craig Ringer
Re: JDBC 5 million function insert returning Single Transaction Lock Access Exclusive Problem
From
Eileen
Date:
Thank you for your help. At a high-level, I am just updating about 900k records in the database with new information, and during that update timetable, I didn't want users to get inconsistent data.
I read about the MVCC and discovered that I didn't necessarily need the LOCK statement. However, based on what I read, I thought that versions of the database would include changes to the schema. I found that not to be the case. I.e. when I queried the database while a transaction was in the process of DROPing tables, it gave me an error instead of an older snapshot. Is there any database which actually isolates schema changes? I was just curious.
I have verified that while I'm DELETING rows from one session, that other sessions can retrieve the old data in a consistent state. Although, in order to actually successfully DELETE the items, I had to add an index for all my Foreign Key fields.
Tina
From: Dave Cramer <pg@fastcrypt.com>
To: Eileen <hey_here@yahoo.com>
Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Sent: Friday, August 31, 2012 6:50 AM
Subject: Re: [PERFORM] JDBC 5 million function insert returning Single Transaction Lock Access Exclusive Problem
Dave Cramer
dave.cramer(at)credativ(dot)ca
http://www.credativ.ca
On Thu, Aug 30, 2012 at 2:34 AM, Eileen <hey_here@yahoo.com> wrote:
> Hi,
>
> I have written some Java code which builds a postgresql function. That
> function calls approximately 6 INSERT statements with a RETURNING clause. I
> recreate and re-run the function about 900,000 times. I use JDBC to execute
> these functions on postgresql 8.3 on Windows. When I tried running this on
> a single Connection of Postgresql, it failed (some kind of memory error).
> So I split the JDBC connections up into chunks of 5000. I reran and
> everything was fine. It took about 1 hour to execute all the updates.
>
> Since it took so long to perform the update, I wanted to prevent other users
> from querying the data during that time. So I read about the LOCK command.
> It seemed like I should LOCK all the tables in the database with an ACCESS
> EXCLUSIVE mode. That would prevent anyone from getting data while the
> database was making its updates.
Do you understand how MVCC works? Do you really need to lock out users ?
>
> Since a LOCK is only valid for 1 transaction, I set autocommit to FALSE. I
> also removed the code which chunked up the inserts. I had read that a
> single transaction ought to have better performance than committing after
> each insert, but that was clearly not what ended up happening in my case.
We would need more information as to what you are doing.
>
> In my case, a few problems occurred. Number 1, the process ran at least 8
> hours and never finished. It did not finish because the hard drive was
> filled up. After running a manual vacuum (VACUUM FULL), no space was freed
> up. I think this has cost me 20 GB of space. Is there any way to free this
> space up? I even dropped the database to no avail.
>
> Secondly, why did this process take over 8 hours to run? While reading the
> performance mailing list, it seems like recommendations are to run lots of
> INSERTS in a single commit. Is 5 million too many? Is redefining a
> function over and over inside a transaction a problem? Does the RETURNING
> clause present a problem during a single transaction?
VACUUM FULL on 8.3 is not a good idea
>
> If anyone has any suggestions for me, I would really appreciate it.
>
Can you explain at a high level what you are trying to do ?
> Tina
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance