Thread: LOCK TABLE & speeding up mass data loads
Hi, Would LOCK TABLE ACCESS EXCLUSIVE MODE speed things up, when I have a script that loads data by setting transactions, and then committing works after every few thousand INSERTs? Thanks, Ron -- +---------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "Fear the Penguin!!" | +---------------------------------------------------------------+
On Sat, 2003-01-25 at 13:57, Ron Johnson wrote: > Hi, > > Would LOCK TABLE ACCESS EXCLUSIVE MODE speed things up, when I have > a script that loads data by setting transactions, and then committing > works after every few thousand INSERTs? If you're the only person working on the database, then no. If you're fighting for resources with a bunch of other people -- then possibly, but the others won't get anything done during this timeframe (of course). Oh, and you're using COPY right? -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Attachment
On Sat, 2003-01-25 at 13:07, Rod Taylor wrote: > On Sat, 2003-01-25 at 13:57, Ron Johnson wrote: > > Hi, > > > > Would LOCK TABLE ACCESS EXCLUSIVE MODE speed things up, when I have > > a script that loads data by setting transactions, and then committing > > works after every few thousand INSERTs? > > If you're the only person working on the database, then no. If you're > fighting for resources with a bunch of other people -- then possibly, > but the others won't get anything done during this timeframe (of > course). Ok. > Oh, and you're using COPY right? No. Too much data manipulation to do 1st. Also, by committing every X thousand rows, then if the process must be aborted, then there's no huge rollback, and the script can then skip to the last comitted row and pick up from there. -- +---------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "Fear the Penguin!!" | +---------------------------------------------------------------+
On Sun, 25 Jan 2003, Ron Johnson wrote: > > Oh, and you're using COPY right? > > No. Too much data manipulation to do 1st. Also, by committing every > X thousand rows, then if the process must be aborted, then there's > no huge rollback, and the script can then skip to the last comitted > row and pick up from there. I don't see how the amount of data manipulation makes a difference. Where you now issue a BEGIN, issue a COPY instead. Where you now INSERT, just print the data for the columns, separated by tabs. Where you now issue a COMMIT, end the copy. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
On Sun, 2003-01-26 at 17:10, Curt Sampson wrote: > On Sun, 25 Jan 2003, Ron Johnson wrote: > > > > Oh, and you're using COPY right? > > > > No. Too much data manipulation to do 1st. Also, by committing every > > X thousand rows, then if the process must be aborted, then there's > > no huge rollback, and the script can then skip to the last comitted > > row and pick up from there. > > I don't see how the amount of data manipulation makes a difference. > Where you now issue a BEGIN, issue a COPY instead. Where you now INSERT, > just print the data for the columns, separated by tabs. Where you now > issue a COMMIT, end the copy. Yes, create an input file for COPY. Great idea. However, If I understand you correctly, then if I want to be able to not have to roll-back and re-run and complete COPY (which may entail millions of rows), then I'd have to have thousands of seperate input files (which would get processed sequentially). Here's what I'd like to see: COPY table [ ( column [, ...] ) ] FROM { 'filename' | stdin } [ [ WITH ] [ BINARY ] [ OIDS ] [ DELIMITER [ AS ] 'delimiter' ] [ NULL [ AS ] 'null string' ] ] [COMMIT EVERY ... ROWS WITH LOGGING] <<<<<<<<<<<<< [SKIP ... ROWS] <<<<<<<<<<<<< This way, if I'm loading 25M rows, I can have it commit every, say, 1000 rows, and if it pukes 1/2 way thru, then when I restart the COPY, it can SKIP past what's already been loaded, and proceed apace. -- +---------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "Fear the Penguin!!" | +---------------------------------------------------------------+
On 27 Jan 2003 at 3:08, Ron Johnson wrote: > Here's what I'd like to see: > COPY table [ ( column [, ...] ) ] > FROM { 'filename' | stdin } > [ [ WITH ] > [ BINARY ] > [ OIDS ] > [ DELIMITER [ AS ] 'delimiter' ] > [ NULL [ AS ] 'null string' ] ] > [COMMIT EVERY ... ROWS WITH LOGGING] <<<<<<<<<<<<< > [SKIP ... ROWS] <<<<<<<<<<<<< > > This way, if I'm loading 25M rows, I can have it commit every, say, > 1000 rows, and if it pukes 1/2 way thru, then when I restart the > COPY, it can SKIP past what's already been loaded, and proceed apace. IIRc, there is a hook to \copy, not the postgreSQL command copy for how many transactions you would like to see. I remember to have benchmarked that and concluded that doing copy in one transaction is the fastest way of doing it. DOn't have a postgresql installation handy, me being in linux, but this is definitely possible.. Bye Shridhar -- I still maintain the point that designing a monolithic kernel in 1991 is afundamental error. Be thankful you are not my student. You would not get ahigh grade for such a design :-)(Andrew Tanenbaum to Linus Torvalds)
On Mon, 2003-01-27 at 03:45, Shridhar Daithankar wrote: > On 27 Jan 2003 at 3:08, Ron Johnson wrote: > > > Here's what I'd like to see: > > COPY table [ ( column [, ...] ) ] > > FROM { 'filename' | stdin } > > [ [ WITH ] > > [ BINARY ] > > [ OIDS ] > > [ DELIMITER [ AS ] 'delimiter' ] > > [ NULL [ AS ] 'null string' ] ] > > [COMMIT EVERY ... ROWS WITH LOGGING] <<<<<<<<<<<<< > > [SKIP ... ROWS] <<<<<<<<<<<<< > > > > This way, if I'm loading 25M rows, I can have it commit every, say, > > 1000 rows, and if it pukes 1/2 way thru, then when I restart the > > COPY, it can SKIP past what's already been loaded, and proceed apace. > > IIRc, there is a hook to \copy, not the postgreSQL command copy for how many I'll have to look into that. > transactions you would like to see. I remember to have benchmarked that and > concluded that doing copy in one transaction is the fastest way of doing it. Boy Scout motto: Be prepared!! (Serves me well as a DBA.) So it takes a little longer. In case of failure, the time would be more than made up. Also, wouldn't the WAL grow hugely if many millions of rows were inserted in one txn? -- +---------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "Fear the Penguin!!" | +---------------------------------------------------------------+
On 27 Jan 2003 at 15:15, Shridhar Daithankar wrote: > IIRc, there is a hook to \copy, not the postgreSQL command copy for how many > transactions you would like to see. I remember to have benchmarked that and > concluded that doing copy in one transaction is the fastest way of doing it. > > DOn't have a postgresql installation handy, me being in linux, but this is > definitely possible.. I am sleeping. That should have read XP rather than linux. Grrr.. Bye Shridhar -- Lowery's Law: If it jams -- force it. If it breaks, it needed replacing anyway.
On 27 Jan 2003 at 3:54, Ron Johnson wrote: > On Mon, 2003-01-27 at 03:45, Shridhar Daithankar wrote: > > transactions you would like to see. I remember to have benchmarked that and > > concluded that doing copy in one transaction is the fastest way of doing it. > > Boy Scout motto: Be prepared!! (Serves me well as a DBA.) Goes for everything else as well.. > > So it takes a little longer. In case of failure, the time would be > more than made up. Also, wouldn't the WAL grow hugely if many millions > of rows were inserted in one txn? Nops.. If WAL starts recycling, postgresql should start flishing data from WAL to data files. At any given moment, WAL will not exceed of what you have configured. They are just read ahead logs most of the times intended for crash recovery. (Consequently it does not help setting WAL bigger than required.) Bye Shridhar -- nominal egg: New Yorkerese for expensive.
On Mon, 27 Jan 2003, Ron Johnson wrote: > > I don't see how the amount of data manipulation makes a difference. > > Where you now issue a BEGIN, issue a COPY instead. Where you now INSERT, > > just print the data for the columns, separated by tabs. Where you now > > issue a COMMIT, end the copy. > > Yes, create an input file for COPY. Great idea. That's not quite what I was thinking of. Don't create an input file, just send the commands directly to the server (if your API supports it). If worst comes to worst, you could maybe open up a subprocess for a psql and write to its standard input. > However, If I understand you correctly, then if I want to be able > to not have to roll-back and re-run and complete COPY (which may > entail millions of rows), then I'd have to have thousands of seperate > input files (which would get processed sequentially). Right. But you can probably commit much less often than 1000 rows. 10,000 or 100,000 would probably be more practical. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC