Thread: LOCK TABLE & speeding up mass data loads

LOCK TABLE & speeding up mass data loads

From
Ron Johnson
Date:
Hi,

Would LOCK TABLE ACCESS EXCLUSIVE MODE speed things up, when I have
a script that loads data by setting transactions, and then committing
works after every few thousand INSERTs?

Thanks,
Ron
--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA      http://members.cox.net/ron.l.johnson  |
|                                                               |
| "Fear the Penguin!!"                                          |
+---------------------------------------------------------------+


Re: LOCK TABLE & speeding up mass data loads

From
Rod Taylor
Date:
On Sat, 2003-01-25 at 13:57, Ron Johnson wrote:
> Hi,
>
> Would LOCK TABLE ACCESS EXCLUSIVE MODE speed things up, when I have
> a script that loads data by setting transactions, and then committing
> works after every few thousand INSERTs?

If you're the only person working on the database, then no.  If you're
fighting for resources with a bunch of other people -- then possibly,
but the others won't get anything done during this timeframe (of
course).

Oh, and you're using COPY right?

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

Re: LOCK TABLE & speeding up mass data loads

From
Ron Johnson
Date:
On Sat, 2003-01-25 at 13:07, Rod Taylor wrote:
> On Sat, 2003-01-25 at 13:57, Ron Johnson wrote:
> > Hi,
> >
> > Would LOCK TABLE ACCESS EXCLUSIVE MODE speed things up, when I have
> > a script that loads data by setting transactions, and then committing
> > works after every few thousand INSERTs?
>
> If you're the only person working on the database, then no.  If you're
> fighting for resources with a bunch of other people -- then possibly,
> but the others won't get anything done during this timeframe (of
> course).

Ok.

> Oh, and you're using COPY right?

No.  Too much data manipulation to do 1st.  Also, by committing every
X thousand rows, then if the process must be aborted, then there's
no huge rollback, and the script can then skip to the last comitted
row and pick up from there.

--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA      http://members.cox.net/ron.l.johnson  |
|                                                               |
| "Fear the Penguin!!"                                          |
+---------------------------------------------------------------+


Re: LOCK TABLE & speeding up mass data loads

From
Curt Sampson
Date:
On Sun, 25 Jan 2003, Ron Johnson wrote:

> > Oh, and you're using COPY right?
>
> No.  Too much data manipulation to do 1st.  Also, by committing every
> X thousand rows, then if the process must be aborted, then there's
> no huge rollback, and the script can then skip to the last comitted
> row and pick up from there.

I don't see how the amount of data manipulation makes a difference.
Where you now issue a BEGIN, issue a COPY instead. Where you now INSERT,
just print the data for the columns, separated by tabs. Where you now
issue a COMMIT, end the copy.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC

Re: LOCK TABLE & speeding up mass data loads

From
Ron Johnson
Date:
On Sun, 2003-01-26 at 17:10, Curt Sampson wrote:
> On Sun, 25 Jan 2003, Ron Johnson wrote:
>
> > > Oh, and you're using COPY right?
> >
> > No.  Too much data manipulation to do 1st.  Also, by committing every
> > X thousand rows, then if the process must be aborted, then there's
> > no huge rollback, and the script can then skip to the last comitted
> > row and pick up from there.
>
> I don't see how the amount of data manipulation makes a difference.
> Where you now issue a BEGIN, issue a COPY instead. Where you now INSERT,
> just print the data for the columns, separated by tabs. Where you now
> issue a COMMIT, end the copy.

Yes, create an input file for COPY.  Great idea.

However, If I understand you correctly, then if I want to be able
to not have to roll-back and re-run and complete COPY (which may
entail millions of rows), then I'd have to have thousands of seperate
input files (which would get processed sequentially).

Here's what I'd like to see:
COPY table [ ( column [, ...] ) ]
    FROM { 'filename' | stdin }
    [ [ WITH ]
          [ BINARY ]
          [ OIDS ]
          [ DELIMITER [ AS ] 'delimiter' ]
          [ NULL [ AS ] 'null string' ] ]
    [COMMIT EVERY ... ROWS WITH LOGGING]  <<<<<<<<<<<<<
    [SKIP ... ROWS]          <<<<<<<<<<<<<

This way, if I'm loading 25M rows, I can have it commit every, say,
1000 rows, and if it pukes 1/2 way thru, then when I restart the
COPY, it can SKIP past what's already been loaded, and proceed apace.

--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA      http://members.cox.net/ron.l.johnson  |
|                                                               |
| "Fear the Penguin!!"                                          |
+---------------------------------------------------------------+


Re: LOCK TABLE & speeding up mass data loads

From
"Shridhar Daithankar"
Date:
On 27 Jan 2003 at 3:08, Ron Johnson wrote:

> Here's what I'd like to see:
> COPY table [ ( column [, ...] ) ]
>     FROM { 'filename' | stdin }
>     [ [ WITH ]
>           [ BINARY ]
>           [ OIDS ]
>           [ DELIMITER [ AS ] 'delimiter' ]
>           [ NULL [ AS ] 'null string' ] ]
>     [COMMIT EVERY ... ROWS WITH LOGGING]  <<<<<<<<<<<<<
>     [SKIP ... ROWS]          <<<<<<<<<<<<<
>
> This way, if I'm loading 25M rows, I can have it commit every, say,
> 1000 rows, and if it pukes 1/2 way thru, then when I restart the
> COPY, it can SKIP past what's already been loaded, and proceed apace.

IIRc, there is a hook to \copy, not the postgreSQL command copy for how many
transactions you would like to see. I remember to have benchmarked that and
concluded that doing copy in one transaction is the fastest way of doing it.

DOn't have a postgresql installation handy, me being in linux, but this is
definitely possible..

Bye
 Shridhar

--
I still maintain the point that designing a monolithic kernel in 1991 is
afundamental error.  Be thankful you are not my student.  You would not get
ahigh grade for such a design :-)(Andrew Tanenbaum to Linus Torvalds)


Re: LOCK TABLE & speeding up mass data loads

From
Ron Johnson
Date:
On Mon, 2003-01-27 at 03:45, Shridhar Daithankar wrote:
> On 27 Jan 2003 at 3:08, Ron Johnson wrote:
>
> > Here's what I'd like to see:
> > COPY table [ ( column [, ...] ) ]
> >     FROM { 'filename' | stdin }
> >     [ [ WITH ]
> >           [ BINARY ]
> >           [ OIDS ]
> >           [ DELIMITER [ AS ] 'delimiter' ]
> >           [ NULL [ AS ] 'null string' ] ]
> >     [COMMIT EVERY ... ROWS WITH LOGGING]  <<<<<<<<<<<<<
> >     [SKIP ... ROWS]          <<<<<<<<<<<<<
> >
> > This way, if I'm loading 25M rows, I can have it commit every, say,
> > 1000 rows, and if it pukes 1/2 way thru, then when I restart the
> > COPY, it can SKIP past what's already been loaded, and proceed apace.
>
> IIRc, there is a hook to \copy, not the postgreSQL command copy for how many

I'll have to look into that.

> transactions you would like to see. I remember to have benchmarked that and
> concluded that doing copy in one transaction is the fastest way of doing it.

Boy Scout motto: Be prepared!!  (Serves me well as a DBA.)

So it takes a little longer.  In case of failure, the time would be
more than made up.  Also, wouldn't the WAL grow hugely if many millions
of rows were inserted in one txn?

--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA      http://members.cox.net/ron.l.johnson  |
|                                                               |
| "Fear the Penguin!!"                                          |
+---------------------------------------------------------------+


Re: LOCK TABLE & speeding up mass data loads

From
"Shridhar Daithankar"
Date:
On 27 Jan 2003 at 15:15, Shridhar Daithankar wrote:
> IIRc, there is a hook to \copy, not the postgreSQL command copy for how many
> transactions you would like to see. I remember to have benchmarked that and
> concluded that doing copy in one transaction is the fastest way of doing it.
>
> DOn't have a postgresql installation handy, me being in linux, but this is
> definitely possible..

I am sleeping. That should have read XP rather than linux.

Grrr..

Bye
 Shridhar

--
Lowery's Law:    If it jams -- force it.  If it breaks, it needed replacing
anyway.


Re: LOCK TABLE & speeding up mass data loads

From
"Shridhar Daithankar"
Date:
On 27 Jan 2003 at 3:54, Ron Johnson wrote:

> On Mon, 2003-01-27 at 03:45, Shridhar Daithankar wrote:
> > transactions you would like to see. I remember to have benchmarked that and
> > concluded that doing copy in one transaction is the fastest way of doing it.
>
> Boy Scout motto: Be prepared!!  (Serves me well as a DBA.)

Goes for everything else as well..
>
> So it takes a little longer.  In case of failure, the time would be
> more than made up.  Also, wouldn't the WAL grow hugely if many millions
> of rows were inserted in one txn?

Nops.. If WAL starts recycling, postgresql should start flishing data from WAL
to data files.

At any given moment, WAL will not exceed of what you have configured. They are
just  read ahead logs most of the times intended for crash recovery.
(Consequently it does not help setting WAL bigger than required.)

Bye
 Shridhar

--
nominal egg:    New Yorkerese for expensive.


Re: LOCK TABLE & speeding up mass data loads

From
Curt Sampson
Date:
On Mon, 27 Jan 2003, Ron Johnson wrote:

> > I don't see how the amount of data manipulation makes a difference.
> > Where you now issue a BEGIN, issue a COPY instead. Where you now INSERT,
> > just print the data for the columns, separated by tabs. Where you now
> > issue a COMMIT, end the copy.
>
> Yes, create an input file for COPY.  Great idea.

That's not quite what I was thinking of. Don't create an input file,
just send the commands directly to the server (if your API supports it).
If worst comes to worst, you could maybe open up a subprocess for a psql
and write to its standard input.

> However, If I understand you correctly, then if I want to be able
> to not have to roll-back and re-run and complete COPY (which may
> entail millions of rows), then I'd have to have thousands of seperate
> input files (which would get processed sequentially).

Right.

But you can probably commit much less often than 1000 rows. 10,000 or
100,000 would probably be more practical.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC