Re: Importing Large Amounts of Data - Mailing list pgsql-hackers

From Curt Sampson
Subject Re: Importing Large Amounts of Data
Date
Msg-id Pine.NEB.4.43.0204161356050.498-100000@angelic.cynic.net
Whole thread Raw
In response to Re: Importing Large Amounts of Data  (Gavin Sherry <swm@linuxworld.com.au>)
Responses Re: Importing Large Amounts of Data  (Gavin Sherry <swm@linuxworld.com.au>)
List pgsql-hackers
On Tue, 16 Apr 2002, Gavin Sherry wrote:

> I don't see any straight forward way of modifying the code to allow a fast
> path directly to relationals on-disk. However, it should be possible to
> bypass locking, RI, MVCC etc with the use of a bootstrap-like tool.

That was my thought. I'm not asking for disk-speed writes through the
database server itself; I can make do with taking the server off-line,
doing the imports, and bringing it back again, as you suggest.

> Given the very low parsing and 'planning' overhead, the real cost would be
> WAL (the bootstrapper could fail and render the database unusable) and the
> subsequent updating of on-disk relations.

MS SQL Server, when doing a BULK INSERT or BCP, can do it as a fully or
"minimally" logged operation. When minimally logged, there's no ability
to roll-forward or recover inserted data, just the ability to go back
to the state at the beginning of the operation. This technique can work
even though an on-line database. A bit more information is available at
http://msdn.microsoft.com/library/en-us/adminsql/ad_impt_bcp_9esz.asp

(You may want to browse this with lynx; the javascript in it is going to
force your screen into a configuration with frames.) You can follow some
of the links in that page for further information.

Another option, for off-line databases, might just be not to log at all.
If you take a backup first, it may be faster to restore the backup and
start again than to try to roll back the operation, or roll it foward
to partial completion and then figure out where to restart your import.
This seems especially likely if you can restore only the files relating
to the table that was actually damanaged.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCHES] [SQL] 16 parameter limit
Next
From: Bruce Momjian
Date:
Subject: Re: [PATCHES] [SQL] 16 parameter limit