Re: Importing Large Amounts of Data - Mailing list pgsql-hackers

From Curt Sampson
Subject Re: Importing Large Amounts of Data
Date
Msg-id Pine.NEB.4.43.0204161017210.498-100000@angelic.cynic.net
Whole thread Raw
In response to Re: Importing Large Amounts of Data  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Importing Large Amounts of Data  (Bruce Momjian <pgman@candle.pha.pa.us>)
Re: Importing Large Amounts of Data  (Gavin Sherry <swm@linuxworld.com.au>)
List pgsql-hackers
On Mon, 15 Apr 2002, Tom Lane wrote:

> > I'm not looking for "runs a bit faster;" five percent either way
> > makes little difference to me. I'm looking for a five-fold performance
> > increase.
>
> You are not going to get it from this; where in the world did you get
> the notion that data integrity costs that much?

Um...the fact that MySQL imports the same data five times as fast? :-)

Note that this is *only* related to bulk-importing huge amounts of
data. Postgres seems a little bit slower than MySQL at building
the indicies afterwards, but this would be expected since (probably
due to higher tuple overhead) the size of the data once in postgres
is about 75% larger than in MySQL: 742 MB vs 420 MB. I've not done
any serious testing of query speed, but the bit of toying I've done
with it shows no major difference.

> Have you tried all the usual speedup hacks?  Turn off fsync, if you
> really think you do not care about crash integrity; use COPY FROM STDIN
> to bulk-load data, not retail INSERTs; possibly drop and recreate
> indexes rather than updating them piecemeal; etc.  You should also
> consider not declaring foreign keys, as the runtime checks for reference
> validity are pretty expensive.

Yes, I did all of the above. (This was all mentioned in my initial
message, except for turning off foreign key constraints--but the
table has no foreign keys.)

What I'm thinking would be really cool would be to have an "offline"
way of creating tables using a stand-alone program that would write
the files at, one hopes, near disk speed. Maybe it could work by
creating the tables in a detached tablespace, and then you'd attach
the tablespace when you're done. It might even be extended to be
able to do foreign key checks, create indicies, and so on. (Foreign
key checks would be useful; I'm not sure that creating indicies
would be any faster than just doing it after the tablespace is
attached.)

This would be particularly useful for fast restores of backups.
Downtime while doing a restore is always a huge pain for large
databases.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: multibyte support by default
Next
From: Bruce Momjian
Date:
Subject: Re: Importing Large Amounts of Data