Re: Importing Large Amounts of Data - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Importing Large Amounts of Data
Date
Msg-id 22794.1018880810@sss.pgh.pa.us
Whole thread Raw
In response to Re: Importing Large Amounts of Data  (Curt Sampson <cjs@cynic.net>)
Responses Re: Importing Large Amounts of Data  (Curt Sampson <cjs@cynic.net>)
List pgsql-hackers
Curt Sampson <cjs@cynic.net> writes:
> On Mon, 15 Apr 2002, Christopher Kings-Lynne wrote:
>> CREATE TABLE WITHOUT OIDS ...

> As you can see from the schema I gave later in my message, that's
> exactly what I did. But does this actually avoid allocating the
> space in the on-disk tuples? What part of the code deals with this?
> It looks to me like the four bytes for the OID are still allocated
> in the tuple, but not used.

Curt is correct: WITHOUT OIDS does not save any storage.  Having two
different formats for the on-disk tuple header seemed more pain than
the feature was worth.  Also, because of alignment considerations it
would save no storage on machines where MAXALIGN is 8.  (Possibly my
thinking is colored somewhat by the fact that that's so on all my
favorite platforms ;-).)

However, as for the NULL values bitmap: that's already compacted out
when not used, and always has been AFAIK.

>> It's a bit hard to say "just turn off all the things that ensure your data
>> integrity so it runs a bit faster", if you actually need data integrity.

> I'm not looking for "runs a bit faster;" five percent either way
> makes little difference to me. I'm looking for a five-fold performance
> increase.

You are not going to get it from this; where in the world did you get
the notion that data integrity costs that much?  When the WAL stuff
was added in 7.1, we certainly did not see any five-fold slowdown.
If anything, testing seemed to indicate that WAL sped things up.
A lot would depend on your particular scenario of course.

Have you tried all the usual speedup hacks?  Turn off fsync, if you
really think you do not care about crash integrity; use COPY FROM STDIN
to bulk-load data, not retail INSERTs; possibly drop and recreate
indexes rather than updating them piecemeal; etc.  You should also
consider not declaring foreign keys, as the runtime checks for reference
validity are pretty expensive.
        regards, tom lane


pgsql-hackers by date:

Previous
From: postgresql@fruru.com
Date:
Subject: Re: [GENERAL] [Fwd: AW: More UB-Tree patent information]
Next
From: "Rod Taylor"
Date:
Subject: Stumbled upon a time bug...