Re: VLDB Features - Mailing list pgsql-hackers

From Trent Shipley
Subject Re: VLDB Features
Date
Msg-id 200712151759.20925.trent_shipley@qwest.net
Whole thread Raw
In response to Re: VLDB Features  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Saturday 2007-12-15 02:14, Simon Riggs wrote:
> On Fri, 2007-12-14 at 18:22 -0500, Tom Lane wrote:
> > Neil Conway <neilc@samurai.com> writes:
> > > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
> > > to drop (and log) rows that contain malformed data. That is, rows with
> > > too many or too few columns, rows that result in constraint violations,
> > > and rows containing columns where the data type's input function raises
> > > an error. The last case is the only thing that would be a bit tricky to
> > > implement, I think: you could use PG_TRY() around the
> > > InputFunctionCall, but I guess you'd need a subtransaction to ensure
> > > that you reset your state correctly after catching an error.
> >
> > Yeah.  It's the subtransaction per row that's daunting --- not only the
> > cycles spent for that, but the ensuing limitation to 4G rows imported
> > per COPY.
>
> I'd suggest doing everything at block level
> - wrap each new block of data in a subtransaction
> - apply data to the table block by block (can still work with FSM).
> - apply indexes in bulk for each block, unique ones first.
>
> That then gives you a limit of more than 500 trillion rows, which should
> be enough for anyone.

Wouldn't it only give you more than 500T rows in the best case?  If it hits a 
bad row it has to back off and roll forward one row and one subtransaction at 
a time for the failed block.  So in the worst case, where there is at least 
one exception row per block, I think you would still wind up with only a 
capacity of 4G rows.


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: psql's describe command (for sequences) output improvement
Next
From: Bruce Momjian
Date:
Subject: Re: pgindent issue with EXEC_BACKEND-only typedefs