Re: Benchmark Data requested - Mailing list pgsql-performance

From Jignesh K. Shah
Subject Re: Benchmark Data requested
Date
Msg-id 47A8CAED.1040402@sun.com
Whole thread Raw
In response to Re: Benchmark Data requested  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-performance
Commercial Db bulk loaders work the same way.. they give you an option
as a fast loader provided in case of error, the whole table is
truncated.  This I think also has real life advantages where PostgreSQL
is used as datamarts which are recreated every now and then from other
systems and they want fast loaders. So its not just the benchmarking
folks like me that will take advantage of such features. INFACT I have
seen that they force the clause "REPLACE TABLE" in the sense that will
infact truncate the table before loading so there is no confusion what
happens to the original data in the table and only then it avoids the logs.


to be honest, its not the WAL Writes to the disk that I am worried
about.. According to my tests, async_commit is coming pretty close to
sync=off and solves the WALWriteLock contention. We should maybe just
focus on making it more efficient which I think also involves
WALInsertLock that may not be entirely efficient.


Also all changes have to be addon options and not replacement for
existing loads, I totally agree to that point.. The guys in production
support don't even like optimizer query plan changes, forget  corrupt
index. (I have spent two days in previous role trying to figure out why
a particular query plan on another database changed in production.)






Simon Riggs wrote:
> On Tue, 2008-02-05 at 13:47 -0500, Jignesh K. Shah wrote:
>
>> That sounds cool to me too..
>>
>> How much work is to make pg_bulkload to work on 8.3? An Integrated
>> version is certainly more beneficial.
>>
>
>
>> Specially I think it will also help for other setups like TPC-E too
>> where this is a problem.
>>
>
> If you don't write WAL then you can lose all your writes in a crash.
> That issue is surmountable on a table with no indexes, or even
> conceivably with one monotonically ascending index. With other indexes
> if we crash then we have a likely corrupt index.
>
> For most production systems I'm aware of, losing an index on a huge
> table is not anything you'd want to trade for performance. Assuming
> you've ever been knee-deep in it on a real server.
>
> Maybe we can have a "load mode" for a table where we skip writing any
> WAL, but if we crash we just truncate the whole table to nothing? Issue
> a WARNING if we enable this mode while any data in table. I'm nervous of
> it, but maybe people really want it?
>
> I don't really want to invent ext2 all over again, so we have to run an
> fsck on a table of we crash while loading. My concern is that many
> people would choose that then blame us for delivering unreliable
> software. e.g. direct path loader on Oracle used to corrupt a PK index
> if you loaded duplicate rows with it (whether it still does I couldn't
> care). That kind of behaviour is simply incompatible with production
> usage, even if it does good benchmark.
>
>

pgsql-performance by date:

Previous
From: "Heikki Linnakangas"
Date:
Subject: Re: Benchmark Data requested
Next
From: "Jignesh K. Shah"
Date:
Subject: Re: Benchmark Data requested