Re: Benchmark Data requested - Mailing list pgsql-performance
From | Jignesh K. Shah |
---|---|
Subject | Re: Benchmark Data requested |
Date | |
Msg-id | 47A8CAED.1040402@sun.com Whole thread Raw |
In response to | Re: Benchmark Data requested (Simon Riggs <simon@2ndquadrant.com>) |
List | pgsql-performance |
Commercial Db bulk loaders work the same way.. they give you an option as a fast loader provided in case of error, the whole table is truncated. This I think also has real life advantages where PostgreSQL is used as datamarts which are recreated every now and then from other systems and they want fast loaders. So its not just the benchmarking folks like me that will take advantage of such features. INFACT I have seen that they force the clause "REPLACE TABLE" in the sense that will infact truncate the table before loading so there is no confusion what happens to the original data in the table and only then it avoids the logs. to be honest, its not the WAL Writes to the disk that I am worried about.. According to my tests, async_commit is coming pretty close to sync=off and solves the WALWriteLock contention. We should maybe just focus on making it more efficient which I think also involves WALInsertLock that may not be entirely efficient. Also all changes have to be addon options and not replacement for existing loads, I totally agree to that point.. The guys in production support don't even like optimizer query plan changes, forget corrupt index. (I have spent two days in previous role trying to figure out why a particular query plan on another database changed in production.) Simon Riggs wrote: > On Tue, 2008-02-05 at 13:47 -0500, Jignesh K. Shah wrote: > >> That sounds cool to me too.. >> >> How much work is to make pg_bulkload to work on 8.3? An Integrated >> version is certainly more beneficial. >> > > >> Specially I think it will also help for other setups like TPC-E too >> where this is a problem. >> > > If you don't write WAL then you can lose all your writes in a crash. > That issue is surmountable on a table with no indexes, or even > conceivably with one monotonically ascending index. With other indexes > if we crash then we have a likely corrupt index. > > For most production systems I'm aware of, losing an index on a huge > table is not anything you'd want to trade for performance. Assuming > you've ever been knee-deep in it on a real server. > > Maybe we can have a "load mode" for a table where we skip writing any > WAL, but if we crash we just truncate the whole table to nothing? Issue > a WARNING if we enable this mode while any data in table. I'm nervous of > it, but maybe people really want it? > > I don't really want to invent ext2 all over again, so we have to run an > fsck on a table of we crash while loading. My concern is that many > people would choose that then blame us for delivering unreliable > software. e.g. direct path loader on Oracle used to corrupt a PK index > if you loaded duplicate rows with it (whether it still does I couldn't > care). That kind of behaviour is simply incompatible with production > usage, even if it does good benchmark. > >
pgsql-performance by date: