Re: Benchmark Data requested - Mailing list pgsql-performance

From Matthew
Subject Re: Benchmark Data requested
Date
Msg-id Pine.LNX.4.64.0802051439300.20402@aragorn.flymine.org
Whole thread Raw
In response to Re: Benchmark Data requested  (Dimitri Fontaine <dfontaine@hi-media.com>)
List pgsql-performance
Apologies for the blank email - mailer problems. I lost all my nicely
typed stuff, too.

On Tue, 5 Feb 2008, Dimitri Fontaine wrote:
>> Multiple table loads ( 1 per table) spawned via script  is bit better
>> but hits wal problems.
>
> pgloader will too hit the WAL problem, but it still may have its benefits, or
> at least we will soon (you can already if you take it from CVS) be able to
> measure if the parallel loading at the client side is a good idea perf. wise.

You'll have to be careful here. Depending on the filesystem, writing large
amounts of data to two files simultaneously can results in the blocks
being interleaved to some degree on the disc, which can cause performance
problems later on.

As for the WAL, I have an suggestion, but I'm aware that I don't know how
PG actually does it, so you'll have to tell me if it is valid.

My impression is that the WAL is used to store writes in a transactional
manner, for writes that can't be written in a transactional manner
directly to the data files. Hence the suggestion for restoring database
dumps to run the whole restore in one transaction, which means that the
table creation is in the same transaction as loading the data into it.
Since the table is not visible to other backends, the writes to it do not
need to go to the WAL, and PG is clever enough to do this.

My suggestion is to extend that slightly. If there is a large chunk of
data to be written to a table, which will be entirely to empty pages or
appended to the of the data file, then there is no risk of corruption of
existing data, and that write could be made directly to the table. You
would have to write a WAL entry reserving the space in the data file, and
then write the data to the file. Then when that WAL entry is checkpointed,
no work would be required.

This would improve the performance of database restores and large writes
which expand the table's data file. So, would it work?

Matthew

--
If pro is the opposite of con, what is the opposite of progress?

pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Benchmark Data requested
Next
From: Matthew
Date:
Subject: Re: Benchmark Data requested