Re: pg_restore takes ages - Mailing list pgsql-general

From Tom Lane
Subject Re: pg_restore takes ages
Date
Msg-id 1182.1065216584@sss.pgh.pa.us
Whole thread Raw
In response to Re: pg_restore takes ages  ("scott.marlowe" <scott.marlowe@ihs.com>)
Responses Re: pg_restore takes ages
List pgsql-general
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> Yes, you are.  Basically, with fsync on, things have to happen in order.
> I.e.
> write to WAL what you're gonna do.  WAIT for confirmation on write
> write the tuples out.  wait for confirmation
> checkpoint the WAL.  wait for confirmation

Not really.  With fsync on, we *only* sync the WAL writes.  Data writes
can happen whenever, so long as we know the corresponding WAL writes
went down first.  We only wait for data writes to complete before
considering that a checkpoint is complete --- which is something that is
not in the main line of execution and doesn't block other activity.

This is one good reason for keeping WAL on a separate drive from the
data files --- you are then freeing the system to schedule data I/O as
optimally as it can.

> Note that if you're running on IDE drives, you already ARE probably
> running with fsync off if write caching is enabled, so you'll need to turn
> it off (hdparm -W0 /dev/hdx in linux) to ensure fsync actually works.

It'd be interesting to think about whether a write-caching IDE drive
could safely be used for data storage, if WAL is elsewhere.

Right offhand I think the only problem is how to know when it's safe
to consider a checkpoint complete.  Maybe all that would be needed is
a long enough time delay after issuing sync(2) in the checkpoint code.
Do these drives guarantee "data will be written within 30 seconds" or
something like that?  Or can the delay be indefinite when load is heavy?

            regards, tom lane

pgsql-general by date:

Previous
From: Gene Vital
Date:
Subject: Re: book on advanced postgresql?
Next
From: "scott.marlowe"
Date:
Subject: Re: pg_restore takes ages