Thread: pg_xlog on separate drive

pg_xlog on separate drive

From
"Travis Whitton"
Date:
Hey guys, sorry if this is slightly OT for this list, but I figure it's a simple question. If I'm storing pg_xlog on a
secondnon-redundant drive using the symlink method and the journal drive were to crash, how difficult is recovery? Will
Postgresqlsimply be able to reinitialize the journal on a new drive and carry on, or is there more to it than that? I
realizeany pending transactions would be lost, but that's not a huge concern for me because everything I'm importing
comesfrom raw data. <br /><br />Thanks,<br />Travis<br /> 

Re: pg_xlog on separate drive

From
Tom Lane
Date:
"Travis Whitton" <tinymountain@gmail.com> writes:
> Hey guys, sorry if this is slightly OT for this list, but I figure it's a
> simple question. If I'm storing pg_xlog on a second non-redundant drive
> using the symlink method and the journal drive were to crash, how difficult
> is recovery? Will Postgresql simply be able to reinitialize the journal on a
> new drive and carry on, or is there more to it than that? I realize any
> pending transactions would be lost, but that's not a huge concern for me
> because everything I'm importing comes from raw data.

Losing xlog is pretty bad: there's a serious risk of data corruption, in
that transactions made since your last checkpoint may be only partially
applied.  I wouldn't recommend a setup in which xlog is less redundant
than your main storage array.
        regards, tom lane


Re: pg_xlog on separate drive

From
Markus Schaber
Date:
Hi, Travis,

Travis Whitton wrote:
> Hey guys, sorry if this is slightly OT for this list, but I figure it's
> a simple question. If I'm storing pg_xlog on a second non-redundant
> drive using the symlink method and the journal drive were to crash, how
> difficult is recovery? Will Postgresql simply be able to reinitialize
> the journal on a new drive and carry on, or is there more to it than
> that? I realize any pending transactions would be lost, but that's not a
> huge concern for me because everything I'm importing comes from raw data.

The problem is that you risk inconsistency at data and structural level.

When the server crashes, it might happen that some pages in the data
files are written only partially (because most disks have a much smaller
blocksize than the PostgreSQL page size (which is 8k by default)).

Now, when the server cannot reply the WAL log, those half-written pages
will not be repaired, and your data may be inconsistent at a very low
sematic level (duplicate rows, missing rows, broken rows, backend
crashes etc.) with no way to repair.

HTH,
Markus
-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


Re: pg_xlog on separate drive

From
"Travis Whitton"
Date:
Thanks for the replies guys. I think I may be ok in my case though because I'll be importing data as a single daily batch from raw data. I'll be taking nightly backups, and in the event of a crash, I can simply restore from a recent backup and then reimport the raw data. I can now see why losing pg_xlog would be a big problem if I were inserting and updating data continuously throughout the day though.

Thanks,
Travis

On 12/4/06, Markus Schaber <schabi@logix-tt.com> wrote:
Hi, Travis,

Travis Whitton wrote:
> Hey guys, sorry if this is slightly OT for this list, but I figure it's
> a simple question. If I'm storing pg_xlog on a second non-redundant
> drive using the symlink method and the journal drive were to crash, how
> difficult is recovery? Will Postgresql simply be able to reinitialize
> the journal on a new drive and carry on, or is there more to it than
> that? I realize any pending transactions would be lost, but that's not a
> huge concern for me because everything I'm importing comes from raw data.

The problem is that you risk inconsistency at data and structural level.

When the server crashes, it might happen that some pages in the data
files are written only partially (because most disks have a much smaller
blocksize than the PostgreSQL page size (which is 8k by default)).

Now, when the server cannot reply the WAL log, those half-written pages
will not be repaired, and your data may be inconsistent at a very low
sematic level (duplicate rows, missing rows, broken rows, backend
crashes etc.) with no way to repair.

HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org