Thread: pg_xlog on separate drive
Hey guys, sorry if this is slightly OT for this list, but I figure it's a simple question. If I'm storing pg_xlog on a secondnon-redundant drive using the symlink method and the journal drive were to crash, how difficult is recovery? Will Postgresqlsimply be able to reinitialize the journal on a new drive and carry on, or is there more to it than that? I realizeany pending transactions would be lost, but that's not a huge concern for me because everything I'm importing comesfrom raw data. <br /><br />Thanks,<br />Travis<br />
"Travis Whitton" <tinymountain@gmail.com> writes: > Hey guys, sorry if this is slightly OT for this list, but I figure it's a > simple question. If I'm storing pg_xlog on a second non-redundant drive > using the symlink method and the journal drive were to crash, how difficult > is recovery? Will Postgresql simply be able to reinitialize the journal on a > new drive and carry on, or is there more to it than that? I realize any > pending transactions would be lost, but that's not a huge concern for me > because everything I'm importing comes from raw data. Losing xlog is pretty bad: there's a serious risk of data corruption, in that transactions made since your last checkpoint may be only partially applied. I wouldn't recommend a setup in which xlog is less redundant than your main storage array. regards, tom lane
Hi, Travis, Travis Whitton wrote: > Hey guys, sorry if this is slightly OT for this list, but I figure it's > a simple question. If I'm storing pg_xlog on a second non-redundant > drive using the symlink method and the journal drive were to crash, how > difficult is recovery? Will Postgresql simply be able to reinitialize > the journal on a new drive and carry on, or is there more to it than > that? I realize any pending transactions would be lost, but that's not a > huge concern for me because everything I'm importing comes from raw data. The problem is that you risk inconsistency at data and structural level. When the server crashes, it might happen that some pages in the data files are written only partially (because most disks have a much smaller blocksize than the PostgreSQL page size (which is 8k by default)). Now, when the server cannot reply the WAL log, those half-written pages will not be repaired, and your data may be inconsistent at a very low sematic level (duplicate rows, missing rows, broken rows, backend crashes etc.) with no way to repair. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
Thanks for the replies guys. I think I may be ok in my case though because I'll be importing data as a single daily batch from raw data. I'll be taking nightly backups, and in the event of a crash, I can simply restore from a recent backup and then reimport the raw data. I can now see why losing pg_xlog would be a big problem if I were inserting and updating data continuously throughout the day though.
Thanks,
Travis
Thanks,
Travis
On 12/4/06, Markus Schaber <schabi@logix-tt.com> wrote:
Hi, Travis,
Travis Whitton wrote:
> Hey guys, sorry if this is slightly OT for this list, but I figure it's
> a simple question. If I'm storing pg_xlog on a second non-redundant
> drive using the symlink method and the journal drive were to crash, how
> difficult is recovery? Will Postgresql simply be able to reinitialize
> the journal on a new drive and carry on, or is there more to it than
> that? I realize any pending transactions would be lost, but that's not a
> huge concern for me because everything I'm importing comes from raw data.
The problem is that you risk inconsistency at data and structural level.
When the server crashes, it might happen that some pages in the data
files are written only partially (because most disks have a much smaller
blocksize than the PostgreSQL page size (which is 8k by default)).
Now, when the server cannot reply the WAL log, those half-written pages
will not be repaired, and your data may be inconsistent at a very low
sematic level (duplicate rows, missing rows, broken rows, backend
crashes etc.) with no way to repair.
HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS
Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org