Re: LVM snapshots - Mailing list pgsql-admin
From | Tom Lane |
---|---|
Subject | Re: LVM snapshots |
Date | |
Msg-id | 7901.1049479338@sss.pgh.pa.us Whole thread Raw |
In response to | Re: LVM snapshots (Murthy Kambhampaty <murthy.kambhampaty@goeci.com>) |
Responses |
Re: LVM snapshots
|
List | pgsql-admin |
Murthy Kambhampaty <murthy.kambhampaty@goeci.com> writes: > If somebody wants to try to replicate this: > 1. Create a table with several hundred thousand records > 2. Start an "insert into tbl2 from select ..." (multiple ones in parallel > maximize your chance of hitting this) > 3. Freeze the filesystem, create a snapshot, unfreeze the filesystem, mount > the snapshot and copy $PGDATA somewhere else > 4. Start a postmaster on the copy of the database cluster; run VACUUM > ANALYZE on tlb2 > you should see a series of > NOTICE: Rel tbl2: Uninitialized page 'nnnnn' - fixing > messages Hmm. This amounts to testing what happens after a Postgres crash (the copied PGDATA will look exactly like the system had gone down at the instant of the freeze). In that scenario zeroed pages aren't too improbable, although I wouldn't expect to see a large number of them. Are you running with a nondefault WAL_BUFFERS setting? The way it could happen is like this: you have backend(s) busily extending the table and filling each page with new tuples. Each time we extend a table, we actually write a page of zeroes to the kernel --- this is so that we'll find out right away if we're out of disk space. Then, *inside the Postgres shared-buffer arena*, the page is initialized with correct header data and filled with tuples. Unless you are short on shared buffers, it will probably not be written out to the kernel until the next checkpoint. So there is an interval where the kernel thinks that this page of the file exists but contains zeroes. Meanwhile, WAL entries for the new tuples are being written into the WAL buffers arena. But they aren't going to be forced to disk until there's a transaction commit (or the WAL buffers become full). If the first WAL entry describing a tuple insertion into the new page gets out to disk (or out to the kernel at least) before the filesystem freeze or Postgres crash, then WAL replay will initialize the zeroed page and you won't see any complaints. But there is clearly an interval where a zeroed page can be added to a table and not initialized by WAL replay. This is all okay and no data loss is possible through this mechanism. We can only lose uncommitted tuples, since a commit would have forced out the WAL entries. But it's hard to see how you could have more than approximately WAL_BUFFERS uninitialized pages present due to this scenario --- a WAL entry can remain unwritten only as long as less than WAL_BUFFERS worth of subsequent WAL traffic has occurred, so the physical extensions of the file can't get all that far ahead of what WAL replay will be able to see. If you're just seeing a few of these notices per try, then I don't think there's anything to worry about (especially if they reference pages very near the end of their tables). Otherwise, there may be some other behavior involved, and we'd better look more closely. regards, tom lane
pgsql-admin by date: