Thread: Re: [BUGS] BUG #7521: Cannot disable WAL log while using pg_dump

Re: [BUGS] BUG #7521: Cannot disable WAL log while using pg_dump

From
Pavan Deolasee
Date:
(Adding -hackers. Did not realize it got dropped)

On Fri, Sep 7, 2012 at 11:25 PM, Gezeala M. Bacuño II <gezeala@gmail.com> wrote:
On Fri, Sep 7, 2012 at 7:17 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
>
> On Fri, Sep 7, 2012 at 7:00 PM, Marie Bacuno II <gezeala@gmail.com> wrote:
>>
>>
>> On Sep 7, 2012, at 2:19, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>>
>>
>> > or have long running transactions ?
>>
>> Yes but I don't think there are when the snapshot was taken. Does the
>> pg_xlog_location_diff() result from latest and prior checkpoint upon
>> start-up indicates the size of replayed changes?
>>
>
> Thats the amount of additional WAL generated after you started the server.
>
>>
>> >
>> > BTW, the following query returns ~60GB. Thats the amount of WAL
>> > written after the server was started and at the end of pg_dump (I
>> > don't think pg_xlog_location_diff() is available in the older
>> > releases).
>> >
>> > postgres=# select pg_xlog_location_diff('4450/7A14F280',
>> > '4441/5E681F38')/(2^30);
>> >    ?column?
>> > ------------------
>> > 60.1980484202504
>>
>> It'll be great to know what the wals modified..?
>
>
> You would need something like xlogdump to decipher them. I quickly tried
> this and it seems to work against 8.4 version that you are running.
> https://github.com/snaga/xlogdump
>
> Download the source code, compile and run it against one of the most recent
> WAL files in the cluster against which you ran pg_dump. You would need to
> set PATH to contain the pg_config of the server you are running. Please post
> the output.
>
> Thanks,
> Pavan
>
>

Here you go:

## last WAL
$ xlogdump -S /dbpool/data/pg_xlog/00000001000044500000007A

/dbpool/data/pg_xlog/00000001000044500000007A:

Unexpected page info flags 0003 at offset 0
Skipping unexpected continuation record at offset 0
ReadRecord: record with zero len at 17488/7A14F310
Unexpected page info flags 0001 at offset 150000
Skipping unexpected continuation record at offset 150000
Unable to read continuation page?
 ** maybe continues to next segment **
---------------------------------------------------------------
TimeLineId: 1, LogId: 17488, LogSegment: 122

Resource manager stats:
  [0]XLOG      : 3 records, 120 bytes (avg 40.0 bytes)
                 checkpoint: 3, switch: 0, backup end: 0
  [1]Transaction: 0 record, 0 byte (avg 0.0 byte)
                 commit: 0, abort: 0
  [2]Storage   : 0 record, 0 byte (avg 0.0 byte)
  [3]CLOG      : 0 record, 0 byte (avg 0.0 byte)
  [4]Database  : 0 record, 0 byte (avg 0.0 byte)
  [5]Tablespace: 0 record, 0 byte (avg 0.0 byte)
  [6]MultiXact : 0 record, 0 byte (avg 0.0 byte)
  [7]Reserved 7: 0 record, 0 byte (avg 0.0 byte)
  [8]Reserved 8: 0 record, 0 byte (avg 0.0 byte)
  [9]Heap2     : 2169 records, 43380 bytes (avg 20.0 bytes)
  [10]Heap      : 0 record, 0 byte (avg 0.0 byte)
                 ins: 0, upd/hot_upd: 0/0, del: 0
  [11]Btree     : 0 record, 0 byte (avg 0.0 byte)
  [12]Hash      : 0 record, 0 byte (avg 0.0 byte)
  [13]Gin       : 0 record, 0 byte (avg 0.0 byte)
  [14]Gist      : 0 record, 0 byte (avg 0.0 byte)
  [15]Sequence  : 0 record, 0 byte (avg 0.0 byte)

Backup block stats: 2169 blocks, 16551816 bytes (avg 7631.1 bytes)


I think both my theories seem to be holding up. Heap2 resource manager is used only for vacuum freeze, lazy vacuum or HOT prune. Given your access pattern, I bet its the third activity that kicking in on your database. You got many pages with dead tuples and they are getting cleaned at the first opportunity, which happens to be the pg_dump thats run immediately after the server restart. This is seen by all 2169 WAL records in the file being attributed to the Heap2 RM above.

Whats additionally happening is each of these records are on different heap pages. The cleanup activity dirties those pages. Since each of these pages is being dirtied for the first time after a recent checkpoint and full_page_writes is turned ON, entire page is backed up in the WAL record. You can see the exact number of backup blocks in the stats above.

I don't think we have any mechanism to control or stop HOT from doing what it wants to do, unless you are willing to run a modified server for this reason. But you can at least bring down the WAL volume by turning full_page_writes OFF.

Thanks,
Pavan

Re: [BUGS] BUG #7521: Cannot disable WAL log while using pg_dump

From
Gezeala M. Bacuño II
Date:
adding pgsql-bugs list in case OP posts back.

On Fri, Sep 7, 2012 at 11:29 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> (Adding -hackers. Did not realize it got dropped)
>
> On Fri, Sep 7, 2012 at 11:25 PM, Gezeala M. Bacuño II <gezeala@gmail.com>
> wrote:
>>
>> On Fri, Sep 7, 2012 at 7:17 AM, Pavan Deolasee <pavan.deolasee@gmail.com>
>> wrote:
>> >
>> >
>> > On Fri, Sep 7, 2012 at 7:00 PM, Marie Bacuno II <gezeala@gmail.com>
>> > wrote:
>> >>
>> >>
>> >> On Sep 7, 2012, at 2:19, Pavan Deolasee <pavan.deolasee@gmail.com>
>> >> wrote:
>> >>
>> >>
>> >> > or have long running transactions ?
>> >>
>> >> Yes but I don't think there are when the snapshot was taken. Does the
>> >> pg_xlog_location_diff() result from latest and prior checkpoint upon
>> >> start-up indicates the size of replayed changes?
>> >>
>> >
>> > Thats the amount of additional WAL generated after you started the
>> > server.
>> >
>> >>
>> >> >
>> >> > BTW, the following query returns ~60GB. Thats the amount of WAL
>> >> > written after the server was started and at the end of pg_dump (I
>> >> > don't think pg_xlog_location_diff() is available in the older
>> >> > releases).
>> >> >
>> >> > postgres=# select pg_xlog_location_diff('4450/7A14F280',
>> >> > '4441/5E681F38')/(2^30);
>> >> >    ?column?
>> >> > ------------------
>> >> > 60.1980484202504
>> >>
>> >> It'll be great to know what the wals modified..?
>> >
>> >
>> > You would need something like xlogdump to decipher them. I quickly tried
>> > this and it seems to work against 8.4 version that you are running.
>> > https://github.com/snaga/xlogdump
>> >
>> > Download the source code, compile and run it against one of the most
>> > recent
>> > WAL files in the cluster against which you ran pg_dump. You would need
>> > to
>> > set PATH to contain the pg_config of the server you are running. Please
>> > post
>> > the output.
>> >
>> > Thanks,
>> > Pavan
>> >
>> >
>>
>> Here you go:
>>
>> ## last WAL
>> $ xlogdump -S /dbpool/data/pg_xlog/00000001000044500000007A
>>
>> /dbpool/data/pg_xlog/00000001000044500000007A:
>>
>> Unexpected page info flags 0003 at offset 0
>> Skipping unexpected continuation record at offset 0
>> ReadRecord: record with zero len at 17488/7A14F310
>> Unexpected page info flags 0001 at offset 150000
>> Skipping unexpected continuation record at offset 150000
>> Unable to read continuation page?
>>  ** maybe continues to next segment **
>> ---------------------------------------------------------------
>> TimeLineId: 1, LogId: 17488, LogSegment: 122
>>
>> Resource manager stats:
>>   [0]XLOG      : 3 records, 120 bytes (avg 40.0 bytes)
>>                  checkpoint: 3, switch: 0, backup end: 0
>>   [1]Transaction: 0 record, 0 byte (avg 0.0 byte)
>>                  commit: 0, abort: 0
>>   [2]Storage   : 0 record, 0 byte (avg 0.0 byte)
>>   [3]CLOG      : 0 record, 0 byte (avg 0.0 byte)
>>   [4]Database  : 0 record, 0 byte (avg 0.0 byte)
>>   [5]Tablespace: 0 record, 0 byte (avg 0.0 byte)
>>   [6]MultiXact : 0 record, 0 byte (avg 0.0 byte)
>>   [7]Reserved 7: 0 record, 0 byte (avg 0.0 byte)
>>   [8]Reserved 8: 0 record, 0 byte (avg 0.0 byte)
>>   [9]Heap2     : 2169 records, 43380 bytes (avg 20.0 bytes)
>>   [10]Heap      : 0 record, 0 byte (avg 0.0 byte)
>>                  ins: 0, upd/hot_upd: 0/0, del: 0
>>   [11]Btree     : 0 record, 0 byte (avg 0.0 byte)
>>   [12]Hash      : 0 record, 0 byte (avg 0.0 byte)
>>   [13]Gin       : 0 record, 0 byte (avg 0.0 byte)
>>   [14]Gist      : 0 record, 0 byte (avg 0.0 byte)
>>   [15]Sequence  : 0 record, 0 byte (avg 0.0 byte)
>>
>> Backup block stats: 2169 blocks, 16551816 bytes (avg 7631.1 bytes)
>>
>
> I think both my theories seem to be holding up. Heap2 resource manager is
> used only for vacuum freeze, lazy vacuum or HOT prune. Given your access
> pattern, I bet its the third activity that kicking in on your database. You
> got many pages with dead tuples and they are getting cleaned at the first
> opportunity, which happens to be the pg_dump thats run immediately after the
> server restart. This is seen by all 2169 WAL records in the file being
> attributed to the Heap2 RM above.
>
> Whats additionally happening is each of these records are on different heap
> pages. The cleanup activity dirties those pages. Since each of these pages
> is being dirtied for the first time after a recent checkpoint and
> full_page_writes is turned ON, entire page is backed up in the WAL record.
> You can see the exact number of backup blocks in the stats above.
>
> I don't think we have any mechanism to control or stop HOT from doing what
> it wants to do, unless you are willing to run a modified server for this
> reason. But you can at least bring down the WAL volume by turning
> full_page_writes OFF.
>
> Thanks,
> Pavan

Great. Finally got some light on this. I'll disable full_page_writes
on my next backup and will post back results tomorrow. Thanks.



Re: [BUGS] BUG #7521: Cannot disable WAL log while using pg_dump

From
Gezeala M. Bacuño II
Date:
On Fri, Sep 7, 2012 at 11:40 AM, Gezeala M. Bacuño II <gezeala@gmail.com> wrote:
> adding pgsql-bugs list in case OP posts back.
>
> On Fri, Sep 7, 2012 at 11:29 AM, Pavan Deolasee
> <pavan.deolasee@gmail.com> wrote:
>> (Adding -hackers. Did not realize it got dropped)
>>
>> On Fri, Sep 7, 2012 at 11:25 PM, Gezeala M. Bacuño II <gezeala@gmail.com>
>> wrote:
>>>
>>> On Fri, Sep 7, 2012 at 7:17 AM, Pavan Deolasee <pavan.deolasee@gmail.com>
>>> wrote:
>>> >
>>> >
>>> > On Fri, Sep 7, 2012 at 7:00 PM, Marie Bacuno II <gezeala@gmail.com>
>>> > wrote:
>>> >>
>>> >>
>>> >> On Sep 7, 2012, at 2:19, Pavan Deolasee <pavan.deolasee@gmail.com>
>>> >> wrote:
>>> >>
>>> >>
>>> >> > or have long running transactions ?
>>> >>
>>> >> Yes but I don't think there are when the snapshot was taken. Does the
>>> >> pg_xlog_location_diff() result from latest and prior checkpoint upon
>>> >> start-up indicates the size of replayed changes?
>>> >>
>>> >
>>> > Thats the amount of additional WAL generated after you started the
>>> > server.
>>> >
>>> >>
>>> >> >
>>> >> > BTW, the following query returns ~60GB. Thats the amount of WAL
>>> >> > written after the server was started and at the end of pg_dump (I
>>> >> > don't think pg_xlog_location_diff() is available in the older
>>> >> > releases).
>>> >> >
>>> >> > postgres=# select pg_xlog_location_diff('4450/7A14F280',
>>> >> > '4441/5E681F38')/(2^30);
>>> >> >    ?column?
>>> >> > ------------------
>>> >> > 60.1980484202504
>>> >>
>>> >> It'll be great to know what the wals modified..?
>>> >
>>> >
>>> > You would need something like xlogdump to decipher them. I quickly tried
>>> > this and it seems to work against 8.4 version that you are running.
>>> > https://github.com/snaga/xlogdump
>>> >
>>> > Download the source code, compile and run it against one of the most
>>> > recent
>>> > WAL files in the cluster against which you ran pg_dump. You would need
>>> > to
>>> > set PATH to contain the pg_config of the server you are running. Please
>>> > post
>>> > the output.
>>> >
>>> > Thanks,
>>> > Pavan
>>> >
>>> >
>>>
>>> Here you go:
>>>
>>> ## last WAL
>>> $ xlogdump -S /dbpool/data/pg_xlog/00000001000044500000007A
>>>
>>> /dbpool/data/pg_xlog/00000001000044500000007A:
>>>
>>> Unexpected page info flags 0003 at offset 0
>>> Skipping unexpected continuation record at offset 0
>>> ReadRecord: record with zero len at 17488/7A14F310
>>> Unexpected page info flags 0001 at offset 150000
>>> Skipping unexpected continuation record at offset 150000
>>> Unable to read continuation page?
>>>  ** maybe continues to next segment **
>>> ---------------------------------------------------------------
>>> TimeLineId: 1, LogId: 17488, LogSegment: 122
>>>
>>> Resource manager stats:
>>>   [0]XLOG      : 3 records, 120 bytes (avg 40.0 bytes)
>>>                  checkpoint: 3, switch: 0, backup end: 0
>>>   [1]Transaction: 0 record, 0 byte (avg 0.0 byte)
>>>                  commit: 0, abort: 0
>>>   [2]Storage   : 0 record, 0 byte (avg 0.0 byte)
>>>   [3]CLOG      : 0 record, 0 byte (avg 0.0 byte)
>>>   [4]Database  : 0 record, 0 byte (avg 0.0 byte)
>>>   [5]Tablespace: 0 record, 0 byte (avg 0.0 byte)
>>>   [6]MultiXact : 0 record, 0 byte (avg 0.0 byte)
>>>   [7]Reserved 7: 0 record, 0 byte (avg 0.0 byte)
>>>   [8]Reserved 8: 0 record, 0 byte (avg 0.0 byte)
>>>   [9]Heap2     : 2169 records, 43380 bytes (avg 20.0 bytes)
>>>   [10]Heap      : 0 record, 0 byte (avg 0.0 byte)
>>>                  ins: 0, upd/hot_upd: 0/0, del: 0
>>>   [11]Btree     : 0 record, 0 byte (avg 0.0 byte)
>>>   [12]Hash      : 0 record, 0 byte (avg 0.0 byte)
>>>   [13]Gin       : 0 record, 0 byte (avg 0.0 byte)
>>>   [14]Gist      : 0 record, 0 byte (avg 0.0 byte)
>>>   [15]Sequence  : 0 record, 0 byte (avg 0.0 byte)
>>>
>>> Backup block stats: 2169 blocks, 16551816 bytes (avg 7631.1 bytes)
>>>
>>
>> I think both my theories seem to be holding up. Heap2 resource manager is
>> used only for vacuum freeze, lazy vacuum or HOT prune. Given your access
>> pattern, I bet its the third activity that kicking in on your database. You
>> got many pages with dead tuples and they are getting cleaned at the first
>> opportunity, which happens to be the pg_dump thats run immediately after the
>> server restart. This is seen by all 2169 WAL records in the file being
>> attributed to the Heap2 RM above.
>>
>> Whats additionally happening is each of these records are on different heap
>> pages. The cleanup activity dirties those pages. Since each of these pages
>> is being dirtied for the first time after a recent checkpoint and
>> full_page_writes is turned ON, entire page is backed up in the WAL record.
>> You can see the exact number of backup blocks in the stats above.
>>
>> I don't think we have any mechanism to control or stop HOT from doing what
>> it wants to do, unless you are willing to run a modified server for this
>> reason. But you can at least bring down the WAL volume by turning
>> full_page_writes OFF.
>>
>> Thanks,
>> Pavan
>
> Great. Finally got some light on this. I'll disable full_page_writes
> on my next backup and will post back results tomorrow. Thanks.

Results with full_page_writes disabled:

pg_controldata right after cluster start:
Latest checkpoint location:           4469/B8352EB8
Prior checkpoint location:            4469/B5EAE428
Latest checkpoint's REDO location:    4469/B8352EB8
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/419770454
Latest checkpoint's NextOID:          654192504
Latest checkpoint's NextMultiXactId:  108124
Latest checkpoint's NextMultiOffset:  220345

pg_controldata after full dump:
Latest checkpoint location:           446A/22A93370
Prior checkpoint location:            446A/22A93328
Latest checkpoint's REDO location:    446A/22A93370
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/419770454
Latest checkpoint's NextOID:          654192504
Latest checkpoint's NextMultiXactId:  108124
Latest checkpoint's NextMultiOffset:  220345


$ xlogdump -S /dbpool/data/pg_xlog/000000010000446A00000022

/dbpool/data/pg_xlog/000000010000446A00000022:

ReadRecord: record with zero len at 17514/22A933B8
---------------------------------------------------------------
TimeLineId: 1, LogId: 17514, LogSegment: 34

Resource manager stats: [0]XLOG      : 3 records, 120 bytes (avg 40.0 bytes)                checkpoint: 3, switch: 0,
backupend: 0 [1]Transaction: 0 record, 0 byte (avg 0.0 byte)                commit: 0, abort: 0 [2]Storage   : 0
record,0 byte (avg 0.0 byte) [3]CLOG      : 0 record, 0 byte (avg 0.0 byte) [4]Database  : 0 record, 0 byte (avg 0.0
byte)[5]Tablespace: 0 record, 0 byte (avg 0.0 byte) [6]MultiXact : 0 record, 0 byte (avg 0.0 byte) [7]Reserved 7: 0
record,0 byte (avg 0.0 byte) [8]Reserved 8: 0 record, 0 byte (avg 0.0 byte) [9]Heap2     : 68125 records, 8645016 bytes
(avg126.9 bytes) [10]Heap      : 28952 records, 4640535 bytes (avg 160.3 bytes)                ins: 28952, upd/hot_upd:
0/0,del: 0 [11]Btree     : 0 record, 0 byte (avg 0.0 byte) [12]Hash      : 0 record, 0 byte (avg 0.0 byte) [13]Gin
: 0 record, 0 byte (avg 0.0 byte) [14]Gist      : 0 record, 0 byte (avg 0.0 byte) [15]Sequence  : 0 record, 0 byte (avg
0.0byte) 

Backup block stats: 0 block, 0 byte (avg 0.0 byte)

zfs USED property of clone after backup and full_page_write disabled: 298GB



Re: [BUGS] BUG #7521: Cannot disable WAL log while using pg_dump

From
Gezeala M. Bacuño II
Date:
You may disable full_page_writes, but as you can see from my previous
post, disabling it did not do the trick. My zfs' USED property
continues to increase.

On Wed, Oct 17, 2012 at 3:55 PM, ichbinrene <rene.romero.b@gmail.com> wrote:
> I'm experiencing the exact same issue:
>
> PostgreSQL 9.1.6 on i686-pc-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305
> (Red Hat 4.4.6-4), 32-bit
> CentOS release 6.3 (Final)
>
> I might also turn full_page_writes off but I fear for data integrity in case
> of a crash .....
>
>
>
> --
> View this message in context:
http://postgresql.1045698.n5.nabble.com/BUG-7521-Cannot-disable-WAL-log-while-using-pg-dump-tp5722846p5728727.html
> Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs



Re: [BUGS] BUG #7521: Cannot disable WAL log while using pg_dump

From
Robert Haas
Date:
On Thu, Oct 18, 2012 at 3:18 PM, Gezeala M. Bacuño II <gezeala@gmail.com> wrote:
> You may disable full_page_writes, but as you can see from my previous
> post, disabling it did not do the trick. My zfs' USED property
> continues to increase.

I think we need somebody to compile with WAL_DEBUG defined and try to
reproduce this with wal_debug = on.  Then we will find out what WAL
records are being emitted, which will presumably give us a clue as to
what is really happening here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company