Thread: BUG #7521: Cannot disable WAL log while using pg_dump

BUG #7521: Cannot disable WAL log while using pg_dump

From

boy@atsc.nl

Date:

05 September 2012, 18:20:13

The following bug has been logged on the website:

Bug reference:      7521
Logged by:          Boy de Laat
Email address:      boy@atsc.nl
PostgreSQL version: 9.1.4
Operating system:   CentOS 6.2 x86_64
Description:        =


I've setup some slave replication and just to be sure i use pg_dump as a
backup mechanism besides replication. But when the pg_dump command is issued
much WAL logs will be generated and this is causing my slave to stop
replicating because it is to far behind.

So it would be nice if there is an option to disable WAL logging while
running pg_dump.

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Robert Haas

Date:

06 September 2012, 22:39:09

On Wed, Sep 5, 2012 at 9:57 AM,  <boy@atsc.nl> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      7521
> Logged by:          Boy de Laat
> Email address:      boy@atsc.nl
> PostgreSQL version: 9.1.4
> Operating system:   CentOS 6.2 x86_64
> Description:
>
> I've setup some slave replication and just to be sure i use pg_dump as a
> backup mechanism besides replication. But when the pg_dump command is issued
> much WAL logs will be generated and this is causing my slave to stop
> replicating because it is to far behind.
>
> So it would be nice if there is an option to disable WAL logging while
> running pg_dump.

pg_dump doesn't modify any data, so I don't see how it could be
causing WAL logs to get generated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Tom Lane

Date:

06 September 2012, 22:55:18

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Sep 5, 2012 at 9:57 AM,  <boy@atsc.nl> wrote:
>> So it would be nice if there is an option to disable WAL logging while
>> running pg_dump.

> pg_dump doesn't modify any data, so I don't see how it could be
> causing WAL logs to get generated.

Doesn't hint-bit setting cause WAL traffic these days?

Mind you, I think this request is entirely ill-considered.  Not writing
out the hint bits is not going to be a win in the long run.

            regards, tom lane

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Robert Haas

Date:

06 September 2012, 23:07:33

On Thu, Sep 6, 2012 at 3:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Sep 5, 2012 at 9:57 AM,  <boy@atsc.nl> wrote:
>>> So it would be nice if there is an option to disable WAL logging while
>>> running pg_dump.
>
>> pg_dump doesn't modify any data, so I don't see how it could be
>> causing WAL logs to get generated.
>
> Doesn't hint-bit setting cause WAL traffic these days?

I sure as heck don't think so.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Heikki Linnakangas

Date:

06 September 2012, 23:08:51

On 06.09.2012 13:07, Robert Haas wrote:
> On Thu, Sep 6, 2012 at 3:55 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Robert Haas<robertmhaas@gmail.com>  writes:
>>> On Wed, Sep 5, 2012 at 9:57 AM,<boy@atsc.nl>  wrote:
>>>> So it would be nice if there is an option to disable WAL logging while
>>>> running pg_dump.
>>
>>> pg_dump doesn't modify any data, so I don't see how it could be
>>> causing WAL logs to get generated.
>>
>> Doesn't hint-bit setting cause WAL traffic these days?
>
> I sure as heck don't think so.

It does not. HOT page pruning does, however. It could be that..

- Heikki

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Robert Haas

Date:

06 September 2012, 23:10:17

On Thu, Sep 6, 2012 at 4:08 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 06.09.2012 13:07, Robert Haas wrote:
>>>> pg_dump doesn't modify any data, so I don't see how it could be
>>>> causing WAL logs to get generated.
>>>
>>> Doesn't hint-bit setting cause WAL traffic these days?
>>
>> I sure as heck don't think so.
>
> It does not. HOT page pruning does, however. It could be that..

True, but wouldn't you have to be fantastically unlucky for that to
amount to anything material?  I mean, what are the chances of that
happening on more than a very small percentage of blocks?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Tom Lane

Date:

06 September 2012, 23:23:19

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 06.09.2012 13:07, Robert Haas wrote:
>> On Thu, Sep 6, 2012 at 3:55 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>> Doesn't hint-bit setting cause WAL traffic these days?

>> I sure as heck don't think so.

> It does not. HOT page pruning does, however. It could be that..

Sorry, I was thinking of the freeze case, which is also somewhat
unlikely --- but if it were happening, it could easily happen on
most/all pages of a table.

            regards, tom lane

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

"Boy de Laat"

Date:

06 September 2012, 23:50:09

At the time my backup starts i see much WAL logs being generated?

Met vriendelijke groet,

Boy de Laat

Verzonden via iPhone

Op 6 sep. 2012 om 21:44 heeft "Robert Haas" <robertmhaas@gmail.com> het
volgende geschreven:

> On Wed, Sep 5, 2012 at 9:57 AM,  <boy@atsc.nl> wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference:      7521
> > Logged by:          Boy de Laat
> > Email address:      boy@atsc.nl
> > PostgreSQL version: 9.1.4
> > Operating system:   CentOS 6.2 x86_64
> > Description:
> >
> > I've setup some slave replication and just to be sure i use pg_dump as
a
> > backup mechanism besides replication. But when the pg_dump command is
issued
> > much WAL logs will be generated and this is causing my slave to stop
> > replicating because it is to far behind.
> >
> > So it would be nice if there is an option to disable WAL logging while
> > running pg_dump.
>
> pg_dump doesn't modify any data, so I don't see how it could be
> causing WAL logs to get generated.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Robert Haas

Date:

07 September 2012, 00:06:52

On Thu, Sep 6, 2012 at 4:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> On 06.09.2012 13:07, Robert Haas wrote:
>>> On Thu, Sep 6, 2012 at 3:55 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>>> Doesn't hint-bit setting cause WAL traffic these days?
>
>>> I sure as heck don't think so.
>
>> It does not. HOT page pruning does, however. It could be that..
>
> Sorry, I was thinking of the freeze case, which is also somewhat
> unlikely --- but if it were happening, it could easily happen on
> most/all pages of a table.

But that only happens in vacuum, which pg_dump does not do.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Alvaro Herrera

Date:

07 September 2012, 00:07:25

Excerpts from Tom Lane's message of jue sep 06 17:23:07 -0300 2012:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > On 06.09.2012 13:07, Robert Haas wrote:
> >> On Thu, Sep 6, 2012 at 3:55 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
> >>> Doesn't hint-bit setting cause WAL traffic these days?
>=20
> >> I sure as heck don't think so.
>=20
> > It does not. HOT page pruning does, however. It could be that..
>=20
> Sorry, I was thinking of the freeze case, which is also somewhat
> unlikely --- but if it were happening, it could easily happen on
> most/all pages of a table.

How about secondary effects -- say pg_dump reads a lot of data into
buffers that were previously dirty and those get evicted.  Could page
eviction cause lots of WAL to be emitted?

--=20
=C3=81lvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Robert Haas

Date:

07 September 2012, 00:08:13

On Thu, Sep 6, 2012 at 5:07 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> How about secondary effects -- say pg_dump reads a lot of data into
> buffers that were previously dirty and those get evicted.  Could page
> eviction cause lots of WAL to be emitted?

No.  Simon's proposed checksum patch would have that effect, but currently no.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Alvaro Herrera

Date:

07 September 2012, 00:13:32

Excerpts from Boy de Laat's message of jue sep 06 17:24:35 -0300 2012:
>=20
> At the time my backup starts i see much WAL logs being generated?

I guess we'd need to see what the generated WAL logs are, either with
xlogdump or XLOG_DEBUG turned on ...

--=20
=C3=81lvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Pavan Deolasee

Date:

07 September 2012, 06:49:25

On Fri, Sep 7, 2012 at 2:43 AM, Alvaro Herrera <alvherre@2ndquadrant.com>wrote:

> Excerpts from Boy de Laat's message of jue sep 06 17:24:35 -0300 2012:
> >
> > At the time my backup starts i see much WAL logs being generated?
>
> I guess we'd need to see what the generated WAL logs are, either with
> xlogdump or XLOG_DEBUG turned on ...
>
>
Can a long running pg_dump transaction have some ill effect on the amount
of WAL generation ? It can definitely cause unusual additional bloat if the
database is being continuously updated during that period, but not sure it
can cause additional WAL.

OP probably needs to tell us whats the size of the database, how's the
access pattern, how long does it take to dump the entire database and how
many more WAL files generated during this process compared to average rate.

Thanks,
Pavan

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Gezeala M. Bacuño II

Date:

07 September 2012, 09:45:43

On Thu, Sep 6, 2012 at 8:48 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
>
> On Fri, Sep 7, 2012 at 2:43 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
>>
>> Excerpts from Boy de Laat's message of jue sep 06 17:24:35 -0300 2012:
>> >
>> > At the time my backup starts i see much WAL logs being generated?
>>
>> I guess we'd need to see what the generated WAL logs are, either with
>> xlogdump or XLOG_DEBUG turned on ...
>>
>
> Can a long running pg_dump transaction have some ill effect on the amount of
> WAL generation ? It can definitely cause unusual additional bloat if the
> database is being continuously updated during that period, but not sure it
> can cause additional WAL.
>
> OP probably needs to tell us whats the size of the database, how's the
> access pattern, how long does it take to dump the entire database and how
> many more WAL files generated during this process compared to average rate.
>
> Thanks,
> Pavan
>

DB size: 3400509331216 (3.09TB)
Full db pg_dump takes around 20 to 21hrs with options "-Fc -Z 1"
pg_dump size: 391161548000 (364.3GB)
checkpoint_segments = 80
#checkpoint_timeout = 5min (default)
#checkpoint_completion_target = 0.5 (default)
version: PostgreSQL 8.4.12

>From pg_controldata right after starting the instance:
Latest checkpoint location:           4441/5E681F38
Prior checkpoint location:            4441/5E67D140
Latest checkpoint's REDO location:    4441/5E681F38
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/419543166
Latest checkpoint's NextOID:          653512568
Latest checkpoint's NextMultiXactId:  107873
Latest checkpoint's NextMultiOffset:  219841

>From pg_controldata after pg_dump:
Latest checkpoint location:           4450/7A14F280
Prior checkpoint location:            4450/7A14E018
Latest checkpoint's REDO location:    4450/7A14F280
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/419543166
Latest checkpoint's NextOID:          653512568
Latest checkpoint's NextMultiXactId:  107873
Latest checkpoint's NextMultiOffset:  219841

Background:
The cluster used for the backup was cloned from a zfs snapshot (taken
while the db is running without issuing pg_start/stop_backup). WALs
are replayed and cluster starts up usually in a minute or so. After
pg_dump, the clone's zfs USED property value is 285G -- that's how
huge block changes have grown for the entirety of the pg_dump process.
Yesterday's backup clone was 280G.

Aside from pg_dump, a cron job issues a 'select * from
pg_stat_activity' every 9mins. The instance is cloned solely for
backup purposes with zero modifications and autovacuum disabled.

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Pavan Deolasee

Date:

07 September 2012, 12:19:56

On Fri, Sep 7, 2012 at 12:15 PM, Gezeala M. Bacu=F1o II <gezeala@gmail.com>=
 wrote:

>
> From pg_controldata right after starting the instance:
> Latest checkpoint location:           4441/5E681F38
> Prior checkpoint location:            4441/5E67D140
> Latest checkpoint's REDO location:    4441/5E681F38
> Latest checkpoint's TimeLineID:       1
> Latest checkpoint's NextXID:          0/419543166
> Latest checkpoint's NextOID:          653512568
> Latest checkpoint's NextMultiXactId:  107873
> Latest checkpoint's NextMultiOffset:  219841
>
> From pg_controldata after pg_dump:
> Latest checkpoint location:           4450/7A14F280
> Prior checkpoint location:            4450/7A14E018
> Latest checkpoint's REDO location:    4450/7A14F280
> Latest checkpoint's TimeLineID:       1
> Latest checkpoint's NextXID:          0/419543166
> Latest checkpoint's NextOID:          653512568
> Latest checkpoint's NextMultiXactId:  107873
> Latest checkpoint's NextMultiOffset:  219841
>
> Background:
> The cluster used for the backup was cloned from a zfs snapshot (taken
> while the db is running without issuing pg_start/stop_backup). WALs
> are replayed and cluster starts up usually in a minute or so. After
> pg_dump, the clone's zfs USED property value is 285G -- that's how
> huge block changes have grown for the entirety of the pg_dump process.
> Yesterday's backup clone was 280G.
>
> Aside from pg_dump, a cron job issues a 'select * from
> pg_stat_activity' every 9mins. The instance is cloned solely for
> backup purposes with zero modifications and autovacuum disabled.

Hmm.. So there is definitely large number of WALs being written but no
transaction activity as shown by the constant NextXID. As someone
mentioned upthread, HOT prune can cause WAL activity even for what is
otherwise a read-only transaction. Given that pg_dump would be
touching each and every page in every relation, its not entirely
unlikely that HOT might be acting on many pages. But that should
happen only once. So if you take another dump of the cluster, you
should not see more WAL activity.

Does your primary database (which you cloned) get significant
UPDATE/DELETE activities ? Further, does it have autovacuum disabled
or have long running transactions ?

BTW, the following query returns ~60GB. Thats the amount of WAL
written after the server was started and at the end of pg_dump (I
don't think pg_xlog_location_diff() is available in the older
releases).

postgres=3D# select pg_xlog_location_diff('4450/7A14F280',
'4441/5E681F38')/(2^30);
     ?column?
------------------
 60.1980484202504

Thanks,
Pavan

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Pavan Deolasee

Date:

07 September 2012, 16:29:30

On Fri, Sep 7, 2012 at 12:15 PM, Gezeala M. Bacu=F1o II <gezeala@gmail.com>=
wrote:

>  The instance is cloned solely for
> backup purposes with zero modifications and autovacuum disabled.
>

For your restricted use case, it might be worthwhile to
turn full_page_writes OFF before starting the server. That should
definitely reduce a lot of WAL volume.

Thanks,
Pavan

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Gezeala M. Bacuño II

Date:

07 September 2012, 21:41:14

adding pgsql-bugs list in case OP posts back.

On Fri, Sep 7, 2012 at 11:29 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> (Adding -hackers. Did not realize it got dropped)
>
> On Fri, Sep 7, 2012 at 11:25 PM, Gezeala M. Bacuño II <gezeala@gmail.com>
> wrote:
>>
>> On Fri, Sep 7, 2012 at 7:17 AM, Pavan Deolasee <pavan.deolasee@gmail.com>
>> wrote:
>> >
>> >
>> > On Fri, Sep 7, 2012 at 7:00 PM, Marie Bacuno II <gezeala@gmail.com>
>> > wrote:
>> >>
>> >>
>> >> On Sep 7, 2012, at 2:19, Pavan Deolasee <pavan.deolasee@gmail.com>
>> >> wrote:
>> >>
>> >>
>> >> > or have long running transactions ?
>> >>
>> >> Yes but I don't think there are when the snapshot was taken. Does the
>> >> pg_xlog_location_diff() result from latest and prior checkpoint upon
>> >> start-up indicates the size of replayed changes?
>> >>
>> >
>> > Thats the amount of additional WAL generated after you started the
>> > server.
>> >
>> >>
>> >> >
>> >> > BTW, the following query returns ~60GB. Thats the amount of WAL
>> >> > written after the server was started and at the end of pg_dump (I
>> >> > don't think pg_xlog_location_diff() is available in the older
>> >> > releases).
>> >> >
>> >> > postgres=# select pg_xlog_location_diff('4450/7A14F280',
>> >> > '4441/5E681F38')/(2^30);
>> >> >    ?column?
>> >> > ------------------
>> >> > 60.1980484202504
>> >>
>> >> It'll be great to know what the wals modified..?
>> >
>> >
>> > You would need something like xlogdump to decipher them. I quickly tried
>> > this and it seems to work against 8.4 version that you are running.
>> > https://github.com/snaga/xlogdump
>> >
>> > Download the source code, compile and run it against one of the most
>> > recent
>> > WAL files in the cluster against which you ran pg_dump. You would need
>> > to
>> > set PATH to contain the pg_config of the server you are running. Please
>> > post
>> > the output.
>> >
>> > Thanks,
>> > Pavan
>> >
>> >
>>
>> Here you go:
>>
>> ## last WAL
>> $ xlogdump -S /dbpool/data/pg_xlog/00000001000044500000007A
>>
>> /dbpool/data/pg_xlog/00000001000044500000007A:
>>
>> Unexpected page info flags 0003 at offset 0
>> Skipping unexpected continuation record at offset 0
>> ReadRecord: record with zero len at 17488/7A14F310
>> Unexpected page info flags 0001 at offset 150000
>> Skipping unexpected continuation record at offset 150000
>> Unable to read continuation page?
>>  ** maybe continues to next segment **
>> ---------------------------------------------------------------
>> TimeLineId: 1, LogId: 17488, LogSegment: 122
>>
>> Resource manager stats:
>>   [0]XLOG      : 3 records, 120 bytes (avg 40.0 bytes)
>>                  checkpoint: 3, switch: 0, backup end: 0
>>   [1]Transaction: 0 record, 0 byte (avg 0.0 byte)
>>                  commit: 0, abort: 0
>>   [2]Storage   : 0 record, 0 byte (avg 0.0 byte)
>>   [3]CLOG      : 0 record, 0 byte (avg 0.0 byte)
>>   [4]Database  : 0 record, 0 byte (avg 0.0 byte)
>>   [5]Tablespace: 0 record, 0 byte (avg 0.0 byte)
>>   [6]MultiXact : 0 record, 0 byte (avg 0.0 byte)
>>   [7]Reserved 7: 0 record, 0 byte (avg 0.0 byte)
>>   [8]Reserved 8: 0 record, 0 byte (avg 0.0 byte)
>>   [9]Heap2     : 2169 records, 43380 bytes (avg 20.0 bytes)
>>   [10]Heap      : 0 record, 0 byte (avg 0.0 byte)
>>                  ins: 0, upd/hot_upd: 0/0, del: 0
>>   [11]Btree     : 0 record, 0 byte (avg 0.0 byte)
>>   [12]Hash      : 0 record, 0 byte (avg 0.0 byte)
>>   [13]Gin       : 0 record, 0 byte (avg 0.0 byte)
>>   [14]Gist      : 0 record, 0 byte (avg 0.0 byte)
>>   [15]Sequence  : 0 record, 0 byte (avg 0.0 byte)
>>
>> Backup block stats: 2169 blocks, 16551816 bytes (avg 7631.1 bytes)
>>
>
> I think both my theories seem to be holding up. Heap2 resource manager is
> used only for vacuum freeze, lazy vacuum or HOT prune. Given your access
> pattern, I bet its the third activity that kicking in on your database. You
> got many pages with dead tuples and they are getting cleaned at the first
> opportunity, which happens to be the pg_dump thats run immediately after the
> server restart. This is seen by all 2169 WAL records in the file being
> attributed to the Heap2 RM above.
>
> Whats additionally happening is each of these records are on different heap
> pages. The cleanup activity dirties those pages. Since each of these pages
> is being dirtied for the first time after a recent checkpoint and
> full_page_writes is turned ON, entire page is backed up in the WAL record.
> You can see the exact number of backup blocks in the stats above.
>
> I don't think we have any mechanism to control or stop HOT from doing what
> it wants to do, unless you are willing to run a modified server for this
> reason. But you can at least bring down the WAL volume by turning
> full_page_writes OFF.
>
> Thanks,
> Pavan

Great. Finally got some light on this. I'll disable full_page_writes
on my next backup and will post back results tomorrow. Thanks.

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Gezeala M. Bacuño II

Date:

09 September 2012, 08:59:34

On Fri, Sep 7, 2012 at 11:40 AM, Gezeala M. Bacuño II <gezeala@gmail.com> wrote:
> adding pgsql-bugs list in case OP posts back.
>
> On Fri, Sep 7, 2012 at 11:29 AM, Pavan Deolasee
> <pavan.deolasee@gmail.com> wrote:
>> (Adding -hackers. Did not realize it got dropped)
>>
>> On Fri, Sep 7, 2012 at 11:25 PM, Gezeala M. Bacuño II <gezeala@gmail.com>
>> wrote:
>>>
>>> On Fri, Sep 7, 2012 at 7:17 AM, Pavan Deolasee <pavan.deolasee@gmail.com>
>>> wrote:
>>> >
>>> >
>>> > On Fri, Sep 7, 2012 at 7:00 PM, Marie Bacuno II <gezeala@gmail.com>
>>> > wrote:
>>> >>
>>> >>
>>> >> On Sep 7, 2012, at 2:19, Pavan Deolasee <pavan.deolasee@gmail.com>
>>> >> wrote:
>>> >>
>>> >>
>>> >> > or have long running transactions ?
>>> >>
>>> >> Yes but I don't think there are when the snapshot was taken. Does the
>>> >> pg_xlog_location_diff() result from latest and prior checkpoint upon
>>> >> start-up indicates the size of replayed changes?
>>> >>
>>> >
>>> > Thats the amount of additional WAL generated after you started the
>>> > server.
>>> >
>>> >>
>>> >> >
>>> >> > BTW, the following query returns ~60GB. Thats the amount of WAL
>>> >> > written after the server was started and at the end of pg_dump (I
>>> >> > don't think pg_xlog_location_diff() is available in the older
>>> >> > releases).
>>> >> >
>>> >> > postgres=# select pg_xlog_location_diff('4450/7A14F280',
>>> >> > '4441/5E681F38')/(2^30);
>>> >> >    ?column?
>>> >> > ------------------
>>> >> > 60.1980484202504
>>> >>
>>> >> It'll be great to know what the wals modified..?
>>> >
>>> >
>>> > You would need something like xlogdump to decipher them. I quickly tried
>>> > this and it seems to work against 8.4 version that you are running.
>>> > https://github.com/snaga/xlogdump
>>> >
>>> > Download the source code, compile and run it against one of the most
>>> > recent
>>> > WAL files in the cluster against which you ran pg_dump. You would need
>>> > to
>>> > set PATH to contain the pg_config of the server you are running. Please
>>> > post
>>> > the output.
>>> >
>>> > Thanks,
>>> > Pavan
>>> >
>>> >
>>>
>>> Here you go:
>>>
>>> ## last WAL
>>> $ xlogdump -S /dbpool/data/pg_xlog/00000001000044500000007A
>>>
>>> /dbpool/data/pg_xlog/00000001000044500000007A:
>>>
>>> Unexpected page info flags 0003 at offset 0
>>> Skipping unexpected continuation record at offset 0
>>> ReadRecord: record with zero len at 17488/7A14F310
>>> Unexpected page info flags 0001 at offset 150000
>>> Skipping unexpected continuation record at offset 150000
>>> Unable to read continuation page?
>>>  ** maybe continues to next segment **
>>> ---------------------------------------------------------------
>>> TimeLineId: 1, LogId: 17488, LogSegment: 122
>>>
>>> Resource manager stats:
>>>   [0]XLOG      : 3 records, 120 bytes (avg 40.0 bytes)
>>>                  checkpoint: 3, switch: 0, backup end: 0
>>>   [1]Transaction: 0 record, 0 byte (avg 0.0 byte)
>>>                  commit: 0, abort: 0
>>>   [2]Storage   : 0 record, 0 byte (avg 0.0 byte)
>>>   [3]CLOG      : 0 record, 0 byte (avg 0.0 byte)
>>>   [4]Database  : 0 record, 0 byte (avg 0.0 byte)
>>>   [5]Tablespace: 0 record, 0 byte (avg 0.0 byte)
>>>   [6]MultiXact : 0 record, 0 byte (avg 0.0 byte)
>>>   [7]Reserved 7: 0 record, 0 byte (avg 0.0 byte)
>>>   [8]Reserved 8: 0 record, 0 byte (avg 0.0 byte)
>>>   [9]Heap2     : 2169 records, 43380 bytes (avg 20.0 bytes)
>>>   [10]Heap      : 0 record, 0 byte (avg 0.0 byte)
>>>                  ins: 0, upd/hot_upd: 0/0, del: 0
>>>   [11]Btree     : 0 record, 0 byte (avg 0.0 byte)
>>>   [12]Hash      : 0 record, 0 byte (avg 0.0 byte)
>>>   [13]Gin       : 0 record, 0 byte (avg 0.0 byte)
>>>   [14]Gist      : 0 record, 0 byte (avg 0.0 byte)
>>>   [15]Sequence  : 0 record, 0 byte (avg 0.0 byte)
>>>
>>> Backup block stats: 2169 blocks, 16551816 bytes (avg 7631.1 bytes)
>>>
>>
>> I think both my theories seem to be holding up. Heap2 resource manager is
>> used only for vacuum freeze, lazy vacuum or HOT prune. Given your access
>> pattern, I bet its the third activity that kicking in on your database. You
>> got many pages with dead tuples and they are getting cleaned at the first
>> opportunity, which happens to be the pg_dump thats run immediately after the
>> server restart. This is seen by all 2169 WAL records in the file being
>> attributed to the Heap2 RM above.
>>
>> Whats additionally happening is each of these records are on different heap
>> pages. The cleanup activity dirties those pages. Since each of these pages
>> is being dirtied for the first time after a recent checkpoint and
>> full_page_writes is turned ON, entire page is backed up in the WAL record.
>> You can see the exact number of backup blocks in the stats above.
>>
>> I don't think we have any mechanism to control or stop HOT from doing what
>> it wants to do, unless you are willing to run a modified server for this
>> reason. But you can at least bring down the WAL volume by turning
>> full_page_writes OFF.
>>
>> Thanks,
>> Pavan
>
> Great. Finally got some light on this. I'll disable full_page_writes
> on my next backup and will post back results tomorrow. Thanks.

Results with full_page_writes disabled:

pg_controldata right after cluster start:
Latest checkpoint location:           4469/B8352EB8
Prior checkpoint location:            4469/B5EAE428
Latest checkpoint's REDO location:    4469/B8352EB8
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/419770454
Latest checkpoint's NextOID:          654192504
Latest checkpoint's NextMultiXactId:  108124
Latest checkpoint's NextMultiOffset:  220345

pg_controldata after full dump:
Latest checkpoint location:           446A/22A93370
Prior checkpoint location:            446A/22A93328
Latest checkpoint's REDO location:    446A/22A93370
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/419770454
Latest checkpoint's NextOID:          654192504
Latest checkpoint's NextMultiXactId:  108124
Latest checkpoint's NextMultiOffset:  220345


$ xlogdump -S /dbpool/data/pg_xlog/000000010000446A00000022

/dbpool/data/pg_xlog/000000010000446A00000022:

ReadRecord: record with zero len at 17514/22A933B8
---------------------------------------------------------------
TimeLineId: 1, LogId: 17514, LogSegment: 34

Resource manager stats: [0]XLOG      : 3 records, 120 bytes (avg 40.0 bytes)                checkpoint: 3, switch: 0,
backupend: 0 [1]Transaction: 0 record, 0 byte (avg 0.0 byte)                commit: 0, abort: 0 [2]Storage   : 0
record,0 byte (avg 0.0 byte) [3]CLOG      : 0 record, 0 byte (avg 0.0 byte) [4]Database  : 0 record, 0 byte (avg 0.0
byte)[5]Tablespace: 0 record, 0 byte (avg 0.0 byte) [6]MultiXact : 0 record, 0 byte (avg 0.0 byte) [7]Reserved 7: 0
record,0 byte (avg 0.0 byte) [8]Reserved 8: 0 record, 0 byte (avg 0.0 byte) [9]Heap2     : 68125 records, 8645016 bytes
(avg126.9 bytes) [10]Heap      : 28952 records, 4640535 bytes (avg 160.3 bytes)                ins: 28952, upd/hot_upd:
0/0,del: 0 [11]Btree     : 0 record, 0 byte (avg 0.0 byte) [12]Hash      : 0 record, 0 byte (avg 0.0 byte) [13]Gin
: 0 record, 0 byte (avg 0.0 byte) [14]Gist      : 0 record, 0 byte (avg 0.0 byte) [15]Sequence  : 0 record, 0 byte (avg
0.0byte) 

Backup block stats: 0 block, 0 byte (avg 0.0 byte)

zfs USED property of clone after backup and full_page_write disabled: 298GB

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

ichbinrene

Date:

18 October 2012, 06:48:08

I'm experiencing the exact same issue:

PostgreSQL 9.1.6 on i686-pc-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305
(Red Hat 4.4.6-4), 32-bit
CentOS release 6.3 (Final)

I might also turn full_page_writes off but I fear for data integrity in case
of a crash .....



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/BUG-7521-Cannot-disable-WAL-log-while-using-pg-dump-tp5722846p5728727.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Gezeala M. Bacuño II

Date:

18 October 2012, 22:20:10

You may disable full_page_writes, but as you can see from my previous
post, disabling it did not do the trick. My zfs' USED property
continues to increase.

On Wed, Oct 17, 2012 at 3:55 PM, ichbinrene <rene.romero.b@gmail.com> wrote:
> I'm experiencing the exact same issue:
>
> PostgreSQL 9.1.6 on i686-pc-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305
> (Red Hat 4.4.6-4), 32-bit
> CentOS release 6.3 (Final)
>
> I might also turn full_page_writes off but I fear for data integrity in case
> of a crash .....
>
>
>
> --
> View this message in context:
http://postgresql.1045698.n5.nabble.com/BUG-7521-Cannot-disable-WAL-log-while-using-pg-dump-tp5722846p5728727.html
> Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs

Re: BUG #7521: Cannot disable WAL log while using pg_dump

From

Robert Haas

Date:

19 October 2012, 23:25:15

On Thu, Oct 18, 2012 at 3:18 PM, Gezeala M. Bacuño II <gezeala@gmail.com> wrote:
> You may disable full_page_writes, but as you can see from my previous
> post, disabling it did not do the trick. My zfs' USED property
> continues to increase.

I think we need somebody to compile with WAL_DEBUG defined and try to
reproduce this with wal_debug = on.  Then we will find out what WAL
records are being emitted, which will presumably give us a clue as to
what is really happening here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company