Re: 9.3 pg_archivecleanup broken? - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: 9.3 pg_archivecleanup broken?
Date
Msg-id CAHGQGwH+M630UnYOXiicXikL_gyFxv=bR=o_BSS_6xLMsv4DKw@mail.gmail.com
Whole thread Raw
In response to 9.3 pg_archivecleanup broken?  ("Erik Rijkers" <er@xs4all.nl>)
Responses Re: 9.3 pg_archivecleanup broken?  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Mon, Nov 19, 2012 at 12:43 AM, Erik Rijkers <er@xs4all.nl> wrote:
> (In a test setup) I can't get pg_archivecleanup to remove WALfiles in 9.3devel. (A very similar
> setup in 9.2 works fine).
>
> In 9.3 pg_archivecleanup just keeps repeating lines like:
>
> pg_archivecleanup: keep WAL file "/home/aardvark/pg_stuff/archive_dir93/000000000000000000000000"
> and later
>
> (and does not delete any files.)
>
> Configuration:
>
> # master  pgsql.93_1/data/postgresql.conf:
> data_directory = '/home/aardvark/pg_stuff/pg_installations/pgsql.93_1/data'
> listen_addresses = '*'
> max_connections = 100
> shared_buffers = 128MB
> wal_level = hot_standby
> synchronous_commit = on
> checkpoint_segments = 3
> archive_mode = on
> archive_command = 'cp %p /home/aardvark/pg_stuff/archive_dir93/%f < /dev/null'
> max_wal_senders = 3
> synchronous_standby_names = '*'
>
> # slave   pgsql.93_2/data/postgresql.conf:
> data_directory = '/home/aardvark/pg_stuff/pg_installations/pgsql.93_2/data'
> listen_addresses = '*'
> port = 6665
> max_connections = 100
> shared_buffers = 128MB
> wal_level = hot_standby
> synchronous_commit = on
> checkpoint_segments = 3
> max_wal_senders = 3
> synchronous_standby_names = ''
> hot_standby = on
> wal_receiver_status_interval = 59
>
> # pgsql.93_2/data/recovery.conf
> primary_conninfo = 'host=127.0.0.1 port=6664 user=aardvark password=sekr1t
> application_name=wal_receiver_01'
> standby_mode = 'on'
> restore_command = 'cp /home/aardvark/pg_stuff/archive_dir93/%f %p < /dev/null'
> archive_cleanup_command = 'pg_archivecleanup -d /home/aardvark/pg_stuff/archive_dir93 %r'
>
>
> Seeing that the same setup in 9.2 has pg_archivecleanup deleting files, it would seem that some
> bug exists but I haven't followed changes regarding WAL too closely.

Thanks for the report! I was able to reproduce this problem.

What's broken is not pg_archivecleanup itself but %r in archive_cleanup_command
which is replaced by the name of the file containing the last valid
restart point.
In 9.3dev, %r is always replaced by an invalid WAL filename (i.e., 0000....0000)
wrongly.

This bug is derived from the commit d5497b95f3ca2fc50c6eef46d3394ab6e6855956.
This commit changed ExecuteRecoveryCommand() so that it calculates the
the last valid
retart file by using GetOldestRestartPoint(), even though
GetOldestRestartPoint() only
works in the startup process and only while WAL replay is in progress
(i.e., InRedo = true).
In archive_cleanup_command, ExecuteRecoveryCommand() is executed by the
checkpointer process, so the problem happened.

I found recovery_end_command also has the same bug because it calls
ExecuteRecoveryComand() after WAL replay is completed.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [RFC] Fix div/mul crash and more undefined behavior
Next
From: Craig Ringer
Date:
Subject: Re: Parser - Query Analyser