Re: Streaming replication, some small issues - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Streaming replication, some small issues
Date
Msg-id 3f0b79eb0912080338m71505de4g1aa61e6229fc1666@mail.gmail.com
Whole thread Raw
In response to Streaming replication, some small issues  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Tue, Dec 8, 2009 at 5:30 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> A couple of small issues spotted while reviewing the streaming
> replication patch:

Thanks for the review!

> - Because sentPtr is initialized to zeros, GetOldestWALSendPointer will
> return zero before a just-launched WAL sender has sent its first
> message. That can lead to WAL files that are still needed by another
> standby to be deleted prematurely.

Oops! I fixed that (in my git repository, see the bottom of this mail).

> - If a WAL file is not found in the master for some reason, standby goes
> into an infinite loop retrying it:
>
> ERROR:  could not read xlog records: FATAL:  could not open file
> "pg_xlog/000000010000000000000000" (log file 0, segment 0): No such file
> or directory

http://archives.postgresql.org/pgsql-hackers/2009-09/msg01393.php
>> walreceiver shouldn't die on connection error, just to be restarted by
>> startup process. Can we add error handling a la bgwriter and have a
>> retry loop within walreceiver?

As the result of your current and previous comment, you mean that
walreceiver should always retry connecting to the primary after
a connection error occurs in PQgetXLogData/PQputXLogRecPtr, and
exit after the other errors occur? Though I'm not sure whether
we can determine the error type precisely.

> - It's possible to shut down master, change max_wal_senders to 0,
> restart and do an operation like CLUSTER which then skips WAL-logging.
> Then shutdown, change max_wal_senders back to non-zero. All this while
> the standby is running. Leads to a corrupt standby.

I've regarded this case as a restriction. But, how do you think
we should cope with it?

1. Restriction: only documentation is required?
2. Needs safe guard: - forbid the primary to perform such operations while the   standby is running? - emit PANIC error
onthe standby if the primary which lost sync   restarts? 
3. Full solution: automatic resync mechanism is required?

> I've also pushed a couple of small cosmetic changes to replication
> branch at git://git.postgresql.org/git/users/heikki/postgres.git

Your changes seem good.

I pulled and merged your changes into my repository:
  git://git.postgresql.org/git/users/fujii/postgres.git  branch: replication

And, I pushed the capability of replication of a backup history file
into the repository.

> I'll continue reviewing...

Thanks a lot!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: EXPLAIN BUFFERS
Next
From: Greg Stark
Date:
Subject: Re: Streaming replication, some small issues