Re: pg_receivewal starting position - Mailing list pgsql-hackers
From | Ronan Dunklau |
---|---|
Subject | Re: pg_receivewal starting position |
Date | |
Msg-id | 15651763.8WXi7s64xO@aivenronan Whole thread Raw |
In response to | Re: pg_receivewal starting position (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: pg_receivewal starting position
|
List | pgsql-hackers |
Le mercredi 28 juillet 2021, 08:22:30 CEST Kyotaro Horiguchi a écrit : > At Tue, 27 Jul 2021 07:50:39 +0200, Ronan Dunklau <ronan.dunklau@aiven.io> > wrote in > > Hello, > > > > I've notived that pg_receivewal logic for deciding which LSN to start > > > > streaming at consists of: > > - looking up the latest WAL file in our destination folder, and resume > > from > > > > here > > > > - if there isn't, use the current flush location instead. > > > > This behaviour surprised me when using it with a replication slot: I was > > expecting it to start streaming at the last flushed location from the > > replication slot instead. If you consider a backup tool which will take > > pg_receivewal's output and transfer it somewhere else, using the > > replication slot position would be the easiest way to ensure we don't > > miss WAL files. > > > > Does that make sense ? > > > > I don't know if it should be the default, toggled by a command line flag, > > or if we even should let the user provide a LSN. > > *I* think it is completely reasonable (or at least convenient or less > astonishing) that pg_receivewal starts from the restart_lsn of the > replication slot to use. The tool already decides the clean-start LSN > a bit unusual way. And it seems to me that proposed behavior can be > the default when -S is specified. > As of now we can't get the replication_slot restart_lsn with a replication connection AFAIK. This implies that the patch could require the user to specify a maintenance-db parameter, and we would use that if provided to fetch the replication slot info, or fallback to the previous behaviour. I don't really like this approach as the behaviour changing wether we supply a maintenance-db parameter or not is error-prone for the user. Another option would be to add a new replication command (for example ACQUIRE_REPLICATION_SLOT <slot_name>) to set the replication slot as the current one, and return some info about it (restart_lsn at least for a physical slot). I don't see any reason not to make it work for logical replication connections / slots, but it wouldn't be that useful since we can query the database in that case. Acquiring the replication slot instead of just reading it would make sure that no other process could start the replication between the time we read the restart_lsn and when we issue START_REPLICATION. START_REPLICATION could then check if we already have a replication slot, and ensure it is the same one as the one we're trying to use. From pg_receivewal point of view, this would amount to: - check if we currently have wal in the target directory. - if we do, proceed as currently done, by computing the start lsn and timeline from the last archived wal - if we don't, and we have a slot, run ACQUIRE_REPLICATION_SLOT. Use the restart_lsn as the start lsn if there is one, and don't provide a timeline - if we still don't have a start_lsn, fallback to using the current server wal position as is done. What do you think ? Which information should we provide about the slot ? -- Ronan Dunklau
pgsql-hackers by date: