Help with streaming replication protocol - Mailing list pgsql-general

From Christopher Bottaro
Subject Help with streaming replication protocol
Date
Msg-id DM6PR22MB205937A29DEF55A4D7205B8AE8B20@DM6PR22MB2059.namprd22.prod.outlook.com
Whole thread Raw
List pgsql-general
Hello,

So from a high level, I understand that Postgres will send messages (XLogData) and the receiver needs to ack these messages so Postgres knows it's ok to delete data from disk. I don't understand some details of the protocol though.

Working off the documentation here:

(Side note, it's super annoying that the documentation doesn't name these fields, so I'm going to name them here to make things easier to talk about.)

An XLogData message has two interesting fields:
```
message_wal_start) "The starting point of the WAL data in this message."
server_wal_end) "The current end of WAL on the server."
```

Which one do I care about? It seems like message_wal_start, and I don't even get the point of server_wal_end.

When sending a "Standby status update" (which is presumably the "ack" message), I don't understand why there are *three* fields regarding what part of the wal I've seen:
```
wal_written) "The location of the last WAL byte + 1 received and written to disk in the standby."
wal_flushed) "The location of the last WAL byte + 1 flushed to disk in the standby."
wal_applied) "The location of the last WAL byte + 1 applied in the standby."
```

It seems like I should be setting all 3 of them to the last message_wal_start that I've seen. If I look at sendFeedback() in pg_recvlogical.c, it doesn't even set wal_applied. Also, it doesn't do the +1 addition that the documentation says to do.

From some experimentation, I found that if I set all three fields to the last seen message_wal_start, then the replication slot's restart_lsn field will not advanced past the last XLogMessage that I've seen, so if I restart my program, I will get the last XLogMessage again (a duplicate).

To further confuse things, the Postgres server will periodically send a Keepalive message which has:
```
server_wal_end) "The current end of WAL on the server."
```

And it seems I need to send this back via a "Standby status update" message otherwise the replication slot's restart_lsn doesn't advance.

So I guess it boils down to three questions:
1) What should I care about in the XLogData messages? Which wal position?
2) What should I be sending in the status update messages?
3) Should I be doing anything with the server_wal_end in the keepalive messages?

Thank you for the help. If I get to the point of understanding well enough, I wouldn't mind adding it to the Postgresql wiki.

pgsql-general by date:

Previous
From: Scott Ribe
Date:
Subject: Re: query, probably needs window functions
Next
From: Christopher Pereira
Date:
Subject: Re: pg_basebackup + incremental base backups