Re: [HACKERS] Restricting maximum keep segments by repslots - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: [HACKERS] Restricting maximum keep segments by repslots |
Date | |
Msg-id | 20200427223342.GA23152@alvherre.pgsql Whole thread Raw |
In response to | Re: [HACKERS] Restricting maximum keep segments by repslots (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: [HACKERS] Restricting maximum keep segments by repslots
(Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
|
List | pgsql-hackers |
On 2020-Apr-08, Kyotaro Horiguchi wrote: > At Wed, 08 Apr 2020 09:37:10 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > Just avoiding starting replication when restart_lsn is invalid is > sufficient (the attached, which is equivalent to a part of what the > invalidated flag did). I thing that the error message needs a Hint but > it looks on the subscriber side as: > > [22086] 2020-04-08 10:35:04.188 JST ERROR: could not receive data from WAL stream: ERROR: replication slot "s1" is invalidated > HINT: The slot exceeds the limit by max_slot_wal_keep_size. > > I don't think it is not clean.. Perhaps the subscriber should remove > the trailing line of the message from the publisher? Thanks for the fix! I propose two changes: 1. reword the error like this: ERROR: replication slot "regression_slot3" cannot be advanced DETAIL: This slot has never previously reserved WAL, or has been invalidated 2. use the same error in one other place, to wit pg_logical_slot_get_changes() and pg_replication_slot_advance(). I made the DETAIL part the same in all places, but the ERROR line is adjusted to what each callsite is doing. I do think that this change in test_decoding is a bit unpleasant: -ERROR: cannot use physical replication slot for logical decoding +ERROR: cannot get changes from replication slot "repl" The test is -- check that we're detecting a streaming rep slot used for logical decoding SELECT 'init' FROM pg_create_physical_replication_slot('repl'); SELECT data FROM pg_logical_slot_get_changes('repl', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > > On the other hand, physical replication doesn't break by invlidation. > > [...] > > If we don't mind that standby can reconnect after a walsender > termination due to the invalidation, we don't need to do something for > this. Restricting max_slot_wal_keep_size to be larger than a certain > threshold would reduce the chance we see that behavior. Yeah, I think you're referring to the fact that StartReplication() doesn't verify the restart_lsn of the slot; and if we do add a check, a few tests that rely on physical replication start to fail. This patch only adds a comment in that spot. But I don't (yet) know what the consequences of this are, or whether it can be fixed by setting a valid restart_lsn ahead of time. This test in pg_basebackup fails, for example: # Running: pg_basebackup -D /home/alvherre/Code/pgsql-build/master/src/bin/pg_basebackup/tmp_check/tmp_test_EwIj/backupxs_sl-X stream -S slot1 pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: cannot read from replication slot "slot1" DETAIL: This slot has never previously reserved WAL, or has been invalidated pg_basebackup: error: child process exited with exit code 1 pg_basebackup: removing data directory "/home/alvherre/Code/pgsql-build/master/src/bin/pg_basebackup/tmp_check/tmp_test_EwIj/backupxs_sl" not ok 95 - pg_basebackup -X stream with replication slot runs # Failed test 'pg_basebackup -X stream with replication slot runs' # at t/010_pg_basebackup.pl line 461. Anyway I think the current patch can be applied as is -- and if we want physical replication to have some other behavior, we can patch for that afterwards. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: