Re: [HACKERS] Restricting maximum keep segments by repslots - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: [HACKERS] Restricting maximum keep segments by repslots
Date
Msg-id 20200427223342.GA23152@alvherre.pgsql
Whole thread Raw
In response to Re: [HACKERS] Restricting maximum keep segments by repslots  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: [HACKERS] Restricting maximum keep segments by repslots  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On 2020-Apr-08, Kyotaro Horiguchi wrote:

> At Wed, 08 Apr 2020 09:37:10 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> 
> Just avoiding starting replication when restart_lsn is invalid is
> sufficient (the attached, which is equivalent to a part of what the
> invalidated flag did). I thing that the error message needs a Hint but
> it looks on the subscriber side as:
> 
> [22086] 2020-04-08 10:35:04.188 JST ERROR:  could not receive data from WAL stream: ERROR:  replication slot "s1" is
invalidated
>         HINT:  The slot exceeds the limit by max_slot_wal_keep_size.
> 
> I don't think it is not clean.. Perhaps the subscriber should remove
> the trailing line of the message from the publisher?

Thanks for the fix!  I propose two changes:

1. reword the error like this:

ERROR:  replication slot "regression_slot3" cannot be advanced
DETAIL:  This slot has never previously reserved WAL, or has been invalidated

2. use the same error in one other place, to wit
   pg_logical_slot_get_changes() and pg_replication_slot_advance().  I
   made the DETAIL part the same in all places, but the ERROR line is
   adjusted to what each callsite is doing.
   I do think that this change in test_decoding is a bit unpleasant:

-ERROR:  cannot use physical replication slot for logical decoding
+ERROR:  cannot get changes from replication slot "repl"

   The test is
      -- check that we're detecting a streaming rep slot used for logical decoding
      SELECT 'init' FROM pg_create_physical_replication_slot('repl');
      SELECT data FROM pg_logical_slot_get_changes('repl', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');

> > On the other hand, physical replication doesn't break by invlidation.
> > [...]
> 
> If we don't mind that standby can reconnect after a walsender
> termination due to the invalidation, we don't need to do something for
> this.  Restricting max_slot_wal_keep_size to be larger than a certain
> threshold would reduce the chance we see that behavior.

Yeah, I think you're referring to the fact that StartReplication()
doesn't verify the restart_lsn of the slot; and if we do add a check, a
few tests that rely on physical replication start to fail.  This patch
only adds a comment in that spot.  But I don't (yet) know what the
consequences of this are, or whether it can be fixed by setting a valid
restart_lsn ahead of time.  This test in pg_basebackup fails, for
example:

# Running: pg_basebackup -D
/home/alvherre/Code/pgsql-build/master/src/bin/pg_basebackup/tmp_check/tmp_test_EwIj/backupxs_sl-X stream -S slot1
 
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  cannot read from replication slot
"slot1"
DETAIL:  This slot has never previously reserved WAL, or has been invalidated
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory
"/home/alvherre/Code/pgsql-build/master/src/bin/pg_basebackup/tmp_check/tmp_test_EwIj/backupxs_sl"
not ok 95 - pg_basebackup -X stream with replication slot runs

#   Failed test 'pg_basebackup -X stream with replication slot runs'
#   at t/010_pg_basebackup.pl line 461.


Anyway I think the current patch can be applied as is -- and if we want
physical replication to have some other behavior, we can patch for that
afterwards.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Darafei "Komяpa" Praliaskouski
Date:
Subject: Re: Parallel GiST build on Cube
Next
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots