Thread: refactor subscription tests to use PostgresNode's wait_for_catchup
It appears that we have unwittingly created some duplicate and copy-and-paste-prone code in src/test/subscription/ to wait for a replication subscriber to catch up, when we already have almost-sufficient code in PostgresNode to do that more compactly. So I propose this patch to consolidate that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Mon, Jan 08, 2018 at 09:46:21PM -0500, Peter Eisentraut wrote: > It appears that we have unwittingly created some duplicate and > copy-and-paste-prone code in src/test/subscription/ to wait for a > replication subscriber to catch up, when we already have > almost-sufficient code in PostgresNode to do that more compactly. So I > propose this patch to consolidate that. This looks sane to me. I have two comments while I read the surroundings. > @@ -1505,7 +1515,7 @@ sub wait_for_catchup > . $target_lsn . " on " > . $self->name . "\n"; > my $query = > -qq[SELECT '$target_lsn' <= ${mode}_lsn FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';]; > +qq[SELECT $lsn_expr <= ${mode}_lsn FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';]; > $self->poll_query_until('postgres', $query) > or die "timed out waiting for catchup, current location is " > . ($self->safe_psql('postgres', $query) || '(unknown)'); This log is wrong from the beginning. Here $query returns a boolean status and not a location. I think that when the poll dies because of a timeout you should do a lookup at ${mode}_lsn from pg_stat_replication when application_name matching $standby_name. Could you fix that as well? Could you also update promote_standby in RewindTest.pm? Your refactoring to use pg_current_wal_lsn() if a target_lsn is not possible makes this move possible. Using the generic APIs gives better logs as well. -- Michael
Attachment
On 1/8/18 23:47, Michael Paquier wrote: >> @@ -1505,7 +1515,7 @@ sub wait_for_catchup >> . $target_lsn . " on " >> . $self->name . "\n"; >> my $query = >> -qq[SELECT '$target_lsn' <= ${mode}_lsn FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';]; >> +qq[SELECT $lsn_expr <= ${mode}_lsn FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';]; >> $self->poll_query_until('postgres', $query) >> or die "timed out waiting for catchup, current location is " >> . ($self->safe_psql('postgres', $query) || '(unknown)'); > > This log is wrong from the beginning. Here $query returns a boolean > status and not a location. I think that when the poll dies because of a > timeout you should do a lookup at ${mode}_lsn from pg_stat_replication > when application_name matching $standby_name. Could you fix that as > well? Should we just remove it? Apparently, it was never functional to begin with. Otherwise, we'd have to write a second query to return the value to print. wait_for_slot_catchup has the same issue. Seems like a lot of overhead for something that has never been used. > Could you also update promote_standby in RewindTest.pm? Your refactoring > to use pg_current_wal_lsn() if a target_lsn is not possible makes this > move possible. Using the generic APIs gives better logs as well. Right. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jan 10, 2018 at 09:45:56PM -0500, Peter Eisentraut wrote: > On 1/8/18 23:47, Michael Paquier wrote: > Should we just remove it? Apparently, it was never functional to begin > with. Otherwise, we'd have to write a second query to return the value > to print. wait_for_slot_catchup has the same issue. Seems like a lot > of overhead for something that has never been used. Fine for me to remove it. -- Michael
Attachment
On 1/10/18 22:24, Michael Paquier wrote: > On Wed, Jan 10, 2018 at 09:45:56PM -0500, Peter Eisentraut wrote: >> On 1/8/18 23:47, Michael Paquier wrote: >> Should we just remove it? Apparently, it was never functional to begin >> with. Otherwise, we'd have to write a second query to return the value >> to print. wait_for_slot_catchup has the same issue. Seems like a lot >> of overhead for something that has never been used. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services