Mark Dilger <mark.dilger@enterprisedb.com> writes:
> Perhaps having the bloom index messed up answers that, though. I think it should be easy enough to get the path to
theheap main table fork and the bloom main index fork for both the primary and standby and do a filesystem comparison
aspart of the wal test. That would tell us if they differ, and also if the differences are limited to just one or the
other.
I think that's probably overkill, and definitely out-of-scope for
contrib/bloom. If we fear that WAL replay is not reproducing the data
accurately, we should be testing for that in some more centralized place.
Anyway, I confirmed my diagnosis by adding a delay in WAL apply
(0001 below); that makes this test fall over spectacularly.
And 0002 fixes it. So I propose to push 0002 as soon as the
v14 release freeze ends.
Should we back-patch 0002? I'm inclined to think so. Should
we then also back-patch enablement of the bloom test? Less
sure about that, but I'd lean to doing so. A test that appears
to be there but isn't actually invoked is pretty misleading.
regards, tom lane
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e51a7a749d..eecbe57aee 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7370,6 +7370,9 @@ StartupXLOG(void)
{
bool switchedTLI = false;
+ if (random() < INT_MAX/100)
+ pg_usleep(100000);
+
#ifdef WAL_DEBUG
if (XLOG_DEBUG ||
(rmid == RM_XACT_ID && trace_recovery_messages <= DEBUG2) ||
diff --git a/contrib/bloom/t/001_wal.pl b/contrib/bloom/t/001_wal.pl
index 55ad35926f..be8916a8eb 100644
--- a/contrib/bloom/t/001_wal.pl
+++ b/contrib/bloom/t/001_wal.pl
@@ -16,12 +16,10 @@ sub test_index_replay
{
my ($test_name) = @_;
+ local $Test::Builder::Level = $Test::Builder::Level + 1;
+
# Wait for standby to catch up
- my $applname = $node_standby->name;
- my $caughtup_query =
- "SELECT pg_current_wal_lsn() <= write_lsn FROM pg_stat_replication WHERE application_name = '$applname';";
- $node_primary->poll_query_until('postgres', $caughtup_query)
- or die "Timed out while waiting for standby 1 to catch up";
+ $node_primary->wait_for_catchup($node_standby);
my $queries = qq(SET enable_seqscan=off;
SET enable_bitmapscan=on;