RE: How can end users know the cause of LR slot sync delays? - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: How can end users know the cause of LR slot sync delays? |
Date | |
Msg-id | OSCPR01MB149667826BC79248A20999D1BF51AA@OSCPR01MB14966.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: How can end users know the cause of LR slot sync delays? (Shlok Kyal <shlok.kyal.oss@gmail.com>) |
Responses |
RE: How can end users know the cause of LR slot sync delays?
|
List | pgsql-hackers |
Dear Shlok, Thanks for updating the patch. Here are my comments. 01. ``` + /* Update the slot sync reason */ + SpinLockAcquire(&slot->mutex); + if (slot->slot_sync_skip_reason != skip_reason) + slot->slot_sync_skip_reason = skip_reason; + SpinLockRelease(&slot->mutex); ``` Per my understanding, spinlock is acquired when the attribute on the shared memory is updating. Can you check other parts and follow the rukle? 02. ``` + SpinLockAcquire(&slot->mutex); + synced = slot->data.synced; + SpinLockRelease(&slot->mutex); ``` Same as 1. 03. ``` ``` 04. ``` +#ifdef USE_INJECTION_POINTS + INJECTION_POINT("slot-sync-skip", NULL); +#endif ``` No need do do #ifdef here. INJECTION_POINT() itself checks internally. 05. ``` +# Initialize the primary cluster +my $primary = PostgreSQL::Test::Cluster->new('publisher'); +$primary->init( + allows_streaming => 'logical', + auth_extra => [ '--create-role' => 'repl_role' ]); ``` Do we have to create repl_role? I'm not sure where it's used. 06. ``` +$primary->append_conf( + 'postgresql.conf', qq{ +autovacuum = off +max_prepared_transactions = 1 +}); ``` Do we have to set max_prepared_transactions? PREPARE command is not used. 07. ``` +# Load the injection_points extension +$primary->safe_psql('postgres', q(CREATE EXTENSION injection_points)) ``` We must check whether the injection_points is available or not. See 047_checkpoint_physical_slot.pl. 08. ``` +# Initialize standby from primary backup +my $standby1 = PostgreSQL::Test::Cluster->new('standby1'); +$standby1->init_from_backup( + $primary, $backup_name, + has_streaming => 1, + has_restoring => 1); ``` To clarify, is there a reason why we set has_restoring? Can we remove it? 09. ``` +my $connstr_1 = $primary->connstr; ``` Since this is an only connection string in the test, suffix _1 is not needed. 10. ``` +# Simulate standby connection failure by modifying pg_hba.conf +unlink($primary->data_dir . '/pg_hba.conf'); +$primary->append_conf('pg_hba.conf', + qq{local all all trust} +); ``` What if the system does not have Unix domain socket? I'm afraid all connections could be brocked in this case. 11. ``` +# Attempt to sync replication slots while standby is behind +$standby1->psql('postgres', "SELECT pg_sync_replication_slots();"); ``` When the command can be failed, can you set a error message from the command to a variable? Otherwise an ERROR is output to the result file, which is suprising. ``` psql:<stdin>:1: ERROR: skipping slot synchronization because the received slot sync LSN 0/03019E58 for slot "slot_sync"is ahead of the standby position 0/030000D8 [17:23:21.495](0.384s) ok 2 - slot sync skip when standby is behind ``` 12. ``` +# Restore connectivity between primary and standby +unlink($primary->data_dir . '/pg_hba.conf'); +$primary->append_conf( + 'pg_hba.conf', + qq{ +local all all trust +local replication all trust +}); ``` Same as 10. Also, no need to do unlink() here. 13. ``` +# Create a new logical slot for testing injection point +$primary->safe_psql('postgres', + "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)" +); ``` Before here can you add a description what you would test? 14. ``` # Create a physical replication slot on primary for standby $primary->psql('postgres', q{SELECT pg_create_physical_replication_slot('sb1_slot');}); ``` Use safe_psql instead of psql(). 15. ``` - if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) + if (slot || (slot = SearchNamedReplicationSlot(remote_slot->name, true))) ``` Is there a possibility that slot is not NULL? It is used when confirmed_flush exceeds the latestFlushPtr, but in this case the function cannot reach here. Best regards, Hayato Kuroda FUJITSU LIMITED
pgsql-hackers by date: