RE: speed up a logical replica setup - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: speed up a logical replica setup |
Date | |
Msg-id | TYCPR01MB1207713BEC5C379A05D65E342F54B2@TYCPR01MB12077.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: speed up a logical replica setup ("Euler Taveira" <euler@eulerto.com>) |
Responses |
RE: speed up a logical replica setup
|
List | pgsql-hackers |
Dear Euler, Further comments for v17. 01. This program assumes that the target server has same major version with this. Because the target server would be restarted by same version's pg_ctl command. I felt it should be ensured by reading the PG_VERSION. 02. pg_upgrade checked the version of using executables, like pg_ctl, postgres, and pg_resetwal. I felt it should be as well. 03. get_bin_directory ``` if (find_my_exec(path, full_path) < 0) { pg_log_error("The program \"%s\" is needed by %s but was not found in the\n" "same directory as \"%s\".\n", "pg_ctl", progname, full_path); ``` s/"pg_ctl"/progname 04. Missing canonicalize_path()? 05. Assuming that the target server is a cascade standby, i.e., it has a role as another primary. In this case, I thought the child node would not work. Because pg_createsubcriber runs pg_resetwal and all WAL files would be discarded at that time. I have not tested, but should the program detect it and exit earlier? 06. wait_for_end_recovery() waits forever even if the standby has been disconnected from the primary, right? should we check the status of the replication via pg_stat_wal_receiver? 07. The cleanup function has couple of bugs. * If subscriptions have been created on the database, the function also tries to drop a publication. But it leads an ERROR because it has been already dropped. See setup_subscriber(). * If the subscription has been created, drop_replication_slot() leads an ERROR. Because the subscriber tried to drop the subscription while executing DROP SUBSCRIPTION. 08. I found that all messages (ERROR, WARNING, INFO, etc...) would output to stderr, but I felt it should be on stdout. Is there a reason? pg_dump outputs messages to stderr, but the motivation might be to avoid confusion with dumps. 09. I'm not sure the cleanup for subscriber is really needed. Assuming that there are two databases, e.g., pg1 pg2 , and we fail to create a subscription on pg2. This can happen when the subscription which has the same name has been already created on the primary server. In this case a subscirption pn pg1 would be removed. But what is a next step? Since a timelineID on the standby server is larger than the primary (note that the standby has been promoted once), we cannot resume the physical replication as-is. IIUC the easiest method to retry is removing a cluster once and restarting from pg_basebackup. If so, no need to cleanup the standby because it is corrupted. We just say "Please remove the cluster and recreate again". Here is a reproducer. 1. apply the txt patch atop 0001 patch. 2. run test_corruption.sh. 3. when you find a below output [1], connect to a testdb from another terminal and run CREATE SUBSCRITPION for the same subscription on the primary 4. Finally, pg_createsubscriber would fail the creation. I also attached server logs of both nodes and the output. Note again that this is a real issue. I used a tricky way for surely overlapping name, but this can happen randomly. 10. While investigating #09, I found that we cannot report properly a reason why the subscription cannot be created. The output said: ``` pg_createsubscriber: error: could not create subscription "pg_createsubscriber_16389_3884" on database "testdb": out of memory ``` But the standby serverlog said: ``` ERROR: subscription "pg_createsubscriber_16389_3884" already exists STATEMENT: CREATE SUBSCRIPTION pg_createsubscriber_16389_3884 CONNECTION 'user=postgres port=5431 dbname=testdb' PUBLICATIONpg_createsubscriber_16389 WITH (create_slot = false, copy_data = false, enabled = false) ``` [1] ``` pg_createsubscriber: creating the replication slot "pg_createsubscriber_16389_3884" on database "testdb" pg_createsubscriber: XXX: sleep 20s ``` Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/global/
Attachment
pgsql-hackers by date: