RE: speed up a logical replica setup - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: speed up a logical replica setup
Date
Msg-id OSBPR01MB25521B15BF950D2523BBE143F5D32@OSBPR01MB2552.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: speed up a logical replica setup  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: speed up a logical replica setup
List pgsql-hackers
Dear Tom,

> I have a different but possibly-related complaint: why is
> 040_pg_createsubscriber.pl so miserably slow?  On my machine it
> runs for a bit over 19 seconds, which seems completely out of line
> (for comparison, 010_pg_basebackup.pl takes 6 seconds, and the
> other test scripts in this directory take much less).  It looks
> like most of the blame falls on this step:
> 
> [12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S
> 
> AFAICS the amount of data being replicated is completely trivial,
> so that it doesn't make any sense for this to take so long --- and
> if it does, that suggests that this tool will be impossibly slow
> for production use.  But I suspect there is a logic flaw causing
> this.

I analyzed the issue. My elog() debugging said that wait_for_end_recovery() was
wasted some time. This was caused by the recovery target seeming unsatisfactory.

We are setting recovery_target_lsn by the return value of pg_create_logical_replication_slot(),
which returns the end of the RUNNING_XACT record. If we use the returned value as
recovery_target_lsn as-is, however, we must wait for additional WAL generation
because the parameter requires that the replicated WAL overtake a certain point.
On my env, the function waited until the bgwriter emitted the XLOG_RUNNING_XACTS record.

One simple solution is to add an additional WAL record at the end of the publisher
setup. IIUC, an arbitrary WAL insertion can reduce the waiting time. The attached
patch inserts a small XLOG_LOGICAL_MESSAGE record, which could reduce much execution
time on my environment.

```
BEFORE
(13.751s) ok 30 - run pg_createsubscriber on node S
AFTER
(0.749s) ok 30 - run pg_createsubscriber on node S
```

However, even after the modification, the reported failure [1] could not be resolved on my env.

How do you think?

[1]: https://www.postgresql.org/message-id/0dffca12-bf17-4a7a-334d-225569de5e6e%40gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Should we document how column DEFAULT expressions work?
Next
From: Alvaro Herrera
Date:
Subject: Re: LogwrtResult contended spinlock