RE: Logical replication timeout - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: Logical replication timeout
Date
Msg-id OSCPR01MB14966B646506E0C9B81B3A4CFF5022@OSCPR01MB14966.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Logical replication timeout  (RECHTÉ Marc <marc.rechte@meteo.fr>)
List pgsql-hackers
Dear Shlok,

> 
> Thanks for sharing the analysis.
> 
> I tested the patch on my machine as well and it has worse performance
> for me as well.
> I came up with an alternate approach. In this approach we keep track
> of wal segment the transaction is part of. This helps to iterate
> through only required files during clean up.
> 
> On my machine, I am running the testcase provided by you in [1]. It is
> generating ~1.9 million spill files. For me the transaction completed
> in 56sec.
> Cleanup (deletion of spill files) took around following time:
> With HEAD : ~ 5min
> With latest patch (attached here) : ~2min
> 
> Can you test if this improves performance for you?

I'm also not sure the performance, but I can post my comments.
I'm not sure your patch can properly handle the list operations.

```
+                oldcontext = MemoryContextSwitchTo(rb->context);
+                txn->walsgmts = lappend(txn->walsgmts, curOpenSegNo);
+                MemoryContextSwitchTo(oldcontext);
+
```

IIUC lappend() accepts a point of a Datum, but here a normal value is passed.
Should we define a new struct which represents a node of list and append it
after it is palloc()'d?
Or your code is enough for some reasons?

```
     /* iterate over all possible filenames, and delete them */
-    for (cur = first; cur <= last; cur++)
+    foreach(cell, txn->walsgmts)
     {
+        XLogSegNo curr_segno = (XLogSegNo) lfirst(cell);
         char        path[MAXPGPATH];
 
-        ReorderBufferSerializedPath(path, MyReplicationSlot, txn->xid, cur);
+        ReorderBufferSerializedPath(path, MyReplicationSlot, txn->xid, curr_segno);
         if (unlink(path) != 0 && errno != ENOENT)
             ereport(ERROR,
                     (errcode_for_file_access(),
                      errmsg("could not remove file \"%s\": %m", path)));
     }
+
+    if(txn->walsgmts != NIL)
+    {
+        pfree(txn->walsgmts);
+        txn->walsgmts = NIL;
+    }
```

If above comment is accepted, I feel you can use foreach_delete_current().

=======

Also, even when we optimize the truncation of files, there is a possibility that
replication is timed out. Can you also create a patch which implements [1]?

[1]: https://www.postgresql.org/message-id/CAExHW5s2_T9mULDQRKsdV72wpnA%2BNLT63cX51b51QQVEV4sG5g%40mail.gmail.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

pgsql-hackers by date:

Previous
From: Vladlen Popolitov
Date:
Subject: Re: Exporting float_to_shortest_decimal_buf(n) with Postgres 17 on Windows
Next
From: Alexander Lakhin
Date:
Subject: Re: Make tuple deformation faster