Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables |
Date | |
Msg-id | CAA4eK1Lb3sY8TEfQrtZ8ceeHy3=Z-H=dsYcbjWnYonD=e8EvHA@mail.gmail.com Whole thread Raw |
In response to | Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables (Keisuke Kuroda <keisuke.kuroda.3862@gmail.com>) |
Responses |
Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
|
List | pgsql-hackers |
On Wed, Sep 23, 2020 at 1:09 PM Keisuke Kuroda <keisuke.kuroda.3862@gmail.com> wrote: > > Hi hackers, > > I found a problem in logical replication. > It seems to have the same cause as the following problem. > > Creating many tables gets logical replication stuck > https://www.postgresql.org/message-id/flat/20f3de7675f83176253f607b5e199b228406c21c.camel%40cybertec.at > > Logical decoding CPU-bound w/ large number of tables > https://www.postgresql.org/message-id/flat/CAHoiPjzea6N0zuCi%3D%2Bf9v_j94nfsy6y8SU7-%3Dbp4%3D7qw6_i%3DRg%40mail.gmail.com > > # problem > > * logical replication enabled > * walsender process has RelfilenodeMap cache(2000 relations in this case) > * TRUNCATE or DROP or CREATE many tables in same transaction > > At this time, walsender process continues to use 100% of the CPU 1core. > ... ... > > ./src/backend/replication/logical/reorderbuffer.c > 1746 case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID: > 1747 Assert(change->data.command_id != InvalidCommandId); > 1748 > 1749 if (command_id < change->data.command_id) > 1750 { > 1751 command_id = change->data.command_id; > 1752 > 1753 if (!snapshot_now->copied) > 1754 { > 1755 /* we don't use the global one anymore */ > 1756 snapshot_now = ReorderBufferCopySnap(rb, snapshot_now, > 1757 txn, command_id); > 1758 } > 1759 > 1760 snapshot_now->curcid = command_id; > 1761 > 1762 TeardownHistoricSnapshot(false); > 1763 SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash); > 1764 > 1765 /* > 1766 * Every time the CommandId is incremented, we could > 1767 * see new catalog contents, so execute all > 1768 * invalidations. > 1769 */ > 1770 ReorderBufferExecuteInvalidations(rb, txn); > 1771 } > 1772 > 1773 break; > > Do you have any solutions? > Yeah, I have an idea on how to solve this problem. This problem is primarily due to the reason that we use to receive invalidations only at commit time and then we need to execute them after each command id change. However, after commit c55040ccd0 (When wal_level=logical, write invalidations at command end into WAL so that decoding can use this information.) we actually know exactly when we need to execute each invalidation. The idea is that instead of collecting invalidations only in ReorderBufferTxn, we need to collect them in form of ReorderBufferChange as well similar to what we do for other changes (for ex. REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID). In this case, we need to collect additionally in ReorderBufferTxn because if the transaction is aborted or some exception occurred while executing the changes we need to perform all the invalidations. -- With Regards, Amit Kapila.
pgsql-hackers by date: