Decoding speculative insert with toast leaks memory - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Decoding speculative insert with toast leaks memory
Date
Msg-id CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
Whole thread Raw
Responses Re: Decoding speculative insert with toast leaks memory  (Peter Geoghegan <pg@bowt.ie>)
Re: Decoding speculative insert with toast leaks memory  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi All,
We saw OOM in a system where WAL sender consumed Gigabttes of memory
which was never released. Upon investigation, we found out that there
were many ReorderBufferToastHash memory contexts linked to
ReorderBuffer context, together consuming gigs of memory. They were
running INSERT ... ON CONFLICT .. among other things. A similar report
at [1]

I could reproduce a memory leak in wal sender using following steps
Session 1
postgres=# create table t_toast (a int primary key, b text);
postgres=# CREATE PUBLICATION dbz_minimal_publication FOR TABLE public.t_toast;

Terminal 4
$ pg_recvlogical -d postgres --slot pgoutput_minimal_test_slot
--create-slot -P pgoutput
$ pg_recvlogical -d postgres --slot pgoutput_minimal_test_slot --start
-o proto_version=1 -o publication_names='dbz_minimal_publication' -f
/dev/null

Session 1
postgres=# select * from pg_replication_slots ;
         slot_name          |  plugin  | slot_type | datoid | database
| temporary | active | active_pid | xmin | catalog_xmin | restart_lsn
| confirmed_flush_lsn

----------------------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
 pgoutput_minimal_test_slot | pgoutput | logical   |  12402 | postgres
| f         | f      |            |      |          570 | 0/15FFFD0
| 0/1600020

postgres=# begin;
postgres=# insert into t_toast values (500, repeat('something' ||
txid_current()::text, 100000)) ON CONFLICT (a) DO NOTHING;
INSERT 0 1

Session 2 (xid = 571)
postgres=# begin;
postgres=# insert into t_toast values (500, repeat('something' ||
txid_current()::text, 100000)) ON CONFLICT (a) DO NOTHING;

Session 3 (xid = 572)
postgres=# begin;
postgres=# insert into t_toast values (500, repeat('something' ||
txid_current()::text, 100000)) ON CONFLICT (a) DO NOTHING;

Session 1 (this session doesn't modify the table but is essential for
speculative insert to happen.)
postgres=# rollback;

Session 2 and 3 in the order in which you get control back (in my case
session 2 with xid = 571)
INSERT 0 1
postgres=# commit;
COMMIT

other session (in my case session 3 with xid = 572)
INSERT 0 0
postgres=# commit;
COMMIT

With the attached patch, we see following in the server logs
2021-03-25 09:57:20.469 IST [12424] LOG:  starting logical decoding
for slot "pgoutput_minimal_test_slot"
2021-03-25 09:57:20.469 IST [12424] DETAIL:  Streaming transactions
committing after 0/1600020, reading WAL from 0/15FFFD0.
2021-03-25 09:57:20.469 IST [12424] LOG:  logical decoding found
consistent point at 0/15FFFD0
2021-03-25 09:57:20.469 IST [12424] DETAIL:  There are no running transactions.
2021-03-25 09:59:45.494 IST [12424] LOG:  initializing hash table for
transaction 571
2021-03-25 09:59:45.494 IST [12424] LOG:  speculative insert
encountered in transaction 571
2021-03-25 09:59:45.494 IST [12424] LOG:  speculative insert confirmed
in transaction 571
2021-03-25 09:59:45.504 IST [12424] LOG:  destroying toast hash for
transaction 571
2021-03-25 09:59:50.806 IST [12424] LOG:  initializing hash table for
transaction 572
2021-03-25 09:59:50.806 IST [12424] LOG:  speculative insert
encountered in transaction 572
2021-03-25 09:59:50.806 IST [12424] LOG:  toast hash for transaction
572 is not cleared

Observe that the toast_hash was cleaned for the transaction 571 which
successfully inserted the row but was not cleaned for the transaction
572 which performed DO NOTHING instead of INSERT.

Here's the sequence of events which leads to memory leak in wal sender
1. Transaction 571 performs a speculative INSERT which is decoded as
toast insert followed by speculative insert of row
2. decoding toast tuple, causes the toast hash to be created
3. Speculative insert is ignored while decoding
4. Speculative insert is confimed and decoded as a normal INSERT, also
destroying the toast hash
5. Transaction 572 performs speculative insert which is decoded as
toast insert followed by speculative insert of row
6. decoding toast tuple causes the toast hash to be created
7. speculative insert is ignored while decoding
... Speculative INSERT is never confirmed and thus toast hash is never
destroyed leaking memory

In memory context dump we see as many ReorderBufferToastHash as the
number of times the above sequence is repeated.
TopMemoryContext: 1279640 total in 7 blocks; 23304 free (17 chunks);
1256336 used
...
    Replication command context: 32768 total in 3 blocks; 10952 free
(9 chunks); 21816 used
 ...
        ReorderBuffer: 8192 total in 1 blocks; 7656 free (7 chunks); 536 used
          ReorderBufferToastHash: 8192 total in 1 blocks; 2056 free (0
chunks); 6136 used
          ReorderBufferToastHash: 8192 total in 1 blocks; 2056 free (0
chunks); 6136 used
          ReorderBufferToastHash: 8192 total in 1 blocks; 2056 free (0
chunks); 6136 used


The relevant code is all in ReoderBufferCommit() in cases
REORDER_BUFFER_CHANGE_INSERT,
REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT and
REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM.

About the solution: The speculative insert needs to be ignored since
it can be rolled back later. If speculative insert is not confirmed,
there is no way to know that the speculative insert change required a
toast_hash table and destroy it before the next change starts.
ReorderBufferCommit seems to notice a speculative insert that was
never confirmed in the following code
1624             change_done:
1625
1626                     /*
1627                      * Either speculative insertion was
confirmed, or it was
1628                      * unsuccessful and the record isn't needed anymore.
1629                      */
1630                     if (specinsert != NULL)
1631                     {
1632                         ReorderBufferReturnChange(rb, specinsert);
1633                         specinsert = NULL;
1634                     }
1635
1636                     if (relation != NULL)
1637                     {
1638                         RelationClose(relation);
1639                         relation = NULL;
1640                     }
1641                     break;

but by then we might have reused the toast_hash and thus can not be
destroyed. But that isn't the problem since the reused toast_hash will
be destroyed eventually.

It's only when the change next to speculative insert is something
other than INSERT/UPDATE/DELETE that we have to worry about a
speculative insert that was never confirmed. So may be for those
cases, we check whether specinsert != null and destroy toast_hash if
it exists.

[1] https://www.postgresql-archive.org/Diagnose-memory-leak-in-logical-replication-td6161318.html

-- 
Best Wishes,
Ashutosh Bapat

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: In-placre persistance change of a relation
Next
From: Michael Paquier
Date:
Subject: Re: Proposal: Save user's original authenticated identity for logging