Thread: Decoding speculative insert with toast leaks memory
Hi All, We saw OOM in a system where WAL sender consumed Gigabttes of memory which was never released. Upon investigation, we found out that there were many ReorderBufferToastHash memory contexts linked to ReorderBuffer context, together consuming gigs of memory. They were running INSERT ... ON CONFLICT .. among other things. A similar report at [1] I could reproduce a memory leak in wal sender using following steps Session 1 postgres=# create table t_toast (a int primary key, b text); postgres=# CREATE PUBLICATION dbz_minimal_publication FOR TABLE public.t_toast; Terminal 4 $ pg_recvlogical -d postgres --slot pgoutput_minimal_test_slot --create-slot -P pgoutput $ pg_recvlogical -d postgres --slot pgoutput_minimal_test_slot --start -o proto_version=1 -o publication_names='dbz_minimal_publication' -f /dev/null Session 1 postgres=# select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn ----------------------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+--------------------- pgoutput_minimal_test_slot | pgoutput | logical | 12402 | postgres | f | f | | | 570 | 0/15FFFD0 | 0/1600020 postgres=# begin; postgres=# insert into t_toast values (500, repeat('something' || txid_current()::text, 100000)) ON CONFLICT (a) DO NOTHING; INSERT 0 1 Session 2 (xid = 571) postgres=# begin; postgres=# insert into t_toast values (500, repeat('something' || txid_current()::text, 100000)) ON CONFLICT (a) DO NOTHING; Session 3 (xid = 572) postgres=# begin; postgres=# insert into t_toast values (500, repeat('something' || txid_current()::text, 100000)) ON CONFLICT (a) DO NOTHING; Session 1 (this session doesn't modify the table but is essential for speculative insert to happen.) postgres=# rollback; Session 2 and 3 in the order in which you get control back (in my case session 2 with xid = 571) INSERT 0 1 postgres=# commit; COMMIT other session (in my case session 3 with xid = 572) INSERT 0 0 postgres=# commit; COMMIT With the attached patch, we see following in the server logs 2021-03-25 09:57:20.469 IST [12424] LOG: starting logical decoding for slot "pgoutput_minimal_test_slot" 2021-03-25 09:57:20.469 IST [12424] DETAIL: Streaming transactions committing after 0/1600020, reading WAL from 0/15FFFD0. 2021-03-25 09:57:20.469 IST [12424] LOG: logical decoding found consistent point at 0/15FFFD0 2021-03-25 09:57:20.469 IST [12424] DETAIL: There are no running transactions. 2021-03-25 09:59:45.494 IST [12424] LOG: initializing hash table for transaction 571 2021-03-25 09:59:45.494 IST [12424] LOG: speculative insert encountered in transaction 571 2021-03-25 09:59:45.494 IST [12424] LOG: speculative insert confirmed in transaction 571 2021-03-25 09:59:45.504 IST [12424] LOG: destroying toast hash for transaction 571 2021-03-25 09:59:50.806 IST [12424] LOG: initializing hash table for transaction 572 2021-03-25 09:59:50.806 IST [12424] LOG: speculative insert encountered in transaction 572 2021-03-25 09:59:50.806 IST [12424] LOG: toast hash for transaction 572 is not cleared Observe that the toast_hash was cleaned for the transaction 571 which successfully inserted the row but was not cleaned for the transaction 572 which performed DO NOTHING instead of INSERT. Here's the sequence of events which leads to memory leak in wal sender 1. Transaction 571 performs a speculative INSERT which is decoded as toast insert followed by speculative insert of row 2. decoding toast tuple, causes the toast hash to be created 3. Speculative insert is ignored while decoding 4. Speculative insert is confimed and decoded as a normal INSERT, also destroying the toast hash 5. Transaction 572 performs speculative insert which is decoded as toast insert followed by speculative insert of row 6. decoding toast tuple causes the toast hash to be created 7. speculative insert is ignored while decoding ... Speculative INSERT is never confirmed and thus toast hash is never destroyed leaking memory In memory context dump we see as many ReorderBufferToastHash as the number of times the above sequence is repeated. TopMemoryContext: 1279640 total in 7 blocks; 23304 free (17 chunks); 1256336 used ... Replication command context: 32768 total in 3 blocks; 10952 free (9 chunks); 21816 used ... ReorderBuffer: 8192 total in 1 blocks; 7656 free (7 chunks); 536 used ReorderBufferToastHash: 8192 total in 1 blocks; 2056 free (0 chunks); 6136 used ReorderBufferToastHash: 8192 total in 1 blocks; 2056 free (0 chunks); 6136 used ReorderBufferToastHash: 8192 total in 1 blocks; 2056 free (0 chunks); 6136 used The relevant code is all in ReoderBufferCommit() in cases REORDER_BUFFER_CHANGE_INSERT, REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT and REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM. About the solution: The speculative insert needs to be ignored since it can be rolled back later. If speculative insert is not confirmed, there is no way to know that the speculative insert change required a toast_hash table and destroy it before the next change starts. ReorderBufferCommit seems to notice a speculative insert that was never confirmed in the following code 1624 change_done: 1625 1626 /* 1627 * Either speculative insertion was confirmed, or it was 1628 * unsuccessful and the record isn't needed anymore. 1629 */ 1630 if (specinsert != NULL) 1631 { 1632 ReorderBufferReturnChange(rb, specinsert); 1633 specinsert = NULL; 1634 } 1635 1636 if (relation != NULL) 1637 { 1638 RelationClose(relation); 1639 relation = NULL; 1640 } 1641 break; but by then we might have reused the toast_hash and thus can not be destroyed. But that isn't the problem since the reused toast_hash will be destroyed eventually. It's only when the change next to speculative insert is something other than INSERT/UPDATE/DELETE that we have to worry about a speculative insert that was never confirmed. So may be for those cases, we check whether specinsert != null and destroy toast_hash if it exists. [1] https://www.postgresql-archive.org/Diagnose-memory-leak-in-logical-replication-td6161318.html -- Best Wishes, Ashutosh Bapat
Attachment
On Wed, Mar 24, 2021 at 10:34 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > Hi All, > We saw OOM in a system where WAL sender consumed Gigabttes of memory > which was never released. Upon investigation, we found out that there > were many ReorderBufferToastHash memory contexts linked to > ReorderBuffer context, together consuming gigs of memory. They were > running INSERT ... ON CONFLICT .. among other things. A similar report > at [1] What is the relationship between this bug and commit 7259736a6e5, dealt specifically with TOAST and speculative insertion resource management issues within reorderbuffer.c? Amit? -- Peter Geoghegan
On Thu, Mar 25, 2021 at 11:04 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > Hi All, > We saw OOM in a system where WAL sender consumed Gigabttes of memory > which was never released. Upon investigation, we found out that there > were many ReorderBufferToastHash memory contexts linked to > ReorderBuffer context, together consuming gigs of memory. They were > running INSERT ... ON CONFLICT .. among other things. A similar report > at [1] > .. > > but by then we might have reused the toast_hash and thus can not be > destroyed. But that isn't the problem since the reused toast_hash will > be destroyed eventually. > > It's only when the change next to speculative insert is something > other than INSERT/UPDATE/DELETE that we have to worry about a > speculative insert that was never confirmed. So may be for those > cases, we check whether specinsert != null and destroy toast_hash if > it exists. > Can we consider the possibility to destroy the toast_hash in ReorderBufferCleanupTXN/ReorderBufferTruncateTXN? It will delay the clean up of memory till the end of stream or txn but there won't be any memory leak. -- With Regards, Amit Kapila.
On Thu, May 27, 2021 at 8:27 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Mar 24, 2021 at 10:34 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > Hi All, > > We saw OOM in a system where WAL sender consumed Gigabttes of memory > > which was never released. Upon investigation, we found out that there > > were many ReorderBufferToastHash memory contexts linked to > > ReorderBuffer context, together consuming gigs of memory. They were > > running INSERT ... ON CONFLICT .. among other things. A similar report > > at [1] > > What is the relationship between this bug and commit 7259736a6e5, > dealt specifically with TOAST and speculative insertion resource > management issues within reorderbuffer.c? Amit? > This seems to be a pre-existing bug. This should be reproduced in PG-13 and or prior to that commit. Ashutosh can confirm? -- With Regards, Amit Kapila.
On Thu, May 27, 2021 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Mar 25, 2021 at 11:04 AM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > Hi All, > > We saw OOM in a system where WAL sender consumed Gigabttes of memory > > which was never released. Upon investigation, we found out that there > > were many ReorderBufferToastHash memory contexts linked to > > ReorderBuffer context, together consuming gigs of memory. They were > > running INSERT ... ON CONFLICT .. among other things. A similar report > > at [1] > > > .. > > > > but by then we might have reused the toast_hash and thus can not be > > destroyed. But that isn't the problem since the reused toast_hash will > > be destroyed eventually. > > > > It's only when the change next to speculative insert is something > > other than INSERT/UPDATE/DELETE that we have to worry about a > > speculative insert that was never confirmed. So may be for those > > cases, we check whether specinsert != null and destroy toast_hash if > > it exists. > > > > Can we consider the possibility to destroy the toast_hash in > ReorderBufferCleanupTXN/ReorderBufferTruncateTXN? It will delay the > clean up of memory till the end of stream or txn but there won't be > any memory leak. > The other possibility could be to clean it up when we clean the spec insert change in the below code: /* * There's a speculative insertion remaining, just clean in up, it * can't have been successful, otherwise we'd gotten a confirmation * record. */ if (specinsert) { ReorderBufferReturnChange(rb, specinsert, true); specinsert = NULL; } But I guess we might miss cleaning it up in case of an error. A similar problem could be there in the idea where we will try to tie the clean up with the next change. -- With Regards, Amit Kapila.
On Thu, May 27, 2021 at 9:03 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Mar 25, 2021 at 11:04 AM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > Hi All, > > We saw OOM in a system where WAL sender consumed Gigabttes of memory > > which was never released. Upon investigation, we found out that there > > were many ReorderBufferToastHash memory contexts linked to > > ReorderBuffer context, together consuming gigs of memory. They were > > running INSERT ... ON CONFLICT .. among other things. A similar report > > at [1] > > > .. > > > > but by then we might have reused the toast_hash and thus can not be > > destroyed. But that isn't the problem since the reused toast_hash will > > be destroyed eventually. > > > > It's only when the change next to speculative insert is something > > other than INSERT/UPDATE/DELETE that we have to worry about a > > speculative insert that was never confirmed. So may be for those > > cases, we check whether specinsert != null and destroy toast_hash if > > it exists. > > > > Can we consider the possibility to destroy the toast_hash in > ReorderBufferCleanupTXN/ReorderBufferTruncateTXN? It will delay the > clean up of memory till the end of stream or txn but there won't be > any memory leak. Currently, we are ignoring XLH_DELETE_IS_SUPER, so maybe we can do something based on this flag? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, May 27, 2021 at 9:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, May 27, 2021 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Mar 25, 2021 at 11:04 AM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > Hi All, > > > We saw OOM in a system where WAL sender consumed Gigabttes of memory > > > which was never released. Upon investigation, we found out that there > > > were many ReorderBufferToastHash memory contexts linked to > > > ReorderBuffer context, together consuming gigs of memory. They were > > > running INSERT ... ON CONFLICT .. among other things. A similar report > > > at [1] > > > > > .. > > > > > > but by then we might have reused the toast_hash and thus can not be > > > destroyed. But that isn't the problem since the reused toast_hash will > > > be destroyed eventually. > > > > > > It's only when the change next to speculative insert is something > > > other than INSERT/UPDATE/DELETE that we have to worry about a > > > speculative insert that was never confirmed. So may be for those > > > cases, we check whether specinsert != null and destroy toast_hash if > > > it exists. > > > > > > > Can we consider the possibility to destroy the toast_hash in > > ReorderBufferCleanupTXN/ReorderBufferTruncateTXN? It will delay the > > clean up of memory till the end of stream or txn but there won't be > > any memory leak. > > > > The other possibility could be to clean it up when we clean the spec > insert change in the below code: Yeah that could be done. > /* > * There's a speculative insertion remaining, just clean in up, it > * can't have been successful, otherwise we'd gotten a confirmation > * record. > */ > if (specinsert) > { > ReorderBufferReturnChange(rb, specinsert, true); > specinsert = NULL; > } > > But I guess we might miss cleaning it up in case of an error. A > similar problem could be there in the idea where we will try to tie > the clean up with the next change. In error case also we can handle it in the CATCH block no? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, May 27, 2021 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, May 27, 2021 at 9:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > Can we consider the possibility to destroy the toast_hash in > > > ReorderBufferCleanupTXN/ReorderBufferTruncateTXN? It will delay the > > > clean up of memory till the end of stream or txn but there won't be > > > any memory leak. > > > > > > > The other possibility could be to clean it up when we clean the spec > > insert change in the below code: > > Yeah that could be done. > > > /* > > * There's a speculative insertion remaining, just clean in up, it > > * can't have been successful, otherwise we'd gotten a confirmation > > * record. > > */ > > if (specinsert) > > { > > ReorderBufferReturnChange(rb, specinsert, true); > > specinsert = NULL; > > } > > > > But I guess we might miss cleaning it up in case of an error. A > > similar problem could be there in the idea where we will try to tie > > the clean up with the next change. > > In error case also we can handle it in the CATCH block no? > True, but if you do this clean-up in ReorderBufferCleanupTXN then you don't need to take care at separate places. Also, toast_hash is stored in txn so it appears natural to clean it up in while releasing TXN. -- With Regards, Amit Kapila.
On Thu, May 27, 2021 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, May 27, 2021 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > True, but if you do this clean-up in ReorderBufferCleanupTXN then you > don't need to take care at separate places. Also, toast_hash is stored > in txn so it appears natural to clean it up in while releasing TXN. Make sense, basically, IMHO we will have to do in TruncateTXN and ReturnTXN as attached? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On 5/27/21 6:36 AM, Dilip Kumar wrote: > On Thu, May 27, 2021 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Thu, May 27, 2021 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: >> >> True, but if you do this clean-up in ReorderBufferCleanupTXN then you >> don't need to take care at separate places. Also, toast_hash is stored >> in txn so it appears natural to clean it up in while releasing TXN. > > Make sense, basically, IMHO we will have to do in TruncateTXN and > ReturnTXN as attached? > Yeah, I've been working on a fix over the last couple days (because of a customer issue), and I ended up with the reset in ReorderBufferReturnTXN too - it does solve the issue in the case I've been investigating. I'm not sure the reset in ReorderBufferTruncateTXN is correct, though. Isn't it possible that we'll need the TOAST data after streaming part of the transaction? After all, we're not resetting invalidations, tuplecids and snapshot either ... And we'll eventually clean it after the streamed transaction gets commited (ReorderBufferStreamCommit ends up calling ReorderBufferReturnTXN too). I wonder if there's a way to free the TOASTed data earlier, instead of waiting until the end of the transaction (as this patch does). But I suspect it'd be way more complex, harder to backpatch, and destroying the hash table is a good idea anyway. Also, I think the "if (txn->toast_hash != NULL)" checks are not needed, because it's the first thing ReorderBufferToastReset does. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 28, 2021 at 5:16 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > On 5/27/21 6:36 AM, Dilip Kumar wrote: > > On Thu, May 27, 2021 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > >> On Thu, May 27, 2021 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > >> > >> True, but if you do this clean-up in ReorderBufferCleanupTXN then you > >> don't need to take care at separate places. Also, toast_hash is stored > >> in txn so it appears natural to clean it up in while releasing TXN. > > > > Make sense, basically, IMHO we will have to do in TruncateTXN and > > ReturnTXN as attached? > > > > Yeah, I've been working on a fix over the last couple days (because of a > customer issue), and I ended up with the reset in ReorderBufferReturnTXN > too - it does solve the issue in the case I've been investigating. > > I'm not sure the reset in ReorderBufferTruncateTXN is correct, though. > Isn't it possible that we'll need the TOAST data after streaming part of > the transaction? After all, we're not resetting invalidations, tuplecids > and snapshot either Actually, as per the current design, we don't need the toast data after the streaming. Because currently, we don't allow to stream the transaction if we need to keep the toast across stream e.g. if we only have toast insert without the main insert we assure this as partial changes, another case is if we have multi-insert with toast then we keep the transaction as mark partial until we get the last insert of the multi-insert. So with the current design we don't have any case where we need to keep toast data across streams. ... And we'll eventually clean it after the streamed > transaction gets commited (ReorderBufferStreamCommit ends up calling > ReorderBufferReturnTXN too). Right, but generally after streaming we assert that txn->size should be 0. That could be changed if we change the above design but this is what it is today. > I wonder if there's a way to free the TOASTed data earlier, instead of > waiting until the end of the transaction (as this patch does). But I > suspect it'd be way more complex, harder to backpatch, and destroying > the hash table is a good idea anyway. Right. > Also, I think the "if (txn->toast_hash != NULL)" checks are not needed, > because it's the first thing ReorderBufferToastReset does. I see, I will change this. If we all agree with this idea. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On 5/28/21 2:17 PM, Dilip Kumar wrote: > On Fri, May 28, 2021 at 5:16 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> On 5/27/21 6:36 AM, Dilip Kumar wrote: >>> On Thu, May 27, 2021 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >>>> >>>> On Thu, May 27, 2021 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: >>>> >>>> True, but if you do this clean-up in ReorderBufferCleanupTXN then you >>>> don't need to take care at separate places. Also, toast_hash is stored >>>> in txn so it appears natural to clean it up in while releasing TXN. >>> >>> Make sense, basically, IMHO we will have to do in TruncateTXN and >>> ReturnTXN as attached? >>> >> >> Yeah, I've been working on a fix over the last couple days (because of a >> customer issue), and I ended up with the reset in ReorderBufferReturnTXN >> too - it does solve the issue in the case I've been investigating. >> >> I'm not sure the reset in ReorderBufferTruncateTXN is correct, though. >> Isn't it possible that we'll need the TOAST data after streaming part of >> the transaction? After all, we're not resetting invalidations, tuplecids >> and snapshot either > > Actually, as per the current design, we don't need the toast data > after the streaming. Because currently, we don't allow to stream the > transaction if we need to keep the toast across stream e.g. if we only > have toast insert without the main insert we assure this as partial > changes, another case is if we have multi-insert with toast then we > keep the transaction as mark partial until we get the last insert of > the multi-insert. So with the current design we don't have any case > where we need to keep toast data across streams. > > ... And we'll eventually clean it after the streamed >> transaction gets commited (ReorderBufferStreamCommit ends up calling >> ReorderBufferReturnTXN too). > > Right, but generally after streaming we assert that txn->size should > be 0. That could be changed if we change the above design but this is > what it is today. > Can we add some assert to enforce this? >> I wonder if there's a way to free the TOASTed data earlier, instead of >> waiting until the end of the transaction (as this patch does). But I >> suspect it'd be way more complex, harder to backpatch, and destroying >> the hash table is a good idea anyway. > > Right. > >> Also, I think the "if (txn->toast_hash != NULL)" checks are not needed, >> because it's the first thing ReorderBufferToastReset does. > > I see, I will change this. If we all agree with this idea. > +1 from me. I think it's good enough to do the cleanup at the end, and it's an improvement compared to current state. There might be cases of transactions doing many such speculative inserts and accumulating a lot of data in the TOAST hash, but I find it very unlikely. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 28, 2021 at 6:01 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 5/28/21 2:17 PM, Dilip Kumar wrote: > > On Fri, May 28, 2021 at 5:16 PM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> On 5/27/21 6:36 AM, Dilip Kumar wrote: > >>> On Thu, May 27, 2021 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > >>>> > >>>> On Thu, May 27, 2021 at 9:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > >>>> > >>>> True, but if you do this clean-up in ReorderBufferCleanupTXN then you > >>>> don't need to take care at separate places. Also, toast_hash is stored > >>>> in txn so it appears natural to clean it up in while releasing TXN. > >>> > >>> Make sense, basically, IMHO we will have to do in TruncateTXN and > >>> ReturnTXN as attached? > >>> > >> > >> Yeah, I've been working on a fix over the last couple days (because of a > >> customer issue), and I ended up with the reset in ReorderBufferReturnTXN > >> too - it does solve the issue in the case I've been investigating. > >> > >> I'm not sure the reset in ReorderBufferTruncateTXN is correct, though. > >> Isn't it possible that we'll need the TOAST data after streaming part of > >> the transaction? After all, we're not resetting invalidations, tuplecids > >> and snapshot either > > > > Actually, as per the current design, we don't need the toast data > > after the streaming. Because currently, we don't allow to stream the > > transaction if we need to keep the toast across stream e.g. if we only > > have toast insert without the main insert we assure this as partial > > changes, another case is if we have multi-insert with toast then we > > keep the transaction as mark partial until we get the last insert of > > the multi-insert. So with the current design we don't have any case > > where we need to keep toast data across streams. > > > > ... And we'll eventually clean it after the streamed > >> transaction gets commited (ReorderBufferStreamCommit ends up calling > >> ReorderBufferReturnTXN too). > > > > Right, but generally after streaming we assert that txn->size should > > be 0. That could be changed if we change the above design but this is > > what it is today. > > > > Can we add some assert to enforce this? > There is already an Assert for this. See ReorderBufferCheckMemoryLimit. -- With Regards, Amit Kapila.
On Fri, May 28, 2021 at 5:16 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > I wonder if there's a way to free the TOASTed data earlier, instead of > waiting until the end of the transaction (as this patch does). > IIUC we are anyway freeing the toasted data at the next insert/update/delete. We can try to free at other change message types like REORDER_BUFFER_CHANGE_MESSAGE but as you said that may make the patch more complex, so it seems better to do the fix on the lines of what is proposed in the patch. -- With Regards, Amit Kapila.
On 5/29/21 6:29 AM, Amit Kapila wrote: > On Fri, May 28, 2021 at 5:16 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> I wonder if there's a way to free the TOASTed data earlier, instead of >> waiting until the end of the transaction (as this patch does). >> > > IIUC we are anyway freeing the toasted data at the next > insert/update/delete. We can try to free at other change message types > like REORDER_BUFFER_CHANGE_MESSAGE but as you said that may make the > patch more complex, so it seems better to do the fix on the lines of > what is proposed in the patch. > +1 Even if we started doing what you mention (freeing the hash for other change types), we'd still need to do what the patch proposes because the speculative insert may be the last change in the transaction. For the other cases it works as a mitigation, so that we don't leak the memory forever. So let's get this committed, perhaps with a comment explaining that it might be possible to reset earlier if needed. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, May 29, 2021 at 5:45 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 5/29/21 6:29 AM, Amit Kapila wrote: > > On Fri, May 28, 2021 at 5:16 PM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> > >> I wonder if there's a way to free the TOASTed data earlier, instead of > >> waiting until the end of the transaction (as this patch does). > >> > > > > IIUC we are anyway freeing the toasted data at the next > > insert/update/delete. We can try to free at other change message types > > like REORDER_BUFFER_CHANGE_MESSAGE but as you said that may make the > > patch more complex, so it seems better to do the fix on the lines of > > what is proposed in the patch. > > > > +1 > > Even if we started doing what you mention (freeing the hash for other > change types), we'd still need to do what the patch proposes because the > speculative insert may be the last change in the transaction. For the > other cases it works as a mitigation, so that we don't leak the memory > forever. > Right. > So let's get this committed, perhaps with a comment explaining that it > might be possible to reset earlier if needed. > Okay, I think it would be better if we can test this once for the streaming case as well. Dilip, would you like to do that and send the updated patch as per one of the comments by Tomas? -- With Regards, Amit Kapila.
On Mon, 31 May 2021 at 8:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, May 29, 2021 at 5:45 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 5/29/21 6:29 AM, Amit Kapila wrote:
> > On Fri, May 28, 2021 at 5:16 PM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >>
> >> I wonder if there's a way to free the TOASTed data earlier, instead of
> >> waiting until the end of the transaction (as this patch does).
> >>
> >
> > IIUC we are anyway freeing the toasted data at the next
> > insert/update/delete. We can try to free at other change message types
> > like REORDER_BUFFER_CHANGE_MESSAGE but as you said that may make the
> > patch more complex, so it seems better to do the fix on the lines of
> > what is proposed in the patch.
> >
>
> +1
>
> Even if we started doing what you mention (freeing the hash for other
> change types), we'd still need to do what the patch proposes because the
> speculative insert may be the last change in the transaction. For the
> other cases it works as a mitigation, so that we don't leak the memory
> forever.
>
Right.
> So let's get this committed, perhaps with a comment explaining that it
> might be possible to reset earlier if needed.
>
Okay, I think it would be better if we can test this once for the
streaming case as well. Dilip, would you like to do that and send the
updated patch as per one of the comments by Tomas?
I will do that in sometime.
On Mon, May 31, 2021 at 8:50 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, 31 May 2021 at 8:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> Okay, I think it would be better if we can test this once for the >> streaming case as well. Dilip, would you like to do that and send the >> updated patch as per one of the comments by Tomas? > > > I will do that sometime. I have changed patches as Tomas suggested and also created back patches. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On Mon, 31 May 2021 at 4:29 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, May 31, 2021 at 8:50 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, 31 May 2021 at 8:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> Okay, I think it would be better if we can test this once for the
>> streaming case as well. Dilip, would you like to do that and send the
>> updated patch as per one of the comments by Tomas?
>
>
> I will do that sometime.
I have changed patches as Tomas suggested and also created back patches.
I missed to do the test for streaming. I will to that tomorrow and reply back.
On Mon, May 31, 2021 at 6:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, 31 May 2021 at 4:29 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote: >> >> On Mon, May 31, 2021 at 8:50 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: >> > >> > On Mon, 31 May 2021 at 8:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> >> >> Okay, I think it would be better if we can test this once for the >> >> streaming case as well. Dilip, would you like to do that and send the >> >> updated patch as per one of the comments by Tomas? >> > >> > >> > I will do that sometime. >> >> >> I have changed patches as Tomas suggested and also created back patches. > > > I missed to do the test for streaming. I will to that tomorrow and reply back. For streaming transactions this issue is not there. Because this problem will only occur if the last change is *SPEC INSERT * and after that there is no other UPDATE/INSERT change because on that change we are resetting the toast table. Now, if the transaction has only *SPEC INSERT* without SPEC CONFIRM or any other INSERT/UPDATE then we will not stream it. And if we get any next INSERT/UPDATE then only we can select this for stream but in that case toast will be reset. So as of today for streaming mode we don't have this problem. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, May 31, 2021 at 8:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, May 31, 2021 at 6:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I missed to do the test for streaming. I will to that tomorrow and reply back. > > For streaming transactions this issue is not there. Because this > problem will only occur if the last change is *SPEC INSERT * and after > that there is no other UPDATE/INSERT change because on that change we > are resetting the toast table. Now, if the transaction has only *SPEC > INSERT* without SPEC CONFIRM or any other INSERT/UPDATE then we will > not stream it. And if we get any next INSERT/UPDATE then only we can > select this for stream but in that case toast will be reset. So as of > today for streaming mode we don't have this problem. > What if the next change is a different SPEC_INSERT (REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)? I think in that case we will stream but won't free the toast memory. -- With Regards, Amit Kapila.
On Tue, Jun 1, 2021 at 9:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, May 31, 2021 at 8:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Mon, May 31, 2021 at 6:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > I missed to do the test for streaming. I will to that tomorrow and reply back. > > > > For streaming transactions this issue is not there. Because this > > problem will only occur if the last change is *SPEC INSERT * and after > > that there is no other UPDATE/INSERT change because on that change we > > are resetting the toast table. Now, if the transaction has only *SPEC > > INSERT* without SPEC CONFIRM or any other INSERT/UPDATE then we will > > not stream it. And if we get any next INSERT/UPDATE then only we can > > select this for stream but in that case toast will be reset. So as of > > today for streaming mode we don't have this problem. > > > > What if the next change is a different SPEC_INSERT > (REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)? I think in that case we > will stream but won't free the toast memory. But if the next change is again the SPEC INSERT then we will keep the PARTIAL change flag set and we will not select this transaction for stream right? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 1, 2021 at 9:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 1, 2021 at 9:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, May 31, 2021 at 8:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Mon, May 31, 2021 at 6:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > I missed to do the test for streaming. I will to that tomorrow and reply back. > > > > > > For streaming transactions this issue is not there. Because this > > > problem will only occur if the last change is *SPEC INSERT * and after > > > that there is no other UPDATE/INSERT change because on that change we > > > are resetting the toast table. Now, if the transaction has only *SPEC > > > INSERT* without SPEC CONFIRM or any other INSERT/UPDATE then we will > > > not stream it. And if we get any next INSERT/UPDATE then only we can > > > select this for stream but in that case toast will be reset. So as of > > > today for streaming mode we don't have this problem. > > > > > > > What if the next change is a different SPEC_INSERT > > (REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)? I think in that case we > > will stream but won't free the toast memory. > > But if the next change is again the SPEC INSERT then we will keep the > PARTIAL change flag set and we will not select this transaction for > stream right? > Right, I think you can remove the change related to stream xact and probably write some comments on why we don't need it for streamed transactions. But, now I have another question related to fixing the non-streamed case. What if after the missing spec_confirm we get the delete operation in the transaction? It seems ReorderBufferToastReplace always expects Insert/Update if we have toast hash active in the transaction. -- With Regards, Amit Kapila.
On Tue, Jun 1, 2021 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 1, 2021 at 9:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Jun 1, 2021 at 9:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, May 31, 2021 at 8:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Mon, May 31, 2021 at 6:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > I missed to do the test for streaming. I will to that tomorrow and reply back. > > > > > > > > For streaming transactions this issue is not there. Because this > > > > problem will only occur if the last change is *SPEC INSERT * and after > > > > that there is no other UPDATE/INSERT change because on that change we > > > > are resetting the toast table. Now, if the transaction has only *SPEC > > > > INSERT* without SPEC CONFIRM or any other INSERT/UPDATE then we will > > > > not stream it. And if we get any next INSERT/UPDATE then only we can > > > > select this for stream but in that case toast will be reset. So as of > > > > today for streaming mode we don't have this problem. > > > > > > > > > > What if the next change is a different SPEC_INSERT > > > (REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)? I think in that case we > > > will stream but won't free the toast memory. > > > > But if the next change is again the SPEC INSERT then we will keep the > > PARTIAL change flag set and we will not select this transaction for > > stream right? > > > > Right, I think you can remove the change related to stream xact and > probably write some comments on why we don't need it for streamed > transactions. But, now I have another question related to fixing the > non-streamed case. What if after the missing spec_confirm we get the > delete operation in the transaction? It seems > ReorderBufferToastReplace always expects Insert/Update if we have > toast hash active in the transaction. Yeah, that looks like a problem, I will test this case. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 1, 2021 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 1, 2021 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Jun 1, 2021 at 9:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Jun 1, 2021 at 9:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Mon, May 31, 2021 at 8:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > On Mon, May 31, 2021 at 6:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > I missed to do the test for streaming. I will to that tomorrow and reply back. > > > > > > > > > > For streaming transactions this issue is not there. Because this > > > > > problem will only occur if the last change is *SPEC INSERT * and after > > > > > that there is no other UPDATE/INSERT change because on that change we > > > > > are resetting the toast table. Now, if the transaction has only *SPEC > > > > > INSERT* without SPEC CONFIRM or any other INSERT/UPDATE then we will > > > > > not stream it. And if we get any next INSERT/UPDATE then only we can > > > > > select this for stream but in that case toast will be reset. So as of > > > > > today for streaming mode we don't have this problem. > > > > > > > > > > > > > What if the next change is a different SPEC_INSERT > > > > (REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT)? I think in that case we > > > > will stream but won't free the toast memory. > > > > > > But if the next change is again the SPEC INSERT then we will keep the > > > PARTIAL change flag set and we will not select this transaction for > > > stream right? > > > > > > > Right, I think you can remove the change related to stream xact and > > probably write some comments on why we don't need it for streamed > > transactions. But, now I have another question related to fixing the > > non-streamed case. What if after the missing spec_confirm we get the > > delete operation in the transaction? It seems > > ReorderBufferToastReplace always expects Insert/Update if we have > > toast hash active in the transaction. > > Yeah, that looks like a problem, I will test this case. I am able to hit that Assert after slight modification in the original test case, basically, I added an extra delete in the spec abort transaction and I got this assertion. #0 0x00007f7b8cc3a387 in raise () from /lib64/libc.so.6 #1 0x00007f7b8cc3ba78 in abort () from /lib64/libc.so.6 #2 0x0000000000b027c7 in ExceptionalCondition (conditionName=0xcc11df "change->data.tp.newtuple", errorType=0xcc0244 "FailedAssertion", fileName=0xcc0290 "reorderbuffer.c", lineNumber=4601) at assert.c:69 #3 0x00000000008dfeaf in ReorderBufferToastReplace (rb=0x1a73e40, txn=0x1b5d6e8, relation=0x7f7b8dab4d78, change=0x1b5fb68) at reorderbuffer.c:4601 #4 0x00000000008db769 in ReorderBufferProcessTXN (rb=0x1a73e40, txn=0x1b5d6e8, commit_lsn=24329048, snapshot_now=0x1b4b8d0, command_id=0, streaming=false) at reorderbuffer.c:2187 #5 0x00000000008dc1df in ReorderBufferReplay (txn=0x1b5d6e8, rb=0x1a73e40, xid=748, commit_lsn=24329048, end_lsn=24329096, commit_time=675842700629597, origin_id=0, origin_lsn=0) at reorderbuffer.c:2601 #6 0x00000000008dc267 in ReorderBufferCommit (rb=0x1a73e40, xid=748, commit_lsn=24329048, end_lsn=24329096, commit_time=675842700629597, origin_id=0, origin_lsn=0) at reorderbuffer.c:2625 #7 0x00000000008cc144 in DecodeCommit (ctx=0x1b319b0, buf=0x7ffdf15fb0a0, parsed=0x7ffdf15faf00, xid=748, two_phase=false) at decode.c:744 IMHO, as I stated earlier one way to fix this problem is that we add the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the queue, maybe with action name "REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing that just cleans up the toast and specinsert tuple and nothing else. If we think that makes sense then I can work on that patch? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 1, 2021 at 11:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 1, 2021 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Jun 1, 2021 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > Right, I think you can remove the change related to stream xact and > > > probably write some comments on why we don't need it for streamed > > > transactions. But, now I have another question related to fixing the > > > non-streamed case. What if after the missing spec_confirm we get the > > > delete operation in the transaction? It seems > > > ReorderBufferToastReplace always expects Insert/Update if we have > > > toast hash active in the transaction. > > > > Yeah, that looks like a problem, I will test this case. > > I am able to hit that Assert after slight modification in the original > test case, basically, I added an extra delete in the spec abort > transaction and I got this assertion. > Can we try with other Insert/Update after spec abort to check if there can be other problems due to active toast_hash? > > IMHO, as I stated earlier one way to fix this problem is that we add > the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the > queue, maybe with action name > "REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing > that just cleans up the toast and specinsert tuple and nothing else. > If we think that makes sense then I can work on that patch? > I think this should solve the problem but let's first try to see if we have any other problems. Because, say, if we don't have any other problem, then maybe removing Assert might work but I guess we still need to process the tuple to find that we don't need to assemble toast for it which again seems like overkill. -- With Regards, Amit Kapila.
On Tue, Jun 1, 2021 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > IMHO, as I stated earlier one way to fix this problem is that we add > > the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the > > queue, maybe with action name > > "REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing > > that just cleans up the toast and specinsert tuple and nothing else. > > If we think that makes sense then I can work on that patch? > > > > I think this should solve the problem but let's first try to see if we > have any other problems. Because, say, if we don't have any other > problem, then maybe removing Assert might work but I guess we still > need to process the tuple to find that we don't need to assemble toast > for it which again seems like overkill. Yeah, other operation will also fail, basically, if txn->toast_hash is not NULL then we assume that we need to assemble the tuple using toast, but if there is next insert in another relation and if that relation doesn't have a toast table then it will fail with below error. And otherwise also, if it is the same relation, then the toast chunks of previous tuple will be used for constructing this new tuple. I think we must have to clean the toast before processing the next tuple so I think we can go with the solution I mentioned above. static void ReorderBufferToastReplace { ... toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid); if (!RelationIsValid(toast_rel)) elog(ERROR, "could not open relation with OID %u", relation->rd_rel->reltoastrelid); -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 1, 2021 at 5:22 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 1, 2021 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > IMHO, as I stated earlier one way to fix this problem is that we add > > > the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the > > > queue, maybe with action name > > > "REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing > > > that just cleans up the toast and specinsert tuple and nothing else. > > > If we think that makes sense then I can work on that patch? > > > > > > > I think this should solve the problem but let's first try to see if we > > have any other problems. Because, say, if we don't have any other > > problem, then maybe removing Assert might work but I guess we still > > need to process the tuple to find that we don't need to assemble toast > > for it which again seems like overkill. > > Yeah, other operation will also fail, basically, if txn->toast_hash is > not NULL then we assume that we need to assemble the tuple using > toast, but if there is next insert in another relation and if that > relation doesn't have a toast table then it will fail with below > error. And otherwise also, if it is the same relation, then the toast > chunks of previous tuple will be used for constructing this new tuple. > I think we must have to clean the toast before processing the next > tuple so I think we can go with the solution I mentioned above. > > static void > ReorderBufferToastReplace > { > ... > toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid); > if (!RelationIsValid(toast_rel)) > elog(ERROR, "could not open relation with OID %u", > relation->rd_rel->reltoastrelid); The attached patch fixes by queuing the spec abort change and cleaning up the toast hash on spec abort. Currently, in this patch I am queuing up all the spec abort changes, but as an optimization we can avoid queuing the spec abort for toast tables but for that we need to log that as a flag in WAL. that this XLH_DELETE_IS_SUPER is for a toast relation. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On Tue, Jun 1, 2021 at 8:01 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > The attached patch fixes by queuing the spec abort change and cleaning > up the toast hash on spec abort. Currently, in this patch I am > queuing up all the spec abort changes, but as an optimization we can > avoid > queuing the spec abort for toast tables but for that we need to log > that as a flag in WAL. that this XLH_DELETE_IS_SUPER is for a toast > relation. > I don't think that is required especially because we intend to backpatch this, so I would like to keep such optimization for another day. Few comments: Comments: ------------ /* * Super deletions are irrelevant for logical decoding, it's driven by the * confirmation records. */ 1. The above comment is not required after your other changes. /* * Either speculative insertion was confirmed, or it was * unsuccessful and the record isn't needed anymore. */ if (specinsert != NULL) 2. The above comment needs some adjustment. /* * There's a speculative insertion remaining, just clean in up, it * can't have been successful, otherwise we'd gotten a confirmation * record. */ if (specinsert) { ReorderBufferReturnChange(rb, specinsert, true); specinsert = NULL; } 3. Ideally, we should have an Assert here because we shouldn't reach without cleaning up specinsert. If there is still a chance then we should mention that in the comments. 4. + case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT: + + /* + * Abort for speculative insertion arrived. I think here we should explain why we can't piggyback cleanup on next insert/update/delete. 5. Can we write a test case for it? I guess we don't need to use multiple sessions if the conflicting record is already present. Please see if the same patch works on back-branches? I guess this makes the change bit tricky as it involves decoding a new message but not sure if there is a better way. -- With Regards, Amit Kapila.
On Tue, Jun 1, 2021 at 5:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 1, 2021 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > IMHO, as I stated earlier one way to fix this problem is that we add > > > the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the > > > queue, maybe with action name > > > "REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing > > > that just cleans up the toast and specinsert tuple and nothing else. > > > If we think that makes sense then I can work on that patch? > > > > > > > I think this should solve the problem but let's first try to see if we > > have any other problems. Because, say, if we don't have any other > > problem, then maybe removing Assert might work but I guess we still > > need to process the tuple to find that we don't need to assemble toast > > for it which again seems like overkill. > > Yeah, other operation will also fail, basically, if txn->toast_hash is > not NULL then we assume that we need to assemble the tuple using > toast, but if there is next insert in another relation and if that > relation doesn't have a toast table then it will fail with below > error. And otherwise also, if it is the same relation, then the toast > chunks of previous tuple will be used for constructing this new tuple. > I think the same relation case might not create a problem because it won't find the entry for it in the toast_hash, so it will return from there but the other two problems will be there. So, one idea could be to just avoid those two cases (by simply adding return for those cases) and still we can rely on toast clean up on the next insert/update/delete. However, your approach looks more natural to me as that will allow us to clean up memory in all cases instead of waiting till the transaction end. So, I still vote for going with your patch's idea of cleaning at spec_abort but I am fine if you and others decide not to process spec_abort message. What do you think? Tomas, do you have any opinion on this matter? -- With Regards, Amit Kapila.
On Wed, Jun 2, 2021 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 1, 2021 at 5:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Jun 1, 2021 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > IMHO, as I stated earlier one way to fix this problem is that we add > > > > the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the > > > > queue, maybe with action name > > > > "REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing > > > > that just cleans up the toast and specinsert tuple and nothing else. > > > > If we think that makes sense then I can work on that patch? > > > > > > > > > > I think this should solve the problem but let's first try to see if we > > > have any other problems. Because, say, if we don't have any other > > > problem, then maybe removing Assert might work but I guess we still > > > need to process the tuple to find that we don't need to assemble toast > > > for it which again seems like overkill. > > > > Yeah, other operation will also fail, basically, if txn->toast_hash is > > not NULL then we assume that we need to assemble the tuple using > > toast, but if there is next insert in another relation and if that > > relation doesn't have a toast table then it will fail with below > > error. And otherwise also, if it is the same relation, then the toast > > chunks of previous tuple will be used for constructing this new tuple. > > > > I think the same relation case might not create a problem because it > won't find the entry for it in the toast_hash, so it will return from > there but the other two problems will be there. Right So, one idea could be > to just avoid those two cases (by simply adding return for those > cases) and still we can rely on toast clean up on the next > insert/update/delete. However, your approach looks more natural to me > as that will allow us to clean up memory in all cases instead of > waiting till the transaction end. So, I still vote for going with your > patch's idea of cleaning at spec_abort but I am fine if you and others > decide not to process spec_abort message. What do you think? Tomas, do > you have any opinion on this matter? I agree that processing with spec abort looks more natural and ideally the current code expects it to be getting cleaned after the change, that's the reason we have those assertions and errors. OTOH I agree that we can just return from those conditions because now we know that with the current code those situations are possible. My vote is with handling the spec abort option (Option1) because that looks more natural way of handling these issues and we also don't have to clean up the hash in "ReorderBufferReturnTXN" if no followup change after spec abort. I am attaching the patches with both the approaches for the reference. Once we finalize on the approach then I will work on pending review comments and also prepare the back branch patches. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, Jun 2, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Jun 2, 2021 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Jun 1, 2021 at 5:23 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Jun 1, 2021 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > IMHO, as I stated earlier one way to fix this problem is that we add > > > > > the spec abort operation (DELETE + XLH_DELETE_IS_SUPER flag) to the > > > > > queue, maybe with action name > > > > > "REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT" and as part of processing > > > > > that just cleans up the toast and specinsert tuple and nothing else. > > > > > If we think that makes sense then I can work on that patch? > > > > > > > > > > > > > I think this should solve the problem but let's first try to see if we > > > > have any other problems. Because, say, if we don't have any other > > > > problem, then maybe removing Assert might work but I guess we still > > > > need to process the tuple to find that we don't need to assemble toast > > > > for it which again seems like overkill. > > > > > > Yeah, other operation will also fail, basically, if txn->toast_hash is > > > not NULL then we assume that we need to assemble the tuple using > > > toast, but if there is next insert in another relation and if that > > > relation doesn't have a toast table then it will fail with below > > > error. And otherwise also, if it is the same relation, then the toast > > > chunks of previous tuple will be used for constructing this new tuple. > > > > > > > I think the same relation case might not create a problem because it > > won't find the entry for it in the toast_hash, so it will return from > > there but the other two problems will be there. > > Right > > So, one idea could be > > to just avoid those two cases (by simply adding return for those > > cases) and still we can rely on toast clean up on the next > > insert/update/delete. However, your approach looks more natural to me > > as that will allow us to clean up memory in all cases instead of > > waiting till the transaction end. So, I still vote for going with your > > patch's idea of cleaning at spec_abort but I am fine if you and others > > decide not to process spec_abort message. What do you think? Tomas, do > > you have any opinion on this matter? > > I agree that processing with spec abort looks more natural and ideally > the current code expects it to be getting cleaned after the change, > that's the reason we have those assertions and errors. OTOH I agree > that we can just return from those conditions because now we know that > with the current code those situations are possible. My vote is with > handling the spec abort option (Option1) because that looks more > natural way of handling these issues and we also don't have to clean > up the hash in "ReorderBufferReturnTXN" if no followup change after > spec abort. > Even, if we decide to go with spec_abort approach, it might be better to still keep the toastreset call in ReorderBufferReturnTXN so that it can be freed in case of error. -- With Regards, Amit Kapila.
On Wed, Jun 2, 2021 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jun 2, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Jun 2, 2021 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I think the same relation case might not create a problem because it > > > won't find the entry for it in the toast_hash, so it will return from > > > there but the other two problems will be there. > > > > Right > > > > So, one idea could be > > > to just avoid those two cases (by simply adding return for those > > > cases) and still we can rely on toast clean up on the next > > > insert/update/delete. However, your approach looks more natural to me > > > as that will allow us to clean up memory in all cases instead of > > > waiting till the transaction end. So, I still vote for going with your > > > patch's idea of cleaning at spec_abort but I am fine if you and others > > > decide not to process spec_abort message. What do you think? Tomas, do > > > you have any opinion on this matter? > > > > I agree that processing with spec abort looks more natural and ideally > > the current code expects it to be getting cleaned after the change, > > that's the reason we have those assertions and errors. > > Okay, so, let's go with that approach. I have thought about whether it creates any problem in back-branches but couldn't think of any primarily because we are not going to send anything additional to plugin/subscriber. Do you see any problems with back branches if we go with this approach? > > OTOH I agree > > that we can just return from those conditions because now we know that > > with the current code those situations are possible. My vote is with > > handling the spec abort option (Option1) because that looks more > > natural way of handling these issues and we also don't have to clean > > up the hash in "ReorderBufferReturnTXN" if no followup change after > > spec abort. > > > > Even, if we decide to go with spec_abort approach, it might be better > to still keep the toastreset call in ReorderBufferReturnTXN so that it > can be freed in case of error. > Please take care of this as well. -- With Regards, Amit Kapila.
On Mon, 7 Jun 2021 at 8:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jun 2, 2021 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jun 2, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Jun 2, 2021 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > I think the same relation case might not create a problem because it
> > > won't find the entry for it in the toast_hash, so it will return from
> > > there but the other two problems will be there.
> >
> > Right
> >
> > So, one idea could be
> > > to just avoid those two cases (by simply adding return for those
> > > cases) and still we can rely on toast clean up on the next
> > > insert/update/delete. However, your approach looks more natural to me
> > > as that will allow us to clean up memory in all cases instead of
> > > waiting till the transaction end. So, I still vote for going with your
> > > patch's idea of cleaning at spec_abort but I am fine if you and others
> > > decide not to process spec_abort message. What do you think? Tomas, do
> > > you have any opinion on this matter?
> >
> > I agree that processing with spec abort looks more natural and ideally
> > the current code expects it to be getting cleaned after the change,
> > that's the reason we have those assertions and errors.
> >
Okay, so, let's go with that approach. I have thought about whether it
creates any problem in back-branches but couldn't think of any
primarily because we are not going to send anything additional to
plugin/subscriber. Do you see any problems with back branches if we go
with this approach?
I will check this and let you know.
> > OTOH I agree
> > that we can just return from those conditions because now we know that
> > with the current code those situations are possible. My vote is with
> > handling the spec abort option (Option1) because that looks more
> > natural way of handling these issues and we also don't have to clean
> > up the hash in "ReorderBufferReturnTXN" if no followup change after
> > spec abort.
> >
>
> Even, if we decide to go with spec_abort approach, it might be better
> to still keep the toastreset call in ReorderBufferReturnTXN so that it
> can be freed in case of error.
>
Please take care of this as well.
Ok
On Mon, Jun 7, 2021 at 8:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, 7 Jun 2021 at 8:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:On Wed, Jun 2, 2021 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jun 2, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Jun 2, 2021 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > I think the same relation case might not create a problem because it
> > > won't find the entry for it in the toast_hash, so it will return from
> > > there but the other two problems will be there.
> >
> > Right
> >
> > So, one idea could be
> > > to just avoid those two cases (by simply adding return for those
> > > cases) and still we can rely on toast clean up on the next
> > > insert/update/delete. However, your approach looks more natural to me
> > > as that will allow us to clean up memory in all cases instead of
> > > waiting till the transaction end. So, I still vote for going with your
> > > patch's idea of cleaning at spec_abort but I am fine if you and others
> > > decide not to process spec_abort message. What do you think? Tomas, do
> > > you have any opinion on this matter?
> >
> > I agree that processing with spec abort looks more natural and ideally
> > the current code expects it to be getting cleaned after the change,
> > that's the reason we have those assertions and errors.
> >
Okay, so, let's go with that approach. I have thought about whether it
creates any problem in back-branches but couldn't think of any
primarily because we are not going to send anything additional to
plugin/subscriber. Do you see any problems with back branches if we go
with this approach?I will check this and let you know.
> > OTOH I agree
> > that we can just return from those conditions because now we know that
> > with the current code those situations are possible. My vote is with
> > handling the spec abort option (Option1) because that looks more
> > natural way of handling these issues and we also don't have to clean
> > up the hash in "ReorderBufferReturnTXN" if no followup change after
> > spec abort.
> >
>
> Even, if we decide to go with spec_abort approach, it might be better
> to still keep the toastreset call in ReorderBufferReturnTXN so that it
> can be freed in case of error.
>
Please take care of this as well.Ok
I have fixed all pending review comments and also added a test case which is working fine. I haven't yet checked on the back branches. Let's discuss if we think this patch looks fine then I can apply and test on the back branches.
--
Attachment
On Mon, Jun 7, 2021 at 6:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I have fixed all pending review comments and also added a test case which is working fine. > Few observations and questions on testcase: 1. +step "s1_lock_s2" { SELECT pg_advisory_lock(2); } +step "s1_lock_s3" { SELECT pg_advisory_lock(2); } +step "s1_session" { SET spec.session = 1; } +step "s1_begin" { BEGIN; } +step "s1_insert_tbl1" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ON CONFLICT DO NOTHING; } +step "s1_abort" { ROLLBACK; } +step "s1_unlock_s2" { SELECT pg_advisory_unlock(2); } +step "s1_unlock_s3" { SELECT pg_advisory_unlock(2); } Here, s1_lock_s3 and s1_unlock_s3 uses 2 as identifier. Don't you need to use 3 in that part of the test? 2. In the test, there seems to be an assumption that we can unlock s2 and s3 one after another, and then both will start waiting on s-1 but isn't it possible that before s2 start waiting on s1, s3 completes its insertion and then s2 will never proceed for speculative insertion? > I haven't yet checked on the back branches. Let's discuss if we think this patch looks fine then I can apply and teston the back branches. > Sure, that makes sense. -- With Regards, Amit Kapila.
On Mon, Jun 7, 2021 at 6:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jun 7, 2021 at 6:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I have fixed all pending review comments and also added a test case which is working fine.
>
Few observations and questions on testcase:
1.
+step "s1_lock_s2" { SELECT pg_advisory_lock(2); }
+step "s1_lock_s3" { SELECT pg_advisory_lock(2); }
+step "s1_session" { SET spec.session = 1; }
+step "s1_begin" { BEGIN; }
+step "s1_insert_tbl1" { INSERT INTO tbl1 VALUES(1, repeat('a', 4000))
ON CONFLICT DO NOTHING; }
+step "s1_abort" { ROLLBACK; }
+step "s1_unlock_s2" { SELECT pg_advisory_unlock(2); }
+step "s1_unlock_s3" { SELECT pg_advisory_unlock(2); }
Here, s1_lock_s3 and s1_unlock_s3 uses 2 as identifier. Don't you need
to use 3 in that part of the test?
Yeah this should be 3.
2. In the test, there seems to be an assumption that we can unlock s2
and s3 one after another, and then both will start waiting on s-1 but
isn't it possible that before s2 start waiting on s1, s3 completes its
insertion and then s2 will never proceed for speculative insertion?
I agree, such race conditions are possible. Currently, I am not able to think what we can do here, but I will think more on this.
--
On Mon, Jun 7, 2021 at 6:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: >> >> >> 2. In the test, there seems to be an assumption that we can unlock s2 >> and s3 one after another, and then both will start waiting on s-1 but >> isn't it possible that before s2 start waiting on s1, s3 completes its >> insertion and then s2 will never proceed for speculative insertion? > > > I agree, such race conditions are possible. Currently, I am not able to think what we can do here, but I will think moreon this. > Based on the off list discussion, I have modified the test based on the idea showed in "isolation/specs/insert-conflict-specconflict.spec", other open point we had about the race condition that how to ensure that when we unlock any session it make progress, IMHO the isolation tested is designed in a way that either all the waiting session returns with the output or again block on a heavy weight lock before performing the next step. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On Tue, Jun 8, 2021 at 5:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > Based on the off list discussion, I have modified the test based on > the idea showed in > "isolation/specs/insert-conflict-specconflict.spec", other open point > we had about the race condition that how to ensure that when we unlock > any session it make progress, IMHO the isolation tested is designed in > a way that either all the waiting session returns with the output or > again block on a heavy weight lock before performing the next step. > Few comments: 1. The test has a lot of similarities and test duplication with what we are doing in insert-conflict-specconflict.spec. Can we move it to insert-conflict-specconflict.spec? I understand that having it in test_decoding has the advantage that we can have all decoding tests in one place but OTOH, we can avoid a lot of test-code duplication if we add it in insert-conflict-specconflict.spec. 2. +#permutation "s1_session" "s1_lock_s2" "s1_lock_s3" "s1_begin" "s1_insert_tbl1" "s2_session" "s2_begin" "s2_insert_tbl1" "s3_session" "s3_begin" "s3_insert_tbl1" "s1_unlock_s2" "s1_unlock_s3" "s1_lock_s2" "s1_abort" "s3_commit" "s1_unlock_s2" "s2_insert_tbl2" "s2_commit" "s1_get_changes" This permutation is not matching with what we are actually doing. 3. +# Test that speculative locks are correctly acquired and released, s2 +# inserts, s1 updates. This test description doesn't seem to be correct. Can we change it to something like: "Test logical decoding of speculative aborts for toast insertion followed by insertion into a different table which doesn't have a toast"? Also, let's prepare and test the patches for back-branches. It would be better if you can prepare separate patches for code and test-case for each branch then I can merge them before commit. This helps with testing on back-branches. -- With Regards, Amit Kapila.
On Wed, Jun 9, 2021 at 11:03 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jun 8, 2021 at 5:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Based on the off list discussion, I have modified the test based on
> the idea showed in
> "isolation/specs/insert-conflict-specconflict.spec", other open point
> we had about the race condition that how to ensure that when we unlock
> any session it make progress, IMHO the isolation tested is designed in
> a way that either all the waiting session returns with the output or
> again block on a heavy weight lock before performing the next step.
>
Few comments:
1. The test has a lot of similarities and test duplication with what
we are doing in insert-conflict-specconflict.spec. Can we move it to
insert-conflict-specconflict.spec? I understand that having it in
test_decoding has the advantage that we can have all decoding tests in
one place but OTOH, we can avoid a lot of test-code duplication if we
add it in insert-conflict-specconflict.spec.
It seems the isolation test runs on the default configuration, will it be a good idea to change the wal_level to logical for the whole isolation tester folder?
On Wed, Jun 9, 2021 at 4:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Jun 9, 2021 at 11:03 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Tue, Jun 8, 2021 at 5:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: >> > >> > Based on the off list discussion, I have modified the test based on >> > the idea showed in >> > "isolation/specs/insert-conflict-specconflict.spec", other open point >> > we had about the race condition that how to ensure that when we unlock >> > any session it make progress, IMHO the isolation tested is designed in >> > a way that either all the waiting session returns with the output or >> > again block on a heavy weight lock before performing the next step. >> > >> >> Few comments: >> 1. The test has a lot of similarities and test duplication with what >> we are doing in insert-conflict-specconflict.spec. Can we move it to >> insert-conflict-specconflict.spec? I understand that having it in >> test_decoding has the advantage that we can have all decoding tests in >> one place but OTOH, we can avoid a lot of test-code duplication if we >> add it in insert-conflict-specconflict.spec. >> > > It seems the isolation test runs on the default configuration, will it be a good idea to change the wal_level to logicalfor the whole isolation tester folder? > No, that doesn't sound like a good idea to me. Let's keep it in test_decoding then. -- With Regards, Amit Kapila.
On Wed, Jun 9, 2021 at 4:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jun 9, 2021 at 4:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> Few comments:
>> 1. The test has a lot of similarities and test duplication with what
>> we are doing in insert-conflict-specconflict.spec. Can we move it to
>> insert-conflict-specconflict.spec? I understand that having it in
>> test_decoding has the advantage that we can have all decoding tests in
>> one place but OTOH, we can avoid a lot of test-code duplication if we
>> add it in insert-conflict-specconflict.spec.
>>
>
> It seems the isolation test runs on the default configuration, will it be a good idea to change the wal_level to logical for the whole isolation tester folder?
>
No, that doesn't sound like a good idea to me. Let's keep it in
test_decoding then.
Okay, I will work on the remaining comments and back patches and send it by tomorrow.
May I suggest to use a different name in the blurt_and_lock_123() function, so that it doesn't conflict with the one in insert-conflict-specconflict? Thanks -- Álvaro Herrera 39°49'30"S 73°17'W
On Wed, Jun 9, 2021 at 8:59 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > May I suggest to use a different name in the blurt_and_lock_123() > function, so that it doesn't conflict with the one in > insert-conflict-specconflict? Thanks Renamed to blurt_and_lock(), is that fine? I haved fixed other comments and also prepared patches for the back branches. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
- v5-0001-Bug-fix-for-speculative-abort_HEAD.patch
- v5-0001-Bug-fix-for-speculative-abort-v10.patch
- v5-0001-Bug-fix-for-speculative-abort-v12_and_v13.patch
- v5-0001-Bug-fix-for-speculative-abort-v96.patch
- v5-0001-Bug-fix-for-speculative-abort-v11.patch
- v5-0002-Test-logical-decoding-of-speculative-aborts_HEAD.patch
- v5-0002-Test-logical-decoding-of-speculative-aborts-v10.patch
- v5-0002-Test-logical-decoding-of-speculative-aborts-v11.patch
- v5-0002-Test-logical-decoding-of-speculative-aborts-v12_and_v13.patch
- v5-0002-Test-logical-decoding-of-speculative-aborts-v96.patch
On Thu, Jun 10, 2021 at 2:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Jun 9, 2021 at 8:59 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > May I suggest to use a different name in the blurt_and_lock_123() > > function, so that it doesn't conflict with the one in > > insert-conflict-specconflict? Thanks > > Renamed to blurt_and_lock(), is that fine? > I think a non-conflicting name should be fine. > I haved fixed other comments and also prepared patches for the back branches. > Okay, I have verified the fix on all branches and the newly added test was giving error without patch and passes with code change patch. Few minor things: 1. You forgot to make the change in ReorderBufferChangeSize for v13 patch. 2. I have made a few changes in the HEAD patch, (a) There was an unnecessary cleanup of spec insert at one place. I have replaced that with Assert. (b) I have added and edited few comments both in the code and test patch. Please find the patch for HEAD attached. Can you please prepare the patch for back-branches by doing all the changes I have done in the patch for HEAD? -- With Regards, Amit Kapila.
Attachment
On Thu, Jun 10, 2021 at 7:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Please find the patch for HEAD attached. Can you please prepare the > patch for back-branches by doing all the changes I have done in the > patch for HEAD? Done -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
- v7-0001-Fix-decoding-of-speculative-aborts.patch
- v7-0001-96-Fix-decoding-of-speculative-aborts.patch
- v7-0001-v11-Fix-decoding-of-speculative-aborts.patch
- v7-0001-v12-Fix-decoding-of-speculative-aborts.patch
- v7-0001-v10-Fix-decoding-of-speculative-aborts.patch
- v7-0001-v13-Fix-decoding-of-speculative-aborts.patch
On Fri, Jun 11, 2021 at 11:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Jun 10, 2021 at 7:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > Please find the patch for HEAD attached. Can you please prepare the > > patch for back-branches by doing all the changes I have done in the > > patch for HEAD? > > Done > Thanks, the patch looks good to me. I'll push these early next week (Tuesday) unless someone has any other comments or suggestions. -- With Regards, Amit Kapila.
On Fri, Jun 11, 2021 at 7:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jun 11, 2021 at 11:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Jun 10, 2021 at 7:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > Please find the patch for HEAD attached. Can you please prepare the > > > patch for back-branches by doing all the changes I have done in the > > > patch for HEAD? > > > > Done > > > > Thanks, the patch looks good to me. I'll push these early next week > (Tuesday) unless someone has any other comments or suggestions. > I think the test in this patch is quite similar to what Noah has pointed in the nearby thread [1] to be failing at some intervals. Can you also please once verify the same and if we can expect similar failures here then we might want to consider dropping the test in this patch for now? We can always come back to it once we find a good solution to make it pass consistently. [1] - https://www.postgresql.org/message-id/20210613073407.GA768908%40rfd.leadboat.com -- With Regards, Amit Kapila.
On Mon, Jun 14, 2021 at 8:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I think the test in this patch is quite similar to what Noah has > pointed in the nearby thread [1] to be failing at some intervals. Can > you also please once verify the same and if we can expect similar > failures here then we might want to consider dropping the test in this > patch for now? We can always come back to it once we find a good > solution to make it pass consistently. test insert-conflict-do-nothing ... ok 646 ms test insert-conflict-do-nothing-2 ... ok 1994 ms test insert-conflict-do-update ... ok 1786 ms test insert-conflict-do-update-2 ... ok 2689 ms test insert-conflict-do-update-3 ... ok 851 ms test insert-conflict-specconflict ... FAILED 3695 ms test delete-abort-savept ... ok 1238 ms Yeah, this is the same test that we have used base for our test so let's not push this test until it becomes stable. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Jun 14, 2021 at 9:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Jun 14, 2021 at 8:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > I think the test in this patch is quite similar to what Noah has > > pointed in the nearby thread [1] to be failing at some intervals. Can > > you also please once verify the same and if we can expect similar > > failures here then we might want to consider dropping the test in this > > patch for now? We can always come back to it once we find a good > > solution to make it pass consistently. > > test insert-conflict-do-nothing ... ok 646 ms > test insert-conflict-do-nothing-2 ... ok 1994 ms > test insert-conflict-do-update ... ok 1786 ms > test insert-conflict-do-update-2 ... ok 2689 ms > test insert-conflict-do-update-3 ... ok 851 ms > test insert-conflict-specconflict ... FAILED 3695 ms > test delete-abort-savept ... ok 1238 ms > > Yeah, this is the same test that we have used base for our test so > let's not push this test until it becomes stable. Patches without test case. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
- v8-0001-96-Fix-decoding-of-speculative-aborts.patch
- v8-0001-v12-Fix-decoding-of-speculative-aborts.patch
- v8-0001-v10-Fix-decoding-of-speculative-aborts.patch
- v8-0001-Fix-decoding-of-speculative-aborts.patch
- v8-0001-v11-Fix-decoding-of-speculative-aborts.patch
- v8-0001-v13-Fix-decoding-of-speculative-aborts.patch
On Mon, Jun 14, 2021 at 12:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Jun 14, 2021 at 9:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Mon, Jun 14, 2021 at 8:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I think the test in this patch is quite similar to what Noah has > > > pointed in the nearby thread [1] to be failing at some intervals. Can > > > you also please once verify the same and if we can expect similar > > > failures here then we might want to consider dropping the test in this > > > patch for now? We can always come back to it once we find a good > > > solution to make it pass consistently. > > > > test insert-conflict-do-nothing ... ok 646 ms > > test insert-conflict-do-nothing-2 ... ok 1994 ms > > test insert-conflict-do-update ... ok 1786 ms > > test insert-conflict-do-update-2 ... ok 2689 ms > > test insert-conflict-do-update-3 ... ok 851 ms > > test insert-conflict-specconflict ... FAILED 3695 ms > > test delete-abort-savept ... ok 1238 ms > > > > Yeah, this is the same test that we have used base for our test so > > let's not push this test until it becomes stable. > > Patches without test case. > Pushed! -- With Regards, Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> writes: > Pushed! skink reports that this has valgrind issues: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2021-06-15%2020%3A49%3A26 2021-06-16 01:20:13.344 UTC [2198271][4/0:0] LOG: received replication command: IDENTIFY_SYSTEM 2021-06-16 01:20:13.384 UTC [2198271][4/0:0] LOG: received replication command: START_REPLICATION SLOT "sub2" LOGICAL 0/0(proto_version '1', publication_names '"pub2"') 2021-06-16 01:20:13.454 UTC [2198271][4/0:0] LOG: starting logical decoding for slot "sub2" 2021-06-16 01:20:13.454 UTC [2198271][4/0:0] DETAIL: Streaming transactions committing after 0/157C828, reading WAL from0/157C7F0. 2021-06-16 01:20:13.488 UTC [2198271][4/0:0] LOG: logical decoding found consistent point at 0/157C7F0 2021-06-16 01:20:13.488 UTC [2198271][4/0:0] DETAIL: There are no running transactions. ... ==2198271== VALGRINDERROR-BEGIN ==2198271== Conditional jump or move depends on uninitialised value(s) ==2198271== at 0x80EF890: rel_sync_cache_relation_cb (pgoutput.c:833) ==2198271== by 0x666AEB: LocalExecuteInvalidationMessage (inval.c:595) ==2198271== by 0x53A423: ReceiveSharedInvalidMessages (sinval.c:90) ==2198271== by 0x666026: AcceptInvalidationMessages (inval.c:683) ==2198271== by 0x53FBDD: LockRelationOid (lmgr.c:136) ==2198271== by 0x1D3943: relation_open (relation.c:56) ==2198271== by 0x26F21F: table_open (table.c:43) ==2198271== by 0x66D97F: ScanPgRelation (relcache.c:346) ==2198271== by 0x674644: RelationBuildDesc (relcache.c:1059) ==2198271== by 0x674BE8: RelationClearRelation (relcache.c:2568) ==2198271== by 0x675064: RelationFlushRelation (relcache.c:2736) ==2198271== by 0x6750A6: RelationCacheInvalidateEntry (relcache.c:2797) ==2198271== Uninitialised value was created by a heap allocation ==2198271== at 0x6AC308: MemoryContextAlloc (mcxt.c:826) ==2198271== by 0x68A8D9: DynaHashAlloc (dynahash.c:283) ==2198271== by 0x68A94B: element_alloc (dynahash.c:1675) ==2198271== by 0x68AA58: get_hash_entry (dynahash.c:1284) ==2198271== by 0x68B23E: hash_search_with_hash_value (dynahash.c:1057) ==2198271== by 0x68B3D4: hash_search (dynahash.c:913) ==2198271== by 0x80EE855: get_rel_sync_entry (pgoutput.c:681) ==2198271== by 0x80EEDA5: pgoutput_truncate (pgoutput.c:530) ==2198271== by 0x4E48A2: truncate_cb_wrapper (logical.c:797) ==2198271== by 0x4EFDDB: ReorderBufferCommit (reorderbuffer.c:1777) ==2198271== by 0x4E1DBE: DecodeCommit (decode.c:637) ==2198271== by 0x4E1F31: DecodeXactOp (decode.c:245) ==2198271== ==2198271== VALGRINDERROR-END regards, tom lane
On Wed, Jun 16, 2021 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > Pushed! > > skink reports that this has valgrind issues: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2021-06-15%2020%3A49%3A26 > The problem happens at line: rel_sync_cache_relation_cb() { .. if (entry->map) .. I think the reason is that before we initialize 'entry->map' in get_rel_sync_entry(), the invalidation is processed as part of which when we try to clean up the entry, it tries to access uninitialized value. Note, this won't happen in HEAD as we initialize 'entry->map' before we get to process any invalidation. We have fixed a similar issue in HEAD sometime back as part of the commit 69bd60672a, so we need to make a similar change in PG-13 as well. This problem is introduced by commit d250568121 (Fix memory leak due to RelationSyncEntry.map.) not by the patch in this thread, so keeping Amit L and Osumi-San in the loop. -- With Regards, Amit Kapila.
On Thu, Jun 17, 2021 at 12:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Jun 16, 2021 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Amit Kapila <amit.kapila16@gmail.com> writes: > > > Pushed! > > > > skink reports that this has valgrind issues: > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2021-06-15%2020%3A49%3A26 > > > > The problem happens at line: > rel_sync_cache_relation_cb() > { > .. > if (entry->map) > .. > > I think the reason is that before we initialize 'entry->map' in > get_rel_sync_entry(), the invalidation is processed as part of which > when we try to clean up the entry, it tries to access uninitialized > value. Note, this won't happen in HEAD as we initialize 'entry->map' > before we get to process any invalidation. We have fixed a similar > issue in HEAD sometime back as part of the commit 69bd60672a, so we > need to make a similar change in PG-13 as well. > > This problem is introduced by commit d250568121 (Fix memory leak due > to RelationSyncEntry.map.) not by the patch in this thread, so keeping > Amit L and Osumi-San in the loop. Thanks. Maybe not sufficient as a fix, but I wonder if rel_sync_cache_relation_cb() should really also check that replicate_valid is true in the following condition: /* * Reset schema sent status as the relation definition may have changed. * Also free any objects that depended on the earlier definition. */ if (entry != NULL) { If the problem is with HEAD, I don't quite understand why the last statement of the following code block doesn't suffice: /* Not found means schema wasn't sent */ if (!found) { /* immediately make a new entry valid enough to satisfy callbacks */ entry->schema_sent = false; entry->streamed_txns = NIL; entry->replicate_valid = false; entry->pubactions.pubinsert = entry->pubactions.pubupdate = entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false; entry->publish_as_relid = InvalidOid; entry->map = NULL; /* will be set by maybe_send_schema() if needed */ } Do we need the same statement at the end of the following block? /* Validate the entry */ if (!entry->replicate_valid) { -- Amit Langote EDB: http://www.enterprisedb.com
On Thu, Jun 17, 2021 at 10:39 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Thu, Jun 17, 2021 at 12:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jun 16, 2021 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > Amit Kapila <amit.kapila16@gmail.com> writes: > > > > Pushed! > > > > > > skink reports that this has valgrind issues: > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2021-06-15%2020%3A49%3A26 > > > > > > > The problem happens at line: > > rel_sync_cache_relation_cb() > > { > > .. > > if (entry->map) > > .. > > > > I think the reason is that before we initialize 'entry->map' in > > get_rel_sync_entry(), the invalidation is processed as part of which > > when we try to clean up the entry, it tries to access uninitialized > > value. Note, this won't happen in HEAD as we initialize 'entry->map' > > before we get to process any invalidation. We have fixed a similar > > issue in HEAD sometime back as part of the commit 69bd60672a, so we > > need to make a similar change in PG-13 as well. > > > > This problem is introduced by commit d250568121 (Fix memory leak due > > to RelationSyncEntry.map.) not by the patch in this thread, so keeping > > Amit L and Osumi-San in the loop. > > Thanks. > > Maybe not sufficient as a fix, but I wonder if > rel_sync_cache_relation_cb() should really also check that > replicate_valid is true in the following condition: > I don't think that is required because we initialize the entry in "if (!found)" case in the HEAD. > /* > * Reset schema sent status as the relation definition may have changed. > * Also free any objects that depended on the earlier definition. > */ > if (entry != NULL) > { > > If the problem is with HEAD, > The problem occurs only in PG-13. So, we need to make PG-13 code similar to HEAD as far as initialization of entry is concerned. -- With Regards, Amit Kapila.
On Thu, Jun 17, 2021 at 3:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Jun 17, 2021 at 10:39 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > > On Thu, Jun 17, 2021 at 12:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > On Wed, Jun 16, 2021 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > > > Amit Kapila <amit.kapila16@gmail.com> writes: > > > > > Pushed! > > > > > > > > skink reports that this has valgrind issues: > > > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2021-06-15%2020%3A49%3A26 > > > > > > > > > > The problem happens at line: > > > rel_sync_cache_relation_cb() > > > { > > > .. > > > if (entry->map) > > > .. > > > > > > I think the reason is that before we initialize 'entry->map' in > > > get_rel_sync_entry(), the invalidation is processed as part of which > > > when we try to clean up the entry, it tries to access uninitialized > > > value. Note, this won't happen in HEAD as we initialize 'entry->map' > > > before we get to process any invalidation. We have fixed a similar > > > issue in HEAD sometime back as part of the commit 69bd60672a, so we > > > need to make a similar change in PG-13 as well. > > > > > > This problem is introduced by commit d250568121 (Fix memory leak due > > > to RelationSyncEntry.map.) not by the patch in this thread, so keeping > > > Amit L and Osumi-San in the loop. > > > > Thanks. > > > > Maybe not sufficient as a fix, but I wonder if > > rel_sync_cache_relation_cb() should really also check that > > replicate_valid is true in the following condition: > > I don't think that is required because we initialize the entry in "if > (!found)" case in the HEAD. Yeah, I see that. If we can be sure that the callback can't get called between hash_search() allocating the entry and the above code block making the entry look valid, which appears to be the case, then I guess we don't need to worry. > > /* > > * Reset schema sent status as the relation definition may have changed. > > * Also free any objects that depended on the earlier definition. > > */ > > if (entry != NULL) > > { > > > > If the problem is with HEAD, > > > > The problem occurs only in PG-13. So, we need to make PG-13 code > similar to HEAD as far as initialization of entry is concerned. Oh I missed that the problem report is for the PG13 branch. How about the attached patch then? -- Amit Langote EDB: http://www.enterprisedb.com
Attachment
On Thu, Jun 17, 2021 at 12:52 PM Amit Langote <amitlangote09@gmail.com> wrote: > Oh I missed that the problem report is for the PG13 branch. > > How about the attached patch then? > Looks good, one minor comment, how about making the below comment, same as on the head? - if (!found || !entry->replicate_valid) + if (!found) + { + /* + * Make the new entry valid enough for the callbacks to look at, in + * case any of them get invoked during the more complicated + * initialization steps below. + */ On head: if (!found) { /* immediately make a new entry valid enough to satisfy callbacks */ -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Hi Dilip, On Thu, Jun 17, 2021 at 4:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > On Thu, Jun 17, 2021 at 12:52 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > Oh I missed that the problem report is for the PG13 branch. > > > > How about the attached patch then? > > > Looks good, Thanks for checking. > one minor comment, how about making the below comment, > same as on the head? > > - if (!found || !entry->replicate_valid) > + if (!found) > + { > + /* > + * Make the new entry valid enough for the callbacks to look at, in > + * case any of them get invoked during the more complicated > + * initialization steps below. > + */ > > On head: > if (!found) > { > /* immediately make a new entry valid enough to satisfy callbacks */ Agree it's better to have the same comment in both branches. Though, I think it should be "the new entry", not "a new entry". I find the sentence I wrote a bit more enlightening, but I am fine with just fixing the aforementioned problem with the existing comment. I've updated the patch. Also, attaching a patch for HEAD for the s/a/the change. While at it, I also capitalized "immediately". -- Amit Langote EDB: http://www.enterprisedb.com
Attachment
On Thu, Jun 17, 2021 at 1:35 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Hi Dilip, > > On Thu, Jun 17, 2021 at 4:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Jun 17, 2021 at 12:52 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > > Oh I missed that the problem report is for the PG13 branch. > > > > > > How about the attached patch then? > > > > > Looks good, > > Thanks for checking. > > > one minor comment, how about making the below comment, > > same as on the head? > > > > - if (!found || !entry->replicate_valid) > > + if (!found) > > + { > > + /* > > + * Make the new entry valid enough for the callbacks to look at, in > > + * case any of them get invoked during the more complicated > > + * initialization steps below. > > + */ > > > > On head: > > if (!found) > > { > > /* immediately make a new entry valid enough to satisfy callbacks */ > > Agree it's better to have the same comment in both branches. > > Though, I think it should be "the new entry", not "a new entry". I > find the sentence I wrote a bit more enlightening, but I am fine with > just fixing the aforementioned problem with the existing comment. > > I've updated the patch. Also, attaching a patch for HEAD for the > s/a/the change. While at it, I also capitalized "immediately". > Your patch looks good to me as well. I would like to retain the comment as it is from master for now. I'll do some testing and push it tomorrow unless there are additional comments. -- With Regards, Amit Kapila.
On Thu, Jun 17, 2021 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Your patch looks good to me as well. I would like to retain the > comment as it is from master for now. I'll do some testing and push it > tomorrow unless there are additional comments. > Pushed! -- With Regards, Amit Kapila.
Hi, On 6/18/21 5:50 AM, Amit Kapila wrote: > On Thu, Jun 17, 2021 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> Your patch looks good to me as well. I would like to retain the >> comment as it is from master for now. I'll do some testing and push it >> tomorrow unless there are additional comments. >> > > Pushed! > While rebasing a patch broken by 4daa140a2f5, I noticed that the patch does this: @@ -63,6 +63,7 @@ enum ReorderBufferChangeType REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID, REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT, REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM, + REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT, REORDER_BUFFER_CHANGE_TRUNCATE }; I understand adding the ABORT right after CONFIRM Isn't that an undesirable ABI break for extensions? It changes the value for the TRUNCATE item, so if an extension references to that somehow it'd suddenly start failing (until it gets rebuilt). And the failures would be pretty confusing and seemingly contradicting the code. FWIW I don't know how likely it is for an extension to depend on the TRUNCATE value (it'd be far worse for INSERT/UPDATE/DELETE), but seems moving the new element at the end would solve this. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > While rebasing a patch broken by 4daa140a2f5, I noticed that the patch > does this: > @@ -63,6 +63,7 @@ enum ReorderBufferChangeType > REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID, > REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT, > REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM, > + REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT, > REORDER_BUFFER_CHANGE_TRUNCATE > }; > Isn't that an undesirable ABI break for extensions? I think it's OK in HEAD. I agree we shouldn't do it like that in the back branches. regards, tom lane
On Wed, Jun 23, 2021 at 8:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > > While rebasing a patch broken by 4daa140a2f5, I noticed that the patch > > does this: > > > @@ -63,6 +63,7 @@ enum ReorderBufferChangeType > > REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID, > > REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT, > > REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM, > > + REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT, > > REORDER_BUFFER_CHANGE_TRUNCATE > > }; > > > Isn't that an undesirable ABI break for extensions? > > I think it's OK in HEAD. I agree we shouldn't do it like that > in the back branches. > Okay, I'll change this in back branches and HEAD to keep the code consistent, or do you think it is better to retain the order in HEAD as it is and just change it for back-branches? -- With Regards, Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> writes: >> I think it's OK in HEAD. I agree we shouldn't do it like that >> in the back branches. > Okay, I'll change this in back branches and HEAD to keep the code > consistent, or do you think it is better to retain the order in HEAD > as it is and just change it for back-branches? As I said, I'd keep the natural ordering in HEAD. regards, tom lane
On Thu, Jun 24, 2021 at 12:25:15AM -0400, Tom Lane wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> Okay, I'll change this in back branches and HEAD to keep the code >> consistent, or do you think it is better to retain the order in HEAD >> as it is and just change it for back-branches? > > As I said, I'd keep the natural ordering in HEAD. Yes, please keep the items in an alphabetical order on HEAD, and just have the new item at the bottom of the enum in the back-branches. That's the usual practice. -- Michael
Attachment
On Thu, Jun 24, 2021 at 11:04 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jun 24, 2021 at 12:25:15AM -0400, Tom Lane wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > >> Okay, I'll change this in back branches and HEAD to keep the code > >> consistent, or do you think it is better to retain the order in HEAD > >> as it is and just change it for back-branches? > > > > As I said, I'd keep the natural ordering in HEAD. > > Yes, please keep the items in an alphabetical order on HEAD, and just > have the new item at the bottom of the enum in the back-branches. > That's the usual practice. > Okay, I have back-patched the change till v11 because before that REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT is already at the end. -- With Regards, Amit Kapila.