Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Date
Msg-id CAFiTN-v0sQ0_xER+=BEo5ZPduLn17AZ8pQs6GN1bsy0xnd=Vvw@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
List pgsql-hackers
On Thu, Oct 3, 2019 at 2:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 3, 2019 at 4:03 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Wed, Oct 02, 2019 at 04:27:30AM +0530, Amit Kapila wrote:
>> >On Tue, Oct 1, 2019 at 7:21 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> >wrote:
>> >
>> >> On Tue, Oct 01, 2019 at 06:55:52PM +0530, Amit Kapila wrote:
>> >> >
>> >> >On further testing, I found that the patch seems to have problems with
>> >> >toast.  Consider below scenario:
>> >> >Session-1
>> >> >Create table large_text(t1 text);
>> >> >INSERT INTO large_text
>> >> >SELECT (SELECT string_agg('x', ',')
>> >> >FROM generate_series(1, 1000000)) FROM generate_series(1, 1000);
>> >> >
>> >> >Session-2
>> >> >SELECT * FROM pg_create_logical_replication_slot('regression_slot',
>> >> >'test_decoding');
>> >> >SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
>> >> >*--kaboom*
>> >> >
>> >> >The second statement in Session-2 leads to a crash.
>> >> >
>> >>
>> >> OK, thanks for the report - will investigate.
>> >>
>> >
>> >It was an assertion failure in ReorderBufferCleanupTXN at below line:
>> >+ /* Check we're not mixing changes from different transactions. */
>> >+ Assert(change->txn == txn);
>> >
>>
>> Can you still reproduce this issue with the patch I sent on 28/9? I have
>> been unable to trigger the failure, and it seems pretty similar to the
>> failure you reported (and I fixed) on 28/9.
>
>
> Yes, it seems we need a similar change in ReorderBufferAddNewTupleCids.  I think in session-2 you need to create
replicationslot before creating table in session-1 to see this problem. 
>
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
> @@ -2196,6 +2196,7 @@ ReorderBufferAddNewTupleCids(ReorderBuffer *rb, TransactionId xid,
>         change->data.tuplecid.cmax = cmax;
>         change->data.tuplecid.combocid = combocid;
>         change->lsn = lsn;
> +       change->txn = txn;
>         change->action = REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID;
>         dlist_push_tail(&txn->tuplecids, &change->node);
>
> Few more comments:
> -----------------------------------
> 1.
> +static bool
> +check_logical_decoding_work_mem(int *newval, void **extra, GucSource source)
> +{
> + /*
> + * -1 indicates fallback.
> + *
> + * If we haven't yet changed the boot_val default of -1, just let it be.
> + * logical decoding will look to maintenance_work_mem instead.
> + */
> + if (*newval == -1)
> + return true;
> +
> + /*
> + * We clamp manually-set values to at least 64kB. The maintenance_work_mem
> + * uses a higher minimum value (1MB), so this is OK.
> + */
> + if (*newval < 64)
> + *newval = 64;
>
> I think this needs to be changed as now we don't rely on maintenance_work_mem.  Another thing related to this is that
Ithink the default value for logical_decoding_work_mem still seems to be -1.  We need to make it to 64MB.  I have seen
thiswhile debugging memory accounting changes.  I think this is the reason why I was not seeing toast related changes
beingserialized because, in that test, I haven't changed the default value of logical_decoding_work_mem. 
>
> 2.
> + /*
> + * We're going modify the size of the change, so to make sure the
> + * accounting is correct we'll make it look like we're removing the
> + * change now (with the old size), and then re-add it at the end.
> + */
>
>
> /going modify/going to modify/
>
> 3.
> + *
> + * While updating the existing change with detoasted tuple data, we need to
> + * update the memory accounting info, because the change size will differ.
> + * Otherwise the accounting may get out of sync, triggering serialization
> + * at unexpected times.
> + *
> + * We simply subtract size of the change before rejiggering the tuple, and
> + * then adding the new size. This makes it look like the change was removed
> + * and then added back, except it only tweaks the accounting info.
> + *
> + * In particular it can't trigger serialization, which would be pointless
> + * anyway as it happens during commit processing right before handing
> + * the change to the output plugin.
>   */
>  static void
>  ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
> @@ -3023,6 +3281,13 @@ ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
>   if (txn->toast_hash == NULL)
>   return;
>
> + /*
> + * We're going modify the size of the change, so to make sure the
> + * accounting is correct we'll make it look like we're removing the
> + * change now (with the old size), and then re-add it at the end.
> + */
> + ReorderBufferChangeMemoryUpdate(rb, change, false);
>
> It is not very clear why this change is required.  Basically, this is done at commit time after which actually we
shouldn'tattempt to spill these changes.  This is mentioned in comments as well, but it is not clear if that is the
case,then how and when accounting can create a problem.  If possible, can you explain it with an example? 
>
IIUC, we are keeping the track of the memory in ReorderBuffer which is
common across the transactions.  So even if this transaction is
committing and will not spill to dis but we need to keep the memory
accounting correct for the future changes in other transactions.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object14185/39327/0 is already held
Next
From: David Fetter
Date:
Subject: Re: adding partitioned tables to publications