On 11/19/18 10:28 AM, Masahiko Sawada wrote:
> On Mon, Nov 19, 2018 at 6:52 AM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>>
>> Hi,
>>
>> It seems we have pretty annoying problem with logical decoding when
>> performing VACUUM FULL / CLUSTER on a table with toast-ed data.
>>
>> The trouble is that the rewritten heap is WAL-logged using XLOG/FPI
>> records, the TOAST data is logged as regular INSERT records. XLOG/FPI is
>> ignored in logical decoding, and so reorderbuffer never gets those
>> records. But we do decode the TOAST data, and reorderbuffer stashes them
>> in toast_hash hash table. Which gets reset only when handling a row from
>> the main heap, and that never arrives. So we end up stashing all the
>> TOAST data in memory :-(
>>
>> So do VACUUM FULL (or CLUSTER) on a sufficiently large table, and you're
>> likely to break any logical replication connection. And it does not
>> matter if you replicate this particular table.
>>
>> Luckily enough, this can leverage some of the pieces introduced by
>> e9edc1ba which was meant to deal with rewrites of system tables, and in
>> raw_heap_insert it added this:
>>
>> /*
>> * The new relfilenode's relcache entrye doesn't have the necessary
>> * information to determine whether a relation should emit data for
>> * logical decoding. Force it to off if necessary.
>> */
>> if (!RelationIsLogicallyLogged(state->rs_old_rel))
>> options |= HEAP_INSERT_NO_LOGICAL;
>>
>> As raw_heap_insert is used only for heap rewrites, we can simply remove
>> the if condition and use the HEAP_INSERT_NO_LOGICAL flag for all TOAST
>> data logged from here.
>>
>
> This fix seems fine to me.
>
>> This does fix the issue, because we still decode the TOAST changes but
>> there are no data and so
>>
>> if (change->data.tp.newtuple != NULL)
>> {
>> dlist_delete(&change->node);
>> ReorderBufferToastAppendChunk(rb, txn, relation,
>> change);
>> }
>>
>> ends up not stashing the change in the hash table.
>
> With the below change you proposed can we remove the above condition
> because toast-insertion changes being processed by the reorder buffer
> always have a new tuple? If a toast-insertion record doesn't have a
> new tuple it has already ignored when decoding.
>
Good point. I think you're right the reorderbuffer part may be
simplified as you propose.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services