Hi,
On 2019-11-18 21:58:16 +0100, Tomas Vondra wrote:
> and the ReorderBufferToastReplace does this:
>
> newtup = change->data.tp.newtuple;
>
> heap_deform_tuple(&newtup->tuple, desc, attrs, isnull);
>
> but that fails, because the tuple pointer happens to be 0x8, which is
> clearly bogus. Not sure where that comes from, I don't recognize that as
> a typical patter.
It indicates that change->data.tp.newtuple is NULL,
afaict. newtup->tuple boils down to
((char *) newtup->tuple) + offsetof(ReorderBufferTupleBuf, tuple)
and offsetof(ReorderBufferTupleBuf, tuple) is 0x8.
> Can you create a core dump (see [1]), and print 'change' and 'txn' in
> frame #2? I wonder if some the other fields are bogus too (but it can't
> be entirely true ...), and if the transaction got serialized.
Please print change and *change, both, please.
I suspect what's happening is that somehow a change that shouldn't have
toast changes - e.g. a DELETE - somehow has toast changes. Which then
triggers a failure in ReorderBufferToastReplace(), which expects
newtuple to be valid.
It's probably worthwhile to add an elog(ERROR) check for this, even if
this does not turn out to be the case.
> > This behaviour does not depends on defined data in tables, because we see it
> > in different database with different sets of tables in publications.
>
> I'm not sure I really believe that. Surely there has to be something
> special about your schema, or possibly access patter that triggers this
> bug in your environment and not elsewhere.
Yea. Are there any C triggers present? Any unusual extensions? Users of
the transaction hook, for example?
> > Looks like a real issue in logical replication.
> > I will happy to provide an additional information about that issue, but i
> > should know what else to need to collect for helping to solve this
> > problem.
> >
>
> Well, if you can create a reproducer, that'd be the best option, because
> then we can investigate locally instead of the ping-ping here.
>
> But if that's not possible, let's start with the schema and the
> additional information from the core file.
>
> I'd also like to see the contents of the WAL, particularly for the XID
> triggering this issue. Please run pg_waldump and see how much data is
> there for XID 1667601527. It does commit at 25EE/D6DE6EB8, not sure
> where it starts. It may have subtransactions, so don't do just grep.
Yea, that'd be helpful.
Greetings,
Andres Freund