Home > mailing lists

Re: Segmentation Fault in logical decoding get/peek API - Mailing list pgsql-bugs

From	Jeremy Finzel
Subject	Re: Segmentation Fault in logical decoding get/peek API
Date	March 4, 2019 14:28:36
Msg-id	CAMa1XUivSJ2idopJUhMi-CWundycnH4OdwWOiZ_D+12BvMLObg@mail.gmail.com Whole thread Raw
In response to	Re: Segmentation Fault in logical decoding get/peek API (Jeremy Finzel <finzelj@gmail.com>)
List	pgsql-bugs

Tree view

On Wed, Feb 27, 2019 at 8:34 AM Jeremy Finzel <finzelj@gmail.com> wrote:

On Thu, Feb 14, 2019 at 5:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeremy Finzel <finzelj@gmail.com> writes:
> On Thu, Feb 14, 2019 at 4:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So my bet is that your problem was fixed by some other commit between
>> 10.3 and 10.6. Maybe the predecessor one, b767b3f2e; but hard to say
>> without more investigation than seems warranted, if the bug's gone.

> I am willing to put in more time debugging this because we want to know
> which clusters are actually susceptible to the bug. Any suggestions to
> proceed are welcome.

Well, as Peter said, "git bisect" and trying to reproduce the problem
at each step would be the way to prove it definitively. Seems mighty
tedious though. Possibly you could shave some time off the process
by assuming it must have been one of the commits that touched
reorderbuffer.c ... a quick check says there have been ten of those
in the v10 branch since 10.3.

regards, tom lane

I have confirmed which commit fixed the segfault:

cee1dd1eeda1e7b86b78f240d24bbfde21d75928 is the first fixed commit

commit cee1dd1eeda1e7b86b78f240d24bbfde21d75928

Author: Alvaro Herrera <alvherre@alvh.no-ip.org>

Date: Tue Mar 6 15:57:20 2018 -0300

Refrain from duplicating data in reorderbuffers

If a walsender exits leaving data in reorderbuffers, the next walsender

that tries to decode the same transaction would append its decoded data

in the same spill files without truncating it first, which effectively

duplicate the data. Avoid that by removing any leftover reorderbuffer

spill files when a walsender starts.

Backpatch to 9.4; this bug has been there from the very beginning of

logical decoding.

Author: Craig Ringer, revised by me

Reviewed by: Álvaro Herrera, Petr Jelínek, Masahiko Sawada

I'm very curious if anyone more familiar with the code base understands how the old code was vulnerable to a segfault, but not the new code. I don't think in my situation a walsender was exiting at all, so I do think the specific reason for this code change accidentally fixed a potential segfault unrelated to an exiting walsender leaving data in reorderbuffers.

I still can't explain precisely what reproduces it. As I have said before, some unique things about my setup are lots of INSERT ... ON CONFLICT UPDATE queries, as well as btree gist exclusion indexes, but I suspect this has to do with the former somehow, although I haven't been able to create a reliably reproducing crash.

I suggest at least adding something to the 10.4 (and corollary major version) release notes around this.

Here is the release note now:

In logical decoding, avoid possible double processing of WAL data when a walsender restarts (Craig Ringer)

Perhaps some addition like this:

In logical decoding, avoid possible double processing of WAL data when a walsender restarts (Craig Ringer)
This change also had the effect of fixing a potential segfault vulnerability in the logical decoding peek/get_changes API in ReorderBufferCommit

Thanks,

Jeremy

pgsql-bugs by date:

From: Tom Lane
Date: 04 March 2019, 14:27:43
Subject: Re: BUG #15666: Seemingly incorrect error reporting/logging for violation of non-null constraint

From: Amit Langote
Date: 04 March 2019, 15:12:25
Subject: Re: BUG #15662: Row update in foreign partition does not move rowbased on partition key

Re: Segmentation Fault in logical decoding get/peek API - Mailing list pgsql-bugs

Previous

Next