Re: BUG #16329: Valgrind detects an invalid read when building a gist index with buffering - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #16329: Valgrind detects an invalid read when building a gist index with buffering
Date
Msg-id 10261.1588705157@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #16329: Valgrind detects an invalid read when building a gistindex with buffering  (Alexander Lakhin <exclusion@gmail.com>)
Responses Re: BUG #16329: Valgrind detects an invalid read when building a gistindex with buffering  (Alexander Lakhin <exclusion@gmail.com>)
Re: BUG #16329: Valgrind detects an invalid read when building a gist index with buffering  (Pavel Borisov <pashkin.elfe@gmail.com>)
List pgsql-bugs
Alexander Lakhin <exclusion@gmail.com> writes:
>> Please look at the patch that modifies the gist regression test to make
>> the issue visible and fixes it by avoiding access to the memory context
>> that can be reset in gistProcessEmptyingQueue().

> Could you please let me know if the fix is incorrect (or not elaborated
> enough to be included in the upcoming releases) or this issue is false
> positive?

I took a look at this.  I see the hazard, but I'm not sure that
I like your proposed quick-fix.  Essentially, the problem is that
gistBuildCallback is supposing that the tuple it creates will survive
the execution of its subroutines, which in fact it won't always.
But that means that at some point the tuple pointer it's passed down to
those subroutines becomes a dangling pointer.  What guarantee would we
have that the subroutines don't access the tuple after that point?

I think the real issue here is confusion about which level of function
"owns" the tempCxt and gets to decide when to reset it.  We can't have
both gistBuildCallback and gistProcessEmptyingQueue doing that.  Maybe
we need more than one temporary context, or maybe we could just skip
the context reset in gistProcessEmptyingQueue and let the storage
accumulate till we get back to gistBuildCallback.  But in any case
it's going to require more analysis.

Or, if we do just hack it as you suggest, there had better be a couple
of fat comments in gistBufferingBuildInsert warning about the hazards.

I was somewhat dismayed to realize from looking at the code coverage
report that we have exactly zero test coverage of the gist buffering
build code paths today.  That's just awful; how did the feature get
committed with no test coverage?  Your proposed test additions raise the
coverage in gistbuild.c and gistbuildbuffers.c to something respectable,
at least.  But it looks like there are still some potentially-delicate
paths that aren't tested, notably the "have to stop buffering" business.
Can we do better at reasonable cost, or are those paths just never
reached without huge data volume?  (Could we make them more reachable
by turning down maintenance_work_mem or some other setting?)

            regards, tom lane



pgsql-bugs by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: BUG #16416: unable to start the server with pg_CTL
Next
From: Andreas Seltenreich
Date:
Subject: Re: BUG #16393: PANIC: cannot abort transaction, it was already committed