On Tue, May 31, 2022 at 01:30:11PM +0900, Michael Paquier wrote:
> On Sat, May 28, 2022 at 09:34:03PM -0700, Noah Misch wrote:
>> On Sat, May 28, 2022 at 08:30:33PM -0700, Nathan Bossart wrote:
>>> I started looking at the problem and hacked together a
>>> proof-of-concept based on your first candidate that seems to fix the
>>> reported issue. However, from the upthread discussion, it is not clear
>>> whether there is agreement on the approach. IIUC there are still many
>>> other code paths that would require a similar treatment, so perhaps
>>> identifying all of those would be a good next step.
>>
>> Agreed. To identify them, perhaps put an ereport(..., errbacktrace()) in
>> aclmask(), then write some index-creating DDL that refers to the
>> largest-possible number of objects.
>
> While we've reached an agreement on the thread related to the
> corruption caused by the incorrect snapshots for concurrent index
> builds, this thread seems to be stalling a bit and we should try to
> move on before getting a release out. Is somebody looking at what we
> have here?
I've spent some time looking at all the code impacted by a117ceb and
0abc1a0, and I've yet to identify any additional problems besides the one
with ResolveOpClass(). However, I'm far from confident in this analysis,
and I still need to try out the ereport() approach that Noah suggested. I
would welcome any assistance identifying other problem areas.
For now, I've attached a slightly polished patch to fix the reported issue.
It's still rather hacky, and it does nothing to reduce the complexity of
the current approach of weaving between user IDs, but perhaps it can serve
as a stopgap solution.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com