Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > I think the problem is that CreateMultiXactId calls
> > GetNewMultiXactId and then RecordNewMultiXact, and the lock is released
> > between the calls. So one backend could try to read the offset before
> > another one had the time to finish writing it.
>
> Ugh, yes, that is clearly a hole :-( even if it turns out not to explain
> Matteo's observation.
>
> I don't see any easy way to fix this except by introducing a lot more
> locking than is there now --- ie, holding the MultiXactGenLock until the
> new mxact's starting offset has been written to disk. Any better ideas?
Well, it isn't a very good solution because it requires us to retain the
MultiXactGenLock past a XLogInsert and some I/O on SLRU pages.
Previously the lock was mostly only used in short operations and very
rarely held during I/O. But I don't see any other solution either.
Patch attached.
I confess being attracted to Martijn's idea of looping until the correct
answer is obtained. I don't think it's even too difficult to implement.
But I wonder if there's some hidden pitfall.
Thanks to Matteo for finding the bug!
--
Alvaro Herrera http://www.PlanetPostgreSQL.org
"El número de instalaciones de UNIX se ha elevado a 10,
y se espera que este número aumente" (UPM, 1972)