Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date
Msg-id CA+TgmobF0fbzxXOUy2ooN8Wh7ABfWfDWwFsgnJM3VuzBV=BpMg@mail.gmail.com
Whole thread Raw
In response to Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1  (Andres Freund <andres@anarazel.de>)
Responses Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Jun 2, 2015 at 4:19 PM, Andres Freund <andres@anarazel.de> wrote:
> I'm not really convinced tying things closer to having done trimming is
> easier to understand than tying things to recovery having finished.
>
> E.g.
>         if (did_trim)
>                 oldestOffset = GetOldestReferencedOffset(oldest_datminmxid);
> imo is harder to understand than if (!InRecovery).
>
> Maybe we could just name it finishedStartup and rename the functions accordingly?

Basing that particular call site on InRecovery doesn't work, because
InRecovery isn't set early enough.  But I'm fine to rename it to
whatever.

> Maybe it's worthwhile to add a 'NB: At this stage the data directory is
> not yet necessarily consistent' StartupMultiXact's comments, to avoid
> reintroducing problems like this?

Sure.

>>       /*
>> +      * We can read this without a lock, because it only changes when nothing
>> +      * else is running.
>> +      */
>> +     did_trim = MultiXactState->didTrimMultiXact;
>
> Err, Hot Standby? It might be ok to not lock, but the comment is
> definitely wrong. I'm inclined to simply use locking, this doesn't look
> sufficiently critical performancewise.

/me nods.  Good point.

> Hm. If GetOldestMultiXactOnDisk() gets the starting point by scanning
> the disk it'll always get one at a segment boundary, right? I'm not sure
> that's actually ok; because the value at the beginning of the segment
> can very well end up being a 0, as MaybeExtendOffsetSlru() will have
> filled the page with zeros.
>
> I think that should be harmless, the worst that can happen is that
> oldestOffset errorneously is 0, which should be correct, even though
> possibly overly conservative, in these cases.

Uh oh.  That seems like a real bad problem for this approach.  What
keeps that from being the opposite of too conservative?  There's no
"safe" value in a circular numbering space.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Next
From: Andres Freund
Date:
Subject: Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1