Thread: MultiXact member wraparound protections are now enabled
Why is this message logged by default in a fresh installation? The technicality of that message doesn't seem to match the kinds of messages that we normally print at startup.
On Wed, Jul 22, 2015 at 4:11 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > Why is this message logged by default in a fresh installation? The > technicality of that message doesn't seem to match the kinds of messages > that we normally print at startup. It seems nobody likes that message. I did it that way because I wanted to provide an easy way for users to know whether they had those protections enabled. If you don't display the message when things are already OK at startup, users have to make a negative inference, like this: let's see, I'm on a version that is new enough that it would have printed a message if the protections had not been enabled, so the absence of the message must mean things are OK. But it seemed to me that this could be rather confusing. I thought it would be better to be explicit about whether the protections are enabled in all cases. That way, (1) if you see the message saying they are enabled, they are enabled; (2) if you see the message saying they are disabled, they are disabled; and (3) if you see neither message, your version does not have those protections. You are not the first person to dislike this, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 7/22/15 4:45 PM, Robert Haas wrote: > But it seemed to me that this could be rather confusing. I thought it > would be better to be explicit about whether the protections are > enabled in all cases. That way, (1) if you see the message saying > they are enabled, they are enabled; (2) if you see the message saying > they are disabled, they are disabled; and (3) if you see neither > message, your version does not have those protections. But this is not documented, AFAICT, so I don't think anyone is going to be able to follow that logic. I don't see anything in the release notes saying, look for this message to see how this applies to you, or whatever.
On Fri, Jul 24, 2015 at 9:14 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On 7/22/15 4:45 PM, Robert Haas wrote: >> But it seemed to me that this could be rather confusing. I thought it >> would be better to be explicit about whether the protections are >> enabled in all cases. That way, (1) if you see the message saying >> they are enabled, they are enabled; (2) if you see the message saying >> they are disabled, they are disabled; and (3) if you see neither >> message, your version does not have those protections. > > But this is not documented, AFAICT, so I don't think anyone is going to > be able to follow that logic. I don't see anything in the release notes > saying, look for this message to see how this applies to you, or whatever. Good point. I can't tell you what the right thing to do is, and I'm sure there is room for debate about that. I'm only telling you why I did what I did. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 22 July 2015 at 21:45, Robert Haas <robertmhaas@gmail.com> wrote:
--
But it seemed to me that this could be rather confusing. I thought it
would be better to be explicit about whether the protections are
enabled in all cases. That way, (1) if you see the message saying
they are enabled, they are enabled; (2) if you see the message saying
they are disabled, they are disabled; and (3) if you see neither
message, your version does not have those protections.
(3) would imply that we can't ever remove the message, in case people think they are unprotected.
If we display (1) and then we find a further bug, where does that leave us? Do we put a second "really, really fixed" message?
AIUI this refers to a bug fix, its not like we've invented some anti-virus mode to actively prevent or even scan for further error. I'm not sure why we need a message to say a bug fix has been applied; that is what the release notes are for.
If something is disabled, we should say so, but otherwise silence means safety and success.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jul 24, 2015 at 09:14:09PM -0400, Peter Eisentraut wrote: > On 7/22/15 4:45 PM, Robert Haas wrote: > > But it seemed to me that this could be rather confusing. I thought it > > would be better to be explicit about whether the protections are > > enabled in all cases. That way, (1) if you see the message saying > > they are enabled, they are enabled; (2) if you see the message saying > > they are disabled, they are disabled; and (3) if you see neither > > message, your version does not have those protections. > > But this is not documented, AFAICT, so I don't think anyone is going to > be able to follow that logic. I don't see anything in the release notes > saying, look for this message to see how this applies to you, or whatever. I supported inclusion of the message, because it has good potential to help experts studying historical logs to find the root cause of data corruption. The complex histories of clusters showing corruption from this series of bugs have brought great expense to the task of debugging new reports. Given a cluster having full mxact wraparound protections since last corruption-free backup (or since initdb), one can rule out some causes.
On 26 July 2015 at 20:15, Noah Misch <noah@leadboat.com> wrote:
--
On Fri, Jul 24, 2015 at 09:14:09PM -0400, Peter Eisentraut wrote:
> On 7/22/15 4:45 PM, Robert Haas wrote:
> > But it seemed to me that this could be rather confusing. I thought it
> > would be better to be explicit about whether the protections are
> > enabled in all cases. That way, (1) if you see the message saying
> > they are enabled, they are enabled; (2) if you see the message saying
> > they are disabled, they are disabled; and (3) if you see neither
> > message, your version does not have those protections.
>
> But this is not documented, AFAICT, so I don't think anyone is going to
> be able to follow that logic. I don't see anything in the release notes
> saying, look for this message to see how this applies to you, or whatever.
I supported inclusion of the message, because it has good potential to help
experts studying historical logs to find the root cause of data corruption.
The complex histories of clusters showing corruption from this series of bugs
have brought great expense to the task of debugging new reports. Given a
cluster having full mxact wraparound protections since last corruption-free
backup (or since initdb), one can rule out some causes.
Would it be better to replace it with a less specific and more generally useful message?
For example, Server started with release X.y.z
from which we could infer various useful things.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jul 27, 2015 at 07:59:40AM +0100, Simon Riggs wrote: > On 26 July 2015 at 20:15, Noah Misch <noah@leadboat.com> wrote: > > On Fri, Jul 24, 2015 at 09:14:09PM -0400, Peter Eisentraut wrote: > > > On 7/22/15 4:45 PM, Robert Haas wrote: > > > > But it seemed to me that this could be rather confusing. I thought it > > > > would be better to be explicit about whether the protections are > > > > enabled in all cases. That way, (1) if you see the message saying > > > > they are enabled, they are enabled; (2) if you see the message saying > > > > they are disabled, they are disabled; and (3) if you see neither > > > > message, your version does not have those protections. > > > > > > But this is not documented, AFAICT, so I don't think anyone is going to > > > be able to follow that logic. I don't see anything in the release notes > > > saying, look for this message to see how this applies to you, or > > whatever. > > > > I supported inclusion of the message, because it has good potential to help > > experts studying historical logs to find the root cause of data corruption. > > The complex histories of clusters showing corruption from this series of > > bugs > > have brought great expense to the task of debugging new reports. Given a > > cluster having full mxact wraparound protections since last corruption-free > > backup (or since initdb), one can rule out some causes. > > > Would it be better to replace it with a less specific and more generally > useful message? > > For example, Server started with release X.y.z > from which we could infer various useful things. That message does sound generally useful, but we couldn't infer $subject from it. While the $subject message appears at startup in simple cases, autovacuum prerequisite work can delay it indefinitely.
On Sat, Jul 25, 2015 at 4:11 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 22 July 2015 at 21:45, Robert Haas <robertmhaas@gmail.com> wrote: >> But it seemed to me that this could be rather confusing. I thought it >> would be better to be explicit about whether the protections are >> enabled in all cases. That way, (1) if you see the message saying >> they are enabled, they are enabled; (2) if you see the message saying >> they are disabled, they are disabled; and (3) if you see neither >> message, your version does not have those protections. > > (3) would imply that we can't ever remove the message, in case people think > they are unprotected. > > If we display (1) and then we find a further bug, where does that leave us? > Do we put a second "really, really fixed" message? > > AIUI this refers to a bug fix, its not like we've invented some anti-virus > mode to actively prevent or even scan for further error. I'm not sure why we > need a message to say a bug fix has been applied; that is what the release > notes are for. > > If something is disabled, we should say so, but otherwise silence means > safety and success. Well, I think that we can eventually downgrade or remove the message once (1) we've actually fixed all of the known multixact bugs and (2) a couple of years have gone by and most people are in the clear. But right now, we've still got significant bugs unfixed. https://wiki.postgresql.org/wiki/MultiXact_Bugs Therefore, in my opinion, anything that might make it harder to debug problems with the MultiXact system is premature at this point. The detective work that it took to figure out the chain of events that led to the problem fixed in 068cfadf9e2190bdd50a30d19efc7c9f0b825b5e was difficult; I wanted to make sure that future debugging would be easier, not harder. I still think that's the right decision, but I recognize that not everyone agrees. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 28 July 2015 at 14:20, Robert Haas <robertmhaas@gmail.com> wrote:
--
On Sat, Jul 25, 2015 at 4:11 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 22 July 2015 at 21:45, Robert Haas <robertmhaas@gmail.com> wrote:
>> But it seemed to me that this could be rather confusing. I thought it
>> would be better to be explicit about whether the protections are
>> enabled in all cases. That way, (1) if you see the message saying
>> they are enabled, they are enabled; (2) if you see the message saying
>> they are disabled, they are disabled; and (3) if you see neither
>> message, your version does not have those protections.
>
> (3) would imply that we can't ever remove the message, in case people think
> they are unprotected.
>
> If we display (1) and then we find a further bug, where does that leave us?
> Do we put a second "really, really fixed" message?
>
> AIUI this refers to a bug fix, its not like we've invented some anti-virus
> mode to actively prevent or even scan for further error. I'm not sure why we
> need a message to say a bug fix has been applied; that is what the release
> notes are for.
>
> If something is disabled, we should say so, but otherwise silence means
> safety and success.
Well, I think that we can eventually downgrade or remove the message
once (1) we've actually fixed all of the known multixact bugs and (2)
a couple of years have gone by and most people are in the clear. But
right now, we've still got significant bugs unfixed.
https://wiki.postgresql.org/wiki/MultiXact_Bugs
Therefore, in my opinion, anything that might make it harder to debug
problems with the MultiXact system is premature at this point. The
detective work that it took to figure out the chain of events that led
to the problem fixed in 068cfadf9e2190bdd50a30d19efc7c9f0b825b5e was
difficult; I wanted to make sure that future debugging would be
easier, not harder. I still think that's the right decision, but I
recognize that not everyone agrees.
I do now, thanks for explaining.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 7/28/15 9:20 AM, Robert Haas wrote: > Well, I think that we can eventually downgrade or remove the message > once (1) we've actually fixed all of the known multixact bugs and (2) > a couple of years have gone by and most people are in the clear. Fair enough. But we should document this better in the future.