Thread: [HACKERS] 10RC1 crash testing MultiXact oddity

[HACKERS] 10RC1 crash testing MultiXact oddity

From
Jeff Janes
Date:
I am running some crash recovery testing against 10rc1 by injecting torn page writes, using a test case which generates a lot of multixact, some naturally by doing a lot fk updates, but most artificially by calling the pg_burn_multixact function from one of the attached patches.

In 22 hours of running I got 12 instances were messages like this appear:

MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 681012168 does not exist on disk

This is not a fatal error, and no inconsistent data is found at the end of the run.  But the code comments suggests that this should only happen on a server that has been upgraded from 9.3 or 9.4, which this server has not been.

Is the presence of this log message something that needs to be investigated further?

Thanks,

Jeff


Attachment

Re: [HACKERS] 10RC1 crash testing MultiXact oddity

From
Alvaro Herrera
Date:
Jeff Janes wrote:
> I am running some crash recovery testing against 10rc1 by injecting torn
> page writes, using a test case which generates a lot of multixact, some
> naturally by doing a lot fk updates, but most artificially by calling
> the pg_burn_multixact function from one of the attached patches.

Is this new in pg10, or do you also see it in 9.6?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10RC1 crash testing MultiXact oddity

From
Robert Haas
Date:
On Fri, Sep 22, 2017 at 11:37 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Is the presence of this log message something that needs to be investigated
> further?

I'd say yes.  It sounds like we have a race condition someplace that
previous fixes in this area failed to adequately understand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10RC1 crash testing MultiXact oddity

From
Jeff Janes
Date:
On Fri, Sep 22, 2017 at 8:47 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Jeff Janes wrote:
> I am running some crash recovery testing against 10rc1 by injecting torn
> page writes, using a test case which generates a lot of multixact, some
> naturally by doing a lot fk updates, but most artificially by calling
> the pg_burn_multixact function from one of the attached patches.

Is this new in pg10, or do you also see it in 9.6?

It turns out it is not new in pg10.  I spotted in the log file only by accident while looking for something else.  Now that I am looking for it, I do see it in 9.6 as well.

Cheers,

Jeff

Re: [HACKERS] 10RC1 crash testing MultiXact oddity

From
Robert Haas
Date:
On Fri, Sep 22, 2017 at 3:39 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> It turns out it is not new in pg10.  I spotted in the log file only by
> accident while looking for something else.  Now that I am looking for it, I
> do see it in 9.6 as well.

So I guess the next question is whether it also shows up if you initdb
with 9.4.latest and then run the same test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10RC1 crash testing MultiXact oddity

From
Jeff Janes
Date:
On Fri, Sep 22, 2017 at 1:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 22, 2017 at 3:39 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> It turns out it is not new in pg10.  I spotted in the log file only by
> accident while looking for something else.  Now that I am looking for it, I
> do see it in 9.6 as well.

So I guess the next question is whether it also shows up if you initdb
with 9.4.latest and then run the same test.

git bisect shows that it shows up in 9.5, at this commit:

commit bd7c348d83a4576163b635010e49dbcac7126f01
Author: Andres Freund <andres@anarazel.de>
Date:   Sat Sep 26 19:04:25 2015 +0200

    Rework the way multixact truncations work.

The patches which enable the crashes and the rapid consumption of xid and multixact both need a little adjustment from the 10rc1 versions, so I'm attaching a combined patch that applies to bd7c348d83.

Not really sure what the next step is here.  I could promote the ereport(LOG...) to a PANIC to get a core dump, but I don't think that would help because presumably the problem occurred early, when the truncation was done, not when it was detected.

Cheers,

Jeff


Attachment

Re: [HACKERS] 10RC1 crash testing MultiXact oddity

From
Alvaro Herrera
Date:
Jeff Janes wrote:
> On Fri, Sep 22, 2017 at 1:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> 
> > On Fri, Sep 22, 2017 at 3:39 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > > It turns out it is not new in pg10.  I spotted in the log file only by
> > > accident while looking for something else.  Now that I am looking for
> > it, I
> > > do see it in 9.6 as well.
> >
> > So I guess the next question is whether it also shows up if you initdb
> > with 9.4.latest and then run the same test.
> >
> 
> git bisect shows that it shows up in 9.5, at this commit:
> 
> commit bd7c348d83a4576163b635010e49dbcac7126f01
> Author: Andres Freund <andres@anarazel.de>
> Date:   Sat Sep 26 19:04:25 2015 +0200
> 
>     Rework the way multixact truncations work.

Oh man.  And I thought we were done with that stuff :-(

> Not really sure what the next step is here.  I could promote the
> ereport(LOG...) to a PANIC to get a core dump, but I don't think that would
> help because presumably the problem occurred early, when the truncation was
> done, not when it was detected.

Probably the best way to track it down is to add some instrumentation
elog(LOG) to the multixact truncation mechanism.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers