Re: found xmin from before relfrozenxid on pg_catalog.pg_authid - Mailing list pgsql-hackers

From Andres Freund
Subject Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
Date
Msg-id 20180612003305.74vnsl4scn5sdhhf@alap3.anarazel.de
Whole thread Raw
In response to Re: found xmin from before relfrozenxid on pg_catalog.pg_authid  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 2018-05-29 19:14:51 -0400, Alvaro Herrera wrote:
> I added an Assert(DatabasePath != NULL) to
> RelationCacheInitFilePreInvalidate() and didn't see a single crash when
> running the tests.  I thought that adding a "VACUUM FREEZE pg_class"
> in the recovery tests (where there is a standby) ought to do it, but it
> doesn't.  What's the deal there?
> 
> Here are some proposed changes.  Some of these comment edits are WIP :-)

I'm a bit confused by these changes - there seems to be some that look
like a borked diff?  And a number of others look pretty unrelated?

> -- 
> Álvaro Herrera                https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

> diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
> index 0d6100fb08..8a8620943f 100644
> --- a/src/backend/utils/cache/inval.c
> +++ b/src/backend/utils/cache/inval.c
> @@ -872,6 +872,8 @@ ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
>                                       int nmsgs, bool RelcacheInitFileInval,
>                                       Oid dbid, Oid tsid)
>  {
> +    Assert(InRecovery);
> +

Idk, seems unrelated.

>      if (nmsgs <= 0)
>          return;
>  
> @@ -884,12 +886,13 @@ ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
>               dbid);
>  
>          /*
> -         * RelationCacheInitFilePreInvalidate, when the invalidation message
> -         * is for a specific database, requires DatabasePath to be set, but we
> -         * should not use SetDatabasePath during recovery, since it is
> -         * intended to be used only once by normal backends.  Hence, a quick
> -         * hack: set DatabasePath directly then unset after use.
> +         * When the invalidation message is for a specific database,
> +         * RelationCacheInitFilePreInvalidate requires DatabasePath to be set,
> +         * but we're not allowed to use SetDatabasePath during recovery (since
> +         * it is intended to be used by normal backends).  Hence, a quick hack:
> +         * set DatabasePath directly then unset after use.
>           */
> +        Assert(!DatabasePath);    /* don't clobber an existing value */
>          if (OidIsValid(dbid))
>              DatabasePath = GetDatabasePath(dbid, tsid);

Same.


> diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
> index aa4427724d..71b2212cbd 100644
> --- a/src/backend/utils/cache/relcache.c
> +++ b/src/backend/utils/cache/relcache.c
> @@ -1934,6 +1934,11 @@ RelationIdGetRelation(Oid relationId)
>                  RelationClearRelation(rd, true);
>  
>              /*
> +             * Normal entries are valid by now -- except nailed ones when
> +             * loaded before relcache initialization.  There isn't enough
> +             * infrastructure yet to do pg_class lookups, but we need their
> +             * rd_rel entries to be updated, so we let these through.
> +             */
>               * Normally entries need to be valid here, but before the relcache
>               * has been initialized, not enough infrastructure exists to
>               * perform pg_class lookups. The structure of such entries doesn't
> @@ -2346,8 +2351,7 @@ RelationClearRelation(Relation relation, bool rebuild)
>      RelationCloseSmgr(relation);

This sure looks like it's a syntax error? So I guess you might not have
staged the removal ports of the diff?


>      /*
> -     * Treat nailed-in system relations separately, they always need to be
> -     * accessible, so we can't blow them away.
> +     * We cannot blow away nailed-in relations, so treat them especially.
>       */

The former seems just as accurate, and is basically just the already
existing comment?


>      if (relation->rd_isnailed)
>      {
> @@ -5942,7 +5946,8 @@ write_relcache_init_file(bool shared)
>       * wrote out was up-to-date.)
>       *
>       * This mustn't run concurrently with the code that unlinks an init file
> -     * and sends SI messages, so grab a serialization lock for the duration.
> +     * and sends SI messages (see RelationCacheInitFilePreInvalidate), so grab
> +     * a serialization lock for the duration.
>       */
>      LWLockAcquire(RelCacheInitLock, LW_EXCLUSIVE);

Unrelated?


> @@ -6061,6 +6066,10 @@ RelationHasUnloggedIndex(Relation rel)
>   * changed one or more of the relation cache entries that are kept in the
>   * local init file.
>   *
> + * When DatabasePath is set, both the init file for that database and the
> + * shared (global) init files are to be removed; otherwise only the latter is.
> + * This is useful during recovery (XXX really?)
> + *

I'm confused?


>   * To be safe against concurrent inspection or rewriting of the init file,
>   * we must take RelCacheInitLock, then remove the old init file, then send
>   * the SI messages that include relcache inval for such relations, and then
> @@ -6180,9 +6189,9 @@ unlink_initfile(const char *initfilename, int elevel)
>  {
>      if (unlink(initfilename) < 0)
>      {
> -        /* It might not be there, but log any error other than ENOENT */
> +        /* It might not be there, but report any error other than ENOENT */
>          if (errno != ENOENT)
> -            ereport(ERROR,
> +            ereport(elevel,
>                      (errcode_for_file_access(),
>                       errmsg("could not remove cache file \"%s\": %m",
>                              initfilename)));

Included the elevel inclusion. Can't parse the difference between log
and report here?

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: why partition pruning doesn't work?
Next
From: Andres Freund
Date:
Subject: Re: found xmin from before relfrozenxid on pg_catalog.pg_authid