Thread: pg_upgrade and materialized views

pg_upgrade and materialized views

From
Claudio Freire
Date:
I'm not 100% sure this is a pg_upgrade bug or a pg_dump
--binary-upgrade one, or some other thing, but at this point I'm
fairly certain there's something wrong in one of them.

I just tried to pg_upgrade a database from 9.5 to 10.2. I took a
snapshot off a replica, promoted it, and then did the pg_upgrade there
(to avoid breaking our production server).

It all went very well, except that a database-wide vacuum is
complaining about materialized views, not all of them, specifically
the ones in which we regularly use "REFRESH MATERIALIZED VIEW
CONCURRENTLY" on.

In our production master, those views contain rather old relfrozenxid:

mat=# select relname, relfrozenxid from pg_class where relname like
'%_mv' or relname = 'user_agents_canonical_user_agent_os';
               relname               | relfrozenxid
-------------------------------------+--------------
 os_ranking_mv                       |    272288261
 site_ranking_mv                     |    272260588
 carrier_ranking_mv                  |    272273002
 brand_ranking_mv                    |    226575108
 device_specs_ranking_mv             |    182006046
 user_agents_canonical_user_agent_os |    129807014
(6 rows)

Of those, the last 3 get concurrent refreshes, the first 3 don't.

In the upgraded server, vacuum complained with:

INFO:  vacuuming "public.user_agents_canonical_user_agent_os"
vacuumdb: vacuuming of database "mat" failed: ERROR:  found xmin
244738497 from before relfrozenxid 245830003

Now, 245830003 looks a lot like the current xid during pg_upgrade, so
I believe pg_dump is somehow failing to restore relfrozenxid on those
matviews. In fact, trying pg_dump --binary-upgrade on any matview
shows that it's not setting relfrozenxid, probably because in a normal
dump, matviews are refreshed, but not when --binary-upgrade is used
(since it's usually used with --schema-only as well).

I haven't yet managed to build a minimal case to reproduce this, I'll
post it when I succeed, but I wanted to report the issue now since it
looks like a genuine bug.


Re: pg_upgrade and materialized views

From
Alvaro Herrera
Date:
Claudio Freire wrote:

> In the upgraded server, vacuum complained with:
> 
> INFO:  vacuuming "public.user_agents_canonical_user_agent_os"
> vacuumdb: vacuuming of database "mat" failed: ERROR:  found xmin
> 244738497 from before relfrozenxid 245830003

Wow, these checks were just added shortly before the minors released a
couple of weeks ago, and they've already found one genuine bug!  Nice.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_upgrade and materialized views

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
> I'm not 100% sure this is a pg_upgrade bug or a pg_dump
> --binary-upgrade one, or some other thing, but at this point I'm
> fairly certain there's something wrong in one of them.

The symptoms you're mentioning suggest two bugs: (1) it's not clear
why binary upgrade should treat a matview's relfrozenxid differently
from a regular table's; (2) independently of that, it sounds like REFRESH
MATERIALIZED VIEW CONCURRENTLY is somehow preventing advancement of the
matview's relfrozenxid in the source DB.

> I just tried to pg_upgrade a database from 9.5 to 10.2. I took a
> snapshot off a replica, promoted it, and then did the pg_upgrade there
> (to avoid breaking our production server).

And that brings replication behavior into the mix, too :-(.  I'd
suggest seeing if you can duplicate these problems without any
replication involved.

            regards, tom lane


Re: pg_upgrade and materialized views

From
Andres Freund
Date:
On 2018-02-20 18:26:59 -0300, Alvaro Herrera wrote:
> Claudio Freire wrote:
> 
> > In the upgraded server, vacuum complained with:
> > 
> > INFO:  vacuuming "public.user_agents_canonical_user_agent_os"
> > vacuumdb: vacuuming of database "mat" failed: ERROR:  found xmin
> > 244738497 from before relfrozenxid 245830003
> 
> Wow, these checks were just added shortly before the minors released a
> couple of weeks ago, and they've already found one genuine bug!  Nice.

My paranoia is vindicated once more!


Re: pg_upgrade and materialized views

From
Claudio Freire
Date:
On Tue, Feb 20, 2018 at 6:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>> I'm not 100% sure this is a pg_upgrade bug or a pg_dump
>> --binary-upgrade one, or some other thing, but at this point I'm
>> fairly certain there's something wrong in one of them.
...
> (2) independently of that, it sounds like REFRESH
> MATERIALIZED VIEW CONCURRENTLY is somehow preventing advancement of the
> matview's relfrozenxid in the source DB.

Not necessarily. I have vacuum_table_freeze_max_age set to 350M, so
it's not yet due for freezing.

>> I just tried to pg_upgrade a database from 9.5 to 10.2. I took a
>> snapshot off a replica, promoted it, and then did the pg_upgrade there
>> (to avoid breaking our production server).
>
> And that brings replication behavior into the mix, too :-(.  I'd
> suggest seeing if you can duplicate these problems without any
> replication involved.

Indeed, I'll try.


Re: pg_upgrade and materialized views

From
Claudio Freire
Date:
On Tue, Feb 20, 2018 at 6:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> (2) independently of that, it sounds like REFRESH
> MATERIALIZED VIEW CONCURRENTLY is somehow preventing advancement of the
> matview's relfrozenxid in the source DB.

About that, I did check the view's relfrozenxid and its rows' xmin to
see if there was some obvious breakage.

The following query:

select * from device_specs_ranking_mv where age(xmin) > age((select
relfrozenxid from pg_class where relname = 'device_specs_ranking_mv'))
limit 1;

Yields empty results both in the master and the replica, which, unless
I did something wrong in that query, would rule out replication
issues.


Re: pg_upgrade and materialized views

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
> On Tue, Feb 20, 2018 at 6:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (2) independently of that, it sounds like REFRESH
>> MATERIALIZED VIEW CONCURRENTLY is somehow preventing advancement of the
>> matview's relfrozenxid in the source DB.

> Not necessarily. I have vacuum_table_freeze_max_age set to 350M, so
> it's not yet due for freezing.

Perhaps, but it seems pretty suggestive that all of the non-concurrently
refreshed matviews have relfrozenxid significantly newer than all of
the concurrently refreshed ones.  Maybe that's just coincidence, or a
predictable outcome of your usage pattern, but I think it needs
explaining.

            regards, tom lane


Re: pg_upgrade and materialized views

From
Andres Freund
Date:
On 2018-02-20 18:13:26 -0300, Claudio Freire wrote:
> I'm not 100% sure this is a pg_upgrade bug or a pg_dump
> --binary-upgrade one, or some other thing, but at this point I'm
> fairly certain there's something wrong in one of them.
> 
> I just tried to pg_upgrade a database from 9.5 to 10.2. I took a
> snapshot off a replica, promoted it, and then did the pg_upgrade there
> (to avoid breaking our production server).
> 
> It all went very well, except that a database-wide vacuum is
> complaining about materialized views, not all of them, specifically
> the ones in which we regularly use "REFRESH MATERIALIZED VIEW
> CONCURRENTLY" on.
> 
> In our production master, those views contain rather old relfrozenxid:
> 
> mat=# select relname, relfrozenxid from pg_class where relname like
> '%_mv' or relname = 'user_agents_canonical_user_agent_os';
>                relname               | relfrozenxid
> -------------------------------------+--------------
>  os_ranking_mv                       |    272288261
>  site_ranking_mv                     |    272260588
>  carrier_ranking_mv                  |    272273002
>  brand_ranking_mv                    |    226575108
>  device_specs_ranking_mv             |    182006046
>  user_agents_canonical_user_agent_os |    129807014
> (6 rows)
> 
> Of those, the last 3 get concurrent refreshes, the first 3 don't.
> 
> In the upgraded server, vacuum complained with:
> 
> INFO:  vacuuming "public.user_agents_canonical_user_agent_os"
> vacuumdb: vacuuming of database "mat" failed: ERROR:  found xmin
> 244738497 from before relfrozenxid 245830003
> 
> Now, 245830003 looks a lot like the current xid during pg_upgrade, so
> I believe pg_dump is somehow failing to restore relfrozenxid on those
> matviews. In fact, trying pg_dump --binary-upgrade on any matview
> shows that it's not setting relfrozenxid, probably because in a normal
> dump, matviews are refreshed, but not when --binary-upgrade is used
> (since it's usually used with --schema-only as well).

The important part then happens in pg_dump. Note

        /*
         * To create binary-compatible heap files, we have to ensure the same
         * physical column order, including dropped columns, as in the
         * original.  Therefore, we create dropped columns above and drop them
         * here, also updating their attlen/attalign values so that the
         * dropped column can be skipped properly.  (We do not bother with
         * restoring the original attbyval setting.)  Also, inheritance
         * relationships are set up by doing ALTER TABLE INHERIT rather than
         * using an INHERITS clause --- the latter would possibly mess up the
         * column order.  That also means we have to take care about setting
         * attislocal correctly, plus fix up any inherited CHECK constraints.
         * Analogously, we set up typed tables using ALTER TABLE / OF here.
         */
        if (dopt->binary_upgrade &&
            (tbinfo->relkind == RELKIND_RELATION ||
             tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
             tbinfo->relkind == RELKIND_PARTITIONED_TABLE))
        {
...
            appendPQExpBufferStr(q, "\n-- For binary upgrade, set heap's relfrozenxid and relminmxid\n");
            appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n"
                              "SET relfrozenxid = '%u', relminmxid = '%u'\n"
                              "WHERE oid = ",
                              tbinfo->frozenxid, tbinfo->minmxid);
            appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout);
            appendPQExpBufferStr(q, "::pg_catalog.regclass;\n");

note that the above if clause doesn't include materialized tables. Which
sems to explain this bug?  Could you check that just updating the above
if to include matviews fixes the bug for you?


Looking into this I also saw:

/*
 *    set_frozenxids()
 *
 *    We have frozen all xids, so set datfrozenxid, relfrozenxid, and
 *    relminmxid to be the old cluster's xid counter, which we just set
 *    in the new cluster.  User-table frozenxid and minmxid values will
 *    be set by pg_dump --binary-upgrade, but objects not set by the pg_dump
 *    must have proper frozen counters.
 */
static
void
set_frozenxids(bool minmxid_only)
...
            /* set pg_class.relfrozenxid */
            PQclear(executeQueryOrDie(conn,
                                      "UPDATE    pg_catalog.pg_class "
                                      "SET    relfrozenxid = '%u' "
            /* only heap, materialized view, and TOAST are vacuumed */
                                      "WHERE    relkind IN ("
                                      CppAsString2(RELKIND_RELATION) ", "
                                      CppAsString2(RELKIND_MATVIEW) ", "
                                      CppAsString2(RELKIND_TOASTVALUE) ")",
                                      old_cluster.controldata.chkpnt_nxtxid));

which makes a bit uncomfortable, but I can't quite put my finger on
why.

Greetings,

Andres Freund


Re: pg_upgrade and materialized views

From
Andres Freund
Date:
Hi,

On 2018-02-20 13:54:20 -0800, Andres Freund wrote:
> note that the above if clause doesn't include materialized tables. Which
> sems to explain this bug?  Could you check that just updating the above
> if to include matviews fixes the bug for you?

I suspect that a proper fix would be to move the relevant block one
level up, to have it's own if block. The current location seems to be
wrong, this is unrelated work to the header:
         * To create binary-compatible heap files, we have to ensure the same
         * physical column order, including dropped columns, as in the
         * original.  Therefore, we create dropped columns above and drop them
         * here, also updating their attlen/attalign values so that the
         * dropped column can be skipped properly.  (We do not bother with
         * restoring the original attbyval setting.)  Also, inheritance
         * relationships are set up by doing ALTER TABLE INHERIT rather than
         * using an INHERITS clause --- the latter would possibly mess up the
         * column order.  That also means we have to take care about setting
         * attislocal correctly, plus fix up any inherited CHECK constraints.
         * Analogously, we set up typed tables using ALTER TABLE / OF here.
         */
of that block.

A later if block just concerned to restoring frozenxid for all relevant
types of tables seems more appropriate. I also wonder if we shouldn't
just always restore relfrozenxid / minmxid if set on the source system,
rather than having relkind specific dispatch.

Also, dear god, this code could use some refactoring :(

Greetings,

Andres Freund


Re: pg_upgrade and materialized views

From
Claudio Freire
Date:
On Tue, Feb 20, 2018 at 6:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>> On Tue, Feb 20, 2018 at 6:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> (2) independently of that, it sounds like REFRESH
>>> MATERIALIZED VIEW CONCURRENTLY is somehow preventing advancement of the
>>> matview's relfrozenxid in the source DB.
>
>> Not necessarily. I have vacuum_table_freeze_max_age set to 350M, so
>> it's not yet due for freezing.
>
> Perhaps, but it seems pretty suggestive that all of the non-concurrently
> refreshed matviews have relfrozenxid significantly newer than all of
> the concurrently refreshed ones.  Maybe that's just coincidence, or a
> predictable outcome of your usage pattern, but I think it needs
> explaining.

I think it's quite expectable.

The ones that use concurrently are expected to change only slightly
between refreshes. All those matviews get refreshed periodically in
mostly the same schedule, but some are either slow to refresh
concurrently, and thus we prefer a full refresh, or are expected to
change too much for a concurrent refresh to be useful. As such,
concurrently refreshed views are expected to have rows that remain
valid for a long while (old data that has stabilized, the views
themselves represent a few month's worth of data). Fully refreshed
views will always have recent xids because they are recreated often.
I'd say it makes sense.


On Tue, Feb 20, 2018 at 6:54 PM, Andres Freund <andres@anarazel.de> wrote:
> The important part then happens in pg_dump. Note
>
>                 /*
>                  * To create binary-compatible heap files, we have to ensure the same
>                  * physical column order, including dropped columns, as in the
>                  * original.  Therefore, we create dropped columns above and drop them
>                  * here, also updating their attlen/attalign values so that the
>                  * dropped column can be skipped properly.  (We do not bother with
>                  * restoring the original attbyval setting.)  Also, inheritance
>                  * relationships are set up by doing ALTER TABLE INHERIT rather than
>                  * using an INHERITS clause --- the latter would possibly mess up the
>                  * column order.  That also means we have to take care about setting
>                  * attislocal correctly, plus fix up any inherited CHECK constraints.
>                  * Analogously, we set up typed tables using ALTER TABLE / OF here.
>                  */
>                 if (dopt->binary_upgrade &&
>                         (tbinfo->relkind == RELKIND_RELATION ||
>                          tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
>                          tbinfo->relkind == RELKIND_PARTITIONED_TABLE))
>                 {
> ...
>                         appendPQExpBufferStr(q, "\n-- For binary upgrade, set heap's relfrozenxid and
relminmxid\n");
>                         appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n"
>                                                           "SET relfrozenxid = '%u', relminmxid = '%u'\n"
>                                                           "WHERE oid = ",
>                                                           tbinfo->frozenxid, tbinfo->minmxid);
>                         appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout);
>                         appendPQExpBufferStr(q, "::pg_catalog.regclass;\n");
>
> note that the above if clause doesn't include materialized tables. Which
> sems to explain this bug?  Could you check that just updating the above
> if to include matviews fixes the bug for you?

The pg_dump --binary-only test does produce the necessary SQL to set
relfrozenxid after that change, so it looks like it would fix it.

To be able to fully confirm it, though, I'll have to build a mimal
case to reproduce the issue, because the snapshot I used it not usable
again (can't re-upgrade), and launching another snapshot takes a lot
of time. Basically, I'll get back to you with a confirmation, but it
does look good.

> Looking into this I also saw:
>
> /*
>  *      set_frozenxids()
>  *
>  *      We have frozen all xids, so set datfrozenxid, relfrozenxid, and
>  *      relminmxid to be the old cluster's xid counter, which we just set
>  *      in the new cluster.  User-table frozenxid and minmxid values will
>  *      be set by pg_dump --binary-upgrade, but objects not set by the pg_dump
>  *      must have proper frozen counters.
>  */
> static
> void
> set_frozenxids(bool minmxid_only)
> ...
>                         /* set pg_class.relfrozenxid */
>                         PQclear(executeQueryOrDie(conn,
>                                                                           "UPDATE       pg_catalog.pg_class "
>                                                                           "SET  relfrozenxid = '%u' "
>                         /* only heap, materialized view, and TOAST are vacuumed */
>                                                                           "WHERE        relkind IN ("
>                                                                           CppAsString2(RELKIND_RELATION) ", "
>                                                                           CppAsString2(RELKIND_MATVIEW) ", "
>                                                                           CppAsString2(RELKIND_TOASTVALUE) ")",
>                                                                           old_cluster.controldata.chkpnt_nxtxid));
>
> which makes a bit uncomfortable, but I can't quite put my finger on
> why.

I looked into that one, it's not relevant to this case, since it's
working on template1 (check the conn used there).


Re: pg_upgrade and materialized views

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> The important part then happens in pg_dump. Note

>         if (dopt->binary_upgrade &&
>             (tbinfo->relkind == RELKIND_RELATION ||
>              tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
>              tbinfo->relkind == RELKIND_PARTITIONED_TABLE))

> note that the above if clause doesn't include materialized tables. Which
> sems to explain this bug?  Could you check that just updating the above
> if to include matviews fixes the bug for you?

I'm also wondering why it *does* include foreign tables.  Surely
relfrozenxid is meaningless for a FT?

> Looking into this I also saw:
> set_frozenxids(bool minmxid_only)
> which makes a bit uncomfortable, but I can't quite put my finger on
> why.

The fact that it's inconsistent with the other list is surely a red flag,
eg seems like we should include RELKIND_PARTITIONED_TABLE there too.

            regards, tom lane


Re: pg_upgrade and materialized views

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
>> Looking into this I also saw:
>> *      set_frozenxids()
>> which makes a bit uncomfortable, but I can't quite put my finger on
>> why.

> I looked into that one, it's not relevant to this case, since it's
> working on template1 (check the conn used there).

Hm?  Looks to me like it iterates through all the DBs in the cluster.

            regards, tom lane


Re: pg_upgrade and materialized views

From
Claudio Freire
Date:
On Tue, Feb 20, 2018 at 7:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>>> Looking into this I also saw:
>>> *      set_frozenxids()
>>> which makes a bit uncomfortable, but I can't quite put my finger on
>>> why.
>
>> I looked into that one, it's not relevant to this case, since it's
>> working on template1 (check the conn used there).
>
> Hm?  Looks to me like it iterates through all the DBs in the cluster.
>
>                         regards, tom lane

You're right. But IIUC it's called before restoring globals, so only
postgres, template0 and template1 exist at that point.


Re: pg_upgrade and materialized views

From
Andres Freund
Date:
On 2018-02-20 17:05:29 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > The important part then happens in pg_dump. Note
> 
> >         if (dopt->binary_upgrade &&
> >             (tbinfo->relkind == RELKIND_RELATION ||
> >              tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
> >              tbinfo->relkind == RELKIND_PARTITIONED_TABLE))
> 
> > note that the above if clause doesn't include materialized tables. Which
> > sems to explain this bug?  Could you check that just updating the above
> > if to include matviews fixes the bug for you?
> 
> I'm also wondering why it *does* include foreign tables.  Surely
> relfrozenxid is meaningless for a FT?

I think it's because that if block originally was concerned about
reconstructing table order, rather than relfrozenxid management. For
that the conditions make sense - matviews can't have columns
added/dropped but foreign tables can.  That's why I'm suggesting to move
the relfrozenxid handling out of there, it seems very likely to cause
similar problems down the road.

> > Looking into this I also saw:
> > set_frozenxids(bool minmxid_only)
> > which makes a bit uncomfortable, but I can't quite put my finger on
> > why.
> 
> The fact that it's inconsistent with the other list is surely a red flag,
> eg seems like we should include RELKIND_PARTITIONED_TABLE there too.

If I understand correctly the purpose of set_frozenxid() is to update
the horizons of system tables, *before* restoring the schema dump. So I
think excluding RELKIND_PARTITIONED_TABLE is harmless for now (i.e. we
can just change it in master).

I wonder if there's a scenario in which a schema restore uses enough
xids to get close to anti-wraparound territory?

Greetings,

Andres Freund


Re: pg_upgrade and materialized views

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
> On Tue, Feb 20, 2018 at 7:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hm?  Looks to me like it iterates through all the DBs in the cluster.

> You're right. But IIUC it's called before restoring globals, so only
> postgres, template0 and template1 exist at that point.

Hmm ... there's another call later, but that one only happens for
pre-9.3 source DBs, which won't contain matviews nor partitioned tables.
So maybe it's OK but I'm not quite sure, and even if it is OK it's only
accidentally so.

Be nice if the documentation for this code weren't so relentlessly bad ---
the header comment for this function looks like it's telling you something
useful, but it isn't even close to accurate.  The comment right above the
call in prepare_new_globals() seems unrelated to anyone's version of
reality either.

            regards, tom lane


Re: pg_upgrade and materialized views

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I wonder if there's a scenario in which a schema restore uses enough
> xids to get close to anti-wraparound territory?

Interesting question ... you'd need one heck of a lot of objects in
the cluster, but we've certainly heard of people with lots of objects.

We could stave that problem off by running the restore steps in
--single-transaction mode, if it weren't that pg_restore will reject the
combination of --create and --single-transaction.  I wonder if we should
allow that, specifying that it means that the single transaction starts
after we reconnect to the new DB.

            regards, tom lane


Re: pg_upgrade and materialized views

From
Claudio Freire
Date:
On Tue, Feb 20, 2018 at 7:05 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Tue, Feb 20, 2018 at 6:54 PM, Andres Freund <andres@anarazel.de> wrote:
>> The important part then happens in pg_dump. Note
>>
>>                 /*
>>                  * To create binary-compatible heap files, we have to ensure the same
>>                  * physical column order, including dropped columns, as in the
>>                  * original.  Therefore, we create dropped columns above and drop them
>>                  * here, also updating their attlen/attalign values so that the
>>                  * dropped column can be skipped properly.  (We do not bother with
>>                  * restoring the original attbyval setting.)  Also, inheritance
>>                  * relationships are set up by doing ALTER TABLE INHERIT rather than
>>                  * using an INHERITS clause --- the latter would possibly mess up the
>>                  * column order.  That also means we have to take care about setting
>>                  * attislocal correctly, plus fix up any inherited CHECK constraints.
>>                  * Analogously, we set up typed tables using ALTER TABLE / OF here.
>>                  */
>>                 if (dopt->binary_upgrade &&
>>                         (tbinfo->relkind == RELKIND_RELATION ||
>>                          tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
>>                          tbinfo->relkind == RELKIND_PARTITIONED_TABLE))
>>                 {
>> ...
>>                         appendPQExpBufferStr(q, "\n-- For binary upgrade, set heap's relfrozenxid and
relminmxid\n");
>>                         appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n"
>>                                                           "SET relfrozenxid = '%u', relminmxid = '%u'\n"
>>                                                           "WHERE oid = ",
>>                                                           tbinfo->frozenxid, tbinfo->minmxid);
>>                         appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout);
>>                         appendPQExpBufferStr(q, "::pg_catalog.regclass;\n");
>>
>> note that the above if clause doesn't include materialized tables. Which
>> sems to explain this bug?  Could you check that just updating the above
>> if to include matviews fixes the bug for you?
>
> The pg_dump --binary-only test does produce the necessary SQL to set
> relfrozenxid after that change, so it looks like it would fix it.
>
> To be able to fully confirm it, though, I'll have to build a mimal
> case to reproduce the issue, because the snapshot I used it not usable
> again (can't re-upgrade), and launching another snapshot takes a lot
> of time. Basically, I'll get back to you with a confirmation, but it
> does look good.

Well, the attached script reproduces the issue.

Make a scratch dir somewhere, cd into it, and run it. It will download
9.6.7 and 10.2, build them, create a test db in 9.6.7, modify it a bit
here and there to get to the desired state, pg_upgrade it to 10.2 and
vacuum.

Might even be a good idea to create a regression test with that, but
AFAIK there's nothing remotely like that in the current tests.

The patch you propose makes the test succeed. It may be true that
there might be other similar issues lurking in that code, but it looks
like it fixes this particular issue.

Attachment

Re: pg_upgrade and materialized views

From
Andres Freund
Date:
On 2018-02-20 17:29:01 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I wonder if there's a scenario in which a schema restore uses enough
> > xids to get close to anti-wraparound territory?
> 
> Interesting question ... you'd need one heck of a lot of objects in
> the cluster, but we've certainly heard of people with lots of objects.
> 
> We could stave that problem off by running the restore steps in
> --single-transaction mode, if it weren't that pg_restore will reject the
> combination of --create and --single-transaction.  I wonder if we should
> allow that, specifying that it means that the single transaction starts
> after we reconnect to the new DB.

One problem is that we need to restore some types of objects that can be
quite voluminous (e.g. users) during the restore of global objects. But
we IIRC can't make all of that use a single transaction as some commands
like CREATE DATABASE / TABLESPACE etc don't like that.

I suspect one of the more realistic ways to use up huge amounts of xids
would be to have lots of large objects with individual permissions?

A quick and dirt option would be to add a few hand-scheduled transaction
commands at the right places in a pg_upgrade mode :/

Greetings,

Andres Freund


Re: pg_upgrade and materialized views

From
Andres Freund
Date:
On 2018-02-20 17:24:03 -0500, Tom Lane wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
> > On Tue, Feb 20, 2018 at 7:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Hm?  Looks to me like it iterates through all the DBs in the cluster.
> 
> > You're right. But IIUC it's called before restoring globals, so only
> > postgres, template0 and template1 exist at that point.
> 
> Hmm ... there's another call later, but that one only happens for
> pre-9.3 source DBs, which won't contain matviews nor partitioned tables.
> So maybe it's OK but I'm not quite sure, and even if it is OK it's only
> accidentally so.

> Be nice if the documentation for this code weren't so relentlessly bad ---
> the header comment for this function looks like it's telling you something
> useful, but it isn't even close to accurate.  The comment right above the
> call in prepare_new_globals() seems unrelated to anyone's version of
> reality either.

Yea, this code, not just the comments, is bad.  Combining the codepaths
for setting minmxid on all objects when upgrading from an old server
with the task of setting an appropriate horizon for objects created by
initdb, but nothing else, without so much as a comment about it?

Greetings,

Andres Freund


Re: pg_upgrade and materialized views

From
Claudio Freire
Date:
On Tue, Feb 20, 2018 at 7:56 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Tue, Feb 20, 2018 at 7:05 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Tue, Feb 20, 2018 at 6:54 PM, Andres Freund <andres@anarazel.de> wrote:
>>> The important part then happens in pg_dump. Note
>>>
>>>                 /*
>>>                  * To create binary-compatible heap files, we have to ensure the same
>>>                  * physical column order, including dropped columns, as in the
>>>                  * original.  Therefore, we create dropped columns above and drop them
>>>                  * here, also updating their attlen/attalign values so that the
>>>                  * dropped column can be skipped properly.  (We do not bother with
>>>                  * restoring the original attbyval setting.)  Also, inheritance
>>>                  * relationships are set up by doing ALTER TABLE INHERIT rather than
>>>                  * using an INHERITS clause --- the latter would possibly mess up the
>>>                  * column order.  That also means we have to take care about setting
>>>                  * attislocal correctly, plus fix up any inherited CHECK constraints.
>>>                  * Analogously, we set up typed tables using ALTER TABLE / OF here.
>>>                  */
>>>                 if (dopt->binary_upgrade &&
>>>                         (tbinfo->relkind == RELKIND_RELATION ||
>>>                          tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
>>>                          tbinfo->relkind == RELKIND_PARTITIONED_TABLE))
>>>                 {
>>> ...
>>>                         appendPQExpBufferStr(q, "\n-- For binary upgrade, set heap's relfrozenxid and
relminmxid\n");
>>>                         appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n"
>>>                                                           "SET relfrozenxid = '%u', relminmxid = '%u'\n"
>>>                                                           "WHERE oid = ",
>>>                                                           tbinfo->frozenxid, tbinfo->minmxid);
>>>                         appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout);
>>>                         appendPQExpBufferStr(q, "::pg_catalog.regclass;\n");
>>>
>>> note that the above if clause doesn't include materialized tables. Which
>>> sems to explain this bug?  Could you check that just updating the above
>>> if to include matviews fixes the bug for you?
>>
>> The pg_dump --binary-only test does produce the necessary SQL to set
>> relfrozenxid after that change, so it looks like it would fix it.
>>
>> To be able to fully confirm it, though, I'll have to build a mimal
>> case to reproduce the issue, because the snapshot I used it not usable
>> again (can't re-upgrade), and launching another snapshot takes a lot
>> of time. Basically, I'll get back to you with a confirmation, but it
>> does look good.
>
> Well, the attached script reproduces the issue.
>
> Make a scratch dir somewhere, cd into it, and run it. It will download
> 9.6.7 and 10.2, build them, create a test db in 9.6.7, modify it a bit
> here and there to get to the desired state, pg_upgrade it to 10.2 and
> vacuum.
>
> Might even be a good idea to create a regression test with that, but
> AFAIK there's nothing remotely like that in the current tests.
>
> The patch you propose makes the test succeed. It may be true that
> there might be other similar issues lurking in that code, but it looks
> like it fixes this particular issue.

Oops, the previous script has timing issues, the attached one really
works - just added some sleeps ;-)

Attachment

Re: pg_upgrade and materialized views

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
>> Well, the attached script reproduces the issue.

One thing this doesn't prove by itself is whether the use of
REFRESH MATERIALIZED VIEW CONCURRENTLY has any effect on the
situation.

I poked at that a bit and noted that REFRESH MATERIALIZED VIEW
CONCURRENTLY doesn't seem to change the matview's relfrozenxid
at all, while a plain REFRESH advances the matview's relfrozenxid
to (more or less) the current xid counter.  That probably isn't
a bug, but maybe there's a missed optimization opportunity there.
At any rate, the matview's relfrozenxid isn't going backwards,
so my original fear seems unfounded.

Anyway, I'm thinking the core of the problem here is that we've got
multiple places that know which relkinds are physically transferred
during a pg_upgrade, and they don't all know the same thing.  We
need to centralize that knowledge somehow, or we're going to be
singing this same tune again in the future.  Not quite sure where
to put it though.  pg_dump and pg_upgrade both need to know that,
but the backend doesn't, so I don't quite want to add it in
pg_class.h where the core list of relkinds is.

            regards, tom lane


Re: pg_upgrade and materialized views

From
Andres Freund
Date:

On February 20, 2018 3:44:47 PM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Anyway, I'm thinking the core of the problem here is that we've got
>multiple places that know which relkinds are physically transferred
>during a pg_upgrade, and they don't all know the same thing.  We
>need to centralize that knowledge somehow, or we're going to be
>singing this same tune again in the future.  Not quite sure where
>to put it though.  pg_dump and pg_upgrade both need to know that,
>but the backend doesn't, so I don't quite want to add it in
>pg_class.h where the core list of relkinds is.

Why do we need any relkind checks here at all? Shouldn't we just transport all xid horizons that are set before into
thenew cluster without filtering? 

And update all preexisting objects that have an xid set to the new the old cluster's nextxid?

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: pg_upgrade and materialized views

From
Claudio Freire
Date:
On Tue, Feb 20, 2018 at 8:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>>> Well, the attached script reproduces the issue.
>
> One thing this doesn't prove by itself is whether the use of
> REFRESH MATERIALIZED VIEW CONCURRENTLY has any effect on the
> situation.
>
> I poked at that a bit and noted that REFRESH MATERIALIZED VIEW
> CONCURRENTLY doesn't seem to change the matview's relfrozenxid
> at all, while a plain REFRESH advances the matview's relfrozenxid
> to (more or less) the current xid counter.

Probably the reason why only concurrently triggers the vacuum
assertion is because only concurrently creates dead tuples.

Vacuum won't check unless there are dead tuples, it skips all-visible pages.


Re: pg_upgrade and materialized views

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On February 20, 2018 3:44:47 PM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Anyway, I'm thinking the core of the problem here is that we've got
>> multiple places that know which relkinds are physically transferred
>> during a pg_upgrade, and they don't all know the same thing.  We
>> need to centralize that knowledge somehow, or we're going to be
>> singing this same tune again in the future.  Not quite sure where
>> to put it though.  pg_dump and pg_upgrade both need to know that,
>> but the backend doesn't, so I don't quite want to add it in
>> pg_class.h where the core list of relkinds is.

> Why do we need any relkind checks here at all? Shouldn't we just transport all xid horizons that are set before into
thenew cluster without filtering? 

The ones on relations without storage are meaningless, and I'm not
sure it's a good idea to set them nonzero if they're zero initially.
In any case, pg_upgrade *has to* be selective about which files it
tries to physically transfer --- even skipping nonexistent files
wouldn't be good enough, since it should not transfer sequence files.

Also, you were right to complain before that some of the issue here
is considering "must hack dropped columns" to be identical to
"must hack relfrozenxid".  It's not clear that that's true, eg
I think the answer is different for RELKIND_PARTITIONED_TABLE.

            regards, tom lane


Re: pg_upgrade and materialized views

From
Victor Yegorov
Date:
2018-02-21 0:56 GMT+02:00 Claudio Freire <klaussfreire@gmail.com>:
Well, the attached script reproduces the issue.

Make a scratch dir somewhere, cd into it, and run it. It will download
9.6.7 and 10.2, build them, create a test db in 9.6.7, modify it a bit
here and there to get to the desired state, pg_upgrade it to 10.2 and
vacuum.

Might even be a good idea to create a regression test with that, but
AFAIK there's nothing remotely like that in the current tests.

The patch you propose makes the test succeed. It may be true that
there might be other similar issues lurking in that code, but it looks
like it fixes this particular issue.

Is it possible, that bug #14852 is of the same nature as the issue at hand here?

Re: pg_upgrade and materialized views

From
Tom Lane
Date:
Victor Yegorov <vyegorov@gmail.com> writes:
> Is it possible, that bug #14852 is of the same nature as the issue at hand
> here?
> https://postg.es/m/20171013115320.28049.86457@wrigleys.postgresql.org

For the archives' sake, the correct link is
https://www.postgresql.org/message-id/20171013115320.28049.86457@wrigleys.postgresql.org

Yeah, that does look suspiciously like a set of facts matching this
problem :-(

            regards, tom lane


Re: pg_upgrade and materialized views

From
Andres Freund
Date:
On 2018-02-21 10:08:12 -0500, Tom Lane wrote:
> Victor Yegorov <vyegorov@gmail.com> writes:
> > Is it possible, that bug #14852 is of the same nature as the issue at hand
> > here?
> > https://postg.es/m/20171013115320.28049.86457@wrigleys.postgresql.org
> 
> For the archives' sake, the correct link is
> https://www.postgresql.org/message-id/20171013115320.28049.86457@wrigleys.postgresql.org

Yea, it does.


> Yeah, that does look suspiciously like a set of facts matching this
> problem :-(

I'd personally say ;), given that it's one less potentially data
corrupting bug to worry about ;). And it's one that can be fixed without
dataloss to boot.

Greetings,

Andres Freund


Re: pg_upgrade and materialized views

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2018-02-21 10:08:12 -0500, Tom Lane wrote:
>> Yeah, that does look suspiciously like a set of facts matching this
>> problem :-(

> I'd personally say ;), given that it's one less potentially data
> corrupting bug to worry about ;). And it's one that can be fixed without
> dataloss to boot.

I've pushed a fix for this.  I concluded after further study that there
wasn't really much call for a unified notion of applicable relkinds,
because actually what this code is dealing with is "relkinds that need
preservation of dropped columns" and "relkinds that need preservation
of relfrozenxid", both of which are somewhat different from "relkinds
that have storage".  What's really lacking is documentation, which
I attempted to supply.  For the code change proper, I did what you
suggested and split the relfrozenxid hacking into an independent
if-block.

            regards, tom lane


Re: pg_upgrade and materialized views

From
Andres Freund
Date:
On 2018-02-21 18:45:26 -0500, Tom Lane wrote:
> I've pushed a fix for this.

Thanks for this work, the new comments indeed make this much
clearer!


Wonder if we should include a comment in the release notes that possible
occurances of the problem can be fixed with REFRESH? Perhaps that's
obvious?

I also wonder if there's a chance that this can result in wrong query
results without getting errors, once the xids appears to be from the
future - if so, should we ask people to refresh matviews if they're
older than pg_upgrade?

Greetings,

Andres Freund


Re: pg_upgrade and materialized views

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Wonder if we should include a comment in the release notes that possible
> occurances of the problem can be fixed with REFRESH?

Yeah, I plan to.  I forgot to mention it in the commit message, but I
imagine it will still be fresh in mind when I make the release notes
on Friday ;-)

> I also wonder if there's a chance that this can result in wrong query
> results without getting errors, once the xids appears to be from the
> future - if so, should we ask people to refresh matviews if they're
> older than pg_upgrade?

I'm inclined to think that this is a rare problem, seeing that it's been
there more than 5 years and we just now identified it.  Probably, most
people REFRESH their matviews often enough that there aren't really
old rows in them.  (It seems to me that REFRESH CONCURRENTLY does share
part of the blame here, because it can allow older rows to linger in the
matview.)  I think it's sufficient to say "refresh if you see one of
these errors".

            regards, tom lane


Re: pg_upgrade and materialized views

From
Claudio Freire
Date:
On Wed, Feb 21, 2018 at 8:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2018-02-21 10:08:12 -0500, Tom Lane wrote:
>>> Yeah, that does look suspiciously like a set of facts matching this
>>> problem :-(
>
>> I'd personally say ;), given that it's one less potentially data
>> corrupting bug to worry about ;). And it's one that can be fixed without
>> dataloss to boot.
>
> I've pushed a fix for this.

Thanks for addressing this so quickly :-)