Thread: pg_upgrade and materialized views
I'm not 100% sure this is a pg_upgrade bug or a pg_dump --binary-upgrade one, or some other thing, but at this point I'm fairly certain there's something wrong in one of them. I just tried to pg_upgrade a database from 9.5 to 10.2. I took a snapshot off a replica, promoted it, and then did the pg_upgrade there (to avoid breaking our production server). It all went very well, except that a database-wide vacuum is complaining about materialized views, not all of them, specifically the ones in which we regularly use "REFRESH MATERIALIZED VIEW CONCURRENTLY" on. In our production master, those views contain rather old relfrozenxid: mat=# select relname, relfrozenxid from pg_class where relname like '%_mv' or relname = 'user_agents_canonical_user_agent_os'; relname | relfrozenxid -------------------------------------+-------------- os_ranking_mv | 272288261 site_ranking_mv | 272260588 carrier_ranking_mv | 272273002 brand_ranking_mv | 226575108 device_specs_ranking_mv | 182006046 user_agents_canonical_user_agent_os | 129807014 (6 rows) Of those, the last 3 get concurrent refreshes, the first 3 don't. In the upgraded server, vacuum complained with: INFO: vacuuming "public.user_agents_canonical_user_agent_os" vacuumdb: vacuuming of database "mat" failed: ERROR: found xmin 244738497 from before relfrozenxid 245830003 Now, 245830003 looks a lot like the current xid during pg_upgrade, so I believe pg_dump is somehow failing to restore relfrozenxid on those matviews. In fact, trying pg_dump --binary-upgrade on any matview shows that it's not setting relfrozenxid, probably because in a normal dump, matviews are refreshed, but not when --binary-upgrade is used (since it's usually used with --schema-only as well). I haven't yet managed to build a minimal case to reproduce this, I'll post it when I succeed, but I wanted to report the issue now since it looks like a genuine bug.
Claudio Freire wrote: > In the upgraded server, vacuum complained with: > > INFO: vacuuming "public.user_agents_canonical_user_agent_os" > vacuumdb: vacuuming of database "mat" failed: ERROR: found xmin > 244738497 from before relfrozenxid 245830003 Wow, these checks were just added shortly before the minors released a couple of weeks ago, and they've already found one genuine bug! Nice. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Claudio Freire <klaussfreire@gmail.com> writes: > I'm not 100% sure this is a pg_upgrade bug or a pg_dump > --binary-upgrade one, or some other thing, but at this point I'm > fairly certain there's something wrong in one of them. The symptoms you're mentioning suggest two bugs: (1) it's not clear why binary upgrade should treat a matview's relfrozenxid differently from a regular table's; (2) independently of that, it sounds like REFRESH MATERIALIZED VIEW CONCURRENTLY is somehow preventing advancement of the matview's relfrozenxid in the source DB. > I just tried to pg_upgrade a database from 9.5 to 10.2. I took a > snapshot off a replica, promoted it, and then did the pg_upgrade there > (to avoid breaking our production server). And that brings replication behavior into the mix, too :-(. I'd suggest seeing if you can duplicate these problems without any replication involved. regards, tom lane
On 2018-02-20 18:26:59 -0300, Alvaro Herrera wrote: > Claudio Freire wrote: > > > In the upgraded server, vacuum complained with: > > > > INFO: vacuuming "public.user_agents_canonical_user_agent_os" > > vacuumdb: vacuuming of database "mat" failed: ERROR: found xmin > > 244738497 from before relfrozenxid 245830003 > > Wow, these checks were just added shortly before the minors released a > couple of weeks ago, and they've already found one genuine bug! Nice. My paranoia is vindicated once more!
On Tue, Feb 20, 2018 at 6:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Claudio Freire <klaussfreire@gmail.com> writes: >> I'm not 100% sure this is a pg_upgrade bug or a pg_dump >> --binary-upgrade one, or some other thing, but at this point I'm >> fairly certain there's something wrong in one of them. ... > (2) independently of that, it sounds like REFRESH > MATERIALIZED VIEW CONCURRENTLY is somehow preventing advancement of the > matview's relfrozenxid in the source DB. Not necessarily. I have vacuum_table_freeze_max_age set to 350M, so it's not yet due for freezing. >> I just tried to pg_upgrade a database from 9.5 to 10.2. I took a >> snapshot off a replica, promoted it, and then did the pg_upgrade there >> (to avoid breaking our production server). > > And that brings replication behavior into the mix, too :-(. I'd > suggest seeing if you can duplicate these problems without any > replication involved. Indeed, I'll try.
On Tue, Feb 20, 2018 at 6:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > (2) independently of that, it sounds like REFRESH > MATERIALIZED VIEW CONCURRENTLY is somehow preventing advancement of the > matview's relfrozenxid in the source DB. About that, I did check the view's relfrozenxid and its rows' xmin to see if there was some obvious breakage. The following query: select * from device_specs_ranking_mv where age(xmin) > age((select relfrozenxid from pg_class where relname = 'device_specs_ranking_mv')) limit 1; Yields empty results both in the master and the replica, which, unless I did something wrong in that query, would rule out replication issues.
Claudio Freire <klaussfreire@gmail.com> writes: > On Tue, Feb 20, 2018 at 6:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (2) independently of that, it sounds like REFRESH >> MATERIALIZED VIEW CONCURRENTLY is somehow preventing advancement of the >> matview's relfrozenxid in the source DB. > Not necessarily. I have vacuum_table_freeze_max_age set to 350M, so > it's not yet due for freezing. Perhaps, but it seems pretty suggestive that all of the non-concurrently refreshed matviews have relfrozenxid significantly newer than all of the concurrently refreshed ones. Maybe that's just coincidence, or a predictable outcome of your usage pattern, but I think it needs explaining. regards, tom lane
On 2018-02-20 18:13:26 -0300, Claudio Freire wrote: > I'm not 100% sure this is a pg_upgrade bug or a pg_dump > --binary-upgrade one, or some other thing, but at this point I'm > fairly certain there's something wrong in one of them. > > I just tried to pg_upgrade a database from 9.5 to 10.2. I took a > snapshot off a replica, promoted it, and then did the pg_upgrade there > (to avoid breaking our production server). > > It all went very well, except that a database-wide vacuum is > complaining about materialized views, not all of them, specifically > the ones in which we regularly use "REFRESH MATERIALIZED VIEW > CONCURRENTLY" on. > > In our production master, those views contain rather old relfrozenxid: > > mat=# select relname, relfrozenxid from pg_class where relname like > '%_mv' or relname = 'user_agents_canonical_user_agent_os'; > relname | relfrozenxid > -------------------------------------+-------------- > os_ranking_mv | 272288261 > site_ranking_mv | 272260588 > carrier_ranking_mv | 272273002 > brand_ranking_mv | 226575108 > device_specs_ranking_mv | 182006046 > user_agents_canonical_user_agent_os | 129807014 > (6 rows) > > Of those, the last 3 get concurrent refreshes, the first 3 don't. > > In the upgraded server, vacuum complained with: > > INFO: vacuuming "public.user_agents_canonical_user_agent_os" > vacuumdb: vacuuming of database "mat" failed: ERROR: found xmin > 244738497 from before relfrozenxid 245830003 > > Now, 245830003 looks a lot like the current xid during pg_upgrade, so > I believe pg_dump is somehow failing to restore relfrozenxid on those > matviews. In fact, trying pg_dump --binary-upgrade on any matview > shows that it's not setting relfrozenxid, probably because in a normal > dump, matviews are refreshed, but not when --binary-upgrade is used > (since it's usually used with --schema-only as well). The important part then happens in pg_dump. Note /* * To create binary-compatible heap files, we have to ensure the same * physical column order, including dropped columns, as in the * original. Therefore, we create dropped columns above and drop them * here, also updating their attlen/attalign values so that the * dropped column can be skipped properly. (We do not bother with * restoring the original attbyval setting.) Also, inheritance * relationships are set up by doing ALTER TABLE INHERIT rather than * using an INHERITS clause --- the latter would possibly mess up the * column order. That also means we have to take care about setting * attislocal correctly, plus fix up any inherited CHECK constraints. * Analogously, we set up typed tables using ALTER TABLE / OF here. */ if (dopt->binary_upgrade && (tbinfo->relkind == RELKIND_RELATION || tbinfo->relkind == RELKIND_FOREIGN_TABLE || tbinfo->relkind == RELKIND_PARTITIONED_TABLE)) { ... appendPQExpBufferStr(q, "\n-- For binary upgrade, set heap's relfrozenxid and relminmxid\n"); appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n" "SET relfrozenxid = '%u', relminmxid = '%u'\n" "WHERE oid = ", tbinfo->frozenxid, tbinfo->minmxid); appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout); appendPQExpBufferStr(q, "::pg_catalog.regclass;\n"); note that the above if clause doesn't include materialized tables. Which sems to explain this bug? Could you check that just updating the above if to include matviews fixes the bug for you? Looking into this I also saw: /* * set_frozenxids() * * We have frozen all xids, so set datfrozenxid, relfrozenxid, and * relminmxid to be the old cluster's xid counter, which we just set * in the new cluster. User-table frozenxid and minmxid values will * be set by pg_dump --binary-upgrade, but objects not set by the pg_dump * must have proper frozen counters. */ static void set_frozenxids(bool minmxid_only) ... /* set pg_class.relfrozenxid */ PQclear(executeQueryOrDie(conn, "UPDATE pg_catalog.pg_class " "SET relfrozenxid = '%u' " /* only heap, materialized view, and TOAST are vacuumed */ "WHERE relkind IN (" CppAsString2(RELKIND_RELATION) ", " CppAsString2(RELKIND_MATVIEW) ", " CppAsString2(RELKIND_TOASTVALUE) ")", old_cluster.controldata.chkpnt_nxtxid)); which makes a bit uncomfortable, but I can't quite put my finger on why. Greetings, Andres Freund
Hi, On 2018-02-20 13:54:20 -0800, Andres Freund wrote: > note that the above if clause doesn't include materialized tables. Which > sems to explain this bug? Could you check that just updating the above > if to include matviews fixes the bug for you? I suspect that a proper fix would be to move the relevant block one level up, to have it's own if block. The current location seems to be wrong, this is unrelated work to the header: * To create binary-compatible heap files, we have to ensure the same * physical column order, including dropped columns, as in the * original. Therefore, we create dropped columns above and drop them * here, also updating their attlen/attalign values so that the * dropped column can be skipped properly. (We do not bother with * restoring the original attbyval setting.) Also, inheritance * relationships are set up by doing ALTER TABLE INHERIT rather than * using an INHERITS clause --- the latter would possibly mess up the * column order. That also means we have to take care about setting * attislocal correctly, plus fix up any inherited CHECK constraints. * Analogously, we set up typed tables using ALTER TABLE / OF here. */ of that block. A later if block just concerned to restoring frozenxid for all relevant types of tables seems more appropriate. I also wonder if we shouldn't just always restore relfrozenxid / minmxid if set on the source system, rather than having relkind specific dispatch. Also, dear god, this code could use some refactoring :( Greetings, Andres Freund
On Tue, Feb 20, 2018 at 6:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Claudio Freire <klaussfreire@gmail.com> writes: >> On Tue, Feb 20, 2018 at 6:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> (2) independently of that, it sounds like REFRESH >>> MATERIALIZED VIEW CONCURRENTLY is somehow preventing advancement of the >>> matview's relfrozenxid in the source DB. > >> Not necessarily. I have vacuum_table_freeze_max_age set to 350M, so >> it's not yet due for freezing. > > Perhaps, but it seems pretty suggestive that all of the non-concurrently > refreshed matviews have relfrozenxid significantly newer than all of > the concurrently refreshed ones. Maybe that's just coincidence, or a > predictable outcome of your usage pattern, but I think it needs > explaining. I think it's quite expectable. The ones that use concurrently are expected to change only slightly between refreshes. All those matviews get refreshed periodically in mostly the same schedule, but some are either slow to refresh concurrently, and thus we prefer a full refresh, or are expected to change too much for a concurrent refresh to be useful. As such, concurrently refreshed views are expected to have rows that remain valid for a long while (old data that has stabilized, the views themselves represent a few month's worth of data). Fully refreshed views will always have recent xids because they are recreated often. I'd say it makes sense. On Tue, Feb 20, 2018 at 6:54 PM, Andres Freund <andres@anarazel.de> wrote: > The important part then happens in pg_dump. Note > > /* > * To create binary-compatible heap files, we have to ensure the same > * physical column order, including dropped columns, as in the > * original. Therefore, we create dropped columns above and drop them > * here, also updating their attlen/attalign values so that the > * dropped column can be skipped properly. (We do not bother with > * restoring the original attbyval setting.) Also, inheritance > * relationships are set up by doing ALTER TABLE INHERIT rather than > * using an INHERITS clause --- the latter would possibly mess up the > * column order. That also means we have to take care about setting > * attislocal correctly, plus fix up any inherited CHECK constraints. > * Analogously, we set up typed tables using ALTER TABLE / OF here. > */ > if (dopt->binary_upgrade && > (tbinfo->relkind == RELKIND_RELATION || > tbinfo->relkind == RELKIND_FOREIGN_TABLE || > tbinfo->relkind == RELKIND_PARTITIONED_TABLE)) > { > ... > appendPQExpBufferStr(q, "\n-- For binary upgrade, set heap's relfrozenxid and relminmxid\n"); > appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n" > "SET relfrozenxid = '%u', relminmxid = '%u'\n" > "WHERE oid = ", > tbinfo->frozenxid, tbinfo->minmxid); > appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout); > appendPQExpBufferStr(q, "::pg_catalog.regclass;\n"); > > note that the above if clause doesn't include materialized tables. Which > sems to explain this bug? Could you check that just updating the above > if to include matviews fixes the bug for you? The pg_dump --binary-only test does produce the necessary SQL to set relfrozenxid after that change, so it looks like it would fix it. To be able to fully confirm it, though, I'll have to build a mimal case to reproduce the issue, because the snapshot I used it not usable again (can't re-upgrade), and launching another snapshot takes a lot of time. Basically, I'll get back to you with a confirmation, but it does look good. > Looking into this I also saw: > > /* > * set_frozenxids() > * > * We have frozen all xids, so set datfrozenxid, relfrozenxid, and > * relminmxid to be the old cluster's xid counter, which we just set > * in the new cluster. User-table frozenxid and minmxid values will > * be set by pg_dump --binary-upgrade, but objects not set by the pg_dump > * must have proper frozen counters. > */ > static > void > set_frozenxids(bool minmxid_only) > ... > /* set pg_class.relfrozenxid */ > PQclear(executeQueryOrDie(conn, > "UPDATE pg_catalog.pg_class " > "SET relfrozenxid = '%u' " > /* only heap, materialized view, and TOAST are vacuumed */ > "WHERE relkind IN (" > CppAsString2(RELKIND_RELATION) ", " > CppAsString2(RELKIND_MATVIEW) ", " > CppAsString2(RELKIND_TOASTVALUE) ")", > old_cluster.controldata.chkpnt_nxtxid)); > > which makes a bit uncomfortable, but I can't quite put my finger on > why. I looked into that one, it's not relevant to this case, since it's working on template1 (check the conn used there).
Andres Freund <andres@anarazel.de> writes: > The important part then happens in pg_dump. Note > if (dopt->binary_upgrade && > (tbinfo->relkind == RELKIND_RELATION || > tbinfo->relkind == RELKIND_FOREIGN_TABLE || > tbinfo->relkind == RELKIND_PARTITIONED_TABLE)) > note that the above if clause doesn't include materialized tables. Which > sems to explain this bug? Could you check that just updating the above > if to include matviews fixes the bug for you? I'm also wondering why it *does* include foreign tables. Surely relfrozenxid is meaningless for a FT? > Looking into this I also saw: > set_frozenxids(bool minmxid_only) > which makes a bit uncomfortable, but I can't quite put my finger on > why. The fact that it's inconsistent with the other list is surely a red flag, eg seems like we should include RELKIND_PARTITIONED_TABLE there too. regards, tom lane
Claudio Freire <klaussfreire@gmail.com> writes: >> Looking into this I also saw: >> * set_frozenxids() >> which makes a bit uncomfortable, but I can't quite put my finger on >> why. > I looked into that one, it's not relevant to this case, since it's > working on template1 (check the conn used there). Hm? Looks to me like it iterates through all the DBs in the cluster. regards, tom lane
On Tue, Feb 20, 2018 at 7:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Claudio Freire <klaussfreire@gmail.com> writes: >>> Looking into this I also saw: >>> * set_frozenxids() >>> which makes a bit uncomfortable, but I can't quite put my finger on >>> why. > >> I looked into that one, it's not relevant to this case, since it's >> working on template1 (check the conn used there). > > Hm? Looks to me like it iterates through all the DBs in the cluster. > > regards, tom lane You're right. But IIUC it's called before restoring globals, so only postgres, template0 and template1 exist at that point.
On 2018-02-20 17:05:29 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > The important part then happens in pg_dump. Note > > > if (dopt->binary_upgrade && > > (tbinfo->relkind == RELKIND_RELATION || > > tbinfo->relkind == RELKIND_FOREIGN_TABLE || > > tbinfo->relkind == RELKIND_PARTITIONED_TABLE)) > > > note that the above if clause doesn't include materialized tables. Which > > sems to explain this bug? Could you check that just updating the above > > if to include matviews fixes the bug for you? > > I'm also wondering why it *does* include foreign tables. Surely > relfrozenxid is meaningless for a FT? I think it's because that if block originally was concerned about reconstructing table order, rather than relfrozenxid management. For that the conditions make sense - matviews can't have columns added/dropped but foreign tables can. That's why I'm suggesting to move the relfrozenxid handling out of there, it seems very likely to cause similar problems down the road. > > Looking into this I also saw: > > set_frozenxids(bool minmxid_only) > > which makes a bit uncomfortable, but I can't quite put my finger on > > why. > > The fact that it's inconsistent with the other list is surely a red flag, > eg seems like we should include RELKIND_PARTITIONED_TABLE there too. If I understand correctly the purpose of set_frozenxid() is to update the horizons of system tables, *before* restoring the schema dump. So I think excluding RELKIND_PARTITIONED_TABLE is harmless for now (i.e. we can just change it in master). I wonder if there's a scenario in which a schema restore uses enough xids to get close to anti-wraparound territory? Greetings, Andres Freund
Claudio Freire <klaussfreire@gmail.com> writes: > On Tue, Feb 20, 2018 at 7:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hm? Looks to me like it iterates through all the DBs in the cluster. > You're right. But IIUC it's called before restoring globals, so only > postgres, template0 and template1 exist at that point. Hmm ... there's another call later, but that one only happens for pre-9.3 source DBs, which won't contain matviews nor partitioned tables. So maybe it's OK but I'm not quite sure, and even if it is OK it's only accidentally so. Be nice if the documentation for this code weren't so relentlessly bad --- the header comment for this function looks like it's telling you something useful, but it isn't even close to accurate. The comment right above the call in prepare_new_globals() seems unrelated to anyone's version of reality either. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > I wonder if there's a scenario in which a schema restore uses enough > xids to get close to anti-wraparound territory? Interesting question ... you'd need one heck of a lot of objects in the cluster, but we've certainly heard of people with lots of objects. We could stave that problem off by running the restore steps in --single-transaction mode, if it weren't that pg_restore will reject the combination of --create and --single-transaction. I wonder if we should allow that, specifying that it means that the single transaction starts after we reconnect to the new DB. regards, tom lane
On Tue, Feb 20, 2018 at 7:05 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Tue, Feb 20, 2018 at 6:54 PM, Andres Freund <andres@anarazel.de> wrote: >> The important part then happens in pg_dump. Note >> >> /* >> * To create binary-compatible heap files, we have to ensure the same >> * physical column order, including dropped columns, as in the >> * original. Therefore, we create dropped columns above and drop them >> * here, also updating their attlen/attalign values so that the >> * dropped column can be skipped properly. (We do not bother with >> * restoring the original attbyval setting.) Also, inheritance >> * relationships are set up by doing ALTER TABLE INHERIT rather than >> * using an INHERITS clause --- the latter would possibly mess up the >> * column order. That also means we have to take care about setting >> * attislocal correctly, plus fix up any inherited CHECK constraints. >> * Analogously, we set up typed tables using ALTER TABLE / OF here. >> */ >> if (dopt->binary_upgrade && >> (tbinfo->relkind == RELKIND_RELATION || >> tbinfo->relkind == RELKIND_FOREIGN_TABLE || >> tbinfo->relkind == RELKIND_PARTITIONED_TABLE)) >> { >> ... >> appendPQExpBufferStr(q, "\n-- For binary upgrade, set heap's relfrozenxid and relminmxid\n"); >> appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n" >> "SET relfrozenxid = '%u', relminmxid = '%u'\n" >> "WHERE oid = ", >> tbinfo->frozenxid, tbinfo->minmxid); >> appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout); >> appendPQExpBufferStr(q, "::pg_catalog.regclass;\n"); >> >> note that the above if clause doesn't include materialized tables. Which >> sems to explain this bug? Could you check that just updating the above >> if to include matviews fixes the bug for you? > > The pg_dump --binary-only test does produce the necessary SQL to set > relfrozenxid after that change, so it looks like it would fix it. > > To be able to fully confirm it, though, I'll have to build a mimal > case to reproduce the issue, because the snapshot I used it not usable > again (can't re-upgrade), and launching another snapshot takes a lot > of time. Basically, I'll get back to you with a confirmation, but it > does look good. Well, the attached script reproduces the issue. Make a scratch dir somewhere, cd into it, and run it. It will download 9.6.7 and 10.2, build them, create a test db in 9.6.7, modify it a bit here and there to get to the desired state, pg_upgrade it to 10.2 and vacuum. Might even be a good idea to create a regression test with that, but AFAIK there's nothing remotely like that in the current tests. The patch you propose makes the test succeed. It may be true that there might be other similar issues lurking in that code, but it looks like it fixes this particular issue.
Attachment
On 2018-02-20 17:29:01 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I wonder if there's a scenario in which a schema restore uses enough > > xids to get close to anti-wraparound territory? > > Interesting question ... you'd need one heck of a lot of objects in > the cluster, but we've certainly heard of people with lots of objects. > > We could stave that problem off by running the restore steps in > --single-transaction mode, if it weren't that pg_restore will reject the > combination of --create and --single-transaction. I wonder if we should > allow that, specifying that it means that the single transaction starts > after we reconnect to the new DB. One problem is that we need to restore some types of objects that can be quite voluminous (e.g. users) during the restore of global objects. But we IIRC can't make all of that use a single transaction as some commands like CREATE DATABASE / TABLESPACE etc don't like that. I suspect one of the more realistic ways to use up huge amounts of xids would be to have lots of large objects with individual permissions? A quick and dirt option would be to add a few hand-scheduled transaction commands at the right places in a pg_upgrade mode :/ Greetings, Andres Freund
On 2018-02-20 17:24:03 -0500, Tom Lane wrote: > Claudio Freire <klaussfreire@gmail.com> writes: > > On Tue, Feb 20, 2018 at 7:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Hm? Looks to me like it iterates through all the DBs in the cluster. > > > You're right. But IIUC it's called before restoring globals, so only > > postgres, template0 and template1 exist at that point. > > Hmm ... there's another call later, but that one only happens for > pre-9.3 source DBs, which won't contain matviews nor partitioned tables. > So maybe it's OK but I'm not quite sure, and even if it is OK it's only > accidentally so. > Be nice if the documentation for this code weren't so relentlessly bad --- > the header comment for this function looks like it's telling you something > useful, but it isn't even close to accurate. The comment right above the > call in prepare_new_globals() seems unrelated to anyone's version of > reality either. Yea, this code, not just the comments, is bad. Combining the codepaths for setting minmxid on all objects when upgrading from an old server with the task of setting an appropriate horizon for objects created by initdb, but nothing else, without so much as a comment about it? Greetings, Andres Freund
On Tue, Feb 20, 2018 at 7:56 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Tue, Feb 20, 2018 at 7:05 PM, Claudio Freire <klaussfreire@gmail.com> wrote: >> On Tue, Feb 20, 2018 at 6:54 PM, Andres Freund <andres@anarazel.de> wrote: >>> The important part then happens in pg_dump. Note >>> >>> /* >>> * To create binary-compatible heap files, we have to ensure the same >>> * physical column order, including dropped columns, as in the >>> * original. Therefore, we create dropped columns above and drop them >>> * here, also updating their attlen/attalign values so that the >>> * dropped column can be skipped properly. (We do not bother with >>> * restoring the original attbyval setting.) Also, inheritance >>> * relationships are set up by doing ALTER TABLE INHERIT rather than >>> * using an INHERITS clause --- the latter would possibly mess up the >>> * column order. That also means we have to take care about setting >>> * attislocal correctly, plus fix up any inherited CHECK constraints. >>> * Analogously, we set up typed tables using ALTER TABLE / OF here. >>> */ >>> if (dopt->binary_upgrade && >>> (tbinfo->relkind == RELKIND_RELATION || >>> tbinfo->relkind == RELKIND_FOREIGN_TABLE || >>> tbinfo->relkind == RELKIND_PARTITIONED_TABLE)) >>> { >>> ... >>> appendPQExpBufferStr(q, "\n-- For binary upgrade, set heap's relfrozenxid and relminmxid\n"); >>> appendPQExpBuffer(q, "UPDATE pg_catalog.pg_class\n" >>> "SET relfrozenxid = '%u', relminmxid = '%u'\n" >>> "WHERE oid = ", >>> tbinfo->frozenxid, tbinfo->minmxid); >>> appendStringLiteralAH(q, fmtId(tbinfo->dobj.name), fout); >>> appendPQExpBufferStr(q, "::pg_catalog.regclass;\n"); >>> >>> note that the above if clause doesn't include materialized tables. Which >>> sems to explain this bug? Could you check that just updating the above >>> if to include matviews fixes the bug for you? >> >> The pg_dump --binary-only test does produce the necessary SQL to set >> relfrozenxid after that change, so it looks like it would fix it. >> >> To be able to fully confirm it, though, I'll have to build a mimal >> case to reproduce the issue, because the snapshot I used it not usable >> again (can't re-upgrade), and launching another snapshot takes a lot >> of time. Basically, I'll get back to you with a confirmation, but it >> does look good. > > Well, the attached script reproduces the issue. > > Make a scratch dir somewhere, cd into it, and run it. It will download > 9.6.7 and 10.2, build them, create a test db in 9.6.7, modify it a bit > here and there to get to the desired state, pg_upgrade it to 10.2 and > vacuum. > > Might even be a good idea to create a regression test with that, but > AFAIK there's nothing remotely like that in the current tests. > > The patch you propose makes the test succeed. It may be true that > there might be other similar issues lurking in that code, but it looks > like it fixes this particular issue. Oops, the previous script has timing issues, the attached one really works - just added some sleeps ;-)
Attachment
Claudio Freire <klaussfreire@gmail.com> writes: >> Well, the attached script reproduces the issue. One thing this doesn't prove by itself is whether the use of REFRESH MATERIALIZED VIEW CONCURRENTLY has any effect on the situation. I poked at that a bit and noted that REFRESH MATERIALIZED VIEW CONCURRENTLY doesn't seem to change the matview's relfrozenxid at all, while a plain REFRESH advances the matview's relfrozenxid to (more or less) the current xid counter. That probably isn't a bug, but maybe there's a missed optimization opportunity there. At any rate, the matview's relfrozenxid isn't going backwards, so my original fear seems unfounded. Anyway, I'm thinking the core of the problem here is that we've got multiple places that know which relkinds are physically transferred during a pg_upgrade, and they don't all know the same thing. We need to centralize that knowledge somehow, or we're going to be singing this same tune again in the future. Not quite sure where to put it though. pg_dump and pg_upgrade both need to know that, but the backend doesn't, so I don't quite want to add it in pg_class.h where the core list of relkinds is. regards, tom lane
On February 20, 2018 3:44:47 PM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Anyway, I'm thinking the core of the problem here is that we've got >multiple places that know which relkinds are physically transferred >during a pg_upgrade, and they don't all know the same thing. We >need to centralize that knowledge somehow, or we're going to be >singing this same tune again in the future. Not quite sure where >to put it though. pg_dump and pg_upgrade both need to know that, >but the backend doesn't, so I don't quite want to add it in >pg_class.h where the core list of relkinds is. Why do we need any relkind checks here at all? Shouldn't we just transport all xid horizons that are set before into thenew cluster without filtering? And update all preexisting objects that have an xid set to the new the old cluster's nextxid? Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Tue, Feb 20, 2018 at 8:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Claudio Freire <klaussfreire@gmail.com> writes: >>> Well, the attached script reproduces the issue. > > One thing this doesn't prove by itself is whether the use of > REFRESH MATERIALIZED VIEW CONCURRENTLY has any effect on the > situation. > > I poked at that a bit and noted that REFRESH MATERIALIZED VIEW > CONCURRENTLY doesn't seem to change the matview's relfrozenxid > at all, while a plain REFRESH advances the matview's relfrozenxid > to (more or less) the current xid counter. Probably the reason why only concurrently triggers the vacuum assertion is because only concurrently creates dead tuples. Vacuum won't check unless there are dead tuples, it skips all-visible pages.
Andres Freund <andres@anarazel.de> writes: > On February 20, 2018 3:44:47 PM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Anyway, I'm thinking the core of the problem here is that we've got >> multiple places that know which relkinds are physically transferred >> during a pg_upgrade, and they don't all know the same thing. We >> need to centralize that knowledge somehow, or we're going to be >> singing this same tune again in the future. Not quite sure where >> to put it though. pg_dump and pg_upgrade both need to know that, >> but the backend doesn't, so I don't quite want to add it in >> pg_class.h where the core list of relkinds is. > Why do we need any relkind checks here at all? Shouldn't we just transport all xid horizons that are set before into thenew cluster without filtering? The ones on relations without storage are meaningless, and I'm not sure it's a good idea to set them nonzero if they're zero initially. In any case, pg_upgrade *has to* be selective about which files it tries to physically transfer --- even skipping nonexistent files wouldn't be good enough, since it should not transfer sequence files. Also, you were right to complain before that some of the issue here is considering "must hack dropped columns" to be identical to "must hack relfrozenxid". It's not clear that that's true, eg I think the answer is different for RELKIND_PARTITIONED_TABLE. regards, tom lane
2018-02-21 0:56 GMT+02:00 Claudio Freire <klaussfreire@gmail.com>:
Well, the attached script reproduces the issue.
Make a scratch dir somewhere, cd into it, and run it. It will download
9.6.7 and 10.2, build them, create a test db in 9.6.7, modify it a bit
here and there to get to the desired state, pg_upgrade it to 10.2 and
vacuum.
Might even be a good idea to create a regression test with that, but
AFAIK there's nothing remotely like that in the current tests.
The patch you propose makes the test succeed. It may be true that
there might be other similar issues lurking in that code, but it looks
like it fixes this particular issue.
Victor Yegorov <vyegorov@gmail.com> writes: > Is it possible, that bug #14852 is of the same nature as the issue at hand > here? > https://postg.es/m/20171013115320.28049.86457@wrigleys.postgresql.org For the archives' sake, the correct link is https://www.postgresql.org/message-id/20171013115320.28049.86457@wrigleys.postgresql.org Yeah, that does look suspiciously like a set of facts matching this problem :-( regards, tom lane
On 2018-02-21 10:08:12 -0500, Tom Lane wrote: > Victor Yegorov <vyegorov@gmail.com> writes: > > Is it possible, that bug #14852 is of the same nature as the issue at hand > > here? > > https://postg.es/m/20171013115320.28049.86457@wrigleys.postgresql.org > > For the archives' sake, the correct link is > https://www.postgresql.org/message-id/20171013115320.28049.86457@wrigleys.postgresql.org Yea, it does. > Yeah, that does look suspiciously like a set of facts matching this > problem :-( I'd personally say ;), given that it's one less potentially data corrupting bug to worry about ;). And it's one that can be fixed without dataloss to boot. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-02-21 10:08:12 -0500, Tom Lane wrote: >> Yeah, that does look suspiciously like a set of facts matching this >> problem :-( > I'd personally say ;), given that it's one less potentially data > corrupting bug to worry about ;). And it's one that can be fixed without > dataloss to boot. I've pushed a fix for this. I concluded after further study that there wasn't really much call for a unified notion of applicable relkinds, because actually what this code is dealing with is "relkinds that need preservation of dropped columns" and "relkinds that need preservation of relfrozenxid", both of which are somewhat different from "relkinds that have storage". What's really lacking is documentation, which I attempted to supply. For the code change proper, I did what you suggested and split the relfrozenxid hacking into an independent if-block. regards, tom lane
On 2018-02-21 18:45:26 -0500, Tom Lane wrote: > I've pushed a fix for this. Thanks for this work, the new comments indeed make this much clearer! Wonder if we should include a comment in the release notes that possible occurances of the problem can be fixed with REFRESH? Perhaps that's obvious? I also wonder if there's a chance that this can result in wrong query results without getting errors, once the xids appears to be from the future - if so, should we ask people to refresh matviews if they're older than pg_upgrade? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Wonder if we should include a comment in the release notes that possible > occurances of the problem can be fixed with REFRESH? Yeah, I plan to. I forgot to mention it in the commit message, but I imagine it will still be fresh in mind when I make the release notes on Friday ;-) > I also wonder if there's a chance that this can result in wrong query > results without getting errors, once the xids appears to be from the > future - if so, should we ask people to refresh matviews if they're > older than pg_upgrade? I'm inclined to think that this is a rare problem, seeing that it's been there more than 5 years and we just now identified it. Probably, most people REFRESH their matviews often enough that there aren't really old rows in them. (It seems to me that REFRESH CONCURRENTLY does share part of the blame here, because it can allow older rows to linger in the matview.) I think it's sufficient to say "refresh if you see one of these errors". regards, tom lane
On Wed, Feb 21, 2018 at 8:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@anarazel.de> writes: >> On 2018-02-21 10:08:12 -0500, Tom Lane wrote: >>> Yeah, that does look suspiciously like a set of facts matching this >>> problem :-( > >> I'd personally say ;), given that it's one less potentially data >> corrupting bug to worry about ;). And it's one that can be fixed without >> dataloss to boot. > > I've pushed a fix for this. Thanks for addressing this so quickly :-)