Thread: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 15896 Logged by: Christoph Berg Email address: myon@debian.org PostgreSQL version: 12beta2 Operating system: Debian Description: When pg_upgrading from 10-or-earlier to 12beta2 or 13devel, an assertion is raised. (Spotted by Debian's postgresql-common upgrade tests. Previously missed because we were only testing version+1 upgrades.) TRAP: FailedAssertion(»!(metad->btm_version >= 3)«, Datei: »/build/postgresql-12-3URvLF/postgresql-12-12~beta2/build/../src/backend/access/nbtree/nbtpage.c«, Zeile: 665) Reproducer: 10.9: CREATE TABLE phone (name varchar(255) PRIMARY KEY, tel int NOT NULL); INSERT INTO phone VALUES ('Alice', 2); CREATE USER foo; GRANT INSERT ON phone TO foo; INSERT INTO phone VALUES ('Bob', 1); pg_upgradecluster -m upgrade 10 foo (runs /usr/lib/postgresql/12/bin/initdb -D /var/lib/postgresql/12/foo --auth-local peer --auth-host md5 --encoding UTF8 --lc-collate de_DE.utf8 --lc-ctype de_DE.utf8 /usr/lib/postgresql/12/bin/pg_upgrade -b /usr/lib/postgresql/10/bin -B /usr/lib/postgresql/12/bin -p 5434 -P 5435 -d /etc/postgresql/10/foo -D /etc/postgresql/12/foo) 12beta2: SELECT * FROM phone ORDER BY name; TRAP: FailedAssertion(»!(metad->btm_version >= 3)«, Datei: »/build/postgresql-12-3URvLF/postgresql-12-12~beta2/build/../src/backend/access/nbtree/nbtpage.c«, Zeile: 665) 2019-07-05 11:34:35.040 CEST [14325] LOG: Serverprozess (PID 14376) wurde von Signal 6 beendet: Abgebrochen 2019-07-05 11:34:35.040 CEST [14325] DETAIL: Der fehlgeschlagene Prozess führte aus: SELECT * FROM phone ORDER BY name; Combinations tested: bad: 10->12 9.6->12 9.6->13 good: 10->11 11->12 12->13
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP:FailedAssertion(»!(metad->btm_version >= 3)«
From
Bernd Helmle
Date:
Am Freitag, den 05.07.2019, 09:59 +0000 schrieb PG Bug reporting form: > 12beta2: > SELECT * FROM phone ORDER BY name; > TRAP: FailedAssertion(»!(metad->btm_version >= 3)«, Datei: > »/build/postgresql-12-3URvLF/postgresql-12- > 12~beta2/build/../src/backend/access/nbtree/nbtpage.c«, > Zeile: 665) > 2019-07-05 11:34:35.040 CEST [14325] LOG: Serverprozess (PID 14376) > wurde > von Signal 6 beendet: Abgebrochen > 2019-07-05 11:34:35.040 CEST [14325] DETAIL: Der fehlgeschlagene > Prozess > führte aus: SELECT * FROM phone ORDER BY name; Additional note from me after a short test: It works if you upgrade to PG11 first _and_ have touched the index in a way it has rewritten the metapage to version 3, e.g. adding new pages to the index. Thanks Bernd
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Alvaro Herrera
Date:
Adding Peter G. to CC. On 2019-Jul-05, PG Bug reporting form wrote: > When pg_upgrading from 10-or-earlier to 12beta2 or 13devel, an assertion is > raised. (Spotted by Debian's postgresql-common upgrade tests. Previously > missed because we were only testing version+1 upgrades.) > > TRAP: FailedAssertion(»!(metad->btm_version >= 3)«, Datei: > »/build/postgresql-12-3URvLF/postgresql-12-12~beta2/build/../src/backend/access/nbtree/nbtpage.c«, > Zeile: 665) Seems that _bt_getrootheight is too optimistic about the metapage version it'll find. I suppose this could be handled by just not caching the metapage if it is of the old version ... or maybe by calling _bt_upgrademetapage(). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Peter Geoghegan
Date:
On Fri, Jul 5, 2019 at 8:49 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > TRAP: FailedAssertion(»!(metad->btm_version >= 3)«, Datei: > > »/build/postgresql-12-3URvLF/postgresql-12-12~beta2/build/../src/backend/access/nbtree/nbtpage.c«, > > Zeile: 665) > > Seems that _bt_getrootheight is too optimistic about the metapage > version it'll find. I suppose this could be handled by just not caching > the metapage if it is of the old version ... or maybe by calling > _bt_upgrademetapage(). The problem here predates v12 -- the call to _bt_cachemetadata() was added to _bt_getrootheight() by commit 0a64b45152b, which went into v11. My commit dd299df8189 added a new assertion that fails, but that's just a symptom -- I changed the code in _bt_getrootheight() to use a BTMetaPageData pointer to shared memory (i.e. a pointer to the authoritative version), rather than using the newly-out-of-sync cached version. It shouldn't be out-of-sync at all. _bt_getrootheight() is mostly just something that exists for the planner, so it has no business calling _bt_cachemetadata(), which will "upgrade" the cached metadata image from version 2 to version 3 if it happens to be on version 2. How can it be okay to upgrade the cached version without also upgrading the on-disk/shared_buffers version? This bug was hiding in plain sight. -- Peter Geoghegan
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Peter Geoghegan
Date:
On Fri, Jul 5, 2019 at 3:00 AM PG Bug reporting form <noreply@postgresql.org> wrote: > When pg_upgrading from 10-or-earlier to 12beta2 or 13devel, an assertion is > raised. (Spotted by Debian's postgresql-common upgrade tests. Previously > missed because we were only testing version+1 upgrades.) Obviously you were right to adopt pg_upgrade tests that cross more than one version at a time. Thanks! -- Peter Geoghegan
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Christoph Berg
Date:
Re: Peter Geoghegan 2019-07-06 <CAH2-WznYXDG6aANoag5ixoMrV=0bD-W5i9cVq4cRkb-fA4xRmw@mail.gmail.com> > > When pg_upgrading from 10-or-earlier to 12beta2 or 13devel, an assertion is > > raised. (Spotted by Debian's postgresql-common upgrade tests. Previously > > missed because we were only testing version+1 upgrades.) > > Obviously you were right to adopt pg_upgrade tests that cross more > than one version at a time. Thanks! Indeed, and running 9.6->10->11 tests (that were present before as well, but never run) revealed another bug in pg_upgradecluster itself... Testsuites are just awesome. Christoph
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Peter Geoghegan
Date:
On Fri, Jul 5, 2019 at 3:14 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Fri, Jul 5, 2019 at 8:49 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > TRAP: FailedAssertion(»!(metad->btm_version >= 3)«, Datei: > > > »/build/postgresql-12-3URvLF/postgresql-12-12~beta2/build/../src/backend/access/nbtree/nbtpage.c«, > > > Zeile: 665) > > > > Seems that _bt_getrootheight is too optimistic about the metapage > > version it'll find. I suppose this could be handled by just not caching > > the metapage if it is of the old version ... or maybe by calling > > _bt_upgrademetapage(). > > The problem here predates v12 -- the call to _bt_cachemetadata() was > added to _bt_getrootheight() by commit 0a64b45152b, which went into > v11. My commit dd299df8189 added a new assertion that fails, but > that's just a symptom -- I changed the code in _bt_getrootheight() to > use a BTMetaPageData pointer to shared memory (i.e. a pointer to the > authoritative version), rather than using the newly-out-of-sync cached > version. It shouldn't be out-of-sync at all. > > _bt_getrootheight() is mostly just something that exists for the > planner, so it has no business calling _bt_cachemetadata(), which will > "upgrade" the cached metadata image from version 2 to version 3 if it > happens to be on version 2. How can it be okay to upgrade the cached > version without also upgrading the on-disk/shared_buffers version? > This bug was hiding in plain sight. CC'ing Alexander, who was also involved in commit 0a64b45152b... -- Peter Geoghegan
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Michael Paquier
Date:
On Thu, Jul 11, 2019 at 12:18:46AM -0700, Peter Geoghegan wrote: > On Fri, Jul 5, 2019 at 3:14 PM Peter Geoghegan <pg@bowt.ie> wrote: >> _bt_getrootheight() is mostly just something that exists for the >> planner, so it has no business calling _bt_cachemetadata(), which will >> "upgrade" the cached metadata image from version 2 to version 3 if it >> happens to be on version 2. How can it be okay to upgrade the cached >> version without also upgrading the on-disk/shared_buffers version? >> This bug was hiding in plain sight. > > CC'ing Alexander, who was also involved in commit 0a64b45152b... So the problem has been introduced in v11 per this commit, still we only see the issue since v12 because your code relied on a wrong assumption. Per what I am reading, it seems to me that we should fix v11, but that's a live problem for v12 because of the page format upgrade so we need to track it as an open item. -- Michael
Attachment
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Peter Geoghegan
Date:
On Thu, Jul 11, 2019 at 1:16 AM Michael Paquier <michael@paquier.xyz> wrote: > So the problem has been introduced in v11 per this commit, still we > only see the issue since v12 because your code relied on a wrong > assumption. We see the issue on v12 because my code added an assertion that caught the issue. The behavior on v12 is otherwise substantively the same as on v11. Even if it was true that my commits made nbtree rely on the same "wrong assumption" more often than on v11, why would it matter? The bug is still a v11 bug. > Per what I am reading, it seems to me that we should fix > v11, but that's a live problem for v12 because of the page format > upgrade so we need to track it as an open item. The highest priority ought to be fixing the problem in the stable v11 branch. That said, there is hardly any difference between v11 and v12. It isn't possible to upgrade a B-Tree index from v2/v3 to v4 (the latest version) on-the-fly. But if in your judgement it makes sense to add a v12 open item, I have no objection. -- Peter Geoghegan
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Peter Geoghegan
Date:
On Fri, Jul 5, 2019 at 3:14 PM Peter Geoghegan <pg@bowt.ie> wrote: > The problem here predates v12 -- the call to _bt_cachemetadata() was > added to _bt_getrootheight() by commit 0a64b45152b, which went into > v11. My commit dd299df8189 added a new assertion that fails, but > that's just a symptom -- I changed the code in _bt_getrootheight() to > use a BTMetaPageData pointer to shared memory (i.e. a pointer to the > authoritative version), rather than using the newly-out-of-sync cached > version. It shouldn't be out-of-sync at all. The attached HEAD-only patch fixes the issue by not actually upgrading the page within _bt_cachemetadata(), while making some existing assertions less aggressive. I haven't tested it properly just yet -- ran out of time to do that today. Maybe you can try this out with your original test case, Christoph? Thanks -- Peter Geoghegan
Attachment
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Christoph Berg
Date:
Re: Peter Geoghegan 2019-07-16 <CAH2-Wz=jvYQmPgSxZCdHe1NwK+S3j1jF6ydc1G0eHs7BNdpn2w@mail.gmail.com> > Maybe you can try this out with your original test case, Christoph? I applied the patch to 12/HEAD and the Debian testsuite now passes the 10->12 upgrade test. Thanks, Christoph
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Peter Geoghegan
Date:
On Tue, Jul 16, 2019 at 1:13 AM Christoph Berg <myon@debian.org> wrote: > I applied the patch to 12/HEAD and the Debian testsuite now passes the > 10->12 upgrade test. Thanks for testing! I am not quite sure if I should push ahead with this, simply because I don't know what the point of commit 0a64b45152b really was (Alexandar? Teodor?). Why not just make the assertions a bit more less strict in one or two places? Is the _bt_cachemetadata() function really necessary? Can we remove it now? AFAICT the only purpose that _bt_cachemetadata() serves that isn't better handled by updating the assertions that were failing back in April of 2018 (and still sometimes fail) is initializing the new-to-v3 fields defensively (initializing btm_oldest_btpo_xact and btm_last_cleanup_num_heap_tuples are set to their default values). Even that seems unnecessary, since every piece of code knows that it isn't sensible to read those values. Including contrib/pageinspect, which was actually taught this by commit 0a64b45152b itself. I'm a bit nervous about pushing a commit that will almost be a straight revert of 0a64b45152b without first getting confirmation that _bt_cachemetadata() is actually totally unnecessary. -- Peter Geoghegan
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Peter Geoghegan
Date:
On Wed, Jul 17, 2019 at 4:10 PM Peter Geoghegan <pg@bowt.ie> wrote: > I am not quite sure if I should push ahead with this, simply because I > don't know what the point of commit 0a64b45152b really was (Alexandar? > Teodor?). Why not just make the assertions a bit more less strict in > one or two places? Is the _bt_cachemetadata() function really > necessary? Can we remove it now? Attached revision of the patch shows what I have in mind -- this is almost a straight revert of 0a64b45152b. This is really no different than the first version, though it couldn't hurt to test it once more if you have time, Christoph. I wonder why commit 0a64b45152b didn't look like this in the first place. My approach is the obvious way to fix the problem that 0a64b45152b was designed to fix. Am I missing something? Prior to Postgres v11, we never supported upgrades -- the nbtree version was 2 for many many years (since before we even had pg_upgrade to worry about). Obviously that meant that assertions that looked like "Assert(metad->btm_version == BTREE_VERSION)" had to be updated by commit 857f9c36cda (this was a big feature commit that commit 0a64b45152b tried to fix-up). It looks like commit 857f9c36cda simply missed a few assertions in the upgrade path, including _bt_getrootheight(). So, again, I don't know why commit 0a64b45152b had to go any further than updating 2 or 3 assertions that were overlooked in commit 857f9c36cda. -- Peter Geoghegan
Attachment
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Christoph Berg
Date:
Re: Peter Geoghegan 2019-07-18 <CAH2-Wznb+-L9_HEjSGZD0Wty8Auqmfkna7pNN6nu+oWis7iWYA@mail.gmail.com> > Attached revision of the patch shows what I have in mind -- this is > almost a straight revert of 0a64b45152b. This is really no different > than the first version, though it couldn't hurt to test it once more > if you have time, Christoph. That version works as well on 12 HEAD. It passes "check world" and the Debian testsuite on top of that running 10->12 upgrade tests. Christoph
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Peter Geoghegan
Date:
On Thu, Jul 18, 2019 at 2:30 AM Christoph Berg <myon@debian.org> wrote: > That version works as well on 12 HEAD. It passes "check world" and the > Debian testsuite on top of that running 10->12 upgrade tests. Pushed a slightly refined version of the same patch back to the v11 branch. Thanks for the report, and for testing! -- Peter Geoghegan
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Alvaro Herrera
Date:
On 2019-Jul-18, Peter Geoghegan wrote: > On Thu, Jul 18, 2019 at 2:30 AM Christoph Berg <myon@debian.org> wrote: > > That version works as well on 12 HEAD. It passes "check world" and the > > Debian testsuite on top of that running 10->12 upgrade tests. > > Pushed a slightly refined version of the same patch back to the v11 branch. Thank you! I wonder what the consequences are in a system that doesn't have assertions enabled. Do we just end up with an "upgraded" cached metapage, and no further ill effects? That seems pretty benign ... I hope we don't end up with an expected v4 metapage or something crazy like that? (I suppose we would have noticed.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #15896: pg_upgrade from 10-or-earlier: TRAP: FailedAssertion(»!(metad->btm_version >= 3)«
From
Peter Geoghegan
Date:
On Thu, Jul 18, 2019 at 1:57 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I wonder what the consequences are in a system that doesn't have > assertions enabled. Do we just end up with an "upgraded" cached > metapage, and no further ill effects? That seems pretty benign ... AFAICT, it's benign. The new fields added to the metapage by commit 857f9c36cda were put there because that happened to be convenient. They were only really something that VACUUM is expected to care about. The usual concerns about having a stale meta page don't apply to those fields. The design of commit 0a64b45152b seems to have been "let's just pretend that it's a v3 page in the cached version, not a v2 page, since code that might used a cached version won't care about the difference, and code that does care about the difference definitely won't use the cache". This was made much more complicated by the fact that 0a64b45152b didn't really seem to admit that it was doing this -- it's very weird that the cached version could be *ahead of* the authoritative version. It's now possible that the cached version number will become stale on v11, but that should be harmless for the same reason as it was harmless when it happened in the opposite direction. Besides, there are well established ways in which a cached nbtree metapage is allowed to become stale -- comments above _bt_getrootheight() have always said that that's okay. The situation here was confusing enough that it seemed worth backpatching a fix to v11, even though I believe that nobody using v11 will have see any misbehavior related to commit 0a64b45152b. > I hope we don't end up with an expected v4 metapage or something crazy > like that? (I suppose we would have noticed.) I think that we'd have heard about it by now. -- Peter Geoghegan