Thread: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Hello, hackers. It seems like PG 14 works incorrectly with vacuum_defer_cleanup_age (or just not cleared rows, not sure) and SELECT FOR UPDATE + UPDATE. I am not certain, but hot_standby_feedback probably able to cause the same issues. Steps to reproduce: 1) Start Postgres like this: docker run -it -p 5432:5432 --name pg -e POSTGRES_PASSWORD=postgres -e LANG=C.UTF-8 -d postgres:14.6 -c vacuum_defer_cleanup_age=1000000 2) Prepare scheme: CREATE TABLE something_is_wrong_here (id bigserial PRIMARY KEY, value numeric(15,4) DEFAULT 0 NOT NULL); INSERT INTO something_is_wrong_here (value) (SELECT 10000 from generate_series(0, 100)); 3) Prepare file for pgbench: BEGIN; SELECT omg.* FROM something_is_wrong_here AS omg ORDER BY random() LIMIT 1 FOR UPDATE \gset UPDATE something_is_wrong_here SET value = :value + 1 WHERE id = :id; END; 4) Run pgbench: pgbench -c 50 -j 2 -n -f reproduce.bench 'port= 5432 host=localhost user=postgres dbname=postgres password=postgres' -T 100 -P 1 I was able to see such a set of errors (looks scary): ERROR: MultiXactId 30818104 has not been created yet -- apparent wraparound ERROR: could not open file "base/13757/16385.1" (target block 39591744): previous segment is only 24 blocks ERROR: attempted to lock invisible tuple ERROR: could not access status of transaction 38195704 DETAIL: Could not open file "pg_subtrans/0246": No such file or directory. Best regards, Michail.
Hello, Andres. I apologize for the direct ping, but I think your snapshot scalability work in PG14 could be related to the issue. The TransactionIdRetreatedBy implementation looks correct... But with txid_current=212195 I see errors like "could not access status of transaction 58643736"... So, maybe vacuum_defer_cleanup_age just highlights some special case (something with "previous" xids on the left side of zero?).... Thanks, Michail.
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
From
Matthias van de Meent
Date:
On Thu, 5 Jan 2023 at 14:12, Michail Nikolaev <michail.nikolaev@gmail.com> wrote:
>
> Hello, hackers.
>
> It seems like PG 14 works incorrectly with vacuum_defer_cleanup_age
> (or just not cleared rows, not sure) and SELECT FOR UPDATE + UPDATE.
> I am not certain, but hot_standby_feedback probably able to cause the
> same issues.
>
> Steps to reproduce:
>
> [steps]
>
> I was able to see such a set of errors (looks scary):
>
> ERROR: MultiXactId 30818104 has not been created yet -- apparent wraparound
> ERROR: could not open file "base/13757/16385.1" (target block
> 39591744): previous segment is only 24 blocks
This looks quite suspicious too - it wants to access a block at 296GB of data, where only 196kB exist.
> ERROR: attempted to lock invisible tuple
> ERROR: could not access status of transaction 38195704
> DETAIL: Could not open file "pg_subtrans/0246": No such file or directory.
I just saw two instances of this "attempted to lock invisible tuple" error for the 15.1 image (run on Docker in Ubuntu in WSL) with your reproducer script, so this does not seem to be specific to PG14 (.6).
And, after some vacuum and restarting the process, I got the following:
client 29 script 0 aborted in command 2 query 0: ERROR: heap tid from index tuple (111,1) points past end of heap page line pointer array at offset 262 of block 1 in index "something_is_wrong_here_pkey"
There is indeed something wrong there; the page can't be read by pageinspect:
$ select get_raw_page('public.something_is_wrong_here', 111)::bytea;
ERROR: invalid page in block 111 of relation base/5/16385
I don't have access to the v14 data anymore (I tried a restart, which dropped the data :-( ), but will retain my v15 instance for some time to help any debugging.
Kind regards,
Matthias van de Meent
On Thu, Jan 5, 2023 at 1:49 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > client 29 script 0 aborted in command 2 query 0: ERROR: heap tid from index tuple (111,1) points past end of heap pageline pointer array at offset 262 of block 1 in index "something_is_wrong_here_pkey" This particular error message is from the hardening added to Postgres 15 in commit e7428a99. So it's not surprising that Michail didn't see the same error on 14. -- Peter Geoghegan
On Thu, Jan 5, 2023 at 3:27 PM Peter Geoghegan <pg@bowt.ie> wrote: > This particular error message is from the hardening added to Postgres > 15 in commit e7428a99. So it's not surprising that Michail didn't see > the same error on 14. Reproduced this on HEAD locally (no docker), without any difficulty. FWIW, I find that the "Assert(ItemIdIsNormal(lp));" at the top of heap_lock_tuple() is the first thing that fails on my assert-enabled build. -- Peter Geoghegan
Hello! Thanks for checking the issue! > FWIW, I find that the "Assert(ItemIdIsNormal(lp));" at the top of > heap_lock_tuple() is the first thing that fails on my assert-enabled > build. Yes, the same for me: TRAP: failed Assert("ItemIdIsNormal(lp)"), File: "heapam.c", Line: 4292, PID: 33416 > Reproduced this on HEAD locally (no docker), without any difficulty. It is a little bit harder without docker in my case, need to adjust connections and number of threads: pgbench -c 90 -j 8 -n -f reproduce.bench 'port= 5432 host=localhost user=postgres dbname=postgres password=postgres' -T 2000 -P 1 Best regards, Michail.
Hello. The few things I have got so far: 1) It is not required to order by random() to reproduce the issue - it could be done using queries like: BEGIN; SELECT omg.* FROM something_is_wrong_here AS omg ORDER BY value -- change is here LIMIT 1 FOR UPDATE \gset UPDATE something_is_wrong_here SET value = :value + 1 WHERE id = :id; COMMIT; But for some reason it is harder to reproduce without random in my case (typically need to wait for about a minute with 100 connections). 2) It is not an issue at table creation time. Issue is reproducible if vacuum_defer_cleanup_age set after table preparation. 3) To reproduce the issue, vacuum_defer_cleanup_age should flip xid over zero (be >= txid_current()). And it is stable.... So, for example - unable to reproduce with 733 value, but 734 gives error each time. Just a single additional txid_current() (after data is filled) fixes a crash... It looks like the first SELECT FOR UPDATE + UPDATE silently poisons everything somehow. You could use such PSQL script: DROP TABLE IF EXISTS something_is_wrong_here; CREATE TABLE something_is_wrong_here (id bigserial PRIMARY KEY, value numeric(15,4) DEFAULT 0 NOT NULL); INSERT INTO something_is_wrong_here (value) (SELECT 10000 from generate_series(0, 100)); SELECT txid_current() \gset SELECT :txid_current + 1 as txid \gset ALTER SYSTEM SET vacuum_defer_cleanup_age to :txid;SELECT pg_reload_conf(); I have attached some scripts if someone goes to reproduce. Best regards, Michail.
Attachment
Hi, Thomas, CCing you because of the 64bit xid representation aspect. On 2023-01-06 00:39:44 +0300, Michail Nikolaev wrote: > I apologize for the direct ping, but I think your snapshot scalability > work in PG14 could be related to the issue. Good call! > The TransactionIdRetreatedBy implementation looks correct... But with > txid_current=212195 I see errors like "could not access status of > transaction 58643736"... > So, maybe vacuum_defer_cleanup_age just highlights some special case > (something with "previous" xids on the left side of zero?).... I think the bug is close to TransactionIdRetreatedBy(). Arguably in FullXidRelativeTo(). Or a more fundamental data representation issue with 64bit xids. To explain, here's a trace to the bottom of GetSnapshotData() leading to the problem: In the case I'm looking at here we end up with 720: oldestfxid = FullXidRelativeTo(latest_completed, oldestxid); and xmin is 255271, both correct. Then in TransactionIdRetreatedBy: /* apply vacuum_defer_cleanup_age */ def_vis_xid_data = TransactionIdRetreatedBy(xmin, vacuum_defer_cleanup_age); things start to be iffy. Because we retreat by vacuum_defer_cleanup_age, which was set to txid_current() in scheme.sql, and the xmin above is that xid, TransactionIdRetreatedBy() first ends up with 0. It then backtracks further to the highest 32 xid (i.e. 4294967295). So far so good. We could obviously end up with values further in the past as well, if vacuum_defer_cleanup_age were larger. Things start to seriously go off the rails when we convert that 32bit xid to 64 bit with: def_vis_fxid = FullXidRelativeTo(latest_completed, def_vis_xid); which returns {value = 18446744073709551615}, which 0-1 in 64bit. However, as 64bit xids are not supposed to wrap around, we're in trouble - it's an xid *very* far into the future. Allowing things to be pruned that shouldn't, because everything is below that. I don't quite know how to best fix this. 4294967295 is the correct result for TransactionIdRetreatedBy() in this case, and it'd not cause problems for FullXidRelativeTo() if we actually had wrapped around the xid counter before (since then we'd just represent it as a fxid "of the proper epoch"). Basically, in contrast to 32bit xids, 64bit xids cannot represent xids from before the start of the universe, whereas with the modulo arithmetic of 32bit that's not a real problem. It's probably not too hard to fix specifically in this one place - we could just clamp vacuum_defer_cleanup_age to be smaller than latest_completed, but it strikes me as as a somewhat larger issue for the 64it xid infrastructure. I suspect this might not be the only place running into problems with such "before the universe" xids. For a bit I was thinking that we should introduce the notion that a FullTransactionId can be from the past. Specifically that when the upper 32bit are all set, we treat the lower 32bit as a value from before xid 0 using the normal 32bit xid arithmetic. But it sucks to add overhead for that everywhere. It might be a bit more palatable to designate one individual value, e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far before the start of the universe an xid point to... FullXidRelativeTo() did assert that the input 32bit xid is in a reasonable range, but unfortunately didn't do a similar check for the 64bit case. Greetings, Andres Freund
Hi, On 2023-01-07 21:06:06 +0300, Michail Nikolaev wrote: > 2) It is not an issue at table creation time. Issue is reproducible if > vacuum_defer_cleanup_age set after table preparation. > > 3) To reproduce the issue, vacuum_defer_cleanup_age should flip xid > over zero (be >= txid_current()). > And it is stable.... So, for example - unable to reproduce with 733 > value, but 734 gives error each time. > Just a single additional txid_current() (after data is filled) fixes a > crash... It looks like the first SELECT FOR UPDATE + UPDATE silently > poisons everything somehow. > You could use such PSQL script: FWIW, the concrete value for vacuum_defer_cleanup_age is not crucial to encounter the problem. It needs to be a value that, when compared to the xid that did the "INSERT INTO something_is_wrong_here", results in value <= 0. Setting vacuum_defer_cleanup_age than the xid to a much larger value allows the crash to be encountered repeatedly. Greetings, Andres Freund
Hi, On 2023-01-07 16:29:23 -0800, Andres Freund wrote: > It's probably not too hard to fix specifically in this one place - we could > just clamp vacuum_defer_cleanup_age to be smaller than latest_completed, but > it strikes me as as a somewhat larger issue for the 64it xid infrastructure. I > suspect this might not be the only place running into problems with such > "before the universe" xids. I haven't found other problematic places in HEAD, but did end up find a less serious version of this bug in < 14: GetFullRecentGlobalXmin(). I did verify that with vacuum_defer_cleanup_age set GetFullRecentGlobalXmin() returns values that look likely to cause problems. Its "just" used in gist luckily. It's hard to find places that do this kind of arithmetic, we traditionally haven't had a helper for it. So it's open-coded in various ways. xidStopLimit = xidWrapLimit - 3000000; if (xidStopLimit < FirstNormalTransactionId) xidStopLimit -= FirstNormalTransactionId; and oddly: xidVacLimit = oldest_datfrozenxid + autovacuum_freeze_max_age; if (xidVacLimit < FirstNormalTransactionId) xidVacLimit += FirstNormalTransactionId; or (in < 14): RecentGlobalXmin = globalxmin - vacuum_defer_cleanup_age; if (!TransactionIdIsNormal(RecentGlobalXmin)) RecentGlobalXmin = FirstNormalTransactionId; The currently existing places I found, other than the ones in procarray.c, luckily don't seem to convert the xids to 64bit xids. > For a bit I was thinking that we should introduce the notion that a > FullTransactionId can be from the past. Specifically that when the upper 32bit > are all set, we treat the lower 32bit as a value from before xid 0 using the > normal 32bit xid arithmetic. But it sucks to add overhead for that > everywhere. > > It might be a bit more palatable to designate one individual value, > e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far > before the start of the universe an xid point to... On IM Thomas suggested we could reserve the 2^32-1 epoch for invalid values. I hacked up a patch that converts various fxid functions to inline functions with such assertions, and it indeed quickly catches the problem this thread reported, close to the source of the use. One issue with that is is that it'd reduce what can be input for the xid8 type. But it's hard to believe that'd be a real issue? It's quite unfortunate that we don't have a test for vacuum_defer_cleanup_age yet :(. Greetings, Andres Freund
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
From
Matthias van de Meent
Date:
On Sun, 8 Jan 2023 at 04:09, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-01-07 16:29:23 -0800, Andres Freund wrote: > > It's probably not too hard to fix specifically in this one place - we could > > just clamp vacuum_defer_cleanup_age to be smaller than latest_completed, but > > it strikes me as as a somewhat larger issue for the 64it xid infrastructure. I > > suspect this might not be the only place running into problems with such > > "before the universe" xids. > > I haven't found other problematic places in HEAD, but did end up find a less > serious version of this bug in < 14: GetFullRecentGlobalXmin(). I did verify > that with vacuum_defer_cleanup_age set GetFullRecentGlobalXmin() returns > values that look likely to cause problems. Its "just" used in gist luckily. > > It's hard to find places that do this kind of arithmetic, we traditionally > haven't had a helper for it. So it's open-coded in various ways. > [...] > > The currently existing places I found, other than the ones in procarray.c, > luckily don't seem to convert the xids to 64bit xids. That's good to know. > > For a bit I was thinking that we should introduce the notion that a > > FullTransactionId can be from the past. Specifically that when the upper 32bit > > are all set, we treat the lower 32bit as a value from before xid 0 using the > > normal 32bit xid arithmetic. But it sucks to add overhead for that > > everywhere. > > > > It might be a bit more palatable to designate one individual value, > > e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far > > before the start of the universe an xid point to... > > On IM Thomas suggested we could reserve the 2^32-1 epoch for invalid values. I > hacked up a patch that converts various fxid functions to inline functions > with such assertions, and it indeed quickly catches the problem this thread > reported, close to the source of the use. Wouldn't it be enough to only fix the constructions in FullXidRelativeTo() and widen_snapshot_xid() (as attached, $topic does not occur with the patch), and (optionally) bump the first XID available for any cluster to (FirstNormalXid + 1) to retain the 'older than any running transaction' property? The change only fixes the issue for FullTransactionId, which IMO is OK - I don't think we need to keep xid->xid8->xid symmetric in cases of xid8 wraparound. > One issue with that is is that it'd reduce what can be input for the xid8 > type. But it's hard to believe that'd be a real issue? Yes, it's unlikely anyone would ever hit that with our current WAL format - we use 24 bytes /xid just to log it's use, so we'd use at most epoch 0x1000_0000 in unrealistic scenarios. In addition; technically, we already have (3*2^32 - 3) "invalid" xid8 values that can never be produced in FullXidRelativeTo - those few extra invalid values don't matter much to me except "even more special casing". Kind regards, Matthias van de Meent.
Attachment
Hi, Robert, Mark, CCing you because of the amcheck aspect (see below). On 2023-01-09 17:50:10 +0100, Matthias van de Meent wrote: > On Sun, 8 Jan 2023 at 04:09, Andres Freund <andres@anarazel.de> wrote: > > > For a bit I was thinking that we should introduce the notion that a > > > FullTransactionId can be from the past. Specifically that when the upper 32bit > > > are all set, we treat the lower 32bit as a value from before xid 0 using the > > > normal 32bit xid arithmetic. But it sucks to add overhead for that > > > everywhere. > > > > > > It might be a bit more palatable to designate one individual value, > > > e.g. 2^32-1<<32, as a "before xid 0" indicator - it doesn't matter how far > > > before the start of the universe an xid point to... > > > > On IM Thomas suggested we could reserve the 2^32-1 epoch for invalid values. I > > hacked up a patch that converts various fxid functions to inline functions > > with such assertions, and it indeed quickly catches the problem this thread > > reported, close to the source of the use. > > Wouldn't it be enough to only fix the constructions in > FullXidRelativeTo() and widen_snapshot_xid() (as attached, $topic does > not occur with the patch), and (optionally) bump the first XID > available for any cluster to (FirstNormalXid + 1) to retain the 'older > than any running transaction' property? It's not too hard to fix in individual places, but I suspect that we'll introduce the bug in future places without some more fundamental protection. Locally I fixed it by clamping vacuum_defer_cleanup_age to a reasonable value in ComputeXidHorizons() and GetSnapshotData(). Fixing it in FullXidRelativeTo() doesn't seem quite right, while it's ok to just return FirstNormalTransactionId in the case of vacuum_defer_cleanup_age, it doesn't see necessarily correct for other cases. > The change only fixes the issue for FullTransactionId, which IMO is OK > - I don't think we need to keep xid->xid8->xid symmetric in cases of > xid8 wraparound. I think we should keep that symmetric, it just gets too confusing / easy to miss bugs otherwise. > > One issue with that is is that it'd reduce what can be input for the xid8 > > type. But it's hard to believe that'd be a real issue? > > Yes, it's unlikely anyone would ever hit that with our current WAL > format - we use 24 bytes /xid just to log it's use, so we'd use at > most epoch 0x1000_0000 in unrealistic scenarios. In addition; > technically, we already have (3*2^32 - 3) "invalid" xid8 values that > can never be produced in FullXidRelativeTo - those few extra invalid > values don't matter much to me except "even more special casing". Yep. The attached 0002 is a first implementation of this. The new assertions found at least one bug in amcheck, and one further example of the problem of representing past 32 xids in 64bit: 1) Because ctx->next_xid is set after the XidFromFullTransactionId() call in update_cached_xid_range(), we end up using the xid 0 (or an outdated value in subsequent calls) to determine whether epoch needs to be reduced. 2) One test generates includes an xid from the future (4026531839). Which causes epoch to wrap around (via the epoch--) in FullTransactionIdFromXidAndCtx(). I've hackily fixed that by just representing it as an xid from the future instead. But not sure that's a good answer. A different approach would be to represent fxids as *signed* 64bit integers. That'd of course loose more range, but could represent everything accurately, and would have a compatible on-disk representation on two's complement platforms (all our platforms). I think the only place that'd need special treatment is U64FromFullTransactionId() / its callers. I think this might be the most robust approach. Greetings, Andres Freund
Attachment
On Tue, Jan 10, 2023 at 8:34 AM Andres Freund <andres@anarazel.de> wrote: > A different approach would be to represent fxids as *signed* 64bit > integers. That'd of course loose more range, but could represent everything > accurately, and would have a compatible on-disk representation on two's > complement platforms (all our platforms). I think the only place that'd need > special treatment is U64FromFullTransactionId() / its callers. I think this > might be the most robust approach. It does sound like an interesting approach; it means you are free to retreat arbitrarily without ever thinking about it, and by the arguments given (LSN space required to consume fxids) it's still 'enough'. Essentially all these bugs are places where the author already believed it worked that way. (Two's complement is required in the C23 draft.)
> On Jan 9, 2023, at 11:34 AM, Andres Freund <andres@anarazel.de> wrote: > > 1) Because ctx->next_xid is set after the XidFromFullTransactionId() call in > update_cached_xid_range(), we end up using the xid 0 (or an outdated value in > subsequent calls) to determine whether epoch needs to be reduced. Can you say a bit more about your analysis here, preferably with pointers to the lines of code you are analyzing? Does theproblem exist in amcheck as currently committed, or are you thinking about a problem that arises only after applying yourpatch? I'm a bit fuzzy on where xid 0 gets used. Thanks — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2023-01-09 13:55:02 -0800, Mark Dilger wrote: > > On Jan 9, 2023, at 11:34 AM, Andres Freund <andres@anarazel.de> wrote: > > > > 1) Because ctx->next_xid is set after the XidFromFullTransactionId() call in > > update_cached_xid_range(), we end up using the xid 0 (or an outdated value in > > subsequent calls) to determine whether epoch needs to be reduced. > > Can you say a bit more about your analysis here, preferably with pointers to > the lines of code you are analyzing? Does the problem exist in amcheck as > currently committed, or are you thinking about a problem that arises only > after applying your patch? I'm a bit fuzzy on where xid 0 gets used. The problems exist in the code as currently committed. I'm not sure what exactly the consequences are, the result is that oldest_fxid will be, at least temporarily, bogus. Consider the first call to update_cached_xid_range(): /* * Update our cached range of valid transaction IDs. */ static void update_cached_xid_range(HeapCheckContext *ctx) { /* Make cached copies */ LWLockAcquire(XidGenLock, LW_SHARED); ctx->next_fxid = ShmemVariableCache->nextXid; ctx->oldest_xid = ShmemVariableCache->oldestXid; LWLockRelease(XidGenLock); /* And compute alternate versions of the same */ ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx); ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid); } The problem is that the call to FullTransactionIdFromXidAndCtx() happens before ctx->next_xid is assigned, even though FullTransactionIdFromXidAndCtx() uses ctx->next_xid. static FullTransactionId FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx) { uint32 epoch; if (!TransactionIdIsNormal(xid)) return FullTransactionIdFromEpochAndXid(0, xid); epoch = EpochFromFullTransactionId(ctx->next_fxid); if (xid > ctx->next_xid) epoch--; return FullTransactionIdFromEpochAndXid(epoch, xid); } Because ctx->next_xid is 0, due to not having been set yet, "xid > ctx->next_xid" will always be true, leading to epoch being reduced by one. In the common case of there never having been an xid wraparound, we'll thus underflow epoch, generating an xid far into the future. The tests encounter the issue today. If you add Assert(TransactionIdIsValid(ctx->next_xid)); Assert(FullTransactionIdIsValid(ctx->next_fxid)); early in FullTransactionIdFromXidAndCtx() it'll be hit in the amcheck/pg_amcheck tests. Greetings, Andres Freund
> On Jan 9, 2023, at 2:07 PM, Andres Freund <andres@anarazel.de> wrote: > > The tests encounter the issue today. If you add > Assert(TransactionIdIsValid(ctx->next_xid)); > Assert(FullTransactionIdIsValid(ctx->next_fxid)); > early in FullTransactionIdFromXidAndCtx() it'll be hit in the > amcheck/pg_amcheck tests. Ok, I can confirm that. I find the assertion Assert(epoch != (uint32)-1); a bit simpler to reason about, but either way, I agree it is a bug. Thanks for finding this. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
From
Matthias van de Meent
Date:
On Mon, 9 Jan 2023 at 20:34, Andres Freund <andres@anarazel.de> wrote: > On 2023-01-09 17:50:10 +0100, Matthias van de Meent wrote: > > Wouldn't it be enough to only fix the constructions in > > FullXidRelativeTo() and widen_snapshot_xid() (as attached, $topic does > > not occur with the patch), and (optionally) bump the first XID > > available for any cluster to (FirstNormalXid + 1) to retain the 'older > > than any running transaction' property? > > It's not too hard to fix in individual places, but I suspect that we'll > introduce the bug in future places without some more fundamental protection. > > Locally I fixed it by clamping vacuum_defer_cleanup_age to a reasonable value > in ComputeXidHorizons() and GetSnapshotData(). I don't think that clamping the value with oldestXid (as seen in patch 0001, in GetSnapshotData) is right. It would clamp the value relative to the oldest frozen xid of all databases, which can be millions of transactions behind oldestXmin, and thus severely skew the amount of transaction's changes you keep on disk (that is, until oldestXid moves past 1000_000). A similar case can be made for the changes in ComputeXidHorizons - for the described behaviour of vacuum_defer_cleanup_age we would need to clamp the used offset separately for each of the fields in the horizon result to retain all transaction data for the first 1 million transactions, and the ones that may still see these transactions. > Fixing it in FullXidRelativeTo() doesn't seem quite right, while it's ok to > just return FirstNormalTransactionId in the case of vacuum_defer_cleanup_age, > it doesn't see necessarily correct for other cases. Understood. Kind regards, Matthias van de Meent
Hi, On 2023-01-10 15:03:42 +0100, Matthias van de Meent wrote: > On Mon, 9 Jan 2023 at 20:34, Andres Freund <andres@anarazel.de> wrote: > > On 2023-01-09 17:50:10 +0100, Matthias van de Meent wrote: > > > Wouldn't it be enough to only fix the constructions in > > > FullXidRelativeTo() and widen_snapshot_xid() (as attached, $topic does > > > not occur with the patch), and (optionally) bump the first XID > > > available for any cluster to (FirstNormalXid + 1) to retain the 'older > > > than any running transaction' property? > > > > It's not too hard to fix in individual places, but I suspect that we'll > > introduce the bug in future places without some more fundamental protection. > > > > Locally I fixed it by clamping vacuum_defer_cleanup_age to a reasonable value > > in ComputeXidHorizons() and GetSnapshotData(). > > I don't think that clamping the value with oldestXid (as seen in patch > 0001, in GetSnapshotData) is right. I agree that using oldestXid to clamp is problematic. > It would clamp the value relative to the oldest frozen xid of all > databases, which can be millions of transactions behind oldestXmin, > and thus severely skew the amount of transaction's changes you keep on > disk (that is, until oldestXid moves past 1000_000). What precisely do you mean with "skew" here? Do you just mean that it'd take a long time until vacuum_defer_cleanup_age takes effect? Somehow it sounds like you might mean more than that? I'm tempted to go with reinterpreting 64bit xids as signed. Except that it seems like a mighty invasive change to backpatch. Greetings, Andres Freund
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
From
Matthias van de Meent
Date:
On Tue, 10 Jan 2023 at 20:14, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-01-10 15:03:42 +0100, Matthias van de Meent wrote: > > On Mon, 9 Jan 2023 at 20:34, Andres Freund <andres@anarazel.de> wrote: > > > It's not too hard to fix in individual places, but I suspect that we'll > > > introduce the bug in future places without some more fundamental protection. > > > > > > Locally I fixed it by clamping vacuum_defer_cleanup_age to a reasonable value > > > in ComputeXidHorizons() and GetSnapshotData(). > > > > I don't think that clamping the value with oldestXid (as seen in patch > > 0001, in GetSnapshotData) is right. > > I agree that using oldestXid to clamp is problematic. > > > > It would clamp the value relative to the oldest frozen xid of all > > databases, which can be millions of transactions behind oldestXmin, > > and thus severely skew the amount of transaction's changes you keep on > > disk (that is, until oldestXid moves past 1000_000). > > What precisely do you mean with "skew" here? Do you just mean that it'd take a > long time until vacuum_defer_cleanup_age takes effect? Somehow it sounds like > you might mean more than that? h->oldest_considered_running can be extremely old due to the global nature of the value and the potential existence of a snapshot in another database that started in parallel to a very old running transaction. Example: With vacuum_defer_cleanup_age set to 1000000, it is possible that a snapshot in another database (thus another backend) would result in a local intermediate status result of h->o_c_r = 20, h->s_o_n = 20, h->d_o_n = 10030. The clamped offset would then be 20 (clamped using h->o_c_r), which updates h->data_oldest_nonremovable to 10010. The obvious result is that all but the last 20 transactions from this database's data files are available for cleanup, which contradicts with the intention of the vacuum_defer_cleanup_age GUC. > I'm tempted to go with reinterpreting 64bit xids as signed. Except that it > seems like a mighty invasive change to backpatch. I'm not sure either. Protecting against underflow by halving the effective valid value space is quite the intervention, but if it is necessary to make this work in a performant manner, it would be worth it. Maybe someone else with more experience can provide their opinion here. Kind regards, Matthias van de Meent
Hello. I have registered it as patch in the commit fest: https://commitfest.postgresql.org/42/4138/ Best regards, Michail.
Hi, On 2023-01-10 21:32:54 +0100, Matthias van de Meent wrote: > On Tue, 10 Jan 2023 at 20:14, Andres Freund <andres@anarazel.de> wrote: > > On 2023-01-10 15:03:42 +0100, Matthias van de Meent wrote: > > What precisely do you mean with "skew" here? Do you just mean that it'd take a > > long time until vacuum_defer_cleanup_age takes effect? Somehow it sounds like > > you might mean more than that? > > h->oldest_considered_running can be extremely old due to the global > nature of the value and the potential existence of a snapshot in > another database that started in parallel to a very old running > transaction. Here's a version that, I think, does not have that issue. In an earlier, not posted, version I had an vacuum_defer_cleanup_age specific helper function for this, but it seems likely we'll need it in other places too. So I named it TransactionIdRetreatSafely(). I made it accept the xid by pointer, as the line lengths / repetition otherwise end up making it hard to read the code. For now I have TransactionIdRetreatSafely() be private to procarray.c, but I expect we'll have to change that eventually. Not sure I like TransactionIdRetreatSafely() as a name. Maybe TransactionIdRetreatClamped() is better? I've been working on a test for vacuum_defer_cleanup_age. It does catch the corruption at hand, but not much more. It's quite painful to write, right now. Some of the reasons: https://postgr.es/m/20230130194350.zj5v467x4jgqt3d6%40awork3.anarazel.de > > I'm tempted to go with reinterpreting 64bit xids as signed. Except that it > > seems like a mighty invasive change to backpatch. > > I'm not sure either. Protecting against underflow by halving the > effective valid value space is quite the intervention, but if it is > necessary to make this work in a performant manner, it would be worth > it. Maybe someone else with more experience can provide their opinion > here. The attached assertions just removes 1/2**32'ths of the space, by reserving the xid range with the upper 32bit set as something that shouldn't be reachable. Still requires us to change the input routines to reject that range, but I think that's a worthy tradeoff. I didn't find the existing limits for the type to be documented anywhere. Obviously something like that could only go into HEAD. Greetings, Andres Freund
Attachment
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
From
Matthias van de Meent
Date:
On Mon, 30 Jan 2023 at 21:19, Andres Freund <andres@anarazel.de> wrote: > On 2023-01-10 21:32:54 +0100, Matthias van de Meent wrote: > > On Tue, 10 Jan 2023 at 20:14, Andres Freund <andres@anarazel.de> wrote: > > > On 2023-01-10 15:03:42 +0100, Matthias van de Meent wrote: > > > What precisely do you mean with "skew" here? Do you just mean that it'd take a > > > long time until vacuum_defer_cleanup_age takes effect? Somehow it sounds like > > > you might mean more than that? > > > > h->oldest_considered_running can be extremely old due to the global > > nature of the value and the potential existence of a snapshot in > > another database that started in parallel to a very old running > > transaction. > > Here's a version that, I think, does not have that issue. Thanks! > In an earlier, not posted, version I had an vacuum_defer_cleanup_age specific > helper function for this, but it seems likely we'll need it in other places > too. So I named it TransactionIdRetreatSafely(). I made it accept the xid by > pointer, as the line lengths / repetition otherwise end up making it hard to > read the code. For now I have TransactionIdRetreatSafely() be private to > procarray.c, but I expect we'll have to change that eventually. If TransactionIdRetreatSafely will be exposed outside procarray.c, then I think the xid pointer should be replaced with normal arguments/returns; both for parity with TransactionIdRetreatedBy and to remove this memory store dependency in this hot code path. > Not sure I like TransactionIdRetreatSafely() as a name. Maybe > TransactionIdRetreatClamped() is better? I think the 'safely' version is fine. > I've been working on a test for vacuum_defer_cleanup_age. It does catch the > corruption at hand, but not much more. It's quite painful to write, right > now. Some of the reasons: > https://postgr.es/m/20230130194350.zj5v467x4jgqt3d6%40awork3.anarazel.de > > > > > > I'm tempted to go with reinterpreting 64bit xids as signed. Except that it > > > seems like a mighty invasive change to backpatch. > > > > I'm not sure either. Protecting against underflow by halving the > > effective valid value space is quite the intervention, but if it is > > necessary to make this work in a performant manner, it would be worth > > it. Maybe someone else with more experience can provide their opinion > > here. > > The attached assertions just removes 1/2**32'ths of the space, by reserving > the xid range with the upper 32bit set as something that shouldn't be > reachable. I think that is acceptible. > Still requires us to change the input routines to reject that range, but I > think that's a worthy tradeoff. Agreed. > I didn't find the existing limits for the > type to be documented anywhere. > > Obviously something like that could only go into HEAD. Yeah. Comments on 0003: > + /* > + * FIXME, doubtful this is the best fix. > + * > + * Can't represent the 32bit xid as a 64bit xid, as it's before fxid > + * 0. Represent it as an xid from the future instead. > + */ > + if (epoch == 0) > + return FullTransactionIdFromEpochAndXid(0, xid); Shouldn't this be an error condition instead, as this XID should not be able to appear? on 0004: > - '0xffffffffffffffff'::xid8, > - '-1'::xid8; > + '0xefffffffffffffff'::xid8, > + '0'::xid8; The 0xFF... usages were replaced with "0xEFFF...". Shouldn't we also test on 0xffff_fffE_ffff_ffff to test for input of our actual max value? > @@ -326,7 +329,11 @@ parse_snapshot(const char *str, Node *escontext) > while (*str != '\0') > { > /* read next value */ > - val = FullTransactionIdFromU64(strtou64(str, &endp, 10)); > + raw_fxid = strtou64(str, &endp, 10); > + > + val = FullTransactionIdFromU64(raw_fxid); > + if (!InFullTransactionIdRange(raw_fxid)) > + goto bad_format; With assertions enabled FullTransactionIdFromU64 will assert the InFullTransactionIdRange condition, meaning we wouldn't hit the branch into bad_format. I think these operations should be swapped, as parsing a snapshot shouldn't run into assertion failures like this if it can error normally. Maybe this can be added to tests as well? Kind regards, Matthias van de Meent
Hi, On 2023-01-31 15:05:17 +0100, Matthias van de Meent wrote: > On Mon, 30 Jan 2023 at 21:19, Andres Freund <andres@anarazel.de> wrote: > > In an earlier, not posted, version I had an vacuum_defer_cleanup_age specific > > helper function for this, but it seems likely we'll need it in other places > > too. So I named it TransactionIdRetreatSafely(). I made it accept the xid by > > pointer, as the line lengths / repetition otherwise end up making it hard to > > read the code. For now I have TransactionIdRetreatSafely() be private to > > procarray.c, but I expect we'll have to change that eventually. > > If TransactionIdRetreatSafely will be exposed outside procarray.c, > then I think the xid pointer should be replaced with normal > arguments/returns; both for parity with TransactionIdRetreatedBy That's why I named one version *Retreat the other Retreated :) I think it'll make the code easier to read in the other places too, the variable names / function names in this space are uncomfortably long to fit into 78chars..., particularly when there's two references to the same variable in the same line. > and to remove this memory store dependency in this hot code path. I doubt that matters much here and the places it's going to be used in. And presumably the compiler will inline it anyway. I'd probably make it a static inline in the header too. What's making me hesitate about exposing it is that it's quite easy to get things wrong by using a wrong fxid or such. > > + /* > > + * FIXME, doubtful this is the best fix. > > + * > > + * Can't represent the 32bit xid as a 64bit xid, as it's before fxid > > + * 0. Represent it as an xid from the future instead. > > + */ > > + if (epoch == 0) > > + return FullTransactionIdFromEpochAndXid(0, xid); > > Shouldn't this be an error condition instead, as this XID should not > be able to appear? If you mean error in the sense of ERROR, no, I don't think so. That code tries hard to be able to check many tuples in a row. And if we were to error out here, we'd not able to do that. We should still report those tuples as corrupt, fwiw. The reason this path is hit is that a test intentionally corrupts some xids. So the path is reachable and we need to cope somehow. I'm not really satisfied with this fix either - I mostly wanted to include something sufficient to prevent assertion failures. I had hoped that Mark would look at the amcheck bits and come up with more complete fixes. > on 0004: > > > - '0xffffffffffffffff'::xid8, > > - '-1'::xid8; > > + '0xefffffffffffffff'::xid8, > > + '0'::xid8; > > The 0xFF... usages were replaced with "0xEFFF...". Shouldn't we also > test on 0xffff_fffE_ffff_ffff to test for input of our actual max > value? Probably a good idea. > > @@ -326,7 +329,11 @@ parse_snapshot(const char *str, Node *escontext) > > while (*str != '\0') > > { > > /* read next value */ > > - val = FullTransactionIdFromU64(strtou64(str, &endp, 10)); > > + raw_fxid = strtou64(str, &endp, 10); > > + > > + val = FullTransactionIdFromU64(raw_fxid); > > + if (!InFullTransactionIdRange(raw_fxid)) > > + goto bad_format; > > With assertions enabled FullTransactionIdFromU64 will assert the > InFullTransactionIdRange condition, meaning we wouldn't hit the branch > into bad_format. > I think these operations should be swapped, as parsing a snapshot > shouldn't run into assertion failures like this if it can error > normally. Yep. > Maybe this can be added to tests as well? I'll check. I thought for a bit it'd not work because we'd perform range checks on the xids, but we don't... Greetings, Andres Freund
Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
From
Matthias van de Meent
Date:
On Tue, 31 Jan 2023 at 23:48, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-01-31 15:05:17 +0100, Matthias van de Meent wrote: > > If TransactionIdRetreatSafely will be exposed outside procarray.c, > > then I think the xid pointer should be replaced with normal > > arguments/returns; both for parity with TransactionIdRetreatedBy > > That's why I named one version *Retreat the other Retreated :) That part I noticed too :) I don't mind either way, I was just concerned with exposing the function as a prototype, not as an inline static. > I think it'll make the code easier to read in the other places too, the > variable names / function names in this space are uncomfortably long to > fit into 78chars..., particularly when there's two references to the > same variable in the same line. I guess that's true, and once inlined there should indeed be no extra runtime overhead. > 78 chars Didn't we use 80 columns/chars? How did you get to 78? Not that I can't think of any ways, but none of them stand out to me as obviously correct. > > and to remove this memory store dependency in this hot code path. > > I doubt that matters much here and the places it's going to be used > in. I thought that this was executed while still in ProcArrayLock, but instead we've released that lock already by the time we're trying to adjust the horizons, so the 'hot code path' concern is mostly relieved. > And presumably the compiler will inline it anyway. I'd probably make > it a static inline in the header too. Yes, my concern was based on an extern prototype with private implementation, as that does prohibit inlining and thus would have a requirement to push the data to memory (probably only L1, but still memory). > What's making me hesitate about exposing it is that it's quite easy to > get things wrong by using a wrong fxid or such. I'm less concerned about that when the function is well-documented. > > > + /* > > > + * FIXME, doubtful this is the best fix. > > > + * > > > + * Can't represent the 32bit xid as a 64bit xid, as it's before fxid > > > + * 0. Represent it as an xid from the future instead. > > > + */ > > > + if (epoch == 0) > > > + return FullTransactionIdFromEpochAndXid(0, xid); > > > > Shouldn't this be an error condition instead, as this XID should not > > be able to appear? > > If you mean error in the sense of ERROR, no, I don't think so. That code > tries hard to be able to check many tuples in a row. And if we were to > error out here, we'd not able to do that. We should still report those > tuples as corrupt, fwiw. > > The reason this path is hit is that a test intentionally corrupts some > xids. So the path is reachable and we need to cope somehow. I see. Kind regards, Matthias van de Meent
Hi, Heikki, Andrey, CCing you because you wrote commit 6655a7299d835dea9e8e0ba69cc5284611b96f29 Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Date: 2019-07-24 20:24:07 +0300 Use full 64-bit XID for checking if a deleted GiST page is old enough. On 2023-01-07 19:09:56 -0800, Andres Freund wrote: > I haven't found other problematic places in HEAD, but did end up find a less > serious version of this bug in < 14: GetFullRecentGlobalXmin(). I did verify > that with vacuum_defer_cleanup_age set GetFullRecentGlobalXmin() returns > values that look likely to cause problems. Its "just" used in gist luckily. Is there a good way to make breakage in the page recycling mechanism visible with gist? I guess to see corruption, I'd have to halt a scan before a page is visited with gdb, then cause the page to be recycled prematurely in another session, then unblock the first? Which'd then visit that page, thinking it to be in a different part of the tree than it actually is? I'm pretty sure it's broken though. On 13, with vacuum_defer_cleanup_age=0, the attached script has two consecutive VACUUM VERBOSEs output 106 index pages have been deleted, 0 are currently reusable. 106 index pages have been deleted, 0 are currently reusable. in the presence of a prepared transaction. Which makes sense. But with vacuum_defer_cleanup_age=10000 106 index pages have been deleted, 0 are currently reusable. 106 index pages have been deleted, 106 are currently reusable. which clearly doesn't seem right. I just can't quite judge how bad that is. Greetings, Andres Freund
Attachment
Hello, Andres. > Not sure I like TransactionIdRetreatSafely() as a name. Maybe > TransactionIdRetreatClamped() is better? I think it is better to just replace TransactionIdRetreatedBy. It is not used anymore after `v3-0001-WIP-Fix-corruption-due-to-vacuum_defer_cleanup_ag.patch` - so, it is better to replace the dangerous version in order to avoid similar issues in the future. But we need also to move FullXidRelativeTo in that case (not sure it is safe). > I think it'll make the code easier to read in the other places too, the > variable names / function names in this space are uncomfortably long to > fit into 78chars..., particularly when there's two references to the > same variable in the same line. Looks fine for my taste, but it is pretty far from perfect :) Best regards, Michail.
Attachment
On Sat, Feb 4, 2023 at 2:57 AM Andres Freund <andres@anarazel.de> wrote: > > Is there a good way to make breakage in the page recycling mechanism > visible with gist? I guess to see corruption, I'd have to halt a scan > before a page is visited with gdb, then cause the page to be recycled > prematurely in another session, then unblock the first? Which'd then > visit that page, thinking it to be in a different part of the tree than > it actually is? > In most cases landing on one extra page will not affect the scan. Worst case that I can imagine - scan is landing on a page that is the new parent of the deleted page. Even then we cannot end up with infinite index scan - we will just make one extra loop. Although, IndexScan will yield duplicate tids. In case of interference with concurrent insertion we will get a tree structure departed from optimal, but that is not a problem. Best regards, Andrey Borodin.
On Sat, Feb 4, 2023 at 2:57 AM Andres Freund <andres@anarazel.de> wrote: > Is there a good way to make breakage in the page recycling mechanism > visible with gist? I guess to see corruption, I'd have to halt a scan > before a page is visited with gdb, then cause the page to be recycled > prematurely in another session, then unblock the first? Which'd then > visit that page, thinking it to be in a different part of the tree than > it actually is? Yes. This bug is similar to an ancient nbtree bug fixed back in 2012, by commit d3abbbeb. > which clearly doesn't seem right. > > I just can't quite judge how bad that is. It's really hard to judge, even if you're an expert. We're talking about a fairly chaotic scenario. My guess is that there is a very small chance of a very unpleasant scenario if you have a GiST index that has regular page deletions, and if you use vacuum_defer_cleanup_age. It's likely that most GiST indexes never have any page deletions due to the workload characteristics. -- Peter Geoghegan
Hi, On 2023-02-04 11:10:55 -0800, Peter Geoghegan wrote: > On Sat, Feb 4, 2023 at 2:57 AM Andres Freund <andres@anarazel.de> wrote: > > Is there a good way to make breakage in the page recycling mechanism > > visible with gist? I guess to see corruption, I'd have to halt a scan > > before a page is visited with gdb, then cause the page to be recycled > > prematurely in another session, then unblock the first? Which'd then > > visit that page, thinking it to be in a different part of the tree than > > it actually is? > > Yes. This bug is similar to an ancient nbtree bug fixed back in 2012, > by commit d3abbbeb. > > > which clearly doesn't seem right. > > > > I just can't quite judge how bad that is. > > It's really hard to judge, even if you're an expert. We're talking > about a fairly chaotic scenario. My guess is that there is a very > small chance of a very unpleasant scenario if you have a GiST index > that has regular page deletions, and if you use > vacuum_defer_cleanup_age. It's likely that most GiST indexes never > have any page deletions due to the workload characteristics. Thanks. Sounds like a problem here is too hard to repro. I mostly wanted to know how to be more confident about a fix working correctly. There's no tests for the whole page recycling behaviour, afaics, so it's a bit scary to change things around. I didn't quite feel confident pushing a fix for this just before a minor release, so I'll push once the minor releases are tagged. A quite minimal fix to GetFullRecentGlobalXmin() in 12-13 (returning FirstNormalTransactionId if epoch == 0 and RecentGlobalXmin > nextxid_xid), and the slightly larger fix in 14+. Greetings, Andres Freund
Hi, On 2023-02-06 13:02:05 -0800, Andres Freund wrote: > I didn't quite feel confident pushing a fix for this just before a minor > release, so I'll push once the minor releases are tagged. A quite minimal fix > to GetFullRecentGlobalXmin() in 12-13 (returning FirstNormalTransactionId if > epoch == 0 and RecentGlobalXmin > nextxid_xid), and the slightly larger fix in > 14+. Pushed that. Mark: I worked some more on the fixes for amcheck, and fixes for amcheck. The second amcheck fix ends up correcting some inaccurate output in the tests - previously xids from before xid 0 were reported to be in the future. Previously there was no test case exercising exceeding nextxid, without wrapping around into the past. I added that at the end of 004_verify_heapam.pl, because renumbering seemed too annoying. What do you think? Somewhat random note: Is it intentional that we VACUUM FREEZE test ROWCOUNT times? That's effectively O(ROWCOUNT^2), albeit with small enough constants to not really matter. I don't think we need to insert the rows one-by-one either. Changing that to a single INSERT and FREEZE shaves 10-12% off the tests. I didn't change that, but we also fire off a psql for each tuple for heap_page_items(), with offset $N no less. That seems to be another 500ms. Greetings, Andres Freund
Attachment
> On Mar 8, 2023, at 4:15 PM, Andres Freund <andres@anarazel.de> wrote: > > I worked some more on the fixes for amcheck, and fixes for amcheck. > > The second amcheck fix ends up correcting some inaccurate output in the tests > - previously xids from before xid 0 were reported to be in the future. > > Previously there was no test case exercising exceeding nextxid, without > wrapping around into the past. I added that at the end of > 004_verify_heapam.pl, because renumbering seemed too annoying. > > What do you think? The changes look reasonable to me. > Somewhat random note: > > Is it intentional that we VACUUM FREEZE test ROWCOUNT times? That's > effectively O(ROWCOUNT^2), albeit with small enough constants to not really > matter. I don't think we need to insert the rows one-by-one either. Changing > that to a single INSERT and FREEZE shaves 10-12% off the tests. I didn't > change that, but we also fire off a psql for each tuple for heap_page_items(), > with offset $N no less. That seems to be another 500ms. I don't recall the reasoning. Feel free to optimize the tests. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2023-03-09 12:15:16 -0800, Mark Dilger wrote: > > Somewhat random note: > > > > Is it intentional that we VACUUM FREEZE test ROWCOUNT times? That's > > effectively O(ROWCOUNT^2), albeit with small enough constants to not really > > matter. I don't think we need to insert the rows one-by-one either. Changing > > that to a single INSERT and FREEZE shaves 10-12% off the tests. I didn't > > change that, but we also fire off a psql for each tuple for heap_page_items(), > > with offset $N no less. That seems to be another 500ms. > > I don't recall the reasoning. Feel free to optimize the tests. Something like the attached. I don't know enough perl to know how to interpolate something like use constant ROWCOUNT => 17; so I just made it a variable. Greetings, Andres Freund
Attachment
> On Mar 11, 2023, at 3:22 PM, Andres Freund <andres@anarazel.de> wrote: > > Something like the attached. I like that your patch doesn't make the test longer. I assume you've already run the tests and that it works. > I don't know enough perl to know how to interpolate something like > use constant ROWCOUNT => 17; > so I just made it a variable. Seems fair. I certainly don't mind. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2023-03-11 15:34:55 -0800, Mark Dilger wrote: > > On Mar 11, 2023, at 3:22 PM, Andres Freund <andres@anarazel.de> wrote: > > > > Something like the attached. > > I like that your patch doesn't make the test longer. I assume you've already run the tests and that it works. I did check that, yes :). My process of writing perl is certainly, uh, iterative. No way I would get anything close to working without testing it. CI now finished the tests as well: https://cirrus-ci.com/build/6675457702100992 So I'll go ahead and push that. Greetings, Andres Freund
On 12.03.23 00:41, Andres Freund wrote: > Hi, > > On 2023-03-11 15:34:55 -0800, Mark Dilger wrote: >>> On Mar 11, 2023, at 3:22 PM, Andres Freund <andres@anarazel.de> wrote: >>> >>> Something like the attached. >> >> I like that your patch doesn't make the test longer. I assume you've already run the tests and that it works. > > I did check that, yes :). My process of writing perl is certainly, uh, > iterative. No way I would get anything close to working without testing it. > > CI now finished the tests as well: > https://cirrus-ci.com/build/6675457702100992 > > So I'll go ahead and push that. There is a small issue with this commit (a4f23f9b3c). In src/bin/pg_amcheck/t/004_verify_heapam.pl, there is code to detect whether the page layout matches expectations and if not it calls plan skip_all. This commit adds a test is(scalar @lp_off, $ROWCOUNT, "acquired row offsets"); *before* that skip_all call. This appears to be invalid. If the skip_all happens, you get a complaint like t/004_verify_heapam.pl (Wstat: 0 Tests: 1 Failed: 0) Parse errors: Bad plan. You planned 0 tests but ran 1. We could move the is() test after all the skip_all's. Any thoughts?
Hi, On 2023-05-10 17:44:07 +0200, Peter Eisentraut wrote: > On 12.03.23 00:41, Andres Freund wrote: > > Hi, > > > > On 2023-03-11 15:34:55 -0800, Mark Dilger wrote: > > > > On Mar 11, 2023, at 3:22 PM, Andres Freund <andres@anarazel.de> wrote: > > > > > > > > Something like the attached. > > > > > > I like that your patch doesn't make the test longer. I assume you've already run the tests and that it works. > > > > I did check that, yes :). My process of writing perl is certainly, uh, > > iterative. No way I would get anything close to working without testing it. > > > > CI now finished the tests as well: > > https://cirrus-ci.com/build/6675457702100992 > > > > So I'll go ahead and push that. > > There is a small issue with this commit (a4f23f9b3c). > > In src/bin/pg_amcheck/t/004_verify_heapam.pl, there is code to detect > whether the page layout matches expectations and if not it calls plan > skip_all. Some of these skip_all's don't seem like a good idea. Why is a broken relfrozenxid a cause for skipping a test? But anyway, that's really unrelated to the topic at hand. > This commit adds a test > > is(scalar @lp_off, $ROWCOUNT, "acquired row offsets"); > > *before* that skip_all call. This appears to be invalid. If the skip_all > happens, you get a complaint like > > t/004_verify_heapam.pl (Wstat: 0 Tests: 1 Failed: 0) > Parse errors: Bad plan. You planned 0 tests but ran 1. > > We could move the is() test after all the skip_all's. Any thoughts? I think the easiest fix is to just die if we can't get the offsets - it's not like we can really continue afterwards... Greetings, Andres Freund
On 10.05.23 20:04, Andres Freund wrote: >> This commit adds a test >> >> is(scalar @lp_off, $ROWCOUNT, "acquired row offsets"); >> >> *before* that skip_all call. This appears to be invalid. If the skip_all >> happens, you get a complaint like >> >> t/004_verify_heapam.pl (Wstat: 0 Tests: 1 Failed: 0) >> Parse errors: Bad plan. You planned 0 tests but ran 1. >> >> We could move the is() test after all the skip_all's. Any thoughts? > > I think the easiest fix is to just die if we can't get the offsets - it's not > like we can really continue afterwards... This should do it: -is(scalar @lp_off, $ROWCOUNT, "acquired row offsets"); +scalar @lp_off == $ROWCOUNT or BAIL_OUT("row offset counts mismatch"); But I'm not sure what the latest thinking on BAIL_OUT is. It is used nearby in a similar way though.
Hello Andres,
12.03.2023 02:41, Andres Freund wrote:
12.03.2023 02:41, Andres Freund wrote:
CI now finished the tests as well: https://cirrus-ci.com/build/6675457702100992 So I'll go ahead and push that.
As I mentioned at [1], `meson test` fails on Windows x86 platform during
the test pg_amcheck/004_verify_heapam (I'm using VS 2022 Version 17.9.7):
meson setup build --wipe -Dcassert=true
cd build & ninja & meson test
... postgresql:pg_amcheck / pg_amcheck/004_verify_heapam ERROR 6.95s exit status 25
004_verify_heapam_test.log contains:
TRAP: failed Assert("FullTransactionIdIsNormal(fxid)"), File: "../contrib/amcheck/verify_heapam.c", Line: 1915, PID: 2560
2024-07-04 20:56:54.592 PDT [9780] LOG: server process (PID 2560) was terminated by exception 0xC0000409
2024-07-04 20:56:54.592 PDT [9780] DETAIL: Failed process was running: SELECT v.blkno, v.offnum, v.attnum, v.msg FROM pg_catalog.pg_class c, "public".verify_heapam(
relation := c.oid, on_error_stop := false, check_toast := true, skip := 'none'
) v WHERE c.oid = 16438 AND c.relpersistence != 't'
`git bisect` for this anomaly pointed at 4f5d461e0.
(I couldn't compile Postgres on that commit, but with
`git show 53ea2b7ad | git apply` (see also [2]) it's possible.)
The Assert in question is:
else
fxid = FullTransactionIdFromU64(nextfxid_i - diff);
Assert(FullTransactionIdIsNormal(fxid));
It was not clear to me how it comes out that fxid is not normal, until I
looked at the disassembly:
else
fxid = FullTransactionIdFromU64(nextfxid_i - diff);
751812D2 sub ebx,eax
751812D4 sbb edi,edx
Assert(FullTransactionIdIsNormal(fxid));
751812D6 jne FullTransactionIdFromXidAndCtx+0E6h (751812F6h)
751812D8 jb FullTransactionIdFromXidAndCtx+0CFh (751812DFh)
751812DA cmp ebx,3
751812DD jae FullTransactionIdFromXidAndCtx+0E6h (751812F6h)
751812DF push 77Bh
751812E4 push offset string "../contrib/amcheck/verify_heapa@"... (7518C4A4h)
751812E9 push offset string "FullTransactionIdIsNormal(fxid)" (7518DB04h)
751812EE call _ExceptionalCondition (75189FFEh)
The same code fragment for your convenience:
https://ideone.com/8wiGRY
Could you please look at this?
[1] https://www.postgresql.org/message-id/72705e42-42d1-ac6e-e7d5-4baec8a0d2af%40gmail.com
[2] https://postgr.es/m/17967-cd21e34a314141b2@postgresql.org
Best regards,
Alexander