Thread: Failures in constraints regression test, "read only 0 of 8192 bytes"

These two animals seem to have got mixed up about about the size of
this relation in the same place:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=avocet&dt=2024-02-28%2017%3A34%3A30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2024-03-01%2006%3A47%3A53

+++ /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/regress/results/constraints.out
2024-03-01 08:22:11.624897033 +0100
@@ -573,42 +573,38 @@
  UNIQUE (i) DEFERRABLE INITIALLY DEFERRED;
 BEGIN;
 INSERT INTO unique_tbl VALUES (1, 'five');
+ERROR:  could not read blocks 0..0 in file "base/16384/21437": read
only 0 of 8192 bytes

That error message changed slightly in my smgrreadv() commit a couple
of months ago (it would have been "block 0" and now it's "blocks 0..0"
because now we can read more than one block at a time) but I don't
immediately see how anything at that low level could be responsible
for this.



These are "my" animals (running at a local university). There's a couple
interesting details:

1) the animals run on the same machine (one with gcc, one with clang)

2) I did upgrade the OS and restarted the machine on 2024/02/26, i.e.
right before the failures started

These might be just coincidences, but maybe something got broken by the
upgrade ... OTOH it's weird it'd affect just HEAD and none of the other
branches, and on two difference compilers.

Just to be sure I removed the buildroot, in case there's something wrong
with ccache. It's a wild guess, but I don't have any other idea.

regards

On 3/2/24 22:39, Thomas Munro wrote:
> These two animals seem to have got mixed up about about the size of
> this relation in the same place:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=avocet&dt=2024-02-28%2017%3A34%3A30
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2024-03-01%2006%3A47%3A53
> 
> +++ /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/regress/results/constraints.out
> 2024-03-01 08:22:11.624897033 +0100
> @@ -573,42 +573,38 @@
>   UNIQUE (i) DEFERRABLE INITIALLY DEFERRED;
>  BEGIN;
>  INSERT INTO unique_tbl VALUES (1, 'five');
> +ERROR:  could not read blocks 0..0 in file "base/16384/21437": read
> only 0 of 8192 bytes
> 
> That error message changed slightly in my smgrreadv() commit a couple
> of months ago (it would have been "block 0" and now it's "blocks 0..0"
> because now we can read more than one block at a time) but I don't
> immediately see how anything at that low level could be responsible
> for this.
> 
> 

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Le samedi 2 mars 2024, 23:29:52 CET Tomas Vondra a écrit :
> These are "my" animals (running at a local university). There's a couple
> interesting details:

Hi Tomas,
do you still have the failing cluster data ?

Noah pointed me to this thread, and it looks a bit similar to the FSM
corruption issue I'm facing: https://www.postgresql.org/message-id/
1925490.taCxCBeP46%40aivenlaptop

So if you still have the data, it would be nice to see if you indeed have a
corrupted FSM, and if you have indications when it happened.

Best regards,

--
Ronan Dunklau






On 3/4/24 14:16, Ronan Dunklau wrote:
> Le samedi 2 mars 2024, 23:29:52 CET Tomas Vondra a écrit :
>> These are "my" animals (running at a local university). There's a couple
>> interesting details:
> 
> Hi Tomas,
> do you still have the failing cluster data ? 
> 
> Noah pointed me to this thread, and it looks a bit similar to the FSM 
> corruption issue I'm facing: https://www.postgresql.org/message-id/
> 1925490.taCxCBeP46%40aivenlaptop
> 
> So if you still have the data, it would be nice to see if you indeed have a 
> corrupted FSM, and if you have indications when it happened.
> 

Sorry, I nuked the buildroot so I don't have the data anymore. Let's see
if it fails again.

regards
-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Happened again.  I see this is OpenSUSE.  Does that mean the file
system is Btrfs?



On 3/8/24 09:33, Thomas Munro wrote:
> Happened again.  I see this is OpenSUSE.  Does that mean the file
> system is Btrfs?


It is, but I don't think that matters - I've been able to reproduce this
locally on my laptop using ext4 filesystem. I'd bet the important piece
here is -DCLOBBER_CACHE_ALWAYS (and it seems avocet/trilobite are the
only animals running with this).

Also, if this really is a filesystem (or environment) issue, it seems
very strange it'd only affect HEAD and not the other branches. So it
seems quite likely this is actually triggered by a commit.

Looking at the commits from the good/bad runs, I see this:

avocet: good=4c2369a bad=f5a465f
trilobite: good=d13ff82 bad=def0ce3

That means the commit would have to be somewhere here:

f5a465f1a07 Promote assertion about !ReindexIsProcessingIndex to ...
57b28c08305 Doc: fix minor typos in two ECPG function descriptions.
28e858c0f95 Improve documentation and GUC description for ...
a661bf7b0f5 Remove flaky isolation tests for timeouts
874d817baa1 Multiple revisions to the GROUP BY reordering tests
466979ef031 Replace lateral references to removed rels in subqueries
a6b2a51e16d Avoid dangling-pointer problem with partitionwise ...
d360e3cc60e Fix compiler warning on typedef redeclaration
8af25652489 Introduce a new smgr bulk loading facility.
e612384fc78 Fix mistake in SQL features list
d13ff82319c Fix BF failure in commit 93db6cbda0.

My guess would be 8af25652489, as it's the only storage-related commit.

I'm currently running tests to verify this.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




On 3/8/24 13:21, Tomas Vondra wrote:
> On 3/8/24 09:33, Thomas Munro wrote:
>> Happened again.  I see this is OpenSUSE.  Does that mean the file
>> system is Btrfs?
> 
> 
> It is, but I don't think that matters - I've been able to reproduce this
> locally on my laptop using ext4 filesystem. I'd bet the important piece
> here is -DCLOBBER_CACHE_ALWAYS (and it seems avocet/trilobite are the
> only animals running with this).
> 
> Also, if this really is a filesystem (or environment) issue, it seems
> very strange it'd only affect HEAD and not the other branches. So it
> seems quite likely this is actually triggered by a commit.
> 
> Looking at the commits from the good/bad runs, I see this:
> 
> avocet: good=4c2369a bad=f5a465f
> trilobite: good=d13ff82 bad=def0ce3
> 
> That means the commit would have to be somewhere here:
> 
> f5a465f1a07 Promote assertion about !ReindexIsProcessingIndex to ...
> 57b28c08305 Doc: fix minor typos in two ECPG function descriptions.
> 28e858c0f95 Improve documentation and GUC description for ...
> a661bf7b0f5 Remove flaky isolation tests for timeouts
> 874d817baa1 Multiple revisions to the GROUP BY reordering tests
> 466979ef031 Replace lateral references to removed rels in subqueries
> a6b2a51e16d Avoid dangling-pointer problem with partitionwise ...
> d360e3cc60e Fix compiler warning on typedef redeclaration
> 8af25652489 Introduce a new smgr bulk loading facility.
> e612384fc78 Fix mistake in SQL features list
> d13ff82319c Fix BF failure in commit 93db6cbda0.
> 
> My guess would be 8af25652489, as it's the only storage-related commit.
> 
> I'm currently running tests to verify this.
> 

Yup, the breakage starts with this commit. I haven't looked into the
root cause, or whether the commit maybe just made some pre-existing
issue easier to hit. Also, I haven't followed the discussion on the
pgsql-bugs thread [1], maybe there are some interesting findings.

[1]
https://www.postgresql.org/message-id/flat/1878547.tdWV9SEqCh%40aivenlaptop


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Le vendredi 8 mars 2024, 14:36:48 CET Tomas Vondra a écrit :
> > My guess would be 8af25652489, as it's the only storage-related commit.
> >
> > I'm currently running tests to verify this.
>
> Yup, the breakage starts with this commit. I haven't looked into the
> root cause, or whether the commit maybe just made some pre-existing
> issue easier to hit. Also, I haven't followed the discussion on the
> pgsql-bugs thread [1], maybe there are some interesting findings.
>

If that happens only on HEAD and not on 16, and doesn't involve WAL replay,
then it's not the same bug.

--
Ronan Dunklau





On Sat, Mar 9, 2024 at 2:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> On 3/8/24 13:21, Tomas Vondra wrote:
> > My guess would be 8af25652489, as it's the only storage-related commit.
> >
> > I'm currently running tests to verify this.
> >
>
> Yup, the breakage starts with this commit. I haven't looked into the
> root cause, or whether the commit maybe just made some pre-existing
> issue easier to hit. Also, I haven't followed the discussion on the
> pgsql-bugs thread [1], maybe there are some interesting findings.

Adding Heikki.




On 3/8/24 21:29, Thomas Munro wrote:
> On Sat, Mar 9, 2024 at 2:36 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> On 3/8/24 13:21, Tomas Vondra wrote:
>>> My guess would be 8af25652489, as it's the only storage-related commit.
>>>
>>> I'm currently running tests to verify this.
>>>
>>
>> Yup, the breakage starts with this commit. I haven't looked into the
>> root cause, or whether the commit maybe just made some pre-existing
>> issue easier to hit. Also, I haven't followed the discussion on the
>> pgsql-bugs thread [1], maybe there are some interesting findings.
> 
> Adding Heikki.

I spent a bit of time investigating this today, but haven't made much
progress due to (a) my unfamiliarity with the smgr code in general and
the patch in particular, and (b) CLOBBER_CACHE_ALWAYS making it quite
time consuming to iterate and experiment.

However, the smallest schedule that still reproduces the issue is:

-------------------
test: test_setup

test: create_aggregate create_function_sql create_cast constraints
triggers select inherit typed_table vacuum drop_if_exists
updatable_views roleattributes create_am hash_func errors infinite_recurse
-------------------

I tried to reduce the second step to just "constraints", but that does
not fail. Clearly there's some concurrency at play, and having just a
single backend makes that go away.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Sat, Mar 9, 2024 at 9:48 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I spent a bit of time investigating this today, but haven't made much
> progress due to (a) my unfamiliarity with the smgr code in general and
> the patch in particular, and (b) CLOBBER_CACHE_ALWAYS making it quite
> time consuming to iterate and experiment.
>
> However, the smallest schedule that still reproduces the issue is:
>
> -------------------
> test: test_setup
>
> test: create_aggregate create_function_sql create_cast constraints
> triggers select inherit typed_table vacuum drop_if_exists
> updatable_views roleattributes create_am hash_func errors infinite_recurse
> -------------------

Thanks, reproduced here (painfully slowly).  Looking...

Huh, also look at these extra problems later in the logs of the latest
trilobite and avocet runs, further down after the short read errors:

TRAP: failed Assert("j > attnum"), File: "heaptuple.c", Line: 640, PID: 15753
postgres: autovacuum worker regression(ExceptionalCondition+0x67)[0x9c5f37]
postgres: autovacuum worker regression[0x4b60c8]
postgres: autovacuum worker regression[0x5ff735]
postgres: autovacuum worker regression[0x5fe468]
postgres: autovacuum worker regression(analyze_rel+0x133)[0x5fd5e3]
postgres: autovacuum worker regression(vacuum+0x6b6)[0x683926]
postgres: autovacuum worker regression[0x7ce5e3]
postgres: autovacuum worker regression[0x7cc4f0]
postgres: autovacuum worker regression(StartAutoVacWorker+0x22)[0x7cc152]
postgres: autovacuum worker regression[0x7d57d1]
postgres: autovacuum worker regression[0x7d37bf]
postgres: autovacuum worker regression[0x6f5a4f]

Then crash recovery fails, in one case with:

2024-03-07 20:28:18.632 CET [15860:4] WARNING:  will not overwrite a used ItemId
2024-03-07 20:28:18.632 CET [15860:5] CONTEXT:  WAL redo at 0/FB07A48
for Heap/INSERT: off: 9, flags: 0x00; blkref #0: rel 1663/16384/2610,
blk 12
2024-03-07 20:28:18.632 CET [15860:6] PANIC:  failed to add tuple
2024-03-07 20:28:18.632 CET [15860:7] CONTEXT:  WAL redo at 0/FB07A48
for Heap/INSERT: off: 9, flags: 0x00; blkref #0: rel 1663/16384/2610,
blk 12

... and in another with:

2024-03-05 11:51:27.992 CET [25510:4] PANIC:  invalid lp
2024-03-05 11:51:27.992 CET [25510:5] CONTEXT:  WAL redo at 0/F87A8D0
for Heap/INPLACE: off: 29; blkref #0: rel 1663/16384/20581, blk 0



On Sun, Mar 10, 2024 at 5:02 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Thanks, reproduced here (painfully slowly).  Looking...

I changed that ERROR to a PANIC and now I can see that
_bt_metaversion() is failing to read a meta page (block 0), and the
file is indeed of size 0 in my filesystem.  Which is not cool, for a
btree.  Looking at btbuildempty(), we have this sequence:

    bulkstate = smgr_bulk_start_rel(index, INIT_FORKNUM);

    /* Construct metapage. */
    metabuf = smgr_bulk_get_buf(bulkstate);
    _bt_initmetapage((Page) metabuf, P_NONE, 0, allequalimage);
    smgr_bulk_write(bulkstate, BTREE_METAPAGE, metabuf, true);

    smgr_bulk_finish(bulkstate);

Ooh.  One idea would be that the smgr lifetime stuff is b0rked,
introducing corruption.  Bulk write itself isn't pinning the smgr
relation, it's relying purely on the object not being invalidated,
which the theory of 21d9c3ee's commit message allowed for but ... here
it's destroyed (HASH_REMOVE'd) sooner under CACHE_CLOBBER_ALWAYS,
which I think we failed to grok.  If that's it, I'm surprised that
things don't implode more spectacularly.  Perhaps HASH_REMOVE should
clobber objects in debug builds, similar to pfree?

For that hypothesis, the corruption might not be happening in the
above-quoted code itself, because it doesn't seem to have an
invalidation acceptance point (unless I'm missing it).  Some other
bulk write got mixed up?  Not sure yet.

I won't be surprised if the answer is: if you're holding a reference,
you have to get a pin (referring to bulk_write.c).



On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> I won't be surprised if the answer is: if you're holding a reference,
> you have to get a pin (referring to bulk_write.c).

Ahhh, on second thoughts, I take that back, I think the original
theory still actually works just fine.  It's just that somewhere in
our refactoring of that commit, when we were vacillating between
different semantics for 'destroy' and 'release', I think we made a
mistake: in RelationCacheInvalidate() I think we should now call
smgrreleaseall(), not smgrdestroyall().  That satisfies the
requirements for sinval queue overflow: we close md.c segments (and
most importantly virtual file descriptors), so our lack of sinval
records can't hurt us, we'll reopen all files as required.  That's
what CLOBBER_CACHE_ALWAYS is effectively testing (and more).  But the
smgr pointer remains valid, and retains only its "identity", eg hash
table key, and that's also fine.  It won't be destroyed until after
the end of the transaction.  Which was the point, and it allows things
like bulk_write.c (and streaming_read.c) to hold an smgr reference.
Right?



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

From
Heikki Linnakangas
Date:
Thanks for diagnosing this!

On 10/03/2024 08:23, Thomas Munro wrote:
> On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> I won't be surprised if the answer is: if you're holding a reference,
>> you have to get a pin (referring to bulk_write.c).
> 
> Ahhh, on second thoughts, I take that back, I think the original
> theory still actually works just fine.  It's just that somewhere in
> our refactoring of that commit, when we were vacillating between
> different semantics for 'destroy' and 'release', I think we made a
> mistake: in RelationCacheInvalidate() I think we should now call
> smgrreleaseall(), not smgrdestroyall().

Yes, I ran the reproducer with 'rr', and came to the same conclusion. 
That smgrdestroyall() call closes destroys the SmgrRelation, breaking 
the new assumption that an unpinned SmgrRelation is valid until end of 
transaction.

> That satisfies the
> requirements for sinval queue overflow: we close md.c segments (and
> most importantly virtual file descriptors), so our lack of sinval
> records can't hurt us, we'll reopen all files as required.  That's
> what CLOBBER_CACHE_ALWAYS is effectively testing (and more).  But the
> smgr pointer remains valid, and retains only its "identity", eg hash
> table key, and that's also fine.  It won't be destroyed until after
> the end of the transaction.  Which was the point, and it allows things
> like bulk_write.c (and streaming_read.c) to hold an smgr reference.
> Right

+1.

I wonder if we can now simplify RelationCacheInvalidate() further. In 
the loop above that smgrdestroyall():

>     while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
>     {
>         relation = idhentry->reldesc;
> 
>         /* Must close all smgr references to avoid leaving dangling ptrs */
>         RelationCloseSmgr(relation);
> 
>         /*
>          * Ignore new relations; no other backend will manipulate them before
>          * we commit.  Likewise, before replacing a relation's relfilelocator,
>          * we shall have acquired AccessExclusiveLock and drained any
>          * applicable pending invalidations.
>          */
>         if (relation->rd_createSubid != InvalidSubTransactionId ||
>             relation->rd_firstRelfilelocatorSubid != InvalidSubTransactionId)
>             continue;
> 

I don't think we need to call RelationCloseSmgr() for relations created 
in the same transaction anymore.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

From
Heikki Linnakangas
Date:
On 10/03/2024 11:20, Heikki Linnakangas wrote:
> On 10/03/2024 08:23, Thomas Munro wrote:
>> On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>>> I won't be surprised if the answer is: if you're holding a reference,
>>> you have to get a pin (referring to bulk_write.c).
>>
>> Ahhh, on second thoughts, I take that back, I think the original
>> theory still actually works just fine.  It's just that somewhere in
>> our refactoring of that commit, when we were vacillating between
>> different semantics for 'destroy' and 'release', I think we made a
>> mistake: in RelationCacheInvalidate() I think we should now call
>> smgrreleaseall(), not smgrdestroyall().
> 
> Yes, I ran the reproducer with 'rr', and came to the same conclusion.
> That smgrdestroyall() call closes destroys the SmgrRelation, breaking
> the new assumption that an unpinned SmgrRelation is valid until end of
> transaction.
> 
>> That satisfies the
>> requirements for sinval queue overflow: we close md.c segments (and
>> most importantly virtual file descriptors), so our lack of sinval
>> records can't hurt us, we'll reopen all files as required.  That's
>> what CLOBBER_CACHE_ALWAYS is effectively testing (and more).  But the
>> smgr pointer remains valid, and retains only its "identity", eg hash
>> table key, and that's also fine.  It won't be destroyed until after
>> the end of the transaction.  Which was the point, and it allows things
>> like bulk_write.c (and streaming_read.c) to hold an smgr reference.
>> Right
> 
> +1.
> 
> I wonder if we can now simplify RelationCacheInvalidate() further. In
> the loop above that smgrdestroyall():
> 
>>     while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
>>     {
>>         relation = idhentry->reldesc;
>>
>>         /* Must close all smgr references to avoid leaving dangling ptrs */
>>         RelationCloseSmgr(relation);
>>
>>         /*
>>          * Ignore new relations; no other backend will manipulate them before
>>          * we commit.  Likewise, before replacing a relation's relfilelocator,
>>          * we shall have acquired AccessExclusiveLock and drained any
>>          * applicable pending invalidations.
>>          */
>>         if (relation->rd_createSubid != InvalidSubTransactionId ||
>>             relation->rd_firstRelfilelocatorSubid != InvalidSubTransactionId)
>>             continue;
>>
> 
> I don't think we need to call RelationCloseSmgr() for relations created
> in the same transaction anymore.

Barring objections, I'll commit the attached.

Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's 
not required for correctness AFAICS. We don't do it in single-rel 
invalidation in RelationCacheInvalidateEntry() either.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment
On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Barring objections, I'll commit the attached.

+1

I guess the comment for smgrreleaseall() could also be updated.  It
mentions only PROCSIGNAL_BARRIER_SMGRRELEASE, but I think sinval
overflow (InvalidateSystemCaches()) should also be mentioned?

> Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's
> not required for correctness AFAICS. We don't do it in single-rel
> invalidation in RelationCacheInvalidateEntry() either.

I think we do, because we have missed sinval messages.  It's unlikely
but a relfilenode might have been recycled, and we might have file
descriptors that point to the unlinked files.  That is, there are new
files with the same names and we need to open those ones.



On Mon, Mar 11, 2024 at 9:59 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's
> > not required for correctness AFAICS. We don't do it in single-rel
> > invalidation in RelationCacheInvalidateEntry() either.
>
> I think we do, because we have missed sinval messages.  It's unlikely
> but a relfilenode might have been recycled, and we might have file
> descriptors that point to the unlinked files.  That is, there are new
> files with the same names and we need to open those ones.

... though I think you would be right if Dilip and Robert had
succeeded in their quest to introduce 56-bit non-cycling relfilenodes.
And for the record, we can also shoot ourselves in the foot in another
known case without sinval[1], so more work is needed here, but that
doesn't mean this sinval code path should also aim footwards.

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKGLs554tQFCUjv_vn7ft9Xv5LNjPoAd--3Df%2BJJKJ7A8kw%40mail.gmail.com#f099d68e95edcfe408818447d9da04a7



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

From
Heikki Linnakangas
Date:
On 10/03/2024 22:59, Thomas Munro wrote:
> On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Barring objections, I'll commit the attached.
> 
> +1

Pushed, thanks!

> I guess the comment for smgrreleaseall() could also be updated.  It
> mentions only PROCSIGNAL_BARRIER_SMGRRELEASE, but I think sinval
> overflow (InvalidateSystemCaches()) should also be mentioned?

I removed that comment; people can grep to find the callers.

>> Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's
>> not required for correctness AFAICS. We don't do it in single-rel
>> invalidation in RelationCacheInvalidateEntry() either.
> 
> I think we do, because we have missed sinval messages.  It's unlikely
> but a relfilenode might have been recycled, and we might have file
> descriptors that point to the unlinked files.  That is, there are new
> files with the same names and we need to open those ones.

Gotcha.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

From
Alexander Lakhin
Date:
Hello Heikki,

11.03.2024 10:09, Heikki Linnakangas wrote:
> On 10/03/2024 22:59, Thomas Munro wrote:
>> On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> Barring objections, I'll commit the attached.
>>
>> +1
>
> Pushed, thanks!

Please look at a new anomaly, that I've discovered in master.

Starting from af0e7deb4, the following script:
numjobs=80
for ((i=1;i<=50;i++)); do
echo "I $i"

for ((j=1;j<=numjobs;j++)); do createdb db$j; done

for ((j=1;j<=numjobs;j++)); do
echo "
VACUUM FULL pg_class;
REINDEX INDEX pg_class_oid_index;
" | psql -d db$j >/dev/null 2>&1 &

echo "
CREATE TABLE tbl1 (t text);
DROP TABLE tbl1;
" | psql -d db$j >/dev/null 2>&1 &
done
wait

grep 'was terminated' server.log && break;
for ((j=1;j<=numjobs;j++)); do dropdb db$j; done

done

triggers a segfault:
2024-06-19 19:22:49.009 UTC [1607210:6] LOG:  server process (PID 1607671) was terminated by signal 11: Segmentation
fault

with the following stack trace:
Core was generated by `postgres: law db50 [local] CREATE TABLE                                       '.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/1607671' in core file too small.
#0  0x000055d04cb8232e in RelationReloadNailed (relation=0x7f7d0a1b1fd8) at relcache.c:2415
2415                            relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
(gdb) bt
#0  0x000055d04cb8232e in RelationReloadNailed (relation=0x7f7d0a1b1fd8) at relcache.c:2415
#1  0x000055d04cb8278c in RelationClearRelation (relation=0x7f7d0a1b1fd8, rebuild=true) at relcache.c:2560
#2  0x000055d04cb834e9 in RelationCacheInvalidate (debug_discard=false) at relcache.c:3048
#3  0x000055d04cb72e3c in InvalidateSystemCachesExtended (debug_discard=false) at inval.c:680
#4  0x000055d04cb73190 in InvalidateSystemCaches () at inval.c:794
#5  0x000055d04c9754ad in ReceiveSharedInvalidMessages (
     invalFunction=0x55d04cb72eee <LocalExecuteInvalidationMessage>,
     resetFunction=0x55d04cb7317e <InvalidateSystemCaches>) at sinval.c:105
#6  0x000055d04cb731b4 in AcceptInvalidationMessages () at inval.c:808
#7  0x000055d04c97d2ed in LockRelationOid (relid=2662, lockmode=1) at lmgr.c:136
#8  0x000055d04c404b1f in relation_open (relationId=2662, lockmode=1) at relation.c:55
#9  0x000055d04c47941e in index_open (relationId=2662, lockmode=1) at indexam.c:137
#10 0x000055d04c4787c2 in systable_beginscan (heapRelation=0x7f7d0a1b1fd8, indexId=2662, indexOK=true, snapshot=0x0,
     nkeys=1, key=0x7ffd456a8570) at genam.c:396
#11 0x000055d04cb7e93d in ScanPgRelation (targetRelId=3466, indexOK=true, force_non_historic=false) at relcache.c:381
#12 0x000055d04cb7fe15 in RelationBuildDesc (targetRelId=3466, insertIt=true) at relcache.c:1093
#13 0x000055d04cb81c93 in RelationIdGetRelation (relationId=3466) at relcache.c:2108
#14 0x000055d04c404b29 in relation_open (relationId=3466, lockmode=1) at relation.c:58
#15 0x000055d04cb720a6 in BuildEventTriggerCache () at evtcache.c:129
#16 0x000055d04cb71f6a in EventCacheLookup (event=EVT_SQLDrop) at evtcache.c:68
#17 0x000055d04c61c037 in trackDroppedObjectsNeeded () at event_trigger.c:1342
#18 0x000055d04c61bf02 in EventTriggerBeginCompleteQuery () at event_trigger.c:1284
#19 0x000055d04c9ac744 in ProcessUtilitySlow (pstate=0x55d04d04aca0, pstmt=0x55d04d021540,
     queryString=0x55d04d020830 "CREATE TABLE tbl1 (t text);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
     queryEnv=0x0, dest=0x55d04d021800, qc=0x7ffd456a8fb0) at utility.c:1107
#20 0x000055d04c9ac64d in standard_ProcessUtility (pstmt=0x55d04d021540,
     queryString=0x55d04d020830 "CREATE TABLE tbl1 (t text);", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL,
     params=0x0, queryEnv=0x0, dest=0x55d04d021800, qc=0x7ffd456a8fb0) at utility.c:1067
#21 0x000055d04c9ab54d in ProcessUtility (pstmt=0x55d04d021540,
     queryString=0x55d04d020830 "CREATE TABLE tbl1 (t text);", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL,
     params=0x0, queryEnv=0x0, dest=0x55d04d021800, qc=0x7ffd456a8fb0) at utility.c:523
#22 0x000055d04c9a9dc6 in PortalRunUtility (portal=0x55d04d0c1020, pstmt=0x55d04d021540, isTopLevel=true,
     setHoldSnapshot=false, dest=0x55d04d021800, qc=0x7ffd456a8fb0) at pquery.c:1158
#23 0x000055d04c9aa03d in PortalRunMulti (portal=0x55d04d0c1020, isTopLevel=true, setHoldSnapshot=false,
     dest=0x55d04d021800, altdest=0x55d04d021800, qc=0x7ffd456a8fb0) at pquery.c:1315
#24 0x000055d04c9a9487 in PortalRun (portal=0x55d04d0c1020, count=9223372036854775807, isTopLevel=true, run_once=true,
     dest=0x55d04d021800, altdest=0x55d04d021800, qc=0x7ffd456a8fb0) at pquery.c:791
#25 0x000055d04c9a2150 in exec_simple_query (query_string=0x55d04d020830 "CREATE TABLE tbl1 (t text);")
     at postgres.c:1273
#26 0x000055d04c9a71f5 in PostgresMain (dbname=0x55d04d05dbf0 "db50", username=0x55d04d05dbd8 "law") at
postgres.c:4675
#27 0x000055d04c8bd8c8 in BackendRun (port=0x55d04d0540c0) at postmaster.c:4475
#28 0x000055d04c8bcedd in BackendStartup (port=0x55d04d0540c0) at postmaster.c:4151
#29 0x000055d04c8b93a9 in ServerLoop () at postmaster.c:1769
#30 0x000055d04c8b8ca4 in PostmasterMain (argc=3, argv=0x55d04d01a770) at postmaster.c:1468
#31 0x000055d04c76c577 in main (argc=3, argv=0x55d04d01a770) at main.c:197

(gdb) p pg_class_tuple
$1 = (HeapTuple) 0x0

server.log might also contain:
2024-06-19 19:25:38.060 UTC [1618682:5] psql ERROR:  could not read blocks 3..3 in file "base/28531/28840": read only 0

of 8192 bytes

Best regards,
Alexander



On Wed, Jun 19, 2024 at 11:00:00PM +0300, Alexander Lakhin wrote:
> Starting from af0e7deb4, the following script:
> [...]
> triggers a segfault:
> 2024-06-19 19:22:49.009 UTC [1607210:6] LOG:  server process (PID
> 1607671) was terminated by signal 11: Segmentation fault

Open item added for this issue.
--
Michael

Attachment

Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

From
Heikki Linnakangas
Date:
On 19/06/2024 23:00, Alexander Lakhin wrote:
> Please look at a new anomaly, that I've discovered in master.
> 
> ...
> 
> triggers a segfault:
> 2024-06-19 19:22:49.009 UTC [1607210:6] LOG:  server process (PID 1607671) was terminated by signal 11: Segmentation
fault
> 
> ...
> 
> server.log might also contain:
> 2024-06-19 19:25:38.060 UTC [1618682:5] psql ERROR:  could not read blocks 3..3 in file "base/28531/28840": read only
0
> of 8192 bytes

Thanks for the report! I was not able to reproduce the segfault, but I 
do see the "could not read blocks" error very quickly with the script.

In commit af0e7deb4a, I removed the call to RelationCloseSmgr() from 
RelationCacheInvalidate(). I thought it was no longer needed, because we 
no longer free the underlying SmgrRelation.

However, it meant that if the relfilenode of the relation was changed, 
the relation keeps pointing to the SMgrRelation of the old relfilenode. 
So we still need the RelationCloseSmgr() call, in case the relfilenode 
has changed.

Per attached patch.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> In commit af0e7deb4a, I removed the call to RelationCloseSmgr() from 
> RelationCacheInvalidate(). I thought it was no longer needed, because we 
> no longer free the underlying SmgrRelation.

> However, it meant that if the relfilenode of the relation was changed, 
> the relation keeps pointing to the SMgrRelation of the old relfilenode. 
> So we still need the RelationCloseSmgr() call, in case the relfilenode 
> has changed.

Ouch.  How come we did not see this immediately in testing?  I'd have
thought surely such a bug would be exposed by any command that
rewrites a heap.

            regards, tom lane



Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

From
Heikki Linnakangas
Date:
On 21/06/2024 02:12, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> In commit af0e7deb4a, I removed the call to RelationCloseSmgr() from
>> RelationCacheInvalidate(). I thought it was no longer needed, because we
>> no longer free the underlying SmgrRelation.
> 
>> However, it meant that if the relfilenode of the relation was changed,
>> the relation keeps pointing to the SMgrRelation of the old relfilenode.
>> So we still need the RelationCloseSmgr() call, in case the relfilenode
>> has changed.
> 
> Ouch.  How come we did not see this immediately in testing?  I'd have
> thought surely such a bug would be exposed by any command that
> rewrites a heap.

There is a RelationCloseSmgr() call in RelationClearRelation(), which 
covers the common cases. This only occurs during 
RelationCacheInvalidate(), when pg_class's relfilenumber was changed.

Hmm, looking closer, I think this might be a more appropriate place for 
the RelationCloseSmgr() call:

>             /*
>              * If it's a mapped relation, immediately update its rd_locator in
>              * case its relfilenumber changed.  We must do this during phase 1
>              * in case the relation is consulted during rebuild of other
>              * relcache entries in phase 2.  It's safe since consulting the
>              * map doesn't involve any access to relcache entries.
>              */
>             if (RelationIsMapped(relation))
>                 RelationInitPhysicalAddr(relation);

That's where we change the relfilenumber, before the 
RelationClearRelation() call.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

From
Heikki Linnakangas
Date:
On 21/06/2024 02:25, Heikki Linnakangas wrote:
> Hmm, looking closer, I think this might be a more appropriate place for
> the RelationCloseSmgr() call:
> 
>>             /*
>>              * If it's a mapped relation, immediately update its rd_locator in
>>              * case its relfilenumber changed.  We must do this during phase 1
>>              * in case the relation is consulted during rebuild of other
>>              * relcache entries in phase 2.  It's safe since consulting the
>>              * map doesn't involve any access to relcache entries.
>>              */
>>             if (RelationIsMapped(relation))
>>                 RelationInitPhysicalAddr(relation);
> 
> That's where we change the relfilenumber, before the
> RelationClearRelation() call.

Pushed a fix that way.

-- 
Heikki Linnakangas
Neon (https://neon.tech)