Thread: BUG #18866: Running pg_freespace() on views triggers an Abort

BUG #18866: Running pg_freespace() on views triggers an Abort

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18866
Logged by:          Robins Tharakan
Email address:      tharakan@gmail.com
PostgreSQL version: Unsupported/Unknown
Operating system:   Ubuntu
Description:

Hi,

Passing a view to pg_freespace() triggers an Abort on HEAD. This has been so
since the beginning (049ef3398d05c9dc8f48aa9a6d68440661cfeb87). Given that
this is just an assert, feel free to skip - but thought I'd bring it up, in
case this needs a review.

SQL / Backtrace / Error Log excerpt given below:


SQL
===
$ cat crashing.sql
CREATE EXTENSION pg_freespacemap;
SELECT pg_freespace('pg_roles', 0);


SQL Output
==========
$ psql -f crashing.sql
CREATE EXTENSION
psql:crashing.sql:2: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
psql:crashing.sql:2: error: connection to server was lost


Backtrace
=========
#5  0x00005b073976f7e0 in ExceptionalCondition (conditionName=0x5b07399bb930
"RelFileNumberIsValid(rlocator.relNumber)", fileName=0x5b07399bb90c
"smgr.c", lineNumber=228)
    at assert.c:66
No locals.
#6  0x00005b0739556b62 in smgropen (rlocator=..., backend=-1) at
smgr.c:228
        brlocator = {locator = {spcOid = 1101590320, dbOid = 23303,
relNumber = 1615569600}, backend = 31264}
        reln = 0x5b0741ab40a8
        found = 65
#7  0x00005b07395163cb in RelationGetSmgr (rel=0x7a2060b99bf8) at
../../../../src/include/utils/rel.h:579
No locals.
#8  0x00005b0739516dc2 in fsm_readbuf (rel=0x7a2060b99bf8, addr=...,
extend=false) at freespace.c:558
        blkno = 2
        buf = -998073056
        reln = 0x0
#9  0x00005b0739516705 in GetRecordedFreeSpace (rel=0x7a2060b99bf8,
heapBlk=0) at freespace.c:254
        addr = {level = 0, logpageno = 0}
        slot = 0
        buf = 23303
        cat = 22 '\026'
#10 0x00007a2061ab630c in pg_freespace (fcinfo=0x5b0741b07928) at
pg_freespacemap.c:38
        relid = 12000
        blkno = 0
        freespace = -17281
        rel = 0x7a2060b99bf8
        __func__ = "pg_freespace"
#11 0x00005b0739259dc4 in ExecInterpExpr (state=0x5b0741b077d0,
econtext=0x5b0741b074b8, isnull=0x0) at execExprInterp.c:1001
        d = 100086724982736
        fcinfo = 0x5b0741b07928
        args = 0x5b0741b07948
        op = 0x5b0741b07b50
        resultslot = 0x5b0741b07708
        innerslot = 0x0
        outerslot = 0x0
        scanslot = 0x0
        oldslot = 0x0
        newslot = 0x0


Surfacing Commit
================
Checking (62f36d6924c~0) - 62f36d6924 - fail
Checking (62f36d6924c~10) - 023fb51275 - fail
Checking (62f36d6924c~30) - e215166c9c - fail
.
.
.
Checking (62f36d6924c~3560) - b31ba5310b - fail
Checking (62f36d6924c~3565) - e5b8c4f68f - pass
Checking (62f36d6924c~3562) - 049ef3398d - fail
Checking (62f36d6924c~3563) - cd7f19da34 - pass
Surfacing commit is 049ef3398d05c9dc8f48aa9a6d68440661cfeb87


Postgres Error Logs
===================
TRAP: failed Assert("RelFileNumberIsValid(rlocator.relNumber)"), File:
"smgr.c", Line: 228, PID: 2372149
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(ExceptionalCondition+0xbb)[0x5b073976f7ad]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(smgropen+0x5e)[0x5b0739556b62]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x6b83cb)[0x5b07395163cb]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x6b8dc2)[0x5b0739516dc2]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(GetRecordedFreeSpace+0x50)[0x5b0739516705]
/home/smith/proj/blamepg/lib/postgresql/pg_freespacemap.so(pg_freespace+0xc4)[0x7a2061ab730c]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x3fbdc4)[0x5b0739259dc4]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(ExecInterpExprStillValid+0x5b)[0x5b073925c94a]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x461f27)[0x5b07392bff27]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x461fe5)[0x5b07392bffe5]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x462046)[0x5b07392c0046]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x462269)[0x5b07392c0269]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x418aba)[0x5b0739276aba]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x40aeaf)[0x5b0739268eaf]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x40df9e)[0x5b073926bf9e]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(standard_ExecutorRun+0x1cc)[0x5b07392696eb]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(ExecutorRun+0x54)[0x5b073926951c]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x707af2)[0x5b0739565af2]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(PortalRun+0x274)[0x5b0739565720]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x6ffc69)[0x5b073955dc69]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(PostgresMain+0xb8d)[0x5b0739563461]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x6fafe7)[0x5b0739558fe7]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(postmaster_child_launch+0x174)[0x5b073945cac4]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x605664)[0x5b0739463664]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(+0x602a7e)[0x5b0739460a7e]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(PostmasterMain+0x1546)[0x5b0739460374]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(main+0x38c)[0x5b07392ff7db]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7a206042a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7a206042a28b]
postgres: smith: smith postgres 127.0.0.1(51026)
SELECT(_start+0x25)[0x5b0738f43eb5]
2025-03-25 21:32:25.437 ACDT [2372118] LOG:  client backend (PID 2372149)
was terminated by signal 6: Aborted
2025-03-25 21:32:25.437 ACDT [2372118] DETAIL:  Failed process was running:
SELECT public.pg_freespace('pg_roles', 0);
2025-03-25 21:32:25.437 ACDT [2372118] LOG:  terminating any other active
server processes


Thanks to SQLSmith / SQLReduce.
-
Robins


Re: BUG #18866: Running pg_freespace() on views triggers an Abort

From
Tender Wang
Date:


PG Bug reporting form <noreply@postgresql.org> 于2025年3月25日周二 22:42写道:
The following bug has been logged on the website:

Bug reference:      18866
Logged by:          Robins Tharakan
Email address:      tharakan@gmail.com
PostgreSQL version: Unsupported/Unknown
Operating system:   Ubuntu
Description:       

Hi,

Passing a view to pg_freespace() triggers an Abort on HEAD. This has been so
since the beginning (049ef3398d05c9dc8f48aa9a6d68440661cfeb87). Given that
this is just an assert, feel free to skip - but thought I'd bring it up, in
case this needs a review.

SQL / Backtrace / Error Log excerpt given below:


SQL
===
$ cat crashing.sql
CREATE EXTENSION pg_freespacemap;
SELECT pg_freespace('pg_roles', 0);

Yeah, it would crash if you input a foreign table, for example:
create extension postgres_fdw;
CREATE SERVER d FOREIGN DATA WRAPPER postgres_fdw;
CREATE FOREIGN TABLE f (g text) SERVER d; 
SELECT pg_freespace('f', 0);    -- will crash too

I think we can remove the Assert in smgropen().
Any thoughts?

--
Thanks,
Tender Wang
Attachment
Tender Wang <tndrwang@gmail.com> writes:
> PG Bug reporting form <noreply@postgresql.org> 于2025年3月25日周二 22:42写道:
>> Passing a view to pg_freespace() triggers an Abort on HEAD.

> I think we can remove the Assert in smgropen().
> Any thoughts?

That seems like a "solution" that will allow other bugs.  pg_freespace
should be fixed to verify RELKIND_HAS_STORAGE() before it tries to
access said storage.

            regards, tom lane



Re: BUG #18866: Running pg_freespace() on views triggers an Abort

From
Tender Wang
Date:


Tom Lane <tgl@sss.pgh.pa.us> 于2025年3月26日周三 00:37写道:
Tender Wang <tndrwang@gmail.com> writes:
> PG Bug reporting form <noreply@postgresql.org> 于2025年3月25日周二 22:42写道:
>> Passing a view to pg_freespace() triggers an Abort on HEAD.

> I think we can remove the Assert in smgropen().
> Any thoughts?

That seems like a "solution" that will allow other bugs.  pg_freespace
should be fixed to verify RELKIND_HAS_STORAGE() before it tries to
access said storage.

Thanks for the advice. Please see the attached patch.


--
Thanks,
Tender Wang
Attachment

Re: BUG #18866: Running pg_freespace() on views triggers an Abort

From
"Euler Taveira"
Date:
On Tue, Mar 25, 2025, at 10:18 PM, Tender Wang wrote:
Thanks for the advice. Please see the attached patch.

Your patch needs some adjustments. There is no need to include pg_class.h. I
don't like the proposed error message. I prefer saying the relation cannot be
opened because that's what will happen if it reaches this code path. This is an
existing message so no need for translation. The output will be like:

postgres=# SELECT pg_freespace('f', 0); 
ERROR:  cannot open relation "f"
DETAIL:  This operation is not supported for foreign tables.
postgres=# SELECT pg_freespace('pg_roles', 0); 
ERROR:  cannot open relation "pg_roles"
DETAIL:  This operation is not supported for views.
postgres=# SELECT pg_freespace('pg_class', 0); 
pg_freespace 
--------------
         7520
(1 row)

I'm attaching a patch that includes these modifications.


--
Euler Taveira

Attachment
"Euler Taveira" <euler@eulerto.com> writes:
> Your patch needs some adjustments. There is no need to include pg_class.h. I
> don't like the proposed error message. I prefer saying the relation cannot be
> opened because that's what will happen if it reaches this code path.

I don't care for that proposal either: we just did open the relation ;-)

Looking at other places where we throw an error for !RELKIND_HAS_STORAGE:

rawpage.c:
                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                 errmsg("cannot get raw page from relation \"%s\"",

pgstatindex.c:

                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                 errmsg("cannot get page count of relation \"%s\"",

tid.c:
                errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                errmsg("cannot look at latest visible tid for relation \"%s.%s\"",

relcache.c:
        elog(ERROR, "relation \"%s\" does not have storage",
             RelationGetRelationName(relation));

So the previous proposal was evidently modeled on the first two of
these precedents.  Personally I prefer messages that say *why*
something failed, so I'd go with something more like "relation \"%s\"
does not have storage".  Use of errdetail_relkind_not_supported is
fine though.

            regards, tom lane



Re: BUG #18866: Running pg_freespace() on views triggers an Abort

From
"Euler Taveira"
Date:
On Wed, Mar 26, 2025, at 1:00 PM, Tom Lane wrote:
"Euler Taveira" <euler@eulerto.com> writes:
> Your patch needs some adjustments. There is no need to include pg_class.h. I
> don't like the proposed error message. I prefer saying the relation cannot be
> opened because that's what will happen if it reaches this code path.

I don't care for that proposal either: we just did open the relation ;-)

Fair point.

relcache.c:
        elog(ERROR, "relation \"%s\" does not have storage",
             RelationGetRelationName(relation));

So the previous proposal was evidently modeled on the first two of
these precedents.  Personally I prefer messages that say *why*
something failed, so I'd go with something more like "relation \"%s\"
does not have storage".  Use of errdetail_relkind_not_supported is
fine though.

I thought about saying "no storage" but don't know why was convinced by that
proposal. Your suggestion works for me.


--
Euler Taveira

Re: BUG #18866: Running pg_freespace() on views triggers an Abort

From
Álvaro Herrera
Date:
On 2025-Mar-25, Euler Taveira wrote:

> On Tue, Mar 25, 2025, at 10:18 PM, Tender Wang wrote:
> > Thanks for the advice. Please see the attached patch.
> 
> Your patch needs some adjustments. There is no need to include pg_class.h.

I think including pg_class.h (which is where RELKIND_HAS_STORAGE() is
defined and where errdetail_relkind_not_supported is declared) is better
than including utils/rel.h, which brings in a whole bunch of additional
stuff in addition to pg_class.h :-)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end." (2nd Commandment for C programmers)



Re: BUG #18866: Running pg_freespace() on views triggers an Abort

From
"Euler Taveira"
Date:
On Wed, Mar 26, 2025, at 2:33 PM, Álvaro Herrera wrote:
On 2025-Mar-25, Euler Taveira wrote:

> On Tue, Mar 25, 2025, at 10:18 PM, Tender Wang wrote:
> > Thanks for the advice. Please see the attached patch.

> Your patch needs some adjustments. There is no need to include pg_class.h.

I think including pg_class.h (which is where RELKIND_HAS_STORAGE() is
defined and where errdetail_relkind_not_supported is declared) is better
than including utils/rel.h, which brings in a whole bunch of additional
stuff in addition to pg_class.h :-)

It doesn't work because of RelationGetRelationName() that is defined into
utils/rel.h.


--
Euler Taveira

Re: BUG #18866: Running pg_freespace() on views triggers an Abort

From
Alvaro Herrera
Date:
On 2025-Mar-26, Euler Taveira wrote:

> It doesn't work because of RelationGetRelationName() that is defined into
> utils/rel.h.

Doh, of course.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu.  Five minutes later I realize that it's also talking
about food" (Donald Knuth)



Re: BUG #18866: Running pg_freespace() on views triggers an Abort

From
Tender Wang
Date:


Tom Lane <tgl@sss.pgh.pa.us> 于2025年3月27日周四 00:01写道:
"Euler Taveira" <euler@eulerto.com> writes:
> Your patch needs some adjustments. There is no need to include pg_class.h. I
> don't like the proposed error message. I prefer saying the relation cannot be
> opened because that's what will happen if it reaches this code path.

I don't care for that proposal either: we just did open the relation ;-)

Looking at other places where we throw an error for !RELKIND_HAS_STORAGE:

rawpage.c:
                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                 errmsg("cannot get raw page from relation \"%s\"",

pgstatindex.c:

                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                 errmsg("cannot get page count of relation \"%s\"",

tid.c:
                errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                errmsg("cannot look at latest visible tid for relation \"%s.%s\"",

relcache.c:
        elog(ERROR, "relation \"%s\" does not have storage",
             RelationGetRelationName(relation));

So the previous proposal was evidently modeled on the first two of
these precedents. 

Yes, I referred to the rawpage.c. 
 
Personally I prefer messages that say *why*
something failed, so I'd go with something more like "relation \"%s\"
does not have storage".  Use of errdetail_relkind_not_supported is
fine though.

 PFA for the updated patch file.

--
Thanks,
Tender Wang
Attachment

Re: BUG #18866: Running pg_freespace() on views triggers an Abort

From
Richard Guo
Date:
On Thu, Mar 27, 2025 at 4:14 PM Tender Wang <tndrwang@gmail.com> wrote:
>  PFA for the updated patch file.

Not related to this issue but I wonder why blkno is verified after
the relation is opened.  It can be verified beforehand, no?

Thanks
Richard



Richard Guo <guofenglinux@gmail.com> writes:
> Not related to this issue but I wonder why blkno is verified after
> the relation is opened.  It can be verified beforehand, no?

It makes sense to me to check the parameters left-to-right, so
I think verifying blkno after relation is fine.  It's not like
there's value in optimizing the failure case.

This does, however, suggest that we ought to check the relkind
immediately after opening the rel, before the blkno check.
I'll adjust the patch that way and push.

            regards, tom lane



Re: BUG #18866: Running pg_freespace() on views triggers an Abort

From
Tender Wang
Date:


Tom Lane <tgl@sss.pgh.pa.us> 于2025年3月28日周五 00:01写道:
Richard Guo <guofenglinux@gmail.com> writes:
> Not related to this issue but I wonder why blkno is verified after
> the relation is opened.  It can be verified beforehand, no?

It makes sense to me to check the parameters left-to-right, so
I think verifying blkno after relation is fine.  It's not like
there's value in optimizing the failure case.

This does, however, suggest that we ought to check the relkind
immediately after opening the rel, before the blkno check.
I'll adjust the patch that way and push.

Thanks for pushing. 


--
Thanks,
Tender Wang