Thread: RE: Fix for pageinspect bug in PG 17

RE: Fix for pageinspect bug in PG 17

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Tomas,

> Here's a fix for pageinspect bug in PG17, reported in [1]. The bug turns
> out to be introduced by my commit

I could not see the link, but I think it is [1], right?

> 
> commit dae761a87edae444d11a411f711f1d679bed5941
> Author: Tomas Vondra <tomas.vondra@postgresql.org>
> Date:   Fri Dec 8 17:07:30 2023 +0100
> 
>     Add empty BRIN ranges during CREATE INDEX
> 
>     ...
> 
> This adds an out argument to brin_page_items, but I failed to consider
> the user may still run with an older version of the extension - either
> after pg_upgrade (as in the report), or when the CREATE EXTENSION
> command specifies VERSION.

I confirmed that 428c0ca added new output entries (version is bumped 11->12),
and the same C function is used for both 11 and 12.

> The new "empty" field is added in the middle of the output tuple, which
> shifts the values, causing segfault when accessing a text field at the
> end of the array.

Agreed and I could reproduce by steps:

```
postgres=# CREATE EXTENSION pageinspect VERSION "1.11" ;
CREATE EXTENSION
postgres=# CREATE TABLE foo (id int);
CREATE TABLE
postgres=# CREATE INDEX ON foo USING brin ( id );
CREATE INDEX
postgres=# SELECT * FROM brin_page_items(get_raw_page('foo_id_idx', 2), 'foo_id_idx');
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
```

> 
> Of course, we add arguments to existing functions pretty often, and we
> know how to do that in backwards-compatible way - pg_stat_statements is
> a good example of how to do that nicely. Unfortunately, it's too late to
> do that for brin_page_items :-( There may already be upgraded systems or
> with installed pageinspect, etc.
> 
> The only fix I can think of is explicitly looking at TupleDesc->natts,
> as in the attached fix.

I've tested your patch and it worked well. Also, I tried to upgrade from 11 and 12
and ran again, it could execute well.

Few points:

1.
Do you think we should follow the approach like pg_stat_statements for master?
Or, do you want to apply the patch both PG17 and HEAD? I think latter one is OK
because we should avoid code branches as much as possible.

2.
Later readers may surprise the part for checking `rsinfo->setDesc->natts`.
Can you use the boolean and add comments, something like

```
+            /* Support older API version */
+            bool        output_empty_attr = (rsinfo->setDesc->natts >= 8);
```

3.
Do we have to add the test for the fix?

[1]:
https://www.postgresql.org/message-id/VI1PR02MB63331C3D90E2104FD12399D38A5D2%40VI1PR02MB6333.eurprd02.prod.outlook.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Fix for pageinspect bug in PG 17

From
Tomas Vondra
Date:
On 11/12/24 09:04, Hayato Kuroda (Fujitsu) wrote:
> Dear Tomas,
> 
>> Here's a fix for pageinspect bug in PG17, reported in [1]. The bug turns
>> out to be introduced by my commit
> 
> I could not see the link, but I think it is [1], right?
> 

Right. Apologies, I forgot to include the link.

>>
>> commit dae761a87edae444d11a411f711f1d679bed5941
>> Author: Tomas Vondra <tomas.vondra@postgresql.org>
>> Date:   Fri Dec 8 17:07:30 2023 +0100
>>
>>     Add empty BRIN ranges during CREATE INDEX
>>
>>     ...
>>
>> This adds an out argument to brin_page_items, but I failed to consider
>> the user may still run with an older version of the extension - either
>> after pg_upgrade (as in the report), or when the CREATE EXTENSION
>> command specifies VERSION.
> 
> I confirmed that 428c0ca added new output entries (version is bumped 11->12),
> and the same C function is used for both 11 and 12.
> 
>> The new "empty" field is added in the middle of the output tuple, which
>> shifts the values, causing segfault when accessing a text field at the
>> end of the array.
> 
> Agreed and I could reproduce by steps:
> 
> ```
> postgres=# CREATE EXTENSION pageinspect VERSION "1.11" ;
> CREATE EXTENSION
> postgres=# CREATE TABLE foo (id int);
> CREATE TABLE
> postgres=# CREATE INDEX ON foo USING brin ( id );
> CREATE INDEX
> postgres=# SELECT * FROM brin_page_items(get_raw_page('foo_id_idx', 2), 'foo_id_idx');
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> ```
> 
>>
>> Of course, we add arguments to existing functions pretty often, and we
>> know how to do that in backwards-compatible way - pg_stat_statements is
>> a good example of how to do that nicely. Unfortunately, it's too late to
>> do that for brin_page_items :-( There may already be upgraded systems or
>> with installed pageinspect, etc.
>>
>> The only fix I can think of is explicitly looking at TupleDesc->natts,
>> as in the attached fix.
> 
> I've tested your patch and it worked well. Also, I tried to upgrade from 11 and 12
> and ran again, it could execute well.
> 
> Few points:
> 
> 1.
> Do you think we should follow the approach like pg_stat_statements for master?
> Or, do you want to apply the patch both PG17 and HEAD? I think latter one is OK
> because we should avoid code branches as much as possible.
> 

My plan was to apply the patch to both 17 and HEAD, and then maybe do
something smarter in HEAD in a separate commit. But then Michael pointed
out other pageinspect functions just error out in this version-mismatch
cases, so I think it's better to just do it the same way.

Maybe we could be smarter, but it seems pointless to do it only for one
pageinspect function, while still requiring an upgrade for other more
widely used functions in the same situation (albeit for a much older
version of the extension).

> 2.
> Later readers may surprise the part for checking `rsinfo->setDesc->natts`.
> Can you use the boolean and add comments, something like
> 
> ```
> +            /* Support older API version */
> +            bool        output_empty_attr = (rsinfo->setDesc->natts >= 8);
> ```
> 

Doesn't matter, if we error out.

> 3.
> Do we have to add the test for the fix?
> 

Yes. See the patch I shared a couple minutes ago.


regards

-- 
Tomas Vondra




Re: Fix for pageinspect bug in PG 17

From
Peter Geoghegan
Date:
On Wed, Nov 13, 2024 at 11:07 AM Tomas Vondra <tomas@vondra.me> wrote:
> My plan was to apply the patch to both 17 and HEAD, and then maybe do
> something smarter in HEAD in a separate commit. But then Michael pointed
> out other pageinspect functions just error out in this version-mismatch
> cases, so I think it's better to just do it the same way.

FWIW I didn't actually backpatch commit 691e8b2e18. I decided that it
was better to just paper-over the issue on backbranches instead -- see
commit c788115b.

The problem that I fixed back in 2020 was a problem with the data
types used -- not a failure to consider older versions of the
extension at all. It was just convenient to use the number of columns
to detect the version of the extension to detect a problematic
(incorrectly typed) function.

--
Peter Geoghegan



Re: Fix for pageinspect bug in PG 17

From
Tomas Vondra
Date:

On 11/13/24 18:20, Peter Geoghegan wrote:
> On Wed, Nov 13, 2024 at 11:07 AM Tomas Vondra <tomas@vondra.me> wrote:
>> My plan was to apply the patch to both 17 and HEAD, and then maybe do
>> something smarter in HEAD in a separate commit. But then Michael pointed
>> out other pageinspect functions just error out in this version-mismatch
>> cases, so I think it's better to just do it the same way.
> 
> FWIW I didn't actually backpatch commit 691e8b2e18. I decided that it
> was better to just paper-over the issue on backbranches instead -- see
> commit c788115b.
> 
> The problem that I fixed back in 2020 was a problem with the data
> types used -- not a failure to consider older versions of the
> extension at all. It was just convenient to use the number of columns
> to detect the version of the extension to detect a problematic
> (incorrectly typed) function.
> 

Does that mean you think we should fix the issue at hand differently?
Say, by looking at number of columns and building the correct tuple,
like I did in my initial patch?

regards

-- 
Tomas Vondra




Re: Fix for pageinspect bug in PG 17

From
Peter Geoghegan
Date:
On Wed, Nov 13, 2024 at 7:48 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Wed, Nov 13, 2024 at 09:00:30PM +0100, Tomas Vondra wrote:
> > Does that mean you think we should fix the issue at hand differently?
> > Say, by looking at number of columns and building the correct tuple,
> > like I did in my initial patch?
>
> 691e8b2e18 is not something I would have done when it comes to
> pageinspect, FWIW.  There is the superuser argument for this module,
> so I'd vote for an error and apply the same policy across all branches
> as a matter of consistency.

691e8b2e18 was the one that threw the error?

--
Peter Geoghegan



Re: Fix for pageinspect bug in PG 17

From
Peter Geoghegan
Date:
On Wed, Nov 13, 2024 at 3:00 PM Tomas Vondra <tomas@vondra.me> wrote:
> Does that mean you think we should fix the issue at hand differently?
> Say, by looking at number of columns and building the correct tuple,
> like I did in my initial patch?

I don't feel strongly about it either way. But if it was my fix I'd
probably make it work on both versions. There's plenty of precedent
for that.

--
Peter Geoghegan



RE: Fix for pageinspect bug in PG 17

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Tomas,

Thanks for updating the patch.

I've tested new patch and confirmed the brin_pgage_items() could error out:

```
postgres=# SELECT * FROM brin_page_items(get_raw_page('foo_id_idx', 2), 'foo_id_idx');
ERROR:  function has wrong number of declared columns
HINT:  To resolve the problem, update the "pageinspect" extension to the latest version.
```

I also do not have strong opinions for both approaches.
If I had to say... it's OK to do error out. The source becomes simpler and I cannot
find use-case that user strongly wants to use the older pageinspect extension.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Fix for pageinspect bug in PG 17

From
Tomas Vondra
Date:
I've pushed (and backpatched) a fix for this.

I ended up doing the simplest thing -- error out if the number of
columns does not match, suggesting to update to latest extension version.

I considered handling it in a nicer way, but I didn't like the result
very much and I think that's sufficient for superuser-only extension.
And 691e8b2e18 seems like a reasonable precedent (even though the
backbranches did do a different thing).

I also considered introducing pg_stat_statements-style versioning, but
it's too late to do that in backbranches, and I don't think we expect
the function to change very often to justify this.

regards

-- 
Tomas Vondra