Thread: RE: Fix for pageinspect bug in PG 17
Dear Tomas, > Here's a fix for pageinspect bug in PG17, reported in [1]. The bug turns > out to be introduced by my commit I could not see the link, but I think it is [1], right? > > commit dae761a87edae444d11a411f711f1d679bed5941 > Author: Tomas Vondra <tomas.vondra@postgresql.org> > Date: Fri Dec 8 17:07:30 2023 +0100 > > Add empty BRIN ranges during CREATE INDEX > > ... > > This adds an out argument to brin_page_items, but I failed to consider > the user may still run with an older version of the extension - either > after pg_upgrade (as in the report), or when the CREATE EXTENSION > command specifies VERSION. I confirmed that 428c0ca added new output entries (version is bumped 11->12), and the same C function is used for both 11 and 12. > The new "empty" field is added in the middle of the output tuple, which > shifts the values, causing segfault when accessing a text field at the > end of the array. Agreed and I could reproduce by steps: ``` postgres=# CREATE EXTENSION pageinspect VERSION "1.11" ; CREATE EXTENSION postgres=# CREATE TABLE foo (id int); CREATE TABLE postgres=# CREATE INDEX ON foo USING brin ( id ); CREATE INDEX postgres=# SELECT * FROM brin_page_items(get_raw_page('foo_id_idx', 2), 'foo_id_idx'); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. ``` > > Of course, we add arguments to existing functions pretty often, and we > know how to do that in backwards-compatible way - pg_stat_statements is > a good example of how to do that nicely. Unfortunately, it's too late to > do that for brin_page_items :-( There may already be upgraded systems or > with installed pageinspect, etc. > > The only fix I can think of is explicitly looking at TupleDesc->natts, > as in the attached fix. I've tested your patch and it worked well. Also, I tried to upgrade from 11 and 12 and ran again, it could execute well. Few points: 1. Do you think we should follow the approach like pg_stat_statements for master? Or, do you want to apply the patch both PG17 and HEAD? I think latter one is OK because we should avoid code branches as much as possible. 2. Later readers may surprise the part for checking `rsinfo->setDesc->natts`. Can you use the boolean and add comments, something like ``` + /* Support older API version */ + bool output_empty_attr = (rsinfo->setDesc->natts >= 8); ``` 3. Do we have to add the test for the fix? [1]: https://www.postgresql.org/message-id/VI1PR02MB63331C3D90E2104FD12399D38A5D2%40VI1PR02MB6333.eurprd02.prod.outlook.com Best regards, Hayato Kuroda FUJITSU LIMITED
On 11/12/24 09:04, Hayato Kuroda (Fujitsu) wrote: > Dear Tomas, > >> Here's a fix for pageinspect bug in PG17, reported in [1]. The bug turns >> out to be introduced by my commit > > I could not see the link, but I think it is [1], right? > Right. Apologies, I forgot to include the link. >> >> commit dae761a87edae444d11a411f711f1d679bed5941 >> Author: Tomas Vondra <tomas.vondra@postgresql.org> >> Date: Fri Dec 8 17:07:30 2023 +0100 >> >> Add empty BRIN ranges during CREATE INDEX >> >> ... >> >> This adds an out argument to brin_page_items, but I failed to consider >> the user may still run with an older version of the extension - either >> after pg_upgrade (as in the report), or when the CREATE EXTENSION >> command specifies VERSION. > > I confirmed that 428c0ca added new output entries (version is bumped 11->12), > and the same C function is used for both 11 and 12. > >> The new "empty" field is added in the middle of the output tuple, which >> shifts the values, causing segfault when accessing a text field at the >> end of the array. > > Agreed and I could reproduce by steps: > > ``` > postgres=# CREATE EXTENSION pageinspect VERSION "1.11" ; > CREATE EXTENSION > postgres=# CREATE TABLE foo (id int); > CREATE TABLE > postgres=# CREATE INDEX ON foo USING brin ( id ); > CREATE INDEX > postgres=# SELECT * FROM brin_page_items(get_raw_page('foo_id_idx', 2), 'foo_id_idx'); > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > ``` > >> >> Of course, we add arguments to existing functions pretty often, and we >> know how to do that in backwards-compatible way - pg_stat_statements is >> a good example of how to do that nicely. Unfortunately, it's too late to >> do that for brin_page_items :-( There may already be upgraded systems or >> with installed pageinspect, etc. >> >> The only fix I can think of is explicitly looking at TupleDesc->natts, >> as in the attached fix. > > I've tested your patch and it worked well. Also, I tried to upgrade from 11 and 12 > and ran again, it could execute well. > > Few points: > > 1. > Do you think we should follow the approach like pg_stat_statements for master? > Or, do you want to apply the patch both PG17 and HEAD? I think latter one is OK > because we should avoid code branches as much as possible. > My plan was to apply the patch to both 17 and HEAD, and then maybe do something smarter in HEAD in a separate commit. But then Michael pointed out other pageinspect functions just error out in this version-mismatch cases, so I think it's better to just do it the same way. Maybe we could be smarter, but it seems pointless to do it only for one pageinspect function, while still requiring an upgrade for other more widely used functions in the same situation (albeit for a much older version of the extension). > 2. > Later readers may surprise the part for checking `rsinfo->setDesc->natts`. > Can you use the boolean and add comments, something like > > ``` > + /* Support older API version */ > + bool output_empty_attr = (rsinfo->setDesc->natts >= 8); > ``` > Doesn't matter, if we error out. > 3. > Do we have to add the test for the fix? > Yes. See the patch I shared a couple minutes ago. regards -- Tomas Vondra
On Wed, Nov 13, 2024 at 11:07 AM Tomas Vondra <tomas@vondra.me> wrote: > My plan was to apply the patch to both 17 and HEAD, and then maybe do > something smarter in HEAD in a separate commit. But then Michael pointed > out other pageinspect functions just error out in this version-mismatch > cases, so I think it's better to just do it the same way. FWIW I didn't actually backpatch commit 691e8b2e18. I decided that it was better to just paper-over the issue on backbranches instead -- see commit c788115b. The problem that I fixed back in 2020 was a problem with the data types used -- not a failure to consider older versions of the extension at all. It was just convenient to use the number of columns to detect the version of the extension to detect a problematic (incorrectly typed) function. -- Peter Geoghegan
On 11/13/24 18:20, Peter Geoghegan wrote: > On Wed, Nov 13, 2024 at 11:07 AM Tomas Vondra <tomas@vondra.me> wrote: >> My plan was to apply the patch to both 17 and HEAD, and then maybe do >> something smarter in HEAD in a separate commit. But then Michael pointed >> out other pageinspect functions just error out in this version-mismatch >> cases, so I think it's better to just do it the same way. > > FWIW I didn't actually backpatch commit 691e8b2e18. I decided that it > was better to just paper-over the issue on backbranches instead -- see > commit c788115b. > > The problem that I fixed back in 2020 was a problem with the data > types used -- not a failure to consider older versions of the > extension at all. It was just convenient to use the number of columns > to detect the version of the extension to detect a problematic > (incorrectly typed) function. > Does that mean you think we should fix the issue at hand differently? Say, by looking at number of columns and building the correct tuple, like I did in my initial patch? regards -- Tomas Vondra
On Wed, Nov 13, 2024 at 7:48 PM Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Nov 13, 2024 at 09:00:30PM +0100, Tomas Vondra wrote: > > Does that mean you think we should fix the issue at hand differently? > > Say, by looking at number of columns and building the correct tuple, > > like I did in my initial patch? > > 691e8b2e18 is not something I would have done when it comes to > pageinspect, FWIW. There is the superuser argument for this module, > so I'd vote for an error and apply the same policy across all branches > as a matter of consistency. 691e8b2e18 was the one that threw the error? -- Peter Geoghegan
On Wed, Nov 13, 2024 at 3:00 PM Tomas Vondra <tomas@vondra.me> wrote: > Does that mean you think we should fix the issue at hand differently? > Say, by looking at number of columns and building the correct tuple, > like I did in my initial patch? I don't feel strongly about it either way. But if it was my fix I'd probably make it work on both versions. There's plenty of precedent for that. -- Peter Geoghegan
Dear Tomas, Thanks for updating the patch. I've tested new patch and confirmed the brin_pgage_items() could error out: ``` postgres=# SELECT * FROM brin_page_items(get_raw_page('foo_id_idx', 2), 'foo_id_idx'); ERROR: function has wrong number of declared columns HINT: To resolve the problem, update the "pageinspect" extension to the latest version. ``` I also do not have strong opinions for both approaches. If I had to say... it's OK to do error out. The source becomes simpler and I cannot find use-case that user strongly wants to use the older pageinspect extension. Best regards, Hayato Kuroda FUJITSU LIMITED
I've pushed (and backpatched) a fix for this. I ended up doing the simplest thing -- error out if the number of columns does not match, suggesting to update to latest extension version. I considered handling it in a nicer way, but I didn't like the result very much and I think that's sufficient for superuser-only extension. And 691e8b2e18 seems like a reasonable precedent (even though the backbranches did do a different thing). I also considered introducing pg_stat_statements-style versioning, but it's too late to do that in backbranches, and I don't think we expect the function to change very often to justify this. regards -- Tomas Vondra