Thread: BUG #16285: bt_metap fails with value is out of range for type integer
BUG #16285: bt_metap fails with value is out of range for type integer
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16285 Logged by: Victor Yegorov Email address: vyegorov@gmail.com PostgreSQL version: 12.2 Operating system: Ubuntu 18.04.3 LTS Description: I have an index, that is giving issues pageinspect-ing it: SELECT * FROM bt_metap('index')\gx ERROR: value "2180413846" is out of range for type integer At the same time: SELECT * FROM pgstatindex('index')\gx -[ RECORD 1 ]------+---------- version | 3 tree_level | 2 index_size | 131571712 root_block_no | 290 internal_pages | 56 leaf_pages | 16003 empty_pages | 0 deleted_pages | 1 avg_leaf_density | 50.06 leaf_fragmentation | 66.08 Looking at the sources of both extensions, I can see, that pgstatindex() is using psprintf(INT64_FORMAT) for page counters and psprintf("%u") for root page, while bt_metap() is using only psprintf("%d"); I assume psprintf("%u") should be used at least for metad->btm_root and metad->btm_fastroot in the bt_metap(PG_FUNCTION_ARGS) function.
Re: BUG #16285: bt_metap fails with value is out of range for type integer
From
Peter Geoghegan
Date:
On Mon, Mar 2, 2020 at 2:40 PM PG Bug reporting form <noreply@postgresql.org> wrote: > Looking at the sources of both extensions, I can see, that pgstatindex() is > using psprintf(INT64_FORMAT) for page counters and psprintf("%u") for root > page, while bt_metap() is using only psprintf("%d"); > > I assume psprintf("%u") should be used at least for metad->btm_root and > metad->btm_fastroot in the bt_metap(PG_FUNCTION_ARGS) function. While this has been wrong forever, the immediate problem here is probably not metad->btm_root. In practice, the root block number is usually fairly low. Very large indexes tend to have a root page that only has a small number of tuples, because the root page was created relatively early in the lifetime of the index. Past a certain point, the root page receives new tuples so infrequently that it almost never happens. I think it's more likely that the problem here is the relatively new column returned by bt_metap(), oldest_xact. That has only been around since Postgres v11. I'm not sure how we should handle this in the backbranches, since only a change in the CREATE FUNCTION declaration of bt_metap() can truly fix the problem. I suppose that we could work around the problem with some kind of kludge, but come up with a real fix for v13. -- Peter Geoghegan
Re: BUG #16285: bt_metap fails with value is out of range for type integer
From
Victor Yegorov
Date:
вт, 3 мар. 2020 г. в 01:03, Peter Geoghegan <pg@bowt.ie>:
I think it's more likely that the problem here is the relatively new
column returned by bt_metap(), oldest_xact. That has only been around
since Postgres v11.
This made me look at the stats, as this is a test copy of the DB upgraded to 12.2 recently. Per pg_stat_user_tables,
table had never been vacuumed since upgrade, only analyzed.
I've run vacuum manually and this made the issue go away.
It is beyond my skills to try to find the real place for this issue now. I think we might be hitting some uninitialized value somewhere, perhaps?..
Victor Yegorov
On 2020-Mar-02, PG Bug reporting form wrote: > I have an index, that is giving issues pageinspect-ing it: > > SELECT * FROM bt_metap('index')\gx > ERROR: value "2180413846" is out of range for type integer I understand it's gone now, but you could have used \errverbose to display the source function name that caused the error, followed by setting backtrace_functions to that function and repeated the query. That may have resulted in a more precise location of the problem. Thanks, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #16285: bt_metap fails with value is out of range for type integer
From
Peter Geoghegan
Date:
On Mon, Mar 2, 2020 at 3:19 PM Victor Yegorov <vyegorov@gmail.com> wrote: > This made me look at the stats, as this is a test copy of the DB upgraded to 12.2 recently. Per pg_stat_user_tables, > table had never been vacuumed since upgrade, only analyzed. > I've run vacuum manually and this made the issue go away. > > It is beyond my skills to try to find the real place for this issue now. I think we might be hitting some uninitializedvalue somewhere, perhaps?.. The issue is that bt_metap() was declared using the wrong data types -- simple as that. An int4 cannot represent 2180413846 -- only a uint32 (or a TransactionId) can. Technically this is not a recent issue, since we got btm_root wrong right from the start. However, it seems more likely that the relatively recently added oldest_xact field will cause problems in practice. The fact that a manual VACUUM made that go away for you (at least temporarily) is not surprising. The declaration itself is wrong, so it is the declaration itself that we must fix. I can't really see a way of fixing it without introducing a new version of contrib/pageinspect. :-( I propose the attached fix for the master branch only. This is a draft patch. The patch changes the data types to more appropriate, wider integer types. It follows the convention of using int8/bigint where the natural data type to use at the C code level is actually uint32 -- pg_stat_statements does this with queryId (actually, we switched over to 64-bit hashes some time later, but it worked that way before commit cff440d3686). The patch takes another idea from pg_stat_statements: Using a tupledesc's natts field to determine the extension version in use, to maintain compatibility when there are breaking changes to a function's definition. I simply error out when bt_metap() is called using the definition from an older version of the extension, unlike pg_stat_statements. It isn't worth going to the trouble of preserving a set of behaviors that are more or less broken anyway. Better to error out in an obvious way. Especially given that contrib/pageinspect is fundamentally a superuser-only extension, with a relatively small user base. Theoretically, I should also change bt_page_items() (and several functions) to make its blkno In parameter of type int8/bigint (instead of int4/integer). I haven't bothered with that, though. -- Peter Geoghegan
Attachment
Re: BUG #16285: bt_metap fails with value is out of range for type integer
From
Victor Yegorov
Date:
вт, 3 мар. 2020 г. в 03:39, Alvaro Herrera <alvherre@2ndquadrant.com>:
I understand it's gone now, but you could have used \errverbose to
display the source function name that caused the error, followed by
setting backtrace_functions to that function and repeated the query.
That may have resulted in a more precise location of the problem.
It is not entirely gone, I still have some indexes around that produce this.
This backtracing functionality is available only in the HEAD as far as I can see, so it took me a while
to get to it:
[3760] ERROR: value "4282444360" is out of range for type integer
[3760] BACKTRACE:
postgres: postgres postgres [local] SELECT(pg_strtoint32+0xdf) [0x55ab094e4d7f]
postgres: postgres postgres [local] SELECT(int4in+0xd) [0x55ab094acc5d]
postgres: postgres postgres [local] SELECT(InputFunctionCall+0x7b) [0x55ab095675ab]
postgres: postgres postgres [local] SELECT(BuildTupleFromCStrings+0xa7) [0x55ab093066a7]
/var/lib/postgresql/opt/pg13devel/lib/pageinspect.so(bt_metap+0x199) [0x7f7ec35b7619]
postgres: postgres postgres [local] SELECT(ExecMakeTableFunctionResult+0x40b) [0x55ab09302f0b]
postgres: postgres postgres [local] SELECT(+0x27bd11) [0x55ab09311d11]
postgres: postgres postgres [local] SELECT(ExecScan+0x7b) [0x55ab0930381b]
postgres: postgres postgres [local] SELECT(standard_ExecutorRun+0x102) [0x55ab092fa3f2]
postgres: postgres postgres [local] SELECT(+0x3b6d3d) [0x55ab0944cd3d]
postgres: postgres postgres [local] SELECT(PortalRun+0x2ba) [0x55ab0944e17a]
postgres: postgres postgres [local] SELECT(+0x3b3d07) [0x55ab09449d07]
postgres: postgres postgres [local] SELECT(PostgresMain+0x1df6) [0x55ab0944c046]
postgres: postgres postgres [local] SELECT(+0x33de85) [0x55ab093d3e85]
postgres: postgres postgres [local] SELECT(PostmasterMain+0xf40) [0x55ab093d5000]
postgres: postgres postgres [local] SELECT(main+0x4a4) [0x55ab09154964]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f7ecd26eb97]
postgres: postgres postgres [local] SELECT(_start+0x2a) [0x55ab09154a2a]
[3760] STATEMENT: select * from bt_metap('table_pkey');
This backtracing functionality is available only in the HEAD as far as I can see, so it took me a while
to get to it:
[3760] ERROR: value "4282444360" is out of range for type integer
[3760] BACKTRACE:
postgres: postgres postgres [local] SELECT(pg_strtoint32+0xdf) [0x55ab094e4d7f]
postgres: postgres postgres [local] SELECT(int4in+0xd) [0x55ab094acc5d]
postgres: postgres postgres [local] SELECT(InputFunctionCall+0x7b) [0x55ab095675ab]
postgres: postgres postgres [local] SELECT(BuildTupleFromCStrings+0xa7) [0x55ab093066a7]
/var/lib/postgresql/opt/pg13devel/lib/pageinspect.so(bt_metap+0x199) [0x7f7ec35b7619]
postgres: postgres postgres [local] SELECT(ExecMakeTableFunctionResult+0x40b) [0x55ab09302f0b]
postgres: postgres postgres [local] SELECT(+0x27bd11) [0x55ab09311d11]
postgres: postgres postgres [local] SELECT(ExecScan+0x7b) [0x55ab0930381b]
postgres: postgres postgres [local] SELECT(standard_ExecutorRun+0x102) [0x55ab092fa3f2]
postgres: postgres postgres [local] SELECT(+0x3b6d3d) [0x55ab0944cd3d]
postgres: postgres postgres [local] SELECT(PortalRun+0x2ba) [0x55ab0944e17a]
postgres: postgres postgres [local] SELECT(+0x3b3d07) [0x55ab09449d07]
postgres: postgres postgres [local] SELECT(PostgresMain+0x1df6) [0x55ab0944c046]
postgres: postgres postgres [local] SELECT(+0x33de85) [0x55ab093d3e85]
postgres: postgres postgres [local] SELECT(PostmasterMain+0xf40) [0x55ab093d5000]
postgres: postgres postgres [local] SELECT(main+0x4a4) [0x55ab09154964]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f7ecd26eb97]
postgres: postgres postgres [local] SELECT(_start+0x2a) [0x55ab09154a2a]
[3760] STATEMENT: select * from bt_metap('table_pkey');
Victor Yegorov
Re: BUG #16285: bt_metap fails with value is out of range for type integer
From
Peter Geoghegan
Date:
On Fri, Mar 6, 2020 at 1:38 AM Victor Yegorov <vyegorov@gmail.com> wrote: > It is not entirely gone, I still have some indexes around that produce this. > This backtracing functionality is available only in the HEAD as far as I can see, so it took me a while > to get to it: > > [3760] ERROR: value "4282444360" is out of range for type integer This has to be the oldest_xact field. If it was any of the other fields, the "%d" format would not result in an error (it would just result in incorrectly displaying a negative integer). oldest_xact is the only field that uses "%u" (unfortunately, the declaration makes the field an int4/integer, so you may see this error). -- Peter Geoghegan
Re: BUG #16285: bt_metap fails with value is out of range for type integer
From
Peter Geoghegan
Date:
On Fri, Mar 6, 2020 at 2:23 PM Peter Geoghegan <pg@bowt.ie> wrote: > This has to be the oldest_xact field. If it was any of the other > fields, the "%d" format would not result in an error (it would just > result in incorrectly displaying a negative integer). oldest_xact is > the only field that uses "%u" (unfortunately, the declaration makes > the field an int4/integer, so you may see this error). Pushed a fix for this just now. Thanks for the report! -- Peter Geoghegan
Re: BUG #16285: bt_metap fails with value is out of range for type integer
From
Victor Yegorov
Date:
вс, 8 мар. 2020 г. в 02:45, Peter Geoghegan <pg@bowt.ie>:
Pushed a fix for this just now.
Thanks for the fix!
Victor Yegorov
Hi, On 2020-03-07 16:45:27 -0800, Peter Geoghegan wrote: > On Fri, Mar 6, 2020 at 2:23 PM Peter Geoghegan <pg@bowt.ie> wrote: > > This has to be the oldest_xact field. If it was any of the other > > fields, the "%d" format would not result in an error (it would just > > result in incorrectly displaying a negative integer). oldest_xact is > > the only field that uses "%u" (unfortunately, the declaration makes > > the field an int4/integer, so you may see this error). > > Pushed a fix for this just now. > > Thanks for the report! ISTM that we need some fix for the back-branches too. Being unable to look at some indexes till 12 has aged out doesn't strike me as good. How about simply printing the wrapped value? That's far from perfect, of course, but clearly better than the current situation in the back branches. Greetings, Andres Freund
Re: BUG #16285: bt_metap fails with value is out of range for type integer
From
Peter Geoghegan
Date:
On Mon, Mar 9, 2020 at 3:09 PM Andres Freund <andres@anarazel.de> wrote: > ISTM that we need some fix for the back-branches too. Being unable to > look at some indexes till 12 has aged out doesn't strike me as good. Actually, the oldest_xact field was added in Postgres 11. > How about simply printing the wrapped value? That's far from perfect, of > course, but clearly better than the current situation in the back > branches. Would you be happy if we always raised a NOTICE that had information about the affected fields? I don't think that we should try to be clever and only do it when we know that it will fail. We should admit that it's broken with a HINT that gets associated with the NOTICE, in order to discourage relying on the number within automated tools. If we were to do this, it would probably only be necessary to backpatch to Postgres 11 and 12. Those are the only stable releases with the oldest_xact field. In practice, it is highly likely to be the thing that causes problems. We will report the root block number at a negative block number when it happens to exceed 2^31-1, but that condition is almost impossible to hit in practice, even when the index size is close to the system-wide limit of relation size. That's why nobody has complained about it in all these years. -- Peter Geoghegan
Hi, On 2020-03-09 15:31:38 -0700, Peter Geoghegan wrote: > On Mon, Mar 9, 2020 at 3:09 PM Andres Freund <andres@anarazel.de> wrote: > > ISTM that we need some fix for the back-branches too. Being unable to > > look at some indexes till 12 has aged out doesn't strike me as good. > > Actually, the oldest_xact field was added in Postgres 11. What do you mean? Since 12 is the newest release affected, we'd potentially (and with increasing likelihood due to clusters living longer) have the problem till 12 is not supported anymore. What am I missing? > > How about simply printing the wrapped value? That's far from perfect, of > > course, but clearly better than the current situation in the back > > branches. > > Would you be happy if we always raised a NOTICE that had information > about the affected fields? I don't think that we should try to be > clever and only do it when we know that it will fail. We should admit > that it's broken with a HINT that gets associated with the NOTICE, in > order to discourage relying on the number within automated tools. I'd just do the s/%u/%d/. > If we were to do this, it would probably only be necessary to > backpatch to Postgres 11 and 12. Those are the only stable releases > with the oldest_xact field. In practice, it is highly likely to be the > thing that causes problems. We will report the root block number at a > negative block number when it happens to exceed 2^31-1, but that > condition is almost impossible to hit in practice, even when the index > size is close to the system-wide limit of relation size. That's why > nobody has complained about it in all these years. pg_class.relpages is also reported as a signed integer :(. Since btm_root/fastroot use %d, it'll just have similar wrapping behaviour. Greetings, Andres Freund
Re: BUG #16285: bt_metap fails with value is out of range for type integer
From
Peter Geoghegan
Date:
On Mon, Mar 9, 2020 at 3:36 PM Andres Freund <andres@anarazel.de> wrote: > What do you mean? Since 12 is the newest release affected, we'd > potentially (and with increasing likelihood due to clusters living > longer) have the problem till 12 is not supported anymore. What am I > missing? But 12 isn't the latest release affected. It just so happens that Victor was using 12, but oldest_xact was actually added by commit 857f9c36 -- that's Postgres 11. To be very precise: I imagine that Victor was using bt_metap() in production on a Postgres 12 installation because he wanted to make sure that his installation had the new stuff (he did a talk about it at EU, so clearly it's of interest to him). The problem is nevertheless not new to Postgres 12. > I'd just do the s/%u/%d/. That's a pretty gross hack. So be it. > pg_class.relpages is also reported as a signed integer :(. Since > btm_root/fastroot use %d, it'll just have similar wrapping behaviour. I guess that means that pageinspect was correct after all! -- Peter Geoghegan
Hi, On 2020-03-09 17:16:47 -0700, Peter Geoghegan wrote: > On Mon, Mar 9, 2020 at 3:36 PM Andres Freund <andres@anarazel.de> wrote: > > What do you mean? Since 12 is the newest release affected, we'd > > potentially (and with increasing likelihood due to clusters living > > longer) have the problem till 12 is not supported anymore. What am I > > missing? > > But 12 isn't the latest release affected. It just so happens that > Victor was using 12, but oldest_xact was actually added by commit > 857f9c36 -- that's Postgres 11. Huh? I think we might be miscommunicating here. My point isn't about the *earliest* release affected, it's about the *latest* version without a fix. IOW, until when is there a supported release without a fix. And once 12 is not supported anymore, 11 is also unsupported. So we'd have a live bug (which would mainly hit while investigating issues) until 12 is unsupported? > To be very precise: I imagine that Victor was using bt_metap() in > production on a Postgres 12 installation because he wanted to make > sure that his installation had the new stuff (he did a talk about it > at EU, so clearly it's of interest to him). The problem is > nevertheless not new to Postgres 12. > > > I'd just do the s/%u/%d/. > > That's a pretty gross hack. So be it. Yea, it is. > > pg_class.relpages is also reported as a signed integer :(. Since > > btm_root/fastroot use %d, it'll just have similar wrapping behaviour. > > I guess that means that pageinspect was correct after all! Well, for some value of correct. I was arguing quite a while ago that we should just make pg_class.relpages a 64bit integer, or introduce an sql level type for block numbers. Greetings, Andres Freund
Re: BUG #16285: bt_metap fails with value is out of range for type integer
From
Peter Geoghegan
Date:
On Mon, Mar 9, 2020 at 5:22 PM Andres Freund <andres@anarazel.de> wrote: > Huh? I think we might be miscommunicating here. My point isn't about the > *earliest* release affected, it's about the *latest* version without a > fix. IOW, until when is there a supported release without a fix. Got it. > And once 12 is not supported anymore, 11 is also unsupported. So we'd > have a live bug (which would mainly hit while investigating issues) > until 12 is unsupported? > > To be very precise: I imagine that Victor was using bt_metap() in > > production on a Postgres 12 installation because he wanted to make > > sure that his installation had the new stuff (he did a talk about it > > at EU, so clearly it's of interest to him). The problem is > > nevertheless not new to Postgres 12. > > > > > I'd just do the s/%u/%d/. > > > > That's a pretty gross hack. So be it. > > Yea, it is. Right. But we only need the gross kludge on 11 and 12 -- there is no "%u" to change on earlier Postgres versions. That will allow all Postgres/pageinspect versions to at least manage to consistently display something within the bt_metap() fields. -- Peter Geoghegan
Re: BUG #16285: bt_metap fails with value is out of range for type integer
From
Peter Geoghegan
Date:
On Mon, Mar 9, 2020 at 5:27 PM Peter Geoghegan <pg@bowt.ie> wrote: > Right. But we only need the gross kludge on 11 and 12 -- there is no > "%u" to change on earlier Postgres versions. That will allow all > Postgres/pageinspect versions to at least manage to consistently > display something within the bt_metap() fields. I pushed commits that make this change to both REL_11_STABLE and REL_12_STABLE. It's a total hack, but better than doing nothing. Somebody just asked about checking the B-Tree version on -general, and I pointed them in the direction of bt_metap(). I think that we'd hear more complaints like Victor's if we left 11 and 12 alone. -- Peter Geoghegan