Thread: Re: Add comment explaining why queryid is int64 in pg_stat_statements
On 15.05.2025 10:08, Shaik Mohammad Mujeeb wrote:
Hi Developers,
In pg_stat_statements.c, the function pg_stat_statements_internal() declares the queryid variable as int64, but assigns it a value of type uint64. At first glance, this might appear to be an overflow issue. However, I think this is intentional - the queryid is cast to int64 to match the bigint type of the queryid column in the pg_stat_statements view.
Please find the attached patch, which adds a clarifying comment to make the rationale explicit and avoid potential confusion for future readers.Thanks and Regards,
Shaik Mohammad Mujeeb
Member Technical Staff
Zoho Corp
I don't think the comment is necessary here. There are no arithmetic or logical operations performed on it. It is only passed as a Datum.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
On 17.05.25 14:49, Michael Paquier wrote: > On Fri, May 16, 2025 at 04:05:01PM +0530, Shaik Mohammad Mujeeb wrote: >> This conversion is intentional - most likely to match the bigint >> type of the queryid column in pg_stat_statements. However, without >> an explicit comment, this can be misleading. A beginner reading this >> might misinterpret it as an unintentional overflow or bug and raise >> unnecessary concerns. Therefore, it´s worth adding a brief comment >> clarifying the intent behind this assignment. > > I don't quite see the value in the comment addition you are suggesting > here: all the user-facing features related to query IDs use signed 64b > integers, and are documented as such in the official docs. The fact > that we store an unsigned value in the backend core code is an > internal implementation artifact, the important point is that we have > a value stored in 8 bytes. There is another, new instance of this confusion. When query IDs are printed to the log, we use a signed-integer format, but the value is unsigned (see log_status_format(), write_jsonlog()). This works correctly in practice, but it triggers warnings from -Wwarnings-format-signedness (not in the default warnings set, but useful to clean up occasionally) and maybe other tools along those lines. This is new because in the past this warning was hidden by additional casts. If we want to keep this arrangement, maybe we should create an explicit function to convert query IDs for output, and then explain all this there. Or why not store query IDs as int64_t internally, too?
On Tue, May 20, 2025 at 02:03:37PM +1200, David Rowley wrote: > Aside from the struct field types changing for Query.queryId, > PlannedStmt.queryId and PgBackendStatus.st_query_id, the > external-facing changes are limited to: > > 1. pgstat_report_query_id() now returns int64 instead of uint64 > 2. pgstat_get_my_query_id() now returns int64 instead of uint64 > 3. pgstat_report_query_id()'s first input parameter is now int64 > > If we were to clean this up, then I expect it's fine to wait until > v19, as it's not really a problem that's new to v18. Hmm. For the query ID, that's not new, but for the plan ID it is. So it seems to me that there could be also an argument for doing all that in v18 rather than releasing v18 with the plan ID being unsigned, keeping a maximum of consistency for both of IDs. AFAIK, this is something that Lukas's plan storing extension exposes as an int64 value to the user and the SQL interfaces, even if it's true that we don't expose it in core, yet. -- Michael
Attachment
On Mon, May 19, 2025 at 08:43:25PM -0700, Lukas Fittl wrote: > Yeah, +1 to making this consistent across both query ID and the new plan > ID, and changing both to int64 internally seems best. Any thoughts from others? I'm OK to take the extra step of switching both fields on HEAD and write a patch for that, relying on what David has sent as a first step towards that. -- Michael
Attachment
On Tue, 20 May 2025 at 17:09, Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, May 19, 2025 at 08:43:25PM -0700, Lukas Fittl wrote: > > Yeah, +1 to making this consistent across both query ID and the new plan > > ID, and changing both to int64 internally seems best. > > Any thoughts from others? I'm OK to take the extra step of switching > both fields on HEAD and write a patch for that, relying on what David > has sent as a first step towards that. Given the planId stuff is new and has the same issue, I think that pushes me towards thinking now is better than later for fixing both. I'm happy to adjust my patch unless you've started working on it already. David
Supporting unsigned INTs will be valuable for more than just this obviously, and if we do ever have that capability, we would likely want to go that route. I know I've been asked about why queryIds could be negative more than a few times. Until then, I think the patch being suggested is reasonable and cleaner. A few comments on the patches: 1/ this was missed in pg_stat_statements ./contrib/pg_stat_statements/pg_stat_statements.c: uint64 saved_queryId = pstmt->queryId; 2/ Should "DatumGetInt64(hash_any_extended(......" be turned into a macro since it's used in several places? Also, we can add a comment in the macro such as " Output the queryId as an int64 rather than a uint64, to match the BIGINT column used to emit queryId in system views " I think this comment is needed to address the confusion that is the original subject of this thread. Otherwise, the confusion will be moved from pg_stat_statements.c to queryjumblefuncs.c DoJumble(JumbleState *jstate, Node *node) { /* Jumble the given node */ @@ -208,9 +208,9 @@ DoJumble(JumbleState *jstate, Node *node) FlushPendingNulls(jstate); /* Process the jumble buffer and produce the hash value */ - return DatumGetUInt64(hash_any_extended(jstate->jumble, - jstate->jumble_len, - 0)); + return DatumGetInt64(hash_any_extended(jstate->jumble, + jstate->jumble_len, + 0)); } /* @@ -256,10 +256,10 @@ AppendJumbleInternal(JumbleState *jstate, const unsigned char *item, if (unlikely(jumble_len >= JUMBLE_SIZE)) { - uint64 start_hash; + int64 start_hash; - start_hash = DatumGetUInt64(hash_any_extended(jumble, - JUMBLE_SIZE, 0)); + start_hash = DatumGetInt64(hash_any_extended(jumble, + JUMBLE_SIZE, 0)); memcpy(jumble, &start_hash, sizeof(start_hash)); jumble_len = sizeof(start_hash); -- Sami Imseih Amazon Web Services (AWS)
On Tue, May 20, 2025 at 11:28:17PM +1200, David Rowley wrote: > Certainly, a bit late, yes. It basically requires implementing > unsigned types on the SQL level. I believe there are a few sunken > ships along that coastline, and plenty of history in the archives if > you want to understand some of the difficulties. Providing some more context and some more information than this reply, the latest thread that I can find on the matter is this one, from December 2024: https://www.postgresql.org/message-id/CAN3gO4sPBKbfWYK10i294u3kzsfDb4WX891FMbjLnKjMS08u7A%40mail.gmail.com It summarizes nicely the situation and it contains some more general points. This particular point from Tom about numeric promotion and casting hierarchy resonates as a very good one: https://www.postgresql.org/message-id/3180774.1733588677%40sss.pgh.pa.us My own bet is that if you don't want to lose any existing functionality, perhaps we should be just more aggressive with casting requirements on input to begin with even if it means more work when writing queries, even if it comes at a loss of usability that should still be something.. FWIW, I've wanted an unsigned in-core type more than once. It would be a lot of work, but we have a lot of things that exist in the core code that map to unsigned 32b or 64b integer values. Just naming one, WAL LSN difference calculations. XLogRecPtr is a 64b unsigned value, not its representation at SQL level, meaning that we'd overflow after reaching the threshold of the bit tracking the signedness. It's true that a system would need to live long enough to reach such LSNs, but we have also 32b integers plugged here and there. Another one is block numbers, or OID which is an in-core type. Having an in-core unsigned integer type would scale better that creating types mapping to every single internal core concept. -- Michael
Attachment
On Tue, May 20, 2025 at 10:35:39AM -0500, Sami Imseih wrote: > 1/ this was missed in pg_stat_statements > ./contrib/pg_stat_statements/pg_stat_statements.c: uint64 > saved_queryId = pstmt->queryId; Right. Some greps based on "queryid" and "query_id" point that this is the last reference to uint in the tree. > 2/ Should "DatumGetInt64(hash_any_extended(......" be turned into a > macro since it's used in > several places? Also, we can add a comment in the macro such as > " > Output the queryId as an int64 rather than a uint64, to match the > BIGINT column used to emit > queryId in system views > " We are talking about two places in queryjumblefuncs.c. Leaving the code as in David's original proposal is fine IMO. Adding a comment about the implied uint64 -> int64 conversion when calling hash_any_extended() sounds like a good idea to document what we want from the hash. I've had a closer look at David's patch, and I did not spot other places that required similar adjustments. I may have missed something, of course. David, how would you like to move on with this patch set? I can take responsibility for both patches as the plan ID change can qualify as an open item. -- Michael
Attachment
On 20.05.25 08:38, Michael Paquier wrote: > On Tue, May 20, 2025 at 05:51:51PM +1200, David Rowley wrote: >> Given the planId stuff is new and has the same issue, I think that >> pushes me towards thinking now is better than later for fixing both. >> >> I'm happy to adjust my patch unless you've started working on it already. > > Here you go with the attached, to-be-applied on top of your own patch. Whichever way we're going, surely this whole thing could benefit from a typedef something QueryId;
On Thu, May 22, 2025 at 02:36:38PM +1200, David Rowley wrote: > You could argue that if it reduces the locations that need to be > changed by using a typedef, then it's a win. But there are also > negative aspects to typedefs that need to be considered. For me, those > are the added level of indirection of code reading to actually who > what type I'm working with. I personally dislike typedefs like > "typedef PageHeaderData *PageHeader;" as they hide the fact I'm > working with a pointer. > > I'm not outright objecting to the typedef for this. It's just I don't > see it as a clear-cut improvement for this case. Same opinion here. I am not quite clear what there is to gain in hiding the query ID behind a typedef, or even apply that to the plan ID. I have added an open item about the plan ID part as it applies to v18, adding the RMT in CC to get an opinion. If we cannot get a consensus on all that, letting things as they are is still logically correct, even with the -Wwarnings-format-signedness argument which is not included by default currently. Has somebody an opinion to offer? -- Michael
Attachment
On Thu, May 29, 2025 at 01:53:07PM +0900, Michael Paquier wrote: > On Thu, May 22, 2025 at 01:01:14PM +0900, Michael Paquier wrote: >> I have added an open item about the plan ID part as it applies to v18, >> adding the RMT in CC to get an opinion. If we cannot get a consensus >> on all that, letting things as they are is still logically correct, >> even with the -Wwarnings-format-signedness argument which is not >> included by default currently. >> >> Has somebody an opinion to offer? > > It has been one week since this last update, and there has been > nothing except the sound of cicadas. IMO, I think that we should just > pull the switch and make both of these IDs signed on HEAD, taking case > of the potential signedness warning issues. > > Now, I don't really want to take a leap of faith without the RMT being > OK with that now that we are in beta1. After reading through this thread and the latest patch set, I don't see any strong reason for the RMT to object to this change for v18. IIUC some extensions may need to adapt, but we're still a few months from 18.0, so that seems okay. I vaguely recall that we've made other small extension-breaking changes during the beta period for previous major releases. -- nathan
On Thu, May 29, 2025 at 09:28:35AM -0500, Nathan Bossart wrote: > On Thu, May 29, 2025 at 01:53:07PM +0900, Michael Paquier wrote: >> Now, I don't really want to take a leap of faith without the RMT being >> OK with that now that we are in beta1. > > After reading through this thread and the latest patch set, I don't see any > strong reason for the RMT to object to this change for v18. IIUC some > extensions may need to adapt, but we're still a few months from 18.0, so > that seems okay. I vaguely recall that we've made other small > extension-breaking changes during the beta period for previous major > releases. Thanks, Nathan. Let's proceed with the change then. David, would you prefer handling the patch you have written by yourself for the query ID part? -- Michael
Attachment
On Fri, 30 May 2025 at 11:35, Michael Paquier <michael@paquier.xyz> wrote: > Thanks, Nathan. Let's proceed with the change then. David, would you > prefer handling the patch you have written by yourself for the query > ID part? Yes. I'll look at that again shortly. David
Thanks for the review. On Wed, 21 May 2025 at 03:35, Sami Imseih <samimseih@gmail.com> wrote: > 2/ Should "DatumGetInt64(hash_any_extended(......" be turned into a > macro since it's used in > several places? Also, we can add a comment in the macro such as > " > Output the queryId as an int64 rather than a uint64, to match the > BIGINT column used to emit > queryId in system views > " > > I think this comment is needed to address the confusion that is the > original subject of this thread. Otherwise, > the confusion will be moved from pg_stat_statements.c to queryjumblefuncs.c I ended up adding a comment in the Query struct detailing why queryId is signed. I imagine, as time passes, we might forget why we did that as hashed values are more naturally unsigned. I think it's wrong to add comments in DoJumble() to mention details about what the calling function does. I think the fact that DoJumble() now returns int64 should be a good enough explanation as to why we want to get the hash value in signed form. David
Re: Add comment explaining why queryid is int64 in pg_stat_statements
From
Shaik Mohammad Mujeeb
Date:
Thanks for making the appropriate changes, David.
Now everything make sense. :)
Now everything make sense. :)
---- On Fri, 30 May 2025 16:39:23 +0530 David Rowley <dgrowleyml@gmail.com> wrote ---
On Fri, 30 May 2025 at 11:51, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 30 May 2025 at 11:35, Michael Paquier <michael@paquier.xyz> wrote:
> > Thanks, Nathan. Let's proceed with the change then. David, would you
> > prefer handling the patch you have written by yourself for the query
> > ID part?
>
> Yes. I'll look at that again shortly.
Now done.
David