Thread: Re: Add comment explaining why queryid is int64 in pg_stat_statements


On 15.05.2025 10:08, Shaik Mohammad Mujeeb wrote:
Hi Developers,

In pg_stat_statements.c, the function pg_stat_statements_internal() declares the queryid variable as int64, but assigns it a value of type uint64. At first glance, this might appear to be an overflow issue. However, I think this is intentional - the queryid is cast to int64 to match the bigint type of the queryid column in the pg_stat_statements view.

Please find the attached patch, which adds a clarifying comment to make the rationale explicit and avoid potential confusion for future readers.



Thanks and Regards,
Shaik Mohammad Mujeeb
Member Technical Staff
Zoho Corp



I don't think the comment is necessary here. There are no arithmetic or logical operations performed on it. It is only passed as a Datum.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Add comment explaining why queryid is int64 in pg_stat_statements

From
Peter Eisentraut
Date:
On 17.05.25 14:49, Michael Paquier wrote:
> On Fri, May 16, 2025 at 04:05:01PM +0530, Shaik Mohammad Mujeeb wrote:
>> This conversion is intentional - most likely to match the bigint
>> type of the queryid column in pg_stat_statements. However, without
>> an explicit comment, this can be misleading. A beginner reading this
>> might misinterpret it as an unintentional overflow or bug and raise
>> unnecessary concerns. Therefore, it´s worth adding a brief comment
>> clarifying the intent behind this assignment.
> 
> I don't quite see the value in the comment addition you are suggesting
> here: all the user-facing features related to query IDs use signed 64b
> integers, and are documented as such in the official docs.  The fact
> that we store an unsigned value in the backend core code is an
> internal implementation artifact, the important point is that we have
> a value stored in 8 bytes.

There is another, new instance of this confusion.  When query IDs are 
printed to the log, we use a signed-integer format, but the value is 
unsigned (see log_status_format(), write_jsonlog()).  This works 
correctly in practice, but it triggers warnings from 
-Wwarnings-format-signedness (not in the default warnings set, but 
useful to clean up occasionally) and maybe other tools along those 
lines.  This is new because in the past this warning was hidden by 
additional casts.  If we want to keep this arrangement, maybe we should 
create an explicit function to convert query IDs for output, and then 
explain all this there.  Or why not store query IDs as int64_t 
internally, too?



Re: Add comment explaining why queryid is int64 in pg_stat_statements

From
Michael Paquier
Date:
On Tue, May 20, 2025 at 02:03:37PM +1200, David Rowley wrote:
> Aside from the struct field types changing for Query.queryId,
> PlannedStmt.queryId and PgBackendStatus.st_query_id, the
> external-facing changes are limited to:
>
> 1. pgstat_report_query_id() now returns int64 instead of uint64
> 2. pgstat_get_my_query_id()  now returns int64 instead of uint64
> 3. pgstat_report_query_id()'s first input parameter is now int64
>
> If we were to clean this up, then I expect it's fine to wait until
> v19, as it's not really a problem that's new to v18.

Hmm.  For the query ID, that's not new, but for the plan ID it is.  So
it seems to me that there could be also an argument for doing all that
in v18 rather than releasing v18 with the plan ID being unsigned,
keeping a maximum of consistency for both of IDs.  AFAIK, this is
something that Lukas's plan storing extension exposes as an int64
value to the user and the SQL interfaces, even if it's true that we
don't expose it in core, yet.
--
Michael

Attachment

Re: Add comment explaining why queryid is int64 in pg_stat_statements

From
Michael Paquier
Date:
On Mon, May 19, 2025 at 08:43:25PM -0700, Lukas Fittl wrote:
> Yeah, +1 to making this consistent across both query ID and the new plan
> ID, and changing both to int64 internally seems best.

Any thoughts from others?  I'm OK to take the extra step of switching
both fields on HEAD and write a patch for that, relying on what David
has sent as a first step towards that.
--
Michael

Attachment
On Tue, 20 May 2025 at 17:09, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, May 19, 2025 at 08:43:25PM -0700, Lukas Fittl wrote:
> > Yeah, +1 to making this consistent across both query ID and the new plan
> > ID, and changing both to int64 internally seems best.
>
> Any thoughts from others?  I'm OK to take the extra step of switching
> both fields on HEAD and write a patch for that, relying on what David
> has sent as a first step towards that.

Given the planId stuff is new and has the same issue, I think that
pushes me towards thinking now is better than later for fixing both.

I'm happy to adjust my patch unless you've started working on it already.

David



Supporting unsigned INTs will be valuable for more than just this
obviously, and if we do
ever have that capability, we would likely want to go that route. I
know I've been asked about
why queryIds could be negative more than a few times. Until then, I
think the patch being
suggested is reasonable and cleaner.

A few comments on the patches:

1/ this was missed in pg_stat_statements
./contrib/pg_stat_statements/pg_stat_statements.c:    uint64
saved_queryId = pstmt->queryId;

2/ Should "DatumGetInt64(hash_any_extended(......" be turned into a
macro since it's used in
several places? Also, we can add a comment in the macro such as
"
Output the queryId as an int64 rather than a uint64, to match the
BIGINT column used to emit
queryId in system views
"

I think this comment is needed to address the confusion that is the
original subject of this thread. Otherwise,
the confusion will be moved from pg_stat_statements.c to queryjumblefuncs.c

DoJumble(JumbleState *jstate, Node *node)
{
/* Jumble the given node */
@@ -208,9 +208,9 @@ DoJumble(JumbleState *jstate, Node *node)
FlushPendingNulls(jstate);
/* Process the jumble buffer and produce the hash value */
- return DatumGetUInt64(hash_any_extended(jstate->jumble,
- jstate->jumble_len,
- 0));
+ return DatumGetInt64(hash_any_extended(jstate->jumble,
+ jstate->jumble_len,
+ 0));
}
/*
@@ -256,10 +256,10 @@ AppendJumbleInternal(JumbleState *jstate, const
unsigned char *item,
if (unlikely(jumble_len >= JUMBLE_SIZE))
{
- uint64 start_hash;
+ int64 start_hash;
- start_hash = DatumGetUInt64(hash_any_extended(jumble,
- JUMBLE_SIZE, 0));
+ start_hash = DatumGetInt64(hash_any_extended(jumble,
+ JUMBLE_SIZE, 0));
memcpy(jumble, &start_hash, sizeof(start_hash));
jumble_len = sizeof(start_hash);

--
Sami Imseih
Amazon Web Services (AWS)



Re: Add comment explaining why queryid is int64 in pg_stat_statements

From
Michael Paquier
Date:
On Tue, May 20, 2025 at 11:28:17PM +1200, David Rowley wrote:
> Certainly, a bit late, yes. It basically requires implementing
> unsigned types on the SQL level. I believe there are a few sunken
> ships along that coastline, and plenty of history in the archives if
> you want to understand some of the difficulties.

Providing some more context and some more information than this reply,
the latest thread that I can find on the matter is this one, from
December 2024:
https://www.postgresql.org/message-id/CAN3gO4sPBKbfWYK10i294u3kzsfDb4WX891FMbjLnKjMS08u7A%40mail.gmail.com

It summarizes nicely the situation and it contains some more general
points.

This particular point from Tom about numeric promotion and casting
hierarchy resonates as a very good one:
https://www.postgresql.org/message-id/3180774.1733588677%40sss.pgh.pa.us
My own bet is that if you don't want to lose any existing
functionality, perhaps we should be just more aggressive with casting
requirements on input to begin with even if it means more work when
writing queries, even if it comes at a loss of usability that should
still be something..

FWIW, I've wanted an unsigned in-core type more than once.  It would
be a lot of work, but we have a lot of things that exist in the core
code that map to unsigned 32b or 64b integer values.  Just naming one,
WAL LSN difference calculations.  XLogRecPtr is a 64b unsigned value,
not its representation at SQL level, meaning that we'd overflow after
reaching the threshold of the bit tracking the signedness.  It's true
that a system would need to live long enough to reach such LSNs, but
we have also 32b integers plugged here and there.  Another one is
block numbers, or OID which is an in-core type.  Having an in-core
unsigned integer type would scale better that creating types mapping
to every single internal core concept.
--
Michael

Attachment

Re: Add comment explaining why queryid is int64 in pg_stat_statements

From
Michael Paquier
Date:
On Tue, May 20, 2025 at 10:35:39AM -0500, Sami Imseih wrote:
> 1/ this was missed in pg_stat_statements
> ./contrib/pg_stat_statements/pg_stat_statements.c:    uint64
> saved_queryId = pstmt->queryId;

Right.  Some greps based on "queryid" and "query_id" point that this
is the last reference to uint in the tree.

> 2/ Should "DatumGetInt64(hash_any_extended(......" be turned into a
> macro since it's used in
> several places? Also, we can add a comment in the macro such as
> "
> Output the queryId as an int64 rather than a uint64, to match the
> BIGINT column used to emit
> queryId in system views
> "

We are talking about two places in queryjumblefuncs.c.  Leaving the
code as in David's original proposal is fine IMO.  Adding a comment
about the implied uint64 -> int64 conversion when calling
hash_any_extended() sounds like a good idea to document what we want
from the hash.

I've had a closer look at David's patch, and I did not spot other
places that required similar adjustments.  I may have missed
something, of course.

David, how would you like to move on with this patch set?  I can take
responsibility for both patches as the plan ID change can qualify as
an open item.
--
Michael

Attachment

Re: Add comment explaining why queryid is int64 in pg_stat_statements

From
Peter Eisentraut
Date:
On 20.05.25 08:38, Michael Paquier wrote:
> On Tue, May 20, 2025 at 05:51:51PM +1200, David Rowley wrote:
>> Given the planId stuff is new and has the same issue, I think that
>> pushes me towards thinking now is better than later for fixing both.
>>
>> I'm happy to adjust my patch unless you've started working on it already.
> 
> Here you go with the attached, to-be-applied on top of your own patch.

Whichever way we're going, surely this whole thing could benefit from a

     typedef something QueryId;




Re: Add comment explaining why queryid is int64 in pg_stat_statements

From
Michael Paquier
Date:
On Thu, May 22, 2025 at 02:36:38PM +1200, David Rowley wrote:
> You could argue that if it reduces the locations that need to be
> changed by using a typedef, then it's a win. But there are also
> negative aspects to typedefs that need to be considered. For me, those
> are the added level of indirection of code reading to actually who
> what type I'm working with. I personally dislike typedefs like
> "typedef PageHeaderData *PageHeader;" as they hide the fact I'm
> working with a pointer.
>
> I'm not outright objecting to the typedef for this. It's just I don't
> see it as a clear-cut improvement for this case.

Same opinion here.  I am not quite clear what there is to gain in
hiding the query ID behind a typedef, or even apply that to the plan
ID.

I have added an open item about the plan ID part as it applies to v18,
adding the RMT in CC to get an opinion.  If we cannot get a consensus
on all that, letting things as they are is still logically correct,
even with the -Wwarnings-format-signedness argument which is not
included by default currently.

Has somebody an opinion to offer?
--
Michael

Attachment
On Thu, May 29, 2025 at 01:53:07PM +0900, Michael Paquier wrote:
> On Thu, May 22, 2025 at 01:01:14PM +0900, Michael Paquier wrote:
>> I have added an open item about the plan ID part as it applies to v18,
>> adding the RMT in CC to get an opinion.  If we cannot get a consensus
>> on all that, letting things as they are is still logically correct,
>> even with the -Wwarnings-format-signedness argument which is not
>> included by default currently.
>> 
>> Has somebody an opinion to offer?
> 
> It has been one week since this last update, and there has been
> nothing except the sound of cicadas.  IMO, I think that we should just
> pull the switch and make both of these IDs signed on HEAD, taking case
> of the potential signedness warning issues.
> 
> Now, I don't really want to take a leap of faith without the RMT being
> OK with that now that we are in beta1.

After reading through this thread and the latest patch set, I don't see any
strong reason for the RMT to object to this change for v18.  IIUC some
extensions may need to adapt, but we're still a few months from 18.0, so
that seems okay.  I vaguely recall that we've made other small
extension-breaking changes during the beta period for previous major
releases.

-- 
nathan



Re: Add comment explaining why queryid is int64 in pg_stat_statements

From
Michael Paquier
Date:
On Thu, May 29, 2025 at 09:28:35AM -0500, Nathan Bossart wrote:
> On Thu, May 29, 2025 at 01:53:07PM +0900, Michael Paquier wrote:
>> Now, I don't really want to take a leap of faith without the RMT being
>> OK with that now that we are in beta1.
>
> After reading through this thread and the latest patch set, I don't see any
> strong reason for the RMT to object to this change for v18.  IIUC some
> extensions may need to adapt, but we're still a few months from 18.0, so
> that seems okay.  I vaguely recall that we've made other small
> extension-breaking changes during the beta period for previous major
> releases.

Thanks, Nathan.  Let's proceed with the change then.  David, would you
prefer handling the patch you have written by yourself for the query
ID part?
--
Michael

Attachment
On Fri, 30 May 2025 at 11:35, Michael Paquier <michael@paquier.xyz> wrote:
> Thanks, Nathan.  Let's proceed with the change then.  David, would you
> prefer handling the patch you have written by yourself for the query
> ID part?

Yes. I'll look at that again shortly.

David



Thanks for the review.

On Wed, 21 May 2025 at 03:35, Sami Imseih <samimseih@gmail.com> wrote:
> 2/ Should "DatumGetInt64(hash_any_extended(......" be turned into a
> macro since it's used in
> several places? Also, we can add a comment in the macro such as
> "
> Output the queryId as an int64 rather than a uint64, to match the
> BIGINT column used to emit
> queryId in system views
> "
>
> I think this comment is needed to address the confusion that is the
> original subject of this thread. Otherwise,
> the confusion will be moved from pg_stat_statements.c to queryjumblefuncs.c

I ended up adding a comment in the Query struct detailing why queryId
is signed. I imagine, as time passes, we might forget why we did that
as hashed values are more naturally unsigned. I think it's wrong to
add comments in DoJumble() to mention details about what the calling
function does. I think the fact that DoJumble() now returns int64
should be a good enough explanation as to why we want to get the hash
value in signed form.

David



Re: Add comment explaining why queryid is int64 in pg_stat_statements

From
Shaik Mohammad Mujeeb
Date:
Thanks for making the appropriate changes, David.

Now everything make sense. :)





---- On Fri, 30 May 2025 16:39:23 +0530 David Rowley <dgrowleyml@gmail.com> wrote ---

On Fri, 30 May 2025 at 11:51, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 30 May 2025 at 11:35, Michael Paquier <michael@paquier.xyz> wrote:
> > Thanks, Nathan. Let's proceed with the change then. David, would you
> > prefer handling the patch you have written by yourself for the query
> > ID part?
>
> Yes. I'll look at that again shortly.

Now done.

David