Thread: [HACKERS] 64-bit queryId?

[HACKERS] 64-bit queryId?

From
Robert Haas
Date:
Our Query currently has space for a 32-bit queryId, but that seems
reasonably likely to result in collisions:

https://en.wikipedia.org/wiki/Birthday_problem#Probability_table

If you have as many as 50,000 queries, there's a 25% probability of
having at least one collision; that doesn't seem particularly
unrealistic.  Obviously, normalization reduces the number of distinct
queries a lot, but if queries are dynamically generated you might
still have quite a few of them.

How about widening the value to uint64?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> How about widening the value to uint64?

Doesn't really seem like that would guarantee no collisions.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Michael Paquier
Date:
On Sat, Sep 30, 2017 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> How about widening the value to uint64?
>
> Doesn't really seem like that would guarantee no collisions.

This moves the possibility of a 25% collision from 50k queries 3.3
billion. Not zero, but safe enough as a minimal change for many years
to come.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Sat, Sep 30, 2017 at 12:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> How about widening the value to uint64?
>
> Doesn't really seem like that would guarantee no collisions.

Well, no duh.  If you come up with a hash function that maps an
infinite domain onto a finite range while guaranteeing no collisions,
I will look forward to reading the paper with interest.

Assuming, however, that you don't manage to prove all known
mathematics inconsistent, what one might reasonably hope to do is
render collisions remote enough that one need not worry about them too
much in practice.  From that point of view, reducing the probability
of a collision by several orders of magnitude seems worth doing if (1)
the cost isn't too much and (2) the probability of a collision as
things stand is significant.  I argue that both of those things are
probably true.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Peter Geoghegan
Date:
On Sat, Sep 30, 2017 at 7:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Assuming, however, that you don't manage to prove all known
> mathematics inconsistent, what one might reasonably hope to do is
> render collisions remote enough that one need not worry about them too
> much in practice.

Isn't that already true in the case of queryId? I've never heard any
complaints about collisions. Most people don't change
pg_stat_statements.max, so the probability of a collision is more like
1%. And, that's the probability of *any* collision, not the
probability of a collision that the user actually cares about. The
majority of entries in pg_stat_statements among those ten thousand
will not be interesting. Often, 90%+ will be one-hit wonders. If that
isn't true, then you're probably not going to find pg_stat_statements
very useful, because you have nothing to focus on.

I have heard complaints about a number of different things in
pg_stat_statements, like the fact that we don't always manage to
replace constants with '?'/'$n' characters in all cases. I heard about
that quite a few times during my time at Heroku. But never this.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Peter Geoghegan
Date:
On Sat, Sep 30, 2017 at 8:39 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> Isn't that already true in the case of queryId? I've never heard any
> complaints about collisions. Most people don't change
> pg_stat_statements.max, so the probability of a collision is more like
> 1%. And, that's the probability of *any* collision, not the
> probability of a collision that the user actually cares about. The
> majority of entries in pg_stat_statements among those ten thousand
> will not be interesting.

Correction: ten thousand is an example value of pg_stat_statements.max
in the docs, not the default value. The default is five thousand.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Sat, Sep 30, 2017 at 11:39 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Sat, Sep 30, 2017 at 7:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Assuming, however, that you don't manage to prove all known
>> mathematics inconsistent, what one might reasonably hope to do is
>> render collisions remote enough that one need not worry about them too
>> much in practice.
>
> Isn't that already true in the case of queryId? I've never heard any
> complaints about collisions. Most people don't change
> pg_stat_statements.max, so the probability of a collision is more like
> 1%. And, that's the probability of *any* collision, not the
> probability of a collision that the user actually cares about. The
> majority of entries in pg_stat_statements among those ten thousand
> will not be interesting. Often, 90%+ will be one-hit wonders. If that
> isn't true, then you're probably not going to find pg_stat_statements
> very useful, because you have nothing to focus on.

Well, I think that the fact that pg_stat_statements.max exists at all
is something that could be fixed now that we have DSA.  It's hard to
tell right now whether the fact that we don't hear about collisions is
an artifact of that limit or of the underlying workload.  Also, if
someone did have collisions, how would they even know?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Andres Freund
Date:
On 2017-09-30 11:50:08 -0400, Robert Haas wrote:
> Well, I think that the fact that pg_stat_statements.max exists at all
> is something that could be fixed now that we have DSA.

You normally *do* want a limit imo. And given that query strings are now
stored externally, I don't think there's a huge space concern anymore?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> On Sat, Sep 30, 2017 at 7:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Assuming, however, that you don't manage to prove all known
>> mathematics inconsistent, what one might reasonably hope to do is
>> render collisions remote enough that one need not worry about them too
>> much in practice.

> Isn't that already true in the case of queryId? I've never heard any
> complaints about collisions.

More to the point: with 32-bit IDs, it's apparent that you shouldn't
really rely on them being unique, and should design your usage so that
it will survive collisions.  Robert seems to be arguing that if we
merely made the IDs wider, it would be okay to design applications that
don't allow for that and would fail hard on a collision.  I'm reminded
of Weinberg's famous line "If builders built houses the way programmers
build programs, the first woodpecker to come along would destroy
civilization".
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Andres Freund
Date:
On 2017-09-30 12:03:57 -0400, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > On Sat, Sep 30, 2017 at 7:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> Assuming, however, that you don't manage to prove all known
> >> mathematics inconsistent, what one might reasonably hope to do is
> >> render collisions remote enough that one need not worry about them too
> >> much in practice.
> 
> > Isn't that already true in the case of queryId? I've never heard any
> > complaints about collisions.
> 
> More to the point: with 32-bit IDs, it's apparent that you shouldn't
> really rely on them being unique, and should design your usage so that
> it will survive collisions.  Robert seems to be arguing that if we
> merely made the IDs wider, it would be okay to design applications that
> don't allow for that and would fail hard on a collision.  I'm reminded
> of Weinberg's famous line "If builders built houses the way programmers
> build programs, the first woodpecker to come along would destroy
> civilization".

I think you're putting words and intent into Robert's mouth.  If you
design a hashtable you're concerned about unnecessary collisions, even
if you're aware of them.  Additionally, it's not clear you always can do
all that much about the collisions, without accepting undue overhead -
see e.g. pg_stat_statements.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Sat, Sep 30, 2017 at 12:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> More to the point: with 32-bit IDs, it's apparent that you shouldn't
> really rely on them being unique, and should design your usage so that
> it will survive collisions.

But the point is precisely that we do not do that.  pg_stat_statements
"survives" collisions, but nothing good happens.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Alexander Korotkov
Date:
On Sat, Sep 30, 2017 at 6:39 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Sat, Sep 30, 2017 at 7:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Assuming, however, that you don't manage to prove all known
> mathematics inconsistent, what one might reasonably hope to do is
> render collisions remote enough that one need not worry about them too
> much in practice.

Isn't that already true in the case of queryId? I've never heard any
complaints about collisions. Most people don't change
pg_stat_statements.max, so the probability of a collision is more like
1%. And, that's the probability of *any* collision, not the
probability of a collision that the user actually cares about. The
majority of entries in pg_stat_statements among those ten thousand
will not be interesting. Often, 90%+ will be one-hit wonders. If that
isn't true, then you're probably not going to find pg_stat_statements
very useful, because you have nothing to focus on.

I heard from customers that they periodically dump contents of pg_stat_statements and then build statistics over long period of time.  If even they leave default pg_stat_statements.max unchanged, probability of collision would be significantly higher.

I have heard complaints about a number of different things in
pg_stat_statements, like the fact that we don't always manage to
replace constants with '?'/'$n' characters in all cases. I heard about
that quite a few times during my time at Heroku. But never this.

I'm not sure that we should be oriented only by users complaints in this problem by couple reasons.

1) Some defects could leave unrevealed by users during long periods of time.  We have examples of GiST picksplit algorithm to be bad resulting bad query performance.  However, users never complained about that because they didn't know what is real fair performance for this kind of queries.  In the same way users may suffer from queryId collisions during long time without noticing it.  They might have inaccurate statistics of queries execution, but they don't notice it because they have nothing to compare.

2) Ideally, we should fix potential problems before they materialize, not only after users complaints.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Sat, Sep 30, 2017 at 11:55 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-09-30 11:50:08 -0400, Robert Haas wrote:
>> Well, I think that the fact that pg_stat_statements.max exists at all
>> is something that could be fixed now that we have DSA.
>
> You normally *do* want a limit imo. And given that query strings are now
> stored externally, I don't think there's a huge space concern anymore?

Well, that's probably true.  I guess you don't want your query
generator to run the system out of memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Andres Freund
Date:
On 2017-09-30 18:17:37 -0400, Robert Haas wrote:
> On Sat, Sep 30, 2017 at 11:55 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2017-09-30 11:50:08 -0400, Robert Haas wrote:
> >> Well, I think that the fact that pg_stat_statements.max exists at all
> >> is something that could be fixed now that we have DSA.
> >
> > You normally *do* want a limit imo. And given that query strings are now
> > stored externally, I don't think there's a huge space concern anymore?
> 
> Well, that's probably true.  I guess you don't want your query
> generator to run the system out of memory.

Hah. I'm just pondering running sqlsmith against a pg without such a
limit.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Greg Stark
Date:
On 30 September 2017 at 21:03, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

> I heard from customers that they periodically dump contents of
> pg_stat_statements and then build statistics over long period of time.  If
> even they leave default pg_stat_statements.max unchanged, probability of
> collision would be significantly higher.

Indeed. It's simple enough to export stats to prometheus with queryid
as the key. Then even if the query ages out of the database stats you
have graphs and derivative metrics going further back.

I have to admit this was my first reaction to the idea of using sha1
hashes for git commits as well. But eventually I came around. If the
chances of a hash collision are smaller than a cosmic ray flipping a
bit or a digital electronics falling into a meta-stable state then I
had to admit there's not much value in being theoretically more
correct.

In practice if the query has aged out of pg_stat_statements and you're
exporting pg_stat_statements metrics to longer-term storage there's
really nothing more "correct" than just using a long enough hash and
assuming there are no collisions anyways. If 64-bits is still not
sufficient we could just go to 160-bit sha1 or 256-bit sha256.

Actually there's a reason I'm wondering if we shouldn't use a
cryptographic hash or even an HMAC. Currently if you're non-superuser
we, quite sensibly, hide the query text. But we also hide the queryid.
The latter is really inconvenient since it really makes the stats
utterly useless. I'm not sure what the rationale was but the only
thing I can think of is a fear that it's possible to reverse engineer
the query using brute force. An HMAC, or for most purposes even a
simple cryptographic hash with a secret salt would make that
impossible.

(I have other advances in pg_stat_statements I would love to see. It
would be so much more helpful if pg_stat_statements also kept a few
examples of query parameters such as the most recent set, the set that
caused the longest execution, maybe the set with the largest of each
metric.)

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes:
> Indeed. It's simple enough to export stats to prometheus with queryid
> as the key. Then even if the query ages out of the database stats you
> have graphs and derivative metrics going further back.

I'm not really ready to buy into that as a supported use-case, because
it immediately leads to the conclusion that we need to promise stability
of query IDs across PG versions, which seems entirely infeasible to me
(at least with anything like the current method of deriving them).
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Greg Stark
Date:
On 1 October 2017 at 16:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <stark@mit.edu> writes:
>> Indeed. It's simple enough to export stats to prometheus with queryid
>> as the key. Then even if the query ages out of the database stats you
>> have graphs and derivative metrics going further back.
>
> I'm not really ready to buy into that as a supported use-case, because
> it immediately leads to the conclusion that we need to promise stability
> of query IDs across PG versions, which seems entirely infeasible to me
> (at least with anything like the current method of deriving them).

Well these kinds of monitoring systems tend to be used by operations
people who are a lot more practical and a lot less worried about
theoretical concerns like that.

What they're going to want to do is things like "alert on any query
using more than 0.1 cpu core" or "alert on any query being the top i/o
consumer that isn't amongst the top 10 over the previous day" or
"alert on any query using more than 4x the cpu or i/o on one database
than it does on average across all our databases".  That last one is a
case where it would be nice if the queryid values were stable across
versions but it's hardly surprising that they aren't.

In context the point was merely that the default
pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit
values are enough. It wouldn't be hard for there to be 64k different
queries over time and across all the databases in a fleet and at that
point it becomes likely there'll be a 32-bit collision.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Sun, Oct 1, 2017 at 3:48 PM, Greg Stark <stark@mit.edu> wrote:
> Well these kinds of monitoring systems tend to be used by operations
> people who are a lot more practical and a lot less worried about
> theoretical concerns like that.

+1, well said.

> In context the point was merely that the default
> pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit
> values are enough. It wouldn't be hard for there to be 64k different
> queries over time and across all the databases in a fleet and at that
> point it becomes likely there'll be a 32-bit collision.

Yeah.

I think Alexander Korotkov's points are quite good, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Magnus Hagander
Date:


On Mon, Oct 2, 2017 at 1:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Oct 1, 2017 at 3:48 PM, Greg Stark <stark@mit.edu> wrote:
> Well these kinds of monitoring systems tend to be used by operations
> people who are a lot more practical and a lot less worried about
> theoretical concerns like that.

+1, well said.

+1 as well. I think these people would be perfectly find by it changing across a version upgrade as long as they know that's the deal. *Most* of the time there is no version upgrade going on, so it would work fine that time.

Most operations people already deal with a lot of such parameters changing around. I'm sure most of them would be more than happy with an improvement, even if it's not mathematically perfect.

--

Re: [HACKERS] 64-bit queryId?

From
"Joshua D. Drake"
Date:
On 10/01/2017 04:22 PM, Robert Haas wrote:
> On Sun, Oct 1, 2017 at 3:48 PM, Greg Stark <stark@mit.edu> wrote:
>> Well these kinds of monitoring systems tend to be used by operations
>> people who are a lot more practical and a lot less worried about
>> theoretical concerns like that.
> 
> +1, well said.
> 
>> In context the point was merely that the default
>> pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit
>> values are enough. It wouldn't be hard for there to be 64k different
>> queries over time and across all the databases in a fleet and at that
>> point it becomes likely there'll be a 32-bit collision.
> 
> Yeah.
> 
> I think Alexander Korotkov's points are quite good, too.
> 

+1 to both of these as well.

jD

-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
*****     Unless otherwise stated, opinions are my own.   *****


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Mon, Oct 2, 2017 at 11:02 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
> +1 to both of these as well.

OK, so here's a patch.  Review appreciated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] 64-bit queryId?

From
Gavin Flower
Date:
On 03/10/17 04:02, Joshua D. Drake wrote:
> On 10/01/2017 04:22 PM, Robert Haas wrote:
>> On Sun, Oct 1, 2017 at 3:48 PM, Greg Stark <stark@mit.edu> wrote:
>>> Well these kinds of monitoring systems tend to be used by operations
>>> people who are a lot more practical and a lot less worried about
>>> theoretical concerns like that.
>>
>> +1, well said.
>>
>>> In context the point was merely that the default
>>> pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit
>>> values are enough. It wouldn't be hard for there to be 64k different
>>> queries over time and across all the databases in a fleet and at that
>>> point it becomes likely there'll be a 32-bit collision.
>>
>> Yeah.
>>
>> I think Alexander Korotkov's points are quite good, too.
>>
>
> +1 to both of these as well.
>
> jD
>
Did a calculation:

#         probability of collision
54561        0.499993
54562        0.500005

Essentially, you hit a greater than 50% chance of a collision before you 
get to 55 thousand statements.


Cheers,
Gavin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Peter Geoghegan
Date:
On Mon, Oct 2, 2017 at 9:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Oct 2, 2017 at 11:02 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
>> +1 to both of these as well.
>
> OK, so here's a patch.  Review appreciated.

You need to change the SQL interface as well, although I'm not sure
exactly how. The problem is that you are now passing a uint64 queryId
to Int64GetDatumFast() within pg_stat_statements_internal(). That
worked when queryId was a uint32, because you can easily represent
values <= UINT_MAX as an int64/int8. However, you cannot represent the
second half of the range of uint64 within a int64/int8. I think that
this will behave different depending on USE_FLOAT8_BYVAL, if nothing
else.

The background here is that we used int8 as the output type for the
function when queryId was first exposed via the SQL interface because
there was no 32-bit unsigned int type that we could have used (except
possibly Oid, but that's weird). You see the same pattern in a couple
of other places.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> You need to change the SQL interface as well, although I'm not sure
> exactly how. The problem is that you are now passing a uint64 queryId
> to Int64GetDatumFast() within pg_stat_statements_internal(). That
> worked when queryId was a uint32, because you can easily represent
> values <= UINT_MAX as an int64/int8. However, you cannot represent the
> second half of the range of uint64 within a int64/int8. I think that
> this will behave different depending on USE_FLOAT8_BYVAL, if nothing
> else.

Maybe intentionally drop the high-order bit, so that it's a 63-bit ID?
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Alexander Korotkov
Date:
On Tue, Oct 3, 2017 at 12:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Geoghegan <pg@bowt.ie> writes:
> You need to change the SQL interface as well, although I'm not sure
> exactly how. The problem is that you are now passing a uint64 queryId
> to Int64GetDatumFast() within pg_stat_statements_internal(). That
> worked when queryId was a uint32, because you can easily represent
> values <= UINT_MAX as an int64/int8. However, you cannot represent the
> second half of the range of uint64 within a int64/int8. I think that
> this will behave different depending on USE_FLOAT8_BYVAL, if nothing
> else.

Maybe intentionally drop the high-order bit, so that it's a 63-bit ID?

+1,
I see 3 options there:
1) Drop high-order bit, as you proposed.
2) Allow negative queryIds.
3) Implement unsigned 64-type.

#1 causes minor loss of precision which looks rather insignificant in given context.
#2 might be rather unexpected for users whose previously had non-negative queryIds.  Changing queryId from 32-bit to 64-bit itself might require some adoption from monitoring software. But queryIds are user-visible, and negative queryIds would look rather nonlogical.
#3 would be attaching hard and long-term problem by insufficient reason.
Thus, #1 looks like most harmless solution.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] 64-bit queryId?

From
Michael Paquier
Date:
On Tue, Oct 3, 2017 at 9:07 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> +1,
> I see 3 options there:
> 1) Drop high-order bit, as you proposed.
> 2) Allow negative queryIds.
> 3) Implement unsigned 64-type.
>
> #1 causes minor loss of precision which looks rather insignificant in given
> context.
> #2 might be rather unexpected for users whose previously had non-negative
> queryIds.  Changing queryId from 32-bit to 64-bit itself might require some
> adoption from monitoring software. But queryIds are user-visible, and
> negative queryIds would look rather nonlogical.

Per the principal of least astonishment perhaps:
https://en.wikipedia.org/wiki/Principle_of_least_astonishment
Negative values tend to be considered as error codes as well.

> #3 would be attaching hard and long-term problem by insufficient reason.
> Thus, #1 looks like most harmless solution.

In this case going for #1 looks like the safest bet.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Andres Freund
Date:
On 2017-10-03 03:07:09 +0300, Alexander Korotkov wrote:
> On Tue, Oct 3, 2017 at 12:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> > Peter Geoghegan <pg@bowt.ie> writes:
> > > You need to change the SQL interface as well, although I'm not sure
> > > exactly how. The problem is that you are now passing a uint64 queryId
> > > to Int64GetDatumFast() within pg_stat_statements_internal(). That
> > > worked when queryId was a uint32, because you can easily represent
> > > values <= UINT_MAX as an int64/int8. However, you cannot represent the
> > > second half of the range of uint64 within a int64/int8. I think that
> > > this will behave different depending on USE_FLOAT8_BYVAL, if nothing
> > > else.
> >
> > Maybe intentionally drop the high-order bit, so that it's a 63-bit ID?
> >
> 
> +1,
> I see 3 options there:
> 1) Drop high-order bit, as you proposed.
> 2) Allow negative queryIds.
> 3) Implement unsigned 64-type.

4) use numeric, efficiency when querying is not a significant concern here
5) use a custom type that doesn't support arithmetic, similar to pg_lsn.

FWIW, I think we should consider going for something like 5) for
pg_class.relpages.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Vladimir Sitnikov
Date:
>OK, so here's a patch.  Review appreciated.

Please correct typo "Write an unsigned integer field (anythign written with UINT64_FORMAT)".  anythign ->  anything.

Vladimir

Re: [HACKERS] 64-bit queryId?

From
Michael Paquier
Date:
On Tue, Oct 3, 2017 at 3:12 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-10-03 03:07:09 +0300, Alexander Korotkov wrote:
>> On Tue, Oct 3, 2017 at 12:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> +1,
>> I see 3 options there:
>> 1) Drop high-order bit, as you proposed.
>> 2) Allow negative queryIds.
>> 3) Implement unsigned 64-type.
>
> 4) use numeric, efficiency when querying is not a significant concern here
> 5) use a custom type that doesn't support arithmetic, similar to pg_lsn.

Why not just returning a hexa-like text?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Andres Freund
Date:
On 2017-10-03 17:06:20 +0900, Michael Paquier wrote:
> On Tue, Oct 3, 2017 at 3:12 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2017-10-03 03:07:09 +0300, Alexander Korotkov wrote:
> >> On Tue, Oct 3, 2017 at 12:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> +1,
> >> I see 3 options there:
> >> 1) Drop high-order bit, as you proposed.
> >> 2) Allow negative queryIds.
> >> 3) Implement unsigned 64-type.
> >
> > 4) use numeric, efficiency when querying is not a significant concern here
> > 5) use a custom type that doesn't support arithmetic, similar to pg_lsn.
> 
> Why not just returning a hexa-like text?

Two reasons: First, it'd look fairly different to before, whereas 4/5
would probably just continue to work fairly transparently in a lot of
cases. Secondly, what's the advantage in doing so over 4)?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Mon, Oct 2, 2017 at 8:07 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> +1,
> I see 3 options there:
> 1) Drop high-order bit, as you proposed.
> 2) Allow negative queryIds.
> 3) Implement unsigned 64-type.
>
> #1 causes minor loss of precision which looks rather insignificant in given
> context.
> #2 might be rather unexpected for users whose previously had non-negative
> queryIds.  Changing queryId from 32-bit to 64-bit itself might require some
> adoption from monitoring software. But queryIds are user-visible, and
> negative queryIds would look rather nonlogical.
> #3 would be attaching hard and long-term problem by insufficient reason.
> Thus, #1 looks like most harmless solution.

I think we should just allow negative queryIds.  I mean, the hash
functions have behaved that way for a long time:

rhaas=# select hashtext('');
  hashtext
-------------
 -1477818771
(1 row)

It seems silly to me to throw away a perfectly good bit from the hash
value just because of some risk of minor user confusion.  I do like
Michael's suggestion of outputing hexa-like text, but changing the
types of the user-visible output columns seems like a job for another
patch.  Similarly, if we were to adopt Andres's suggestions of a new
type or using numeric or Alexander's suggestion of implementing a
64-bit unsigned type, I think it should be a separate patch from this
one.

I would note that such a patch will actually create real
incompatibility -- extension upgrades might fail if there are
dependent views, for example -- whereas merely having query IDs start
to sometimes be negative only creates an incompatibility for people
who assumed that the int64 type wouldn't actually use its full range
of allowable values.  I don't deny that someone may have done that, of
course, but I wouldn't guess that it's a common thing... maybe I'm
wrong.

Meanwhile, updated patch with a fix for the typo pointed out by
Vladimir Sitnikov attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] 64-bit queryId?

From
Alexander Korotkov
Date:
On Tue, Oct 3, 2017 at 5:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 2, 2017 at 8:07 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> +1,
> I see 3 options there:
> 1) Drop high-order bit, as you proposed.
> 2) Allow negative queryIds.
> 3) Implement unsigned 64-type.
>
> #1 causes minor loss of precision which looks rather insignificant in given
> context.
> #2 might be rather unexpected for users whose previously had non-negative
> queryIds.  Changing queryId from 32-bit to 64-bit itself might require some
> adoption from monitoring software. But queryIds are user-visible, and
> negative queryIds would look rather nonlogical.
> #3 would be attaching hard and long-term problem by insufficient reason.
> Thus, #1 looks like most harmless solution.

I think we should just allow negative queryIds.  I mean, the hash
functions have behaved that way for a long time:

rhaas=# select hashtext('');
  hashtext
-------------
 -1477818771
(1 row)

It seems silly to me to throw away a perfectly good bit from the hash
value just because of some risk of minor user confusion.  I do like
Michael's suggestion of outputing hexa-like text, but changing the
types of the user-visible output columns seems like a job for another
patch.  Similarly, if we were to adopt Andres's suggestions of a new
type or using numeric or Alexander's suggestion of implementing a
64-bit unsigned type, I think it should be a separate patch from this
one.

I would note that such a patch will actually create real
incompatibility -- extension upgrades might fail if there are
dependent views, for example -- whereas merely having query IDs start
to sometimes be negative only creates an incompatibility for people
who assumed that the int64 type wouldn't actually use its full range
of allowable values.  I don't deny that someone may have done that, of
course, but I wouldn't guess that it's a common thing... maybe I'm
wrong.

BTW, you didn't comment Tom's suggestion about dropping high-order bit which trades minor user user confusion to minor loss of precision.
TBH, for me it's not so important whether we allow negative queryIds or drop high-order bit.  I would be anyway very good to have 64-(or 63-)bit queryIds committed.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Tue, Oct 3, 2017 at 12:04 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> BTW, you didn't comment Tom's suggestion about dropping high-order bit which
> trades minor user user confusion to minor loss of precision.

Oh, I thought I did comment on that.  I favor allowing negative IDs
rather than minor loss of precision.

> TBH, for me it's not so important whether we allow negative queryIds or drop
> high-order bit.  I would be anyway very good to have 64-(or 63-)bit queryIds
> committed.

Great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Michael Paquier
Date:
On Tue, Oct 3, 2017 at 11:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> It seems silly to me to throw away a perfectly good bit from the hash
> value just because of some risk of minor user confusion.  I do like
> Michael's suggestion of outputing hexa-like text, but changing the
> types of the user-visible output columns seems like a job for another
> patch.  Similarly, if we were to adopt Andres's suggestions of a new
> type or using numeric or Alexander's suggestion of implementing a
> 64-bit unsigned type, I think it should be a separate patch from this
> one.

Yeah, any of them would require a version bump of pg_stat_statements.
My suggestion would be actually to just document the use of to_hex in
pg_stat_statements if there is any issue with query ID signed-ness.
Still, I have yet to see any user stories about complains on the
matter, so even just added a note in the docs may be overdoing it.

Thinking about that freshly today (new day, new thoughts of course), I
think that I would go with the 64-bit version. The patch gets more
simple, and there are ways to easily get around negative numbers by
applying anything like to_hex() on top of pg_stat_statements.

+/* Write an unsigned integer field (anything written with UINT64_FORMAT) */
+#define WRITE_UINT64_FIELD(fldname) \
+   appendStringInfo(str, " :" CppAsString(fldname) " " UINT64_FORMAT, \
+                    node->fldname)
Correct call here.
{
-   return hash_any((const unsigned char *) str, len);
+   return hash_any_extended((const unsigned char *) str, len, 0);}
[...]
-           uint32      start_hash = hash_any(jumble, JUMBLE_SIZE);
+           uint64      start_hash = hash_any_extended(jumble, JUMBLE_SIZE, 0);
Missing two DatumGetUInt64() perhaps? HEAD looks wrong to me as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Tue, Oct 3, 2017 at 9:34 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> +/* Write an unsigned integer field (anything written with UINT64_FORMAT) */
> +#define WRITE_UINT64_FIELD(fldname) \
> +   appendStringInfo(str, " :" CppAsString(fldname) " " UINT64_FORMAT, \
> +                    node->fldname)
> Correct call here.

I'm sorry, but I don't understand this comment.

>  {
> -   return hash_any((const unsigned char *) str, len);
> +   return hash_any_extended((const unsigned char *) str, len, 0);
>  }
> [...]
> -           uint32      start_hash = hash_any(jumble, JUMBLE_SIZE);
> +           uint64      start_hash = hash_any_extended(jumble, JUMBLE_SIZE, 0);
> Missing two DatumGetUInt64() perhaps? HEAD looks wrong to me as well.

Ah, yes, you're right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Michael Paquier
Date:
On Wed, Oct 4, 2017 at 10:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Oct 3, 2017 at 9:34 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> +/* Write an unsigned integer field (anything written with UINT64_FORMAT) */
>> +#define WRITE_UINT64_FIELD(fldname) \
>> +   appendStringInfo(str, " :" CppAsString(fldname) " " UINT64_FORMAT, \
>> +                    node->fldname)
>> Correct call here.
>
> I'm sorry, but I don't understand this comment.

I just mean that your patch is correct here. I don't always complain :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Tue, Oct 3, 2017 at 9:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
>> I'm sorry, but I don't understand this comment.
>
> I just mean that your patch is correct here. I don't always complain :)

Oh, OK.  I'm all right with my patch being correct.

Here's a new version that hopefully fixes the things that you noticed
were incorrect.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] 64-bit queryId?

From
Michael Paquier
Date:
On Wed, Oct 4, 2017 at 9:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Oct 3, 2017 at 9:39 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>>> I'm sorry, but I don't understand this comment.
>>
>> I just mean that your patch is correct here. I don't always complain :)
>
> Oh, OK.  I'm all right with my patch being correct.
>
> Here's a new version that hopefully fixes the things that you noticed
> were incorrect.

I am still on the learning curve with pg_stat_statements... This still
does not look complete to me. pgss_hash_fn only makes use of the last
four bytes of the query ID. What about computing the hash using as
also the first four bytes? With the current code, if the last four
bytes of two queries match then they would be counted together looking
at pgss_store().

I have spotted as well this comment in pg_stat_statements.c:   /* Increment the counts, except when jstate is not NULL
*/  if (!jstate)
 
I think that this should be "when jstate is NULL".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Wed, Oct 4, 2017 at 9:49 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> I am still on the learning curve with pg_stat_statements... This still
> does not look complete to me. pgss_hash_fn only makes use of the last
> four bytes of the query ID. What about computing the hash using as
> also the first four bytes? With the current code, if the last four
> bytes of two queries match then they would be counted together looking
> at pgss_store().

Not really; dynahash won't merge two keys just because their hash
codes come out the same.  But you're right; that's probably not the
best way to do it.   TBH, why do we even have pgss_hash_fn?  It seems
like using tag_hash would be superior.

> I have spotted as well this comment in pg_stat_statements.c:
>     /* Increment the counts, except when jstate is not NULL */
>     if (!jstate)
> I think that this should be "when jstate is NULL".

I think that you're right, but that's unrelated to this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Michael Paquier
Date:
On Wed, Oct 4, 2017 at 11:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Not really; dynahash won't merge two keys just because their hash
> codes come out the same.  But you're right; that's probably not the
> best way to do it.   TBH, why do we even have pgss_hash_fn?  It seems
> like using tag_hash would be superior.

Yes, using tag_hash would be just better than any custom formula.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Wed, Oct 4, 2017 at 10:11 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Oct 4, 2017 at 11:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Not really; dynahash won't merge two keys just because their hash
>> codes come out the same.  But you're right; that's probably not the
>> best way to do it.   TBH, why do we even have pgss_hash_fn?  It seems
>> like using tag_hash would be superior.
>
> Yes, using tag_hash would be just better than any custom formula.

OK, here's v4, which does it that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] 64-bit queryId?

From
Michael Paquier
Date:
On Thu, Oct 5, 2017 at 4:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Oct 4, 2017 at 10:11 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Wed, Oct 4, 2017 at 11:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Not really; dynahash won't merge two keys just because their hash
>>> codes come out the same.  But you're right; that's probably not the
>>> best way to do it.   TBH, why do we even have pgss_hash_fn?  It seems
>>> like using tag_hash would be superior.
>>
>> Yes, using tag_hash would be just better than any custom formula.
>
> OK, here's v4, which does it that way.

v4 looks correct to me. Testing it through (pgbench and some custom
queries) I have not spotted issues. If the final decision is to use
64-bit query IDs, then this patch could be pushed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Wed, Oct 4, 2017 at 9:00 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> v4 looks correct to me. Testing it through (pgbench and some custom
> queries) I have not spotted issues. If the final decision is to use
> 64-bit query IDs, then this patch could be pushed.

Cool.  Committed, thanks for the review.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Michael Paquier
Date:
On Thu, Oct 12, 2017 at 9:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Oct 4, 2017 at 9:00 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> v4 looks correct to me. Testing it through (pgbench and some custom
>> queries) I have not spotted issues. If the final decision is to use
>> 64-bit query IDs, then this patch could be pushed.
>
> Cool.  Committed, thanks for the review.

The final result looks fine for me. Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Julien Rouhaud
Date:
On Thu, Oct 12, 2017 at 2:46 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Oct 12, 2017 at 9:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Oct 4, 2017 at 9:00 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> v4 looks correct to me. Testing it through (pgbench and some custom
>>> queries) I have not spotted issues. If the final decision is to use
>>> 64-bit query IDs, then this patch could be pushed.
>>
>> Cool.  Committed, thanks for the review.
>
> The final result looks fine for me. Thanks.

Sorry for replying so late, but I have a perhaps naive question about
the hashtable handling with this new version.

IIUC, the shared hash table is now created with HASH_BLOBS instead of
HASH_FUNCTION, so since sizeof(pgssHashKey) != sizeof(uint32) the hash
table will use tag_hash() to compute the hash key.

tag_hash() uses all the bits present in the given struct, so this can
be problematic if padding bits are not zeroed, which isn't garanted by
C standard for local variable.

WIth current pgssHashKey definition, there shouldn't be padding bits,
so it should be safe.  But I wonder if adding an explicit memset() of
the key in pgss_store() could avoid extension authors to have
duplicate entries if they rely on this code, or prevent future issue
in the unlikely case of adding other fields to pgssHashKey.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Wed, Oct 18, 2017 at 9:20 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
> Sorry for replying so late, but I have a perhaps naive question about
> the hashtable handling with this new version.
>
> IIUC, the shared hash table is now created with HASH_BLOBS instead of
> HASH_FUNCTION, so since sizeof(pgssHashKey) != sizeof(uint32) the hash
> table will use tag_hash() to compute the hash key.
>
> tag_hash() uses all the bits present in the given struct, so this can
> be problematic if padding bits are not zeroed, which isn't garanted by
> C standard for local variable.
>
> WIth current pgssHashKey definition, there shouldn't be padding bits,
> so it should be safe.  But I wonder if adding an explicit memset() of
> the key in pgss_store() could avoid extension authors to have
> duplicate entries if they rely on this code, or prevent future issue
> in the unlikely case of adding other fields to pgssHashKey.

I guess we should probably add additional comment to the definition of
pgssHashKey warning of the danger.  I'm OK with adding a memset if
somebody can promise me it will get optimized away by all reasonably
commonly-used compilers, but I'm not that keen on adding more cycles
to protect against a hypothetical danger.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Michael Paquier
Date:
On Thu, Oct 19, 2017 at 4:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Oct 18, 2017 at 9:20 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
>> Sorry for replying so late, but I have a perhaps naive question about
>> the hashtable handling with this new version.
>>
>> IIUC, the shared hash table is now created with HASH_BLOBS instead of
>> HASH_FUNCTION, so since sizeof(pgssHashKey) != sizeof(uint32) the hash
>> table will use tag_hash() to compute the hash key.
>>
>> tag_hash() uses all the bits present in the given struct, so this can
>> be problematic if padding bits are not zeroed, which isn't garanted by
>> C standard for local variable.
>>
>> WIth current pgssHashKey definition, there shouldn't be padding bits,
>> so it should be safe.  But I wonder if adding an explicit memset() of
>> the key in pgss_store() could avoid extension authors to have
>> duplicate entries if they rely on this code, or prevent future issue
>> in the unlikely case of adding other fields to pgssHashKey.
>
> I guess we should probably add additional comment to the definition of
> pgssHashKey warning of the danger.  I'm OK with adding a memset if
> somebody can promise me it will get optimized away by all reasonably
> commonly-used compilers, but I'm not that keen on adding more cycles
> to protect against a hypothetical danger.

A comment is an adapted answer for me too. There is no guarantee that
memset improvements will get committed. They will likely be, but
making a hard promise is difficult.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Julien Rouhaud
Date:
On Thu, Oct 19, 2017 at 1:08 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Oct 19, 2017 at 4:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Oct 18, 2017 at 9:20 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
>>> WIth current pgssHashKey definition, there shouldn't be padding bits,
>>> so it should be safe.  But I wonder if adding an explicit memset() of
>>> the key in pgss_store() could avoid extension authors to have
>>> duplicate entries if they rely on this code, or prevent future issue
>>> in the unlikely case of adding other fields to pgssHashKey.
>>
>> I guess we should probably add additional comment to the definition of
>> pgssHashKey warning of the danger.  I'm OK with adding a memset if
>> somebody can promise me it will get optimized away by all reasonably
>> commonly-used compilers, but I'm not that keen on adding more cycles
>> to protect against a hypothetical danger.
>
> A comment is an adapted answer for me too.

I agree, and I'm perfectly fine with adding a comment around pgssHashKey.

PFA a patch to warn about the danger.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] 64-bit queryId?

From
Robert Haas
Date:
On Thu, Oct 19, 2017 at 2:11 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
> I agree, and I'm perfectly fine with adding a comment around pgssHashKey.
>
> PFA a patch to warn about the danger.

Committed a somewhat different version of this - hope you are OK with
the result.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

From
Julien Rouhaud
Date:
On Fri, Oct 20, 2017 at 3:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Oct 19, 2017 at 2:11 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
>> I agree, and I'm perfectly fine with adding a comment around pgssHashKey.
>>
>> PFA a patch to warn about the danger.
>
> Committed a somewhat different version of this - hope you are OK with
> the result.

That's much better than what I proposed. Thanks a lot!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers