RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET - Mailing list pgsql-hackers

From Bykov Ivan
Subject RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Date
Msg-id 5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain
Whole thread Raw
In response to Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
Hello!

> Since then, I see that Ivan
> has already submitted a patch that accounts for NULL nodes and adds a
> byte to the jumble buffer to account for NULLs. This seems quite clean
> and simple. However, Sami seems to have concerns about the overhead of
> doing this. Is that warranted at all? Potentially, there could be a
> decent number of NULL fields. It'll probably be much cheaper than the
> offsetof idea I came up with.

To address the issue of it consuming a lot of bytes in the jumble buffer
for NULL marks, I have added a new patch version for Variant B.
(v2-0001-Query-ID-Calculation-Fix-Variant-B.patch).

This patch adds to JumbleState the count of NULL nodes visited before a
non-NULL one appears. The least significant byte of this count is appended
to the jumble buffer every time the count is not null (indicating that we have
visited non-NULL nodes previously), and a new chunk of data is also appended
to the jumble buffer. I intentionally did not add a final append for the NULL
count in the JumbleQuery processing, as NULL tail nodes of the query do not
improve the reliability of query ID collision resolution.

I believe that my first version, Variant B of the patch, is simple and easy to
understand. I would prefer to use the first version of the Variant B patch,
so I have attached an updated version along with tests
(v3-0001-Query-ID-Calculation-Fix-Variant-B.patch).

As you can see, v2-0001-Query-ID-Calculation-Fix-Variant-B.patch is more
complex and invasive than v3-0001-Query-ID-Calculation-Fix-Variant-B.patch.
I don't think that saving a few bytes in a 1024-byte sized buffer is worth
implementing, even for this simple optimization (v2).

> Can you share your thoughts on Ivan's NULL jumble idea?

I would appreciate any feedback. Thank you.


-----Original Message-----
From: David Rowley <dgrowleyml@gmail.com> 
Sent: Tuesday, March 11, 2025 12:49 PM
To: Michael Paquier <michael@paquier.xyz>
Cc: Быков Иван Александрович <i.bykov@modernsys.ru>; Sami Imseih <samimseih@gmail.com>; pgsql-hackers@postgresql.org
Subject: Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

On Tue, 11 Mar 2025 at 19:50, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Mar 11, 2025 at 12:45:35AM +1300, David Rowley wrote:
> > It seems to me that if this fixes the issue, that the next similar 
> > one is already lurking in the shadows waiting to jump out on us.
>
> For how many years will be have to wait for this threat hiddent in the 
> shadows?  :)
>
> This issue exists since the query jumbling exists in pgss, so it goes 
> really down the road.  I've spent a bit more than a few minutes on 
> that.

I didn't mean to cause offence here. I just wanted to make it clear that I don't think fixing these types of issues by
swappingthe order of the fields is going to be a sustainable practice, and it would be good to come up with a solution
thateliminates the chances of this class of bug from ever appearing again.  Even if we were to trawl the entire Query
structand everything that could ever be attached to it and we deem that today it's fine and no more such bugs exist,
theperson adding some new SQL feature in the future that adds new data to store in the Query probably isn't going to
givequery jumbling much thought, especially now that the code generation is automatic.
 

> > Couldn't we adjust all this code so that we pass a seed to
> > AppendJumble() that's the offsetof(Struct, field) that we're 
> > jumbling and incorporate that seed into the hash?  I don't think we 
> > could possibly change the offset in a minor version, so I don't 
> > think there's a risk that could cause jumble value instability. 
> > Possibly different CPU architectures would come up with different 
> > offsets through having different struct alignment requirements, but 
> > we don't make any guarantees about the jumble values matching across 
> > different CPU architectures anyway.
>
> Yeah, something like that has potential to get rid of such problems 
> forever, particularly thanks to the fact that we automate the 
> generation of this code now so it is mostly cost-free.  This has a 
> cost for the custom jumbling functions where one needs some 
> imagination, but with models being around while we keep the number of 
> custom functions low, that should be neither complicated nor costly, 
> at least in my experience.

I hadn't really processed this thread fully when I responded yesterday and my response was mostly aimed at the latest
patchon the thread at the time, which I had looked at quickly and wanted to put a stop to changing field orders as a
fixfor this.  Since then, I see that Ivan has already submitted a patch that accounts for NULL nodes and adds a byte to
thejumble buffer to account for NULLs. This seems quite clean and simple. However, Sami seems to have concerns about
theoverhead of doing this. Is that warranted at all? Potentially, there could be a decent number of NULL fields. It'll
probablybe much cheaper than the offsetof idea I came up with.
 

> When I was thinking about the addition of the offset to the jumbling 
> yesterday, I was wondering about two things:
> - if there are some node structures where it could not work.
> - if we'd need to pass down some information of the upper node when 
> doing the jumbling of a node attached to it, which would make the code 
> generation much more annoying.
>
> But after sleeping on it I think that these two points are kind of
> moot: having only the offset passed down to a single _jumbleNode() 
> with the offset compiled at its beginning should be sufficient.  Using 
> "seed" as a term is perhaps a bit confusing, because this is an offset 
> in the upper node, but perhaps that's OK as-is.  It's just more 
> entropy addition.  If somebody has a better idea for this term, feel 
> free.

Can you share your thoughts on Ivan's NULL jumble idea?

> _jumbleList() is an interesting one.  If we want an equivalent of the 
> offset, this could use the item number in the list which would be a 
> rather close equivalent to the offset used elsewhere.  For the int, 
> oid and xid lists, we should do an extra JUMBLE_FIELD_SINGLE() for 
> each member, apply the offset at the beginning of _jumbleList(), once, 
> I guess.

Is this affected by the same class of bug that we're aiming to fix here?  (This class being not always jumbling all
fieldsbecause of NULLs or some other value, resulting in the next jumble field applying the same bytes to the buffer as
theprevious field would have, had it not been ignored)
 

> _jumbleVariableSetStmt() should be OK with the offset of "arg" in 
> VariableSetStmt.  The addition of hash_combine64() to count for the 
> offset should be OK.
>
> With that in mind, the cases with DISTINCT/ORDER BY and OFFSET/LIMIT 
> seem to work fine, without causing noise for the other cases tracked 
> in the regression tests of PGSS.
>
> What do you think?

I mostly now think the class of bug can be fixed by ensuring we never ignore any jumble field for any reason and always
putat least 1 byte onto the buffer with some sort of entropy. I'm keen to understand if I'm missing some case that
you'vethought about that I've not.
 

David

Attachment

pgsql-hackers by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: Reducing the log spam
Next
From: Kirill Reshke
Date:
Subject: Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring