Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET |
Date | |
Msg-id | Z9AQMKcXJu30TQM5@paquier.xyz Whole thread Raw |
In response to | Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET |
List | pgsql-hackers |
On Tue, Mar 11, 2025 at 08:48:33PM +1300, David Rowley wrote: > On Tue, 11 Mar 2025 at 19:50, Michael Paquier <michael@paquier.xyz> wrote: >> This issue exists since the query jumbling exists in pgss, so it goes >> really down the road. I've spent a bit more than a few minutes on >> that. > > I didn't mean to cause offence here. I just wanted to make it clear > that I don't think fixing these types of issues by swapping the order > of the fields is going to be a sustainable practice, and it would be > good to come up with a solution that eliminates the chances of this > class of bug from ever appearing again. Even if we were to trawl the > entire Query struct and everything that could ever be attached to it > and we deem that today it's fine and no more such bugs exist, the > person adding some new SQL feature in the future that adds new data to > store in the Query probably isn't going to give query jumbling much > thought, especially now that the code generation is automatic. FWIW, I've mentioned the use of the offset in a Node upthread, in the next to last last paragraph of this email: https://www.postgresql.org/message-id/Z84mhjm5RtWtLG4X@paquier.xyz One thing I did not notice yesterday was that it is possible to rely on _jumbleNode() to pass an offset from the parent. So it looks like we had roughly the same idea independently, but I was not able to look more closely at that yesterday. >> But after sleeping on it I think that these two points are kind of >> moot: having only the offset passed down to a single _jumbleNode() >> with the offset compiled at its beginning should be sufficient. Using >> "seed" as a term is perhaps a bit confusing, because this is an offset >> in the upper node, but perhaps that's OK as-is. It's just more >> entropy addition. If somebody has a better idea for this term, feel >> free. > > Can you share your thoughts on Ivan's NULL jumble idea? This is an incomplete solution, I am afraid. The origin of the problem comes from the fact that a Node (here, Query) stores two times successively equivalent Nodes that are used for separate data, with NULL being the difference between both. The problem is not limited to NULL, we could as well, for example, finish with a Node that has a custom jumbling function and the same issue depending on how it is used in a parent Node. Adding the offset from the parent in the jumbling is a much stronger insurance policy that the proposed solution specific to NULL, because each offset is unique in a single Node. >> _jumbleList() is an interesting one. If we want an equivalent of the >> offset, this could use the item number in the list which would be a >> rather close equivalent to the offset used elsewhere. For the int, >> oid and xid lists, we should do an extra JUMBLE_FIELD_SINGLE() for >> each member, apply the offset at the beginning of _jumbleList(), once, >> I guess. > > Is this affected by the same class of bug that we're aiming to fix > here? (This class being not always jumbling all fields because of > NULLs or some other value, resulting in the next jumble field applying > the same bytes to the buffer as the previous field would have, had it > not been ignored) Yep, that could be possible. I am not sure if it can be reached currently with the set of parse nodes, but it in theory possible could be possible with a list of Nodes, at least. -- Michael
Attachment
pgsql-hackers by date: