Home > mailing lists

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Date	March 11 13:28:00
Msg-id	Z9AQMKcXJu30TQM5@paquier.xyz Whole thread Raw
In response to	Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
List	pgsql-hackers

Tree view

On Tue, Mar 11, 2025 at 08:48:33PM +1300, David Rowley wrote:
> On Tue, 11 Mar 2025 at 19:50, Michael Paquier <michael@paquier.xyz> wrote:
>> This issue exists since the query jumbling exists in pgss, so it goes
>> really down the road.  I've spent a bit more than a few minutes on
>> that.
>
> I didn't mean to cause offence here. I just wanted to make it clear
> that I don't think fixing these types of issues by swapping the order
> of the fields is going to be a sustainable practice, and it would be
> good to come up with a solution that eliminates the chances of this
> class of bug from ever appearing again.  Even if we were to trawl the
> entire Query struct and everything that could ever be attached to it
> and we deem that today it's fine and no more such bugs exist, the
> person adding some new SQL feature in the future that adds new data to
> store in the Query probably isn't going to give query jumbling much
> thought, especially now that the code generation is automatic.

FWIW, I've mentioned the use of the offset in a Node upthread, in the
next to last last paragraph of this email:
https://www.postgresql.org/message-id/Z84mhjm5RtWtLG4X@paquier.xyz

One thing I did not notice yesterday was that it is possible to rely
on _jumbleNode() to pass an offset from the parent.  So it looks like
we had roughly the same idea independently, but I was not able to look
more closely at that yesterday.

>> But after sleeping on it I think that these two points are kind of
>> moot: having only the offset passed down to a single _jumbleNode()
>> with the offset compiled at its beginning should be sufficient.  Using
>> "seed" as a term is perhaps a bit confusing, because this is an offset
>> in the upper node, but perhaps that's OK as-is.  It's just more
>> entropy addition.  If somebody has a better idea for this term, feel
>> free.
>
> Can you share your thoughts on Ivan's NULL jumble idea?

This is an incomplete solution, I am afraid.  The origin of the
problem comes from the fact that a Node (here, Query) stores two times
successively equivalent Nodes that are used for separate data, with
NULL being the difference between both.  The problem is not limited to
NULL, we could as well, for example, finish with a Node that has a
custom jumbling function and the same issue depending on how it is
used in a parent Node.  Adding the offset from the parent in the
jumbling is a much stronger insurance policy that the proposed
solution specific to NULL, because each offset is unique in a single
Node.

>> _jumbleList() is an interesting one.  If we want an equivalent of the
>> offset, this could use the item number in the list which would be a
>> rather close equivalent to the offset used elsewhere.  For the int,
>> oid and xid lists, we should do an extra JUMBLE_FIELD_SINGLE() for
>> each member, apply the offset at the beginning of _jumbleList(), once,
>> I guess.
>
> Is this affected by the same class of bug that we're aiming to fix
> here?  (This class being not always jumbling all fields because of
> NULLs or some other value, resulting in the next jumble field applying
> the same bytes to the buffer as the previous field would have, had it
> not been ignored)

Yep, that could be possible.  I am not sure if it can be reached
currently with the set of parse nodes, but it in theory possible could
be possible with a list of Nodes, at least.
--
Michael

Attachment

signature.asc

pgsql-hackers by date:

From: Álvaro Herrera
Date: 11 March, 13:21:12
Subject: Re: bogus error message for ALTER TABLE ALTER CONSTRAINT

From: vignesh C
Date: 11 March, 13:30:45
Subject: Re: Add an option to skip loading missing publication to avoid logical replication failure

Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET - Mailing list pgsql-hackers

Attachment

Previous

Next