Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET - Mailing list pgsql-hackers

From Sami Imseih
Subject Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
Date
Msg-id CAA5RZ0to4CdDnEJgKyr93oG+_KTptNXsDhfanX2EhqyRfvCujA@mail.gmail.com
Whole thread Raw
In response to Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
> If we add something for NULLs, we'll always add distinctClause before
> the sortClause. If the distinctClause is NULL we won't end up with the
> same jumble bytes as if the sortClause were NULL, as the order the
> NULL byte(s) are added to the buffer changes.

That is how I understand it as well. By appending a single character null
terminator to the jumble for every NULL expression, we change the composition
of the final jumble. Passing offsets will accomplish the same thing, but I can't
see how it buys us any additional safeguards.

> I suspect that the overhead will be minimal for all the approaches I'm
> seeing on this thread, but it would not hurt to double-check all that.

Perhaps my concern is overblown. Also, it seems the consensus is for a more
comprehensive solution than that of variant A.

Here is an idea I was playing around with. Why don't we just append a null
terminator for every expression (a variant of variant B)?
I describe this as jumbling the expression position. And if we do
that, it no longer
seems necessary to jumble the expression type either. right?

+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -236,6 +236,13 @@ do { \
        if (expr->str) \
                AppendJumble(jstate, (const unsigned char *)
(expr->str), strlen(expr->str) + 1); \
 } while(0)
+#define JUMBLE_POSITION() \
+do { \
+       char nullterm = '\0'; \
+       AppendJumble(jstate, (const unsigned char *) &(nullterm),
sizeof(nullterm)); \
+       if (expr == NULL) \
+               return; \
+} while(0)

 #include "queryjumblefuncs.funcs.c"

@@ -244,17 +251,15 @@ _jumbleNode(JumbleState *jstate, Node *node)
 {
        Node       *expr = node;

-       if (expr == NULL)
-               return;
-
        /* Guard against stack overflow due to overly complex expressions */
        check_stack_depth();

        /*
-        * We always emit the node's NodeTag, then any additional
fields that are
-        * considered significant, and then we recurse to any child nodes.
+        * We represent a position of a field in the jumble with a null-term.
+        * Doing so ensures that even NULL expressions are accounted for in
+        * the jumble.
         */
-       JUMBLE_FIELD(type);
+       JUMBLE_POSITION();

        switch (nodeTag(expr))
        {

> As the overhead of a single query jumbling is weightless compared to
> the overall query processing

yeah, that is a good point. At least benchmarking the above on a
SELECT only pgbench did not reveal any regression.


--
Sami Imseih
Amazon Web Services (AWS)



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Advanced Patch Feedback Session / pgconf.dev 2025
Next
From: Andres Freund
Date:
Subject: DROP TABLESPACE waits for checkpoint with lwlock held