Re: Fully documenting the design of nbtree row comparison scan keys - Mailing list pgsql-hackers

From Chao Li
Subject Re: Fully documenting the design of nbtree row comparison scan keys
Date
Msg-id 01631F61-7768-4E4E-A46C-CC7C8DD40FD3@gmail.com
Whole thread Raw
In response to Fully documenting the design of nbtree row comparison scan keys  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi Peter,

I am glad to review this patch.

> On Oct 31, 2025, at 05:34, Peter Geoghegan <pg@bowt.ie> wrote:
>
> nbtree row comparison scan keys consist of one header key (which
> appears in so->keyData[]), and 2 or more subkeys (which
> _bt_check_rowcompare accesses by following a pointer stored in the
> header key). This design makes sense to me, but it's not at all
> obvious why the scan keys are structured in this way.
>
> Attached patch adds comments about all this above
> _bt_check_rowcompare. It also adds a couple of new documenting
> assertions. This clears up why don't we just include the subkeys in
> the main so->keyData[] array instead [1], and why it's useful to
> sometimes treat a row compare inequality as if it was a similar scalar
> inequality (on the most significant row member's index column).
>
> I didn't understand every nuance of row compare inequalities myself
> until quite recently. The rules with NULLs are particularly tricky.
>
> It seems worthwhile to clear things up now in large part due to the
> recent addition of code in places like _bt_advance_array_keys -- code
> that wants to to treat row compare keys as if they were just a simple
> scalar inequality on the row compare's most significant column. That
> general behavior isn't new (e.g., _bt_first has long ignored row
> compare scan key markings when deducing a NOT NULL constraint), but
> it's not easy to see why it's correct.
>
> I'd like to commit this on both the master branch and the 18 branch in
> the next couple of days. Seems worth keeping them in sync for this.
>
> [1] https://www.postgresql.org/message-id/24134.1137366192%40sss.pgh.pa.us
> --
> Peter Geoghegan
> <v1-0001-Document-nbtree-row-comparison-design.patch>

I spent 2 hours on this patch. Renaming the function parameter “skey" to “header” as well as adding new asserts make
thingsmuch clearer, which is nice. 

For the function comment added to _bt_check_rowcompare(), I really have a mixed feeling. Personally, I feel that the
newcomment is even harder to understand than reading the code directly. 

```
+ * This is a subroutine for _bt_checkkeys/_bt_check_compare.  Caller passes us
+ * a row compare header key taken from so->keyData[].
+ *
+ * The SQL standard describes row value comparisons in terms of logical
+ * expansions that only use scalar operators.  Consider the following example
+ * row comparison:
+ *
+ * "(a, b, c) > (7, 'bar', 77)"
+ *
+ * This is logically/semantically equivalent to:
+ *
+ * "(a = 7 AND b = 'bar' AND c > 77) OR (a = 7 AND b > 'bar') OR (a > 7)".
+ *
+ * Notice that this condition is satisfied by _all_ rows that satisfy "a > 7",
+ * and by a subset of all rows that satisfy "a >= 7" (possibly all such rows).
+ * It _can't_ be satisfied by other rows (where "a < 7" or where "a IS NULL").
+ * A row comparison header key can therefore often be treated as if it was a
+ * simple scalar inequality on the most significant row member's index column.
+ *
```

So far so good. Clearly explained row comparison.

```
+ * Things get more complicated for our row compare with rows where "a = 7".
+ * Note that a row comparison isn't necessarily satisfied by _every_ row that
+ * appears after the first satisfied/matching row.  A forwards scan that uses
+ * our example qual might first return a row "(a, b, c) = (7, 'zebra', 54)".
+ * But it must not return a row "(a, b, c) = (7, NULL, 1)" that'll appear to
+ * the right of the first match (assumes that "b" was declared NULLS LAST).
+ * The scan only returns additional matches upon reaching rows where "a > 7".
+ * If you rereview our example row comparison's logical expansion, you'll
+ * understand why this is so.
+ *
```

This paragraph is actually describing how index data are stored, (in which order index data are loaded,)  but I think
thatdoesn’t matter to this function. This function just takes a header of ScanKey and an IndexTuple as inputs and
comparethem. Understanding row comparison seems enough to understand this function. 

```
+ * Note that a row comparison key behaves _exactly_ the same as an equivalent
+ * scalar inequality key on its most significant column once the scan reaches
+ * the point where it no longer needs to consider any of its lower-order keys.
+ * For example, once a forwards scan that uses our example qual reaches the
+ * first tuple "a > 7", we'll behave in precisely the same way as our caller
+ * would behave with a scalar inequality "a > 7" for the remainder of the scan
+ * (assuming that the scan never changes direction/never goes backwards).
+ * This includes setting continuescan=false based on a deduced NOT NULL
+ * constraint according to the same rules that our caller applies when a NULL
+ * tuple value fails to satisfy a scalar inequality that's marked required in
+ * the opposite scan direction.
```

_bt_check_rowcompare() actually just does a recursive-like comparison. To evaluate (a1, a2, … an) > (b1, b2 …, bn):

    • Compare the first column.
    • If they differ, that decides.
    • If they’re equal, move to next column and compare the rest.
    • Any NULL comparison yields “unknown”, then the whole thing is false.

The corresponding code for this process is quite short. The big portion of the function are handling the situations
whencolumn data is NULL and when const data is NULL, where there are inline comments to describe the behaviors. 

That’s just my personal feeling, others may think differently. I am sorry if you are unhappy with my comment.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Fabrice Chapuis
Date:
Subject: Re: Issue with logical replication slot during switchover
Next
From: Gureumi
Date:
Subject: [PATCH] Fix Korean typo 'checkpoint' in log