Thread: UNIQUE null treatment option

UNIQUE null treatment option

From
Peter Eisentraut
Date:
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is apparently pretty easy; most of the patch is just to carry the flag 
around to all the places that need it.

The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

(I named all the internal flags, catalog columns, etc. in the
negative ("nulls not distinct") so that the default PostgreSQL
behavior is the default if the flag is false.  But perhaps the double
negatives make some code harder to read.)

Attachment

Re: UNIQUE null treatment option

From
Marko Tiikkaja
Date:
On Fri, Aug 27, 2021 at 3:38 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

For the unique index syntax, should this be selectable per column/expression, rather than for the entire index as a whole?


.m

Re: UNIQUE null treatment option

From
Peter Eisentraut
Date:
On 27.08.21 14:44, Marko Tiikkaja wrote:
> On Fri, Aug 27, 2021 at 3:38 PM Peter Eisentraut 
> <peter.eisentraut@enterprisedb.com 
> <mailto:peter.eisentraut@enterprisedb.com>> wrote:
> 
>     In the SQL:202x draft, this
>     has been formalized by making this implementation-defined and adding
>     an option on unique constraint definitions UNIQUE [ NULLS [NOT]
>     DISTINCT ] to choose a behavior explicitly.
> 
>     The CREATE UNIQUE INDEX syntax extension is not from the standard,
>     it's my own invention.
> 
> 
> For the unique index syntax, should this be selectable per 
> column/expression, rather than for the entire index as a whole?

Semantically, this would be possible, but the bookkeeping to make it 
work seems out of proportion with the utility.  And you'd have the 
unique index syntax out of sync with the unique constraint syntax, which 
would be pretty confusing.



Re: UNIQUE null treatment option

From
Peter Eisentraut
Date:
Here is a rebased version of this patch.

On 27.08.21 14:38, Peter Eisentraut wrote:
> The SQL standard has been ambiguous about whether null values in
> unique constraints should be considered equal or not.  Different
> implementations have different behaviors.  In the SQL:202x draft, this
> has been formalized by making this implementation-defined and adding
> an option on unique constraint definitions UNIQUE [ NULLS [NOT]
> DISTINCT ] to choose a behavior explicitly.
> 
> This patch adds this option to PostgreSQL.  The default behavior
> remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
> is apparently pretty easy; most of the patch is just to carry the flag 
> around to all the places that need it.
> 
> The CREATE UNIQUE INDEX syntax extension is not from the standard,
> it's my own invention.
> 
> (I named all the internal flags, catalog columns, etc. in the
> negative ("nulls not distinct") so that the default PostgreSQL
> behavior is the default if the flag is false.  But perhaps the double
> negatives make some code harder to read.)

Attachment

Re: UNIQUE null treatment option

From
Zhihong Yu
Date:
Hi,

    bool        isunique;
+   bool        nulls_not_distinct;
 } BTSpool;

Looking at the other fields in BTSpool, there is no underscore in field name.
I think the new field can be named nullsdistinct. This way, the double negative is avoided.

Similar comment for new fields in BTShared and BTLeader.

And the naming would be consistent with information_schema.sql where nulls_distinct is used:

+           CAST('YES' AS yes_or_no) AS enforced,
+           CAST(NULL AS yes_or_no) AS nulls_distinct

Cheers

Re: UNIQUE null treatment option

From
Pavel Borisov
Date:
+1 for commiting this feature. Consider this useful.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: UNIQUE null treatment option

From
Maxim Orlov
Date:
I find this patch useful. It includes changes in documentation and tests. Code itself looks reasonable to me. Since, unique constraint check is done by corresponding btree index, it makes this feature implementation elegant and lightweight.

In my view, it is sufficient that heap relation can have different nulls treatment in unique constraints for different unique columns. For example:
CREATE TABLE t (i INT UNIQUE NULLS DISTINCT, a INT UNIQUE NULLS NOT DISTINCT);

All the tests are running ok on Linux and MacOS X.

Although, patch doesn't apply with default git apply options. Only with the "three way merge" option (-3). Consider rebasing it, please. Then, in my view, it can be "Ready for committer".
--
Best regards,
Maxim Orlov.

Re: UNIQUE null treatment option

From
Peter Geoghegan
Date:
On Wed, Dec 29, 2021 at 2:06 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> Here is a rebased version of this patch.

BTScanInsertData.anynullkeys already effectively means "if the index
is a unique index, then we don't actually need to go through
_bt_check_unique(), or perform any other checkingunique steps". This
is really an instruction about what to do (or not do), based on the
specifics of the values for the insertion scan key plus the index
definition. In other words, the code in _bt_mkscankey() that sets up
BTScanInsertData (an insertion scankey) was written with the exact
requirements of btinsert() in mind -- nothing more.

I wonder if the logic for setting BTScanInsertData.anynullkeys inside
_bt_mkscankey() is the place to put your test for
rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That
would probably necessitate renaming anynullkeys, but that's okay. This
feels more natural to me because a NULL key column in a NULLS NOT
DISTINCT unique constraint is very similar to a NULL non-key column in
an INCLUDE index, as far as our requirements go -- and so both cases
should probably be dealt with at the same point.

-- 
Peter Geoghegan



Re: UNIQUE null treatment option

From
Peter Geoghegan
Date:
On Thu, Jan 13, 2022 at 10:36 AM Peter Geoghegan <pg@bowt.ie> wrote:
> I wonder if the logic for setting BTScanInsertData.anynullkeys inside
> _bt_mkscankey() is the place to put your test for
> rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That
> would probably necessitate renaming anynullkeys, but that's okay. This
> feels more natural to me because a NULL key column in a NULLS NOT
> DISTINCT unique constraint is very similar to a NULL non-key column in
> an INCLUDE index, as far as our requirements go -- and so both cases
> should probably be dealt with at the same point.

Correction: I meant to write "...a NULL key column in a NULLS DISTINCT
unique constraint is very similar...".

-- 
Peter Geoghegan



Re: UNIQUE null treatment option

From
Pavel Borisov
Date:
I wonder if the logic for setting BTScanInsertData.anynullkeys inside
_bt_mkscankey() is the place to put your test for
rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That
would probably necessitate renaming anynullkeys, but that's okay. This
feels more natural to me because a NULL key column in a NULLS NOT
DISTINCT unique constraint is very similar to a NULL non-key column in
an INCLUDE index, as far as our requirements go -- and so both cases
should probably be dealt with at the same point.

A good point, indeed!

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: UNIQUE null treatment option

From
Peter Eisentraut
Date:
On 13.01.22 19:36, Peter Geoghegan wrote:
> I wonder if the logic for setting BTScanInsertData.anynullkeys inside
> _bt_mkscankey() is the place to put your test for
> rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That
> would probably necessitate renaming anynullkeys, but that's okay. This
> feels more natural to me because a NULL key column in a NULLS NOT
> DISTINCT unique constraint is very similar to a NULL non-key column in
> an INCLUDE index, as far as our requirements go -- and so both cases
> should probably be dealt with at the same point.

Makes sense.  Here is an updated patch with this change.

I didn't end up renaming anynullkeys.  I came up with names like 
"anyalwaysdistinctkeys", but in the end that felt too abstract, and 
moreover, it would require rewriting a bunch of code comments that refer 
to null values in this context.  Since as you wrote, anynullkeys is just 
a local concern between two functions, this slight inaccuracy is perhaps 
better than some highly general but unclear terminology.
Attachment

Re: UNIQUE null treatment option

From
Maxim Orlov
Date:
Since cfbot did failed with error, probably, unrelated to the patch itself (see https://cirrus-ci.com/task/5330150500859904)
and repeated check did not start automatically, I reattach patch v3 to restart cfbot on this patch.



--
Best regards,
Maxim Orlov.
Attachment

Re: UNIQUE null treatment option

From
Pavel Borisov
Date:
Makes sense.  Here is an updated patch with this change.

I didn't end up renaming anynullkeys.  I came up with names like
"anyalwaysdistinctkeys", but in the end that felt too abstract, and
moreover, it would require rewriting a bunch of code comments that refer
to null values in this context.  Since as you wrote, anynullkeys is just
a local concern between two functions, this slight inaccuracy is perhaps
better than some highly general but unclear terminology.
Agree with that. With the comment it is clear how it works.

I've looked at the patch v3. It seems good enough for me. CFbot tests have also come green.
Suggest it is RFC now.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: UNIQUE null treatment option

From
Peter Eisentraut
Date:
On 28.01.22 13:56, Pavel Borisov wrote:
>     Makes sense.  Here is an updated patch with this change.
> 
>     I didn't end up renaming anynullkeys.  I came up with names like
>     "anyalwaysdistinctkeys", but in the end that felt too abstract, and
>     moreover, it would require rewriting a bunch of code comments that
>     refer
>     to null values in this context.  Since as you wrote, anynullkeys is
>     just
>     a local concern between two functions, this slight inaccuracy is
>     perhaps
>     better than some highly general but unclear terminology.
> 
> Agree with that. With the comment it is clear how it works.
> 
> I've looked at the patch v3. It seems good enough for me. CFbot tests 
> have also come green.
> Suggest it is RFC now.

Committed.  Thanks.