Re: Document NULL - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Document NULL
Date
Msg-id 505f9dfcc6e84a442c6fbdf600ab1df1e5030603.camel@j-davis.com
Whole thread Raw
In response to Re: Document NULL  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: Document NULL
List pgsql-hackers
On Fri, 2024-06-28 at 13:39 -0700, David G. Johnston wrote:
> The attached are complete and ready for review.  I did some file
> structure reformatting at the end and left that as the second patch. 
> The first contains all of the content.

I read through v4-0001. Thank you for working on this!

I really like the overall feel of the document: outlines the various
interpretations, behaviors, nuances and rationales; practical and not
philosophical.

Comments:

1.

One idea is to have a brief guidance section to help users know how to
use nulls in their data model effectively. For instance, if you allow
nulls for middle_name to mean "no middle name", then you have to be
careful when concatenating it as part of a larger string (otherwise it
will make the entire result null). Using COALESCE() can be a good
strategy here.

2.

It would be helpful to go through a combined example that shows how
these varous behaviors interact. For instance:

  SELECT r.a, SUM(s.b)
    FROM r LEFT JOIN s ON r.id = s.id
    GROUP BY r.a HAVING SUM(s.b) < 123;

Assume that there are no null values in r or s, and there's one record
in r with no match in s. First, a null value comes into existence from
the outer join when there's no match. Then, the GROUP BY creates a
group with a single null value. Then the SUM aggregates it and returns
null. Then the less-than expression evaluates to null (due to 3VL),
then the HAVING clause excludes the record because it's distinct from
true. That's probably not what the user intended -- the sum of no
records is intuitively less than 123.

3. "...more formally, the Law of the Excluded Middle does not hold:
i.e., p OR NOT(p) != true; for all p."

Switching to formal language here is confusing (and wrong, I think). I
suggest rewording and I don't think you need formal language here: what
you are highlighting is that, when p is the null value, the expression
"p OR NOT p" evaluates to null, which is surprising to someone who is
used to thinking in 2VL.

4. COUNT() with no input is a special case that returns zero, and I
think that's worth mentioning somewhere.

Regards,
    Jeff Davis



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: pg_ctl/miscinit: print "MyStartTime" as a long long instead of long to avoid 2038 problem.
Next
From: Nathan Bossart
Date:
Subject: Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C