Thread: `inet` docs suggestion and possible bug report

`inet` docs suggestion and possible bug report

From
Nathan Long
Date:
On https://www.postgresql.org/docs/current/datatype-net-types.html the opening paragraph says:

> PostgreSQL offers data types to store IPv4, IPv6, and MAC addresses, as shown in Table 8.21. It is better to use these types instead of plain text types to store network addresses, because these types offer input error checking and specialized operators and functions (see Section 9.12).

At least in the case of `inet`, another reason is for accurate comparison. IPv4 and IPv6 both have shorthand textual representations; eg `127.1` = `127.1.0.0`. Text storage would consider these unequal.

I propose modifying the leading paragraph to say:

> PostgreSQL offers data types to store IPv4, IPv6, and MAC addresses, as shown in Table 8.21. It is better to use these types instead of plain text types to store network addresses, because these types offer input error checking, **accurate comparison,** and specialized operators and functions (see Section 9.12).

and appending a sentence to `8.9.1 inet` as follows:

> Because it stores an address in its canonical format, inet provides reliable comparison between addresses. Both IPv4 and IPv6 have shorthand formats which would compare as unequal if stored as plain text; for example, `SELECT '127.1' = '127.1.0.0';` returns false, but `SELECT '::ffff:127.0.0.1'::inet = '::ffff:127.1'::inet;` returns true.

A possible bug report: As of I expected `SELECT '127.1'::inet = '127.0.0.1'::inet;` to return true, but as of 16.6 the cast on the shorthand format fails, even though it handles the IPV6-mapped equivalent.

-- Nathan

Re: `inet` docs suggestion and possible bug report

From
Tom Lane
Date:
Nathan Long <hello@nathanmlong.com> writes:
> At least in the case of `inet`, another reason is for accurate comparison.
> IPv4 and IPv6 both have shorthand textual representations; eg `127.1` =
> `127.1.0.0`. Text storage would consider these unequal.

I'm not sure how much we want to press that point, because AFAICS
the code we use does not have the same abbreviation rules you are
expecting.  Notably, it thinks '127.1' means 127.1.0.0.
(We lifted this logic from BIND 20+ years ago, so while it might
not entirely agree with practice elsewhere, it has a respectable
pedigree and I'm hesitant to mess with it.)

> A possible bug report: As of I expected `SELECT '127.1'::inet =
> '127.0.0.1'::inet;` to return true, but as of 16.6 the cast on the
> shorthand format fails, even though it handles the IPV6-mapped equivalent.

That seems to be falling foul of this restriction in inet_net_pton_ipv4:

    /* Prefix length can default to /32 only if all four octets spec'd. */
    if (bits == -1)
    {
        if (dst - odst == 4)
            bits = 32;
        else
            goto enoent;
    }

although if we relaxed that restriction it'd still fail at the next
bit,

    /* If prefix length overspecifies mantissa, life is bad. */
    if ((bits / 8) > (dst - odst))
        goto enoent;

which is why '127.1/32'::inet also fails.

Maybe somebody should take a look at current BIND and see if they
redefined these rules.  Per our git log, we've not attempted to
sync this code with upstream since 2005.

            regards, tom lane



Re: `inet` docs suggestion and possible bug report

From
Tom Lane
Date:
I wrote:
> Nathan Long <hello@nathanmlong.com> writes:
>> At least in the case of `inet`, another reason is for accurate comparison.
>> IPv4 and IPv6 both have shorthand textual representations; eg `127.1` =
>> `127.1.0.0`. Text storage would consider these unequal.

> I'm not sure how much we want to press that point, because AFAICS
> the code we use does not have the same abbreviation rules you are
> expecting.  Notably, it thinks '127.1' means 127.1.0.0.
> (We lifted this logic from BIND 20+ years ago, so while it might
> not entirely agree with practice elsewhere, it has a respectable
> pedigree and I'm hesitant to mess with it.)

I spent a little while researching this.  BIND stopped including the
relevant code at all sometime in the past 10 years, apparently feeling
that POSIX standardization means the libc versions of inet_pton()
behave sufficiently alike everywhere.  You can still find copies
of their code at, eg,

https://users.isc.org/~each/doxygen/bind9/inet__pton_8c-source.html

and there are also versions in the NetBSD source tree and probably
elsewhere.  As far as I can find, none of these will interpret '127.1'
as 127.0.0.1.  Some will reject it (which is what the POSIX spec for
the function says to do) and some will interpret it as 127.1.0.0.

Where 127.1 => 127.0.0.1 seems to come from is inet_addr (in POSIX)
and inet_aton (not in POSIX), which are legacy IPv4-only functions.
They say (quoting POSIX here):

    Values specified using IPv4 dotted decimal notation take one of
    the following forms:

    a.b.c.d
        When four parts are specified, each shall be interpreted as a
        byte of data and assigned, from left to right, to the four
        bytes of an Internet address.

    a.b.c
        When a three-part address is specified, the last part shall be
        interpreted as a 16-bit quantity and placed in the rightmost
        two bytes of the network address. This makes the three-part
        address format convenient for specifying Class B network
        addresses as "128.net.host".

    a.b
        When a two-part address is supplied, the last part shall be
        interpreted as a 24-bit quantity and placed in the rightmost
        three bytes of the network address. This makes the two-part
        address format convenient for specifying Class A network
        addresses as "net.host".

    a
        When only one part is given, the value shall be stored
        directly in the network address without any byte
        rearrangement.

    All numbers supplied as parts in IPv4 dotted decimal notation may
    be decimal, octal, or hexadecimal.

Frankly, I don't think we want to support this.  Classful network
addresses have gone the way of the dodo.  And the fact that it'd be
inconsistent with our traditional interpretation for some non-error
cases such as '127.1/16'::inet is really problematic.
Moreover, the option to allow octal input is a true disaster, not
least because there is plenty of code out there that is willing to
print IPv4 addresses with zero-padded *decimal* byte values.

So at this point I'm very unexcited about touching the behavior of
inet_in.  Maybe in another universe it would have acted differently,
but we have too many years of history with the current behavior.

I do take your point about the inet types helping to standardize
comparison behavior, but I think we should probably limit the text
to talking about IPv6 abbreviations.  Maybe like

    these types offer input error checking and specialized
    operators and functions (see <xref linkend="functions-net"/>).
+   They also simplify comparisons of inconsistently-written addresses,
+   such as abbreviated and unabbreviated IPv6 addresses.
   </para>

            regards, tom lane



Re: `inet` docs suggestion and possible bug report

From
Nathan Long
Date:
Sounds good. Thanks for researching this!

On Tue, Apr 29, 2025, 4:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> Nathan Long <hello@nathanmlong.com> writes:
>> At least in the case of `inet`, another reason is for accurate comparison.
>> IPv4 and IPv6 both have shorthand textual representations; eg `127.1` =
>> `127.1.0.0`. Text storage would consider these unequal.

> I'm not sure how much we want to press that point, because AFAICS
> the code we use does not have the same abbreviation rules you are
> expecting.  Notably, it thinks '127.1' means 127.1.0.0.
> (We lifted this logic from BIND 20+ years ago, so while it might
> not entirely agree with practice elsewhere, it has a respectable
> pedigree and I'm hesitant to mess with it.)

I spent a little while researching this.  BIND stopped including the
relevant code at all sometime in the past 10 years, apparently feeling
that POSIX standardization means the libc versions of inet_pton()
behave sufficiently alike everywhere.  You can still find copies
of their code at, eg,

https://users.isc.org/~each/doxygen/bind9/inet__pton_8c-source.html

and there are also versions in the NetBSD source tree and probably
elsewhere.  As far as I can find, none of these will interpret '127.1'
as 127.0.0.1.  Some will reject it (which is what the POSIX spec for
the function says to do) and some will interpret it as 127.1.0.0.

Where 127.1 => 127.0.0.1 seems to come from is inet_addr (in POSIX)
and inet_aton (not in POSIX), which are legacy IPv4-only functions.
They say (quoting POSIX here):

    Values specified using IPv4 dotted decimal notation take one of
    the following forms:

    a.b.c.d
        When four parts are specified, each shall be interpreted as a
        byte of data and assigned, from left to right, to the four
        bytes of an Internet address.

    a.b.c
        When a three-part address is specified, the last part shall be
        interpreted as a 16-bit quantity and placed in the rightmost
        two bytes of the network address. This makes the three-part
        address format convenient for specifying Class B network
        addresses as "128.net.host".

    a.b
        When a two-part address is supplied, the last part shall be
        interpreted as a 24-bit quantity and placed in the rightmost
        three bytes of the network address. This makes the two-part
        address format convenient for specifying Class A network
        addresses as "net.host".

    a
        When only one part is given, the value shall be stored
        directly in the network address without any byte
        rearrangement.

    All numbers supplied as parts in IPv4 dotted decimal notation may
    be decimal, octal, or hexadecimal.

Frankly, I don't think we want to support this.  Classful network
addresses have gone the way of the dodo.  And the fact that it'd be
inconsistent with our traditional interpretation for some non-error
cases such as '127.1/16'::inet is really problematic.
Moreover, the option to allow octal input is a true disaster, not
least because there is plenty of code out there that is willing to
print IPv4 addresses with zero-padded *decimal* byte values.

So at this point I'm very unexcited about touching the behavior of
inet_in.  Maybe in another universe it would have acted differently,
but we have too many years of history with the current behavior.

I do take your point about the inet types helping to standardize
comparison behavior, but I think we should probably limit the text
to talking about IPv6 abbreviations.  Maybe like

    these types offer input error checking and specialized
    operators and functions (see <xref linkend="functions-net"/>).
+   They also simplify comparisons of inconsistently-written addresses,
+   such as abbreviated and unabbreviated IPv6 addresses.
   </para>

                        regards, tom lane