Re: macaddr 64 bit (EUI-64) datatype support - Mailing list pgsql-hackers

From Vitaly Burovoy
Subject Re: macaddr 64 bit (EUI-64) datatype support
Date
Msg-id CAKOSWNmCjBBYTW9rOsD41N9vJoTwAL6MLnK2twSU-cSSva4k7g@mail.gmail.com
Whole thread Raw
In response to Re: macaddr 64 bit (EUI-64) datatype support  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: macaddr 64 bit (EUI-64) datatype support
List pgsql-hackers
On 10/12/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Tom Lane wrote:
>>> Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:
>>>> P.S.: I still think it is a good idea to change storage format,

>>> I'm not sure which part of "no" you didn't understand,

I just paid attention to the words "likelihood" (mixed up with
"likeliness"), "we wanted" and "probably".
Also there was a note about "would also break send/recv" which
behavior can be saved.
And after your letter Julien Rouhaud wrote about mapping from MAC-48
to EUI-64 which leads absence of a bit indicated version of a stored
value. Which can be considered as a new information.

>>> but we're
>>> not breaking on-disk compatibility of existing macaddr columns.

Can I ask why? It will not be a varlen (typstorage will not be
changed), it just changes typlen to 8 and typalign to 'd'.
For every major release 9.0, 9.1, 9.2 .. 9.6 the docs says "A
dump/restore using pg_dumpall, or use of pg_upgrade, is required".
Both handle changes in a storage format. Do they?

>>> Breaking the on-the-wire binary I/O representation seems like a
>>> nonstarter as well.

I wrote that for the EUI-48 (MAC-48) values the binary I/O
representation can be saved.
The binary format (in DataRow message) has a length of the column
value which is reflected in PGresAttValue.len in libpq.
If the client works with the binary format it must consult with the
length of the data.
But until the client works with (and columns have) MAC-48 nothing
hurts it and PGresAttValue.len is "6" as now.

>> I think the suggestion was to rename macaddr to macaddr6 or similar,
>> keeping the existing behavior and the current OID.  So existing columns
>> would continue to work fine and maintain on-disk compatibility, but any
>> newly created columns would become the 8-byte variant.
>
> ... and would have different I/O behavior from before, possibly breaking
> applications that expect "macaddr" to mean what it used to.  I'm still
> dubious that that's a good idea.

Only if a new type will send xx:xx:xx:FF:FF:xx:xx:xx instead of usual
(expected) 6 octets long.
Again, that case in my offer is similar (by I/O behavior) to "just
change 'macaddr' to keep and accept both MAC-48 and MAC-64", but
allows to use "-k" key for pg_upgrade to prevent rewriting possibly
huge (for instance, 'log') tables (but users unexpectedly get
"macaddr6" after upgrade in their columns and function names which
looks strange enough).

> The larger picture here is that we got very little thanks when we squeezed
> IPv6 into the pre-existing inet datatype;

Without a sarcasm, I thank a lot all people involved in it because it
does not hurt me (and many other people) from distinguishing ipv4 and
ipv6 at app-level.
I write apps and just save remote address of clients to an "inet"
column named "remote_ip" without thinking "what if we start serving
clients via ipv6?"; or have a column named "allowed_ip" with IPs or
subnets and just save client's IPv4 or IPv6 as a white list (and use
"allowed_ip >>= $1"). It just works.

> there's a large number of people
> who just said "no thanks" and started using the add-on ip4r type instead.

I found a repository[1] at github. From the description it is
understandable why people used ip4r those days (2005 year). The reason
"Furthermore, they are variable length types (to support ipv6) with
non-trivial overheads" is mentioned as the last in its README.
When you deal with IPv4 in 99.999%, storing it in TOAST tables leads
to a big penalty, but the new version of macaddr is not so wide, so it
does not lead to similar speed decrease (it will be stored inplace).

> So I'm not sure why we want to complicate our lives in order to make
> macaddr follow the same path.

Because according to the Wiki[3] MAC addresses now "are formed
according to the rules of one of three numbering name spaces ...:
MAC-48, EUI-48, and EUI-64.", so IEEE extended range of allowed values
from 48 to 64 bits and since Postgres claims supporting of "mac
addresses", I (as a developer who still uses PG as a primary database)
expect supporting of any kind of mac address, not a limited one. I
expect it is just works.

I reject to imagine what I have to do if I have a column of a type
"macaddr" and unexpectedly I have to deal with an input of EUI-64
type. Add a new column or change columns's type?

In the first case what to do with stored procedures? Duplicate input
parameter to pass the new column of macaddr8 (if macaddr was passed
before)? Duplicate stored procedure?
Also I have to support two columns at the application level. Why? I
just want to store it in the DB, work with it there and get it back!

In the second case (if output will not be mapped to MAC-48 when it is
possible) I have the same troubles as you wrote (oid, I/O and text
representation at least for output, may be also for input).
Moreover I still have to rewrite tables but not when I'm ready for it
(at a migration stage from one major version to another), but when the
task appears.

===
I see no type (besides integers, floats and related with them: their
ranges and arrays ) where numbers appears indicating their capacity:

postgres=# select typname from pg_type where typname ~ '[0-9]' and
typname not like 'pg_toast_%';  typname

-------------int8int2int2vectorint4float4float8_int2_int2vector_int4_int8_float4_float8int4range_int4rangeint8range_int8range
(16 rows)

So why should we have the name "macaddr" without capacity number and
(unexpectedly) macaddr8 (when a different number appears in the
official name "EAI-64")?

===
I offer a change when the current behavior is not changed for MAC-48
values at all (for textual and binary I/O), internal representation is
always 64bit long, and input and output are mapped from (and when it
is possible to) MAC-48 to seamless usage of a "mac address" concept.


P.S.: Note that the current version[2] of ip4r has the "ipaddress"
type for both IPv4 and IPv6 like the "inet" has. We'll end up having a
single type for both MAC-48 and MAC-64. Why don't do it immediately
(without intermediate types)?
While time passes more and more hardware have EUI-64; the same as more
and more clients have IPv6.

P.P.S.: I played around a length of a value in the binary format (in a
client and in the "macaddr_recv"). It is possible to distinguish
MAC-48 to EUI-64 inputs in "macaddr_recv", so there is no changes
necessary at the client side while it works with the MAC-48 format
only.


[1] https://github.com/petere/ip4r-cvs
[2] https://github.com/RhodiumToad/ip4r
[3] https://en.wikipedia.org/wiki/MAC_address

-- 
Best regards,
Vitaly Burovoy



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Change of extension name to new name
Next
From: Michael Paquier
Date:
Subject: Mention to pg_backup in pg_dump.c