Re: macaddr 64 bit (EUI-64) datatype support - Mailing list pgsql-hackers
From | Vitaly Burovoy |
---|---|
Subject | Re: macaddr 64 bit (EUI-64) datatype support |
Date | |
Msg-id | CAKOSWNmCjBBYTW9rOsD41N9vJoTwAL6MLnK2twSU-cSSva4k7g@mail.gmail.com Whole thread Raw |
In response to | Re: macaddr 64 bit (EUI-64) datatype support (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: macaddr 64 bit (EUI-64) datatype support
|
List | pgsql-hackers |
On 10/12/16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Tom Lane wrote: >>> Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: >>>> P.S.: I still think it is a good idea to change storage format, >>> I'm not sure which part of "no" you didn't understand, I just paid attention to the words "likelihood" (mixed up with "likeliness"), "we wanted" and "probably". Also there was a note about "would also break send/recv" which behavior can be saved. And after your letter Julien Rouhaud wrote about mapping from MAC-48 to EUI-64 which leads absence of a bit indicated version of a stored value. Which can be considered as a new information. >>> but we're >>> not breaking on-disk compatibility of existing macaddr columns. Can I ask why? It will not be a varlen (typstorage will not be changed), it just changes typlen to 8 and typalign to 'd'. For every major release 9.0, 9.1, 9.2 .. 9.6 the docs says "A dump/restore using pg_dumpall, or use of pg_upgrade, is required". Both handle changes in a storage format. Do they? >>> Breaking the on-the-wire binary I/O representation seems like a >>> nonstarter as well. I wrote that for the EUI-48 (MAC-48) values the binary I/O representation can be saved. The binary format (in DataRow message) has a length of the column value which is reflected in PGresAttValue.len in libpq. If the client works with the binary format it must consult with the length of the data. But until the client works with (and columns have) MAC-48 nothing hurts it and PGresAttValue.len is "6" as now. >> I think the suggestion was to rename macaddr to macaddr6 or similar, >> keeping the existing behavior and the current OID. So existing columns >> would continue to work fine and maintain on-disk compatibility, but any >> newly created columns would become the 8-byte variant. > > ... and would have different I/O behavior from before, possibly breaking > applications that expect "macaddr" to mean what it used to. I'm still > dubious that that's a good idea. Only if a new type will send xx:xx:xx:FF:FF:xx:xx:xx instead of usual (expected) 6 octets long. Again, that case in my offer is similar (by I/O behavior) to "just change 'macaddr' to keep and accept both MAC-48 and MAC-64", but allows to use "-k" key for pg_upgrade to prevent rewriting possibly huge (for instance, 'log') tables (but users unexpectedly get "macaddr6" after upgrade in their columns and function names which looks strange enough). > The larger picture here is that we got very little thanks when we squeezed > IPv6 into the pre-existing inet datatype; Without a sarcasm, I thank a lot all people involved in it because it does not hurt me (and many other people) from distinguishing ipv4 and ipv6 at app-level. I write apps and just save remote address of clients to an "inet" column named "remote_ip" without thinking "what if we start serving clients via ipv6?"; or have a column named "allowed_ip" with IPs or subnets and just save client's IPv4 or IPv6 as a white list (and use "allowed_ip >>= $1"). It just works. > there's a large number of people > who just said "no thanks" and started using the add-on ip4r type instead. I found a repository[1] at github. From the description it is understandable why people used ip4r those days (2005 year). The reason "Furthermore, they are variable length types (to support ipv6) with non-trivial overheads" is mentioned as the last in its README. When you deal with IPv4 in 99.999%, storing it in TOAST tables leads to a big penalty, but the new version of macaddr is not so wide, so it does not lead to similar speed decrease (it will be stored inplace). > So I'm not sure why we want to complicate our lives in order to make > macaddr follow the same path. Because according to the Wiki[3] MAC addresses now "are formed according to the rules of one of three numbering name spaces ...: MAC-48, EUI-48, and EUI-64.", so IEEE extended range of allowed values from 48 to 64 bits and since Postgres claims supporting of "mac addresses", I (as a developer who still uses PG as a primary database) expect supporting of any kind of mac address, not a limited one. I expect it is just works. I reject to imagine what I have to do if I have a column of a type "macaddr" and unexpectedly I have to deal with an input of EUI-64 type. Add a new column or change columns's type? In the first case what to do with stored procedures? Duplicate input parameter to pass the new column of macaddr8 (if macaddr was passed before)? Duplicate stored procedure? Also I have to support two columns at the application level. Why? I just want to store it in the DB, work with it there and get it back! In the second case (if output will not be mapped to MAC-48 when it is possible) I have the same troubles as you wrote (oid, I/O and text representation at least for output, may be also for input). Moreover I still have to rewrite tables but not when I'm ready for it (at a migration stage from one major version to another), but when the task appears. === I see no type (besides integers, floats and related with them: their ranges and arrays ) where numbers appears indicating their capacity: postgres=# select typname from pg_type where typname ~ '[0-9]' and typname not like 'pg_toast_%'; typname -------------int8int2int2vectorint4float4float8_int2_int2vector_int4_int8_float4_float8int4range_int4rangeint8range_int8range (16 rows) So why should we have the name "macaddr" without capacity number and (unexpectedly) macaddr8 (when a different number appears in the official name "EAI-64")? === I offer a change when the current behavior is not changed for MAC-48 values at all (for textual and binary I/O), internal representation is always 64bit long, and input and output are mapped from (and when it is possible to) MAC-48 to seamless usage of a "mac address" concept. P.S.: Note that the current version[2] of ip4r has the "ipaddress" type for both IPv4 and IPv6 like the "inet" has. We'll end up having a single type for both MAC-48 and MAC-64. Why don't do it immediately (without intermediate types)? While time passes more and more hardware have EUI-64; the same as more and more clients have IPv6. P.P.S.: I played around a length of a value in the binary format (in a client and in the "macaddr_recv"). It is possible to distinguish MAC-48 to EUI-64 inputs in "macaddr_recv", so there is no changes necessary at the client side while it works with the MAC-48 format only. [1] https://github.com/petere/ip4r-cvs [2] https://github.com/RhodiumToad/ip4r [3] https://en.wikipedia.org/wiki/MAC_address -- Best regards, Vitaly Burovoy
pgsql-hackers by date: