Thread: MACADDR parsing issues

MACADDR parsing issues

From
Matthijs Bomhoff
Date:
Hey,

While playing around with different MAC address notations supported by postgresql, I encountered the following:


db=> select '08002b-010203'::macaddr;
      macaddr
-------------------
 08:00:2b:01:02:03
(1 row)

db=> select '08002b-01023'::macaddr;
      macaddr
-------------------
 08:00:2b:01:02:03
(1 row)

db=> select '08002b-0123'::macaddr;
      macaddr
-------------------
 08:00:2b:00:12:03
(1 row)

db=> select '08002b-123'::macaddr;
ERROR:  invalid input syntax for type macaddr: "08002b-123"
LINE 1: select '08002b-123'::macaddr;
              ^
db=> select '08002b-1203'::macaddr;
ERROR:  invalid octet value in "macaddr" value: "08002b-1203"
LINE 1: select '08002b-1203'::macaddr;
              ^

(These particular results have been encountered on 8.4.4, but similar issues still seem to exist in the git head I
pulledlast week.) 

Note how for example '08002b-0123' is accepted as a valid MAC and is parsed as '08:00:2b:00:12:03' leading to two
additionalzeroes being added in different places. Furthermore, the last example actually matches the pattern for a MAC
withoutdelimiters, incorrectly parsing "-1" as one of the octets and thus resulting in an error about invalid octets
insteadof an invalid syntax. 

In case anyone is interested, I have attached a simple attempt at a patch for the MAC address parser that makes it a
bitmore strict in such a way that it still accepts the formats specified in the documentation, but rejects many other
"broken"addresses that are currently accepted with sometimes surprising results. The attached version also rejects MACs
containingadditional whitespace between the octets and separators etc. The patch probably still needs a bit of work to
makeit more in line with your coding style, as well as a decent review to make sure it doesn't break anything else, but
I'llleave that to those who know more about postgresql, MAC notations and sscanf :) 

I have also added a couple of additional test cases in the same diff, although the code could still use a few more for
possiblecorner cases etc. 

Regards,

Matthijs Bomhoff

Attachment

Re: MACADDR parsing issues

From
Tom Lane
Date:
Matthijs Bomhoff <matthijs@quarantainenet.nl> writes:
>  The attached version also rejects MACs containing additional
>  whitespace between the octets and separators etc.

I was under the impression that allowing whitespace there was a feature,
not a bug.  I'm not sure about the more general question of which
abbreviated MAC formats are or should be allowed, though.  Can you point
to any standards about that?  I'm disinclined to incur the inevitable
application-compatibility complaints from making the code reject things
it now accepts unless we have a pretty solid argument that "it now acts
more like RFC NNNN says it should".

            regards, tom lane

Re: MACADDR parsing issues

From
Matthijs Bomhoff
Date:
On Jun 6, 2011, at 4:31 PM, Tom Lane wrote:

> Matthijs Bomhoff <matthijs@quarantainenet.nl> writes:
>> The attached version also rejects MACs containing additional
>> whitespace between the octets and separators etc.
>=20
> I was under the impression that allowing whitespace there was a feature,
> not a bug.

Ah, I was not sure about that, that's why I explicitly mentioned it.

In my patch I disallowed whitespace on both sides of the separators, as "01=
:02:03:04:05: 06" is currently fine, but "01:02:03:04:05 :06" is not, so I =
thought this might simply have been an unintended consequence of using ssca=
nf. This could of course be changed in my patch.

>  I'm not sure about the more general question of which
> abbreviated MAC formats are or should be allowed, though.  Can you point
> to any standards about that?  I'm disinclined to incur the inevitable
> application-compatibility complaints from making the code reject things
> it now accepts unless we have a pretty solid argument that "it now acts
> more like RFC NNNN says it should".

According to the postgres documentation for the MACADDR data type, "The rem=
aining four input formats are not part of any standard.", and I haven't bee=
n able to find any evidence to the contrary regarding that during a quick s=
earch. I do think some network hardware vendors allow some abbreviated form=
ats, while others don't.  Although I am by no means an expert on this, that=
's why I mentioned someone with more knowledge of such notations should pro=
bably look at it.

I understand and agree that backwards-compatibility is a good thing, howeve=
r I was personally rather surprised to see this happen:

db=3D> select '08002b-0123'::macaddr;
     macaddr
-------------------
08:00:2b:00:12:03

I can't imagine anyone writing an application that counts on this behavior =
(turning "0123" into "001203"; I am inclined not to qualify that as abbrevi=
ation.), in particular as the following leads to an error:

db=3D> select '08002b-1203'::macaddr;
ERROR:  invalid octet value in "macaddr" value: "08002b-1203"
LINE 1: select '08002b-1203'::macaddr;
             ^

Anyway, I only wanted to point this out as some of the current behavior str=
uck me as slightly odd. And I figured it would be nice to add a possible pa=
tch and some additional tests to my email in order to help out a bit in cas=
e anyone thought it would be a good idea to change it. Do with it as you se=
e fit :)

Regards,

Matthijs Bomhoff