Re: [HACKERS] [PATCH] SortSupport for macaddr type - Mailing list pgsql-hackers

From Brandur Leach
Subject Re: [HACKERS] [PATCH] SortSupport for macaddr type
Date
Msg-id CABR_9B_9HsZQ+QUb1_nR4WS_BHGcds7eV4HBJCRPLZ-r0WJM3A@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] SortSupport for macaddr type  (Julien Rouhaud <julien.rouhaud@dalibo.com>)
Responses Re: [HACKERS] [PATCH] SortSupport for macaddr type  (Brandur Leach <brandur@mutelight.org>)
Re: [HACKERS] [PATCH] SortSupport for macaddr type  (Julien Rouhaud <julien.rouhaud@dalibo.com>)
List pgsql-hackers
Hi Julien,

Thank you for taking the time to do this review, and my
apologies for the very delayed response. I lost track of
this work and have only jumped back into it today.

Please find attached a new version of the patch with your
feedback integrated. I've also rebased the patch against
the current master and selected a new OID because my old
one is now in use.

> * you used macaddr_cmp_internal() function name, for uuid
>   the same function is named uuid_internal_cmp().  Using
>   the same naming pattern is probably better.

I was a little split on this one! It's true that UUID uses
`_internal_cmp`, but `_cmp_internal` is also used in a
number of places like `enum`, `timetz`, and `network`. I
don't have a strong feeling about it either way, so I've
changed it to `_internal_cmp` to match UUID as you
suggested.

> * the function comment on macaddr_abbrev_convert()
>   doesn't mention specific little-endian handling

I tried to bake this into the comment text. Here are the
relevant lines of the amended version:

    * Packs the bytes of a 6-byte MAC address into a Datum and treats it as an
    * unsigned integer for purposes of comparison. On a 64-bit machine, there
    * will be two zeroed bytes of padding. The integer is converted to native
    * endianness to facilitate easy comparison.

> * "There will be two bytes of zero padding on the least
>   significant end"
> "least significant bits" would be better

Also done. Here is the amended version:

    * On a 64-bit machine, zero out the 8-byte datum and copy the 6 bytes of
    * the MAC address in. There will be two bytes of zero padding on the end
    * of the least significant bits.

> * This patch will trigger quite a lot modifications after
>   a pgindent run.  Could you try to run pgindent on mac.c
>   before sending an updated patch?

Good call! I've run the new version through pgindent.

Let me know if you have any further feedback and/or
suggestions. Thanks!

Brandur


On Wed, Sep 14, 2016 at 3:14 AM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
On 26/08/2016 19:44, Brandur wrote:
> Hello,
>

Hello,

> I've attached a patch to add SortSupport for Postgres' macaddr which has the
> effect of improving the performance of sorting operations for the type. The
> strategy that I employ is very similar to that for UUID, which is to create
> abbreviated keys by packing as many bytes from the MAC address as possible into
> Datums, and then performing fast unsigned integer comparisons while sorting.
>
> I ran some informal local benchmarks, and for cardinality greater than 100k
> rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For those
> interested, I put a few more numbers into a small report here [2].)
>

That's a nice improvement!

> Admittedly, this is not quite as useful as speeding up sorting on a more common
> data type like TEXT or UUID, but the change still seems like a useful
> performance improvement. I largely wrote it as an exercise to familiarize
> myself with the Postgres codebase.
>
> I'll add an entry into the current commitfest as suggested by the Postgres Wiki
> and follow up here with a link.
>
> Thanks, and if anyone has feedback or other thoughts, let me know!
>

I just reviewed your patch.  It applies and compiles cleanly, and the
abbrev feature works as intended.  There's not much to say since this is
heavily inspired on the uuid SortSupport. The only really specific part
is in the abbrev_converter function, and I don't see any issue with it.

I have a few trivial comments:

* you used macaddr_cmp_internal() function name, for uuid the same
function is named uuid_internal_cmp().  Using the same naming pattern is
probably better.

* the function comment on macaddr_abbrev_convert() doesn't mention
specific little-endian handling

* "There will be two bytes of zero padding on the least significant end"

"least significant bits" would be better

* This patch will trigger quite a lot modifications after a pgindent
run.  Could you try to run pgindent on mac.c before sending an updated
patch?

Best regards.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
Next
From: David Fetter
Date:
Subject: Re: [HACKERS] 3D Z-curve spatial index