Re: [HACKERS] [PATCH] SortSupport for macaddr type - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] [PATCH] SortSupport for macaddr type
Date
Msg-id CAH2-WznDkX59FmxD_rRh3orzfk-5mG4AiByvsz4VvCQSyyeKjw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [PATCH] SortSupport for macaddr type  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: [HACKERS] [PATCH] SortSupport for macaddr type  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Tue, Mar 14, 2017 at 3:25 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Looks straightforward at a quick read-through. I have just a couple of
> questions. How much of the performance gain comes from avoiding the
> FunctionCallInvoke overhead, by simply having SortSupport with a comparison
> function, and how much comes from the "abbreviation"?

I assume it's very similar to the existing performance characteristics
that UUID SortSupport has.

> Also, is it worth the cycles and code to have the hyperloglog estimator, and
> aborting the abbreviation if there are only a few distinct values. Or put
> another way, how much does the abbreviation slow things down in the worst
> case?

Postgres doesn't support passing 6 byte types by-value due to a lack
of support from macros in places like postgres.h. There is a
GET_4_BYTES(), and a GET_8_BYTES(), but no GET_6_BYTES(). It doesn't
really seem worth adding those, just like it didn't seem worth it back
when I pointed out that a TID sort is very slow in the context of
CREATE INDEX CONCURRENTLY. (We ended up coming up with an ad-hoc int8
encoding scheme there, which allowed us to use the int8 default
opclass rather than the TID default opclass, you'll recall.)

It's a bit odd that macaddr abbreviated keys are actually always
smaller (not including padding) than sizeof(Datum) on 64-bit machines,
but this guarantees that abbreviated comparisons will reliably
indicate what an authoritative comparison would indicate, since of
course all relevant information is right there in the abbreviated key.
This will be the only time where that's generally true with
abbreviated keys for any type. This seems fine to me, especially
because it lets us compare macaddr using simple 3-way unsigned int
comparisons, which isn't otherwise the case.

I doubt that you'll ever see a real need to abort abbreviation.
Arguably it shouldn't be considered, but maybe it's worth being
consistent. It might matter on 32-bit platforms, where there is only a
single NIC-specific byte to differentiate MAC addresses, but even that
seems like a stretch. With a type like macaddr, where there is no
major cost like strxfrm() to worry about when abbreviating, the only
cost of abbreviating is that tie-breakers will have an extra
index_getattr() or similar. If there was no abbreviation, or if it
aborted, then the index_getattr() can happen once per SortTuple,
up-front.

Nitpick: the patch should probably not refer to 32-bit or 64-bit
systems. Rather, it should refer to Datum size only.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: [HACKERS] free space map and visibility map
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] mat views stats