Re: B-Tree support function number 3 (strxfrm() optimization) - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: B-Tree support function number 3 (strxfrm() optimization)
Date
Msg-id CAGTBQpY4Qbunj+kYc5hRin3jWP4uovmTbgcKW-VY0LKxH9Ggxg@mail.gmail.com
Whole thread Raw
In response to Re: B-Tree support function number 3 (strxfrm() optimization)  (Peter Geoghegan <pg@heroku.com>)
Responses Re: B-Tree support function number 3 (strxfrm() optimization)  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Mon, Jul 14, 2014 at 2:53 PM, Peter Geoghegan <pg@heroku.com> wrote:
> My concern is that it won't be worth it to do the extra work,
> particularly given that I already have 8 bytes to work with. Supposing
> I only had 4 bytes to work with (as researchers writing [2] may have
> only had in 1994), that would leave me with a relatively small number
> of distinct normalized keys in many representative cases. For example,
> I'd have a mere 40,665 distinct normalized keys in the case of my
> "cities" database, rather than 243,782 (out of a set of 317,102 rows)
> for 8 bytes of storage. But if I double that to 16 bytes (which might
> be taken as a proxy for what a good compression scheme could get me),
> I only get a modest improvement - 273,795 distinct keys. To be fair,
> that's in no small part because there are only 275,330 distinct city
> names overall (and so most dups get away with a cheap memcmp() on
> their tie-breaker), but this is a reasonably organic, representative
> dataset.


Are those numbers measured on MAC's strxfrm?

That was the one with suboptimal entropy on the first 8 bytes.



pgsql-hackers by date:

Previous
From: Stefan Kaltenbrunner
Date:
Subject: Re: SSL information view
Next
From: Robert Haas
Date:
Subject: Re: pg_shmem_allocations view