Home > mailing lists

Re: B-Tree support function number 3 (strxfrm() optimization) - Mailing list pgsql-hackers

From	Claudio Freire
Subject	Re: B-Tree support function number 3 (strxfrm() optimization)
Date	July 14, 2014 21:03:53
Msg-id	CAGTBQpY4Qbunj+kYc5hRin3jWP4uovmTbgcKW-VY0LKxH9Ggxg@mail.gmail.com Whole thread Raw
In response to	Re: B-Tree support function number 3 (strxfrm() optimization) (Peter Geoghegan <pg@heroku.com>)
Responses	Re: B-Tree support function number 3 (strxfrm() optimization) (Peter Geoghegan <pg@heroku.com>)
List	pgsql-hackers

Tree view

On Mon, Jul 14, 2014 at 2:53 PM, Peter Geoghegan <pg@heroku.com> wrote:
> My concern is that it won't be worth it to do the extra work,
> particularly given that I already have 8 bytes to work with. Supposing
> I only had 4 bytes to work with (as researchers writing [2] may have
> only had in 1994), that would leave me with a relatively small number
> of distinct normalized keys in many representative cases. For example,
> I'd have a mere 40,665 distinct normalized keys in the case of my
> "cities" database, rather than 243,782 (out of a set of 317,102 rows)
> for 8 bytes of storage. But if I double that to 16 bytes (which might
> be taken as a proxy for what a good compression scheme could get me),
> I only get a modest improvement - 273,795 distinct keys. To be fair,
> that's in no small part because there are only 275,330 distinct city
> names overall (and so most dups get away with a cheap memcmp() on
> their tie-breaker), but this is a reasonably organic, representative
> dataset.


Are those numbers measured on MAC's strxfrm?

That was the one with suboptimal entropy on the first 8 bytes.

pgsql-hackers by date:

From: Stefan Kaltenbrunner
Date: 14 July 2014, 20:55:00
Subject: Re: SSL information view

From: Robert Haas
Date: 14 July 2014, 21:13:14
Subject: Re: pg_shmem_allocations view

Re: B-Tree support function number 3 (strxfrm() optimization) - Mailing list pgsql-hackers

Previous

Next