Re: [HACKERS] [WIP] Zipfian distribution in pgbench - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Date
Msg-id CAH2-WznyizQ1B3g0L8CyN7r=GmO_7Ft=SUYWUYCoYyesJ6v_mQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [WIP] Zipfian distribution in pgbench  (Alik Khilazhev <a.khilazhev@postgrespro.ru>)
Responses Re: [HACKERS] [WIP] Zipfian distribution in pgbench
List pgsql-hackers
On Fri, Jul 21, 2017 at 4:51 AM, Alik Khilazhev
<a.khilazhev@postgrespro.ru> wrote:
> (Latest version of pgbench Zipfian patch)

While I'm +1 on this idea, I think that it would also be nice if there
was an option to make functions like random_zipfian() actually return
a value that has undergone perfect hashing. When this option is used,
any given value that the function returns would actually be taken from
a random mapping to some other value in the same range. So, you can
potentially get a Zipfian distribution without the locality.

While I think that Zipfian distributions are common, it's probably
less common for the popular items to be clustered together within the
primary key, unless the PK has multiple attributes, and the first is
the "popular attribute". For example, there would definitely be a lot
of interest among contiguous items within an index on "(artist,
album)" where "artist" is a popular artist, which is certainly quite
possible. But, if "album" is the primary key, and it's a SERIAL PK,
then there will only be weak temporal locality within the PK, and only
because today's fads are more popular than the fads from previous
years.

Just another idea, that doesn't have to hold this patch up.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: [HACKERS] Draft release notes up for review
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] Subscription code improvements