Re: Sequential vs. random values - number of pages in B-tree - Mailing list pgsql-general

From Daniel Verite
Subject Re: Sequential vs. random values - number of pages in B-tree
Date
Msg-id 5d6320bd-0268-4e8b-9cd1-159299cc03a3@mm
Whole thread Raw
In response to Re: Sequential vs. random values - number of pages in B-tree  (Francisco Olarte <folarte@peoplecall.com>)
Responses Re: Sequential vs. random values - number of pages in B-tree  (Francisco Olarte <folarte@peoplecall.com>)
List pgsql-general
    Francisco Olarte wrote:

> I think there are some pseudo-random number generators which
> can be made to work with any range, but do not recall which ones right
> now.

There's a simple technique that works on top of a Feistel network,
called the cycle-walking cipher. Described for instance at:
http://web.cs.ucdavis.edu/~rogaway/papers/subset.pdf

I'm using the opportunity to add a wiki page:
https://wiki.postgresql.org/wiki/Pseudo_encrypt_constrained_to_an_arbitrary_range
with sample plgsql code for the [0..10,000,000] range that might be useful
for other cases.

But for the btree fragmentation and final size issue, TBH I don't expect
that constraining the values within that smaller range will make any
difference
in the tests, because it's the dispersion that matters, not the values
themselves.

I mean that, whether the values are well dispersed in the [0..1e7] range or
equally well dispersed in the [0..2**32] range, the probability of a newly
inserted value to compare greater or lower to any previous values of the list
should be the same, so shouldn't the page splits be the same, statistically
speaking?

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


pgsql-general by date:

Previous
From: Thomas Güttler
Date:
Subject: Re: PG vs ElasticSearch for Logs
Next
From: John R Pierce
Date:
Subject: Re: PG vs ElasticSearch for Logs