Re: Sequential vs. random values - number of pages in B-tree - Mailing list pgsql-general

From Francisco Olarte
Subject Re: Sequential vs. random values - number of pages in B-tree
Date
Msg-id CA+bJJbywO9hknJyqK7x5kixyFwyWRwah1Wn801f4YP_VpdyhRA@mail.gmail.com
Whole thread Raw
In response to Re: Sequential vs. random values - number of pages in B-tree  ("Daniel Verite" <daniel@manitou-mail.org>)
List pgsql-general
On Fri, Aug 19, 2016 at 3:20 PM, Daniel Verite <daniel@manitou-mail.org> wrote:
> There's a simple technique that works on top of a Feistel network,
> called the cycle-walking cipher. Described for instance at:
> http://web.cs.ucdavis.edu/~rogaway/papers/subset.pdf
> I'm using the opportunity to add a wiki page:
> https://wiki.postgresql.org/wiki/Pseudo_encrypt_constrained_to_an_arbitrary_range
> with sample plgsql code for the [0..10,000,000] range that might be useful
> for other cases.

Nice reference, nice WikiPage, nice job. Bookmarking every thing.

> But for the btree fragmentation and final size issue, TBH I don't expect
> that constraining the values within that smaller range will make any
> difference
> in the tests, because it's the dispersion that matters, not the values
> themselves.

Neither do I, that is why I stated probabley good enough for tests,  but....

> I mean that, whether the values are well dispersed in the [0..1e7] range or
> equally well dispersed in the [0..2**32] range, the probability of a newly
> inserted value to compare greater or lower to any previous values of the list
> should be the same, so shouldn't the page splits be the same, statistically
> speaking?

I know some btrees do prefix-coding of the keys, and some do it even
with long integers, and some others do delta-coding for integer keys.
I seem to recall postgres does not, that's whay I do not think it will
make a difference.

But anyway, to compare two things like that, as the original poster
was doing, I normally prefer to test just one thing at a time, that's
why I would normally try to do it by writing a sorted file, shuffling
it with sort -R, and copying it, server side if posible, to eliminate
so both

Francisco Olarte.


pgsql-general by date:

Previous
From: Victor Blomqvist
Date:
Subject: Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans
Next
From: Francisco Olarte
Date:
Subject: Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans