Re: Sequential vs. random values - number of pages in B-tree - Mailing list pgsql-general

From Francisco Olarte
Subject Re: Sequential vs. random values - number of pages in B-tree
Date
Msg-id CA+bJJbzHT4SSzE2MTw=1d_u1TFo12w5GzRnbuJR0BjH3eHPrLw@mail.gmail.com
Whole thread Raw
In response to Sequential vs. random values - number of pages in B-tree  (pinker <pinker@onet.eu>)
Responses Re: Sequential vs. random values - number of pages in B-tree  ("Daniel Verite" <daniel@manitou-mail.org>)
Re: Sequential vs. random values - number of pages in B-tree  (pinker <pinker@onet.eu>)
List pgsql-general
Hi:

On Thu, Aug 18, 2016 at 1:32 PM, pinker <pinker@onet.eu> wrote:
...
> create table t01 (id bigint);
> create index i01 on t01(id);
> insert into t01 SELECT s from generate_series(1,10000000) as s;
>
> and random values:
> create table t02 (id bigint);
> create index i02 on t02(id);
> insert into t02 SELECT random()*100 from generate_series(1,10000000) as s;

It's already been told that btrees work that way, if you find it
strange read a bit about them, this is completely normal, but ...

... what I come to point is your test is severely flawed. It probably
does not matter in this case, but you are inserting 10M DIFFERENT
VALUES in the first case and only 100 in the second one, which an
average of 100K DUPLICATES of each. This affects btrees too. You could
try using random*1G, or at least 100M, for a better test ( which may
have even worse behaviour, ideally I would just write 10M integers to
a disk file, then shuffle it and compare COPY FROM times from both ) (
unless you know of an easy way to generate a random permutation on the
fly without using a lot of memory, I do not ).

Francisco Olarte.


pgsql-general by date:

Previous
From: Francisco Olarte
Date:
Subject: Re: SQL help - multiple aggregates
Next
From: "Daniel Verite"
Date:
Subject: Re: Sequential vs. random values - number of pages in B-tree