gistchoose vs. bloat - Mailing list pgsql-hackers

From Alexander Korotkov
Subject gistchoose vs. bloat
Date
Msg-id CAPpHfduhRjJ3WQ3F2yac+UZ98XFoGW4iXfontLZYkBC+dqpFFQ@mail.gmail.com
Whole thread Raw
Responses Re: gistchoose vs. bloat
List pgsql-hackers
Hackers,

While experimenting with gistchoose I achieve interesting results about relation of gistchoose behaviour and gist index bloat.
I've created following testcase for reproducing gist index bloating:
1) Create test table with 1000000 points from geonames
# create table geotest (id serial, point point);
# insert into geotest (point) (select * from geonames order by random() limit 1000000);
2) Create gist index and statistics table. Insert initial statistics (rows count, index size) into table.
# create index geotest_idx on geotest using gist (point) with (buffering = off);
# create table geotest_stats (id serial, count bigint, size bigint);
# insert into geotest_stats (count, size) values ((select count(*) from geotest), pg_relation_size('geotest_idx'));
3) Delete 10% of geotest and vacuum it.
delete from geotest where id in (select id from geotest order by random() limit 100000);
vacuum geotest;
4) Insert new 100000 points fromg eonames.
insert into geotest (point) (select * from geonames order by random() limit 100000);
5) Insert current statistics (rows count, index size) into table.
insert into geotest_stats (count, size) values ((select count(*) from geotest), pg_relation_size('geotest_idx'));
6) Repeat 3-5 steps 50 times.

I get following results with current gistchoose implementation:
test=# select * from geotest1_stats;
 id |  count  |   size    
----+---------+-----------
  1 | 1000000 |  75767808
  2 | 1000000 |  76046336
  3 | 1000000 |  76840960
  4 | 1000000 |  77799424
  5 | 1000000 |  78766080
  6 | 1000000 |  79757312
  7 | 1000000 |  80494592
  8 | 1000000 |  81125376
  9 | 1000000 |  81985536
 10 | 1000000 |  82804736
 11 | 1000000 |  83378176
 12 | 1000000 |  84115456
 13 | 1000000 |  84819968
 14 | 1000000 |  85598208
 15 | 1000000 |  86302720
 16 | 1000000 |  87023616
 17 | 1000000 |  87703552
 18 | 1000000 |  88342528
 19 | 1000000 |  88915968
 20 | 1000000 |  89513984
 21 | 1000000 |  90152960
 22 | 1000000 |  90742784
 23 | 1000000 |  91324416
 24 | 1000000 |  91742208
 25 | 1000000 |  92258304
 26 | 1000000 |  92758016
 27 | 1000000 |  93241344
 28 | 1000000 |  93683712
 29 | 1000000 |  93970432
 30 | 1000000 |  94396416
 31 | 1000000 |  94740480
 32 | 1000000 |  95068160
 33 | 1000000 |  95444992
 34 | 1000000 |  95780864
 35 | 1000000 |  96313344
 36 | 1000000 |  96731136
 37 | 1000000 |  96968704
 38 | 1000000 |  97222656
 39 | 1000000 |  97509376
 40 | 1000000 |  97779712
 41 | 1000000 |  98074624
 42 | 1000000 |  98344960
 43 | 1000000 |  98639872
 44 | 1000000 |  99000320
 45 | 1000000 |  99229696
 46 | 1000000 |  99459072
 47 | 1000000 |  99672064
 48 | 1000000 |  99901440
 49 | 1000000 | 100114432
 50 | 1000000 | 100261888
 51 | 1000000 | 100458496
(51 rows)

Index grow about from 75 MB to 100 MB.

Current implementation of gistchoose select first index tuple which have minimal penalty. It is possible for several tuples to have same minimal penalty. I've created simple patch which selects random from them. I then I've following results for same testcase.

test=# select * from geotest_stats;
 id |  count  |   size   
----+---------+----------
  1 | 1000000 | 74694656
  2 | 1000000 | 74743808
  3 | 1000000 | 74891264
  4 | 1000000 | 75120640
  5 | 1000000 | 75374592
  6 | 1000000 | 75612160
  7 | 1000000 | 75833344
  8 | 1000000 | 76144640
  9 | 1000000 | 76333056
 10 | 1000000 | 76595200
 11 | 1000000 | 76718080
 12 | 1000000 | 76939264
 13 | 1000000 | 77070336
 14 | 1000000 | 77332480
 15 | 1000000 | 77520896
 16 | 1000000 | 77750272
 17 | 1000000 | 77996032
 18 | 1000000 | 78127104
 19 | 1000000 | 78307328
 20 | 1000000 | 78512128
 21 | 1000000 | 78610432
 22 | 1000000 | 78774272
 23 | 1000000 | 78929920
 24 | 1000000 | 79060992
 25 | 1000000 | 79216640
 26 | 1000000 | 79331328
 27 | 1000000 | 79454208
 28 | 1000000 | 79593472
 29 | 1000000 | 79708160
 30 | 1000000 | 79822848
 31 | 1000000 | 79921152
 32 | 1000000 | 80035840
 33 | 1000000 | 80076800
 34 | 1000000 | 80175104
 35 | 1000000 | 80207872
 36 | 1000000 | 80322560
 37 | 1000000 | 80363520
 38 | 1000000 | 80445440
 39 | 1000000 | 80494592
 40 | 1000000 | 80576512
 41 | 1000000 | 80666624
 42 | 1000000 | 80764928
 43 | 1000000 | 80805888
 44 | 1000000 | 80912384
 45 | 1000000 | 80994304
 46 | 1000000 | 81027072
 47 | 1000000 | 81100800
 48 | 1000000 | 81174528
 49 | 1000000 | 81297408
 50 | 1000000 | 81371136
 51 | 1000000 | 81420288
(51 rows)

Now index grow about from 75 MB to 81 MB. It is almost 4 times less index bloat!
I've following explanation of that. If index tuples are overlapping then possibility of insertion into last tuple is much lower than possibility of insertion to the first tuple. Thereby, freed space underlying to last tuples of inner page is not likely to be reused. Selecting random tuple increase that chances.
Obviously, it makes insertion more expensive. I need some more benchmarks to measure it. Probably, we would need a user-visible option for cheaper insert or less bloat.

------
With best regards,
Alexander Korotkov.
Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Resource Owner reassign Locks
Next
From: Andres Freund
Date:
Subject: Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node