Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2) - Mailing list pgsql-hackers

From Kisung Kim
Subject Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
Date
Msg-id CABF0Rr07yJop9OEgaWz5jFYoqqo_-cAv6g-Qy92VV9u5WMMwTA@mail.gmail.com
Whole thread Raw
Responses Re: Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)  (Lukas Fittl <lukas@fittl.com>)
List pgsql-hackers
Hi,

I've run YCSB(Yahoo! Cloud Service Benchmark) on PostgreSQL and MongoDB with WiredTiger.
And I found some interesting results and some issues(maybe) on Btree index of PostgreSQL.

Here is my experiments and results.
YCSB is for document store benchmark and I build following schema in PG.

CREATE TABLE usertable (
    YCSB_KEY varchar(30) primary key,
    FIELDS jsonb);

And the benchmark generated avg-300-byte-length Json documents and loaded 100M rows in PG and Mongo.

First I found that the size difference between PG and Mongo:
I configured Mongo not to use any compression for both storage and index.

MongoDB index size: 2.1 GB
PostgreSQL index size: 5.5 GB

When I used the index bloating estimation script in https://github.com/ioguix/pgsql-bloat-estimation,
the result is as follows:
 current_database | schemaname |     tblname      |              idxname              | real_size  | extra_size |    extra_ratio    | fillfactor | bloat_size |    bloat_ratio    | is_na 
------------------+------------+------------------+-----------------------------------+------------+------------+-------------------+------------+------------+-------------------+-------
ycsb             | public     | usertable        | usertable_pkey                    | 5488852992 | 2673713152 |  48.7116917850949 |         90 | 2362122240 |  43.0348971532448 | f

It says that the bloat_ratio is 42 for the index.

So, I rebuilded the index and the result was changed:

 current_database | schemaname |     tblname      |              idxname              | real_size  | extra_size |    extra_ratio    | fillfactor | bloat_size |    bloat_ratio    | is_na 
------------------+------------+------------------+-----------------------------------+------------+------------+-------------------+------------+------------+-------------------+-------
 ycsb             | public     | usertable        | usertable_pkey                    | 3154264064 |  339132416 |  10.7515543758863 |         90 |   27533312 | 0.872891788428275 | f


I am curious about the results
1) why the index was bloated even though rows were only inserted not removed or updated.
2) And then why the bloating is removed when it is rebuilded

I guess that the splits of Btree nodes during inserting rows caused the bloating but I don't know exact reasons.
And given that MongoDB's index size is much smaller than PG after they handled the same workload (only insert),
then is there any chances to improve PG's index behavior.

Thank you very much. 


--

                                                                                                                                                       

Bitnine Global Inc., Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : kskim@bitnine.net
Office phone : 070-4800-5890, 408-606-8602
US Mobile phone : 408-805-2192

pgsql-hackers by date:

Previous
From: Marko Tiikkaja
Date:
Subject: Re: Assertion failure in REL9_5_STABLE
Next
From: Peter Geoghegan
Date:
Subject: Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)