Home > mailing lists

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers

From	Anastasia Lubennikova
Subject	Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date	July 4, 2019 15:06:38
Msg-id	4ab6e2db-bcee-f4cf-0916-3a06e6ccbb55@postgrespro.ru Whole thread Raw
In response to	Re: [WIP] Effective storage of duplicates in B-tree index. (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Responses	Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. (Peter Geoghegan <pg@bowt.ie>) Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. (Peter Geoghegan <pg@bowt.ie>)
List	pgsql-hackers

Tree view

The new version of the patch is attached.
This version is even simpler than the previous one,
thanks to the recent btree design changes and all the feedback I received.
I consider it ready for review and testing.

[feature overview]
This patch implements the deduplication of btree non-pivot tuples on 
leaf pages
in a manner similar to GIN index "posting lists".

Non-pivot posting tuple has following format:
t_tid | t_info | key values | posting_list[]

Where t_tid and t_info fields are used to store meta info
about tuple's posting list.
posting list is an array of ItemPointerData.

Currently, compression is applied to all indexes except system indexes, 
unique
indexes, and indexes with included columns.

On insertion, compression applied not to each tuple, but to the page before
split. If the target page is full, we try to compress it.

[benchmark results]
idx ON tbl(c1);
index contains 10000000 integer values

i - number of distinct values in the index.
So i=1 means that all rows have the same key,
and i=10000000 means that all keys are different.

i / old size (MB) / new size (MB)
1            215    88
1000        215    90
100000        215    71
10000000    214    214

For more, see the attached diagram with test results.

[future work]
Many things can be improved in this feature.
Personally, I'd prefer to keep this patch as small as possible
and work on other improvements after a basic part is committed.
Though, I understand that some of these can be considered essential
for this patch to be approved.

1. Implement a split of the posting tuples on a page split.
2. Implement microvacuum of posting tuples.
3. Add a flag into pg_index, which allows enabling/disabling compression
for a particular index.
4. Implement posting list compression.

-- 
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment

pgsql-hackers by date:

From: Alex
Date: 04 July 2019, 15:04:22
Subject: run os command in pg_regress?

From: James Coleman
Date: 04 July 2019, 16:25:08
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers

Attachment

Previous

Next