Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers

From Anastasia Lubennikova
Subject Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date
Msg-id f5069d7e-91e6-635b-5bfe-dce4e18714e2@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
List pgsql-hackers
25.09.2019 22:14, Peter Geoghegan wrote:
>
>>> We still haven't added an "off" switch to deduplication, which seems
>>> necessary. I suppose that this should look like GIN's "fastupdate"
>>> storage parameter.
>> Why is it necessary to save this information somewhere but rel->rd_options,
>> while we can easily access this field from _bt_findinsertloc() and
>> _bt_load().
> Maybe, but we also need to access a flag that says it's safe to use
> deduplication. Obviously deduplication is not safe for datatypes like
> numeric and text with a nondeterministic collation. The "is
> deduplication safe for this index?" mechanism will probably work by
> doing several catalog lookups. This doesn't seem like something we
> want to do very often, especially with a buffer lock held -- ideally
> it will be somewhere that's convenient to access.
>
> Do we want to do that separately, and have a storage parameter that
> says "I would like to use deduplication in principle, if it's safe"?
> Or, do we store both pieces of information together, and forbid
> setting the storage parameter to on when it's known to be unsafe for
> the underlying opclasses used by the index? I don't know.
>
> I think that you can start working on this without knowing exactly how
> we'll do those catalog lookups. What you come up with has to work with
> that before the patch can be committed, though.
>
Attached is v19.

* It adds new btree reloption "deduplication".
I decided to refactor the code and move BtreeOptions into a separate 
structure,
rather than adding new btree specific value to StdRelOptions.
Now it can be set even for indexes that do not support deduplication.
In that case it will be ignored. Should we add this check to option 
validation?

* By default deduplication is on for non-unique indexes and off for 
unique ones.

* New function _bt_dedup_is_possible() is intended to be a single place
to perform all the checks. Now it's just a stub to ensure that it works.

Is there a way to extract this from existing opclass information,
or we need to add new opclass field? Have you already started this work?
I recall there was another thread, but didn't manage to find it.

* I also integrated into this version your latest patch that enables 
deduplication on unique indexes,
since now it can be easily switched on/off.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Amit Khandekar
Date:
Subject: Re: Minimal logical decoding on standbys
Next
From: Alexey Bashtanov
Date:
Subject: Re: log bind parameter values on error