Re: Feature request for adoptive indexes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Feature request for adoptive indexes
Date
Msg-id c1f6d01e-09ce-ca0e-f0e8-4f386b2bc1e8@enterprisedb.com
Whole thread Raw
In response to Re: Feature request for adoptive indexes  (Pavel Borisov <pashkin.elfe@gmail.com>)
Responses Re: Feature request for adoptive indexes  (Mark Dilger <mark.dilger@enterprisedb.com>)
List pgsql-hackers

On 10/26/21 21:39, Pavel Borisov wrote:
> I've already answered OP but it seems in the wrong thread, so I copy it 
> here:
> 
> I think now in many cases you can effectively use covering index to have 
> fast index-only scans without index duplication. It will help if you 
> don't have great selectivity on the last column (most probably you 
> don't). E.g.:
> 
> CREATE INDEX ON table_name (`job`,`nlp`,`year`) INCLUDE (`scan_id`, 
> `issue_flag`, `sequence`)
> 
> But I consider the feature can be useful when there is a very little 
> selectivity in the first index columns. I.e. if (job`,`nlp`,`year') has 
> many repeats and the most selection is done in the last column. I am not 
> sure how often this can arise but in general, I see it as a useful 
> b-tree generalization.
> 
> I'm not sure how it should be done. In my view, we need to add an 
> ordered posting tree as a leaf element if b-tree and now we have index 
> storage only for tuples. The change of on-disk format was previously not 
> easy in nbtree and if we consider the change, we need an extra bit to 
> mark posting trees among index tuples. Maybe it could be done in a way 
> similar to deduplicated tuples if some bits in the tuple header are 
> still could be freed.
> 
> Thoughts?
> 
>     If I get what you propose, you want to have a "top" tree for (job, nlp,
>     year), which "splits" the data set into subsets of ~5000-7000 rows. And
>     then for each subset you want a separate "small" trees on each of the
>     other columns, so in this case three trees.
> 
>     Well, the problem with this is pretty obvious - each of the small trees
>     requires separate copies of the leaf pages. And remember, in a btree the
>     internal pages are usually less than 1% of the index, so this pretty
>     much triples the size of the index. And if you insert a row into the
>     index, it has to insert the item pointer into each of the small trees,
>     likely requiring a separate I/O for each.
> 
>     So I'd bet this is not any different from just having three separate
>     indexes - it doesn't save space, doesn't save I/O, nothing.
> 
> Tomas, I really think we should not try realizing this feature using 
> existing index pages that contain only tuples. You are right, it will 
> cause large overhead. If instead we decide and succeed in creating 
> "posting trees" as a new on-disk page entry type we can have an index 
> with space comparable to the abovementioned covering index but with 
> sorting of values in these trees (i.e. all values are sorted, and "key" 
> ones).
> 

Well, there was no explanation about how it could/should be implemented, 
and maybe there is some elaborate way to handle the "posting trees" that 
I can't quite think of (at least not in the btree context).

I'm still rather skeptical about it - for such feature to be useful the 
prefix columns must not be very selective, i.e. the posting trees are 
expected to be fairly large (e.g. 5-7k rows). It pretty much has to to 
require multiple (many) index pages, in order for the "larger" btree 
index to be slower. And at that point I'd expect the extra overhead to 
be worse than simply defining multiple simple indexes.

A simple experiment would be to measure timing for queries with a 
condition on "sequence" using two indexes:

1) (job, nlp, year, sequence)
2) (job, nlp, year, scan_id, issue_flag, sequence)

The (1) index is "optimal" i.e. there's unlikely to be a better index 
for this query, at least no tree-like. (2) is something like the "worst" 
case index that we can use for this query.

For the new feature to be useful, two things would need to be true:

* query with (2) is much slower than (1)
* the new index would need to be close to (1)

Obviously, if the new index is slower than (2), it's mostly useless 
right off the bat. And it probably can't be faster than (1) in practice, 
as it still is basically a btree index (at least the top half).

So I'd expect the performance to be somewhere between (1) and (2), but 
if (2) is very close to (1) - which I'd bet it is - then the potential 
benefit is also pretty small.

Perhaps I'm entirely wrong and there's a new type of index, better 
suited for cases similar to this. The "posting tree" reference actually 
made me thinking that maybe btree_gin might be applicable here?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "Jonathan S. Katz"
Date:
Subject: Re: allowing "map" for password auth methods with clientcert=verify-full
Next
From: Arjan van de Ven
Date:
Subject: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint