Re: Feature request for adoptive indexes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Feature request for adoptive indexes
Date
Msg-id da00cb7f-1b1c-57cc-dd92-c78dd6e5afa6@enterprisedb.com
Whole thread Raw
In response to Re: Feature request for adoptive indexes  (Hayk Manukyan <manukyantt@gmail.com>)
Responses Re: Feature request for adoptive indexes  (Pavel Borisov <pashkin.elfe@gmail.com>)
Re: Feature request for adoptive indexes  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers

On 10/26/21 8:49 AM, Hayk Manukyan wrote:
> ok. here is the deal if I have the following index with 6 column
> 
> CREATE INDEX ON job_nlp_year_scan (job, nlp, year, scan_id, issue_flag,
> sequence);
> 
> I need to specify all 6 columns in where clause in order to fully use
> this index.

What do you mean by "fully use this index"? Yes, the query may use just
some of the columns and there will be a bit of overhead, but I doubt
it'll be measurable.

> It will not be efficient in cases when I have 4 condition in where
> clause also I should follow the order of columns.

So, do some experiments and show us what the difference is. Create an
index on the 4 and 6 columns, and measure timings for a query with just
the 4 columns.

> In case of INCLUDE the 3 columns just will be in index but will not be
> structured as index so it will have affect only if In select I will have
> that 6 columns nothing more.
> 
> In my case I have table with ~15 columns
> In my application  I have to do a lot of queries with following where
> clauses 
> 
> 1. where  job = <something> and nlp = <something> and year = <something>
> and SCAN_ID = <something>
> 2. where  job = <something> and nlp = <something> and year = <something>
> and ISSUE_FLAG = <something>
> 3. where  job = <something> and nlp = <something> and year = <something>
> and SEQUENCE = <something>
> 
> I don't want to index just on  job, nlp, year because for each  job,
> nlp, year I have approximately 5000-7000 rows ,
> overall table have ~50m rows so it is partitioned by job as well.  So if
> I build 3 separate indexes it will be huge resource.
> So I am thinking of having one index which will be job, nlp, year and
> the 4-th layer will be other columns not just included but also in
> B-tree structure. 
> To visualize it will be something like this:
> image.png
> The red part is ordinary index with nested b-trees ant the yellow part
> is adaptive part so depends on
> where clause optimizer can decide which direction (leaf, b-tree
> whatever) to chose.
> In this case I will have one index and will manage red part only once
> for all three cases.
> Those it make sense ? 

If I get what you propose, you want to have a "top" tree for (job, nlp,
year), which "splits" the data set into subsets of ~5000-7000 rows. And
then for each subset you want a separate "small" trees on each of the
other columns, so in this case three trees.

Well, the problem with this is pretty obvious - each of the small trees
requires separate copies of the leaf pages. And remember, in a btree the
internal pages are usually less than 1% of the index, so this pretty
much triples the size of the index. And if you insert a row into the
index, it has to insert the item pointer into each of the small trees,
likely requiring a separate I/O for each.

So I'd bet this is not any different from just having three separate
indexes - it doesn't save space, doesn't save I/O, nothing.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Refactoring: join MakeSingleTupleTableSlot() and MakeTupleTableSlot()
Next
From: Mark Dilger
Date:
Subject: Re: CREATEROLE and role ownership hierarchies