Re: Feature request for adoptive indexes - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Feature request for adoptive indexes |
Date | |
Msg-id | da00cb7f-1b1c-57cc-dd92-c78dd6e5afa6@enterprisedb.com Whole thread Raw |
In response to | Re: Feature request for adoptive indexes (Hayk Manukyan <manukyantt@gmail.com>) |
Responses |
Re: Feature request for adoptive indexes
Re: Feature request for adoptive indexes |
List | pgsql-hackers |
On 10/26/21 8:49 AM, Hayk Manukyan wrote: > ok. here is the deal if I have the following index with 6 column > > CREATE INDEX ON job_nlp_year_scan (job, nlp, year, scan_id, issue_flag, > sequence); > > I need to specify all 6 columns in where clause in order to fully use > this index. What do you mean by "fully use this index"? Yes, the query may use just some of the columns and there will be a bit of overhead, but I doubt it'll be measurable. > It will not be efficient in cases when I have 4 condition in where > clause also I should follow the order of columns. So, do some experiments and show us what the difference is. Create an index on the 4 and 6 columns, and measure timings for a query with just the 4 columns. > In case of INCLUDE the 3 columns just will be in index but will not be > structured as index so it will have affect only if In select I will have > that 6 columns nothing more. > > In my case I have table with ~15 columns > In my application I have to do a lot of queries with following where > clauses > > 1. where job = <something> and nlp = <something> and year = <something> > and SCAN_ID = <something> > 2. where job = <something> and nlp = <something> and year = <something> > and ISSUE_FLAG = <something> > 3. where job = <something> and nlp = <something> and year = <something> > and SEQUENCE = <something> > > I don't want to index just on job, nlp, year because for each job, > nlp, year I have approximately 5000-7000 rows , > overall table have ~50m rows so it is partitioned by job as well. So if > I build 3 separate indexes it will be huge resource. > So I am thinking of having one index which will be job, nlp, year and > the 4-th layer will be other columns not just included but also in > B-tree structure. > To visualize it will be something like this: > image.png > The red part is ordinary index with nested b-trees ant the yellow part > is adaptive part so depends on > where clause optimizer can decide which direction (leaf, b-tree > whatever) to chose. > In this case I will have one index and will manage red part only once > for all three cases. > Those it make sense ? If I get what you propose, you want to have a "top" tree for (job, nlp, year), which "splits" the data set into subsets of ~5000-7000 rows. And then for each subset you want a separate "small" trees on each of the other columns, so in this case three trees. Well, the problem with this is pretty obvious - each of the small trees requires separate copies of the leaf pages. And remember, in a btree the internal pages are usually less than 1% of the index, so this pretty much triples the size of the index. And if you insert a row into the index, it has to insert the item pointer into each of the small trees, likely requiring a separate I/O for each. So I'd bet this is not any different from just having three separate indexes - it doesn't save space, doesn't save I/O, nothing. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: