Re: Feature request for adoptive indexes - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Feature request for adoptive indexes |
Date | |
Msg-id | f1f33e93-9fe6-b27c-9dc2-367595d24f49@enterprisedb.com Whole thread Raw |
In response to | Re: Feature request for adoptive indexes (Hayk Manukyan <manukyantt@gmail.com>) |
List | pgsql-hackers |
On 11/1/21 1:24 PM, Hayk Manukyan wrote: > I agree with the above mentioned. > The only concern I have is that we compare little wrong things. > For read we should compare > (job, nlp, year, sequence) AND (job, nlp, year, Scan_ID) and (job, nlp, > year, issue_flag ) VS (job, nlp, year, sequence, Scan_ID, issue_flag) > OR (job, nlp, year INCLUDE(sequence, Scan_ID, issue_flag) ) > Because our proposed index for reading should be closer to a combination > of those 3 and we have current solutions like index on all or with > Include statement. I don't follow. The whole point of the experiment was to show the gap between a "best case" and "worst case" alternatives, with the assumption the gap would be substantial and the new index type might get close to the best case. Are you suggesting those are not the actual best/worst cases and we should use some other indexes? If yes, which ones? IMHO those best/worst cases are fine because: 1) best case (job, nlp, year, sequence) I don't see how we could get anything better for queries on "sequence" than this index, because that's literally one of the indexes that would be included in the whole index. Yes, if you need to support queries on additional columns, you might need more indexes, but that's irrelevant - why would anyone define those indexes, when the "worst case" btree index with all the columns is so close to the best case? 2) worst case (job, nlp, year, scan_id, issue_flag, sequence) I think an index with INCLUDE is entirely irrelevant here. The reason to use INCLUDE is to define UNIQUE index on a subset of columns, but that's not what we need here. I repeated the benchmark with such index, and the timing is ~150ms, so about 50% slower than the simple index. Sorting on all columns is clearly beneficial even for the last column. So I still think those best/worst cases are sensible, and the proposed index would need to beat the worst case. Which seems challenging, considering how close it is to the best case. Or it might break the best case, if there's some sort of revolutionary way to store the small indexes or something like that. The fact that there's no size difference between the two cases is mostly a coincidence, due to the columns being just 2B each, and with wider values the difference might be substantial, making the gap larger. But then the new index would have to improve on this, but there's no proposal on how to do that. > We should try to find a gap between these three cases. > For DML queries > (job, nlp, year, sequence, Scan_ID, issue_flag) OR (job, nlp, year > INCLUDE(sequence, Scan_ID, issue_flag) ) VS (job, nlp, year, sequence) > AND (job, nlp, year, Scan_ID) and (job, nlp, year, issue_flag ) > Because again the proposed index should be just one and cover all 3 > separate ones. > > If you agree with these cases I will try to find a bigger time frame to > compare these two cases deeper. > > The issue is not high prio but I strongly believe it can help and can be > nice feature for even more complicated cases. > You don't need my approval to run benchmarks etc. If you believe this is beneficial then just do the tests and you'll see if it makes sense ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: