Re: Feature request for adoptive indexes - Mailing list pgsql-hackers
From | Hayk Manukyan |
---|---|
Subject | Re: Feature request for adoptive indexes |
Date | |
Msg-id | CAF+kZOHed=__24BvtOLw4KGR+rcD=rEGUbMhXwvyw3VFfLHvRA@mail.gmail.com Whole thread Raw |
In response to | Re: Feature request for adoptive indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Feature request for adoptive indexes
|
List | pgsql-hackers |
I agree with the above mentioned.
The only concern I have is that we compare little wrong things.
For read we should compare
(job, nlp, year, sequence) AND (job, nlp, year, Scan_ID) and (job, nlp, year, issue_flag ) VS (job, nlp, year, sequence, Scan_ID, issue_flag) OR (job, nlp, year INCLUDE(sequence, Scan_ID, issue_flag) )
Because our proposed index for reading should be closer to a combination of those 3 and we have current solutions like index on all or with Include statement.
We should try to find a gap between these three cases.
For DML queries
(job, nlp, year, sequence, Scan_ID, issue_flag) OR (job, nlp, year INCLUDE(sequence, Scan_ID, issue_flag) ) VS (job, nlp, year, sequence) AND (job, nlp, year, Scan_ID) and (job, nlp, year, issue_flag )
Because again the proposed index should be just one and cover all 3 separate ones.
If you agree with these cases I will try to find a bigger time frame to compare these two cases deeper.
The issue is not high prio but I strongly believe it can help and can be nice feature for even more complicated cases.
Best regards.
вс, 31 окт. 2021 г. в 21:33, Tomas Vondra <tomas.vondra@enterprisedb.com>:
On 10/31/21 16:48, Pavel Borisov wrote:
> 4 columns: 106 ms
> 6 columns: 109 ms
>
> So there's like 3% difference between the two cases, and even that
> might
> be just noise. This is consistent with the two indexes being about the
> same size.
>
> I also don't think we can get great speedup in the mentioned case, so it
> is not urgently needed of course. My point is that it is just nice to
> have a multicolumn index constructed on stacked trees constructed on
> separate columns, not on the index tuples as a whole thing.
Well, I'd say "nice to have" features are pointless unless they actually
give tangible benefits (like speedup) to users. I'd bet no one is going
to implement and maintain something unless it has such benefit, because
they have to weight it against other beneficial features.
Maybe there are use cases where this would be beneficial, but so far we
haven't seen one. Usually it's the OP who presents such a case, and a
plausible way to improve it - but it seems this thread presents a
solution and now we're looking for an issue it might solve.
> At least there is a benefit of sparing shared memory if we don't need
> to cache index tuples of several similar indexes, instead caching one
> "compound index". So if someone wants to propose this thing I'd
> support it provided problems with concurrency, which were mentioned
> by Peter are solved.
>
The problem with this it assumes the new index would use (significantly)
less space than three separate indexes. I find that rather unlikely, but
maybe there is a smart way to achieve that (certainly not in detail).
I don't want to sound overly pessimistic and if you have an idea how to
do this, I'd like to hear it. But it seems pretty tricky, particularly
if we assume the suffix columns are more variable (which limits the
"compression" ratio etc.).
> These problems could be appear easy though, as we have index tuples
> constructed in a similar way as heap tuples. Maybe it could be easier if
> we had another heap am, which stored separate attributes (if so it could
> be useful as a better JSON storage method than we have today).
>
IMO this just moved the goalposts somewhere outside the solar system.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
pgsql-hackers by date: