Re: Why do indexes and sorts use the database collation? - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Why do indexes and sorts use the database collation? |
Date | |
Msg-id | 4c7b88bdc63c01fb51b56447b5e0c6f1dd702ffa.camel@j-davis.com Whole thread Raw |
In response to | Re: Why do indexes and sorts use the database collation? (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Mon, 2023-11-13 at 10:02 -0800, Andres Freund wrote: > > Inequalities and ORDER BYs can't benefit from an index with a > > different > > collation, but lots of indexes don't need that. > > But we don't know whether the index is used for that. That will be hard to quantify, but perhaps we can discuss in terms of the conditions that must be satisfied for pathkeys provided by an index to be useful (the following is a bit fuzzy, let me know if there's something major I'm missing): a. There needs to be an ORDER BY or inequality somewhere in a query that involves that text field. b. The index must be correlated, or the data cached well enough, or there must be a highly selective inequality. c. For ORDER BY, the result size needs to be large enough for the pathkeys to really matter vs just sorting at the top of the plan. d. The pathkeys must be part of a winning plan (e.g. the winning plan must keep the indexed field on the outer of a join, or use MergeJoin). In my experience, considering a text index speciifically: queries are less likely to use inequalities on text fields than a timestamp field; and indexes on text are less likely to be correlated than an index on a timestamp field. That pushes down the probabilities of (a) and (b) compared with timestamps. (Timestamps obviously don't have collation; I'm just using timestamps as a point of reference where index pathkeys are really useful.) I know the above are hard to quantify (and not statistically independent), but I don't think we should take it for granted that pathkeys on a text index are overwhelmingly useful. I would describe them as "sometimes useful". > > > That makes no sense to me. Either the user cares about ordering, in > which case > the index needs to be in that ordering for efficient ORDER BY I disagree. The user may want top-level ORDER BYs on that field to return 'a' before 'Z', and that's a very reasonable expectation that requires a non-"C" collation. But that does not imply much about what order an index on that field should be. The fact that an ORDER BY exists satisfies only condition (a) above. If the other conditions are not met, then the pathkeys provided by the index are close to useless anyway. The index itself might be useful for other reasons though, like constraints or equality lookups. But indexes for those purposes don't need to provide pathkeys. > You partially > premised your argument on the content of primary keys typically > making non-C > collations undesirable! Primary keys requires (in a practical sense) an index to be created, and that index should be useful for other purposes, too. Equality lookups are clearly required to implement a primary key, so of course the index should be useful for any other equality lookups as well, because that has zero cost. But "useful for other purposes" is not a blank check. Providing useful pathkeys offers some marginal utility (assuming the conditions (a)-(e) are satisfied), but also has a marginal cost (build time and versioning risks). For typical cases I believe Postgres is on the wrong side of that trade; that's all I'm saying. > I'm not sure it's quite that easy. One issue is obviously that this > could lead > to a huge increase in paths we need to keep If there's a particularly bad case you have in mind, please let me know. Otherwise we can sort the details out when it comes to a patch. > > > I'd think the specific collation. Even if we initially perhaps just > get the > default cost from the provider such, it structurally seems the sanest > place to > locate the cost. Makes sense, though I'm thinking we'd still want to special case the fastest collation as "C". Regards, Jeff Davis
pgsql-hackers by date: