Collatability of type "name" - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Collatability of type "name" |
Date | |
Msg-id | 15938.1544377821@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Collatability of type "name"
Re: Collatability of type "name" |
List | pgsql-hackers |
I've been experimenting with the task proposed in [1] of expanding the text_ops operator family to include type "name" as well as cross-type text vs. name operators. These operators would need to offer collation-aware sorting, since that's exactly the difference between text_ops and the non-collation-aware name_ops opfamily. I ran into a nasty stumbling block almost immediately: the proposed name vs. name comparison operators fail, because the parser sees that both inputs are of noncollatable types so it doesn't assign any collation to the operator node. I experimented with leaving out the name vs. name operators and just adding cross-type text vs. name and name vs. text operators. That turns out not to work well at all. Aside from the fact that opr_sanity whines about an incomplete operator family, I found various situations where the planner fails, complaining about things like "missing operator 1(19,19) in opfamily 1994". The root of that mess seems to be that we've supposed that if an equality operator is marked mergejoinable then it is mergejoinable in every opfamily that it's a member of. But that isn't true in an opfamily structure like this. For instance "text = name" should be mergejoinable in the name_ops opclass, since we know how to sort both text and name in non-collation-aware ways. But it's not mergejoinable in the text_ops opclass if text_ops doesn't provide collation-aware name vs. name operators to sort the name input with. We could probably fix that, at the cost of about tripling the work needed to detect whether an operator is really mergejoinable, but I have little confidence that there aren't more problems lurking behind it. There are a lot of aspects of EquivalenceClass processing that look pretty questionable if we're trying to support operators that act this way. For instance, if we derive "a = c" given "a = b" and "b = c", the equality operator in "a = c" might be mergejoinable in a different set of opclasses than the other two operators are, making it debatable whether it can be thought to belong to the same EquivalenceClass at all. So the other approach I'm contemplating is to mark type name as collatable (with "C" as its typcollation, probably). There are two plausible sub-approaches: 1. The regular name comparison operators remain non-collation-aware. This would be the least invasive way but it'd have the odd side-effect that expressions like "namecoll1 < namecoll2 COLLATE something" would be accepted but the collation would be ignored. Also, we'd have to invent some new names for the collation-aware name-vs-name operators, and I don't see any obvious candidate for that. 2. Upgrade the name comparison operators to be collation-aware, with (probably) all the same optimizations for C collation as we have for text. This'd be a cleaner end result but it seems like there are a lot of potential side-effects, e.g. syscache lookups would have to be prepared to pass the right collation argument to name comparisons. I feel like #2 is probably really the Right Thing, but it's also sounding like significantly more work than I thought this was going to involve. Not sure if it's worth the effort right now. Also, I think that either solution would lead to some subtle changes in semantics. For example, right now if you compare a name column to a text value, you get a text (collation-aware) comparison using the database's default collation. It looks like if name columns are marked with attcollation = 'C', that would win and the comparison would now have 'C' collation unless you explicitly override it with a COLLATE clause. I'm not sure this is a bad thing --- it'd be more likely to match the sort order of the index on the column --- but it could surprise people. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/5978.1544030694@sss.pgh.pa.us
pgsql-hackers by date: