Collatability of type "name" - Mailing list pgsql-hackers

From Tom Lane
Subject Collatability of type "name"
Date
Msg-id 15938.1544377821@sss.pgh.pa.us
Whole thread Raw
Responses Re: Collatability of type "name"
Re: Collatability of type "name"
List pgsql-hackers
I've been experimenting with the task proposed in [1] of expanding
the text_ops operator family to include type "name" as well as
cross-type text vs. name operators.  These operators would need to
offer collation-aware sorting, since that's exactly the difference
between text_ops and the non-collation-aware name_ops opfamily.
I ran into a nasty stumbling block almost immediately: the proposed
name vs. name comparison operators fail, because the parser sees
that both inputs are of noncollatable types so it doesn't assign
any collation to the operator node.

I experimented with leaving out the name vs. name operators and
just adding cross-type text vs. name and name vs. text operators.
That turns out not to work well at all.  Aside from the fact that
opr_sanity whines about an incomplete operator family, I found
various situations where the planner fails, complaining about
things like "missing operator 1(19,19) in opfamily 1994".  The
root of that mess seems to be that we've supposed that if an
equality operator is marked mergejoinable then it is mergejoinable
in every opfamily that it's a member of.  But that isn't true in
an opfamily structure like this.  For instance "text = name" should
be mergejoinable in the name_ops opclass, since we know how to sort
both text and name in non-collation-aware ways.  But it's not
mergejoinable in the text_ops opclass if text_ops doesn't provide
collation-aware name vs. name operators to sort the name input with.

We could probably fix that, at the cost of about tripling the work
needed to detect whether an operator is really mergejoinable, but
I have little confidence that there aren't more problems lurking
behind it.  There are a lot of aspects of EquivalenceClass processing
that look pretty questionable if we're trying to support operators
that act this way.  For instance, if we derive "a = c" given "a = b"
and "b = c", the equality operator in "a = c" might be mergejoinable
in a different set of opclasses than the other two operators are,
making it debatable whether it can be thought to belong to the same
EquivalenceClass at all.

So the other approach I'm contemplating is to mark type name as
collatable (with "C" as its typcollation, probably).  There are
two plausible sub-approaches:

1. The regular name comparison operators remain non-collation-aware.
This would be the least invasive way but it'd have the odd side-effect
that expressions like "namecoll1 < namecoll2 COLLATE something"
would be accepted but the collation would be ignored.  Also, we'd
have to invent some new names for the collation-aware name-vs-name
operators, and I don't see any obvious candidate for that.

2. Upgrade the name comparison operators to be collation-aware,
with (probably) all the same optimizations for C collation as we
have for text.  This'd be a cleaner end result but it seems like
there are a lot of potential side-effects, e.g. syscache lookups
would have to be prepared to pass the right collation argument
to name comparisons.

I feel like #2 is probably really the Right Thing, but it's also
sounding like significantly more work than I thought this was going
to involve.  Not sure if it's worth the effort right now.

Also, I think that either solution would lead to some subtle changes
in semantics.  For example, right now if you compare a name column
to a text value, you get a text (collation-aware) comparison using
the database's default collation.  It looks like if name columns
are marked with attcollation = 'C', that would win and the comparison
would now have 'C' collation unless you explicitly override it with
a COLLATE clause.  I'm not sure this is a bad thing --- it'd be more
likely to match the sort order of the index on the column --- but it
could surprise people.

Thoughts?

            regards, tom lane

[1] https://www.postgresql.org/message-id/5978.1544030694@sss.pgh.pa.us


pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: pg_partition_tree crashes for a non-defined relation
Next
From: Tom Lane
Date:
Subject: Re: pg_partition_tree crashes for a non-defined relation