Re: BUG #16966: Nested loop joining across tables with varchar -> bpchar cast always scans varchar table first - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #16966: Nested loop joining across tables with varchar -> bpchar cast always scans varchar table first
Date
Msg-id 4158832.1618591790@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #16966: Nested loop joining across tables with varchar -> bpchar cast always scans varchar table first  (Zeb Burke-Conte <zebburkeconte@gmail.com>)
List pgsql-bugs
Zeb Burke-Conte <zebburkeconte@gmail.com> writes:
> I still find this quite counterintuitive, since nothing about my query is
> forcing Postgres to cast the varchar column to bpchar instead of the other
> way around. Is there some arcane standard that requires it?

Don't think so.  It's a bit of an implementation artifact arising from
the fact that varchar has no operators of its own (it's kind of like a
domain over text, in our implementation).  So the parser, faced with
"vc = bp", has to choose whether to use the "text = text" or
"bpchar = bpchar" operator.  It settles on the latter, which I believe
is because it has one more exact match to the actual input types than
"text = text".  (Cf. the resolution rules in [1], whcih are most certainly
not derived from the SQL standard.)  Interestingly, if you'd been
working with "text = bp", then "text = text" would've been picked and
the inefficiency would be on the other side of the join due to needing
to cast the bpchar input.

It may be possible to argue that the SQL spec has a preference here,
but I'm not sure.  (IIRC, their text talks about PAD SPACE attributes
rather than a distinct type, so mapping it onto our API takes some
head-scratching.)  In any case, PG has had these resolution rules for
twenty years plus, so changing them would be a huge compatibility break.

> Changing this could be a "nice-to-have" although I see how it's not a bug
> per se.

Even if we wanted to change the resolution rules, that would just move
the pain somewhere else, as I indicated above.

It's possible that things could be improved by inventing operators
for "varchar = bpchar", "bpchar = varchar", etc and adding them to
all the right index opclasses.  It would be a large amount of work
though and could easily introduce some unexpected side-effects.
Given that we regard bpchar as pretty much of a third-class citizen,
I'm not surprised that nobody has put effort into that.

            regards, tom lane

[1] https://www.postgresql.org/docs/current/typeconv-oper.html



pgsql-bugs by date:

Previous
From: Zeb Burke-Conte
Date:
Subject: Re: BUG #16966: Nested loop joining across tables with varchar -> bpchar cast always scans varchar table first
Next
From: PG Bug reporting form
Date:
Subject: BUG #16967: Extremely slow update statement in trigger