On Fri, Oct 6, 2023 at 2:25 PM Nico Williams <nico@cryptonector.com> wrote:
> > > > Well, that would be making the encoding a per-value property, rather
> > > > than a per-column property like collation as I proposed. I can't see
> > >
> > > On-disk it would be just a property of the type, not part of the value.
> >
> > I mean, that's not how it works.
>
> Sure, because TEXT in PG doesn't have codeset+encoding as part of it --
> it's whatever the database's encoding is. Collation can and should be a
> porperty of a column, since for Unicode it wouldn't be reasonable to
> make that part of the type. But codeset+encoding should really be a
> property of the type if PG were to support more than one. IMO.
No, what I mean is, you can't just be like "oh, the varlena will be
different in memory than on disk" as if that were no big deal.
I agree that, as an alternative to encoding being a column property,
it could instead be completely a type property, meaning that if you
want to store, say, LATIN1 text in your UTF-8 database, you first
create a latint1text data type and then use it, rather than, as in the
model I proposed, creating a text column and then applying a setting
like ENCODING latin1 to it. I think that there might be some problems
with that model, but it could also have some benefits. If someone were
going to make a run at implementing this, they might want to consider
both designs and evaluate the tradeoffs.
But, even if we were all convinced that this kind of feature was good
to add, I think it would almost certainly be wrong to invent new
varlena features along the way.
--
Robert Haas
EDB: http://www.enterprisedb.com