mark@mark.mielke.cc writes:
> I read "the backend is by and large an ASCII, null-terminated-string
> engine" with "we use UTF-8 [for varlena strings?]" as, a lot of the
> code assumes varlena strings are '\0' terminated, and an assumption
> on my part, that the varlena strings are not stored in the backend
> with a '\0' terminator, therefore, they require being copied out,
> terminated with a '\0', before they can be used?
There are places where we have to do that, the worst from a performance
viewpoint being in string comparison --- we have to null-terminate both
values before we can pass them to strcoll().
One of the large bits that would have to be done before we could even
contemplate using UCS2/UCS4 is getting rid of our dependence on strcoll,
since its API is null-terminated-string.
> How much effort (past discussions that I've missed from a decade ago?
> hehe) has been put into determining whether a zero-copy architecture,
> or really, a minimum copy architecture, would address some of these
> bottlenecks? Am I dreaming? :-)
We've already done it in places, for instance the new implementation
of "virtual tuples" in TupleTableSlots eliminates a lot of copying
of pass-by-reference values.
regards, tom lane