Re: Implementing full UTF-8 support (aka supporting 0x00) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Implementing full UTF-8 support (aka supporting 0x00)
Date
Msg-id 24883.1470237821@sss.pgh.pa.us
Whole thread Raw
In response to Implementing full UTF-8 support (aka supporting 0x00)  (Álvaro Hernández Tortosa <aht@8kdata.com>)
Responses Re: Implementing full UTF-8 support (aka supporting 0x00)  (Álvaro Hernández Tortosa <aht@8kdata.com>)
List pgsql-hackers
Álvaro Hernández Tortosa <aht@8kdata.com> writes:
>      As has been previously discussed (see 
> https://www.postgresql.org/message-id/BAY7-F17FFE0E324AB3B642C547E96890%40phx.gbl 
> for instance) varlena fields cannot accept the literal 0x00 value.

Yup.

>      What would it take to support it?

One key reason why that's hard is that datatype input and output
functions use nul-terminated C strings as the representation of the
text form of any datatype.  We can't readily change that API without
breaking huge amounts of code, much of it not under the core project's
control.

There may be other places where nul-terminated strings would be a hazard
(mumble fgets mumble), but offhand that API seems like the major problem
so far as the backend is concerned.

There would be a slew of client-side problems as well.  For example this
would assuredly break psql and pg_dump, along with every other client that
supposes that it can treat PQgetvalue() as returning a nul-terminated
string.  This end of it would possibly be even worse than fixing the
backend, because so little of the affected code is under our control.

In short, the problem is not with having an embedded nul in a stored
text value.  The problem is the reams of code that suppose that the
text representation of any data value is a nul-terminated C string.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Dave Cramer
Date:
Subject: Re: regression test for extended query protocol
Next
From: Kevin Grittner
Date:
Subject: Re: Wanting to learn about pgsql design decision