Re: Bug in UTF8-Validation Code? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Bug in UTF8-Validation Code?
Date
Msg-id 12557.1174153862@sss.pgh.pa.us
Whole thread Raw
In response to Re: Bug in UTF8-Validation Code?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bug in UTF8-Validation Code?  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
I wrote:
> Actually, I have to take back that objection: on closer look, COPY
> validates the data only once and does so before applying its own
> backslash-escaping rules.  So there is a risk in that path too.

> It's still pretty annoying to be validating the data twice in the
> common case where no backslash reduction occurred, but I'm not sure
> I see any good way to avoid it.

Further thought here: if we put encoding verification into textin()
and related functions, could we *remove* it from COPY IN, in the common
case where client and server encodings are the same?  Currently, copy.c
forces a trip through pg_client_to_server for multibyte encodings
even when the encodings are the same, so as to perform validation.
But I'm wondering whether we'd still need that.  There's no risk of
SQL injection in COPY data.  Bogus input encoding could possibly
make for confusion about where the field boundaries are, but bad
data is bad data in any case.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Pavan Deolasee"
Date:
Subject: Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
Next
From: "Mickael DELOISON"
Date:
Subject: Project suggestion: benchmark utility for PostgreSQL