Re: invalidly encoded strings - Mailing list pgsql-hackers

From Tom Lane
Subject Re: invalidly encoded strings
Date
Msg-id 16212.1189394546@sss.pgh.pa.us
Whole thread Raw
In response to Re: invalidly encoded strings  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: invalidly encoded strings
Re: invalidly encoded strings
List pgsql-hackers
Jeff Davis <pgsql@j-davis.com> writes:
> Would stringTypeDatum() in parse_type.c be a good place to put the
> pg_verifymbstr()? 

Probably not, in its current form, since it hasn't got any idea where
the "char *string" came from; moreover it is not in any better position
than the typinput function to determine whether there was a bogus
embedded null.

OTOH, there may be no decent way to fix the embedded-null problem
other than by hacking the scanner to reject \0 immediately.  If we
did that it would give us more flexibility about where to put the
encoding validity checks.

In any case, I feel dubious that checking in stringTypeDatum will cover
every code path.  Somewhere around where A_Const gets transformed to
Const seems like it'd be a better plan.  (But I think that in most
utility statement parsetrees, A_Const never does get transformed to
Const; and there seem to be a few places in gram.y where an SCONST
gives rise to something other than A_Const; so this is still not a
bulletproof choice, at least not without additional changes.)

In the short run it might be best to do it in scan.l after all.  A few
minutes' thought about what it'd take to delay the decisions till later
yields a depressingly large number of changes; and we do not have time
to be developing mostly-cosmetic patches for 8.3.  Given that
database_encoding is frozen for any one DB at the moment, and that that
is unlikely to change in the near future, insisting on a solution that
allows it to vary is probably unreasonable at this stage of the game.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Kenneth Marshall
Date:
Subject: Re: Hash index todo list item
Next
From: Andrew Dunstan
Date:
Subject: Re: invalidly encoded strings