Thread: byteain() doesn't parse correctly

byteain() doesn't parse correctly

From
Jered Floyd
Date:
============================================================================
                        POSTGRESQL BUG REPORT TEMPLATE
============================================================================


Your name        :    Jered Floyd
Your email address    :    jered@permabit.com


System Configuration
---------------------
  Architecture (example: Intel Pentium)      : x86

  Operating System (example: Linux 2.0.26 ELF)     : Linux 2.2.17

  PostgreSQL version (example: PostgreSQL-7.0):   PostgreSQL-7.0.3

  Compiler used (example:  gcc 2.8.0)        : gcc 2.95.2


Please enter a FULL description of your problem:
------------------------------------------------

byteain() in backend/utils/adt/varlena.c is just wrong. It can't parse
'\\', claiming 'Bad input string for type bytea'. No, really.

More curious is that it can't handle '\134' either, implying that
multiple levels of parsing are going on.  But, it *can* parse
'\\\\' as \\.  This boggles the mind.


Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------

SELECT '\\'::bytea;
SELECT '\134'::bytea;
SELECT '\\\\'::bytea;


If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------

This routine is cherry fondue; extremely nasty, but we can't prosecute
for that.  It needs a good rewrite, and an audit of the parse chain in
general wouldn't hurt.

As with the previous 2 bugs, I suspect this is something I'll just do
when I port my 7.0.3 routines for making BYTEAs more first-class
citizens to 7.1.  I'd be tickled if you beat me to it, though.

Re: byteain() doesn't parse correctly

From
Tom Lane
Date:
Jered Floyd <jered@permabit.com> writes:
> More curious is that it can't handle '\134' either, implying that
> multiple levels of parsing are going on.

You're right, there are multiple levels of parsing going on.  The
string-literal parser gets first crack before the type-specific
input converter does.  If you don't allow for that when counting
backslashes etc, you'll get confused for sure.

            regards, tom lane

Re: byteain() doesn't parse correctly

From
Tom Lane
Date:
Jered Floyd <jered@permabit.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> You're right, there are multiple levels of parsing going on.  The
>> string-literal parser gets first crack before the type-specific
>> input converter does.  If you don't allow for that when counting
>> backslashes etc, you'll get confused for sure.

> Argh. This is really bad.  This means, for example, that I can't have
> NULs in my bytea, which was the whole reason I was using bytea to
> begin with. Actually, maybe not.

Sure you can.  You just have to write them as \000, which actually
will be written \\000 to get through the string-literal parser.
It's not a real *convenient* notation, I agree, but it works.

There has been talk of providing alternate paths, such as functions
that would convert bytea to and from other textual representations
like base64.  Nothing's been done yet though.

            regards, tom lane

Re: byteain() doesn't parse correctly

From
Jered Floyd
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> You're right, there are multiple levels of parsing going on.  The
> string-literal parser gets first crack before the type-specific
> input converter does.  If you don't allow for that when counting
> backslashes etc, you'll get confused for sure.

Argh. This is really bad.  This means, for example, that I can't have
NULs in my bytea, which was the whole reason I was using bytea to
begin with. Actually, maybe not.

I slighly misevaluted the way in which things are broken before. It
*does* work if I escape my escape characters (why am I reminded of
Emacs regexps?), so '\\\\' really does yield a single backslash in a
bytea. The output routine was simply re-escaping things, but lo,
octet_length() tells the truth! *cry*

Ok, good.  I'm a bit concerned by backend/commands/trigger.c using
byteain() to do argument parsing, but that's not my problem right now.

--Jered