Thread: byteain() doesn't parse correctly
============================================================================ POSTGRESQL BUG REPORT TEMPLATE ============================================================================ Your name : Jered Floyd Your email address : jered@permabit.com System Configuration --------------------- Architecture (example: Intel Pentium) : x86 Operating System (example: Linux 2.0.26 ELF) : Linux 2.2.17 PostgreSQL version (example: PostgreSQL-7.0): PostgreSQL-7.0.3 Compiler used (example: gcc 2.8.0) : gcc 2.95.2 Please enter a FULL description of your problem: ------------------------------------------------ byteain() in backend/utils/adt/varlena.c is just wrong. It can't parse '\\', claiming 'Bad input string for type bytea'. No, really. More curious is that it can't handle '\134' either, implying that multiple levels of parsing are going on. But, it *can* parse '\\\\' as \\. This boggles the mind. Please describe a way to repeat the problem. Please try to provide a concise reproducible example, if at all possible: ---------------------------------------------------------------------- SELECT '\\'::bytea; SELECT '\134'::bytea; SELECT '\\\\'::bytea; If you know how this problem might be fixed, list the solution below: --------------------------------------------------------------------- This routine is cherry fondue; extremely nasty, but we can't prosecute for that. It needs a good rewrite, and an audit of the parse chain in general wouldn't hurt. As with the previous 2 bugs, I suspect this is something I'll just do when I port my 7.0.3 routines for making BYTEAs more first-class citizens to 7.1. I'd be tickled if you beat me to it, though.
Jered Floyd <jered@permabit.com> writes: > More curious is that it can't handle '\134' either, implying that > multiple levels of parsing are going on. You're right, there are multiple levels of parsing going on. The string-literal parser gets first crack before the type-specific input converter does. If you don't allow for that when counting backslashes etc, you'll get confused for sure. regards, tom lane
Jered Floyd <jered@permabit.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> You're right, there are multiple levels of parsing going on. The >> string-literal parser gets first crack before the type-specific >> input converter does. If you don't allow for that when counting >> backslashes etc, you'll get confused for sure. > Argh. This is really bad. This means, for example, that I can't have > NULs in my bytea, which was the whole reason I was using bytea to > begin with. Actually, maybe not. Sure you can. You just have to write them as \000, which actually will be written \\000 to get through the string-literal parser. It's not a real *convenient* notation, I agree, but it works. There has been talk of providing alternate paths, such as functions that would convert bytea to and from other textual representations like base64. Nothing's been done yet though. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > You're right, there are multiple levels of parsing going on. The > string-literal parser gets first crack before the type-specific > input converter does. If you don't allow for that when counting > backslashes etc, you'll get confused for sure. Argh. This is really bad. This means, for example, that I can't have NULs in my bytea, which was the whole reason I was using bytea to begin with. Actually, maybe not. I slighly misevaluted the way in which things are broken before. It *does* work if I escape my escape characters (why am I reminded of Emacs regexps?), so '\\\\' really does yield a single backslash in a bytea. The output routine was simply re-escaping things, but lo, octet_length() tells the truth! *cry* Ok, good. I'm a bit concerned by backend/commands/trigger.c using byteain() to do argument parsing, but that's not my problem right now. --Jered