Some encoding trouble via libpq - Mailing list pgsql-general

From Billy Gray
Subject Some encoding trouble via libpq
Date
Msg-id 1175089113.167572.176700@l77g2000hsb.googlegroups.com
Whole thread Raw
Responses Re: Some encoding trouble via libpq  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Friends,

I did a little research into the archives of this list for my
particular problem, and while I haven't found the solution, I'm
thinking that maybe I'm approaching it wrong.  If anyone has any
advice, it'd be much appreciated.

On the one hand I have a database in postgres created WITH
ENCODING='UTF8'.  On the other hand I have this C program written with
libpq-fe.h that takes data over standard input, does some checking,
and then inserts it into a table in the aforementioned database.

The trick there is properly reading off of stdin in a safe way (input
can be of variable size), especially since this data is mostly going
to be coming off of a pipe transport in exim!  Before I was doing it
sloppy with getc but it was working just fine with postgres, but now
I'm using fread.  The oddest thing about this is that the program does
just what it should when it is run on Mac OS X against postgres 8.2.3,
also on Mac OS X.  It's when I run it on CentOS 4 that I get this
error:

ERROR:  invalid byte sequence for encoding "UTF8": 0xc0f5
HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

Now that error message is very literal, and I've been trying to take
it at face value.  Yet, the data going in over stdin is ASCII, which
shouldn't need any conversion to UTF8!  The same data worked just fine
before on Cent OS, before I put together this new fread routine.  And
yet, that routine works just fine on Mac OS X!  Weird!

We have tried setting certain env vars (export LC_ALL=en_US.utf-8,
export PGCLIENTENCODING=utf8) to force a client encoding on the client
side, and we get this slightly different error in that instance:

ERROR:  invalid byte sequence for encoding "UTF8": 0xb0HINT:  This
error can also happen if the byte sequence does not match the encoding
expected by the server, which is controlled by "client_encoding".

Without further ado, this is the routine used to read off of stdin,
and below that is the snippet of code the does the statement execution
into postgres:

char *
readinput()
{
    char *buffer = (char *) xmalloc (STDIN_BLOCK); //xmalloc is really
malloc
    int offset = 0;
    int read = 1;
    int size = STDIN_BLOCK;

    while ( (read > 0) && (offset <= STDIN_MAX) )
    {
        syslog (LOG_DEBUG, "Reading a block...");
        read = fread (buffer + offset, 1, STDIN_BLOCK, stdin);
        offset += read;
        if (read == STDIN_BLOCK)
        {
            size += STDIN_BLOCK;
            buffer = xrealloc (buffer, size);
        }
    } // while

    // null terminate the string...
    memset(buffer + offset + 1, '\0', 1);

    syslog (LOG_DEBUG, "Read message of %d bytes", offset);

    fprintf (stderr, "Contents of the buffer:\n%s\n\n", buffer);

    return buffer;
}

Later on in my program, I set this buffer to one of my input
parameters and run an insert query:

message = readinput();

    paramValues[0] = ping_id;
    paramValues[1] = event_id;
    paramValues[2] = message; // get from std input! how do we do that
again???
    result = PQexecParams(conn,
                            "INSERT INTO event_changes (ping_id, event_id, created_at,
message) VALUES ($1, $2, NOW(), $3)",
                            3,
                            NULL, // backend figures out type itself
                            paramValues,
                            NULL, // apparently we don't need param lengths
                            NULL, // all text params
                            0 // we don't want binary results, no
                        );

I'm a little beside myself as to what I ought to try next, so if
anybody has any random hunches, they're much appreciated!  A really
interesting thing is that I've recreated my database with encoding
SQL_ASCII, which really should make it ignorant of the encoding of the
data coming in.  In that instance, I get a weird variation on the
error listed above:

ERROR:  invalid input syntax for integer: "????^?"
STATEMENT:  INSERT INTO event_changes (ping_id, event_id, created_at,
message) VALUES ($1, $2, NOW(), $3)

Thanks,
Billy


pgsql-general by date:

Previous
From: "Dmitry Koterov"
Date:
Subject: Re: How to speedup CHECKPOINTs?
Next
From: "Nik"
Date:
Subject: pg_dump is stuck