Re: fast read of binary data - Mailing list pgsql-performance

From Eildert Groeneveld
Subject Re: fast read of binary data
Date
Msg-id 1353570844.30293.8.camel@eno.tzv.local
Whole thread Raw
In response to Re: fast read of binary data  ("Albe Laurenz" <laurenz.albe@wien.gv.at>)
Responses Re: fast read of binary data
List pgsql-performance
On Mo, 2012-11-12 at 12:18 +0100, Albe Laurenz wrote:
> Eildert Groeneveld wrote:
> > I am currently implementing using a compressed binary storage scheme
> > genotyping data. These are basically vectors of binary data which may be
> > megabytes in size.
> >
> > Our current implementation uses the data type bit varying.
> >
> > What we want to do is very simple: we want to retrieve such records from
> > the database and transfer it unaltered to the client which will do
> > something (uncompressing) with it. As massive amounts of data are to be
> > moved, speed is of great importance, precluding any to and fro
> > conversions.
> >
> > Our current implementation uses Perl DBI; we can retrieve the data ok,
> > but apparently there is some converting going on.
> >
> > Further, we would like to use ODBC from Fortran90 (wrapping the
> > C-library)  for such transfers. However, all sorts funny things happen
> > here which look like conversion issues.
> >
> > In old fashioned network database some decade ago (in pre SQL times)
> > this was no problem. Maybe there is someone here who knows the PG
> > internals sufficiently well to give advice on how big blocks of memory
> > (i.e. bit varying records) can between transferred UNALTERED between
> > backend and clients.
>
> Using the C API you can specify binary mode for your data, which
> meand that they won't be converted.
>
> I don't think you will be able to use this with DBI or ODBC,
> but maybe binary corsors can help
> (http://www.postgresql.org/docs/current/static/sql-declare.html),
> but I don't know if DBI or ODBC handles them well.
>
> If you can avoid DBI or ODBC, that would be best.
ok, I did have a look at the libpq librar, and you are right, there is a
way to obtain binary data from the backend through the  PQexecParams

    res = PQexecParams(conn,
                       "DECLARE myportal CURSOR FOR select genotype_bits
from v_genotype_data",
                       0,       /* zero param */
                       NULL,    /* let the backend deduce param type */
                       paramValues,
                       NULL,    /* don't need param lengths since text*/
                       NULL,    /* default to all text params */
                       1);      /* ask for binary results */

genotype_bits is defined as bit varying in the backend. When writing the
results:
    for (i = 0; i < PQntuples(res); i++)
    {
        for (j = 0; j < nFields; j++)
      fwrite(PQgetvalue(res, i, j),100000,1,f);
    }

it is clear that the results are NOT in binary format:
eg(eno,snp): od -b junk |head
0000000 061 060 061 060 061 060 061 060 061 060 061 060 061 060 061 060

clearly, these are nice 0 and 1 in ASCII and not as I need it as a bit
stream.

Also, (and in line with this) PQgetvalue(res, i, j) seems to be of type
text.

What am I missing?






pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Hints (was Poor performance using CTE)
Next
From: Heikki Linnakangas
Date:
Subject: Re: Hints (was Poor performance using CTE)