Re: fast read of binary data - Mailing list pgsql-performance
From | Eildert Groeneveld |
---|---|
Subject | Re: fast read of binary data |
Date | |
Msg-id | 1353570844.30293.8.camel@eno.tzv.local Whole thread Raw |
In response to | Re: fast read of binary data ("Albe Laurenz" <laurenz.albe@wien.gv.at>) |
Responses |
Re: fast read of binary data
|
List | pgsql-performance |
On Mo, 2012-11-12 at 12:18 +0100, Albe Laurenz wrote: > Eildert Groeneveld wrote: > > I am currently implementing using a compressed binary storage scheme > > genotyping data. These are basically vectors of binary data which may be > > megabytes in size. > > > > Our current implementation uses the data type bit varying. > > > > What we want to do is very simple: we want to retrieve such records from > > the database and transfer it unaltered to the client which will do > > something (uncompressing) with it. As massive amounts of data are to be > > moved, speed is of great importance, precluding any to and fro > > conversions. > > > > Our current implementation uses Perl DBI; we can retrieve the data ok, > > but apparently there is some converting going on. > > > > Further, we would like to use ODBC from Fortran90 (wrapping the > > C-library) for such transfers. However, all sorts funny things happen > > here which look like conversion issues. > > > > In old fashioned network database some decade ago (in pre SQL times) > > this was no problem. Maybe there is someone here who knows the PG > > internals sufficiently well to give advice on how big blocks of memory > > (i.e. bit varying records) can between transferred UNALTERED between > > backend and clients. > > Using the C API you can specify binary mode for your data, which > meand that they won't be converted. > > I don't think you will be able to use this with DBI or ODBC, > but maybe binary corsors can help > (http://www.postgresql.org/docs/current/static/sql-declare.html), > but I don't know if DBI or ODBC handles them well. > > If you can avoid DBI or ODBC, that would be best. ok, I did have a look at the libpq librar, and you are right, there is a way to obtain binary data from the backend through the PQexecParams res = PQexecParams(conn, "DECLARE myportal CURSOR FOR select genotype_bits from v_genotype_data", 0, /* zero param */ NULL, /* let the backend deduce param type */ paramValues, NULL, /* don't need param lengths since text*/ NULL, /* default to all text params */ 1); /* ask for binary results */ genotype_bits is defined as bit varying in the backend. When writing the results: for (i = 0; i < PQntuples(res); i++) { for (j = 0; j < nFields; j++) fwrite(PQgetvalue(res, i, j),100000,1,f); } it is clear that the results are NOT in binary format: eg(eno,snp): od -b junk |head 0000000 061 060 061 060 061 060 061 060 061 060 061 060 061 060 061 060 clearly, these are nice 0 and 1 in ASCII and not as I need it as a bit stream. Also, (and in line with this) PQgetvalue(res, i, j) seems to be of type text. What am I missing?
pgsql-performance by date: