Thread: Scanning a large binary field

Scanning a large binary field

From

Kynn Jones

Date:

15 March 2009, 17:42:27

I have a C program that reads a large binary file, and uses the read information plus some user-supplied arguments to generate an in-memory data structure that is used during the remainder of the program's execution. I would like to adapt this code so that it gets the original binary data from a Pg database rather than a file.

One very nice feature of the original scheme is that the reading of the original file was done piecemeal, so that the full content of the file (which is about 0.2GB) was never in memory all at once, which kept the program's memory footprint nice and small.

Is there any way to replicate this small memory footprint if the program reads the binary data from a Pg DB instead of from a file?

FWIW, my OS is Linux.

TIA!

Kynn

Re: Scanning a large binary field

From

John R Pierce

Date:

15 March 2009, 18:07:09

Kynn Jones wrote:
> I have a C program that reads a large binary file, and uses the read
> information plus some user-supplied arguments to generate an in-memory
> data structure that is used during the remainder of the program's
> execution.  I would like to adapt this code so that it gets the
> original binary data from a Pg database rather than a file.
>
> One very nice feature of the original scheme is that the reading of
> the original file was done piecemeal, so that the full content of the
> file (which is about 0.2GB) was never in memory all at once, which
> kept the program's memory footprint nice and small.
>
> Is there any way to replicate this small memory footprint if the
> program reads the binary data from a Pg DB instead of from a file?

is this binary data in any way record or table structured such that it
could be stored as multiple rows and perrhaps fields?    if not, why
would you want to put a 200MB blob of amorphous data into a relational
database?

Re: Scanning a large binary field

From

Kynn Jones

Date:

15 March 2009, 18:20:46

On Sun, Mar 15, 2009 at 5:06 PM, John R Pierce <pierce@hogranch.com> wrote:

Kynn Jones wrote:
I have a C program that reads a large binary file, and uses the read information plus some user-supplied arguments to generate an in-memory data structure that is used during the remainder of the program's execution. I would like to adapt this code so that it gets the original binary data from a Pg database rather than a file.

One very nice feature of the original scheme is that the reading of the original file was done piecemeal, so that the full content of the file (which is about 0.2GB) was never in memory all at once, which kept the program's memory footprint nice and small.

Is there any way to replicate this small memory footprint if the program reads the binary data from a Pg DB instead of from a file?

is this binary data in any way record or table structured such that it could be stored as multiple rows and perrhaps fields? if not, why would you want to put a 200MB blob of amorphous data into a relational database?

That's a fair question. The program in question already gets from the relational database most of the external data it needs. The only exception to this is these large amorphous blobs, as you describe them. My only reason for wanting to put the blobs in the DB as well is to consolidate all the external data sources for the program.

Kynn

Re: Scanning a large binary field

From

John R Pierce

Date:

15 March 2009, 18:25:44

Kynn Jones wrote:
> That's a fair question.  The program in question already gets from the
> relational database most of the external data it needs.  The only
> exception to this is these large amorphous blobs, as you describe
> them.  My only reason for wanting to put the blobs in the DB as well
> is to consolidate all the external data sources for the program.

well, look at the LO (large object) facility of postgres.   this is
available to apps that call libpq directly, I have no idea if any of the
generic 'portable' APIs would have any such hooks.

Re: Scanning a large binary field

From

Sam Mason

Date:

16 March 2009, 12:25:39

On Sun, Mar 15, 2009 at 02:25:11PM -0700, John R Pierce wrote:
> Kynn Jones wrote:
> >That's a fair question.  The program in question already gets from the
> >relational database most of the external data it needs.  The only
> >exception to this is these large amorphous blobs, as you describe
> >them.  My only reason for wanting to put the blobs in the DB as well
> >is to consolidate all the external data sources for the program.
>
> well, look at the LO (large object) facility of postgres.   this is
> available to apps that call libpq directly, I have no idea if any of the
> generic 'portable' APIs would have any such hooks.

They are all exposed as normal SQL functions that can be used as needed;
I've just gone through the pain of getting VB to talk to them which
included writing base64 encoding and decoding routines as VB didn't seem
to be very reliable at handling non-ascii characters.

The C library interface is documented here:

  http://www.postgresql.org/docs/current/static/lo-interfaces.html

and the SQL level variants are named similarly (sometimes without an
underscore in the name) and have identical semantics.

--
  Sam  http://samason.me.uk/