Thread: Use PSQLFS for photo storage

Use PSQLFS for photo storage

From
Jason Long
Date:
I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/)
to store 100 GB of images in PostgreSQL.

Once they are in there I can deal with them.  My main purpose is to use
rsync to get the files into the database.

Is there a better way to load 20,000 plus files reliably into Postgres?

Re: Use PSQLFS for photo storage

From
Alan Hodgson
Date:
On Tuesday 13 January 2009, Jason Long <mailing.list@supernovasoftware.com>
wrote:
> I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/)
> to store 100 GB of images in PostgreSQL.
>
> Once they are in there I can deal with them.  My main purpose is to use
> rsync to get the files into the database.
>
> Is there a better way to load 20,000 plus files reliably into Postgres?

A perl script using either bytea fields or the lo_ interface via DBD::Pg
would work well.

--
Alan

Re: Use PSQLFS for photo storage

From
Jason Long
Date:
bytea was what I was going for.

Does anyone have a script they would share for this purpose?

If not I will probably use Java because this is what I am familiar with.

The web app I will write for managing my photos will be written in Java.  I want to be able to categorize, label, search,.... my photos.

Alan Hodgson wrote:
On Tuesday 13 January 2009, Jason Long <mailing.list@supernovasoftware.com> 
wrote: 
I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/)
to store 100 GB of images in PostgreSQL.

Once they are in there I can deal with them.  My main purpose is to use
rsync to get the files into the database.

Is there a better way to load 20,000 plus files reliably into Postgres?   
A perl script using either bytea fields or the lo_ interface via DBD::Pg 
would work well.
 

Re: Use PSQLFS for photo storage

From
Steve Atkins
Date:
On Jan 13, 2009, at 10:34 AM, Jason Long wrote:

> I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/)
> to store 100 GB of images in PostgreSQL.
>
> Once they are in there I can deal with them.  My main purpose is to
> use rsync to get the files into the database.
>
> Is there a better way to load 20,000 plus files reliably into
> Postgres?

If it's a filesystem then you'd get the files into the system
by copying them there. You wouldn't want to touch the
database manually (that'd be like touching the raw disk
device on a real filesystem).

Conversely, it's just a filesystem. There's not really any use
to putting a filesystem on top of a database on top of a filesystem
other than the (significant) hack value.

In other words, you probably don't really want to do this.

Cheers,
   Steve




Re: Use PSQLFS for photo storage

From
Jason Long
Date:
Steve Atkins wrote:
>
> On Jan 13, 2009, at 10:34 AM, Jason Long wrote:
>
>> I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/)
>> to store 100 GB of images in PostgreSQL.
>>
>> Once they are in there I can deal with them.  My main purpose is to
>> use rsync to get the files into the database.
>>
>> Is there a better way to load 20,000 plus files reliably into Postgres?
>
> If it's a filesystem then you'd get the files into the system
> by copying them there. You wouldn't want to touch the
> database manually (that'd be like touching the raw disk
> device on a real filesystem).
>
> Conversely, it's just a filesystem. There's not really any use
> to putting a filesystem on top of a database on top of a filesystem
> other than the (significant) hack value.
>
> In other words, you probably don't really want to do this.
>
> Cheers,
>   Steve
>
>
>
I just want an easy way to load the files into the DB and their original
path they were loaded from.

Is possible through SQL to load a file into a bytea column?







Re: Use PSQLFS for photo storage

From
Sam Mason
Date:
On Tue, Jan 13, 2009 at 03:28:18PM -0600, Jason Long wrote:
> Steve Atkins wrote:
> >On Jan 13, 2009, at 10:34 AM, Jason Long wrote:
> >>I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/)
> >>to store 100 GB of images in PostgreSQL.
> >>
> >>Is there a better way to load 20,000 plus files reliably into Postgres?

That would imply that they're around 5MB on average?  If they're all
under, say, 20MB (or maybe even much more) you should be able to handle
it by doing the most naive things possible.

> I just want an easy way to load the files into the DB and their original
> path they were loaded from.
>
> Is possible through SQL to load a file into a bytea column?

You'd need to generate the SQL somehow; if you know python it's probably
a pretty easy 20 or 30 lines of code to get this working. psycopg seems
to be the recommend way of accessing PG with python and you basically
want to be doing something like:

  import psycopg2;
  filename = "myimage.jpeg"
  conn = psycopg2.connect("");
  conn.cursor().execute(
    "INSERT INTO pictures (filename,data) VALUES (%s,%s);",
    [filename,psycopg2.Binary(open(filename,"rb").read())]);
  conn.commit();

This seems to do the right thing for me, and obviously needs to be put
into a loop of some sort.  But it'll hopefully get you started.


  Sam

Re: Use PSQLFS for photo storage

From
Jason Long
Date:
Sam Mason wrote:
On Tue, Jan 13, 2009 at 03:28:18PM -0600, Jason Long wrote: 
Steve Atkins wrote:   
On Jan 13, 2009, at 10:34 AM, Jason Long wrote:     
I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/)
to store 100 GB of images in PostgreSQL.

Is there a better way to load 20,000 plus files reliably into Postgres?       
That would imply that they're around 5MB on average?  If they're all
under, say, 20MB (or maybe even much more) you should be able to handle
it by doing the most naive things possible.
 
This is correct.  They are all around 5 MB.
I just want an easy way to load the files into the DB and their original 
path they were loaded from.

Is possible through SQL to load a file into a bytea column?   
You'd need to generate the SQL somehow; if you know python it's probably
a pretty easy 20 or 30 lines of code to get this working. psycopg seems
to be the recommend way of accessing PG with python and you basically
want to be doing something like:
 import psycopg2; filename = "myimage.jpeg" conn = psycopg2.connect(""); conn.cursor().execute(   "INSERT INTO pictures (filename,data) VALUES (%s,%s);",    [filename,psycopg2.Binary(open(filename,"rb").read())]); conn.commit();

This seems to do the right thing for me, and obviously needs to be put
into a loop of some sort.  But it'll hopefully get you started.

 Sam 
Never used Python or Perl.  I use primarily Java.  I was thinking of doing something like
INSERT INTO pictures (filename,data) VALUES ('filename','/path/to/my/image/img0009.jpg');

But, this syntax doesn't seem to be supported.

Maybe I can use a custom C function to get the contents of the file.  Then do something like

INSERT INTO pictures (filename,data) VALUES ('/path/to/my/image/img0009.jpg',getBinaryFileContents('/path/to/my/image/img0009.jpg'));

Is there some postgres contrib for something like this?




Re: Use PSQLFS for photo storage

From
Sam Mason
Date:
On Tue, Jan 13, 2009 at 06:22:34PM -0600, Jason Long wrote:
> Sam Mason wrote:
> >You'd need to generate the SQL somehow; if you know python it's probably
> >a pretty easy 20 or 30 lines of code to get this working.

> *Never used Python or Perl.  I use primarily Java.  I was thinking of
> doing something like
>
> *INSERT INTO pictures (filename,data) VALUES
> ('filename','/path/to/my/image/img0009.jpg');

If you're OK with using large objects, instead of "bytea"s, you can use
the lo_import function.  You'd do something like:

  CREATE TABLE pics (
    path TEXT PRIMARY KEY,
    data OID
  );

  INSERT INTO pics (path,data)
    SELECT path, lo_import(path)
    FROM (VALUES ('/path/to/my/image/img0009.jpg')) x(path);

This assumes that the files are accessable to the database server (i.e.
the paths are relative to the server daemon, not the psql command line
or whatever JDBC app you're driving this from).

Does that help any more?


  Sam

Re: Use PSQLFS for photo storage

From
Jason Long
Date:
Sam Mason wrote:
On Tue, Jan 13, 2009 at 06:22:34PM -0600, Jason Long wrote: 
Sam Mason wrote:   
You'd need to generate the SQL somehow; if you know python it's probably
a pretty easy 20 or 30 lines of code to get this working.     
 
*Never used Python or Perl.  I use primarily Java.  I was thinking of 
doing something like

*INSERT INTO pictures (filename,data) VALUES 
('filename','/path/to/my/image/img0009.jpg');   
If you're OK with using large objects, instead of "bytea"s, you can use
the lo_import function.  You'd do something like:
 CREATE TABLE pics (   path TEXT PRIMARY KEY,   data OID );
 INSERT INTO pics (path,data)   SELECT path, lo_import(path)   FROM (VALUES ('/path/to/my/image/img0009.jpg')) x(path);

This assumes that the files are accessable to the database server (i.e.
the paths are relative to the server daemon, not the psql command line
or whatever JDBC app you're driving this from).

Does that help any more?

 Sam
 
They are on the server.  I would rather use bytea.  Is it possible to import them as large objects and then use SQL to convert them to bytea?

Re: Use PSQLFS for photo storage

From
Sam Mason
Date:
On Tue, Jan 13, 2009 at 06:43:06PM -0600, Jason Long wrote:
> Sam Mason wrote:
> >If you're OK with using large objects, instead of "bytea"s, you can use
> >the lo_import function.
>
> *They are on the server.  I would rather use bytea.  Is it possible to
> import them as large objects and then use SQL to convert them to bytea?*

You can, but it's pretty backward.  If you Java you'd probably be better
off using it to slurp in the file and do the insert there.  If you
really want to use large objects to go to byteas have a look through the
client side API[1] and then at the functions available to call from SQL
by doing \df from psql as they mirror the client calls very closely.

Be aware that you'll be creating a lot more "garbage" that PG will have
to clean out, so importing many gigabytes of data this way will take
much longer.


  Sam

Re: Use PSQLFS for photo storage

From
Sam Mason
Date:
On Wed, Jan 14, 2009 at 12:56:42AM +0000, Sam Mason wrote:
> If you Java you'd probably be better off using it

Hum, it's getting late.  That should be "If you *know* Java"!  Bed time
for me I think!


  Sam

Re: Use PSQLFS for photo storage

From
Jason Long
Date:
Sam Mason wrote:
On Wed, Jan 14, 2009 at 12:56:42AM +0000, Sam Mason wrote: 
If you Java you'd probably be better off using it   
Hum, it's getting late.  That should be "If you *know* Java"!  Bed time
for me I think!

 Sam
 
Thanks for the advice.  I will probably go with Java.  In the inventory system I developed I am already storing documents via bytea.  I will probably do the same with these images.  I will be dealing with them via hibernate so I guess I should import them the same way.

I was hoping for a already developed function that could import an entire directory structure recursively.  I can do this easily enough in Java.

Re: Use PSQLFS for photo storage

From
Reid Thompson
Date:
On Tue, 2009-01-13 at 18:22 -0600, Jason Long wrote:
> Never used Python or Perl.  I use primarily Java.  I was thinking of
> doing something like
> INSERT INTO pictures (filename,data) VALUES ('filename','/path/to/my/image/img0009.jpg');
>
> But, this syntax doesn't seem to be supported.
>
> Maybe I can use a custom C function to get the contents of the file.  Then do something like
>
> INSERT INTO pictures (filename,data) VALUES
('/path/to/my/image/img0009.jpg',getBinaryFileContents('/path/to/my/image/img0009.jpg'));
>
> Is there some postgres contrib for something like this?
>
>
>
>
Simple java stub test program attached.  Modify to meet your needs.  See
file header for URL of original example.

Attachment

Re: Use PSQLFS for photo storage

From
Jason Long
Date:
Reid Thompson wrote:
On Tue, 2009-01-13 at 18:22 -0600, Jason Long wrote: 
Never used Python or Perl.  I use primarily Java.  I was thinking of
doing something like
INSERT INTO pictures (filename,data) VALUES ('filename','/path/to/my/image/img0009.jpg');

But, this syntax doesn't seem to be supported.

Maybe I can use a custom C function to get the contents of the file.  Then do something like

INSERT INTO pictures (filename,data) VALUES ('/path/to/my/image/img0009.jpg',getBinaryFileContents('/path/to/my/image/img0009.jpg'));

Is there some postgres contrib for something like this?



   
Simple java stub test program attached.  Modify to meet your needs.  See
file header for URL of original example. 
Thanks for the sample.  I will be using Hibernate with will make my code much cleaner.  I will post an example here when done.  I never use straight JDBC.

Re: Use PSQLFS for photo storage

From
Jason Long
Date:
Steven Lembark wrote:
I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/)
to store 100 GB of images in PostgreSQL.

Once they are in there I can deal with them.  My main purpose is to use
rsync to get the files into the database.

Is there a better way to load 20,000 plus files reliably into Postgres?   
Don't: put them into a reasonably-organized filesystem
and store the paths. 
This is exactly what I do not want to do.  I do not want to deal with the file system at all except for initial import.  I want to manage the photos entirely with SQL.
I use dir names of date-venue-descr
and basenames of the dir-image number (e.g.,
"2009.01.12-foo-bar/123-4567.nef"). You'll probably find
that moving the data into a viewer using a path will be
simpler than pumping the raw data through postgres
anyway. 
Please explain.  What does moving the data into a viewer using a path mean.
At that point you could create COPY statements that look
like a pg_dump output to load the stuff the first time.
 
Please elaborate.