Thread: Use PSQLFS for photo storage
I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/) to store 100 GB of images in PostgreSQL. Once they are in there I can deal with them. My main purpose is to use rsync to get the files into the database. Is there a better way to load 20,000 plus files reliably into Postgres?
On Tuesday 13 January 2009, Jason Long <mailing.list@supernovasoftware.com> wrote: > I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/) > to store 100 GB of images in PostgreSQL. > > Once they are in there I can deal with them. My main purpose is to use > rsync to get the files into the database. > > Is there a better way to load 20,000 plus files reliably into Postgres? A perl script using either bytea fields or the lo_ interface via DBD::Pg would work well. -- Alan
bytea was what I was going for.
Does anyone have a script they would share for this purpose?
If not I will probably use Java because this is what I am familiar with.
The web app I will write for managing my photos will be written in Java. I want to be able to categorize, label, search,.... my photos.
Alan Hodgson wrote:
Does anyone have a script they would share for this purpose?
If not I will probably use Java because this is what I am familiar with.
The web app I will write for managing my photos will be written in Java. I want to be able to categorize, label, search,.... my photos.
Alan Hodgson wrote:
On Tuesday 13 January 2009, Jason Long <mailing.list@supernovasoftware.com> wrote:I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/) to store 100 GB of images in PostgreSQL. Once they are in there I can deal with them. My main purpose is to use rsync to get the files into the database. Is there a better way to load 20,000 plus files reliably into Postgres?A perl script using either bytea fields or the lo_ interface via DBD::Pg would work well.
On Jan 13, 2009, at 10:34 AM, Jason Long wrote: > I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/) > to store 100 GB of images in PostgreSQL. > > Once they are in there I can deal with them. My main purpose is to > use rsync to get the files into the database. > > Is there a better way to load 20,000 plus files reliably into > Postgres? If it's a filesystem then you'd get the files into the system by copying them there. You wouldn't want to touch the database manually (that'd be like touching the raw disk device on a real filesystem). Conversely, it's just a filesystem. There's not really any use to putting a filesystem on top of a database on top of a filesystem other than the (significant) hack value. In other words, you probably don't really want to do this. Cheers, Steve
Steve Atkins wrote: > > On Jan 13, 2009, at 10:34 AM, Jason Long wrote: > >> I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/) >> to store 100 GB of images in PostgreSQL. >> >> Once they are in there I can deal with them. My main purpose is to >> use rsync to get the files into the database. >> >> Is there a better way to load 20,000 plus files reliably into Postgres? > > If it's a filesystem then you'd get the files into the system > by copying them there. You wouldn't want to touch the > database manually (that'd be like touching the raw disk > device on a real filesystem). > > Conversely, it's just a filesystem. There's not really any use > to putting a filesystem on top of a database on top of a filesystem > other than the (significant) hack value. > > In other words, you probably don't really want to do this. > > Cheers, > Steve > > > I just want an easy way to load the files into the DB and their original path they were loaded from. Is possible through SQL to load a file into a bytea column?
On Tue, Jan 13, 2009 at 03:28:18PM -0600, Jason Long wrote: > Steve Atkins wrote: > >On Jan 13, 2009, at 10:34 AM, Jason Long wrote: > >>I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/) > >>to store 100 GB of images in PostgreSQL. > >> > >>Is there a better way to load 20,000 plus files reliably into Postgres? That would imply that they're around 5MB on average? If they're all under, say, 20MB (or maybe even much more) you should be able to handle it by doing the most naive things possible. > I just want an easy way to load the files into the DB and their original > path they were loaded from. > > Is possible through SQL to load a file into a bytea column? You'd need to generate the SQL somehow; if you know python it's probably a pretty easy 20 or 30 lines of code to get this working. psycopg seems to be the recommend way of accessing PG with python and you basically want to be doing something like: import psycopg2; filename = "myimage.jpeg" conn = psycopg2.connect(""); conn.cursor().execute( "INSERT INTO pictures (filename,data) VALUES (%s,%s);", [filename,psycopg2.Binary(open(filename,"rb").read())]); conn.commit(); This seems to do the right thing for me, and obviously needs to be put into a loop of some sort. But it'll hopefully get you started. Sam
Sam Mason wrote:
This is correct. They are all around 5 MB.On Tue, Jan 13, 2009 at 03:28:18PM -0600, Jason Long wrote:Steve Atkins wrote:On Jan 13, 2009, at 10:34 AM, Jason Long wrote:I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/) to store 100 GB of images in PostgreSQL. Is there a better way to load 20,000 plus files reliably into Postgres?That would imply that they're around 5MB on average? If they're all under, say, 20MB (or maybe even much more) you should be able to handle it by doing the most naive things possible.
Never used Python or Perl. I use primarily Java. I was thinking of doing something likeI just want an easy way to load the files into the DB and their original path they were loaded from. Is possible through SQL to load a file into a bytea column?You'd need to generate the SQL somehow; if you know python it's probably a pretty easy 20 or 30 lines of code to get this working. psycopg seems to be the recommend way of accessing PG with python and you basically want to be doing something like: import psycopg2; filename = "myimage.jpeg" conn = psycopg2.connect(""); conn.cursor().execute( "INSERT INTO pictures (filename,data) VALUES (%s,%s);", [filename,psycopg2.Binary(open(filename,"rb").read())]); conn.commit(); This seems to do the right thing for me, and obviously needs to be put into a loop of some sort. But it'll hopefully get you started. Sam
INSERT INTO pictures (filename,data) VALUES ('filename','/path/to/my/image/img0009.jpg'); But, this syntax doesn't seem to be supported. Maybe I can use a custom C function to get the contents of the file. Then do something like INSERT INTO pictures (filename,data) VALUES ('/path/to/my/image/img0009.jpg',getBinaryFileContents('/path/to/my/image/img0009.jpg')); Is there some postgres contrib for something like this?
On Tue, Jan 13, 2009 at 06:22:34PM -0600, Jason Long wrote: > Sam Mason wrote: > >You'd need to generate the SQL somehow; if you know python it's probably > >a pretty easy 20 or 30 lines of code to get this working. > *Never used Python or Perl. I use primarily Java. I was thinking of > doing something like > > *INSERT INTO pictures (filename,data) VALUES > ('filename','/path/to/my/image/img0009.jpg'); If you're OK with using large objects, instead of "bytea"s, you can use the lo_import function. You'd do something like: CREATE TABLE pics ( path TEXT PRIMARY KEY, data OID ); INSERT INTO pics (path,data) SELECT path, lo_import(path) FROM (VALUES ('/path/to/my/image/img0009.jpg')) x(path); This assumes that the files are accessable to the database server (i.e. the paths are relative to the server daemon, not the psql command line or whatever JDBC app you're driving this from). Does that help any more? Sam
Sam Mason wrote:
They are on the server. I would rather use bytea. Is it possible to import them as large objects and then use SQL to convert them to bytea?On Tue, Jan 13, 2009 at 06:22:34PM -0600, Jason Long wrote:Sam Mason wrote:You'd need to generate the SQL somehow; if you know python it's probably a pretty easy 20 or 30 lines of code to get this working.*Never used Python or Perl. I use primarily Java. I was thinking of doing something like *INSERT INTO pictures (filename,data) VALUES ('filename','/path/to/my/image/img0009.jpg');If you're OK with using large objects, instead of "bytea"s, you can use the lo_import function. You'd do something like: CREATE TABLE pics ( path TEXT PRIMARY KEY, data OID ); INSERT INTO pics (path,data) SELECT path, lo_import(path) FROM (VALUES ('/path/to/my/image/img0009.jpg')) x(path); This assumes that the files are accessable to the database server (i.e. the paths are relative to the server daemon, not the psql command line or whatever JDBC app you're driving this from). Does that help any more? Sam
On Tue, Jan 13, 2009 at 06:43:06PM -0600, Jason Long wrote: > Sam Mason wrote: > >If you're OK with using large objects, instead of "bytea"s, you can use > >the lo_import function. > > *They are on the server. I would rather use bytea. Is it possible to > import them as large objects and then use SQL to convert them to bytea?* You can, but it's pretty backward. If you Java you'd probably be better off using it to slurp in the file and do the insert there. If you really want to use large objects to go to byteas have a look through the client side API[1] and then at the functions available to call from SQL by doing \df from psql as they mirror the client calls very closely. Be aware that you'll be creating a lot more "garbage" that PG will have to clean out, so importing many gigabytes of data this way will take much longer. Sam
On Wed, Jan 14, 2009 at 12:56:42AM +0000, Sam Mason wrote: > If you Java you'd probably be better off using it Hum, it's getting late. That should be "If you *know* Java"! Bed time for me I think! Sam
Sam Mason wrote:
I was hoping for a already developed function that could import an entire directory structure recursively. I can do this easily enough in Java.
Thanks for the advice. I will probably go with Java. In the inventory system I developed I am already storing documents via bytea. I will probably do the same with these images. I will be dealing with them via hibernate so I guess I should import them the same way.On Wed, Jan 14, 2009 at 12:56:42AM +0000, Sam Mason wrote:If you Java you'd probably be better off using itHum, it's getting late. That should be "If you *know* Java"! Bed time for me I think! Sam
I was hoping for a already developed function that could import an entire directory structure recursively. I can do this easily enough in Java.
On Tue, 2009-01-13 at 18:22 -0600, Jason Long wrote: > Never used Python or Perl. I use primarily Java. I was thinking of > doing something like > INSERT INTO pictures (filename,data) VALUES ('filename','/path/to/my/image/img0009.jpg'); > > But, this syntax doesn't seem to be supported. > > Maybe I can use a custom C function to get the contents of the file. Then do something like > > INSERT INTO pictures (filename,data) VALUES ('/path/to/my/image/img0009.jpg',getBinaryFileContents('/path/to/my/image/img0009.jpg')); > > Is there some postgres contrib for something like this? > > > > Simple java stub test program attached. Modify to meet your needs. See file header for URL of original example.
Attachment
Reid Thompson wrote:
Thanks for the sample. I will be using Hibernate with will make my code much cleaner. I will post an example here when done. I never use straight JDBC.On Tue, 2009-01-13 at 18:22 -0600, Jason Long wrote:Never used Python or Perl. I use primarily Java. I was thinking of doing something like INSERT INTO pictures (filename,data) VALUES ('filename','/path/to/my/image/img0009.jpg'); But, this syntax doesn't seem to be supported. Maybe I can use a custom C function to get the contents of the file. Then do something like INSERT INTO pictures (filename,data) VALUES ('/path/to/my/image/img0009.jpg',getBinaryFileContents('/path/to/my/image/img0009.jpg')); Is there some postgres contrib for something like this?Simple java stub test program attached. Modify to meet your needs. See file header for URL of original example.
Steven Lembark wrote:
This is exactly what I do not want to do. I do not want to deal with the file system at all except for initial import. I want to manage the photos entirely with SQL.I would like to use PSQLFS(http://www.edlsystems.com/psqlfs/) to store 100 GB of images in PostgreSQL. Once they are in there I can deal with them. My main purpose is to use rsync to get the files into the database. Is there a better way to load 20,000 plus files reliably into Postgres?Don't: put them into a reasonably-organized filesystem and store the paths.
Please explain. What does moving the data into a viewer using a path mean.I use dir names of date-venue-descr and basenames of the dir-image number (e.g., "2009.01.12-foo-bar/123-4567.nef"). You'll probably find that moving the data into a viewer using a path will be simpler than pumping the raw data through postgres anyway.
Please elaborate.At that point you could create COPY statements that look like a pg_dump output to load the stuff the first time.