Thread: Can I read data load files without loading in db?
We have an awkward situation. An affiliate organization periodically sends us a stack of CDs. On the first one there are a couple of small scripts to handle installing the data and/or upgrading the database schema. Most of the CD's contents are large data files to be used to update a postgresql database. This would be OK if we had much need for the full database. In practice, we typically spend all the time loading the data into the database only to turn around and run our own C++ program that generates extracts containing the relatively few data fields that we need on another project (where we load the data into a Berkeley database and use it from there). I would like to skip the time consuming process of Postgresql loading before extracting, and somehow directly generate the extracts that we need with a program that reads directly from the database dump/load files on the CDs. The affiliate organization has made it clear that they cannot afford the resources to make a customized limited edition of their data. I would appreciate some hints on the issues. Which of the files in such a situation would have the data layout information? If there are some of you that have actually done something similar (directly read the archive files without loading them), I'd greatly appreciate hearing about it. Thanks -bC
On Wed, Aug 23, 2006 at 12:27:09PM -0700, barryc.ctr@gmail.com wrote: > We have an awkward situation. > > An affiliate organization periodically sends us a stack of CDs. On > the first one there are a couple of small scripts to handle installing > the data and/or upgrading the database schema. Most of the CD's > contents are large data files to be used to update a postgresql > database. I assume from the question that the data is in pg_dump format? i.e. a whole pile of COPY statements with data? In that case just write a perl script to extract the data. Or specify more precisely if that's not the problem. Have anice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
On Fri, Aug 25, 2006 at 07:36:38AM -0400, barry conner wrote: > Thanks for your reply Martijn. > I do not know for certain that its in pg_dump format, not having dealt > with it specificly before. I will examine the files more closely to look > for COPY statements. If not in that format, what else could it be that > could be shoved right into a postgresql database? In any case, would an > associated file (schema type?) tell me the field name arrangements that the > data files are loaded with? or if COPY statements are used, would they > name the fields being targeted? It really depends on the format they send you. It's either plain text, in which case the COPY commands will tell you which fields. If it's the custom format you can use tar to extract it IIRC. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Am Freitag, 25. August 2006 13:51 schrieb Martijn van Oosterhout: > It really depends on the format they send you. It's either plain text, > in which case the COPY commands will tell you which fields. If it's the > custom format you can use tar to extract it IIRC. You can also use pg_restore to extract "binary" dump formats to plain text. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Thanks for your reply Martijn.
I do not know for certain that its in pg_dump format, not having dealt with it specificly before. I will examine the files more closely to look for COPY statements. If not in that format, what else could it be that could be shoved right into a postgresql database? In any case, would an associated file (schema type?) tell me the field name arrangements that the data files are loaded with? or if COPY statements are used, would they name the fields being targeted?
Thanks again.
I do not know for certain that its in pg_dump format, not having dealt with it specificly before. I will examine the files more closely to look for COPY statements. If not in that format, what else could it be that could be shoved right into a postgresql database? In any case, would an associated file (schema type?) tell me the field name arrangements that the data files are loaded with? or if COPY statements are used, would they name the fields being targeted?
Thanks again.
On 8/25/06, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Wed, Aug 23, 2006 at 12:27:09PM -0700, barryc.ctr@gmail.com wrote:
> We have an awkward situation.
>
> An affiliate organization periodically sends us a stack of CDs. On
> the first one there are a couple of small scripts to handle installing
> the data and/or upgrading the database schema. Most of the CD's
> contents are large data files to be used to update a postgresql
> database.
I assume from the question that the data is in pg_dump format? i.e. a
whole pile of COPY statements with data?
In that case just write a perl script to extract the data.
Or specify more precisely if that's not the problem.
Have anice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
iD8DBQFE7tZDIB7bNG8LQkwRAmqwAJ0S3UU1Im2CXnZ7XZ3231JUukrHGACgjiNH
FK36quKQzA7vwwweFfqaYCE=
=WLfn
-----END PGP SIGNATURE-----
Peter Eisentraut wrote: > Am Freitag, 25. August 2006 13:51 schrieb Martijn van Oosterhout: > > It really depends on the format they send you. It's either plain text, > > in which case the COPY commands will tell you which fields. If it's the > > custom format you can use tar to extract it IIRC. > > You can also use pg_restore to extract "binary" dump formats to plain text. I used "less" to look at one of the files. In this case its named 'db-private.pg-dump.1' and in a subdirectory called 'database' on the CD. "less" shows much familiar data in the file. I would have assumed it was in text format. I don't notice any COPY commands in the file as I'm looking at it via "less", so I try to locate some via "grep" and it tells me that there are matches, but that its a binary file so it won't show me the actual occurrences. !! I guess that I'll need to try your suggestion of using pg_restore to extract them to plain text. Apparently I'll need to add software to this Fedora Core 3 installation since pg_restore isn't defined currently. Do you recall the switches needed? Thanks! -bC