Thread: customizing pg_dump together with copy.c's DoCopy function
My first ever newsgroup PostgreSQL question... I want to move data between some very large databases (100+ gb) of different schema at our customer sites. I cannot expect there to be much working partition space, so the databases cannot exist simultaneously. I am also restricted to hours, not days, to move the data. I felt that pg_dump/pg_restore with the compressed format would do the job for me. I was able to create a modified pg_dump program without any difficulty. But I need to customize the speedy COPY FROM and COPY TO commands to perform my necessary schema and data content changes. I tried copying /src/backend/copy.c/h to my customized pg_dump project, renamed them and their DoCopy function, and added it to my makefile. This created conflicts between libpq-fe.h and libpq.h. For example: postgresql-7.4.13/src/interfaces/libpq/libpq-fe.h:191: error: conflicting types for `PQArgBlock' postgresql-7.4.13/src/include/libpq/libpq.h:39: error: previous declaration of `PQArgBlock' Is it possible to compile-link together frontend pg_dump code with backend code from copy.c? If not, how can I go about customizing pg_dump to have low-level control over the speedy but inflexible COPY TO/FROM commands? I already tried all this with regular sql calls and it was unnacceptably far too slow. thanks -Lynn
After further reading, I'm wondering if I should instead try to use libpq calls like PQgetCopyData, PQputCopyData, and PQputCopyEnd. Would they be a viable alternative to provide both the speed and flexibility I'm looking for? -Lynn
"lynnsettle@yahoo.com" <lynnsettle@yahoo.com> writes: > Is it possible to compile-link together frontend pg_dump code with > backend code from copy.c? No. Why do you think you need to modify pg_dump at all? regards, tom lane
> > Is it possible to compile-link together frontend pg_dump code with > > backend code from copy.c? > > No. Why do you think you need to modify pg_dump at all? > pg_dump and pg_restore provide important advantages for upgrading a customer's database on site: They are fast. I want to minimize downtime. They allow compression. I often will have relatively little free disk space to work with. My concept is "customized dump", drop database, create new schema database, "customized restore". My upgrade requires many schema and data content changes. I've tried using standard SQL statements in perl scripts to do all of it, but even with no indexes on inserts, later creating indexes for the lookup work, and every other optimization I know of, a 100gb database requires several days to turn our old database into a new one. I was hoping that I could modify the speedy pg_dump/pg_restore utilities to make these changes "on the fly". It gets tricky because I have to restore some of the data to different tables having varying schema and also change the table linking. But this is all doable as long as I can "massage" the SQL statements and data both when it goes into the dump file and when it is getting restored back out. Or am I trying to do the impossible? -Lynn
pg_dump by default dumps to STDOUT, which you should use in a pipeline to perform any modifications. To me this seems pretty tricky, but should be doable. Modifying pg_dump really strikes me as the wrong way to go about it. Pipelines operate in memory, and should be very fast, depending on how you write the filtering program. You would need to dump the data without compression, then compress it coming out the other end (maybe split it up too). Something like this:
pg_dump | myfilter | gzip | split --bytes=2000M - mydump.
Also, you can't expect to have speed if you have no disk space. Reading/writing to the same disk will kill you. If you could set up some temp space over NFS on the local network, that should gain you some speed.
pg_dump | myfilter | gzip | split --bytes=2000M - mydump.
Also, you can't expect to have speed if you have no disk space. Reading/writing to the same disk will kill you. If you could set up some temp space over NFS on the local network, that should gain you some speed.
On 11 Jul 2006 08:43:17 -0700, lynnsettle@yahoo.com <lynnsettle@yahoo.com > wrote:
> > Is it possible to compile-link together frontend pg_dump code with
> > backend code from copy.c?
>
> No. Why do you think you need to modify pg_dump at all?
>
pg_dump and pg_restore provide important advantages for upgrading a
customer's database on site:
They are fast. I want to minimize downtime.
They allow compression. I often will have relatively little free disk
space to work with.
My concept is "customized dump", drop database, create new schema
database, "customized restore".
My upgrade requires many schema and data content changes. I've tried
using standard SQL statements in perl scripts to do all of it, but even
with no indexes on inserts, later creating indexes for the lookup work,
and every other optimization I know of, a 100gb database requires
several days to turn our old database into a new one. I was hoping that
I could modify the speedy pg_dump/pg_restore utilities to make these
changes "on the fly". It gets tricky because I have to restore some of
the data to different tables having varying schema and also change the
table linking. But this is all doable as long as I can "massage" the
SQL statements and data both when it goes into the dump file and when
it is getting restored back out.
Or am I trying to do the impossible?
-Lynn
Brian, Those are very interesting ideas. Thanks. I've been playing around with pg_dump. Modifying it to selectively dump/restore tables and columns is pretty easy. But as you say, changing the data content within the data buffers to reflect varying column values, changed column types, and new columns seems tricky. I wonder if anyone knows of any example code around somewhere... -Lynn "Brian Mathis" wrote: > pg_dump by default dumps to STDOUT, which you should use in a pipeline to > perform any modifications. To me this seems pretty tricky, but should be > doable. Modifying pg_dump really strikes me as the wrong way to go about > it. Pipelines operate in memory, and should be very fast, depending on how > you write the filtering program. You would need to dump the data without > compression, then compress it coming out the other end (maybe split it up > too). Something like this: > pg_dump | myfilter | gzip | split --bytes=2000M - mydump. > > Also, you can't expect to have speed if you have no disk space. > Reading/writing to the same disk will kill you. If you could set up some > temp space over NFS on the local network, that should gain you some speed. > >