Thread: db partial dumping with pg_dump
We have a patch for pg_dump which adds a possibility to dump a part of table, for example: dump Top.Science.Astronomy heirarchy from dmoz catalog pg_dump -d -t dmoz -w "select * from dmoz where path <@ 'Top.Science.Astronomy'" dmoz We found it's very useful. We'd like to extend it to use also with COPY but it has no support for select (of course it will works only for special case when ALL columns retrieved). The question is: Is't worth to submit patch for pg_dump and look into copy code ? Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
On Tue, 2002-08-13 at 07:49, Oleg Bartunov wrote: > We have a patch for pg_dump which adds a possibility to dump > a part of table, for example: > > dump Top.Science.Astronomy heirarchy from dmoz catalog > > pg_dump -d -t dmoz -w "select * from dmoz where path <@ 'Top.Science.Astronomy'" dmoz > > We found it's very useful. We'd like to extend it to use also with COPY > but it has no support for select (of course it will works only for > special case when ALL columns retrieved). > > The question is: Is't worth to submit patch for pg_dump and look into > copy code ? I've been asked by co-workers for information on how to do this type of thing. They do partial table dumps to a development system for the purpose of finding / eliminating bugs. That said, it may be smart to make the 'select * from <table>' part yourself, and let the user supply a where clause.
On 13 Aug 2002, Rod Taylor wrote: > On Tue, 2002-08-13 at 07:49, Oleg Bartunov wrote: > > We have a patch for pg_dump which adds a possibility to dump > > a part of table, for example: > > > > dump Top.Science.Astronomy heirarchy from dmoz catalog > > > > pg_dump -d -t dmoz -w "select * from dmoz where path <@ 'Top.Science.Astronomy'" dmoz > > > > We found it's very useful. We'd like to extend it to use also with COPY > > but it has no support for select (of course it will works only for > > special case when ALL columns retrieved). > > > > The question is: Is't worth to submit patch for pg_dump and look into > > copy code ? > > I've been asked by co-workers for information on how to do this type of > thing. They do partial table dumps to a development system for the > purpose of finding / eliminating bugs. > > That said, it may be smart to make the 'select * from <table>' part > yourself, and let the user supply a where clause. > find patch (7.2) in attachement. Note, it works with -d (insert mode) option. > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
This patch seems extremely messy to me. Unless I'm missing something, -w just plain fails except when you are dumping a specific table (ie, -t must be given as well). And heaven help you if you specify a different table in -t than the one -w is selecting from. This isn't well thought out. I'm not at all convinced that such a thing belongs in pg_dump anyway. It'd be more useful as a manually-invokable feature, I think. You can almost do this in psql withselect * from table where something\g outfile but I don't think you can get psql to emit the data in a form that can be reloaded reliably (it won't quote data characters that look the same as column delimiters, for instance). What would seem to make sense is adding a WHERE-clause option to COPY TO, and then you could goCOPY table TO 'myfile' WHERE ... We already have column-list support in COPY, so we can already slice the table vertically --- WHERE would let you slice it horizontally, which seems a natural extension. (BTW, has anyone taught psql's \copy about column lists? AFAIR the original patch was only against the backend.) I'm finding it hard to visualize situations where I'd want the extra baggage of pg_dump for something like this. If I want the schema at all, I'll probably want it separate from the data so that I can hack the schema conveniently --- so I'd want to do a "pg_dump -s -t table" and then do the selective copying separately. regards, tom lane
On Tue, 13 Aug 2002, Tom Lane wrote: > This patch seems extremely messy to me. Unless I'm missing something, > -w just plain fails except when you are dumping a specific table (ie, > -t must be given as well). And heaven help you if you specify a > different table in -t than the one -w is selecting from. This isn't > well thought out. You're right. The patch I've sent was no way for submitting to sources ! I just asked about the feature and attached to message for Rod Taylor. > > I'm not at all convinced that such a thing belongs in pg_dump anyway. > It'd be more useful as a manually-invokable feature, I think. You > can almost do this in psql with > select * from table where something > \g outfile > but I don't think you can get psql to emit the data in a form that can > be reloaded reliably (it won't quote data characters that look the same > as column delimiters, for instance). > that was the reason we don't use psql for dumping > What would seem to make sense is adding a WHERE-clause option to > COPY TO, and then you could go > COPY table TO 'myfile' WHERE ... > We already have column-list support in COPY, so we can already slice the > table vertically --- WHERE would let you slice it horizontally, which > seems a natural extension. (BTW, has anyone taught psql's \copy about > column lists? AFAIR the original patch was only against the backend.) > > I'm finding it hard to visualize situations where I'd want the extra > baggage of pg_dump for something like this. If I want the schema at > all, I'll probably want it separate from the data so that I can hack > the schema conveniently --- so I'd want to do a "pg_dump -s -t table" > and then do the selective copying separately. > that'd be nice ! I often need such a feature because db at work are often too large to play at home :-) > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
On Tue, 2002-08-13 at 20:24, Tom Lane wrote: > > What would seem to make sense is adding a WHERE-clause option to > COPY TO, and then you could go > COPY table TO 'myfile' WHERE ... What about : COPY table TO 'myfile' [ WHERE cond ] [ AS INSERT [ WITH COLUMN ] ]; to get the data as INSERT INTO statements (pg_dump -d), with optional column list (pg_dump -D) ---------------- Hannu
Actually, loading all this stuff into COPY is not the way to go, I think. Informix had: UNLOAD TO 'filename'SELECT ... I have to admit, this is a superior way to do thing compared to what we have. Is is possible for us to get: COPY TO 'filename'SELECT ... It allows any arbitrary table, group by, even order by combination. --------------------------------------------------------------------------- Hannu Krosing wrote: > On Tue, 2002-08-13 at 20:24, Tom Lane wrote: > > > > What would seem to make sense is adding a WHERE-clause option to > > COPY TO, and then you could go > > COPY table TO 'myfile' WHERE ... > > What about : > > COPY table TO 'myfile' [ WHERE cond ] [ AS INSERT [ WITH COLUMN ] ]; > > to get the data as INSERT INTO statements (pg_dump -d), with optional > column list (pg_dump -D) > > ---------------- > Hannu > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Wed, 2002-08-14 at 00:03, Bruce Momjian wrote: > > Actually, loading all this stuff into COPY is not the way to go, I > think. > > Informix had: > > UNLOAD TO 'filename' > SELECT ... > > I have to admit, this is a superior way to do thing compared to what we > have. Is is possible for us to get: > > COPY TO 'filename' > SELECT ... > > It allows any arbitrary table, group by, even order by combination. It would be more in line with the rest of the system to just allow subselect as 'table' COPY (select in,name from mystuff wher id > 10) over10stuff TO stdout; ----------------- Hannu
Hannu Krosing wrote: > On Wed, 2002-08-14 at 00:03, Bruce Momjian wrote: > > > > Actually, loading all this stuff into COPY is not the way to go, I > > think. > > > > Informix had: > > > > UNLOAD TO 'filename' > > SELECT ... > > > > I have to admit, this is a superior way to do thing compared to what we > > have. Is is possible for us to get: > > > > COPY TO 'filename' > > SELECT ... > > > > It allows any arbitrary table, group by, even order by combination. > > > It would be more in line with the rest of the system to just allow > subselect as 'table' > > COPY (select in,name from mystuff wher id > 10) over10stuff TO stdout; Yep, that would work too. Clearly, we should shoot for something that leverages the existing SELECT code rather than hang more clauses off of COPY. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Clearly, we should shoot for something that leverages the existing > SELECT code rather than hang more clauses off of COPY. Yeah, that's a good point. COPY IN is still a special case, I think, but seems like COPY OUT could be reimplemented as a special tuple destination for the regular executor machinery. Q: how much performance hit would we be taking? If this slows down pg_dump a lot, the extra code is worth keeping. regards, tom lane
> I'm finding it hard to visualize situations where I'd want the extra > baggage of pg_dump for something like this. If I want the schema at > all, I'll probably want it separate from the data so that I can hack > the schema conveniently --- so I'd want to do a "pg_dump -s -t table" > and then do the selective copying separately. Tell you what I'd kill for: pg_dump -t users_\* db > users.sql Sort of thing. Chris
On Tue, 2002-08-13 at 19:54, Christopher Kings-Lynne wrote: > > I'm finding it hard to visualize situations where I'd want the extra > > baggage of pg_dump for something like this. If I want the schema at > > all, I'll probably want it separate from the data so that I can hack > > the schema conveniently --- so I'd want to do a "pg_dump -s -t table" > > and then do the selective copying separately. > > Tell you what I'd kill for: > > pg_dump -t users_\* db > users.sql > > Sort of thing. > > Chris > I'd kill for pg_restore --ignore-missing-objects --clean .... so I could run refreshes against databases that are changing schemas rather than having to constantly edit the output of -l. I'd also kill for pg_restore --ignore-existing-objects .... so I could run the darn thing against a database that's already got pl/pgsql installed in template1 and the dump file wants to install it again etc. ss > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: > Tell you what I'd kill for: > pg_dump -t users_\* db > users.sql Yeah, I was just about to start working on pg_dump's -t switch. It absolutely *must* be made schema-aware for 7.3 (-t schema.table doesn't work today) and it seems like we might as well try to allow wildcard support. Another thought I had was to accept multiple -t switches, eg "pg_dump -t foo -t bar -t schema.baz*" to dump all tables matching any of the patterns. regards, tom lane
> Yeah, I was just about to start working on pg_dump's -t switch. > It absolutely *must* be made schema-aware for 7.3 (-t schema.table > doesn't work today) and it seems like we might as well try to allow > wildcard support. Another thought I had was to accept multiple -t > switches, eg "pg_dump -t foo -t bar -t schema.baz*" to dump all tables > matching any of the patterns. What about a switch to set schema search path as well? You could also just have the tables comma separated in the -t ?? Chris
Scott Shattuck <ss@technicalpursuit.com> writes: > I'd also kill for pg_restore --ignore-existing-objects .... so I could > run the darn thing against a database that's already got pl/pgsql > installed in template1 and the dump file wants to install it again etc. In general, I think it's a mistake for pg_restore to bail out on errors. The underlying pg_dump scripts have been built and optimized on the assumption that psql would keep plugging after seeing an error. For example, scripts containing "\connect - foo" still work if there's no "foo" user ... but only because psql doesn't go belly-up. pg_restore is way less forgiving. I think the ideal behavior for pg_restore would be to abandon work on the current dump item upon seeing a SQL error, but to pick up with the next one. (Of course we could have an --anal-retentive switch to bail on first error, but I doubt it'd be used much.) Errors associated with ownership switches shouldn't cause failure in any case. Anybody care to submit patches to make this happen? regards, tom lane
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: > What about a switch to set schema search path as well? I'm leaning to the thought that this is a bad idea, and that pg_dump should act in a way that's search-path-independent. But I need to think more. > You could also just have the tables comma separated in the -t ?? Don't like it --- that makes comma a special character in pg_dump tablename patterns, which is an exception we don't need to make. The multiple-dash-t approach needs no such assumption. regards, tom lane