Thread: Table data exclusion patch for pg_dump

Table data exclusion patch for pg_dump

From
Vadim Trochinsky
Date:
Hello!

This is a patch that allows choosing not to dump the data for the selected 
tables.

The intended usage is to make backups smaller and faster, by allowing skipping 
unneeded data, while still generating a backup that can be restored and obtain 
a fully working application.

I use it to avoid dumping log data, and the content of tables that keep data 
that has a short lifetime (which in the event of a restore would have expired 
by then anyway)

The patch adds the -d and -D arguments: -d  --data=TABLE            dump data for the named table(s) -D
--exclude-data=TABLE   do NOT dump data for the named table(s)
 


I believe the patch to be complete, though I'm very new to the postgresql 
codebase and might have missed something. 

The patch applies to HEAD, compiles and worked properly in my tests.



Re: Table data exclusion patch for pg_dump

From
Tom Lane
Date:
Vadim Trochinsky <me@vadim.ws> writes:
> This is a patch that allows choosing not to dump the data for the selected 
> tables.

Why wouldn't you just use -s ?
        regards, tom lane


Re: Table data exclusion patch for pg_dump

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Vadim Trochinsky <me@vadim.ws> writes:
>   
>> This is a patch that allows choosing not to dump the data for the selected 
>> tables.
>>     
>
> Why wouldn't you just use -s ?
>
>             
>   

You might want the whole schema and data for most but not all of the 
tables (e.g. you might leave out a large session table for a web app).

cheers

andrew


Re: Table data exclusion patch for pg_dump

From
Asko Oja
Date:
How do you use -s to exclude data for some tables from otherwise full dump?<br />Dump schema and data separately?<br
/><br/><div class="gmail_quote">On Fri, May 1, 2009 at 6:38 PM, Tom Lane <span dir="ltr"><<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>></span>wrote:<br /><blockquote class="gmail_quote"
style="border-left:1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="im">Vadim
Trochinsky<<a href="mailto:me@vadim.ws">me@vadim.ws</a>> writes:<br /> > This is a patch that allows choosing
notto dump the data for the selected<br /> > tables.<br /><br /></div>Why wouldn't you just use -s ?<br /><br />    
                  regards, tom lane<br /><font color="#888888"><br /> --<br /> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your
subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></font></blockquote></div><br /> 

Re: Table data exclusion patch for pg_dump

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> Why wouldn't you just use -s ?

> You might want the whole schema and data for most but not all of the 
> tables (e.g. you might leave out a large session table for a web app).

The use-case seems pretty thin to me, and the potential for shooting
oneself in the foot rather large.  We routinely get complaints, for
example, from people who do partial dumps and then find out they don't
restore because of foreign key constraints.  This looks like mostly
a larger-gauge version of that.
        regards, tom lane


Re: Table data exclusion patch for pg_dump

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Tom Lane wrote:
>>     
>>> Why wouldn't you just use -s ?
>>>       
>
>   
>> You might want the whole schema and data for most but not all of the 
>> tables (e.g. you might leave out a large session table for a web app).
>>     
>
> The use-case seems pretty thin to me, and the potential for shooting
> oneself in the foot rather large.  We routinely get complaints, for
> example, from people who do partial dumps and then find out they don't
> restore because of foreign key constraints.  This looks like mostly
> a larger-gauge version of that.
>
>     
>   

Well, you can shoot yourself in the foot using pg_restore's --use-list 
option too, but that doesn't mean it's not useful. And indeed it could 
be used to achieve the OP's ends, except that he would have spent 
useless time and space dumping the data for a table he doesn't want.

cheers

andrew




Re: Table data exclusion patch for pg_dump

From
Pavel Stehule
Date:
2009/5/1 Tom Lane <tgl@sss.pgh.pa.us>:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> Tom Lane wrote:
>>> Why wouldn't you just use -s ?
>
>> You might want the whole schema and data for most but not all of the
>> tables (e.g. you might leave out a large session table for a web app).
>
> The use-case seems pretty thin to me, and the potential for shooting
> oneself in the foot rather large.  We routinely get complaints, for
> example, from people who do partial dumps and then find out they don't
> restore because of foreign key constraints.  This looks like mostly
> a larger-gauge version of that.

I am sorry, but this use-case is relative maybe often. When we
migrated from 8.1 to 8.3 we truncates all large audit and log tables.
Without this switch we had to use TRUNCATE or own substitution of
pg_dump. I thing so this switch should be implement like TRUNCATE
statement - you have to exclude all depended tables.

regards
Pavel Stehule

>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: Table data exclusion patch for pg_dump

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Well, you can shoot yourself in the foot using pg_restore's --use-list 
> option too, but that doesn't mean it's not useful. And indeed it could 
> be used to achieve the OP's ends, except that he would have spent 
> useless time and space dumping the data for a table he doesn't want.

The critical difference, of course, being that when he discovers that it
doesn't work, he *has got* the data he needs to have to perform the
restore.  Excluding data at dump time is considerably more dangerous
because of the likelihood that you won't have any recourse when you
need it.

(In which connection, it is an astonishingly bad idea to repurpose -d
and -D for this behavior, so soon after we deprecated them.)
        regards, tom lane


Re: Table data exclusion patch for pg_dump

From
Jaime Casanova
Date:
On Fri, May 1, 2009 at 2:25 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
> Tom Lane wrote:
>>
>> Vadim Trochinsky <me@vadim.ws> writes:
>>
>>> This is a patch that allows choosing not to dump the data for the
>>> selected tables.
>>>
>> Why wouldn't you just use -s ?
>>
>
> You might want the whole schema and data for most but not all of the tables
> (e.g. you might leave out a large session table for a web app).
>

Actually you can use pg_dump -T to exclude some tables and then
pg_dump -s -t to dump the schema of those tables


--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157