Thread: possibility of partial data dumps with pg_dump

possibility of partial data dumps with pg_dump

From
Никита Старовойтов
Date:
Hello,
with a view to meeting with postgres code and to get some practice with it, I am making a small patch that adds the possibility of partial tables dump.
A rule of filtering is specified with standard SQL where clause (without "where" keyword)
There are three ways to send data filters over command line:

1) using table pattern in "where" parameter with divider '@' 
... --where "table_pattern@where_condition" ...

"Where" condition will be used for all tables that match the search pattern.

2) using table parameter before any table inclusion
... --where "where condition" ...

All tables in databases will be filtered with input condition.

3) using "where" parameter after table pattern
... -t table_pattern --where where_condition ...

Only tables matching to last pattern before --where will be filtered. Third way is necessary to shorten the command
line and to avoid duplicating tables pattern when specific tables are dumped.

Also filters may be input from files.
A file consists of lines, and every line is a table pattern or a where condition for data.
For example, file
"""
where column_name_1 == 1
table_pattern 
table_pattern where column_name_2 == 1
"""
corresponds to parameters

 --where "column_name_1 == 1" -t table_pattern --where "column_name_2 == 1"

The file format is not very good, because it doesn't provide sending patterns of other components such as schemas for example.
And I am ready to change it, if functionality is actually needed.

All use cases are provided with tests.

I will be grateful if patch will get a discussion.
Attachment

Re: possibility of partial data dumps with pg_dump

From
Pavel Stehule
Date:
Hi

út 4. 10. 2022 v 12:48 odesílatel Никита Старовойтов <nikstarall@gmail.com> napsal:
Hello,
with a view to meeting with postgres code and to get some practice with it, I am making a small patch that adds the possibility of partial tables dump.
A rule of filtering is specified with standard SQL where clause (without "where" keyword)
There are three ways to send data filters over command line:

1) using table pattern in "where" parameter with divider '@' 
... --where "table_pattern@where_condition" ...

"Where" condition will be used for all tables that match the search pattern.

2) using table parameter before any table inclusion
... --where "where condition" ...

All tables in databases will be filtered with input condition.

3) using "where" parameter after table pattern
... -t table_pattern --where where_condition ...

Only tables matching to last pattern before --where will be filtered. Third way is necessary to shorten the command
line and to avoid duplicating tables pattern when specific tables are dumped.

Also filters may be input from files.
A file consists of lines, and every line is a table pattern or a where condition for data.
For example, file
"""
where column_name_1 == 1
table_pattern 
table_pattern where column_name_2 == 1
"""
corresponds to parameters

 --where "column_name_1 == 1" -t table_pattern --where "column_name_2 == 1"

The file format is not very good, because it doesn't provide sending patterns of other components such as schemas for example.
And I am ready to change it, if functionality is actually needed.

All use cases are provided with tests.

I will be grateful if patch will get a discussion.

What is benefit and use case? For this case I don't see any benefit against simple

\copy (select * from xx where ...) to file CSV

or how hard is it to write trivial application that does export of what you want in the format that you want?

Regards

Pavel

Re: possibility of partial data dumps with pg_dump

From
Julien Rouhaud
Date:
Hi,

On Tue, Oct 04, 2022 at 02:15:16PM +0200, Pavel Stehule wrote:
>
> út 4. 10. 2022 v 12:48 odesílatel Никита Старовойтов <nikstarall@gmail.com>
> napsal:
>
> > Hello,
> > with a view to meeting with postgres code and to get some practice with
> > it, I am making a small patch that adds the possibility of partial tables
> > dump.
> > A rule of filtering is specified with standard SQL where clause (without
> > "where" keyword)
>
> What is benefit and use case? For this case I don't see any benefit against
> simple
>
> \copy (select * from xx where ...) to file CSV
>
> or how hard is it to write trivial application that does export of what you
> want in the format that you want?

Also, such approach probably requires a lot of effort to get a valid backup
(with regards to foreign keys and such).

There's already a project dedicated to generate such partial (and consistent)
backups: https://github.com/mla/pg_sample.  Maybe that would address your
needs?



Re: possibility of partial data dumps with pg_dump

From
Никита Старовойтов
Date:
Good afternoon, Indeed, the functionality that I started to implement in the patch is very similar to what is included in the program you proposed. Many of the use cases are the same. Thanks for giving me a hint about it. I have been working on implementing referential integrity, but have not been able to find simple solutions for a complex structure. And I am not sure if it can be done in the dump process. Although it is obvious that without this functionality, the usefulness of the function is insignificant. When I worked with another database management system, the partial offer feature was available from the dump program. It was useful for me. But I understand why it might not be worth extending pg_dump with a non-essential feature. However, I will try to work again to solve the problem with the guaranteed recovery of the database. Thanks for the comments, they were really helpful to me.

вт, 4 окт. 2022 г. в 19:24, Julien Rouhaud <rjuju123@gmail.com>:
Hi,

On Tue, Oct 04, 2022 at 02:15:16PM +0200, Pavel Stehule wrote:
>
> út 4. 10. 2022 v 12:48 odesílatel Никита Старовойтов <nikstarall@gmail.com>
> napsal:
>
> > Hello,
> > with a view to meeting with postgres code and to get some practice with
> > it, I am making a small patch that adds the possibility of partial tables
> > dump.
> > A rule of filtering is specified with standard SQL where clause (without
> > "where" keyword)
>
> What is benefit and use case? For this case I don't see any benefit against
> simple
>
> \copy (select * from xx where ...) to file CSV
>
> or how hard is it to write trivial application that does export of what you
> want in the format that you want?

Also, such approach probably requires a lot of effort to get a valid backup
(with regards to foreign keys and such).

There's already a project dedicated to generate such partial (and consistent)
backups: https://github.com/mla/pg_sample.  Maybe that would address your
needs?