Many times I've wanted to export a subset of a database, using some sort of row filter condition on some of the large tables. E.g. copying a production database to a staging environment, but with some time series data only from the past month.
We have the existing options:
--include-table=table (and its -t synonym)
--exclude-table=table
--exclude-table-data=table
I propose a new option:
--include-table-data-where=table:filter_clause
One would use this option as follows:
pg_dump --include-table-data-where=largetable:"created_at >= '2018-05-01'" database_name
The filter_clause is used as the contents of a WHERE clause when querying the data to generate the COPY statement produced by pg_dump.
I've prepared a proposed patch for this, which is attached. The code changes are rather straightforward. I did have to add the ability to carry around an extra pointer-sized object to the simple_list implementation, in order to allow the filter clause to be associated to the matching oids of the table pattern. It seemed the best way to augment the existing simple_list implementation, but change as little as possible elsewhere in the codebase. (Note that SimpleOidList is actually only used by pg_dump).
Feel free to review and propose any amendments.