Re: Add --include-table-data-where option to pg_dump, to export onlya subset of table data - Mailing list pgsql-hackers

From Jeremy Finzel
Subject Re: Add --include-table-data-where option to pg_dump, to export onlya subset of table data
Date
Msg-id CAMa1XUiGKd+xMO35JOtt9R=E-_Hxcyi0PJ3iZmGh8+6MBbpOAg@mail.gmail.com
Whole thread Raw
In response to Add --include-table-data-where option to pg_dump, to export only asubset of table data  (Carter Thaxton <carter.thaxton@gmail.com>)
Responses Re: Add --include-table-data-where option to pg_dump, to export onlya subset of table data
List pgsql-hackers


On Mon, May 21, 2018 at 6:34 AM Carter Thaxton <carter.thaxton@gmail.com> wrote:
Many times I've wanted to export a subset of a database, using some sort of row filter condition on some of the large tables.  E.g. copying a production database to a staging environment, but with some time series data only from the past month.

We have the existing options:
  --include-table=table    (and its -t synonym)
  --exclude-table=table
  --exclude-table-data=table

I propose a new option:
  --include-table-data-where=table:filter_clause

One would use this option as follows:

  pg_dump --include-table-data-where=largetable:"created_at >= '2018-05-01'" database_name

The filter_clause is used as the contents of a WHERE clause when querying the data to generate the COPY statement produced by pg_dump.

I've prepared a proposed patch for this, which is attached.  The code changes are rather straightforward.  I did have to add the ability to carry around an extra pointer-sized object to the simple_list implementation, in order to allow the filter clause to be associated to the matching oids of the table pattern.  It seemed the best way to augment the existing simple_list implementation, but change as little as possible elsewhere in the codebase.  (Note that SimpleOidList is actually only used by pg_dump).

Feel free to review and propose any amendments.


Why not simply use \copy (select * from largetable where created_at >= '2018-05-01') to stdout? That is what I’ve always done when I need something like this and have not found it particularly bothersome but rather quite powerful. And here you have tons of flexibility because you can do joins and whatever else.

FWIW. Thanks,
Jeremy 

pgsql-hackers by date:

Previous
From: Ildus Kurbangaliev
Date:
Subject: Re: [HACKERS] Custom compression methods
Next
From: "David G. Johnston"
Date:
Subject: Re: Add --include-table-data-where option to pg_dump, to export onlya subset of table data