Re: [HACKERS] Bug in pg_dump --table and --exclude-table fordeclarative partition table handling. - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] Bug in pg_dump --table and --exclude-table fordeclarative partition table handling.
Date
Msg-id CAFjFpRfMX_aTXrAeWNoGkwfzhgZ18S7gE7tqfL4y0q8+UzpKew@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Bug in pg_dump --table and --exclude-table fordeclarative partition table handling.  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: [HACKERS] Bug in pg_dump --table and --exclude-table fordeclarative partition table handling.  (amul sul <sulamul@gmail.com>)
List pgsql-hackers
On Tue, May 9, 2017 at 2:59 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> Hi Amul,
>
> On 2017/05/09 16:13, amul sul wrote:
>> Hi,
>>
>> Current pg_dump --exclude-table option excludes partitioned relation
>> and dumps all its child relations and vice versa for --table option, which
>> I think is incorrect.
>
> I agree that `pg_dump -t partitioned_table` should dump all of its
> partitions too and that `pg_dump -T partitioned_table` should exclude
> partitions.  Your patch achieves that.  Some comments:
>
> I think we should update doc/src/sgml/ref/pg_dump.sgml to mention this
> behavior.
>
> +static void
> +find_partition_by_relid(Archive *fout, Oid parentId, SimpleOidList *oids)
>
> Is expand_partitioned_table() a slightly better name?
>
>
> +               appendPQExpBuffer(query, "SELECT relkind "
> +                                                 "FROM pg_catalog.pg_class "
> +                                                 "WHERE oid = %u", partid);
> +
> +               res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
> +
> +               if (PQntuples(res) == 0)
> +                       exit_horribly(NULL, "no matching partition tables were ");
> +
> +               relkind = PQgetvalue(res, 0, 0);
> +
> +               if (relkind[0] == RELKIND_PARTITIONED_TABLE)
> +                       find_partition_by_relid(fout, partid, oids);
>
> Instead of recursing like this requiring to send one query for every
> partition, why not issue just one query such that all the partitions in
> the inheritance tree are returned.  For example, as follows:
>
> +    appendPQExpBuffer(query, "WITH RECURSIVE partitions (partoid) AS ("
> +                             "   SELECT inhrelid"
> +                             "   FROM pg_inherits"
> +                             "   WHERE inhparent = %u"
> +                             "     UNION ALL"
> +                             "   SELECT inhrelid"
> +                             "   FROM pg_inherits, partitions"
> +                             "   WHERE inhparent = partitions.partoid )"
> +                             " SELECT partoid FROM partitions",
> +                             parentId);
>
> I included your patch with the above modifications in the attached 0001
> patch, which also contains the documentation updates.  Please take a look.

I think this patch too has the same problem I described in my reply to
Amul; it fires a query to server for every partitioned table it
encounters, that's not very efficient.

>
> When testing the patch, I found that if --table specifies the parent and
> --exclude specifies one of its partitions, the latter won't be forcefully
> be included due to the partitioned table expansion, which seems like an
> acceptable behavior.

unless the partition is default partition or a hash partition.

> On the other hand, if --table specifies a partition
> and --exclude specifies the partition's parent, the latter makes partition
> to be excluded as well as part of the expansion, which seems fine too.
>

I am not sure if it's worth considering partitions and partitioned
table as different entities as far as dump is concerned. We should
probably dump the whole partitioned table or none of it. There's merit
in dumping just a single partition to transfer that data to some other
server, but I am not sure how much the feature would be used.

In order to avoid any such potential anomalies, we should copy dump
flag for a partition from that of the parent as I have described in my
reply to Amul.

> One more thing I am wondering about is whether `pg_dump -t partition`
> should be dumped as a partition definition (that is, as CREATE TABLE
> PARTITION OF) which is what happens currently or as a regular table (just
> CREATE TABLE)?  I'm thinking the latter would be better, but would like to
> confirm before writing any code for that.

If we go about dumping a partition without its parent table, we should
dump CREATE TABLE with proper list of columns and constraints without
PARTITION OF clause. But I am not sure whether we should go that
route.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Jeevan Ladhe
Date:
Subject: Re: [HACKERS] Bug in pg_dump --table and --exclude-table fordeclarative partition table handling.
Next
From: "Mengxing Liu"
Date:
Subject: [HACKERS] [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking inserializable transactions