Thread: [HACKERS] reload-through-the-top-parent switch the partition table
https://www.postgresql.org/mes sage-id/CA%2BTgmoZFn7TJ7QBsFat nuEE%3DGYGdZSNXqr9489n5JBsdy5r FfA%40mail.gmail.com
Testcase:
create schema a;
create schema b;
create schema c;
create table a.t1 ( a int, b int ) partition by list(a);
create table b.t1_p1 partition of a.t1 FOR VALUES in ( 1, 2, 3) partition by list(b);
create table c.t1_p1_p1 partition of b.t1_p1 FOR VALUES in (10, 20 );
insert into a.t1 values ( 2 , 10 );
insert into a.t1 values ( 1 , 20 );
Thanks,
Above thread, it's been pointed out as important consideration
about adding reload-through-the-top- parent switch the partition
table. One small step toward making use of hash function was
adding a switch into pg_dump which reload through the top
parent for the partition table.
Attach patch does the implementation for the same:
- Added switch reload-through-root: (Open for suggestion for the switch name)
- Fix dumpTableData to COPY to the Root table with the reload-through-root switch.
- Fix dumpTableData_insert - to generate the proper insert statement with the reload-through-root switch.
- Added switch reload-through-root into pg_dumpall
Testing:
- Performed test with multi level partition + different schema combination.
- Tested a case with parent and child attributes in different order.
Attaching the pg_dump output for below test with --reload-through-root for COPY and INSERTS.
Testcase:
create schema a;
create schema b;
create schema c;
create table a.t1 ( a int, b int ) partition by list(a);
create table b.t1_p1 partition of a.t1 FOR VALUES in ( 1, 2, 3) partition by list(b);
create table c.t1_p1_p1 partition of b.t1_p1 FOR VALUES in (10, 20 );
insert into a.t1 values ( 2 , 10 );
insert into a.t1 values ( 1 , 20 );
My colleague Robert and I had doubt about the order in of TABLE
and TABLE_DATA. We thought earlier that reload-thought-root might
might not solve the purpose which has been discussed in the above
mentioned thread. But later looking into code I realize the sort order
for DO_TABLE and DO_TABLE_DATA are different, so we don't need
to worry about that issue.
might not solve the purpose which has been discussed in the above
mentioned thread. But later looking into code I realize the sort order
for DO_TABLE and DO_TABLE_DATA are different, so we don't need
to worry about that issue.
TODOs:
- Update documentation for pg_dump & pg_dumpallRushabh Lathia
Attachment
On Tue, Aug 1, 2017 at 5:34 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > My colleague Robert and I had doubt about the order in of TABLE > and TABLE_DATA. We thought earlier that reload-thought-root might > might not solve the purpose which has been discussed in the above > mentioned thread. But later looking into code I realize the sort order > for DO_TABLE and DO_TABLE_DATA are different, so we don't need > to worry about that issue. Hmm. Does that mean that table data restoration will *absolutely always* follow all CREATE TABLE commands, or is that something that's *normally* true but potentially false if dependency sorting switches things around? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 2, 2017 at 3:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 1, 2017 at 5:34 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> My colleague Robert and I had doubt about the order in of TABLE
> and TABLE_DATA. We thought earlier that reload-thought-root might
> might not solve the purpose which has been discussed in the above
> mentioned thread. But later looking into code I realize the sort order
> for DO_TABLE and DO_TABLE_DATA are different, so we don't need
> to worry about that issue.
Hmm. Does that mean that table data restoration will *absolutely
always* follow all CREATE TABLE commands, or is that something that's
*normally* true but potentially false if dependency sorting switches
things around?
Looking at the dbObjectTypePriority comments that seems like data restoration
will *absolutely always* follow all CREATE TABLE commands.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Rushabh Lathia
On Wed, Aug 2, 2017 at 1:01 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > Looking at the dbObjectTypePriority comments that seems like data > restoration > will *absolutely always* follow all CREATE TABLE commands. Hmm. I wasn't very convinced by those comments, but Tom's commit a1ef01fe163b304760088e3e30eb22036910a495 convinces me that it has to work that way. So I think we are OK on that score. The patch itself looks just fine on a quick glance, modulo the lack of documentation, but I think we need to bikeshed the name of the flag. --reload-through-root is clear as daylight to me, but I'm not sure users will agree. The lack of the word "partition" is perhaps a significant flaw, and pg_dump doesn't really reload anything; it just dumps. The best thing I can come up with after brief thought is --partition-data-via-root, but maybe somebody else has a better idea? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > The patch itself looks just fine on a quick glance, modulo the lack of > documentation, but I think we need to bikeshed the name of the flag. > --reload-through-root is clear as daylight to me, but I'm not sure > users will agree. The lack of the word "partition" is perhaps a > significant flaw, and pg_dump doesn't really reload anything; it just > dumps. > The best thing I can come up with after brief thought is > --partition-data-via-root, but maybe somebody else has a better idea? --restore-via-partition-root ? regards, tom lane
On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> The patch itself looks just fine on a quick glance, modulo the lack of >> documentation, but I think we need to bikeshed the name of the flag. >> --reload-through-root is clear as daylight to me, but I'm not sure >> users will agree. The lack of the word "partition" is perhaps a >> significant flaw, and pg_dump doesn't really reload anything; it just >> dumps. > >> The best thing I can come up with after brief thought is >> --partition-data-via-root, but maybe somebody else has a better idea? > > --restore-via-partition-root ? I worry someone will think that pg_dump is now restoring stuff, but it isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> --restore-via-partition-root ? > I worry someone will think that pg_dump is now restoring stuff, but it isn't. Well, the point is that the commands it emits will cause the eventual restore to go through the root. Anyway, I think trying to avoid using a verb altogether is going to result in a very stilted option name. I notice that the option list already includes some references to "insert", so maybe "--insert-via-partition-root"? Although you could argue that that's confusing when we're using COPY. regards, tom lane
Re: [HACKERS] reload-through-the-top-parent switch the partition table
From
"David G. Johnston"
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> --restore-via-partition-root ?
> I worry someone will think that pg_dump is now restoring stuff, but it isn't.
Well, the point is that the commands it emits will cause the eventual
restore to go through the root. Anyway, I think trying to avoid using
a verb altogether is going to result in a very stilted option name.
I notice that the option list already includes some references to
"insert", so maybe "--insert-via-partition-root"? Although you could
argue that that's confusing when we're using COPY.
--use-partitioned-table [partition-name, ...] # if names are omitted it defaults to all partitioned tables
I don't know that we need to use "root" in the argument name to communicate "the top-most if partitioned tables are nested". We have the docs to describe exactly what it does. "Partitioned Table" is what we are calling the main routing table in the docs. "Use" seems adequate.
FWIW my first thought was "--insert-via-partition-root" and I didn't mind that "COPY" commands would be affected implicitly.
David J.
On Wed, Aug 2, 2017 at 1:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> --restore-via-partition-root ? > >> I worry someone will think that pg_dump is now restoring stuff, but it isn't. > > Well, the point is that the commands it emits will cause the eventual > restore to go through the root. Anyway, I think trying to avoid using > a verb altogether is going to result in a very stilted option name. > > I notice that the option list already includes some references to > "insert", so maybe "--insert-via-partition-root"? Although you could > argue that that's confusing when we're using COPY. Yeah, that's definitely confusing. I realize that my verbless version is a little odd, but there are numerous precedents for it: --inserts, --column-inserts, --if-exists, --strict-names, ... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8/2/17 13:58, Tom Lane wrote: > I notice that the option list already includes some references to > "insert", so maybe "--insert-via-partition-root"? Although you could > argue that that's confusing when we're using COPY. "load" could be more general. But I'm also OK with "restore". -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Aug 2, 2017 at 11:47 PM, David G. Johnston <david.g.johnston@gmail.com> wrote: > On Wed, Aug 2, 2017 at 10:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Robert Haas <robertmhaas@gmail.com> writes: >> > On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> --restore-via-partition-root ? >> >> > I worry someone will think that pg_dump is now restoring stuff, but it >> > isn't. >> >> Well, the point is that the commands it emits will cause the eventual >> restore to go through the root. Anyway, I think trying to avoid using >> a verb altogether is going to result in a very stilted option name. >> >> I notice that the option list already includes some references to >> "insert", so maybe "--insert-via-partition-root"? Although you could >> argue that that's confusing when we're using COPY. > > > --use-partitioned-table [partition-name, ...] # if names are omitted it > defaults to all partitioned tables I like this idea since it allows using this feature for selected tables e.g. hash tables. Otherwise, users will be forced to use this option even when there is only a single hash partitioned table and many other list/range partitioned tables. What we are trying to do here is dump the data in a partitioned table as if it's not partitioned. Combine that with --data-only dumps, and one could use it to load partitioned data into unpartitioned or differently partitioned table. So, how about naming the option as --unpartition-partitioned-table [partitioned-table-name, ....] # if names are omitted it defaults to all the partitioned tables. That really says what dump is really doing without focusing on how the data will be used like restoring/inserting/copying etc. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Thu, Aug 3, 2017 at 3:39 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Wed, Aug 2, 2017 at 11:47 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> On Wed, Aug 2, 2017 at 10:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Robert Haas <robertmhaas@gmail.com> writes:
>> > On Wed, Aug 2, 2017 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> --restore-via-partition-root ?
>>
>> > I worry someone will think that pg_dump is now restoring stuff, but it
>> > isn't.
>>
>> Well, the point is that the commands it emits will cause the eventual
>> restore to go through the root. Anyway, I think trying to avoid using
>> a verb altogether is going to result in a very stilted option name.
>>
>> I notice that the option list already includes some references to
>> "insert", so maybe "--insert-via-partition-root"? Although you could
>> argue that that's confusing when we're using COPY.
>
>
> --use-partitioned-table [partition-name, ...] # if names are omitted it
> defaults to all partitioned tables
I like this idea since it allows using this feature for selected
tables e.g. hash tables. Otherwise, users will be forced to use this
option even when there is only a single hash partitioned table and
many other list/range partitioned tables.
+1.
What we are trying to do here is dump the data in a partitioned table
as if it's not partitioned. Combine that with --data-only dumps, and
one could use it to load partitioned data into unpartitioned or
differently partitioned table. So, how about naming the option as
--unpartition-partitioned-table [partitioned-table-name, ....] # if
names are omitted it defaults to all the partitioned tables.
--unpartition-partitioned- table is very confusing.
I liked the previous option which is --use-partitioned-table [partition-name, ...].
The Only problem with --use-partitioned-table is, a user needs to specify the
partition-name in the options list. Imagine if someone having 100's of
partitions then specifying those name is pg_dump option is a pain.
Rather than that:
--use-partitioned-table [partitioned_name, ...] # if
names are omitted it defaults to all the partitioned tables.
Here user need to specify the root relation name in the option - and any
partition table have that as a ROOT, will load the data through
top-parent-relation.
That really says what dump is really doing without focusing on how the
data will be used like restoring/inserting/copying etc.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Rushabh Lathia
Re: [HACKERS] reload-through-the-top-parent switch the partition table
From
"David G. Johnston"
Date:
On Thursday, August 3, 2017, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
--use-partitioned-table [partitioned_name, ...] # ifnames are omitted it defaults to all the partitioned tables.Here user need to specify the root relation name in the option - and anypartition table have that as a ROOT, will load the data throughtop-parent-relation.
This is indeed what I intended to convey. Only specify "root" names.
David J.
On Thu, Aug 3, 2017 at 9:01 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > --unpartition-partitioned-table is very confusing. +1. > I liked the previous option which is --use-partitioned-table > [partition-name, ...]. > The Only problem with --use-partitioned-table is, a user needs to specify > the > partition-name in the options list. Imagine if someone having 100's of > partitions then specifying those name is pg_dump option is a pain. Yeah, that won't work. > Rather than that: > > --use-partitioned-table [partitioned_name, ...] # if > names are omitted it defaults to all the partitioned tables. > > Here user need to specify the root relation name in the option - and any > partition table have that as a ROOT, will load the data through > top-parent-relation. We could do that, but I'm not sure it's a good idea to use getopt_long() with optional options. Sometimes that creates confusion -- is pg_dump --use-partitioned-table salad an attempt to dump the salad database with the --use-partitioned-table option, or an attempt to apply --use-partitioned-table only to partitions whose parent is the salad table? getopt_long() has an answer, but some people may guess incorrectly about what it is. I would be more inclined to make this a global option than something that modifies the behavior for certain tables; the only per-table flags we have right now are just to include/exclude individual tables. You could make --inserts or --no-unlogged-table-data apply to some but not all tables, but we didn't; why start here? I don't like the specific name --use-partitioned-table much either. Use it for what? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 2, 2017 at 4:08 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 8/2/17 13:58, Tom Lane wrote: >> I notice that the option list already includes some references to >> "insert", so maybe "--insert-via-partition-root"? Although you could >> argue that that's confusing when we're using COPY. > > "load" could be more general. But I'm also OK with "restore". "load" seems better than "restore" to me, both because it's shorter and because it sounds less like pg_dump will be doing the job of pg_restore. So maybe --load-via-partition-root if nobody likes my previous suggestion of --partition-data-via-root ? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > So maybe --load-via-partition-root if nobody likes my previous > suggestion of --partition-data-via-root ? WFM. regards, tom lane
Re: [HACKERS] reload-through-the-top-parent switch the partition table
From
"David G. Johnston"
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> So maybe --load-via-partition-root if nobody likes my previous
> suggestion of --partition-data-via-root ?
WFM.
+1
David J.
On 2017/08/04 1:08, David G. Johnston wrote: > On Thu, Aug 3, 2017 at 8:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Robert Haas <robertmhaas@gmail.com> writes: >>> So maybe --load-via-partition-root if nobody likes my previous >>> suggestion of --partition-data-via-root ? >> >> WFM. >> > > +1 +1. Thanks, Amit
On Fri, Aug 4, 2017 at 7:13 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/08/04 1:08, David G. Johnston wrote:
> On Thu, Aug 3, 2017 at 8:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> So maybe --load-via-partition-root if nobody likes my previous
>>> suggestion of --partition-data-via-root ?
>>
>> WFM.
>>
>
> +1
+1.
Thanks,
Amit
Thanks,
Rushabh Lathia
Attachment
Re: [HACKERS] reload-through-the-top-parent switch the partition table
From
Rajkumar Raghuwanshi
Date:
Hi Rushabh,
While testing latest v2 patch, I got a crash when using --load-via-partition-root with --schema options. Below are steps to reproduce.create schema a;
create schema b;
create schema c;
create table t1 (a int,b text) partition by list(a);
create table a.t1_p1 partition of t1 FOR VALUES in (1,2,3,4) partition by list(a);
create table b.t1_p1_p1 partition of a.t1_p1 FOR VALUES in (1,2);
create table c.t1_p1_p2 partition of a.t1_p1 FOR VALUES in (3,4);
create table b.t1_p2 partition of t1 FOR VALUES in (5,6,7,8) partition by list(a);
create table a.t1_p2_p1 partition of b.t1_p2 FOR VALUES in (5,6);
create table t1_p2_p2 partition of b.t1_p2 FOR VALUES in (7,8);
insert into t1 values (8,'t1');
insert into a.t1_p1 values (2,'a.t1_p1');
insert into b.t1_p1_p1 values (1,'b.t1_p1_p1');
insert into c.t1_p1_p2 values (3,'c.t1_p1_p2');
insert into b.t1_p2 values (6,'b.t1_p2');
insert into a.t1_p2_p1 values (5,'a.t1_p2_p1');
insert into t1_p2_p2 values (7,'t1_p2_p1');
insert into t1 values (4 ,'t1');
[edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root -f d2.dump -Fp
[edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root --schema=a -f d3.dump -Fp
pg_dump: pg_dump.c:2063: getRootTableInfo: Assertion `tbinfo->numParents == 1' failed.
Aborted (core dumped)
[edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root --schema=a -f d3.dump -Fp
pg_dump: pg_dump.c:2063: getRootTableInfo: Assertion `tbinfo->numParents == 1' failed.
Aborted (core dumped)
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
On Fri, Aug 4, 2017 at 3:01 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
as pg_dumpall.and also added the documentation for the new switch into pg_dump as wellHere is an update patch, now renamed the switch to --load-via-partition-rootOn Fri, Aug 4, 2017 at 7:13 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: On 2017/08/04 1:08, David G. Johnston wrote:
> On Thu, Aug 3, 2017 at 8:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> So maybe --load-via-partition-root if nobody likes my previous
>>> suggestion of --partition-data-via-root ?
>>
>> WFM.
>>
>
> +1
+1.
Thanks,
AmitThanks,Rushabh Lathia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks Rajkumar for testing and reporting this.
It seems like with we set the numParents and parents only for thedumpable objects (flagInhTables()). Current patch relies on the numParents
and parents to get the root partition TableInfo, but when --schema is been
specified - it doesn't load those information for the object which is not
dumpable.
Now one options is:
1) restrict the --load-via-partition-root to be used with
the --schema or may be some other options - where we restrict the
objects.
Consider this, partition root is in schema 'a' and the partition table is in
schema 'b', if someone specify the --schema b with --load-via-partition-root,
schema 'b', if someone specify the --schema b with --load-via-partition-root,
I think we should not do "INSERT INTO a.tab" to load the data (because
user specified --schema b). 2) fix flagInhTables() to load the numParents and the parents information
for the partition table (can be checked using ispartition), irrespective of
whether object is dumpable is true or not.
May be something like:
@@ -322,7 +322,11 @@ flagInhTables(TableInfo *tblinfo, int numTables,
/* Don't bother computing anything for non-target tables, either */
if (!tblinfo[i].dobj.dump)
+ {
+ if (tblinfo[i].ispartition)
+ findParentsByOid(&tblinfo[i], inhinfo, numInherits);
continue;
+ }
@@ -322,7 +322,11 @@ flagInhTables(TableInfo *tblinfo, int numTables,
/* Don't bother computing anything for non-target tables, either */
if (!tblinfo[i].dobj.dump)
+ {
+ if (tblinfo[i].ispartition)
+ findParentsByOid(&tblinfo[i], inhinfo, numInherits);
continue;
+ }
I am still looking into this, meanwhile any inputs are welcome.
On Tue, Aug 8, 2017 at 4:10 PM, Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
[edb@localhost bin]$ ./pg_dump -d postgres --schema=a -f d1.dump -Fp--trying to take pg_dump--create below test dataHi Rushabh,While testing latest v2 patch, I got a crash when using --load-via-partition-root with --schema options. Below are steps to reproduce.
create schema a;
create schema b;
create schema c;
create table t1 (a int,b text) partition by list(a);
create table a.t1_p1 partition of t1 FOR VALUES in (1,2,3,4) partition by list(a);
create table b.t1_p1_p1 partition of a.t1_p1 FOR VALUES in (1,2);
create table c.t1_p1_p2 partition of a.t1_p1 FOR VALUES in (3,4);
create table b.t1_p2 partition of t1 FOR VALUES in (5,6,7,8) partition by list(a);
create table a.t1_p2_p1 partition of b.t1_p2 FOR VALUES in (5,6);
create table t1_p2_p2 partition of b.t1_p2 FOR VALUES in (7,8);
insert into t1 values (8,'t1');
insert into a.t1_p1 values (2,'a.t1_p1');
insert into b.t1_p1_p1 values (1,'b.t1_p1_p1');
insert into c.t1_p1_p2 values (3,'c.t1_p1_p2');
insert into b.t1_p2 values (6,'b.t1_p2');
insert into a.t1_p2_p1 values (5,'a.t1_p2_p1');
insert into t1_p2_p2 values (7,'t1_p2_p1');
insert into t1 values (4 ,'t1');[edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root -f d2.dump -Fp
[edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root --schema=a -f d3.dump -Fp
pg_dump: pg_dump.c:2063: getRootTableInfo: Assertion `tbinfo->numParents == 1' failed.
Aborted (core dumped)Thanks & Regards,Rajkumar RaghuwanshiQMG, EnterpriseDB CorporationOn Fri, Aug 4, 2017 at 3:01 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:--as pg_dumpall.and also added the documentation for the new switch into pg_dump as wellHere is an update patch, now renamed the switch to --load-via-partition-rootOn Fri, Aug 4, 2017 at 7:13 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: On 2017/08/04 1:08, David G. Johnston wrote:
> On Thu, Aug 3, 2017 at 8:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> So maybe --load-via-partition-root if nobody likes my previous
>>> suggestion of --partition-data-via-root ?
>>
>> WFM.
>>
>
> +1
+1.
Thanks,
AmitThanks,Rushabh Lathia
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Rushabh Lathia
On Tue, Aug 8, 2017 at 6:18 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > Thanks Rajkumar for testing and reporting this. > > > It seems like with we set the numParents and parents only for the > dumpable objects (flagInhTables()). Current patch relies on the numParents > and parents to get the root partition TableInfo, but when --schema is been > specified - it doesn't load those information for the object which is not > dumpable. > > Now one options is: > > 1) restrict the --load-via-partition-root to be used with > the --schema or may be some other options - where we restrict the > objects. > > Consider this, partition root is in schema 'a' and the partition table is in > schema 'b', if someone specify the --schema b with > --load-via-partition-root, > I think we should not do "INSERT INTO a.tab" to load the data (because > user specified --schema b). > +1, this looks cleaner to me. > 2) fix flagInhTables() to load the numParents and the parents information > for the partition table (can be checked using ispartition), irrespective of > whether object is dumpable is true or not. > > May be something like: > > @@ -322,7 +322,11 @@ flagInhTables(TableInfo *tblinfo, int numTables, > > /* Don't bother computing anything for non-target tables, either */ > if (!tblinfo[i].dobj.dump) > + { > + if (tblinfo[i].ispartition) > + findParentsByOid(&tblinfo[i], inhinfo, numInherits); > continue; > + } > > I am still looking into this, meanwhile any inputs are welcome. > See the note given in the pg_dump document[1] is : "When -n is specified, pg_dump makes no attempt to dump any other database objects that the selected schema(s) might depend upon. Therefore, there is no guarantee that the results of a specific-schema dump can be successfully restored by themselves into a clean database." If we want to follow the same trend then we could simply dump all partition(ed) belong to a specified schema. In the Rajkumar’s example, we need to dump partitions belong to schema 'a' (i.e t1_p1 and t1_p2_p1), and target root for the insert query will be the same table or the parent belong to the same schema. Regards, Amul [1] https://www.postgresql.org/docs/devel/static/app-pgdump.html
On Tue, Aug 8, 2017 at 8:48 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > It seems like with we set the numParents and parents only for the > dumpable objects (flagInhTables()). Current patch relies on the numParents > and parents to get the root partition TableInfo, but when --schema is been > specified - it doesn't load those information for the object which is not > dumpable. > > Now one options is: > > 1) restrict the --load-via-partition-root to be used with > the --schema or may be some other options - where we restrict the > objects. > > Consider this, partition root is in schema 'a' and the partition table is in > schema 'b', if someone specify the --schema b with > --load-via-partition-root, > I think we should not do "INSERT INTO a.tab" to load the data (because > user specified --schema b). > > 2) fix flagInhTables() to load the numParents and the parents information > for the partition table (can be checked using ispartition), irrespective of > whether object is dumpable is true or not. (1) seems like a pretty arbitrary restriction, so I don't like that option. (2) would hurt performance in some use cases. Do we have an option (3)? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 9, 2017 at 1:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 8, 2017 at 8:48 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> It seems like with we set the numParents and parents only for the
> dumpable objects (flagInhTables()). Current patch relies on the numParents
> and parents to get the root partition TableInfo, but when --schema is been
> specified - it doesn't load those information for the object which is not
> dumpable.
>
> Now one options is:
>
> 1) restrict the --load-via-partition-root to be used with
> the --schema or may be some other options - where we restrict the
> objects.
>
> Consider this, partition root is in schema 'a' and the partition table is in
> schema 'b', if someone specify the --schema b with
> --load-via-partition-root,
> I think we should not do "INSERT INTO a.tab" to load the data (because
> user specified --schema b).
>
> 2) fix flagInhTables() to load the numParents and the parents information
> for the partition table (can be checked using ispartition), irrespective of
> whether object is dumpable is true or not.
(1) seems like a pretty arbitrary restriction, so I don't like that
option. (2) would hurt performance in some use cases. Do we have an
option (3)?
How about protecting option 2) with the load-via-partition-root protection. Means
load the parents information even dump is not set only when load-via-partition-root
& ispartition.
This won't hurt performance for the normal cases.
Attaching the patch.
--
Rushabh Lathia
Attachment
On Thu, Aug 10, 2017 at 3:47 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: >> (1) seems like a pretty arbitrary restriction, so I don't like that >> option. (2) would hurt performance in some use cases. Do we have an >> option (3)? > > How about protecting option 2) with the load-via-partition-root protection. > Means > load the parents information even dump is not set only when > load-via-partition-root > & ispartition. > > This won't hurt performance for the normal cases. Yes, that seems like the right approach. + Dump data via the top-most partitioned table (rather than partition + table) when dumping data for the partition table. I think we should phrase this a bit more clearly, something like this: When dumping a COPY or INSERT statement for a partitioned table, target the root of the partitioning hierarchy which contains it rather than the partition itself. This may be useful when reloading data on a server where rows do not always fall into the same partitions as they did on the original server. This could happen, for example, if the partitioning column is of type text and the two system have different definitions of the collation used to partition the data. + printf(_(" --load-via-partition-root load partition table via the root relation\n")); "relation" seems odd to me here. root table? /* Don't bother computing anything for non-target tables, either */ if (!tblinfo[i].dobj.dump) + { + /* + * Load the parents information for the partition table when + * the load-via-partition-root option is set. As we need the + * parents information to get the partition root. + */ + if (dopt->load_via_partition_root && + tblinfo[i].ispartition) + findParentsByOid(&tblinfo[i], inhinfo, numInherits); continue; + } Duplicating the call to findParentsByOid seems less then ideal. How about doing something like this: if (tblinfo[i].dobj.dump) { find_parents = true; mark_parents = true; } else if (dopt->load_via_partition_root && tblinfo[i].ispartition) find_parents = true; if (find_parents) findParentsByOid(&tblinfo[i], inhinfo, numInherits); etc. The comments for this function also need some work - e.g. the function header comment deserves some kind of update for these changes. +static TableInfo * +getRootTableInfo(TableInfo *tbinfo) +{ + Assert(tbinfo->ispartition); + Assert(tbinfo->numParents == 1); + if (tbinfo->parents[0]->ispartition) + return getRootTableInfo(tbinfo->parents[0]); + + return tbinfo->parents[0]; +} This code should iterate, not recurse, to avoid any risk of blowing out the stack. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 10, 2017 at 8:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Aug 10, 2017 at 3:47 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
>> (1) seems like a pretty arbitrary restriction, so I don't like that
>> option. (2) would hurt performance in some use cases. Do we have an
>> option (3)?
>
> How about protecting option 2) with the load-via-partition-root protection.
> Means
> load the parents information even dump is not set only when
> load-via-partition-root
> & ispartition.
>
> This won't hurt performance for the normal cases.
Yes, that seems like the right approach.
+ Dump data via the top-most partitioned table (rather than partition
+ table) when dumping data for the partition table.
I think we should phrase this a bit more clearly, something like this:
When dumping a COPY or INSERT statement for a partitioned table,
target the root of the partitioning hierarchy which contains it rather
than the partition itself. This may be useful when reloading data on
a server where rows do not always fall into the same partitions as
they did on the original server. This could happen, for example, if
the partitioning column is of type text and the two system have
different definitions of the collation used to partition the data.
Done.
+ printf(_(" --load-via-partition-root load partition table via
the root relation\n"));
"relation" seems odd to me here. root table?
Done.
/* Don't bother computing anything for non-target tables, either */
if (!tblinfo[i].dobj.dump)
+ {
+ /*
+ * Load the parents information for the partition table when
+ * the load-via-partition-root option is set. As we need the
+ * parents information to get the partition root.
+ */
+ if (dopt->load_via_partition_root &&
+ tblinfo[i].ispartition)
+ findParentsByOid(&tblinfo[i], inhinfo, numInherits);
continue;
+ }
Duplicating the call to findParentsByOid seems less then ideal. How
about doing something like this:
if (tblinfo[i].dobj.dump)
{
find_parents = true;
mark_parents = true;
}
else if (dopt->load_via_partition_root && tblinfo[i].ispartition)
find_parents = true;
if (find_parents)
findParentsByOid(&tblinfo[i], inhinfo, numInherits);
etc.
Done changes to avoid duplicate call to findParentsByOid().
The comments for this function also need some work - e.g. the function
header comment deserves some kind of update for these changes.
Done.
+static TableInfo *
+getRootTableInfo(TableInfo *tbinfo)
+{
+ Assert(tbinfo->ispartition);
+ Assert(tbinfo->numParents == 1);
+ if (tbinfo->parents[0]->ispartition)
+ return getRootTableInfo(tbinfo->parents[0]);
+
+ return tbinfo->parents[0];
+}
This code should iterate, not recurse, to avoid any risk of blowing
out the stack.
Done.
Please find attach patch with the changes.
Thanks,
Rushabh Lathia
Attachment
On Fri, Aug 11, 2017 at 5:36 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > Please find attach patch with the changes. I found the way that you had the logic structured in flagInhTables() to be quite hard to follow, so I rewrote it in the attached version. This version also has a bunch of minor cosmetic changes. Barring objections, I'll commit it once the tree opens for v11 development. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Fri, Aug 11, 2017 at 10:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Aug 11, 2017 at 5:36 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
> Please find attach patch with the changes.
I found the way that you had the logic structured in flagInhTables()
to be quite hard to follow, so I rewrote it in the attached version.
This version also has a bunch of minor cosmetic changes. Barring
objections, I'll commit it once the tree opens for v11 development.
Thanks Robert.
Patch changes looks good to me.
regards,
Rushabh Lathia
On Mon, Aug 14, 2017 at 12:40 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > On Fri, Aug 11, 2017 at 10:50 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Aug 11, 2017 at 5:36 AM, Rushabh Lathia >> <rushabh.lathia@gmail.com> wrote: >> > Please find attach patch with the changes. >> >> I found the way that you had the logic structured in flagInhTables() >> to be quite hard to follow, so I rewrote it in the attached version. >> This version also has a bunch of minor cosmetic changes. Barring >> objections, I'll commit it once the tree opens for v11 development. > > Thanks Robert. > > Patch changes looks good to me. Thanks. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company