Thread: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Tomas Vondra
Date:
Hi,

I've been doing some testing today, and it seems heap_sync is somewhat
confused by partitioned tables. I'm doing a COPY into a partitioned
table (lineitem from TPC-H partitioned per month) like this:

---------------------------------------------------------------------
BEGIN;

CREATE TABLE lineitem (...) PARTITION BY RANGE (l_shipdate);

CREATE TABLE lineitem_1992_01 PARTITION OF lineitem FOR VALUES FROM
('1992-01-01') TO ('1992-02-01');

...

CREATE TABLE lineitem_1998_12 PARTITION OF lineitem FOR VALUES FROM
('1998-12-01') TO ('1999-01-01');

COPY INTO lineitem FROM ....

COMMIT;
---------------------------------------------------------------------

I'm observing COPY failing like this:

ERROR: could not open file "base/16384/16427": No such file or directory

where the relfilenode matches the "lineitem" relation (which is
guaranteed to be empty, as it's partitioned, so it's not surprising the
relfilenode does not exist).

After adding an Assert(false) into mdopen(), which is where the error
comes from, I got this backtrace (full backtrace attached):

...
#3  0x0000000000766387 in mdopen (...) at md.c:609
#4  0x00000000007674b7 in mdnblocks (...) at md.c:876
#5  0x0000000000767a61 in mdimmedsync (...) at md.c:1034
#6  0x00000000004c62b5 in heap_sync (...) at heapam.c:9399
#7  0x00000000005a2bdd in CopyFrom (...) at copy.c:2890
#8  0x00000000005a1621 in DoCopy (...) at copy.c:992
...

So apparently CopyFrom() invokes heap_sync() on the partitioned
relation, which attempts to do mdimmedsync() only on the root. That
seems like a bug to me.

Obviously this only applies to wal_level=minimal. There are multiple
callers of heap_sync(), but only the COPY seems to be affected by this,
because the rest can't see partitioned tables.

So it seems heap_sync() needs to be improved to sync all partitions.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

From
David Rowley
Date:
On 19 September 2018 at 12:21, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> So apparently CopyFrom() invokes heap_sync() on the partitioned
> relation, which attempts to do mdimmedsync() only on the root. That
> seems like a bug to me.
>
> Obviously this only applies to wal_level=minimal. There are multiple
> callers of heap_sync(), but only the COPY seems to be affected by this,
> because the rest can't see partitioned tables.
>
> So it seems heap_sync() needs to be improved to sync all partitions.

Wouldn't it be better to modify copy.c to just perform the heap_sync
on just the partitions it touches?

After https://commitfest.postgresql.org/19/1690/ it's very efficient
to know which partitions were used. The bulk load might just be
hitting a single partition, so it makes sense to try to just do the
ones we need to, rather than needlessly performing heap_sync() on all
of them.

For the back branches, the current form of PartitionDispatch is not
particularly inefficient to just skip over the uninitialized
partitions.

6b78231d918 did recently just make the PartitionDispatch struct's
members private to execPartition.c, so doing this will probably
require some API function that can be passed a callback function to do
what we need.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Michael Paquier
Date:
On Wed, Sep 19, 2018 at 01:14:10PM +1200, David Rowley wrote:
> On 19 September 2018 at 12:21, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> So apparently CopyFrom() invokes heap_sync() on the partitioned
>> relation, which attempts to do mdimmedsync() only on the root. That
>> seems like a bug to me.

And the root should not have any physical storage either, so attempting
to sync it is wrong.

>> Obviously this only applies to wal_level=minimal. There are multiple
>> callers of heap_sync(), but only the COPY seems to be affected by this,
>> because the rest can't see partitioned tables.
>>
>> So it seems heap_sync() needs to be improved to sync all partitions.
>
> Wouldn't it be better to modify copy.c to just perform the heap_sync
> on just the partitions it touches?

Yeah, my gut is telling me that this would be the best approach for now,
still I am not sure that this is the best move in the long term.  All
the other callers of heap_sync don't care about partitioned tables, so
we could add an assertion on RELKIND_PARTITIONED_TABLE.  Still, one
crazy argument against that approach would be that we'd need to do the
same thing again if one day CTAS or matviews allow some sort of
automatic creation of partitions.
--
Michael

Attachment

Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Andres Freund
Date:
On 2018-09-19 12:06:47 +0900, Michael Paquier wrote:
> On Wed, Sep 19, 2018 at 01:14:10PM +1200, David Rowley wrote:
> > Wouldn't it be better to modify copy.c to just perform the heap_sync
> > on just the partitions it touches?
> 
> Yeah, my gut is telling me that this would be the best approach for now,
> still I am not sure that this is the best move in the long term.

ISTM heap_sync() would be the entirely wrong layer to handle
partitioning. For several reasons: 1) With pluggable storage, we want to
have multiple different table implementations, doing the syncing on the
heap_* for partitions would thus be wrong. 2) In just about all cases we
only want to sync a few partitions, there's not really a use-case for
doing syncs across all partitions imo.

> All the other callers of heap_sync don't care about partitioned
> tables, so we could add an assertion on RELKIND_PARTITIONED_TABLE.

Or rather, it should assert the expected relkinds?

Greetings,

Andres Freund


Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Michael Paquier
Date:
On Wed, Sep 19, 2018 at 02:48:58PM -0700, Andres Freund wrote:
> On 2018-09-19 12:06:47 +0900, Michael Paquier wrote:
>> Yeah, my gut is telling me that this would be the best approach for now,
>> still I am not sure that this is the best move in the long term.
>
> ISTM heap_sync() would be the entirely wrong layer to handle
> partitioning. For several reasons: 1) With pluggable storage, we want to
> have multiple different table implementations, doing the syncing on the
> heap_* for partitions would thus be wrong. 2) In just about all cases we
> only want to sync a few partitions, there's not really a use-case for
> doing syncs across all partitions imo.

I haven't considered things from the angle of 1), which is a very good
point.  2) is also a good argument.

>> All the other callers of heap_sync don't care about partitioned
>> tables, so we could add an assertion on RELKIND_PARTITIONED_TABLE.
>
> Or rather, it should assert the expected relkinds?

Yeah, I think that we are coming back to what heap_create assumes in
term of which relkinds should have storage or not, so a global macro
able to do the work would be adapted perhaps?
--
Michael

Attachment

Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

From
David Rowley
Date:
On Thu, 20 Sep 2018 at 11:31, Andres Freund <andres@anarazel.de> wrote:
>
> On 2018-09-19 12:06:47 +0900, Michael Paquier wrote:
> > On Wed, Sep 19, 2018 at 01:14:10PM +1200, David Rowley wrote:
> > > Wouldn't it be better to modify copy.c to just perform the heap_sync
> > > on just the partitions it touches?
> >
> > Yeah, my gut is telling me that this would be the best approach for now,
> > still I am not sure that this is the best move in the long term.
>
> ISTM heap_sync() would be the entirely wrong layer to handle
> partitioning. For several reasons: 1) With pluggable storage, we want to
> have multiple different table implementations, doing the syncing on the
> heap_* for partitions would thus be wrong. 2) In just about all cases we
> only want to sync a few partitions, there's not really a use-case for
> doing syncs across all partitions imo.

I started looking into this and noticed that during the check where we
set the hi_options bits to determine if we can skip WAL and FSM
checks, we only check the table that's the target of the COPY command.
That's fine for normal tables, but for partitioned tables, it's pretty
wrong. We could end up skipping FSM checks for partitions that have
concurrent DML being performed which could bloat the table. We could
also skip writing WAL for partitions that should have had WAL written.

To fix this we'd need to also know each partition that we're about to
route tuples to was also created or truncated in the same transaction,
but we can't know that just yet since we've not opened up any
partitions to check yet.  I think the best back-patchable fix for this
is just to disable the optimization for partitioned tables.

I also noticed that we support COPY FREEZE for partitioned tables and
attempt to just check if the partitioned table was created or
truncated in the current transaction.  This is also wrong as even if
the partitioned table had been just created, it's partitions may have
already existed and actually be visible to other transactions.  It
seems best to just error out if anyone tries to do COPY FREEZE with a
partitioned table as there is no option, in this case, to build
per-partition hi_options lazily since we must error out if FREEZE
cannot be supported before the COPY starts processing tuples.

> > All the other callers of heap_sync don't care about partitioned
> > tables, so we could add an assertion on RELKIND_PARTITIONED_TABLE.
>
> Or rather, it should assert the expected relkinds?

I think an Assert() in heap_sync is a good idea.  As far as I know, it
should only ever be used for normal tables or materialized views. So
perhaps best to assert the rel is one of those?

I've attached a patch to assist discussion on the problem with setting
the incorrect hi_options for partitioned tables.

Comments are welcome.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Amit Langote
Date:
Hi,

On 2018/09/28 10:23, David Rowley wrote:
> On Thu, 20 Sep 2018 at 11:31, Andres Freund <andres@anarazel.de> wrote:
>>
>> On 2018-09-19 12:06:47 +0900, Michael Paquier wrote:
>>> On Wed, Sep 19, 2018 at 01:14:10PM +1200, David Rowley wrote:
>>>> Wouldn't it be better to modify copy.c to just perform the heap_sync
>>>> on just the partitions it touches?
>>>
>>> Yeah, my gut is telling me that this would be the best approach for now,
>>> still I am not sure that this is the best move in the long term.
>>
>> ISTM heap_sync() would be the entirely wrong layer to handle
>> partitioning. For several reasons: 1) With pluggable storage, we want to
>> have multiple different table implementations, doing the syncing on the
>> heap_* for partitions would thus be wrong. 2) In just about all cases we
>> only want to sync a few partitions, there's not really a use-case for
>> doing syncs across all partitions imo.
> 
> I started looking into this and noticed that during the check where we
> set the hi_options bits to determine if we can skip WAL and FSM
> checks, we only check the table that's the target of the COPY command.
> That's fine for normal tables, but for partitioned tables, it's pretty
> wrong. We could end up skipping FSM checks for partitions that have
> concurrent DML being performed which could bloat the table. We could
> also skip writing WAL for partitions that should have had WAL written.
> 
> To fix this we'd need to also know each partition that we're about to
> route tuples to was also created or truncated in the same transaction,
> but we can't know that just yet since we've not opened up any
> partitions to check yet.  I think the best back-patchable fix for this
> is just to disable the optimization for partitioned tables.

Agreed.  Data could go into partitions all of which may not pass the tests
used to select the value of hi_options being used to enable this
optimization.  So, it's better to avoid the optimization until we find a
way to set hi_option's value per partition.  Doing it upfront for all
partitions doesn't seem like a good idea as you already said, so we may
not consider patching it like that in back-branches.

> I also noticed that we support COPY FREEZE for partitioned tables and
> attempt to just check if the partitioned table was created or
> truncated in the current transaction.  This is also wrong as even if
> the partitioned table had been just created, it's partitions may have
> already existed and actually be visible to other transactions.  It
> seems best to just error out if anyone tries to do COPY FREEZE with a
> partitioned table as there is no option, in this case, to build
> per-partition hi_options lazily since we must error out if FREEZE
> cannot be supported before the COPY starts processing tuples.

Yeah, the same argument seems to apply here.  Not all partitions might be
created in the same sub-transaction.

>>> All the other callers of heap_sync don't care about partitioned
>>> tables, so we could add an assertion on RELKIND_PARTITIONED_TABLE.
>>
>> Or rather, it should assert the expected relkinds?
> 
> I think an Assert() in heap_sync is a good idea.  As far as I know, it
> should only ever be used for normal tables or materialized views. So
> perhaps best to assert the rel is one of those?

Agree with Andres that we should assert relkinds that heap_sync is
designed to handle, instead of, say, asserting that no partitioned tables
are passed to it.

> I've attached a patch to assist discussion on the problem with setting
> the incorrect hi_options for partitioned tables.
> 
> Comments are welcome.

Thank you for creating the patch and Tomas for reporting the bug.

Looking at the patch itself, does it seem like both the newly added
comments repeat the same point (that we'll need per-partition hi_options
to enable these optimizations) and are pretty close to each other?

Regards,
Amit



Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

From
David Rowley
Date:
On 28 September 2018 at 14:25, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> Looking at the patch itself, does it seem like both the newly added
> comments repeat the same point (that we'll need per-partition hi_options
> to enable these optimizations) and are pretty close to each other?

Thanks for looking at this.

I don't agree that we can skip explaining why one of the optimisations
can't be applied just because we've explained why a similar
optimisation cannot be applied somewhere close by.  I think that the
WAL/FSM optimisation can fairly easily be improved on and probably
fixed in PG12 as we can just lazily determine per-partition if it can
be applied to that partition or not.

For the FREEZE optimisation, since we ERROR out in cases where it's
requested but is not possible, it does not seem likely we'll ever fix
that since to do that we'd need to determine that all partitions have
just been truncated or were only just created in this transaction.
Since we've both recently done a bit of work in the area of speeding
up COPY, then I doubt either of us would like to go and slow it down
again by adding a pre-check that goes and opens all the partitions
before the copy begins.  That's going to have a huge negative
performance impact on small copies to 1 partition when there are many
partitions attached.

So in this regard, you'll notice that the comments are not that
similar. One explains that we could improve on it, and the other
attempts to mention that it would be surprising if we performed a
FREEZE for some partitions but not others.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Michael Paquier
Date:
On Fri, Sep 28, 2018 at 02:46:30PM +1200, David Rowley wrote:
> I don't agree that we can skip explaining why one of the optimisations
> can't be applied just because we've explained why a similar
> optimisation cannot be applied somewhere close by.  I think that the
> WAL/FSM optimisation can fairly easily be improved on and probably
> fixed in PG12 as we can just lazily determine per-partition if it can
> be applied to that partition or not.

Have you guys looked at what the following patch does for partitions and
how it interacts with it?
https://commitfest.postgresql.org/19/528/

The proposed patch is missing the point that documentation also mentions
the optimizations for COPY with wal_level = minimal:
   <para>
    <command>COPY</command> is fastest when used within the same
    transaction as an earlier <command>CREATE TABLE</command> or
    <command>TRUNCATE</command> command. In such cases no WAL
    needs to be written, because in case of an error, the files
    containing the newly loaded data will be removed anyway.
    However, this consideration only applies when
    <xref linkend="guc-wal-level"/> is <literal>minimal</literal> as all commands
    must write WAL otherwise.
   </para>
--
Michael

Attachment

Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Amit Langote
Date:
On 2018/09/28 11:46, David Rowley wrote:
> On 28 September 2018 at 14:25, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> Looking at the patch itself, does it seem like both the newly added
>> comments repeat the same point (that we'll need per-partition hi_options
>> to enable these optimizations) and are pretty close to each other?
> 
> Thanks for looking at this.
> 
> I don't agree that we can skip explaining why one of the optimisations
> can't be applied just because we've explained why a similar
> optimisation cannot be applied somewhere close by.  I think that the
> WAL/FSM optimisation can fairly easily be improved on and probably
> fixed in PG12 as we can just lazily determine per-partition if it can
> be applied to that partition or not.
> 
> For the FREEZE optimisation, since we ERROR out in cases where it's
> requested but is not possible, it does not seem likely we'll ever fix
> that since to do that we'd need to determine that all partitions have
> just been truncated or were only just created in this transaction.
> Since we've both recently done a bit of work in the area of speeding
> up COPY, then I doubt either of us would like to go and slow it down
> again by adding a pre-check that goes and opens all the partitions
> before the copy begins.  That's going to have a huge negative
> performance impact on small copies to 1 partition when there are many
> partitions attached.
> 
> So in this regard, you'll notice that the comments are not that
> similar. One explains that we could improve on it, and the other
> attempts to mention that it would be surprising if we performed a
> FREEZE for some partitions but not others.

Okay, I see the difference.  I didn't really intend to ask to remove the
second comment altogether; just the part which repeats the why (that
actual relations that the data will go into are not handy for inspection
at this point).

Looking at the second comment:

+/*
+ * We currently disallow COPY FREEZE on partitioned tables.  The
+ * reason for this is that here we're only able to determine details
+ * about the partitioned table.  The actual partitions are not opened
+ * until we start routing tuples to them, so we cannot know in advance
+ * if the partition has just been created or not.  It may be possible
+ * to relax this, but we would need to store hi_options per partition
+ * and it would possibly mean that we'd freeze some partitions but not
+ * others.  An outright ERROR seems better than surprising behavior.
+ */

Does this comment mean that we *may* consider removing this ERROR in the
future and continue with the optimization on per-partition basis even if,
as it says, it might mean that some partitions will contain frozen rows
while others not?

Thanks,
Amit



Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Amit Langote
Date:
On 2018/09/28 12:12, Michael Paquier wrote:
> On Fri, Sep 28, 2018 at 02:46:30PM +1200, David Rowley wrote:
>> I don't agree that we can skip explaining why one of the optimisations
>> can't be applied just because we've explained why a similar
>> optimisation cannot be applied somewhere close by.  I think that the
>> WAL/FSM optimisation can fairly easily be improved on and probably
>> fixed in PG12 as we can just lazily determine per-partition if it can
>> be applied to that partition or not.
> 
> Have you guys looked at what the following patch does for partitions and
> how it interacts with it?
> https://commitfest.postgresql.org/19/528/

Just looked at that patch and noticed that the following hunk won't cope
if COPY's target table is partitioned:

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 7674369613..7b9a7af2d2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2416,10 +2416,8 @@ CopyFrom(CopyState cstate)
     {
         hi_options |= HEAP_INSERT_SKIP_FSM;

-        if (!XLogIsNeeded() &&
-            cstate->rel->trigdesc == NULL &&
-            RelationGetNumberOfBlocks(cstate->rel) == 0)
-            hi_options |= HEAP_INSERT_SKIP_WAL;
+        if (!XLogIsNeeded() && RelationGetNumberOfBlocks(cstate->rel) == 0)
+             hi_options |= HEAP_INSERT_SKIP_WAL;
     }

     /*

RelationGetNumberOfBlocks would blow up if passed a partitioned table to it.

Applying David's patch will take care of that though.

> The proposed patch is missing the point that documentation also mentions
> the optimizations for COPY with wal_level = minimal:
>    <para>
>     <command>COPY</command> is fastest when used within the same
>     transaction as an earlier <command>CREATE TABLE</command> or
>     <command>TRUNCATE</command> command. In such cases no WAL
>     needs to be written, because in case of an error, the files
>     containing the newly loaded data will be removed anyway.
>     However, this consideration only applies when
>     <xref linkend="guc-wal-level"/> is <literal>minimal</literal> as all commands
>     must write WAL otherwise.
>    </para>

I might be wrong but I'm not sure if we should mention here that this
optimization is not applied to partitioned tables due to what appears to
be a implementation-level limitation?

Thanks,
Amit



Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

From
David Rowley
Date:
On 28 September 2018 at 15:12, Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Sep 28, 2018 at 02:46:30PM +1200, David Rowley wrote:
>> I don't agree that we can skip explaining why one of the optimisations
>> can't be applied just because we've explained why a similar
>> optimisation cannot be applied somewhere close by.  I think that the
>> WAL/FSM optimisation can fairly easily be improved on and probably
>> fixed in PG12 as we can just lazily determine per-partition if it can
>> be applied to that partition or not.
>
> Have you guys looked at what the following patch does for partitions and
> how it interacts with it?
> https://commitfest.postgresql.org/19/528/

I've glanced at it. I don't think we're taking anything in the wrong
direction. The patch looks like it would need rebased if this gets in
first.

> The proposed patch is missing the point that documentation also mentions
> the optimizations for COPY with wal_level = minimal:
>    <para>
>     <command>COPY</command> is fastest when used within the same
>     transaction as an earlier <command>CREATE TABLE</command> or
>     <command>TRUNCATE</command> command. In such cases no WAL
>     needs to be written, because in case of an error, the files
>     containing the newly loaded data will be removed anyway.
>     However, this consideration only applies when
>     <xref linkend="guc-wal-level"/> is <literal>minimal</literal> as all commands
>     must write WAL otherwise.
>    </para>

I've edited that in the attached patch.  Also reworded a comment that
Amit mentioned and made a small change to the COPY FREEZE docs to
mention no support for partitioned tables.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Tomas Vondra
Date:

On 09/28/2018 06:02 AM, Amit Langote wrote:
> On 2018/09/28 12:12, Michael Paquier wrote:
>> On Fri, Sep 28, 2018 at 02:46:30PM +1200, David Rowley wrote:
>>> I don't agree that we can skip explaining why one of the optimisations
>>> can't be applied just because we've explained why a similar
>>> optimisation cannot be applied somewhere close by.  I think that the
>>> WAL/FSM optimisation can fairly easily be improved on and probably
>>> fixed in PG12 as we can just lazily determine per-partition if it can
>>> be applied to that partition or not.
>>
>> Have you guys looked at what the following patch does for partitions and
>> how it interacts with it?
>> https://commitfest.postgresql.org/19/528/
> 
> Just looked at that patch and noticed that the following hunk won't cope
> if COPY's target table is partitioned:
> 
> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
> index 7674369613..7b9a7af2d2 100644
> --- a/src/backend/commands/copy.c
> +++ b/src/backend/commands/copy.c
> @@ -2416,10 +2416,8 @@ CopyFrom(CopyState cstate)
>      {
>          hi_options |= HEAP_INSERT_SKIP_FSM;
> 
> -        if (!XLogIsNeeded() &&
> -            cstate->rel->trigdesc == NULL &&
> -            RelationGetNumberOfBlocks(cstate->rel) == 0)
> -            hi_options |= HEAP_INSERT_SKIP_WAL;
> +        if (!XLogIsNeeded() && RelationGetNumberOfBlocks(cstate->rel) == 0)
> +             hi_options |= HEAP_INSERT_SKIP_WAL;
>      }
> 
>      /*
> 
> RelationGetNumberOfBlocks would blow up if passed a partitioned table to it.
> 
> Applying David's patch will take care of that though.
> 
>> The proposed patch is missing the point that documentation also mentions
>> the optimizations for COPY with wal_level = minimal:
>>    <para>
>>     <command>COPY</command> is fastest when used within the same
>>     transaction as an earlier <command>CREATE TABLE</command> or
>>     <command>TRUNCATE</command> command. In such cases no WAL
>>     needs to be written, because in case of an error, the files
>>     containing the newly loaded data will be removed anyway.
>>     However, this consideration only applies when
>>     <xref linkend="guc-wal-level"/> is <literal>minimal</literal> as all commands
>>     must write WAL otherwise.
>>    </para>
> 
> I might be wrong but I'm not sure if we should mention here that this
> optimization is not applied to partitioned tables due to what appears to
> be a implementation-level limitation?
> 

Aren't most limitations implementation-level? ;-)

IMHO it's entirely appropriate to mention this in user-level docs. Based
on the current wording, it's quite natural to assume the optimization
applies to both partitioned and non-partitioned tables. And in fact it
does not. It will still work for individual partitions, though.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

From
David Rowley
Date:
On 28 September 2018 at 19:22, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> I've edited that in the attached patch.  Also reworded a comment that
> Amit mentioned and made a small change to the COPY FREEZE docs to
> mention no support for partitioned tables.

Added to the November 'fest.

https://commitfest.postgresql.org/20/1811/

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Steve Singer
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            tested, failed

--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1534,9 +1534,10 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
     <command>TRUNCATE</command> command. In such cases no WAL
     needs to be written, because in case of an error, the files
     containing the newly loaded data will be removed anyway.
-    However, this consideration only applies when
-    <xref linkend="guc-wal-level"/> is <literal>minimal</literal> as all commands
-    must write WAL otherwise.
+    However, this consideration only applies for non-partitioned
+    tables when <xref linkend="guc-wal-level"/> is
+    <literal>minimal</literal> as all commands must write WAL
+    otherwise.
    </para>

Would this be better as "However, this consideration only applies for non-partitioned and the wal_level is minimal"
(Switching the 'when' to 'and the')



--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -222,9 +222,10 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Requests copying the data with rows already frozen, just as they
       would be after running the <command>VACUUM FREEZE</command> command.
       This is intended as a performance option for initial data loading.
-      Rows will be frozen only if the table being loaded has been created
-      or truncated in the current subtransaction, there are no cursors
-      open and there are no older snapshots held by this transaction.
+      Rows will be frozen only for non-partitioned tables if the table
+      being loaded has been created or truncated in the current
+      subtransaction, there are no cursors open and there are no older
+      snapshots held by this transaction.

Similar concern with above

When I read this comment it makes me think that the table is partitioned then the rows will be frozen even if the table
wascreated by an earlier transaction. 
 
Do we want to instead say "Rows will be frozen if the table is not partitioned and the table being loaded has
been...."

Other than that the patch looks fine and works as expected.

The new status of this patch is: Waiting on Author

Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

From
David Rowley
Date:
Thanks for looking at this.

On 1 November 2018 at 16:07, Steve Singer <steve@ssinger.info> wrote:
> --- a/doc/src/sgml/perform.sgml
> +++ b/doc/src/sgml/perform.sgml
> @@ -1534,9 +1534,10 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
>      <command>TRUNCATE</command> command. In such cases no WAL
>      needs to be written, because in case of an error, the files
>      containing the newly loaded data will be removed anyway.
> -    However, this consideration only applies when
> -    <xref linkend="guc-wal-level"/> is <literal>minimal</literal> as all commands
> -    must write WAL otherwise.
> +    However, this consideration only applies for non-partitioned
> +    tables when <xref linkend="guc-wal-level"/> is
> +    <literal>minimal</literal> as all commands must write WAL
> +    otherwise.
>     </para>
>
> Would this be better as "However, this consideration only applies for non-partitioned and the wal_level is minimal"
> (Switching the 'when' to 'and the')

I ended up going with:

    However, this consideration only applies when
    <xref linkend="guc-wal-level"/> is <literal>minimal</literal> for
    non-partitioned tables as all commands must write WAL otherwise.

> --- a/doc/src/sgml/ref/copy.sgml
> +++ b/doc/src/sgml/ref/copy.sgml
> @@ -222,9 +222,10 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
>        Requests copying the data with rows already frozen, just as they
>        would be after running the <command>VACUUM FREEZE</command> command.
>        This is intended as a performance option for initial data loading.
> -      Rows will be frozen only if the table being loaded has been created
> -      or truncated in the current subtransaction, there are no cursors
> -      open and there are no older snapshots held by this transaction.
> +      Rows will be frozen only for non-partitioned tables if the table
> +      being loaded has been created or truncated in the current
> +      subtransaction, there are no cursors open and there are no older
> +      snapshots held by this transaction.
>
> Similar concern with above

I've changed this to:

      Rows will be frozen only if the table being loaded has been created
      or truncated in the current subtransaction, there are no cursors
      open and there are no older snapshots held by this transaction.  It is
      currently not possible to perform a <command>COPY FREEZE</command> on
      a partitioned table.

Which is just the original text with the new sentence tagged onto the end.

> Other than that the patch looks fine and works as expected.
>
> The new status of this patch is: Waiting on Author

Thanks for looking at this. I've attached an updated patch.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Steve Singer
Date:
On Thu, 1 Nov 2018, David Rowley wrote:

>
> I ended up going with:
>
>    However, this consideration only applies when
>    <xref linkend="guc-wal-level"/> is <literal>minimal</literal> for
>    non-partitioned tables as all commands must write WAL otherwise.
>

>
> I've changed this to:
>
>      Rows will be frozen only if the table being loaded has been created
>      or truncated in the current subtransaction, there are no cursors
>      open and there are no older snapshots held by this transaction.  It is
>      currently not possible to perform a <command>COPY FREEZE</command> on
>      a partitioned table.
>
> Which is just the original text with the new sentence tagged onto the end.
>
>> Other than that the patch looks fine and works as expected.
>>
>> The new status of this patch is: Waiting on Author
>
> Thanks for looking at this. I've attached an updated patch.

I am happy with the changes.
I think the patch is ready for a committer.


>
> -- 
> David Rowley                   http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

Steve



Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

From
David Rowley
Date:
On 2 November 2018 at 15:43, Steve Singer <steve@ssinger.info> wrote:
> I am happy with the changes.
> I think the patch is ready for a committer.

Many thanks for your review.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Alvaro Herrera
Date:
Here are versions for branches 10 and 11.  The main change is that the
COPY test didn't have the partitioned table, because it was recently
introduced (0d5f05cde011) so I backpatched that part also.  It's a bit
useless, but I'd rather backpatch the same thing rather than have
different lines there ...

(The version for 10 only differs in the perform.sgml hunk.)

I intend to get this pushed on Monday.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Alvaro Herrera
Date:
FWIW I didn't like the error message:
  cannot perform FREEZE on a partitioned table
but then I noticed other messages also say "cannot perform FREEZE".  I
think they should all say "cannot perform COPY FREEZE" instead.

Not something for this patch to fix, though, I think.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Tomas Vondra
Date:
On 11/17/18 4:53 PM, Alvaro Herrera wrote:
> Here are versions for branches 10 and 11.  The main change is that the
> COPY test didn't have the partitioned table, because it was recently
> introduced (0d5f05cde011) so I backpatched that part also.  It's a bit
> useless, but I'd rather backpatch the same thing rather than have
> different lines there ...
> 

The patch seems fine to me.

It's a bit unfortunate that we simply disable the optimization on 
partitioned tables instead of fixing it somehow, but I guess it's better 
than nothing and no one has a very good idea how to make it work.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)

From
Alvaro Herrera
Date:
On 2018-Nov-19, Tomas Vondra wrote:

> On 11/17/18 4:53 PM, Alvaro Herrera wrote:
> > Here are versions for branches 10 and 11.  The main change is that the
> > COPY test didn't have the partitioned table, because it was recently
> > introduced (0d5f05cde011) so I backpatched that part also.  It's a bit
> > useless, but I'd rather backpatch the same thing rather than have
> > different lines there ...
> 
> The patch seems fine to me.

Pushed now, thanks.

> It's a bit unfortunate that we simply disable the optimization on
> partitioned tables instead of fixing it somehow, but I guess it's better
> than nothing and no one has a very good idea how to make it work.

Yeah, I think leaving it in the current state is worse than disabling
it, since an error is thrown anyway.  In any case, I guess when we have
a patch it won't be backpatchable.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

From
David Rowley
Date:
On Tue, 20 Nov 2018 at 01:47, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> The patch seems fine to me.

Thanks for looking at it.

> It's a bit unfortunate that we simply disable the optimization on
> partitioned tables instead of fixing it somehow, but I guess it's better
> than nothing and no one has a very good idea how to make it work.

Yeah, I agree.  I added various comments in the patch to explain why
we're not going to the trouble of allowing it. I had hoped that would
do for now.  Lately, there's quite a bit of effort going into reducing
various overheads to having a partitioned table with high numbers of
partitions. A patch to speed up INSERT was just committed, for
example.  Perhaps COPY is less critical to make faster for inserting
small numbers of rows, but I'd not rule out someone complaining
sometime about the overhead of checking all partitions are compatible
with the freeze optimisation before doing the COPY, although, perhaps
we can just tell those people not to use the FREEZE option.

Anyway, I was mostly interested in fixing the bug.  Making COPY FREEZE
work with partitioned tables sounds like a new feature.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

From
David Rowley
Date:
(Returning from leave)

On Tue, 20 Nov 2018 at 03:19, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Pushed now, thanks.

Many thanks for pushing.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services