Thread: [PATCH] Logical decoding of TRUNCATE
Hi, This patch implements support for TRUNCATE statements in logical replication. The work has mainly done by Simon Riggs then finished by me. Tests are written by me. TRUNCATE is treated as a form of DELETE for the purpose of deciding whether to publish, or not. The "TRUNCATE behavior when session_replication_role = replica"[1] patch is required from this patch to work correctly with tables referenced by foreign keys. [1] https://commitfest.postgresql.org/16/1447/ Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Attachment
Hi, On 2017-12-29 14:15:22 +0100, Marco Nenciarini wrote: > This patch implements support for TRUNCATE statements > in logical replication. The work has mainly done by Simon Riggs then > finished by me. Tests are written by me. > > TRUNCATE is treated as a form of DELETE for the purpose of deciding > whether to publish, or not. It'd be good if you explained exactly what the chosen behaviour is, and why that's the right behaviour in comparison to other alternatives. - Andres
Hi, Il 29/12/17 20:55, Andres Freund ha scritto: > Hi, > > On 2017-12-29 14:15:22 +0100, Marco Nenciarini wrote: >> This patch implements support for TRUNCATE statements >> in logical replication. The work has mainly done by Simon Riggs then >> finished by me. Tests are written by me. >> >> TRUNCATE is treated as a form of DELETE for the purpose of deciding >> whether to publish, or not. > > It'd be good if you explained exactly what the chosen behaviour is, and > why that's the right behaviour in comparison to other alternatives. > here is the description of how the patch works: * A new WAL record type has been added (XLOG_HEAP_TRUNCATE) to allow a precise logical decoding of the TRUNCATE operation. It contains the TRUNCATE flags and the list of the involved tables and sequences. It is treated as a NO-OP during the WAL reply as all the physical operations are logged in SMGR WAL records. * A new TRUNCATE message has been added to the logical protocol. It carries the information about the truncation of one single table specifying if CASCADE or RESTART IDENTITY has been specified. * The ExecuteTruncateGuts function has been extracted from ExecuteTruncate. This function is used by the logical apply process to replicate the TRUNCATE operations, honouring the eventual CASCADE and RESTART IDENTITY flag. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Attachment
On 29 December 2017 at 19:55, Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2017-12-29 14:15:22 +0100, Marco Nenciarini wrote: >> This patch implements support for TRUNCATE statements >> in logical replication. The work has mainly done by Simon Riggs then >> finished by me. Tests are written by me. >> >> TRUNCATE is treated as a form of DELETE for the purpose of deciding >> whether to publish, or not. > > It'd be good if you explained exactly what the chosen behaviour is, and > why that's the right behaviour in comparison to other alternatives. At present the patch treats TRUNCATE as if it were a DELETE which means that CREATE PUBLICATION insert_only FOR TABLE mydata WITH (publish = 'insert'); will not publish truncates before or after this patch CREATE PUBLICATION insert_only FOR TABLE mydata; will now publish TRUNCATEs, although they were ignored in PG10 so PG10 publications will act differently I had regarded it as a missing piece, but some may view that is a behaviour change in PG11 Alternatively, we could also use WITH (publish = 'truncate') as a separate decision. That is an easy change if we wish it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, I've found some SGML errors in the version v10 of the patch. I've fixed it in version v11 that is attached. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Attachment
Patch rebased on the current master. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Attachment
Attached here there is the complete list of patches required to pass all the tests. The 0001 patch is discussed in a separate thread, but I've posted it also here to ease the review of the 0002. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Attachment
Hi all, After commit bbd3363e128daec0e70952c1bb2f12ab1f6f1292 that refactor subscription tests to use PostgresNode's wait_for_catchup, the patch needs to be updated to use wait_for_catchup. Attached there is the updated patch. is discussed in a separate thread https://commitfest.postgresql.org/16/1447/, but I've posted it also here to ease the review of the 0002. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Attachment
Hi, I reviewed 0001 in its own thread. So I think that we generally want this patch and I think the design decisions are right. Namely: TRUNCATE being treated as DELETE in terms of DML filtering makes sense to me as it is basically bulk delete, needs to be mentioned in release notes though. Adding special message to protocol is appropriate as truncate is more DML than DDL in sense of manipulating data so it should be replicated separately from other DDL Processing relations that were truncated when CASCADE is used separately is needed because we allow relations to be filtered by logical replication I see the patch adds new xlog record which is perhaps not ideal but the current one seems utterly unsuitable for decoding so I guess it's okay, especially when it's only added for wal_level = logical which it is. Also TRUNCATE is not exactly high tps operation. Things I am less convinced about: The patch will cascade truncation on downstream if cascade was specified on the upstream, that can potentially be dangerous and we either should not do it and only truncate the tables which were truncated upstream (but without restricting because of FKs), leaving the data inconsistent on downstream (like we do already with DELETE or UPDATE). Or maybe make it into either subscription or publication option so that user can chose the behaviour here as I am sure some people will want it to cascade (but the default should still IMHO be to not cascade as that's safer). > + /* logicalrep_rel_close call not needed, because ExecuteTruncateGuts > + * already closes the relations. Setting localrel to NULL in the map entry > + * is still needed. > + */ > + rel->localrel = NULL; This is somewhat ugly. Perhaps the ExecuteTruncateGuts should track which relations it opened and only close those and the rest should be closed by caller? That should also remove the other ugly part which is that the ExecuteTruncateGuts modifies the input list. What if caller wanted to use those relations it sent as parameter later? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 17 January 2018 at 17:07, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > Things I am less convinced about: > > The patch will cascade truncation on downstream if cascade was specified > on the upstream, that can potentially be dangerous and we either should > not do it and only truncate the tables which were truncated upstream > (but without restricting because of FKs), leaving the data inconsistent > on downstream (like we do already with DELETE or UPDATE). Or maybe make > it into either subscription or publication option so that user can chose > the behaviour here as I am sure some people will want it to cascade (but > the default should still IMHO be to not cascade as that's safer). I agree the default should be to NOT cascade. If someone wants cascading as a publication option, that can be added later. >> + /* logicalrep_rel_close call not needed, because ExecuteTruncateGuts >> + * already closes the relations. Setting localrel to NULL in the map entry >> + * is still needed. >> + */ >> + rel->localrel = NULL; > > This is somewhat ugly. Perhaps the ExecuteTruncateGuts should track > which relations it opened and only close those and the rest should be > closed by caller? That should also remove the other ugly part which is > that the ExecuteTruncateGuts modifies the input list. What if caller > wanted to use those relations it sent as parameter later? Agreed -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi All, Il 18/01/18 17:48, Simon Riggs ha scritto: > On 17 January 2018 at 17:07, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > >> Things I am less convinced about: >> >> The patch will cascade truncation on downstream if cascade was specified >> on the upstream, that can potentially be dangerous and we either should >> not do it and only truncate the tables which were truncated upstream >> (but without restricting because of FKs), leaving the data inconsistent >> on downstream (like we do already with DELETE or UPDATE). Or maybe make >> it into either subscription or publication option so that user can chose >> the behaviour here as I am sure some people will want it to cascade (but >> the default should still IMHO be to not cascade as that's safer). > > I agree the default should be to NOT cascade. > > If someone wants cascading as a publication option, that can be added later. > I agree that not replicating the CASCADE option is the best option according to POLA principle. >>> + /* logicalrep_rel_close call not needed, because ExecuteTruncateGuts >>> + * already closes the relations. Setting localrel to NULL in the map entry >>> + * is still needed. >>> + */ >>> + rel->localrel = NULL; >> >> This is somewhat ugly. Perhaps the ExecuteTruncateGuts should track >> which relations it opened and only close those and the rest should be >> closed by caller? That should also remove the other ugly part which is >> that the ExecuteTruncateGuts modifies the input list. What if caller >> wanted to use those relations it sent as parameter later? > > Agreed > Attached a new version of the patch addressing these issues. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Attachment
On 19/01/18 12:37, Marco Nenciarini wrote: > Hi All, > > Il 18/01/18 17:48, Simon Riggs ha scritto: >> On 17 January 2018 at 17:07, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: >> >>> Things I am less convinced about: >>> >>> The patch will cascade truncation on downstream if cascade was specified >>> on the upstream, that can potentially be dangerous and we either should >>> not do it and only truncate the tables which were truncated upstream >>> (but without restricting because of FKs), leaving the data inconsistent >>> on downstream (like we do already with DELETE or UPDATE). Or maybe make >>> it into either subscription or publication option so that user can chose >>> the behaviour here as I am sure some people will want it to cascade (but >>> the default should still IMHO be to not cascade as that's safer). >> >> I agree the default should be to NOT cascade. >> >> If someone wants cascading as a publication option, that can be added later. >> > > I agree that not replicating the CASCADE option is the best option > according to POLA principle. > >>>> + /* logicalrep_rel_close call not needed, because ExecuteTruncateGuts >>>> + * already closes the relations. Setting localrel to NULL in the map entry >>>> + * is still needed. >>>> + */ >>>> + rel->localrel = NULL; >>> >>> This is somewhat ugly. Perhaps the ExecuteTruncateGuts should track >>> which relations it opened and only close those and the rest should be >>> closed by caller? That should also remove the other ugly part which is >>> that the ExecuteTruncateGuts modifies the input list. What if caller >>> wanted to use those relations it sent as parameter later? >> >> Agreed >> > > Attached a new version of the patch addressing these issues. > Besides the small thing I wrote for the 0001 in the other thread I am pretty much happy with this now. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 22/01/18 19:45, Petr Jelinek wrote: > On 19/01/18 12:37, Marco Nenciarini wrote: >> Hi All, >>>>> + /* logicalrep_rel_close call not needed, because ExecuteTruncateGuts >>>>> + * already closes the relations. Setting localrel to NULL in the map entry >>>>> + * is still needed. >>>>> + */ >>>>> + rel->localrel = NULL; >>>> >>>> This is somewhat ugly. Perhaps the ExecuteTruncateGuts should track >>>> which relations it opened and only close those and the rest should be >>>> closed by caller? That should also remove the other ugly part which is >>>> that the ExecuteTruncateGuts modifies the input list. What if caller >>>> wanted to use those relations it sent as parameter later? >>> >>> Agreed >>> >> >> Attached a new version of the patch addressing these issues. >> > > Besides the small thing I wrote for the 0001 in the other thread I am > pretty much happy with this now. > Actually on second look, I don't like the new boolean parameter much. I'd rather we didn't touch the input list and always close only relations opened inside the ExecuteTruncateGuts(). It may mean more list(s) but the current interface is still not clean. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Il 22/01/18 23:18, Petr Jelinek ha scritto: > On 22/01/18 19:45, Petr Jelinek wrote: > > Actually on second look, I don't like the new boolean parameter much. > I'd rather we didn't touch the input list and always close only > relations opened inside the ExecuteTruncateGuts(). > > It may mean more list(s) but the current interface is still not clean. > Now ExecuteTruncateGuts unconditionally closes the relations that it opens. The caller has now always the responsibility to close the relations passed with the explicit_rels list. Version 15 attached. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Attachment
Hi, On 23/01/18 15:38, Marco Nenciarini wrote: > Il 22/01/18 23:18, Petr Jelinek ha scritto: >> On 22/01/18 19:45, Petr Jelinek wrote: >> >> Actually on second look, I don't like the new boolean parameter much. >> I'd rather we didn't touch the input list and always close only >> relations opened inside the ExecuteTruncateGuts(). >> >> It may mean more list(s) but the current interface is still not clean. >> > > Now ExecuteTruncateGuts unconditionally closes the relations that it > opens. The caller has now always the responsibility to close the > relations passed with the explicit_rels list. This looks good. > > Version 15 attached. > I see you still do CASCADE on the subscriber though. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Il 23/01/18 18:13, Petr Jelinek ha scritto: > Hi, > > On 23/01/18 15:38, Marco Nenciarini wrote: >> Il 22/01/18 23:18, Petr Jelinek ha scritto: >>> On 22/01/18 19:45, Petr Jelinek wrote: >>> >>> Actually on second look, I don't like the new boolean parameter much. >>> I'd rather we didn't touch the input list and always close only >>> relations opened inside the ExecuteTruncateGuts(). >>> >>> It may mean more list(s) but the current interface is still not clean. >>> >> >> Now ExecuteTruncateGuts unconditionally closes the relations that it >> opens. The caller has now always the responsibility to close the >> relations passed with the explicit_rels list. > > This looks good. > >> >> Version 15 attached. >> > > I see you still do CASCADE on the subscriber though. > No it doesn't. The following code in worker.c prevents that. + /* + * Even if we used CASCADE on the upstream master we explicitly + * default to replaying changes without further cascading. + * This might be later changeable with a user specified option. + */ + cascade = false; There is also a test that check it works as intended: + # should not cascade on replica + $node_publisher->safe_psql('postgres', "TRUNCATE tab_rep CASCADE"); + + $node_publisher->wait_for_catchup($appname); + + $result = $node_subscriber->safe_psql('postgres', + "SELECT count(*), min(a), max(a) FROM tab_notrep_fk"); + is($result, qq(1|-1|-1), + 'check replicated truncate does not cascade on replica'); + + $result = $node_subscriber->safe_psql('postgres', + "SELECT nextval('seq_notrep')"); + is($result, qq(102), + 'check replicated truncate does not restart identities'); + Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Attachment
On 23/01/18 18:19, Marco Nenciarini wrote: > Il 23/01/18 18:13, Petr Jelinek ha scritto: >> Hi, >> >> On 23/01/18 15:38, Marco Nenciarini wrote: >>> Il 22/01/18 23:18, Petr Jelinek ha scritto: >>>> On 22/01/18 19:45, Petr Jelinek wrote: >>>> >>>> Actually on second look, I don't like the new boolean parameter much. >>>> I'd rather we didn't touch the input list and always close only >>>> relations opened inside the ExecuteTruncateGuts(). >>>> >>>> It may mean more list(s) but the current interface is still not clean. >>>> >>> >>> Now ExecuteTruncateGuts unconditionally closes the relations that it >>> opens. The caller has now always the responsibility to close the >>> relations passed with the explicit_rels list. >> >> This looks good. >> >>> >>> Version 15 attached. >>> >> >> I see you still do CASCADE on the subscriber though. >> > > No it doesn't. The following code in worker.c prevents that. > > > + /* > + * Even if we used CASCADE on the upstream master we explicitly > + * default to replaying changes without further cascading. > + * This might be later changeable with a user specified option. > + */ > + cascade = false; Ah, that's pretty ugly, why don't we just use DROP_RESTRICT always instead of this (keeping the comment). Unless you plan to make it option as part of this patch, the current coding is confusing. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Il 23/01/18 18:25, Petr Jelinek ha scritto: > On 23/01/18 18:19, Marco Nenciarini wrote: >> Il 23/01/18 18:13, Petr Jelinek ha scritto: >>> Hi, >>> >>> On 23/01/18 15:38, Marco Nenciarini wrote: >>>> Il 22/01/18 23:18, Petr Jelinek ha scritto: >>>>> On 22/01/18 19:45, Petr Jelinek wrote: >>>>> >>>>> Actually on second look, I don't like the new boolean parameter much. >>>>> I'd rather we didn't touch the input list and always close only >>>>> relations opened inside the ExecuteTruncateGuts(). >>>>> >>>>> It may mean more list(s) but the current interface is still not clean. >>>>> >>>> >>>> Now ExecuteTruncateGuts unconditionally closes the relations that it >>>> opens. The caller has now always the responsibility to close the >>>> relations passed with the explicit_rels list. >>> >>> This looks good. >>> >>>> >>>> Version 15 attached. >>>> >>> >>> I see you still do CASCADE on the subscriber though. >>> >> >> No it doesn't. The following code in worker.c prevents that. >> >> >> + /* >> + * Even if we used CASCADE on the upstream master we explicitly >> + * default to replaying changes without further cascading. >> + * This might be later changeable with a user specified option. >> + */ >> + cascade = false; > > Ah, that's pretty ugly, why don't we just use DROP_RESTRICT always > instead of this (keeping the comment). Unless you plan to make it option > as part of this patch, the current coding is confusing. > Ok, Removed. Version 16 attached. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Attachment
Hi, On 23/01/18 18:47, Marco Nenciarini wrote: > Il 23/01/18 18:25, Petr Jelinek ha scritto: >> On 23/01/18 18:19, Marco Nenciarini wrote: >>> Il 23/01/18 18:13, Petr Jelinek ha scritto: >>>> Hi, >>>> >>>> On 23/01/18 15:38, Marco Nenciarini wrote: >>>>> Il 22/01/18 23:18, Petr Jelinek ha scritto: >>>>>> On 22/01/18 19:45, Petr Jelinek wrote: >>>>>> >>>>>> Actually on second look, I don't like the new boolean parameter much. >>>>>> I'd rather we didn't touch the input list and always close only >>>>>> relations opened inside the ExecuteTruncateGuts(). >>>>>> >>>>>> It may mean more list(s) but the current interface is still not clean. >>>>>> >>>>> >>>>> Now ExecuteTruncateGuts unconditionally closes the relations that it >>>>> opens. The caller has now always the responsibility to close the >>>>> relations passed with the explicit_rels list. >>>> >>>> This looks good. >>>> >>>>> >>>>> Version 15 attached. >>>>> >>>> >>>> I see you still do CASCADE on the subscriber though. >>>> >>> >>> No it doesn't. The following code in worker.c prevents that. >>> >>> >>> + /* >>> + * Even if we used CASCADE on the upstream master we explicitly >>> + * default to replaying changes without further cascading. >>> + * This might be later changeable with a user specified option. >>> + */ >>> + cascade = false; >> >> Ah, that's pretty ugly, why don't we just use DROP_RESTRICT always >> instead of this (keeping the comment). Unless you plan to make it option >> as part of this patch, the current coding is confusing. >> > > Ok, Removed. > Great, thanks, I think this is ready now so marking as such. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
I wonder whether this should be dealing with sequences at all. We are not currently publishing any information about sequences, so it seems weird to do it only here. Also, I'd imagine that if we ever get to publishing sequence events, then the sequence restarts would be published as independent events. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 24/01/18 13:50, Peter Eisentraut wrote: > I wonder whether this should be dealing with sequences at all. We are > not currently publishing any information about sequences, so it seems > weird to do it only here. Also, I'd imagine that if we ever get to > publishing sequence events, then the sequence restarts would be > published as independent events. > That depends on if we consider this to be part of sequence handling or truncate statement replication handling. It's true that if we had sequence replication, the restart would get propagated that way anyway. OTOH, if we'll want to add option in the future to propagate cascade (to any additional tables on downstream), it may need to reset sequences which are not even present upstream and as such would not be handled by sequence replication. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jan 24, 2018 at 6:47 AM, Marco Nenciarini <marco.nenciarini@2ndquadrant.it> wrote: > Version 16 attached. Hi Marco, FYI this version doesn't compile: worker.c: In function ‘apply_handle_truncate’: worker.c:888:39: error: ‘cascade’ undeclared (first use in this function) relid = logicalrep_read_truncate(s, &cascade, &restart_seqs); ^ -- Thomas Munro http://www.enterprisedb.com
On 25/01/18 08:26, Thomas Munro wrote: > On Wed, Jan 24, 2018 at 6:47 AM, Marco Nenciarini > <marco.nenciarini@2ndquadrant.it> wrote: >> Version 16 attached. > > Hi Marco, > > FYI this version doesn't compile: > > worker.c: In function ‘apply_handle_truncate’: > worker.c:888:39: error: ‘cascade’ undeclared (first use in this function) > relid = logicalrep_read_truncate(s, &cascade, &restart_seqs); > ^ > Indeed. We also found one more issue - the truncate done by logical replication worker is not logically logged which would break cascading. I expect Marco to send new version shortly with both of these fixed. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Il 25/01/18 13:18, Petr Jelinek ha scritto: > On 25/01/18 08:26, Thomas Munro wrote: >> On Wed, Jan 24, 2018 at 6:47 AM, Marco Nenciarini >> <marco.nenciarini@2ndquadrant.it> wrote: >>> Version 16 attached. >> >> Hi Marco, >> >> FYI this version doesn't compile: >> >> worker.c: In function ‘apply_handle_truncate’: >> worker.c:888:39: error: ‘cascade’ undeclared (first use in this function) >> relid = logicalrep_read_truncate(s, &cascade, &restart_seqs); >> ^ >> > Fixed. > Indeed. > > We also found one more issue - the truncate done by logical replication > worker is not logically logged which would break cascading. > Fixed. > I expect Marco to send new version shortly with both of these fixed. > New patch v17 attached. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Attachment
Seeing that patch development is coming along at a nice pace, it seems a good time to discuss this question. On 1/4/18 03:24, Simon Riggs wrote: > At present the patch treats TRUNCATE as if it were a DELETE > > ... > > CREATE PUBLICATION insert_only FOR TABLE mydata; > will now publish TRUNCATEs, although they were ignored in PG10 > so PG10 publications will act differently > > I had regarded it as a missing piece, but some may view that is a > behaviour change in PG11 My understanding of this design is that there's no possible way to make a PG11 database behave like a PG10 database did. For example, if someone has a data warehouse with a single table that's subscribed to publishers on multiple source databases, they would certainly not want truncate SQL replicated. If they just upgrade their database without reading all the release notes (not that anyone would ever do that), they might get a surprise data loss in the warehouse. Of course I wouldn't suggest using truncate on their sources - but people may still do it. > Alternatively, we could also use WITH (publish = 'truncate') as a > separate decision. > > That is an easy change if we wish it. Of course the user could simply not _allow_ truncate commands on the source tables via permissions and/or triggers; that seems a little more "correct" to me from a DB design perspective. But I still like Simon's idea here of using WITH as a separate decision. I'm sure that there will be users somewhere who build systems based on our PG10 design - this at least doesn't completely leave them out to dry, and I don't see any big downsides to having the separate decision for truncate. In addition, this seems a little more consistent. In other places that comes to mind (e.g. triggers and privileges), truncate is treated distinctly from delete. Makes sense to me to continue that convention. -Jeremy -- Jeremy Schneider Database Engineer Amazon Web Services +1 312-725-9249 (m) schnjere@amazon.com
Seeing that patch development is coming along at a nice pace, it seems a good time to discuss this question. On 1/4/18 03:24, Simon Riggs wrote: > At present the patch treats TRUNCATE as if it were a DELETE > > ... > > CREATE PUBLICATION insert_only FOR TABLE mydata; > will now publish TRUNCATEs, although they were ignored in PG10 > so PG10 publications will act differently > > I had regarded it as a missing piece, but some may view that is a > behaviour change in PG11 My understanding of this design is that there's no possible way to make a PG11 database behave like a PG10 database did. For example, if someone has a data warehouse with a single table that's subscribed to publishers on multiple source databases, they would certainly not want truncate SQL replicated. If they just upgrade their database without reading all the release notes (not that anyone would ever do that), they might get a surprise data loss in the warehouse. Of course I wouldn't suggest using truncate on their sources - but people may still do it. > Alternatively, we could also use WITH (publish = 'truncate') as a > separate decision. > > That is an easy change if we wish it. Of course the user could simply not _allow_ truncate commands on the source tables via permissions and/or triggers; that seems a little more "correct" to me from a DB design perspective. But I still like Simon's idea here of using WITH as a separate decision. I'm sure that there will be users somewhere who build systems based on our PG10 design - this at least doesn't completely leave them out to dry, and I don't see any big downsides to having the separate decision for truncate. In addition, this seems a little more consistent. In other places that comes to mind (e.g. triggers and privileges), truncate is treated distinctly from delete. Makes sense to me to continue that convention. -Jeremy -- Jeremy Schneider Database Engineer Amazon Web Services +1 312-725-9249 (m) schnjere@amazon.com
On Wed, Jan 17, 2018 at 12:07 PM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > The patch will cascade truncation on downstream if cascade was specified > on the upstream, that can potentially be dangerous and we either should > not do it and only truncate the tables which were truncated upstream > (but without restricting because of FKs), leaving the data inconsistent > on downstream (like we do already with DELETE or UPDATE). Or maybe make > it into either subscription or publication option so that user can chose > the behaviour here as I am sure some people will want it to cascade (but > the default should still IMHO be to not cascade as that's safer). Maybe I'm not understanding what is being proposed here, but it sounds like you're saying that if somebody removes a bunch of data on the logical master, replication will remove only some of it from the servers to which the change is replicated. That seems crazy. Then replication can't be counted on to produce a replica. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 26/01/18 02:34, Robert Haas wrote: > On Wed, Jan 17, 2018 at 12:07 PM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >> The patch will cascade truncation on downstream if cascade was specified >> on the upstream, that can potentially be dangerous and we either should >> not do it and only truncate the tables which were truncated upstream >> (but without restricting because of FKs), leaving the data inconsistent >> on downstream (like we do already with DELETE or UPDATE). Or maybe make >> it into either subscription or publication option so that user can chose >> the behaviour here as I am sure some people will want it to cascade (but >> the default should still IMHO be to not cascade as that's safer). > > Maybe I'm not understanding what is being proposed here, but it sounds > like you're saying that if somebody removes a bunch of data on the > logical master, replication will remove only some of it from the > servers to which the change is replicated. That seems crazy. Then > replication can't be counted on to produce a replica. > No, I was talking about extra tables that might be present on downstream which weren't truncated on upstream. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 25, 2018 at 8:36 PM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > On 26/01/18 02:34, Robert Haas wrote: >> On Wed, Jan 17, 2018 at 12:07 PM, Petr Jelinek >> <petr.jelinek@2ndquadrant.com> wrote: >>> The patch will cascade truncation on downstream if cascade was specified >>> on the upstream, that can potentially be dangerous and we either should >>> not do it and only truncate the tables which were truncated upstream >>> (but without restricting because of FKs), leaving the data inconsistent >>> on downstream (like we do already with DELETE or UPDATE). Or maybe make >>> it into either subscription or publication option so that user can chose >>> the behaviour here as I am sure some people will want it to cascade (but >>> the default should still IMHO be to not cascade as that's safer). >> >> Maybe I'm not understanding what is being proposed here, but it sounds >> like you're saying that if somebody removes a bunch of data on the >> logical master, replication will remove only some of it from the >> servers to which the change is replicated. That seems crazy. Then >> replication can't be counted on to produce a replica. >> > No, I was talking about extra tables that might be present on downstream > which weren't truncated on upstream. Oh. That's different. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/24/18 07:53, Petr Jelinek wrote: > That depends on if we consider this to be part of sequence handling or > truncate statement replication handling. It's true that if we had > sequence replication, the restart would get propagated that way anyway. > OTOH, if we'll want to add option in the future to propagate cascade (to > any additional tables on downstream), it may need to reset sequences > which are not even present upstream and as such would not be handled by > sequence replication. Logical replication, as it currently is designed, replicates discrete actions, not commands. A delete command that deletes five rows and cascades to delete three more rows somewhere else ends up being replicated as eight changes. So I think a TRUNCATE command that cascades to four tables and resets six sequences should end up being replicated as ten changes. (Since we currently don't handle sequences at all, we'll omit the six sequence changes for now.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 26/01/18 03:44, Peter Eisentraut wrote: > On 1/24/18 07:53, Petr Jelinek wrote: >> That depends on if we consider this to be part of sequence handling or >> truncate statement replication handling. It's true that if we had >> sequence replication, the restart would get propagated that way anyway. >> OTOH, if we'll want to add option in the future to propagate cascade (to >> any additional tables on downstream), it may need to reset sequences >> which are not even present upstream and as such would not be handled by >> sequence replication. > > Logical replication, as it currently is designed, replicates discrete > actions, not commands. A delete command that deletes five rows and > cascades to delete three more rows somewhere else ends up being > replicated as eight changes. So I think a TRUNCATE command that > cascades to four tables and resets six sequences should end up being > replicated as ten changes. (Since we currently don't handle sequences > at all, we'll omit the six sequence changes for now.) > Well, that depends, I think there are two separate problems a) decoding b) replication. I think it's useful for plugins to know if reset sequence or cascade was specified in the command so I think it should be decoded. Some plugins will definitely want to forward that info. But it's true that since we don't handle sequences in logical replication, the sequence reset does not need to be replicated there. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2018-01-25 14:21:15 +0100, Marco Nenciarini wrote: > + if (SessionReplicationRole != SESSION_REPLICATION_ROLE_REPLICA) > + { > + > + /* > + * Check foreign key references. In CASCADE mode, this should be > + * unnecessary since we just pulled in all the references; but as a > + * cross-check, do it anyway if in an Assert-enabled build. > + */ > #ifdef USE_ASSERT_CHECKING > heap_truncate_check_FKs(rels, false); > + #else > + if (stmt->behavior == DROP_RESTRICT) > + heap_truncate_check_FKs(rels, false); > #endif > + } That *can't* be right. > + case REORDER_BUFFER_CHANGE_TRUNCATE: > + appendStringInfoString(ctx->out, " TRUNCATE:"); > + > + if (change->data.truncate_msg.restart_seqs > + || change->data.truncate_msg.cascade) > + { > + if (change->data.truncate_msg.restart_seqs) > + appendStringInfo(ctx->out, " restart_seqs"); > + if (change->data.truncate_msg.cascade) > + appendStringInfo(ctx->out, " cascade"); > + } > + else > + appendStringInfoString(ctx->out, " (no-flags)"); > + break; > default: > Assert(false); > } I know this has been discussed in the thread already, but it really strikes me as wrong to basically do some mini DDL replication feature via per-command WAL records. > *************** > *** 111,116 **** CREATE PUBLICATION <replaceable class="parameter">name</replaceable> > --- 111,121 ---- > and so the default value for this option is > <literal>'insert, update, delete'</literal>. > </para> > + <para> > + <command>TRUNCATE</command> is treated as a form of > + <command>DELETE</command> for the purpose of deciding whether > + to publish, or not. > + </para> > </listitem> > </varlistentry> > </variablelist> Why is this a good idea? Hm, it seems logicaldecoding.sgml hasn't been updated? > + void > + ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged, > + DropBehavior behavior, bool restart_seqs) > + { > + List *rels = list_copy(explicit_rels); Why is this copied? > + * Write a WAL record to allow this set of actions to be logically decoded. > + * We could optimize this away when !RelationIsLogicallyLogged(rel) > + * but that doesn't save much space or time. What you're saying isn't that you're not logging anything, but that you're allocating the header regardless? Because this certainly sounds like you unconditionally log a WAL record. > + * Assemble an array of relids, then an array of seqrelids so we can write > + * a single WAL record for the whole action. > + */ > + logrelids = palloc(maxrelids * sizeof(Oid)); > + foreach (cell, relids_logged) > + { > + nrelids++; > + if (nrelids > maxrelids) > + { > + maxrelids *= 2; > + logrelids = repalloc(logrelids, maxrelids * sizeof(Oid)); > + } > + logrelids[nrelids - 1] = lfirst_oid(cell); > + } > + > + foreach (cell, seq_relids_logged) > + { > + nseqrelids++; > + if ((nrelids + nseqrelids) > maxrelids) > + { > + maxrelids *= 2; > + logrelids = repalloc(logrelids, maxrelids * sizeof(Oid)); > + } > + logrelids[nrelids + nseqrelids - 1] = lfirst_oid(cell); > + } I'm confused. Why do we need the resizing here, when we know the max upfront? > + /* > + * For truncate we list all truncated relids in an array, followed by all > + * sequence relids that need to be restarted, if any. > + * All rels are always within the same database, so we just list dbid once. > + */ > + typedef struct xl_heap_truncate > + { > + Oid dbId; > + uint32 nrelids; > + uint32 nseqrelids; > + uint8 flags; > + Oid relids[FLEXIBLE_ARRAY_MEMBER]; > + } xl_heap_truncate; Given that the space is used anyway due to padding, I'd just make flags 32bit. Greetings, Andres Freund
On 1 April 2018 at 21:01, Andres Freund <andres@anarazel.de> wrote: >> *************** >> *** 111,116 **** CREATE PUBLICATION <replaceable class="parameter">name</replaceable> >> --- 111,121 ---- >> and so the default value for this option is >> <literal>'insert, update, delete'</literal>. >> </para> >> + <para> >> + <command>TRUNCATE</command> is treated as a form of >> + <command>DELETE</command> for the purpose of deciding whether >> + to publish, or not. >> + </para> >> </listitem> >> </varlistentry> >> </variablelist> > > Why is this a good idea? TRUNCATE removes rows, just as DELETE does, so anybody that wants to publish the removal of rows will be interested in this. This avoids unnecessary overcomplication of the existing interface. >> + * Write a WAL record to allow this set of actions to be logically decoded. >> + * We could optimize this away when !RelationIsLogicallyLogged(rel) >> + * but that doesn't save much space or time. > > What you're saying isn't that you're not logging anything, but that > you're allocating the header regardless? Because this certainly sounds > like you unconditionally log a WAL record. It says that, yes, my idea - as explained. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 4/1/18 16:01, Andres Freund wrote: > On 2018-01-25 14:21:15 +0100, Marco Nenciarini wrote: >> + if (SessionReplicationRole != SESSION_REPLICATION_ROLE_REPLICA) >> + { >> + >> + /* >> + * Check foreign key references. In CASCADE mode, this should be >> + * unnecessary since we just pulled in all the references; but as a >> + * cross-check, do it anyway if in an Assert-enabled build. >> + */ >> #ifdef USE_ASSERT_CHECKING >> heap_truncate_check_FKs(rels, false); >> + #else >> + if (stmt->behavior == DROP_RESTRICT) >> + heap_truncate_check_FKs(rels, false); >> #endif >> + } > > That *can't* be right. You mean the part that ignores this in session_replication_role = replica? I don't like that either. And it's also not clear why it's needed for this feature. >> + case REORDER_BUFFER_CHANGE_TRUNCATE: >> + appendStringInfoString(ctx->out, " TRUNCATE:"); >> + >> + if (change->data.truncate_msg.restart_seqs >> + || change->data.truncate_msg.cascade) >> + { >> + if (change->data.truncate_msg.restart_seqs) >> + appendStringInfo(ctx->out, " restart_seqs"); >> + if (change->data.truncate_msg.cascade) >> + appendStringInfo(ctx->out, " cascade"); >> + } >> + else >> + appendStringInfoString(ctx->out, " (no-flags)"); >> + break; >> default: >> Assert(false); >> } > > I know this has been discussed in the thread already, but it really > strikes me as wrong to basically do some mini DDL replication feature > via per-command WAL records. The problem is that the interaction of TRUNCATE and foreign keys is a mess. Let's say I have a provider database with table1, table2, and table3, all connected by foreign keys, and a replica database with table1, table2, and table4, all connected by foreign keys. I run TRUNCATE table1 CASCADE on the provider. What should replication do? The proposed patch applies the TRUNCATE table1 CASCADE on the replica, which ends up truncating table1, table2, and table4, which is different from what happened on the origin. An alternative might be to make the replication protocol message say "I truncated table1, table2" (let's say table3 is not replicated). (This sounds more like logical replication rather than DDL replication.) That will then fail to apply on the replica, because it requires that you include all tables connected by foreign keys. We could then do what we do in the DELETE case and just ignore foreign keys altogether, which is obviously bad and not a forward-looking behavior. Or we could do what we *should* be doing in the DELETE case and apply cascading actions on the replica to table4, but that kind of row-by-row processing is what we want to avoid by using TRUNCATE in the first place. None of these solutions are good. I don't have any other ideas, though. >> *************** >> *** 111,116 **** CREATE PUBLICATION <replaceable class="parameter">name</replaceable> >> --- 111,121 ---- >> and so the default value for this option is >> <literal>'insert, update, delete'</literal>. >> </para> >> + <para> >> + <command>TRUNCATE</command> is treated as a form of >> + <command>DELETE</command> for the purpose of deciding whether >> + to publish, or not. >> + </para> >> </listitem> >> </varlistentry> >> </variablelist> > > Why is this a good idea? I think it seemed like a good idea at the time, so to speak, but several people have argued against it, so we should probably change this in the final version. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2 April 2018 at 19:30, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: >>> *************** >>> *** 111,116 **** CREATE PUBLICATION <replaceable class="parameter">name</replaceable> >>> --- 111,121 ---- >>> and so the default value for this option is >>> <literal>'insert, update, delete'</literal>. >>> </para> >>> + <para> >>> + <command>TRUNCATE</command> is treated as a form of >>> + <command>DELETE</command> for the purpose of deciding whether >>> + to publish, or not. >>> + </para> >>> </listitem> >>> </varlistentry> >>> </variablelist> >> >> Why is this a good idea? > > I think it seemed like a good idea at the time, so to speak, but several > people have argued against it, so we should probably change this in the > final version. Who has argued against it? I see that Andres has asked twice about it and been answered twice, but expressed no opinion. Petr has said he thinks that's right. Nobody else has commented. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-04-02 14:30:50 -0400, Peter Eisentraut wrote: > On 4/1/18 16:01, Andres Freund wrote: > > On 2018-01-25 14:21:15 +0100, Marco Nenciarini wrote: > >> + if (SessionReplicationRole != SESSION_REPLICATION_ROLE_REPLICA) > >> + { > >> + > >> + /* > >> + * Check foreign key references. In CASCADE mode, this should be > >> + * unnecessary since we just pulled in all the references; but as a > >> + * cross-check, do it anyway if in an Assert-enabled build. > >> + */ > >> #ifdef USE_ASSERT_CHECKING > >> heap_truncate_check_FKs(rels, false); > >> + #else > >> + if (stmt->behavior == DROP_RESTRICT) > >> + heap_truncate_check_FKs(rels, false); > >> #endif > >> + } > > > > That *can't* be right. > > You mean the part that ignores this in session_replication_role = > replica? I don't like that either. And it's also not clear why it's > needed for this feature. I primarily the way the code is written. For me code differing like this between USE_ASSERT_CHECKING and not seems like a recipe for disaster. And yea, I'm not sure why the session_replication_role bit is here either. > > I know this has been discussed in the thread already, but it really > > strikes me as wrong to basically do some mini DDL replication feature > > via per-command WAL records. > > The problem is that the interaction of TRUNCATE and foreign keys is a mess. > > Let's say I have a provider database with table1, table2, and table3, > all connected by foreign keys, and a replica database with table1, > table2, and table4, all connected by foreign keys. I run TRUNCATE > table1 CASCADE on the provider. What should replication do? > > The proposed patch applies the TRUNCATE table1 CASCADE on the replica, > which ends up truncating table1, table2, and table4, which is different > from what happened on the origin. I actually think that's a fairly sane behaviour because it allows you to have different schemas on both sides and still deal in a reasonably sane manner. What I'm concerned about is more that we're developing a one-of DDL replication feature w/ corresponding bespoke WAL-logging. > An alternative might be to make the replication protocol message say "I > truncated table1, table2" (let's say table3 is not replicated). (This > sounds more like logical replication rather than DDL replication.) That > will then fail to apply on the replica, because it requires that you > include all tables connected by foreign keys. And entirely fails if there's schema differences. > We could then do what we do in the DELETE case and just ignore foreign > keys altogether, which is obviously bad and not a forward-looking behavior. > > Or we could do what we *should* be doing in the DELETE case and apply > cascading actions on the replica to table4, but that kind of row-by-row > processing is what we want to avoid by using TRUNCATE in the first place. Well, you could also queue a re-check at the end of the processed message, doing a bulk re-check at the end. But that's obviously more work. > >> + <para> > >> + <command>TRUNCATE</command> is treated as a form of > >> + <command>DELETE</command> for the purpose of deciding whether > >> + to publish, or not. > >> + </para> > >> </listitem> > >> </varlistentry> > >> </variablelist> > > > > Why is this a good idea? > > I think it seemed like a good idea at the time, so to speak, but several > people have argued against it, so we should probably change this in the > final version. I think it's e.g. perfectly reasonable to want OLTP changes to be replicated, but not to truncate historical data. So for me those actions should be separate... Greetings, Andres Freund
Hi, On 2018-04-02 07:39:57 +0100, Simon Riggs wrote: > On 1 April 2018 at 21:01, Andres Freund <andres@anarazel.de> wrote: > > >> *************** > >> *** 111,116 **** CREATE PUBLICATION <replaceable class="parameter">name</replaceable> > >> --- 111,121 ---- > >> and so the default value for this option is > >> <literal>'insert, update, delete'</literal>. > >> </para> > >> + <para> > >> + <command>TRUNCATE</command> is treated as a form of > >> + <command>DELETE</command> for the purpose of deciding whether > >> + to publish, or not. > >> + </para> > >> </listitem> > >> </varlistentry> > >> </variablelist> > > > > Why is this a good idea? > > TRUNCATE removes rows, just as DELETE does, so anybody that wants to > publish the removal of rows will be interested in this. I'm not convinced. I think it's perfectly reasonable to want to truncate away data on the live node, but maintain it on an archival node. In that case one commonly wants to continue to maintain OLTP changes (hence DELETE), but not the bulk truncation. I think there's a reasonable counter-argument in that this isn't fine grained enough. > >> + * Write a WAL record to allow this set of actions to be logically decoded. > >> + * We could optimize this away when !RelationIsLogicallyLogged(rel) > >> + * but that doesn't save much space or time. > > > > What you're saying isn't that you're not logging anything, but that > > you're allocating the header regardless? Because this certainly sounds > > like you unconditionally log a WAL record. > > It says that, yes, my idea - as explained. My complaint is that the comment and the actual implementation differ. The above sounds like it's unconditionally WAL logging, but: + /* + * Write a WAL record to allow this set of actions to be logically decoded. + * We could optimize this away when !RelationIsLogicallyLogged(rel) + * but that doesn't save much space or time. + * + * Assemble an array of relids, then an array of seqrelids so we can write + * a single WAL record for the whole action. + */ + logrelids = palloc(maxrelids * sizeof(Oid)); + foreach (cell, relids_logged) + { + nrelids++; + if (nrelids > maxrelids) + { + maxrelids *= 2; + logrelids = repalloc(logrelids, maxrelids * sizeof(Oid)); + } + logrelids[nrelids - 1] = lfirst_oid(cell); + } + + foreach (cell, seq_relids_logged) + { + nseqrelids++; + if ((nrelids + nseqrelids) > maxrelids) + { + maxrelids *= 2; + logrelids = repalloc(logrelids, maxrelids * sizeof(Oid)); + } + logrelids[nrelids + nseqrelids - 1] = lfirst_oid(cell); + } + + if ((nrelids + nseqrelids) > 0) + { + xl_heap_truncate xlrec; + + xlrec.dbId = MyDatabaseId; + xlrec.nrelids = nrelids; + xlrec.nseqrelids = nseqrelids; + xlrec.flags = 0; + if (behavior == DROP_CASCADE) + xlrec.flags |= XLH_TRUNCATE_CASCADE; + if (restart_seqs) + xlrec.flags |= XLH_TRUNCATE_RESTART_SEQS; + + XLogBeginInsert(); + XLogRegisterData((char *) &xlrec, SizeOfHeapTruncate); + XLogRegisterData((char *) logrelids, + (nrelids + nseqrelids) * sizeof(Oid)); + + XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN); + + (void) XLogInsert(RM_HEAP_ID, XLOG_HEAP_TRUNCATE); + } Note that the XLogInsert() is in an if branch that's only executed if there's either logged relids or sequence relids. I think logrelids should be allocated at the max size immediately, and the comment rewritten to explicitly only talk about the allocation, rather than sounding like it's about WAL logging as well. Greetings, Andres Freund
> On Apr 2, 2018, at 11:50, Andres Freund <andres@anarazel.de> wrote: > I'm not convinced. I think it's perfectly reasonable to want to truncate > away data on the live node, but maintain it on an archival node. +1. We've had at least one specific request for this, in maintaining a data warehouse from an OLTP system. -- -- Christophe Pettus xof@thebuild.com
Here is an updated patch that addresses some of your concerns. I've split it up into the decoding support and the pgoutput replication support. The problem I see now is that when we WAL-log a truncate, we include all the relids in one record. That seems useful. But during decoding we split this into one change per relid. So at the receiving end, truncate can only process one relation at a time, which will fail if there are foreign keys involved. The solution that had been proposed here was to ignore foreign keys on the subscriber, which is clearly problematic. I wonder why this approach was chosen. If we go through the trouble of WAL-logging all the relations together, why not present them to the output plugin as one so that they can be applied together? This will clearly make questions of filtering and replication set membership and so on more difficult, but that's just the nature of the thing. If you are connecting tables by foreign keys and only replicate some of them, then that's going to have confusing effects no matter what you do. (That's perhaps another reason why having the option of replicating truncate separately from delete could be useful.) I'm going to try to hack up an alternative approach and see how it works out. On 4/1/18 16:01, Andres Freund wrote: > On 2018-01-25 14:21:15 +0100, Marco Nenciarini wrote: >> + if (SessionReplicationRole != SESSION_REPLICATION_ROLE_REPLICA) >> + { >> + >> + /* >> + * Check foreign key references. In CASCADE mode, this should be >> + * unnecessary since we just pulled in all the references; but as a >> + * cross-check, do it anyway if in an Assert-enabled build. >> + */ >> #ifdef USE_ASSERT_CHECKING >> heap_truncate_check_FKs(rels, false); >> + #else >> + if (stmt->behavior == DROP_RESTRICT) >> + heap_truncate_check_FKs(rels, false); >> #endif >> + } > > That *can't* be right. This is actually existing code that just looks funny in the diff after being indented. But I'd rather skip this patch altogether and find a different solution; see above. > I know this has been discussed in the thread already, but it really > strikes me as wrong to basically do some mini DDL replication feature > via per-command WAL records. I think TRUNCATE is special in some ways and it's reasonable to treat it specially. Real DDL is being worked on in the 2PC decoding thread. What we are discussing here isn't going to be applicable there and vice versa, I think. In fact, one of the reasons for this effort is that in BDR TRUNCATE is currently handled like a DDL command, which doesn't work well. >> + <para> >> + <command>TRUNCATE</command> is treated as a form of >> + <command>DELETE</command> for the purpose of deciding whether >> + to publish, or not. >> + </para> >> </listitem> >> </varlistentry> >> </variablelist> > > Why is this a good idea? I have changed this by making this a separate property. > Hm, it seems logicaldecoding.sgml hasn't been updated? I re-checked but didn't find anything suitable to update. >> + void >> + ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged, >> + DropBehavior behavior, bool restart_seqs) >> + { >> + List *rels = list_copy(explicit_rels); > > Why is this copied? Because it is overwritten later. I have moved it down a bit to make that a bit clearer. >> + * Write a WAL record to allow this set of actions to be logically decoded. >> + * We could optimize this away when !RelationIsLogicallyLogged(rel) >> + * but that doesn't save much space or time. > > What you're saying isn't that you're not logging anything, but that > you're allocating the header regardless? Because this certainly sounds > like you unconditionally log a WAL record. > I'm confused. Why do we need the resizing here, when we know the max > upfront? I have rewritten this a bit and removed the logging of the sequence relids, which isn't used anywhere after, to make the code a bit simpler. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 4 April 2018 at 03:31, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > I wonder why this approach was chosen. I looked at coding it that way and it seemed worse than the way chosen. > I'm going to try to hack up an alternative approach and see how it works > out. If you add a new filter for TRUNCATE it will mean compatibility problems and the need for pg_dump support. Need note in release notes to explain that people will need to add TRUNCATE filter to their existing publications. I was hoping to have that picked up automatically, which can't happen if we have an explicit filter for it. >> I know this has been discussed in the thread already, but it really >> strikes me as wrong to basically do some mini DDL replication feature >> via per-command WAL records. Don't really understand that comment. > I think TRUNCATE is special in some ways and it's reasonable to treat it > specially. Real DDL is being worked on in the 2PC decoding thread. > What we are discussing here isn't going to be applicable there and vice > versa, I think. In fact, one of the reasons for this effort is that in > BDR TRUNCATE is currently handled like a DDL command, which doesn't work > well. Agreed -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 4/3/18 22:31, Peter Eisentraut wrote: > The problem I see now is that when we WAL-log a truncate, we include all > the relids in one record. That seems useful. But during decoding we > split this into one change per relid. So at the receiving end, truncate > can only process one relation at a time, which will fail if there are > foreign keys involved. The solution that had been proposed here was to > ignore foreign keys on the subscriber, which is clearly problematic. > I'm going to try to hack up an alternative approach and see how it works > out. Done here. I added a separate callback for decoding truncates, which receives all the relations at once. That information can then be shipped off together and applied together on the receiving side. So foreign keys are not a problem anymore. This ended up being a bit larger than the original patch, but I think it's sound behavior and future-proof. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
This sounds like a good approach. > +static void > +pg_decode_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, > + int nrelations, Relation relations[], ReorderBufferChange *change) > +{ > + for (i = 0; i < nrelations; i++) > + { > + Oid relid = RelationGetRelid(relations[i]); > + > + if (i > 0) > + appendStringInfoString(ctx->out, ", "); > + > + appendStringInfoString(ctx->out, > + quote_qualified_identifier( > + get_namespace_name(get_rel_namespace(relid)), > + get_rel_name(relid))); Note that you start the loop having the Relation; yet you go extra length to grab the relnamespace and relname from syscache instead of just relations[i]->rd_rel->relname etc. pgoutput doesn't do it that way, so it doesn't affect logical replication, but I think it's best not to create awkward code in test_decoding, since it's so widely copied. > + else if (info == XLOG_HEAP_TRUNCATE) > + { > + xl_heap_truncate *xlrec = (xl_heap_truncate *) rec; > + > + if (xlrec->flags & XLH_TRUNCATE_CASCADE) > + appendStringInfo(buf, "cascade "); > + if (xlrec->flags & XLH_TRUNCATE_RESTART_SEQS) > + appendStringInfo(buf, "restart_seqs "); > + appendStringInfo(buf, "nrelids %u", xlrec->nrelids); > + /* skip the list of relids */ > + } Maybe not a big deal, but for future pg_waldump users I'm sure it'll be nice to have the OIDs here. > +void > +ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged, > + DropBehavior behavior, bool restart_seqs) > +{ Please add a comment atop this function. > + /* > + * Write a WAL record to allow this set of actions to be logically decoded. > + * > + * Assemble an array of relids so we can write a single WAL record for the > + * whole action. > + */ > + if (list_length(relids_logged) > 0) > + { > + xl_heap_truncate xlrec; > + int i = 0; I wonder if this should happen only if logical decoding? (Maybe it already occurs because relids_logged would be empty? Worth a comment in that case) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 4/5/18 13:07, Alvaro Herrera wrote: > Note that you start the loop having the Relation; yet you go extra > length to grab the relnamespace and relname from syscache instead of > just relations[i]->rd_rel->relname etc. fixed > Maybe not a big deal, but for future pg_waldump users I'm sure it'll be > nice to have the OIDs here. done >> +void >> +ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged, >> + DropBehavior behavior, bool restart_seqs) >> +{ > > Please add a comment atop this function. done >> + /* >> + * Write a WAL record to allow this set of actions to be logically decoded. >> + * >> + * Assemble an array of relids so we can write a single WAL record for the >> + * whole action. >> + */ >> + if (list_length(relids_logged) > 0) >> + { >> + xl_heap_truncate xlrec; >> + int i = 0; > > I wonder if this should happen only if logical decoding? (Maybe it > already occurs because relids_logged would be empty? Worth a comment in > that case) Added an assertion and a comment. Committed with those changes. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Apr 07, 2018 at 07:40:11PM -0400, Peter Eisentraut wrote: > Committed with those changes. Since commit 039eb6e added logical replication support for TRUNCATE, logical apply of the TRUNCATE fails if it chooses a parallel index build: cat >/tmp/most_parallel.conf <<EOCONF; make -C src/test/subscription check TEMP_CONFIG=/tmp/most_parallel.conf # like noah@leadboat.com buildfarm members log_line_prefix = '%m [%p:%l] %q%a ' log_connections = 'true' log_disconnections = 'true' log_statement = 'all' fsync = off authentication_timeout = '600s' wal_sender_timeout = '18000s' # not on v12 backtrace_functions = 'quickdie, GetTransactionSnapshot' # as much parallelism as we can get log_statement = all wal_level = minimal max_wal_senders = 0 force_parallel_mode = regress min_parallel_index_scan_size = 0 min_parallel_table_scan_size = 0 parallel_setup_cost = 0 parallel_tuple_cost = 0 EOCONF Symptom in src/test/subscription/tmp_check/log/010_truncate_subscriber.log: 2020-12-19 17:54:04.669 PST [3629509:1] LOG: logical replication apply worker for subscription "sub1" has started 2020-12-19 17:54:04.682 PST [3629509:2] ERROR: cannot take query snapshot during a parallel operation 2020-12-19 17:54:04.682 PST [3629509:3] BACKTRACE: postgres: subscriber: logical replication worker for subscription 16411 (GetTransactionSnapshot+0x168) [0x951ce8] postgres: subscriber: logical replication worker for subscription 16411 (InitializeParallelDSM+0x16) [0x52cf86] postgres: subscriber: logical replication worker for subscription 16411 (btbuild+0x26a) [0x50905a] postgres: subscriber: logical replication worker for subscription 16411 (index_build+0x14b) [0x569c1b] postgres: subscriber: logical replication worker for subscription 16411 (reindex_index+0x19a) [0x56caea] postgres: subscriber: logical replication worker for subscription 16411 (reindex_relation+0xc0) [0x56d090] postgres: subscriber: logical replication worker for subscription 16411 (ExecuteTruncateGuts+0x376) [0x62f0d6] postgres: subscriber: logical replication worker for subscription 16411 () [0x78d592] postgres: subscriber: logical replication worker for subscription 16411 (ApplyWorkerMain+0x5ab) [0x78e4eb] postgres: subscriber: logical replication worker for subscription 16411 (StartBackgroundWorker+0x23f) [0x75522f] postgres: subscriber: logical replication worker for subscription 16411 () [0x762a6d] postgres: subscriber: logical replication worker for subscription 16411 () [0x7635ee] /lib64/libpthread.so.0(+0xf630) [0x7fe081e97630] /lib64/libc.so.6(__select+0x13) [0x7fe0805c0983] postgres: subscriber: logical replication worker for subscription 16411 () [0x4887ac] postgres: subscriber: logical replication worker for subscription 16411 (PostmasterMain+0x1118) [0x764c88] postgres: subscriber: logical replication worker for subscription 16411 (main+0x6f2) [0x48aae2] /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fe0804ed555] postgres: subscriber: logical replication worker for subscription 16411 () [0x48ab49] 2020-12-19 17:54:04.683 PST [3629353:5] LOG: background worker "logical replication worker" (PID 3629509) exited with exitcode 1 (To see that particular failure at commit 039eb6e, one would need to cherry-pick f90e80b to avoid an earlier failure. One would also modify PostgresNode to inject the parallelism settings, because PostgresNode did not support TEMP_CONFIG in those days.)
Hi, On 2020-12-20 04:13:19 +0000, Noah Misch wrote: > postgres: subscriber: logical replication worker for subscription 16411 (GetTransactionSnapshot+0x168) [0x951ce8] > postgres: subscriber: logical replication worker for subscription 16411 (InitializeParallelDSM+0x16) [0x52cf86] > postgres: subscriber: logical replication worker for subscription 16411 (btbuild+0x26a) [0x50905a] > postgres: subscriber: logical replication worker for subscription 16411 (index_build+0x14b) [0x569c1b] > postgres: subscriber: logical replication worker for subscription 16411 (reindex_index+0x19a) [0x56caea] > postgres: subscriber: logical replication worker for subscription 16411 (reindex_relation+0xc0) [0x56d090] > postgres: subscriber: logical replication worker for subscription 16411 (ExecuteTruncateGuts+0x376) [0x62f0d6] > postgres: subscriber: logical replication worker for subscription 16411 () [0x78d592] > postgres: subscriber: logical replication worker for subscription 16411 (ApplyWorkerMain+0x5ab) [0x78e4eb] > postgres: subscriber: logical replication worker for subscription 16411 (StartBackgroundWorker+0x23f) [0x75522f] > postgres: subscriber: logical replication worker for subscription 16411 () [0x762a6d] > postgres: subscriber: logical replication worker for subscription 16411 () [0x7635ee] > /lib64/libpthread.so.0(+0xf630) [0x7fe081e97630] > /lib64/libc.so.6(__select+0x13) [0x7fe0805c0983] > postgres: subscriber: logical replication worker for subscription 16411 () [0x4887ac] > postgres: subscriber: logical replication worker for subscription 16411 (PostmasterMain+0x1118) [0x764c88] > postgres: subscriber: logical replication worker for subscription 16411 (main+0x6f2) [0x48aae2] > /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fe0804ed555] > postgres: subscriber: logical replication worker for subscription 16411 () [0x48ab49] > 2020-12-19 17:54:04.683 PST [3629353:5] LOG: background worker "logical replication worker" (PID 3629509) exited withexit code 1 Hm. Do I understand correctly that this problem is hit solely because the parallel mode code relies on there already have been a transaction snapshot set, thus avoiding the error? And that the code normally only works because GetTransactionSnapshot() will already have been called somewhere, before EnterParallelMode()? Greetings, Andres Freund
On Sun, Dec 20, 2020 at 3:13 PM Andres Freund <andres@anarazel.de> wrote: > Hm. Do I understand correctly that this problem is hit solely because > the parallel mode code relies on there already have been a transaction > snapshot set, thus avoiding the error? And that the code normally only > works because GetTransactionSnapshot() will already have been called > somewhere, before EnterParallelMode()? It seems unlikely that InitializeParallelDSM() was ever intended to be run in a background worker. -- Peter Geoghegan
On Sun, Dec 20, 2020 at 9:43 AM Noah Misch <noah@leadboat.com> wrote: > > Since commit 039eb6e added logical replication support for TRUNCATE, logical > apply of the TRUNCATE fails if it chooses a parallel index build: > I think the TRUNCATE operation should not use parallelism either via apply worker or without it because there is nothing to scan in heap. Additionally, we can have an Assert or elog in InitializeParallelDSM to ensure that it is never invoked by parallel worker. -- With Regards, Amit Kapila.
On Sun, Dec 20, 2020 at 03:54:31PM -0800, Peter Geoghegan wrote: > On Sun, Dec 20, 2020 at 3:13 PM Andres Freund <andres@anarazel.de> wrote: > > Hm. Do I understand correctly that this problem is hit solely because > > the parallel mode code relies on there already have been a transaction > > snapshot set, thus avoiding the error? And that the code normally only > > works because GetTransactionSnapshot() will already have been called > > somewhere, before EnterParallelMode()? I think so. > It seems unlikely that InitializeParallelDSM() was ever intended to be > run in a background worker. That wouldn't surprise me. Nonetheless, when worker_spi runs parallel queries, they work fine. The logical replication worker experiences novel scenarios, because it calls ExecuteTruncateGuts() directly, not as part of an actual TRUNCATE query. That bypasses some of the usual once-per-query setup. On Mon, Dec 21, 2020 at 12:29:37PM +0530, Amit Kapila wrote: > I think the TRUNCATE operation should not use parallelism either via > apply worker or without it because there is nothing to scan in heap. That's fair. > Additionally, we can have an Assert or elog in InitializeParallelDSM > to ensure that it is never invoked by parallel worker. I don't know whether InitializeParallelDSM() operates correctly from inside a parallel worker. That is orthogonal to the bug here.
Hi, On 2020-12-20 15:54:31 -0800, Peter Geoghegan wrote: > On Sun, Dec 20, 2020 at 3:13 PM Andres Freund <andres@anarazel.de> wrote: > > Hm. Do I understand correctly that this problem is hit solely because > > the parallel mode code relies on there already have been a transaction > > snapshot set, thus avoiding the error? And that the code normally only > > works because GetTransactionSnapshot() will already have been called > > somewhere, before EnterParallelMode()? > > It seems unlikely that InitializeParallelDSM() was ever intended to be > run in a background worker. IDK, the environment in a bgworker shouldn't be that different from the normal query environment in a normal connection. And it's far from insane to want to be able to run a paralell query in a bgworker (and I *think* I have seen that work before). This case here seems more like an accidental dependency than anything to me, once that could perhaps even hint at problems in normal backends too. Greetings, Andres Freund
On Mon, Dec 21, 2020 at 09:42:47AM -0800, Andres Freund wrote: > On 2020-12-20 15:54:31 -0800, Peter Geoghegan wrote: > > On Sun, Dec 20, 2020 at 3:13 PM Andres Freund <andres@anarazel.de> wrote: > > > Hm. Do I understand correctly that this problem is hit solely because > > > the parallel mode code relies on there already have been a transaction > > > snapshot set, thus avoiding the error? And that the code normally only > > > works because GetTransactionSnapshot() will already have been called > > > somewhere, before EnterParallelMode()? > > > > It seems unlikely that InitializeParallelDSM() was ever intended to be > > run in a background worker. > > IDK, the environment in a bgworker shouldn't be that different from the > normal query environment in a normal connection. And it's far from > insane to want to be able to run a paralell query in a bgworker (and I > *think* I have seen that work before). This case here seems more like > an accidental dependency than anything to me, once that could perhaps > even hint at problems in normal backends too. Yeah. SPI_execute("TRUNCATE foo", false, 0) has no trouble doing a parallel index build in a bgworker. Let's ignore intention and be pleased about that.