Thread: "unexpected duplicate for tablespace" problem in logical replication
"unexpected duplicate for tablespace" problem in logical replication
From
"wangsh.fnst@fujitsu.com"
Date:
Hi, I met a problem while using logical replication in PG11 and I think all the PG version have this problem. The log looks like: > ERROR: unexpected duplicate for tablespace 0, relfilenode xxxxxxx Someone also reported this problem in [1], but no one has responded to it. I did some investigation, and found a way to reproduce this problem. The steps are: 1. create a table (call it tableX) and truncate it. 2. cycle through 2^32 OIDs. 3. restart the database to clear all the cache. 4. create a temp table which make the temp table's OID equals to the tableX's relfilenode and insert any data into tableX. The attachment(run.sh) can reproduce this problem in PG10 and PG11with the help of option 'WITH OIDS'. I don't find any way to cycle the OIDs quickly in branch master, but I use the gdb to reproduce this problem too. Now, function GetNewRelFileNode() only checks: 1. duplicated OIDs in pg_class. 2. relpath(rnode) is exists in disk. However, the result of relpath(temp table) and relpath(non-temp table) are different, temp table's relpath() has a prefix "t%d". That means, if there is a table that value of relfilenode is 20000(but the value of oid isn't 20000), it's possible to create a temp table that value of relfilenode is also 20000. I think function GetNewRelFileNode() should always check the duplicated relfilenode, see the patch(a simple to way to fix this problem is master branch). Any comment? Regards, Shenhao Wang [1] https://www.postgresql.org/message-id/flat/CAM5YvKTPxmMT%3DS7iPcu5SgmaOv4S4nhE1HZRO_sdFX9cXeXXOQ%40mail.gmail.com
Attachment
RE: "unexpected duplicate for tablespace" problem in logical replication
From
"osumi.takamichi@fujitsu.com"
Date:
On Wednesday, April 6, 2022 11:14 AM wangsh.fnst@fujitsu.com <wangsh.fnst@fujitsu.com> wrote: > I met a problem while using logical replication in PG11 and I think all the PG > version have this problem. > > > The log looks like: > > ERROR: unexpected duplicate for tablespace 0, relfilenode xxxxxxx > Someone also reported this problem in [1], but no one has responded to it. > > > > I did some investigation, and found a way to reproduce this problem. The > steps are: > > > 1. create a table (call it tableX) and truncate it. > > > 2. cycle through 2^32 OIDs. > > > 3. restart the database to clear all the cache. > > > 4. create a temp table which make the temp table's OID equals to the > tableX's relfilenode and insert any data into tableX. > > > The attachment(run.sh) can reproduce this problem in PG10 and PG11with > the help of option 'WITH OIDS'. I don't find any way to cycle the OIDs > quickly in branch master, but I use the gdb to reproduce this problem too. > > > > Now, function GetNewRelFileNode() only checks: > > > 1. duplicated OIDs in pg_class. > > > 2. relpath(rnode) is exists in disk. > > > However, the result of relpath(temp table) and relpath(non-temp table) > are different, temp table's relpath() has a prefix "t%d". That means, if > there is a table that value of relfilenode is 20000(but the value of oid > isn't 20000), it's possible to create a temp table that value of > relfilenode is also 20000. > > > I think function GetNewRelFileNode() should always check the duplicated > relfilenode, see the patch(a simple to way to fix this problem is master > branch). > > > Any comment? Hi, thank you for your report. It seems correct that there's room that wraparounded oid can be used for temp table, and we get duplicate result when we retrieve it and face the error. I reproduced your issue with HEAD and gdb, by replacing rnode.node.relNode with an existing relfilenode in GetNewRelFileNode(), immediately before the call of relpath(). At the same time, I also confirmed that with your patch, I had another iteration call of GetNewOidWithIndex() and got different oid, even if I inserted an existing relfilenode in the same way. I didn't get the same error with your patch. Below are my several review comments on your patch. (1) Name of isRelNodeCollision How about changing it to "IsCollidedRelNode" ? When I see around of this code, there are functions named as "IsReservedName", "IsSharedRelation" or "IsPinnedObject". So, my suggestion would be more aligned. (2) Remove curly brackets for one sentence + if (pg_class != NULL) + { + usedAsOID = true; + } + else + { + pg_class = table_open(RelationRelationId, AccessShareLock); + usedAsOID = false; + } + Please remove curly brackets for one sentence. Also, I suggest you add a comment something like "Ensure we have opened pg_class in this function" at the top of this check for clarification. (3) variables of isRelNodeCollision + SysScanDesc scandesc; + static ScanKeyData skey[2]; We can move those declaration into the inside of the conditional branch for scankey. (4) Having Assert in isRelNodeCollision How about add an Assert to check the argument relation is not NULL ? (5) argument name of isRelNodeCollision I suggest you change it to pg_class. Best Regards, Takamichi Osumi
RE: "unexpected duplicate for tablespace" problem in logical replication
From
"osumi.takamichi@fujitsu.com"
Date:
On Friday, April 8, 2022 6:44 PM I wrote: > On Wednesday, April 6, 2022 11:14 AM wangsh.fnst@fujitsu.com > <wangsh.fnst@fujitsu.com> wrote: > > I met a problem while using logical replication in PG11 and I think > > all the PG version have this problem. > > > > > > The log looks like: > > > ERROR: unexpected duplicate for tablespace 0, relfilenode xxxxxxx > > Someone also reported this problem in [1], but no one has responded to it. > > > > > > > > I did some investigation, and found a way to reproduce this problem. > > The steps are: > > > > > > 1. create a table (call it tableX) and truncate it. > > > > > > 2. cycle through 2^32 OIDs. > > > > > > 3. restart the database to clear all the cache. > > > > > > 4. create a temp table which make the temp table's OID equals to the > > tableX's relfilenode and insert any data into tableX. > > > > > > The attachment(run.sh) can reproduce this problem in PG10 and PG11with > > the help of option 'WITH OIDS'. I don't find any way to cycle the OIDs > > quickly in branch master, but I use the gdb to reproduce this problem too. > > > > > > > > Now, function GetNewRelFileNode() only checks: > > > > > > 1. duplicated OIDs in pg_class. > > > > > > 2. relpath(rnode) is exists in disk. > > > > > > However, the result of relpath(temp table) and relpath(non-temp table) > > are different, temp table's relpath() has a prefix "t%d". That means, > > if there is a table that value of relfilenode is 20000(but the value > > of oid isn't 20000), it's possible to create a temp table that value > > of relfilenode is also 20000. > > > > > > I think function GetNewRelFileNode() should always check the > > duplicated relfilenode, see the patch(a simple to way to fix this > > problem is master branch). > > > > > > Any comment? > Hi, thank you for your report. > > > It seems correct that there's room that wraparounded oid can be used for temp > table, and we get duplicate result when we retrieve it and face the error. > > I reproduced your issue with HEAD and gdb, by replacing rnode.node.relNode > with an existing relfilenode in GetNewRelFileNode(), immediately before the > call of relpath(). One thing I forgot to note is that this bug is not unique to the logical replication. There is other path to hit it for example, pg_filenode_relation in the same procedures with gdb. In the below output, I created tempa table with the same filenode with gdb without having a pair of logical replication and got the same error you reported. postgres=# select oid, relname, relfilenode, reltablespace from pg_class where relname in ('c', 'tempa'); oid | relname | relfilenode | reltablespace -------+---------+-------------+--------------- 16387 | c | 16390 | 0 16390 | tempa | 16390 | 0 (2 rows) postgres=# select pg_filenode_relation(0, 16390); ERROR: unexpected duplicate for tablespace 0, relfilenode 16390 Best Regards, Takamichi Osumi
Re: "unexpected duplicate for tablespace" problem in logical replication
From
"wangsh.fnst@fujitsu.com"
Date:
Hi, Osumi-san, Thank you for your comment. On 4/11/22 15:40, Osumi, Takamichi/大墨 昂道 wrote: > (1) Name of isRelNodeCollision changed. > (2) Remove curly brackets for one sentence changed. > (3) variables of isRelNodeCollision changed. > (4) Having Assert in isRelNodeCollision added. > (5) argument name of isRelNodeCollision changed. BTW, I also change the parameter from null to SnapshotAny scandesc = systable_beginscan(pg_class, ClassTblspcRelfilenodeIndexId, - true, NULL, 2, skey); + true, SnapshotAny, 2, skey); I'm not sure, but I think SnapshotAny is more suitable after reading the source(comment) in function GetNewOidWithIndex(). Please see the attachment Regards, Shenhao Wang
Attachment
Re: "unexpected duplicate for tablespace" problem in logical replication
From
Kyotaro Horiguchi
Date:
At Mon, 11 Apr 2022 10:24:26 +0000, "wangsh.fnst@fujitsu.com" <wangsh.fnst@fujitsu.com> wrote in > Please see the attachment Sorry for being late, but this seems to be in the wrong direction. In the problem case, as you know, GetNewRelFileNode does *not* check oid uniqueness in pg_class. The on-catalog duplicate check is done only when the result is to be used as the *table oid* at creation. In other cases it chooses a relfilenode that does not duplicate in storage file names irrelevantly to the relation oid. As the result Non-temporary and temporary tables have separate relfilenode oid spaces, either intentional or not.. Thus, I think we don't want GetNewRelFileNode to check for duplicate oid on pg_class if pg_class is not passed since that significantly narrow the oid space for relfilenode. If we did something like that, we might do check the existence of file names of the both non-temp and temp in GetNewRelFileNode(), but that also narrows the relfilenode oid space. RelidByRelfilenode is called by autoprewarm, logical replication, and pg_filenode_relation. In the autoprewarm and logical replication cases, it is called only for non-temprary relations so letting the function ignore (oid duplicating) temp tables works. pg_relfilienode_relation is somewhat dubious. It is affected by this bug. In the attached PoC, to avoid API change (in case for back-patching), RelidByRelfilenode ignores duplcates of differernt persistence. Also the PoC prioritize on persistent tables to temporary ones but I'm not sure it's good decision, but otherwise we need to change the signature of pg_filenode_relation. The attached is an PoC of that. The first one is a reproduction-aid code. With it applied, the following steps create duplicate relfilenode relations. (apply the first diff) $ rm /tmp/hoge $ (start postgres) # touch /tmp/hoge $ psql =# create table xp (a int); =# truncate xp; =# create temp table xt (a int); =# truncate xt; =# select oid, relname, relfilenode from pg_class where relname like 'x%'; oid | relname | relfilenode -------+---------+------------- 16384 | xp | 55555 55558 | xt | 55555 (2 rows) =# select pg_filenode_relation(0, 55555); ERROR: unexpected duplicate for tablespace 0, relfilenode 55555 (apply the second diff) =# create temp table xt (a int); =# truncate xt; =# select pg_filenode_relation(0, 55555); pg_filenode_relation ---------------------- xp (1 row) What do you think about this direction? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
Re: "unexpected duplicate for tablespace" problem in logical replication
From
"wangsh.fnst@fujitsu.com"
Date:
On 4/12/22 10:45, Kyotaro Horiguchi wrote: > What do you think about this direction? I think this direction is better then mine. This POC looks good to me. Regards, Shenhao Wang
On Tue, Apr 12, 2022 at 11:45 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Mon, 11 Apr 2022 10:24:26 +0000, "wangsh.fnst@fujitsu.com" <wangsh.fnst@fujitsu.com> wrote in > > Please see the attachment > > Sorry for being late, but this seems to be in the wrong direction. > > In the problem case, as you know, GetNewRelFileNode does *not* check > oid uniqueness in pg_class. The on-catalog duplicate check is done > only when the result is to be used as the *table oid* at creation. In > other cases it chooses a relfilenode that does not duplicate in > storage file names irrelevantly to the relation oid. As the result > Non-temporary and temporary tables have separate relfilenode oid > spaces, either intentional or not.. > > Thus, I think we don't want GetNewRelFileNode to check for duplicate > oid on pg_class if pg_class is not passed since that significantly > narrow the oid space for relfilenode. If we did something like that, > we might do check the existence of file names of the both non-temp and > temp in GetNewRelFileNode(), but that also narrows the relfilenode oid > space. Agreed. > > RelidByRelfilenode is called by autoprewarm, logical replication, and > pg_filenode_relation. In the autoprewarm and logical replication > cases, it is called only for non-temprary relations so letting the > function ignore (oid duplicating) temp tables works. > pg_relfilienode_relation is somewhat dubious. It is affected by this > bug. In the attached PoC, to avoid API change (in case for > back-patching), RelidByRelfilenode ignores duplcates of differernt > persistence. Also the PoC prioritize on persistent tables to > temporary ones but I'm not sure it's good decision, but otherwise we > need to change the signature of pg_filenode_relation. > > The attached is an PoC of that. The first one is a reproduction-aid > code. With it applied, the following steps create duplicate > relfilenode relations. > > What do you think about this direction? Sounds good to me. If we go in this direction, probably we also need to change the cache checks in RelidByRelfilenode() so that we prioritize non-temporary relations. Otherwise, we will end up returning the OID of temporary relation if it's already cached. Another idea I came up with is that we have RelidByRelfilenode() check only non-temporary relations by adding relpersistence != RELPERSISTENCE_TEMP to the scan key. If we want to support temporary relations too, I think that, in v16, we can add relpersistence to RelidByRelfilenode() as a function argument (and a flag to pg_filenode_relation() accordingly) so that the user can get the name of the temporary relation by like pg_filenode_relation(0, 16386, true). On the other hand, in back branches, if an application is using pg_filenode_relation() to get the name of temporary relations, it would break it. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Tue, 19 Apr 2022 at 11:54, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Apr 12, 2022 at 11:45 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > At Mon, 11 Apr 2022 10:24:26 +0000, "wangsh.fnst@fujitsu.com" <wangsh.fnst@fujitsu.com> wrote in > > > Please see the attachment > > > > Sorry for being late, but this seems to be in the wrong direction. > > > > In the problem case, as you know, GetNewRelFileNode does *not* check > > oid uniqueness in pg_class. The on-catalog duplicate check is done > > only when the result is to be used as the *table oid* at creation. In > > other cases it chooses a relfilenode that does not duplicate in > > storage file names irrelevantly to the relation oid. As the result > > Non-temporary and temporary tables have separate relfilenode oid > > spaces, either intentional or not.. > > > > Thus, I think we don't want GetNewRelFileNode to check for duplicate > > oid on pg_class if pg_class is not passed since that significantly > > narrow the oid space for relfilenode. If we did something like that, > > we might do check the existence of file names of the both non-temp and > > temp in GetNewRelFileNode(), but that also narrows the relfilenode oid > > space. > > Agreed. > > > > > RelidByRelfilenode is called by autoprewarm, logical replication, and > > pg_filenode_relation. In the autoprewarm and logical replication > > cases, it is called only for non-temprary relations so letting the > > function ignore (oid duplicating) temp tables works. > > pg_relfilienode_relation is somewhat dubious. It is affected by this > > bug. In the attached PoC, to avoid API change (in case for > > back-patching), RelidByRelfilenode ignores duplcates of differernt > > persistence. Also the PoC prioritize on persistent tables to > > temporary ones but I'm not sure it's good decision, but otherwise we > > need to change the signature of pg_filenode_relation. > > > > The attached is an PoC of that. The first one is a reproduction-aid > > code. With it applied, the following steps create duplicate > > relfilenode relations. > > > > What do you think about this direction? > > Sounds good to me. If we go in this direction, probably we also need > to change the cache checks in RelidByRelfilenode() so that we > prioritize non-temporary relations. Otherwise, we will end up > returning the OID of temporary relation if it's already cached. > > Another idea I came up with is that we have RelidByRelfilenode() check > only non-temporary relations by adding relpersistence != > RELPERSISTENCE_TEMP to the scan key. If we want to support temporary > relations too, I think that, in v16, we can add relpersistence to > RelidByRelfilenode() as a function argument (and a flag to > pg_filenode_relation() accordingly) so that the user can get the name > of the temporary relation by like pg_filenode_relation(0, 16386, > true). On the other hand, in back branches, if an application is using > pg_filenode_relation() to get the name of temporary relations, it > would break it. I have changed the status to "Waiting on Author" as the points raised by Masahiko Sawada-san is pending. Regards, Vignesh
Re: "unexpected duplicate for tablespace" problem in logical replication
From
Kyotaro Horiguchi
Date:
At Mon, 8 Jan 2024 10:32:03 +0530, vignesh C <vignesh21@gmail.com> wrote in > I have changed the status to "Waiting on Author" as the points raised > by Masahiko Sawada-san is pending. Oops! This slipped off from my mind. Thank you for the reminder. > On Tue, 19 Apr 2022 at 11:54, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > What do you think about this direction? > > > > Sounds good to me. If we go in this direction, probably we also need > > to change the cache checks in RelidByRelfilenode() so that we > > prioritize non-temporary relations. Otherwise, we will end up > > returning the OID of temporary relation if it's already cached. Maybe. It might be better for the cache not to register temprary relations at all. > > Another idea I came up with is that we have RelidByRelfilenode() check > > only non-temporary relations by adding relpersistence != > > RELPERSISTENCE_TEMP to the scan key. I'm not sure it is good considering the case pg_relationumber_relation for a temporary relation. > > If we want to support temporary > > relations too, I think that, in v16, we can add relpersistence to > > RelidByRelfilenode() as a function argument (and a flag to > > pg_filenode_relation() accordingly) so that the user can get the name > > of the temporary relation by like pg_filenode_relation(0, 16386, > > true). On the other hand, in back branches, if an application is using > > pg_filenode_relation() to get the name of temporary relations, it > > would break it. Agreed. So, the attachd files are the patches for the master and older versions respectively. If there is a concern with the patch for the master, the parameter handling of pg_filenode_relation might be not good. If the feature of searching for relations regardless of their persistence is unnecessary, it could be simplified. (Anyway it lacks documentation at all). Also, while the patch for version 11 is attached, I am unable to build this version on my system (although I haven't investigated this deeply, but I'm on Rocky 9 now). regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Fri, Jan 12, 2024 at 3:38 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > Maybe. It might be better for the cache not to register temprary > relations at all. This point seems worthy of serious consideration to me. Is there any reason why we need RelidByRelfilenumber() to work with temporary relations at all? I understand that the current behavior is exposed via the SQL-callable function, but maybe that's not really intentional. If there's no other use of RelidByRelfilenumber() that needs to care about permanent relations intrinsically, I think we shouldn't hesitate to just cut them out of the mechanism entirely. -- Robert Haas EDB: http://www.enterprisedb.com
Re: "unexpected duplicate for tablespace" problem in logical replication
From
Kyotaro Horiguchi
Date:
At Mon, 15 Jan 2024 16:32:07 -0500, Robert Haas <robertmhaas@gmail.com> wrote in > On Fri, Jan 12, 2024 at 3:38 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > Maybe. It might be better for the cache not to register temprary > > relations at all. > > This point seems worthy of serious consideration to me. Is there any > reason why we need RelidByRelfilenumber() to work with temporary > relations at all? I understand that the current behavior is exposed > via the SQL-callable function, but maybe that's not really > intentional. If there's no other use of RelidByRelfilenumber() that > needs to care about permanent relations intrinsically, I think we > shouldn't hesitate to just cut them out of the mechanism entirely. Do you mean that the current behavior of the SQL-callable function is being treated as a bug and should it be corrected? Simply doing so will result in the functions pg_relation_filenode() and pg_filenode_relation() behaving asymmetrically. While there is no need to purposely change the behavior of the former, it is necessary to document the behavior of the latter regardless. The attached patch does the above for the master head. If we apply this approach to older versions, I can adapt and create similar patches for them. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Mon, Jan 15, 2024 at 8:58 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > Do you mean that the current behavior of the SQL-callable function is > being treated as a bug and should it be corrected? > > Simply doing so will result in the functions pg_relation_filenode() > and pg_filenode_relation() behaving asymmetrically. While there is no > need to purposely change the behavior of the former, it is necessary > to document the behavior of the latter regardless. > > The attached patch does the above for the master head. If we apply > this approach to older versions, I can adapt and create similar > patches for them. Yes, this patch shows the approach I was asking about. Andres, what do you think about this idea? I wonder if you just momentarily forgot about temporary relations when coding RelidByRelfilenumber -- because for that function to give well-defined answers with temporary relations included, it would need the backend ID as an additional argument. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Feb 02, 2024 at 10:49:17AM -0500, Robert Haas wrote: > Andres, what do you think about this idea? I wonder if you just > momentarily forgot about temporary relations when coding > RelidByRelfilenumber -- because for that function to give well-defined > answers with temporary relations included, it would need the backend > ID as an additional argument. No idea what Andres thinks, but seeing that pg_filenode_relation() uses in input a tablespace OID and a filenode OID while ignoring the prefix that would be used for a temp relation path (with a 't' and the backend number), it is clear that the current function is not suited to make the difference between temporary and persistent relations as we'd need to have a priority order to choose one over the other. And that may not lead to the correct choice. Ignoring temporary relations entirely makes sense: one cannot get a regclass from only a tablespace and a relfilenode, the persistence, as well as a backend ID would also be required. I've not checked the patch in details, but it's to say that the idea to cut temporary relations sounds rather right here. -- Michael
Attachment
On Sun, Mar 17, 2024 at 11:35 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Feb 02, 2024 at 10:49:17AM -0500, Robert Haas wrote: > > Andres, what do you think about this idea? I wonder if you just > > momentarily forgot about temporary relations when coding > > RelidByRelfilenumber -- because for that function to give well-defined > > answers with temporary relations included, it would need the backend > > ID as an additional argument. > > > Ignoring temporary relations entirely makes sense: one cannot get a > regclass from only a tablespace and a relfilenode, the persistence, as > well as a backend ID would also be required. I've not checked the > patch in details, but it's to say that the idea to cut temporary > relations sounds rather right here. That makes sense to me too. Regarding the patch, filtering by the relpersistence in systable_getnext() loop seems to be good to me. Alternatively we can add "relpersistence == RELPERSISTENCE_TEMP" to the scan key. The patch would need regression tests too. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com