Re: "unexpected duplicate for tablespace" problem in logical replication - Mailing list pgsql-bugs
From | Ashutosh Bapat |
---|---|
Subject | Re: "unexpected duplicate for tablespace" problem in logical replication |
Date | |
Msg-id | CAExHW5vBePcL9iEFasECB-wmYksZ9nUVnjnzOqkHYvtbCn==0A@mail.gmail.com Whole thread Raw |
In response to | Re: "unexpected duplicate for tablespace" problem in logical replication (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
Re: "unexpected duplicate for tablespace" problem in logical replication
|
List | pgsql-bugs |
On Thu, Jun 19, 2025 at 2:17 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Tue, Jun 17, 2025 at 6:29 PM vignesh C <vignesh21@gmail.com> wrote: > > > > On Thu, 25 Jul 2024 at 03:21, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Sun, Mar 17, 2024 at 11:35 PM Michael Paquier <michael@paquier.xyz> wrote: > > > > > > > > On Fri, Feb 02, 2024 at 10:49:17AM -0500, Robert Haas wrote: > > > > > Andres, what do you think about this idea? I wonder if you just > > > > > momentarily forgot about temporary relations when coding > > > > > RelidByRelfilenumber -- because for that function to give well-defined > > > > > answers with temporary relations included, it would need the backend > > > > > ID as an additional argument. > > > > > > > > > > > > Ignoring temporary relations entirely makes sense: one cannot get a > > > > regclass from only a tablespace and a relfilenode, the persistence, as > > > > well as a backend ID would also be required. I've not checked the > > > > patch in details, but it's to say that the idea to cut temporary > > > > relations sounds rather right here. > > > > > > That makes sense to me too. > > > > > > Regarding the patch, filtering by the relpersistence in > > > systable_getnext() loop seems to be good to me. Alternatively we can > > > add "relpersistence == RELPERSISTENCE_TEMP" to the scan key. > > Do you mean relpersistence != RELPERSISTENCE_TEMP? I tried that. It's > not possible to add this check to the scankey since indexes do not > work with != conditions. We will need to use a sequential scan to do > so, but that will affect the performance. I think what Vignesh has > done in his patch is good enough. > > > > > The attached patch adds a new test and resolves an existing test > > failure. However, a downside is that we can no longer verify the > > mapping of the temporary tables. > > Yes, but I think the current way of verification wasn't correct anyway > because it couldn't match the temporary table's relation exactly. We > will need to devise another way to do that, maybe creating a version > of pg_filenode_relation() for temporary tables. > > Some more comments, some of which I have applied in the attached > patches. Please review those. Let me know what you think. > > I feel that we should mention permanent tables in the prologue of > pg_filenode_relation() for somebody who just looks at > pg_filenode_relation(). Also in its pg_proc.dat description for one > who looks at \df+ output. Attached patch does that. > > - * Returns InvalidOid if no relation matching the criteria could be found. > + * Returns InvalidOid if no permanent relation matching the criteria could be > + * found. > > The relpersistence enum has values > #define RELPERSISTENCE_PERMANENT 'p' /* regular table */ > #define RELPERSISTENCE_UNLOGGED 'u' /* unlogged permanent table */ > #define RELPERSISTENCE_TEMP 't' /* temporary table */ > > The description of UNLOGGED mentions "permanent" so it looks like your > use of "permanent" covers both regular and unlogged tables. However > looking purely at the RELPERSISTENCE_ names, it' s possible that one > will associate the word "permanent" with RELPERSISTENCE_PERMANENT > only, and not with RELPERSISTENCE_UNLOGGED. Should we use > "non-temporary" instead to avoid confusion? > > + /* > + * Temporary relations should be excluded. This exclusion is > + * necessary because they can lead to the wrong result; the > + * relfilenumber space is shared with persistent > + * relations. Additionally, excluding them is possible since they > + * are not the focus in this context. > + */ > + if (classform->relpersistence == RELPERSISTENCE_TEMP) > + continue; > + > > I slightly rephrased the comment and moved it to the function prologue > since it defines the function's scope. > > WHERE relkind IN ('r', 'i', 'S', 't', 'm') AND mapped_oid IS DISTINCT FROM oid; > +WHERE relkind IN ('r', 'i', 'S', 'm') AND mapped_oid IS DISTINCT FROM oid; > > Why do we need to remove permanent toast tables from the query? > Instead adjustment below > > SELECT m.* FROM filenode_mapping m LEFT JOIN pg_class c ON c.oid = m.oid > -WHERE c.oid IS NOT NULL OR m.mapped_oid IS NOT NULL; > +WHERE (c.oid IS NOT NULL OR m.mapped_oid IS NOT NULL) AND > c.relpersistence != 't'; > > seems enough. Actually, we shouldn't pass temporary tables to > pg_filenode_relation(), since it doesn't map temporary relations now. > Adjusted that way in the attached patch. > > While reviewing the patch, I found something else. The > RelidByRelfilenumber() enters negative cache entries. > RelfilenumberMapInvalidateCallback() treats the negative entries > specifically which indicates that it's intentional. But if somebody > calls pg_filenode_relation() repeatedly with invalid relfilenodes, > that would bloat the cache with "kinda useless entries". It's a way to > make a backend consume a lot of memory quickly and thus perform a DOS. > For example, on my laptop, I could make a backend consume almost 3GB > of memory in 7 minutes. > > #SELECT pg_size_pretty(total_bytes) total_bytes, > pg_size_pretty(used_bytes) used_bytes FROM pg_backend_memory_contexts > where name = 'RelfilenumberMap cache'; > total_bytes | used_bytes > -------------+------------ > (0 rows) > > #\timing > Timing is on. > #select r is null, count(*) from (select pg_filenode_relation(i, j) r > from generate_series(1, 1000) i, generate_series(1, 1000) j) q group > by r is null; > ?column? | count > ----------+--------- > t | 1000000 > (1 row) > > Time: 4705.483 ms (00:04.705) > #SELECT pg_size_pretty(total_bytes) total_bytes, > pg_size_pretty(used_bytes) used_bytes FROM pg_backend_memory_contexts > where name = 'RelfilenumberMap cache'; > total_bytes | used_bytes > -------------+------------ > 40 MB | 39 MB > (1 row) > > Time: 2.351 ms > #select r is null, count(*) from (select pg_filenode_relation(i, j) r > from generate_series(1001, 10000) i, generate_series(1001, 10000) j) q > group by r is null; > ?column? | count > ----------+---------- > f | 153215 > t | 80846785 > (2 rows) > > Time: 421998.039 ms (07:01.998) > #SELECT pg_size_pretty(total_bytes) total_bytes, > pg_size_pretty(used_bytes) used_bytes FROM pg_backend_memory_contexts > where name = 'RelfilenumberMap cache'; > total_bytes | used_bytes > -------------+------------ > 3180 MB | 3176 MB > (1 row) > > Time: 132.187 ms > > Logical replication and autoprewarm may not cause such a large bloat > but there is no limit to passing invalid combinations of reltablespace > and relfilenumber to pg_filenode_relation(). Do we want to prohibit > that case by passing a flag from logical pg_filenode_relation() to not > cache invalid entries? > > I have moved the CF entry to the July commitfest. The patch needed a rebase because of func.sgml refactoring. Here's a rebased patchset. -- Best Wishes, Ashutosh Bapat
Attachment
pgsql-bugs by date: