Re: "unexpected duplicate for tablespace" problem in logical replication - Mailing list pgsql-bugs

From Ashutosh Bapat
Subject Re: "unexpected duplicate for tablespace" problem in logical replication
Date
Msg-id CAExHW5uaSBDktUEyohUYy_QGRZgN=zd=B9gdp2MyDMJZOCisBg@mail.gmail.com
Whole thread Raw
In response to Re: "unexpected duplicate for tablespace" problem in logical replication  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
List pgsql-bugs
On Wed, Aug 13, 2025 at 3:18 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Jun 19, 2025 at 2:17 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Tue, Jun 17, 2025 at 6:29 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Thu, 25 Jul 2024 at 03:21, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Sun, Mar 17, 2024 at 11:35 PM Michael Paquier <michael@paquier.xyz> wrote:
> > > > >
> > > > > On Fri, Feb 02, 2024 at 10:49:17AM -0500, Robert Haas wrote:
> > > > > > Andres, what do you think about this idea? I wonder if you just
> > > > > > momentarily forgot about temporary relations when coding
> > > > > > RelidByRelfilenumber -- because for that function to give well-defined
> > > > > > answers with temporary relations included, it would need the backend
> > > > > > ID as an additional argument.
> > > > >
> > > > >
> > > > > Ignoring temporary relations entirely makes sense: one cannot get a
> > > > > regclass from only a tablespace and a relfilenode, the persistence, as
> > > > > well as a backend ID would also be required.  I've not checked the
> > > > > patch in details, but it's to say that the idea to cut temporary
> > > > > relations sounds rather right here.
> > > >
> > > > That makes sense to me too.
> > > >
> > > > Regarding the patch, filtering by the relpersistence in
> > > > systable_getnext() loop seems to be good to me. Alternatively we can
> > > > add "relpersistence == RELPERSISTENCE_TEMP" to the scan key.
> >
> > Do you mean relpersistence != RELPERSISTENCE_TEMP? I tried that. It's
> > not possible to add this check to the scankey since indexes do not
> > work with != conditions. We will need to use a sequential scan to do
> > so, but that will affect the performance. I think what Vignesh has
> > done in his patch is good enough.
> >
> > >
> > > The attached patch adds a new test and resolves an existing test
> > > failure. However, a downside is that we can no longer verify the
> > > mapping of the temporary tables.
> >
> > Yes, but I think the current way of verification wasn't correct anyway
> > because it couldn't match the temporary table's relation exactly. We
> > will need to devise another way to do that, maybe creating a version
> > of pg_filenode_relation() for temporary tables.
> >
> > Some more comments, some of which I have applied in the attached
> > patches. Please review those. Let me know what you think.
> >
> > I feel that we should mention permanent tables in the prologue of
> > pg_filenode_relation() for somebody who just looks at
> > pg_filenode_relation(). Also in its pg_proc.dat description for one
> > who looks at \df+ output. Attached patch does that.
> >
> > - * Returns InvalidOid if no relation matching the criteria could be found.
> > + * Returns InvalidOid if no permanent relation matching the criteria could be
> > + * found.
> >
> > The relpersistence enum has values
> > #define   RELPERSISTENCE_PERMANENT 'p' /* regular table */
> > #define   RELPERSISTENCE_UNLOGGED 'u' /* unlogged permanent table */
> > #define   RELPERSISTENCE_TEMP 't' /* temporary table */
> >
> > The description of UNLOGGED mentions "permanent" so it looks like your
> > use of "permanent" covers both regular and unlogged tables. However
> > looking purely at the RELPERSISTENCE_ names, it' s possible that one
> > will associate the word "permanent" with RELPERSISTENCE_PERMANENT
> > only, and not with RELPERSISTENCE_UNLOGGED. Should we use
> > "non-temporary" instead to avoid confusion?
> >
> > + /*
> > + * Temporary relations should be excluded. This exclusion is
> > + * necessary because they can lead to the wrong result; the
> > + * relfilenumber space is shared with persistent
> > + * relations. Additionally, excluding them is possible since they
> > + * are not the focus in this context.
> > + */
> > + if (classform->relpersistence == RELPERSISTENCE_TEMP)
> > + continue;
> > +
> >
> > I slightly rephrased the comment and moved it to the function prologue
> > since it defines the function's scope.
> >
> > WHERE relkind IN ('r', 'i', 'S', 't', 'm') AND mapped_oid IS DISTINCT FROM oid;
> > +WHERE relkind IN ('r', 'i', 'S', 'm') AND mapped_oid IS DISTINCT FROM oid;
> >
> > Why do we need to remove permanent toast tables from the query?
> > Instead adjustment below
> >
> >  SELECT m.* FROM filenode_mapping m LEFT JOIN pg_class c ON c.oid = m.oid
> > -WHERE c.oid IS NOT NULL OR m.mapped_oid IS NOT NULL;
> > +WHERE (c.oid IS NOT NULL OR m.mapped_oid IS NOT NULL) AND
> > c.relpersistence != 't';
> >
> >  seems enough. Actually, we shouldn't pass temporary tables to
> > pg_filenode_relation(), since it doesn't map temporary relations now.
> > Adjusted that way in the attached patch.
> >
> > While reviewing the patch, I found something else. The
> > RelidByRelfilenumber() enters negative cache entries.
> > RelfilenumberMapInvalidateCallback() treats the negative entries
> > specifically which indicates that it's intentional. But if somebody
> > calls pg_filenode_relation() repeatedly with invalid relfilenodes,
> > that would bloat the cache with "kinda useless entries". It's a way to
> > make a backend consume a lot of memory quickly and thus perform a DOS.
> > For example, on my laptop, I could make a backend consume almost 3GB
> > of memory in 7 minutes.
> >
> > #SELECT pg_size_pretty(total_bytes) total_bytes,
> > pg_size_pretty(used_bytes) used_bytes FROM pg_backend_memory_contexts
> > where name = 'RelfilenumberMap cache';
> >  total_bytes | used_bytes
> > -------------+------------
> > (0 rows)
> >
> > #\timing
> > Timing is on.
> > #select r is null, count(*) from (select pg_filenode_relation(i, j) r
> > from generate_series(1, 1000) i, generate_series(1, 1000) j) q group
> > by r is null;
> >  ?column? |  count
> > ----------+---------
> >  t        | 1000000
> > (1 row)
> >
> > Time: 4705.483 ms (00:04.705)
> > #SELECT pg_size_pretty(total_bytes) total_bytes,
> > pg_size_pretty(used_bytes) used_bytes FROM pg_backend_memory_contexts
> > where name = 'RelfilenumberMap cache';
> >  total_bytes | used_bytes
> > -------------+------------
> >  40 MB       | 39 MB
> > (1 row)
> >
> > Time: 2.351 ms
> > #select r is null, count(*) from (select pg_filenode_relation(i, j) r
> > from generate_series(1001, 10000) i, generate_series(1001, 10000) j) q
> > group by r is null;
> >  ?column? |  count
> > ----------+----------
> >  f        |   153215
> >  t        | 80846785
> > (2 rows)
> >
> > Time: 421998.039 ms (07:01.998)
> > #SELECT pg_size_pretty(total_bytes) total_bytes,
> > pg_size_pretty(used_bytes) used_bytes FROM pg_backend_memory_contexts
> > where name = 'RelfilenumberMap cache';
> >  total_bytes | used_bytes
> > -------------+------------
> >  3180 MB     | 3176 MB
> > (1 row)
> >
> > Time: 132.187 ms
> >
> > Logical replication and autoprewarm may not cause such a large bloat
> > but there is no limit to passing invalid combinations of reltablespace
> > and relfilenumber to pg_filenode_relation(). Do we want to prohibit
> > that case by passing a flag from logical pg_filenode_relation() to not
> > cache invalid entries?
> >
> > I have moved the CF entry to the July commitfest.
>
> The patch needed a rebase because of func.sgml refactoring. Here's a
> rebased patchset.

Please ignore the 0002 patch in the earlier patchset. It's an
experimental change related to negative entries bloat in
RelfilenumberMap cache.

--
Best Wishes,
Ashutosh Bapat



pgsql-bugs by date:

Previous
From: "zhouenbing"
Date:
Subject: 答复: Empty query_id in pg_stat_activity
Next
From: PG Bug reporting form
Date:
Subject: BUG #19019: Feature Request: allow the use of column reference in DEFAULT expression