Thread: Re: Large expressions in indexes can't be stored (non-TOASTable)

Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> Thanks to commit 96cdeae, only a few catalogs remain that are missing TOAST
> tables: pg_attribute, pg_class, pg_index, pg_largeobject, and
> pg_largeobject_metadata.  I've attached a short patch to add one for
> pg_index, which resolves the issue cited here.  This passes "check-world"
> and didn't fail for a few ad hoc tests (e.g., VACUUM FULL on pg_index).  I
> haven't spent too much time investigating possible circularity issues, but
> I'll note that none of the system indexes presently use the indexprs and
> indpred columns.

Yeah, the possibility of circularity seems like the main hazard, but
I agree it's unlikely that the entries for system indexes could ever
need out-of-line storage.  There are many other things that would have
to be improved before a system index could use indexprs or indpred.

            regards, tom lane



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
On Wed, Sep 04, 2024 at 03:20:33PM -0400, Jonathan S. Katz wrote:
> On 9/4/24 3:08 PM, Tom Lane wrote:
>> Nathan Bossart <nathandbossart@gmail.com> writes:
>> > Thanks to commit 96cdeae, only a few catalogs remain that are missing TOAST
>> > tables: pg_attribute, pg_class, pg_index, pg_largeobject, and
>> > pg_largeobject_metadata.  I've attached a short patch to add one for
>> > pg_index, which resolves the issue cited here.  This passes "check-world"
>> > and didn't fail for a few ad hoc tests (e.g., VACUUM FULL on pg_index).  I
>> > haven't spent too much time investigating possible circularity issues, but
>> > I'll note that none of the system indexes presently use the indexprs and
>> > indpred columns.
>> 
>> Yeah, the possibility of circularity seems like the main hazard, but
>> I agree it's unlikely that the entries for system indexes could ever
>> need out-of-line storage.  There are many other things that would have
>> to be improved before a system index could use indexprs or indpred.
> 
> Agreed on the unlikeliness of that, certainly in the short-to-mid term. The
> impetus driving this is dealing with a data type that can be quite large,
> and it's unlikely system catalogs will be dealing with anything of that
> nature, or requiring very long expressions that couldn't be encapsulated in
> a different way.

Any objections to committing this?  I've still been unable to identify any
breakage, and adding it now would give us ~1 year of testing before it'd be
available in a GA release.  Perhaps we should at least add something to
misc_sanity.sql that verifies no system indexes are using pg_index's TOAST
table.

-- 
nathan



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Wed, Sep 04, 2024 at 03:20:33PM -0400, Jonathan S. Katz wrote:
>> On 9/4/24 3:08 PM, Tom Lane wrote:
>>> Nathan Bossart <nathandbossart@gmail.com> writes:
>>>> Thanks to commit 96cdeae, only a few catalogs remain that are missing TOAST
>>>> tables: pg_attribute, pg_class, pg_index, pg_largeobject, and
>>>> pg_largeobject_metadata.  I've attached a short patch to add one for
>>>> pg_index, which resolves the issue cited here.

> Any objections to committing this?

Nope.

            regards, tom lane



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Alexander Lakhin
Date:
Hello Nathan,

18.09.2024 22:52, Nathan Bossart wrote:
Committed.  I waffled on whether to add a test for system indexes that used
pg_index's varlena columns, but I ended up leaving it out.  I've attached
it here in case anyone thinks we should add it.

I've discovered that Jonathan's initial script:
CREATE TABLE def (id int);
SELECT array_agg(n) b FROM generate_series(1,10_000) n \gset
CREATE OR REPLACE FUNCTION vec_quantizer (a int, b int[]) RETURNS bool
AS $$ SELECT true $$ LANGUAGE SQL IMMUTABLE;
CREATE INDEX ON def (vec_quantizer(id, :'b'));

completed with:
DROP INDEX CONCURRENTLY def_vec_quantizer_idx;

triggers an assertion failure:
TRAP: failed Assert("HaveRegisteredOrActiveSnapshot()"), File: "toast_internals.c", Line: 668, PID: 3723372

with the following stack trace:
ExceptionalCondition at assert.c:52:13
init_toast_snapshot at toast_internals.c:670:2
toast_delete_datum at toast_internals.c:429:60
toast_tuple_cleanup at toast_helper.c:303:30
heap_toast_insert_or_update at heaptoast.c:335:9
heap_update at heapam.c:3752:14
simple_heap_update at heapam.c:4210:11
CatalogTupleUpdate at indexing.c:324:2
index_set_state_flags at index.c:3522:2
index_concurrently_set_dead at index.c:1848:2
index_drop at index.c:2286:3
doDeletion at dependency.c:1362:5
deleteOneObject at dependency.c:1279:12
deleteObjectsInList at dependency.c:229:3
performMultipleDeletions at dependency.c:393:2
RemoveRelations at tablecmds.c:1594:2
ExecDropStmt at utility.c:2008:4
...

This class of assert failures is not new, see e. g., bugs #13809, #18127,
but this concrete instance (with index_set_state_flags()) emerged with
b52c4fc3c and may be worth fixing while on it...

Best regards,
Alexander

Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Alexander Lakhin
Date:
Hello Nathan,

19.09.2024 21:36, Nathan Bossart wrote:
> On Thu, Sep 19, 2024 at 12:00:00PM +0300, Alexander Lakhin wrote:
>> completed with:
>> DROP INDEX CONCURRENTLY def_vec_quantizer_idx;
>>
>> triggers an assertion failure:
>> TRAP: failed Assert("HaveRegisteredOrActiveSnapshot()"), File: "toast_internals.c", Line: 668, PID: 3723372
> Ha, that was fast.  The attached patch seems to fix the assertion failures.
> It's probably worth checking if any of the adjacent code paths are
> affected, too.
>

Thank you for your attention to that issue!

I've found another two paths to reach that condition:
CREATE INDEX CONCURRENTLY ON def (vec_quantizer(id, :'b'));
ERROR:  cannot fetch toast data without an active snapshot

REINDEX INDEX CONCURRENTLY def_vec_quantizer_idx;
(or REINDEX TABLE CONCURRENTLY def;)
TRAP: failed Assert("HaveRegisteredOrActiveSnapshot()"), File: "toast_internals.c", Line: 668, PID: 2934502
ExceptionalCondition at assert.c:52:13
init_toast_snapshot at toast_internals.c:670:2
toast_delete_datum at toast_internals.c:429:60
toast_tuple_cleanup at toast_helper.c:303:30
heap_toast_insert_or_update at heaptoast.c:335:9
heap_update at heapam.c:3752:14
simple_heap_update at heapam.c:4210:11
CatalogTupleUpdate at indexing.c:324:2
index_concurrently_swap at index.c:1649:2
ReindexRelationConcurrently at indexcmds.c:4270:3
ReindexIndex at indexcmds.c:2962:1
ExecReindex at indexcmds.c:2884:4
ProcessUtilitySlow at utility.c:1570:22
...

Perhaps it would make sense to check all CatalogTupleUpdate(pg_index, ...)
calls (I've found 10 such instances, but haven't checked them yet).

Best regards,
Alexander



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
On Fri, Sep 20, 2024 at 08:16:24AM +0900, Michael Paquier wrote:
> Perhaps the reason why these snapshots are pushed should be documented
> with a comment?

Definitely.  I'll add those once we are more confident that we've
identified all the bugs.

-- 
nathan



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Alexander Lakhin
Date:
Hello Nathan,

20.09.2024 19:51, Nathan Bossart wrote:
> Here's a (probably naive) attempt at fixing these, too.  I'll give each
> path a closer look once it feels like we've identified all the bugs.

Thank you for the updated patch!

I tested it with two code modifications (1st is to make each created
expression index TOASTed (by prepending 1M of spaces to the indexeprs
value) and 2nd to make each created index an expression index (by
modifying index_elem_options in gram.y) — both modifications are kludgy so
I don't dare to publish them) and found no other snapshot-related issues
during `make check-world`.

Best regards,
Alexander



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
On Tue, Sep 24, 2024 at 02:26:08PM -0500, Nathan Bossart wrote:
> On Tue, Sep 24, 2024 at 01:21:45PM +0900, Michael Paquier wrote:
>> On Mon, Sep 23, 2024 at 10:50:21AM -0500, Nathan Bossart wrote:
>>> I carefully inspected all the code paths this patch touches, and I think
>>> I've got all the details right, but I would be grateful if someone else
>>> could take a look.
>> 
>> No objections from here with putting the snapshots pops and pushes
>> outside the inner routines of reindex/drop concurrently, meaning that
>> ReindexRelationConcurrently(), DefineIndex() and index_drop() are fine
>> to do these operations.
> 
> Great.  I plan to push 0001 shortly.

Committed this one.

-- 
nathan



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
On Wed, Oct 16, 2024 at 09:12:31AM +0900, Michael Paquier wrote:
> -   if (!ctx->rel->rd_rel->reltoastrelid)
> +   if (!OidIsValid(RelationGetToastRelid(ctx->rel)))
> 
> This set of diffs in 0002 is a nice cleanup.  I'd wish for relying
> less on zero comparitons when assuming that InvalidOid is in use.

I'm wondering if there's any concern about this one causing back-patching
pain.  If so, I can just add the macro for use in new code.

> +static inline void
> +AssertHasSnapshotForToast(Relation rel)
> +{
> +    /* bootstrap mode in particular breaks this rule */
> +    if (!IsNormalProcessingMode())
> +        return;
> +
> +    /* if the relation doesn't have a TOAST table, we are good */
> +    if (!OidIsValid(RelationGetToastRelid(rel)))
> +        return;
> +
> +    Assert(HaveRegisteredOrActiveSnapshot());
> +}
> 
> Using a separate inlined routine is indeed cleaner as you have
> documented the assumptions behind the check.  Wouldn't it be better to
> use a USE_ASSERT_CHECKING block?  These two checks for normal
> processing and toastrelid are cheap lookups, but we don't need them at
> all in non-assert paths, so I'd suggest to ignore them entirely for
> the non-USE_ASSERT_CHECKING case.

I assume all of this will get compiled out in non-USE_ASSERT_CHECKING
builds as-is, but I see no problem with surrounding it with an #ifdef to be
sure.

-- 
nathan



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
On Thu, Oct 31, 2024 at 08:45:15AM +0900, Michael Paquier wrote:
> On Wed, Oct 30, 2024 at 03:54:32PM -0500, Nathan Bossart wrote:
>> I'll manage.  0001 was a doozy to back-patch, and this obviously isn't a
>> terribly pressing issue, so I plan to wait until after the November minor
>> release to apply this.
> 
> Okay by me.

I took another look at 0001, and I think it still needs a bit of work.
Specifically, IIUC a few of these code paths are actually fine since they
do not ever touch a varlena column.  For example,
clear_subscription_skip_lsn() only updates subskiplsn, so unless I am
missing a corner case, there is no risk of trying to access a TOAST table
without a snapshot.  Allowing such cases would involve adjusting the new
assertions in 0003 to check for only the varlena columns during
inserts/updates, which is more complicated but seems doable.

Or we could just enforce that you have an active snapshot whenever you
modify a catalog with a TOAST table.  That's simpler, but it requires extra
work in some paths (and probably comments to point out that we're only
pushing an active snapshot to satisfy an assertion).

>> I'm having second thoughts on 0002, so I'll
>> probably leave that one uncommitted, but IMHO we definitely need 0003 to
>> prevent this issue from reoccurring.
> 
> I liked your idea behind 0002, FWIW.  We do that for a lot of fields
> already in a Relation.

Alright, then I'll plan on proceeding with it.

-- 
nathan



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
On Wed, Nov 27, 2024 at 03:20:19PM +0900, Michael Paquier wrote:
> On Mon, Nov 25, 2024 at 01:29:31PM -0600, Nathan Bossart wrote:
>> Or we could just enforce that you have an active snapshot whenever you
>> modify a catalog with a TOAST table.  That's simpler, but it requires extra
>> work in some paths (and probably comments to point out that we're only
>> pushing an active snapshot to satisfy an assertion).
> 
> I may be wrong, but I suspect that enforcing the check without being
> column-based is the right way to go and that this is going to catch
> more errors in the long-term than being a maintenance burden.  So I
> would keep the snapshot check even if it's a bit aggressive, still
> it's useful.  And we are not talking about that may code paths that
> need to be switched to require a snapshot, as well.  Most of the ones
> you have mentioned on this thread are really particular in the ways
> they do transaction handling.  I suspect that it may also catch
> out-of-core issues with extensions doing direct catalog manipulations.

That is useful feedback, thanks.

One other thing that caught my eye is that replorigin_create() uses
SnapshotDirty, so I'm unsure if
PushActiveSnapshot(GetTransactionSnapshot()) is correct there.  The only
other example I found is RelationFindReplTupleByIndex(), which uses
GetLatestSnapshot().  But I do see that CreateSubscription() calls
replorigin_create(), and it seems to rely on the transaction snapshot, so
maybe it's okay, at least for the purpose of TOAST table access...  I'm
finding myself wishing there was a bit more commentary about the proper
usage of these snapshot functions.  Maybe I can try to add some as a
follow-up exercise.

-- 
nathan



On Wed, 27 Nov 2024 at 21:29, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Wed, Nov 27, 2024 at 03:20:19PM +0900, Michael Paquier wrote:
> > On Mon, Nov 25, 2024 at 01:29:31PM -0600, Nathan Bossart wrote:
> >> Or we could just enforce that you have an active snapshot whenever you
> >> modify a catalog with a TOAST table.  That's simpler, but it requires extra
> >> work in some paths (and probably comments to point out that we're only
> >> pushing an active snapshot to satisfy an assertion).
> >
> > I may be wrong, but I suspect that enforcing the check without being
> > column-based is the right way to go and that this is going to catch
> > more errors in the long-term than being a maintenance burden.  So I
> > would keep the snapshot check even if it's a bit aggressive, still
> > it's useful.  And we are not talking about that may code paths that
> > need to be switched to require a snapshot, as well.  Most of the ones
> > you have mentioned on this thread are really particular in the ways
> > they do transaction handling.  I suspect that it may also catch
> > out-of-core issues with extensions doing direct catalog manipulations.
>
> That is useful feedback, thanks.

Michael's feedback from [1] is still pending, so I'm updating the
status to "Waiting on Author." Please provide an updated patch and
change the status back to "Needs Review".
[1] - https://www.postgresql.org/message-id/Z0a6IwjW36af71J7%40paquier.xyz

Regards,
Vignesh



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Amit Kapila
Date:
On Wed, Nov 27, 2024 at 9:29 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Wed, Nov 27, 2024 at 03:20:19PM +0900, Michael Paquier wrote:
> > On Mon, Nov 25, 2024 at 01:29:31PM -0600, Nathan Bossart wrote:
> >> Or we could just enforce that you have an active snapshot whenever you
> >> modify a catalog with a TOAST table.  That's simpler, but it requires extra
> >> work in some paths (and probably comments to point out that we're only
> >> pushing an active snapshot to satisfy an assertion).
> >
> > I may be wrong, but I suspect that enforcing the check without being
> > column-based is the right way to go and that this is going to catch
> > more errors in the long-term than being a maintenance burden.  So I
> > would keep the snapshot check even if it's a bit aggressive, still
> > it's useful.  And we are not talking about that may code paths that
> > need to be switched to require a snapshot, as well.  Most of the ones
> > you have mentioned on this thread are really particular in the ways
> > they do transaction handling.  I suspect that it may also catch
> > out-of-core issues with extensions doing direct catalog manipulations.
>
> That is useful feedback, thanks.
>
> One other thing that caught my eye is that replorigin_create() uses
> SnapshotDirty, so I'm unsure if
> PushActiveSnapshot(GetTransactionSnapshot()) is correct there.
>

Can we dodge adding this push call if we restrict the length of the
origin name to some reasonable limit like 256 or 512 and avoid the
need of toast altogether?

--
With Regards,
Amit Kapila.



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
On Fri, Apr 04, 2025 at 05:16:43PM +0530, Amit Kapila wrote:
> Can we dodge adding this push call if we restrict the length of the
> origin name to some reasonable limit like 256 or 512 and avoid the
> need of toast altogether?

We did consider just removing pg_replication_origin's TOAST table earlier
[0], but we decided against it at that time.  Maybe it's worth
reconsidering...

[0] https://postgr.es/m/ZvOMBhlb5zrBCG5p%40paquier.xyz

-- 
nathan



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Amit Kapila
Date:
On Fri, Apr 4, 2025 at 7:58 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Fri, Apr 04, 2025 at 05:16:43PM +0530, Amit Kapila wrote:
> > Can we dodge adding this push call if we restrict the length of the
> > origin name to some reasonable limit like 256 or 512 and avoid the
> > need of toast altogether?
>
> We did consider just removing pg_replication_origin's TOAST table earlier
> [0], but we decided against it at that time.  Maybe it's worth
> reconsidering...
>

I don't see any good reason in that email for not removing the TOAST
table for pg_replication_origin. I would prefer to remove it rather
than add protection related to its access.

--
With Regards,
Amit Kapila.



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
On Tue, Apr 08, 2025 at 04:44:02PM +0530, Amit Kapila wrote:
> On Fri, Apr 4, 2025 at 7:58 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> On Fri, Apr 04, 2025 at 05:16:43PM +0530, Amit Kapila wrote:
>> > Can we dodge adding this push call if we restrict the length of the
>> > origin name to some reasonable limit like 256 or 512 and avoid the
>> > need of toast altogether?
>>
>> We did consider just removing pg_replication_origin's TOAST table earlier
>> [0], but we decided against it at that time.  Maybe it's worth
>> reconsidering...
> 
> I don't see any good reason in that email for not removing the TOAST
> table for pg_replication_origin. I would prefer to remove it rather
> than add protection related to its access.

The only reason I can think of is that folks might have existing
replication origins with extremely long names that would cause upgrades to
fail.  While I think it would be probably be okay to remove the TOAST table
and restrict origin names to 512 bytes (like we did for password hashes in
commit 8275325), folks routinely complain about NAMEDATALEN, so I think we
should be cautious here.

-- 
nathan



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
"Euler Taveira"
Date:
On Tue, Apr 8, 2025, at 5:25 PM, Nathan Bossart wrote:
On Tue, Apr 08, 2025 at 04:44:02PM +0530, Amit Kapila wrote:
> On Fri, Apr 4, 2025 at 7:58 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> On Fri, Apr 04, 2025 at 05:16:43PM +0530, Amit Kapila wrote:
>> > Can we dodge adding this push call if we restrict the length of the
>> > origin name to some reasonable limit like 256 or 512 and avoid the
>> > need of toast altogether?
>>
>> We did consider just removing pg_replication_origin's TOAST table earlier
>> [0], but we decided against it at that time.  Maybe it's worth
>> reconsidering...

> I don't see any good reason in that email for not removing the TOAST
> table for pg_replication_origin. I would prefer to remove it rather
> than add protection related to its access.

The only reason I can think of is that folks might have existing
replication origins with extremely long names that would cause upgrades to
fail.  While I think it would be probably be okay to remove the TOAST table
and restrict origin names to 512 bytes (like we did for password hashes in
commit 8275325), folks routinely complain about NAMEDATALEN, so I think we
should be cautious here.

The logical replication creates origin names as pg_SUBOID_RELID or pg_SUBOID.
It means the maximum origin name is 24. This limited origin name also applies
to pglogical that limits the name to 54 IIRC. I think that covers the majority
of the logical replication setups. There might be a small number of custom
logical replication systems that possibly use long names for replication
origin. I've never seen a replication origin name longer than NAMEDATALEN.

If you consider that the maximum number of replication origin is limited to 2
bytes (65k distinct names), it is reasonable to restrict the replication
origin names to 512 due to the high number of combinations. We generally
expects that a catalog string uses "name" as type if it is an identifier; it
could be the case for roname if author decided to be strict.

This additional TOAST table has no or rare use. +1 for removing it. It is one
less file, one less table and one less index; in short, one less source of data
corruption. ;)


--
Euler Taveira

Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Michael Paquier
Date:
On Tue, Apr 08, 2025 at 11:21:31PM -0300, Euler Taveira wrote:
> The logical replication creates origin names as pg_SUBOID_RELID or pg_SUBOID.
> It means the maximum origin name is 24. This limited origin name also applies
> to pglogical that limits the name to 54 IIRC. I think that covers the majority
> of the logical replication setups. There might be a small number of custom
> logical replication systems that possibly use long names for replication
> origin. I've never seen a replication origin name longer than NAMEDATALEN.

pg_replication_origin_create() can be used with text as input for the
origin name, still your argument sounds sensible here as I would
suspect that most setups of logical replication are these.

> If you consider that the maximum number of replication origin is limited to 2
> bytes (65k distinct names), it is reasonable to restrict the replication
> origin names to 512 due to the high number of combinations. We generally
> expects that a catalog string uses "name" as type if it is an identifier; it
> could be the case for roname if author decided to be strict.

I would be more cautious than a limit on NAMEDATALEN.  The restriction
suggested by Nathan at 512 bytes should be plenty enough.

> This additional TOAST table has no or rare use. +1 for removing it. It is one
> less file, one less table and one less index; in short, one less source of data
> corruption. ;)

I guess that's the consensus, then.  No objections to the removal here.
--
Michael

Attachment

Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
On Wed, Apr 09, 2025 at 01:20:29PM +0900, Michael Paquier wrote:
> I guess that's the consensus, then.  No objections to the removal here.

That would look like the attached.  There are still a couple of other known
TOAST snapshot issues I need to revisit, but this sidesteps a few of them.
But this'll need to wait for a couple of months until v19 development
begins.

-- 
nathan

Attachment

Re: Large expressions in indexes can't be stored (non-TOASTable)

From
"Euler Taveira"
Date:
On Wed, Apr 9, 2025, at 4:05 PM, Nathan Bossart wrote:
On Wed, Apr 09, 2025 at 01:20:29PM +0900, Michael Paquier wrote:
> I guess that's the consensus, then.  No objections to the removal here.

That would look like the attached.  There are still a couple of other known
TOAST snapshot issues I need to revisit, but this sidesteps a few of them.
But this'll need to wait for a couple of months until v19 development
begins.

LGTM. I have a few suggestions.
 
+   /*
+    * To avoid needing a TOAST table for pg_replication_origin, we restrict
+    * replication origin names to 512 bytes.  This should be more than enough
+    * for all practical use.
+    */
+   if (strlen(roname) > MAX_RONAME_LEN)
+       ereport(ERROR,

I wouldn't duplicate the comment. Instead, I would keep it only in origin.h.

+                errdetail("Repilcation origin names must be no longer than %d bytes.",
+                          MAX_RONAME_LEN)));

s/Repilcation/Replication/

+#define MAX_RONAME_LEN (512)

It is just a cosmetic suggestion but I usually use parentheses when it is an
expression but don't include it for a single number.

Should we document this maximum length?


--
Euler Taveira

Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
On Wed, Apr 09, 2025 at 08:54:21PM -0300, Euler Taveira wrote:
> LGTM. I have a few suggestions.

Thanks for reviewing.

> +   /*
> +    * To avoid needing a TOAST table for pg_replication_origin, we restrict
> +    * replication origin names to 512 bytes.  This should be more than enough
> +    * for all practical use.
> +    */
> +   if (strlen(roname) > MAX_RONAME_LEN)
> +       ereport(ERROR,
> 
> I wouldn't duplicate the comment. Instead, I would keep it only in origin.h.

Hm.  I agree that duplicating the comment isn't great, but I'm also not
wild about forcing readers to jump to the macro definition to figure out
why there is a length restriction.

> +                errdetail("Repilcation origin names must be no longer than %d bytes.",
> +                          MAX_RONAME_LEN)));
> 
> s/Repilcation/Replication/

Fixed.

> +#define MAX_RONAME_LEN (512)
> 
> It is just a cosmetic suggestion but I usually use parentheses when it is an
> expression but don't include it for a single number.

We use both styles, but the no-parentheses style does seem to be preferred.

    $ grep -E "^#define\s[A-Z_]+\s\([0-9]+\)$" src/* -rI | wc -l
          23
    $ grep -E "^#define\s[A-Z_]+\s[0-9]+$" src/* -rI | wc -l
         861

> Should we document this maximum length?

I've added a note.

-- 
nathan

Attachment

Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
Apparently replication origins in tests are supposed to be prefixed with
"regress_".  Here is a new patch with that fixed.

-- 
nathan

Attachment

Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nisha Moond
Date:
On Mon, Apr 21, 2025 at 9:26 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> Apparently replication origins in tests are supposed to be prefixed with
> "regress_".  Here is a new patch with that fixed.
>

Hi Nathan,
Thanks for the patch! I tested it for functionality and here are a few comments:

1) While testing, I noticed an unexpected behavior with the new 512
byte length restriction in place. Since there’s no explicit
restriction on the column roname’s type, it’s possible to insert an
origin name exceeding 512 bytes. For instance, can use the COPY
command -- similar to how pg_dump might dump and restore catalog
tables.

For example:
  postgres=# COPY pg_catalog.pg_replication_origin (roident, roname) FROM stdin;
  Enter data to be copied followed by a newline.
  End with a backslash and a period on a line by itself, or an EOF signal.
  >> 44   regress_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa... [>512 bytes string]
  >> \.
  COPY 1

Once inserted, I was able to use  the pg_replication_origin_xxx
functions with this origin name without any errors.
Not sure how common pg_dump/restore of catalog tables is in real
systems, but should we still consider if this behavior is acceptable?
~~~

Below are a few cosmetic suggestions you might consider.
2)
        <para>
         Creates a replication origin with the given external
         name, and returns the internal ID assigned to it.
+        The name must be no longer than 512 bytes.
        </para></entry>

Would it make sense to make the phrasing a bit more formal, perhaps
something like:
“The name must not exceed 512 bytes.”?

3) For the code comments -
+ /*
+ * To avoid needing a TOAST table for pg_replication_origin, we restrict
+ * replication origin names to 512 bytes.  This should be more than enough
+ * for all practical use.
+ */
+ if (strlen(roname) > MAX_RONAME_LEN)

  a) /bytes.  This/bytes. This/ (there is an extra space before “This”)
  b) /all practical use/all practical uses/
  c) Both (a) and (b) above, also apply to the comment above the macro
“MAX_RONAME_LEN”.

4) The "Detail" part of the error message could be made a bit more
formal and precise -

Current:
DETAIL:  Replication origin names must be no longer than 512 bytes.

Suggestion:
DETAIL:  Replication origin names must not exceed 512 bytes.

5) As Euler pointed out, logical replication already has a built-in
restriction for internally assigned origin names, limiting them to
NAMEDATALEN. Given that, only the SQL function
pg_replication_origin_create() is capable of creating longer origin
names. Would it make sense to consider moving the check into
pg_replication_origin_create() itself, since it seems redundant for
all other callers?
That said, if there's a possibility of future callers needing this
restriction at a lower level, it may be reasonable to leave it as is.

--
Thanks,
Nisha



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
On Tue, Apr 22, 2025 at 05:17:17PM +0530, Nisha Moond wrote:
> Thanks for the patch! I tested it for functionality and here are a few comments:

Thank you for reviewing.

> 1) While testing, I noticed an unexpected behavior with the new 512
> byte length restriction in place. Since there´s no explicit
> restriction on the column roname´s type, it´s possible to insert an
> origin name exceeding 512 bytes. For instance, can use the COPY
> command -- similar to how pg_dump might dump and restore catalog
> tables.
> 
> For example:
>   postgres=# COPY pg_catalog.pg_replication_origin (roident, roname) FROM stdin;
>   Enter data to be copied followed by a newline.
>   End with a backslash and a period on a line by itself, or an EOF signal.
>   >> 44   regress_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa... [>512 bytes string]
>   >> \.
>   COPY 1
> 
> Once inserted, I was able to use  the pg_replication_origin_xxx
> functions with this origin name without any errors.
> Not sure how common pg_dump/restore of catalog tables is in real
> systems, but should we still consider if this behavior is acceptable?

I'm personally not too worried about this.  The limit is primarily there to
provide a nicer ERROR than "row is too big", which AFAICT is the worst-case
scenario if you go to the trouble of setting allow_system_table_mods and
modifying shared catalogs directly.

> 5) As Euler pointed out, logical replication already has a built-in
> restriction for internally assigned origin names, limiting them to
> NAMEDATALEN. Given that, only the SQL function
> pg_replication_origin_create() is capable of creating longer origin
> names. Would it make sense to consider moving the check into
> pg_replication_origin_create() itself, since it seems redundant for
> all other callers?
> That said, if there's a possibility of future callers needing this
> restriction at a lower level, it may be reasonable to leave it as is.

pg_replication_origin_create() might be a better place.  After all, that's
where we're enforcing the name doesn't start with "pg_" and, for testing,
_does_ start with "regress_".  Plus, it seems highly unlikely that any
other callers of replorigin_create() will ever provide names longer than
512 bytes.

-- 
nathan



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
On Tue, Apr 22, 2025 at 12:21:01PM -0500, Nathan Bossart wrote:
> On Tue, Apr 22, 2025 at 05:17:17PM +0530, Nisha Moond wrote:
>> 5) As Euler pointed out, logical replication already has a built-in
>> restriction for internally assigned origin names, limiting them to
>> NAMEDATALEN. Given that, only the SQL function
>> pg_replication_origin_create() is capable of creating longer origin
>> names. Would it make sense to consider moving the check into
>> pg_replication_origin_create() itself, since it seems redundant for
>> all other callers?
>> That said, if there's a possibility of future callers needing this
>> restriction at a lower level, it may be reasonable to leave it as is.
> 
> pg_replication_origin_create() might be a better place.  After all, that's
> where we're enforcing the name doesn't start with "pg_" and, for testing,
> _does_ start with "regress_".  Plus, it seems highly unlikely that any
> other callers of replorigin_create() will ever provide names longer than
> 512 bytes.

Here is a new version of the patch with this change.

-- 
nathan

Attachment
Nathan Bossart <nathandbossart@gmail.com> writes:
> Here is a new version of the patch with this change.

I don't see any comments in this patch that capture the real
reason for removing pg_replication_origin's TOAST table,
namely (IIUC) that we'd like to be able to access that catalog
without a snapshot.  I think it's important to record that
knowledge, because otherwise somebody is likely to think they
can undo this change for $whatever-reason.

            regards, tom lane



Re: Large expressions in indexes can't be stored (non-TOASTable)

From
Nathan Bossart
Date:
On Wed, Apr 23, 2025 at 05:33:41PM -0400, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> Here is a new version of the patch with this change.
> 
> I don't see any comments in this patch that capture the real
> reason for removing pg_replication_origin's TOAST table,
> namely (IIUC) that we'd like to be able to access that catalog
> without a snapshot.  I think it's important to record that
> knowledge, because otherwise somebody is likely to think they
> can undo this change for $whatever-reason.

D'oh, yes, I'd better add that.

-- 
nathan