Thread: BUG #18310: Some SQL commands fail to process duplicate objects with error: tuple already updated by self

The following bug has been logged on the website:

Bug reference:      18310
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 16.1
Operating system:   Ubuntu 22.04
Description:

[ this bug reported to fix defects discovered while working on bug #18297
]

The following query:
CREATE ROLE u;
DROP ROLE u, u;
fails with
ERROR:  tuple already updated by self

Whilst all the other DROP commands, accepting a list of objects (namely,
AGGREGATE, DOMAIN, EXTENSION, FOREIGN DATA WRAPPER, FOREIGN TABLE,
FUNCTION, INDEX, MATERIALIZED VIEW, OPERATOR, PROCEDURE, ROUTINE,
SEQUENCE, SERVER, STATISTICS, TABLE, TYPE, VIEW), handle such duplicates
with no error.

Also, the following ALTER query:
CREATE TEXT SEARCH CONFIGURATION ispell_tst (COPY=english);
CREATE TEXT SEARCH DICTIONARY ispell (Template=ispell,
 DictFile=ispell_sample, AffFile=ispell_sample);
ALTER TEXT SEARCH CONFIGURATION ispell_tst ALTER MAPPING FOR word, word
 WITH ispell;
fails with the same
ERROR:  tuple already updated by self

Again, I re-checked all the other ALTER commands, that accept a list of
objects, and could not find other ones producing the same error.


On Thu, Jan 25, 2024 at 04:00:01AM +0000, PG Bug reporting form wrote:
> The following query:
> CREATE ROLE u;
> DROP ROLE u, u;
> fails with
> ERROR:  tuple already updated by self

Pas glop.

> Whilst all the other DROP commands, accepting a list of objects (namely,
> AGGREGATE, DOMAIN, EXTENSION, FOREIGN DATA WRAPPER, FOREIGN TABLE,
> FUNCTION, INDEX, MATERIALIZED VIEW, OPERATOR, PROCEDURE, ROUTINE,
> SEQUENCE, SERVER, STATISTICS, TABLE, TYPE, VIEW), handle such duplicates
> with no error.

DROP ROLE has it own parsing node and its own dropping logic because
of ACL checks and such, while the others used the unified route with
dependency.c & friends.

For this job, I think user.c is just overengineered and could be
simplified to use a list of OIDs.  There is no need for ObjectAddress
or even ObjectAddresses: all of them are created with
AuthIdRelationId as class ID, and the second loop just fetches the
role IDs retrieved from the ObjectAddresses created in the first loop.
We don't use the individual ObjectAddress either.  On top of all that,
it makes the code a bit easier to follow.

> Also, the following ALTER query:
> CREATE TEXT SEARCH CONFIGURATION ispell_tst (COPY=english);
> CREATE TEXT SEARCH DICTIONARY ispell (Template=ispell,
>  DictFile=ispell_sample, AffFile=ispell_sample);
> ALTER TEXT SEARCH CONFIGURATION ispell_tst ALTER MAPPING FOR word, word
>  WITH ispell;
> fails with the same
> ERROR:  tuple already updated by self
>
> Again, I re-checked all the other ALTER commands, that accept a list of
> objects, and could not find other ones producing the same error.

This one comes down to the list of manipulation of the list for token
names, where tsearchcmds.c has a few assumptions about the way to do
its job across multiple catalog scans.  We could adjust the code so as
the list of String nodes is rebuilt with some list_append_unique(),
but I cannot convince myself that this is worth the complication.
--
Michael

Attachment


Michael Paquier <michael@paquier.xyz> 于2024年1月26日周五 10:51写道:
On Thu, Jan 25, 2024 at 04:00:01AM +0000, PG Bug reporting form wrote:
> The following query:
> CREATE ROLE u;
> DROP ROLE u, u;
> fails with
> ERROR:  tuple already updated by self

Pas glop.

> Whilst all the other DROP commands, accepting a list of objects (namely,
> AGGREGATE, DOMAIN, EXTENSION, FOREIGN DATA WRAPPER, FOREIGN TABLE,
> FUNCTION, INDEX, MATERIALIZED VIEW, OPERATOR, PROCEDURE, ROUTINE,
> SEQUENCE, SERVER, STATISTICS, TABLE, TYPE, VIEW), handle such duplicates
> with no error.

DROP ROLE has it own parsing node and its own dropping logic because
of ACL checks and such, while the others used the unified route with
dependency.c & friends.

For this job, I think user.c is just overengineered and could be
simplified to use a list of OIDs.  There is no need for ObjectAddress
or even ObjectAddresses: all of them are created with
AuthIdRelationId as class ID, and the second loop just fetches the
role IDs retrieved from the ObjectAddresses created in the first loop.
We don't use the individual ObjectAddress either.  On top of all that,
it makes the code a bit easier to follow.

Agreed. +1
 

> Also, the following ALTER query:
> CREATE TEXT SEARCH CONFIGURATION ispell_tst (COPY=english);
> CREATE TEXT SEARCH DICTIONARY ispell (Template=ispell,
>  DictFile=ispell_sample, AffFile=ispell_sample);
> ALTER TEXT SEARCH CONFIGURATION ispell_tst ALTER MAPPING FOR word, word
>  WITH ispell;
> fails with the same
> ERROR:  tuple already updated by self
>
> Again, I re-checked all the other ALTER commands, that accept a list of
> objects, and could not find other ones producing the same error.

This one comes down to the list of manipulation of the list for token
names, where tsearchcmds.c has a few assumptions about the way to do
its job across multiple catalog scans.  We could adjust the code so as
the list of String nodes is rebuilt with some list_append_unique(),
but I cannot convince myself that this is worth the complication.
--
Michael


--
Tender Wang
OpenPie:  https://en.openpie.com/


Michael Paquier <michael@paquier.xyz> 于2024年1月26日周五 10:51写道:
On Thu, Jan 25, 2024 at 04:00:01AM +0000, PG Bug reporting form wrote:
> The following query:
> CREATE ROLE u;
> DROP ROLE u, u;
> fails with
> ERROR:  tuple already updated by self

Pas glop.

> Whilst all the other DROP commands, accepting a list of objects (namely,
> AGGREGATE, DOMAIN, EXTENSION, FOREIGN DATA WRAPPER, FOREIGN TABLE,
> FUNCTION, INDEX, MATERIALIZED VIEW, OPERATOR, PROCEDURE, ROUTINE,
> SEQUENCE, SERVER, STATISTICS, TABLE, TYPE, VIEW), handle such duplicates
> with no error.

DROP ROLE has it own parsing node and its own dropping logic because
of ACL checks and such, while the others used the unified route with
dependency.c & friends.

For this job, I think user.c is just overengineered and could be
simplified to use a list of OIDs.  There is no need for ObjectAddress
or even ObjectAddresses: all of them are created with
AuthIdRelationId as class ID, and the second loop just fetches the
role IDs retrieved from the ObjectAddresses created in the first loop.
We don't use the individual ObjectAddress either.  On top of all that,
it makes the code a bit easier to follow.

> Also, the following ALTER query:
> CREATE TEXT SEARCH CONFIGURATION ispell_tst (COPY=english);
> CREATE TEXT SEARCH DICTIONARY ispell (Template=ispell,
>  DictFile=ispell_sample, AffFile=ispell_sample);
> ALTER TEXT SEARCH CONFIGURATION ispell_tst ALTER MAPPING FOR word, word
>  WITH ispell;
> fails with the same
> ERROR:  tuple already updated by self
>
> Again, I re-checked all the other ALTER commands, that accept a list of
> objects, and could not find other ones producing the same error.

This one comes down to the list of manipulation of the list for token
names, where tsearchcmds.c has a few assumptions about the way to do
its job across multiple catalog scans.  We could adjust the code so as
the list of String nodes is rebuilt with some list_append_unique(),
but I cannot convince myself that this is worth the complication.

 I have an idea.
How about using list *res replace int *res in getTokenTypes(), so we can use list_append_unique().
In outer function, use list *tokens to replace int *tokens, and ntokens = list_length(tokens) not list_length(stmt->tokentype).

--
Michael


--
Tender Wang
OpenPie:  https://en.openpie.com/
On Fri, Jan 26, 2024 at 11:33:24AM +0800, Tender Wang wrote:
> How about using list *res replace int *res in getTokenTypes(), so we can
> use list_append_unique().
> In outer function, use list *tokens to replace int *tokens, and ntokens =
> list_length(tokens) not list_length(stmt->tokentype).

Yeah, I was wondering about this code path.  If you feel strongly
about that, would you like to write a patch?
--
Michael

Attachment


Michael Paquier <michael@paquier.xyz> 于2024年1月26日周五 11:41写道:
On Fri, Jan 26, 2024 at 11:33:24AM +0800, Tender Wang wrote:
> How about using list *res replace int *res in getTokenTypes(), so we can
> use list_append_unique().
> In outer function, use list *tokens to replace int *tokens, and ntokens =
> list_length(tokens) not list_length(stmt->tokentype).

Yeah, I was wondering about this code path.  If you feel strongly
about that, would you like to write a patch?
 
Ok,  I try to fix it.
--
Michael


--
Tender Wang
OpenPie:  https://en.openpie.com/


Tender Wang <tndrwang@gmail.com> 于2024年1月26日周五 12:54写道:


Michael Paquier <michael@paquier.xyz> 于2024年1月26日周五 11:41写道:
On Fri, Jan 26, 2024 at 11:33:24AM +0800, Tender Wang wrote:
> How about using list *res replace int *res in getTokenTypes(), so we can
> use list_append_unique().
> In outer function, use list *tokens to replace int *tokens, and ntokens =
> list_length(tokens) not list_length(stmt->tokentype).

Yeah, I was wondering about this code path.  If you feel strongly
about that, would you like to write a patch?
 
Ok,  I try to fix it.
 
As I said before, return List looks like not complicated to solve this issue.
I found another problem, it didn't report NOTICE if SQL has IF EXISTS, for example:

postgres=# alter text search configuration ispell_tst drop mapping if exists for test;
ERROR:  token type "test" does not exist

So I change the func getTokenTypes() interface and remove DropConfigurationMapping() error report into getTokenTypes().


--
Michael


--
Tender Wang
OpenPie:  https://en.openpie.com/


--
Tender Wang
OpenPie:  https://en.openpie.com/
Attachment
Hi Tender Wang,

I think your code changes LGTM. Thanks for your contribution.

But I meet these errors when I git apply your patch, so I do some tiny formate modification. 
Please refer to the patch which is still authored by you.

``` bash
$ git apply 0001-Fix-reporting-error-when-process-duplicate-token-typ.patch
0001-Fix-reporting-error-when-process-duplicate-token-typ.patch:38: indent with spaces.
    List       *tokennames = stmt->tokentype;
0001-Fix-reporting-error-when-process-duplicate-token-typ.patch:73: indent with spaces.
        {
0001-Fix-reporting-error-when-process-duplicate-token-typ.patch:74: indent with spaces.
            if (!stmt->missing_ok)
0001-Fix-reporting-error-when-process-duplicate-token-typ.patch:75: indent with spaces.
                ereport(ERROR,
0001-Fix-reporting-error-when-process-duplicate-token-typ.patch:76: indent with spaces.
                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
warning: squelched 9 whitespace errors
warning: 14 lines add whitespace errors.
```

Regrads

Yongtao Huang
Broadcom Greenplum

Tender Wang <tndrwang@gmail.com> 于2024年1月26日周五 14:21写道:


Tender Wang <tndrwang@gmail.com> 于2024年1月26日周五 12:54写道:


Michael Paquier <michael@paquier.xyz> 于2024年1月26日周五 11:41写道:
On Fri, Jan 26, 2024 at 11:33:24AM +0800, Tender Wang wrote:
> How about using list *res replace int *res in getTokenTypes(), so we can
> use list_append_unique().
> In outer function, use list *tokens to replace int *tokens, and ntokens =
> list_length(tokens) not list_length(stmt->tokentype).

Yeah, I was wondering about this code path.  If you feel strongly
about that, would you like to write a patch?
 
Ok,  I try to fix it.
 
As I said before, return List looks like not complicated to solve this issue.
I found another problem, it didn't report NOTICE if SQL has IF EXISTS, for example:

postgres=# alter text search configuration ispell_tst drop mapping if exists for test;
ERROR:  token type "test" does not exist

So I change the func getTokenTypes() interface and remove DropConfigurationMapping() error report into getTokenTypes().


--
Michael


--
Tender Wang
OpenPie:  https://en.openpie.com/


--
Tender Wang
OpenPie:  https://en.openpie.com/
Attachment
On Sun, Jan 28, 2024 at 01:32:08AM +0800, Yongtao Huang wrote:
> Hi Tender Wang,
>
> I think your code changes *LGTM*. Thanks for your contribution.

Yes, incorrect indentation is something I have noticed in the previous
patches you have sent to the lists: all the new lines have an
incorrect indentation.  See here, where it is mentioned that the code
base uses 4 column tab spacing with tabs preserved:
https://www.postgresql.org/docs/devel/source-format.html

FWIW, I am OK for simple patches, but this is going to make reviews
harder when it comes to read more complicated patches that touch more
code paths.

> Broadcom Greenplum

Glad to see you're still alive and around.
--
Michael

Attachment
On Fri, Jan 26, 2024 at 11:23:08AM +0800, Tender Wang wrote:
> Michael Paquier <michael@paquier.xyz> 于2024年1月26日周五 10:51写道:
>> For this job, I think user.c is just overengineered and could be
>> simplified to use a list of OIDs.  There is no need for ObjectAddress
>> or even ObjectAddresses: all of them are created with
>> AuthIdRelationId as class ID, and the second loop just fetches the
>> role IDs retrieved from the ObjectAddresses created in the first loop.
>> We don't use the individual ObjectAddress either.  On top of all that,
>> it makes the code a bit easier to follow.
>
> Agreed. +1

Thanks, I've applied this one down v16 as of 50b797dc99ec.  v15 and
older do not use a two-pass logic in DropRole() so one would fail with
a simpler "role does not exist", which is kind of correct in its own
way because the second role cannot be scanned in pg_authid after the
CCI of the first one, so I have let it be in older versions.  It's not
like somebody complained about that in the past, either, as far as I
recall.
--
Michael

Attachment


Michael Paquier <michael@paquier.xyz> 于2024年1月29日周一 07:18写道:
On Sun, Jan 28, 2024 at 01:32:08AM +0800, Yongtao Huang wrote:
> Hi Tender Wang,
>
> I think your code changes *LGTM*. Thanks for your contribution.

Thanks for reviewing this patch and fixing the indentation.
 

Yes, incorrect indentation is something I have noticed in the previous
patches you have sent to the lists: all the new lines have an
incorrect indentation.  See here, where it is mentioned that the code
base uses 4 column tab spacing with tabs preserved:
https://www.postgresql.org/docs/devel/source-format.html

FWIW, I am OK for simple patches, but this is going to make reviews
harder when it comes to read more complicated patches that touch more
code paths.

 Sorry, my mistake.  Thanks for reminding.  Yongtao has help me fix the indentation issue in his  patch,
 and I don't have new code to add.


> Broadcom Greenplum

Glad to see you're still alive and around.
--
Michael

--
Tender Wang
OpenPie:  https://en.openpie.com/
On Mon, Jan 29, 2024 at 11:27:15AM +0800, Tender Wang wrote:
> Michael Paquier <michael@paquier.xyz> 于2024年1月29日周一 07:18写道:
>> Yes, incorrect indentation is something I have noticed in the previous
>> patches you have sent to the lists: all the new lines have an
>> incorrect indentation.  See here, where it is mentioned that the code
>> base uses 4 column tab spacing with tabs preserved:
>> https://www.postgresql.org/docs/devel/source-format.html
>>
>> FWIW, I am OK for simple patches, but this is going to make reviews
>> harder when it comes to read more complicated patches that touch more
>> code paths.
>>
>
>  Sorry, my mistake.  Thanks for reminding.  Yongtao has help me fix the
> indentation issue in his  patch,
>  and I don't have new code to add.

The latest version posted on [1] was still not right.  I got a lot of
complaints from a simple `git diff --check`.  I am just looking at the
patch, no need to send something else.

[1]: https://www.postgresql.org/message-id/CAHewXNk5yeUo2XPaqTYSfbkKhEBfKd4VYa7NHv2MzV-mJ9ULpA@mail.gmail.com
--
Michael

Attachment
On Fri, Jan 26, 2024 at 02:21:12PM +0800, Tender Wang wrote:
> As I said before, return List looks like not complicated to solve this
> issue.
> I found another problem, it didn't report NOTICE if SQL has IF EXISTS, for
> example:
>
> postgres=# alter text search configuration ispell_tst drop mapping if
> exists for test;
> ERROR:  token type "test" does not exist

It seems to me that there is a misunderstanding here, because this
query is correct depending on the parser used by a tsearch
configuration, and the code.  See details below.

> So I change the func getTokenTypes() interface and remove
> DropConfigurationMapping() error report into getTokenTypes().

+            if (!stmt->missing_ok)
+                ereport(ERROR,
+                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                         errmsg("token type \"%s\" does not exist",
+                                strVal(val))));
+            else
+            {
+                ereport(NOTICE,
+                        (errmsg("token type \"%s\" does not exist, skipping",
+                                strVal(val))));
+            }
[ ... ]
-        if (!found)
-        {
-            if (!stmt->missing_ok)
-            {
-                ereport(ERROR,
-                        (errcode(ERRCODE_UNDEFINED_OBJECT),
-                         errmsg("mapping for token type \"%s\" does not exist",
-                                strVal(val))));
-            }
-            else
-            {
-                ereport(NOTICE,
-                        (errmsg("mapping for token type \"%s\" does not exist, skipping",
-                                strVal(val))));
-            }
-        }

Your patch looks incorrect to me, and changes existing behaviors
because of the changes you have done in both getTokenTypes() *and*
DropConfigurationMapping() because the error handling you have removed
can be reached when attempting to drop a token that is included in the
parser but has *no* mapping.  For example, reusing the test of your
patch, this would include a bunch of tokens:
CREATE TEXT SEARCH CONFIGURATION dup_ispell_tst (COPY=english);
\dF+ dup_ispell_tst

On HEAD, attempting to drop a token that does not exist for the
configuration fails, because the defined token is not part of the
parser:
=# ALTER TEXT SEARCH CONFIGURATION dup_ispell_tst
     DROP MAPPING IF EXISTS FOR not_here;
ERROR:  22023: token type "not_here" does not exist
But if a token is dropped the configuration with the parser supporting
it, we'd get that:
=# ALTER TEXT SEARCH CONFIGURATION dup_ispell_tst
      DROP MAPPING FOR word;
ALTER TEXT SEARCH CONFIGURATION
=# ALTER TEXT SEARCH CONFIGURATION dup_ispell_tst
     DROP MAPPING IF EXISTS FOR word;
NOTICE:  00000: mapping for token type "word" does not exist, skipping

With the patch and the same queries as previously, we now get that:
=# ALTER TEXT SEARCH CONFIGURATION dup_ispell_tst
     DROP MAPPING IF EXISTS FOR not_here;
NOTICE:  00000: token type "not_here" does not exist, skipping
=# ALTER TEXT SEARCH CONFIGURATION dup_ispell_tst
     DROP MAPPING IF EXISTS FOR word;
ALTER TEXT SEARCH CONFIGURATION
=# ALTER TEXT SEARCH CONFIGURATION dup_ispell_tst
     DROP MAPPING IF EXISTS FOR word;
ALTER TEXT SEARCH CONFIGURATION

This is incorrect for two reasons:
- In the first query, the mapping "not_here" is not part of the
parser, and this pattern *has to* be a hard ERROR even on IF EXISTS.
- The token "word" is part in the parser, and we finish by incorrectly
reporting that it gets dropped, missing the NOTICE from the IF EXISTS.

So the patch makes a worse experience the user, because he/she then
does not know if the mapping has been dropped or not, while not
knowing anymore if the mapping was part of the parser used by the
configuration.

The root of the problem is that we should fail for an unknown token
name and we skip a failure if the mapping does not exist with IF
EXISTS, while still knowing the name of the token to be able to report
it correctly, and while handling duplicates in getTokenTypes().  So
the patch should *not* change the error checks at all.

I think that we should tweak getTokenTypes() so as we return *two*
Lists, one for the IDs and a second with the token names, then use
forboth() in DropConfigurationMapping() with the two lists and a
simple foreach in MakeConfigurationMapping() when overriding the
mappings, while getTokenTypes() checks if a number is in the first
list before adding the name of a token in the second list.  Or we
could use a simple list with pointers to a local structure, but the
second list is only needed by DropConfigurationMapping().  That would
enforce a correct order of the token names and numbers, at least.
I would be tempted to just use one List with a structure (number,
token_name).  It makes the checks for duplicates O(N^2) but we will
never have hundreds of mapping entries in these DDL queries.

What's your pick?
--
Michael

Attachment


Michael Paquier <michael@paquier.xyz> 于2024年1月30日周二 14:28写道:
On Fri, Jan 26, 2024 at 02:21:12PM +0800, Tender Wang wrote:
> As I said before, return List looks like not complicated to solve this
> issue.
> I found another problem, it didn't report NOTICE if SQL has IF EXISTS, for
> example:
>
> postgres=# alter text search configuration ispell_tst drop mapping if
> exists for test;
> ERROR:  token type "test" does not exist

It seems to me that there is a misunderstanding here, because this
query is correct depending on the parser used by a tsearch
configuration, and the code.  See details below.

Yeah, you are right. 

> So I change the func getTokenTypes() interface and remove
> DropConfigurationMapping() error report into getTokenTypes().

+            if (!stmt->missing_ok)
+                ereport(ERROR,
+                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                         errmsg("token type \"%s\" does not exist",
+                                strVal(val))));
+            else
+            {
+                ereport(NOTICE,
+                        (errmsg("token type \"%s\" does not exist, skipping",
+                                strVal(val))));
+            }
[ ... ]
-        if (!found)
-        {
-            if (!stmt->missing_ok)
-            {
-                ereport(ERROR,
-                        (errcode(ERRCODE_UNDEFINED_OBJECT),
-                         errmsg("mapping for token type \"%s\" does not exist",
-                                strVal(val))));
-            }
-            else
-            {
-                ereport(NOTICE,
-                        (errmsg("mapping for token type \"%s\" does not exist, skipping",
-                                strVal(val))));
-            }
-        }

Your patch looks incorrect to me, and changes existing behaviors
because of the changes you have done in both getTokenTypes() *and*
DropConfigurationMapping() because the error handling you have removed
can be reached when attempting to drop a token that is included in the
parser but has *no* mapping.  For example, reusing the test of your
patch, this would include a bunch of tokens:
CREATE TEXT SEARCH CONFIGURATION dup_ispell_tst (COPY=english);
\dF+ dup_ispell_tst

On HEAD, attempting to drop a token that does not exist for the
configuration fails, because the defined token is not part of the
parser:
=# ALTER TEXT SEARCH CONFIGURATION dup_ispell_tst
     DROP MAPPING IF EXISTS FOR not_here;
ERROR:  22023: token type "not_here" does not exist
But if a token is dropped the configuration with the parser supporting
it, we'd get that:
=# ALTER TEXT SEARCH CONFIGURATION dup_ispell_tst
      DROP MAPPING FOR word;
ALTER TEXT SEARCH CONFIGURATION
=# ALTER TEXT SEARCH CONFIGURATION dup_ispell_tst
     DROP MAPPING IF EXISTS FOR word;
NOTICE:  00000: mapping for token type "word" does not exist, skipping

With the patch and the same queries as previously, we now get that:
=# ALTER TEXT SEARCH CONFIGURATION dup_ispell_tst
     DROP MAPPING IF EXISTS FOR not_here;
NOTICE:  00000: token type "not_here" does not exist, skipping
=# ALTER TEXT SEARCH CONFIGURATION dup_ispell_tst
     DROP MAPPING IF EXISTS FOR word;
ALTER TEXT SEARCH CONFIGURATION
=# ALTER TEXT SEARCH CONFIGURATION dup_ispell_tst
     DROP MAPPING IF EXISTS FOR word;
ALTER TEXT SEARCH CONFIGURATION

Thanks for your explanation. 
 
This is incorrect for two reasons:
- In the first query, the mapping "not_here" is not part of the
parser, and this pattern *has to* be a hard ERROR even on IF EXISTS.
- The token "word" is part in the parser, and we finish by incorrectly
reporting that it gets dropped, missing the NOTICE from the IF EXISTS.

So the patch makes a worse experience the user, because he/she then
does not know if the mapping has been dropped or not, while not
knowing anymore if the mapping was part of the parser used by the
configuration.

The root of the problem is that we should fail for an unknown token
name and we skip a failure if the mapping does not exist with IF
EXISTS, while still knowing the name of the token to be able to report
it correctly, and while handling duplicates in getTokenTypes().  So
the patch should *not* change the error checks at all.

 
Yeah, it works for  MakeConfigurationMapping() when just returns list, but for  DropConfigurationMapping(),
it need to know the non-exist token name. So it doesn't work if we just return oid list in getTokenTypes().
When I try to fix the problem, I found that getTokenType() will report ERROR and DropConfigurationMapping()
will report ERROR too. I felt a little confused at that time. 

I misunderstund the behavior in getTokenTypes() and in DropConfigurationMapping(). I only realized it can fix the reported issue. 

I think that we should tweak getTokenTypes() so as we return *two*
Lists, one for the IDs and a second with the token names, then use
forboth() in DropConfigurationMapping() with the two lists and a
simple foreach in MakeConfigurationMapping() when overriding the
mappings, while getTokenTypes() checks if a number is in the first
list before adding the name of a token in the second list.  Or we
could use a simple list with pointers to a local structure, but the
second list is only needed by DropConfigurationMapping().  That would
enforce a correct order of the token names and numbers, at least.
I would be tempted to just use one List with a structure (number,
token_name).  It makes the checks for duplicates O(N^2) but we will
never have hundreds of mapping entries in these DDL queries.

Hmm, I agree with you.  
 

What's your pick?
--
Michael


--
Tender Wang
OpenPie:  https://en.openpie.com/
On Tue, Jan 30, 2024 at 03:29:46PM +0800, Tender Wang wrote:
> Michael Paquier <michael@paquier.xyz> 于2024年1月30日周二 14:28写道:
> I misunderstund the behavior in getTokenTypes() and in
> DropConfigurationMapping(). I only realized it can fix the reported issue.

Well, I don't think you should blame yourself, so no worries we are
all here to learn :)

tsdicts.sql has little to no coverage of the various grammar flavors
we are discussing here, so we should try to close the hole with all
the expectations I've just guessed while analyzing the code and the
patch.  So it is a
problem of fixing the root issue as much as expanding the regression
tests with token names and IF EXISTS.

>> I think that we should tweak getTokenTypes() so as we return *two*
>> Lists, one for the IDs and a second with the token names, then use
>> forboth() in DropConfigurationMapping() with the two lists and a
>> simple foreach in MakeConfigurationMapping() when overriding the
>> mappings, while getTokenTypes() checks if a number is in the first
>> list before adding the name of a token in the second list.  Or we
>> could use a simple list with pointers to a local structure, but the
>> second list is only needed by DropConfigurationMapping().  That would
>> enforce a correct order of the token names and numbers, at least.
>> I would be tempted to just use one List with a structure (number,
>> token_name).  It makes the checks for duplicates O(N^2) but we will
>> never have hundreds of mapping entries in these DDL queries.
>>
>
> Hmm, I agree with you.

Perhaps you'd like to give it a second try?  Based on my suggestion,
you just need to englobe the tokens retrieved into a single structure
like that:
typedef struct
{
    int num;
        char *name;
} TSTokenTypes;

Then handle a List of these (could be as well an array with its length
returned as an output argument of getTokenTypes(), to keep a
non-duplicated list of tokens with a strict mapping between token name
and token number.
--
Michael

Attachment


Michael Paquier <michael@paquier.xyz> 于2024年1月30日周二 15:42写道:
On Tue, Jan 30, 2024 at 03:29:46PM +0800, Tender Wang wrote:
> Michael Paquier <michael@paquier.xyz> 于2024年1月30日周二 14:28写道:
> I misunderstund the behavior in getTokenTypes() and in
> DropConfigurationMapping(). I only realized it can fix the reported issue.

Well, I don't think you should blame yourself, so no worries we are
all here to learn :)

tsdicts.sql has little to no coverage of the various grammar flavors
we are discussing here, so we should try to close the hole with all
the expectations I've just guessed while analyzing the code and the
patch.  So it is a
problem of fixing the root issue as much as expanding the regression
tests with token names and IF EXISTS.

>> I think that we should tweak getTokenTypes() so as we return *two*
>> Lists, one for the IDs and a second with the token names, then use
>> forboth() in DropConfigurationMapping() with the two lists and a
>> simple foreach in MakeConfigurationMapping() when overriding the
>> mappings, while getTokenTypes() checks if a number is in the first
>> list before adding the name of a token in the second list.  Or we
>> could use a simple list with pointers to a local structure, but the
>> second list is only needed by DropConfigurationMapping().  That would
>> enforce a correct order of the token names and numbers, at least.
>> I would be tempted to just use one List with a structure (number,
>> token_name).  It makes the checks for duplicates O(N^2) but we will
>> never have hundreds of mapping entries in these DDL queries.
>>
>
> Hmm, I agree with you.

Perhaps you'd like to give it a second try?  Based on my suggestion,
you just need to englobe the tokens retrieved into a single structure
like that:
typedef struct
{
        int num;
        char *name;
} TSTokenTypes;

Then handle a List of these (could be as well an array with its length
returned as an output argument of getTokenTypes(), to keep a
non-duplicated list of tokens with a strict mapping between token name
and token number.

Thanks a lot. I want to give it a second try. I will send a patch based on your suggestion.
 
--
Michael


--
Tender Wang
OpenPie:  https://en.openpie.com/
On Tue, Jan 30, 2024 at 04:02:06PM +0800, Tender Wang wrote:
> Thanks a lot. I want to give it a second try. I will send a patch based on
> your suggestion.

Sure, feel free.
--
Michael

Attachment
I refactor previes patch based on your suggestion, please review the new attached patch.

Michael Paquier <michael@paquier.xyz> 于2024年1月30日周二 16:15写道:
On Tue, Jan 30, 2024 at 04:02:06PM +0800, Tender Wang wrote:
> Thanks a lot. I want to give it a second try. I will send a patch based on
> your suggestion.

Sure, feel free.
--
Michael


--
Tender Wang
OpenPie:  https://en.openpie.com/
Attachment
On Tue, Jan 30, 2024 at 07:51:45PM +0800, Tender Wang wrote:
> I refactor previes patch based on your suggestion, please review the new
> attached patch.

Thanks for the new patch.  I've found the logic to be basically OK,
minus edits with the structure, the routine names and some comments.

Unfortunately, the set of regression tests was too shy because this
only tested for the case of duplicated tokens when overridding
mappings, while missing:
- ADD MAPPING that failed on a constraint failure.
- DROP MAPPING that failed with an deletion failure.
- The case of IF EXISTS with tokens supported or not supported by a
configuration's parser.  That was not related to the report of this
patch, but as we've discussed it is very easy to miss the historical
distinction between the way this clause is handled.

In short, there was zero coverage for these code paths.  See
DropConfigurationMapping() that was completely red, for one:
https://coverage.postgresql.org/src/backend/commands/tsearchcmds.c.gcov.html

This should be marked as covered pretty soon.  I was also hesitating
about the addition of a test with REPLACE, actually..
--
Michael

Attachment
Thanks for your work on this issue, and thanks for pushing it, again.

Michael Paquier <michael@paquier.xyz> 于2024年1月31日周三 12:25写道:
On Tue, Jan 30, 2024 at 07:51:45PM +0800, Tender Wang wrote:
> I refactor previes patch based on your suggestion, please review the new
> attached patch.

Thanks for the new patch.  I've found the logic to be basically OK,
minus edits with the structure, the routine names and some comments.

Unfortunately, the set of regression tests was too shy because this
only tested for the case of duplicated tokens when overridding
mappings, while missing:
- ADD MAPPING that failed on a constraint failure.
- DROP MAPPING that failed with an deletion failure.
- The case of IF EXISTS with tokens supported or not supported by a
configuration's parser.  That was not related to the report of this
patch, but as we've discussed it is very easy to miss the historical
distinction between the way this clause is handled.

In short, there was zero coverage for these code paths.  See
DropConfigurationMapping() that was completely red, for one:
https://coverage.postgresql.org/src/backend/commands/tsearchcmds.c.gcov.html

This should be marked as covered pretty soon.  I was also hesitating
about the addition of a test with REPLACE, actually..
--
Michael


--
Tender Wang
OpenPie:  https://en.openpie.com/