Thread: [HACKERS] Issue with logical replication: MyPgXact->xmin already is valid

[HACKERS] Issue with logical replication: MyPgXact->xmin already is valid

From
Konstantin Knizhnik
Date:
When creating logical replication slots we quite often get the following 
error:

ERROR:  cannot build an initial slot snapshot when MyPgXact->xmin 
already is valid

which cause restart of WAL sender.
The comment to this line doesn't clarify much:
    /* so we don't overwrite the existing value */    if (TransactionIdIsValid(MyPgXact->xmin))        elog(ERROR,
"cannotbuild an initial slot snapshot when 
 
MyPgXact->xmin already is valid");


I have checked that MyPgXact->xmin is set by GetSnapshotData:

#3  0x000000000086025e in GetSnapshotData (snapshot=0xf28040 
<CatalogSnapshotData>) at procarray.c:1714
#4  0x0000000000a48523 in GetNonHistoricCatalogSnapshot (relid=2615) at 
snapmgr.c:479
#5  0x0000000000a484d3 in GetCatalogSnapshot (relid=2615) at snapmgr.c:452
#6  0x00000000004f15bf in systable_beginscan (heapRelation=0x256bdb0, 
indexId=2684, indexOK=1 '\001', snapshot=0x0, nkeys=1, 
key=0x7ffdf633c770) at genam.c:353
#7  0x00000000009d676e in SearchCatCache (cache=0x25230a8, v1=39586984, 
v2=0, v3=0, v4=0) at catcache.c:1169
#8  0x00000000009ec13d in SearchSysCache (cacheId=35, key1=39586984, 
key2=0, key3=0, key4=0) at syscache.c:1109
#9  0x00000000009ec259 in GetSysCacheOid (cacheId=35, key1=39586984, 
key2=0, key3=0, key4=0) at syscache.c:1187
#10 0x0000000000574b85 in get_namespace_oid (nspname=0x25c0ca8 
"pg_catalog", missing_ok=1 '\001') at namespace.c:3009
#11 0x0000000000574886 in LookupExplicitNamespace (nspname=0x25c0ca8 
"pg_catalog", missing_ok=1 '\001') at namespace.c:2871
#12 0x000000000057437d in get_ts_config_oid (names=0x25c2438, 
missing_ok=1 '\001') at namespace.c:2653
#13 0x00000000009f56ca in check_TSCurrentConfig (newval=0x7ffdf633cb90, 
extra=0x7ffdf633cba0, source=PGC_S_FILE) at ts_cache.c:600
#14 0x0000000000a1bbdc in call_string_check_hook (conf=0xf26870 
<ConfigureNamesString+6992>, newval=0x7ffdf633cb90, 
extra=0x7ffdf633cba0, source=PGC_S_FILE, elevel=12) at guc.c:9912
#15 0x0000000000a14420 in parse_and_validate_value (record=0xf26870 
<ConfigureNamesString+6992>, name=0x25c0620 
"default_text_search_config", value=0x25c0658 "pg_catalog.english", 
source=PGC_S_FILE, elevel=12,    newval=0x7ffdf633cb90, newextra=0x7ffdf633cba0) at guc.c:5840
#16 0x0000000000a15543 in set_config_option (name=0x25c0620 
"default_text_search_config", value=0x25c0658 "pg_catalog.english", 
context=PGC_SIGHUP, source=PGC_S_FILE, action=GUC_ACTION_SET, 
changeVal=1 '\001', elevel=12,    is_reload=0 '\000') at guc.c:6442
#17 0x0000000000a1eb5c in ProcessConfigFileInternal (context=PGC_SIGHUP, 
applySettings=1 '\001', elevel=13) at guc-file.l:439
#18 0x0000000000a1e4ee in ProcessConfigFile (context=PGC_SIGHUP) at 
guc-file.l:155
#19 0x000000000082433c in WalSndWaitForWal (loc=25991904) at 
walsender.c:1309
#20 0x0000000000823423 in logical_read_xlog_page (state=0x25a4f10, 
targetPagePtr=25985024, reqLen=6880, targetRecPtr=25991880, 
cur_page=0x25a6c60 "\227\320\005", pageTLI=0x25a57bc) at walsender.c:761
#21 0x0000000000558c3d in ReadPageInternal (state=0x25a4f10, 
pageptr=25985024, reqLen=6880) at xlogreader.c:556
#22 0x0000000000558405 in XLogReadRecord (state=0x25a4f10, 
RecPtr=25991880, errormsg=0x7ffdf633cea8) at xlogreader.c:255
#23 0x000000000080dda6 in DecodingContextFindStartpoint (ctx=0x25a4c50) 
at logical.c:450
#24 0x0000000000823a0c in CreateReplicationSlot (cmd=0x24dc398) at 
walsender.c:934
#25 0x00000000008247e4 in exec_replication_command (cmd_string=0x254e8f0 
"CREATE_REPLICATION_SLOT \"shardman_copy_t_10_3_4_17307_sync_17302\" 
TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at walsender.c:1515
#26 0x000000000088eccc in PostgresMain (argc=1, argv=0x24f0b28, 
dbname=0x24f0948 "postgres", username=0x24bf7f0 "knizhnik") at 
postgres.c:4086
#27 0x00000000007ef9e2 in BackendRun (port=0x24dee00) at postmaster.c:4357
#28 0x00000000007ef10c in BackendStartup (port=0x24dee00) at 
postmaster.c:4029
#29 0x00000000007eb6cc in ServerLoop () at postmaster.c:1753
#30 0x00000000007eacb8 in PostmasterMain (argc=3, argv=0x24bd660) at 
postmaster.c:1361
#31 0x0000000000728593 in main (argc=3, argv=0x24bd660) at main.c:228


I wonder if it is normal situation or something goes wrong?





-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue with logical replication: MyPgXact->xmin alreadyis valid

From
Petr Jelinek
Date:
On 06/10/17 12:16, Konstantin Knizhnik wrote:
> When creating logical replication slots we quite often get the following
> error:
> 
> ERROR:  cannot build an initial slot snapshot when MyPgXact->xmin
> already is valid
> 
> which cause restart of WAL sender.
> The comment to this line doesn't clarify much:
> 
>     /* so we don't overwrite the existing value */
>     if (TransactionIdIsValid(MyPgXact->xmin))
>         elog(ERROR, "cannot build an initial slot snapshot when
> MyPgXact->xmin already is valid");
>>
> I wonder if it is normal situation or something goes wrong?
> 

Hi,

no it's not normal situation, it seems you are doing something that
assigns xid before you run the CREATE_REPLICATION_SLOT command on that
connection.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue with logical replication: MyPgXact->xmin alreadyis valid

From
Konstantin Knizhnik
Date:

On 06.10.2017 15:29, Petr Jelinek wrote:
> On 06/10/17 12:16, Konstantin Knizhnik wrote:
>> When creating logical replication slots we quite often get the following
>> error:
>>
>> ERROR:  cannot build an initial slot snapshot when MyPgXact->xmin
>> already is valid
>>
>> which cause restart of WAL sender.
>> The comment to this line doesn't clarify much:
>>
>>      /* so we don't overwrite the existing value */
>>      if (TransactionIdIsValid(MyPgXact->xmin))
>>          elog(ERROR, "cannot build an initial slot snapshot when
>> MyPgXact->xmin already is valid");
>> I wonder if it is normal situation or something goes wrong?
>>
> Hi,
>
> no it's not normal situation, it seems you are doing something that
> assigns xid before you run the CREATE_REPLICATION_SLOT command on that
> connection.

I have not doing something in this connection: it is wal sender 
executing CREATE_REPLICATION_SLOT replication command.
Please look at the stack in my original e-mail. It shows who and when is 
setting MyPgXact->xmin.
It is GetSnapshotData called because of reloading configuration settings.
And configuration setting are reloaded because our application is 
updating "synchronous_standby_names".


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue with logical replication: MyPgXact->xmin alreadyis valid

From
Petr Jelinek
Date:
On 06/10/17 16:46, Konstantin Knizhnik wrote:
> 
> 
> On 06.10.2017 15:29, Petr Jelinek wrote:
>> On 06/10/17 12:16, Konstantin Knizhnik wrote:
>>> When creating logical replication slots we quite often get the following
>>> error:
>>>
>>> ERROR:  cannot build an initial slot snapshot when MyPgXact->xmin
>>> already is valid
>>>
>>> which cause restart of WAL sender.
>>> The comment to this line doesn't clarify much:
>>>
>>>      /* so we don't overwrite the existing value */
>>>      if (TransactionIdIsValid(MyPgXact->xmin))
>>>          elog(ERROR, "cannot build an initial slot snapshot when
>>> MyPgXact->xmin already is valid");
>>> I wonder if it is normal situation or something goes wrong?
>>>
>> Hi,
>>
>> no it's not normal situation, it seems you are doing something that
>> assigns xid before you run the CREATE_REPLICATION_SLOT command on that
>> connection.
> 
> I have not doing something in this connection: it is wal sender
> executing CREATE_REPLICATION_SLOT replication command.
> Please look at the stack in my original e-mail. It shows who and when is
> setting MyPgXact->xmin.
> It is GetSnapshotData called because of reloading configuration settings.
> And configuration setting are reloaded because our application is
> updating "synchronous_standby_names".
> 
> 

Hmm so you start transaction (you have to when running
CREATE_REPLICATION_SLOT with USE_SNAPSHOT parameter). And while the slot
is being created the config is reloaded. And since now you are in
transaction the tsearch hook for GUC processing tries to access catalogs
which sets the xmin for the transaction.

That's not good, but I can't really say I have idea about what to do
with it other than to set some kind of flag saying that logical decoding
snapshot is being built and using that to skip catalog access which does
not seem very pretty.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue with logical replication: MyPgXact->xmin alreadyis valid

From
Konstantin Knizhnik
Date:
On 10/07/2017 04:26 PM, Petr Jelinek wrote:
>
> Hmm so you start transaction (you have to when running
> CREATE_REPLICATION_SLOT with USE_SNAPSHOT parameter). And while the slot
> is being created the config is reloaded. And since now you are in
> transaction the tsearch hook for GUC processing tries to access catalogs
> which sets the xmin for the transaction.

Actually this slot is implicitly created by LogicalRepSyncTableStart to perform initial data sync.

>
> That's not good, but I can't really say I have idea about what to do
> with it other than to set some kind of flag saying that logical decoding
> snapshot is being built and using that to skip catalog access which does
> not seem very pretty.
>
It is not quite clear from the comment why it is not possible to overwrite MyPgXact->xmin or just use existed value.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue with logical replication: MyPgXact->xmin alreadyis valid

From
Petr Jelinek
Date:
On 07/10/17 18:23, Konstantin Knizhnik wrote:
> On 10/07/2017 04:26 PM, Petr Jelinek wrote:
>>
>> Hmm so you start transaction (you have to when running
>> CREATE_REPLICATION_SLOT with USE_SNAPSHOT parameter). And while the slot
>> is being created the config is reloaded. And since now you are in
>> transaction the tsearch hook for GUC processing tries to access catalogs
>> which sets the xmin for the transaction.
> 
> Actually this slot is implicitly created by LogicalRepSyncTableStart to
> perform initial data sync.
> 

That does not change what I said above.

>>
>> That's not good, but I can't really say I have idea about what to do
>> with it other than to set some kind of flag saying that logical decoding
>> snapshot is being built and using that to skip catalog access which does
>> not seem very pretty.
>>
> It is not quite clear from the comment why it is not possible to
> overwrite MyPgXact->xmin or just use existed value.
> 

We can't use existing value because the snapshot used to read data would
no longer correspond to the one we just built as consistent start point.

About overwriting, well we may be able to do that if there was no active
snapshot in the transaction, ie we could try doing something like
SnapshotResetXmin() or possibly rather InvalidateCatalogSnapshot() since
the xmin was set for reading catalog snapshot before checking the
MyPgXact->xmin.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers