RE: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers
From | tanghy.fnst@fujitsu.com |
---|---|
Subject | RE: [HACKERS] logical decoding of two-phase transactions |
Date | |
Msg-id | OS0PR01MB611332752AFF6ADF580AB849FB259@OS0PR01MB6113.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: [HACKERS] logical decoding of two-phase transactions (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: [HACKERS] logical decoding of two-phase transactions
|
List | pgsql-hackers |
> > 13. > > @@ -507,7 +558,16 @@ CreateSubscription(CreateSubscriptionStmt *stmt, > > bool isTopLevel) > > { > > Assert(slotname); > > > > - walrcv_create_slot(wrconn, slotname, false, > > + /* > > + * Even if two_phase is set, don't create the slot with > > + * two-phase enabled. Will enable it once all the tables are > > + * synced and ready. This avoids race-conditions like prepared > > + * transactions being skipped due to changes not being applied > > + * due to checks in should_apply_changes_for_rel() when > > + * tablesync for the corresponding tables are in progress. See > > + * comments atop worker.c. > > + */ > > + walrcv_create_slot(wrconn, slotname, false, false, > > > > Can't we enable two_phase if copy_data is false? Because in that case, > > all relations will be in a READY state. If we do that then we should > > also set two_phase state as 'enabled' during createsubscription. I > > think we need to be careful to check that connect option is given and > > copy_data is false before setting such a state. Now, I guess we may > > not be able to optimize this to not set 'enabled' state when the > > subscription has no rels. > > > > Fixed in v77-0001 I noticed this modification in v77-0001 and executed "CREATE SUBSCRIPTION ... WITH (two_phase = on, copy_data = false)",but it crashed. ------------- postgres=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres' PUBLICATION pub WITH(two_phase = on, copy_data = false); WARNING: relcache reference leak: relation "pg_subscription" not closed WARNING: snapshot 0x34278d0 still active NOTICE: created replication slot "sub" on publisher server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !?> ------------- There are two warnings and a segmentation fault in subscriber log: ------------- 2021-05-24 15:08:32.435 CST [2848572] WARNING: relcache reference leak: relation "pg_subscription" not closed 2021-05-24 15:08:32.435 CST [2848572] WARNING: snapshot 0x32ce8b0 still active 2021-05-24 15:08:33.012 CST [2848555] LOG: server process (PID 2848572) was terminated by signal 11: Segmentation fault 2021-05-24 15:08:33.012 CST [2848555] DETAIL: Failed process was running: CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres'PUBLICATION pub WITH(two_phase = on, copy_data = false); ------------- The backtrace about segmentation fault is attached. It happened in table_close function, we got it because "CurrentResourceOwner"was NULL. I think it was related with the first warning, which reported "relcache reference leak". The backtrace information is attached,too. When updating two-phase state in CreateSubscription function, it released resource owner and set "CurrentResourceOwner"as NULL in CommitTransaction function. The second warning about "snapshot still active" was also happened in CommitTransaction function. It called AtEOXact_Snapshotfunction, checked leftover snapshots and reported the warning. I debugged and found the snapshot was added in function PortalRunUtility by calling PushActiveSnapshot function, the addressof "ActiveSnapshot" at this time was same as the address in warning. In summary, when creating subscription with two_phase = on and copy_data = false, it calls UpdateTwoPhaseState function inCreateSubscription function to set two_phase state as 'enabled', and it checked and released relcache and snapshot tooearly so the NG happened. I think some change should be made to avoid it. Thought? FYI I also tested the new released V78* at [1], the above NG still exists. [1] https://www.postgresql.org/message-id/CAFPTHDab56twVmC%2B0a%3DRNcRw4KuyFdqzW0JAcvJdS63n_fRnOQ%40mail.gmail.com Regards Tang
Attachment
pgsql-hackers by date: