Thread: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot
BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 15888 Logged by: Marko Tiikkaja Email address: marko@joh.to PostgreSQL version: 9.6.14 Operating system: OS X 10.12.6 Description: Run: pg_recvlogical -S foo -f /dev/null -d dbname --create-slot --start and pg_stat_activity will show something like this until that session disconnects: -[ RECORD 1 ]----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- datid | 9672230 datname | dbname pid | 81996 usesysid | 10 usename | marko application_name | pg_recvlogical client_addr | client_hostname | client_port | -1 backend_start | 2019-07-03 10:02:00.459737+02 xact_start | query_start | state_change | 2019-07-03 10:02:00.726645+02 wait_event_type | wait_event | state | idle in transaction backend_xid | backend_xmin | query | which is quite obviously bogus.
Re: BUG #15888: Bogus "idle in transaction" state for logicaldecoding client after creating a slot
From
Michael Paquier
Date:
On Wed, Jul 03, 2019 at 08:03:24AM +0000, PG Bug reporting form wrote: > Run: > > pg_recvlogical -S foo -f /dev/null -d dbname --create-slot --start > > and pg_stat_activity will show something like this until that session > disconnects: > > which is quite obviously bogus. I think that's the transaction which is started when exporting the snapshot at logical slot creation. Or in short: #2 0x000055b5f8e5720f in StartTransactionCommand () at xact.c:2699 #3 0x000055b5f90fab4e in SnapBuildExportSnapshot (builder=0x55b5fa3427b0) at snapbuild.c:575 #4 0x000055b5f90fe5df in CreateReplicationSlot (cmd=0x55b5fa2914e8) at walsender.c:843 SnapBuildExportSnapshot() needs to keep a transaction context opened to keep the wanted xmin around. In PG >= 10, this problem gets "solved" by using NOEXPORT_SNAPSHOT in the CREATE_REPLICATION_SLOT command. One idea would be to enforce a commit in this case from CreateReplicationSlot/streamutil.c? We know that pg_recvlogical is not going to use this snapshot anyway... -- Michael
Attachment
Re: BUG #15888: Bogus "idle in transaction" state for logicaldecoding client after creating a slot
From
Andres Freund
Date:
Hi, On 2019-07-04 13:35:12 +0900, Michael Paquier wrote: > On Wed, Jul 03, 2019 at 08:03:24AM +0000, PG Bug reporting form wrote: > > Run: > > > > pg_recvlogical -S foo -f /dev/null -d dbname --create-slot --start > > > > and pg_stat_activity will show something like this until that session > > disconnects: > > > > which is quite obviously bogus. > > I think that's the transaction which is started when exporting the > snapshot at logical slot creation. Or in short: > #2 0x000055b5f8e5720f in StartTransactionCommand () at xact.c:2699 > #3 0x000055b5f90fab4e in SnapBuildExportSnapshot > (builder=0x55b5fa3427b0) at snapbuild.c:575 > #4 0x000055b5f90fe5df in CreateReplicationSlot (cmd=0x55b5fa2914e8) > at walsender.c:843 > > SnapBuildExportSnapshot() needs to keep a transaction context opened > to keep the wanted xmin around. In PG >= 10, this problem gets > "solved" by using NOEXPORT_SNAPSHOT in the CREATE_REPLICATION_SLOT > command. One idea would be to enforce a commit in this case from > CreateReplicationSlot/streamutil.c? We know that pg_recvlogical is > not going to use this snapshot anyway... The transaction ought to be finished by exec_replication_command() calling SnapBuildClearExportedSnapshot(), which then does AbortCurrentTransaction(). So something's wrong if that's not working anymore. I just tested this on a slightly older v12 checkout, and the general mechanism works: S1(replication): postgres[13464][1]=# CREATE_REPLICATION_SLOT foo LOGICAL test_decoding; S2(plain): postgres[13317][1]=# SELECT state, wait_event_type, wait_event, backend_xid, backend_xmin FROM pg_stat_activity WHERE pid= 13464; ┌─────────────────────┬─────────────────┬────────────┬─────────────┬──────────────┐ │ state │ wait_event_type │ wait_event │ backend_xid │ backend_xmin │ ├─────────────────────┼─────────────────┼────────────┼─────────────┼──────────────┤ │ idle in transaction │ Client │ ClientRead │ (null) │ 1494339463 │ └─────────────────────┴─────────────────┴────────────┴─────────────┴──────────────┘ (1 row) S1(replication): postgres[13464][1]=# IDENTIFY_SYSTEM; ┌─────────────────────┬──────────┬──────────────┬──────────┐ │ systemid │ timeline │ xlogpos │ dbname │ ├─────────────────────┼──────────┼──────────────┼──────────┤ │ 6704433050907079638 │ 1 │ 3AC/1A25B700 │ postgres │ └─────────────────────┴──────────┴──────────────┴──────────┘ S2(plain): postgres[13317][1]=# SELECT state, wait_event_type, wait_event, backend_xid, backend_xmin FROM pg_stat_activity WHERE pid= 13464; ┌───────┬─────────────────┬────────────┬─────────────┬──────────────┐ │ state │ wait_event_type │ wait_event │ backend_xid │ backend_xmin │ ├───────┼─────────────────┼────────────┼─────────────┼──────────────┤ │ idle │ Client │ ClientRead │ (null) │ (null) │ └───────┴─────────────────┴────────────┴─────────────┴──────────────┘ (1 row) I *can* reproduce with 9.6. It's worthwhile to note that the transaction *is* actually aborted, via the path above. As far as I can tell the problem is "solely" that we never reach a set_ps_display() that resets the 'idle in transaction' bit, because START_REPLICATION "takes over" the connection once started, and we don't go through PostgresMain() again. The problem doesn't exist in later versions, because there exec_replication_command() does an explicit /* Report to pgstat that this process is now idle */ pgstat_report_activity(STATE_IDLE, NULL); That's due to commit d02974e32e028fc078d8f5eca1d6a4516efb0aa6 Author: Magnus Hagander <magnus@hagander.net> Date: 2017-12-29 16:19:51 +0100 Properly set base backup backends to active in pg_stat_activity which is post 9.6. Not sure I really feel a huge appetite for whacking this around in the back branches. Greetings, Andres Freund
Re: BUG #15888: Bogus "idle in transaction" state for logicaldecoding client after creating a slot
From
Alvaro Herrera
Date:
On 2019-Jul-09, Andres Freund wrote: > The problem doesn't exist in later versions, because there > exec_replication_command() does an explicit > /* Report to pgstat that this process is now idle */ > pgstat_report_activity(STATE_IDLE, NULL); > That's due to > > commit d02974e32e028fc078d8f5eca1d6a4516efb0aa6 > Author: Magnus Hagander <magnus@hagander.net> > Date: 2017-12-29 16:19:51 +0100 > > Properly set base backup backends to active in pg_stat_activity > > which is post 9.6. Not sure I really feel a huge appetite for whacking > this around in the back branches. Hmm ... so what that commit did is precisely to fix this bug. Magnus thought at the time he was fixing a pg10 bug, https://postgr.es/m/CABUevEwX4g8y=gmgfPzxFKS7gqjSYNR949Xc96OQm=YXJmh_Og@mail.gmail.com but apparently now we see that the bug was older than that. Maybe it's okay to backpatch further? (Booh to missing "Discussion:" tags in commit messages) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #15888: Bogus "idle in transaction" state for logicaldecoding client after creating a slot
From
Michael Paquier
Date:
On Tue, Jul 09, 2019 at 05:46:30PM -0400, Alvaro Herrera wrote: > Hmm ... so what that commit did is precisely to fix this bug. > Magnus thought at the time he was fixing a pg10 bug, > https://postgr.es/m/CABUevEwX4g8y=gmgfPzxFKS7gqjSYNR949Xc96OQm=YXJmh_Og@mail.gmail.com > but apparently now we see that the bug was older than that. Maybe it's > okay to backpatch further? Yes, I would not be against a back-patch in this case. There is a minor conflict because pre-9.6 WAL senders cannot run SQL queries but that looks simple enough to solve. > (Booh to missing "Discussion:" tags in commit messages) Booh++ -- Michael
Attachment
Re: BUG #15888: Bogus "idle in transaction" state for logicaldecoding client after creating a slot
From
Andres Freund
Date:
Hi, On 2019-07-10 13:51:47 +0900, Michael Paquier wrote: > On Tue, Jul 09, 2019 at 05:46:30PM -0400, Alvaro Herrera wrote: > > Hmm ... so what that commit did is precisely to fix this bug. > > Magnus thought at the time he was fixing a pg10 bug, > > https://postgr.es/m/CABUevEwX4g8y=gmgfPzxFKS7gqjSYNR949Xc96OQm=YXJmh_Og@mail.gmail.com > > but apparently now we see that the bug was older than that. Maybe it's > > okay to backpatch further? > > Yes, I would not be against a back-patch in this case. There is a > minor conflict because pre-9.6 WAL senders cannot run SQL queries but > that looks simple enough to solve. I'm not worried about backpatching that in isolation - but I'm worried that just backpatching that indidividual commit isn't going to yield particularly satisfactory behaviour. The code around this isn't super robust, and pg10+ are noticably different due to the changes needed to allow to execute normal queries over a replication connection. Greetings, Andres Freund
Re: BUG #15888: Bogus "idle in transaction" state for logicaldecoding client after creating a slot
From
Magnus Hagander
Date:
On Tue, Jul 9, 2019 at 11:46 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Jul-09, Andres Freund wrote:
> The problem doesn't exist in later versions, because there
> exec_replication_command() does an explicit
> /* Report to pgstat that this process is now idle */
> pgstat_report_activity(STATE_IDLE, NULL);
> That's due to
>
> commit d02974e32e028fc078d8f5eca1d6a4516efb0aa6
> Author: Magnus Hagander <magnus@hagander.net>
> Date: 2017-12-29 16:19:51 +0100
>
> Properly set base backup backends to active in pg_stat_activity
>
> which is post 9.6. Not sure I really feel a huge appetite for whacking
> this around in the back branches.
Hmm ... so what that commit did is precisely to fix this bug.
Magnus thought at the time he was fixing a pg10 bug,
https://postgr.es/m/CABUevEwX4g8y=gmgfPzxFKS7gqjSYNR949Xc96OQm=YXJmh_Og@mail.gmail.com
but apparently now we see that the bug was older than that. Maybe it's
okay to backpatch further?
I don't recall any particular reason around the backpatching other than as you say, probably not realizing it. At the time we didn't have logical replication yet (it landed later), so the actual uses of the logical decoding were nowhere near as frequent...
(Booh to missing "Discussion:" tags in commit messages)
If only we could agree on a template/standard for those :P
//Magnus
Re: BUG #15888: Bogus "idle in transaction" state for logicaldecoding client after creating a slot
From
Michael Paquier
Date:
On Wed, Jul 10, 2019 at 05:24:44PM +0200, Magnus Hagander wrote: > On Tue, Jul 9, 2019 at 11:46 PM Alvaro Herrera <alvherre@2ndquadrant.com> > wrote: > (Booh to missing "Discussion:" tags in commit messages) >> >> > If only we could agree on a template/standard for those :P This would make an excellent topic for a developper meeting with most of the main committers around the table. -- Michael
Attachment
Re: BUG #15888: Bogus "idle in transaction" state for logicaldecoding client after creating a slot
From
Magnus Hagander
Date:
On Thu, Jul 11, 2019 at 1:58 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 10, 2019 at 05:24:44PM +0200, Magnus Hagander wrote:
> On Tue, Jul 9, 2019 at 11:46 PM Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
> (Booh to missing "Discussion:" tags in commit messages)
>>
>>
> If only we could agree on a template/standard for those :P
This would make an excellent topic for a developper meeting with most
of the main committers around the table.
We've tried that once, and I think even twice, and failed miserably at any form of agreement. But we could certainly try again :)
Re: BUG #15888: Bogus "idle in transaction" state for logicaldecoding client after creating a slot
From
Michael Paquier
Date:
On Thu, Jul 11, 2019 at 01:21:39PM +0200, Magnus Hagander wrote: > We've tried that once, and I think even twice, and failed miserably at any > form of agreement. But we could certainly try again :) Or we could just drop a thread about that.. -- Michael
Attachment
Re: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot
From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes: > We've tried that once, and I think even twice, and failed miserably at any > form of agreement. But we could certainly try again :) I thought we *did* have an agreement, to wit using Discussion: https://postgr.es/m/<message-id> to link to relevant mail thread(s). Some people use more tags but that seems inessential to me. regards, tom lane
Re: BUG #15888: Bogus "idle in transaction" state for logicaldecoding client after creating a slot
From
Michael Paquier
Date:
On Thu, Jul 11, 2019 at 09:44:07AM -0400, Tom Lane wrote: > I thought we *did* have an agreement, to wit using > > Discussion: https://postgr.es/m/<message-id> > > to link to relevant mail thread(s). Some people use more tags > but that seems inessential to me. Hehe. I actually was thinking about advocating for having more of them in the commit logs. I'll just start a new thread about what I had in mind. Perhaps that will lead us nowhere, but let's see. -- Michael