Re: pg_stat_activity crashes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: pg_stat_activity crashes
Date
Msg-id CA+TgmobYX6+d695SE3-UZ1HL326at-9sr4JzQxBW3GAW4exjEA@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_activity crashes  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pg_stat_activity crashes  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On Thu, Apr 21, 2016 at 2:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Apr 20, 2016 at 10:56 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> At Wed, 20 Apr 2016 15:14:16 +0200, Petr Jelinek <petr@2ndquadrant.com> wrote in <571780A8.4070902@2ndquadrant.com>
>>> I noticed sporadic segfaults when selecting from pg_stat_activity on
>>> current HEAD.
>>>
>>> The culprit is the 53be0b1add7064ca5db3cd884302dfc3268d884e commit
>>> which added more wait info into the pg_stat_get_activity(). More
>>> specifically, the following code is broken:
>>>
>>> +                       proc = BackendPidGetProc(beentry->st_procpid);
>>> + wait_event_type = pgstat_get_wait_event_type(proc->wait_event_info);
>>>
>>> This needs to check if proc is NULL. When reading the code I noticed
>>> that the new functions pg_stat_get_backend_wait_event_type() and
>>> pg_stat_get_backend_wait_event() suffer from the same problem.
>>
>> Good catch.
>
> Agreed.
>
>>> Here is PoC patch which fixes the problem. I am wondering if we should
>>> raise warning in the pg_stat_get_backend_wait_event_type() and
>>> pg_stat_get_backend_wait_event() like the pg_signal_backend() does
>>> when proc is NULL instead of just returning NULL which is what this
>>> patch does though.
>>
>> It still makes the two relevant columns in pg_stat_activity
>> inconsistent each other since it reads the procarray entry twice
>> without a lock on procarray.
>>
>> The attached patch adds pgstat_get_wait_event_info to read
>> wait_event_info in more appropriate way. Then change
>> pg_stat_get_wait_event(_type) to take the wait_event_info.
>>
>> Does this work for you?
>
> This is a hideously way of fixing this problem.  The whole point of
> storing the wait event in a 4-byte integer is that we already assume
> reads of 4 byte integers are atomic and thus no lock is needed.  The
> only thing we need to do is to prevent the value from being read
> twice, and we already have precedent for how to prevent that in
> freelist.c.

That was intended to say "a hideously expensive way".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: pg_stat_activity crashes
Next
From: Robert Haas
Date:
Subject: Re: max_parallel_degree > 0 for 9.6 beta