Re: RFC: replace pg_stat_activity.waiting with something more descriptive - Mailing list pgsql-hackers

From Robert Haas
Subject Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Date
Msg-id CA+TgmoZe6zZ6USgfmQsdAtB9zwiQEnPseNBd=h9TN0CbzyH71g@mail.gmail.com
Whole thread Raw
In response to Re: RFC: replace pg_stat_activity.waiting with something more descriptive  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: RFC: replace pg_stat_activity.waiting with something more descriptive  (Vladimir Borodin <root@simply.name>)
Re: RFC: replace pg_stat_activity.waiting with something more descriptive  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Sep 16, 2015 at 12:29 PM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> Yes, the major question is cost. But I think we should validate our thoughts
> by experiments assuming there are more possible synchronization protocols.
> Ildus posted implemention of double buffering approach that showed quite low
> cost.

I'm not sure exactly which email you are referring to, but I don't
believe that anyone has done experiments that are anywhere near
comprehensive enough to convince ourselves that this won't be a
problem.  If a particular benchmark doesn't show an issue, that can
just mean that the benchmark isn't hitting the case where there is a
problem.  For example, EDB has had customers who have severe
contention apparently on the buffer content lwlocks, resulting in big
slowdowns.  You don't see that in, say, a pgbench run.  But for people
who have certain kinds of queries, it's really bad.  Those sort of
loads, where the lwlock system really gets stressed, are cases where
adding overhead seems likely to pinch.

> Yes, but some competing products also provides comprehensive waits
> monitoring too. That makes me think it should be possible for us too.

I agree, but keep in mind that some of those products may use
techniques to reduce the overhead that we don't have available.  I
have a strong suspicion that one of those products in particular has
done something clever to make measuring the time cheap on all
platforms.  Whatever that clever thing is, we haven't done it.  So
that matters.

> I think the reason for hooks could be not only disagreements about design,
> but platform dependent issues too.
> Next step after we have view with current wait events will be gathering some
> statistics of them. We can oppose at least two approaches here:
> 1) Periodical sampling of current wait events.
> 2) Measure each wait event duration. We could collect statistics for short
> period locally and update shared memory structure periodically (using some
> synchronization protocol).
>
> In the previous attempt to gather lwlocks statistics, you predict that
> sampling could have a significant overhead [1]. In contrast, on many systems
> time measurements are cheap. We have implemented both approaches and it
> shows that sampling every 1 milliseconds produce higher overhead than
> individual duration measurements for each wait event. We can share another
> version of waits monitoring based on sampling to make these results
> reproducible for everybody. However, cheap time measurements are available
> not for each platform. For instance, ISTM that on Windows time measurements
> are too expensive [2].
>
> That makes me think that we need pluggable solution, at least for
> statistics: direct measuring of events durations for majority of systems and
> sampling for others as the least harm.

To me, those seem like arguments for making it configurable, but not
necessarily for having hooks.

>> I think it's reasonable to consider reporting this data in the PGPROC
>> using a 4-byte integer rather than reporting it through a singe byte
>> in the backend status structure.  I believe that addresses the
>> concerns about reporting from auxiliary processes, and it also allows
>> a little more data to be reported.  For anything in excess of that, I
>> think we should think rather harder.  Most likely, such addition
>> detail should be reported only for certain types of wait events, or on
>> a delay, or something like that, so that the core mechanism remains
>> really, really fast.
>
> That sounds reasonable. There are many pending questions, but it seems like
> step forward to me.

Great, let's do it.  I think we should probably do the work to
separate the non-individual lwlocks into tranches first, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: WIP: Rework access method interface
Next
From: Pavel Stehule
Date:
Subject: Re: On-demand running query plans using auto_explain and signals