Thread: Background Processes and reporting

Background Processes and reporting

From
Andres Freund
Date:
Hi,

We now have "Provide much better wait information in pg_stat_activity"
and "Add a generic command progress reporting facility" making it easier
to provide insight into the system.


While working on the writeback control / checkpoint sorting patch I'd
the following statement in BufferSync()'s main loop:
   fprintf(stderr, "\33[2K\rto_scan: %d, scanned: %d, %%processed: %.2f, %%writeouts: %.2f",           num_to_scan,
num_processed,          (((double) num_processed) / num_to_scan) * 100,           ((double) num_written /
num_processed)* 100);
 

which basically printed the progress of a checkpoint, and some
additional detail to stderr. Quite helpful to see whether progress is
"unsteady".

Obviously that's not something that could be committed.

So I'm wondering how we can make it possible to use the aforementioned
"progress reporting facility" to monitor checkpoint progress. To which
Robert replied on IM:
"it wouldn't quite help with that because the checkpointer doesn't show
up as a regular backend"


It seems rather worthwhile to think about how we can expand the coverage
of progress tracking to other types of background processes.


Similarly for the wait event stuff - checkpointer, wal writer,
background writer are in many cases processes that very often are
blocked on locks, IO and such.  Thus restricting the facility to
database connected processes seems like a loss.


Greetings,

Andres Freund



Re: Background Processes and reporting

From
Vladimir Borodin
Date:

11 марта 2016 г., в 22:16, Andres Freund <andres@anarazel.de> написал(а):

Hi,

We now have "Provide much better wait information in pg_stat_activity"
and "Add a generic command progress reporting facility" making it easier
to provide insight into the system.


While working on the writeback control / checkpoint sorting patch I'd
the following statement in BufferSync()'s main loop:

   fprintf(stderr, "\33[2K\rto_scan: %d, scanned: %d, %%processed: %.2f, %%writeouts: %.2f",
           num_to_scan, num_processed,
           (((double) num_processed) / num_to_scan) * 100,
           ((double) num_written / num_processed) * 100);

which basically printed the progress of a checkpoint, and some
additional detail to stderr. Quite helpful to see whether progress is
"unsteady".

Obviously that's not something that could be committed.

So I'm wondering how we can make it possible to use the aforementioned
"progress reporting facility" to monitor checkpoint progress. To which
Robert replied on IM:
"it wouldn't quite help with that because the checkpointer doesn't show
up as a regular backend"


It seems rather worthwhile to think about how we can expand the coverage
of progress tracking to other types of background processes.


Similarly for the wait event stuff - checkpointer, wal writer,
background writer are in many cases processes that very often are
blocked on locks, IO and such.  Thus restricting the facility to
database connected processes seems like a loss.

It was many times stated in threads about waits monitoring [0, 1, 2] and supported by different people, but ultimately waits information was stored in PgBackendStatus. Can’t we think one more time about implementation provided by Ildus and Alexander here [3]? That implementation included 1. waits monitoring for all process, 2. ability to trace waits of a particular process to file, 3. wait events history with sampling every N ms and 4. configurable measurement of wait timings. It has much more features, has been used in production on 100+ databases (patched 9.4) and gives wide opportunities for further development. Of course, huge work should be done to rebase across current master and cleanup but IMHO it is much better approach. Seems that current implementation doesn’t give reasonable ways to implement all that features and it is really sad.




Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
May the force be with you…

Re: Background Processes and reporting

From
Andres Freund
Date:
On 2016-03-11 23:53:15 +0300, Vladimir Borodin wrote:
> It was many times stated in threads about waits monitoring [0, 1, 2]
> and supported by different people, but ultimately waits information
> was stored in PgBackendStatus.

Only that it isn't. It's stored in PGPROC.  This criticism is true of
the progress reporting patch, but a quick scan of the thread doesn't
show authors of the wait events patch participating there.


> Can’t we think one more time about implementation provided by Ildus
> and Alexander here [3]?

I don't think so. Afaics the proposed patch tried to do too many things
at once, and it's authors didn't listen well to criticism.  Trying to go
back to that seems like a surefire way to have nothing in 9.6.


> Seems that current implementation doesn’t give reasonable ways to
> implement all that features and it is really sad.

Why is that?


Andres Freund



Re: Background Processes and reporting

From
Andres Freund
Date:
Hi,

On 2016-03-11 11:16:32 -0800, Andres Freund wrote:
> It seems rather worthwhile to think about how we can expand the coverage
> of progress tracking to other types of background processes.

WRT the progress reporting patch, I think we should split (as afaics was
discussed in the thread for a while) off the new part of PgBackendStatus
into it's own structure.

That'd not just allow using this from non-backend processes, but would
also have the advantage that the normal PgBackendStatus' changecount
doesn't advance quite so rapidly. E.g. when reporting progress of a
vacuum, the changecount will probably change at quite a rapid rate, but
that's uninteresting for e.g. pg_stat_activity.


> Similarly for the wait event stuff - checkpointer, wal writer,
> background writer are in many cases processes that very often are
> blocked on locks, IO and such.  Thus restricting the facility to
> database connected processes seems like a loss.

I think one way to address this would be to not only report
PgBackendStatus type processes in pg_stat_activity. While that'd
obviously be a compatibility break, I think it'd be an improvement.

Regards,

Andres



Re: Background Processes and reporting

From
Vladimir Borodin
Date:

12 марта 2016 г., в 0:22, Andres Freund <andres@anarazel.de> написал(а):

On 2016-03-11 23:53:15 +0300, Vladimir Borodin wrote:
It was many times stated in threads about waits monitoring [0, 1, 2]
and supported by different people, but ultimately waits information
was stored in PgBackendStatus.

Only that it isn't. It's stored in PGPROC.  

Sorry, I missed that. So monitoring of wait events for auxiliary processes still could be implemented?

This criticism is true of
the progress reporting patch, but a quick scan of the thread doesn't
show authors of the wait events patch participating there.


Can’t we think one more time about implementation provided by Ildus
and Alexander here [3]?

I don't think so. Afaics the proposed patch tried to do too many things
at once, and it's authors didn't listen well to criticism.  Trying to go
back to that seems like a surefire way to have nothing in 9.6.

The idea is not to try implement all that at once (and more in 9.6) but give an ability to implement all that features eventually. If it is still possible, it’s great.



Seems that current implementation doesn’t give reasonable ways to
implement all that features and it is really sad.

Why is that?

Storing information about wait event in 4 bytes gives an ability to store only wait type and event. No way to store duration or extra information (i.e. buffer number for I/O events or buffer manager LWLocks). Maybe I’m missing something...



Andres Freund


--
May the force be with you…

Re: Background Processes and reporting

From
Andres Freund
Date:
On 2016-03-12 01:05:43 +0300, Vladimir Borodin wrote:
> > 12 марта 2016 г., в 0:22, Andres Freund <andres@anarazel.de> написал(а):
> > Only that it isn't. It's stored in PGPROC.  
> 
> Sorry, I missed that. So monitoring of wait events for auxiliary processes still could be implemented?

It's basically a question of where to report the information.

> >> Seems that current implementation doesn’t give reasonable ways to
> >> implement all that features and it is really sad.
> > 
> > Why is that?
> 
> Storing information about wait event in 4 bytes gives an ability to
> store only wait type and event. No way to store duration or extra
> information (i.e. buffer number for I/O events or buffer manager
> LWLocks). Maybe I’m missing something...

Sure, but that that's just incrementally building features?


Greetings,

Andres Freund



Re: Background Processes and reporting

From
Alexander Korotkov
Date:
On Sat, Mar 12, 2016 at 12:22 AM, Andres Freund <andres@anarazel.de> wrote:
On 2016-03-11 23:53:15 +0300, Vladimir Borodin wrote:
> It was many times stated in threads about waits monitoring [0, 1, 2]
> and supported by different people, but ultimately waits information
> was stored in PgBackendStatus.

Only that it isn't. It's stored in PGPROC.

Yes. And this is good. Original concept of single byte PgBackendStatus of Robert looks incompatible with any further development of wait monitoring.
 
This criticism is true of
the progress reporting patch, but a quick scan of the thread doesn't
show authors of the wait events patch participating there.


> Can’t we think one more time about implementation provided by Ildus
> and Alexander here [3]?

I don't think so. Afaics the proposed patch tried to do too many things
at once, and it's authors didn't listen well to criticism.

Original patch really did too many things at once.  But it was good as prototype.
We should move forward by splitting it into many smaller parts.  But we didn't because disagreement about design.

Idea of individual time measurement of every wait event met criticism because it might have high overhead [1].  This is really so at least for Windows [2]. But accessing only current values wouldn't be very useful.  We anyway need to gather some statistics.  Gathering it by sampling would be both more expensive and less accurate for majority of systems.  This is why I proposed hooks to make possible platform dependent extensions.  Robert rejects hook because he is "not a big fan of hooks as a way of resolving disagreements about the design" [3].  Besides that is actually not design issues but platform issues...

Another question is wait parameters.  We want to expose wait event with some parameters.  Robert rejects that because it *might* add additional overhead [3]. When I proposed to fit something useful into hard-won 4-bytes, Roberts claims that it is "too clever" [4].

So, situation looks like dead-end.  I have no idea how to convince Robert about any kind of advanced functionality of wait monitoring to PostgreSQL.  I'm thinking about implementing sampling extension over current infrastructure just to make community see that it sucks. Andres, it would be very nice if you have any idea how to move this situation forward.

Another aspect is that EnterpriseDB offers waits monitoring in proprietary fork [5].
SQL Session/System wait tuning diagnostics
The Dynamic Runtime Instrumentation Tools Architecture (DRITA) allows a DBA to query catalog views to determine the wait events that affect the performance of individual sessions or the system as a whole. DRITA records the number of times each event occurs as well as the time spent waiting; you can use this information to diagnose performance problems.
And they declare to measure time of individual wait events.  This is exactly thing which Robert resist so much.  So, I suspect some kind of hidden motivation here.  Probably, EBD guys just don't want to loose majority of their proprietary product over open source PostgreSQL.


------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Background Processes and reporting

From
Andres Freund
Date:
On 2016-03-12 02:24:33 +0300, Alexander Korotkov wrote:
> Idea of individual time measurement of every wait event met criticism
> because it might have high overhead [1].

Right. And that's actually one of the point which I meant with "didn't
listen to criticism". There've been a lot of examples, on an off list,
where taking timings trigger significant slowdowns.  Yes, in some
bare-metal environments, which a coherent tsc, the overhead can be
low. But that doesn't make it ok to have a high overhead on a lot of
other systems.

Just claiming that that's not a problem will only lead to your position
not being taken serious.


> This is really so at least for Windows [2].

Measuring timing overhead for a simplistic workload on a single system
doesn't mean that.  Try doing such a test on a vmware esx virtualized
windows machine, on a multi-socket server; in a lot of instances you'll
see two-three orders of magnitude longer average times; with peaks going
into 4-5 orders of magnitude.  And, as sad it is, realistically most
postgres instances will run in virtualized environments.


> But accessing only current values wouldn't be very useful.  We
> anyway need to gather some statistics.  Gathering it by sampling would be
> both more expensive and less accurate for majority of systems.  This is why
> I proposed hooks to make possible platform dependent extensions.  Robert
> rejects hook because he is "not a big fan of hooks as a way of resolving
> disagreements about the design" [3].

I think I agree with Robert here. Providing hooks into very low level
places tends to lead to problems in my experience; tight control over
what happens is often important - I certainly don't want any external
code to run while we're waiting for an lwlock.


> Besides that is actually not design issues but platform issues...

I don't see how that's the case.


> Another question is wait parameters.  We want to expose wait event with
> some parameters.  Robert rejects that because it *might* add additional
> overhead [3]. When I proposed to fit something useful into hard-won
> 4-bytes, Roberts claims that it is "too clever" [4].

I think stopping to treat this as "Robert/EDB vs. pgpro" would be a good
first step to make progress here.


It seems entirely possible to extend the current API in an incremental
fashion, either allowing to disable the individual pieces, or providing
sufficient measurements that it's not needed.


> So, situation looks like dead-end.  I have no idea how to convince Robert
> about any kind of advanced functionality of wait monitoring to PostgreSQL.
> I'm thinking about implementing sampling extension over current
> infrastructure just to make community see that it sucks. Andres, it would
> be very nice if you have any idea how to move this situation forward.

I've had my share of conflicts with Robert. But if I were in his shoes,
targeted by this kind of rhetoric, I'd be very tempted to just ignore
any further arguments from the origin.  So I think the way forward is
for everyone to cool off, and to see how we can incrementally make
progress from here on.


> Another aspect is that EnterpriseDB offers waits monitoring in proprietary
> fork [5].

So?

Greetings,

Andres Freund



Re: Background Processes and reporting

From
Alexander Korotkov
Date:
On Sat, Mar 12, 2016 at 2:45 AM, Andres Freund <andres@anarazel.de> wrote:
On 2016-03-12 02:24:33 +0300, Alexander Korotkov wrote:
> Idea of individual time measurement of every wait event met criticism
> because it might have high overhead [1].

Right. And that's actually one of the point which I meant with "didn't
listen to criticism". There've been a lot of examples, on an off list,
where taking timings trigger significant slowdowns.  Yes, in some
bare-metal environments, which a coherent tsc, the overhead can be
low. But that doesn't make it ok to have a high overhead on a lot of
other systems.

Just claiming that that's not a problem will only lead to your position
not being taken serious.


> This is really so at least for Windows [2].

Measuring timing overhead for a simplistic workload on a single system
doesn't mean that.  Try doing such a test on a vmware esx virtualized
windows machine, on a multi-socket server; in a lot of instances you'll
see two-three orders of magnitude longer average times; with peaks going
into 4-5 orders of magnitude.  And, as sad it is, realistically most
postgres instances will run in virtualized environments.


> But accessing only current values wouldn't be very useful.  We
> anyway need to gather some statistics.  Gathering it by sampling would be
> both more expensive and less accurate for majority of systems.  This is why
> I proposed hooks to make possible platform dependent extensions.  Robert
> rejects hook because he is "not a big fan of hooks as a way of resolving
> disagreements about the design" [3].

I think I agree with Robert here. Providing hooks into very low level
places tends to lead to problems in my experience; tight control over
what happens is often important - I certainly don't want any external
code to run while we're waiting for an lwlock.

So, I get following.
 
1) Detailed wait monitoring might cause high overhead on some systems.
2) We want wait monitoring to be always on. And we don't want options to enable additional features of wait monitoring.
3) We don't want hook of wait events to be exposed.

Can I conclude that we reject detailed wait monitoring by design?
If it's so and not only Robert thinks so, then let's just admit it and add it to FAQ and etc.

> Besides that is actually not design issues but platform issues...

I don't see how that's the case.


> Another question is wait parameters.  We want to expose wait event with
> some parameters.  Robert rejects that because it *might* add additional
> overhead [3]. When I proposed to fit something useful into hard-won
> 4-bytes, Roberts claims that it is "too clever" [4].

I think stopping to treat this as "Robert/EDB vs. pgpro" would be a good
first step to make progress here.


It seems entirely possible to extend the current API in an incremental
fashion, either allowing to disable the individual pieces, or providing
sufficient measurements that it's not needed.


> So, situation looks like dead-end.  I have no idea how to convince Robert
> about any kind of advanced functionality of wait monitoring to PostgreSQL.
> I'm thinking about implementing sampling extension over current
> infrastructure just to make community see that it sucks. Andres, it would
> be very nice if you have any idea how to move this situation forward.

I've had my share of conflicts with Robert. But if I were in his shoes,
targeted by this kind of rhetoric, I'd be very tempted to just ignore
any further arguments from the origin.  So I think the way forward is
for everyone to cool off, and to see how we can incrementally make
progress from here on.


> Another aspect is that EnterpriseDB offers waits monitoring in proprietary
> fork [5].

So?

So, we'll end up with every company providing fork with detailed wait monitoring. While community PostgreSQL resists from providing such functionality.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Background Processes and reporting

From
Amit Kapila
Date:
On Sat, Mar 12, 2016 at 3:10 AM, Andres Freund <andres@anarazel.de> wrote:
>
>
> > Similarly for the wait event stuff - checkpointer, wal writer,
> > background writer are in many cases processes that very often are
> > blocked on locks, IO and such.  Thus restricting the facility to
> > database connected processes seems like a loss.
>
> I think one way to address this would be to not only report
> PgBackendStatus type processes in pg_stat_activity. While that'd
> obviously be a compatibility break, I think it'd be an improvement.
>

I think here another point which needs more thoughts is that many of the pg_stat_activity fields are not relevant for background processes, ofcourse one can say that we can keep those fields as NULL, but still I think that indicates it is not the most suitable way to expose such information.

Another way could be to have new view like pg_stat_background_activity with only relevant fields or try expose via individual views like pg_stat_bgwriter.

Do you intend to get this done for 9.6 considering an add-on patch for wait event information displayed in pg_stat_activity?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Background Processes and reporting

From
Oleg Bartunov
Date:


On Sat, Mar 12, 2016 at 12:45 AM, Andres Freund <andres@anarazel.de> wrote:
On 2016-03-12 02:24:33 +0300, Alexander Korotkov wrote:

 
> So, situation looks like dead-end.  I have no idea how to convince Robert
> about any kind of advanced functionality of wait monitoring to PostgreSQL.
> I'm thinking about implementing sampling extension over current
> infrastructure just to make community see that it sucks. Andres, it would
> be very nice if you have any idea how to move this situation forward.

I've had my share of conflicts with Robert. But if I were in his shoes,
targeted by this kind of rhetoric, I'd be very tempted to just ignore
any further arguments from the origin.  So I think the way forward is
for everyone to cool off, and to see how we can incrementally make
progress from here on.


We all are very different people from different cultures, so online discussion on ill-defined topics  wouldn't work. Let's back to work.


> Another aspect is that EnterpriseDB offers waits monitoring in proprietary
> fork [5].
 

So?

So, Robert already has experience with the subject, probably,  he has bad experience with edb implementation and he'd like to see something better in community version. That's fair and I accept his position.

Wait monitoring is one of the popular requirement of russian companies, who migrated from Oracle. Overwhelming majority of them use Linux, so I suggest to have configure flag for including wait monitoring at compile time (default is no wait monitoring), or have GUC variable, which is also off by default, so we have zero to minimal overhead of monitoring. That way we'll satisfy many enterprises and help them to choose postgres, will get feedback from production use and have time for feature improving.

 

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Background Processes and reporting

From
Amit Kapila
Date:
On Sat, Mar 12, 2016 at 2:38 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
>
> On Sat, Mar 12, 2016 at 2:45 AM, Andres Freund <andres@anarazel.de> wrote:
>>
>>
>> I think I agree with Robert here. Providing hooks into very low level
>> places tends to lead to problems in my experience; tight control over
>> what happens is often important - I certainly don't want any external
>> code to run while we're waiting for an lwlock.
>
>
> So, I get following.
>  
> 1) Detailed wait monitoring might cause high overhead on some systems.
> 2) We want wait monitoring to be always on. And we don't want options to enable additional features of wait monitoring.
>

I am not able to see how any of above comments indicate that wait monitoring need to be always on, why can't we consider to be off by default especially for events like timing calculations where we suspect to have some performance penalty and  during development if it is proven that none of the additional wait events cause any overhead, then we can keep them on by default.

> 3) We don't want hook of wait events to be exposed.
>
> Can I conclude that we reject detailed wait monitoring by design?
>

I don't think so.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Background Processes and reporting

From
Vladimir Borodin
Date:

12 марта 2016 г., в 2:45, Andres Freund <andres@anarazel.de> написал(а):

On 2016-03-12 02:24:33 +0300, Alexander Korotkov wrote:
Idea of individual time measurement of every wait event met criticism
because it might have high overhead [1].

Right. And that's actually one of the point which I meant with "didn't
listen to criticism". There've been a lot of examples, on an off list,
where taking timings trigger significant slowdowns.  Yes, in some
bare-metal environments, which a coherent tsc, the overhead can be
low. But that doesn't make it ok to have a high overhead on a lot of
other systems.

That’s why proposal included GUC for that with a default to turn timings measuring off. I don’t remember any objections against that.

And I’m absolutely sure that a real highload production (which of course doesn’t use virtualization and windows) can’t exist without measuring timings. Oracle guys have written several chapters (!) about that [0]. Long story short, sampling doesn’t give enough precision. I have shown overhead [1] on bare metal linux with high stressed lwlocks worload. BTW Oracle doesn’t give you any ways to turn timings measurement off, even with hidden parameters. All other commercial databases have waits monitoring with timings measurement. Let’s do it and turn it off by default so that all other platforms don’t suffer from it.



Just claiming that that's not a problem will only lead to your position
not being taken serious.


This is really so at least for Windows [2].

Measuring timing overhead for a simplistic workload on a single system
doesn't mean that.  Try doing such a test on a vmware esx virtualized
windows machine, on a multi-socket server; in a lot of instances you'll
see two-three orders of magnitude longer average times; with peaks going
into 4-5 orders of magnitude.  And, as sad it is, realistically most
postgres instances will run in virtualized environments.


But accessing only current values wouldn't be very useful.  We
anyway need to gather some statistics.  Gathering it by sampling would be
both more expensive and less accurate for majority of systems.  This is why
I proposed hooks to make possible platform dependent extensions.  Robert
rejects hook because he is "not a big fan of hooks as a way of resolving
disagreements about the design" [3].

I think I agree with Robert here. Providing hooks into very low level
places tends to lead to problems in my experience; tight control over
what happens is often important - I certainly don't want any external
code to run while we're waiting for an lwlock.


Besides that is actually not design issues but platform issues...

I don't see how that's the case.


Another question is wait parameters.  We want to expose wait event with
some parameters.  Robert rejects that because it *might* add additional
overhead [3]. When I proposed to fit something useful into hard-won
4-bytes, Roberts claims that it is "too clever" [4].

I think stopping to treat this as "Robert/EDB vs. pgpro" would be a good
first step to make progress here.


It seems entirely possible to extend the current API in an incremental
fashion, either allowing to disable the individual pieces, or providing
sufficient measurements that it's not needed.


So, situation looks like dead-end.  I have no idea how to convince Robert
about any kind of advanced functionality of wait monitoring to PostgreSQL.
I'm thinking about implementing sampling extension over current
infrastructure just to make community see that it sucks. Andres, it would
be very nice if you have any idea how to move this situation forward.

I've had my share of conflicts with Robert. But if I were in his shoes,
targeted by this kind of rhetoric, I'd be very tempted to just ignore
any further arguments from the origin.  So I think the way forward is
for everyone to cool off, and to see how we can incrementally make
progress from here on.


Another aspect is that EnterpriseDB offers waits monitoring in proprietary
fork [5].

So?

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
May the force be with you…

Re: Background Processes and reporting

From
Vladimir Borodin
Date:

12 марта 2016 г., в 13:59, Amit Kapila <amit.kapila16@gmail.com> написал(а):

On Sat, Mar 12, 2016 at 3:10 AM, Andres Freund <andres@anarazel.de> wrote:
>
>
> > Similarly for the wait event stuff - checkpointer, wal writer,
> > background writer are in many cases processes that very often are
> > blocked on locks, IO and such.  Thus restricting the facility to
> > database connected processes seems like a loss.
>
> I think one way to address this would be to not only report
> PgBackendStatus type processes in pg_stat_activity. While that'd
> obviously be a compatibility break, I think it'd be an improvement.
>

I think here another point which needs more thoughts is that many of the pg_stat_activity fields are not relevant for background processes, ofcourse one can say that we can keep those fields as NULL, but still I think that indicates it is not the most suitable way to expose such information.

Another way could be to have new view like pg_stat_background_activity with only relevant fields or try expose via individual views like pg_stat_bgwriter.

From the DBA point of view it is much more convenient to see all wait events in one view. I don’t know if it is right to break compability even more, but IMHO exposing this data in different views is a bad plan.


Do you intend to get this done for 9.6 considering an add-on patch for wait event information displayed in pg_stat_activity?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


--
May the force be with you…

Re: Background Processes and reporting

From
Kevin Grittner
Date:
On Sat, Mar 12, 2016 at 11:40 AM, Vladimir Borodin <root@simply.name> wrote:
> 12 марта 2016 г., в 13:59, Amit Kapila <amit.kapila16@gmail.com> написал(а):

>> I think here another point which needs more thoughts is that many of the
>> pg_stat_activity fields are not relevant for background processes, ofcourse
>> one can say that we can keep those fields as NULL, but still I think that
>> indicates it is not the most suitable way to expose such information.
>>
>> Another way could be to have new view like pg_stat_background_activity with
>> only relevant fields or try expose via individual views like
>> pg_stat_bgwriter.
>
> From the DBA point of view it is much more convenient to see all wait events
> in one view. I don’t know if it is right to break compability even more, but
> IMHO exposing this data in different views is a bad plan.

+1

If they are split into separate views I think that there will be a
lot of effort put into views to present the UNION of them, probably
with weird corner cases and race conditions.  A single view can
probably better manage race conditions, and a WHERE clause is not
as tricky for the DBA and/or end user.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Background Processes and reporting

From
Robert Haas
Date:
On Sat, Mar 12, 2016 at 6:05 AM, Oleg Bartunov <obartunov@gmail.com> wrote:
>> So?
>
> So, Robert already has experience with the subject, probably,  he has bad
> experience with edb implementation and he'd like to see something better in
> community version. That's fair and I accept his position.

Bingo - though maybe "bad" experience is not quite as accurate as
"could be better".

> Wait monitoring is one of the popular requirement of russian companies, who
> migrated from Oracle. Overwhelming majority of them use Linux, so I suggest
> to have configure flag for including wait monitoring at compile time
> (default is no wait monitoring), or have GUC variable, which is also off by
> default, so we have zero to minimal overhead of monitoring. That way we'll
> satisfy many enterprises and help them to choose postgres, will get feedback
> from production use and have time for feature improving.

So, right now we can only display the wait information in
pg_stat_activity.  There are a couple of other things that somebody
might want to do:

1. Sample the wait state information across all backends in the
system.  On a large, busy system, this figures to be quite cheap, and
the sampling interval could be configurable.

2. Count every instance of every wait event in every backend, and roll
that up either via shared memory or additional stats messges.

3. Like #2, but with timing information.

4. Like #2, but on a per-query basis, somehow integrated with
pg_stat_statements.

The challenge with any of these except #1 is that they are going to
produce a huge volume of data, and, whether you believe it or not, #3
is going to sometimes be crushingly slow.  Really.  I tend to think
that #1 might be better than #2 or #3, but I'm not unwilling to listen
to contrary arguments, especially if backed up by careful benchmarking
showing that the performance hit is negligible.  My reason for wanting
to get the stuff we already had committed first is because I have
found that it is best to proceed with these kinds of problems
incrementally, not trying to solve too much in a single commit.  Now
that we have the basics, we can build on it, adding more wait events
and possibly more recordkeeping for the ones we have already - but
anything that regresses performance for people not using the feature
is a dead end in my book, as is anything that introduces overall
stability risks.

I think the way forward from here is that Postgres Pro should (a)
rework their implementation to work with what has already been
committed, (b) consider carefully whether they've done everything
possible to contain the performance loss, (c) benchmark it on several
different machines and workloads to see how much performance loss
there is, and (d) stop accusing me of acting in bad faith.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Background Processes and reporting

From
Vladimir Borodin
Date:

14 марта 2016 г., в 22:21, Robert Haas <robertmhaas@gmail.com> написал(а):

On Sat, Mar 12, 2016 at 6:05 AM, Oleg Bartunov <obartunov@gmail.com> wrote:
So?

So, Robert already has experience with the subject, probably,  he has bad
experience with edb implementation and he'd like to see something better in
community version. That's fair and I accept his position.

Bingo - though maybe "bad" experience is not quite as accurate as
"could be better".

Wait monitoring is one of the popular requirement of russian companies, who
migrated from Oracle. Overwhelming majority of them use Linux, so I suggest
to have configure flag for including wait monitoring at compile time
(default is no wait monitoring), or have GUC variable, which is also off by
default, so we have zero to minimal overhead of monitoring. That way we'll
satisfy many enterprises and help them to choose postgres, will get feedback
from production use and have time for feature improving.

So, right now we can only display the wait information in
pg_stat_activity.  There are a couple of other things that somebody
might want to do:

1. Sample the wait state information across all backends in the
system.  On a large, busy system, this figures to be quite cheap, and
the sampling interval could be configurable.

2. Count every instance of every wait event in every backend, and roll
that up either via shared memory or additional stats messges.

3. Like #2, but with timing information.

4. Like #2, but on a per-query basis, somehow integrated with
pg_stat_statements.

5. Show extra information about wait event (i.e. exclusive of shared mode for LWLocks, relation/forknum/blknum for I/O operations, etc.).


The challenge with any of these except #1 is that they are going to
produce a huge volume of data, and, whether you believe it or not, #3
is going to sometimes be crushingly slow.  Really.  I tend to think
that #1 might be better than #2 or #3, but I'm not unwilling to listen
to contrary arguments, especially if backed up by careful benchmarking
showing that the performance hit is negligible.

I have already shown [0, 1] the overhead of measuring timings in linux on representative workload. AFAIK, these tests were the only one that showed any numbers. All other statements about terrible performance have been and remain unconfirmed.

As for the size of such information it of course should be configurable. I.e. in Oracle there is a GUC for the size of ring buffer to store history of sampling with extra information about each wait event.


 My reason for wanting
to get the stuff we already had committed first is because I have
found that it is best to proceed with these kinds of problems
incrementally, not trying to solve too much in a single commit.  Now
that we have the basics, we can build on it, adding more wait events
and possibly more recordkeeping for the ones we have already - but
anything that regresses performance for people not using the feature
is a dead end in my book, as is anything that introduces overall
stability risks.

Ok, doing it in short steps seems to be a good plan. Any objections against giving people an ability to turn some feature (i.e. notorious measuring timings) even if it makes some performance degradation? Of course, it should be turned off by default.


I think the way forward from here is that Postgres Pro should (a)
rework their implementation to work with what has already been
committed, (b) consider carefully whether they've done everything
possible to contain the performance loss, (c) benchmark it on several
different machines and workloads to see how much performance loss
there is, and (d) stop accusing me of acting in bad faith.

If anything, I’m not from PostgresPro and I’m not «accusing you». But to be honest current committed implementation has been tested exactly on one machine with two workloads. And I think, it is somehow unfair to demand more from others. Although it doesn’t mean that testing on exactly one machine with only one OS is enough, of course. I suppose, you should ask the authors to test it on some representative hardware and workload but if authors don’t have them, it would be nice to help them with that.

Also it would be really interesting to hear your opinion about the initial Andres’s question. Any thoughts about changing current committed implementation?


--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
May the force be with you…

Re: Background Processes and reporting

From
Andres Freund
Date:
On 2016-03-12 16:29:11 +0530, Amit Kapila wrote:
> On Sat, Mar 12, 2016 at 3:10 AM, Andres Freund <andres@anarazel.de> wrote:
> >
> >
> > > Similarly for the wait event stuff - checkpointer, wal writer,
> > > background writer are in many cases processes that very often are
> > > blocked on locks, IO and such.  Thus restricting the facility to
> > > database connected processes seems like a loss.
> >
> > I think one way to address this would be to not only report
> > PgBackendStatus type processes in pg_stat_activity. While that'd
> > obviously be a compatibility break, I think it'd be an improvement.
> >
> 
> I think here another point which needs more thoughts is that many of the
> pg_stat_activity fields are not relevant for background processes, ofcourse
> one can say that we can keep those fields as NULL, but still I think that
> indicates it is not the most suitable way to expose such information.

But neither are all of them relevant for autovacuum workers, and we
already show them.  pg_stat_activity as a name imo doesn't really imply
that it's about plain queries.  ISTM we should add a 'backend_type'
column that is one of backend|checkpointer|autovacuum|autovacuum-worker|wal writer| bgwriter| bgworker
(or similar). That makes querying easier.  And then display all shmem
connected processes.

> Another way could be to have new view like pg_stat_background_activity with
> only relevant fields or try expose via individual views like
> pg_stat_bgwriter.

I think the second is a pretty bad alternative; it'll force us to add
new views with very similar information; and it'll be hard to get
information about the whole system.   I mean if you want to know which
locks are causing problems, you don't primarily care whether it's a
background process or a backend that has contention issues.


> Do you intend to get this done for 9.6 considering an add-on patch for wait
> event information displayed in pg_stat_activity?

I think we should fix this for 9.6; it's a weakness in a new
interface. Let's not yank people around more than we need to.

I'm willing to do some work on that, if we can agree upon a course.

Andres



Re: Background Processes and reporting

From
Robert Haas
Date:
On Mon, Mar 14, 2016 at 3:54 PM, Vladimir Borodin <root@simply.name> wrote:
> 5. Show extra information about wait event (i.e. exclusive of shared mode
> for LWLocks, relation/forknum/blknum for I/O operations, etc.).

I doubt that this is a good idea.  Everybody will pay the cost of it,
and who will get a benefit?  We haven't had any wait monitoring at all
in PostgreSQL for years and years and years and it's only just now
getting to the top of our list of things to fix.  So I have a hard
time believing that now we suddenly need this level of detail.  The
very good thing about the committed implementation is that it requires
*no* synchronization, and anything more than a 4-byte integer will
(probably an st_changecount type protocol).  I continue to believe
that a feature that is on for everyone and dirt cheap is going to be
more valuable than anything that is expensive enough to require an
"off" switch.

> I have already shown [0, 1] the overhead of measuring timings in linux on
> representative workload. AFAIK, these tests were the only one that showed
> any numbers. All other statements about terrible performance have been and
> remain unconfirmed.

Of course, those numbers are substantial regressions which would
likely make it impractical to turn this on on a heavily-loaded
production system.  On the other hand, the patch actually committed is
turned on by default and Amit posted numbers showing no performance
change at all.

> As for the size of such information it of course should be configurable.
> I.e. in Oracle there is a GUC for the size of ring buffer to store history
> of sampling with extra information about each wait event.

That's a reasonable idea, although not one I'm very excited about.

> Ok, doing it in short steps seems to be a good plan. Any objections against
> giving people an ability to turn some feature (i.e. notorious measuring
> timings) even if it makes some performance degradation? Of course, it should
> be turned off by default.

I am not totally opposed to that, but I think a feature that causes a
10% performance hit when you turn it on will be mostly useless.  The
people who need it won't be able to risk turning it on.

> If anything, I’m not from PostgresPro and I’m not «accusing you». But to be
> honest current committed implementation has been tested exactly on one
> machine with two workloads. And I think, it is somehow unfair to demand more
> from others. Although it doesn’t mean that testing on exactly one machine
> with only one OS is enough, of course. I suppose, you should ask the authors
> to test it on some representative hardware and workload but if authors don’t
> have them, it would be nice to help them with that.

I'm not necessarily opposed to that, but this thread has a lot more
heat than light, and some of the other threads on this topic have had
the same problem. There seems to be tremendous resistance to the idea
that recording timestamps is going to be extensive even though there
are many previous threads on pgsql-hackers about many different
features showing that this is true.  Somehow, I've got to justify a
position which has been taken by many people many times before on this
very same mailing list.  That strikes me as 100% backwards.

Similarly, the position that a wait-reporting interface that does not
require synchronization will be a lot cheaper than one that does
require synchronization has been questioned repeatedly.  I'm not very
interested in spending a lot of time defending that proposition or
producing benchmarking results to support it, and I don't think I
should have to.  We wouldn't have so many patches floating around that
aimed to reduce locking if synchronization overhead didn't cost, and
what is being proposed is to stick those into low-level code paths
that are sometimes highly trafficked.

> Also it would be really interesting to hear your opinion about the initial
> Andres’s question. Any thoughts about changing current committed
> implementation?

I'm a little vague on specifically what Andres has in mind.  I tend to
think that there's not much point in allowing
pg_stat_get_progress_info('checkpointer') because we can just have a
dedicated view for that sort of thing, cf. pg_stat_bgwriter, which
seems better.  Exposing the wait events from background processes
might be worth doing, but I don't think we want to add a bunch of
dummy lines to pg_stat_activity.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Background Processes and reporting

From
Andres Freund
Date:
Hi,

On 2016-03-14 16:16:43 -0400, Robert Haas wrote:
> > I have already shown [0, 1] the overhead of measuring timings in linux on
> > representative workload. AFAIK, these tests were the only one that showed
> > any numbers. All other statements about terrible performance have been and
> > remain unconfirmed.
>
> Of course, those numbers are substantial regressions which would
> likely make it impractical to turn this on on a heavily-loaded
> production system.

A lot of people operating production systems are fine with trading a <=
10% impact for more insight into the system; especially if that
configuration can be changed without a restart.  I know a lot of systems
that use pg_stat_statements, track_io_timing = on, etc; just to get
that. In fact there's people running perf more or less continuously in
production environments; just to get more insight.

I think it's important to get as much information out there without
performance overhead, so it can be enabled by default. But I don't think
it makes sense to not allow features in that cannot be enabled by
default, *if* we tried to make them cheap enough beforehand.


> > Ok, doing it in short steps seems to be a good plan. Any objections against
> > giving people an ability to turn some feature (i.e. notorious measuring
> > timings) even if it makes some performance degradation? Of course, it should
> > be turned off by default.
>
> I am not totally opposed to that, but I think a feature that causes a
> 10% performance hit when you turn it on will be mostly useless.  The
> people who need it won't be able to risk turning it on.

That's not my experience.


> > If anything, I’m not from PostgresPro and I’m not «accusing you». But to be
> > honest current committed implementation has been tested exactly on one
> > machine with two workloads. And I think, it is somehow unfair to demand more
> > from others. Although it doesn’t mean that testing on exactly one machine
> > with only one OS is enough, of course. I suppose, you should ask the authors
> > to test it on some representative hardware and workload but if authors don’t
> > have them, it would be nice to help them with that.
>
> I'm not necessarily opposed to that, but this thread has a lot more
> heat than light

Indeed.


>, and some of the other threads on this topic have had
> the same problem. There seems to be tremendous resistance to the idea
> that recording timestamps is going to be extensive even though there
> are many previous threads on pgsql-hackers about many different
> features showing that this is true.  Somehow, I've got to justify a
> position which has been taken by many people many times before on this
> very same mailing list.  That strikes me as 100% backwards.

Agreed; I find that pretty baffling. Especially that pointing out
problems like timestamp overhead generates a remarkable amount of
hostility is weird.


> > Also it would be really interesting to hear your opinion about the initial
> > Andres’s question. Any thoughts about changing current committed
> > implementation?
>
> I'm a little vague on specifically what Andres has in mind.

That makes two of us.


> I tend to think that there's not much point in allowing
> pg_stat_get_progress_info('checkpointer') because we can just have a
> dedicated view for that sort of thing, cf. pg_stat_bgwriter, which
> seems better.

But that infrastructure isn't really suitable for exposing quickly
changing counters imo. And given that we now have a relatively generic
framework, it seems like a pain to add a custom implementation just for
the checkpointer. Also, using custom infrastructure means it's not
extensible to custom bgworker, which doesn't seem like a good
idea. E.g. it'd be very neat to show the progress of a logical
replication catchup process that way, no?


> Exposing the wait events from background processes
> might be worth doing, but I don't think we want to add a bunch of
> dummy lines to pg_stat_activity.

Why are those dummy lines? It's activity in the cluster? We already show
autovacuum workers in there. And walsenders, if you query the underlying
function, instead of pg_stat_activity (due to a join to pg_database).

Andres



Re: Background Processes and reporting

From
Robert Haas
Date:
On Mon, Mar 14, 2016 at 4:42 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-03-14 16:16:43 -0400, Robert Haas wrote:
>> > I have already shown [0, 1] the overhead of measuring timings in linux on
>> > representative workload. AFAIK, these tests were the only one that showed
>> > any numbers. All other statements about terrible performance have been and
>> > remain unconfirmed.
>>
>> Of course, those numbers are substantial regressions which would
>> likely make it impractical to turn this on on a heavily-loaded
>> production system.
>
> A lot of people operating production systems are fine with trading a <=
> 10% impact for more insight into the system; especially if that
> configuration can be changed without a restart.  I know a lot of systems
> that use pg_stat_statements, track_io_timing = on, etc; just to get
> that. In fact there's people running perf more or less continuously in
> production environments; just to get more insight.
>
> I think it's important to get as much information out there without
> performance overhead, so it can be enabled by default. But I don't think
> it makes sense to not allow features in that cannot be enabled by
> default, *if* we tried to make them cheap enough beforehand.

Hmm, OK.  I would have expected you to be on the other side of this
question, so maybe I'm all wet.  One point I am concerned about is
that, right now, we have only a handful of types of wait events.  I'm
very interested in seeing us add more, like I/O and client wait.  So
any overhead we pay here is likely to eventually be paid in a lot of
places - thus it had better be extremely small.

>> I tend to think that there's not much point in allowing
>> pg_stat_get_progress_info('checkpointer') because we can just have a
>> dedicated view for that sort of thing, cf. pg_stat_bgwriter, which
>> seems better.
>
> But that infrastructure isn't really suitable for exposing quickly
> changing counters imo. And given that we now have a relatively generic
> framework, it seems like a pain to add a custom implementation just for
> the checkpointer. Also, using custom infrastructure means it's not
> extensible to custom bgworker, which doesn't seem like a good
> idea. E.g. it'd be very neat to show the progress of a logical
> replication catchup process that way, no?

It isn't really that hard to make a purpose-built shared memory area
for each permanent background process, and I think that would be a
better design.  Then you can have whatever types you need, whatever
column labels make sense, etc.  You can't really do that for command
progress reporting because there are so many commands, but a
single-purpose backend doesn't have that issue.  I do agree that
having background workers report into this facility could make sense.

>> Exposing the wait events from background processes
>> might be worth doing, but I don't think we want to add a bunch of
>> dummy lines to pg_stat_activity.
>
> Why are those dummy lines? It's activity in the cluster? We already show
> autovacuum workers in there. And walsenders, if you query the underlying
> function, instead of pg_stat_activity (due to a join to pg_database).

Hmm.  Well, OK, maybe.  I didn't realize walsenders were showing up
there ... sorta.  I guess my concern was that people would complain
about breaking compatibility, but since we're doing that already maybe
we ought to double down on it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Background Processes and reporting

From
Amit Kapila
Date:
On Tue, Mar 15, 2016 at 1:32 AM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2016-03-12 16:29:11 +0530, Amit Kapila wrote:
> > On Sat, Mar 12, 2016 at 3:10 AM, Andres Freund <andres@anarazel.de> wrote:
> > >
> > >
> > > > Similarly for the wait event stuff - checkpointer, wal writer,
> > > > background writer are in many cases processes that very often are
> > > > blocked on locks, IO and such.  Thus restricting the facility to
> > > > database connected processes seems like a loss.
> > >
> > > I think one way to address this would be to not only report
> > > PgBackendStatus type processes in pg_stat_activity. While that'd
> > > obviously be a compatibility break, I think it'd be an improvement.
> > >
> >
> > I think here another point which needs more thoughts is that many of the
> > pg_stat_activity fields are not relevant for background processes, ofcourse
> > one can say that we can keep those fields as NULL, but still I think that
> > indicates it is not the most suitable way to expose such information.
>
> But neither are all of them relevant for autovacuum workers, and we
> already show them.
>

Right, currently any process which has been assigned BackendId is probable candidate for being displayed in pg_stat_activity and all the relative information is being captured in PGBackendStatus.

>  pg_stat_activity as a name imo doesn't really imply
> that it's about plain queries.  ISTM we should add a 'backend_type'
> column that is one of backend|checkpointer|autovacuum|autovacuum-worker|wal writer| bgwriter| bgworker
> (or similar). That makes querying easier.
>

+1 for going that way if we decide to display background process information in pg_stat_activity view.  However I think we might need some additional space in shared memory to track some of the statistics as we track in PGBackendStatus or may be for now just display some minimal stats like wait events for background processes.

>
> > Another way could be to have new view like pg_stat_background_activity with
> > only relevant fields or try expose via individual views like
> > pg_stat_bgwriter.
>
> I think the second is a pretty bad alternative; it'll force us to add
> new views with very similar information; and it'll be hard to get
> information about the whole system.   I mean if you want to know which
> locks are causing problems, you don't primarily care whether it's a
> background process or a backend that has contention issues.
>

Agreed.  OTOH adding information from two different kind of processes (one which has Backendid and one which doesn't have) also doesn't sound to be neat from the internal code perspective, but may be this is just an initial fear or may be because we haven't comeup with a patch which can show it is actually a simple thing to achieve.

Yet another idea could be for 9.6, lets just define statistics functions to get wait events for background process like we have for backends (similar to pg_stat_get_backend_idset() and pg_stat_get_backend_wait_event()). I think we most probably anyway need those kind of functions once we expose such information for background processes, so having them now will at least provide someway to user to have some minimal information about background processes.  I think that won't need much additional work.

>
> > Do you intend to get this done for 9.6 considering an add-on patch for wait
> > event information displayed in pg_stat_activity?
>
> I think we should fix this for 9.6; it's a weakness in a new
> interface. Let's not yank people around more than we need to.
>
> I'm willing to do some work on that, if we can agree upon a course.
>

Good to hear.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Background Processes and reporting

From
Alexander Korotkov
Date:
On Tue, Mar 15, 2016 at 12:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 14, 2016 at 4:42 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-03-14 16:16:43 -0400, Robert Haas wrote:
>> > I have already shown [0, 1] the overhead of measuring timings in linux on
>> > representative workload. AFAIK, these tests were the only one that showed
>> > any numbers. All other statements about terrible performance have been and
>> > remain unconfirmed.
>>
>> Of course, those numbers are substantial regressions which would
>> likely make it impractical to turn this on on a heavily-loaded
>> production system.
>
> A lot of people operating production systems are fine with trading a <=
> 10% impact for more insight into the system; especially if that
> configuration can be changed without a restart.  I know a lot of systems
> that use pg_stat_statements, track_io_timing = on, etc; just to get
> that. In fact there's people running perf more or less continuously in
> production environments; just to get more insight.
>
> I think it's important to get as much information out there without
> performance overhead, so it can be enabled by default. But I don't think
> it makes sense to not allow features in that cannot be enabled by
> default, *if* we tried to make them cheap enough beforehand.

Hmm, OK.  I would have expected you to be on the other side of this
question, so maybe I'm all wet.  One point I am concerned about is
that, right now, we have only a handful of types of wait events.  I'm
very interested in seeing us add more, like I/O and client wait.  So
any overhead we pay here is likely to eventually be paid in a lot of
places - thus it had better be extremely small.

OK. Let's start to produce light, not heat.

As I get we have two features which we suspect to introduce overhead:
1) Recording parameters of wait events which requires some kind of synchronization protocol.
2) Recording time of wait events because time measurements might be expensive on some platforms.

Simultaneously there are machines and workloads where both of these features doesn't produce measurable overhead.  And, we're talking not about toy databases. Vladimir is DBA from Yandex which is in TOP-20 (by traffic) internet companies in the world.  They do run both of this features in production highload database without noticing any overhead of them.

It would be great progress, if we decide that we could add both of these features controlled by GUC (off by default).

If we decide so, then let's start working on this. At first, we should construct list of machines and workloads for testing. Each list of machines and workloads would be not comprehensive. But let's find something that would be enough for testing of GUC controlled, off by default features.  Then we can turn our conversation from theoretical thoughts to particular benchmarks which would be objective and convincing to everybody.

Otherwise, let's just add these features to the list of unwanted functionality and close this question.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Background Processes and reporting

From
Oleg Bartunov
Date:


On Tue, Mar 15, 2016 at 7:43 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Tue, Mar 15, 2016 at 12:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 14, 2016 at 4:42 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-03-14 16:16:43 -0400, Robert Haas wrote:
>> > I have already shown [0, 1] the overhead of measuring timings in linux on
>> > representative workload. AFAIK, these tests were the only one that showed
>> > any numbers. All other statements about terrible performance have been and
>> > remain unconfirmed.
>>
>> Of course, those numbers are substantial regressions which would
>> likely make it impractical to turn this on on a heavily-loaded
>> production system.
>
> A lot of people operating production systems are fine with trading a <=
> 10% impact for more insight into the system; especially if that
> configuration can be changed without a restart.  I know a lot of systems
> that use pg_stat_statements, track_io_timing = on, etc; just to get
> that. In fact there's people running perf more or less continuously in
> production environments; just to get more insight.
>
> I think it's important to get as much information out there without
> performance overhead, so it can be enabled by default. But I don't think
> it makes sense to not allow features in that cannot be enabled by
> default, *if* we tried to make them cheap enough beforehand.

Hmm, OK.  I would have expected you to be on the other side of this
question, so maybe I'm all wet.  One point I am concerned about is
that, right now, we have only a handful of types of wait events.  I'm
very interested in seeing us add more, like I/O and client wait.  So
any overhead we pay here is likely to eventually be paid in a lot of
places - thus it had better be extremely small.

OK. Let's start to produce light, not heat.

As I get we have two features which we suspect to introduce overhead:
1) Recording parameters of wait events which requires some kind of synchronization protocol.
2) Recording time of wait events because time measurements might be expensive on some platforms.

Simultaneously there are machines and workloads where both of these features doesn't produce measurable overhead.  And, we're talking not about toy databases. Vladimir is DBA from Yandex which is in TOP-20 (by traffic) internet companies in the world.  They do run both of this features in production highload database without noticing any overhead of them.

It would be great progress, if we decide that we could add both of these features controlled by GUC (off by default).

enable_waits_statistics ?
 

If we decide so, then let's start working on this. At first, we should construct list of machines and workloads for testing. Each list of machines and workloads would be not comprehensive. But let's find something that would be enough for testing of GUC controlled, off by default features.  Then we can turn our conversation from theoretical thoughts to particular benchmarks which would be objective and convincing to everybody.

Vladimir, could you provide a test suite, so other people could measure overhead on their machines ?


 

Otherwise, let's just add these features to the list of unwanted functionality and close this question.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Background Processes and reporting

From
Vladimir Borodin
Date:

15 марта 2016 г., в 19:57, Oleg Bartunov <obartunov@gmail.com> написал(а):



On Tue, Mar 15, 2016 at 7:43 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Tue, Mar 15, 2016 at 12:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 14, 2016 at 4:42 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-03-14 16:16:43 -0400, Robert Haas wrote:
>> > I have already shown [0, 1] the overhead of measuring timings in linux on
>> > representative workload. AFAIK, these tests were the only one that showed
>> > any numbers. All other statements about terrible performance have been and
>> > remain unconfirmed.
>>
>> Of course, those numbers are substantial regressions which would
>> likely make it impractical to turn this on on a heavily-loaded
>> production system.
>
> A lot of people operating production systems are fine with trading a <=
> 10% impact for more insight into the system; especially if that
> configuration can be changed without a restart.  I know a lot of systems
> that use pg_stat_statements, track_io_timing = on, etc; just to get
> that. In fact there's people running perf more or less continuously in
> production environments; just to get more insight.
>
> I think it's important to get as much information out there without
> performance overhead, so it can be enabled by default. But I don't think
> it makes sense to not allow features in that cannot be enabled by
> default, *if* we tried to make them cheap enough beforehand.

Hmm, OK.  I would have expected you to be on the other side of this
question, so maybe I'm all wet.  One point I am concerned about is
that, right now, we have only a handful of types of wait events.  I'm
very interested in seeing us add more, like I/O and client wait.  So
any overhead we pay here is likely to eventually be paid in a lot of
places - thus it had better be extremely small.

OK. Let's start to produce light, not heat.

As I get we have two features which we suspect to introduce overhead:
1) Recording parameters of wait events which requires some kind of synchronization protocol.
2) Recording time of wait events because time measurements might be expensive on some platforms.

Simultaneously there are machines and workloads where both of these features doesn't produce measurable overhead.  And, we're talking not about toy databases. Vladimir is DBA from Yandex which is in TOP-20 (by traffic) internet companies in the world.  They do run both of this features in production highload database without noticing any overhead of them. 

It would be great progress, if we decide that we could add both of these features controlled by GUC (off by default).

enable_waits_statistics ?
 

If we decide so, then let's start working on this. At first, we should construct list of machines and workloads for testing. Each list of machines and workloads would be not comprehensive. But let's find something that would be enough for testing of GUC controlled, off by default features.  Then we can turn our conversation from theoretical thoughts to particular benchmarks which would be objective and convincing to everybody. 

Vladimir, could you provide a test suite, so other people could measure overhead on their machines ?

I have somehow described it here [0]. Since the majority of concerns were around LWLocks, the plan was to reconstruct a workload under heavy LWLocks pressure. This can easily be done even with pgbench in two following scenarios:
1. Put all the data in shared buffers and on tmpfs and run read/write test. Contention would be around ProcArrayLock.
2. Put all the data in RAM but not all in shared buffers and run read-only test. Contention would be around buffer manager.

IMHO, these two tests are good to be representative and not depend much on hardware.




 

Otherwise, let's just add these features to the list of unwanted functionality and close this question.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 


--
May the force be with you…

Re: Background Processes and reporting

From
Bruce Momjian
Date:
On Sat, Mar 12, 2016 at 08:33:55PM +0300, Vladimir Borodin wrote:
> That’s why proposal included GUC for that with a default to turn timings
> measuring off. I don’t remember any objections against that.
> 
> And I’m absolutely sure that a real highload production (which of course
> doesn’t use virtualization and windows) can’t exist without measuring timings.
> Oracle guys have written several chapters (!) about that [0]. Long story short,
> sampling doesn’t give enough precision. I have shown overhead [1] on bare metal
> linux with high stressed lwlocks worload. BTW Oracle doesn’t give you any ways
> to turn timings measurement off, even with hidden parameters. All other
> commercial databases have waits monitoring with timings measurement. Let’s do
> it and turn it off by default so that all other platforms don’t suffer from it.

I realize that users of other databases have found sampling to be
insufficient, but I think we need to use the Postgres sampling method in
production to see if it is insufficient for Postgres as well.  We can't
design based on the limitations of other databases.

Also, we have enabled the sampling method in 9.6 that we know can be
enabled on every platform with limited overhead.  We can add additional
potential-overhead methods in later releases.

Frankly, it would be odd to add these features in the opposite order,
and the stubbornness of some in this thread to understand that is
concerning.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +