Re: Background Processes and reporting - Mailing list pgsql-hackers
From | Vladimir Borodin |
---|---|
Subject | Re: Background Processes and reporting |
Date | |
Msg-id | FC188775-FF46-4202-9958-6F0E1D3E0A0C@simply.name Whole thread Raw |
In response to | Re: Background Processes and reporting (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Background Processes and reporting
|
List | pgsql-hackers |
12 марта 2016 г., в 2:45, Andres Freund <andres@anarazel.de> написал(а):On 2016-03-12 02:24:33 +0300, Alexander Korotkov wrote:Idea of individual time measurement of every wait event met criticism
because it might have high overhead [1].
Right. And that's actually one of the point which I meant with "didn't
listen to criticism". There've been a lot of examples, on an off list,
where taking timings trigger significant slowdowns. Yes, in some
bare-metal environments, which a coherent tsc, the overhead can be
low. But that doesn't make it ok to have a high overhead on a lot of
other systems.
That’s why proposal included GUC for that with a default to turn timings measuring off. I don’t remember any objections against that.
And I’m absolutely sure that a real highload production (which of course doesn’t use virtualization and windows) can’t exist without measuring timings. Oracle guys have written several chapters (!) about that [0]. Long story short, sampling doesn’t give enough precision. I have shown overhead [1] on bare metal linux with high stressed lwlocks worload. BTW Oracle doesn’t give you any ways to turn timings measurement off, even with hidden parameters. All other commercial databases have waits monitoring with timings measurement. Let’s do it and turn it off by default so that all other platforms don’t suffer from it.
Just claiming that that's not a problem will only lead to your position
not being taken serious.This is really so at least for Windows [2].
Measuring timing overhead for a simplistic workload on a single system
doesn't mean that. Try doing such a test on a vmware esx virtualized
windows machine, on a multi-socket server; in a lot of instances you'll
see two-three orders of magnitude longer average times; with peaks going
into 4-5 orders of magnitude. And, as sad it is, realistically most
postgres instances will run in virtualized environments.
But accessing only current values wouldn't be very useful. We
anyway need to gather some statistics. Gathering it by sampling would be
both more expensive and less accurate for majority of systems. This is why
I proposed hooks to make possible platform dependent extensions. Robert
rejects hook because he is "not a big fan of hooks as a way of resolving
disagreements about the design" [3].
I think I agree with Robert here. Providing hooks into very low level
places tends to lead to problems in my experience; tight control over
what happens is often important - I certainly don't want any external
code to run while we're waiting for an lwlock.Besides that is actually not design issues but platform issues...
I don't see how that's the case.Another question is wait parameters. We want to expose wait event with
some parameters. Robert rejects that because it *might* add additional
overhead [3]. When I proposed to fit something useful into hard-won
4-bytes, Roberts claims that it is "too clever" [4].
I think stopping to treat this as "Robert/EDB vs. pgpro" would be a good
first step to make progress here.
It seems entirely possible to extend the current API in an incremental
fashion, either allowing to disable the individual pieces, or providing
sufficient measurements that it's not needed.So, situation looks like dead-end. I have no idea how to convince Robert
about any kind of advanced functionality of wait monitoring to PostgreSQL.
I'm thinking about implementing sampling extension over current
infrastructure just to make community see that it sucks. Andres, it would
be very nice if you have any idea how to move this situation forward.
I've had my share of conflicts with Robert. But if I were in his shoes,
targeted by this kind of rhetoric, I'd be very tempted to just ignore
any further arguments from the origin. So I think the way forward is
for everyone to cool off, and to see how we can incrementally make
progress from here on.Another aspect is that EnterpriseDB offers waits monitoring in proprietary
fork [5].
So?
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: