Re: Sampling-based timing for EXPLAIN ANALYZE - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Sampling-based timing for EXPLAIN ANALYZE |
Date | |
Msg-id | 673e9b4b-3bf7-ac85-0bd9-c99952800975@enterprisedb.com Whole thread Raw |
In response to | Re: Sampling-based timing for EXPLAIN ANALYZE (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Sampling-based timing for EXPLAIN ANALYZE
|
List | pgsql-hackers |
On 1/15/23 21:22, Andres Freund wrote: > Hi, > > On 2023-01-13 09:11:06 +0100, David Geier wrote: >> Mostly I'm wondering if the sampling based approach gains us enough to be >> worth it, once the patch to use RDTSC hopefully landed (see [1]). > > Well, I'm not sure we have a path forward on it. There's portability and > accuracy concerns. But more importantly: > >> I believe that with the RDTSC patch the overhead of TIMING ON is lower than >> the overhead of using ANALYZE with TIMING OFF in the first place. Hence, to >> be really useful, it would be great if we could on top of TIMING SAMPLING >> also lower the overhead of ANALYZE itself further (e.g. by using a fast path >> for the default EXPLAIN (ANALYZE, TIMING ON / SAMPLING)). Currently, >> InstrStartNode() and InstrStopNode() have a ton of branches and without all >> the typically deactivated code the implementation would be very small and >> could be placed in an inlinable function. > > Yes, I think SAMPLING could get rid of most of the instrumentation overhead - > at the cost of a bit less detail in the explain, of course. Which could make > it a lot more feasible to enable something like auto_explain.log_timing in > busy workloads. > > For the sampling mode we don't really need something like > InstrStart/StopNode. We just need a pointer to node currently executing - not > free to set, but still a heck of a lot cheaper than InstrStopNode(), even > without ->need_timer etc. Then the timer just needs to do > instr->sampled_total += (now - last_sample) > last_sample = now > I don't understand why we would even use timestamps, in this case? AFAIK "sampling profilers" simply increment a counter for the executing node, and then approximate the time as proportional to the count. That also does not have issues with timestamp "rounding" - considering e.g. sample rate 1000Hz, that's 1ms between samples. And it's quite possible the node completes within 1ms, in which case (now - last_sample) ends up being 0 (assuming I correctly understand the code). And I don't think there's any particularly good way to correct this. It seems ExplainNode() attempts to do some correction, but I doubt that's very reliable, as these fast nodes will have sampled_total=0, so no matter what you multiply this with, it'll still be 0. > > I've been thinking that we should consider making more of the instrumentation > code work like that. The amount of work we're doing in InstrStart/StopNode() > has steadily risen. When buffer usage and WAL usage are enabled, we're > executing over 1000 instructions! And each single Instrumentation node is ~450 > bytes, to a good degree due to having 2 BufUsage and 2 WalUsage structs > embedded. > > If we instead have InstrStartNode() set up a global pointer to the > Instrumentation node, we can make the instrumentation code modify both the > "global" counters (pgBufferUsage, pgWalUsage) and, if set, > current_instr->{pgBufferUsage, pgWalUsage}. That'll require some larger > changes - right now nodes "automatically" include the IO/WAL incurred in child > nodes, but that's just a small bit of additional summin-up to be done during > EXPLAIN. > That's certainly one way to implement that. I wonder if we could make that work without the global pointer, but I can't think of any. > > Separately, I think we should consider re-ordering Instrumentation so that > bufusage_start, walusage_start are after the much more commonly used > elements. We're forcing ntuples, nloops, .. onto separate cachelines, even > though they're accounted for unconditionally. > +1 to that regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: