Re: Sampling-based timing for EXPLAIN ANALYZE - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Sampling-based timing for EXPLAIN ANALYZE
Date
Msg-id 9b29c49f-70a4-ee20-b565-6a4cfa0d3c2f@enterprisedb.com
Whole thread Raw
In response to Re: Sampling-based timing for EXPLAIN ANALYZE  (Andres Freund <andres@anarazel.de>)
Responses Re: Sampling-based timing for EXPLAIN ANALYZE
List pgsql-hackers
On 1/17/23 19:46, Andres Freund wrote:
> Hi,
> 
> On 2023-01-17 19:00:02 +0100, Tomas Vondra wrote:
>> On 1/17/23 18:02, Andres Freund wrote:
>>> On 2023-01-17 15:52:07 +0100, Tomas Vondra wrote:
>>>> That also does not have issues with timestamp "rounding" - considering
>>>> e.g. sample rate 1000Hz, that's 1ms between samples. And it's quite
>>>> possible the node completes within 1ms, in which case
>>>>
>>>>    (now - last_sample)
>>>>
>>>> ends up being 0 (assuming I correctly understand the code).
>>>
>>> That part should be counting in nanoseconds, I think? Unless I misunderstand
>>> something?
>>>
>>
>> The higher precision does not help, because both values come from the
>> *sampled* timestamp (i.e. the one updated from the signal handler). So
>> if the node happens to execute between two signals, the values are going
>> to be the same, and the difference is 0.
> 
> In that case there simply wasn't any sample for the node, and a non-timestamp
> based sample counter wouldn't do anything different?
> 

Yeah, you're right.

> If you're worried about the case where a timer does fire during execution of
> the node, but exactly once, that should provide a difference between the last
> sampled timestamp and the current time. It'll attribute a bit too much to the
> in-progress nodes, but well, that's sampling for you.
> 
> 
> I think a "hybrid" explain mode might be worth thinking about. Use the
> "current" sampling method for the first execution of a node, and for the first
> few milliseconds of a query (or perhaps the first few timestamp
> acquisitions). That provides an accurate explain analyze for short queries,
> without a significant slowdown. Then switch to sampling, which provides decent
> attribution for a bit longer running queries.
> 

Yeah, this is essentially the sampling I imagined when I first read the
subject of this thread. It samples which node executions to measure (and
then measures those accurately), while these patches sample timestamps.

> 
> 
>>>>> I've been thinking that we should consider making more of the instrumentation
>>>>> code work like that. The amount of work we're doing in InstrStart/StopNode()
>>>>> has steadily risen. When buffer usage and WAL usage are enabled, we're
>>>>> executing over 1000 instructions! And each single Instrumentation node is ~450
>>>>> bytes, to a good degree due to having 2 BufUsage and 2 WalUsage structs
>>>>> embedded.
>>>>>
>>>>> If we instead have InstrStartNode() set up a global pointer to the
>>>>> Instrumentation node, we can make the instrumentation code modify both the
>>>>> "global" counters (pgBufferUsage, pgWalUsage) and, if set,
>>>>> current_instr->{pgBufferUsage, pgWalUsage}. That'll require some larger
>>>>> changes - right now nodes "automatically" include the IO/WAL incurred in child
>>>>> nodes, but that's just a small bit of additional summin-up to be done during
>>>>> EXPLAIN.
>>>>>
>>>>
>>>> That's certainly one way to implement that. I wonder if we could make
>>>> that work without the global pointer, but I can't think of any.
>>>
>>> I don't see a realistic way at least. We could pass down an
>>> "InstrumentationContext" through everything that needs to do IO and WAL. But
>>> that seems infeasible at this point.
> 
>> Why infeasible?
> 
> Primarily the scale of the change. We'd have to pass down the context into all
> table/index AM functions. And into a lot of bufmgr.c, xlog.c functions,
> which'd require their callers to have access to the context.  That's hundreds
> if not thousands places.
> 
> Adding that many function parameters might turn out to be noticable runtime
> wise, due to increased register movement. I think for a number of the places
> where we currently don't, we ought to use by-reference struct for the
> not-always-used parameters, that then also could contain this context.
> 

OK, I haven't realized we'd have to pass it to that many places.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Progress report of CREATE INDEX for nested partitioned tables
Next
From: Andres Freund
Date:
Subject: Re: CI and test improvements