Re: That EXPLAIN ANALYZE patch still needs work - Mailing list pgsql-hackers

From Jim C. Nasby
Subject Re: That EXPLAIN ANALYZE patch still needs work
Date
Msg-id 20060606210519.GC45331@pervasive.com
Whole thread Raw
In response to Re: That EXPLAIN ANALYZE patch still needs work  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: That EXPLAIN ANALYZE patch still needs work  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Jun 06, 2006 at 04:50:28PM -0400, Tom Lane wrote:
> I have a theory about this, and it's not pleasant at all.  What I
> think is that we have a Heisenberg problem here: the act of invoking
> gettimeofday() actually changes what is measured.  That is, the
> runtime of the "second part" of ExecProcNode is actually longer when
> we sample than when we don't, not merely due to the extra time spent
> in gettimeofday().  It's not very hard to guess at reasons why, either.
> The kernel entry is probably flushing some part of the CPU's state,
> such as virtual/physical address mapping for the userland address
> space.  After returning from the kernel call, the time to reload
> that state shows up as more execution time within the "second part".
> 
> This theory explains two observations that otherwise are hard to
> explain.  One, that the effect is platform-specific: your machine
> may avoid flushing as much state during a kernel call as mine does.
> And two, that upper plan nodes seem much more affected than lower
> ones.  That makes sense because the execution cycle of an upper node
> will involve touching more userspace data than a lower node, and
> therefore more of the flushed TLB entries will need to be reloaded.

If that's the case, then maybe a more sopdisticated method of measuring
the overhead would work. My thought is that on the second call to pull a
tuple from a node (second because the first probably has some anomolies
due to startup), we measure the overhead for that node. This would
probably mean doing the following:
get start time # I'm not refering to this as gettimeofday to avoid              # confusion
gettimeofday() # this is the gettimeofday call that will happen during              # normal operation
get end time

Hopefully, there's no caching effect that would come into play from not
actually touching any of the data structures after the gettimeofday()
call. If that's not the case, it makes measuring the overhead more
complex, but I think it should still be doable...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: AIX check in datetime.h
Next
From: Tom Lane
Date:
Subject: Re: That EXPLAIN ANALYZE patch still needs work