On Tue, Jan 21, 2014 at 8:01 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> Actually, I think the whole thing is rather badly designed, as warm
>> cache or no you're still proposing to do thousands of kernel calls
>> while holding a potentially contended LWLock. I suggest what we
>> do is (1) read in the whole file, (2) acquire the lock, (3)
>> re-read the whole file in the same buffer, (4) work from the buffer.
>
> fseeko() and fread() are part of the standard library, not any kernel.
> I don't believe that what I've done in qtext_fetch() implies thousands
> of kernel calls, or thousands of context switches.
I ran an strace on a pg_stat_statements backend with a full ~5,000
entries. It seems that glibc is actually content to always make
lseek() and read() syscalls in the event of an fseek(), even though it
does maintain its own buffer and could in principle have a lot more
gumption about that [1]. I agree that we should copy everything into a
buffer in one go.
I think that making the LWLocking duration as brief as possible isn't
critically important. Automated tools will only be interested in new
query texts (implying possible I/O while share locking) as they
observe new entries. But new entries are the only thing worth
considering that may potentially block on that shared lock (although,
not as they write out their query texts, which almost always just
requires a shared lock). New entries ought to be very rare occurrence
compared to the execution of queries - certainly, on the busy
production systems I use pg_stat_statements on, it may be weeks before
a new query is observed (uh, with the possible exception of *my*
ad-hoc pg_stat_statements queries). On my laptop, "explain analyze
select * from pg_stat_statements" indicates a total runtime of ~9ms
with 5,000 regression test entries, even with the (still modest)
overhead of doing all those read()/lseek() calls. So we're talking
about a small impact on a new entry, that will only be affected once,
that was always going to have to write out its query text to file
anyway. On top of all this, in general it's very improbable that any
given new entry will be affected at all, because two unlikely things
need to happen at the same time for that to be possible.
The most important aspect of keeping the overhead of
pg_stat_statements low is that there not be too much cache pressure.
Obviously the huge reduction in its shared memory footprint that this
patch allows helps with that.
[1] http://www.pixelbeat.org/programming/stdio_buffering/
--
Peter Geoghegan