Peter Geoghegan <pg@heroku.com> writes:
> On Tue, Jan 21, 2014 at 8:01 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>> Actually, I think the whole thing is rather badly designed, as warm
>>> cache or no you're still proposing to do thousands of kernel calls
>>> while holding a potentially contended LWLock. I suggest what we
>>> do is (1) read in the whole file, (2) acquire the lock, (3)
>>> re-read the whole file in the same buffer, (4) work from the buffer.
>> fseeko() and fread() are part of the standard library, not any kernel.
>> I don't believe that what I've done in qtext_fetch() implies thousands
>> of kernel calls, or thousands of context switches.
> I ran an strace on a pg_stat_statements backend with a full ~5,000
> entries. It seems that glibc is actually content to always make
> lseek() and read() syscalls in the event of an fseek(), even though it
> does maintain its own buffer and could in principle have a lot more
> gumption about that [1]. I agree that we should copy everything into a
> buffer in one go.
Yeah, that was what I thought might happen. Even if glibc were smarter,
a lot of other libc implementations aren't. Also, ISTM that traversal
of the hash table will generally lead to more-or-less-random access into
the text file. So unless you assume that libc has mmap'd the whole
file, it'd have to do a lot of kernel calls anyway to get different pages
of the file into its buffer at different times.
I think that "read the whole file" is probably going to net out not any
more complex as far as our code goes, and it'll be making a lot fewer
assumptions about how smart the libc level is and what's happening at
the kernel level. I'll have a go at coding it, anyway.
> I think that making the LWLocking duration as brief as possible isn't
> critically important.
Maybe, but I don't like the idea that calling pg_stat_statements()
frequently could result in random glitches in response time for other
queries.
> Automated tools will only be interested in new
> query texts (implying possible I/O while share locking) as they
> observe new entries.
This argument would be more plausible if there were a way to fetch
the text for a previously-observed entry without invoking
pg_stat_statements(). I'm surprised you've not submitted a patch
to add such a function.
regards, tom lane