Thread: Access statistics
Here it comes. The views created for now are: pg_stat_activity Current connections to the entire installation including actual querystrings. pg_stat_database Per database IO summary. pg_stat_{all|sys|user}_tables Per table scan statistics. pg_statio_{all|sys|user}_tables Per table buffer cache statistics. pg_stat_{all|sys|user}_indexes Per index scan statistics. pg_statio_{all|sys|user}_indexes Per index buffer cache statistics. pg_statio_{all|sys|user}_indexes Per sequence buffer cache statistics. Of course, an initdb is required (this patch up to now doesn't bump catversion). The rules regression test is broken due to the new views from initdb. The existing functionality to reset stats isn't callable yet. Should become a utility command someday. Collecting must be configurable per database (catalog change). Due to all the changes that have happened (ever tried to apply a 130K patch after a pgindent run?) I must verify all this patch touches once more. Have fun. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Attachment
Jan Wieck <JanWieck@Yahoo.com> writes: > Here it comes. Hm. I really don't like the way you've uglified the heap_fetch interface; can't that be done better? It's just plain bizarre that heap_fetch is now responsible for counting stats for index scans. And surely this bit of code is wrong: *userbuf = buffer; + + if (iscan != NULL) + pgstat_count_heap_fetch(&iscan->xs_pgstat_info); + else + pgstat_count_heap_fetch(&relation->pgstat_info); } You can't count heapscan info via the relcache entry --- what if there's more than one heapscan in progress on the same rel? What if the relcache entry gets rebuilt by SI invalidation? I don't think the stats stuff should touch the relcache at all. There's been talk in the past of trying to unify the heapscan and indexscan APIs, so that we wouldn't need ugliness like the two sets of coding in AttrDefaultFetch and similar places. Maybe it's time to bite that bullet, if it'd provide a cleaner place to put the stats calls. Also, I do not like the pgStatPmPipe mechanism for detecting postmaster shutdown. That eats two file descriptors in every backend, which is a pretty substantial waste of resources. Why not just have the postmaster send a signal to the stats collector when it's time to shut down? BTW, PG_FUNCTION_INFO_V1() is not needed for built-in functions. regards, tom lane
Tom Lane wrote: > Jan Wieck <JanWieck@Yahoo.com> writes: > > Here it comes. > > Hm. I really don't like the way you've uglified the heap_fetch > interface; can't that be done better? It's just plain bizarre that > heap_fetch is now responsible for counting stats for index scans. > And surely this bit of code is wrong: > > *userbuf = buffer; > + > + if (iscan != NULL) > + pgstat_count_heap_fetch(&iscan->xs_pgstat_info); > + else > + pgstat_count_heap_fetch(&relation->pgstat_info); > } > > You can't count heapscan info via the relcache entry --- what if there's > more than one heapscan in progress on the same rel? What if the > relcache entry gets rebuilt by SI invalidation? I don't think the stats > stuff should touch the relcache at all. The counting end's up in one and the same global slot of the tabstat messages. These slots are collected per frontend command and are identified by the tables oid. Does it matter how often the pointer to that slot is updated to the same value? > > There's been talk in the past of trying to unify the heapscan and > indexscan APIs, so that we wouldn't need ugliness like the two sets of > coding in AttrDefaultFetch and similar places. Maybe it's time to bite > that bullet, if it'd provide a cleaner place to put the stats calls. Exactly that's the problem. Only heap_fetch() knows if the tid passed in is visible or not, so right now the counting could be done there or in all the places where heap_fetch() is called. > > Also, I do not like the pgStatPmPipe mechanism for detecting postmaster > shutdown. That eats two file descriptors in every backend, which is a > pretty substantial waste of resources. Why not just have the postmaster > send a signal to the stats collector when it's time to shut down? Don't kill -9 the postmaster :-) Well, we could do the signal *plus* having the child checking from time to time with a zero kill if the postmaster died to suizide in case. > > BTW, PG_FUNCTION_INFO_V1() is not needed for built-in functions. Survived somehow from the time where they resided in a loadable module. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck <JanWieck@yahoo.com> writes: >> You can't count heapscan info via the relcache entry --- what if there's >> more than one heapscan in progress on the same rel? What if the >> relcache entry gets rebuilt by SI invalidation? I don't think the stats >> stuff should touch the relcache at all. > The counting end's up in one and the same global slot of the > tabstat messages. These slots are collected per frontend > command and are identified by the tables oid. Does it matter > how often the pointer to that slot is updated to the same > value? In that case, forget about the fields in the heapscan and indexscan structs. You can easily keep both the heap and index access counts in (or referenced by) the relcache entry, no? It just bothers me that the treatment of heap and index counts is so inconsistent. One or the other is wrong, or at least inefficient. BTW, does it really matter whether we are counting an index-driven or heap-driven fetch? Seems to me that if you count indextuple and heaptuple fetches, you can figure out what's going on without having to have heap_fetch know the difference. >> Also, I do not like the pgStatPmPipe mechanism for detecting postmaster >> shutdown. That eats two file descriptors in every backend, which is a >> pretty substantial waste of resources. Why not just have the postmaster >> send a signal to the stats collector when it's time to shut down? > Don't kill -9 the postmaster :-) What's your point? If the postmaster crashes, whether the stats collector goes away without help seems like the least of your worries. You'll end up killing backends manually anyhow, so why not the stats process too? (For that matter, do you care if the stats process doesn't die? Once there is no one left to send messages to it, it'll never do anything, no?) regards, tom lane
Tom Lane wrote: > Jan Wieck <JanWieck@yahoo.com> writes: > >> You can't count heapscan info via the relcache entry --- what if there's > >> more than one heapscan in progress on the same rel? What if the > >> relcache entry gets rebuilt by SI invalidation? I don't think the stats > >> stuff should touch the relcache at all. > > > The counting end's up in one and the same global slot of the > > tabstat messages. These slots are collected per frontend > > command and are identified by the tables oid. Does it matter > > how often the pointer to that slot is updated to the same > > value? > > In that case, forget about the fields in the heapscan and indexscan > structs. You can easily keep both the heap and index access counts > in (or referenced by) the relcache entry, no? It just bothers me > that the treatment of heap and index counts is so inconsistent. > One or the other is wrong, or at least inefficient. heap tuples fetched due to an index scan are counted as tuples_fetched on the indexes entry. Only heap_fetch() calls not done due to an index scan count into the tuples_fetched count for the table (e.g. tuples refetched for deferred trigger invocation and the like). > > BTW, does it really matter whether we are counting an index-driven > or heap-driven fetch? Seems to me that if you count indextuple and > heaptuple fetches, you can figure out what's going on without having > to have heap_fetch know the difference. > > >> Also, I do not like the pgStatPmPipe mechanism for detecting postmaster > >> shutdown. That eats two file descriptors in every backend, which is a > >> pretty substantial waste of resources. Why not just have the postmaster > >> send a signal to the stats collector when it's time to shut down? > > > Don't kill -9 the postmaster :-) > > What's your point? If the postmaster crashes, whether the stats > collector goes away without help seems like the least of your worries. > You'll end up killing backends manually anyhow, so why not the stats > process too? (For that matter, do you care if the stats process doesn't > die? Once there is no one left to send messages to it, it'll never do > anything, no?) Except for generating another FAQ. We haven't been able to stop ppl from killing the postmaster up to now. Why do you think they'll ever learn? They are human ... Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com