Thread: Access statistics
One more feature for discussion :-) In the next couple of hours (at least tomorrow) I would be ready to commit the backend changes for table-/index-access statistics and current backend activity views. Should I apply the patches or provide a separate patch for review first? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck <JanWieck@Yahoo.com> writes: > In the next couple of hours (at least tomorrow) I would be > ready to commit the backend changes for table-/index-access > statistics and current backend activity views. > Should I apply the patches or provide a separate patch for > review first? Considering that you've not offered any detailed information about what you plan to do (AFAIR), a patch for review first would be polite ... regards, tom lane
> One more feature for discussion :-) > > In the next couple of hours (at least tomorrow) I would be > ready to commit the backend changes for table-/index-access > statistics and current backend activity views. > > Should I apply the patches or provide a separate patch for > review first? I like doing a cvs diff -c and throwing the patch to PATCHES just before commit. That way, people can see may changes easier and find problems. Of course, if I am at all unsure, I post to patches and wait 2 days. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > Jan Wieck <JanWieck@Yahoo.com> writes: > > In the next couple of hours (at least tomorrow) I would be > > ready to commit the backend changes for table-/index-access > > statistics and current backend activity views. > > Should I apply the patches or provide a separate patch for > > review first? > > Considering that you've not offered any detailed information about > what you plan to do (AFAIR), a patch for review first would be polite ... We had that discussion a couple of weeks ago down to if it's better to use UNIX or INET domain UDP sockets. But I expected it not to be detailed enough for our current quality level :-) It's incomplete anyway since the per database configuration in pg_database is missing and some other details Ineed to tidy up. So I'll wrap up a patch tomorrow. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck writes: > In the next couple of hours (at least tomorrow) I would be > ready to commit the backend changes for table-/index-access > statistics and current backend activity views. > > Should I apply the patches or provide a separate patch for > review first? Maybe you could describe what it's going to do and how it's going to work. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Peter Eisentraut wrote: > Jan Wieck writes: > > > In the next couple of hours (at least tomorrow) I would be > > ready to commit the backend changes for table-/index-access > > statistics and current backend activity views. > > > > Should I apply the patches or provide a separate patch for > > review first? > > Maybe you could describe what it's going to do and how it's going to work. Real programmers don't comment - if it was hard to write it should be hard to read :-) So outing myself not beeing a *real programmer*, this is what I have so far: * On startup the postmaster creates an INET domain UDP socket and bind(2)'s it to localhost:0, meaning the kernel will assign a yet unused, unprivileged port that could be seen with getsockaddr(2). It then starts two background processes of which one is simply a wraparound buffer doing recvfrom(2) on thesocket, checking that the source address of the received packets is the sockets own address (!) and forwardingapproved ones over a pipe to the second one, discribed later. * Backends call some collector functions at various places now (these will finally be macros), that count uptable scans, tuples returned by scans, buffer fetches/hits and the like. At the beginning of a statement thebackends send a message telling the first couple of hundred bytes of the querystring and after the statementis done (just before getting ready for the next command) they send the collected access numbers. Tables, indexes etc. in these statistics are identified by OID, so the data doesn't tell much so far. * The second background process fired by the postmaster collects these numbers into hashtables and as longas it receives messages, it'll write out a summary file every 500 or so milliseconds, telling a snapshot ofcurrent stats. Using select(2) with timeouts ensures that a complete idle DB instance not to waste a single CPUcycle or IO to write these snapshots. On startup it tries to read the last snapshot file in, so the collected statistics survive a postmaster restart. Vacuum reads the file too and sends bulk delete messages for objects that are gone. So the stats don't grow infinitely. * A bunch of new builtin functions gain access to the snapshot file. At first call of one of these functions during a transaction, the backend will read the current file and return the numbers from in memory then. Based on these functions a couple of views can tell these collected stats. Information from the databasessystem catalog is of course required to identify the objects in the stats, but I think those informationsshould only be visible to someone who identified herself as a valid DB user anyway. The visibility of querystrings (this info is available cross DB) is restricted to DB superusers. There has been discussion already about using an INET vs. UNIX UDP socket for the communication. At least forLinux I found INET to be the most effective way of communication. And for security concerns: If someone else thanroot can really send packets to that socket that show up with a source address of 127.0.0.1:n, where n is a portnumber actually occupied by your own socket, be sure you'll have more severe problems than modified accessstatistics. The views should be considered examples. The final naming and layout is subject for discussion. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck <JanWieck@Yahoo.com> writes: > So outing myself not beeing a *real programmer*, this is what > I have so far: Hmm ... what is the performance of all this like? Seems like a lot of overhead. Can it be turned off? > * Backends call some collector functions at various places > now (these will finally be macros), that count up table > scans, tuples returned by scans, buffer fetches/hits and > the like. Have you removed the existing stats-gathering support (backend/access/heap/stats.c and so on)? That would buy back at least a few of the cycles involved ... regards, tom lane
Tom Lane wrote: > Jan Wieck <JanWieck@Yahoo.com> writes: > > So outing myself not beeing a *real programmer*, this is what > > I have so far: > > Hmm ... what is the performance of all this like? Seems like a lot > of overhead. Can it be turned off? Current performance loss is about 2-4% wallclock measured. I expect it to become better when turning some of the functions into macros. The plan is to add another column to pg_database that can be used to turn it on/off on a per database level. Backendsjust decide at startup if they collect and send for their session lifetime. > > * Backends call some collector functions at various places > > now (these will finally be macros), that count up table > > scans, tuples returned by scans, buffer fetches/hits and > > the like. > > Have you removed the existing stats-gathering support > (backend/access/heap/stats.c and so on)? That would buy back > at least a few of the cycles involved ... Not sure if we really should. Let's later decide if it's really obsolete. -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck <JanWieck@yahoo.com> writes: > Tom Lane wrote: >> Have you removed the existing stats-gathering support >> (backend/access/heap/stats.c and so on)? That would buy back >> at least a few of the cycles involved ... > Not sure if we really should. Let's later decide if it's > really obsolete. Considering that Bruce long ago ifdef'd out all the code that could actually *do* anything with those stats (like print them), I'd say it's obsolete. In any case, it's too confusing to have two sets of stats-gathering code in there. I vote for getting rid of the old stuff. regards, tom lane
For what its worth, Oracle has a nice feature for resource management called PROFILEs: CREATE PROFILE profile LIMIT [ SESSION_PER_USER [ session_limit | UNLIMITED | DEFAULT ] ] [ CPU_PER_SESSION cpu_session_limit | UNLIMITED | DEFAULT ] ] [ CPU_PER_CALL cpu_call_limit | UNLIMITED | DEFAULT ] ] [ CONNECT_TIME connect_limit | UNLIMITED | DEFAULT ] ] [ IDLE_TIME idle_limit | UNLIMITED | DEFAULT ] ] [ LOGICAL READS_PER_SESSION read_session_limit | UNLIMITED | DEFAULT ] ] [ LOGICAL READS_PER_CALL read_call_limit | UNLIMITED | DEFAULT ] ] [ PRIVATE_SGA memory_limit | UNLIMITED | DEFAULT ] ] [ COMPOSITE_LIMIT resource_cost_limit | UNLIMITED | DEFAULT ] ] which limits things like CPU_PER_CALL and LOGICAL_READS_PER_SESSION to a profile. The ALTER USER command then allows you to assign a PROFILE to a user. This is really nice, since you can prevent run-away queries from denying service by the database to other users. It also can prevent a user from soaking up all of the available connections. You must set a flag in your initSID.ora configuration profile for ORACLE to support profiles. Since Jan is collecting these statistics any way (if the appropriate configuration flag is set), it would be pretty trivial to implement PROFILEs in PostgreSQL. Mike Mascari mascarm@mascari.com -----Original Message----- From: Tom Lane [SMTP:tgl@sss.pgh.pa.us] Jan Wieck <JanWieck@Yahoo.com> writes: > So outing myself not beeing a *real programmer*, this is what > I have so far: Hmm ... what is the performance of all this like? Seems like a lot of overhead. Can it be turned off? > * Backends call some collector functions at various places > now (these will finally be macros), that count up table > scans, tuples returned by scans, buffer fetches/hits and > the like. Have you removed the existing stats-gathering support (backend/access/heap/stats.c and so on)? That would buy back at least a few of the cycles involved ... regards, tom lane
> Tom Lane wrote: > > Jan Wieck <JanWieck@Yahoo.com> writes: > > > So outing myself not beeing a *real programmer*, this is what > > > I have so far: > > > > Hmm ... what is the performance of all this like? Seems like a lot > > of overhead. Can it be turned off? > > Current performance loss is about 2-4% wallclock measured. I > expect it to become better when turning some of the functions > into macros. At 2-4%, I assume it is not enabled by default. I can see the query string part being enabled by default though. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Jan Wieck <JanWieck@yahoo.com> writes: > > Tom Lane wrote: > >> Have you removed the existing stats-gathering support > >> (backend/access/heap/stats.c and so on)? That would buy back > >> at least a few of the cycles involved ... > > > Not sure if we really should. Let's later decide if it's > > really obsolete. > > Considering that Bruce long ago ifdef'd out all the code that could > actually *do* anything with those stats (like print them), I'd say > it's obsolete. In any case, it's too confusing to have two sets of > stats-gathering code in there. I vote for getting rid of the old > stuff. I agree. Rip away. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026