Thread: Sampling profiler updated
I updated Sampling profiler patch to be applied to HEAD cleanly. Basic concept of the patch is same as DTrace probes: Call pgstat_push_condition(condition) before a operation and call pgstat_pop_condition() at the end of the operation. Those functions should be light-weight because they only change a variable on shared memory without any locks. Stats collector process checks those shard variables periodically and sums the status in collector's local memory. We cannot know exact numbers of each operation, but can expect the sampling numbers reflect the tendency of times spend in each operation. The sampling result can be retrived with pg_profiles system view. Of course the profiler could be implemented on the top of DTrace or SystemTap, but it is not so easy if we try to avoid any performance regressions and to provide the information by VIEW-like interface. Also, this approach is platform-independent. A new feature compared with previous patch is function pgstat_register_condition(condition, name). We can add user-defined conditions in extended modules. Comments welcome. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
On Tue, Jul 14, 2009 at 4:47 AM, Itagaki Takahiro<itagaki.takahiro@oss.ntt.co.jp> wrote: > I updated Sampling profiler patch to be applied to HEAD cleanly. > shouldn't pg_stat_reset() reset these values? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Hi! Itagaki Takahiro writes: > I updated Sampling profiler patch to be applied to HEAD cleanly. > > [...] > > Comments welcome. I believe the profiler could give us a better understanding of where different parts of the user visible response time originate from. The problem with DTrace in my opinion is the lack of support on certain platforms (e.g. Windows) and the need to have kernel support and root access, which might not be available to the DBA or developer. Have you thought about keeping the counters for each backend isolated? I think in the end it would be beneficial to be able to break down the response time for a critical business transaction in isolation instead of having all backends in one figure. Do you know the work of Cary Millsap at http://www.method-r.com/ who has been working on response time based tuning in Oracle? Regards, Stefan
Stefan Moeding <pgsql@moeding.net> wrote: > Have you thought about keeping the counters for each backend isolated? > I think in the end it would be beneficial to be able to break down the > response time for a critical business transaction in isolation instead > of having all backends in one figure. I think per-backend profiling is confusable because one backend might be reused by multiple jobs if you use connection pooling. If we need more detailed profiling, it should be grouped by query variations. I have another plan for detailed profiling by extending pg_stat_statements for such purposes. It'll include functionalities of log_{parser|planner| executor|statement}_stats parameters. They are log-based profiler, but a view-based approach seems to be more easy-to-use. > Do you know the work of Cary Millsap at http://www.method-r.com/ who has > been working on response time based tuning in Oracle? I didn't know that. What is the point of the works? Are there some knowledge we should learn from? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Hi! Thanks for your answer. Here is my general reasoning: I was thinking about a way to use the profiler to determine the resource profile even of (maybe even short time) business transactions. I would like to leave the technical focus (high CPU usage, high I/O rate, too many disk sorts, ...) to a more customer/business centric approach (loading the customer form takes too long). My vision would be to get a profile for just one session and only for the time it takes to load the form. Using the profile for the whole database would hide the exact details when you have other users doing other things. And oviously you need to do it on the production machine during business hours or you would ignore most of the influencing factors. The resource profile for the observed business transaction tells us where the time is actually spent. Applying Amdahl's Law also tells us what improvements we can expect from certain changes. Let's asume that 30% of the time is CPU time and the business requests the transaction duration to be cut down to half of the current duration. Without the profile you could only guess that a CPU upgrade might help. With the profile you can prove that even doubling the CPU speed will only get you a 15% improvement. The advantage is in my opinion that it will not only show you the most beneficial approach (the resource that contributes most to the total time) but also can use business related terms (improve online form X or batch job Y) together with specific improvements that can be expected. Itagaki Takahiro writes: > I think per-backend profiling is confusable because one backend might be > reused by multiple jobs if you use connection pooling. If we need more > detailed profiling, it should be grouped by query variations. I see your point. I must say that I haven't thought of pooling probably because I don't use it. But it is easier to build views around the data with aggregations to hide the details than trying to come up with details when only averages are available. > I didn't know that. What is the point of the works? Are there some > knowledge we should learn from? I tried to outline most of his message in the first paragraphs above. His Method-R to response time based approach seems to me like a good improvement as it is measurable in business terms, allows a prediction before you change of buy something and gives a deterministic and repeatable way to tackle the root cause for the current performance shortcoming. -- Stefan
Hi, Le 14 juil. 09 à 11:47, Itagaki Takahiro a écrit : > I updated Sampling profiler patch to be applied to HEAD cleanly. Which I confirm, as I just spent some time to reviewing the patch (it was left unassigned in the commit fest). Reviewing code didn't strike anything so obvious that I'd notice... > Basic concept of the patch is same as DTrace probes: > Call pgstat_push_condition(condition) before a operation and call > pgstat_pop_condition() at the end of the operation. Those functions > should be light-weight because they only change a variable on shared > memory without any locks. ...except that the typical integration is like this: + pgstat_push_condition(PGCOND_XLOG_OPEN); fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method), S_IRUSR | S_IWUSR); + pgstat_pop_condition(); And we have this: ! void ! pgstat_push_condition(PgCondition condition) ! { ! volatile PgBackendStatus *beentry = MyBEEntry; ! PgCondition prev; ! ! if (profiling_interval <= 0 || !beentry) ! return; I'm wondering if it'll be enough for people not interested into profiling not to bother. The patch contains a lot of added call sites. I'm not sure if it's possible to arrange for not calling the function at all when profiling is disabled (GUC profiling_interval = 0). On the same vein: + static void + LockPageContent(volatile BufferDesc *buf, LWLockMode mode) + { + pgstat_push_condition(PGCOND_LWLOCK_PAGE); + LWLockAcquire(buf->content_lock, mode); + pgstat_pop_condition(); + } + + static void + LockPageIO(volatile BufferDesc *buf, LWLockMode mode) + { + pgstat_push_condition(PGCOND_LWLOCK_IO); + LWLockAcquire(buf->io_in_progress_lock, mode); + pgstat_pop_condition(); + } With the call site being (in src/backend/storage/buffer/bufmgr.c, FlushRelationBuffers(Relation rel), FlushDatabaseBuffers(Oid dbid)): ! LWLockAcquire(bufHdr->content_lock, LW_SHARED); ... ! LockPageContent(bufHdr, LW_SHARED); Maybe there's nothing to worry about, but I figured I'd better raise the concert for further reviewing. Oh, and this too: *************** LockBuffer(Buffer buffer, int mode) *** 2372,2380 **** if (mode == BUFFER_LOCK_UNLOCK) LWLockRelease(buf->content_lock); else if (mode ==BUFFER_LOCK_SHARE) ! LWLockAcquire(buf->content_lock, LW_SHARED); else if (mode == BUFFER_LOCK_EXCLUSIVE) ! LWLockAcquire(buf->content_lock, LW_EXCLUSIVE); Now the build went fine, but the testing (default configuration) not so much: dim=# create table series(i integer); dim=# insert into series select generate_series(1, 10000000); LOG: checkpoints are occurring too frequently (4 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". WARNING: condition stack overflow: 10 ... WARNING: condition stack overflow: 11 WARNING: condition stack overflow: 11 WARNING: condition stack overflow: 11 ERROR: canceling statement due to user request dim=# select count(*) from series; count ------- 0 (1 row) Time: 9504.624 ms dim=# select * from pg_profiles; profid | profname | profnum --------+---------------------+--------- 83000 | LWLock:Page | 15 58000 | Data:Extend | 7 80018| LWLock:BgWriterComm | 1 10000 | CPU | 85 22000 | Network:Send | 128 80009| LWLock:WALWrite | 6 55000 | Data:Read | 1 45000 | XLog:Write | 1 15000| CPU:Utility | 5 15100 | CPU:Commit | 1 57000 | Data:Write | 15 42000| XLog:Insert | 24 46000 | XLog:Flush | 4 (13 rows) Time: 11.372 ms The error I got is matching this part of the patch: ! void ! pgstat_push_condition(PgCondition condition) ! { ... ! if (condition_stack_top < MAX_COND_STACK) ! condition_stack[condition_stack_top] = prev; ! else ! elog(WARNING, "condition stack overflow: %d", condition_stack_top); So I'm going to change patch state to "Returned with Feedback" as I guess we'll need to talk about the issue and find a way to solve it, and I don't think this state prevent from getting back to the patch in this same fest. Regards, -- dim
On Sat, Jul 18, 2009 at 4:09 PM, Dimitri Fontaine<dfontaine@hi-media.com> wrote: > So I'm going to change patch state to "Returned with Feedback" as I guess > we'll need to talk about the issue and find a way to solve it, and I don't > think this state prevent from getting back to the patch in this same fest. In general I would prefer patches to be set to Returned with Feedback only when we think they are probably done for this CommitFest. Otherwise, it's hard to tell which are really done and which are maybe done. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Jul 18, 2009 at 4:09 PM, Dimitri Fontaine<dfontaine@hi-media.com> wrote: >> So I'm going to change patch state to "Returned with Feedback" as I guess >> we'll need to talk about the issue and find a way to solve it, and I don't >> think this state prevent from getting back to the patch in this same fest. > In general I would prefer patches to be set to Returned with Feedback > only when we think they are probably done for this CommitFest. > Otherwise, it's hard to tell which are really done and which are maybe > done. What do you want to use then ... Waiting on Author? regards, tom lane
On Sat, Jul 18, 2009 at 3:38 PM, Robert Haas<robertmhaas@gmail.com> wrote: > On Sat, Jul 18, 2009 at 4:09 PM, Dimitri Fontaine<dfontaine@hi-media.com> wrote: >> So I'm going to change patch state to "Returned with Feedback" as I guess >> we'll need to talk about the issue and find a way to solve it, and I don't >> think this state prevent from getting back to the patch in this same fest. > > In general I would prefer patches to be set to Returned with Feedback > only when we think they are probably done for this CommitFest. why? it seems very simple as is: Returned with Feedback means there is something to clean, when the author fixes the problem he can update it to Needs review. > Otherwise, it's hard to tell which are really done and which are maybe > done. > when the rrr thinks the patch is in good shape mark it as Ready for committer then the commiter could apply, reject or return with feedback again... at the end of the commitfest if there are patches on returned with feedback and the author respond those will go to the next commitfest... -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Sat, Jul 18, 2009 at 5:28 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sat, Jul 18, 2009 at 4:09 PM, Dimitri Fontaine<dfontaine@hi-media.com> wrote: >>> So I'm going to change patch state to "Returned with Feedback" as I guess >>> we'll need to talk about the issue and find a way to solve it, and I don't >>> think this state prevent from getting back to the patch in this same fest. > >> In general I would prefer patches to be set to Returned with Feedback >> only when we think they are probably done for this CommitFest. >> Otherwise, it's hard to tell which are really done and which are maybe >> done. > > What do you want to use then ... Waiting on Author? Yeah, that's what I was thinking. ...Robert
On Sat, Jul 18, 2009 at 5:36 PM, Jaime Casanova<jcasanov@systemguards.com.ec> wrote: > On Sat, Jul 18, 2009 at 3:38 PM, Robert Haas<robertmhaas@gmail.com> wrote: >> On Sat, Jul 18, 2009 at 4:09 PM, Dimitri Fontaine<dfontaine@hi-media.com> wrote: >>> So I'm going to change patch state to "Returned with Feedback" as I guess >>> we'll need to talk about the issue and find a way to solve it, and I don't >>> think this state prevent from getting back to the patch in this same fest. >> >> In general I would prefer patches to be set to Returned with Feedback >> only when we think they are probably done for this CommitFest. > > why? it seems very simple as is: > Returned with Feedback means there is something to clean, when the > author fixes the problem he can update it to Needs review. No, that's what "waiting on author" means. "Returned with feedback" means that it might be acceptable in some CommitFest, but not this one. "Rejected" means we don't want it. The distinction is important because it affects the review process. If I have reviewed 8 patches and they are all in the status "waiting on author", then I am reluctant to take on any more patches because I might have to re-review as many as 8 patches, and that might be as much (or more) than I'm prepared to take on. But if all of those patches are returned with feedback, then I know that they are not coming back, and I can take on a few more without worrying that I'm suddenly going to get slammed by the need to re-review a bunch of stuff. It is also very hard to close the CommitFest if things that are dead keep coming back to life again. To take an example, suppose that a patch is reviewed on July 16th and the patch author is asked to make some changes. Then, on August 10th, the author resubmits. If we take the view that this is legitimate, then we're going to have a whole slough of resubmits just prior to whatever we set as the last day for resubmits, which will mean that the CommitFest is not going to be closed in a month, and we need that to happen so that people can get back to working on their own patches. What we need to say is if the patch author can't resubmit within a few days (say, four, maybe a bit more if it's a major patch or early in the CommitFest), then we move it from waiting to author to returned with feedback and move on. ...Robert
Hi, Le 19 juil. 09 à 06:30, Robert Haas a écrit : > On Sat, Jul 18, 2009 at 5:28 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> >> What do you want to use then ... Waiting on Author? > Yeah, that's what I was thinking. Oh and I see that "Returned with feedback" did set a "Close Date", so it's not what I intended anyway. I've changed the status to "Waiting on Author" and if we have no news before the end of current commit fest, I'll then move it to "Returned with feedback". https://commitfest.postgresql.org/action/patch_view?id=99 -- dim
On Sun, Jul 19, 2009 at 8:53 AM, Dimitri Fontaine<dfontaine@hi-media.com> wrote: > Oh and I see that "Returned with feedback" did set a "Close Date", so it's > not what I intended anyway. I've changed the status to "Waiting on Author" > and if we have no news before the end of current commit fest, I'll then move > it to "Returned with feedback". > > https://commitfest.postgresql.org/action/patch_view?id=99 Well, actually, that's a little too generous. What happened last time is that a number of people waited to do anything until they were told "hey, we're closing the CommitFest" and then they all resubmitted at once, which resulted in prolonging the CommitFest rather than bringing it to a conclusion. I think we need a rule that when a patch is waiting on author, we give them 4 or 5 days to get back to us, and then move it over to returned with feedback. This isn't punishment: it's right in keeping with the philosophy that the CommitFest is intended to commit patches that are either done or need only minor modifications, and it's necessary to make sure that the size of the open patch queue decreases monotonically to zero so we can end the reviewing effort and get back to the development effort. But it sounds like "Waiting on author" is the right status for now, and we will move it to returned with feedback if there is no update by the end of the week. ...Robert
Dimitri Fontaine <dfontaine@hi-media.com> wrote: > WARNING: condition stack overflow: 10 > So I'm going to change patch state to "Returned with Feedback" as I > guess we'll need to talk about the issue and find a way to solve it, > and I don't think this state prevent from getting back to the patch in > this same fest. Oops, I must fix it. I didn't test well the default stack depth (10). I'd better not have limitation of condition stack. BTW, I hope I have enough feedbacks from reviewers if the patch is "Returned with Feedback". Are there any issues I need to fix or improve by the next commitfest? I feel we don't have enough discussion about the feature, like: * Is is useful enough? or are there any idea to be more useful?* Is it ok we have two versions of profiling? (this and dtrace probes) * Is the quality of the patch enough in termsof implmentation? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Hi, Le 21 juil. 09 à 07:57, Itagaki Takahiro a écrit : > Oops, I must fix it. I didn't test well the default stack depth (10). > I'd better not have limitation of condition stack. I'm glad to hear it's possible to implement without arbitrary limits :) > BTW, I hope I have enough feedbacks from reviewers if the patch is > "Returned with Feedback". Are there any issues I need to fix or > improve > by the next commitfest? > > I feel we don't have enough discussion about the feature, like: > * Is is useful enough? or are there any idea to be more useful? It looks very useful but I didn't have time to play around enough with it (stopped at warnings, which were very early in testing). It seems not possible to reset the profiles then launch a query and see stats for only this query run? (all backends will be profiled together). Knowing what sort of workload (very detailed here, which is good) is running is good though, I think I'm going to use it when available :) > * Is it ok we have two versions of profiling? (this and dtrace > probes) Can't comment on this, never used dtrace before, don't have a version of it on my production systems. > * Is the quality of the patch enough in terms of implmentation? I've raised some questions related on performance impact of function calls when profiling is disabled and the code changes related to how to take some locks, I didn't have more comments than that (didn't spot other comments to get done). Regards, -- dim
On Tuesday 21 July 2009 09:09:54 Dimitri Fontaine wrote: > > * Is it ok we have two versions of profiling? (this and dtrace > > probes) > > Can't comment on this, never used dtrace before, don't have a version > of it on my production systems. Well, the objection remains: We already have dtrace support, and dtrace or dtrace-like systems are spreading to many operating systems, so one wonders whether it is useful to clutter the code with another probing system instead of putting some resources, say, into getting systemtap up to speed.
Peter Eisentraut <peter_e@gmx.net> writes: > Well, the objection remains: We already have dtrace support, and dtrace or > dtrace-like systems are spreading to many operating systems, so one wonders > whether it is useful to clutter the code with another probing system instead > of putting some resources, say, into getting systemtap up to speed. For the record, I think this patch is a waste of manpower and we should rely on dtrace/systemtap. However, if we are going to make our own homegrown substitute for those facilities, a minimum requirement should be that it uses the dtrace macros already put into the sources, rather than expecting that it gets to clutter the code some more with its own set of tracing markers. regards, tom lane
Hi, Tom Lane <tgl@sss.pgh.pa.us> writes: > For the record, I think this patch is a waste of manpower and we should > rely on dtrace/systemtap. However, if we are going to make our own > homegrown substitute for those facilities, a minimum requirement should > be that it uses the dtrace macros already put into the sources, rather > than expecting that it gets to clutter the code some more with its own > set of tracing markers. I think going from this form of the patch to reusing dtrace macros means providing another entirely different patch to solve the issue (even if the system view and its support function could remain about the same), so I've updated the patch commit fest status to rejected. Regards, -- dim
On Tue, Jul 21, 2009 at 10:36 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> Well, the objection remains: We already have dtrace support, and dtrace or >> dtrace-like systems are spreading to many operating systems, so one wonders >> whether it is useful to clutter the code with another probing system instead >> of putting some resources, say, into getting systemtap up to speed. > > For the record, I think this patch is a waste of manpower and we should > rely on dtrace/systemtap. However, if we are going to make our own > homegrown substitute for those facilities, a minimum requirement should > be that it uses the dtrace macros already put into the sources, rather > than expecting that it gets to clutter the code some more with its own > set of tracing markers. dtrace/systemtap doesn't work on every OS someone might care about, but I definitely agree that we should try to reuse the same tracing markers. ...Robert
Tom Lane <tgl@sss.pgh.pa.us> wrote: > For the record, I think this patch is a waste of manpower and we should > rely on dtrace/systemtap. However, if we are going to make our own > homegrown substitute for those facilities, a minimum requirement should > be that it uses the dtrace macros already put into the sources, rather > than expecting that it gets to clutter the code some more with its own > set of tracing markers. How about export dtrace functions as hook function pointers? For example: void (*LWLOCK_WAIT_START_hook)(int, int); #define TRACE_POSTGRESQL_LWLOCK_WAIT_START(INT1, INT2) \ if (LWLOCK_WAIT_START_hook== NULL); else \ LWLOCK_WAIT_START_hook(INT1, INT2) #define TRACE_POSTGRESQL_LWLOCK_WAIT_START_ENABLED()\ (LWLOCK_WAIT_START_hook != NULL) If there were such hooks, my profiler could be implemented as a loadable module on top of the hooks. It might be good to initialize LWLOCK_WAIT_START_hook with lwlock__wait__start(). If do so, dtrace probes still work and we can avoid if-null checks for each call. If acceptable, I'll also suggest new probe functions like SLEEP, SEND, RECV, SPINLOCK_FAILURE and so on. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
I wrote: > How about export dtrace functions as hook function pointers? Here is a proposal to integrate profiler to postgres without adding any tracing markers. The goal is to provide platform-independent and easy-to-use performance profiler. (typically just adding some configuration to postgresql.conf.) ---- 1. Add Gen_trace_hooks.sed to generate hook functions from probes.d. It appends hook variables at the tail of probes.hlike: extern void (*TRANSACTION_START_hook)(LocalTransactionId arg1); 2. Rewrite trace function calls into PG_TRACE(name, (args...)). Trace macros are defined as: #define PG_TRACE(name,args) \ do { \ TRACE_POSTGRESQL_##name args; \ if (name##_hook) \ name##_hook args; \ } while(0) and called as: PG_TRACE(TRANSACTION_START, (vxid.localTransactionId)); The changes are not always necessary, but PG_TRACE macro is useful to add common logic for all probes. We can also useit to disable probes; Gen_dummy_probes.sed will be no longer needed. 3. Implement profiler using trace hooks. Timer callbacks might be needed for periodical sampling, but I'll try to usesimple polling from sql for now. ---- I tested performance regression by empty dtrace-probes and empty trace-hooks, but the differences were 1-2%. Close enough to dtrace. $ pgbench -n -S -c8 -T60 No probes : tps = 28103 ENABLE_TRACE_HOOK only : tps = 28101 ENABLE_DTRACEonly : tps = 27945 Enable both : tps = 27760 Regards, --- ITAGAKI Takahiro NTT Open Source Software Center