Thread: Review: DTrace probes (merged version)

Review: DTrace probes (merged version)

From
Zdenek Kotala
Date:
I performed review of merged patch from Robert Treat. At first point the patch 
does not work (SunOS 5.11 snv_86 sun4u sparc SUNW,Sun-Fire-V240)
     truncate             ... ok     alter_table          ... FAILED (test process exited with exit code 2)
sequence            ... ok     polymorphism         ... ok     rowtypes             ... ok     returning            ...
ok    largeobject          ... FAILED (test process exited with exit code 2)     xml                  ... ok
 
test stats                ... FAILED (test process exited with exit code 2)
test tablespace           ... FAILED (test process exited with exit code 2)


However I went through a code and I have following comments:

1) Naming convention:
 - Some probes uses "*end", some "*done". It would be good to select one name. - I prefer to use clog instead of slru
inprobes name. clog is widely known. - It seems to me better to have checkpoint-clog..., checkpoint-subtrans 
 
instead of clog-checkpoint. - buffer-flush was originally dirty-buffer-write-start. I prefer Robert Lor's 
naming.

2) storage read write probes

smgr-read*, smgr-writes probes are in md.c. I personally think it make more 
sense to put them into smgr.c. Only advantage to have it in md.c is that actual 
amount of bytes is possible to monitor.

3) query rewrite probe

There are several probes for query measurement but query rewrite phase is 
missing. See analyze_and_rewrite or pg_parse_and_rewrite

4) autovacuum_start

Autovacuum_start probe is alone. I propose following probes for completeness:

proc-autovacuum-start
proc-autovacuum-stop
proc-bgwriter-start
proc-bgwriter-stop
proc-backend-start
proc-backend-stop
proc-master-start
proc-master-stop

5) Need explain of usage:

I have some doubts about following probes. Could you please explain usage of 
them? example dtrace script is welcome
 - all exec-* probes - mark-dirty, local-mark-dirty


6) several comments about placement:

I published patch on http://reviewdemo.postgresql.org/r/25/. I added several 
comments there.

7) SLRU/CLOG

SLRU probes could be return more info. For example if page was in buffer or if 
physical write is not necessary and so on.

That's all for this moment
        Zdenek        
-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: Review: DTrace probes (merged version)

From
Alvaro Herrera
Date:
Zdenek Kotala wrote:

> 1) Naming convention:
>
>  - Some probes uses "*end", some "*done". It would be good to select one name.
>  - I prefer to use clog instead of slru in probes name. clog is widely known.

But slru is also used in pg_subtrans and pg_multixact.  Which maybe
says that we oughta have separate probes for these rather than a single
one in slru.  Otherwise it's going to be difficult telling one from the
other, yes?


> Autovacuum_start probe is alone. I propose following probes for completeness:
>
> proc-autovacuum-start
> proc-autovacuum-stop
> proc-bgwriter-start
> proc-bgwriter-stop

Separate proc-autovacuum-worker-start and proc-autovacuum-launcher-start,
perhaps.  Not that I see any usefulness in tracking autovacuum launcher
start and stop, but then if we're tracking bgwriter start and stop then
it makes the same sense.

> proc-master-start
> proc-master-stop

What's "master" here?


-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Review: DTrace probes (merged version)

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Autovacuum_start probe is alone. I propose following probes for completeness:
>> 
>> proc-autovacuum-start
>> proc-autovacuum-stop
>> proc-bgwriter-start
>> proc-bgwriter-stop

> Separate proc-autovacuum-worker-start and proc-autovacuum-launcher-start,
> perhaps.  Not that I see any usefulness in tracking autovacuum launcher
> start and stop, but then if we're tracking bgwriter start and stop then
> it makes the same sense.

I see no value in cluttering the system with useless probes.  The worker
start/stop are the only ones here with any conceivable application IMHO.
        regards, tom lane


Re: Review: DTrace probes (merged version)

From
Zdenek Kotala
Date:
Alvaro Herrera napsal(a):
> Zdenek Kotala wrote:
> 
>> 1) Naming convention:
>>
>>  - Some probes uses "*end", some "*done". It would be good to select one name.
>>  - I prefer to use clog instead of slru in probes name. clog is widely known.
> 
> But slru is also used in pg_subtrans and pg_multixact.  Which maybe
> says that we oughta have separate probes for these rather than a single
> one in slru.  Otherwise it's going to be difficult telling one from the
> other, yes?

Yeah, you are right, I missed that it is used in other part too. slru is OK

> 
>> Autovacuum_start probe is alone. I propose following probes for completeness:
>>
>> proc-autovacuum-start
>> proc-autovacuum-stop
>> proc-bgwriter-start
>> proc-bgwriter-stop
> 
> Separate proc-autovacuum-worker-start and proc-autovacuum-launcher-start,
> perhaps.  Not that I see any usefulness in tracking autovacuum launcher
> start and stop, but then if we're tracking bgwriter start and stop then
> it makes the same sense.

The advantage to track start and stop of procese is that you can stop the 
process in dtrace script at the beginning and for example attach debugger or for 
example start counting number of writes per process and so on.

>> proc-master-start
>> proc-master-stop
> 
> What's "master" here?

Main process - postmaster.

    Zdenek

-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: Review: DTrace probes (merged version)

From
Zdenek Kotala
Date:
Tom Lane napsal(a):
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>> Autovacuum_start probe is alone. I propose following probes for completeness:
>>>
>>> proc-autovacuum-start
>>> proc-autovacuum-stop
>>> proc-bgwriter-start
>>> proc-bgwriter-stop
> 
>> Separate proc-autovacuum-worker-start and proc-autovacuum-launcher-start,
>> perhaps.  Not that I see any usefulness in tracking autovacuum launcher
>> start and stop, but then if we're tracking bgwriter start and stop then
>> it makes the same sense.
> 
> I see no value in cluttering the system with useless probes.  The worker
> start/stop are the only ones here with any conceivable application IMHO.

As I answered to Alvaro. I needed to catch start of backend several times to 
track call flow or attach debugger. It is possible to use some other dtrace 
magic for that, but it is not easy and there is not way how to determine what 
kind of process it is.  For example how to measure how many writes performs 
bgwriter?
Zdenek


-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: Review: DTrace probes (merged version)

From
Alvaro Herrera
Date:
Zdenek Kotala wrote:
> Alvaro Herrera napsal(a):

>>> proc-master-start
>>> proc-master-stop
>>
>> What's "master" here?
>
> Main process - postmaster.

Huh, surely we're not interested in tracking postmaster start?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Review: DTrace probes (merged version)

From
Alvaro Herrera
Date:
Zdenek Kotala wrote:
> Tom Lane napsal(a):

>> I see no value in cluttering the system with useless probes.  The worker
>> start/stop are the only ones here with any conceivable application IMHO.
>
> As I answered to Alvaro. I needed to catch start of backend several times 
> to track call flow or attach debugger. It is possible to use some other 
> dtrace magic for that, but it is not easy and there is not way how to 
> determine what kind of process it is.  For example how to measure how 
> many writes performs bgwriter?

If you need to attach a debugger to a backend, you can use the -W switch
(even on PGOPTIONS if you need it for a particular backend, AFAIR).  If
you want to "truss" it I guess you can use -W too.

Does it have any usefulness beyond that?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Review: DTrace probes (merged version)

From
Zdenek Kotala
Date:
Alvaro Herrera napsal(a):
> Zdenek Kotala wrote:
>> Alvaro Herrera napsal(a):
> 
>>>> proc-master-start
>>>> proc-master-stop
>>> What's "master" here?
>> Main process - postmaster.
> 
> Huh, surely we're not interested in tracking postmaster start?
> 

It depends. See following schema example


postgres::proc-master-start
{   tracking = 1
}

syscall:::open:entry
/ tracking == 1/
{   ... print file name ...
}

postgres::proc-master-stop
{  tracking = 0
}

It is very useful because it say when you can start and stop monitoring for 
example. But I'm not DTrace expert maybe there is different way how to do it.
    Zdenek


-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: Review: DTrace probes (merged version)

From
Zdenek Kotala
Date:
Alvaro Herrera napsal(a):
> Zdenek Kotala wrote:
>> Tom Lane napsal(a):
> 
>>> I see no value in cluttering the system with useless probes.  The worker
>>> start/stop are the only ones here with any conceivable application IMHO.
>> As I answered to Alvaro. I needed to catch start of backend several times 
>> to track call flow or attach debugger. It is possible to use some other 
>> dtrace magic for that, but it is not easy and there is not way how to 
>> determine what kind of process it is.  For example how to measure how 
>> many writes performs bgwriter?
> 
> If you need to attach a debugger to a backend, you can use the -W switch
> (even on PGOPTIONS if you need it for a particular backend, AFAIR).  If
> you want to "truss" it I guess you can use -W too.
> 
> Does it have any usefulness beyond that?
> 

Why use million of tools when you can use one? And truss monitors only syscalls 
but with dtrace you are able to use/trace over 80000 probes in the kernel, libc 
and so on. I agree that for debugger you can use -W option but in situation when 
you are not able to use this switch (e.g on customer production machine) dtrace 
is only possible solution. That is why I think that this probes are useful.

    Zdenek

-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: Review: DTrace probes (merged version)

From
Robert Lor
Date:
Apologies for the delayed response - vacation, travel, etc got in the way!


Zdenek Kotala wrote:
> I performed review of merged patch from Robert Treat. At first point
> the patch does not work (SunOS 5.11 snv_86 sun4u sparc
> SUNW,Sun-Fire-V240)

The attached patch fixed the regression test errors.
>
>
> However I went through a code and I have following comments:
>
> 1) Naming convention:
>
>  - Some probes uses "*end", some "*done". It would be good to select
> one name.
Noted. Will use <name>-done per the convention. This change will be
included in an updated patch later since I think there are a number of
other changes that need to be made.
>  - I prefer to use clog instead of slru in probes name. clog is widely
> known.
I think slru- is okay per your subsequent emails.
>  - It seems to me better to have checkpoint-clog...,
> checkpoint-subtrans instead of clog-checkpoint.
Yes, I was thinking about this too, but the current names are more
consistent with the others. For example:

buffer-checkpoint, buffer-*
xlog-checkpoint, xlog-*
>  - buffer-flush was originally dirty-buffer-write-start. I prefer
> Robert Lor's naming.
Actually, I made this change so the name is consistent with the other
buffer-* probes.
>
> 2) storage read write probes
>
> smgr-read*, smgr-writes probes are in md.c. I personally think it make
> more sense to put them into smgr.c. Only advantage to have it in md.c
> is that actual amount of bytes is possible to monitor.
The current probes return number of bytes, that's why they are in md.c
>
> 3) query rewrite probe
>
> There are several probes for query measurement but query rewrite phase
> is missing. See analyze_and_rewrite or pg_parse_and_rewrite
I believe the rewrite time is accounted for in the query-plan probe.
Need to double check on this.
>
> 4) autovacuum_start
>
> Autovacuum_start probe is alone. I propose following probes for
> completeness:
>
> proc-autovacuum-start
> proc-autovacuum-stop
> proc-bgwriter-start
> proc-bgwriter-stop
> proc-backend-start
> proc-backend-stop
> proc-master-start
> proc-master-stop
Saw a number of emails on this. Will get back on this.
>
> 5) Need explain of usage:
>
> I have some doubts about following probes. Could you please explain
> usage of them? example dtrace script is welcome
>
>  - all exec-* probes
>  - mark-dirty, local-mark-dirty
>
Theo/Robert, do you guys have any sample scripts on the probes you added.
>
> 6) several comments about placement:
>
> I published patch on http://reviewdemo.postgresql.org/r/25/. I added
> several comments there.
>
> 7) SLRU/CLOG
>
> SLRU probes could be return more info. For example if page was in
> buffer or if physical write is not necessary and so on.
Yes, more info could be returned if we can identify specific use cases
that the new data will enable.

--
Robert Lor           Sun Microsystems
Austin, USA          http://sun.com/postgresql

Index: src/backend/access/transam/clog.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/clog.c,v
retrieving revision 1.46
diff -u -3 -p -r1.46 clog.c
--- src/backend/access/transam/clog.c    1 Jan 2008 19:45:46 -0000    1.46
+++ src/backend/access/transam/clog.c    21 Jul 2008 18:14:48 -0000
@@ -36,6 +36,7 @@
 #include "access/slru.h"
 #include "access/transam.h"
 #include "postmaster/bgwriter.h"
+#include "pg_trace.h"

 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -323,7 +324,9 @@ void
 CheckPointCLOG(void)
 {
     /* Flush dirty CLOG pages to disk */
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_START();
     SimpleLruFlush(ClogCtl, true);
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE();
 }


Index: src/backend/access/transam/multixact.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/multixact.c,v
retrieving revision 1.27
diff -u -3 -p -r1.27 multixact.c
--- src/backend/access/transam/multixact.c    1 Jan 2008 19:45:46 -0000    1.27
+++ src/backend/access/transam/multixact.c    21 Jul 2008 18:21:58 -0000
@@ -57,6 +57,7 @@
 #include "storage/lmgr.h"
 #include "utils/memutils.h"
 #include "storage/procarray.h"
+#include "pg_trace.h"


 /*
@@ -1526,6 +1527,8 @@ MultiXactGetCheckptMulti(bool is_shutdow
 void
 CheckPointMultiXact(void)
 {
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START();
+
     /* Flush dirty MultiXact pages to disk */
     SimpleLruFlush(MultiXactOffsetCtl, true);
     SimpleLruFlush(MultiXactMemberCtl, true);
@@ -1540,6 +1543,8 @@ CheckPointMultiXact(void)
      */
     if (!InRecovery)
         TruncateMultiXact();
+
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE();
 }

 /*
Index: src/backend/access/transam/slru.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/slru.c,v
retrieving revision 1.44
diff -u -3 -p -r1.44 slru.c
--- src/backend/access/transam/slru.c    1 Jan 2008 19:45:48 -0000    1.44
+++ src/backend/access/transam/slru.c    21 Jul 2008 18:22:00 -0000
@@ -57,6 +57,7 @@
 #include "storage/fd.h"
 #include "storage/shmem.h"
 #include "miscadmin.h"
+#include "pg_trace.h"


 /*
@@ -372,6 +373,8 @@ SimpleLruReadPage(SlruCtl ctl, int pagen
 {
     SlruShared    shared = ctl->shared;

+    TRACE_POSTGRESQL_SLRU_READPAGE_START((uintptr_t)ctl, pageno, write_ok, xid);
+
     /* Outer loop handles restart if we must wait for someone else's I/O */
     for (;;)
     {
@@ -399,6 +402,7 @@ SimpleLruReadPage(SlruCtl ctl, int pagen
             }
             /* Otherwise, it's ready to use */
             SlruRecentlyUsed(shared, slotno);
+            TRACE_POSTGRESQL_SLRU_READPAGE_DONE(slotno);
             return slotno;
         }

@@ -446,6 +450,7 @@ SimpleLruReadPage(SlruCtl ctl, int pagen
             SlruReportIOError(ctl, pageno, xid);

         SlruRecentlyUsed(shared, slotno);
+        TRACE_POSTGRESQL_SLRU_READPAGE_DONE(slotno);
         return slotno;
     }
 }
@@ -470,6 +475,8 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl,
     SlruShared    shared = ctl->shared;
     int            slotno;

+    TRACE_POSTGRESQL_SLRU_READPAGE_READONLY((uintptr_t)ctl, pageno, xid);
+
     /* Try to find the page while holding only shared lock */
     LWLockAcquire(shared->ControlLock, LW_SHARED);

@@ -511,6 +518,8 @@ SimpleLruWritePage(SlruCtl ctl, int slot
     int            pageno = shared->page_number[slotno];
     bool        ok;

+    TRACE_POSTGRESQL_SLRU_WRITEPAGE_START((uintptr_t)ctl, pageno, slotno);
+
     /* If a write is in progress, wait for it to finish */
     while (shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
            shared->page_number[slotno] == pageno)
@@ -525,7 +534,10 @@ SimpleLruWritePage(SlruCtl ctl, int slot
     if (!shared->page_dirty[slotno] ||
         shared->page_status[slotno] != SLRU_PAGE_VALID ||
         shared->page_number[slotno] != pageno)
+    {
+        TRACE_POSTGRESQL_SLRU_WRITEPAGE_DONE();
         return;
+    }

     /*
      * Mark the slot write-busy, and clear the dirtybit.  After this point, a
@@ -569,6 +581,8 @@ SimpleLruWritePage(SlruCtl ctl, int slot
     /* Now it's okay to ereport if we failed */
     if (!ok)
         SlruReportIOError(ctl, pageno, InvalidTransactionId);
+
+    TRACE_POSTGRESQL_SLRU_WRITEPAGE_DONE();
 }

 /*
@@ -593,6 +607,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa

     SlruFileName(ctl, path, segno);

+    TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_START((uintptr_t)ctl, path, pageno, slotno);
     /*
      * In a crash-and-restart situation, it's possible for us to receive
      * commands to set the commit status of transactions whose bits are in
@@ -607,6 +622,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa
         {
             slru_errcause = SLRU_OPEN_FAILED;
             slru_errno = errno;
+            TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_DONE(false, slru_errcause, slru_errno);
             return false;
         }

@@ -614,6 +630,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa
                 (errmsg("file \"%s\" doesn't exist, reading as zeroes",
                         path)));
         MemSet(shared->page_buffer[slotno], 0, BLCKSZ);
+        TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_DONE(true, slru_errcause, slru_errno);
         return true;
     }

@@ -622,6 +639,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa
         slru_errcause = SLRU_SEEK_FAILED;
         slru_errno = errno;
         close(fd);
+        TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_DONE(false, slru_errcause, slru_errno);
         return false;
     }

@@ -631,6 +649,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa
         slru_errcause = SLRU_READ_FAILED;
         slru_errno = errno;
         close(fd);
+        TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_DONE(false, slru_errcause, slru_errno);
         return false;
     }

@@ -638,9 +657,11 @@ SlruPhysicalReadPage(SlruCtl ctl, int pa
     {
         slru_errcause = SLRU_CLOSE_FAILED;
         slru_errno = errno;
+        TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_DONE(false, slru_errcause, slru_errno);
         return false;
     }

+    TRACE_POSTGRESQL_SLRU_READPAGE_PHYSICAL_DONE(true, slru_errcause, slru_errno);
     return true;
 }

@@ -668,6 +689,8 @@ SlruPhysicalWritePage(SlruCtl ctl, int p
     char        path[MAXPGPATH];
     int            fd = -1;

+    TRACE_POSTGRESQL_SLRU_WRITEPAGE_PHYSICAL_START((uintptr_t)ctl, pageno, slotno);
+
     /*
      * Honor the write-WAL-before-data rule, if appropriate, so that we do not
      * write out data before associated WAL records.  This is the same action
@@ -753,6 +776,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int p
         {
             slru_errcause = SLRU_OPEN_FAILED;
             slru_errno = errno;
+            TRACE_POSTGRESQL_SLRU_WRITEPAGE_PHYSICAL_DONE(false, slru_errcause, slru_errno);
             return false;
         }

@@ -781,6 +805,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int p
         slru_errno = errno;
         if (!fdata)
             close(fd);
+        TRACE_POSTGRESQL_SLRU_WRITEPAGE_PHYSICAL_DONE(false, slru_errcause, slru_errno);
         return false;
     }

@@ -794,6 +819,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int p
         slru_errno = errno;
         if (!fdata)
             close(fd);
+        TRACE_POSTGRESQL_SLRU_WRITEPAGE_PHYSICAL_DONE(false, slru_errcause, slru_errno);
         return false;
     }

@@ -808,6 +834,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int p
             slru_errcause = SLRU_FSYNC_FAILED;
             slru_errno = errno;
             close(fd);
+            TRACE_POSTGRESQL_SLRU_WRITEPAGE_PHYSICAL_DONE(false, slru_errcause, slru_errno);
             return false;
         }

@@ -815,10 +842,12 @@ SlruPhysicalWritePage(SlruCtl ctl, int p
         {
             slru_errcause = SLRU_CLOSE_FAILED;
             slru_errno = errno;
+            TRACE_POSTGRESQL_SLRU_WRITEPAGE_PHYSICAL_DONE(false, slru_errcause, slru_errno);
             return false;
         }
     }

+    TRACE_POSTGRESQL_SLRU_WRITEPAGE_PHYSICAL_DONE(true, slru_errcause, slru_errno);
     return true;
 }

Index: src/backend/access/transam/subtrans.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/subtrans.c,v
retrieving revision 1.22
diff -u -3 -p -r1.22 subtrans.c
--- src/backend/access/transam/subtrans.c    26 Mar 2008 18:48:59 -0000    1.22
+++ src/backend/access/transam/subtrans.c    21 Jul 2008 18:22:01 -0000
@@ -32,6 +32,7 @@
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "utils/snapmgr.h"
+#include "pg_trace.h"


 /*
@@ -281,7 +282,9 @@ CheckPointSUBTRANS(void)
      * it merely to improve the odds that writing of dirty pages is done by
      * the checkpoint process and not by backends.
      */
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START();
     SimpleLruFlush(SubTransCtl, true);
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE();
 }


Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.43
diff -u -3 -p -r1.43 twophase.c
--- src/backend/access/transam/twophase.c    19 May 2008 18:16:26 -0000    1.43
+++ src/backend/access/transam/twophase.c    21 Jul 2008 18:22:09 -0000
@@ -57,6 +57,7 @@
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /*
@@ -1387,6 +1388,9 @@ CheckPointTwoPhase(XLogRecPtr redo_horiz
      */
     if (max_prepared_xacts <= 0)
         return;                    /* nothing to do */
+
+    TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_START();
+
     xids = (TransactionId *) palloc(max_prepared_xacts * sizeof(TransactionId));
     nxids = 0;

@@ -1444,6 +1448,8 @@ CheckPointTwoPhase(XLogRecPtr redo_horiz
     }

     pfree(xids);
+
+    TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_DONE();
 }

 /*
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.316
diff -u -3 -p -r1.316 xlog.c
--- src/backend/access/transam/xlog.c    13 Jul 2008 20:45:47 -0000    1.316
+++ src/backend/access/transam/xlog.c    21 Jul 2008 18:22:23 -0000
@@ -48,6 +48,7 @@
 #include "utils/builtins.h"
 #include "utils/pg_locale.h"
 #include "utils/ps_status.h"
+#include "pg_trace.h"


 /* File path names (all relative to $PGDATA) */
@@ -484,6 +485,8 @@ XLogInsert(RmgrId rmid, uint8 info, XLog
     if (info & XLR_INFO_MASK)
         elog(PANIC, "invalid xlog info mask %02X", info);

+    TRACE_POSTGRESQL_XLOG_INSERT(rmid, info);
+
     /*
      * In bootstrap mode, we don't actually log anything but XLOG resources;
      * return a phony record pointer.
@@ -912,6 +915,8 @@ begin:;
         XLogwrtRqst FlushRqst;
         XLogRecPtr    OldSegEnd;

+        TRACE_POSTGRESQL_XLOG_SWITCH();
+
         LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);

         /*
@@ -1270,12 +1275,14 @@ AdvanceXLInsertBuffer(bool new_segment)
                  * Have to write buffers while holding insert lock. This is
                  * not good, so only write as much as we absolutely must.
                  */
+                TRACE_POSTGRESQL_WAL_BUFFER_WRITE_START();
                 WriteRqst.Write = OldPageRqstPtr;
                 WriteRqst.Flush.xlogid = 0;
                 WriteRqst.Flush.xrecoff = 0;
                 XLogWrite(WriteRqst, false, false);
                 LWLockRelease(WALWriteLock);
                 Insert->LogwrtResult = LogwrtResult;
+                TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DONE();
             }
         }
     }
@@ -5785,6 +5792,7 @@ CreateCheckPoint(int flags)
             LWLockRelease(WALInsertLock);
             LWLockRelease(CheckpointLock);
             END_CRIT_SECTION();
+            TRACE_POSTGRESQL_XLOG_CHECKPOINT_DONE(UINT_MAX,0,0,0,UINT_MAX);
             return;
         }
     }
@@ -5880,6 +5888,8 @@ CreateCheckPoint(int flags)
     checkPoint.nextXid = ShmemVariableCache->nextXid;
     LWLockRelease(XidGenLock);

+    TRACE_POSTGRESQL_XLOG_CHECKPOINT_START(flags, checkPoint.nextXid);
+
     /* Increase XID epoch if we've wrapped around since last checkpoint */
     checkPoint.nextXidEpoch = ControlFile->checkPointCopy.nextXidEpoch;
     if (checkPoint.nextXid < ControlFile->checkPointCopy.nextXid)
@@ -6004,6 +6014,11 @@ CreateCheckPoint(int flags)
     if (log_checkpoints)
         LogCheckpointEnd();

+    TRACE_POSTGRESQL_XLOG_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
+                NBuffers, CheckpointStats.ckpt_segs_added,
+                CheckpointStats.ckpt_segs_removed,
+                CheckpointStats.ckpt_segs_recycled);
+
     LWLockRelease(CheckpointLock);
 }

Index: src/backend/executor/execScan.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/execScan.c,v
retrieving revision 1.43
diff -u -3 -p -r1.43 execScan.c
--- src/backend/executor/execScan.c    1 Jan 2008 19:45:49 -0000    1.43
+++ src/backend/executor/execScan.c    21 Jul 2008 18:22:23 -0000
@@ -21,6 +21,7 @@
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 static bool tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc);
@@ -60,6 +61,7 @@ ExecScan(ScanState *node,
     qual = node->ps.qual;
     projInfo = node->ps.ps_ProjInfo;

+    TRACE_POSTGRESQL_EXECUTOR_SCAN((uintptr_t)node, ((Scan *)node->ps.plan)->scanrelid, (uintptr_t)accessMtd);
     /*
      * If we have neither a qual to check nor a projection to do, just skip
      * all the overhead and return the raw scan tuple.
Index: src/backend/executor/nodeAgg.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeAgg.c,v
retrieving revision 1.158
diff -u -3 -p -r1.158 nodeAgg.c
--- src/backend/executor/nodeAgg.c    12 May 2008 00:00:49 -0000    1.158
+++ src/backend/executor/nodeAgg.c    21 Jul 2008 18:22:25 -0000
@@ -86,6 +86,7 @@
 #include "utils/syscache.h"
 #include "utils/tuplesort.h"
 #include "utils/datum.h"
+#include "pg_trace.h"


 /*
@@ -805,6 +806,8 @@ ExecAgg(AggState *node)
     if (node->agg_done)
         return NULL;

+    TRACE_POSTGRESQL_EXECUTOR_AGG((uintptr_t)node, ((Agg *) node->ss.ps.plan)->aggstrategy);
+
     if (((Agg *) node->ss.ps.plan)->aggstrategy == AGG_HASHED)
     {
         if (!node->table_filled)
Index: src/backend/executor/nodeGroup.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeGroup.c,v
retrieving revision 1.70
diff -u -3 -p -r1.70 nodeGroup.c
--- src/backend/executor/nodeGroup.c    1 Jan 2008 19:45:49 -0000    1.70
+++ src/backend/executor/nodeGroup.c    21 Jul 2008 18:22:25 -0000
@@ -24,6 +24,7 @@

 #include "executor/executor.h"
 #include "executor/nodeGroup.h"
+#include "pg_trace.h"


 /*
@@ -49,6 +50,8 @@ ExecGroup(GroupState *node)
     numCols = ((Group *) node->ss.ps.plan)->numCols;
     grpColIdx = ((Group *) node->ss.ps.plan)->grpColIdx;

+    TRACE_POSTGRESQL_EXECUTOR_GROUP((uintptr_t)node, numCols);
+
     /*
      * The ScanTupleSlot holds the (copied) first tuple of each group.
      */
Index: src/backend/executor/nodeHash.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeHash.c,v
retrieving revision 1.116
diff -u -3 -p -r1.116 nodeHash.c
--- src/backend/executor/nodeHash.c    1 Jan 2008 19:45:49 -0000    1.116
+++ src/backend/executor/nodeHash.c    21 Jul 2008 18:22:26 -0000
@@ -35,6 +35,7 @@
 #include "utils/dynahash.h"
 #include "utils/memutils.h"
 #include "utils/lsyscache.h"
+#include "pg_trace.h"


 static void ExecHashIncreaseNumBatches(HashJoinTable hashtable);
@@ -70,6 +71,8 @@ MultiExecHash(HashState *node)
     ExprContext *econtext;
     uint32        hashvalue;

+    TRACE_POSTGRESQL_EXECUTOR_HASH_MULTI((uintptr_t)node);
+
     /* must provide our own instrumentation support */
     if (node->ps.instrument)
         InstrStartNode(node->ps.instrument);
Index: src/backend/executor/nodeHashjoin.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeHashjoin.c,v
retrieving revision 1.93
diff -u -3 -p -r1.93 nodeHashjoin.c
--- src/backend/executor/nodeHashjoin.c    1 Jan 2008 19:45:49 -0000    1.93
+++ src/backend/executor/nodeHashjoin.c    21 Jul 2008 18:22:27 -0000
@@ -20,6 +20,7 @@
 #include "executor/nodeHash.h"
 #include "executor/nodeHashjoin.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 static TupleTableSlot *ExecHashJoinOuterGetTuple(PlanState *outerNode,
@@ -58,6 +59,8 @@ ExecHashJoin(HashJoinState *node)
     uint32        hashvalue;
     int            batchno;

+    TRACE_POSTGRESQL_EXECUTOR_HASHJOIN((uintptr_t)node);
+
     /*
      * get information from HashJoin node
      */
Index: src/backend/executor/nodeLimit.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeLimit.c,v
retrieving revision 1.34
diff -u -3 -p -r1.34 nodeLimit.c
--- src/backend/executor/nodeLimit.c    10 Mar 2008 03:37:59 -0000    1.34
+++ src/backend/executor/nodeLimit.c    21 Jul 2008 18:22:28 -0000
@@ -23,6 +23,7 @@

 #include "executor/executor.h"
 #include "executor/nodeLimit.h"
+#include "pg_trace.h"

 static void recompute_limits(LimitState *node);

@@ -41,6 +42,8 @@ ExecLimit(LimitState *node)
     TupleTableSlot *slot;
     PlanState  *outerPlan;

+    TRACE_POSTGRESQL_EXECUTOR_LIMIT((uintptr_t)node);
+
     /*
      * get information from the node
      */
Index: src/backend/executor/nodeMaterial.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeMaterial.c,v
retrieving revision 1.62
diff -u -3 -p -r1.62 nodeMaterial.c
--- src/backend/executor/nodeMaterial.c    23 Mar 2008 00:54:04 -0000    1.62
+++ src/backend/executor/nodeMaterial.c    21 Jul 2008 18:22:28 -0000
@@ -24,6 +24,7 @@
 #include "executor/executor.h"
 #include "executor/nodeMaterial.h"
 #include "miscadmin.h"
+#include "pg_trace.h"

 /* ----------------------------------------------------------------
  *        ExecMaterial
@@ -45,6 +46,8 @@ ExecMaterial(MaterialState *node)
     bool        eof_tuplestore;
     TupleTableSlot *slot;

+    TRACE_POSTGRESQL_EXECUTOR_MATERIAL((uintptr_t)node);
+
     /*
      * get state info from node
      */
Index: src/backend/executor/nodeMergejoin.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeMergejoin.c,v
retrieving revision 1.91
diff -u -3 -p -r1.91 nodeMergejoin.c
--- src/backend/executor/nodeMergejoin.c    13 Apr 2008 20:51:20 -0000    1.91
+++ src/backend/executor/nodeMergejoin.c    21 Jul 2008 18:22:30 -0000
@@ -102,6 +102,7 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/syscache.h"
+#include "pg_trace.h"


 /*
@@ -565,6 +566,8 @@ ExecMergeJoin(MergeJoinState *node)
     bool        doFillOuter;
     bool        doFillInner;

+    TRACE_POSTGRESQL_EXECUTOR_MERGEJOIN((uintptr_t)node);
+
     /*
      * get information from node
      */
Index: src/backend/executor/nodeNestloop.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeNestloop.c,v
retrieving revision 1.46
diff -u -3 -p -r1.46 nodeNestloop.c
--- src/backend/executor/nodeNestloop.c    1 Jan 2008 19:45:49 -0000    1.46
+++ src/backend/executor/nodeNestloop.c    21 Jul 2008 18:22:30 -0000
@@ -24,6 +24,7 @@
 #include "executor/execdebug.h"
 #include "executor/nodeNestloop.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /* ----------------------------------------------------------------
@@ -67,6 +68,8 @@ ExecNestLoop(NestLoopState *node)
     List       *otherqual;
     ExprContext *econtext;

+    TRACE_POSTGRESQL_EXECUTOR_NESTLOOP((uintptr_t)node);
+
     /*
      * get information from the node
      */
Index: src/backend/executor/nodeSetOp.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeSetOp.c,v
retrieving revision 1.25
diff -u -3 -p -r1.25 nodeSetOp.c
--- src/backend/executor/nodeSetOp.c    1 Jan 2008 19:45:49 -0000    1.25
+++ src/backend/executor/nodeSetOp.c    21 Jul 2008 18:22:31 -0000
@@ -37,6 +37,7 @@
 #include "executor/executor.h"
 #include "executor/nodeSetOp.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /* ----------------------------------------------------------------
@@ -50,6 +51,8 @@ ExecSetOp(SetOpState *node)
     TupleTableSlot *resultTupleSlot;
     PlanState  *outerPlan;

+    TRACE_POSTGRESQL_EXECUTOR_SETOP((uintptr_t)node);
+
     /*
      * get information from the node
      */
Index: src/backend/executor/nodeSort.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeSort.c,v
retrieving revision 1.62
diff -u -3 -p -r1.62 nodeSort.c
--- src/backend/executor/nodeSort.c    1 Jan 2008 19:45:49 -0000    1.62
+++ src/backend/executor/nodeSort.c    21 Jul 2008 18:22:32 -0000
@@ -19,6 +19,7 @@
 #include "executor/nodeSort.h"
 #include "miscadmin.h"
 #include "utils/tuplesort.h"
+#include "pg_trace.h"


 /* ----------------------------------------------------------------
@@ -53,6 +54,8 @@ ExecSort(SortState *node)
     dir = estate->es_direction;
     tuplesortstate = (Tuplesortstate *) node->tuplesortstate;

+    TRACE_POSTGRESQL_EXECUTOR_SORT((uintptr_t)node, dir);
+
     /*
      * If first time through, read all tuples from outer plan and pass them to
      * tuplesort.c. Subsequent calls just fetch tuples from tuplesort.
Index: src/backend/executor/nodeSubplan.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeSubplan.c,v
retrieving revision 1.93
diff -u -3 -p -r1.93 nodeSubplan.c
--- src/backend/executor/nodeSubplan.c    12 May 2008 00:00:49 -0000    1.93
+++ src/backend/executor/nodeSubplan.c    21 Jul 2008 18:22:35 -0000
@@ -27,6 +27,7 @@
 #include "utils/array.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 static Datum ExecHashSubPlan(SubPlanState *node,
@@ -80,6 +81,8 @@ ExecHashSubPlan(SubPlanState *node,
     ExprContext *innerecontext = node->innerecontext;
     TupleTableSlot *slot;

+    TRACE_POSTGRESQL_EXECUTOR_SUBPLAN_HASH((uintptr_t)node);
+
     /* Shouldn't have any direct correlation Vars */
     if (subplan->parParam != NIL || node->args != NIL)
         elog(ERROR, "hashed subplan with direct correlation not supported");
@@ -215,6 +218,8 @@ ExecScanSubPlan(SubPlanState *node,
     ListCell   *l;
     ArrayBuildState *astate = NULL;

+    TRACE_POSTGRESQL_EXECUTOR_SUBPLAN_SCAN((uintptr_t)node);
+
     /*
      * We are probably in a short-lived expression-evaluation context. Switch
      * to the per-query context for manipulating the child plan's chgParam,
Index: src/backend/executor/nodeUnique.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeUnique.c,v
retrieving revision 1.56
diff -u -3 -p -r1.56 nodeUnique.c
--- src/backend/executor/nodeUnique.c    1 Jan 2008 19:45:49 -0000    1.56
+++ src/backend/executor/nodeUnique.c    21 Jul 2008 18:22:36 -0000
@@ -28,6 +28,7 @@
 #include "executor/executor.h"
 #include "executor/nodeUnique.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /* ----------------------------------------------------------------
@@ -45,6 +46,8 @@ ExecUnique(UniqueState *node)
     TupleTableSlot *slot;
     PlanState  *outerPlan;

+    TRACE_POSTGRESQL_EXECUTOR_UNIQUE((uintptr_t)node);
+
     /*
      * get information from the node
      */
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.82
diff -u -3 -p -r1.82 autovacuum.c
--- src/backend/postmaster/autovacuum.c    21 Jul 2008 15:27:02 -0000    1.82
+++ src/backend/postmaster/autovacuum.c    21 Jul 2008 18:22:39 -0000
@@ -100,6 +100,7 @@
 #include "utils/ps_status.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"
+#include "pg_trace.h"


 /*
@@ -1601,6 +1602,7 @@ AutoVacWorkerMain(int argc, char *argv[]
         InitPostgres(NULL, dbid, NULL, &dbname);
         SetProcessingMode(NormalProcessing);
         set_ps_display(dbname, false);
+        TRACE_POSTGRESQL_AUTOVACUUM_START(dbid, dbname);
         ereport(DEBUG1,
                 (errmsg("autovacuum: processing database \"%s\"", dbname)));

@@ -2177,6 +2179,7 @@ do_autovacuum(void)
         {
             /* have at it */
             MemoryContextSwitchTo(TopTransactionContext);
+            TRACE_POSTGRESQL_AUTOVACUUM_RELATION(tab->at_relid, tab->at_datname, tab->at_nspname, tab->at_relname);
             autovacuum_do_vac_analyze(tab, bstrategy);

             /*
Index: src/backend/postmaster/pgstat.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.176
diff -u -3 -p -r1.176 pgstat.c
--- src/backend/postmaster/pgstat.c    30 Jun 2008 10:58:47 -0000    1.176
+++ src/backend/postmaster/pgstat.c    21 Jul 2008 18:22:44 -0000
@@ -61,6 +61,7 @@
 #include "utils/ps_status.h"
 #include "utils/rel.h"
 #include "utils/tqual.h"
+#include "pg_trace.h"


 /* ----------
@@ -2202,6 +2203,8 @@ pgstat_report_activity(const char *cmd_s
     TimestampTz start_timestamp;
     int            len;

+    TRACE_POSTGRESQL_STATEMENT_STATUS(cmd_str);
+
     if (!pgstat_track_activities || !beentry)
         return;

Index: src/backend/storage/buffer/bufmgr.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.234
diff -u -3 -p -r1.234 bufmgr.c
--- src/backend/storage/buffer/bufmgr.c    13 Jul 2008 20:45:47 -0000    1.234
+++ src/backend/storage/buffer/bufmgr.c    21 Jul 2008 18:22:48 -0000
@@ -43,6 +43,7 @@
 #include "utils/rel.h"
 #include "utils/resowner.h"
 #include "pgstat.h"
+#include "pg_trace.h"


 /* Note: these two macros only work on shared buffers, not local ones! */
@@ -213,6 +214,9 @@ ReadBuffer_common(SMgrRelation smgr, boo
     if (isExtend)
         blockNum = smgrnblocks(smgr);

+    TRACE_POSTGRESQL_BUFFER_READ_START(blockNum, smgr->smgr_rnode.spcNode,
+        smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, isLocalBuf);
+
     if (isLocalBuf)
     {
         ReadLocalBufferCount++;
@@ -240,12 +244,19 @@ ReadBuffer_common(SMgrRelation smgr, boo
     {
         if (!isExtend)
         {
+            TRACE_POSTGRESQL_BUFFER_HIT();
+
             /* Just need to update stats before we exit */
             *hit = true;

             if (VacuumCostActive)
                 VacuumCostBalance += VacuumCostPageHit;

+            TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum,
+                smgr->smgr_rnode.spcNode,
+                smgr->smgr_rnode.dbNode,
+                smgr->smgr_rnode.relNode, isLocalBuf, found);
+
             return BufferDescriptorGetBuffer(bufHdr);
         }

@@ -297,6 +308,7 @@ ReadBuffer_common(SMgrRelation smgr, boo
             } while (!StartBufferIO(bufHdr, true));
         }
     }
+    TRACE_POSTGRESQL_BUFFER_MISS();

     /*
      * if we have gotten to this point, we have allocated a buffer for the
@@ -368,6 +380,10 @@ ReadBuffer_common(SMgrRelation smgr, boo
     if (VacuumCostActive)
         VacuumCostBalance += VacuumCostPageMiss;

+    TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum, smgr->smgr_rnode.spcNode,
+            smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode,
+            isLocalBuf, found);
+
     return BufferDescriptorGetBuffer(bufHdr);
 }

@@ -510,6 +526,11 @@ BufferAlloc(SMgrRelation smgr,
              * happens to be trying to split the page the first one got from
              * StrategyGetBuffer.)
              */
+
+            TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(blockNum,
+              smgr->smgr_rnode.spcNode, smgr->smgr_rnode.dbNode,
+              smgr->smgr_rnode.relNode);
+
             if (LWLockConditionalAcquire(buf->content_lock, LW_SHARED))
             {
                 /*
@@ -532,6 +553,11 @@ BufferAlloc(SMgrRelation smgr,
                 /* OK, do the I/O */
                 FlushBuffer(buf, NULL);
                 LWLockRelease(buf->content_lock);
+
+                TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_DONE(
+                  blockNum, smgr->smgr_rnode.spcNode,
+                  smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode,
+                  (oldFlags & BM_HINT_BITS_TEST));
             }
             else
             {
@@ -850,7 +876,9 @@ MarkBufferDirty(Buffer buffer)
     if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
         VacuumCostBalance += VacuumCostPageDirty;

-    bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
+    bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED | BM_HINT_BITS_TEST);
+
+    TRACE_POSTGRESQL_BUFFER_MARK_DIRTY(buffer);

     UnlockBufHdr(bufHdr);
 }
@@ -1086,6 +1114,8 @@ BufferSync(int flags)
     if (num_to_write == 0)
         return;                    /* nothing to do */

+    TRACE_POSTGRESQL_BUFFER_SYNC_START(NBuffers, num_to_write);
+
     /*
      * Loop over all buffers again, and write the ones (still) marked with
      * BM_CHECKPOINT_NEEDED.  In this loop, we start at the clock sweep point
@@ -1117,6 +1147,7 @@ BufferSync(int flags)
         {
             if (SyncOneBuffer(buf_id, false) & BUF_WRITTEN)
             {
+                TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
                 BgWriterStats.m_buf_written_checkpoints++;
                 num_written++;

@@ -1147,6 +1178,8 @@ BufferSync(int flags)
             buf_id = 0;
     }

+    TRACE_POSTGRESQL_BUFFER_SYNC_DONE(NBuffers, num_written, num_to_write);
+
     /*
      * Update checkpoint statistics. As noted above, this doesn't include
      * buffers written by other backends or bgwriter scan.
@@ -1653,11 +1686,13 @@ PrintBufferLeakWarning(Buffer buffer)
 void
 CheckPointBuffers(int flags)
 {
+    TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
     CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
     BufferSync(flags);
     CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
     smgrsync();
     CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
+    TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
 }


@@ -1759,6 +1794,10 @@ FlushBuffer(volatile BufferDesc *buf, SM
     if (reln == NULL)
         reln = smgropen(buf->tag.rnode);

+    TRACE_POSTGRESQL_BUFFER_FLUSH_START(reln->smgr_rnode.spcNode,
+         reln->smgr_rnode.dbNode,
+         reln->smgr_rnode.relNode, buf->tag.blockNum);
+
     /*
      * Force XLOG flush up to buffer's LSN.  This implements the basic WAL
      * rule that log updates must hit disk before any of the data-file changes
@@ -1785,6 +1824,10 @@ FlushBuffer(volatile BufferDesc *buf, SM

     BufferFlushCount++;

+    TRACE_POSTGRESQL_BUFFER_FLUSH_DONE(reln->smgr_rnode.spcNode,
+         reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode,
+         buf->tag.blockNum, (buf->flags & BM_HINT_BITS_TEST));
+
     /*
      * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and
      * end the io_in_progress state.
@@ -2536,7 +2579,7 @@ TerminateBufferIO(volatile BufferDesc *b
     Assert(buf->flags & BM_IO_IN_PROGRESS);
     buf->flags &= ~(BM_IO_IN_PROGRESS | BM_IO_ERROR);
     if (clear_dirty && !(buf->flags & BM_JUST_DIRTIED))
-        buf->flags &= ~(BM_DIRTY | BM_CHECKPOINT_NEEDED);
+        buf->flags &= ~(BM_DIRTY | BM_CHECKPOINT_NEEDED | BM_HINT_BITS_TEST);
     buf->flags |= set_flag_bits;

     UnlockBufHdr(buf);
Index: src/backend/storage/buffer/localbuf.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/localbuf.c,v
retrieving revision 1.80
diff -u -3 -p -r1.80 localbuf.c
--- src/backend/storage/buffer/localbuf.c    12 Jun 2008 09:12:31 -0000    1.80
+++ src/backend/storage/buffer/localbuf.c    21 Jul 2008 18:22:48 -0000
@@ -21,6 +21,7 @@
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
+#include "pg_trace.h"


 /*#define LBDEBUG*/
@@ -236,6 +237,7 @@ MarkLocalBufferDirty(Buffer buffer)

     bufHdr = &LocalBufferDescriptors[bufid];
     bufHdr->flags |= BM_DIRTY;
+    TRACE_POSTGRESQL_BUFFER_MARK_DIRTY_LOCAL(buffer);
 }

 /*
Index: src/backend/storage/lmgr/deadlock.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/deadlock.c,v
retrieving revision 1.53
diff -u -3 -p -r1.53 deadlock.c
--- src/backend/storage/lmgr/deadlock.c    24 Mar 2008 18:22:36 -0000    1.53
+++ src/backend/storage/lmgr/deadlock.c    21 Jul 2008 18:22:49 -0000
@@ -30,6 +30,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /* One edge in the waits-for graph */
@@ -222,6 +223,8 @@ DeadLockCheck(PGPROC *proc)
          */
         int            nSoftEdges;

+        TRACE_POSTGRESQL_DEADLOCK_FOUND();
+
         nWaitOrders = 0;
         if (!FindLockCycle(proc, possibleConstraints, &nSoftEdges))
             elog(FATAL, "deadlock seems to have disappeared");
@@ -259,6 +262,8 @@ DeadLockCheck(PGPROC *proc)
         ProcLockWakeup(GetLocksMethodTable(lock), lock);
     }

+    TRACE_POSTGRESQL_DEADLOCK_NOTFOUND(nWaitOrders);
+
     /* Return code tells caller if we had to escape a deadlock or not */
     if (nWaitOrders > 0)
         return DS_SOFT_DEADLOCK;
Index: src/backend/storage/smgr/md.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/smgr/md.c,v
retrieving revision 1.138
diff -u -3 -p -r1.138 md.c
--- src/backend/storage/smgr/md.c    2 May 2008 01:08:27 -0000    1.138
+++ src/backend/storage/smgr/md.c    21 Jul 2008 18:22:50 -0000
@@ -26,6 +26,7 @@
 #include "storage/smgr.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /* interval for calling AbsorbFsyncRequests in mdsync */
@@ -555,6 +556,8 @@ mdread(SMgrRelation reln, BlockNumber bl
     int            nbytes;
     MdfdVec    *v;

+    TRACE_POSTGRESQL_SMGR_READ_START(blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode,
reln->smgr_rnode.relNode);
+
     v = _mdfd_getseg(reln, blocknum, false, EXTENSION_FAIL);

     seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
@@ -569,7 +572,11 @@ mdread(SMgrRelation reln, BlockNumber bl
                         reln->smgr_rnode.dbNode,
                         reln->smgr_rnode.relNode)));

-    if ((nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ)) != BLCKSZ)
+    nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ);
+
+    TRACE_POSTGRESQL_SMGR_READ_DONE(blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode,
reln->smgr_rnode.relNode,nbytes, BLCKSZ); 
+
+    if (nbytes != BLCKSZ)
     {
         if (nbytes < 0)
             ereport(ERROR,
@@ -621,6 +628,8 @@ mdwrite(SMgrRelation reln, BlockNumber b
     Assert(blocknum < mdnblocks(reln));
 #endif

+    TRACE_POSTGRESQL_SMGR_WRITE_START(blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode,
reln->smgr_rnode.relNode);
+
     v = _mdfd_getseg(reln, blocknum, isTemp, EXTENSION_FAIL);

     seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
@@ -657,6 +666,8 @@ mdwrite(SMgrRelation reln, BlockNumber b
                  errhint("Check free disk space.")));
     }

+    TRACE_POSTGRESQL_SMGR_WRITE_DONE(blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode,
reln->smgr_rnode.relNode,nbytes, BLCKSZ); 
+
     if (!isTemp)
         register_dirty_segment(reln, v);
 }
Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.554
diff -u -3 -p -r1.554 postgres.c
--- src/backend/tcop/postgres.c    18 Jul 2008 20:26:06 -0000    1.554
+++ src/backend/tcop/postgres.c    21 Jul 2008 18:22:54 -0000
@@ -71,6 +71,7 @@
 #include "mb/pg_wchar.h"

 #include "pgstat.h"
+#include "pg_trace.h"

 extern int    optind;
 extern char *optarg;
@@ -551,6 +552,8 @@ pg_parse_query(const char *query_string)
 {
     List       *raw_parsetree_list;

+    TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
+
     if (log_parser_stats)
         ResetUsage();

@@ -660,6 +663,8 @@ pg_rewrite_query(Query *query)
         elog_node_display(DEBUG1, "rewritten parse tree", querytree_list,
                           Debug_pretty_print);

+    TRACE_POSTGRESQL_QUERY_PARSE_DONE();
+
     return querytree_list;
 }

@@ -673,6 +678,8 @@ pg_plan_query(Query *querytree, int curs
 {
     PlannedStmt *plan;

+    TRACE_POSTGRESQL_QUERY_PLAN_START();
+
     /* Utility commands have no plans. */
     if (querytree->commandType == CMD_UTILITY)
         return NULL;
@@ -710,6 +717,7 @@ pg_plan_query(Query *querytree, int curs
      */
     if (Debug_print_plan)
         elog_node_display(DEBUG1, "plan", plan, Debug_pretty_print);
+    TRACE_POSTGRESQL_QUERY_PLAN_DONE();

     return plan;
 }
@@ -785,6 +793,7 @@ exec_simple_query(const char *query_stri
     bool        isTopLevel;
     char        msec_str[32];

+
     /*
      * Report query to various monitoring facilities.
      */
@@ -792,6 +801,8 @@ exec_simple_query(const char *query_stri

     pgstat_report_activity(query_string);

+    TRACE_POSTGRESQL_QUERY_START(query_string);
+
     /*
      * We use save_log_statement_stats so ShowUsage doesn't report incorrect
      * results because ResetUsage wasn't called.
@@ -1058,6 +1069,8 @@ exec_simple_query(const char *query_stri
     if (save_log_statement_stats)
         ShowUsage("QUERY STATISTICS");

+    TRACE_POSTGRESQL_QUERY_DONE(query_string);
+
     debug_query_string = NULL;
 }

@@ -3493,6 +3506,7 @@ PostgresMain(int argc, char *argv[], con
         {
             if (IsTransactionOrTransactionBlock())
             {
+                TRACE_POSTGRESQL_TRANSACTION_IDLE_START(GetCurrentTransactionId(), GetCurrentCommandId(false));
                 set_ps_display("idle in transaction", false);
                 pgstat_report_activity("<IDLE> in transaction");
             }
@@ -3522,6 +3536,8 @@ PostgresMain(int argc, char *argv[], con
          */
         firstchar = ReadCommand(&input_message);

+        TRACE_POSTGRESQL_TRANSACTION_IDLE_DONE();
+
         /*
          * (4) disable async signal conditions again.
          */
Index: src/backend/tcop/pquery.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/pquery.c,v
retrieving revision 1.123
diff -u -3 -p -r1.123 pquery.c
--- src/backend/tcop/pquery.c    12 May 2008 20:02:02 -0000    1.123
+++ src/backend/tcop/pquery.c    21 Jul 2008 18:22:56 -0000
@@ -24,6 +24,7 @@
 #include "tcop/utility.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
+#include "pg_trace.h"


 /*
@@ -711,6 +712,8 @@ PortalRun(Portal portal, long count, boo

     AssertArg(PortalIsValid(portal));

+    TRACE_POSTGRESQL_QUERY_EXECUTE_START();
+
     /* Initialize completion tag to empty string */
     if (completionTag)
         completionTag[0] = '\0';
@@ -857,6 +860,8 @@ PortalRun(Portal portal, long count, boo

     if (log_executor_stats && portal->strategy != PORTAL_MULTI_QUERY)
         ShowUsage("EXECUTOR STATISTICS");
+
+    TRACE_POSTGRESQL_QUERY_EXECUTE_DONE();

     return result;
 }
@@ -1237,6 +1242,8 @@ PortalRunMulti(Portal portal, bool isTop
              */
             PlannedStmt *pstmt = (PlannedStmt *) stmt;

+            TRACE_POSTGRESQL_QUERY_EXECUTE_START();
+
             if (log_executor_stats)
                 ResetUsage();

@@ -1257,6 +1264,8 @@ PortalRunMulti(Portal portal, bool isTop

             if (log_executor_stats)
                 ShowUsage("EXECUTOR STATISTICS");
+
+            TRACE_POSTGRESQL_QUERY_EXECUTE_DONE();
         }
         else
         {
Index: src/backend/utils/probes.d
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/probes.d,v
retrieving revision 1.2
diff -u -3 -p -r1.2 probes.d
--- src/backend/utils/probes.d    2 Jan 2008 02:42:06 -0000    1.2
+++ src/backend/utils/probes.d    21 Jul 2008 18:22:56 -0000
@@ -7,18 +7,123 @@
  * ----------
  */

+
 provider postgresql {

-probe transaction__start(int);
-probe transaction__commit(int);
-probe transaction__abort(int);
-probe lwlock__acquire(int, int);
-probe lwlock__release(int);
-probe lwlock__startwait(int, int);
-probe lwlock__endwait(int, int);
-probe lwlock__condacquire(int, int);
-probe lwlock__condacquire__fail(int, int);
-probe lock__startwait(int, int);
-probe lock__endwait(int, int);
+    /*
+     * Due to a bug in Mac OS X 10.5, using defined types (e.g. uintptr_t,
+     * uint32_t, etc.) cause compilation error.
+     */
+
+    probe transaction__start(unsigned int transactionId);
+    probe transaction__commit(unsigned int transactionId);
+    probe transaction__abort(unsigned int transactionId);
+    probe transaction__idle__start(unsigned int currentTxnId, unsigned int currentCmdId);
+    probe transaction__idle__done();
+
+    probe lwlock__acquire(unsigned int lockId, unsigned int mode);
+    probe lwlock__release(unsigned int lockId);
+    probe lwlock__startwait(unsigned int lockId, unsigned int mode);
+    probe lwlock__endwait(unsigned int lockId, unsigned int mode);
+    probe lwlock__condacquire(unsigned int lockId, unsigned int mode);
+    probe lwlock__condacquire__fail(unsigned int lockId, unsigned int mode);
+    probe lock__startwait(unsigned int lockTagField2, int lockMode);
+    probe lock__endwait(unsigned int lockTagField2, int lockMode);
+
+    probe query__parse__start(char *queryString);
+    probe query__parse__done();
+    probe query__plan__start();
+    probe query__plan__done();
+    probe query__execute__start();
+    probe query__execute__done();
+    probe query__start(char *queryString);
+    probe query__done(char *queryString);
+    probe statement__status(char *cmdString);
+
+    probe sort__start(int, int, int, int, int);
+    probe sort__end(int, long);
+
+    probe buffer__read__start(int blockNum, int tablespaceOid,
+            int databaseOid, int relationOid, int isLocalBuf);
+    probe buffer__read__done(int blockNum, int tablespaceOid,
+             int databaseOid, int relationOid, int isLocalBuf,
+            int isInBuffPool);
+    probe buffer__write__dirty__start(int blockNum, int tablespaceOid,
+            int databaseOid, int relationOid);
+    probe buffer__write__dirty__done(int blockNum, int tablespaceOid,
+            int databaseOid, int relationOid, int isHintBitsWrite);
+
+    probe buffer__flush__start(int tablespaceOid,
+            int databaseOid, int relationOid, int blockNum);
+    probe buffer__flush__done(int tablespaceOid, int databaseOid,
+            int relationOid, int blockNum, int isHintBitsWrite);
+    probe buffer__hit();
+    probe buffer__miss();
+    probe buffer__mark__dirty(int bufferId);
+    probe buffer__mark__dirty__local(int bufferId);
+    probe buffer__checkpoint__start(int);
+    probe buffer__checkpoint__done();
+    probe buffer__sync__start(int numBuffers, int numToWrite);
+    probe buffer__sync__written(int bufId);
+    probe buffer__sync__done(int numBuffers, int numWritten, int numToWrite);
+
+    probe wal__buffer__write__start();
+    probe wal__buffer__write__done();
+
+    probe deadlock__found();
+    probe deadlock__notfound(int);
+
+    probe smgr__read__start(int blockNum, int tablespaceOid,
+            int databaseOid, int relationOid);
+    probe smgr__read__done(int blockNum, int tablespaceOid,
+             int databaseOid, int relationOid, int bytesRead,
+            int blockSize);
+    probe smgr__write__start(int blockNum, int tablespaceOid,
+            int databaseOid, int relationOid);
+    probe smgr__write__done(int blockNum, int tablespaceOid,
+            int databaseOid, int relationOid, int bytesRead,
+            int blockSize);
+
+    probe xlog__insert(unsigned char, unsigned char);
+    probe xlog__switch();
+
+    probe xlog__checkpoint__start(int checkpointFlags, unsigned int transactionId);
+    probe xlog__checkpoint__done(int bufsWritten, int NBuffers, int segsAdded,
+            int segsRemoved, int segsRecycled);
+    probe clog__checkpoint__start();
+    probe clog__checkpoint__done();
+    probe subtrans__checkpoint__start();
+    probe subtrans__checkpoint__done();
+    probe multixact__checkpoint__start();
+    probe multixact__checkpoint__done();
+    probe twophase__checkpoint__start();
+    probe twophase__checkpoint__done();
+
+    probe autovacuum__start(unsigned int databaseOid, char *databaseName);
+    probe autovacuum__relation(unsigned int, char *, char *, char *);
+
+    probe slru__readpage__start(unsigned int, int pageNum, int writeOk, int xid);
+    probe slru__readpage__done(int slotNum);
+    probe slru__readpage__readonly(unsigned int, int, int);
+    probe slru__writepage__start(unsigned int, int pageNum, int slotNum);
+    probe slru__writepage__done();
+    probe slru__readpage__physical__start(unsigned int, char *path, int pageNum, int slotNum);
+    probe slru__readpage__physical__done(int retVal, int errCause, int errNo);
+    probe slru__writepage__physical__start(unsigned int, int pageNum, int slotNum);
+    probe slru__writepage__physical__done(int retVal, int errCause, int errNo);

+    probe executor__scan(unsigned int, unsigned int, unsigned int);
+    probe executor__agg(unsigned int, int);
+    probe executor__group(unsigned int, int);
+    probe executor__hash__multi(unsigned int);
+    probe executor__hashjoin(unsigned int);
+    probe executor__limit(unsigned int);
+    probe executor__material(unsigned int);
+    probe executor__mergejoin(unsigned int);
+    probe executor__nestloop(unsigned int);
+    probe executor__setop(unsigned int);
+    probe executor__sort(unsigned int, int);
+    probe executor__subplan__hash(unsigned int);
+    probe executor__subplan__scan(unsigned int);
+    probe executor__unique(unsigned int);
 };
Index: src/backend/utils/sort/tuplesort.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/sort/tuplesort.c,v
retrieving revision 1.85
diff -u -3 -p -r1.85 tuplesort.c
--- src/backend/utils/sort/tuplesort.c    19 Jun 2008 00:46:05 -0000    1.85
+++ src/backend/utils/sort/tuplesort.c    21 Jul 2008 18:23:00 -0000
@@ -115,12 +115,18 @@
 #include "utils/rel.h"
 #include "utils/syscache.h"
 #include "utils/tuplesort.h"
+#include "pg_trace.h"


 /* GUC variables */
 #ifdef TRACE_SORT
 bool        trace_sort = false;
 #endif
+
+#define HEAP_SORT    0
+#define INDEX_SORT    1
+#define DATUM_SORT    2
+
 #ifdef DEBUG_BOUNDED_SORT
 bool        optimize_bounded_sort = true;
 #endif
@@ -569,6 +575,7 @@ tuplesort_begin_heap(TupleDesc tupDesc,
              "begin tuple sort: nkeys = %d, workMem = %d, randomAccess = %c",
              nkeys, workMem, randomAccess ? 't' : 'f');
 #endif
+    TRACE_POSTGRESQL_SORT_START(HEAP_SORT, false, nkeys, workMem, randomAccess);

     state->nKeys = nkeys;

@@ -636,6 +643,8 @@ tuplesort_begin_index_btree(Relation ind

     state->nKeys = RelationGetNumberOfAttributes(indexRel);

+    TRACE_POSTGRESQL_SORT_START(INDEX_SORT, enforceUnique, state->nKeys, workMem, randomAccess);
+
     state->comparetup = comparetup_index_btree;
     state->copytup = copytup_index;
     state->writetup = writetup_index;
@@ -713,6 +722,7 @@ tuplesort_begin_datum(Oid datumType,
              "begin datum sort: workMem = %d, randomAccess = %c",
              workMem, randomAccess ? 't' : 'f');
 #endif
+    TRACE_POSTGRESQL_SORT_START(DATUM_SORT, false, 1, workMem, randomAccess);

     state->nKeys = 1;            /* always a one-column sort */

@@ -825,6 +835,11 @@ tuplesort_end(Tuplesortstate *state)
     }
 #endif

+    TRACE_POSTGRESQL_SORT_END(state->tapeset,
+            (state->tapeset ? LogicalTapeSetBlocks(state->tapeset) :
+            (state->allowedMem - state->availMem + 1023) / 1024));
+
+
     MemoryContextSwitchTo(oldcontext);

     /*
Index: src/include/storage/buf_internals.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/storage/buf_internals.h,v
retrieving revision 1.97
diff -u -3 -p -r1.97 buf_internals.h
--- src/include/storage/buf_internals.h    19 Jun 2008 00:46:06 -0000    1.97
+++ src/include/storage/buf_internals.h    21 Jul 2008 18:23:01 -0000
@@ -37,6 +37,7 @@
 #define BM_JUST_DIRTIED            (1 << 5)        /* dirtied since write started */
 #define BM_PIN_COUNT_WAITER        (1 << 6)        /* have waiter for sole pin */
 #define BM_CHECKPOINT_NEEDED    (1 << 7)        /* must write for checkpoint */
+#define BM_HINT_BITS_TEST    (1 << 8)        /* test effect of writes due to hint bits */

 typedef bits16 BufFlags;


Re: Review: DTrace probes (merged version)

From
Peter Eisentraut
Date:
This patch looked OK to me, but the commit fest comment says that it does not 
include comments from the reviewdemo.postgresql.org.  But I don't find any 
comments there.  The latest version of the patch there appears to be the same 
as here.  Zdenek, could you clarify?


Re: Review: DTrace probes (merged version)

From
Zdenek Kotala
Date:
Peter Eisentraut napsal(a):
> This patch looked OK to me, but the commit fest comment says that it does not 
> include comments from the reviewdemo.postgresql.org.  But I don't find any 
> comments there.  The latest version of the patch there appears to be the same 
> as here.  Zdenek, could you clarify?

I'm sorry. I forgot to publish them :( (new tool). It is fixed now. I also 
upload latest patch version and I will review it tomorrow.
    Zdenek

-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: Review: DTrace probes (merged version) ver_03

From
Zdenek Kotala
Date:
I performed review and I prepared own patch which contains only probes without
any issue. I suggest commit this patch because the rest of patch is independent
and it can be committed next commit fest after rework.

I found following issues:

1) SLRU probes.

I think it is good to have probes there but they needs polish. See my comments
http://reviewdemo.postgresql.org/r/25/


2) XLOG probes

I think there is confuse placement of probes after merge. It needs cleanup.


3) Executor probes

I would like to see any use case for them/

4) smgr probes

I prefer to have this probes in smgr instead of md. The reason why Robert put
them into md is that it returns number of written/read bytes, but it is "always"
BLCKSZ which could be returned from smgr directly. Only difference is
when error occurs during write/read and not all data are written/read.

It needs discuss.

5) autovacuum start probes

I would like to see also stat/stop for any other process types. It was discussed
but no comment from author(s).

6) idle transaction

See my comments
http://reviewdemo.postgresql.org/r/25/

7) query-reewrite is missing


8) mark dirty and BM_HINT... flag

I remove these because I don't see any use case for it. It would be nice provide
some dtrace script or describe basic ideas.



    Thats all Zdenek


Index: src/backend/access/transam/clog.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/clog.c,v
retrieving revision 1.46
diff -c -r1.46 clog.c
*** src/backend/access/transam/clog.c    1 Jan 2008 19:45:46 -0000    1.46
--- src/backend/access/transam/clog.c    24 Jul 2008 14:19:23 -0000
***************
*** 36,41 ****
--- 36,42 ----
  #include "access/slru.h"
  #include "access/transam.h"
  #include "postmaster/bgwriter.h"
+ #include "pg_trace.h"

  /*
   * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
***************
*** 323,329 ****
--- 324,332 ----
  CheckPointCLOG(void)
  {
      /* Flush dirty CLOG pages to disk */
+     TRACE_POSTGRESQL_CLOG_CHECKPOINT_START();
      SimpleLruFlush(ClogCtl, true);
+     TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE();
  }


Index: src/backend/access/transam/multixact.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/multixact.c,v
retrieving revision 1.27
diff -c -r1.27 multixact.c
*** src/backend/access/transam/multixact.c    1 Jan 2008 19:45:46 -0000    1.27
--- src/backend/access/transam/multixact.c    24 Jul 2008 14:19:24 -0000
***************
*** 57,62 ****
--- 57,63 ----
  #include "storage/lmgr.h"
  #include "utils/memutils.h"
  #include "storage/procarray.h"
+ #include "pg_trace.h"


  /*
***************
*** 1526,1531 ****
--- 1527,1534 ----
  void
  CheckPointMultiXact(void)
  {
+     TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START();
+
      /* Flush dirty MultiXact pages to disk */
      SimpleLruFlush(MultiXactOffsetCtl, true);
      SimpleLruFlush(MultiXactMemberCtl, true);
***************
*** 1540,1545 ****
--- 1543,1550 ----
       */
      if (!InRecovery)
          TruncateMultiXact();
+
+     TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE();
  }

  /*
Index: src/backend/access/transam/subtrans.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/subtrans.c,v
retrieving revision 1.22
diff -c -r1.22 subtrans.c
*** src/backend/access/transam/subtrans.c    26 Mar 2008 18:48:59 -0000    1.22
--- src/backend/access/transam/subtrans.c    24 Jul 2008 14:19:24 -0000
***************
*** 32,37 ****
--- 32,38 ----
  #include "access/subtrans.h"
  #include "access/transam.h"
  #include "utils/snapmgr.h"
+ #include "pg_trace.h"


  /*
***************
*** 281,287 ****
--- 282,290 ----
       * it merely to improve the odds that writing of dirty pages is done by
       * the checkpoint process and not by backends.
       */
+     TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START();
      SimpleLruFlush(SubTransCtl, true);
+     TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE();
  }


Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.43
diff -c -r1.43 twophase.c
*** src/backend/access/transam/twophase.c    19 May 2008 18:16:26 -0000    1.43
--- src/backend/access/transam/twophase.c    24 Jul 2008 14:19:24 -0000
***************
*** 57,62 ****
--- 57,63 ----
  #include "storage/smgr.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
+ #include "pg_trace.h"


  /*
***************
*** 1387,1392 ****
--- 1388,1396 ----
       */
      if (max_prepared_xacts <= 0)
          return;                    /* nothing to do */
+
+     TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_START();
+
      xids = (TransactionId *) palloc(max_prepared_xacts * sizeof(TransactionId));
      nxids = 0;

***************
*** 1444,1449 ****
--- 1448,1455 ----
      }

      pfree(xids);
+
+     TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_DONE();
  }

  /*
Index: src/backend/postmaster/pgstat.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.176
diff -c -r1.176 pgstat.c
*** src/backend/postmaster/pgstat.c    30 Jun 2008 10:58:47 -0000    1.176
--- src/backend/postmaster/pgstat.c    24 Jul 2008 14:19:30 -0000
***************
*** 61,66 ****
--- 61,67 ----
  #include "utils/ps_status.h"
  #include "utils/rel.h"
  #include "utils/tqual.h"
+ #include "pg_trace.h"


  /* ----------
***************
*** 2202,2207 ****
--- 2203,2210 ----
      TimestampTz start_timestamp;
      int            len;

+     TRACE_POSTGRESQL_STATEMENT_STATUS(cmd_str);
+
      if (!pgstat_track_activities || !beentry)
          return;

Index: src/backend/storage/buffer/bufmgr.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.234
diff -c -r1.234 bufmgr.c
*** src/backend/storage/buffer/bufmgr.c    13 Jul 2008 20:45:47 -0000    1.234
--- src/backend/storage/buffer/bufmgr.c    24 Jul 2008 14:19:31 -0000
***************
*** 43,48 ****
--- 43,49 ----
  #include "utils/rel.h"
  #include "utils/resowner.h"
  #include "pgstat.h"
+ #include "pg_trace.h"


  /* Note: these two macros only work on shared buffers, not local ones! */
***************
*** 213,218 ****
--- 214,222 ----
      if (isExtend)
          blockNum = smgrnblocks(smgr);

+     TRACE_POSTGRESQL_BUFFER_READ_START(blockNum, smgr->smgr_rnode.spcNode,
+         smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, isLocalBuf);
+
      if (isLocalBuf)
      {
          ReadLocalBufferCount++;
***************
*** 240,251 ****
--- 244,262 ----
      {
          if (!isExtend)
          {
+             TRACE_POSTGRESQL_BUFFER_HIT();
+
              /* Just need to update stats before we exit */
              *hit = true;

              if (VacuumCostActive)
                  VacuumCostBalance += VacuumCostPageHit;

+             TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum,
+                 smgr->smgr_rnode.spcNode,
+                 smgr->smgr_rnode.dbNode,
+                 smgr->smgr_rnode.relNode, isLocalBuf, found);
+
              return BufferDescriptorGetBuffer(bufHdr);
          }

***************
*** 297,302 ****
--- 308,314 ----
              } while (!StartBufferIO(bufHdr, true));
          }
      }
+     TRACE_POSTGRESQL_BUFFER_MISS();

      /*
       * if we have gotten to this point, we have allocated a buffer for the
***************
*** 368,373 ****
--- 380,389 ----
      if (VacuumCostActive)
          VacuumCostBalance += VacuumCostPageMiss;

+     TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum, smgr->smgr_rnode.spcNode,
+             smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode,
+             isLocalBuf, found);
+
      return BufferDescriptorGetBuffer(bufHdr);
  }

***************
*** 1086,1091 ****
--- 1102,1109 ----
      if (num_to_write == 0)
          return;                    /* nothing to do */

+     TRACE_POSTGRESQL_BUFFER_SYNC_START(NBuffers, num_to_write);
+
      /*
       * Loop over all buffers again, and write the ones (still) marked with
       * BM_CHECKPOINT_NEEDED.  In this loop, we start at the clock sweep point
***************
*** 1117,1122 ****
--- 1135,1141 ----
          {
              if (SyncOneBuffer(buf_id, false) & BUF_WRITTEN)
              {
+                 TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
                  BgWriterStats.m_buf_written_checkpoints++;
                  num_written++;

***************
*** 1147,1152 ****
--- 1166,1173 ----
              buf_id = 0;
      }

+     TRACE_POSTGRESQL_BUFFER_SYNC_DONE(NBuffers, num_written, num_to_write);
+
      /*
       * Update checkpoint statistics. As noted above, this doesn't include
       * buffers written by other backends or bgwriter scan.
***************
*** 1653,1663 ****
--- 1674,1686 ----
  void
  CheckPointBuffers(int flags)
  {
+     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
      CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
      BufferSync(flags);
      CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
      smgrsync();
      CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
+     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
  }


***************
*** 1759,1764 ****
--- 1782,1791 ----
      if (reln == NULL)
          reln = smgropen(buf->tag.rnode);

+     TRACE_POSTGRESQL_BUFFER_FLUSH_START(reln->smgr_rnode.spcNode,
+          reln->smgr_rnode.dbNode,
+          reln->smgr_rnode.relNode, buf->tag.blockNum);
+
      /*
       * Force XLOG flush up to buffer's LSN.  This implements the basic WAL
       * rule that log updates must hit disk before any of the data-file changes
***************
*** 1785,1790 ****
--- 1812,1821 ----

      BufferFlushCount++;

+     TRACE_POSTGRESQL_BUFFER_FLUSH_DONE(reln->smgr_rnode.spcNode,
+          reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode,
+          buf->tag.blockNum);
+
      /*
       * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and
       * end the io_in_progress state.
Index: src/backend/storage/lmgr/deadlock.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/deadlock.c,v
retrieving revision 1.53
diff -c -r1.53 deadlock.c
*** src/backend/storage/lmgr/deadlock.c    24 Mar 2008 18:22:36 -0000    1.53
--- src/backend/storage/lmgr/deadlock.c    24 Jul 2008 14:19:31 -0000
***************
*** 30,35 ****
--- 30,36 ----
  #include "storage/lmgr.h"
  #include "storage/proc.h"
  #include "utils/memutils.h"
+ #include "pg_trace.h"


  /* One edge in the waits-for graph */
***************
*** 222,227 ****
--- 223,230 ----
           */
          int            nSoftEdges;

+         TRACE_POSTGRESQL_DEADLOCK_FOUND();
+
          nWaitOrders = 0;
          if (!FindLockCycle(proc, possibleConstraints, &nSoftEdges))
              elog(FATAL, "deadlock seems to have disappeared");
***************
*** 259,264 ****
--- 262,269 ----
          ProcLockWakeup(GetLocksMethodTable(lock), lock);
      }

+     TRACE_POSTGRESQL_DEADLOCK_NOTFOUND(nWaitOrders);
+
      /* Return code tells caller if we had to escape a deadlock or not */
      if (nWaitOrders > 0)
          return DS_SOFT_DEADLOCK;
Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.554
diff -c -r1.554 postgres.c
*** src/backend/tcop/postgres.c    18 Jul 2008 20:26:06 -0000    1.554
--- src/backend/tcop/postgres.c    24 Jul 2008 14:19:31 -0000
***************
*** 71,76 ****
--- 71,77 ----
  #include "mb/pg_wchar.h"

  #include "pgstat.h"
+ #include "pg_trace.h"

  extern int    optind;
  extern char *optarg;
***************
*** 551,556 ****
--- 552,559 ----
  {
      List       *raw_parsetree_list;

+     TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
+
      if (log_parser_stats)
          ResetUsage();

***************
*** 660,665 ****
--- 663,670 ----
          elog_node_display(DEBUG1, "rewritten parse tree", querytree_list,
                            Debug_pretty_print);

+     TRACE_POSTGRESQL_QUERY_PARSE_DONE();
+
      return querytree_list;
  }

***************
*** 673,678 ****
--- 678,685 ----
  {
      PlannedStmt *plan;

+     TRACE_POSTGRESQL_QUERY_PLAN_START();
+
      /* Utility commands have no plans. */
      if (querytree->commandType == CMD_UTILITY)
          return NULL;
***************
*** 710,715 ****
--- 717,723 ----
       */
      if (Debug_print_plan)
          elog_node_display(DEBUG1, "plan", plan, Debug_pretty_print);
+     TRACE_POSTGRESQL_QUERY_PLAN_DONE();

      return plan;
  }
***************
*** 785,790 ****
--- 793,799 ----
      bool        isTopLevel;
      char        msec_str[32];

+
      /*
       * Report query to various monitoring facilities.
       */
***************
*** 792,797 ****
--- 801,808 ----

      pgstat_report_activity(query_string);

+     TRACE_POSTGRESQL_QUERY_START(query_string);
+
      /*
       * We use save_log_statement_stats so ShowUsage doesn't report incorrect
       * results because ResetUsage wasn't called.
***************
*** 1058,1063 ****
--- 1069,1076 ----
      if (save_log_statement_stats)
          ShowUsage("QUERY STATISTICS");

+     TRACE_POSTGRESQL_QUERY_DONE(query_string);
+
      debug_query_string = NULL;
  }

Index: src/backend/tcop/pquery.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/pquery.c,v
retrieving revision 1.123
diff -c -r1.123 pquery.c
*** src/backend/tcop/pquery.c    12 May 2008 20:02:02 -0000    1.123
--- src/backend/tcop/pquery.c    24 Jul 2008 14:19:31 -0000
***************
*** 24,29 ****
--- 24,30 ----
  #include "tcop/utility.h"
  #include "utils/memutils.h"
  #include "utils/snapmgr.h"
+ #include "pg_trace.h"


  /*
***************
*** 711,716 ****
--- 712,719 ----

      AssertArg(PortalIsValid(portal));

+     TRACE_POSTGRESQL_QUERY_EXECUTE_START();
+
      /* Initialize completion tag to empty string */
      if (completionTag)
          completionTag[0] = '\0';
***************
*** 857,862 ****
--- 860,867 ----

      if (log_executor_stats && portal->strategy != PORTAL_MULTI_QUERY)
          ShowUsage("EXECUTOR STATISTICS");
+
+     TRACE_POSTGRESQL_QUERY_EXECUTE_DONE();

      return result;
  }
***************
*** 1237,1242 ****
--- 1242,1249 ----
               */
              PlannedStmt *pstmt = (PlannedStmt *) stmt;

+             TRACE_POSTGRESQL_QUERY_EXECUTE_START();
+
              if (log_executor_stats)
                  ResetUsage();

***************
*** 1257,1262 ****
--- 1264,1271 ----

              if (log_executor_stats)
                  ShowUsage("EXECUTOR STATISTICS");
+
+             TRACE_POSTGRESQL_QUERY_EXECUTE_DONE();
          }
          else
          {
Index: src/backend/utils/probes.d
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/probes.d,v
retrieving revision 1.2
diff -c -r1.2 probes.d
*** src/backend/utils/probes.d    2 Jan 2008 02:42:06 -0000    1.2
--- src/backend/utils/probes.d    24 Jul 2008 14:19:31 -0000
***************
*** 7,24 ****
   * ----------
   */

  provider postgresql {

! probe transaction__start(int);
! probe transaction__commit(int);
! probe transaction__abort(int);
! probe lwlock__acquire(int, int);
! probe lwlock__release(int);
! probe lwlock__startwait(int, int);
! probe lwlock__endwait(int, int);
! probe lwlock__condacquire(int, int);
! probe lwlock__condacquire__fail(int, int);
! probe lock__startwait(int, int);
! probe lock__endwait(int, int);

  };
--- 7,76 ----
   * ----------
   */

  provider postgresql {

!     /*
!      * Due to a bug in Mac OS X 10.5, using defined types (e.g. uintptr_t,
!      * uint32_t, etc.) cause compilation error.
!      */
!
!     probe transaction__start(unsigned int transactionId);
!     probe transaction__commit(unsigned int transactionId);
!     probe transaction__abort(unsigned int transactionId);
!
!     probe lwlock__acquire(unsigned int lockId, unsigned int mode);
!     probe lwlock__release(unsigned int lockId);
!     probe lwlock__startwait(unsigned int lockId, unsigned int mode);
!     probe lwlock__endwait(unsigned int lockId, unsigned int mode);
!     probe lwlock__condacquire(unsigned int lockId, unsigned int mode);
!     probe lwlock__condacquire__fail(unsigned int lockId, unsigned int mode);
!     probe lock__startwait(unsigned int lockTagField2, int lockMode);
!     probe lock__endwait(unsigned int lockTagField2, int lockMode);
!
!     probe query__parse__start(char *queryString);
!     probe query__parse__done();
!     probe query__plan__start();
!     probe query__plan__done();
!     probe query__execute__start();
!     probe query__execute__done();
!     probe query__start(char *queryString);
!     probe query__done(char *queryString);
!     probe statement__status(char *cmdString);
!
!     probe sort__start(int, int, int, int, int);
!     probe sort__end(int, long);
!
!     probe buffer__read__start(int blockNum, int tablespaceOid,
!             int databaseOid, int relationOid, int isLocalBuf);
!     probe buffer__read__done(int blockNum, int tablespaceOid,
!              int databaseOid, int relationOid, int isLocalBuf,
!             int isInBuffPool);
!     probe buffer__write__dirty__start(int blockNum, int tablespaceOid,
!             int databaseOid, int relationOid);
!     probe buffer__write__dirty__done(int blockNum, int tablespaceOid,
!             int databaseOid, int relationOid, int isHintBitsWrite);
!
!     probe buffer__flush__start(int tablespaceOid,
!             int databaseOid, int relationOid, int blockNum);
!     probe buffer__flush__done(int tablespaceOid, int databaseOid,
!             int relationOid, int blockNum);
!     probe buffer__hit();
!     probe buffer__miss();
!     probe buffer__checkpoint__start(int);
!     probe buffer__checkpoint__done();
!     probe buffer__sync__start(int numBuffers, int numToWrite);
!     probe buffer__sync__written(int bufId);
!     probe buffer__sync__done(int numBuffers, int numWritten, int numToWrite);
!
!     probe deadlock__found();
!     probe deadlock__notfound(int);

+     probe clog__checkpoint__start();
+     probe clog__checkpoint__done();
+     probe subtrans__checkpoint__start();
+     probe subtrans__checkpoint__done();
+     probe multixact__checkpoint__start();
+     probe multixact__checkpoint__done();
+     probe twophase__checkpoint__start();
+     probe twophase__checkpoint__done();
  };
Index: src/backend/utils/sort/tuplesort.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/sort/tuplesort.c,v
retrieving revision 1.85
diff -c -r1.85 tuplesort.c
*** src/backend/utils/sort/tuplesort.c    19 Jun 2008 00:46:05 -0000    1.85
--- src/backend/utils/sort/tuplesort.c    24 Jul 2008 14:19:32 -0000
***************
*** 115,126 ****
--- 115,132 ----
  #include "utils/rel.h"
  #include "utils/syscache.h"
  #include "utils/tuplesort.h"
+ #include "pg_trace.h"


  /* GUC variables */
  #ifdef TRACE_SORT
  bool        trace_sort = false;
  #endif
+
+ #define HEAP_SORT    0
+ #define INDEX_SORT    1
+ #define DATUM_SORT    2
+
  #ifdef DEBUG_BOUNDED_SORT
  bool        optimize_bounded_sort = true;
  #endif
***************
*** 569,574 ****
--- 575,581 ----
               "begin tuple sort: nkeys = %d, workMem = %d, randomAccess = %c",
               nkeys, workMem, randomAccess ? 't' : 'f');
  #endif
+     TRACE_POSTGRESQL_SORT_START(HEAP_SORT, false, nkeys, workMem, randomAccess);

      state->nKeys = nkeys;

***************
*** 636,641 ****
--- 643,650 ----

      state->nKeys = RelationGetNumberOfAttributes(indexRel);

+     TRACE_POSTGRESQL_SORT_START(INDEX_SORT, enforceUnique, state->nKeys, workMem, randomAccess);
+
      state->comparetup = comparetup_index_btree;
      state->copytup = copytup_index;
      state->writetup = writetup_index;
***************
*** 713,718 ****
--- 722,728 ----
               "begin datum sort: workMem = %d, randomAccess = %c",
               workMem, randomAccess ? 't' : 'f');
  #endif
+     TRACE_POSTGRESQL_SORT_START(DATUM_SORT, false, 1, workMem, randomAccess);

      state->nKeys = 1;            /* always a one-column sort */

***************
*** 825,830 ****
--- 835,845 ----
      }
  #endif

+     TRACE_POSTGRESQL_SORT_END(state->tapeset,
+             (state->tapeset ? LogicalTapeSetBlocks(state->tapeset) :
+             (state->allowedMem - state->availMem + 1023) / 1024));
+
+
      MemoryContextSwitchTo(oldcontext);

      /*

Re: Review: DTrace probes (merged version) ver_03

From
Theo Schlossnagle
Date:
On Jul 24, 2008, at 11:11 AM, Zdenek Kotala wrote:

> I performed review and I prepared own patch which contains only  
> probes without any issue. I suggest commit this patch because the  
> rest of patch is independent and it can be committed next commit  
> fest after rework.
>
> I found following issues:
>
> 1) SLRU probes.
>
> I think it is good to have probes there but they needs polish. See  
> my comments
> http://reviewdemo.postgresql.org/r/25/

The slru's are quite useful and general enough to use easily.  I used  
them to verify the metered checkpointing stuff:

http://lethargy.org/~jesus/archives/112-Probing-for-Success.html

> 2) XLOG probes
>
> I think there is confuse placement of probes after merge. It needs  
> cleanup.
>
>
> 3) Executor probes
>
> I would like to see any use case for them/

I added them with two thoughts (and knowing that they cost nothing).
(1) you can trace them to assist in debugging an explain plan and to  
better understand the flow of the execution engine.  This is not a  
compelling reason, but a reason none-the-less.
(2) you can trace and existing long-running query for which you do not  
have the original plan (may have changed) and make an educated guess  
at the plan chosen at time of execution.

> 4) smgr probes
>
> I prefer to have this probes in smgr instead of md. The reason why  
> Robert put them into md is that it returns number of written/read  
> bytes, but it is "always" BLCKSZ which could be returned from smgr  
> directly. Only difference is
> when error occurs during write/read and not all data are written/read.
>
> It needs discuss.
>
> 5) autovacuum start probes
>
> I would like to see also stat/stop for any other process types. It  
> was discussed but no comment from author(s).
>
> 6) idle transaction
>
> See my comments
> http://reviewdemo.postgresql.org/r/25/
>
> 7) query-reewrite is missing
>
>
> 8) mark dirty and BM_HINT... flag
>
> I remove these because I don't see any use case for it. It would be  
> nice provide some dtrace script or describe basic ideas.


Perhaps I misunderstood what mark dirty does, but here was my thinking:

Because of the background writer, it is difficult to understand which  
postgres process (and thus query) induced disk writes.  Marking a page  
as dirty is a good indication that a query will be causing I/O and you  
can measure calls to mark dirty per query as a telling metric.

Perhaps I misunderstood, but I have a very serious problem that I  
can't reliably track write I/O to postgresql process ID as the  
bgwriter and the kernel are flushing those dirty blocks to disk while  
the process isn't running.  In my (albeit naive) tests, the mark dirty  
gave me quite expected results for correlating query execution to disk  
I/O to be induced.

--
Theo Schlossnagle
Esoteric Curio -- http://lethargy.org/
OmniTI Computer Consulting, Inc. -- http://omniti.com/



Re: Review: DTrace probes (merged version) ver_03

From
Zdenek Kotala
Date:
Theo Schlossnagle napsal(a):
> 
> On Jul 24, 2008, at 11:11 AM, Zdenek Kotala wrote:
> 
>> I performed review and I prepared own patch which contains only probes 
>> without any issue. I suggest commit this patch because the rest of 
>> patch is independent and it can be committed next commit fest after 
>> rework.
>>
>> I found following issues:
>>
>> 1) SLRU probes.
>>
>> I think it is good to have probes there but they needs polish. See my 
>> comments
>> http://reviewdemo.postgresql.org/r/25/
> 
> The slru's are quite useful and general enough to use easily.  I used 
> them to verify the metered checkpointing stuff:
> 
> http://lethargy.org/~jesus/archives/112-Probing-for-Success.html

I agree that SLRU probes are useful but I'm worry about implementation. I think 
that these probes need more work before commit. Currently there are several bugs 
in placement and arguments (from my point of view).

>> 3) Executor probes
>>
>> I would like to see any use case for them/
> 
> I added them with two thoughts (and knowing that they cost nothing).
> (1) you can trace them to assist in debugging an explain plan and to 
> better understand the flow of the execution engine.  This is not a 
> compelling reason, but a reason none-the-less.
> (2) you can trace and existing long-running query for which you do not 
> have the original plan (may have changed) and make an educated guess at 
> the plan chosen at time of execution.

I'm not executor expert and (1) is useful for me :-). What I'm thinking about is 
if we can mine more information from executor like number of tuples processed by 
node number and so on. I think that it needs discussion.

>> 8) mark dirty and BM_HINT... flag
>>
>> I remove these because I don't see any use case for it. It would be 
>> nice provide some dtrace script or describe basic ideas.
> 
> 
> Perhaps I misunderstood what mark dirty does, but here was my thinking:
> 
> Because of the background writer, it is difficult to understand which 
> postgres process (and thus query) induced disk writes.  Marking a page 
> as dirty is a good indication that a query will be causing I/O and you 
> can measure calls to mark dirty per query as a telling metric.
> 
> Perhaps I misunderstood, but I have a very serious problem that I can't 
> reliably track write I/O to postgresql process ID as the bgwriter and 
> the kernel are flushing those dirty blocks to disk while the process 
> isn't running.  In my (albeit naive) tests, the mark dirty gave me quite 
> expected results for correlating query execution to disk I/O to be induced.
> 

If I understand correctly you need to analyze number of writes per 
query/session. It seems to me, that to use mark dirty is good way, but it 
probably needs more probes. (Robert L. any idea?)

However what I suggested is commit probes without issue now and the rest will be 
processed on the next commit fest after rework/discussion.
    Zdenek

-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: Review: DTrace probes (merged version) ver_03

From
Alvaro Herrera
Date:
Zdenek Kotala wrote:
> I performed review and I prepared own patch which contains only probes 
> without any issue. I suggest commit this patch because the rest of patch 
> is independent and it can be committed next commit fest after rework.
>
> I found following issues:

I noticed that CLOG, Subtrans and Multixact probes are added during a
regular Checkpoint, but not during a shutdown flush.  I think the probes
should count that too (probably with the same counter).

In the pgstat_report_activity probe, is it good to call the probe before
taking the fast path out?

In the BUFFER_READ_START probe, we do not include the smgrnblocks()
call, which could be significant since it includes a number of system
calls.

I think BUFFER_HIT and BUFFER_MISS should include the "isLocalBuf" flag.
I also wonder whether BUFFER_HIT should be called in the block above,
lines 220-238, where we check the "found" flag, i.e.
   if (isLocalBuf)   {       ReadLocalBufferCount++;       bufHdr = LocalBufferAlloc(smgr, blockNum, &found);       if
(found)      {           LocalBufferHitCount++;           TRACE_POSTGRESQL_BUFFER_HIT(true);        /* local buffer */
    }else{    TRACE_POSTGRESQL_BUFFER_MISS(true);        /* ditto */}   }   else   {       ReadBufferCount++;
 
       /*        * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is        * not currently in
memory.       */       bufHdr = BufferAlloc(smgr, blockNum, strategy, &found);       if (found)       {
BufferHitCount++;          TRACE_POSTGRESQL_BUFFER_HIT(false);        /* not local */       }else{
TRACE_POSTGRESQL_BUFFER_MISS(false);   /* ditto */}   }
 

(note that this changes the semantics w.r.t. the isExtend flag).


I understand the desire to have DEADLOCK_FOUND, but is there really a
point in having a DEADLOCK_NOTFOUND probe?  Since this code runs every
time someone waits for a lock longer than a second, there would be a lot
of useless counts and nothing useful.

I find it bogus that we include query rewriting in QUERY_PARSE_START/DONE.  
I think query rewriting should be a separate probe.

QUERY_PLAN_START is badly placed -- it should be after the check for
utility commands (alternatively there could be a QUERY_PLAN_DONE in the
fast way out for utility commands, but in that case a "is utility" flag
would be needed.  I don't see that there's any point in tracing planning
of utility commands though).

Why are there no probes for the v3 protocol stuff?  There should
be probes for Parse, Bind, Execute message processing too, for
completeness.  Also, I wonder if these probes should be in the for(;;)
loop in PostgresMain() instead of sprinkled in the other routines.
I note that the probes in PortalRun and PortalRunMulti are schizophrenic
about whether they include utility functions or not.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Review: DTrace probes (merged version) ver_03

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> I performed review and I prepared own patch which contains only probes
> without any issue. I suggest commit this patch because the rest of
> patch is independent and it can be committed next commit fest after
> rework.

I looked at this patch a little bit.  In addition to the comments Alvaro
made, I have a couple more issues:

* The probes that pass buffer tag elements are already broken by the
pending "relation forks" patch: there is soon going to be another field
in buffer tags.  Perhaps it'd be feasible to pass the buffer tag as a 
single probe argument to make that a bit more future-proof?  I'm not
sure if that would complicate the use of the probe so much as to be
counterproductive.

* I find this to be truly bletcherous:

> !     /* 
> !      * Due to a bug in Mac OS X 10.5, using defined types (e.g. uintptr_t,
> !      * uint32_t, etc.) cause compilation error.  
> !      */
> !       
> !     probe transaction__start(unsigned int transactionId);
> !     probe transaction__commit(unsigned int transactionId);
> !     probe transaction__abort(unsigned int transactionId);

especially since some of the manual translations in the file are flat
out wrong (Oid is unsigned for instance).  Furthermore the comment is
wrong, at least according to my tests with XCode 3.1.  Typedefs seem to
work fine.  What doesn't work fine is #include "postgres.h", which would
be the ideal way to suck in TransactionId and other needed typedefs.
You can get dtrace to invoke the C preprocessor, but at least on OS X,
the D compiler spits up anyway on the contents of some of the system
header files that are pulled in by postgres.h.

What I suggest might be a reasonable compromise is to copy needed
typedefs directly into the probes.d file:
typedef unsigned int LocalTransactionId;
provider postgresql {
probe transaction__start(LocalTransactionId);

This at least makes it possible to declare the probes cleanly,
and it's fairly obvious what to fix if the principal definition of
LocalTransactionId ever changes.  I don't have Solaris to test on, but
on OS X this seems to behave the way we'd like: the typedef itself isn't
copied into the emitted probes.h file, but the emitted extern
declarations use it.
        regards, tom lane


Re: Review: DTrace probes (merged version) ver_03

From
Zdenek Kotala
Date:
Alvaro Herrera napsal(a):
> Zdenek Kotala wrote:
>> I performed review and I prepared own patch which contains only probes 
>> without any issue. I suggest commit this patch because the rest of patch 
>> is independent and it can be committed next commit fest after rework.
>>
>> I found following issues:
> 
> I noticed that CLOG, Subtrans and Multixact probes are added during a
> regular Checkpoint, but not during a shutdown flush.  I think the probes
> should count that too (probably with the same counter).

Yeah, good catch.

> In the pgstat_report_activity probe, is it good to call the probe before
> taking the fast path out?

If you mean to put it behind "if (!pgstat_track_activities || !beentry)" then I 
think that current placement is OK. You should be possibility to track it 
without dependency on pgstat_track_activities GUC variable. Keep in mind that 
DTrace is designed to monitor/trace production system and ability to monitor 
something without any configuration change is main idea.

> In the BUFFER_READ_START probe, we do not include the smgrnblocks()
> call, which could be significant since it includes a number of system
> calls.

It is because the BUFFER_READ_START needs to report correct blockno. My 
suggestion is to add probes to smgrblock function.

> I think BUFFER_HIT and BUFFER_MISS should include the "isLocalBuf" flag.
> I also wonder whether BUFFER_HIT should be called in the block above,
> lines 220-238, where we check the "found" flag, i.e.
> 
>     if (isLocalBuf)
>     {
>         ReadLocalBufferCount++;
>         bufHdr = LocalBufferAlloc(smgr, blockNum, &found);
>         if (found)
>         {
>             LocalBufferHitCount++;
>             TRACE_POSTGRESQL_BUFFER_HIT(true);        /* local buffer */
>         }
>     else
>     {
>         TRACE_POSTGRESQL_BUFFER_MISS(true);        /* ditto */
>     }
>     }
>     else
>     {
>         ReadBufferCount++;
> 
>         /*
>          * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is
>          * not currently in memory.
>          */
>         bufHdr = BufferAlloc(smgr, blockNum, strategy, &found);
>         if (found)
>         {
>             BufferHitCount++;
>             TRACE_POSTGRESQL_BUFFER_HIT(false);        /* not local */
>         }
>     else
>     {
>         TRACE_POSTGRESQL_BUFFER_MISS(false);    /* ditto */
>     }
>     }
> 
> (note that this changes the semantics w.r.t. the isExtend flag).

Good point. The question about isExted is if we want to have exact output 
include corner case or be to sync with ReadBufferCount counter. I prefer your 
suggested placement.

> 
> I understand the desire to have DEADLOCK_FOUND, but is there really a
> point in having a DEADLOCK_NOTFOUND probe?  Since this code runs every
> time someone waits for a lock longer than a second, there would be a lot
> of useless counts and nothing useful.

No idea. Robert any comment?

> I find it bogus that we include query rewriting in QUERY_PARSE_START/DONE.  
> I think query rewriting should be a separate probe.

agree

> QUERY_PLAN_START is badly placed -- it should be after the check for
> utility commands (alternatively there could be a QUERY_PLAN_DONE in the
> fast way out for utility commands, but in that case a "is utility" flag
> would be needed.  I don't see that there's any point in tracing planning
> of utility commands though).

agree

> Why are there no probes for the v3 protocol stuff?  There should
> be probes for Parse, Bind, Execute message processing too, for
> completeness.  Also, I wonder if these probes should be in the for(;;)
> loop in PostgresMain() instead of sprinkled in the other routines.
> I note that the probes in PortalRun and PortalRunMulti are schizophrenic
> about whether they include utility functions or not.

+1 It is good suggestion. I hope that Robert will put it on his probe todo list. 
Same like probes for memory management.
Zdenek


Re: Review: DTrace probes (merged version) ver_03

From
Zdenek Kotala
Date:
Tom Lane napsal(a):
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
>> I performed review and I prepared own patch which contains only probes
>> without any issue. I suggest commit this patch because the rest of
>> patch is independent and it can be committed next commit fest after
>> rework.
> 
> I looked at this patch a little bit.  In addition to the comments Alvaro
> made, I have a couple more issues:
> 
> * The probes that pass buffer tag elements are already broken by the
> pending "relation forks" patch: there is soon going to be another field
> in buffer tags.  Perhaps it'd be feasible to pass the buffer tag as a 
> single probe argument to make that a bit more future-proof?  I'm not
> sure if that would complicate the use of the probe so much as to be
> counterproductive.

It is difficult to say. If you pass only pointer to a structure you can mine all 
data from them but it is little bit complicated. It should be possible to use 
typedef and cast pointer to correct structure. However main problem is that you 
cannot use easily any indirect data in predicates. It needs extra magic. By my 
opinion best approach is use mix of approaches. Important data should be pass 
directly as a argument and rest as a pointer to data structure.

> * I find this to be truly bletcherous:

<snip>
> What doesn't work fine is #include "postgres.h", which would> be the ideal way to suck in TransactionId and other
neededtypedefs.
 

Whats about #include "c.h"? Does it work or does it have same issue?

> 
> What I suggest might be a reasonable compromise is to copy needed
> typedefs directly into the probes.d file:
> 
>     typedef unsigned int LocalTransactionId;
> 
>     provider postgresql {
> 
>     probe transaction__start(LocalTransactionId);
> 
> This at least makes it possible to declare the probes cleanly,
> and it's fairly obvious what to fix if the principal definition of
> LocalTransactionId ever changes.  I don't have Solaris to test on, but
> on OS X this seems to behave the way we'd like: the typedef itself isn't
> copied into the emitted probes.h file, but the emitted extern
> declarations use it.

It works on Solaris as well. However, I think that better solution is to put 
this typedef to the separate header file. DTrace script allows to use typedef or 
include header with typedefs. It would be better to have header file which could 
be used for probe.d and for any other DTrace script.

    Zdenek



Re: Review: DTrace probes (merged version) ver_03

From
Zdenek Kotala
Date:
Zdenek Kotala napsal(a):
> Alvaro Herrera napsal(a):
>> Zdenek Kotala wrote:
>>> I performed review and I prepared own patch which contains only 
>>> probes without any issue. I suggest commit this patch because the 
>>> rest of patch is independent and it can be committed next commit fest 
>>> after rework.
>>>
>>> I found following issues:
>>
>> I noticed that CLOG, Subtrans and Multixact probes are added during a
>> regular Checkpoint, but not during a shutdown flush.  I think the probes
>> should count that too (probably with the same counter).
> 
> Yeah, good catch.

When I'm thinking about it, It seems to me better idea to have

TRACE_POSTGRESQL_CLOG_SHUTDOWN_START();
TRACE_POSTGRESQL_XLOG_SHUTDOWN_START();

Because you will able to determine what was a reason for flush and how log take 
shutting down.


By the way why the shutdown order is following:

05617     CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
05618     ShutdownCLOG();
05619     ShutdownSUBTRANS();
05620     ShutdownMultiXact();

What does happen when kill -9/power lost comes between lines 5617 and 5618?
    Zdenek


Re: Review: DTrace probes (merged version) ver_03

From
Tom Lane
Date:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> Whats about #include "c.h"? Does it work or does it have same issue?

Same problem --- postgres.h isn't actually including any of the
problematic files for itself.

>> What I suggest might be a reasonable compromise is to copy needed
>> typedefs directly into the probes.d file:

> It works on Solaris as well. However, I think that better solution is to put 
> this typedef to the separate header file.

Not going to work, at least not in the general case, because the
typedefs involved are ultimately dependent on system headers that might
or might not be DTrace-clean.  In any case I'm going to resist doing
random restructuring of our core header files for the convenience
of DTrace --- if we take that approach, adding a new probe becomes
a major PITA instead of just a couple more lines of code.  You're likely
to find requests for new probes getting rejected because the required
header rearrangements are too ugly; do you want to take that risk?

We could put the typedefs into probes.d as a stopgap measure, hoping
that Apple (and Sun?) will someday fix their headers and/or the D
compiler so that #include'ing postgres.h works properly.
        regards, tom lane


Re: Review: DTrace probes (merged version) ver_03

From
Robert Lor
Date:
Zdenek Kotala wrote:
> However what I suggested is commit probes without issue now and the 
> rest will be processed on the next commit fest after rework/discussion.
>
>      

Agreed. I'll fix up the remaining issues with the patch you sent.

-- 
Robert Lor           Sun Microsystems
Austin, USA          http://sun.com/postgresql



Re: Review: DTrace probes (merged version) ver_03

From
Robert Lor
Date:
Zdenek Kotala wrote:
> Alvaro Herrera napsal(a):
>> Zdenek Kotala wrote:
>>> I performed review and I prepared own patch which contains only 
>>> probes without any issue. I suggest commit this patch because the 
>>> rest of patch is independent and it can be committed next commit 
>>> fest after rework.
>>>
>>> I found following issues:
>>
>> I noticed that CLOG, Subtrans and Multixact probes are added during a
>> regular Checkpoint, but not during a shutdown flush.  I think the probes
>> should count that too (probably with the same counter).
>
> Yeah, good catch.

Ok. I think the names should be the same but pass a true/false argument 
to differentiate the call just like how it's used in SimpleLruFlush.

e.g.

TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(false) # shutdown case
TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true)

>
>> In the pgstat_report_activity probe, is it good to call the probe before
>> taking the fast path out?
>
> If you mean to put it behind "if (!pgstat_track_activities || 
> !beentry)" then I think that current placement is OK. You should be 
> possibility to track it without dependency on pgstat_track_activities 
> GUC variable. Keep in mind that DTrace is designed to monitor/trace 
> production system and ability to monitor something without any 
> configuration change is main idea.

This probe just prints out the statement, and it doesn't matter whether 
or not track_activities is enabled.

>
>> In the BUFFER_READ_START probe, we do not include the smgrnblocks()
>> call, which could be significant since it includes a number of system
>> calls.
>
> It is because the BUFFER_READ_START needs to report correct blockno. 
> My suggestion is to add probes to smgrblock function.

Yes, that's the main reason.

>
>> I think BUFFER_HIT and BUFFER_MISS should include the "isLocalBuf" flag.
>> I also wonder whether BUFFER_HIT should be called in the block above,
>> lines 220-238, where we check the "found" flag, i.e.
>>
>>     if (isLocalBuf)
>>     {
>>         ReadLocalBufferCount++;
>>         bufHdr = LocalBufferAlloc(smgr, blockNum, &found);
>>         if (found)
>>         {
>>             LocalBufferHitCount++;
>>             TRACE_POSTGRESQL_BUFFER_HIT(true);        /* local buffer */
>>         }
>>     else
>>     {
>>         TRACE_POSTGRESQL_BUFFER_MISS(true);        /* ditto */
>>     }
>>     }
>>     else
>>     {
>>         ReadBufferCount++;
>>
>>         /*
>>          * lookup the buffer.  IO_IN_PROGRESS is set if the requested 
>> block is
>>          * not currently in memory.
>>          */
>>         bufHdr = BufferAlloc(smgr, blockNum, strategy, &found);
>>         if (found)
>>         {
>>             BufferHitCount++;
>>             TRACE_POSTGRESQL_BUFFER_HIT(false);        /* not local */
>>         }
>>     else
>>     {
>>         TRACE_POSTGRESQL_BUFFER_MISS(false);    /* ditto */
>>     }
>>     }
>>
>> (note that this changes the semantics w.r.t. the isExtend flag).
>
> Good point. The question about isExted is if we want to have exact 
> output include corner case or be to sync with ReadBufferCount counter. 
> I prefer your suggested placement.

Agree.

>
>>
>> I understand the desire to have DEADLOCK_FOUND, but is there really a
>> point in having a DEADLOCK_NOTFOUND probe?  Since this code runs every
>> time someone waits for a lock longer than a second, there would be a lot
>> of useless counts and nothing useful.
>
> No idea. Robert any comment?

Yes, you're right. Will delete.

>
>> I find it bogus that we include query rewriting in 
>> QUERY_PARSE_START/DONE.  I think query rewriting should be a separate 
>> probe.
>
> agree

Will fix. I was also thinking about having the probes by modules but 
wanted to limit the number of probes, but I'm fine having another one.

>
>> QUERY_PLAN_START is badly placed -- it should be after the check for
>> utility commands (alternatively there could be a QUERY_PLAN_DONE in the
>> fast way out for utility commands, but in that case a "is utility" flag
>> would be needed.  I don't see that there's any point in tracing planning
>> of utility commands though).
>
> agree
Ok

>
>> Why are there no probes for the v3 protocol stuff?  There should
>> be probes for Parse, Bind, Execute message processing too, for
>> completeness.  

Thanks for catching this.

>> Also, I wonder if these probes should be in the for(;;)
>> loop in PostgresMain() instead of sprinkled in the other routines.
>> I note that the probes in PortalRun and PortalRunMulti are schizophrenic
>> about whether they include utility functions or not.
As are as I can tell, the current probes in PortalRun/Multi don't 
include utility functions.  Should they be included?


I'll send the patch for the above changes tomorrow!

-- 
Robert Lor           Sun Microsystems
Austin, USA          http://sun.com/postgresql



Re: Review: DTrace probes (merged version) ver_03

From
Robert Lor
Date:
Tom Lane wrote:
>
> * The probes that pass buffer tag elements are already broken by the
> pending "relation forks" patch: there is soon going to be another field
> in buffer tags.  Perhaps it'd be feasible to pass the buffer tag as a 
> single probe argument to make that a bit more future-proof?  I'm not
> sure if that would complicate the use of the probe so much as to be
> counterproductive.
>   
Are you referring to this struct?

typedef struct buftag
{       RelFileNode rnode;                      /* physical relation 
identifier */       BlockNumber blockNum;           /* blknum relative to begin of 
reln */
} BufferTag;

I'm not familiar with this pending patch, but why would it break when 
another field is added?

It's certainly possible to pass the buffer tag, but I'd say it's not a 
good idea if the other fields will never be used.

> * I find this to be truly bletcherous:
>
>   
>> !     /* 
>> !      * Due to a bug in Mac OS X 10.5, using defined types (e.g. uintptr_t,
>> !      * uint32_t, etc.) cause compilation error.  
>> !      */
>> !       
>> !     probe transaction__start(unsigned int transactionId);
>> !     probe transaction__commit(unsigned int transactionId);
>> !     probe transaction__abort(unsigned int transactionId);
>>     
>
> especially since some of the manual translations in the file are flat
> out wrong (Oid is unsigned for instance).
Oops!
>   Furthermore the comment is
> wrong, at least according to my tests with XCode 3.1.  Typedefs seem to
> work fine.

The issue is with Apple's dtrace implementation, not Xcode. For more 
info, please see the link below.

http://www.opensolaris.org/jive/thread.jspa?messageID=252503&#252503


> What I suggest might be a reasonable compromise is to copy needed
> typedefs directly into the probes.d file:
>
>     typedef unsigned int LocalTransactionId;
>
>     provider postgresql {
>
>     probe transaction__start(LocalTransactionId);
>
>   
I like this solution.

-- 
Robert Lor           Sun Microsystems
Austin, USA          http://sun.com/postgresql



Re: Review: DTrace probes (merged version) ver_03

From
Tom Lane
Date:
Robert Lor <Robert.Lor@Sun.COM> writes:
> Tom Lane wrote:
>> * The probes that pass buffer tag elements are already broken by the
>> pending "relation forks" patch: there is soon going to be another field
>> in buffer tags.

> I'm not familiar with this pending patch, but why would it break when 
> another field is added?

By "break" I meant "fail to function usefully".  Yes, it would still
compile, but if you don't have the fork number available then you won't
be able to tell what's really happening in the buffer pool.  You might
as well not pass any of the buffer tag as pass only part of it.

>> Furthermore the comment is
>> wrong, at least according to my tests with XCode 3.1.  Typedefs seem to
>> work fine.

> The issue is with Apple's dtrace implementation, not Xcode. For more 
> info, please see the link below.
> http://www.opensolaris.org/jive/thread.jspa?messageID=252503&#252503

I think what this is complaining about is whether allegedly built-in
typedefs like uintptr_t work.  What we care about is different: can
we write an explicit typedef in the .d file?  I do not know if that
worked in XCode 3.0 or not, but it seems to work fine in the version
of dtrace shipped in 3.1.  (And I'm perfectly fine with telling people
that they can't compile Postgres dtrace support with less than the most
recent tool set, especially since it'll be fairly old by the time 8.4
ships.)
        regards, tom lane


Re: Review: DTrace probes (merged version) ver_03

From
Robert Lor
Date:
Tom Lane wrote:
> By "break" I meant "fail to function usefully".  Yes, it would still
> compile, but if you don't have the fork number available then you won't
> be able to tell what's really happening in the buffer pool.  You might
> as well not pass any of the buffer tag as pass only part of it.
>   

Got it.

>> The issue is with Apple's dtrace implementation, not Xcode. For more 
>> info, please see the link below.
>> http://www.opensolaris.org/jive/thread.jspa?messageID=252503&#252503
>>     
>
> I think what this is complaining about is whether allegedly built-in
> typedefs like uintptr_t work.

This is the message I tried to convey with the comment in probe.d, but I 
guess it was not clear.
>   What we care about is different: can
> we write an explicit typedef in the .d file? 

Yes.

>  I do not know if that
> worked in XCode 3.0 or not, but it seems to work fine in the version
> of dtrace shipped in 3.1.  (And I'm perfectly fine with telling people
> that they can't compile Postgres dtrace support with less than the most
> recent tool set, especially since it'll be fairly old by the time 8.4
> ships.)
>   
I tested on both Xcode 3.0 & 3.1 and both worked.


-- 
Robert Lor           Sun Microsystems
Austin, USA          http://sun.com/postgresql



Re: Review: DTrace probes (merged version) ver_03

From
Robert Lor
Date:
Updated patch attached.

This patch addresses all of Alvaro's feedback except the new probes for
v3 protocol which will be submitted later along with the rest of the probes.

It also incorporates Tom's feedback as explained inline below.

I hope this patch is good to go for this commit fest. Will take care of
the rest in the next fest.

Tom Lane wrote:
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
>
>> I performed review and I prepared own patch which contains only probes
>> without any issue. I suggest commit this patch because the rest of
>> patch is independent and it can be committed next commit fest after
>> rework.
>>
>
> I looked at this patch a little bit.  In addition to the comments Alvaro
> made, I have a couple more issues:
>
> * The probes that pass buffer tag elements are already broken by the
> pending "relation forks" patch: there is soon going to be another field
> in buffer tags.  Perhaps it'd be feasible to pass the buffer tag as a
> single probe argument to make that a bit more future-proof?  I'm not
> sure if that would complicate the use of the probe so much as to be
> counterproductive.
>
>
>
Took out the buffer tag argument for now. Will figure out how to best
solve this after this relation forks patch is committed.
>
> What I suggest might be a reasonable compromise is to copy needed
> typedefs directly into the probes.d file:
>
>     typedef unsigned int LocalTransactionId;
>
>     provider postgresql {
>
>     probe transaction__start(LocalTransactionId);
>
> This at least makes it possible to declare the probes cleanly,
> and it's fairly obvious what to fix if the principal definition of
> LocalTransactionId ever changes.  I don't have Solaris to test on, but
> on OS X this seems to behave the way we'd like: the typedef itself isn't
> copied into the emitted probes.h file, but the emitted extern
> declarations use it.
>
>
Implemented this suggestion. There are some weirdness with the OS X
compiler causing some of the probe declarations not to compile (see
comments in probe.d).  The compiler spits out some warnings because the
types don't show up in the function prototype in probes.h, but the
probes work okay. I think we can safely ignore the warnings.

--
Robert Lor           Sun Microsystems
Austin, USA          http://sun.com/postgresql

Index: src/backend/access/transam/clog.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/clog.c,v
retrieving revision 1.46
diff -u -3 -p -r1.46 clog.c
--- src/backend/access/transam/clog.c    1 Jan 2008 19:45:46 -0000    1.46
+++ src/backend/access/transam/clog.c    30 Jul 2008 04:02:32 -0000
@@ -36,6 +36,7 @@
 #include "access/slru.h"
 #include "access/transam.h"
 #include "postmaster/bgwriter.h"
+#include "pg_trace.h"

 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -313,7 +314,9 @@ void
 ShutdownCLOG(void)
 {
     /* Flush dirty CLOG pages to disk */
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(false);
     SimpleLruFlush(ClogCtl, false);
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(false);
 }

 /*
@@ -323,7 +326,9 @@ void
 CheckPointCLOG(void)
 {
     /* Flush dirty CLOG pages to disk */
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true);
     SimpleLruFlush(ClogCtl, true);
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true);
 }


Index: src/backend/access/transam/multixact.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/multixact.c,v
retrieving revision 1.27
diff -u -3 -p -r1.27 multixact.c
--- src/backend/access/transam/multixact.c    1 Jan 2008 19:45:46 -0000    1.27
+++ src/backend/access/transam/multixact.c    30 Jul 2008 04:02:33 -0000
@@ -57,6 +57,7 @@
 #include "storage/lmgr.h"
 #include "utils/memutils.h"
 #include "storage/procarray.h"
+#include "pg_trace.h"


 /*
@@ -1497,8 +1498,10 @@ void
 ShutdownMultiXact(void)
 {
     /* Flush dirty MultiXact pages to disk */
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START(false);
     SimpleLruFlush(MultiXactOffsetCtl, false);
     SimpleLruFlush(MultiXactMemberCtl, false);
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(false);
 }

 /*
@@ -1526,6 +1529,8 @@ MultiXactGetCheckptMulti(bool is_shutdow
 void
 CheckPointMultiXact(void)
 {
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START(true);
+
     /* Flush dirty MultiXact pages to disk */
     SimpleLruFlush(MultiXactOffsetCtl, true);
     SimpleLruFlush(MultiXactMemberCtl, true);
@@ -1540,6 +1545,8 @@ CheckPointMultiXact(void)
      */
     if (!InRecovery)
         TruncateMultiXact();
+
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);
 }

 /*
Index: src/backend/access/transam/subtrans.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/subtrans.c,v
retrieving revision 1.22
diff -u -3 -p -r1.22 subtrans.c
--- src/backend/access/transam/subtrans.c    26 Mar 2008 18:48:59 -0000    1.22
+++ src/backend/access/transam/subtrans.c    30 Jul 2008 04:02:34 -0000
@@ -32,6 +32,7 @@
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "utils/snapmgr.h"
+#include "pg_trace.h"


 /*
@@ -265,7 +266,9 @@ ShutdownSUBTRANS(void)
      * This is not actually necessary from a correctness point of view. We do
      * it merely as a debugging aid.
      */
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(false);
     SimpleLruFlush(SubTransCtl, false);
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(false);
 }

 /*
@@ -281,7 +284,9 @@ CheckPointSUBTRANS(void)
      * it merely to improve the odds that writing of dirty pages is done by
      * the checkpoint process and not by backends.
      */
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(true);
     SimpleLruFlush(SubTransCtl, true);
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(true);
 }


Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.43
diff -u -3 -p -r1.43 twophase.c
--- src/backend/access/transam/twophase.c    19 May 2008 18:16:26 -0000    1.43
+++ src/backend/access/transam/twophase.c    30 Jul 2008 04:02:35 -0000
@@ -57,6 +57,7 @@
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /*
@@ -1387,6 +1388,9 @@ CheckPointTwoPhase(XLogRecPtr redo_horiz
      */
     if (max_prepared_xacts <= 0)
         return;                    /* nothing to do */
+
+    TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_START();
+
     xids = (TransactionId *) palloc(max_prepared_xacts * sizeof(TransactionId));
     nxids = 0;

@@ -1444,6 +1448,8 @@ CheckPointTwoPhase(XLogRecPtr redo_horiz
     }

     pfree(xids);
+
+    TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_DONE();
 }

 /*
Index: src/backend/postmaster/pgstat.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.176
diff -u -3 -p -r1.176 pgstat.c
--- src/backend/postmaster/pgstat.c    30 Jun 2008 10:58:47 -0000    1.176
+++ src/backend/postmaster/pgstat.c    30 Jul 2008 04:02:37 -0000
@@ -61,6 +61,7 @@
 #include "utils/ps_status.h"
 #include "utils/rel.h"
 #include "utils/tqual.h"
+#include "pg_trace.h"


 /* ----------
@@ -2202,6 +2203,8 @@ pgstat_report_activity(const char *cmd_s
     TimestampTz start_timestamp;
     int            len;

+    TRACE_POSTGRESQL_STATEMENT_STATUS(cmd_str);
+
     if (!pgstat_track_activities || !beentry)
         return;

Index: src/backend/storage/buffer/bufmgr.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.234
diff -u -3 -p -r1.234 bufmgr.c
--- src/backend/storage/buffer/bufmgr.c    13 Jul 2008 20:45:47 -0000    1.234
+++ src/backend/storage/buffer/bufmgr.c    30 Jul 2008 04:02:38 -0000
@@ -43,6 +43,7 @@
 #include "utils/rel.h"
 #include "utils/resowner.h"
 #include "pgstat.h"
+#include "pg_trace.h"


 /* Note: these two macros only work on shared buffers, not local ones! */
@@ -213,12 +214,22 @@ ReadBuffer_common(SMgrRelation smgr, boo
     if (isExtend)
         blockNum = smgrnblocks(smgr);

+    TRACE_POSTGRESQL_BUFFER_READ_START(blockNum, smgr->smgr_rnode.spcNode,
+        smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, isLocalBuf);
+
     if (isLocalBuf)
     {
         ReadLocalBufferCount++;
         bufHdr = LocalBufferAlloc(smgr, blockNum, &found);
         if (found)
+        {
             LocalBufferHitCount++;
+            TRACE_POSTGRESQL_BUFFER_HIT(true); /* true == local buffer */
+        }
+        else
+        {
+            TRACE_POSTGRESQL_BUFFER_MISS(true); /* ditto */
+        }
     }
     else
     {
@@ -230,7 +241,14 @@ ReadBuffer_common(SMgrRelation smgr, boo
          */
         bufHdr = BufferAlloc(smgr, blockNum, strategy, &found);
         if (found)
+        {
             BufferHitCount++;
+            TRACE_POSTGRESQL_BUFFER_HIT(false); /* false != local buffer */
+        }
+        else
+        {
+            TRACE_POSTGRESQL_BUFFER_MISS(false); /* ditto */
+        }
     }

     /* At this point we do NOT hold any locks. */
@@ -246,6 +264,11 @@ ReadBuffer_common(SMgrRelation smgr, boo
             if (VacuumCostActive)
                 VacuumCostBalance += VacuumCostPageHit;

+            TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum,
+                smgr->smgr_rnode.spcNode,
+                smgr->smgr_rnode.dbNode,
+                smgr->smgr_rnode.relNode, isLocalBuf, found);
+
             return BufferDescriptorGetBuffer(bufHdr);
         }

@@ -368,6 +391,10 @@ ReadBuffer_common(SMgrRelation smgr, boo
     if (VacuumCostActive)
         VacuumCostBalance += VacuumCostPageMiss;

+    TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum, smgr->smgr_rnode.spcNode,
+            smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode,
+            isLocalBuf, found);
+
     return BufferDescriptorGetBuffer(bufHdr);
 }

@@ -1086,6 +1113,8 @@ BufferSync(int flags)
     if (num_to_write == 0)
         return;                    /* nothing to do */

+    TRACE_POSTGRESQL_BUFFER_SYNC_START(NBuffers, num_to_write);
+
     /*
      * Loop over all buffers again, and write the ones (still) marked with
      * BM_CHECKPOINT_NEEDED.  In this loop, we start at the clock sweep point
@@ -1117,6 +1146,7 @@ BufferSync(int flags)
         {
             if (SyncOneBuffer(buf_id, false) & BUF_WRITTEN)
             {
+                TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
                 BgWriterStats.m_buf_written_checkpoints++;
                 num_written++;

@@ -1147,6 +1177,8 @@ BufferSync(int flags)
             buf_id = 0;
     }

+    TRACE_POSTGRESQL_BUFFER_SYNC_DONE(NBuffers, num_written, num_to_write);
+
     /*
      * Update checkpoint statistics. As noted above, this doesn't include
      * buffers written by other backends or bgwriter scan.
@@ -1653,11 +1685,13 @@ PrintBufferLeakWarning(Buffer buffer)
 void
 CheckPointBuffers(int flags)
 {
+    TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
     CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
     BufferSync(flags);
     CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
     smgrsync();
     CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
+    TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
 }


@@ -1759,6 +1793,10 @@ FlushBuffer(volatile BufferDesc *buf, SM
     if (reln == NULL)
         reln = smgropen(buf->tag.rnode);

+    TRACE_POSTGRESQL_BUFFER_FLUSH_START(reln->smgr_rnode.spcNode,
+         reln->smgr_rnode.dbNode,
+         reln->smgr_rnode.relNode);
+
     /*
      * Force XLOG flush up to buffer's LSN.  This implements the basic WAL
      * rule that log updates must hit disk before any of the data-file changes
@@ -1785,6 +1823,9 @@ FlushBuffer(volatile BufferDesc *buf, SM

     BufferFlushCount++;

+    TRACE_POSTGRESQL_BUFFER_FLUSH_DONE(reln->smgr_rnode.spcNode,
+         reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode);
+
     /*
      * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and
      * end the io_in_progress state.
Index: src/backend/storage/lmgr/deadlock.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/deadlock.c,v
retrieving revision 1.53
diff -u -3 -p -r1.53 deadlock.c
--- src/backend/storage/lmgr/deadlock.c    24 Mar 2008 18:22:36 -0000    1.53
+++ src/backend/storage/lmgr/deadlock.c    30 Jul 2008 04:02:39 -0000
@@ -30,6 +30,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /* One edge in the waits-for graph */
@@ -222,6 +223,8 @@ DeadLockCheck(PGPROC *proc)
          */
         int            nSoftEdges;

+        TRACE_POSTGRESQL_DEADLOCK_FOUND();
+
         nWaitOrders = 0;
         if (!FindLockCycle(proc, possibleConstraints, &nSoftEdges))
             elog(FATAL, "deadlock seems to have disappeared");
Index: src/backend/storage/lmgr/lock.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/lock.c,v
retrieving revision 1.183
diff -u -3 -p -r1.183 lock.c
--- src/backend/storage/lmgr/lock.c    17 Mar 2008 19:44:41 -0000    1.183
+++ src/backend/storage/lmgr/lock.c    30 Jul 2008 04:02:40 -0000
@@ -787,11 +787,11 @@ LockAcquire(const LOCKTAG *locktag,
          * Sleep till someone wakes me up.
          */

-        TRACE_POSTGRESQL_LOCK_STARTWAIT(locktag->locktag_field2, lockmode);
+        TRACE_POSTGRESQL_LOCK_WAIT_START(locktag->locktag_field2, lockmode);

         WaitOnLock(locallock, owner);

-        TRACE_POSTGRESQL_LOCK_ENDWAIT(locktag->locktag_field2, lockmode);
+        TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field2, lockmode);

         /*
          * NOTE: do not do any material change of state between here and
Index: src/backend/storage/lmgr/lwlock.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/lwlock.c,v
retrieving revision 1.51
diff -u -3 -p -r1.51 lwlock.c
--- src/backend/storage/lmgr/lwlock.c    17 Mar 2008 19:44:41 -0000    1.51
+++ src/backend/storage/lmgr/lwlock.c    30 Jul 2008 04:02:40 -0000
@@ -448,7 +448,7 @@ LWLockAcquire(LWLockId lockid, LWLockMod
         block_counts[lockid]++;
 #endif

-        TRACE_POSTGRESQL_LWLOCK_STARTWAIT(lockid, mode);
+        TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);

         for (;;)
         {
@@ -459,7 +459,7 @@ LWLockAcquire(LWLockId lockid, LWLockMod
             extraWaits++;
         }

-        TRACE_POSTGRESQL_LWLOCK_ENDWAIT(lockid, mode);
+        TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode);

         LOG_LWDEBUG("LWLockAcquire", lockid, "awakened");

Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.554
diff -u -3 -p -r1.554 postgres.c
--- src/backend/tcop/postgres.c    18 Jul 2008 20:26:06 -0000    1.554
+++ src/backend/tcop/postgres.c    30 Jul 2008 04:02:42 -0000
@@ -71,6 +71,7 @@
 #include "mb/pg_wchar.h"

 #include "pgstat.h"
+#include "pg_trace.h"

 extern int    optind;
 extern char *optarg;
@@ -551,6 +552,8 @@ pg_parse_query(const char *query_string)
 {
     List       *raw_parsetree_list;

+    TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
+
     if (log_parser_stats)
         ResetUsage();

@@ -572,6 +575,8 @@ pg_parse_query(const char *query_string)
     }
 #endif

+    TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string);
+
     return raw_parsetree_list;
 }

@@ -591,6 +596,8 @@ pg_analyze_and_rewrite(Node *parsetree,
     Query       *query;
     List       *querytree_list;

+    TRACE_POSTGRESQL_QUERY_REWRITE_START(query_string);
+
     /*
      * (1) Perform parse analysis.
      */
@@ -607,6 +614,8 @@ pg_analyze_and_rewrite(Node *parsetree,
      */
     querytree_list = pg_rewrite_query(query);

+    TRACE_POSTGRESQL_QUERY_REWRITE_DONE(query_string);
+
     return querytree_list;
 }

@@ -677,6 +686,8 @@ pg_plan_query(Query *querytree, int curs
     if (querytree->commandType == CMD_UTILITY)
         return NULL;

+    TRACE_POSTGRESQL_QUERY_PLAN_START();
+
     if (log_planner_stats)
         ResetUsage();

@@ -711,6 +722,8 @@ pg_plan_query(Query *querytree, int curs
     if (Debug_print_plan)
         elog_node_display(DEBUG1, "plan", plan, Debug_pretty_print);

+    TRACE_POSTGRESQL_QUERY_PLAN_DONE();
+
     return plan;
 }

@@ -785,6 +798,7 @@ exec_simple_query(const char *query_stri
     bool        isTopLevel;
     char        msec_str[32];

+
     /*
      * Report query to various monitoring facilities.
      */
@@ -792,6 +806,8 @@ exec_simple_query(const char *query_stri

     pgstat_report_activity(query_string);

+    TRACE_POSTGRESQL_QUERY_START(query_string);
+
     /*
      * We use save_log_statement_stats so ShowUsage doesn't report incorrect
      * results because ResetUsage wasn't called.
@@ -1058,6 +1074,8 @@ exec_simple_query(const char *query_stri
     if (save_log_statement_stats)
         ShowUsage("QUERY STATISTICS");

+    TRACE_POSTGRESQL_QUERY_DONE(query_string);
+
     debug_query_string = NULL;
 }

Index: src/backend/tcop/pquery.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/pquery.c,v
retrieving revision 1.123
diff -u -3 -p -r1.123 pquery.c
--- src/backend/tcop/pquery.c    12 May 2008 20:02:02 -0000    1.123
+++ src/backend/tcop/pquery.c    30 Jul 2008 04:02:43 -0000
@@ -24,6 +24,7 @@
 #include "tcop/utility.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
+#include "pg_trace.h"


 /*
@@ -711,6 +712,8 @@ PortalRun(Portal portal, long count, boo

     AssertArg(PortalIsValid(portal));

+    TRACE_POSTGRESQL_QUERY_EXECUTE_START();
+
     /* Initialize completion tag to empty string */
     if (completionTag)
         completionTag[0] = '\0';
@@ -857,6 +860,8 @@ PortalRun(Portal portal, long count, boo

     if (log_executor_stats && portal->strategy != PORTAL_MULTI_QUERY)
         ShowUsage("EXECUTOR STATISTICS");
+
+    TRACE_POSTGRESQL_QUERY_EXECUTE_DONE();

     return result;
 }
@@ -1237,6 +1242,8 @@ PortalRunMulti(Portal portal, bool isTop
              */
             PlannedStmt *pstmt = (PlannedStmt *) stmt;

+            TRACE_POSTGRESQL_QUERY_EXECUTE_START();
+
             if (log_executor_stats)
                 ResetUsage();

@@ -1257,6 +1264,8 @@ PortalRunMulti(Portal portal, bool isTop

             if (log_executor_stats)
                 ShowUsage("EXECUTOR STATISTICS");
+
+            TRACE_POSTGRESQL_QUERY_EXECUTE_DONE();
         }
         else
         {
Index: src/backend/utils/probes.d
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/probes.d,v
retrieving revision 1.2
diff -u -3 -p -r1.2 probes.d
--- src/backend/utils/probes.d    2 Jan 2008 02:42:06 -0000    1.2
+++ src/backend/utils/probes.d    30 Jul 2008 04:02:43 -0000
@@ -7,18 +7,99 @@
  * ----------
  */

+
+typedef unsigned int LocalTransactionId;
+typedef int LWLockId;
+typedef int LWLockMode;
+typedef unsigned int locktag_field2;
+typedef int LOCKMODE;
+typedef const char * query_string;
+typedef int sortType;
+typedef int trueFalse;
+typedef int nkeys;
+typedef int workMem;
+typedef int randomAccess;
+typedef unsigned long LogicalTapeSetPtr;
+typedef long spaceUsed;
+typedef unsigned int BlockNumber;
+typedef unsigned int Oid;
+typedef int isLocalBuf;
+typedef int found;
+typedef int flags;
+typedef int num_to_write;
+typedef int num_written;
+typedef int NBuffers;
+typedef int buf_id;
+
+
 provider postgresql {

-probe transaction__start(int);
-probe transaction__commit(int);
-probe transaction__abort(int);
-probe lwlock__acquire(int, int);
-probe lwlock__release(int);
-probe lwlock__startwait(int, int);
-probe lwlock__endwait(int, int);
-probe lwlock__condacquire(int, int);
-probe lwlock__condacquire__fail(int, int);
-probe lock__startwait(int, int);
-probe lock__endwait(int, int);
+    /*
+     * Due to a bug in Mac OS X 10.5, using built-in typedefs (e.g. uintptr_t,
+     * uint32_t, etc.) cause compilation errors.
+     */
+
+    probe transaction__start(LocalTransactionId);
+    probe transaction__commit(LocalTransactionId);
+    probe transaction__abort(LocalTransactionId);
+
+    probe lwlock__acquire(LWLockId, LWLockMode);
+    probe lwlock__release(LWLockId);
+    probe lwlock__wait__start(LWLockId, LWLockMode);
+    probe lwlock__wait__done(LWLockId, LWLockMode);
+    probe lwlock__condacquire(LWLockId, LWLockMode);
+    probe lwlock__condacquire__fail(LWLockId, LWLockMode);
+
+    /* The following probe declarations cause compilation errors
+         * on Mac OS X but not on Solaris. Need further investigation.
+     * probe lock__wait__start(locktag_field2, LOCKMODE);
+     * probe lock__wait__done(locktag_field2, LOCKMODE);
+     */
+    probe lock__wait__start(unsigned int, int);
+    probe lock__wait__done(unsigned int, int);
+
+    probe query__parse__start(query_string);
+    probe query__parse__done(query_string);
+    probe query__rewrite__start(query_string);
+    probe query__rewrite__done(query_string);
+    probe query__plan__start();
+    probe query__plan__done();
+    probe query__execute__start();
+    probe query__execute__done();
+    probe query__start(query_string);
+    probe query__done(query_string);
+    probe statement__status(query_string);
+
+    probe sort__start(sortType, trueFalse, nkeys, workMem, randomAccess);
+    probe sort__end(LogicalTapeSetPtr, spaceUsed);
+
+    /* The following probe declarations cause compilation errors
+         * on Mac OS X but not on Solaris. Need further investigation.
+     * probe buffer__read__start(BlockNumber, Oid, Oid, Oid, isLocalBuf);
+     * probe buffer__read__done(BlockNumber, Oid, Oid, Oid, isLocalBuf, found);
+     */
+    probe buffer__read__start(unsigned int, unsigned int, unsigned int, unsigned int, int);
+    probe buffer__read__done(unsigned int, unsigned int, unsigned int, unsigned int, int, int);
+
+    probe buffer__flush__start(Oid, Oid, Oid);
+    probe buffer__flush__done(Oid, Oid, Oid);
+
+    probe buffer__hit(trueFalse);
+    probe buffer__miss(trueFalse);
+    probe buffer__checkpoint__start(flags);
+    probe buffer__checkpoint__done();
+    probe buffer__sync__start(NBuffers, num_to_write);
+    probe buffer__sync__written(buf_id);
+    probe buffer__sync__done(NBuffers, num_written, num_to_write);
+
+    probe deadlock__found();

+    probe clog__checkpoint__start(trueFalse);
+    probe clog__checkpoint__done(trueFalse);
+    probe subtrans__checkpoint__start(trueFalse);
+    probe subtrans__checkpoint__done(trueFalse);
+    probe multixact__checkpoint__start(trueFalse);
+    probe multixact__checkpoint__done(trueFalse);
+    probe twophase__checkpoint__start();
+    probe twophase__checkpoint__done();
 };
Index: src/backend/utils/sort/tuplesort.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/sort/tuplesort.c,v
retrieving revision 1.85
diff -u -3 -p -r1.85 tuplesort.c
--- src/backend/utils/sort/tuplesort.c    19 Jun 2008 00:46:05 -0000    1.85
+++ src/backend/utils/sort/tuplesort.c    30 Jul 2008 04:02:45 -0000
@@ -115,12 +115,18 @@
 #include "utils/rel.h"
 #include "utils/syscache.h"
 #include "utils/tuplesort.h"
+#include "pg_trace.h"


 /* GUC variables */
 #ifdef TRACE_SORT
 bool        trace_sort = false;
 #endif
+
+#define HEAP_SORT    0
+#define INDEX_SORT    1
+#define DATUM_SORT    2
+
 #ifdef DEBUG_BOUNDED_SORT
 bool        optimize_bounded_sort = true;
 #endif
@@ -569,6 +575,7 @@ tuplesort_begin_heap(TupleDesc tupDesc,
              "begin tuple sort: nkeys = %d, workMem = %d, randomAccess = %c",
              nkeys, workMem, randomAccess ? 't' : 'f');
 #endif
+    TRACE_POSTGRESQL_SORT_START(HEAP_SORT, false, nkeys, workMem, randomAccess);

     state->nKeys = nkeys;

@@ -636,6 +643,8 @@ tuplesort_begin_index_btree(Relation ind

     state->nKeys = RelationGetNumberOfAttributes(indexRel);

+    TRACE_POSTGRESQL_SORT_START(INDEX_SORT, enforceUnique, state->nKeys, workMem, randomAccess);
+
     state->comparetup = comparetup_index_btree;
     state->copytup = copytup_index;
     state->writetup = writetup_index;
@@ -713,6 +722,7 @@ tuplesort_begin_datum(Oid datumType,
              "begin datum sort: workMem = %d, randomAccess = %c",
              workMem, randomAccess ? 't' : 'f');
 #endif
+    TRACE_POSTGRESQL_SORT_START(DATUM_SORT, false, 1, workMem, randomAccess);

     state->nKeys = 1;            /* always a one-column sort */

@@ -825,6 +835,11 @@ tuplesort_end(Tuplesortstate *state)
     }
 #endif

+    TRACE_POSTGRESQL_SORT_END(state->tapeset,
+            (state->tapeset ? LogicalTapeSetBlocks(state->tapeset) :
+            (state->allowedMem - state->availMem + 1023) / 1024));
+
+
     MemoryContextSwitchTo(oldcontext);

     /*

Re: Review: DTrace probes (merged version) ver_03

From
Alvaro Herrera
Date:
Robert Lor wrote:

Hi,

>> What I suggest might be a reasonable compromise is to copy needed
>> typedefs directly into the probes.d file:

> Implemented this suggestion. There are some weirdness with the OS X  
> compiler causing some of the probe declarations not to compile (see  
> comments in probe.d).  The compiler spits out some warnings because the  
> types don't show up in the function prototype in probes.h, but the  
> probes work okay. I think we can safely ignore the warnings.

These make sense, because they are already typedef's in our code:

> +typedef unsigned int LocalTransactionId;
> +typedef int LWLockId;
> +typedef int LWLockMode;
> +typedef int LOCKMODE;
> +typedef unsigned int BlockNumber;
> +typedef unsigned int Oid;

But I don't see a reason to define the rest:

> +typedef unsigned int locktag_field2;
> +typedef const char * query_string;
> +typedef int sortType;
> +typedef int trueFalse;
> +typedef int nkeys;
> +typedef int workMem;
> +typedef int randomAccess;
> +typedef unsigned long LogicalTapeSetPtr;
> +typedef long spaceUsed;
> +typedef int isLocalBuf;
> +typedef int found;
> +typedef int flags;
> +typedef int num_to_write;
> +typedef int num_written;
> +typedef int NBuffers;
> +typedef int buf_id;

I think you should add a #define Size, perhaps #define bool, and use
those where applicable, and the plain types (int, long, etc) in the rest.

> +    /* The following probe declarations cause compilation errors
> +         * on Mac OS X but not on Solaris. Need further investigation.
> +     * probe lock__wait__start(locktag_field2, LOCKMODE);
> +     * probe lock__wait__done(locktag_field2, LOCKMODE);
> +     */
> +    probe lock__wait__start(unsigned int, int);
> +    probe lock__wait__done(unsigned int, int);

For example I think this should look like
probe lock__wait__start(unsigned int, LOCKMODE);

That Mac OS X problem merits some extra investigation, I think.

Other than this, I think this patch can be committed.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Review: DTrace probes (merged version) ver_03

From
Robert Lor
Date:

Alvaro Herrera wrote:
> But I don't see a reason to define the rest:
>
>
>> +typedef unsigned int locktag_field2;
>> +typedef const char * query_string;
>> +typedef int sortType;
>> +typedef int trueFalse;
>> +typedef int nkeys;
>> +typedef int workMem;
>> +typedef int randomAccess;
>> +typedef unsigned long LogicalTapeSetPtr;
>> +typedef long spaceUsed;
>> +typedef int isLocalBuf;
>> +typedef int found;
>> +typedef int flags;
>> +typedef int num_to_write;
>> +typedef int num_written;
>> +typedef int NBuffers;
>> +typedef int buf_id;
>>
>
> I think you should add a #define Size, perhaps #define bool, and use
> those where applicable, and the plain types (int, long, etc) in the rest.
>

Fixed. Patch attached.
>
> That Mac OS X problem merits some extra investigation, I think.
>
I'm investigating this one and will find the root cause, but I don't
think it should hold back this patch.
> Other than this, I think this patch can be committed.
>
>
I'd appreciate if it can be committed today.


Alvaro, thanks a bunch for the feedback!

--
Robert Lor           Sun Microsystems
Austin, USA          http://sun.com/postgresql

Index: src/backend/access/transam/clog.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/clog.c,v
retrieving revision 1.46
diff -u -3 -p -r1.46 clog.c
--- src/backend/access/transam/clog.c    1 Jan 2008 19:45:46 -0000    1.46
+++ src/backend/access/transam/clog.c    31 Jul 2008 20:09:15 -0000
@@ -36,6 +36,7 @@
 #include "access/slru.h"
 #include "access/transam.h"
 #include "postmaster/bgwriter.h"
+#include "pg_trace.h"

 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -313,7 +314,9 @@ void
 ShutdownCLOG(void)
 {
     /* Flush dirty CLOG pages to disk */
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(false);
     SimpleLruFlush(ClogCtl, false);
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(false);
 }

 /*
@@ -323,7 +326,9 @@ void
 CheckPointCLOG(void)
 {
     /* Flush dirty CLOG pages to disk */
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true);
     SimpleLruFlush(ClogCtl, true);
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true);
 }


Index: src/backend/access/transam/multixact.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/multixact.c,v
retrieving revision 1.27
diff -u -3 -p -r1.27 multixact.c
--- src/backend/access/transam/multixact.c    1 Jan 2008 19:45:46 -0000    1.27
+++ src/backend/access/transam/multixact.c    31 Jul 2008 20:09:16 -0000
@@ -57,6 +57,7 @@
 #include "storage/lmgr.h"
 #include "utils/memutils.h"
 #include "storage/procarray.h"
+#include "pg_trace.h"


 /*
@@ -1497,8 +1498,10 @@ void
 ShutdownMultiXact(void)
 {
     /* Flush dirty MultiXact pages to disk */
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START(false);
     SimpleLruFlush(MultiXactOffsetCtl, false);
     SimpleLruFlush(MultiXactMemberCtl, false);
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(false);
 }

 /*
@@ -1526,6 +1529,8 @@ MultiXactGetCheckptMulti(bool is_shutdow
 void
 CheckPointMultiXact(void)
 {
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START(true);
+
     /* Flush dirty MultiXact pages to disk */
     SimpleLruFlush(MultiXactOffsetCtl, true);
     SimpleLruFlush(MultiXactMemberCtl, true);
@@ -1540,6 +1545,8 @@ CheckPointMultiXact(void)
      */
     if (!InRecovery)
         TruncateMultiXact();
+
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);
 }

 /*
Index: src/backend/access/transam/subtrans.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/subtrans.c,v
retrieving revision 1.22
diff -u -3 -p -r1.22 subtrans.c
--- src/backend/access/transam/subtrans.c    26 Mar 2008 18:48:59 -0000    1.22
+++ src/backend/access/transam/subtrans.c    31 Jul 2008 20:09:17 -0000
@@ -32,6 +32,7 @@
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "utils/snapmgr.h"
+#include "pg_trace.h"


 /*
@@ -265,7 +266,9 @@ ShutdownSUBTRANS(void)
      * This is not actually necessary from a correctness point of view. We do
      * it merely as a debugging aid.
      */
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(false);
     SimpleLruFlush(SubTransCtl, false);
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(false);
 }

 /*
@@ -281,7 +284,9 @@ CheckPointSUBTRANS(void)
      * it merely to improve the odds that writing of dirty pages is done by
      * the checkpoint process and not by backends.
      */
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(true);
     SimpleLruFlush(SubTransCtl, true);
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(true);
 }


Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.43
diff -u -3 -p -r1.43 twophase.c
--- src/backend/access/transam/twophase.c    19 May 2008 18:16:26 -0000    1.43
+++ src/backend/access/transam/twophase.c    31 Jul 2008 20:09:18 -0000
@@ -57,6 +57,7 @@
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /*
@@ -1387,6 +1388,9 @@ CheckPointTwoPhase(XLogRecPtr redo_horiz
      */
     if (max_prepared_xacts <= 0)
         return;                    /* nothing to do */
+
+    TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_START();
+
     xids = (TransactionId *) palloc(max_prepared_xacts * sizeof(TransactionId));
     nxids = 0;

@@ -1444,6 +1448,8 @@ CheckPointTwoPhase(XLogRecPtr redo_horiz
     }

     pfree(xids);
+
+    TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_DONE();
 }

 /*
Index: src/backend/postmaster/pgstat.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.176
diff -u -3 -p -r1.176 pgstat.c
--- src/backend/postmaster/pgstat.c    30 Jun 2008 10:58:47 -0000    1.176
+++ src/backend/postmaster/pgstat.c    31 Jul 2008 20:09:20 -0000
@@ -61,6 +61,7 @@
 #include "utils/ps_status.h"
 #include "utils/rel.h"
 #include "utils/tqual.h"
+#include "pg_trace.h"


 /* ----------
@@ -2202,6 +2203,8 @@ pgstat_report_activity(const char *cmd_s
     TimestampTz start_timestamp;
     int            len;

+    TRACE_POSTGRESQL_STATEMENT_STATUS(cmd_str);
+
     if (!pgstat_track_activities || !beentry)
         return;

Index: src/backend/storage/buffer/bufmgr.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.234
diff -u -3 -p -r1.234 bufmgr.c
--- src/backend/storage/buffer/bufmgr.c    13 Jul 2008 20:45:47 -0000    1.234
+++ src/backend/storage/buffer/bufmgr.c    31 Jul 2008 20:09:21 -0000
@@ -43,6 +43,7 @@
 #include "utils/rel.h"
 #include "utils/resowner.h"
 #include "pgstat.h"
+#include "pg_trace.h"


 /* Note: these two macros only work on shared buffers, not local ones! */
@@ -213,12 +214,22 @@ ReadBuffer_common(SMgrRelation smgr, boo
     if (isExtend)
         blockNum = smgrnblocks(smgr);

+    TRACE_POSTGRESQL_BUFFER_READ_START(blockNum, smgr->smgr_rnode.spcNode,
+        smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, isLocalBuf);
+
     if (isLocalBuf)
     {
         ReadLocalBufferCount++;
         bufHdr = LocalBufferAlloc(smgr, blockNum, &found);
         if (found)
+        {
             LocalBufferHitCount++;
+            TRACE_POSTGRESQL_BUFFER_HIT(true); /* true == local buffer */
+        }
+        else
+        {
+            TRACE_POSTGRESQL_BUFFER_MISS(true); /* ditto */
+        }
     }
     else
     {
@@ -230,7 +241,14 @@ ReadBuffer_common(SMgrRelation smgr, boo
          */
         bufHdr = BufferAlloc(smgr, blockNum, strategy, &found);
         if (found)
+        {
             BufferHitCount++;
+            TRACE_POSTGRESQL_BUFFER_HIT(false); /* false != local buffer */
+        }
+        else
+        {
+            TRACE_POSTGRESQL_BUFFER_MISS(false); /* ditto */
+        }
     }

     /* At this point we do NOT hold any locks. */
@@ -246,6 +264,11 @@ ReadBuffer_common(SMgrRelation smgr, boo
             if (VacuumCostActive)
                 VacuumCostBalance += VacuumCostPageHit;

+            TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum,
+                smgr->smgr_rnode.spcNode,
+                smgr->smgr_rnode.dbNode,
+                smgr->smgr_rnode.relNode, isLocalBuf, found);
+
             return BufferDescriptorGetBuffer(bufHdr);
         }

@@ -368,6 +391,10 @@ ReadBuffer_common(SMgrRelation smgr, boo
     if (VacuumCostActive)
         VacuumCostBalance += VacuumCostPageMiss;

+    TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum, smgr->smgr_rnode.spcNode,
+            smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode,
+            isLocalBuf, found);
+
     return BufferDescriptorGetBuffer(bufHdr);
 }

@@ -1086,6 +1113,8 @@ BufferSync(int flags)
     if (num_to_write == 0)
         return;                    /* nothing to do */

+    TRACE_POSTGRESQL_BUFFER_SYNC_START(NBuffers, num_to_write);
+
     /*
      * Loop over all buffers again, and write the ones (still) marked with
      * BM_CHECKPOINT_NEEDED.  In this loop, we start at the clock sweep point
@@ -1117,6 +1146,7 @@ BufferSync(int flags)
         {
             if (SyncOneBuffer(buf_id, false) & BUF_WRITTEN)
             {
+                TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
                 BgWriterStats.m_buf_written_checkpoints++;
                 num_written++;

@@ -1147,6 +1177,8 @@ BufferSync(int flags)
             buf_id = 0;
     }

+    TRACE_POSTGRESQL_BUFFER_SYNC_DONE(NBuffers, num_written, num_to_write);
+
     /*
      * Update checkpoint statistics. As noted above, this doesn't include
      * buffers written by other backends or bgwriter scan.
@@ -1653,11 +1685,13 @@ PrintBufferLeakWarning(Buffer buffer)
 void
 CheckPointBuffers(int flags)
 {
+    TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
     CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
     BufferSync(flags);
     CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
     smgrsync();
     CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
+    TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
 }


@@ -1759,6 +1793,10 @@ FlushBuffer(volatile BufferDesc *buf, SM
     if (reln == NULL)
         reln = smgropen(buf->tag.rnode);

+    TRACE_POSTGRESQL_BUFFER_FLUSH_START(reln->smgr_rnode.spcNode,
+         reln->smgr_rnode.dbNode,
+         reln->smgr_rnode.relNode);
+
     /*
      * Force XLOG flush up to buffer's LSN.  This implements the basic WAL
      * rule that log updates must hit disk before any of the data-file changes
@@ -1785,6 +1823,9 @@ FlushBuffer(volatile BufferDesc *buf, SM

     BufferFlushCount++;

+    TRACE_POSTGRESQL_BUFFER_FLUSH_DONE(reln->smgr_rnode.spcNode,
+         reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode);
+
     /*
      * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and
      * end the io_in_progress state.
Index: src/backend/storage/lmgr/deadlock.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/deadlock.c,v
retrieving revision 1.53
diff -u -3 -p -r1.53 deadlock.c
--- src/backend/storage/lmgr/deadlock.c    24 Mar 2008 18:22:36 -0000    1.53
+++ src/backend/storage/lmgr/deadlock.c    31 Jul 2008 20:09:22 -0000
@@ -30,6 +30,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /* One edge in the waits-for graph */
@@ -222,6 +223,8 @@ DeadLockCheck(PGPROC *proc)
          */
         int            nSoftEdges;

+        TRACE_POSTGRESQL_DEADLOCK_FOUND();
+
         nWaitOrders = 0;
         if (!FindLockCycle(proc, possibleConstraints, &nSoftEdges))
             elog(FATAL, "deadlock seems to have disappeared");
Index: src/backend/storage/lmgr/lock.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/lock.c,v
retrieving revision 1.183
diff -u -3 -p -r1.183 lock.c
--- src/backend/storage/lmgr/lock.c    17 Mar 2008 19:44:41 -0000    1.183
+++ src/backend/storage/lmgr/lock.c    31 Jul 2008 20:09:23 -0000
@@ -787,11 +787,11 @@ LockAcquire(const LOCKTAG *locktag,
          * Sleep till someone wakes me up.
          */

-        TRACE_POSTGRESQL_LOCK_STARTWAIT(locktag->locktag_field2, lockmode);
+        TRACE_POSTGRESQL_LOCK_WAIT_START(locktag->locktag_field2, lockmode);

         WaitOnLock(locallock, owner);

-        TRACE_POSTGRESQL_LOCK_ENDWAIT(locktag->locktag_field2, lockmode);
+        TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field2, lockmode);

         /*
          * NOTE: do not do any material change of state between here and
Index: src/backend/storage/lmgr/lwlock.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/lwlock.c,v
retrieving revision 1.51
diff -u -3 -p -r1.51 lwlock.c
--- src/backend/storage/lmgr/lwlock.c    17 Mar 2008 19:44:41 -0000    1.51
+++ src/backend/storage/lmgr/lwlock.c    31 Jul 2008 20:09:24 -0000
@@ -448,7 +448,7 @@ LWLockAcquire(LWLockId lockid, LWLockMod
         block_counts[lockid]++;
 #endif

-        TRACE_POSTGRESQL_LWLOCK_STARTWAIT(lockid, mode);
+        TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);

         for (;;)
         {
@@ -459,7 +459,7 @@ LWLockAcquire(LWLockId lockid, LWLockMod
             extraWaits++;
         }

-        TRACE_POSTGRESQL_LWLOCK_ENDWAIT(lockid, mode);
+        TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode);

         LOG_LWDEBUG("LWLockAcquire", lockid, "awakened");

Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.554
diff -u -3 -p -r1.554 postgres.c
--- src/backend/tcop/postgres.c    18 Jul 2008 20:26:06 -0000    1.554
+++ src/backend/tcop/postgres.c    31 Jul 2008 20:09:26 -0000
@@ -71,6 +71,7 @@
 #include "mb/pg_wchar.h"

 #include "pgstat.h"
+#include "pg_trace.h"

 extern int    optind;
 extern char *optarg;
@@ -551,6 +552,8 @@ pg_parse_query(const char *query_string)
 {
     List       *raw_parsetree_list;

+    TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
+
     if (log_parser_stats)
         ResetUsage();

@@ -572,6 +575,8 @@ pg_parse_query(const char *query_string)
     }
 #endif

+    TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string);
+
     return raw_parsetree_list;
 }

@@ -591,6 +596,8 @@ pg_analyze_and_rewrite(Node *parsetree,
     Query       *query;
     List       *querytree_list;

+    TRACE_POSTGRESQL_QUERY_REWRITE_START(query_string);
+
     /*
      * (1) Perform parse analysis.
      */
@@ -607,6 +614,8 @@ pg_analyze_and_rewrite(Node *parsetree,
      */
     querytree_list = pg_rewrite_query(query);

+    TRACE_POSTGRESQL_QUERY_REWRITE_DONE(query_string);
+
     return querytree_list;
 }

@@ -677,6 +686,8 @@ pg_plan_query(Query *querytree, int curs
     if (querytree->commandType == CMD_UTILITY)
         return NULL;

+    TRACE_POSTGRESQL_QUERY_PLAN_START();
+
     if (log_planner_stats)
         ResetUsage();

@@ -711,6 +722,8 @@ pg_plan_query(Query *querytree, int curs
     if (Debug_print_plan)
         elog_node_display(DEBUG1, "plan", plan, Debug_pretty_print);

+    TRACE_POSTGRESQL_QUERY_PLAN_DONE();
+
     return plan;
 }

@@ -785,6 +798,7 @@ exec_simple_query(const char *query_stri
     bool        isTopLevel;
     char        msec_str[32];

+
     /*
      * Report query to various monitoring facilities.
      */
@@ -792,6 +806,8 @@ exec_simple_query(const char *query_stri

     pgstat_report_activity(query_string);

+    TRACE_POSTGRESQL_QUERY_START(query_string);
+
     /*
      * We use save_log_statement_stats so ShowUsage doesn't report incorrect
      * results because ResetUsage wasn't called.
@@ -1058,6 +1074,8 @@ exec_simple_query(const char *query_stri
     if (save_log_statement_stats)
         ShowUsage("QUERY STATISTICS");

+    TRACE_POSTGRESQL_QUERY_DONE(query_string);
+
     debug_query_string = NULL;
 }

Index: src/backend/tcop/pquery.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/pquery.c,v
retrieving revision 1.123
diff -u -3 -p -r1.123 pquery.c
--- src/backend/tcop/pquery.c    12 May 2008 20:02:02 -0000    1.123
+++ src/backend/tcop/pquery.c    31 Jul 2008 20:09:27 -0000
@@ -24,6 +24,7 @@
 #include "tcop/utility.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
+#include "pg_trace.h"


 /*
@@ -711,6 +712,8 @@ PortalRun(Portal portal, long count, boo

     AssertArg(PortalIsValid(portal));

+    TRACE_POSTGRESQL_QUERY_EXECUTE_START();
+
     /* Initialize completion tag to empty string */
     if (completionTag)
         completionTag[0] = '\0';
@@ -857,6 +860,8 @@ PortalRun(Portal portal, long count, boo

     if (log_executor_stats && portal->strategy != PORTAL_MULTI_QUERY)
         ShowUsage("EXECUTOR STATISTICS");
+
+    TRACE_POSTGRESQL_QUERY_EXECUTE_DONE();

     return result;
 }
@@ -1237,6 +1242,8 @@ PortalRunMulti(Portal portal, bool isTop
              */
             PlannedStmt *pstmt = (PlannedStmt *) stmt;

+            TRACE_POSTGRESQL_QUERY_EXECUTE_START();
+
             if (log_executor_stats)
                 ResetUsage();

@@ -1257,6 +1264,8 @@ PortalRunMulti(Portal portal, bool isTop

             if (log_executor_stats)
                 ShowUsage("EXECUTOR STATISTICS");
+
+            TRACE_POSTGRESQL_QUERY_EXECUTE_DONE();
         }
         else
         {
Index: src/backend/utils/Makefile
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/Makefile,v
retrieving revision 1.27
diff -u -3 -p -r1.27 Makefile
--- src/backend/utils/Makefile    17 Mar 2008 19:44:41 -0000    1.27
+++ src/backend/utils/Makefile    31 Jul 2008 20:09:27 -0000
@@ -22,7 +22,7 @@ fmgroids.h fmgrtab.c: Gen_fmgrtab.sh $(t

 probes.h: probes.d
 ifeq ($(enable_dtrace), yes)
-    $(DTRACE) -h -s $< -o $@.tmp
+    $(DTRACE) -C -h -s $< -o $@.tmp
     sed -e 's/POSTGRESQL_/TRACE_POSTGRESQL_/g' $@.tmp >$@
     rm $@.tmp
 else
Index: src/backend/utils/probes.d
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/probes.d,v
retrieving revision 1.2
diff -u -3 -p -r1.2 probes.d
--- src/backend/utils/probes.d    2 Jan 2008 02:42:06 -0000    1.2
+++ src/backend/utils/probes.d    31 Jul 2008 20:09:27 -0000
@@ -7,18 +7,85 @@
  * ----------
  */

+
+/* typdefs used in PostgreSQL */
+typedef unsigned int LocalTransactionId;
+typedef int LWLockId;
+typedef int LWLockMode;
+typedef int LOCKMODE;
+typedef unsigned int BlockNumber;
+typedef unsigned int Oid;
+
+#define bool char
+
 provider postgresql {

-probe transaction__start(int);
-probe transaction__commit(int);
-probe transaction__abort(int);
-probe lwlock__acquire(int, int);
-probe lwlock__release(int);
-probe lwlock__startwait(int, int);
-probe lwlock__endwait(int, int);
-probe lwlock__condacquire(int, int);
-probe lwlock__condacquire__fail(int, int);
-probe lock__startwait(int, int);
-probe lock__endwait(int, int);
+    /*
+     * Due to a bug in Mac OS X 10.5, using built-in typedefs (e.g. uintptr_t,
+     * uint32_t, etc.) cause compilation errors.
+     */
+
+    probe transaction__start(LocalTransactionId);
+    probe transaction__commit(LocalTransactionId);
+    probe transaction__abort(LocalTransactionId);
+
+    probe lwlock__acquire(LWLockId, LWLockMode);
+    probe lwlock__release(LWLockId);
+    probe lwlock__wait__start(LWLockId, LWLockMode);
+    probe lwlock__wait__done(LWLockId, LWLockMode);
+    probe lwlock__condacquire(LWLockId, LWLockMode);
+    probe lwlock__condacquire__fail(LWLockId, LWLockMode);
+
+    /* The following probe declarations cause compilation errors
+         * on Mac OS X but not on Solaris. Need further investigation.
+     * probe lock__wait__start(unsigned int locktag_field2, LOCKMODE);
+     * probe lock__wait__done(unsigned int locktag_field2, LOCKMODE);
+     */
+    probe lock__wait__start(unsigned int, int);
+    probe lock__wait__done(unsigned int, int);
+
+    probe query__parse__start(const char *query_string);
+    probe query__parse__done(const char *query_string);
+    probe query__rewrite__start(const char *query_string);
+    probe query__rewrite__done(const char *query_string);
+    probe query__plan__start();
+    probe query__plan__done();
+    probe query__execute__start();
+    probe query__execute__done();
+    probe query__start(const char *query_string);
+    probe query__done(const char *query_string);
+    probe statement__status(const char *query_string);
+
+    probe sort__start(int sortType, bool trueFalse, int nkeys, int workMem, bool randomAccess);
+    probe sort__end(unsigned long LogicalTapeSetPtr, long spaceUsed);
+
+    /* The following probe declarations cause compilation errors
+         * on Mac OS X but not on Solaris. Need further investigation.
+     * probe buffer__read__start(BlockNumber, Oid, Oid, Oid, bool isLocalBuf);
+     * probe buffer__read__done(BlockNumber, Oid, Oid, Oid, bool isLocalBuf, bool found);
+     */
+    probe buffer__read__start(unsigned int, unsigned int, unsigned int, unsigned int, bool);
+    probe buffer__read__done(unsigned int, unsigned int, unsigned int, unsigned int, bool, bool);
+
+    probe buffer__flush__start(Oid, Oid, Oid);
+    probe buffer__flush__done(Oid, Oid, Oid);
+
+    probe buffer__hit(bool trueFalse);
+    probe buffer__miss(bool trueFalse);
+    probe buffer__checkpoint__start(int flags);
+    probe buffer__checkpoint__done();
+    probe buffer__sync__start(int NBuffers, int num_to_write);
+    probe buffer__sync__written(int buf_id);
+    probe buffer__sync__done(int NBuffers, int num_written, int num_to_write);
+
+    probe deadlock__found();

+    probe clog__checkpoint__start(bool trueFalse);
+    probe clog__checkpoint__done(bool trueFalse);
+    probe subtrans__checkpoint__start(bool trueFalse);
+    probe subtrans__checkpoint__done(bool trueFalse);
+    probe multixact__checkpoint__start(bool trueFalse);
+    probe multixact__checkpoint__done(bool trueFalse);
+    probe twophase__checkpoint__start();
+    probe twophase__checkpoint__done();
 };
Index: src/backend/utils/sort/tuplesort.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/sort/tuplesort.c,v
retrieving revision 1.85
diff -u -3 -p -r1.85 tuplesort.c
--- src/backend/utils/sort/tuplesort.c    19 Jun 2008 00:46:05 -0000    1.85
+++ src/backend/utils/sort/tuplesort.c    31 Jul 2008 20:09:29 -0000
@@ -115,12 +115,18 @@
 #include "utils/rel.h"
 #include "utils/syscache.h"
 #include "utils/tuplesort.h"
+#include "pg_trace.h"


 /* GUC variables */
 #ifdef TRACE_SORT
 bool        trace_sort = false;
 #endif
+
+#define HEAP_SORT    0
+#define INDEX_SORT    1
+#define DATUM_SORT    2
+
 #ifdef DEBUG_BOUNDED_SORT
 bool        optimize_bounded_sort = true;
 #endif
@@ -569,6 +575,7 @@ tuplesort_begin_heap(TupleDesc tupDesc,
              "begin tuple sort: nkeys = %d, workMem = %d, randomAccess = %c",
              nkeys, workMem, randomAccess ? 't' : 'f');
 #endif
+    TRACE_POSTGRESQL_SORT_START(HEAP_SORT, false, nkeys, workMem, randomAccess);

     state->nKeys = nkeys;

@@ -636,6 +643,8 @@ tuplesort_begin_index_btree(Relation ind

     state->nKeys = RelationGetNumberOfAttributes(indexRel);

+    TRACE_POSTGRESQL_SORT_START(INDEX_SORT, enforceUnique, state->nKeys, workMem, randomAccess);
+
     state->comparetup = comparetup_index_btree;
     state->copytup = copytup_index;
     state->writetup = writetup_index;
@@ -713,6 +722,7 @@ tuplesort_begin_datum(Oid datumType,
              "begin datum sort: workMem = %d, randomAccess = %c",
              workMem, randomAccess ? 't' : 'f');
 #endif
+    TRACE_POSTGRESQL_SORT_START(DATUM_SORT, false, 1, workMem, randomAccess);

     state->nKeys = 1;            /* always a one-column sort */

@@ -825,6 +835,11 @@ tuplesort_end(Tuplesortstate *state)
     }
 #endif

+    TRACE_POSTGRESQL_SORT_END(state->tapeset,
+            (state->tapeset ? LogicalTapeSetBlocks(state->tapeset) :
+            (state->allowedMem - state->availMem + 1023) / 1024));
+
+
     MemoryContextSwitchTo(oldcontext);

     /*

Re: Review: DTrace probes (merged version) ver_03

From
Alvaro Herrera
Date:
Robert Lor wrote:

>> That Mac OS X problem merits some extra investigation, I think.
>>
> I'm investigating this one and will find the root cause, but I don't
> think it should hold back this patch.
>> Other than this, I think this patch can be committed.
>>
> I'd appreciate if it can be committed today.

I'm looking at it.

FWIW I found that the machinery to compile on non-probes-enabled
machines needs to be updated per the attached patch.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment

Re: Review: DTrace probes (merged version) ver_03

From
Alvaro Herrera
Date:
Robert Lor wrote:

> Tom Lane wrote:

>> * The probes that pass buffer tag elements are already broken by the
>> pending "relation forks" patch: there is soon going to be another field
>> in buffer tags.  Perhaps it'd be feasible to pass the buffer tag as a  
>> single probe argument to make that a bit more future-proof?  I'm not
>> sure if that would complicate the use of the probe so much as to be
>> counterproductive.
>
> Took out the buffer tag argument for now. Will figure out how to best  
> solve this after this relation forks patch is committed.

I was checking the DTrace docs for other reasons and I came across this,
which maybe can be useful here:

http://docs.sun.com/app/docs/doc/817-6223/chp-xlate?a=view

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Review: DTrace probes (merged version) ver_03

From
Alvaro Herrera
Date:
Here's what I have.  Please confirm that this compiles for you.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment

Re: Review: DTrace probes (merged version) ver_03

From
Robert Lor
Date:
Alvaro Herrera wrote:
> Here's what I have.  Please confirm that this compiles for you.
>
I made some changes to the sed script so it works with the sed on
Solaris & OS X. I tested this patch on both Solaris and OS X with DTrace
enabled and disabled and also verified that the sed script works with
GNU sed. I hope this is the final change for this patch. Thanks for
catching all the issues, and my bad for not testing with DTrace disabled.

>
> ------------------------------------------------------------------------
>
>


--
Robert Lor           Sun Microsystems
Austin, USA          http://sun.com/postgresql

Index: src/backend/Makefile
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/Makefile,v
retrieving revision 1.128
diff -u -3 -p -r1.128 Makefile
--- src/backend/Makefile    17 Mar 2008 19:44:40 -0000    1.128
+++ src/backend/Makefile    1 Aug 2008 03:56:13 -0000
@@ -147,7 +147,7 @@ $(top_builddir)/src/include/utils/probes

 ifeq ($(PORTNAME), solaris)
 utils/probes.o: utils/probes.d $(SUBDIROBJS)
-    $(DTRACE) $(DTRACEFLAGS) -G -s $(call expand_subsys,$^) -o $@
+    $(DTRACE) $(DTRACEFLAGS) -C -G -s $(call expand_subsys,$^) -o $@
 endif


Index: src/backend/access/transam/clog.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/clog.c,v
retrieving revision 1.46
diff -u -3 -p -r1.46 clog.c
--- src/backend/access/transam/clog.c    1 Jan 2008 19:45:46 -0000    1.46
+++ src/backend/access/transam/clog.c    1 Aug 2008 03:56:14 -0000
@@ -36,6 +36,7 @@
 #include "access/slru.h"
 #include "access/transam.h"
 #include "postmaster/bgwriter.h"
+#include "pg_trace.h"

 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -313,7 +314,9 @@ void
 ShutdownCLOG(void)
 {
     /* Flush dirty CLOG pages to disk */
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(false);
     SimpleLruFlush(ClogCtl, false);
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(false);
 }

 /*
@@ -323,7 +326,9 @@ void
 CheckPointCLOG(void)
 {
     /* Flush dirty CLOG pages to disk */
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true);
     SimpleLruFlush(ClogCtl, true);
+    TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true);
 }


Index: src/backend/access/transam/multixact.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/multixact.c,v
retrieving revision 1.27
diff -u -3 -p -r1.27 multixact.c
--- src/backend/access/transam/multixact.c    1 Jan 2008 19:45:46 -0000    1.27
+++ src/backend/access/transam/multixact.c    1 Aug 2008 03:56:15 -0000
@@ -57,6 +57,7 @@
 #include "storage/lmgr.h"
 #include "utils/memutils.h"
 #include "storage/procarray.h"
+#include "pg_trace.h"


 /*
@@ -1497,8 +1498,10 @@ void
 ShutdownMultiXact(void)
 {
     /* Flush dirty MultiXact pages to disk */
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START(false);
     SimpleLruFlush(MultiXactOffsetCtl, false);
     SimpleLruFlush(MultiXactMemberCtl, false);
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(false);
 }

 /*
@@ -1526,6 +1529,8 @@ MultiXactGetCheckptMulti(bool is_shutdow
 void
 CheckPointMultiXact(void)
 {
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_START(true);
+
     /* Flush dirty MultiXact pages to disk */
     SimpleLruFlush(MultiXactOffsetCtl, true);
     SimpleLruFlush(MultiXactMemberCtl, true);
@@ -1540,6 +1545,8 @@ CheckPointMultiXact(void)
      */
     if (!InRecovery)
         TruncateMultiXact();
+
+    TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);
 }

 /*
Index: src/backend/access/transam/subtrans.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/subtrans.c,v
retrieving revision 1.22
diff -u -3 -p -r1.22 subtrans.c
--- src/backend/access/transam/subtrans.c    26 Mar 2008 18:48:59 -0000    1.22
+++ src/backend/access/transam/subtrans.c    1 Aug 2008 03:56:15 -0000
@@ -32,6 +32,7 @@
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "utils/snapmgr.h"
+#include "pg_trace.h"


 /*
@@ -265,7 +266,9 @@ ShutdownSUBTRANS(void)
      * This is not actually necessary from a correctness point of view. We do
      * it merely as a debugging aid.
      */
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(false);
     SimpleLruFlush(SubTransCtl, false);
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(false);
 }

 /*
@@ -281,7 +284,9 @@ CheckPointSUBTRANS(void)
      * it merely to improve the odds that writing of dirty pages is done by
      * the checkpoint process and not by backends.
      */
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(true);
     SimpleLruFlush(SubTransCtl, true);
+    TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(true);
 }


Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.43
diff -u -3 -p -r1.43 twophase.c
--- src/backend/access/transam/twophase.c    19 May 2008 18:16:26 -0000    1.43
+++ src/backend/access/transam/twophase.c    1 Aug 2008 03:56:16 -0000
@@ -57,6 +57,7 @@
 #include "storage/smgr.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /*
@@ -1387,6 +1388,9 @@ CheckPointTwoPhase(XLogRecPtr redo_horiz
      */
     if (max_prepared_xacts <= 0)
         return;                    /* nothing to do */
+
+    TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_START();
+
     xids = (TransactionId *) palloc(max_prepared_xacts * sizeof(TransactionId));
     nxids = 0;

@@ -1444,6 +1448,8 @@ CheckPointTwoPhase(XLogRecPtr redo_horiz
     }

     pfree(xids);
+
+    TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_DONE();
 }

 /*
Index: src/backend/postmaster/pgstat.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.176
diff -u -3 -p -r1.176 pgstat.c
--- src/backend/postmaster/pgstat.c    30 Jun 2008 10:58:47 -0000    1.176
+++ src/backend/postmaster/pgstat.c    1 Aug 2008 03:56:18 -0000
@@ -61,6 +61,7 @@
 #include "utils/ps_status.h"
 #include "utils/rel.h"
 #include "utils/tqual.h"
+#include "pg_trace.h"


 /* ----------
@@ -2202,6 +2203,8 @@ pgstat_report_activity(const char *cmd_s
     TimestampTz start_timestamp;
     int            len;

+    TRACE_POSTGRESQL_STATEMENT_STATUS(cmd_str);
+
     if (!pgstat_track_activities || !beentry)
         return;

Index: src/backend/storage/buffer/bufmgr.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.234
diff -u -3 -p -r1.234 bufmgr.c
--- src/backend/storage/buffer/bufmgr.c    13 Jul 2008 20:45:47 -0000    1.234
+++ src/backend/storage/buffer/bufmgr.c    1 Aug 2008 03:56:20 -0000
@@ -43,6 +43,7 @@
 #include "utils/rel.h"
 #include "utils/resowner.h"
 #include "pgstat.h"
+#include "pg_trace.h"


 /* Note: these two macros only work on shared buffers, not local ones! */
@@ -213,12 +214,22 @@ ReadBuffer_common(SMgrRelation smgr, boo
     if (isExtend)
         blockNum = smgrnblocks(smgr);

+    TRACE_POSTGRESQL_BUFFER_READ_START(blockNum, smgr->smgr_rnode.spcNode,
+        smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode, isLocalBuf);
+
     if (isLocalBuf)
     {
         ReadLocalBufferCount++;
         bufHdr = LocalBufferAlloc(smgr, blockNum, &found);
         if (found)
+        {
             LocalBufferHitCount++;
+            TRACE_POSTGRESQL_BUFFER_HIT(true); /* true == local buffer */
+        }
+        else
+        {
+            TRACE_POSTGRESQL_BUFFER_MISS(true); /* ditto */
+        }
     }
     else
     {
@@ -230,7 +241,14 @@ ReadBuffer_common(SMgrRelation smgr, boo
          */
         bufHdr = BufferAlloc(smgr, blockNum, strategy, &found);
         if (found)
+        {
             BufferHitCount++;
+            TRACE_POSTGRESQL_BUFFER_HIT(false); /* false != local buffer */
+        }
+        else
+        {
+            TRACE_POSTGRESQL_BUFFER_MISS(false); /* ditto */
+        }
     }

     /* At this point we do NOT hold any locks. */
@@ -246,6 +264,11 @@ ReadBuffer_common(SMgrRelation smgr, boo
             if (VacuumCostActive)
                 VacuumCostBalance += VacuumCostPageHit;

+            TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum,
+                smgr->smgr_rnode.spcNode,
+                smgr->smgr_rnode.dbNode,
+                smgr->smgr_rnode.relNode, isLocalBuf, found);
+
             return BufferDescriptorGetBuffer(bufHdr);
         }

@@ -368,6 +391,10 @@ ReadBuffer_common(SMgrRelation smgr, boo
     if (VacuumCostActive)
         VacuumCostBalance += VacuumCostPageMiss;

+    TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum, smgr->smgr_rnode.spcNode,
+            smgr->smgr_rnode.dbNode, smgr->smgr_rnode.relNode,
+            isLocalBuf, found);
+
     return BufferDescriptorGetBuffer(bufHdr);
 }

@@ -1086,6 +1113,8 @@ BufferSync(int flags)
     if (num_to_write == 0)
         return;                    /* nothing to do */

+    TRACE_POSTGRESQL_BUFFER_SYNC_START(NBuffers, num_to_write);
+
     /*
      * Loop over all buffers again, and write the ones (still) marked with
      * BM_CHECKPOINT_NEEDED.  In this loop, we start at the clock sweep point
@@ -1117,6 +1146,7 @@ BufferSync(int flags)
         {
             if (SyncOneBuffer(buf_id, false) & BUF_WRITTEN)
             {
+                TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
                 BgWriterStats.m_buf_written_checkpoints++;
                 num_written++;

@@ -1147,6 +1177,8 @@ BufferSync(int flags)
             buf_id = 0;
     }

+    TRACE_POSTGRESQL_BUFFER_SYNC_DONE(NBuffers, num_written, num_to_write);
+
     /*
      * Update checkpoint statistics. As noted above, this doesn't include
      * buffers written by other backends or bgwriter scan.
@@ -1653,11 +1685,13 @@ PrintBufferLeakWarning(Buffer buffer)
 void
 CheckPointBuffers(int flags)
 {
+    TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
     CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
     BufferSync(flags);
     CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
     smgrsync();
     CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
+    TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
 }


@@ -1759,6 +1793,10 @@ FlushBuffer(volatile BufferDesc *buf, SM
     if (reln == NULL)
         reln = smgropen(buf->tag.rnode);

+    TRACE_POSTGRESQL_BUFFER_FLUSH_START(reln->smgr_rnode.spcNode,
+         reln->smgr_rnode.dbNode,
+         reln->smgr_rnode.relNode);
+
     /*
      * Force XLOG flush up to buffer's LSN.  This implements the basic WAL
      * rule that log updates must hit disk before any of the data-file changes
@@ -1785,6 +1823,9 @@ FlushBuffer(volatile BufferDesc *buf, SM

     BufferFlushCount++;

+    TRACE_POSTGRESQL_BUFFER_FLUSH_DONE(reln->smgr_rnode.spcNode,
+         reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode);
+
     /*
      * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and
      * end the io_in_progress state.
Index: src/backend/storage/lmgr/deadlock.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/deadlock.c,v
retrieving revision 1.53
diff -u -3 -p -r1.53 deadlock.c
--- src/backend/storage/lmgr/deadlock.c    24 Mar 2008 18:22:36 -0000    1.53
+++ src/backend/storage/lmgr/deadlock.c    1 Aug 2008 03:56:21 -0000
@@ -30,6 +30,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /* One edge in the waits-for graph */
@@ -222,6 +223,8 @@ DeadLockCheck(PGPROC *proc)
          */
         int            nSoftEdges;

+        TRACE_POSTGRESQL_DEADLOCK_FOUND();
+
         nWaitOrders = 0;
         if (!FindLockCycle(proc, possibleConstraints, &nSoftEdges))
             elog(FATAL, "deadlock seems to have disappeared");
Index: src/backend/storage/lmgr/lock.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/lock.c,v
retrieving revision 1.183
diff -u -3 -p -r1.183 lock.c
--- src/backend/storage/lmgr/lock.c    17 Mar 2008 19:44:41 -0000    1.183
+++ src/backend/storage/lmgr/lock.c    1 Aug 2008 03:56:22 -0000
@@ -787,11 +787,11 @@ LockAcquire(const LOCKTAG *locktag,
          * Sleep till someone wakes me up.
          */

-        TRACE_POSTGRESQL_LOCK_STARTWAIT(locktag->locktag_field2, lockmode);
+        TRACE_POSTGRESQL_LOCK_WAIT_START(locktag->locktag_field2, lockmode);

         WaitOnLock(locallock, owner);

-        TRACE_POSTGRESQL_LOCK_ENDWAIT(locktag->locktag_field2, lockmode);
+        TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field2, lockmode);

         /*
          * NOTE: do not do any material change of state between here and
Index: src/backend/storage/lmgr/lwlock.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/lwlock.c,v
retrieving revision 1.51
diff -u -3 -p -r1.51 lwlock.c
--- src/backend/storage/lmgr/lwlock.c    17 Mar 2008 19:44:41 -0000    1.51
+++ src/backend/storage/lmgr/lwlock.c    1 Aug 2008 03:56:22 -0000
@@ -448,7 +448,7 @@ LWLockAcquire(LWLockId lockid, LWLockMod
         block_counts[lockid]++;
 #endif

-        TRACE_POSTGRESQL_LWLOCK_STARTWAIT(lockid, mode);
+        TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);

         for (;;)
         {
@@ -459,7 +459,7 @@ LWLockAcquire(LWLockId lockid, LWLockMod
             extraWaits++;
         }

-        TRACE_POSTGRESQL_LWLOCK_ENDWAIT(lockid, mode);
+        TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode);

         LOG_LWDEBUG("LWLockAcquire", lockid, "awakened");

Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.554
diff -u -3 -p -r1.554 postgres.c
--- src/backend/tcop/postgres.c    18 Jul 2008 20:26:06 -0000    1.554
+++ src/backend/tcop/postgres.c    1 Aug 2008 03:56:24 -0000
@@ -71,6 +71,7 @@
 #include "mb/pg_wchar.h"

 #include "pgstat.h"
+#include "pg_trace.h"

 extern int    optind;
 extern char *optarg;
@@ -551,6 +552,8 @@ pg_parse_query(const char *query_string)
 {
     List       *raw_parsetree_list;

+    TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
+
     if (log_parser_stats)
         ResetUsage();

@@ -572,6 +575,8 @@ pg_parse_query(const char *query_string)
     }
 #endif

+    TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string);
+
     return raw_parsetree_list;
 }

@@ -591,6 +596,8 @@ pg_analyze_and_rewrite(Node *parsetree,
     Query       *query;
     List       *querytree_list;

+    TRACE_POSTGRESQL_QUERY_REWRITE_START(query_string);
+
     /*
      * (1) Perform parse analysis.
      */
@@ -607,6 +614,8 @@ pg_analyze_and_rewrite(Node *parsetree,
      */
     querytree_list = pg_rewrite_query(query);

+    TRACE_POSTGRESQL_QUERY_REWRITE_DONE(query_string);
+
     return querytree_list;
 }

@@ -677,6 +686,8 @@ pg_plan_query(Query *querytree, int curs
     if (querytree->commandType == CMD_UTILITY)
         return NULL;

+    TRACE_POSTGRESQL_QUERY_PLAN_START();
+
     if (log_planner_stats)
         ResetUsage();

@@ -711,6 +722,8 @@ pg_plan_query(Query *querytree, int curs
     if (Debug_print_plan)
         elog_node_display(DEBUG1, "plan", plan, Debug_pretty_print);

+    TRACE_POSTGRESQL_QUERY_PLAN_DONE();
+
     return plan;
 }

@@ -785,6 +798,7 @@ exec_simple_query(const char *query_stri
     bool        isTopLevel;
     char        msec_str[32];

+
     /*
      * Report query to various monitoring facilities.
      */
@@ -792,6 +806,8 @@ exec_simple_query(const char *query_stri

     pgstat_report_activity(query_string);

+    TRACE_POSTGRESQL_QUERY_START(query_string);
+
     /*
      * We use save_log_statement_stats so ShowUsage doesn't report incorrect
      * results because ResetUsage wasn't called.
@@ -1058,6 +1074,8 @@ exec_simple_query(const char *query_stri
     if (save_log_statement_stats)
         ShowUsage("QUERY STATISTICS");

+    TRACE_POSTGRESQL_QUERY_DONE(query_string);
+
     debug_query_string = NULL;
 }

Index: src/backend/tcop/pquery.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/pquery.c,v
retrieving revision 1.123
diff -u -3 -p -r1.123 pquery.c
--- src/backend/tcop/pquery.c    12 May 2008 20:02:02 -0000    1.123
+++ src/backend/tcop/pquery.c    1 Aug 2008 03:56:25 -0000
@@ -24,6 +24,7 @@
 #include "tcop/utility.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
+#include "pg_trace.h"


 /*
@@ -711,6 +712,8 @@ PortalRun(Portal portal, long count, boo

     AssertArg(PortalIsValid(portal));

+    TRACE_POSTGRESQL_QUERY_EXECUTE_START();
+
     /* Initialize completion tag to empty string */
     if (completionTag)
         completionTag[0] = '\0';
@@ -857,6 +860,8 @@ PortalRun(Portal portal, long count, boo

     if (log_executor_stats && portal->strategy != PORTAL_MULTI_QUERY)
         ShowUsage("EXECUTOR STATISTICS");
+
+    TRACE_POSTGRESQL_QUERY_EXECUTE_DONE();

     return result;
 }
@@ -1237,6 +1242,8 @@ PortalRunMulti(Portal portal, bool isTop
              */
             PlannedStmt *pstmt = (PlannedStmt *) stmt;

+            TRACE_POSTGRESQL_QUERY_EXECUTE_START();
+
             if (log_executor_stats)
                 ResetUsage();

@@ -1257,6 +1264,8 @@ PortalRunMulti(Portal portal, bool isTop

             if (log_executor_stats)
                 ShowUsage("EXECUTOR STATISTICS");
+
+            TRACE_POSTGRESQL_QUERY_EXECUTE_DONE();
         }
         else
         {
Index: src/backend/utils/Gen_dummy_probes.sed
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/Gen_dummy_probes.sed,v
retrieving revision 1.1
diff -u -3 -p -r1.1 Gen_dummy_probes.sed
--- src/backend/utils/Gen_dummy_probes.sed    17 Mar 2008 19:44:41 -0000    1.1
+++ src/backend/utils/Gen_dummy_probes.sed    1 Aug 2008 03:56:25 -0000
@@ -6,11 +6,16 @@
 # $PostgreSQL: pgsql/src/backend/utils/Gen_dummy_probes.sed,v 1.1 2008/03/17 19:44:41 petere Exp $
 #-------------------------------------------------------------------------

-/^probe /!d
-s/^probe \([^(]*\)\(.*\);/\1\2/
+/^[     ]*probe /!d
+s/^[     ]*probe \([^(]*\)\(.*\);/\1\2/
 s/__/_/g
 y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
 s/^/#define TRACE_POSTGRESQL_/
-s/(INT, INT)/(INT1, INT2)/
+s/([^,)]\{1,\})/(INT1)/
+s/([^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2)/
+s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3)/
+s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3, INT4)/
+s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3, INT4, INT5)/
+s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3, INT4, INT5, INT6)/
 P
 s/(.*$/_ENABLED() (0)/
Index: src/backend/utils/Makefile
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/Makefile,v
retrieving revision 1.27
diff -u -3 -p -r1.27 Makefile
--- src/backend/utils/Makefile    17 Mar 2008 19:44:41 -0000    1.27
+++ src/backend/utils/Makefile    1 Aug 2008 03:56:25 -0000
@@ -20,9 +20,13 @@ $(SUBDIRS:%=%-recursive): fmgroids.h
 fmgroids.h fmgrtab.c: Gen_fmgrtab.sh $(top_srcdir)/src/include/catalog/pg_proc.h
     AWK='$(AWK)' $(SHELL) $< $(top_srcdir)/src/include/catalog/pg_proc.h

+ifneq ($(enable_dtrace), yes)
+probes.h: Gen_dummy_probes.sed
+endif
+
 probes.h: probes.d
 ifeq ($(enable_dtrace), yes)
-    $(DTRACE) -h -s $< -o $@.tmp
+    $(DTRACE) -C -h -s $< -o $@.tmp
     sed -e 's/POSTGRESQL_/TRACE_POSTGRESQL_/g' $@.tmp >$@
     rm $@.tmp
 else
Index: src/backend/utils/probes.d
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/probes.d,v
retrieving revision 1.2
diff -u -3 -p -r1.2 probes.d
--- src/backend/utils/probes.d    2 Jan 2008 02:42:06 -0000    1.2
+++ src/backend/utils/probes.d    1 Aug 2008 03:56:27 -0000
@@ -7,18 +7,85 @@
  * ----------
  */

+
+/* typdefs used in PostgreSQL */
+typedef unsigned int LocalTransactionId;
+typedef int LWLockId;
+typedef int LWLockMode;
+typedef int LOCKMODE;
+typedef unsigned int BlockNumber;
+typedef unsigned int Oid;
+
+#define bool char
+
 provider postgresql {

-probe transaction__start(int);
-probe transaction__commit(int);
-probe transaction__abort(int);
-probe lwlock__acquire(int, int);
-probe lwlock__release(int);
-probe lwlock__startwait(int, int);
-probe lwlock__endwait(int, int);
-probe lwlock__condacquire(int, int);
-probe lwlock__condacquire__fail(int, int);
-probe lock__startwait(int, int);
-probe lock__endwait(int, int);
+    /*
+     * Due to a bug in Mac OS X 10.5, using built-in typedefs (e.g. uintptr_t,
+     * uint32_t, etc.) cause compilation errors.
+     */
+
+    probe transaction__start(LocalTransactionId);
+    probe transaction__commit(LocalTransactionId);
+    probe transaction__abort(LocalTransactionId);
+
+    probe lwlock__acquire(LWLockId, LWLockMode);
+    probe lwlock__release(LWLockId);
+    probe lwlock__wait__start(LWLockId, LWLockMode);
+    probe lwlock__wait__done(LWLockId, LWLockMode);
+    probe lwlock__condacquire(LWLockId, LWLockMode);
+    probe lwlock__condacquire__fail(LWLockId, LWLockMode);
+
+    /* The following probe declarations cause compilation errors
+         * on Mac OS X but not on Solaris. Need further investigation.
+     * probe lock__wait__start(unsigned int, LOCKMODE);
+     * probe lock__wait__done(unsigned int, LOCKMODE);
+     */
+    probe lock__wait__start(unsigned int, int);
+    probe lock__wait__done(unsigned int, int);
+
+    probe query__parse__start(const char *);
+    probe query__parse__done(const char *);
+    probe query__rewrite__start(const char *);
+    probe query__rewrite__done(const char *);
+    probe query__plan__start();
+    probe query__plan__done();
+    probe query__execute__start();
+    probe query__execute__done();
+    probe query__start(const char *);
+    probe query__done(const char *);
+    probe statement__status(const char *);
+
+    probe sort__start(int, bool, int, int, bool);
+    probe sort__end(unsigned long, long);
+
+    /* The following probe declarations cause compilation errors
+         * on Mac OS X but not on Solaris. Need further investigation.
+     * probe buffer__read__start(BlockNumber, Oid, Oid, Oid, bool);
+     * probe buffer__read__done(BlockNumber, Oid, Oid, Oid, bool, bool);
+     */
+    probe buffer__read__start(unsigned int, unsigned int, unsigned int, unsigned int, bool);
+    probe buffer__read__done(unsigned int, unsigned int, unsigned int, unsigned int, bool, bool);
+
+    probe buffer__flush__start(Oid, Oid, Oid);
+    probe buffer__flush__done(Oid, Oid, Oid);
+
+    probe buffer__hit(bool);
+    probe buffer__miss(bool);
+    probe buffer__checkpoint__start(int);
+    probe buffer__checkpoint__done();
+    probe buffer__sync__start(int, int);
+    probe buffer__sync__written(int);
+    probe buffer__sync__done(int, int, int);
+
+    probe deadlock__found();

+    probe clog__checkpoint__start(bool);
+    probe clog__checkpoint__done(bool);
+    probe subtrans__checkpoint__start(bool);
+    probe subtrans__checkpoint__done(bool);
+    probe multixact__checkpoint__start(bool);
+    probe multixact__checkpoint__done(bool);
+    probe twophase__checkpoint__start();
+    probe twophase__checkpoint__done();
 };
Index: src/backend/utils/sort/tuplesort.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/sort/tuplesort.c,v
retrieving revision 1.85
diff -u -3 -p -r1.85 tuplesort.c
--- src/backend/utils/sort/tuplesort.c    19 Jun 2008 00:46:05 -0000    1.85
+++ src/backend/utils/sort/tuplesort.c    1 Aug 2008 03:56:30 -0000
@@ -115,12 +115,18 @@
 #include "utils/rel.h"
 #include "utils/syscache.h"
 #include "utils/tuplesort.h"
+#include "pg_trace.h"


 /* GUC variables */
 #ifdef TRACE_SORT
 bool        trace_sort = false;
 #endif
+
+#define HEAP_SORT    0
+#define INDEX_SORT    1
+#define DATUM_SORT    2
+
 #ifdef DEBUG_BOUNDED_SORT
 bool        optimize_bounded_sort = true;
 #endif
@@ -569,6 +575,7 @@ tuplesort_begin_heap(TupleDesc tupDesc,
              "begin tuple sort: nkeys = %d, workMem = %d, randomAccess = %c",
              nkeys, workMem, randomAccess ? 't' : 'f');
 #endif
+    TRACE_POSTGRESQL_SORT_START(HEAP_SORT, false, nkeys, workMem, randomAccess);

     state->nKeys = nkeys;

@@ -636,6 +643,8 @@ tuplesort_begin_index_btree(Relation ind

     state->nKeys = RelationGetNumberOfAttributes(indexRel);

+    TRACE_POSTGRESQL_SORT_START(INDEX_SORT, enforceUnique, state->nKeys, workMem, randomAccess);
+
     state->comparetup = comparetup_index_btree;
     state->copytup = copytup_index;
     state->writetup = writetup_index;
@@ -713,6 +722,7 @@ tuplesort_begin_datum(Oid datumType,
              "begin datum sort: workMem = %d, randomAccess = %c",
              workMem, randomAccess ? 't' : 'f');
 #endif
+    TRACE_POSTGRESQL_SORT_START(DATUM_SORT, false, 1, workMem, randomAccess);

     state->nKeys = 1;            /* always a one-column sort */

@@ -825,6 +835,11 @@ tuplesort_end(Tuplesortstate *state)
     }
 #endif

+    TRACE_POSTGRESQL_SORT_END(state->tapeset,
+            (state->tapeset ? LogicalTapeSetBlocks(state->tapeset) :
+            (state->allowedMem - state->availMem + 1023) / 1024));
+
+
     MemoryContextSwitchTo(oldcontext);

     /*

Re: Review: DTrace probes (merged version) ver_03

From
Robert Lor
Date:
Alvaro Herrera wrote:
> I was checking the DTrace docs for other reasons and I came across this,
> which maybe can be useful here:
>
> http://docs.sun.com/app/docs/doc/817-6223/chp-xlate?a=view
>
>   
Yes, I  think using the translator  is the best approach to expose 
internal structures in a stable manner.

-- 
Robert Lor           Sun Microsystems
Austin, USA          http://sun.com/postgresql



Re: Review: DTrace probes (merged version) ver_03

From
Alvaro Herrera
Date:
Robert Lor wrote:

> I made some changes to the sed script so it works with the sed on  
> Solaris & OS X. I tested this patch on both Solaris and OS X with DTrace  
> enabled and disabled and also verified that the sed script works with  
> GNU sed. I hope this is the final change for this patch. Thanks for  
> catching all the issues, and my bad for not testing with DTrace disabled.

Applied.

Something seems wrong with this argument:

> +    /* The following probe declarations cause compilation errors
> +         * on Mac OS X but not on Solaris. Need further investigation.
> +     * probe buffer__read__start(BlockNumber, Oid, Oid, Oid, bool);
> +     * probe buffer__read__done(BlockNumber, Oid, Oid, Oid, bool, bool);
> +     */
> +    probe buffer__read__start(unsigned int, unsigned int, unsigned int, unsigned int, bool);
> +    probe buffer__read__done(unsigned int, unsigned int, unsigned int, unsigned int, bool, bool);
> +
> +    probe buffer__flush__start(Oid, Oid, Oid);
> +    probe buffer__flush__done(Oid, Oid, Oid);

How come "Oid" works for FLUSH_START but not READ_START and READ_DONE?

Also, I wonder if there's any proof that this works at all on Mac OS X,
given that the rule to create probes.o from probes.d is conditionally
pulled in only for Solaris?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Review: DTrace probes (merged version) ver_03

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> Applied.

I forgot to mention that I renamed the sort_end probe to sort_done, to
keep the naming convention.  It is a backwards-incompatible change, but
there were plenty other probes renamed so my guess is that one more does
not matter all that much.

Also, it seems I cannot sort pg_trace and pgstat consistently :-(

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Review: DTrace probes (merged version) ver_03

From
Robert Lor
Date:
Alvaro Herrera wrote:
>
> Applied.
>   
Thanks!
>> How come "Oid" works for FLUSH_START but not READ_START and READ_DONE?
>>     

I'll get the answer for this.

>> Also, I wonder if there's any proof that this works at all on Mac OS X,
>> given that the rule to create probes.o from probes.d is conditionally
>> pulled in only for Solaris?
>>     
It does work on OS X. I dare everyone to try it :-)

The implementation of DTrace on OS X is a bit different than on Solaris, 
so the rule your referring to is only needed for Solaris.


-- 
Robert Lor           Sun Microsystems
Austin, USA          http://sun.com/postgresql



Re: Review: DTrace probes (merged version) ver_03

From
Robert Lor
Date:
Alvaro Herrera wrote:
> Alvaro Herrera wrote:
>
>   
>> Applied.
>>     
>
> I forgot to mention that I renamed the sort_end probe to sort_done, to
> keep the naming convention.  It is a backwards-incompatible change, but
> there were plenty other probes renamed so my guess is that one more does
> not matter all that much.
>   
It was overlooked :-( . Good catch!
> Also, it seems I cannot sort pg_trace and pgstat consistently :-(
>
>   
Not sure what you're trying to say here!

-- 
Robert Lor           Sun Microsystems
Austin, USA          http://sun.com/postgresql



Re: Review: DTrace probes (merged version) ver_03

From
Greg Smith
Date:
One tiny change I'd suggest here:  if you look at the code for checkpoint 
buffer writing there are traces for two points in the process:
 CheckPointBuffers(int flags) {
+       TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);        CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
    BufferSync(flags);        CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();        smgrsync();
CheckpointStats.ckpt_sync_end_t= GetCurrentTimestamp();
 
+       TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE(); }

Note how the existing code also tracks how long the sync phase took 
compared to the write one, and reports both numbers in the checkpoint 
logs.  It would be nice to add another probe at that same point (just 
after ckpt_sync_t is set) so that dtrace users could instrument all these 
possibilities as well:  just buffer write time/resources, just sync ones, 
or both.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: Review: DTrace probes (merged version) ver_03

From
Alvaro Herrera
Date:
Greg Smith wrote:
> One tiny change I'd suggest here:  if you look at the code for checkpoint 
> buffer writing there are traces for two points in the process:
>
>  CheckPointBuffers(int flags)
>  {
> +       TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
>         CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
>         BufferSync(flags);
>         CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
>         smgrsync();
>         CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
> +       TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
>  }
>
> Note how the existing code also tracks how long the sync phase took  
> compared to the write one, and reports both numbers in the checkpoint  
> logs.  It would be nice to add another probe at that same point (just  
> after ckpt_sync_t is set) so that dtrace users could instrument all these 
> possibilities as well:  just buffer write time/resources, just sync ones, 
> or both.

Sounds like the thing to do would be to pass CheckpointStats into the
DONE probe.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Review: DTrace probes (merged version) ver_03

From
Robert Lor
Date:
Alvaro Herrera wrote:
> Greg Smith wrote:
>   
>> One tiny change I'd suggest here:  if you look at the code for checkpoint 
>> buffer writing there are traces for two points in the process:
>>
>>  CheckPointBuffers(int flags)
>>  {
>> +       TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
>>         CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
>>         BufferSync(flags);
>>         CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
>>         smgrsync();
>>         CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
>> +       TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
>>  }
>>
>> Note how the existing code also tracks how long the sync phase took  
>> compared to the write one, and reports both numbers in the checkpoint  
>> logs.  It would be nice to add another probe at that same point (just  
>> after ckpt_sync_t is set) so that dtrace users could instrument all these 
>> possibilities as well:  just buffer write time/resources, just sync ones, 
>> or both.
>>     
>
> Sounds like the thing to do would be to pass CheckpointStats into the
> DONE probe.
>
>   

I like this approach as it avoids the need to have too many probes. I 
will make this change and get it in with the remaining probes for the 
next commit fest.

-- 
Robert Lor           Sun Microsystems
Austin, USA          http://sun.com/postgresql



Re: Review: DTrace probes (merged version) ver_03

From
Greg Smith
Date:
On Fri, 1 Aug 2008, Alvaro Herrera wrote:

> Sounds like the thing to do would be to pass CheckpointStats into the
> DONE probe.

I thought it would be more useful to demarcate where the two phases of the 
checkpoint process were at clearly--the actual times themselves are 
helpful but dtrace can do more than that.  The write and sync phases have 
very different I/O characteristics, and it really should be easy for 
people to analyze them independantly.  I'm not familiar enough with dtrace 
to know if that's easy with the patch as it stands, I think there needs to 
be another probe in the middle there to make that straightforward.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD