Thread: New DTrace probes proposal

New DTrace probes proposal

From
Robert Lor
Date:
(Resending since it didn't work the first time. Not sure if attaching a
tar file was the culprit.)

I'd like to propose adding the following probes (some of which came from
Simon) to 8.4.

I think these probe provide very useful data. Although some of the data
can be collected now, the main advantages with probes, among others, are
(1) they are always available and can be enabled only when needed
especially in production (2) different combinations of probes can be
used together to collect interesting data.

They work on OS X Leopard & Solaris now, and hopefully on FreeBSD soon.

Preliminary patch attached along with sample DTrace scripts.

-Robert

---------------

* Probes to measure query time
query-parse-start (int, char *)
query-parse-done (int, char *)
query-plan-start ()
query-plan-done ()
query-execute-start ()
query-execute-done ()
query-statement-start (int, char *)
query-statement-done (int, char *)

* Probes to measure dirty buffer writes by the backend because bgwriter
is not effective
dirty-buffer-write-start (int, int, int, int)
dirty-buffer-write-done (int, int, int, int)

* Probes to measure physical writes from the shared buffer
buffer-write-start (int, int, int, int)
buffer-write-done (int, int, int, int, int)

* Probes to measure reads of a relation from a particular buffer block
buffer-read-start (int, int, int, int, int)
buffer-read-done (int, int, int, int, int, int)

* Probes to measure the effectiveness of buffer caching
buffer-hit ()
buffer-miss ()

* Probes to measure I/O time because wal_buffers is too small
wal-buffer-write-start ()
wal-buffer-write-done ()

* Probes to measure checkpoint stats such as running time, buffers
written, xlog files added, removed, recycled, etc
checkpoint-start (int)
checkpoint-done (int, int, int, int, int)

* Probes to measure Idle in Transaction and client/network time
idle-transaction-start (int, int)
idle-transaction-done ()

* Probes to measure sort time
sort-start (int, int, int, int, int)
sort-done (int, long)

* Probes to determine whether or not the deadlock detector has found a
deadlock
deadlock-found ()
deadlock-notfound (int)

* Probes to measure reads/writes by block numbers and relations
smgr-read-start (int, int,  int, int)
smgr-read-end (int, int,  int, int, int, int)
smgr-write-start (int, int, int, int)
smgr-write-end (int, int, int, int, int, int)



Index: backend/access/transam/xlog.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.295
diff -u -3 -p -r1.295 xlog.c
--- backend/access/transam/xlog.c    25 Mar 2008 22:42:42 -0000    1.295
+++ backend/access/transam/xlog.c    16 May 2008 18:37:22 -0000
@@ -50,6 +50,7 @@
 #include "utils/builtins.h"
 #include "utils/pg_locale.h"
 #include "utils/ps_status.h"
+#include "pg_trace.h"


 /* File path names (all relative to $PGDATA) */
@@ -1258,12 +1259,14 @@ AdvanceXLInsertBuffer(bool new_segment)
                  * Have to write buffers while holding insert lock. This is
                  * not good, so only write as much as we absolutely must.
                  */
+                TRACE_POSTGRESQL_WAL_BUFFER_WRITE_START();
                 WriteRqst.Write = OldPageRqstPtr;
                 WriteRqst.Flush.xlogid = 0;
                 WriteRqst.Flush.xrecoff = 0;
                 XLogWrite(WriteRqst, false, false);
                 LWLockRelease(WALWriteLock);
                 Insert->LogwrtResult = LogwrtResult;
+                TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DONE();
             }
         }
     }
@@ -5665,6 +5668,8 @@ CreateCheckPoint(int flags)
     MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
     CheckpointStats.ckpt_start_t = GetCurrentTimestamp();

+    TRACE_POSTGRESQL_CHECKPOINT_START(flags);
+
     /*
      * Use a critical section to force system panic if we have trouble.
      */
@@ -5726,6 +5731,7 @@ CreateCheckPoint(int flags)
             LWLockRelease(WALInsertLock);
             LWLockRelease(CheckpointLock);
             END_CRIT_SECTION();
+            TRACE_POSTGRESQL_CHECKPOINT_DONE(UINT_MAX,0,0,0,UINT_MAX);
             return;
         }
     }
@@ -5945,6 +5951,11 @@ CreateCheckPoint(int flags)
     if (log_checkpoints)
         LogCheckpointEnd();

+    TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
+                NBuffers, CheckpointStats.ckpt_segs_added,
+                CheckpointStats.ckpt_segs_removed,
+                CheckpointStats.ckpt_segs_recycled);
+
     LWLockRelease(CheckpointLock);
 }

Index: backend/storage/buffer/bufmgr.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.228
diff -u -3 -p -r1.228 bufmgr.c
--- backend/storage/buffer/bufmgr.c    1 Jan 2008 19:45:51 -0000    1.228
+++ backend/storage/buffer/bufmgr.c    16 May 2008 18:37:22 -0000
@@ -42,6 +42,7 @@
 #include "storage/smgr.h"
 #include "utils/resowner.h"
 #include "pgstat.h"
+#include "pg_trace.h"


 /* Note: these two macros only work on shared buffers, not local ones! */
@@ -171,6 +172,7 @@ ReadBuffer_common(Relation reln, BlockNu
     if (isExtend)
         blockNum = smgrnblocks(reln->rd_smgr);

+    TRACE_POSTGRESQL_BUFFER_READ_START(blockNum, reln->rd_node.spcNode, reln->rd_node.dbNode, reln->rd_node.relNode,
isLocalBuf);
     pgstat_count_buffer_read(reln);

     if (isLocalBuf)
@@ -200,12 +202,16 @@ ReadBuffer_common(Relation reln, BlockNu
     {
         if (!isExtend)
         {
+            TRACE_POSTGRESQL_BUFFER_HIT();
             /* Just need to update stats before we exit */
             pgstat_count_buffer_hit(reln);

             if (VacuumCostActive)
                 VacuumCostBalance += VacuumCostPageHit;

+            TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum,
+                 reln->rd_node.spcNode, reln->rd_node.dbNode,
+                 reln->rd_node.relNode, isLocalBuf, found);
             return BufferDescriptorGetBuffer(bufHdr);
         }

@@ -257,6 +263,7 @@ ReadBuffer_common(Relation reln, BlockNu
             } while (!StartBufferIO(bufHdr, true));
         }
     }
+    TRACE_POSTGRESQL_BUFFER_MISS();

     /*
      * if we have gotten to this point, we have allocated a buffer for the
@@ -324,6 +331,9 @@ ReadBuffer_common(Relation reln, BlockNu
     if (VacuumCostActive)
         VacuumCostBalance += VacuumCostPageMiss;

+    TRACE_POSTGRESQL_BUFFER_READ_DONE(blockNum, reln->rd_node.spcNode,
+        reln->rd_node.dbNode, reln->rd_node.relNode, isLocalBuf, found);
+
     return BufferDescriptorGetBuffer(bufHdr);
 }

@@ -466,6 +476,11 @@ BufferAlloc(Relation reln,
              * happens to be trying to split the page the first one got from
              * StrategyGetBuffer.)
              */
+
+            TRACE_POSTGRESQL_DIRTY_BUFFER_WRITE_START(blockNum,
+              reln->rd_node.spcNode, reln->rd_node.dbNode,
+              reln->rd_node.relNode);
+
             if (LWLockConditionalAcquire(buf->content_lock, LW_SHARED))
             {
                 /*
@@ -488,6 +503,11 @@ BufferAlloc(Relation reln,
                 /* OK, do the I/O */
                 FlushBuffer(buf, NULL);
                 LWLockRelease(buf->content_lock);
+
+                TRACE_POSTGRESQL_DIRTY_BUFFER_WRITE_DONE(
+                  blockNum, reln->rd_node.spcNode,
+                  reln->rd_node.dbNode, reln->rd_node.relNode,
+                  (oldFlags & BM_HINT_BITS_TEST));
             }
             else
             {
@@ -1734,6 +1754,10 @@ FlushBuffer(volatile BufferDesc *buf, SM
     buf->flags &= ~BM_JUST_DIRTIED;
     UnlockBufHdr(buf);

+    TRACE_POSTGRESQL_BUFFER_WRITE_START(buf->tag.blockNum,
+         reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode,
+         reln->smgr_rnode.relNode);
+
     smgrwrite(reln,
               buf->tag.blockNum,
               (char *) BufHdrGetBlock(buf),
@@ -1741,6 +1765,10 @@ FlushBuffer(volatile BufferDesc *buf, SM

     BufferFlushCount++;

+    TRACE_POSTGRESQL_BUFFER_WRITE_DONE(buf->tag.blockNum,
+         reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode,
+         reln->smgr_rnode.relNode, (buf->flags & BM_HINT_BITS_TEST));
+
     /*
      * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and
      * end the io_in_progress state.
@@ -2155,7 +2183,7 @@ SetBufferCommitInfoNeedsSave(Buffer buff
         Assert(bufHdr->refcount > 0);
         if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
             VacuumCostBalance += VacuumCostPageDirty;
-        bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
+        bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED | BM_HINT_BITS_TEST);
         UnlockBufHdr(bufHdr);
     }
 }
@@ -2492,7 +2520,7 @@ TerminateBufferIO(volatile BufferDesc *b
     Assert(buf->flags & BM_IO_IN_PROGRESS);
     buf->flags &= ~(BM_IO_IN_PROGRESS | BM_IO_ERROR);
     if (clear_dirty && !(buf->flags & BM_JUST_DIRTIED))
-        buf->flags &= ~(BM_DIRTY | BM_CHECKPOINT_NEEDED);
+        buf->flags &= ~(BM_DIRTY | BM_CHECKPOINT_NEEDED | BM_HINT_BITS_TEST);
     buf->flags |= set_flag_bits;

     UnlockBufHdr(buf);
Index: backend/storage/lmgr/deadlock.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/deadlock.c,v
retrieving revision 1.53
diff -u -3 -p -r1.53 deadlock.c
--- backend/storage/lmgr/deadlock.c    24 Mar 2008 18:22:36 -0000    1.53
+++ backend/storage/lmgr/deadlock.c    16 May 2008 18:37:23 -0000
@@ -30,6 +30,7 @@
 #include "storage/lmgr.h"
 #include "storage/proc.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /* One edge in the waits-for graph */
@@ -222,6 +223,8 @@ DeadLockCheck(PGPROC *proc)
          */
         int            nSoftEdges;

+        TRACE_POSTGRESQL_DEADLOCK_FOUND();
+
         nWaitOrders = 0;
         if (!FindLockCycle(proc, possibleConstraints, &nSoftEdges))
             elog(FATAL, "deadlock seems to have disappeared");
@@ -259,6 +262,8 @@ DeadLockCheck(PGPROC *proc)
         ProcLockWakeup(GetLocksMethodTable(lock), lock);
     }

+    TRACE_POSTGRESQL_DEADLOCK_NOTFOUND(nWaitOrders);
+
     /* Return code tells caller if we had to escape a deadlock or not */
     if (nWaitOrders > 0)
         return DS_SOFT_DEADLOCK;
Index: backend/storage/smgr/md.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/smgr/md.c,v
retrieving revision 1.136
diff -u -3 -p -r1.136 md.c
--- backend/storage/smgr/md.c    10 Mar 2008 20:06:27 -0000    1.136
+++ backend/storage/smgr/md.c    16 May 2008 18:37:23 -0000
@@ -26,6 +26,7 @@
 #include "storage/smgr.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
+#include "pg_trace.h"


 /* interval for calling AbsorbFsyncRequests in mdsync */
@@ -581,6 +582,8 @@ mdread(SMgrRelation reln, BlockNumber bl
     int            nbytes;
     MdfdVec    *v;

+    TRACE_POSTGRESQL_SMGR_READ_START(blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode,
reln->smgr_rnode.relNode);
+
     v = _mdfd_getseg(reln, blocknum, false, EXTENSION_FAIL);

 #ifdef USE_SEGMENTED_FILES
@@ -599,7 +602,11 @@ mdread(SMgrRelation reln, BlockNumber bl
                         reln->smgr_rnode.dbNode,
                         reln->smgr_rnode.relNode)));

-    if ((nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ)) != BLCKSZ)
+    nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ);
+
+    TRACE_POSTGRESQL_SMGR_READ_DONE(blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode,
reln->smgr_rnode.relNode,nbytes, BLCKSZ); 
+
+    if (nbytes != BLCKSZ)
     {
         if (nbytes < 0)
             ereport(ERROR,
@@ -651,6 +658,8 @@ mdwrite(SMgrRelation reln, BlockNumber b
     Assert(blocknum < mdnblocks(reln));
 #endif

+    TRACE_POSTGRESQL_SMGR_WRITE_START(blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode,
reln->smgr_rnode.relNode);
+
     v = _mdfd_getseg(reln, blocknum, isTemp, EXTENSION_FAIL);

 #ifdef USE_SEGMENTED_FILES
@@ -691,6 +700,8 @@ mdwrite(SMgrRelation reln, BlockNumber b
                  errhint("Check free disk space.")));
     }

+    TRACE_POSTGRESQL_SMGR_WRITE_DONE(blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode,
reln->smgr_rnode.relNode,nbytes, BLCKSZ); 
+
     if (!isTemp)
         register_dirty_segment(reln, v);
 }
Index: backend/tcop/postgres.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.547
diff -u -3 -p -r1.547 postgres.c
--- backend/tcop/postgres.c    26 Mar 2008 18:48:59 -0000    1.547
+++ backend/tcop/postgres.c    16 May 2008 18:37:24 -0000
@@ -70,6 +70,7 @@
 #include "mb/pg_wchar.h"

 #include "pgstat.h"
+#include "pg_trace.h"

 extern int    optind;
 extern char *optarg;
@@ -550,6 +551,8 @@ pg_parse_query(const char *query_string)
 {
     List       *raw_parsetree_list;

+    TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
+
     if (log_parser_stats)
         ResetUsage();

@@ -659,6 +662,8 @@ pg_rewrite_query(Query *query)
         elog_node_display(DEBUG1, "rewritten parse tree", querytree_list,
                           Debug_pretty_print);

+    TRACE_POSTGRESQL_QUERY_PARSE_DONE();
+
     return querytree_list;
 }

@@ -672,6 +677,8 @@ pg_plan_query(Query *querytree, int curs
 {
     PlannedStmt *plan;

+    TRACE_POSTGRESQL_QUERY_PLAN_START();
+
     /* Utility commands have no plans. */
     if (querytree->commandType == CMD_UTILITY)
         return NULL;
@@ -709,6 +716,7 @@ pg_plan_query(Query *querytree, int curs
      */
     if (Debug_print_plan)
         elog_node_display(DEBUG1, "plan", plan, Debug_pretty_print);
+    TRACE_POSTGRESQL_QUERY_PLAN_DONE();

     return plan;
 }
@@ -796,6 +804,7 @@ exec_simple_query(const char *query_stri
     bool        isTopLevel;
     char        msec_str[32];

+
     /*
      * Report query to various monitoring facilities.
      */
@@ -803,6 +812,8 @@ exec_simple_query(const char *query_stri

     pgstat_report_activity(query_string);

+    TRACE_POSTGRESQL_QUERY_STATEMENT_START(MyProcPid, query_string);
+
     /*
      * We use save_log_statement_stats so ShowUsage doesn't report incorrect
      * results because ResetUsage wasn't called.
@@ -1064,6 +1075,8 @@ exec_simple_query(const char *query_stri
     if (save_log_statement_stats)
         ShowUsage("QUERY STATISTICS");

+    TRACE_POSTGRESQL_QUERY_STATEMENT_DONE(MyProcPid, query_string);
+
     debug_query_string = NULL;
 }

@@ -3484,6 +3497,7 @@ PostgresMain(int argc, char *argv[], con
         {
             if (IsTransactionOrTransactionBlock())
             {
+                TRACE_POSTGRESQL_IDLE_TRANSACTION_START(GetCurrentTransactionId(), GetCurrentCommandId(false));
                 set_ps_display("idle in transaction", false);
                 pgstat_report_activity("<IDLE> in transaction");
             }
@@ -3513,6 +3527,8 @@ PostgresMain(int argc, char *argv[], con
          */
         firstchar = ReadCommand(&input_message);

+        TRACE_POSTGRESQL_IDLE_TRANSACTION_DONE();
+
         /*
          * (4) disable async signal conditions again.
          */
Index: backend/tcop/pquery.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/pquery.c,v
retrieving revision 1.122
diff -u -3 -p -r1.122 pquery.c
--- backend/tcop/pquery.c    26 Mar 2008 18:48:59 -0000    1.122
+++ backend/tcop/pquery.c    16 May 2008 18:37:24 -0000
@@ -24,6 +24,7 @@
 #include "tcop/utility.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
+#include "pg_trace.h"


 /*
@@ -714,6 +715,8 @@ PortalRun(Portal portal, long count, boo

     AssertArg(PortalIsValid(portal));

+    TRACE_POSTGRESQL_QUERY_EXECUTE_START();
+
     /* Initialize completion tag to empty string */
     if (completionTag)
         completionTag[0] = '\0';
@@ -864,6 +867,8 @@ PortalRun(Portal portal, long count, boo

     if (log_executor_stats && portal->strategy != PORTAL_MULTI_QUERY)
         ShowUsage("EXECUTOR STATISTICS");
+
+    TRACE_POSTGRESQL_QUERY_EXECUTE_DONE();

     return result;
 }
@@ -1234,6 +1239,8 @@ PortalRunMulti(Portal portal, bool isTop
              */
             PlannedStmt *pstmt = (PlannedStmt *) stmt;

+            TRACE_POSTGRESQL_QUERY_EXECUTE_START();
+
             if (log_executor_stats)
                 ResetUsage();

@@ -1254,6 +1261,8 @@ PortalRunMulti(Portal portal, bool isTop

             if (log_executor_stats)
                 ShowUsage("EXECUTOR STATISTICS");
+
+            TRACE_POSTGRESQL_QUERY_EXECUTE_DONE();
         }
         else
         {
Index: backend/utils/probes.d
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/probes.d,v
retrieving revision 1.2
diff -u -3 -p -r1.2 probes.d
--- backend/utils/probes.d    2 Jan 2008 02:42:06 -0000    1.2
+++ backend/utils/probes.d    16 May 2008 18:37:24 -0000
@@ -9,16 +9,60 @@

 provider postgresql {

-probe transaction__start(int);
-probe transaction__commit(int);
-probe transaction__abort(int);
-probe lwlock__acquire(int, int);
-probe lwlock__release(int);
-probe lwlock__startwait(int, int);
-probe lwlock__endwait(int, int);
-probe lwlock__condacquire(int, int);
-probe lwlock__condacquire__fail(int, int);
-probe lock__startwait(int, int);
-probe lock__endwait(int, int);
+    probe transaction__start(int);
+    probe transaction__commit(int);
+    probe transaction__abort(int);
+    probe lwlock__acquire(int, int);
+    probe lwlock__release(int);
+    probe lwlock__startwait(int, int);
+    probe lwlock__endwait(int, int);
+    probe lwlock__condacquire(int, int);
+    probe lwlock__condacquire__fail(int, int);
+    probe lock__startwait(int, int);
+    probe lock__endwait(int, int);
+    probe idle__transaction__start(int, int);
+    probe idle__transaction__done();
+    probe wal__buffer__write__start();
+    probe wal__buffer__write__done();
+    probe query__parse__start(char *query_string);
+    probe query__parse__done();
+    probe query__plan__start();
+    probe query__plan__done();
+    probe query__execute__start();
+    probe query__execute__done();
+    probe query__statement__start(int pid, char *query_string);
+    probe query__statement__done(int pid, char *query_string);
+    probe sort__start(int, int, int, int, int);
+    probe sort__end(int, long);
+    probe buffer__read__start(int blockNum, int tablespaceOid,
+            int databaseOid, int relationOid, int isLocalBuf);
+    probe buffer__read__done(int blockNum, int tablespaceOid,
+             int databaseOid, int relationOid, int isLocalBuf,
+            int isInBuffPool);
+    probe dirty__buffer__write__start(int blockNum, int tablespaceOid,
+            int databaseOid, int relationOid);
+    probe dirty__buffer__write__done(int blockNum, int tablespaceOid,
+             int databaseOid, int relationOid, int isHintBitsWrite);
+    probe buffer__write__start(int blockNum, int tablespaceOid,
+            int databaseOid, int relationOid);
+    probe buffer__write__done(int blockNum, int tablespaceOid,
+             int databaseOid, int relationOid, int isHintBitsWrite);
+    probe buffer__hit();
+    probe buffer__miss();
+    probe deadlock__found();
+    probe deadlock__notfound(int);
+    probe checkpoint__start(int checkpointFlags);
+    probe checkpoint__done(int bufsWritten, int NBuffers, int segsAdded,
+            int segsRemoved, int segsRecycled);
+    probe smgr__read__start(int blockNum, int tablespaceOid,
+            int databaseOid, int relationOid);
+    probe smgr__read__done(int blockNum, int tablespaceOid,
+             int databaseOid, int relationOid, int bytesRead,
+            int blockSize);
+    probe smgr__write__start(int blockNum, int tablespaceOid,
+            int databaseOid, int relationOid);
+    probe smgr__write__done(int blockNum, int tablespaceOid,
+            int databaseOid, int relationOid, int bytesRead,
+            int blockSize);

 };
Index: backend/utils/sort/tuplesort.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/sort/tuplesort.c,v
retrieving revision 1.83
diff -u -3 -p -r1.83 tuplesort.c
--- backend/utils/sort/tuplesort.c    17 Mar 2008 03:45:36 -0000    1.83
+++ backend/utils/sort/tuplesort.c    16 May 2008 18:37:25 -0000
@@ -115,12 +115,18 @@
 #include "utils/pg_rusage.h"
 #include "utils/syscache.h"
 #include "utils/tuplesort.h"
+#include "pg_trace.h"


 /* GUC variables */
 #ifdef TRACE_SORT
 bool        trace_sort = false;
 #endif
+
+#define HEAP_SORT    0
+#define INDEX_SORT    1
+#define DATUM_SORT    2
+
 #ifdef DEBUG_BOUNDED_SORT
 bool        optimize_bounded_sort = true;
 #endif
@@ -569,6 +575,7 @@ tuplesort_begin_heap(TupleDesc tupDesc,
              "begin tuple sort: nkeys = %d, workMem = %d, randomAccess = %c",
              nkeys, workMem, randomAccess ? 't' : 'f');
 #endif
+    TRACE_POSTGRESQL_SORT_START(HEAP_SORT, false, nkeys, workMem, randomAccess);

     state->nKeys = nkeys;

@@ -636,6 +643,8 @@ tuplesort_begin_index_btree(Relation ind

     state->nKeys = RelationGetNumberOfAttributes(indexRel);

+    TRACE_POSTGRESQL_SORT_START(INDEX_SORT, enforceUnique, state->nKeys, workMem, randomAccess);
+
     state->comparetup = comparetup_index_btree;
     state->copytup = copytup_index;
     state->writetup = writetup_index;
@@ -713,6 +722,7 @@ tuplesort_begin_datum(Oid datumType,
              "begin datum sort: workMem = %d, randomAccess = %c",
              workMem, randomAccess ? 't' : 'f');
 #endif
+    TRACE_POSTGRESQL_SORT_START(DATUM_SORT, false, 1, workMem, randomAccess);

     state->nKeys = 1;            /* always a one-column sort */

@@ -825,6 +835,11 @@ tuplesort_end(Tuplesortstate *state)
     }
 #endif

+    TRACE_POSTGRESQL_SORT_END(state->tapeset,
+            (state->tapeset ? LogicalTapeSetBlocks(true) :
+            (state->allowedMem - state->availMem + 1023) / 1024));
+
+
     MemoryContextSwitchTo(oldcontext);

     /*
Index: include/storage/buf_internals.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/storage/buf_internals.h,v
retrieving revision 1.95
diff -u -3 -p -r1.95 buf_internals.h
--- include/storage/buf_internals.h    1 Jan 2008 19:45:58 -0000    1.95
+++ include/storage/buf_internals.h    16 May 2008 18:37:27 -0000
@@ -36,6 +36,7 @@
 #define BM_JUST_DIRTIED            (1 << 5)        /* dirtied since write started */
 #define BM_PIN_COUNT_WAITER        (1 << 6)        /* have waiter for sole pin */
 #define BM_CHECKPOINT_NEEDED    (1 << 7)        /* must write for checkpoint */
+#define BM_HINT_BITS_TEST    (1 << 8)        /* test effect of writes due to hint bits */

 typedef bits16 BufFlags;



Attachment

Re: New DTrace probes proposal

From
Greg Smith
Date:
On Sat, 17 May 2008, Robert Lor wrote:

> I'd like to propose adding the following probes (some of which came from 
> Simon) to 8.4.

There's also a big DTrace probe set patch available from OmniTI: 
https://labs.omniti.com/project-dtrace/trunk/postgresql/
http://labs.omniti.com/trac/project-dtrace/wiki/Applications#PostgreSQL

I don't know if you've looked at that before.  There's some overlap but 
many unique and handy probes to each set.  I think it would be nice to 
consider a superset union of the two.  I would guess OmniTI would be glad 
to have their set assimilated into core as well so they don't have to 
maintain their patch past 8.3; hopefully Theo or Robert will chime in on 
that.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: New DTrace probes proposal

From
Robert Lor
Date:
Greg Smith wrote:
> There's also a big DTrace probe set patch available from OmniTI: 
> https://labs.omniti.com/project-dtrace/trunk/postgresql/
> http://labs.omniti.com/trac/project-dtrace/wiki/Applications#PostgreSQL
>
> I don't know if you've looked at that before.  There's some overlap 
> but many unique and handy probes to each set.  I think it would be 
> nice to consider a superset union of the two.
I heard about Theo's probes recently, but I haven't had a chance to look 
at them closely. I'm more than happy to adapt his probes for 8.4 and 
remove any duplicates if there are no objections from Theo.
> I would guess OmniTI would be glad to have their set assimilated into 
> core as well so they don't have to maintain their patch past 8.3;
I'd think so too!


-Robert




Re: New DTrace probes proposal

From
Robert Treat
Date:
On Sunday 18 May 2008 03:18:13 Greg Smith wrote:
> On Sat, 17 May 2008, Robert Lor wrote:
> > I'd like to propose adding the following probes (some of which came from
> > Simon) to 8.4.
>
> There's also a big DTrace probe set patch available from OmniTI:
> https://labs.omniti.com/project-dtrace/trunk/postgresql/
> http://labs.omniti.com/trac/project-dtrace/wiki/Applications#PostgreSQL
>
> I don't know if you've looked at that before.  There's some overlap but
> many unique and handy probes to each set.  I think it would be nice to
> consider a superset union of the two.  I would guess OmniTI would be glad
> to have their set assimilated into core as well so they don't have to
> maintain their patch past 8.3; hopefully Theo or Robert will chime in on
> that.
>

<chime>We've had it in the back of our minds to submit the patch set for 8.4, 
just hadn't gotten around to it. As we've started to see some 3rd party 
uptake of the set, it might be better to get something in sooner rather than 
later, so yes, we'd be happy to see the patch set adopted upstream.</chime>

-- 
Robert Treat
Database Architect
http://www.omniti.com


Re: New DTrace probes proposal

From
Robert Treat
Date:
On Saturday 17 May 2008 22:33:01 Robert Lor wrote:
> (Resending since it didn't work the first time. Not sure if attaching a
> tar file was the culprit.)
>
> I'd like to propose adding the following probes (some of which came from
> Simon) to 8.4.
>

+1

> I think these probe provide very useful data. Although some of the data
> can be collected now, the main advantages with probes, among others, are
> (1) they are always available and can be enabled only when needed
> especially in production (2) different combinations of probes can be
> used together to collect interesting data.
>
> They work on OS X Leopard & Solaris now, and hopefully on FreeBSD soon.
>

certainly by the time 8.4 ships, these should work with freebsd I'd think. 
ideally we would need to confirm this by release time, certainly getting a 
bsd buildfarm member to compile with them would be a start (and very unlikely 
to cause issues)

> Preliminary patch attached along with sample DTrace scripts.
>

One thing I didnt understand after looking at this was... 

> * Probes to measure query time
> query-parse-start (int, char *)

I would have guessed that the arguments might be pid and query string, but 
looking at the probes, I see it defined as:
TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);

which doesn't seem to match up... can you explain that piece? 


Overall, I like the probes you have breaking down query 
parsing/planning/executing, though I like ours for measuring autovacuum 
pieces, so I think the end game should be to just merge the two patches 
together (barring any place there is direct conflict)... do you see any 
issues with that? 

One other questions would be what to do with the dtrace scripts. I think 
having a set of these available is a large boon for dtrace users, but do you 
see that as something that needs to be distriubuted with the core? I'd lean 
towards reviving the dtrace project on pgfoundry, but it might be worth 
expanding the dynamic tracing chapter to include more examples and a pointer 
to pgfoundry.  

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: New DTrace probes proposal

From
Robert Lor
Date:
Robert Treat wrote:
>
> certainly by the time 8.4 ships, these should work with freebsd I'd think. 
> ideally we would need to confirm this by release time, certainly getting a 
> bsd buildfarm member to compile with them would be a start (and very unlikely 
> to cause issues)
>   
As soon as the DTrace port is working on FreeBSD, I will confirm that 
the probes are working properly, and it's definitely a good idea to get 
a buildfarm machine building with --enable-dtrace.
> One thing I didnt understand after looking at this was... 
>   
>> * Probes to measure query time
>> query-parse-start (int, char *)
>>     
> I would have guessed that the arguments might be pid and query string, but 
> looking at the probes, I see it defined as:
>
>  TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
>
> which doesn't seem to match up... can you explain that piece? 
>   
Having the pid passed as an argument was my original intention, but it's 
actually redundant since the pid is readily available from the script, 
so I will fix the other probes with pid as args.
>
> Overall, I like the probes you have breaking down query 
> parsing/planning/executing, though I like ours for measuring autovacuum 
> pieces, so I think the end game should be to just merge the two patches 
> together (barring any place there is direct conflict)... do you see any 
> issues with that? 
>   
Yes, to avoid confusion, the probes should be merged and resubmitted as 
one patch. Have yours been ported to 8.4 yet? We also need to make sure 
the names and arg types are consistent, probably should work on this 
offline.
> One other questions would be what to do with the dtrace scripts. I think 
> having a set of these available is a large boon for dtrace users, but do you 
> see that as something that needs to be distriubuted with the core?
I don't see the need to include the scripts with core now, maybe some 
point in the future if it makes sense.
>  I'd lean 
> towards reviving the dtrace project on pgfoundry, but it might be worth 
> expanding the dynamic tracing chapter to include more examples and a pointer 
> to pgfoundry.  
>   
Agreed on both. I will add the new scripts to the dtrace project on 
PgFoundry and add more info to the doc. I think you guys have some 
interesting scripts as well that folks will find useful.

-Robert



Re: New DTrace probes proposal

From
Tom Lane
Date:
Robert Lor <Robert.Lor@Sun.COM> writes:
> As soon as the DTrace port is working on FreeBSD, I will confirm that 
> the probes are working properly, and it's definitely a good idea to get 
> a buildfarm machine building with --enable-dtrace.

I'm pretty certain one of the OS X build critters is already testing
that.  Of course it's only testing that it builds, not that it does
anything useful ...
        regards, tom lane


Re: New DTrace probes proposal

From
Robert Treat
Date:
On Friday 06 June 2008 14:32:27 Robert Lor wrote:
> Robert Treat wrote:
> > certainly by the time 8.4 ships, these should work with freebsd I'd
> > think. ideally we would need to confirm this by release time, certainly
> > getting a bsd buildfarm member to compile with them would be a start (and
> > very unlikely to cause issues)
>
> As soon as the DTrace port is working on FreeBSD, I will confirm that
> the probes are working properly, and it's definitely a good idea to get
> a buildfarm machine building with --enable-dtrace.
>
> > One thing I didnt understand after looking at this was...
> >
> >> * Probes to measure query time
> >> query-parse-start (int, char *)
> >
> > I would have guessed that the arguments might be pid and query string,
> > but looking at the probes, I see it defined as:
> >
> >  TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
> >
> > which doesn't seem to match up... can you explain that piece?
>
> Having the pid passed as an argument was my original intention, but it's
> actually redundant since the pid is readily available from the script,
> so I will fix the other probes with pid as args.
>
> > Overall, I like the probes you have breaking down query
> > parsing/planning/executing, though I like ours for measuring autovacuum
> > pieces, so I think the end game should be to just merge the two patches
> > together (barring any place there is direct conflict)... do you see any
> > issues with that?
>
> Yes, to avoid confusion, the probes should be merged and resubmitted as
> one patch. Have yours been ported to 8.4 yet? We also need to make sure
> the names and arg types are consistent, probably should work on this
> offline.
>

We haven't ported our probes to 8.4 yet; the focus of our work has been to 
help people currently running postgres in production, and will probably 
remain with that focus untill closer to feature freeze. (Granted, if we get 
something into 8.4, we may just overhaul our to match that.)  While it would 
be nice to have a clean merge of the two, it's probably simple enough to just 
re-implement the differences into your patch (since yours already compiles on 
8.4).  As far as naming scheme, I'm not particularly wedded to either... is 
there a dtrace naming convention that could be followed? 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: New DTrace probes proposal

From
Robert Lor
Date:
Robert Treat wrote:
>
> While it would be nice to have a clean merge of the two, it's probably simple enough to just 
> re-implement the differences into your patch (since yours already compiles on 8.4).
Should be straightforward ... I can do the merge.
>   As far as naming scheme, I'm not particularly wedded to either... is 
> there a dtrace naming convention that could be followed? 
>   
Yep, and the probes I submitted  pretty much follow the suggested naming 
convention.

-Robert