Sorry for taking so long to answer. I am sending attached the patch with the changes I did to pgsql code. I followed the steps for compiling and installing pgsql from:
http://www.postgresql.org/docs/current/static/install-short.htmlIn summary, the page_id of the page being released in ReleaseBuffer() and ReleaseAndReadBuffer() is written to the file: /usr/loca/pgsql/data/trace. This file is created manually.
I have also created a PrivateDirtyFlag for each backend, in analogy to the PrivateRefCount. I use this to keep track if the current backend performed an update operation in a page in the buffer pool or simply a read operation (it is not relevant now). The trace file consists of one line for each ReleaseBuffer() or ReleaseAndReadBuffer() call. The line has the format:
operation,tblSpace,dbNode,relNode,blockNumber
Once the trace file is complete after the execution of the tpcc benchmark, I use the following bash script to get only unique pages:
cut -d',' -f2-5 trace | sort -n -t',' -k1 -k2 -k3 -k4 | uniq
Today I realized that I was making a mistake in executing the oltpbenchmark application. From the 64 warehouses created for tpcc, only 1 was being accessed (the 14k distinct pages that I mentioned). I increased the "terminal" option of the tpcc benchmark from 1 to 64, resulting in one terminal for each warehouse.
This provided me with a higher number of distinct pages being accessed. Unfortunately, from the 800k pages in the database (64 warehouses), executing tpcc for 10min resulted in 400k distinct pages being accessed. This number is much better than the previous results, but I think it is still not realistic.
I would like to thank you guys for all the attention given to my problem :)