xlog viewer proposal - Mailing list pgsql-hackers

From Diogo Biazus
Subject xlog viewer proposal
Date
Msg-id eca519a10606220501y7e61f82di8903042f947b89cf@mail.gmail.com
Whole thread Raw
Responses Re: xlog viewer proposal
List pgsql-hackers
I'm developing the summer of code project to create a xlog viewer.<br />The tool we want to create is a DBA tool used
forinspect the xlog files, looking for some operations, statistcs of database usage and status of transactions. <br
/><br/>Some use cases:<br />* Some user made a mistake and commited it to the database. Now the DBA wants to find out
theexact xid to restore to a point in time where this mistake was yet to happen.<br /><br />* A long running
transactionchanged lots of data and we want to undo that. Show the most recent transactions, with their commit times
andtotal size of WAL for each transaction, so we can see the big transaction's xid. <br /><br />* How many write
transactionsper second am I getting?<br /><br />The idea I've been discussing with Simon Riggs is to create a set of
functionsthat can be called from within the database.<br />It seems that we would need to extract 2 relations from the
logfiles: transactions and xlog entries. <br />The functions that would extract the entries could easily look into any
xlogsegment passed as a parameter and return a relation containing these entries. But the functions needed to extract
thetransactions would need to look untill the last segment of the xlog to know the status of every transaction. <br
/><br/>The function to extract the xlog entries would read the XLogRecord structures (with all data associated), get a
TupleDescwith get_call_result_type() and return a tuple for each XLogRecord.<br /><br />Another problem is how to
presentthe specific data returned in each operation? <br />We can make functions to present this data in a humam
readableformat like: get_heap_data(), get_btree_data(), etc.<br />For example: SELECT get_btree_data(data) FROM
xlogviewer_file(xxx)WHERE operation = 'BTREE';<br /><br /> Other problem is how to sparete implict ABORTs from explict
ones?It seems that will be necessary to have separate functions for transaction info extraction. As I wrote above, this
functionswould have to read the xlogs all the way to the present moment to know which transactions are implicitly
aborted,and which ones are still active. <br /><br />This design gives lots of flexibility, we can use all the SQL
powerto query the xlog files. The one drawback is that you have to use a working backend to inspect the xlog, but in
caseswhere the database cluster is lost you could always use another cluster. It would be easy to create a wrapper
program(like createdb and createuser) to connect to a database and return the xlog info. <br /><br />Other advantage is
thepossibility of query remote xlogs trought a pg connection, it makes the remote managing easier and machine cluster
managingeasier to (don't have to make ssh accounts on all nodes or map a remote filesystem). <br /><br />Oracle has a
similartool called logminer (there goes an article about it <a
href="http://www.oracle.com/technology/oramag/oracle/05-jul/o45dba.html">http://www.oracle.com/technology/oramag/oracle/05-jul/o45dba.html
</a>).This postgresql xlogviewer would be also good for people migrating from oracle.<br /><br />Besides, if we create
asa separate program this would imply having useful functions related to backend data (the xlogs) not available to
otherbackend modules. It would be easier to create redundant code also. And I've read some emails about having already
duplicatecode for extracting text out of xlogs (the xlogs debug functions). <br /><br />Why a contrib module?<br
/>BecauseIt sounds safer to me to create a contrib module and it seems that I wont need to change the existing backend
code.<br />So all the code I'm planning to write is new, and I wont need changes in the backend. Another bonus is not
tobloat the backend and let this feature to be installed by those who really need it. Afterwards they can be integrated
inthe backend if needed. <br /><br />Given this design I would create some functions like (plus the data formating
functions):<br/><br />xlogviewer_file(char *file_path)<br />Return all entries in a given xlog segment<br /><br
/>xlogviewer_file_offset(char*file_path, uint32 offset) <br />Return all entries in a given xlog segment from an
offset<br/><br />xlogviewer_dir(char *directory_path)<br />Return all entries in all xlog segments inside a
directory<br/><br />xlogviewer_transactions(char *file_path)<br />Return all transactions from the directory containing
thesegment passed as parameter starting from this segment. <br /><br />One example of it's use:<br />SELECT * FROM
xlogviewer_file('00000001000000000000000F')xlog WHERE xlog.operation = 'BTREE';<br />The resultset would be something
like:<br/>xlog_record  |  previous_xlog_record  |  xid  |  operation  |  data <br
/>-------------+------------------------+-------+-------------+-------<br/> 0/00000220  | 0/000001F0             |
4    | BTREE       |<br /><br /><br />We could also query for a list of committed transactions:<br />SELECT<br />   
xlog.xid,count(1), sum(xlog.record_size)<br />FROM<br />    xlogviewer_file('00000001000000000000000F') xlog<br
/>WHERE<br/>    xlog.operation = 'XACT'<br />    <a href="http://xlog.info">xlog.info</a> = 'COMMIT'<br />GROUP BY<br
/>   xlog.xid<br /><br /><br clear="all" /><br />-- <br />Diogo Biazus - <a
href="mailto:diogob@gmail.com">diogob@gmail.com</a><br/>Móvel Consultoria<br /><a
href="http://www.movelinfo.com.br">http://www.movelinfo.com.br</a><br/><a href="http://www.postgresql.org.br">
http://www.postgresql.org.br</a>

pgsql-hackers by date:

Previous
From: "Zeugswetter Andreas DCP SD"
Date:
Subject: Re: vacuum, performance, and MVCC
Next
From: Christopher Browne
Date:
Subject: Re: vacuum, performance, and MVCC