xlog viewer proposal - Mailing list pgsql-hackers
From | Diogo Biazus |
---|---|
Subject | xlog viewer proposal |
Date | |
Msg-id | eca519a10606220501y7e61f82di8903042f947b89cf@mail.gmail.com Whole thread Raw |
Responses |
Re: xlog viewer proposal
|
List | pgsql-hackers |
I'm developing the summer of code project to create a xlog viewer.<br />The tool we want to create is a DBA tool used forinspect the xlog files, looking for some operations, statistcs of database usage and status of transactions. <br /><br/>Some use cases:<br />* Some user made a mistake and commited it to the database. Now the DBA wants to find out theexact xid to restore to a point in time where this mistake was yet to happen.<br /><br />* A long running transactionchanged lots of data and we want to undo that. Show the most recent transactions, with their commit times andtotal size of WAL for each transaction, so we can see the big transaction's xid. <br /><br />* How many write transactionsper second am I getting?<br /><br />The idea I've been discussing with Simon Riggs is to create a set of functionsthat can be called from within the database.<br />It seems that we would need to extract 2 relations from the logfiles: transactions and xlog entries. <br />The functions that would extract the entries could easily look into any xlogsegment passed as a parameter and return a relation containing these entries. But the functions needed to extract thetransactions would need to look untill the last segment of the xlog to know the status of every transaction. <br /><br/>The function to extract the xlog entries would read the XLogRecord structures (with all data associated), get a TupleDescwith get_call_result_type() and return a tuple for each XLogRecord.<br /><br />Another problem is how to presentthe specific data returned in each operation? <br />We can make functions to present this data in a humam readableformat like: get_heap_data(), get_btree_data(), etc.<br />For example: SELECT get_btree_data(data) FROM xlogviewer_file(xxx)WHERE operation = 'BTREE';<br /><br /> Other problem is how to sparete implict ABORTs from explict ones?It seems that will be necessary to have separate functions for transaction info extraction. As I wrote above, this functionswould have to read the xlogs all the way to the present moment to know which transactions are implicitly aborted,and which ones are still active. <br /><br />This design gives lots of flexibility, we can use all the SQL powerto query the xlog files. The one drawback is that you have to use a working backend to inspect the xlog, but in caseswhere the database cluster is lost you could always use another cluster. It would be easy to create a wrapper program(like createdb and createuser) to connect to a database and return the xlog info. <br /><br />Other advantage is thepossibility of query remote xlogs trought a pg connection, it makes the remote managing easier and machine cluster managingeasier to (don't have to make ssh accounts on all nodes or map a remote filesystem). <br /><br />Oracle has a similartool called logminer (there goes an article about it <a href="http://www.oracle.com/technology/oramag/oracle/05-jul/o45dba.html">http://www.oracle.com/technology/oramag/oracle/05-jul/o45dba.html </a>).This postgresql xlogviewer would be also good for people migrating from oracle.<br /><br />Besides, if we create asa separate program this would imply having useful functions related to backend data (the xlogs) not available to otherbackend modules. It would be easier to create redundant code also. And I've read some emails about having already duplicatecode for extracting text out of xlogs (the xlogs debug functions). <br /><br />Why a contrib module?<br />BecauseIt sounds safer to me to create a contrib module and it seems that I wont need to change the existing backend code.<br />So all the code I'm planning to write is new, and I wont need changes in the backend. Another bonus is not tobloat the backend and let this feature to be installed by those who really need it. Afterwards they can be integrated inthe backend if needed. <br /><br />Given this design I would create some functions like (plus the data formating functions):<br/><br />xlogviewer_file(char *file_path)<br />Return all entries in a given xlog segment<br /><br />xlogviewer_file_offset(char*file_path, uint32 offset) <br />Return all entries in a given xlog segment from an offset<br/><br />xlogviewer_dir(char *directory_path)<br />Return all entries in all xlog segments inside a directory<br/><br />xlogviewer_transactions(char *file_path)<br />Return all transactions from the directory containing thesegment passed as parameter starting from this segment. <br /><br />One example of it's use:<br />SELECT * FROM xlogviewer_file('00000001000000000000000F')xlog WHERE xlog.operation = 'BTREE';<br />The resultset would be something like:<br/>xlog_record | previous_xlog_record | xid | operation | data <br />-------------+------------------------+-------+-------------+-------<br/> 0/00000220 | 0/000001F0 | 4 | BTREE |<br /><br /><br />We could also query for a list of committed transactions:<br />SELECT<br /> xlog.xid,count(1), sum(xlog.record_size)<br />FROM<br /> xlogviewer_file('00000001000000000000000F') xlog<br />WHERE<br/> xlog.operation = 'XACT'<br /> <a href="http://xlog.info">xlog.info</a> = 'COMMIT'<br />GROUP BY<br /> xlog.xid<br /><br /><br clear="all" /><br />-- <br />Diogo Biazus - <a href="mailto:diogob@gmail.com">diogob@gmail.com</a><br/>Móvel Consultoria<br /><a href="http://www.movelinfo.com.br">http://www.movelinfo.com.br</a><br/><a href="http://www.postgresql.org.br"> http://www.postgresql.org.br</a>
pgsql-hackers by date: