xlog viewer prototype and new proposal - Mailing list pgsql-hackers

From Diogo Biazus
Subject xlog viewer prototype and new proposal
Date
Msg-id eca519a10607062235t270b9345led5fb56666ae3a0a@mail.gmail.com
Whole thread Raw
Responses Re: xlog viewer prototype and new proposal  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
I've worked on a prototype (attached to this email) of the SRF function and I can query the xlog files for some useful info.
I know that the error codes are still incorrect and the tests are missing, but this is only a proof of concept.

Examples of usage:
Query for committed transactions on the xlog segment:

postgres=# SELECT * from xlogviewer('/usr/local/pgsql/data/pg_xlog/000000010000000000000003') where trim(infoname) = 'commit';
 rmid | xid | rmname | info | len | total_len | infoname
------+-----+--------+------+-----+-----------+----------
    1 |   4 | XACT   |  0   |  12 |        40 | commit
    1 |   5 | XACT   |  0   |  12 |        40 | commit
...

Query for the total length of transactions in the segment:
postgres=# SELECT xid, sum(total_len) from xlogviewer('/usr/local/pgsql/data/pg_xlog/000000010000000000000003') group by xid;
 xid |   sum
-----+---------
 499 |   69054
 497 |    1460
...

I'll sumarize some relevant points of our previous thread (can be viewed at: http://archives.postgresql.org/pgsql-hackers/2006-06/msg01069.php ):

I exposed the idea of bringing the xlogdump functionality to a backend module.
The main drawback is the use case where the database is down.
But the access to a failed cluster isn't impossible, just a little bit more dificult, requiring another cluster to be initialized.
I don't think that this is a no-go for the design, of course for the case where the database is down is still better to have a standalone tool.
So there were suggestions to code something that could be used both on the frontend and the backend.
Tom said it would be difficult to use the same functions on backend and frontend.
I think that I could use the same library and pass pointers to memory allocation and error reporting functions.

Advantages in the SRF design:
- Using SQL to agregate and transform data in any way from the logs.
- It's easier for the DBA in the other use cases where the cluster is still active.
- I already have a database connection in cases where I want to translate oid to names.
- I can connect directly to the postgresql server if I want to query xlogs in a remote machine (don't need remote access to the system).
- Easier to integrate with existing admin tools, like PgAdmin.

In any case I need to start ASAP, to have something useful till the end of the google soc.
So if the way to go will be the standalone program, I think that I can enhance it by adding:
- option to translate OIDs to names given a database connection
- find loser transactions (not commited to the end of the log)
- have an options to output only the transactions with their status and some aggregate data (transaction size).
- a CSV output (to read it's data on another programs, including the backend)

There's one functionality I would like to work on but I don't how dificult it would be to acomplish in time:
- Extract the exact operation done in cases of xlog generated by insert/update/delete.

--
Diogo Biazus - diogob@gmail.com
Móvel Consultoria
http://www.movelinfo.com.br
http://www.postgresql.org.br
Attachment

pgsql-hackers by date:

Previous
From: Christopher Browne
Date:
Subject: request for feature: psql "DSN" option
Next
From: "Andrew Dunstan"
Date:
Subject: Re: request for feature: psql 'DSN' option