Thread: xlog viewer prototype and new proposal
I've worked on a prototype (attached to this email) of the SRF function and I can query the xlog files for some useful info.
I know that the error codes are still incorrect and the tests are missing, but this is only a proof of concept.
Examples of usage:
Query for committed transactions on the xlog segment:
postgres=# SELECT * from xlogviewer('/usr/local/pgsql/data/pg_xlog/000000010000000000000003') where trim(infoname) = 'commit';
rmid | xid | rmname | info | len | total_len | infoname
------+-----+--------+------+-----+-----------+----------
1 | 4 | XACT | 0 | 12 | 40 | commit
1 | 5 | XACT | 0 | 12 | 40 | commit
...
Query for the total length of transactions in the segment:
postgres=# SELECT xid, sum(total_len) from xlogviewer('/usr/local/pgsql/data/pg_xlog/000000010000000000000003') group by xid;
xid | sum
-----+---------
499 | 69054
497 | 1460
...
I'll sumarize some relevant points of our previous thread (can be viewed at: http://archives.postgresql.org/pgsql-hackers/2006-06/msg01069.php ):
I exposed the idea of bringing the xlogdump functionality to a backend module.
The main drawback is the use case where the database is down.
But the access to a failed cluster isn't impossible, just a little bit more dificult, requiring another cluster to be initialized.
I don't think that this is a no-go for the design, of course for the case where the database is down is still better to have a standalone tool.
So there were suggestions to code something that could be used both on the frontend and the backend.
Tom said it would be difficult to use the same functions on backend and frontend.
I think that I could use the same library and pass pointers to memory allocation and error reporting functions.
Advantages in the SRF design:
- Using SQL to agregate and transform data in any way from the logs.
- It's easier for the DBA in the other use cases where the cluster is still active.
- I already have a database connection in cases where I want to translate oid to names.
- I can connect directly to the postgresql server if I want to query xlogs in a remote machine (don't need remote access to the system).
- Easier to integrate with existing admin tools, like PgAdmin.
In any case I need to start ASAP, to have something useful till the end of the google soc.
So if the way to go will be the standalone program, I think that I can enhance it by adding:
- option to translate OIDs to names given a database connection
- find loser transactions (not commited to the end of the log)
- have an options to output only the transactions with their status and some aggregate data (transaction size).
- a CSV output (to read it's data on another programs, including the backend)
There's one functionality I would like to work on but I don't how dificult it would be to acomplish in time:
- Extract the exact operation done in cases of xlog generated by insert/update/delete.
--
Diogo Biazus - diogob@gmail.com
Móvel Consultoria
http://www.movelinfo.com.br
http://www.postgresql.org.br
I know that the error codes are still incorrect and the tests are missing, but this is only a proof of concept.
Examples of usage:
Query for committed transactions on the xlog segment:
postgres=# SELECT * from xlogviewer('/usr/local/pgsql/data/pg_xlog/000000010000000000000003') where trim(infoname) = 'commit';
rmid | xid | rmname | info | len | total_len | infoname
------+-----+--------+------+-----+-----------+----------
1 | 4 | XACT | 0 | 12 | 40 | commit
1 | 5 | XACT | 0 | 12 | 40 | commit
...
Query for the total length of transactions in the segment:
postgres=# SELECT xid, sum(total_len) from xlogviewer('/usr/local/pgsql/data/pg_xlog/000000010000000000000003') group by xid;
xid | sum
-----+---------
499 | 69054
497 | 1460
...
I'll sumarize some relevant points of our previous thread (can be viewed at: http://archives.postgresql.org/pgsql-hackers/2006-06/msg01069.php ):
I exposed the idea of bringing the xlogdump functionality to a backend module.
The main drawback is the use case where the database is down.
But the access to a failed cluster isn't impossible, just a little bit more dificult, requiring another cluster to be initialized.
I don't think that this is a no-go for the design, of course for the case where the database is down is still better to have a standalone tool.
So there were suggestions to code something that could be used both on the frontend and the backend.
Tom said it would be difficult to use the same functions on backend and frontend.
I think that I could use the same library and pass pointers to memory allocation and error reporting functions.
Advantages in the SRF design:
- Using SQL to agregate and transform data in any way from the logs.
- It's easier for the DBA in the other use cases where the cluster is still active.
- I already have a database connection in cases where I want to translate oid to names.
- I can connect directly to the postgresql server if I want to query xlogs in a remote machine (don't need remote access to the system).
- Easier to integrate with existing admin tools, like PgAdmin.
In any case I need to start ASAP, to have something useful till the end of the google soc.
So if the way to go will be the standalone program, I think that I can enhance it by adding:
- option to translate OIDs to names given a database connection
- find loser transactions (not commited to the end of the log)
- have an options to output only the transactions with their status and some aggregate data (transaction size).
- a CSV output (to read it's data on another programs, including the backend)
There's one functionality I would like to work on but I don't how dificult it would be to acomplish in time:
- Extract the exact operation done in cases of xlog generated by insert/update/delete.
--
Diogo Biazus - diogob@gmail.com
Móvel Consultoria
http://www.movelinfo.com.br
http://www.postgresql.org.br
Attachment
"Diogo Biazus" <diogob@gmail.com> writes: > I exposed the idea of bringing the xlogdump functionality to a backend > module. The main drawback is the use case where the database is down. But > the access to a failed cluster isn't impossible, just a little bit more > dificult, requiring another cluster to be initialized. Does that mean you're planning to not use the backend's system tables at all? You'll look at the database cluster under analysis to get all that information? If so then that removes a lot of the objections to running in a backend. You're basically just using the backend as a convenient context for manipulating and storing table-like data. It also seems to remove a lot of the motivation for doing it in the backend. You're not going to get any advantages on the implementation side in that case. > - I already have a database connection in cases where I want to translate > oid to names. You can't do that if you want to allow people to initialize a new cluster to analyze a downed cluster. > - I can connect directly to the postgresql server if I want to query xlogs > in a remote machine (don't need remote access to the system). > - Easier to integrate with existing admin tools, like PgAdmin. These are unconvincing to non-windows people. In any case a stand-alone program could always have a postgres module tacked on to call out to it. That's the main reason I think a stand-alone module makes more sense. You can always take a stand-alone module and stick an interface to it into the server. You can't take code meant to run in the server and build a stand-alone environment to run it. -- greg
On 07 Jul 2006 09:58:29 -0400, Greg Stark <gsstark@mit.edu > wrote:
Yes, that's correct,
Sure it would be possible to make the translations only in cases where the backend is the same generetaing the xlogs.
Sure, that's one of the solutions I was thinking about, now I see it can be the best one.
Using just a backend interface to call the standalone tool.
What I still don't know is:
Is it better to make this interface just calling the program and reading it's output than using a set of shared macros/functions?
Sure.
On the last part off the proposal I've suggested some improvements to the stand-alone tool, any other ideas?
-- "Diogo Biazus" <diogob@gmail.com> writes:
> I exposed the idea of bringing the xlogdump functionality to a backend
> module. The main drawback is the use case where the database is down. But
> the access to a failed cluster isn't impossible, just a little bit more
> dificult, requiring another cluster to be initialized.
Does that mean you're planning to not use the backend's system tables at all?
You'll look at the database cluster under analysis to get all that
information?
If so then that removes a lot of the objections to running in a backend.
You're basically just using the backend as a convenient context for
manipulating and storing table-like data.
It also seems to remove a lot of the motivation for doing it in the backend.
You're not going to get any advantages on the implementation side in that
case.
Yes, that's correct,
> - I already have a database connection in cases where I want to translate
> oid to names.
You can't do that if you want to allow people to initialize a new cluster to
analyze a downed cluster.
Sure it would be possible to make the translations only in cases where the backend is the same generetaing the xlogs.
> - I can connect directly to the postgresql server if I want to query xlogs
> in a remote machine (don't need remote access to the system).
> - Easier to integrate with existing admin tools, like PgAdmin.
These are unconvincing to non-windows people. In any case a stand-alone
program could always have a postgres module tacked on to call out to it.
Sure, that's one of the solutions I was thinking about, now I see it can be the best one.
Using just a backend interface to call the standalone tool.
What I still don't know is:
Is it better to make this interface just calling the program and reading it's output than using a set of shared macros/functions?
That's the main reason I think a stand-alone module makes more sense. You can
always take a stand-alone module and stick an interface to it into the server.
You can't take code meant to run in the server and build a stand-alone
environment to run it.
Sure.
On the last part off the proposal I've suggested some improvements to the stand-alone tool, any other ideas?
Diogo Biazus - diogob@gmail.com
Móvel Consultoria
http://www.movelinfo.com.br
http://www.postgresql.org.br
On Fri, Jul 07, 2006 at 11:59:41AM -0300, Diogo Biazus wrote: > On 07 Jul 2006 09:58:29 -0400, Greg Stark <gsstark@mit.edu> wrote: > > That's the main reason I think a stand-alone module makes more > >sense. You can always take a stand-alone module and stick an > >interface to it into the server. You can't take code meant to run in > >the server and build a stand-alone environment to run it. > > Sure. > On the last part off the proposal I've suggested some improvements to the > stand-alone tool, any other ideas? Something I've been thinking of while reading this thread. One big disadvantage of doing it in the backend is that your methods of returning data are limited. Your resultset can only return one "type". For example, if you start decoding all the different types of xlog packets, you're going to get different information for each. To display that as the output of a function you're going to have to munge them into a common format. An external program does not suffer this limitation. In the future it may be worthwhile making a library that can be used by both an external program and the postgres backend, but really, that seems a lot less work than doing the actual decoding itself... Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On 7/7/06, Martijn van Oosterhout <kleptog@svana.org> wrote:
Something I've been thinking of while reading this thread. One big
disadvantage of doing it in the backend is that your methods of
returning data are limited. Your resultset can only return one "type".
For example, if you start decoding all the different types of xlog
packets, you're going to get different information for each. To display
that as the output of a function you're going to have to munge them
into a common format. An external program does not suffer this
limitation.
In the future it may be worthwhile making a library that can be used by
both an external program and the postgres backend, but really, that
seems a lot less work than doing the actual decoding itself...
Hope this helps,
Sure, but in the backend idea it could be handled with a functions for the comon data, and other functions to extract especific data from diferent operations.
But I was hoping to get more input about the functionality expected in the standalone tool, what could improve the existing xlogdump?
--
Diogo Biazus - diogob@gmail.com
Móvel Consultoria
http://www.movelinfo.com.br
http://www.postgresql.org.br