Re: xlog viewer proposal - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: xlog viewer proposal
Date
Msg-id 1151063174.2691.1385.camel@localhost.localdomain
Whole thread Raw
In response to Re: xlog viewer proposal  ("Diogo Biazus" <diogob@gmail.com>)
Responses Re: xlog viewer proposal
List pgsql-hackers
On Thu, 2006-06-22 at 14:57 -0300, Diogo Biazus wrote:
> Agree, the project must choose one path as the starting point. But the
> two options can be given in the long run.

I'm acting as Diogo's mentor for the SoC, so I'm trying to let Diogo
discuss his ideas in the community manner without too much steering.

Diogo's ideas are interesting - they aren't the way I would have done it
either, but that doesn't mean we shouldn't consider this alternative
approach.

> I still think that as a starting point the functions inside the
> database are a good option.

Yes, if we use SRF functions for this, ISTM they are the best place for
them. 

> The reasons are: 
> - using SQL to agregate and transform data in any way from the logs.

That is a major point here. If the xlogdump is purely a stand-alone
program that it will be much less functionally rich and as Tom mentions,
there are other reasons for having access to a server.

> - it's easier for the DBA in the other use cases where the cluster is
> still active. 

Good point.

> - give more flexibility for managing the xlogs remotely

Not sure what you mean.

> - I think it's faster to implement and to have a working and usable
> tool.

Why do you think that? It sounds like you've got more work since you
effectively need to rewrite the _desc routines.

> And there is one option to minimize the problem in the failed cluster
> case: the wrapper program could give the option to initdb a temporary
> area when no connection is given, creating a backend just to analyze a
> set of xlogs. 

It seems a reasonable assumption that someone reading PostgreSQL logs
would have access to another PostgreSQL cluster. It obviously needs to
work when the server that originated the logs is unavailable, but that
does not mean that all PostgreSQL systems are unavailable. There's no
need to try to wrap initdb - just note that people would have to have
access to a PostgreSQL system.

> Other option is to start by the standalone tool and create a wrapper
> function inside postgresql that would just call this external program
> and extract data from the xlogs using this program's output (with some
> option to output all data in a CSV format). 

I think this idea is a good one, but we must also consider whether is
can be done effectively within the time available. Is this: can do now
or want to do in future?

The alternative of reinforcing xlogdump needs to be considered more
fully now and quickly, so coding can begin as soon as possible. 
- Diogo: what additional things can you make xlogdump do?
- Tom: can you say more about what you'd like to see from a tool, to
help Diogo determine the best way forward. What value can he add if you
have already written the tool?


Some other considerations:
The biggest difficulty is finding "loser transactions" - ones that have
not yet committed by the end of the log. You need to do this in both
cases if you want to allow transaction state to be determined precisely
for 100% of transactions; otherwise you might have to have an Unknown
transaction state in addition to the others.

What nobody has mentioned is that connecting to a db to lookup table
names from OIDs is only possible if that db knows about the set of
tables the log files refer to. How would we be certain that the
OID-to-tablename match would be a reliable one?

-- Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Mark Woodward"
Date:
Subject: Re: vacuum, performance, and MVCC
Next
From: Csaba Nagy
Date:
Subject: Re: vacuum, performance, and MVCC