Thread: [GSOC] - I ntegrity check algorithm for data files

[GSOC] - I ntegrity check algorithm for data files

From
Robert Mach
Date:
Hello,
my name is Robert Mach and I am happy to by working on GSoC project for 
Postgresql. The name of the project is  Integrity check algorithm for 
data files in Postgresql.

So far, I have put together a list of possible checks for data failure 
and I would like to hear as much opinions on this list or even better 
some recommendation on what should be added to this list or removed.

Before presenting possible errors, I divided them into physical and 
logical errors. Physical errors will refer to errors in the structure of 
pages and tuples, whereas logical errors will depict errors that cause 
incorrect performance of postgresql, but are correct according the 
structure of data files.

In order to find PHYSICAL errors:
- check whether the total size of all TOAST table chunks is the same as 
the size mentioned in the TOASTed table
- in case of variable length representation of data (attlen = -1) 
compare the real size of stored data with the size of data mentioned 
within the varlena lenth word
- count the number of rows in a table and compare it with the 
pg_class.reltuples of corresponding record in pg_class.
-check the format of data according to the flags (that determine the 
representation) belonging to them. e.g.: in case of TOASTed values, is 
the size of pointer datum really 20B?, etc.
-check the fields firmly defined by the structure for occurrence of odd data

In order to find LOGICAL errors:
-check the validity of all items in index (e.g. concurrent update and 
index scan with constrain could cause inconsistent snapshot of database 
used for creating the index)/
-After creating of index or running the index scan, check whether all 
tuples that should be indexed are really indexed (e.g. )
-check the validity and effectiveness of free space map (whether it is 
not considering valid data as free, whether the size is fitting the 
needs of database, etc.)
-check whether all user-defined functions in the database are 
visible/usable for the users (maybe also verify privileges..)
-check whether the constrains applied on items are fulfilled:    - the uniqueness    - range of values    - correctness
offoreign key values
 
-in case of very large databases, check whether the wrap around of 
transactions IDs occurred. (Transaction ID = 2 to the power of 32)
-check the integrity of catalogs

I see different ways of delivering this functionality to Postgresql. The 
best of course would be to become part of Postgresql release either as a 
PostgreSQL command (like UPDATE) or as an postgresql server application 
like vacuumdb.
Other possibility is to create a freestanding program that would be 
called with location of datafiles as arguments.
Last possibility is to create an administrative console access (single 
user mode) to the database in which this integrity check could be fired.

I hope to get lots of thoughts  on this proposal as well as lots on 
other ideas on what should be checked in odrer to verify the integrity 
of data in Postgres.

Cheers,

Robert



Re: [GSOC] - I ntegrity check algorithm for data files

From
Martijn van Oosterhout
Date:
On Fri, May 18, 2007 at 12:52:20PM +0200, Robert Mach wrote:
> Before presenting possible errors, I divided them into physical and
> logical errors. Physical errors will refer to errors in the structure of
> pages and tuples, whereas logical errors will depict errors that cause
> incorrect performance of postgresql, but are correct according the
> structure of data files.

Have you looked at pgfsck? I think you also need to check much lower
level stuff also. Let:
- Page headers
- Are all the item pointers in order an non-overlapping
- Is each item correctly structured, do the XIDs look OK. What about
the HINT bits. Can the tuple be decoded properly. One common problem
with data corruption is that postgres with see a varlen header which
says the daa is 2K (or 2M) whereas the item is only 50bytes long, so
that's obviously broken.

Only once you're sure about the actual structure can you worry about
thing like toast pointer (which are important, but less likely to crash
postgres).

All this applies to index pages also.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: [GSOC] - I ntegrity check algorithm for data files

From
Zdenek Kotala
Date:
Robert Mach wrote:

> 
> I see different ways of delivering this functionality to Postgresql. The 
> best of course would be to become part of Postgresql release either as a 
> PostgreSQL command (like UPDATE) or as an postgresql server application 
> like vacuumdb.
> Other possibility is to create a freestanding program that would be 
> called with location of datafiles as arguments.
> Last possibility is to create an administrative console access (single 
> user mode) to the database in which this integrity check could be fired.

I discussed It with Niel and for logical test It should be PostgreSQL 
function. However, for physical layer test database cluster should be in 
inconsistent state and PostgreSQL startup process fails. It means that 
there are only limited possibilities.

1) standalone binary, which will be linked mostly from postgreSQL object 
files and replace main and add new testing functionality. This is 
recommended by Neil.

2) Add new postmaster context/mod - repair and recovery, when postgres 
is run as "postmaster --recovery".

3) My personal favorite is create management console - which allows to 
perform check without physically access to local machine. Management 
console should be use for another purpose - for example disable/enable 
databases from cluster, perform upgrade of datalayer to the new version, 
kill sessions, update postgresql.conf and so on...

However this solution has significant impact on current postgres 
behavior, but I think it should be big deal for postgres.

    Zdenek



Re: [GSOC] - I ntegrity check algorithm for data files

From
Richard Huxton
Date:
Zdenek Kotala wrote:
> 3) My personal favorite is create management console - which allows to 
> perform check without physically access to local machine. Management 
> console should be use for another purpose - for example disable/enable 
> databases from cluster, perform upgrade of datalayer to the new version, 
> kill sessions, update postgresql.conf and so on...
> 
> However this solution has significant impact on current postgres 
> behavior, but I think it should be big deal for postgres.

I believe pgadmin lets you update postgresql.conf remotely, and I know 
you can kill sessions from within PG ( pg_cancel_backend() ).

For the rest, that's what ssh is for imho.

--   Richard Huxton  Archonet Ltd


Re: [GSOC] - I ntegrity check algorithm for data files

From
Zdenek Kotala
Date:
Richard Huxton wrote:
> Zdenek Kotala wrote:
>> 3) My personal favorite is create management console - which allows to 
>> perform check without physically access to local machine. Management 
>> console should be use for another purpose - for example disable/enable 
>> databases from cluster, perform upgrade of datalayer to the new 
>> version, kill sessions, update postgresql.conf and so on...
>>
>> However this solution has significant impact on current postgres 
>> behavior, but I think it should be big deal for postgres.
> 
> I believe pgadmin lets you update postgresql.conf remotely,

IIRC, it is possible only when you have some additional module/function 
installed on server and it works only for pgadmin, not for psql.

> and I know 
> you can kill sessions from within PG ( pg_cancel_backend() ).

If you look how kill session is implemented it does not look much safe.

> For the rest, that's what ssh is for imho.

And what you will do when you don't have ssh access on this machine and 
5432 is only one way how to administrate a server? (Windows is another 
story.)
Zdenek



Re: [GSOC] - I ntegrity check algorithm for data files

From
Dave Page
Date:
Zdenek Kotala wrote:
>> I believe pgadmin lets you update postgresql.conf remotely,
> 
> IIRC, it is possible only when you have some additional module/function
> installed on server and it works only for pgadmin, not for psql.

It needs the adminpack contrib module installed in the maintenance DB
(normally postgres).

In theory it could work for psql, but it's completely impractical -
adminpack just gives us some file IO functions which we just use to read
and write the files. All the editing logic is handled by pgAdmin.

Regards, Dave


Re: [GSOC] - I ntegrity check algorithm for data files

From
Richard Huxton
Date:
Zdenek Kotala wrote:
> Richard Huxton wrote:
> 
>> For the rest, that's what ssh is for imho.
> 
> And what you will do when you don't have ssh access on this machine and 
> 5432 is only one way how to administrate a server? (Windows is another 
> story.)

If I've not got ssh access to the machine, then I'm not the one 
responsible for editing configuration files.

If it's a Windows machine, I'd probably use remote-desktop and pgadmin 
(not that anyone has ever asked me to).

--   Richard Huxton  Archonet Ltd


Re: [GSOC] - I ntegrity check algorithm for data files

From
Zdenek Kotala
Date:
Richard Huxton wrote:
> Zdenek Kotala wrote:
>> Richard Huxton wrote:
>>
>>> For the rest, that's what ssh is for imho.
>>
>> And what you will do when you don't have ssh access on this machine 
>> and 5432 is only one way how to administrate a server? (Windows is 
>> another story.)
> 
> If I've not got ssh access to the machine, then I'm not the one 
> responsible for editing configuration files.

I'm not sure if ssh access is in correlation with db administration. 
However in some Unix system you can get right to user start stop 
postgres (RBAC, sudo ...), but there is not easy way how to allow him to 
access configuration file only in way to share postgres user password.
    Zdenek