Thread: check database integrity

check database integrity

From

Torsten Förtsch

Date:

20 July 2014, 12:57:42

Hi,

we are using 9.3 with data checksums enabled. Now I am looking for a way
to check if all database blocks are still intact. First I tried
pg_filedump. In many cases it simply ignored tampered data blocks. It is
probably not made for this task.

Then I remembered about the pageinspect extension. The following select
is a bit too verbose but it seems to do the job for everything except
fsm files.

        SELECT c.oid::regclass::text as rel,
               f.fork,
               ser.i as blocknr,
               pg.*
          FROM pg_class c
         CROSS JOIN (values ('main'::text), ('vm'::text)) f(fork)
         CROSS JOIN pg_relation_size(c.oid::regclass, f.fork) sz(sz)
         CROSS JOIN generate_series(0,(sz.sz/8192)::int-1) ser(i)
         CROSS JOIN page_header(get_raw_page(c.oid::regclass::text,
                                             f.fork,
                                             ser.i)) pg
         WHERE sz.sz>0

Am I right?

The problem with the select above is that either page_header() or
get_raw_page() seems to allocate the memory for the page without freeing
it again. The process size grew to ~12.5 GB and the query returned
~1,500,000 rows. And 1.5E6 * 8 kB gives roughly 12 GB. Shared buffers is
~120 MB for this database.

I ran this query in a separate transaction. The memory was freed only
when the backend process exited.

Is there a way to work around this memory leak?

Is there a better way to do what I want? I also thought about pg_dump.
But that does not read indexes, AFAIK. Best if the solution would avoid
expensive locks. Then I could also use it in production. But currently I
need it only to verify a backup.

Thanks,
Torsten

Re: check database integrity

From

Andrew Sullivan

Date:

20 July 2014, 14:03:00

On Sun, Jul 20, 2014 at 02:57:20PM +0200, Torsten Förtsch wrote:
> I ran this query in a separate transaction. The memory was freed only
> when the backend process exited.
>
> Is there a way to work around this memory leak?

Why do you think it's a memory leak.  You asked for the full dataset;
you need to have the memory to allocate for it.  At least, that's what
I'd expect.

> expensive locks. Then I could also use it in production. But currently I
> need it only to verify a backup.

If you need to verify a backup, why isn't pg_dump acceptable?  Or is
it that you are somehow trying to prove that what you have on the
target (backup) machine is in fact production-ready?  I guess I don't
really understand what you are trying to do.

A

--
Andrew Sullivan
ajs@crankycanuck.ca

Re: check database integrity

From

Tom Lane

Date:

20 July 2014, 15:35:18

=?ISO-8859-1?Q?Torsten_F=F6rtsch?= <torsten.foertsch@gmx.net> writes:
> Then I remembered about the pageinspect extension. The following select
> is a bit too verbose but it seems to do the job for everything except
> fsm files.

>         SELECT c.oid::regclass::text as rel,
>                f.fork,
>                ser.i as blocknr,
>                pg.*
>           FROM pg_class c
>          CROSS JOIN (values ('main'::text), ('vm'::text)) f(fork)
>          CROSS JOIN pg_relation_size(c.oid::regclass, f.fork) sz(sz)
>          CROSS JOIN generate_series(0,(sz.sz/8192)::int-1) ser(i)
>          CROSS JOIN page_header(get_raw_page(c.oid::regclass::text,
>                                              f.fork,
>                                              ser.i)) pg
>          WHERE sz.sz>0

> The problem with the select above is that either page_header() or
> get_raw_page() seems to allocate the memory for the page without freeing
> it again.

Probably commit 45b0f3572 will help you with that.

> I ran this query in a separate transaction. The memory was freed only
> when the backend process exited.

AFAIK such memory is released at end of query, even without the patch.
Are you sure you aren't looking at shared-buffer usage?  Or maybe you're
on a platform where libc doesn't release freed memory back to the OS.

            regards, tom lane

Re: check database integrity

From

Torsten Förtsch

Date:

21 July 2014, 04:08:59

On 20/07/14 17:35, Tom Lane wrote:
> =?ISO-8859-1?Q?Torsten_F=F6rtsch?= <torsten.foertsch@gmx.net> writes:
>> Then I remembered about the pageinspect extension. The following select
>> is a bit too verbose but it seems to do the job for everything except
>> fsm files.
>
>>         SELECT c.oid::regclass::text as rel,
>>                f.fork,
>>                ser.i as blocknr,
>>                pg.*
>>           FROM pg_class c
>>          CROSS JOIN (values ('main'::text), ('vm'::text)) f(fork)
>>          CROSS JOIN pg_relation_size(c.oid::regclass, f.fork) sz(sz)
>>          CROSS JOIN generate_series(0,(sz.sz/8192)::int-1) ser(i)
>>          CROSS JOIN page_header(get_raw_page(c.oid::regclass::text,
>>                                              f.fork,
>>                                              ser.i)) pg
>>          WHERE sz.sz>0
>
>> The problem with the select above is that either page_header() or
>> get_raw_page() seems to allocate the memory for the page without freeing
>> it again.
>
> Probably commit 45b0f3572 will help you with that.

Thanks, Tom. At least the patch description helped. I moved the
page_header() call to output column list and now it works perfectly.
I'll try the patch next weekend.

>> I ran this query in a separate transaction. The memory was freed only
>> when the backend process exited.
>
> AFAIK such memory is released at end of query, even without the patch.
> Are you sure you aren't looking at shared-buffer usage?  Or maybe you're
> on a platform where libc doesn't release freed memory back to the OS.

You are right here. When I wrote the email I restored the behaviour from
my faulty memory. Today I tried it again and the memory is indeed freed
at the end of the query.

Another question, just out of curiosity, for vm and main forks I use
pg_relation_size to figure out the highest page number. That does not
work for fsm. I have at least one fsm file that it 24 kb. Fetching page
0 works, page 1 and above gives an error:

db=# select page_header(get_raw_page(2836::oid::regclass::text, 'fsm', 0));
                 page_header
----------------------------------------------
 (114/23485F78,19084,0,24,8192,8192,8192,4,0)
(1 row)

db=# select page_header(get_raw_page(2836::oid::regclass::text, 'fsm',
1));
ERROR:  block number 1 is out of range for relation "pg_toast_1255"
db=# select pg_relation_size(2836::oid::regclass, 'fsm');

 pg_relation_size
------------------
            24576


For other relations it works:

db=# select page_header(get_raw_page(60966::oid::regclass::text, 'fsm',
i)) from generate_series(0,2) i;
                  page_header
-----------------------------------------------
 (11F/76884610,-4342,0,24,8192,8192,8192,4,0)
 (11F/768825C0,22465,0,24,8192,8192,8192,4,0)
 (11F/83E9EC38,-29015,0,24,8192,8192,8192,4,0)
(3 rows)

db=# select pg_relation_size(60966::oid::regclass, 'fsm');
 pg_relation_size
------------------
            24576


Is there a way to figure out the highest page number for fsm forks? Is
there perhaps a common way that works for all forks?


Thanks,
Torsten

Re: check database integrity

From

Torsten Förtsch

Date:

21 July 2014, 04:30:14

On 20/07/14 16:02, Andrew Sullivan wrote:
>> Then I could also use it in production. But currently I
>> > need it only to verify a backup.

> If you need to verify a backup, why isn't pg_dump acceptable?  Or is
> it that you are somehow trying to prove that what you have on the
> target (backup) machine is in fact production-ready?  I guess I don't
> really understand what you are trying to do.

Sorry, for kind-of misusing the word backup. What I am doing is this. I
took a base backup and replayed a few xlogs. This is what I meant with
"backup".

What I want to verify is whether all pages in all files match their
checksums. So, I have to make postgres read all pages at least once.
Pg_dump does this for normal tables and toast. But it does not read
index relations as far as I know. A

  select count(*)

from all tables would also do the job, again without indexes.

The sentence about the backup was only to point out that I don't really
care if the query locks the database for concurrent transactions. But
better if it would not acquire an exclusive lock on all tables.

Torsten

Re: check database integrity

From

Tom Lane

Date:

21 July 2014, 14:18:11

=?ISO-8859-1?Q?Torsten_F=F6rtsch?= <torsten.foertsch@gmx.net> writes:
> Another question, just out of curiosity, for vm and main forks I use
> pg_relation_size to figure out the highest page number. That does not
> work for fsm. I have at least one fsm file that it 24 kb. Fetching page
> 0 works, page 1 and above gives an error:

> db=# select page_header(get_raw_page(2836::oid::regclass::text, 'fsm', 0));
>                  page_header
> ----------------------------------------------
>  (114/23485F78,19084,0,24,8192,8192,8192,4,0)
> (1 row)

> db=# select page_header(get_raw_page(2836::oid::regclass::text, 'fsm',
> 1));
> ERROR:  block number 1 is out of range for relation "pg_toast_1255"
> db=# select pg_relation_size(2836::oid::regclass, 'fsm');

>  pg_relation_size
> ------------------
>             24576

That's bizarre.  AFAICS, pg_relation_size() reduces to a stat() call,
while the other error looks like it's coming from rawpage.c's check on
RelationGetNumberOfBlocks() which depends on mdnblocks() which prefers
to look at the result of lseek(SEEK_END).  But both of those should
surely get the same answer, if the file's not changing.

Could you trace through it and see where the results diverge?  Also,
what's the actual size of the file on disk?

Alternatively, if you have a way to reproduce this from a standing
start, I'd be happy to poke into it here.

            regards, tom lane

PS: you've not messed around with the value of BLCKSZ have you?
If different bits of code had different ideas of that constant's
value, it'd possibly explain this.

Re: check database integrity

From

Torsten Förtsch

Date:

22 July 2014, 14:45:22

On 21/07/14 16:17, Tom Lane wrote:
>> > db=# select page_header(get_raw_page(2836::oid::regclass::text, 'fsm',
>> > 1));
>> > ERROR:  block number 1 is out of range for relation "pg_toast_1255"
>> > db=# select pg_relation_size(2836::oid::regclass, 'fsm');
>> >  pg_relation_size
>> > ------------------
>> >             24576
> That's bizarre.  AFAICS, pg_relation_size() reduces to a stat() call,
> while the other error looks like it's coming from rawpage.c's check on
> RelationGetNumberOfBlocks() which depends on mdnblocks() which prefers
> to look at the result of lseek(SEEK_END).  But both of those should
> surely get the same answer, if the file's not changing.
>
> Could you trace through it and see where the results diverge?  Also,
> what's the actual size of the file on disk?

# select pg_relation_filepath(2836::oid::regclass);
pg_relation_filepath
----------------------
 base/25317/11790


# ls -l data/base/25317/11790*
-rw------- 1 postgres postgres  8192 Jul 21 07:31 data/base/25317/11790
-rw------- 1 postgres postgres 24576 Jul 21 07:33 data/base/25317/11790_fsm
-rw------- 1 postgres postgres  8192 Jul 21 07:33 data/base/25317/11790_vm

You see, main and vm forks of the relation are one page. Only fsm is 3
pages.

After a fresh restart of the database I attached strace to the backend.
There are only 2 lines in the output that mention that relation:

open("base/25317/11790", O_RDWR)        = 35
lseek(35, 0, SEEK_END)                  = 8192

This happened during this query:

select get_raw_page(2836::oid::regclass::text, 'fsm', 1);

Shouldn't it rather open 11790_fsm?

Or is there something that first checks the main fork to see if the fsm
page makes sense?

It seems so because here is the same query for a relation where it works:

open("base/25317/60966", O_RDWR)        = 39
lseek(39, 0, SEEK_END)                  = 1490944
open("base/25317/60966_fsm", O_RDWR)    = 40
lseek(40, 8192, SEEK_SET)               = 8192
read(40, "\37\1\0\0\360\371\275\212\305\35\0\0\30\0\0 \0 \4
\0\0\0\0\0\0\0\0\372\372\0\372"..., 8192) = 8192

First it opens the main fork, then *_fsm where it reads a page at offset
8192.

Torsten

Re: check database integrity

From

Tom Lane

Date:

22 July 2014, 14:58:56

=?ISO-8859-1?Q?Torsten_F=F6rtsch?= <torsten.foertsch@gmx.net> writes:
> On 21/07/14 16:17, Tom Lane wrote:
>> Could you trace through it and see where the results diverge?  Also,
>> what's the actual size of the file on disk?

> After a fresh restart of the database I attached strace to the backend.
> There are only 2 lines in the output that mention that relation:
> open("base/25317/11790", O_RDWR)        = 35
> lseek(35, 0, SEEK_END)                  = 8192
> This happened during this query:
> select get_raw_page(2836::oid::regclass::text, 'fsm', 1);
> Shouldn't it rather open 11790_fsm?

Doh.  I looked right at this code in get_raw_page yesterday:

    if (blkno >= RelationGetNumberOfBlocks(rel))
        elog(ERROR, "block number %u is out of range for relation \"%s\"",
             blkno, RelationGetRelationName(rel));

RelationGetNumberOfBlocks reports the length of the main fork ... but
this check is applied regardless of which fork we're reading.  Should
be using RelationGetNumberOfBlocksInFork, of course.

            regards, tom lane

Re: check database integrity

From

Torsten Förtsch

Date:

23 July 2014, 02:27:02

On 22/07/14 16:58, Tom Lane wrote:
> Doh.  I looked right at this code in get_raw_page yesterday:
>
>     if (blkno >= RelationGetNumberOfBlocks(rel))
>         elog(ERROR, "block number %u is out of range for relation \"%s\"",
>              blkno, RelationGetRelationName(rel));
>
> RelationGetNumberOfBlocks reports the length of the main fork ... but
> this check is applied regardless of which fork we're reading.  Should
> be using RelationGetNumberOfBlocksInFork, of course.

Thanks for fixing it. I saw your GIT commit.

Do you know how long it takes for the change to make it into a new pgdg
debian package?

Thanks,
Torsten

Re: check database integrity

From

Tom Lane

Date:

23 July 2014, 04:01:50

=?ISO-8859-1?Q?Torsten_F=F6rtsch?= <torsten.foertsch@gmx.net> writes:
> On 22/07/14 16:58, Tom Lane wrote:
>> RelationGetNumberOfBlocks reports the length of the main fork ... but
>> this check is applied regardless of which fork we're reading.  Should
>> be using RelationGetNumberOfBlocksInFork, of course.

> Thanks for fixing it. I saw your GIT commit.

> Do you know how long it takes for the change to make it into a new pgdg
> debian package?

It will be awhile, since this patch just missed this week's releases :-(.

I believe it's pretty simple to rebuild Debian packages with patches added
to them, so you might take this as an opportunity to learn how to do
that...

            regards, tom lane