Thread: check database integrity
Hi, we are using 9.3 with data checksums enabled. Now I am looking for a way to check if all database blocks are still intact. First I tried pg_filedump. In many cases it simply ignored tampered data blocks. It is probably not made for this task. Then I remembered about the pageinspect extension. The following select is a bit too verbose but it seems to do the job for everything except fsm files. SELECT c.oid::regclass::text as rel, f.fork, ser.i as blocknr, pg.* FROM pg_class c CROSS JOIN (values ('main'::text), ('vm'::text)) f(fork) CROSS JOIN pg_relation_size(c.oid::regclass, f.fork) sz(sz) CROSS JOIN generate_series(0,(sz.sz/8192)::int-1) ser(i) CROSS JOIN page_header(get_raw_page(c.oid::regclass::text, f.fork, ser.i)) pg WHERE sz.sz>0 Am I right? The problem with the select above is that either page_header() or get_raw_page() seems to allocate the memory for the page without freeing it again. The process size grew to ~12.5 GB and the query returned ~1,500,000 rows. And 1.5E6 * 8 kB gives roughly 12 GB. Shared buffers is ~120 MB for this database. I ran this query in a separate transaction. The memory was freed only when the backend process exited. Is there a way to work around this memory leak? Is there a better way to do what I want? I also thought about pg_dump. But that does not read indexes, AFAIK. Best if the solution would avoid expensive locks. Then I could also use it in production. But currently I need it only to verify a backup. Thanks, Torsten
On Sun, Jul 20, 2014 at 02:57:20PM +0200, Torsten Förtsch wrote: > I ran this query in a separate transaction. The memory was freed only > when the backend process exited. > > Is there a way to work around this memory leak? Why do you think it's a memory leak. You asked for the full dataset; you need to have the memory to allocate for it. At least, that's what I'd expect. > expensive locks. Then I could also use it in production. But currently I > need it only to verify a backup. If you need to verify a backup, why isn't pg_dump acceptable? Or is it that you are somehow trying to prove that what you have on the target (backup) machine is in fact production-ready? I guess I don't really understand what you are trying to do. A -- Andrew Sullivan ajs@crankycanuck.ca
=?ISO-8859-1?Q?Torsten_F=F6rtsch?= <torsten.foertsch@gmx.net> writes: > Then I remembered about the pageinspect extension. The following select > is a bit too verbose but it seems to do the job for everything except > fsm files. > SELECT c.oid::regclass::text as rel, > f.fork, > ser.i as blocknr, > pg.* > FROM pg_class c > CROSS JOIN (values ('main'::text), ('vm'::text)) f(fork) > CROSS JOIN pg_relation_size(c.oid::regclass, f.fork) sz(sz) > CROSS JOIN generate_series(0,(sz.sz/8192)::int-1) ser(i) > CROSS JOIN page_header(get_raw_page(c.oid::regclass::text, > f.fork, > ser.i)) pg > WHERE sz.sz>0 > The problem with the select above is that either page_header() or > get_raw_page() seems to allocate the memory for the page without freeing > it again. Probably commit 45b0f3572 will help you with that. > I ran this query in a separate transaction. The memory was freed only > when the backend process exited. AFAIK such memory is released at end of query, even without the patch. Are you sure you aren't looking at shared-buffer usage? Or maybe you're on a platform where libc doesn't release freed memory back to the OS. regards, tom lane
On 20/07/14 17:35, Tom Lane wrote: > =?ISO-8859-1?Q?Torsten_F=F6rtsch?= <torsten.foertsch@gmx.net> writes: >> Then I remembered about the pageinspect extension. The following select >> is a bit too verbose but it seems to do the job for everything except >> fsm files. > >> SELECT c.oid::regclass::text as rel, >> f.fork, >> ser.i as blocknr, >> pg.* >> FROM pg_class c >> CROSS JOIN (values ('main'::text), ('vm'::text)) f(fork) >> CROSS JOIN pg_relation_size(c.oid::regclass, f.fork) sz(sz) >> CROSS JOIN generate_series(0,(sz.sz/8192)::int-1) ser(i) >> CROSS JOIN page_header(get_raw_page(c.oid::regclass::text, >> f.fork, >> ser.i)) pg >> WHERE sz.sz>0 > >> The problem with the select above is that either page_header() or >> get_raw_page() seems to allocate the memory for the page without freeing >> it again. > > Probably commit 45b0f3572 will help you with that. Thanks, Tom. At least the patch description helped. I moved the page_header() call to output column list and now it works perfectly. I'll try the patch next weekend. >> I ran this query in a separate transaction. The memory was freed only >> when the backend process exited. > > AFAIK such memory is released at end of query, even without the patch. > Are you sure you aren't looking at shared-buffer usage? Or maybe you're > on a platform where libc doesn't release freed memory back to the OS. You are right here. When I wrote the email I restored the behaviour from my faulty memory. Today I tried it again and the memory is indeed freed at the end of the query. Another question, just out of curiosity, for vm and main forks I use pg_relation_size to figure out the highest page number. That does not work for fsm. I have at least one fsm file that it 24 kb. Fetching page 0 works, page 1 and above gives an error: db=# select page_header(get_raw_page(2836::oid::regclass::text, 'fsm', 0)); page_header ---------------------------------------------- (114/23485F78,19084,0,24,8192,8192,8192,4,0) (1 row) db=# select page_header(get_raw_page(2836::oid::regclass::text, 'fsm', 1)); ERROR: block number 1 is out of range for relation "pg_toast_1255" db=# select pg_relation_size(2836::oid::regclass, 'fsm'); pg_relation_size ------------------ 24576 For other relations it works: db=# select page_header(get_raw_page(60966::oid::regclass::text, 'fsm', i)) from generate_series(0,2) i; page_header ----------------------------------------------- (11F/76884610,-4342,0,24,8192,8192,8192,4,0) (11F/768825C0,22465,0,24,8192,8192,8192,4,0) (11F/83E9EC38,-29015,0,24,8192,8192,8192,4,0) (3 rows) db=# select pg_relation_size(60966::oid::regclass, 'fsm'); pg_relation_size ------------------ 24576 Is there a way to figure out the highest page number for fsm forks? Is there perhaps a common way that works for all forks? Thanks, Torsten
On 20/07/14 16:02, Andrew Sullivan wrote: >> Then I could also use it in production. But currently I >> > need it only to verify a backup. > If you need to verify a backup, why isn't pg_dump acceptable? Or is > it that you are somehow trying to prove that what you have on the > target (backup) machine is in fact production-ready? I guess I don't > really understand what you are trying to do. Sorry, for kind-of misusing the word backup. What I am doing is this. I took a base backup and replayed a few xlogs. This is what I meant with "backup". What I want to verify is whether all pages in all files match their checksums. So, I have to make postgres read all pages at least once. Pg_dump does this for normal tables and toast. But it does not read index relations as far as I know. A select count(*) from all tables would also do the job, again without indexes. The sentence about the backup was only to point out that I don't really care if the query locks the database for concurrent transactions. But better if it would not acquire an exclusive lock on all tables. Torsten
=?ISO-8859-1?Q?Torsten_F=F6rtsch?= <torsten.foertsch@gmx.net> writes: > Another question, just out of curiosity, for vm and main forks I use > pg_relation_size to figure out the highest page number. That does not > work for fsm. I have at least one fsm file that it 24 kb. Fetching page > 0 works, page 1 and above gives an error: > db=# select page_header(get_raw_page(2836::oid::regclass::text, 'fsm', 0)); > page_header > ---------------------------------------------- > (114/23485F78,19084,0,24,8192,8192,8192,4,0) > (1 row) > db=# select page_header(get_raw_page(2836::oid::regclass::text, 'fsm', > 1)); > ERROR: block number 1 is out of range for relation "pg_toast_1255" > db=# select pg_relation_size(2836::oid::regclass, 'fsm'); > pg_relation_size > ------------------ > 24576 That's bizarre. AFAICS, pg_relation_size() reduces to a stat() call, while the other error looks like it's coming from rawpage.c's check on RelationGetNumberOfBlocks() which depends on mdnblocks() which prefers to look at the result of lseek(SEEK_END). But both of those should surely get the same answer, if the file's not changing. Could you trace through it and see where the results diverge? Also, what's the actual size of the file on disk? Alternatively, if you have a way to reproduce this from a standing start, I'd be happy to poke into it here. regards, tom lane PS: you've not messed around with the value of BLCKSZ have you? If different bits of code had different ideas of that constant's value, it'd possibly explain this.
On 21/07/14 16:17, Tom Lane wrote: >> > db=# select page_header(get_raw_page(2836::oid::regclass::text, 'fsm', >> > 1)); >> > ERROR: block number 1 is out of range for relation "pg_toast_1255" >> > db=# select pg_relation_size(2836::oid::regclass, 'fsm'); >> > pg_relation_size >> > ------------------ >> > 24576 > That's bizarre. AFAICS, pg_relation_size() reduces to a stat() call, > while the other error looks like it's coming from rawpage.c's check on > RelationGetNumberOfBlocks() which depends on mdnblocks() which prefers > to look at the result of lseek(SEEK_END). But both of those should > surely get the same answer, if the file's not changing. > > Could you trace through it and see where the results diverge? Also, > what's the actual size of the file on disk? # select pg_relation_filepath(2836::oid::regclass); pg_relation_filepath ---------------------- base/25317/11790 # ls -l data/base/25317/11790* -rw------- 1 postgres postgres 8192 Jul 21 07:31 data/base/25317/11790 -rw------- 1 postgres postgres 24576 Jul 21 07:33 data/base/25317/11790_fsm -rw------- 1 postgres postgres 8192 Jul 21 07:33 data/base/25317/11790_vm You see, main and vm forks of the relation are one page. Only fsm is 3 pages. After a fresh restart of the database I attached strace to the backend. There are only 2 lines in the output that mention that relation: open("base/25317/11790", O_RDWR) = 35 lseek(35, 0, SEEK_END) = 8192 This happened during this query: select get_raw_page(2836::oid::regclass::text, 'fsm', 1); Shouldn't it rather open 11790_fsm? Or is there something that first checks the main fork to see if the fsm page makes sense? It seems so because here is the same query for a relation where it works: open("base/25317/60966", O_RDWR) = 39 lseek(39, 0, SEEK_END) = 1490944 open("base/25317/60966_fsm", O_RDWR) = 40 lseek(40, 8192, SEEK_SET) = 8192 read(40, "\37\1\0\0\360\371\275\212\305\35\0\0\30\0\0 \0 \4 \0\0\0\0\0\0\0\0\372\372\0\372"..., 8192) = 8192 First it opens the main fork, then *_fsm where it reads a page at offset 8192. Torsten
=?ISO-8859-1?Q?Torsten_F=F6rtsch?= <torsten.foertsch@gmx.net> writes: > On 21/07/14 16:17, Tom Lane wrote: >> Could you trace through it and see where the results diverge? Also, >> what's the actual size of the file on disk? > After a fresh restart of the database I attached strace to the backend. > There are only 2 lines in the output that mention that relation: > open("base/25317/11790", O_RDWR) = 35 > lseek(35, 0, SEEK_END) = 8192 > This happened during this query: > select get_raw_page(2836::oid::regclass::text, 'fsm', 1); > Shouldn't it rather open 11790_fsm? Doh. I looked right at this code in get_raw_page yesterday: if (blkno >= RelationGetNumberOfBlocks(rel)) elog(ERROR, "block number %u is out of range for relation \"%s\"", blkno, RelationGetRelationName(rel)); RelationGetNumberOfBlocks reports the length of the main fork ... but this check is applied regardless of which fork we're reading. Should be using RelationGetNumberOfBlocksInFork, of course. regards, tom lane
On 22/07/14 16:58, Tom Lane wrote: > Doh. I looked right at this code in get_raw_page yesterday: > > if (blkno >= RelationGetNumberOfBlocks(rel)) > elog(ERROR, "block number %u is out of range for relation \"%s\"", > blkno, RelationGetRelationName(rel)); > > RelationGetNumberOfBlocks reports the length of the main fork ... but > this check is applied regardless of which fork we're reading. Should > be using RelationGetNumberOfBlocksInFork, of course. Thanks for fixing it. I saw your GIT commit. Do you know how long it takes for the change to make it into a new pgdg debian package? Thanks, Torsten
=?ISO-8859-1?Q?Torsten_F=F6rtsch?= <torsten.foertsch@gmx.net> writes: > On 22/07/14 16:58, Tom Lane wrote: >> RelationGetNumberOfBlocks reports the length of the main fork ... but >> this check is applied regardless of which fork we're reading. Should >> be using RelationGetNumberOfBlocksInFork, of course. > Thanks for fixing it. I saw your GIT commit. > Do you know how long it takes for the change to make it into a new pgdg > debian package? It will be awhile, since this patch just missed this week's releases :-(. I believe it's pretty simple to rebuild Debian packages with patches added to them, so you might take this as an opportunity to learn how to do that... regards, tom lane