Thread: Re: Does PostgreSQL check database integrity at startup?

Re: Does PostgreSQL check database integrity at startup?

From

Alvaro Herrera

Date:

27 December 2017, 19:47:46

Stephen Frost wrote:

> * Edson Carlos Ericksson Richter (richter@simkorp.com.br) wrote:

> > Anyway, instead digging into rsync functionality (or bugs - I doubt,
> > but who knows?), I do prefer to have a script I can run to check if
> > there is obvious failures in standby servers.
> 
> As mentioned, zero-byte files can be perfectly valid.  PostgreSQL does
> have page-level CRCs, if you initialized your database with them (which
> I would strongly recommend).

Page-level checksums would not detect the problem being complained in
this thread, however.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

27 December 2017, 20:02:13

Alvaro,

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> Stephen Frost wrote:
>
> > * Edson Carlos Ericksson Richter (richter@simkorp.com.br) wrote:
>
> > > Anyway, instead digging into rsync functionality (or bugs - I doubt,
> > > but who knows?), I do prefer to have a script I can run to check if
> > > there is obvious failures in standby servers.
> >
> > As mentioned, zero-byte files can be perfectly valid.  PostgreSQL does
> > have page-level CRCs, if you initialized your database with them (which
> > I would strongly recommend).
>
> Page-level checksums would not detect the problem being complained in
> this thread, however.

It's entirely unclear to me what the problem being complained about in
this thread actually is.  The complaint so far was about zero-byte
files, but those are entirely valid, so that isn't a problem that anyone
can solve..

Given the thread subject, if someone actually wanted to do a database
integrity check before startup, they could use pgBackRest to perform a
backup with a CRC-enabled database and at least verify that all of the
checksums are valid.

We could possibly look into adding some set of additional checks for
files which can't actually be zero-byte, perhaps..  I know we have some
other one-off checks already.

Thanks!

Stephen

Attachment

signature.asc

Re: Does PostgreSQL check database integrity at startup?

From

Edson Carlos Ericksson Richter

Date:

27 December 2017, 20:30:48

Em 27/12/2017 15:02, Stephen Frost escreveu:
> Alvaro,
>
> * Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
>> Stephen Frost wrote:
>>
>>> * Edson Carlos Ericksson Richter (richter@simkorp.com.br) wrote:
>>>> Anyway, instead digging into rsync functionality (or bugs - I doubt,
>>>> but who knows?), I do prefer to have a script I can run to check if
>>>> there is obvious failures in standby servers.
>>> As mentioned, zero-byte files can be perfectly valid.  PostgreSQL does
>>> have page-level CRCs, if you initialized your database with them (which
>>> I would strongly recommend).
>> Page-level checksums would not detect the problem being complained in
>> this thread, however.
> It's entirely unclear to me what the problem being complained about in
> this thread actually is.  The complaint so far was about zero-byte
> files, but those are entirely valid, so that isn't a problem that anyone
> can solve..
>
> Given the thread subject, if someone actually wanted to do a database
> integrity check before startup, they could use pgBackRest to perform a
> backup with a CRC-enabled database and at least verify that all of the
> checksums are valid.
>
> We could possibly look into adding some set of additional checks for
> files which can't actually be zero-byte, perhaps..  I know we have some
> other one-off checks already.
>
> Thanks!
>
> Stephen

Actually, the problem is:

Master => Slave => Backup

In the master server everything is fine.
But at some point in time, slave became corrupt (one of the base files 
are zero size where it should be 16Mb in size), and IMHO a "red alert" 
should arise - Slave server shall not even startup at all.

Since backups are taken from slave server, all backups are also corrupt.

I've detected the problem just because I've restored a backup (excellent 
practice perhaps - nobody should take backups if not testing it with the 
restore procedure).

In slave server there is no indication that the database is corrupt (not 
in logs, it starts normally and show it is applying stream changes 
regularly).

So that is the point: how to detect that a database is corrupt so 
cluster doesn't even start...

Regards,

Edson

Re: Does PostgreSQL check database integrity at startup?

From

Alvaro Herrera

Date:

27 December 2017, 23:11:16

Stephen Frost wrote:

> It's entirely unclear to me what the problem being complained about in
> this thread actually is.

As Edson explained, a relfilenode in the standby server is zero bytes
long when it is not that size in the primary server, and it corresponds
to a persistent table.  I don't have any satisfactory explanation for
that.

> Given the thread subject, if someone actually wanted to do a database
> integrity check before startup, they could use pgBackRest to perform a
> backup with a CRC-enabled database and at least verify that all of the
> checksums are valid.

That's not a complete solution, because a zero-byte file does not
contain any CRC.  CRCs may detect some problems, but they will not
detect this particular kind of corruption.

> We could possibly look into adding some set of additional checks for
> files which can't actually be zero-byte, perhaps..  I know we have some
> other one-off checks already.

Some possibilities, from the realm of haven't-had-coffee-yet-after-lunch:

* the length of the FSM fork can let you infer something about the
length that the main fork ought to have.  Maybe the VM fork too? not sure.
(Easy to check: just some math on the size of the FSM/VM forks)

* the largest block number in any item pointer in any index of a table
can tell you what's the latest page that should appear in the table.
(Expensive: need to scan the indexes completely)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

27 December 2017, 23:24:07

Alvaro,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> Stephen Frost wrote:
>
> > It's entirely unclear to me what the problem being complained about in
> > this thread actually is.
>
> As Edson explained, a relfilenode in the standby server is zero bytes
> long when it is not that size in the primary server, and it corresponds
> to a persistent table.  I don't have any satisfactory explanation for
> that.

Somehow I missed the discussion of it being zero bytes for a long time
after it's non-zero on the primary (I take the to main that it's longer
than whatever checkpoint timeout is set to..?).  That does sound like
there might be an actual issue/bug here.

> > Given the thread subject, if someone actually wanted to do a database
> > integrity check before startup, they could use pgBackRest to perform a
> > backup with a CRC-enabled database and at least verify that all of the
> > checksums are valid.
>
> That's not a complete solution, because a zero-byte file does not
> contain any CRC.  CRCs may detect some problems, but they will not
> detect this particular kind of corruption.

I agree that we can certainly have bugs which will cause corruption
that's not detectable by our CRCs.  Validating CRCs is just checking for
one kind of corruption; using amcheck would provide another level,
though it's unclear if that would help in this specific case if it's
really just files on the standby (but not the primary) being zero-length
long after they should be written out.

> > We could possibly look into adding some set of additional checks for
> > files which can't actually be zero-byte, perhaps..  I know we have some
> > other one-off checks already.
>
> Some possibilities, from the realm of haven't-had-coffee-yet-after-lunch:
>
> * the length of the FSM fork can let you infer something about the
> length that the main fork ought to have.  Maybe the VM fork too? not sure.
> (Easy to check: just some math on the size of the FSM/VM forks)

I was just chatting with David over lunch specifically about checking
that the number of forks was valid and looking at their lengths also
makes sense, but the VM is only created/updated based on VACUUM runs, so
checking that is a bit tricky.  David has ideas about how to handle
various races when it comes to forks (in particular by building a
manifest once and then seeing if anything in the relation has changed
since the manifest was built- if so, assume that WAL has the necessary
info and if not then raise a flag) but it's all a bit tricky when the
system is running.  We might be able to do more checks if we're serious
about supporting pre-startup checks though.  We were also thinking about
having a set of catalog tables that must not be zero and checking for
those.

> * the largest block number in any item pointer in any index of a table
> can tell you what's the latest page that should appear in the table.
> (Expensive: need to scan the indexes completely)

This is something that I'd expect amcheck to handle; I'm not sure it
makes sense to invent something else.  David and I have chatted a little
bit a while ago about trying to incorporate amcheck but it's rather
complicated and not something we plan to do in the near-term.  Still, it
does seem like it'd be a nice capability to have.  I wish it was
possible to do off-line though.  Of course, a user could do a restore
and then run amcheck on the result themselves.

Thanks!

Stephen

Attachment

signature.asc

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

28 December 2017, 00:33:15

Edson,

* Edson Carlos Ericksson Richter (richter@simkorp.com.br) wrote:
> Em 27/12/2017 15:02, Stephen Frost escreveu:
> >* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> >>Stephen Frost wrote:
> >>
> >>>* Edson Carlos Ericksson Richter (richter@simkorp.com.br) wrote:
> >>>>Anyway, instead digging into rsync functionality (or bugs - I doubt,
> >>>>but who knows?), I do prefer to have a script I can run to check if
> >>>>there is obvious failures in standby servers.
> >>>As mentioned, zero-byte files can be perfectly valid.  PostgreSQL does
> >>>have page-level CRCs, if you initialized your database with them (which
> >>>I would strongly recommend).
> >>Page-level checksums would not detect the problem being complained in
> >>this thread, however.
> >It's entirely unclear to me what the problem being complained about in
> >this thread actually is.  The complaint so far was about zero-byte
> >files, but those are entirely valid, so that isn't a problem that anyone
> >can solve..
> >
> >Given the thread subject, if someone actually wanted to do a database
> >integrity check before startup, they could use pgBackRest to perform a
> >backup with a CRC-enabled database and at least verify that all of the
> >checksums are valid.
> >
> >We could possibly look into adding some set of additional checks for
> >files which can't actually be zero-byte, perhaps..  I know we have some
> >other one-off checks already.
>
> Actually, the problem is:
>
> Master => Slave => Backup
>
> In the master server everything is fine.

Alright..

> But at some point in time, slave became corrupt (one of the base
> files are zero size where it should be 16Mb in size), and IMHO a
> "red alert" should arise - Slave server shall not even startup at
> all.

How do you know it should be 16Mb in size...?  That sounds like you're
describing a WAL file, but you should be archiving your WAL files during
a backup, not just using whatever is in pg_xlog/pg_wal..

> Since backups are taken from slave server, all backups are also corrupt.

If you aren't following the appropriate process to perform a backup
then, yes, you're going to end up with corrupt and useless/bad backups.

Backing up from a replica has only been officially supported using the
pg_start/stop_backup methods as of 9.6 and only when doing a
non-exclusive backup.

Note that the wiki page you're talking about (I think, anyway...) is
describing *exclusive* backup, not non-exclusive, and the two are not
the same.

> I've detected the problem just because I've restored a backup
> (excellent practice perhaps - nobody should take backups if not
> testing it with the restore procedure).

Yes, restoring a backup is excellent practice and something that
everyone really should be doing.  In my view, at least, everyone should
also be using well tested backup tools instead of trying to write their
own.

> In slave server there is no indication that the database is corrupt
> (not in logs, it starts normally and show it is applying stream
> changes regularly).

This sounds like what's happening is that you're ending up with PG
thinking that a crash happened because backup_label is missing, which
will happen if you do pg_start/stop_backup on the replica and don't make
sure to take the results from pg_stop_backup and create the backup_label
file before starting PostgreSQL after the restore.

This isn't something that would happen if you used a backup tool that
knew about how to perform a non-exclusive backup or how to properly
backup using a replica instead of trying to write your own without
understanding how all these pieces play together.

> So that is the point: how to detect that a database is corrupt so
> cluster doesn't even start...

I'm not sure that the database is actually corrupt in this specific
case- it sounds like everything is actually fine, but you didn't include
the backup_label file when restoring and therefore PG thinks there is
crash recovery happening when it should be replaying WAL from the start
of the backup, but how is PG going to know that?  Well, it'd know that
from the backup_label file, if it was there..  Otherwise, it seems
pretty difficult for us to know that we're not doing crash recovery.
I'm certainly open to ideas on how to detect that, but nothing springs
to mind off-hand.

Again, using a backup tool instead of trying to roll your own would make
this much less likely to happen.  This isn't something which should
happen when you're using pg_basebackup or the other PG-specific backup
tools, rather it's only happening because you're trying to do your own
with pg_start/stop_backup and rsync and didn't completely read the
documentation on non-exclusive backups from replicas (that said, those
docs could certainly use improvment...).

Of course, perhaps I'm misunderstanding exactly what you're doing or
what file you're referring to, but this is my best guess based on the
information you've provided so far.

Thanks!

Stephen

Attachment

signature.asc

Re: Does PostgreSQL check database integrity at startup?

From

Alvaro Herrera

Date:

28 December 2017, 00:52:04

For context: this was first reported in the Barman forum here:
https://groups.google.com/forum/#!msg/pgbarman/3aXWpaKWRFI/weUIZxspDAAJ
They are using Barman for the backups.

Stephen Frost wrote:

> > But at some point in time, slave became corrupt (one of the base
> > files are zero size where it should be 16Mb in size), and IMHO a
> > "red alert" should arise - Slave server shall not even startup at
> > all.
> 
> How do you know it should be 16Mb in size...?  That sounds like you're
> describing a WAL file, but you should be archiving your WAL files during
> a backup, not just using whatever is in pg_xlog/pg_wal..

It's not a WAL file -- it's a file backing a table.

> > Since backups are taken from slave server, all backups are also corrupt.
> 
> If you aren't following the appropriate process to perform a backup
> then, yes, you're going to end up with corrupt and useless/bad backups.

A few guys went over the backup-taking protocol upthread already.

But anyway the backup tool is a moot point.  The problem doesn't
originate in the backup -- it originates in the standby, from where the
backup is taken.  The file can be seen as size 0 in the standby.
Edson's question is: why wasn't the problem detected in the standby?
It seems a valid question to me, to which we currently we don't have any
good answer.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

28 December 2017, 15:16:51

Alvaro,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> For context: this was first reported in the Barman forum here:
> https://groups.google.com/forum/#!msg/pgbarman/3aXWpaKWRFI/weUIZxspDAAJ
> They are using Barman for the backups.

Ahhhh, I see.  I wasn't aware of that history.

> Stephen Frost wrote:
>
> > > But at some point in time, slave became corrupt (one of the base
> > > files are zero size where it should be 16Mb in size), and IMHO a
> > > "red alert" should arise - Slave server shall not even startup at
> > > all.
> >
> > How do you know it should be 16Mb in size...?  That sounds like you're
> > describing a WAL file, but you should be archiving your WAL files during
> > a backup, not just using whatever is in pg_xlog/pg_wal..
>
> It's not a WAL file -- it's a file backing a table.

Interesting.

> > > Since backups are taken from slave server, all backups are also corrupt.
> >
> > If you aren't following the appropriate process to perform a backup
> > then, yes, you're going to end up with corrupt and useless/bad backups.
>
> A few guys went over the backup-taking protocol upthread already.
>
> But anyway the backup tool is a moot point.  The problem doesn't
> originate in the backup -- it originates in the standby, from where the
> backup is taken.  The file can be seen as size 0 in the standby.
> Edson's question is: why wasn't the problem detected in the standby?
> It seems a valid question to me, to which we currently we don't have any
> good answer.

The last message on that thread seems pretty clear to me- the comment is
"I think this is a failure in standby build."  It's not clear what that
failure was but I agree it doesn't appear related to the backup tool
(the comment there is "I'm using rsync"), or, really, PostgreSQL at all
(a failure during the build of the replica isn't something we're
necessairly going to pick up on..).

As discussed on this thread, zero-byte files are entirely valid to
appear in the PostgreSQL data directory.

To try and dig into what happened, I'd probably look at what forks there
are of that relation, the entry in pg_class, and try to figure out how
it is that replication isn't complaining when the file on the primary
appeared to be modified well after the last modify timestamp on the
replica.  If it's possible to replica this into a test environment,
maybe even do a no-op update of a row of that table and see what happens
with replication.  One thing I wonder is if this table used to be
unlogged and it was later turned into a logged table but something
didn't quite happen correctly with that.  I'd also suggest looking for
other file size mismatches between the primary and the replica.

Thanks!

Stephen

Attachment

signature.asc

Re: Does PostgreSQL check database integrity at startup?

From

Edson Carlos Ericksson Richter

Date:

28 December 2017, 20:46:43

Em 28/12/2017 10:16, Stephen Frost escreveu:
> Alvaro,
>
> * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
>> For context: this was first reported in the Barman forum here:
>> https://groups.google.com/forum/#!msg/pgbarman/3aXWpaKWRFI/weUIZxspDAAJ
>> They are using Barman for the backups.
> Ahhhh, I see.  I wasn't aware of that history.
>
>> Stephen Frost wrote:
>>
>>>> But at some point in time, slave became corrupt (one of the base
>>>> files are zero size where it should be 16Mb in size), and IMHO a
>>>> "red alert" should arise - Slave server shall not even startup at
>>>> all.
>>> How do you know it should be 16Mb in size...?  That sounds like you're
>>> describing a WAL file, but you should be archiving your WAL files during
>>> a backup, not just using whatever is in pg_xlog/pg_wal..
>> It's not a WAL file -- it's a file backing a table.
> Interesting.
>
>>>> Since backups are taken from slave server, all backups are also corrupt.
>>> If you aren't following the appropriate process to perform a backup
>>> then, yes, you're going to end up with corrupt and useless/bad backups.
>> A few guys went over the backup-taking protocol upthread already.
>>
>> But anyway the backup tool is a moot point.  The problem doesn't
>> originate in the backup -- it originates in the standby, from where the
>> backup is taken.  The file can be seen as size 0 in the standby.
>> Edson's question is: why wasn't the problem detected in the standby?
>> It seems a valid question to me, to which we currently we don't have any
>> good answer.
> The last message on that thread seems pretty clear to me- the comment is
> "I think this is a failure in standby build."  It's not clear what that
> failure was but I agree it doesn't appear related to the backup tool
> (the comment there is "I'm using rsync"), or, really, PostgreSQL at all
> (a failure during the build of the replica isn't something we're
> necessairly going to pick up on..).
>
> As discussed on this thread, zero-byte files are entirely valid to
> appear in the PostgreSQL data directory.
>
> To try and dig into what happened, I'd probably look at what forks there
> are of that relation, the entry in pg_class, and try to figure out how
> it is that replication isn't complaining when the file on the primary
> appeared to be modified well after the last modify timestamp on the
> replica.  If it's possible to replica this into a test environment,
> maybe even do a no-op update of a row of that table and see what happens
> with replication.  One thing I wonder is if this table used to be
> unlogged and it was later turned into a logged table but something
> didn't quite happen correctly with that.  I'd also suggest looking for
> other file size mismatches between the primary and the replica.
>
> Thanks!
>
> Stephen
The table was never unlogged. From very beginning, it was always logged. 
I've dozens of databases with exactly same setup - and right now, I'm 
rebuilding the slave server. Instead of investigating something probably 
I will not find the cause, I would like to have a alert for the future.

Would be possible to include in future versions:
1) After start standby, standby run all WAL files until it is 
synchronized with master (current behavior)
3) Before getting into "accept read only queries", check if all base 
files have same size as master server (new behavior). In case something 
is different, throw an error and stop database startup?
4) Then start "accept read only queries" (current behavior)
???

Thanks,

Edson

Re: Does PostgreSQL check database integrity at startup?

From

Brent Wood

Date:

28 December 2017, 21:06:48

Some thoughts....

A tool to calculate a checksum of sorts based on the table (file) content would provide a better surety of duplication than simply checking file size - like differently vacuumed tables in each copy could have the same content but be different file sizes.

Something like these could be adapted to compare database content by filesystem checks rather than db queries. Following tablespaces, etc as well.

http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash

or other similar tools

Yes, there is some overhead, especially for large databases but it would be worth that to robustly ensure genuine and complete duplication.

I do wonder though - given the order of records in a table (file) is not necessarily identical (or is it?) event this may be problematic. Perhaps a checksum based on the result of a query output ordered by primary key could work?

Brent Wood

From: Edson Carlos Ericksson Richter <richter@simkorp.com.br>
To: pgsql-general@lists.postgresql.org
Sent: Friday, December 29, 2017 6:47 AM
Subject: Re: Does PostgreSQL check database integrity at startup?

Em 28/12/2017 10:16, Stephen Frost escreveu:
> Alvaro,
>
> * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
>> For context: this was first reported in the Barman forum here:
>> https://groups.google.com/forum/#!msg/pgbarman/3aXWpaKWRFI/weUIZxspDAAJ
>> They are using Barman for the backups.
> Ahhhh, I see. I wasn't aware of that history.
>
>> Stephen Frost wrote:
>>
>>>> But at some point in time, slave became corrupt (one of the base
>>>> files are zero size where it should be 16Mb in size), and IMHO a
>>>> "red alert" should arise - Slave server shall not even startup at
>>>> all.
>>> How do you know it should be 16Mb in size...? That sounds like you're
>>> describing a WAL file, but you should be archiving your WAL files during
>>> a backup, not just using whatever is in pg_xlog/pg_wal..
>> It's not a WAL file -- it's a file backing a table.
> Interesting.
>
>>>> Since backups are taken from slave server, all backups are also corrupt.
>>> If you aren't following the appropriate process to perform a backup
>>> then, yes, you're going to end up with corrupt and useless/bad backups.
>> A few guys went over the backup-taking protocol upthread already.
>>
>> But anyway the backup tool is a moot point. The problem doesn't
>> originate in the backup -- it originates in the standby, from where the
>> backup is taken. The file can be seen as size 0 in the standby.
>> Edson's question is: why wasn't the problem detected in the standby?
>> It seems a valid question to me, to which we currently we don't have any
>> good answer.
> The last message on that thread seems pretty clear to me- the comment is
> "I think this is a failure in standby build." It's not clear what that
> failure was but I agree it doesn't appear related to the backup tool
> (the comment there is "I'm using rsync"), or, really, PostgreSQL at all
> (a failure during the build of the replica isn't something we're
> necessairly going to pick up on..).
>
> As discussed on this thread, zero-byte files are entirely valid to
> appear in the PostgreSQL data directory.
>
> To try and dig into what happened, I'd probably look at what forks there
> are of that relation, the entry in pg_class, and try to figure out how
> it is that replication isn't complaining when the file on the primary
> appeared to be modified well after the last modify timestamp on the
> replica. If it's possible to replica this into a test environment,
> maybe even do a no-op update of a row of that table and see what happens
> with replication. One thing I wonder is if this table used to be
> unlogged and it was later turned into a logged table but something
> didn't quite happen correctly with that. I'd also suggest looking for
> other file size mismatches between the primary and the replica.
>
> Thanks!
>
> Stephen
The table was never unlogged. From very beginning, it was always logged.
I've dozens of databases with exactly same setup - and right now, I'm
rebuilding the slave server. Instead of investigating something probably
I will not find the cause, I would like to have a alert for the future.

Would be possible to include in future versions:
1) After start standby, standby run all WAL files until it is
synchronized with master (current behavior)
3) Before getting into "accept read only queries", check if all base
files have same size as master server (new behavior). In case something
is different, throw an error and stop database startup?
4) Then start "accept read only queries" (current behavior)
???

Thanks,

Edson

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

28 December 2017, 21:16:16

Greetings,

* Edson Carlos Ericksson Richter (richter@simkorp.com.br) wrote:
> Would be possible to include in future versions:
> 1) After start standby, standby run all WAL files until it is
> synchronized with master (current behavior)
> 3) Before getting into "accept read only queries", check if all base
> files have same size as master server (new behavior). In case
> something is different, throw an error and stop database startup?
> 4) Then start "accept read only queries" (current behavior)

I'm afraid it wouldn't be nearly that simple with a running system.  If
you're that concerned about it, you could shut down the primary, verify
that the replica has fully caught up to where the primary was, and then
compare the file sizes on disk using a script yourself (excluding the
temporary files and unlogged relations, etc, of course).  I'm not
completely sure that there isn't a valid reason for them to be different
even with everything shut down and fully caught up, but that seems like
the best chance way to match things up.

On a running system where lots of processes are potentially writing to
different files across the system, getting a point-in-time snapshot of
the file sizes across the entire system as of a certain WAL point (so
you could replay to there and then compare on the replica) seems
unlikely to be possible, at least with just PG.  Perhaps with help from
the OS/filesystem, you might be able to get such a snapshot.

Do you have some way of knowing that such a check would have actually
caught this..?  Isn't it possible that the file became zero'd out
sometime after the replica started up and was running fine for a while?
One of the questions I asked earlier drove at exactly this question- why
weren't there any errors thrown during WAL replay on the replica, given
that the relation clearly appeared to be modified after the replica was
built?  Another thing which could be done would be to look through the
WAL for references to that file and see what the WAL included for it.

Also, really, if this is happening then there's something pretty wrong
with some part of the process and that simply needs to be fixed- just
throwing an error saying the replica isn't valid isn't really good for
anyone.  Where the issue is, I don't think we have any idea, because
there's a lot of details missing here, which is why I was asking
questions about the other data and information since it might help us
figure out what happened.  Perhaps it's an issue in the OS or filesystem
or another layer (though there should really be logs if that's the case)
or maybe it really is a PG issue, but if so, we need a lot more info to
debug and address it.

Thanks!

Stephen

Attachment

signature.asc

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

28 December 2017, 21:26:36

Greetings Brent,

* Brent Wood (pcreso@yahoo.com) wrote:
> A tool to calculate a checksum of sorts based on the table (file) content would provide a better surety of
duplicationthan simply checking file size - like differently vacuumed tables in each copy could have the same content
butbe different file sizes.  

PG has support for checksums and there are tools out there to validate
that the checksum is correct for all pages which have one, but that
wouldn't help in this case because the file is zero'd out (and a zero'd
out file is actually a valid file in a PG data directory).

Also, the files on the primary and the replica actually can be different
when looked at with a complete-file checksum due to hint bits being set
differently (at least, possibly other ways too).  That doesn't make them
invalid or incorrect though.

Rather than trying to compare a running primary to a running replica,
such a check to verify that the files backed up during a backup have the
same checksum as the files being restored from that backup can be done,
and that *is* done in at least some of the PG backup tools already
(pgBackRest has an independent manifest that it stores for each backup
which contains the checksum of each file as-backed-up, and it verifies
that checksum when performing a restore to make sure that the backed up
file wasn't corrupted in place, other tools hopefully have similar).

> I do wonder though - given the order of records in a table (file) is not necessarily identical (or is it?) event this
maybe problematic. Perhaps a checksum based on the result of a query output ordered by primary key could work? 

The order of records in a *file* should be the same in the heap on the
primary as they are on the replica, but that doesn't mean the contents
of those files will be exactly the same (as mentioned above, hint bits
can differ).  We used to have cases where the indexes could also be
different, but I believe that was changed so they should match.

I've used the approach of doing a checksum across the results of an
ordered query to compare between systems and that generally does work,
but it's a bit tricky if you're trying to compare a table that's heavily
modified- you need to determine the point in the WAL stream that you're
at on the primary when you run the query and then replay the replica to
that point in the WAL and then run the query on the replica, otherwise
you could end up with differences that are just because of the ongoing
transactions being run to update the table.

Thanks!

Stephen

Attachment

signature.asc

Re: Does PostgreSQL check database integrity at startup?

From

Edson Carlos Ericksson Richter

Date:

28 December 2017, 21:33:10

Em 28/12/2017 16:06, Brent Wood escreveu:
> Some thoughts....
>
> A tool to calculate a checksum of sorts based on the table (file) 
> content would provide a better surety of duplication than simply 
> checking file size - like differently vacuumed tables in each copy 
> could have the same content but be different file sizes.
>
> Something like these could be adapted to compare database content by 
> filesystem checks rather than db queries. Following tablespaces, etc 
> as well.
>
> http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash
> or other similar tools
>
> Yes, there is some overhead, especially for large databases but it 
> would be worth that to robustly ensure genuine and complete duplication.
>
> I do wonder though - given the order of records in a table (file) is 
> not necessarily identical (or is it?) event this may be problematic. 
> Perhaps a checksum based on the result of a query output ordered by 
> primary key could work?
>
> Brent Wood
>
> ------------------------------------------------------------------------
> *From:* Edson Carlos Ericksson Richter <richter@simkorp.com.br>
> *To:* pgsql-general@lists.postgresql.org
> *Sent:* Friday, December 29, 2017 6:47 AM
> *Subject:* Re: Does PostgreSQL check database integrity at startup?
>
> Em 28/12/2017 10:16, Stephen Frost escreveu:
> > Alvaro,
> >
> > * Alvaro Herrera (alvherre@alvh.no-ip.org 
> <mailto:alvherre@alvh.no-ip.org>) wrote:
> >> For context: this was first reported in the Barman forum here:
> >> 
> https://groups.google.com/forum/#!msg/pgbarman/3aXWpaKWRFI/weUIZxspDAAJ 
> <https://groups.google.com/forum/#%21msg/pgbarman/3aXWpaKWRFI/weUIZxspDAAJ>
> >> They are using Barman for the backups.
> > Ahhhh, I see.  I wasn't aware of that history.
> >
> >> Stephen Frost wrote:
> >>
> >>>> But at some point in time, slave became corrupt (one of the base
> >>>> files are zero size where it should be 16Mb in size), and IMHO a
> >>>> "red alert" should arise - Slave server shall not even startup at
> >>>> all.
> >>> How do you know it should be 16Mb in size...?  That sounds like you're
> >>> describing a WAL file, but you should be archiving your WAL files 
> during
> >>> a backup, not just using whatever is in pg_xlog/pg_wal..
> >> It's not a WAL file -- it's a file backing a table.
> > Interesting.
> >
> >>>> Since backups are taken from slave server, all backups are also 
> corrupt.
> >>> If you aren't following the appropriate process to perform a backup
> >>> then, yes, you're going to end up with corrupt and useless/bad 
> backups.
> >> A few guys went over the backup-taking protocol upthread already.
> >>
> >> But anyway the backup tool is a moot point. The problem doesn't
> >> originate in the backup -- it originates in the standby, from where the
> >> backup is taken.  The file can be seen as size 0 in the standby.
> >> Edson's question is: why wasn't the problem detected in the standby?
> >> It seems a valid question to me, to which we currently we don't 
> have any
> >> good answer.
> > The last message on that thread seems pretty clear to me- the comment is
> > "I think this is a failure in standby build." It's not clear what that
> > failure was but I agree it doesn't appear related to the backup tool
> > (the comment there is "I'm using rsync"), or, really, PostgreSQL at all
> > (a failure during the build of the replica isn't something we're
> > necessairly going to pick up on..).
> >
> > As discussed on this thread, zero-byte files are entirely valid to
> > appear in the PostgreSQL data directory.
> >
> > To try and dig into what happened, I'd probably look at what forks there
> > are of that relation, the entry in pg_class, and try to figure out how
> > it is that replication isn't complaining when the file on the primary
> > appeared to be modified well after the last modify timestamp on the
> > replica.  If it's possible to replica this into a test environment,
> > maybe even do a no-op update of a row of that table and see what happens
> > with replication.  One thing I wonder is if this table used to be
> > unlogged and it was later turned into a logged table but something
> > didn't quite happen correctly with that.  I'd also suggest looking for
> > other file size mismatches between the primary and the replica.
> >
> > Thanks!
> >
> > Stephen
> The table was never unlogged. From very beginning, it was always logged.
> I've dozens of databases with exactly same setup - and right now, I'm
> rebuilding the slave server. Instead of investigating something probably
> I will not find the cause, I would like to have a alert for the future.
>
> Would be possible to include in future versions:
> 1) After start standby, standby run all WAL files until it is
> synchronized with master (current behavior)
> 3) Before getting into "accept read only queries", check if all base
> files have same size as master server (new behavior). In case something
> is different, throw an error and stop database startup?
> 4) Then start "accept read only queries" (current behavior)
> ???
>
> Thanks,
>
>
> Edson

Thanks, Brent. I'll think about it.

Regards,

Edson

Re: Does PostgreSQL check database integrity at startup?

From

Edson Carlos Ericksson Richter

Date:

28 December 2017, 21:35:38

Em 28/12/2017 16:26, Stephen Frost escreveu:
> Greetings Brent,
>
> * Brent Wood (pcreso@yahoo.com) wrote:
>> A tool to calculate a checksum of sorts based on the table (file) content would provide a better surety of
duplicationthan simply checking file size - like differently vacuumed tables in each copy could have the same content
butbe different file sizes.
 
> PG has support for checksums and there are tools out there to validate
> that the checksum is correct for all pages which have one, but that
> wouldn't help in this case because the file is zero'd out (and a zero'd
> out file is actually a valid file in a PG data directory).
>
> Also, the files on the primary and the replica actually can be different
> when looked at with a complete-file checksum due to hint bits being set
> differently (at least, possibly other ways too).  That doesn't make them
> invalid or incorrect though.
>
> Rather than trying to compare a running primary to a running replica,
> such a check to verify that the files backed up during a backup have the
> same checksum as the files being restored from that backup can be done,
> and that *is* done in at least some of the PG backup tools already
> (pgBackRest has an independent manifest that it stores for each backup
> which contains the checksum of each file as-backed-up, and it verifies
> that checksum when performing a restore to make sure that the backed up
> file wasn't corrupted in place, other tools hopefully have similar).
>
>> I do wonder though - given the order of records in a table (file) is not necessarily identical (or is it?) event
thismay be problematic. Perhaps a checksum based on the result of a query output ordered by primary key could work?
 
> The order of records in a *file* should be the same in the heap on the
> primary as they are on the replica, but that doesn't mean the contents
> of those files will be exactly the same (as mentioned above, hint bits
> can differ).  We used to have cases where the indexes could also be
> different, but I believe that was changed so they should match.
>
> I've used the approach of doing a checksum across the results of an
> ordered query to compare between systems and that generally does work,
> but it's a bit tricky if you're trying to compare a table that's heavily
> modified- you need to determine the point in the WAL stream that you're
> at on the primary when you run the query and then replay the replica to
> that point in the WAL and then run the query on the replica, otherwise
> you could end up with differences that are just because of the ongoing
> transactions being run to update the table.
>
> Thanks!
>
> Stephen

Stephen, thanks for you detailed reasoning on why this would not be so 
simple.

I'll take all of that in consideration. I hope I find something that 
gives a bit more confidence that my replicas are enought reliable. At 
this moment, I just don't trust them.

Regards,

Edson

Re: Does PostgreSQL check database integrity at startup?

From

Jan Wieck

Date:

30 December 2017, 03:14:46

On Thu, Dec 28, 2017 at 1:26 PM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings Brent,

* Brent Wood (pcreso@yahoo.com) wrote:
> A tool to calculate a checksum of sorts based on the table (file) content would provide a better surety of duplication than simply checking file size - like differently vacuumed tables in each copy could have the same content but be different file sizes.

PG has support for checksums and there are tools out there to validate
that the checksum is correct for all pages which have one, but that
wouldn't help in this case because the file is zero'd out (and a zero'd
out file is actually a valid file in a PG data directory).

Also, the files on the primary and the replica actually can be different
when looked at with a complete-file checksum due to hint bits being set
differently (at least, possibly other ways too). That doesn't make them
invalid or incorrect though.

In addition to what Stephen and everyone else said, it is impossible to get a valid snapshot of the whole "file" on a running server without locking the relation and reading it through the PostgreSQL buffer cache. On data files such as heap and index, PostgreSQL does extensive write caching. Preventing data loss from write caching is a primary purpose of WAL. Write caching in the application (PostgreSQL in this case) prevents the OS from actually knowing the correct "logical" state of the file at any given point in time. This means that even a LVM snapshot will not give you consistent data files of a running server, because the not yet written changes (in shared buffers) waiting for a checkpoint to force them into OS buffers won't be visible from outside PostgreSQL.

Regards, Jan

such a check to verify that the files backed up during a backup have the
same checksum as the files being restored from that backup can be done,
and that *is* done in at least some of the PG backup tools already
(pgBackRest has an independent manifest that it stores for each backup
which contains the checksum of each file as-backed-up, and it verifies
that checksum when performing a restore to make sure that the backed up
file wasn't corrupted in place, other tools hopefully have similar).

> I do wonder though - given the order of records in a table (file) is not necessarily identical (or is it?) event this may be problematic. Perhaps a checksum based on the result of a query output ordered by primary key could work?

The order of records in a *file* should be the same in the heap on the
primary as they are on the replica, but that doesn't mean the contents
of those files will be exactly the same (as mentioned above, hint bits
can differ). We used to have cases where the indexes could also be
different, but I believe that was changed so they should match.

I've used the approach of doing a checksum across the results of an
ordered query to compare between systems and that generally does work,
but it's a bit tricky if you're trying to compare a table that's heavily
modified- you need to determine the point in the WAL stream that you're
at on the primary when you run the query and then replay the replica to
that point in the WAL and then run the query on the replica, otherwise
you could end up with differences that are just because of the ongoing
transactions being run to update the table.

Thanks!

Stephen

Jan Wieck
Senior Postgres Architect

http://pgblog.wi3ck.info

Re: Does PostgreSQL check database integrity at startup?

From

Edson Carlos Ericksson Richter

Date:

30 December 2017, 04:03:44

Em 29/12/2017 22:14, Jan Wieck escreveu:
>
>
> On Thu, Dec 28, 2017 at 1:26 PM, Stephen Frost <sfrost@snowman.net 
> <mailto:sfrost@snowman.net>> wrote:
>
>     Greetings Brent,
>
>     * Brent Wood (pcreso@yahoo.com <mailto:pcreso@yahoo.com>) wrote:
>     > A tool to calculate a checksum of sorts based on the table
>     (file) content would provide a better surety of duplication than
>     simply checking file size - like differently vacuumed tables in
>     each copy could have the same content but be different file sizes.
>
>     PG has support for checksums and there are tools out there to validate
>     that the checksum is correct for all pages which have one, but that
>     wouldn't help in this case because the file is zero'd out (and a
>     zero'd
>     out file is actually a valid file in a PG data directory).
>
>     Also, the files on the primary and the replica actually can be
>     different
>     when looked at with a complete-file checksum due to hint bits
>     being set
>     differently (at least, possibly other ways too).  That doesn't
>     make them
>     invalid or incorrect though.
>
>
> In addition to what Stephen and everyone else said, it is impossible 
> to get a valid snapshot of the whole "file" on a running server 
> without locking the relation and reading it through the PostgreSQL 
> buffer cache. On data files such as heap and index, PostgreSQL does 
> extensive write caching. Preventing data loss from write caching is a 
> primary purpose of WAL. Write caching in the application (PostgreSQL 
> in this case) prevents the OS from actually knowing the correct 
> "logical" state of the file at any given point in time. This means 
> that even a LVM snapshot will not give you consistent data files of a 
> running server, because the not yet written changes (in shared 
> buffers) waiting for a checkpoint to force them into OS buffers won't 
> be visible from outside PostgreSQL.
>
>
> Regards, Jan
>
>
>
>     such a check to verify that the files backed up during a backup
>     have the
>     same checksum as the files being restored from that backup can be
>     done,
>     and that *is* done in at least some of the PG backup tools already
>     (pgBackRest has an independent manifest that it stores for each backup
>     which contains the checksum of each file as-backed-up, and it verifies
>     that checksum when performing a restore to make sure that the
>     backed up
>     file wasn't corrupted in place, other tools hopefully have similar).
>
>     > I do wonder though - given the order of records in a table
>     (file) is not necessarily identical (or is it?) event this may be
>     problematic. Perhaps a checksum based on the result of a query
>     output ordered by primary key could work?
>
>     The order of records in a *file* should be the same in the heap on the
>     primary as they are on the replica, but that doesn't mean the contents
>     of those files will be exactly the same (as mentioned above, hint bits
>     can differ).  We used to have cases where the indexes could also be
>     different, but I believe that was changed so they should match.
>
>     I've used the approach of doing a checksum across the results of an
>     ordered query to compare between systems and that generally does work,
>     but it's a bit tricky if you're trying to compare a table that's
>     heavily
>     modified- you need to determine the point in the WAL stream that
>     you're
>     at on the primary when you run the query and then replay the
>     replica to
>     that point in the WAL and then run the query on the replica, otherwise
>     you could end up with differences that are just because of the ongoing
>     transactions being run to update the table.
>
>     Thanks!
>
>     Stephen
>
>
>
>
> -- 
> Jan Wieck
> Senior Postgres Architect
> http://pgblog.wi3ck.info

One last question:

There should be a "catalog" that point where tables are stored in 
physical files (I think, at least, because at some point PostgreSQL need 
to know from where to read the data).

Based on information from this catalog, would I have a tool (perhaps, a 
C function) that check that data is really there?

Thanks,

Edson

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

30 December 2017, 04:09:03

Jan, all,

* Jan Wieck (jan@wi3ck.info) wrote:
> On Thu, Dec 28, 2017 at 1:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Brent Wood (pcreso@yahoo.com) wrote:
> > > A tool to calculate a checksum of sorts based on the table (file)
> > content would provide a better surety of duplication than simply checking
> > file size - like differently vacuumed tables in each copy could have the
> > same content but be different file sizes.
> >
> > PG has support for checksums and there are tools out there to validate
> > that the checksum is correct for all pages which have one, but that
> > wouldn't help in this case because the file is zero'd out (and a zero'd
> > out file is actually a valid file in a PG data directory).
> >
> > Also, the files on the primary and the replica actually can be different
> > when looked at with a complete-file checksum due to hint bits being set
> > differently (at least, possibly other ways too).  That doesn't make them
> > invalid or incorrect though.
>
> In addition to what Stephen and everyone else said, it is impossible to get
> a valid snapshot of the whole "file" on a running server without locking
> the relation and reading it through the PostgreSQL buffer cache. On data
> files such as heap and index, PostgreSQL does extensive write caching.
> Preventing data loss from write caching is a primary purpose of WAL. Write
> caching in the application (PostgreSQL in this case) prevents the OS from
> actually knowing the correct "logical" state of the file at any given point
> in time. This means that even a LVM snapshot will not give you consistent
> data files of a running server, because the not yet written changes (in
> shared buffers) waiting for a checkpoint to force them into OS buffers
> won't be visible from outside PostgreSQL.

Indeed, and just to follow that through- that's what WAL exists to
correct in the event of a crash.  Anything cache'd in PostgreSQL but
actually committed will have already been written into the WAL and
sync'd to disk, so when PostgreSQL restarts it'll realize that a crash
had happened and go back to the last checkpoint in the WAL stream and
replay the WAL from there to get back to consistency.

This is also how online backups of PostgreSQL happen, but PG can't then
start the replay from the *latest* checkpoint, it has to start from the
checkpoint where the backup started, and it can't just finish at any
point in the WAL stream- it must reach consistency (where the backup
finished).  The starting location is provided by PostgreSQL when doing a
non-exclusive backup to the process issuing the 'pg_stop_backup()' call
in its result and must then be put into a file called 'backup_label'
when PG is restored from that backup, so that PG knows that a backup was
done and where to start WAL replay from.

The stopping location is written into the WAL stream when doing a backup
from a primary, but if you're doing a backup from a replica then we use
the latest checkpoint written into pg_control as the stopping point- but
this means that pg_control *must* be copied last when doing a backup
from a standby.  Sadly, we don't seem to have gotten around to actually
documenting this fact as yet, which is really a bug that we should fix
and back-patch because it's absolutely critical.

The various backup tools written by folks who have delved into all of
this deeply handle these requirements, thankfully, but it just goes to
show how difficult this all is to get right. :/

Thanks!

Stephen

Attachment

signature.asc

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

30 December 2017, 04:13:15

Greetings,

* Edson Carlos Ericksson Richter (richter@simkorp.com.br) wrote:
> There should be a "catalog" that point where tables are stored in physical
> files (I think, at least, because at some point PostgreSQL need to know from
> where to read the data).

Yes, it's pg_class.  Specifically, the relfilenode.

> Based on information from this catalog, would I have a tool (perhaps, a C
> function) that check that data is really there?

You could write such a function, but it wouldn't be able to be general
purpose as a zero-byte file is, in fact, a valid file.  You could just
as easily do a 'select 1 from table limit 1;' and make sure that you get
back a successful, single-row, result, if you wish to verify that certain
tables in your database always have at least 1 row.

Thanks!

Stephen

Attachment

signature.asc

Re: Does PostgreSQL check database integrity at startup?

From

Melvin Davidson

Date:

30 December 2017, 05:03:10

>There should be a "catalog" that point where tables are stored in physical files...

Here is the query that gives you that information.

SELECT c.oid,
               n.nspname as schema,
               c.relname as table,
              (SELECT oid FROM pg_database WHERE datname = current_database() ) as db_dir,
         c.relfilenode as filename
   FROM pg_class c
     JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE relname NOT LIKE 'pg_%'
AND relname NOT LIKE 'information%'
      AND relname NOT LIKE 'sql_%'
      AND relkind = 'r'
ORDER BY 2, relname;

On Fri, Dec 29, 2017 at 8:13 PM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Edson Carlos Ericksson Richter (richter@simkorp.com.br) wrote:
> There should be a "catalog" that point where tables are stored in physical
> files (I think, at least, because at some point PostgreSQL need to know from
> where to read the data).

Yes, it's pg_class. Specifically, the relfilenode.

> Based on information from this catalog, would I have a tool (perhaps, a C
> function) that check that data is really there?

You could write such a function, but it wouldn't be able to be general
purpose as a zero-byte file is, in fact, a valid file. You could just
as easily do a 'select 1 from table limit 1;' and make sure that you get
back a successful, single-row, result, if you wish to verify that certain
tables in your database always have at least 1 row.

Thanks!

Stephen

Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

30 December 2017, 05:07:06

Greetings,

* Melvin Davidson (melvin6925@gmail.com) wrote:
> >There should be a "catalog" that point where tables are stored in physical
> files...
>
> Here is the query that gives you that information.
>
> SELECT c.oid,
>                n.nspname as schema,
>                c.relname as table,
>               (SELECT oid FROM pg_database WHERE datname =
> current_database() ) as db_dir,
>                c.relfilenode as filename
>    FROM pg_class c
>      JOIN pg_namespace n ON (n.oid = c.relnamespace)
> WHERE relname NOT LIKE 'pg_%'
>       AND relname NOT LIKE 'information%'
>       AND relname NOT LIKE 'sql_%'
>       AND relkind = 'r'
> ORDER BY 2, relname;

This isn't a terribly good query- it's entirely valid to have 'pg_blah'
and 'informationblah' tables in user schemas.  If you'd like to filter
out the catalogs/internal schemas, filter based on schema name instead.

Also, this doesn't actually provide what Edson is asking for.  Edson's
asking for a query that uses pg_ls_dir() or some custom function which
will run 'stat' on each file and return the size, according to the OS.

Thanks!

Stephen

Attachment

signature.asc

Re: Does PostgreSQL check database integrity at startup?

From

Melvin Davidson

Date:

30 December 2017, 05:14:22

On Fri, Dec 29, 2017 at 9:07 PM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Melvin Davidson (melvin6925@gmail.com) wrote:
> >There should be a "catalog" that point where tables are stored in physical
> files...
>
> Here is the query that gives you that information.
>
> SELECT c.oid,
> n.nspname as schema,
> c.relname as table,
> (SELECT oid FROM pg_database WHERE datname =
> current_database() ) as db_dir,
> c.relfilenode as filename
> FROM pg_class c
> JOIN pg_namespace n ON (n.oid = c.relnamespace)
> WHERE relname NOT LIKE 'pg_%'
> AND relname NOT LIKE 'information%'
> AND relname NOT LIKE 'sql_%'
> AND relkind = 'r'
> ORDER BY 2, relname;

This isn't a terribly good query- it's entirely valid to have 'pg_blah'
and 'informationblah' tables in user schemas. If you'd like to filter
out the catalogs/internal schemas, filter based on schema name instead.

Also, this doesn't actually provide what Edson is asking for. Edson's
asking for a query that uses pg_ls_dir() or some custom function which
will run 'stat' on each file and return the size, according to the OS.

Thanks!

Stephen

Edson's original request was for a query that shows the FILENAMEs for the table.

As for "qood" query, that is entirely an opinion. The query WILL show all files associated

with ALL tables. You are free to edit and reconstruct as you choose.

Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

30 December 2017, 05:22:10

Melvin,

* Melvin Davidson (melvin6925@gmail.com) wrote:
> On Fri, Dec 29, 2017 at 9:07 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Melvin Davidson (melvin6925@gmail.com) wrote:
> > > >There should be a "catalog" that point where tables are stored in
> > physical
> > > files...
> > >
> > > Here is the query that gives you that information.
> > >
> > > SELECT c.oid,
> > >                n.nspname as schema,
> > >                c.relname as table,
> > >               (SELECT oid FROM pg_database WHERE datname =
> > > current_database() ) as db_dir,
> > >                c.relfilenode as filename
> > >    FROM pg_class c
> > >      JOIN pg_namespace n ON (n.oid = c.relnamespace)
> > > WHERE relname NOT LIKE 'pg_%'
> > >       AND relname NOT LIKE 'information%'
> > >       AND relname NOT LIKE 'sql_%'
> > >       AND relkind = 'r'
> > > ORDER BY 2, relname;
> >
> > This isn't a terribly good query- it's entirely valid to have 'pg_blah'
> > and 'informationblah' tables in user schemas.  If you'd like to filter
> > out the catalogs/internal schemas, filter based on schema name instead.
> >
> > Also, this doesn't actually provide what Edson is asking for.  Edson's
> > asking for a query that uses pg_ls_dir() or some custom function which
> > will run 'stat' on each file and return the size, according to the OS.
>
> *Edson's original request was for a query that shows  the FILENAMEs for the
> table.*

As quoted previously, he request included:

---
> Based on information from this catalog, would I have a tool (perhaps, a C
> function) that check that data is really there?
---

Which is asking about having a function to 'stat' the files and check
their length ('data is really there').

> *As for "qood" query, that is entirely an opinion. The query WILL show all
> files associated*
> *with ALL tables. You are free to edit and reconstruct as you choose.  *

No, it won't, it'll filter out tables which exist in user schemas that
happen to start with one of the strings that the query includes ('pg_',
'information', and 'sql_').

I encourage you to test it- create a table in the public schema called
'pg_whatever' and see if your query picks it up or not.  This isn't a
minor complaint about style, the query is outright wrong.

Thanks!

Stephen

Attachment

signature.asc

Re: Does PostgreSQL check database integrity at startup?

From

Melvin Davidson

Date:

30 December 2017, 05:31:24

>the query is outright wrong.

Really? I submit a query to help and all you can do is criticize?

Yes it is legal to create a table that starts with pg_, but any PostgreSQL developer should

know that is not a good idea.

If you don't like the query, write a better one yourself, but

kindly do not try to degrade the efforts of others.

On Fri, Dec 29, 2017 at 9:22 PM, Stephen Frost <sfrost@snowman.net> wrote:

Melvin,

* Melvin Davidson (melvin6925@gmail.com) wrote:
> On Fri, Dec 29, 2017 at 9:07 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Melvin Davidson (melvin6925@gmail.com) wrote:
> > > >There should be a "catalog" that point where tables are stored in
> > physical
> > > files...
> > >
> > > Here is the query that gives you that information.
> > >
> > > SELECT c.oid,
> > > n.nspname as schema,
> > > c.relname as table,
> > > (SELECT oid FROM pg_database WHERE datname =
> > > current_database() ) as db_dir,
> > > c.relfilenode as filename
> > > FROM pg_class c
> > > JOIN pg_namespace n ON (n.oid = c.relnamespace)
> > > WHERE relname NOT LIKE 'pg_%'
> > > AND relname NOT LIKE 'information%'
> > > AND relname NOT LIKE 'sql_%'
> > > AND relkind = 'r'
> > > ORDER BY 2, relname;
> >
> > This isn't a terribly good query- it's entirely valid to have 'pg_blah'
> > and 'informationblah' tables in user schemas. If you'd like to filter
> > out the catalogs/internal schemas, filter based on schema name instead.
> >
> > Also, this doesn't actually provide what Edson is asking for. Edson's
> > asking for a query that uses pg_ls_dir() or some custom function which
> > will run 'stat' on each file and return the size, according to the OS.
>
> *Edson's original request was for a query that shows the FILENAMEs for the
> table.*

As quoted previously, he request included:

---
> Based on information from this catalog, would I have a tool (perhaps, a C
> function) that check that data is really there?
---

Which is asking about having a function to 'stat' the files and check
their length ('data is really there').

> *As for "qood" query, that is entirely an opinion. The query WILL show all
> files associated*
> *with ALL tables. You are free to edit and reconstruct as you choose. *

No, it won't, it'll filter out tables which exist in user schemas that
happen to start with one of the strings that the query includes ('pg_',
'information', and 'sql_').

I encourage you to test it- create a table in the public schema called
'pg_whatever' and see if your query picks it up or not. This isn't a
minor complaint about style, the query is outright wrong.

Thanks!

Stephen

Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Re: Does PostgreSQL check database integrity at startup?

From

Steve Atkins

Date:

30 December 2017, 05:35:30

> On Dec 29, 2017, at 6:14 PM, Melvin Davidson <melvin6925@gmail.com> wrote:
>
> Edson's original request was for a query that shows  the FILENAMEs for the table.

Which the query you provide does not do.

> As for "qood" query, that is entirely an opinion.

But a well informed one. Your query may work well enough for you, but when you're
suggesting others rely on it you should expect more informed users to point out
that it has some critical flaws - especially ones that might not be obvious to new
users - lest others naively rely on it.

> The query WILL show all files associated
> with ALL tables.

No, it won't. You're filtering out a bunch of things via the name of the table. That
doesn't do the right thing. You're heading in the right direction, but the query you
gave gets some things badly wrong.

Listen to the criticism from others and you'll improve your knowledge and skills.

There's absolutely nothing wrong with posting information that's not quite right,
nor with getting feedback from others on what's wrong with it. Copping an attitude
in response to that feedback is where things go downhill.

Cheers,
  Steve

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

30 December 2017, 05:51:03

Melvin,

* Melvin Davidson (melvin6925@gmail.com) wrote:
> >the query is outright wrong.
> Really? I submit a query to help and all you can do is criticize?

I explained exactly what was wrong with it, and how to write a proper
query to answer this question, with the thought that you'd appreciate
the help and would create a proper query.  I'm a bit mystified as to why
you seem to be unhappy with this.

> Yes it is legal to create a table  that starts with pg_, but any PostgreSQL
> developer should
> know that is not a good idea.

That's really not an excuse for providing an incorrect query when you
could simply adjust the query to properly filter on nspname instead of
relname, further, while perhaps 'pg_' instead a good prefix to use in a
PG database (note that other databases, of course, wouldn't have any
issue using such a prefix), using a table name starting with
'information' or 'sql_' isn't nearly as questionable, yet your query
also filtered those out.

> If you don't like the query, write a better one yourself, but
> kindly do not try to degrade the efforts of others.

No, pointing out that a query is wrong and doesn't acurately answer the
question is something which should be done, just as someone should call
out a bug in code that's been submitted for inclusion in PostgreSQL.

I've tried to be kind and point out exactly what's wrong, along with
encouraging words of how to adjust the query to be correct.  Having
misleading and incorrect information in our list archives isn't being
helpful- those archives are preserved for more-or-less forever, end up
being returned in google search results, and have caused confusion and
even data loss in multiple cases that I'm personally aware of (thank you
to pg_resetxlog...).  We should all try to work to minimize that, and to
correct it when necessary as best we can through responses.

Thanks!

Stephen

Attachment

signature.asc

Re: Does PostgreSQL check database integrity at startup?

From

Jan Wieck

Date:

30 December 2017, 08:09:09

This whole discussion left out the critical point of

how on earth did that file end up having zero bytes?

There is the possibility that this is due to a bug in PostgreSQL. But over the 20+ years I've been using and hacking it, that has never happened.

OP never told us what type of OS, hardware and specifically storage subsystem that "slave" was running on. Filesystems zapping files to zero after a catastrophic IO failure (no longer throwing them into lost+found) isn't unheard of.

Regards, Jan

On Fri, Dec 29, 2017 at 8:13 PM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Edson Carlos Ericksson Richter (richter@simkorp.com.br) wrote:
> There should be a "catalog" that point where tables are stored in physical
> files (I think, at least, because at some point PostgreSQL need to know from
> where to read the data).

Yes, it's pg_class. Specifically, the relfilenode.

> Based on information from this catalog, would I have a tool (perhaps, a C
> function) that check that data is really there?

You could write such a function, but it wouldn't be able to be general
purpose as a zero-byte file is, in fact, a valid file. You could just
as easily do a 'select 1 from table limit 1;' and make sure that you get
back a successful, single-row, result, if you wish to verify that certain
tables in your database always have at least 1 row.

Thanks!

Stephen

Jan Wieck
Senior Postgres Architect

http://pgblog.wi3ck.info

Re: Does PostgreSQL check database integrity at startup?

From

Alban Hertroys

Date:

30 December 2017, 14:25:08

> On 30 Dec 2017, at 6:09, Jan Wieck <jan@wi3ck.info> wrote:
>
> This whole discussion left out the critical point of
>
> how on earth did that file end up having zero bytes?
>
> There is the possibility that this is due to a bug in PostgreSQL. But over the 20+ years I've been using and hacking
it,that has never happened. 
>
> OP never told us what type of OS, hardware and specifically storage subsystem that "slave" was running on.
Filesystemszapping files to zero after a catastrophic IO failure (no longer throwing them into lost+found) isn't
unheardof. 

If this is indeed a case of a corrupt database, then it's possible that we're looking at a system that claims to have
thedisk cache flushed when it hasn't yet done that. 

It gets even more fun if there are multiple table-spaces in use with different behaviour in "whatever is causing this"
(fsync,file system zappery, etc.); the WAL log can get out of synch with the table files in that case. The WAL would
reflecta situation where more has been flushed to the respective files than is actually the case - data got lost. 

But of course it's also entirely possible that these empty files are not in fact a problem, as suggested up-thread.

No reason to panic - yet.
In fact, I don't see it mentioned explicitly anywhere, but are we actually looking at a problem?
The standby server started up just fine, from what I'm reading. Is there any evidence of corruption?

So far, the database has been labeled corrupt by Edson because a base file was 0 bytes, but that's apparently not a
definitiveindication of corruption. 
What would be definitive is if querying certain relations causes errors or does give results that were never in that
database.I do think that those 0 byte files are a good starting point, by querying the relations that those belong to
(ifany). Otherwise we may be searching for a needle in a haystack that doesn't necessarily contain one. 
I do realise that not finding any evidence of corruption does not necessarily mean there is none.

I'm not convinced by the argument that a CRC check of a 0 byte file on a standby would not detect corruption. At the
least,the CRC would be different or we would be left with a CRC on the master that we can't match any CRC's on the
slaveto if the file is larger on the master. 

If CRC's can be relied on to detect corruption (which they were designed to do), then that answers Edson's question.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

30 December 2017, 18:27:16

Greetings Alban,

* Alban Hertroys (haramrae@gmail.com) wrote:
> In fact, I don't see it mentioned explicitly anywhere, but are we actually looking at a problem?

From the discussion in the google barman group, it's clear that the file
shouldn't be zero bytes in this specific case.

> I'm not convinced by the argument that a CRC check of a 0 byte file on a standby would not detect corruption. At the
least,the CRC would be different or we would be left with a CRC on the master that we can't match any CRC's on the
slaveto if the file is larger on the master. 

As discussed, files on the primary can be different at a byte level from
those on replicas and still be perfectly valid and correct, for a
variety of reasons from hint bit differences to differences due to the
replica not being at exactly the same point as the primary.

> If CRC's can be relied on to detect corruption (which they were designed to do), then that answers Edson's question.

The checksums included in PG are page-level and therefore there simply
isn't one to look at if the file is zero bytes.

Thanks!

Stephen

Attachment

signature.asc

Re: Does PostgreSQL check database integrity at startup?

From

Melvin Davidson

Date:

30 December 2017, 19:19:24

On Sat, Dec 30, 2017 at 10:27 AM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings Alban,

* Alban Hertroys (haramrae@gmail.com) wrote:
> In fact, I don't see it mentioned explicitly anywhere, but are we actually looking at a problem?

From the discussion in the google barman group, it's clear that the file
shouldn't be zero bytes in this specific case.

> I'm not convinced by the argument that a CRC check of a 0 byte file on a standby would not detect corruption. At the least, the CRC would be different or we would be left with a CRC on the master that we can't match any CRC's on the slave to if the file is larger on the master.

As discussed, files on the primary can be different at a byte level from
those on replicas and still be perfectly valid and correct, for a
variety of reasons from hint bit differences to differences due to the
replica not being at exactly the same point as the primary.

> If CRC's can be relied on to detect corruption (which they were designed to do), then that answers Edson's question.

The checksums included in PG are page-level and therefore there simply
isn't one to look at if the file is zero bytes.

Thanks!

Stephen

Steve Atkins and Stephen Frost,

You both remind me of gawkers at a disaster, pointing out "mistakes" that
first responders are making, but doing nothing yourselves to help.

My query works as designed and has done so for two years. It shows the
filenames for schemas and tables in the database. If you don't like it,
TOO BAD! Shame on you! Use your brains and write your own query.
Nuff said, I shall reply no more.

--

Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

31 December 2017, 01:06:52

Greetings,

* Melvin Davidson (melvin6925@gmail.com) wrote:
> My query works as designed and has done so
> for two years. It shows the filenames for schemas and tables in the
> database.

I'm glad to hear that it works in your specific use-case.
Unfortunately, it doesn't work in the general case and therefore isn't a
good example.  A proper query to return the filename for each user table
in the current database is:

SELECT
  quote_ident(nsp.nspname) || '.' || quote_ident(c.relname),
  s.setting || '/base/' || db.oid || '/' || c.relfilenode
FROM
  pg_settings s
  JOIN pg_database db on (s.name = 'data_directory')
  JOIN pg_class c on (datname = current_database())
  JOIN pg_namespace nsp on (c.relnamespace = nsp.oid)
WHERE
  relfilenode <> 0
  AND nsp.nspname !~ '^pg_'
  AND nsp.nspname <> 'information_schema';

Note that, as discussed earlier in this thread, this doesn't actually
answer what Edson was asking for.  Here's the query that would answer
his original request:

SELECT
  quote_ident(nsp.nspname) || '.' || quote_ident(c.relname),
  s.setting || '/base/' || db.oid || '/' || c.relfilenode,
  (pg_stat_file(s.setting || '/base/' || db.oid || '/' || c.relfilenode)).size as size
FROM
  pg_settings s
  JOIN pg_database db on (s.name = 'data_directory')
  JOIN pg_class c on (datname = current_database())
  JOIN pg_namespace nsp on (c.relnamespace = nsp.oid)
WHERE
  relfilenode <> 0
  AND nsp.nspname !~ '^pg_'
  AND nsp.nspname <> 'information_schema';

Technically speaking, while these queries are correct for PG10, in prior
versions of PostgreSQL it's possible to have user schemas that begin
with 'pg_' and therefore the filtering in the WHERE clause would have to
be more specific.

Note that both of these need to be run as a superuser in older versions
of PG.  In PostgreSQL 10, a user could be GRANT'd 'pg_read_all_settings'
and be able to run the first query.  We don't currently support being
able to GRANT a non-superuser the ability to run pg_stat_file(), but
that will likely be coming in PG 11.

Thanks!

Stephen

Attachment

signature.asc

Sv: Re: Does PostgreSQL check database integrity at startup?

From

Andreas Joseph Krogh

Date:

31 December 2017, 02:43:40

På lørdag 30. desember 2017 kl. 23:06:52, skrev Stephen Frost <sfrost@snowman.net>:

Greetings,

* Melvin Davidson (melvin6925@gmail.com) wrote:
> My query works as designed and has done so
> for two years. It shows the filenames for schemas and tables in the
> database.

I'm glad to hear that it works in your specific use-case.
Unfortunately, it doesn't work in the general case and therefore isn't a
good example. A proper query to return the filename for each user table
in the current database is:

SELECT
quote_ident(nsp.nspname) || '.' || quote_ident(c.relname),
s.setting || '/base/' || db.oid || '/' || c.relfilenode
FROM
pg_settings s
JOIN pg_database db on (s.name = 'data_directory')
JOIN pg_class c on (datname = current_database())
JOIN pg_namespace nsp on (c.relnamespace = nsp.oid)
WHERE
relfilenode <> 0
AND nsp.nspname !~ '^pg_'
AND nsp.nspname <> 'information_schema';

Note that, as discussed earlier in this thread, this doesn't actually
answer what Edson was asking for. Here's the query that would answer
his original request:

SELECT
quote_ident(nsp.nspname) || '.' || quote_ident(c.relname),
s.setting || '/base/' || db.oid || '/' || c.relfilenode,
(pg_stat_file(s.setting || '/base/' || db.oid || '/' || c.relfilenode)).size as size
FROM
pg_settings s
JOIN pg_database db on (s.name = 'data_directory')
JOIN pg_class c on (datname = current_database())
JOIN pg_namespace nsp on (c.relnamespace = nsp.oid)
WHERE
relfilenode <> 0
AND nsp.nspname !~ '^pg_'
AND nsp.nspname <> 'information_schema';

Technically speaking, while these queries are correct for PG10, in prior
versions of PostgreSQL it's possible to have user schemas that begin
with 'pg_' and therefore the filtering in the WHERE clause would have to
be more specific.

Note that both of these need to be run as a superuser in older versions
of PG. In PostgreSQL 10, a user could be GRANT'd 'pg_read_all_settings'
and be able to run the first query. We don't currently support being
able to GRANT a non-superuser the ability to run pg_stat_file(), but
that will likely be coming in PG 11.

Thanks!

That doesn't seem to work with custom types:

andreak@[local]:5433 10.1 andreak=# SELECT
quote_ident(nsp.nspname) || '.' || quote_ident(c.relname),
s.setting || '/base/' || db.oid || '/' || c.relfilenode,
(pg_stat_file(s.setting || '/base/' || db.oid || '/' || c.relfilenode)).size as size
FROM
pg_settings s
JOIN pg_database db on (s.name = 'data_directory')
JOIN pg_class c on (datname = current_database())
JOIN pg_namespace nsp on (c.relnamespace = nsp.oid)
WHERE
relfilenode <> 0
AND nsp.nspname !~ '^pg_'
AND nsp.nspname <> 'information_schema';
ERROR: could not stat file "/home/andreak/programs/postgresql-10/data/base/22039391/22039392": No such file or directory

│ public.biginttuple2 │ /home/andreak/programs/postgresql-10/data/base/22039391/22039392 │

Andreas Joseph Krogh

CTO / Partner - Visena AS

Mobile: +47 909 56 963

andreas@visena.com

www.visena.com

Attachment

msg-1555576-3.png

Re: Sv: Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

31 December 2017, 02:49:31

* Andreas Joseph Krogh (andreas@visena.com) wrote:
>  SELECT
>    quote_ident(nsp.nspname) || '.' || quote_ident(c.relname),
>    s.setting || '/base/' || db.oid || '/' || c.relfilenode,
>    (pg_stat_file(s.setting || '/base/' || db.oid || '/' ||
> c.relfilenode)).size as size
>  FROM
>    pg_settings s
>    JOIN pg_database db on (s.name = 'data_directory')
>    JOIN pg_class c on (datname = current_database())
>    JOIN pg_namespace nsp on (c.relnamespace = nsp.oid)
>  WHERE
>    relfilenode <> 0
>    AND nsp.nspname !~ '^pg_'
>    AND nsp.nspname <> 'information_schema';
>
>  Technically speaking, while these queries are correct for PG10, in prior
>  versions of PostgreSQL it's possible to have user schemas that begin
>  with 'pg_' and therefore the filtering in the WHERE clause would have to
>  be more specific.
>
>  Note that both of these need to be run as a superuser in older versions
>  of PG.  In PostgreSQL 10, a user could be GRANT'd 'pg_read_all_settings'
>  and be able to run the first query.  We don't currently support being
>  able to GRANT a non-superuser the ability to run pg_stat_file(), but
>  that will likely be coming in PG 11.
>
>  Thanks!
>  
> That doesn't seem to work with custom types:

Nothing in this query referred to types, so I'm not sure what custom
types would have to do with it..?

> andreak@[local]:5433 10.1 andreak=# SELECT                         
>   quote_ident(nsp.nspname) || '.' || quote_ident(c.relname),
>   s.setting || '/base/' || db.oid || '/' || c.relfilenode,
>   (pg_stat_file(s.setting || '/base/' || db.oid || '/' || c.relfilenode)).size
> as size
>  FROM  
>   pg_settings s  
>   JOIN pg_database db on (s.name = 'data_directory')  
>   JOIN pg_class c on (datname = current_database())  
>   JOIN pg_namespace nsp on (c.relnamespace = nsp.oid)  
>  WHERE
>   relfilenode <> 0  
>   AND nsp.nspname !~ '^pg_'
>   AND nsp.nspname <> 'information_schema';
>  ERROR:  could not stat file
> "/home/andreak/programs/postgresql-10/data/base/22039391/22039392": No such
> file or directory
>   
> │ public.biginttuple2 │
> /home/andreak/programs/postgresql-10/data/base/22039391/22039392 │

Considering this is saaying 'no such file or directory', I'm guessing
that somehow your data directory isn't what is listed in pg_settings..?

Alternatively, perhaps that table was concurrently dropped?

Are you able to provide any specifics about your system?  Does the
database directory exist?  Does that path look reasonable?  I find it
kind of interesting that the OID of the database and the relfilenode are
so close together- exactly what did you do to test this query?

Thanks!

Stephen

Attachment

signature.asc

Sv: Re: Sv: Re: Does PostgreSQL check database integrity at startup?

From

Andreas Joseph Krogh

Date:

31 December 2017, 03:27:41

På søndag 31. desember 2017 kl. 00:49:31, skrev Stephen Frost <sfrost@snowman.net>:

* Andreas Joseph Krogh (andreas@visena.com) wrote:
> SELECT
> quote_ident(nsp.nspname) || '.' || quote_ident(c.relname),
> s.setting || '/base/' || db.oid || '/' || c.relfilenode,
> (pg_stat_file(s.setting || '/base/' || db.oid || '/' ||
> c.relfilenode)).size as size
> FROM
> pg_settings s
> JOIN pg_database db on (s.name = 'data_directory')
> JOIN pg_class c on (datname = current_database())
> JOIN pg_namespace nsp on (c.relnamespace = nsp.oid)
> WHERE
> relfilenode <> 0
> AND nsp.nspname !~ '^pg_'
> AND nsp.nspname <> 'information_schema';
>
> Technically speaking, while these queries are correct for PG10, in prior
> versions of PostgreSQL it's possible to have user schemas that begin
> with 'pg_' and therefore the filtering in the WHERE clause would have to
> be more specific.
>
> Note that both of these need to be run as a superuser in older versions
> of PG. In PostgreSQL 10, a user could be GRANT'd 'pg_read_all_settings'
> and be able to run the first query. We don't currently support being
> able to GRANT a non-superuser the ability to run pg_stat_file(), but
> that will likely be coming in PG 11.
>
> Thanks!
>
> That doesn't seem to work with custom types:

Nothing in this query referred to types, so I'm not sure what custom
types would have to do with it..?

> andreak@[local]:5433 10.1 andreak=# SELECT
> quote_ident(nsp.nspname) || '.' || quote_ident(c.relname),
> s.setting || '/base/' || db.oid || '/' || c.relfilenode,
> (pg_stat_file(s.setting || '/base/' || db.oid || '/' || c.relfilenode)).size
> as size
> FROM
> pg_settings s
> JOIN pg_database db on (s.name = 'data_directory')
> JOIN pg_class c on (datname = current_database())
> JOIN pg_namespace nsp on (c.relnamespace = nsp.oid)
> WHERE
> relfilenode <> 0
> AND nsp.nspname !~ '^pg_'
> AND nsp.nspname <> 'information_schema';
> ERROR: could not stat file
> "/home/andreak/programs/postgresql-10/data/base/22039391/22039392": No such
> file or directory
>
> │ public.biginttuple2 │
> /home/andreak/programs/postgresql-10/data/base/22039391/22039392 │

Considering this is saaying 'no such file or directory', I'm guessing
that somehow your data directory isn't what is listed in pg_settings..?

Alternatively, perhaps that table was concurrently dropped?

Are you able to provide any specifics about your system? Does the
database directory exist? Does that path look reasonable? I find it
kind of interesting that the OID of the database and the relfilenode are
so close together- exactly what did you do to test this query?

Here's a simple test-case:

createdb test
test=# create table foo(id serial primary key, name varchar not null);
test=# CREATE TYPE BigIntTuple2 AS (f1 bigint, f2 bigint);

SELECT
quote_ident(nsp.nspname) || '.' || quote_ident(c.relname),
s.setting || '/base/' || db.oid || '/' || c.relfilenode
FROM
pg_settings s
JOIN pg_database db on (s.name = 'data_directory')
JOIN pg_class c on (datname = current_database())
JOIN pg_namespace nsp on (c.relnamespace = nsp.oid)
WHERE
relfilenode <> 0
AND nsp.nspname !~ '^pg_'
AND nsp.nspname <> 'information_schema';

┌─────────────────────┬──────────────────────────────────────────────────────────────────┐
│      ?column?       │                             ?column?                             │
├─────────────────────┼──────────────────────────────────────────────────────────────────┤
│ public.foo_id_seq   │ /home/andreak/programs/postgresql-10/data/base/22058766/22058767 │
│ public.foo          │ /home/andreak/programs/postgresql-10/data/base/22058766/22058769 │
│ public.foo_pkey     │ /home/andreak/programs/postgresql-10/data/base/22058766/22058776 │
│ public.biginttuple2 │ /home/andreak/programs/postgresql-10/data/base/22058766/22058778 │
└─────────────────────┴──────────────────────────────────────────────────────────────────┘

SELECT
quote_ident(nsp.nspname) || '.' || quote_ident(c.relname),
s.setting || '/base/' || db.oid || '/' || c.relfilenode,
(pg_stat_file(s.setting || '/base/' || db.oid || '/' || c.relfilenode)).size as size
FROM
pg_settings s
JOIN pg_database db on (s.name = 'data_directory')
JOIN pg_class c on (datname = current_database())
JOIN pg_namespace nsp on (c.relnamespace = nsp.oid)
WHERE
relfilenode <> 0
AND nsp.nspname !~ '^pg_'
AND nsp.nspname <> 'information_schema';

ERROR: could not stat file "/home/andreak/programs/postgresql-10/data/base/22058766/22058778": No such file or directory

$ file /home/andreak/programs/postgresql-10/data/base/22058766/22058776
/home/andreak/programs/postgresql-10/data/base/22058766/22058776: lif file

Andreas Joseph Krogh

CTO / Partner - Visena AS

Mobile: +47 909 56 963

andreas@visena.com

www.visena.com

Attachment

msg-1555581-3.png

Re: Sv: Re: Sv: Re: Does PostgreSQL check database integrity atstartup?

From

Stephen Frost

Date:

31 December 2017, 04:03:15

Greetings Andreas,

* Andreas Joseph Krogh (andreas@visena.com) wrote:
> Here's a simple test-case:
>  
> createdb test
> test=# create table foo(id serial primary key, name varchar not null);
> test=# CREATE TYPE BigIntTuple2 AS (f1 bigint, f2 bigint);

Hah, appears we allocate a relfilenode to types too, though it's not
entirely clear to me why (evidently, they're not used..).

Anyhow, adding a filter on relkind addresses it (though it's not very
future-proof, unfortunately):

SELECT
  quote_ident(nsp.nspname) || '.' || quote_ident(c.relname),
  s.setting || '/base/' || db.oid || '/' || c.relfilenode,
  (pg_stat_file(s.setting || '/base/' || db.oid || '/' || c.relfilenode)).size as size
FROM
  pg_settings s
  JOIN pg_database db on (s.name = 'data_directory')
  JOIN pg_class c on (datname = current_database())
  JOIN pg_namespace nsp on (c.relnamespace = nsp.oid)
WHERE
  relfilenode <> 0
  AND nsp.nspname !~ '^pg_'
  AND nsp.nspname <> 'information_schema'
  AND c.relkind in ('r','i','m','p');

Thanks!

Stephen

Attachment

signature.asc

Re: Does PostgreSQL check database integrity at startup?

From

Jan Wieck

Date:

31 December 2017, 06:10:59

On Sat, Dec 30, 2017 at 10:27 AM, Stephen Frost <sfrost@snowman.net> wrote:

The checksums included in PG are page-level and therefore there simply
isn't one to look at if the file is zero bytes.

And even if the file wasn't zero bytes you can't tell from the per page CRCs if you have all the pages you should have. You could have extra pages that aren't supposed to be there or missing some (or any mix of the two). A per page CRC is useless for those cases.

Regards, Jan

Thanks!

Stephen

Jan Wieck
Senior Postgres Architect

http://pgblog.wi3ck.info

Re: Does PostgreSQL check database integrity at startup?

From

Stephen Frost

Date:

31 December 2017, 19:33:40

Jan, all,

* Jan Wieck (jan@wi3ck.info) wrote:
> On Sat, Dec 30, 2017 at 10:27 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > The checksums included in PG are page-level and therefore there simply
> > isn't one to look at if the file is zero bytes.
>
> And even if the file wasn't zero bytes you can't tell from the per page
> CRCs if you have all the pages you should have. You could have extra pages
> that aren't supposed to  be there or missing some (or any mix of the two).
> A per page CRC is useless for those cases.

Right, which is why it makes sense to have whole-file checksums when
doing things like backups, when you're already reading all of the file
and can grab a checksum in stream, to detect if anything bad happens
after the backup has compeleted to the file that's been backed up.

It's unclear if that would have helped here or not since we don't know
when the file ended up being zero'd out, as I understand it.

Thanks!

Stephen

Attachment

signature.asc