Thread: Scriptable way to validate a pg_dump restore ?

Scriptable way to validate a pg_dump restore ?

From
Laura Smith
Date:
Hi

Let's say I've got a scenario where I'm doing a pg_dump replication rather than online streaming, e.g. due to air-gap
orwhatever. 

Is there a scriptable way to validate the restore ?  e.g. using doing something clever with ctid or something to ensure
boththe schema and all its rows were restored to the same point at which the dump was taken ? 

Thanks !

Laura



Re: Scriptable way to validate a pg_dump restore ?

From
Shaheed Haque
Date:
I'd also like to know how to do this. The current approaches seem, afaict, to involve making on both end of the connection. Even given the inherently racy nature of the issue, that seems unwieldy to me. 

https://www.postgresql.org/message-id/CAHAc2jdAHvp7tFZBP37awcth%3DT3h5WXCN9KjZOvuTNJaAAC_hg%40mail.gmail.com

On Mon, 29 Jan 2024, 14:12 Laura Smith, <n5d9xq3ti233xiyif2vp@protonmail.ch> wrote:
Hi

Let's say I've got a scenario where I'm doing a pg_dump replication rather than online streaming, e.g. due to air-gap or whatever.

Is there a scriptable way to validate the restore ?  e.g. using doing something clever with ctid or something to ensure both the schema and all its rows were restored to the same point at which the dump was taken ?

Thanks !

Laura


Re: Scriptable way to validate a pg_dump restore ?

From
Ron Johnson
Date:
On Mon, Jan 29, 2024 at 3:12 AM Laura Smith <n5d9xq3ti233xiyif2vp@protonmail.ch> wrote:
Hi

Let's say I've got a scenario where I'm doing a pg_dump replication rather than online streaming, e.g. due to air-gap or whatever.

Is there a scriptable way to validate the restore ?  e.g. using doing something clever with ctid or something to ensure both the schema and all its rows were restored to the same point at which the dump was taken ?

1. Since pg_dump creates a logical copy, nothing binary like ctid will work.
2. pg_dump is designed to take a snapshot.If it doesn't, it would fail for people who do logical replication.  However, no one has noticed. 

That's kinda like being asked to prove that rocks always fall when you drop them.  Either you trust physics, because physics has always worked, or you must watch every rock, because next time it might not fall.  The analogy is slightly flawed, since we always check the pg_dump and pg_restore return codes, since something else might impact their function.

But if you still need evidence, here's what I'm doing to verify table and record counts during a 9.6 -> 14 migration.  You'll have to modify it for your purpose.

Create this table and function beforehand:
CREATE TABLE dba.migration_table_counts (
    location text check (location in ('96', '14'))
  , table_name text
  , row_count bigint
  , count_time timestamp without time zone default current_timestamp
  , primary key (table_name, location)
);
CREATE OR REPLACE FUNCTION dba.get_table_counts(_p_source TEXT) RETURNS INTEGER
    LANGUAGE plpgsql
AS $func$
DECLARE
    r RECORD;
    _sql TEXT;
    _table_count BIGINT;
BEGIN
    FOR r IN select relnamespace::regnamespace::text||'.'||relname as table_name
            from pg_class cla
            where relkind = 'r'
            and not exists (select 1   -- excludes parent tables
                            from pg_inherits inh1
                            where inh1.inhparent = cla.oid)
            and relnamespace::regnamespace::text
                    not in ('pg_catalog', 'information_schema', 'dba')
            order by 1
    LOOP
        _sql := FORMAT('SELECT COUNT(*) FROM %s;', r.table_name);
        RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:SS.MS'), _sql;
        EXECUTE _sql INTO _table_count;
        --RAISE NOTICE '%', _table_count;
        INSERT INTO dba.migration_table_counts (location, table_name, row_count)
            VALUES (_p_source, r.table_name, _table_count);
    END LOOP;
    RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:SS.MS'), 'Finished';
    RETURN 0;
END
$func$;

Run this script in a cron job that executes at the same time as your pg_dump cron job.  Parameters should be for the source database. 
#!/bin/bash
declare -gr Server=$1
declare -gr DB=$2
declare -gr Source=$3
if [ -z $Server ]; then echo "Requires a server name."; exit 2; fi
if [ -z $DB ]; then echo "Requires a DB name."; exit 3; fi
if [ -z $Source ]; then echo "Requires a source: 96|14."; exit 4; fi
psql -U postgres -h ${Server} $DB -Xac "DELETE FROM dba.migration_table_counts WHERE location = '$Source';"
psql -U postgres -h ${Server} $DB -Xac "select * from dba.get_table_counts('$Source');"

Run the same script on the destination server after the pg_restore is finished.
Dump the source dba.migration_table_counts then load it into the destination dba.migration_table_counts.

These two queries run on the destination server will check that all tables exist in both databases, and that the record counts are the same.

You'll have some slight variations, since the two jobs are in separate transactions.  (Mine won't, since the applications will be shut down, and pg_hba.conf will block them.)

declare -gr SQL1="
    with
    c96 as (select table_name, row_count from dba.migration_table_counts where location = '96'),
    c14 as (select table_name, row_count from dba.migration_table_counts where location = '14')
    select c96.*, c14.*
    from c96 full join c14 on c96.table_name = c14.table_name
    where c96.table_name is null
       or c14.table_name is null
    order by c96.table_name, c14.table_name;"
psql $DB -ac "$SQL1"

declare -gr SQL2="
    with
    c96 as (select table_name, row_count from dba.migration_table_counts where location = '96'),
    c14 as (select table_name, row_count from dba.migration_table_counts where location = '14')
    select c96.*, c14.*, c96.row_count - c14.row_count as row_diff
    from c96 inner join c14 on c96.table_name = c14.table_name
    where c96.row_count != c14.row_count
    order by c96.table_name;"
psql $DB -ac "$SQL2"

Re: Scriptable way to validate a pg_dump restore ?

From
Adrian Klaver
Date:
On 1/29/24 00:12, Laura Smith wrote:
> Hi
> 
> Let's say I've got a scenario where I'm doing a pg_dump replication rather than online streaming, e.g. due to air-gap
orwhatever.
 
> 
> Is there a scriptable way to validate the restore ?  e.g. using doing something clever with ctid or something to
ensureboth the schema and all its rows were restored to the same point at which the dump was taken ?
 

Assuming you are using pg_restore on a non-plain text dump file then
from pg_restore.c:

* pg_restore.c

*      pg_restore is an utility extracting postgres database definitions
  *      from a backup archive created by pg_dump using the archiver
  *      interface.
  *
  *      pg_restore will read the backup archive and
  *      dump out a script that reproduces
  *      the schema of the database in terms of
  *                user-defined types
  *                user-defined functions
  *                tables
  *                indexes
  *                aggregates
  *                operators
  *                ACL - grant/revoke
  *
  * the output script is SQL that is understood by PostgreSQL
  *
  * Basic process in a restore operation is:
  *
  *      Open the Archive and read the TOC.
  *      Set flags in TOC entries, and *maybe* reorder them.
  *      Generate script to stdout
  *      Exit

Then:

    pg_restore -l -f <output_file> <dump_file>

to get the TOC mentioned above. Walk through that to verify schema is 
the same in the restored database.

This will not tell you whether all the data was transferred. You will 
either have to trust from pg_dump.c:

  *      pg_dump will read the system catalogs in a database and dump out a
  *      script that reproduces the schema in terms of SQL that is 
understood
  *      by PostgreSQL
  *
  *      Note that pg_dump runs in a transaction-snapshot mode transaction,
  *      so it sees a consistent snapshot of the database including system
  *      catalogs. However, it relies in part on various specialized backend
  *      functions like pg_get_indexdef(), and those things tend to look at
  *      the currently committed state.  So it is possible to get 'cache
  *      lookup failed' error if someone performs DDL changes while a 
dump is
  *      happening. The window for this sort of thing is from the 
acquisition
  *      of the transaction snapshot to getSchemaData() (when pg_dump 
acquires
  *      AccessShareLock on every table it intends to dump). It isn't 
very large,
  *      but it can happen.

Or come up with way to capture the state of the data at the time of dump 
and then compare to restored database. Something like Ron posted.

> 
> Thanks !
> 
> Laura
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Scriptable way to validate a pg_dump restore ?

From
Shaheed Haque
Date:


On Mon, 29 Jan 2024, 22:52 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 1/29/24 00:12, Laura Smith wrote:
> Hi
>
> Let's say I've got a scenario where I'm doing a pg_dump replication rather than online streaming, e.g. due to air-gap or whatever.
>
> Is there a scriptable way to validate the restore ?  e.g. using doing something clever with ctid or something to ensure both the schema and all its rows were restored to the same point at which the dump was taken ?

Assuming you are using pg_restore on a non-plain text dump file then
from pg_restore.c:

* pg_restore.c

*      pg_restore is an utility extracting postgres database definitions
  *      from a backup archive created by pg_dump using the archiver
  *      interface.
  *
  *      pg_restore will read the backup archive and
  *      dump out a script that reproduces
  *      the schema of the database in terms of
  *                user-defined types
  *                user-defined functions
  *                tables
  *                indexes
  *                aggregates
  *                operators
  *                ACL - grant/revoke
  *
  * the output script is SQL that is understood by PostgreSQL
  *
  * Basic process in a restore operation is:
  *
  *      Open the Archive and read the TOC.
  *      Set flags in TOC entries, and *maybe* reorder them.
  *      Generate script to stdout
  *      Exit

Then:

        pg_restore -l -f <output_file> <dump_file>

to get the TOC mentioned above. Walk through that to verify schema is
the same in the restored database.

This will not tell you whether all the data was transferred. You will
either have to trust from pg_dump.c:

  *      pg_dump will read the system catalogs in a database and dump out a
  *      script that reproduces the schema in terms of SQL that is
understood
  *      by PostgreSQL
  *
  *      Note that pg_dump runs in a transaction-snapshot mode transaction,
  *      so it sees a consistent snapshot of the database including system
  *      catalogs. However, it relies in part on various specialized backend
  *      functions like pg_get_indexdef(), and those things tend to look at
  *      the currently committed state.  So it is possible to get 'cache
  *      lookup failed' error if someone performs DDL changes while a
dump is
  *      happening. The window for this sort of thing is from the
acquisition
  *      of the transaction snapshot to getSchemaData() (when pg_dump
acquires
  *      AccessShareLock on every table it intends to dump). It isn't
very large,
  *      but it can happen.

Or come up with way to capture the state of the data at the time of dump
and then compare to restored database. Something like Ron posted.

Right, for me, state, not just record count is what I'm interested in (for the initial full table copy part of replication). So, given the explanation about the possible per-table window, is there some property of the table that could be used to confirm that a table has made it across? 

I guess there is such a thing since the following incremental syncing would presumably need it. I had hoped the LSN was this thing, but confirmation would be great. 

Thanks, Shaheed


>
> Thanks !
>
> Laura
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com



Re: Scriptable way to validate a pg_dump restore ?

From
Adrian Klaver
Date:
On 1/29/24 09:28, Shaheed Haque wrote:


> 
> Right, for me, state, not just record count is what I'm interested in 
> (for the initial full table copy part of replication). So, given the 
> explanation about the possible per-table window, is there some property 
> of the table that could be used to confirm that a table has made it across?
> 
> I guess there is such a thing since the following incremental syncing 
> would presumably need it. I had hoped the LSN was this thing, but 
> confirmation would be great.

The OP was referring to the pg_dump/pg_restore cycle, you seem to be 
referring to logical replication. Is that correct?

> 
> Thanks, Shaheed
> 
> 
>      >
>      > Thanks !
>      >
>      > Laura
>      >
>      >
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Scriptable way to validate a pg_dump restore ?

From
Shaheed Haque
Date:


On Mon, 29 Jan 2024, 23:57 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 1/29/24 09:28, Shaheed Haque wrote:


>
> Right, for me, state, not just record count is what I'm interested in
> (for the initial full table copy part of replication). So, given the
> explanation about the possible per-table window, is there some property
> of the table that could be used to confirm that a table has made it across?
>
> I guess there is such a thing since the following incremental syncing
> would presumably need it. I had hoped the LSN was this thing, but
> confirmation would be great.

The OP was referring to the pg_dump/pg_restore cycle, you seem to be
referring to logical replication. Is that correct?

Yes. But I was under the impression that the initial copy of logical replication was the same? 


>
> Thanks, Shaheed
>
>
>      >
>      > Thanks !
>      >
>      > Laura
>      >
>      >
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Scriptable way to validate a pg_dump restore ?

From
Adrian Klaver
Date:
On 1/29/24 10:12, Shaheed Haque wrote:


> 
> Yes. But I was under the impression that the initial copy of logical 
> replication was the same?
>

Are you taking about the copy_data option to WITH?

If so yes and no.

Yes as it uses COPY to transfer the data.

No as what COPY transfers can be affected by WHERE clauses on the 
publisher. Also if you have cascading publishers/subscriptions the 
'original' data maybe upstream of the publisher you are comparing to. 
Finally logical replication is generally not static so there is the 
issue of determining a point in time for the check.

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Scriptable way to validate a pg_dump restore ?

From
Laura Smith
Date:
On Monday, 29 January 2024 at 09:06, Ron Johnson <ronljohnsonjr@gmail.com> wrote:

>
> That's kinda like being asked to prove that rocks always fall when you drop them. Either you trust physics, because
physicshas always worked, or you must watch every rock, because next time it might not fall. The analogy is slightly
flawed,since we always check the pg_dump and pg_restore return codes, since something else might impact their function. 
>
> But if you still need evidence, here's what I'm doing to verify table and record counts during a 9.6 -> 14 migration.
You'llhave to modify it for your purpose. 



Thanks Ron !

I must admit that I am willing to trust pg_dump / pg_restore, mostly for the reasons Adrian Klaver implied.

However your script is likely the very thing I was looking for in terms of belt & braces.  So I appreciate you
publishingit as a source of inspiration ! 



Re: Scriptable way to validate a pg_dump restore ?

From
Shaheed Haque
Date:


On Tue, 30 Jan 2024, 00:27 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 1/29/24 10:12, Shaheed Haque wrote:


>
> Yes. But I was under the impression that the initial copy of logical
> replication was the same?
>

Are you taking about the copy_data option to WITH?

If so yes and no.

Yes as it uses COPY to transfer the data.

Yes, this is what I meant. 

No as what COPY transfers can be affected by WHERE clauses on the
publisher. Also if you have cascading publishers/subscriptions the
'original' data maybe upstream of the publisher you are comparing to.

Good points, understood. For the next bit, let's assume neither of these are in play. 

Finally logical replication is generally not static so there is the
issue of determining a point in time for the check.

Indeed. I currently have a static source db but would eventually like to eliminate the implied downtime. What I'd like to provide my user is some indication of progress initially during the copy_data phase, and for the future, of the anticipated incremental convergence.

And, as per my other note, I would ideally like to be able to do this using only a connection to one db.

I was assuming that logical replication needed "something" similar internally, and was hoping the LSNs were that "something". 

Thanks, Shaheed 


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Scriptable way to validate a pg_dump restore ?

From
Adrian Klaver
Date:
On 1/29/24 11:35, Shaheed Haque wrote:
> 
> 
> On Tue, 30 Jan 2024, 00:27 Adrian Klaver, <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 1/29/24 10:12, Shaheed Haque wrote:
> 
> 
>      >
>      > Yes. But I was under the impression that the initial copy of logical
>      > replication was the same?
>      >
> 
>     Are you taking about the copy_data option to WITH?
> 
>     If so yes and no.
> 
>     Yes as it uses COPY to transfer the data.
> 
> 
> Yes, this is what I meant.
> 
>     No as what COPY transfers can be affected by WHERE clauses on the
>     publisher. Also if you have cascading publishers/subscriptions the
>     'original' data maybe upstream of the publisher you are comparing to.
> 
> 
> Good points, understood. For the next bit, let's assume neither of these 
> are in play.
> 
>     Finally logical replication is generally not static so there is the
>     issue of determining a point in time for the check.
> 
> 
> Indeed. I currently have a static source db but would eventually like to 
> eliminate the implied downtime. What I'd like to provide my user is some 

Implied downtime of what?

> indication of progress initially during the copy_data phase, and for the 
> future, of the anticipated incremental convergence.
> 
> And, as per my other note, I would ideally like to be able to do this 
> using only a connection to one db.
> 
> I was assuming that logical replication needed "something" similar 
> internally, and was hoping the LSNs were that "something".

I'm going to say up front I am no expert on the internals of logical 
replication. Will point you at:

https://www.postgresql.org/docs/current/protocol-message-formats.html

A quick look at that indicates to me it is more involved then you think.

> 
> Thanks, Shaheed
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Scriptable way to validate a pg_dump restore ?

From
Shaheed Haque
Date:


On Tue, 30 Jan 2024, 05:02 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 1/29/24 11:35, Shaheed Haque wrote:
>
>
> On Tue, 30 Jan 2024, 00:27 Adrian Klaver, <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 1/29/24 10:12, Shaheed Haque wrote:
>
>
>      >
>      > Yes. But I was under the impression that the initial copy of logical
>      > replication was the same?
>      >
>
>     Are you taking about the copy_data option to WITH?
>
>     If so yes and no.
>
>     Yes as it uses COPY to transfer the data.
>
>
> Yes, this is what I meant.
>
>     No as what COPY transfers can be affected by WHERE clauses on the
>     publisher. Also if you have cascading publishers/subscriptions the
>     'original' data maybe upstream of the publisher you are comparing to.
>
>
> Good points, understood. For the next bit, let's assume neither of these
> are in play.
>
>     Finally logical replication is generally not static so there is the
>     issue of determining a point in time for the check.
>
>
> Indeed. I currently have a static source db but would eventually like to
> eliminate the implied downtime. What I'd like to provide my user is some

Implied downtime of what?

> indication of progress initially during the copy_data phase, and for the
> future, of the anticipated incremental convergence.
>
> And, as per my other note, I would ideally like to be able to do this
> using only a connection to one db.
>
> I was assuming that logical replication needed "something" similar
> internally, and was hoping the LSNs were that "something".

I'm going to say up front I am no expert on the internals of logical
replication. Will point you at:

https://www.postgresql.org/docs/current/protocol-message-formats.html

A quick look at that indicates to me it is more involved then you think.

I'll take a look. Thanks for the tip and the gentle guidance; it is much appreciated. 


>
> Thanks, Shaheed
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Scriptable way to validate a pg_dump restore ?

From
Shaheed Haque
Date:


On Tue, 30 Jan 2024, 05:02 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 1/29/24 11:35, Shaheed Haque wrote:
>
>
> On Tue, 30 Jan 2024, 00:27 Adrian Klaver, <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 1/29/24 10:12, Shaheed Haque wrote:
>
>
>      >
>      > Yes. But I was under the impression that the initial copy of logical
>      > replication was the same?
>      >
>
>     Are you taking about the copy_data option to WITH?
>
>     If so yes and no.
>
>     Yes as it uses COPY to transfer the data.
>
>
> Yes, this is what I meant.
>
>     No as what COPY transfers can be affected by WHERE clauses on the
>     publisher. Also if you have cascading publishers/subscriptions the
>     'original' data maybe upstream of the publisher you are comparing to.
>
>
> Good points, understood. For the next bit, let's assume neither of these
> are in play.
>
>     Finally logical replication is generally not static so there is the
>     issue of determining a point in time for the check.
>
>
> Indeed. I currently have a static source db but would eventually like to
> eliminate the implied downtime. What I'd like to provide my user is some

Implied downtime of what?

Oh, forgot to say: the downtime of my Django based app. Not anything at the PG level. 


> indication of progress initially during the copy_data phase, and for the
> future, of the anticipated incremental convergence.
>
> And, as per my other note, I would ideally like to be able to do this
> using only a connection to one db.
>
> I was assuming that logical replication needed "something" similar
> internally, and was hoping the LSNs were that "something".

I'm going to say up front I am no expert on the internals of logical
replication. Will point you at:

https://www.postgresql.org/docs/current/protocol-message-formats.html

A quick look at that indicates to me it is more involved then you think.

>
> Thanks, Shaheed
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>

--
Adrian Klaver
adrian.klaver@aklaver.com