Re: In-order pg_dump (or in-order COPY TO) - Mailing list pgsql-general
From | Dimitrios Apostolou |
---|---|
Subject | Re: In-order pg_dump (or in-order COPY TO) |
Date | |
Msg-id | 4ss66r31-558o-qq24-332q-no351p7n5osr@tzk.arg Whole thread Raw |
In response to | Re: In-order pg_dump (or in-order COPY TO) (Adrian Klaver <adrian.klaver@aklaver.com>) |
Responses |
Re: In-order pg_dump (or in-order COPY TO)
|
List | pgsql-general |
On Wednesday 2025-08-27 17:25, Adrian Klaver wrote: > Comments in line below. > >> Dump is from PostgreSQL 16, it's pg_dump writing to stdout: >> >> pg_dump -v --format=custom --compress=none --no-toast-compression -- >> serializable-deferrable db_name | borg create ... >> >> >> As you can see the backup (and deduplicating) program is borgbackup. > > Ok, I use BorgBackup and it is fairly forgiving of normal changes. > > FYI, if you ever want to use compression check out gzip --rsyncable, I have > found it plays well with Borg. For more information see: > > https://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/ Yes, zstd has also --rsyncable. In this case I let borg do per-chunk compression after deduplication, it has worked well so far. >> Restore is in PostgreSQL 17: >> >> I first create the empty tables by running the DDL commands in version >> control to setup the database. And then I do pg_restore --data-only: >> >> pg_restore -vvvv -j 8 -U db_owner -d db_name --schema=public -- >> section=data dump_file > > If you are using only the --data section why not --data-only in the pg_dump? I want the dump to be as complete as possible. Didn't think it would create issues. > > Or is the pg_dump output used for other purposes? It has happened that I have selectively restored user schemas from that dump. >> Worth noting is that the above pg_restore goes through the WAL, i.e. all >> writes are done by walwriter, not the backend directly. > > Please explain the above further. The COPY FROM data is going through the WAL, as usual INSERTS do. The writes to disk happen by the walwriter process. OTOH, If you have configured the server with wal_level=minimal and BEGIN a transaction, CREATE or TRUNCATE a table, and then COPY FROM into that table, then the backend process writes directly to the table without logging to the WAL. This can be much faster, but most importantly it avoids situations of WAL overflow that are very difficult to predict and can mess your server up completely. [1] [1] https://www.postgresql.org/message-id/flat/076464ad-3d70-dd25-9e8f-e84f27decfba%40gmx.net My patches are for activating that codepath in pg_restore, but they were not used on purpose and I took notice that the writes went via WAL. > > The problem occurs when you do the pg_dump after this restore, correct? Correct. The first pg_dump from the restored pg17 is not deduplicated at all. Most of the tables have not changed (logically at least; apparently they have changed physically). > > Is it the same pg_dump command as you show above? Yes. > >> >> Postgres is standard open source running on own server. It has a couple of >> custom patches that shouldn't matter in this codepath. > > For completeness and just in case they may affect the output what do the > patches do? Two patches for speeding up scanning an archive without TOC, like the one I'm having (because it is piped into borg, instead of written to file). These were activated, but shouldn't matter. They only build the TOC in pg_restore's memory. https://commitfest.postgresql.org/patch/5809/ https://commitfest.postgresql.org/patch/5817/ And two patches for speeding up pg_restore like mentioned above, under specific arguments that I didn't provide. (one speedup needs --clean, and the other needs --freeze). https://commitfest.postgresql.org/patch/5821/ https://commitfest.postgresql.org/patch/5826/ IIRC I did not activate them (via --clean) because TRUNCATE fails when foreign keys exist. See the discussion threads. Dimitris
pgsql-general by date: