Re: request clarification on pg_restore documentation - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: request clarification on pg_restore documentation |
Date | |
Msg-id | ZTw6wv8rkkC8IMk0@momjian.us Whole thread Raw |
In response to | Re: request clarification on pg_restore documentation (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: request clarification on pg_restore documentation
|
List | pgsql-hackers |
Does anyone have a suggestion on how to handle this issue? The report is a year old. --------------------------------------------------------------------------- On Thu, Oct 6, 2022 at 10:28:15AM -0400, Bruce Momjian wrote: > On Thu, Sep 29, 2022 at 02:30:13PM +0000, PG Doc comments form wrote: > > The following documentation comment has been logged on the website: > > > > Page: https://www.postgresql.org/docs/14/app-pgrestore.html > > Description: > > > > pg_restore seems to have two ways to restore data: > > > > --section=data > > > > or > > > > --data-only > > > > There is this cryptic warning that --data-only is "similar to but for > > historical reasons different from" --section=data > > > > But there is no further explanation of what those differences are or what > > might be missed or different in your restore if you pick one option or the > > other. Maybe one or the other option is the "preferred current way" and one > > is the "historical way" or they are aimed at different types of use cases, > > but that's not clear. > > [Thread moved from docs to hackers because there are behavioral issues.] > > Very good question. I dug into this and found this commit which says > --data-only and --section=data were equivalent: > > commit a4cd6abcc9 > Author: Andrew Dunstan <andrew@dunslane.net> > Date: Fri Dec 16 19:09:38 2011 -0500 > > Add --section option to pg_dump and pg_restore. > > Valid values are --pre-data, data and post-data. The option can be > given more than once. --schema-only is equivalent to > --> --section=pre-data --section=post-data. --data-only is equivalent > --> to --section=data. > > and then this commit which says they are not: > > commit 4317e0246c > Author: Tom Lane <tgl@sss.pgh.pa.us> > Date: Tue May 29 23:22:14 2012 -0400 > > Rewrite --section option to decouple it from --schema-only/--data-only. > > --> The initial implementation of pg_dump's --section option supposed that the > --> existing --schema-only and --data-only options could be made equivalent to > --> --section settings. This is wrong, though, due to dubious but long since > set-in-stone decisions about where to dump SEQUENCE SET items, as seen in > bug report from Martin Pitt. (And I'm not totally convinced there weren't > other bugs, either.) Undo that coupling and instead drive --section > filtering off current-section state tracked as we scan through the TOC > list to call _tocEntryRequired(). > > To make sure those decisions don't shift around and hopefully save a few > cycles, run _tocEntryRequired() only once per TOC entry and save the result > in a new TOC field. This required minor rejiggering of ACL handling but > also allows a far cleaner implementation of inhibit_data_for_failed_table. > > Also, to ensure that pg_dump and pg_restore have the same behavior with > respect to the --section switches, add _tocEntryRequired() filtering to > WriteToc() and WriteDataChunks(), rather than trying to implement section > filtering in an entirely orthogonal way in dumpDumpableObject(). This > required adjusting the handling of the special ENCODING and STDSTRINGS > items, but they were pretty weird before anyway. > > and this commit which made them closer: > > commit 5a39114fe7 > Author: Tom Lane <tgl@sss.pgh.pa.us> > Date: Fri Oct 26 12:12:42 2012 -0400 > > In pg_dump, dump SEQUENCE SET items in the data not pre-data section. > > Represent a sequence's current value as a separate TableDataInfo dumpable > object, so that it can be dumped within the data section of the archive > --> rather than in pre-data. This fixes an undesirable inconsistency between > --> the meanings of "--data-only" and "--section=data", and also fixes dumping > of sequences that are marked as extension configuration tables, as per a > report from Marko Kreen back in July. The main cost is that we do one more > SQL query per sequence, but that's probably not very meaningful in most > databases. > > Looking at the restore code, I see --data-only disabling triggers, while > --section=data doesn't. I also tested --data-only vs. --section=data in > pg_dump for the regression database and saw the only differences as the > creation and comments on large objects, e.g., > > -- Name: 2121; Type: BLOB; Schema: -; Owner: postgres > -- > > SELECT pg_catalog.lo_create('2121'); > > > ALTER LARGE OBJECT 2121 OWNER TO postgres; > > -- > -- Name: LARGE OBJECT 2121; Type: COMMENT; Schema: -; Owner: postgres > -- > > COMMENT ON LARGE OBJECT 2121 IS 'testing comments'; > > but the large object _data_ was dumped in both cases. > > So, where does this leave us? We know we need --section=data because > the pre/post-data options are clearly useful, so why would someone use > --data-only vs. --section=data. We don't document why to use one rather > than the other, so the --data-only option looks useless to me. Do we > remove it, adjust it, or leave it alone? > > -- > Bruce Momjian <bruce@momjian.us> https://momjian.us > EDB https://enterprisedb.com > > Indecision is a decision. Inaction is an action. Mark Batterson > > > -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
pgsql-hackers by date: