Re: request clarification on pg_restore documentation - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: request clarification on pg_restore documentation
Date
Msg-id ZTw6wv8rkkC8IMk0@momjian.us
Whole thread Raw
In response to Re: request clarification on pg_restore documentation  (Bruce Momjian <bruce@momjian.us>)
Responses Re: request clarification on pg_restore documentation
List pgsql-hackers
Does anyone have a suggestion on how to handle this issue?  The report
is a year old.

---------------------------------------------------------------------------

On Thu, Oct  6, 2022 at 10:28:15AM -0400, Bruce Momjian wrote:
> On Thu, Sep 29, 2022 at 02:30:13PM +0000, PG Doc comments form wrote:
> > The following documentation comment has been logged on the website:
> > 
> > Page: https://www.postgresql.org/docs/14/app-pgrestore.html
> > Description:
> > 
> > pg_restore seems to have two ways to restore data:
> > 
> > --section=data 
> > 
> > or
> > 
> >  --data-only
> > 
> > There is this cryptic warning that --data-only is "similar to but for
> > historical reasons different from" --section=data
> > 
> > But there is no further explanation of what those differences are or what
> > might be missed or different in your restore if you pick one option or the
> > other. Maybe one or the other option is the "preferred current way" and one
> > is the "historical way" or they are aimed at different types of use cases,
> > but that's not clear.
> 
> [Thread moved from docs to hackers because there are behavioral issues.]
> 
> Very good question.  I dug into this and found this commit which says
> --data-only and --section=data were equivalent:
> 
>     commit a4cd6abcc9
>     Author: Andrew Dunstan <andrew@dunslane.net>
>     Date:   Fri Dec 16 19:09:38 2011 -0500
>     
>         Add --section option to pg_dump and pg_restore.
>     
>         Valid values are --pre-data, data and post-data. The option can be
>         given more than once. --schema-only is equivalent to
> -->        --section=pre-data --section=post-data. --data-only is equivalent
> -->        to --section=data.
>     
> and then this commit which says they are not:
> 
>     commit 4317e0246c
>     Author: Tom Lane <tgl@sss.pgh.pa.us>
>     Date:   Tue May 29 23:22:14 2012 -0400
>     
>         Rewrite --section option to decouple it from --schema-only/--data-only.
>     
> -->        The initial implementation of pg_dump's --section option supposed that the
> -->        existing --schema-only and --data-only options could be made equivalent to
> -->        --section settings.  This is wrong, though, due to dubious but long since
>         set-in-stone decisions about where to dump SEQUENCE SET items, as seen in
>         bug report from Martin Pitt.  (And I'm not totally convinced there weren't
>         other bugs, either.)  Undo that coupling and instead drive --section
>         filtering off current-section state tracked as we scan through the TOC
>         list to call _tocEntryRequired().
>     
>         To make sure those decisions don't shift around and hopefully save a few
>         cycles, run _tocEntryRequired() only once per TOC entry and save the result
>         in a new TOC field.  This required minor rejiggering of ACL handling but
>         also allows a far cleaner implementation of inhibit_data_for_failed_table.
>     
>         Also, to ensure that pg_dump and pg_restore have the same behavior with
>         respect to the --section switches, add _tocEntryRequired() filtering to
>         WriteToc() and WriteDataChunks(), rather than trying to implement section
>         filtering in an entirely orthogonal way in dumpDumpableObject().  This
>         required adjusting the handling of the special ENCODING and STDSTRINGS
>         items, but they were pretty weird before anyway.
> 
> and this commit which made them closer:
> 
>     commit 5a39114fe7
>     Author: Tom Lane <tgl@sss.pgh.pa.us>
>     Date:   Fri Oct 26 12:12:42 2012 -0400
>     
>         In pg_dump, dump SEQUENCE SET items in the data not pre-data section.
>     
>         Represent a sequence's current value as a separate TableDataInfo dumpable
>         object, so that it can be dumped within the data section of the archive
> -->        rather than in pre-data.  This fixes an undesirable inconsistency between
> -->        the meanings of "--data-only" and "--section=data", and also fixes dumping
>         of sequences that are marked as extension configuration tables, as per a
>         report from Marko Kreen back in July.  The main cost is that we do one more
>         SQL query per sequence, but that's probably not very meaningful in most
>         databases.
> 
> Looking at the restore code, I see --data-only disabling triggers, while
> --section=data doesn't.  I also tested --data-only vs. --section=data in
> pg_dump for the regression database and saw the only differences as the
> creation and comments on large objects, e.g.,
> 
>     -- Name: 2121; Type: BLOB; Schema: -; Owner: postgres
>     --
>     
>     SELECT pg_catalog.lo_create('2121');
> 
> 
>     ALTER LARGE OBJECT 2121 OWNER TO postgres;
> 
>     --
>     -- Name: LARGE OBJECT 2121; Type: COMMENT; Schema: -; Owner: postgres
>     --
> 
>     COMMENT ON LARGE OBJECT 2121 IS 'testing comments';
> 
> but the large object _data_ was dumped in both cases.
> 
> So, where does this leave us?  We know we need --section=data because
> the pre/post-data options are clearly useful, so why would someone use
> --data-only vs. --section=data.  We don't document why to use one rather
> than the other, so the --data-only option looks useless to me.  Do we
> remove it, adjust it, or leave it alone?
> 
> -- 
>   Bruce Momjian  <bruce@momjian.us>        https://momjian.us
>   EDB                                      https://enterprisedb.com
> 
>   Indecision is a decision.  Inaction is an action.  Mark Batterson
> 
> 
> 

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



pgsql-hackers by date:

Previous
From: Steven Rostedt
Date:
Subject: Re: [POC][RFC][PATCH v2] sched: Extended Scheduler Time Slice
Next
From: Tom Lane
Date:
Subject: Re: race condition in pg_class