Re: WIP Patch: pg_dump structured - Mailing list pgsql-hackers

From Attila Soki
Subject Re: WIP Patch: pg_dump structured
Date
Msg-id 2F0C38DB-A665-4D50-9CEF-1395993C7610@attilasoki.com
Whole thread Raw
In response to Re: WIP Patch: pg_dump structured  (Attila Soki <pgsql@attilasoki.com>)
List pgsql-hackers
On 12 Mar 2023, at 22:56, Attila Soki <pgsql@attilasoki.com> wrote:
>> On 12 Mar 2023, at 21:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> Won't this fail completely with SQL objects whose names aren't suitable
>> to be pathname components?  "A/B" is a perfectly good name so far as
>> SQL is concerned.  You could also have problems with collisions on
>> case-insensitive filesystems.
>

> You are right about the case-insensivity, this is not handled and will fail. I forgot
> to handle that. I trying to find a way to handle this.

Hi Tom,

Thank you for your feedback.

This is an updated version of the pg_dump structured wip patch (V2) with the
following changes:
- to avoid path collisions on case insensitive filessystems, all path components
  created from user input are suffixed with the hex representation of a 32 bit
  hash. “A/B” and “a/b” will get different suffixes.
- all path components are now filesystem safe

All this is a proposal, if you know a better solution please let me know.

This patch is a WIP (V2). The patch is against master and it compiles
successfully on macOS 13.2.1 aarch64 and on Debian 11 arm64.
To test, execute pg_dump --format=structured --file=/path/to/outputdir dbname

>
>>
>>> This format can be restored by feeding its plaintext toc file (restore-dump.sql)
>>> to psql. The output is also suitable for manipulating the files with standard
>>> editing tools.
>>
>> This seems a little contradictory: if you want to edit the individual
>> files, you'd have to also update restore-dump.sql, or else it's pointless.
>> It might make more sense to consider this as a write-only dump format
>> and not worry about whether it can be restored directly.
>
> The main motivation was to track changes with VCS at the file (object) level,
> editing small files was intended as a second possible use case.
> I did not know that a write-only format would go.

Declaring this format as a write-only dump would allow a more flexible directory
structure since we wouldn't have to maintain the restore order.

>
>>
>>> What do you think of this feature, any chance it will be added to pg_dump once
>>> the patch is ready?
>>
>> I'm not clear on how big the use-case is.  It's not really obvious to
>> me that this'd have any benefit over the existing plain-text dump
>> capability.  You can edit those files too, at least till the schema
>> gets too big for your editor.  (But if you've got many many thousand
>> SQL objects, a file-per-SQL-object directory will also be no fun to
>> deal with.)

Here is a sample use case to demonstrate how this format could be used to track
schema changes with git. The main difference from using the existing plain-text
schema dump is, that this format makes it possible to keep a history of the
actual changes made to the individual objects. For example, to determine which
migrations have changed the foo function.

# import the schema into the repository
cd /path/to/my_app_code
pg_dump --format=structured --schema-only --file=foo_schema foodb
git add foo_schema --all
git commit foo_schema -m'initial commit foo_schema'

# make changes in the db
(my_app migrate foodb)
(psql foodb < tweak.sql)

# get a fresh dump
rm -rf foo_schema
pg_dump --format=structured --schema-only --file=foo_schema foodb

# now inspect the changes under foo_schema: there may be changed, new and
# missing files
git status foo_schema

# commit all schema changes
git add foo_schema -u
git commit foo_schema -m'changes from migration foodb'

# later, inspect changes
git log --stat

# show the history of one object
git log -p -- "foo_schema/path/to/FUNCTIONS/foo.sql"

Sure, the user base for this is narrow.

Thanks for any feedback.

—
Best regards
Attila Soki





Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Schema variables - new implementation for Postgres 15
Next
From: Robert Haas
Date:
Subject: Re: HOT chain validation in verify_heapam()