Re: WIP Patch: pg_dump structured - Mailing list pgsql-hackers
From | Attila Soki |
---|---|
Subject | Re: WIP Patch: pg_dump structured |
Date | |
Msg-id | 2F0C38DB-A665-4D50-9CEF-1395993C7610@attilasoki.com Whole thread Raw |
In response to | Re: WIP Patch: pg_dump structured (Attila Soki <pgsql@attilasoki.com>) |
List | pgsql-hackers |
On 12 Mar 2023, at 22:56, Attila Soki <pgsql@attilasoki.com> wrote: >> On 12 Mar 2023, at 21:50, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Won't this fail completely with SQL objects whose names aren't suitable >> to be pathname components? "A/B" is a perfectly good name so far as >> SQL is concerned. You could also have problems with collisions on >> case-insensitive filesystems. > > You are right about the case-insensivity, this is not handled and will fail. I forgot > to handle that. I trying to find a way to handle this. Hi Tom, Thank you for your feedback. This is an updated version of the pg_dump structured wip patch (V2) with the following changes: - to avoid path collisions on case insensitive filessystems, all path components created from user input are suffixed with the hex representation of a 32 bit hash. “A/B” and “a/b” will get different suffixes. - all path components are now filesystem safe All this is a proposal, if you know a better solution please let me know. This patch is a WIP (V2). The patch is against master and it compiles successfully on macOS 13.2.1 aarch64 and on Debian 11 arm64. To test, execute pg_dump --format=structured --file=/path/to/outputdir dbname > >> >>> This format can be restored by feeding its plaintext toc file (restore-dump.sql) >>> to psql. The output is also suitable for manipulating the files with standard >>> editing tools. >> >> This seems a little contradictory: if you want to edit the individual >> files, you'd have to also update restore-dump.sql, or else it's pointless. >> It might make more sense to consider this as a write-only dump format >> and not worry about whether it can be restored directly. > > The main motivation was to track changes with VCS at the file (object) level, > editing small files was intended as a second possible use case. > I did not know that a write-only format would go. Declaring this format as a write-only dump would allow a more flexible directory structure since we wouldn't have to maintain the restore order. > >> >>> What do you think of this feature, any chance it will be added to pg_dump once >>> the patch is ready? >> >> I'm not clear on how big the use-case is. It's not really obvious to >> me that this'd have any benefit over the existing plain-text dump >> capability. You can edit those files too, at least till the schema >> gets too big for your editor. (But if you've got many many thousand >> SQL objects, a file-per-SQL-object directory will also be no fun to >> deal with.) Here is a sample use case to demonstrate how this format could be used to track schema changes with git. The main difference from using the existing plain-text schema dump is, that this format makes it possible to keep a history of the actual changes made to the individual objects. For example, to determine which migrations have changed the foo function. # import the schema into the repository cd /path/to/my_app_code pg_dump --format=structured --schema-only --file=foo_schema foodb git add foo_schema --all git commit foo_schema -m'initial commit foo_schema' # make changes in the db (my_app migrate foodb) (psql foodb < tweak.sql) # get a fresh dump rm -rf foo_schema pg_dump --format=structured --schema-only --file=foo_schema foodb # now inspect the changes under foo_schema: there may be changed, new and # missing files git status foo_schema # commit all schema changes git add foo_schema -u git commit foo_schema -m'changes from migration foodb' # later, inspect changes git log --stat # show the history of one object git log -p -- "foo_schema/path/to/FUNCTIONS/foo.sql" Sure, the user base for this is narrow. Thanks for any feedback. — Best regards Attila Soki
Attachment
pgsql-hackers by date: