Re: proposal: possibility to read dumped table's name from file - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: proposal: possibility to read dumped table's name from file
Date
Msg-id CAFj8pRDFwyGmyefihxnk4ZCVedPp=Gs3H=CxbPj2-hBguvmoXw@mail.gmail.com
Whole thread Raw
In response to Re: proposal: possibility to read dumped table's name from file  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: proposal: possibility to read dumped table's name from file
List pgsql-hackers


po 20. 9. 2021 v 14:10 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> Will do a closer review on the patch shortly.

Had a read through, and tested, the latest posted version today:

+    Read objects filters from the specified file. Specify "-" to read from
+    stdin. Lines of this file must have the following format:
I think this should be <filename>-</filename> and <literal>STDIN</literal> to
match the rest of the docs.


+       <para>
+        With the following filter file, the dump would include table
+        <literal>mytable1</literal> and data from foreign tables of
+        <literal>some_foreign_server</literal> foreign server, but exclude data
+        from table <literal>mytable2</literal>.
+<programlisting>
+include table mytable1
+include foreign_data some_foreign_server
+exclude table mytable2
+</programlisting>
+       </para>
This example is highlighting the issue I've previously raised with the UX/doc
of this feature.  The "exclude table mytable2" is totally pointless in the
above since the exact match of "mytable1" will remove all other objects.  What
we should be doing instead is use the pattern matching aspect along the lines
of the below:

    include table mytable*
    exclude table mytable2

+        The <option>--filter</option> option works just like the other
+        options to include or exclude tables, schemas, table data, or foreign
This should refer to the actual options by name to make it clear which we are
talking about.

fixed



+       printf(_("  --filter=FILENAME            dump objects and data based on the filter expressions\n"
+                        "                               from the filter file\n"));
Before we settle on --filter I think we need to conclude whether this file is
intended to be included from a config file, or used on it's own.  If we gow tih
the former then we might not want a separate option for just --filter.

I prefer to separate two files. Although there is some intersection, I think it is good to have two simple separate files for two really different tasks.
It does filtering, and it should be controlled by option "--filter". When the implementation will be changed, then this option can be changed too.
Filtering is just a pg_dump related feature. Revision of client application configuration is a much more generic task, and if we mix it to one, we can be
in a trap. It can be hard to find one good format for large script generated content, and possibly hand written structured content. For practical
reasons it can be good to have two files too. Filters and configurations can have different life cycles.



+    if (filter_is_keyword(keyword, size, "include"))
I would prefer if this function call was replaced by just the pg_strcasecmp()
call in filter_is_keyword() and the strlen optimization there removed.  The is
not a hot-path, we can afford the string comparison in case of errors.  Having
the string comparison done inline here will improve readability saving the
reading from jumping to another function to see what it does.

I agree that this is not a hot-path, just I don't feel well if I need to make a zero end string just for comparison pg_strcasecmp. Current design reduces malloc/free cycles. It is used in more places, when Postgres parses strings - SQL parser, plpgsql parser. I am not sure about the benefits and costs - pg_strcasecmp can be more readable, but for any keyword I have to call pstrdup and pfree. Is it necessary? My opinion in this part is not too strong - it is a minor issue, maybe I have a little bit different feelings about benefits and costs in this specific case, and if you really think the benefits of rewriting are higher, I'll do it.



+    initStringInfo(&line);
Why is this using a StringInfo rather than a PQExpBuffer as the rest of pg_dump
does?

The StringInfo is used because I use the pg_get_line_buf function, and this function uses this API.



+typedef struct
I think these should be at the top of the file with the other typedefs.


done

 

When testing strange object names, I was unable to express this name in the filter file:

$ ./bin/psql
psql (15devel)
Type "help" for help.

danielg=# create table "
danielg"# t
danielg"# t
danielg"# " (a integer);
CREATE TABLE
danielg=# select relname from pg_class order by oid desc limit 1;
 relname
---------
        +
 t      +
 t      +

(1 row)


Good catch - I had badly placed pg_strip_crlf function, fixed and regress tests enhanced

Please check assigned patch

Regards

Pavel



 
--
Daniel Gustafsson               https://vmware.com/

Attachment

pgsql-hackers by date:

Previous
From: Gavin Flower
Date:
Subject: Re: Release 14 Schedule
Next
From: Antonin Houska
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs