Re: proposal: possibility to read dumped table's name from file - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: proposal: possibility to read dumped table's name from file
Date
Msg-id CAFj8pRCsZuKRRdqZoYYo_wW-YjpWGA_ie9nhwJRd9E+GmsShrQ@mail.gmail.com
Whole thread Raw
In response to Re: proposal: possibility to read dumped table's name from file  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: proposal: possibility to read dumped table's name from file
List pgsql-hackers


st 1. 7. 2020 v 23:24 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:
> st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
> > > +                                             /* ignore empty rows */
> > > +                                             if (*line != '\0')
> >
> > Maybe: if line=='\0': continue
> > We should also support comments.

Comment support is still missing but easily added :)

I tried this patch and it works for my purposes.

Also, your getline is dynamically re-allocating lines of arbitrary length.
Possibly that's not needed.  We'll typically read "+t schema.relname", which is
132 chars.  Maybe it's sufficient to do
char buf[1024];
fgets(buf);
if strchr(buf, '\n') == NULL: error();
ret = pstrdup(buf);

63 bytes is max effective identifier size, but it is not max size of identifiers. It is very probably so buff with 1024 bytes will be enough for all, but I do not want to increase any new magic limit. More when dynamic implementation is not too hard.

Table name can be very long - sometimes the data names (table names) can be stored in external storages with full length and should not be practical to require truncating in filter file.

For this case it is very effective, because a resized (increased) buffer is used for following rows, so realloc should not be often. So when I have to choose between two implementations with similar complexity, I prefer more dynamic code without hardcoded limits. This dynamic hasn't any overhead.


In any case, you could have getline return a char* and (rather than following
GNU) no need to take char**, int* parameters to conflate inputs and outputs.

no, it has a special benefit. It eliminates the short malloc/free cycle. When some lines are longer, then the buffer is increased (and limits), and for other rows with same or less size is not necessary realloc.


I realized that --filter has an advantage over the previous implementation
(with multiple --exclude-* and --include-*) in that it's possible to use stdin
for includes *and* excludes.

yes, it looks like better choose


By chance, I had the opportunity yesterday to re-use with rsync a regex that
I'd previously been using with pg_dump and grep.  What this patch calls
"--filter" in rsync is called "--filter-from".  rsync's --filter-from rejects
filters of length longer than max filename, so I had to split it up into
multiple lines instead of using regex alternation ("|").  This option is a
close parallel in pg_dump.

we can talk about option name - maybe "--filter-from" is better than just "--filter"

Regards

Pavel


 

--
Justin

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal: possibility to read dumped table's name from file
Next
From: Tom Lane
Date:
Subject: Re: Ideas about a better API for postgres_fdw remote estimates