Home > mailing lists

Re: proposal: possibility to read dumped table's name from file - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	Re: proposal: possibility to read dumped table's name from file
Date	July 5, 2020 23:08:09
Msg-id	CAFj8pRCsZuKRRdqZoYYo_wW-YjpWGA_ie9nhwJRd9E+GmsShrQ@mail.gmail.com Whole thread Raw
In response to	Re: proposal: possibility to read dumped table's name from file (Justin Pryzby <pryzby@telsasoft.com>)
Responses	Re: proposal: possibility to read dumped table's name from file
List	pgsql-hackers

Tree view

st 1. 7. 2020 v 23:24 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:

On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:
> st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
> > > + /* ignore empty rows */
> > > + if (*line != '\0')
> >
> > Maybe: if line=='\0': continue
> > We should also support comments.

Comment support is still missing but easily added :)

I tried this patch and it works for my purposes.

Also, your getline is dynamically re-allocating lines of arbitrary length.
Possibly that's not needed. We'll typically read "+t schema.relname", which is
132 chars. Maybe it's sufficient to do
char buf[1024];
fgets(buf);
if strchr(buf, '\n') == NULL: error();
ret = pstrdup(buf);

63 bytes is max effective identifier size, but it is not max size of identifiers. It is very probably so buff with 1024 bytes will be enough for all, but I do not want to increase any new magic limit. More when dynamic implementation is not too hard.

Table name can be very long - sometimes the data names (table names) can be stored in external storages with full length and should not be practical to require truncating in filter file.

For this case it is very effective, because a resized (increased) buffer is used for following rows, so realloc should not be often. So when I have to choose between two implementations with similar complexity, I prefer more dynamic code without hardcoded limits. This dynamic hasn't any overhead.

In any case, you could have getline return a char* and (rather than following
GNU) no need to take char**, int* parameters to conflate inputs and outputs.

no, it has a special benefit. It eliminates the short malloc/free cycle. When some lines are longer, then the buffer is increased (and limits), and for other rows with same or less size is not necessary realloc.

I realized that --filter has an advantage over the previous implementation
(with multiple --exclude-* and --include-*) in that it's possible to use stdin
for includes *and* excludes.

yes, it looks like better choose

By chance, I had the opportunity yesterday to re-use with rsync a regex that
I'd previously been using with pg_dump and grep. What this patch calls
"--filter" in rsync is called "--filter-from". rsync's --filter-from rejects
filters of length longer than max filename, so I had to split it up into
multiple lines instead of using regex alternation ("|"). This option is a
close parallel in pg_dump.

we can talk about option name - maybe "--filter-from" is better than just "--filter"

Regards

Pavel

--
Justin

pgsql-hackers by date:

From: Pavel Stehule
Date: 05 July 2020, 22:50:34
Subject: Re: proposal: possibility to read dumped table's name from file

From: Tom Lane
Date: 05 July 2020, 23:25:22
Subject: Re: Ideas about a better API for postgres_fdw remote estimates

Re: proposal: possibility to read dumped table's name from file - Mailing list pgsql-hackers

Previous

Next