Re: pg_dump refactor patch to remove global variables - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: pg_dump refactor patch to remove global variables
Date
Msg-id 20140911212354.GH4701@eldon.alvh.no-ip.org
Whole thread Raw
In response to Re: pg_dump refactor patch to remove global variables  (Joachim Wieland <joe@mcknight.de>)
Responses Re: pg_dump refactor patch to remove global variables
List pgsql-hackers
Joachim Wieland wrote:
> On Sat, Aug 30, 2014 at 11:12 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

> > - Why is quote_all_identifiers left behind as a global variable?
> 
> As I said, it's really used everywhere. I'm not opposed to passing it
> around (which would be straightforward) but it would really blow up
> the patch size and it would be a rather mechanical patch. I'd rather
> do that as a separate patch, otherwise all other changes would get
> lost in the noise.

I don't understand this bit.  You have struct _dumpOptions containing a
quote_all_identifiers, which seems to be unused.  There's also a static
int quote_all_identifiers in pg_dumpall.c, which I assume hides the
non-static one in dumputils.  And we also have an "extern" in pg_dump.c,
which seems misplaced.  There was an extern in dumputils.h but your
patch removes it.

Oh, after reading some code, I realize that those two variables are
actually separate and do different things: pg_dumpall has one that can
be set by passing --quote-all-identifiers; if set, it activates passing
--quote-all-identifiers to pg_dump.  pg_dump.c misleadingly has an
extern which is enabled by its --quote-all-identifiers switch, and the
effect is to execute "SET quote_all_identifiers=true" to the server.
And finally we have a quote_all_identifiers in dumputils and has a
quoting effect on the fmtId() function.

So the way this works is that pg_dumpall doesn't use the one in
dumputils; and pg_dump.c sets the one in dumputils.c and affects both
the SET command and fmtId().  In other words, unless I miss something,
pg_dumpall would never change the fmtId() behavior, which would be
pretty broken IMHO.

I find this rather odd.  Can we find a more sensible arrangement?  At
least let's move the extern to dumputils.h, remove it from pg_dump.c.
Not totally sure about the static in pg_dumpall.c -- note that it does
call fmtId(), so we might be doing the wrong thing there; maybe we need
to remove the static from there as well, and let --quote-all-identifiers
set the one in dumputils.

Is there a bug here that should be backpatched?  I checked 9.3 and there
is no static in pg_dumpall so it should behave sensibly AFAICT.

> > - Shouldn't pg_dumpall also use DumpOptions?
> 
> It could, it would mostly be a cosmetic change, since pg_dumpall is
> only a wrapper that invokes the pg_dump command. pg_dumpall's argument
> handling is isolated from the rest, so it could use DumpOptions or
> not...

Probably no point, yeah.  However if dumputils.c starts using
DumpOptions, it might need to go in that direction as well.  Not sure.

> Previously dumpencoding and use_role were passed as additional
> arguments to setup_connection(), being NULL when there was no change
> to the encoding (which I found quite ugly). I would say we should
> treat all variables of that group the same, so either all of them
> become local variables in main() and are passed as parameters to
> setup_connection() or we just pass the DumpOptions struct and put them
> in there (as is done in my patch now)... Or we create a new struct
> ConnectionOpts or so...

I think I like the idea of ConnectionOpts as separate from DumpOptions,
TBH.

> > - In dumpOptionsFromRestoreOptions() you are building the return value
> > in a local variable that is not zeroed.  You should probably use
> > NewDumpOptions() there.
> 
> The idea was to avoid malloc()ing and free()ing and to instead just
> create those dumpOptions on the stack, because they're only used for a
> short time and a small chunk of data, but I assume it's more
> consistent to do it as you propose with NewDumpOptions(). So this is
> fixed in the new patch.

I think putting stuff in the stack is fine, as long as you don't depend
on it being initialized to all zeroes, because that's not guaranteed.
You could memset() the struct in the stack also, but really I think it's
simpler to just allocate it.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Arthur Silva
Date:
Subject: Re: jsonb format is pessimal for toast compression
Next
From: Alvaro Herrera
Date:
Subject: Re: SKIP LOCKED DATA (work in progress)