Thread: LEFT JOIN in pg_dumpall is a bug

LEFT JOIN in pg_dumpall is a bug

From
Peter Eisentraut
Date:
This snippet in pg_dumpall

$PSQL -d template1 -At -F ' ' \ -c "SELECT datname, usename, pg_encoding_to_char(d.encoding),
datistemplate, datpath FROM pg_database d LEFT JOIN pg_shadow u ON (datdba
= usesysid) WHERE datallowconn;" | \
while read DATABASE DBOWNER ENCODING ISTEMPLATE DBPATH; do

(line breaks messed up)

won't actually work if there indeed happens to be a database without a
valid owner, because the 'read' command will take ENCODING as the dba
name.

I guess the real question is, what should be done in this case?  I think
it might be better to error out and let the user fix his database before
backing it up.

(At a glance, I think pg_dump also has some problems with these sort of
constellations.)

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: LEFT JOIN in pg_dumpall is a bug

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> This snippet in pg_dumpall
> $PSQL -d template1 -At -F ' ' \
>   -c "SELECT datname, usename, pg_encoding_to_char(d.encoding),
> datistemplate, datpath FROM pg_database d LEFT JOIN pg_shadow u ON (datdba
> = usesysid) WHERE datallowconn;" | \
> while read DATABASE DBOWNER ENCODING ISTEMPLATE DBPATH; do

> won't actually work if there indeed happens to be a database without a
> valid owner, because the 'read' command will take ENCODING as the dba
> name.

Oops, you're right, the read won't keep the columns straight.  Come to
think of it, it would do the wrong thing for empty-string datname or
usename, too, and it's only because datpath is the last column that
we haven't noticed it doing the wrong thing on empty datpath.

Is there a more robust way of reading the data into the script?

> I guess the real question is, what should be done in this case?  I think
> it might be better to error out and let the user fix his database before
> backing it up.

Possibly.  The prior state of the code (before I put in the LEFT JOIN)
would silently ignore any database with no matching user, which was
definitely NOT a good idea.

I think I'd rather see a warning, though, and let the script try to dump
the DB anyway.

> (At a glance, I think pg_dump also has some problems with these sort of
> constellations.)

Yes, there are a number of places where pg_dump should be doing outer
joins and isn't.  I think Tatsuo is at work on that.
        regards, tom lane


Re: LEFT JOIN in pg_dumpall is a bug

From
Peter Eisentraut
Date:
Tom Lane writes:

> > $PSQL -d template1 -At -F ' ' \
> >   -c "SELECT datname, usename, pg_encoding_to_char(d.encoding),
> > datistemplate, datpath FROM pg_database d LEFT JOIN pg_shadow u ON (datdba
> > = usesysid) WHERE datallowconn;" | \
> > while read DATABASE DBOWNER ENCODING ISTEMPLATE DBPATH; do

> Oops, you're right, the read won't keep the columns straight.  Come to
> think of it, it would do the wrong thing for empty-string datname or
> usename, too,

It won't actually work to restore such a setup, because zero-length
identifiers are no longer allowed.

> Is there a more robust way of reading the data into the script?

Provided that 'cut' is portable, then this works for me:

TAB='   ' # tab here

$PSQL -d template1 -At -F "$TAB" \ -c "SELECT datname, usename, pg_encoding_to_char(d.encoding),
datistemplate, datpath FROM pg_database d LEFT JOIN pg_shadow u ON (datdba
= usesysid) WHERE datallowconn;" | \
while read THINGS; do   DATABASE=`echo "$THINGS" | cut -f 1`   DBOWNER=`echo "$THINGS" | cut -f 2`   ENCODING=`echo
"$THINGS"| cut -f 3`   ISTEMPLATE=`echo "$THINGS" | cut -f 4`   DBPATH=`echo "$THINGS" | cut -f 5`
 

If 'cut' is not portable, then I don't believe you can do it with
IFS-based word splitting, because two adjacent separator characters don't
seem to indicate an empty field but are instead taken as one separator.

> I think I'd rather see a warning, though, and let the script try to dump
> the DB anyway.

Maybe for databases without an owner, but not for empty database or user
names.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: LEFT JOIN in pg_dumpall is a bug

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
>> Is there a more robust way of reading the data into the script?

> Provided that 'cut' is portable, then this works for me:

My old copy of Horton's _Portable C Software_ says that cut(1) is a
SysV-ism adopted by POSIX.  At that time (1990) it wasn't portable,
and he recommended using awk or sed instead.

If you think depending on POSIX utilities is OK, then use cut.
I'd recommend sed, though.  The GNU coding standards for Makefiles
suggest not depending on programs outside this set:
    cat cmp cp diff echo egrep expr false grep install-info    ln ls mkdir mv pwd rm rmdir sed sleep sort tar test
touchtrue
 
        regards, tom lane


Re: LEFT JOIN in pg_dumpall is a bug

From
Peter Eisentraut
Date:
Tom Lane writes:

> If you think depending on POSIX utilities is OK, then use cut.
> I'd recommend sed, though.

This has gotten pretty silly:

TAB='   ' # tab here

$PSQL -d template1 -At -F "$TAB" \ -c "SELECT datname, usename, pg_encoding_to_char(d.encoding),
datistemplate, datpath, 'x' FROM pg_database d LEFT JOIN pg_shadow u ON
(datdba = usesysid) WHERE datallowconn;" | \
while read RECORDS; do   DATABASE=`echo "x$RECORDS" | sed "s/^x\([^$TAB]*\).*/\1/"`   DBOWNER=`echo "x$RECORDS" | sed
"s/^x[^$TAB]*$TAB\([^$TAB]*\).*/\1/"`  ENCODING=`echo "x$RECORDS" | sed
"s/^x[^$TAB]*$TAB[^$TAB]*$TAB\([^$TAB]*\).*/\1/"`  ISTEMPLATE=`echo "x$RECORDS" | sed
"s/^x[^$TAB]*$TAB[^$TAB]*$TAB[^$TAB]*$TAB\([^$TAB]*\).*/\1/"`  DBPATH=`echo "x$RECORDS" | sed
"s/^x[^$TAB]*$TAB[^$TAB]*$TAB[^$TAB]*$TAB[^$TAB]*$TAB\([^$TAB]*\).*/\1/"`

I'm not sure whether this is actually an overall improvement.  I'm tempted
to just coalesce(usename, {some default user}) instead.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: LEFT JOIN in pg_dumpall is a bug

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I'm not sure whether this is actually an overall improvement.  I'm tempted
> to just coalesce(usename, {some default user}) instead.

I thought about that to begin with, but figured you wouldn't like it ;-)
        regards, tom lane