Thread: LEFT JOIN in pg_dumpall is a bug
This snippet in pg_dumpall $PSQL -d template1 -At -F ' ' \ -c "SELECT datname, usename, pg_encoding_to_char(d.encoding), datistemplate, datpath FROM pg_database d LEFT JOIN pg_shadow u ON (datdba = usesysid) WHERE datallowconn;" | \ while read DATABASE DBOWNER ENCODING ISTEMPLATE DBPATH; do (line breaks messed up) won't actually work if there indeed happens to be a database without a valid owner, because the 'read' command will take ENCODING as the dba name. I guess the real question is, what should be done in this case? I think it might be better to error out and let the user fix his database before backing it up. (At a glance, I think pg_dump also has some problems with these sort of constellations.) -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Peter Eisentraut <peter_e@gmx.net> writes: > This snippet in pg_dumpall > $PSQL -d template1 -At -F ' ' \ > -c "SELECT datname, usename, pg_encoding_to_char(d.encoding), > datistemplate, datpath FROM pg_database d LEFT JOIN pg_shadow u ON (datdba > = usesysid) WHERE datallowconn;" | \ > while read DATABASE DBOWNER ENCODING ISTEMPLATE DBPATH; do > won't actually work if there indeed happens to be a database without a > valid owner, because the 'read' command will take ENCODING as the dba > name. Oops, you're right, the read won't keep the columns straight. Come to think of it, it would do the wrong thing for empty-string datname or usename, too, and it's only because datpath is the last column that we haven't noticed it doing the wrong thing on empty datpath. Is there a more robust way of reading the data into the script? > I guess the real question is, what should be done in this case? I think > it might be better to error out and let the user fix his database before > backing it up. Possibly. The prior state of the code (before I put in the LEFT JOIN) would silently ignore any database with no matching user, which was definitely NOT a good idea. I think I'd rather see a warning, though, and let the script try to dump the DB anyway. > (At a glance, I think pg_dump also has some problems with these sort of > constellations.) Yes, there are a number of places where pg_dump should be doing outer joins and isn't. I think Tatsuo is at work on that. regards, tom lane
Tom Lane writes: > > $PSQL -d template1 -At -F ' ' \ > > -c "SELECT datname, usename, pg_encoding_to_char(d.encoding), > > datistemplate, datpath FROM pg_database d LEFT JOIN pg_shadow u ON (datdba > > = usesysid) WHERE datallowconn;" | \ > > while read DATABASE DBOWNER ENCODING ISTEMPLATE DBPATH; do > Oops, you're right, the read won't keep the columns straight. Come to > think of it, it would do the wrong thing for empty-string datname or > usename, too, It won't actually work to restore such a setup, because zero-length identifiers are no longer allowed. > Is there a more robust way of reading the data into the script? Provided that 'cut' is portable, then this works for me: TAB=' ' # tab here $PSQL -d template1 -At -F "$TAB" \ -c "SELECT datname, usename, pg_encoding_to_char(d.encoding), datistemplate, datpath FROM pg_database d LEFT JOIN pg_shadow u ON (datdba = usesysid) WHERE datallowconn;" | \ while read THINGS; do DATABASE=`echo "$THINGS" | cut -f 1` DBOWNER=`echo "$THINGS" | cut -f 2` ENCODING=`echo "$THINGS"| cut -f 3` ISTEMPLATE=`echo "$THINGS" | cut -f 4` DBPATH=`echo "$THINGS" | cut -f 5` If 'cut' is not portable, then I don't believe you can do it with IFS-based word splitting, because two adjacent separator characters don't seem to indicate an empty field but are instead taken as one separator. > I think I'd rather see a warning, though, and let the script try to dump > the DB anyway. Maybe for databases without an owner, but not for empty database or user names. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Peter Eisentraut <peter_e@gmx.net> writes: >> Is there a more robust way of reading the data into the script? > Provided that 'cut' is portable, then this works for me: My old copy of Horton's _Portable C Software_ says that cut(1) is a SysV-ism adopted by POSIX. At that time (1990) it wasn't portable, and he recommended using awk or sed instead. If you think depending on POSIX utilities is OK, then use cut. I'd recommend sed, though. The GNU coding standards for Makefiles suggest not depending on programs outside this set: cat cmp cp diff echo egrep expr false grep install-info ln ls mkdir mv pwd rm rmdir sed sleep sort tar test touchtrue regards, tom lane
Tom Lane writes: > If you think depending on POSIX utilities is OK, then use cut. > I'd recommend sed, though. This has gotten pretty silly: TAB=' ' # tab here $PSQL -d template1 -At -F "$TAB" \ -c "SELECT datname, usename, pg_encoding_to_char(d.encoding), datistemplate, datpath, 'x' FROM pg_database d LEFT JOIN pg_shadow u ON (datdba = usesysid) WHERE datallowconn;" | \ while read RECORDS; do DATABASE=`echo "x$RECORDS" | sed "s/^x\([^$TAB]*\).*/\1/"` DBOWNER=`echo "x$RECORDS" | sed "s/^x[^$TAB]*$TAB\([^$TAB]*\).*/\1/"` ENCODING=`echo "x$RECORDS" | sed "s/^x[^$TAB]*$TAB[^$TAB]*$TAB\([^$TAB]*\).*/\1/"` ISTEMPLATE=`echo "x$RECORDS" | sed "s/^x[^$TAB]*$TAB[^$TAB]*$TAB[^$TAB]*$TAB\([^$TAB]*\).*/\1/"` DBPATH=`echo "x$RECORDS" | sed "s/^x[^$TAB]*$TAB[^$TAB]*$TAB[^$TAB]*$TAB[^$TAB]*$TAB\([^$TAB]*\).*/\1/"` I'm not sure whether this is actually an overall improvement. I'm tempted to just coalesce(usename, {some default user}) instead. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Peter Eisentraut <peter_e@gmx.net> writes: > I'm not sure whether this is actually an overall improvement. I'm tempted > to just coalesce(usename, {some default user}) instead. I thought about that to begin with, but figured you wouldn't like it ;-) regards, tom lane