Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes
Date
Msg-id 1628.1572820800@sss.pgh.pa.us
Whole thread Raw
Responses Re: BUG #16059: Tab-completion of filenames in COPY commands removesrequired quotes  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: BUG #16059: Tab-completion of filenames in COPY commands removesrequired quotes  (Robert Haas <robertmhaas@gmail.com>)
Re: BUG #16059: Tab-completion of filenames in COPY commands removesrequired quotes  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers
[ redirecting to -hackers ]

I wrote:
> Yeah, it seems like a bad idea to override the user's choice to write
> a quote, even if one is not formally necessary.  I propose the attached.

After further experimentation, I concluded that that patch is a bad idea;
it breaks a lot of cases that used to work before.  It turns out that
Readline has a bunch of behaviors for filename completion that occur
outside of the rl_filename_completion_function function proper, and they
all assume that what's passed back from that function is plain unquoted
filename(s).  Notably, completion of a path that includes directory names
just doesn't work well at all anymore with that patch ... nor did it
work well before, if the path contained characters that we thought we
should quote.

The right way to do things, seemingly, is to let
rl_filename_completion_function be invoked without any interference,
and instead put our SQL-aware quoting/dequoting logic into the hooks
Readline provides for that purpose, rl_filename_quoting_function and
rl_filename_dequoting_function.  (It appears that somebody tried to do
that before, way back around the turn of the century, but gave up on it.
Too bad, because it's the right thing.)

Of course, this only works if we have those hooks :-(.  So far as
I can tell, all libreadline variants that might still exist in the wild
do have them; but libedit doesn't, at least not in the version Apple
is shipping.  Hence, what the attached patch does is to make configure
probe for the existence of the hook variables; if they're not there,
fall back to what I proposed previously.  The behavior on libedit is
a bit less nice than one could wish, but it's better than what we have
now.

I've tested this on the oldest and newest readline versions I have at
hand (4.2a and 8.0), as well as the oldest and newest versions of
Apple's libedit derivative; but I haven't tried it on whatever the
BSDen are shipping as libedit.

There's enough moving parts here that this probably needs to go through
a full review cycle, so I'll add it to the next commitfest.  Some notes
for anybody wanting to review:

* The patch now always quotes completed filenames, so quote_if_needed()
is misnamed and overcomplicated for this use-case.  I left the extra
generality in place for possible future use.  On the other hand, this
is the *only* use-case, so you could also argue that we should just
simplify the function's API.  I have no strong opinion about that.

* In addition to the directly-related-to-filenames changes, it turns out
to be necessary to set rl_completer_quote_characters to include at least
single quotes, else Readline doesn't understand that a quoted filename
is quoted.  The patch sets it to include double quotes as well.  This
is probably the aspect of the patch that most needs testing.  The general
effect of this change is that Readline now understands that quoted
strings are single entities, plus it will try to complete the contents
of a string if you ask it.  The side-effects I've noticed seem to be
all beneficial -- for example, if you do

    select * from "foo<TAB>

it now correctly completes table names starting with "foo", which it
did not before.  But there might be some bad effects as well.  Also,
although libedit has this variable, setting it doesn't have that effect
there; I've not really found that the variable does anything at all there.

* The behavior of quote_file_name is directly modeled on what Readline's
default implementation rl_quote_filename does, except that it uses
SQL-aware quoting rules.  The business of passing back the final quote
mark separately is their idea.

* An example of the kind of case that's interesting is that if you type

\lo_import /usr/i<TAB>

then what you get on readline (with this patch) is

\lo_import '/usr/include/

while libedit produces

\lo_import '/usr/include'     (with a space after the trailing quote)

That is, readline knows that the completion-so-far is a directory and
assumes that's not all you want, whereas libedit doesn't know that.
So you typically now have to back up two characters, type slash, and
resume completing.  That's kind of a pain, but I'm not sure we can
make it better very easily.  Anyway, libedit did something close to
that before, too.

* There are also some interesting cases around special characters in
the filename.  It seems to work well for embedded spaces, not so well
for embedded single quotes, though that may well vary across readline
versions.  Again, there seems to be a limited amount we can do about
that, given how much of the relevant logic is buried where we can't
modify it.  And I'm not sure how much I care about that case, anyway.

            regards, tom lane

diff --git a/config/programs.m4 b/config/programs.m4
index 90ff944..e5bf980 100644
--- a/config/programs.m4
+++ b/config/programs.m4
@@ -209,11 +209,12 @@ fi



-# PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER
-# ---------------------------------------
+# PGAC_READLINE_VARIABLES
+# -----------------------
 # Readline versions < 2.1 don't have rl_completion_append_character
+# Libedit lacks rl_filename_quote_characters and rl_filename_quoting_function

-AC_DEFUN([PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER],
+AC_DEFUN([PGAC_READLINE_VARIABLES],
 [AC_CACHE_CHECK([for rl_completion_append_character], pgac_cv_var_rl_completion_append_character,
 [AC_LINK_IFELSE([AC_LANG_PROGRAM([#include <stdio.h>
 #ifdef HAVE_READLINE_READLINE_H
@@ -228,7 +229,38 @@ AC_DEFUN([PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER],
 if test x"$pgac_cv_var_rl_completion_append_character" = x"yes"; then
 AC_DEFINE(HAVE_RL_COMPLETION_APPEND_CHARACTER, 1,
           [Define to 1 if you have the global variable 'rl_completion_append_character'.])
-fi])# PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER
+fi
+AC_CACHE_CHECK([for rl_filename_quote_characters], pgac_cv_var_rl_filename_quote_characters,
+[AC_LINK_IFELSE([AC_LANG_PROGRAM([#include <stdio.h>
+#ifdef HAVE_READLINE_READLINE_H
+# include <readline/readline.h>
+#elif defined(HAVE_READLINE_H)
+# include <readline.h>
+#endif
+],
+[rl_filename_quote_characters = "x";])],
+[pgac_cv_var_rl_filename_quote_characters=yes],
+[pgac_cv_var_rl_filename_quote_characters=no])])
+if test x"$pgac_cv_var_rl_filename_quote_characters" = x"yes"; then
+AC_DEFINE(HAVE_RL_FILENAME_QUOTE_CHARACTERS, 1,
+          [Define to 1 if you have the global variable 'rl_filename_quote_characters'.])
+fi
+AC_CACHE_CHECK([for rl_filename_quoting_function], pgac_cv_var_rl_filename_quoting_function,
+[AC_LINK_IFELSE([AC_LANG_PROGRAM([#include <stdio.h>
+#ifdef HAVE_READLINE_READLINE_H
+# include <readline/readline.h>
+#elif defined(HAVE_READLINE_H)
+# include <readline.h>
+#endif
+],
+[rl_filename_quoting_function = 0;])],
+[pgac_cv_var_rl_filename_quoting_function=yes],
+[pgac_cv_var_rl_filename_quoting_function=no])])
+if test x"$pgac_cv_var_rl_filename_quoting_function" = x"yes"; then
+AC_DEFINE(HAVE_RL_FILENAME_QUOTING_FUNCTION, 1,
+          [Define to 1 if you have the global variable 'rl_filename_quoting_function'.])
+fi
+])# PGAC_READLINE_VARIABLES



diff --git a/configure b/configure
index 6b1c779..97556b8 100755
--- a/configure
+++ b/configure
@@ -16409,6 +16409,81 @@ if test x"$pgac_cv_var_rl_completion_append_character" = x"yes"; then
 $as_echo "#define HAVE_RL_COMPLETION_APPEND_CHARACTER 1" >>confdefs.h

 fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for rl_filename_quote_characters" >&5
+$as_echo_n "checking for rl_filename_quote_characters... " >&6; }
+if ${pgac_cv_var_rl_filename_quote_characters+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <stdio.h>
+#ifdef HAVE_READLINE_READLINE_H
+# include <readline/readline.h>
+#elif defined(HAVE_READLINE_H)
+# include <readline.h>
+#endif
+
+int
+main ()
+{
+rl_filename_quote_characters = "x";
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  pgac_cv_var_rl_filename_quote_characters=yes
+else
+  pgac_cv_var_rl_filename_quote_characters=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_var_rl_filename_quote_characters" >&5
+$as_echo "$pgac_cv_var_rl_filename_quote_characters" >&6; }
+if test x"$pgac_cv_var_rl_filename_quote_characters" = x"yes"; then
+
+$as_echo "#define HAVE_RL_FILENAME_QUOTE_CHARACTERS 1" >>confdefs.h
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for rl_filename_quoting_function" >&5
+$as_echo_n "checking for rl_filename_quoting_function... " >&6; }
+if ${pgac_cv_var_rl_filename_quoting_function+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <stdio.h>
+#ifdef HAVE_READLINE_READLINE_H
+# include <readline/readline.h>
+#elif defined(HAVE_READLINE_H)
+# include <readline.h>
+#endif
+
+int
+main ()
+{
+rl_filename_quoting_function = 0;
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  pgac_cv_var_rl_filename_quoting_function=yes
+else
+  pgac_cv_var_rl_filename_quoting_function=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_var_rl_filename_quoting_function" >&5
+$as_echo "$pgac_cv_var_rl_filename_quoting_function" >&6; }
+if test x"$pgac_cv_var_rl_filename_quoting_function" = x"yes"; then
+
+$as_echo "#define HAVE_RL_FILENAME_QUOTING_FUNCTION 1" >>confdefs.h
+
+fi
+
   for ac_func in rl_completion_matches rl_filename_completion_function rl_reset_screen_size
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
diff --git a/configure.in b/configure.in
index 2b9025c..07c00d9 100644
--- a/configure.in
+++ b/configure.in
@@ -1867,7 +1867,7 @@ fi
 LIBS="$LIBS_including_readline"

 if test "$with_readline" = yes; then
-  PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER
+  PGAC_READLINE_VARIABLES
   AC_CHECK_FUNCS([rl_completion_matches rl_filename_completion_function rl_reset_screen_size])
   AC_CHECK_FUNCS([append_history history_truncate_file])
 fi
diff --git a/src/bin/psql/stringutils.c b/src/bin/psql/stringutils.c
index 8c8b4c2..23a6f9f 100644
--- a/src/bin/psql/stringutils.c
+++ b/src/bin/psql/stringutils.c
@@ -282,6 +282,7 @@ strip_quotes(char *source, char quote, char escape, int encoding)
  * entails_quote -    any of these present?  need outer quotes
  * quote -            doubled within string, affixed to both ends
  * escape -            doubled within string
+ * force_quote -    if true, quote the output even if it doesn't "need" it
  * encoding -        the active character-set encoding
  *
  * Do not use this as a substitute for PQescapeStringConn().  Use it for
@@ -289,12 +290,13 @@ strip_quotes(char *source, char quote, char escape, int encoding)
  */
 char *
 quote_if_needed(const char *source, const char *entails_quote,
-                char quote, char escape, int encoding)
+                char quote, char escape, bool force_quote,
+                int encoding)
 {
     const char *src;
     char       *ret;
     char       *dst;
-    bool        need_quotes = false;
+    bool        need_quotes = force_quote;

     Assert(source != NULL);
     Assert(quote != '\0');
diff --git a/src/bin/psql/stringutils.h b/src/bin/psql/stringutils.h
index 16a7af0..fa00141 100644
--- a/src/bin/psql/stringutils.h
+++ b/src/bin/psql/stringutils.h
@@ -22,6 +22,7 @@ extern char *strtokx(const char *s,
 extern void strip_quotes(char *source, char quote, char escape, int encoding);

 extern char *quote_if_needed(const char *source, const char *entails_quote,
-                             char quote, char escape, int encoding);
+                             char quote, char escape, bool force_quote,
+                             int encoding);

 #endif                            /* STRINGUTILS_H */
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 2b1e3cd..a1d64f0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -53,7 +53,7 @@
 #ifdef HAVE_RL_FILENAME_COMPLETION_FUNCTION
 #define filename_completion_function rl_filename_completion_function
 #else
-/* missing in some header files */
+/* decl missing in some header files, but function exists anyway */
 extern char *filename_completion_function();
 #endif

@@ -61,6 +61,15 @@ extern char *filename_completion_function();
 #define completion_matches rl_completion_matches
 #endif

+/*
+ * Currently we assume that rl_filename_dequoting_function exists if
+ * rl_filename_quoting_function does.  If that proves not to be the case,
+ * we'd need to test for the former, or possibly both, in configure.
+ */
+#ifdef HAVE_RL_FILENAME_QUOTING_FUNCTION
+#define USE_FILENAME_QUOTING_FUNCTIONS 1
+#endif
+
 /* word break characters */
 #define WORD_BREAKS        "\t\n@$><=;|&{() "

@@ -1112,9 +1121,9 @@ static char **get_previous_words(int point, char **buffer, int *nwords);

 static char *get_guctype(const char *varname);

-#ifdef NOT_USED
+#ifdef USE_FILENAME_QUOTING_FUNCTIONS
 static char *quote_file_name(char *text, int match_type, char *quote_pointer);
-static char *dequote_file_name(char *text, char quote_char);
+static char *dequote_file_name(char *text, int quote_char);
 #endif


@@ -1127,8 +1136,32 @@ initialize_readline(void)
     rl_readline_name = (char *) pset.progname;
     rl_attempted_completion_function = psql_completion;

+#ifdef USE_FILENAME_QUOTING_FUNCTIONS
+    rl_filename_quoting_function = quote_file_name;
+    rl_filename_dequoting_function = dequote_file_name;
+#endif
+
     rl_basic_word_break_characters = WORD_BREAKS;

+    rl_completer_quote_characters = "'\"";
+
+    /*
+     * Set rl_filename_quote_characters to "all possible characters",
+     * otherwise Readline will skip filename quoting if it thinks a filename
+     * doesn't need quoting.  Readline actually interprets this as bytes, so
+     * there are no encoding considerations here.
+     */
+#ifdef HAVE_RL_FILENAME_QUOTE_CHARACTERS
+    {
+        unsigned char *fqc = (unsigned char *) pg_malloc(256);
+
+        for (int i = 0; i < 255; i++)
+            fqc[i] = (unsigned char) (i + 1);
+        fqc[255] = '\0';
+        rl_filename_quote_characters = (const char *) fqc;
+    }
+#endif
+
     completion_max_records = 1000;

     /*
@@ -4343,12 +4380,31 @@ complete_from_variables(const char *text, const char *prefix, const char *suffix

 /*
  * This function wraps rl_filename_completion_function() to strip quotes from
- * the input before searching for matches and to quote any matches for which
- * the consuming command will require it.
+ * the input before searching for matches and then re-quote the matches.
+ * (We always quote filenames, even though for some psql meta-commands and
+ * some filenames it wouldn't be strictly necessary.)
+ *
+ * Caller must set completion_charp to a zero- or one-character string
+ * containing the escape character.  This is necessary since \copy has no
+ * escape character, but every other backslash command recognizes "\" as an
+ * escape character.  Since we have only two callers, don't bother providing
+ * a macro to simplify this.
  */
 static char *
 complete_from_files(const char *text, int state)
 {
+#ifdef USE_FILENAME_QUOTING_FUNCTIONS
+    /*
+     * If we're using a version of Readline that exports filename quoting
+     * hooks, use those, and invoke rl_filename_completion_function without
+     * any funny business.  Readline does stuff internally that does not work
+     * well at all if we try to handle dequoting here.
+     */
+    return filename_completion_function(text, state);
+#else
+    /*
+     * Otherwise, we have to do the best we can.
+     */
     static const char *unquoted_text;
     char       *unquoted_match;
     char       *ret = NULL;
@@ -4369,15 +4425,11 @@ complete_from_files(const char *text, int state)
     unquoted_match = filename_completion_function(unquoted_text, state);
     if (unquoted_match)
     {
-        /*
-         * Caller sets completion_charp to a zero- or one-character string
-         * containing the escape character.  This is necessary since \copy has
-         * no escape character, but every other backslash command recognizes
-         * "\" as an escape character.  Since we have only two callers, don't
-         * bother providing a macro to simplify this.
-         */
+        /* Re-quote the result. */
         ret = quote_if_needed(unquoted_match, " \t\r\n\"`",
-                              '\'', *completion_charp, pset.encoding);
+                              '\'', *completion_charp,
+                              true,
+                              pset.encoding);
         if (ret)
             free(unquoted_match);
         else
@@ -4385,6 +4437,7 @@ complete_from_files(const char *text, int state)
     }

     return ret;
+#endif                            /* USE_FILENAME_QUOTING_FUNCTIONS */
 }


@@ -4636,46 +4689,60 @@ get_guctype(const char *varname)
     return guctype;
 }

-#ifdef NOT_USED
+#ifdef USE_FILENAME_QUOTING_FUNCTIONS

 /*
- * Surround a string with single quotes. This works for both SQL and
- * psql internal. Currently disabled because it is reported not to
- * cooperate with certain versions of readline.
+ * Quote a filename according to SQL rules, returning a malloc'd string.
+ * completion_charp must point to escape character or '\0', as per
+ * comments for complete_from_files().
  */
 static char *
 quote_file_name(char *text, int match_type, char *quote_pointer)
 {
     char       *s;
-    size_t        length;

-    (void) quote_pointer;        /* not used */
+    /* We always quote (so quote_if_needed's API is overkill) */
+    s = quote_if_needed(text, " \t\r\n\"`",
+                        '\'', *completion_charp,
+                        true,
+                        pset.encoding);
+
+    /*
+     * If quote_pointer is passed, it's better to handle the trailing quote
+     * that way.
+     */
+    if (quote_pointer && s && *s)
+    {
+        char       *send = s + strlen(s) - 1;

-    length = strlen(text) +(match_type == SINGLE_MATCH ? 3 : 2);
-    s = pg_malloc(length);
-    s[0] = '\'';
-    strcpy(s + 1, text);
-    if (match_type == SINGLE_MATCH)
-        s[length - 2] = '\'';
-    s[length - 1] = '\0';
+        Assert(*send == '\'');
+        *send = '\0';
+        *quote_pointer = '\'';
+    }
     return s;
 }

+/*
+ * Dequote a filename, if it's quoted.
+ * completion_charp must point to escape character or '\0', as per
+ * comments for complete_from_files().
+ */
 static char *
-dequote_file_name(char *text, char quote_char)
+dequote_file_name(char *text, int quote_char)
 {
-    char       *s;
-    size_t        length;
-
-    if (!quote_char)
-        return pg_strdup(text);
+    char       *unquoted_text = strtokx(text, "", NULL, "'", *completion_charp,
+                                        false, true, pset.encoding);

-    length = strlen(text);
-    s = pg_malloc(length - 2 + 1);
-    strlcpy(s, text +1, length - 2 + 1);
-
-    return s;
+    /* expect a NULL return for the empty string only */
+    if (!unquoted_text)
+    {
+        Assert(*text == '\0');
+        unquoted_text = text;
+    }
+    /* readline expects a malloc'd result that it is to free */
+    return pg_strdup(unquoted_text);
 }
-#endif                            /* NOT_USED */
+
+#endif                            /* USE_FILENAME_QUOTING_FUNCTIONS */

 #endif                            /* USE_READLINE */
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 939245d..e3baae8 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -492,6 +492,14 @@
 /* Define to 1 if you have the `rl_filename_completion_function' function. */
 #undef HAVE_RL_FILENAME_COMPLETION_FUNCTION

+/* Define to 1 if you have the global variable 'rl_filename_quote_characters'.
+   */
+#undef HAVE_RL_FILENAME_QUOTE_CHARACTERS
+
+/* Define to 1 if you have the global variable 'rl_filename_quoting_function'.
+   */
+#undef HAVE_RL_FILENAME_QUOTING_FUNCTION
+
 /* Define to 1 if you have the `rl_reset_screen_size' function. */
 #undef HAVE_RL_RESET_SCREEN_SIZE

diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 36e6bdc..1ca90b1 100644
--- a/src/include/pg_config.h.win32
+++ b/src/include/pg_config.h.win32
@@ -355,6 +355,14 @@
 /* Define to 1 if you have the `rl_filename_completion_function' function. */
 /* #undef HAVE_RL_FILENAME_COMPLETION_FUNCTION */

+/* Define to 1 if you have the global variable 'rl_filename_quote_characters'.
+   */
+/* #undef HAVE_RL_FILENAME_QUOTE_CHARACTERS */
+
+/* Define to 1 if you have the global variable 'rl_filename_quoting_function'.
+   */
+/* #undef HAVE_RL_FILENAME_QUOTING_FUNCTION */
+
 /* Define to 1 if you have the <security/pam_appl.h> header file. */
 /* #undef HAVE_SECURITY_PAM_APPL_H */


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] Include triggers in EXPLAIN
Next
From: Thomas Munro
Date:
Subject: Re: Should we add xid_current() or a int8->xid cast?