Thread: psql NUL record and field separator
Inspired by this question http://stackoverflow.com/questions/6857265 I have implemented a way to set the psql record and field separators to a zero byte (ASCII NUL character). This can be very useful in shell scripts to have an unambiguous separator. Other GNU tools such as find, grep, sort, xargs also support this. So with this you could for example do psql --record-separator-zero -At -c 'select something from somewhere' | xargs -0 dosomething I have thought about two different ways to implement this. Attempt one was to make the backslash command option parsing zero-byte proof top to bottom by using PQExpBuffers, so you could then write \R '\000'. But that turned out to be very invasive and complicated. And worst, you couldn't use it from the command line, because psql -R '\000' doesn't work (the octal escape syntax is not used on the command line). So attempt two, which I present here, is to just have separate syntax to set the separators to zero bytes. From the command line it would be --record-separator-zero and --field-separator-zero, and from within psql it would be \pset recordsep_zero and \pset fieldsep_zero. I don't care much for the verbosity of this, so I'm still thinking about ways to abbreviate this. I think the most common use of this would be to set the record separator from the command line, so we could use a short option such as -0 or -z for that. Patch attached. Comments welcome.
Attachment
At 2012-01-14 14:23:49 +0200, peter_e@gmx.net wrote: > > Inspired by this question http://stackoverflow.com/questions/6857265 I > have implemented a way to set the psql record and field separators to > a zero byte (ASCII NUL character). Since this patch is in the commitfest, I had a look at it. I agree that the feature is useful. The patch applies and builds cleanly with HEAD@9f9135d1, but needs a further minor tweak to work (attached). Without it, both zero separators get overwritten with the default value after option parsing. The code looks good otherwise. There's one problem: > psql --record-separator-zero -At -c 'select something from somewhere' | xargs -0 dosomething If you run find -print0 and it finds one file, it will still print "filename\0", and xargs -0 will work fine. But psql --record-separator-zero -At -c 'select 1' will print "1\n", not "1\0" or even "1\0\n", so xargs -0 will use the value "1\n", not "1". If you're doing this in a shell script, handing the last argument specially would be painful. At issue are (at least) these three lines from print_unaligned_text in src/bin/psql/print.c: 358 /* the last record needs to be concluded with a newline */ 359 if (need_recordsep) 360 fputc('\n', fout); Perhaps the right thing to do would be to change this to output \0 if --record-separator-zero was used (but leave it at \n otherwise)? That is what my second attached patch does: $ bin/psql --record-separator-zero --field-separator-zero -At -c 'select 1,2 union select 3,4'|xargs -0 echo 1 2 3 4 Thoughts? > I think the most common use of this would be to set the record > separator from the command line, so we could use a short option > such as -0 or -z for that. I agree. The current option names are very unwieldy to type. -- ams
Attachment
On tor, 2012-01-26 at 19:00 +0530, Abhijit Menon-Sen wrote: > At issue are (at least) these three lines from print_unaligned_text in > src/bin/psql/print.c: > > 358 /* the last record needs to be concluded with a newline > */ > 359 if (need_recordsep) > 360 fputc('\n', fout); > > Perhaps the right thing to do would be to change this to output \0 if > --record-separator-zero was used (but leave it at \n otherwise)? That > is what my second attached patch does: > > $ bin/psql --record-separator-zero --field-separator-zero -At -c > 'select 1,2 union select 3,4'|xargs -0 echo > 1 2 3 4 > > Thoughts? > > > I think the most common use of this would be to set the record > > separator from the command line, so we could use a short option > > such as -0 or -z for that. > > I agree. The current option names are very unwieldy to type. > I have incorporated your two patches and added short options. Updated patch attached. This made me wonder, however. The existing -F and -R options set the record *separator*. The new options, however, set the record *terminator*. This is the small distinction that you had discovered. Should we rename the options and/or add that to the documentation, or is the new behavior obvious and any new terminology would be too confusing?
Attachment
At 2012-02-07 13:20:43 +0200, peter_e@gmx.net wrote: > > Should we rename the options and/or add that to the documentation, or is > the new behavior obvious and any new terminology would be too confusing? I agree there is potential for confusion either way. I tried to come up with a complete and not-confusing wording for all four options, but did not manage to improve on the current state of affairs significantly. I think it can stay the way it is. The reference to xargs -0 is probably enough to set the right expectations about how it works. We can always add a sentence later to clarify the special-case behaviour of -0 if anyone is actually confused (and the best wording will be more clear in that situation too). -- Abhijit