Thread: csv format for psql
Hi, This patch implements csv as an output format in psql (\pset format csv). It's quite similar to the unaligned format, except that it applies CSV quoting rules (obviously!) and that it prints no footer and no title. As with unaligned, a header with column names is output unless tuples_only is on. It also supports the fieldsep/fielsep_zero and recordsep/recordsep_zero settings. Most of times, the need for CSV is covered by \copy or COPY with the CSV option, but there are some cases where it would be more practical to have it as an output format in psql. * \copy does not interpolate psql variables and is a single-line command, so making a query fit these contraints can be cumbersome. It can be got around by defining a temporary view and \copy from that view, but that doesn't work in a read-only context such as when connected to a standby. * the server-side COPY TO STDOUT can also be used from psql, typically with psql -c "COPY (query) TO STDOUT CSV" > file.csv, but that's too simple to extract multiple result sets per script. COPY is also more rigid than psql in the options to delimit fields and records. * copy with csv can't help for the output of meta-commands such as \gx, \crosstabview, \l, \d ... whereas a CSV format within psql does work with these. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Attachment
Hello Daniel, > This patch implements csv as an output format in psql > (\pset format csv). Would you consider registering it in the next CF? -- Fabien.
2018-01-30 9:31 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:
Hi,
This patch implements csv as an output format in psql
(\pset format csv). It's quite similar to the unaligned format,
except that it applies CSV quoting rules (obviously!) and that
it prints no footer and no title.
As with unaligned, a header with column names is output unless
tuples_only is on. It also supports the fieldsep/fielsep_zero
and recordsep/recordsep_zero settings.
Most of times, the need for CSV is covered by \copy or COPY with
the CSV option, but there are some cases where it would be more
practical to have it as an output format in psql.
I absolutely agree
* \copy does not interpolate psql variables and is a single-line
command, so making a query fit these contraints can be cumbersome.
It can be got around by defining a temporary view and
\copy from that view, but that doesn't work in a read-only context
such as when connected to a standby.
* the server-side COPY TO STDOUT can also be used from psql,
typically with psql -c "COPY (query) TO STDOUT CSV" > file.csv,
but that's too simple to extract multiple result sets per script.
COPY is also more rigid than psql in the options to delimit
fields and records.
* copy with csv can't help for the output of meta-commands
such as \gx, \crosstabview, \l, \d ... whereas a CSV format within psql
does work with these.
It is great - long time I miss this feature - It is interesting for scripting, ETL, ..
This format is too important, so some special short or long option can be practical (it will be printed in help)
some like --csv
I found one issue - PostgreSQL default field separator is "|". Maybe good time to use more common "," ?
Or when field separator was not explicitly defined, then use "," for CSV, and "|" for other. Although it can be little bit messy
Thank you
Pavel
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Fabien COELHO wrote: > > This patch implements csv as an output format in psql > > (\pset format csv). > > Would you consider registering it in the next CF? Done here: https://commitfest.postgresql.org/17/1500/ Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Pavel Stehule wrote: > This format is too important, so some special short or long option can be > practical (it will be printed in help) > > some like --csv I guess -C/--csv could be used, like there already is -H/--html. > I found one issue - PostgreSQL default field separator is "|". Maybe good > time to use more common "," ? > > Or when field separator was not explicitly defined, then use "," for CSV, > and "|" for other. Although it can be little bit messy Currently there's a strong analogy between the unaligned mode and csv mode. In particular they use the existing pset variables fieldsep, fieldsep_zero, recordsep, recordsep_zero in the same way. If we want to make csv special with regard to the delimiters, that complicates the user interface For instance if fieldsep was changed automatically by \pset format csv, it's not obvious if/when we should switch it back to its previous state, and how the fieldsep switch done automatically would mix or conflict with other \pset commands issued by the user. Or we need to duplicate these variables. Or duplicate only fieldsep, having a fieldsep_csv, leaving out the other variables and not being as close to the unaligned mode. These options don't appeal to me much compared to the simplicity of the current patch. Also, although the comma is the separator defined by the RFC4180 and the default for COPY CSV, people also use the semicolon extensively (because it's what Excel does I guess), which somehow mitigates the importance of comma as the default value. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
2018-01-31 13:58 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:
Pavel Stehule wrote:
> This format is too important, so some special short or long option can be
> practical (it will be printed in help)
>
> some like --csv
I guess -C/--csv could be used, like there already is -H/--html.
> I found one issue - PostgreSQL default field separator is "|". Maybe good
> time to use more common "," ?
>
> Or when field separator was not explicitly defined, then use "," for CSV,
> and "|" for other. Although it can be little bit messy
Currently there's a strong analogy between the unaligned
mode and csv mode. In particular they use the existing pset
variables fieldsep, fieldsep_zero, recordsep, recordsep_zero
in the same way. If we want to make csv special with regard
to the delimiters, that complicates the user interface
For instance if fieldsep was changed automatically by
\pset format csv, it's not obvious if/when we should switch it
back to its previous state, and how the fieldsep switch done
automatically would mix or conflict with other \pset
commands issued by the user.
Or we need to duplicate these variables. Or duplicate
only fieldsep, having a fieldsep_csv, leaving out the
other variables and not being as close to the unaligned
mode.
These options don't appeal to me much compared
to the simplicity of the current patch.
Also, although the comma is the separator defined by the
RFC4180 and the default for COPY CSV, people also use the
semicolon extensively (because it's what Excel does I guess),
which somehow mitigates the importance of comma as the
default value.
The functionality is clean and great - default setting "|" is less. I have not strong opinion about it and I understand to your arguments. Has anybody some idea?
Regards
Pavel
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
2018-01-31 14:09 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2018-01-31 13:58 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:Pavel Stehule wrote:
> This format is too important, so some special short or long option can be
> practical (it will be printed in help)
>
> some like --csv
I guess -C/--csv could be used, like there already is -H/--html.
> I found one issue - PostgreSQL default field separator is "|". Maybe good
> time to use more common "," ?
>
> Or when field separator was not explicitly defined, then use "," for CSV,
> and "|" for other. Although it can be little bit messy
Currently there's a strong analogy between the unaligned
mode and csv mode. In particular they use the existing pset
variables fieldsep, fieldsep_zero, recordsep, recordsep_zero
in the same way. If we want to make csv special with regard
to the delimiters, that complicates the user interface
For instance if fieldsep was changed automatically by
\pset format csv, it's not obvious if/when we should switch it
back to its previous state, and how the fieldsep switch done
automatically would mix or conflict with other \pset
commands issued by the user.
Or we need to duplicate these variables. Or duplicate
only fieldsep, having a fieldsep_csv, leaving out the
other variables and not being as close to the unaligned
mode.
These options don't appeal to me much compared
to the simplicity of the current patch.
Also, although the comma is the separator defined by the
RFC4180 and the default for COPY CSV, people also use the
semicolon extensively (because it's what Excel does I guess),
which somehow mitigates the importance of comma as the
default value.The functionality is clean and great - default setting "|" is less. I have not strong opinion about it and I understand to your arguments. Has anybody some idea?
two other ideas from me
1. currently \pset format xxx command has not any other parameters. Other optional parameters can be set of possible attributes like {, ; | \t header_off escape_all}. If this set will be known by tabcomplete, then it can be very friendly and I don't think so code will be more complex than in your proposal (long options can be "--csv --csv-options=", header_off" )
some like "\pset format csv , header_off
or just limited variant with only field separator specification
\pset format csv ;
2. if we support csv format, then we can introduce \gcsv xxx | gnuplot .... It is just idea.
@1 is much more interesting than @2
Regards
Pavel
RegardsPavel
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
2018-01-31 13:58 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:
Pavel Stehule wrote:
> This format is too important, so some special short or long option can be
> practical (it will be printed in help)
>
> some like --csv
I guess -C/--csv could be used, like there already is -H/--html.
I prefer just long option only. Maybe in future we can use short "C" for something else better. There is only few free short commands.
Regards
Pavel
> I found one issue - PostgreSQL default field separator is "|". Maybe good
> time to use more common "," ?
>
> Or when field separator was not explicitly defined, then use "," for CSV,
> and "|" for other. Although it can be little bit messy
Currently there's a strong analogy between the unaligned
mode and csv mode. In particular they use the existing pset
variables fieldsep, fieldsep_zero, recordsep, recordsep_zero
in the same way. If we want to make csv special with regard
to the delimiters, that complicates the user interface
For instance if fieldsep was changed automatically by
\pset format csv, it's not obvious if/when we should switch it
back to its previous state, and how the fieldsep switch done
automatically would mix or conflict with other \pset
commands issued by the user.
Or we need to duplicate these variables. Or duplicate
only fieldsep, having a fieldsep_csv, leaving out the
other variables and not being as close to the unaligned
mode.
These options don't appeal to me much compared
to the simplicity of the current patch.
Also, although the comma is the separator defined by the
RFC4180 and the default for COPY CSV, people also use the
semicolon extensively (because it's what Excel does I guess),
which somehow mitigates the importance of comma as the
default value.
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Pavel Stehule wrote: > > I guess -C/--csv could be used, like there already is -H/--html. > > > > I prefer just long option only. Maybe in future we can use short "C" for > something else better. There is only few free short commands. Looking at parse_psql_options(), currently psql has 35 long options and all of them have a corresponding single-character invocation. Unless there is a consensus for it, I don't feel like it's time to create the precedent that some options deserve short forms and others don't. Sure at some point we'll run out of letters, but looking back ten years ago, on Jan 2008 psql had 31 options, so the rate of adding new ones does not look worrying. Besides I like the fact that -C can be seen as a drop-in replacement for -A, because in most cases, it's just a safer version of -A, as it deals with the situation that the separator might be in the contents, which is the main weakness of -A. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Pavel Stehule wrote: > some like "\pset format csv , header_off If we did that, we'd need to reconsider the interactions of this with \t on|off and \pset footer on|off and how to keep things consistent with the unaligned format. I feel like it's not worth the trouble. We can still add this later if users are confused with the interface, but that interface is already well established with the unaligned format. > 2. if we support csv format, then we can introduce \gcsv xxx | gnuplot .... I thought committers rejected this because they didn't want a new variant of \g The csv format makes it possible to not add any new \g-style metacommand and inject csv into the existing ones. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
2018-02-12 15:33 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:
Pavel Stehule wrote:
> some like "\pset format csv , header_off
If we did that, we'd need to reconsider the interactions
of this with \t on|off and \pset footer on|off
and how to keep things consistent with the unaligned
format.
ok
I feel like it's not worth the trouble.
We can still add this later if users are confused with
the interface, but that interface is already well
established with the unaligned format.
> 2. if we support csv format, then we can introduce \gcsv xxx | gnuplot ....
I thought committers rejected this because they didn't want a
new variant of \g
The csv format makes it possible to not add any new \g-style
metacommand and inject csv into the existing ones.
ok
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Hello Daniel, > This patch implements csv as an output format in psql > (\pset format csv). It's quite similar to the unaligned format, > except that it applies CSV quoting rules (obviously!) and that > it prints no footer and no title. > As with unaligned, a header with column names is output unless > tuples_only is on. It also supports the fieldsep/fielsep_zero > and recordsep/recordsep_zero settings. Patch applies cleanly and compiles. "make check" ok, although there is no specific test for this feature... The documentation should mention the other CSV options (COPY, \copy, ...) and explain how they compare to this one. Maybe a specific paragraph about how to do CSV? I understand "\pset format csv" as triggering that all outputs compared to per command options. Given that this is somehow already available, I'm wondering why there is no code sharing. I find it annoying that setting csv keeps the default '|' separator, where ISTM that it should be by default "," (as in COMMA separated value:-). However it would not be a good idea to change another variables when setting one, obviously. Maybe some \csv command could set the format to csv, fieldsep to ",", tuples_only to on, recordsep to '\n'? Not sure whether it would be acceptable, though, and how to turn it off once turned on... Probably an average (aka not good) idea:-) The format adds column headers, however they are not escaped properly: psql> \pset format csv psql> \pset fieldsep , psql> SELECT 1 AS "hello, world", 2 AS """"; hello, world," 1,2 Also it does not seem to work properly in expanded mode, both for the column and values: psql> \x psql> SELECT 1 AS "bla""", E'\n,"' AS foo; bla",1 foo, ," There MUST be some tests, especially with ugly stuff (escapes, newlines, double quotes, various types, expanded or not, field seps, strange column names...). > Most of times, the need for CSV is covered by \copy or COPY with > the CSV option, but there are some cases where it would be more > practical to have it as an output format in psql. > > * \copy does not interpolate psql variables and is a single-line > command, so making a query fit these contraints can be cumbersome. > It can be got around by defining a temporary view and > \copy from that view, but that doesn't work in a read-only context > such as when connected to a standby. > > * the server-side COPY TO STDOUT can also be used from psql, > typically with psql -c "COPY (query) TO STDOUT CSV" > file.csv, > but that's too simple to extract multiple result sets per script. > COPY is also more rigid than psql in the options to delimit > fields and records. > > * copy with csv can't help for the output of meta-commands > such as \gx, \crosstabview, \l, \d ... whereas a CSV format within psql > does work with these. -- Fabien.
Fabien COELHO wrote: > Maybe some \csv command could set the format to csv, fieldsep to ",", > tuples_only to on, recordsep to '\n'? Not sure whether it would be > acceptable, though, and how to turn it off once turned on... Probably an > average (aka not good) idea:-) Thanks for reviewing this patch! Attached is a v2 fixing the bugs you mentioned, and adding ---csv/-C as discussed upthread. I'll add some regression tests shortly. About the default separator, the approach taken by this patch is that the csv output is equivalent to unaligned with csv quoting, so it has the same behavior and follows the options of the unaligned mode as much as possible. That seemed like a good idea to me when I wrote it, but maybe a better idea would to have a new \pset fieldsep_csv parameter with its own command-line option, and ignore {fieldsep, fieldsep_zero, recordsep, recordsep_zero} for this format, just like they are being ignored for HTML, or latex, or asciidoc... Does anyone who think that csv should be added care about this "alignment" with unaligned? :) Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Attachment
2018-03-01 17:10 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:
Fabien COELHO wrote:
> Maybe some \csv command could set the format to csv, fieldsep to ",",
> tuples_only to on, recordsep to '\n'? Not sure whether it would be
> acceptable, though, and how to turn it off once turned on... Probably an
> average (aka not good) idea:-)
Thanks for reviewing this patch!
Attached is a v2 fixing the bugs you mentioned, and adding ---csv/-C
as discussed upthread. I'll add some regression tests shortly.
About the default separator, the approach taken by this patch is that
the csv output is equivalent to unaligned with csv quoting, so it has
the same behavior and follows the options of the unaligned mode
as much as possible.
That seemed like a good idea to me when I wrote it, but maybe
a better idea would to have a new \pset fieldsep_csv
parameter with its own command-line option, and ignore {fieldsep,
fieldsep_zero, recordsep, recordsep_zero} for this format, just like
they are being ignored for HTML, or latex, or asciidoc...
Does anyone who think that csv should be added care
about this "alignment" with unaligned? :)
good idea
Pavel
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Hello Daniel, > Attached is a v2 fixing the bugs you mentioned, and adding ---csv/-C > as discussed upthread. I'll add some regression tests shortly. Basically I'm waiting for the version with regression tests before reviewing. It is unclear whether committer will like it. From my point of view being able to simply set postgres to generate csv is fine with me, with example uses such as: psql --csv 'TABLE Stuff;' > stuff.csv So that having the --csv option to turn to "full csv", i.e. set the format and various seperators as expected, would be a nice have. -- Fabien.
2018-03-07 10:45 GMT+01:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Daniel,Attached is a v2 fixing the bugs you mentioned, and adding ---csv/-C
as discussed upthread. I'll add some regression tests shortly.
Basically I'm waiting for the version with regression tests before reviewing.
It is unclear whether committer will like it.
From my point of view being able to simply set postgres to generate csv is fine with me, with example uses such as:
psql --csv 'TABLE Stuff;' > stuff.csv
So that having the --csv option to turn to "full csv", i.e. set the format and various seperators as expected, would be a nice have.
There is commad -c and it should be used. The --csv options should not to have a parameter. I don't like a idea to have more options for query execution.
Regards
Pavel
--
Fabien.
Hello Pavel, >> psql --csv 'TABLE Stuff;' > stuff.csv > > There is commad -c and it should be used. The --csv options should not to > have a parameter. I don't like a idea to have more options for query > execution. Yes, I agree and that is indeed what I meant, sorry for the typo. The cleaner example would be something like: psql --csv -c 'TABLE foo' > foo.csv With a -c to introduce the command. -- Fabien.
2018-03-07 19:40 GMT+01:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Pavel,psql --csv 'TABLE Stuff;' > stuff.csv
There is commad -c and it should be used. The --csv options should not to
have a parameter. I don't like a idea to have more options for query
execution.
Yes, I agree and that is indeed what I meant, sorry for the typo. The cleaner example would be something like:
psql --csv -c 'TABLE foo' > foo.csv
With a -c to introduce the command.
ok :) it has sense now
Regards
Pavel
--
Fabien.
On Wed, Mar 07, 2018 at 07:40:49PM +0100, Fabien COELHO wrote: > > Hello Pavel, > > >> psql --csv 'TABLE Stuff;' > stuff.csv > > > >There is commad -c and it should be used. The --csv options should not to > >have a parameter. I don't like a idea to have more options for query > >execution. > > Yes, I agree and that is indeed what I meant, sorry for the typo. The > cleaner example would be something like: > > psql --csv -c 'TABLE foo' > foo.csv > > With a -c to introduce the command. This seems pretty specialized. If we're adding something new, how about psql --format=csv -o foo.csv -c 'TABLE foo' Or we could stick with: psql -P format=csv -o foo.csv -c 'TABLE foo' Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
>> psql --csv -c 'TABLE foo' > foo.csv >> >> With a -c to introduce the command. > > This seems pretty specialized. If we're adding something new, how about > > psql --format=csv -o foo.csv -c 'TABLE foo' > > Or we could stick with: > > psql -P format=csv -o foo.csv -c 'TABLE foo' Currently "-P format=csv" uses the unaligned formating separators, i.e. '|' is used. I was suggesting that a special long option could switch several variables to some specific values, i.e. --csv Would be equivalent to something like: -P format=csv -P fieldsep=, -P recordsep=\n (?) -P tuples_only=on ... I.e. really generate some csv from the data in just one option, not many. But this is obviously debatable. -- Fabien.
On Wed, Mar 07, 2018 at 08:04:05PM +0100, Fabien COELHO wrote: > > >> psql --csv -c 'TABLE foo' > foo.csv > >> > >>With a -c to introduce the command. > > > >This seems pretty specialized. If we're adding something new, how about > > > > psql --format=csv -o foo.csv -c 'TABLE foo' > > > >Or we could stick with: > > > > psql -P format=csv -o foo.csv -c 'TABLE foo' > > Currently "-P format=csv" uses the unaligned formating separators, i.e. '|' > is used. I was suggesting that a special long option could switch several > variables to some specific values, i.e. > > --csv > > Would be equivalent to something like: > > -P format=csv -P fieldsep=, -P recordsep=\n (?) -P tuples_only=on ... We have some inconsistency here in that fewer table formats are supported, but I think asciidoc, etc., do this correctly via invocations like: psql -P format=asciidoc -o foo.adoc -AtXc 'TABLE foo' > I.e. really generate some csv from the data in just one option, not many. > > But this is obviously debatable. I suspect we'll get requests for an all-JSON option, HTML tables, etc., assuming we don't have them already. I'm hoping we can have that all in one framework. I get that setting each of tuples_only, fieldsep, recordsep, etc. might be a bit of a lift for some users, but it's not clear how we'd make a sane default that made choices among those correct for enough users. For example, do we know that we want tuples_only behavior by default? A lot of people's CSV tools assume a header row. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
2018-03-07 20:25 GMT+01:00 David Fetter <david@fetter.org>:
On Wed, Mar 07, 2018 at 08:04:05PM +0100, Fabien COELHO wrote:
>
> >> psql --csv -c 'TABLE foo' > foo.csv
> >>
> >>With a -c to introduce the command.
> >
> >This seems pretty specialized. If we're adding something new, how about
> >
> > psql --format=csv -o foo.csv -c 'TABLE foo'
> >
> >Or we could stick with:
> >
> > psql -P format=csv -o foo.csv -c 'TABLE foo'
>
> Currently "-P format=csv" uses the unaligned formating separators, i.e. '|'
> is used. I was suggesting that a special long option could switch several
> variables to some specific values, i.e.
>
> --csv
>
> Would be equivalent to something like:
>
> -P format=csv -P fieldsep=, -P recordsep=\n (?) -P tuples_only=on ...
We have some inconsistency here in that fewer table formats are
supported, but I think asciidoc, etc., do this correctly via
invocations like:
psql -P format=asciidoc -o foo.adoc -AtXc 'TABLE foo'
> I.e. really generate some csv from the data in just one option, not many.
>
> But this is obviously debatable.
I suspect we'll get requests for an all-JSON option, HTML tables,
etc., assuming we don't have them already.
I'm hoping we can have that all in one framework. I get that setting
each of tuples_only, fieldsep, recordsep, etc. might be a bit of a
lift for some users, but it's not clear how we'd make a sane default
that made choices among those correct for enough users.
For example, do we know that we want tuples_only behavior by default?
A lot of people's CSV tools assume a header row.
I am for default header - it can be disabled by -t option
Pavel
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
I wrote: > a better idea would to have a new \pset fieldsep_csv PFA a v3 patch that implements that, along with regression tests this time. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Attachment
David Fetter wrote: > This seems pretty specialized. If we're adding something new, how about > > psql --format=csv -o foo.csv -c 'TABLE foo' It's a bit easier to memorize than -P format=csv, but psql doesn't have any long option that does not a have a short form with a single letter, and both -F and -f are already taken. Contrary to -C that isn't used until now. > Or we could stick with: > > psql -P format=csv -o foo.csv -c 'TABLE foo' That already works as of the current patch, just like with the other formats. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
David Fetter wrote: > We have some inconsistency here in that fewer table formats are > supported, but I think asciidoc, etc., do this correctly via > invocations like: > > psql -P format=asciidoc -o foo.adoc -AtXc 'TABLE foo' -A is equivalent to -P format=unaligned, so in the above invocation, it cancels the effect of -P format=asciidoc. Anyway -P format=name on the command line is the same as "\pset format name" as a metacommand, so it works for all formats. Some formats (unaligned, html) have corresponding command-line options (-A, -H), and others don't. In this patch, -C is used so that csv would be in the category of formats that can be switched on with the simpler invocation on the command line. If we don't like that, we can leave out -C for future use and let users write -P format=csv. That's not the best choice from my POV though, as csv is a primary choice to export tabular data. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
2018-03-07 21:31 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:
David Fetter wrote:
> We have some inconsistency here in that fewer table formats are
> supported, but I think asciidoc, etc., do this correctly via
> invocations like:
>
> psql -P format=asciidoc -o foo.adoc -AtXc 'TABLE foo'
-A is equivalent to -P format=unaligned, so in the above
invocation, it cancels the effect of -P format=asciidoc.
Anyway -P format=name on the command line
is the same as "\pset format name" as a
metacommand, so it works for all formats.
Some formats (unaligned, html) have corresponding
command-line options (-A, -H), and others don't.
In this patch, -C is used so that csv would be in the
category of formats that can be switched on with the simpler
invocation on the command line.
If we don't like that, we can leave out -C for future use
and let users write -P format=csv.
That's not the best choice from my POV though, as csv
is a primary choice to export tabular data.
-C can be used for certificates or some similar. I like csv, but I am not sure, so it is too important to get short option (the list of free chars will be only shorter)
Regards
Pavel
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
On Wed, Mar 07, 2018 at 09:37:26PM +0100, Pavel Stehule wrote: > 2018-03-07 21:31 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>: > > > David Fetter wrote: > > > > > We have some inconsistency here in that fewer table formats are > > > supported, but I think asciidoc, etc., do this correctly via > > > invocations like: > > > > > > psql -P format=asciidoc -o foo.adoc -AtXc 'TABLE foo' > > > > -A is equivalent to -P format=unaligned, so in the above > > invocation, it cancels the effect of -P format=asciidoc. > > Anyway -P format=name on the command line > > is the same as "\pset format name" as a > > metacommand, so it works for all formats. > > > > Some formats (unaligned, html) have corresponding > > command-line options (-A, -H), and others don't. > > In this patch, -C is used so that csv would be in the > > category of formats that can be switched on with the simpler > > invocation on the command line. > > If we don't like that, we can leave out -C for future use > > and let users write -P format=csv. > > That's not the best choice from my POV though, as csv > > is a primary choice to export tabular data. > > > > -C can be used for certificates or some similar. I like csv, but I am not > sure, so it is too important to get short option (the list of free chars > will be only shorter) +1 for not using up a single-letter option for this. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
2018-03-07 22:16 GMT+01:00 David Fetter <david@fetter.org>:
On Wed, Mar 07, 2018 at 09:37:26PM +0100, Pavel Stehule wrote:
> 2018-03-07 21:31 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:
>
> > David Fetter wrote:
> >
> > > We have some inconsistency here in that fewer table formats are
> > > supported, but I think asciidoc, etc., do this correctly via
> > > invocations like:
> > >
> > > psql -P format=asciidoc -o foo.adoc -AtXc 'TABLE foo'
> >
> > -A is equivalent to -P format=unaligned, so in the above
> > invocation, it cancels the effect of -P format=asciidoc.
> > Anyway -P format=name on the command line
> > is the same as "\pset format name" as a
> > metacommand, so it works for all formats.
> >
> > Some formats (unaligned, html) have corresponding
> > command-line options (-A, -H), and others don't.
> > In this patch, -C is used so that csv would be in the
> > category of formats that can be switched on with the simpler
> > invocation on the command line.
> > If we don't like that, we can leave out -C for future use
> > and let users write -P format=csv.
> > That's not the best choice from my POV though, as csv
> > is a primary choice to export tabular data.
> >
>
> -C can be used for certificates or some similar. I like csv, but I am not
> sure, so it is too important to get short option (the list of free chars
> will be only shorter)
+1 for not using up a single-letter option for this.
Is there some rule, so alone long options are disallowed? When this software will be more mature, then we cannot to find "inteligent" short option for lot of tasks.
Regards
Pavel
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>> I.e. really generate some csv from the data in just one option, not many. >> >> But this is obviously debatable. > > I suspect we'll get requests for an all-JSON option, HTML tables, > etc., assuming we don't have them already. I would definitely be fine with --html (which indeed already exists) & --json (which does not, but could some day) as long options. > I'm hoping we can have that all in one framework. ISTM that it is more or less the case if an option simply presets a bunch of existing table output options which is an existing framework. > I get that setting each of tuples_only, fieldsep, recordsep, etc. > might be a bit of a lift for some users, but it's not clear how we'd > make a sane default that made choices among those correct for enough > users. For example, do we know that we want tuples_only behavior by > default? A lot of people's CSV tools assume a header row. If there is a possible disagreement on one option, then let it out and use the corresponding short option if needed? Tuple only: psql --csv -t -c 'TABLE foo' -o foo.csv With title headers: psql --csv -c 'TABLE foo' -o foo.csv Would be okay. -- Fabien.
2018-03-08 9:29 GMT+01:00 Fabien COELHO <coelho@cri.ensmp.fr>:
+1
I.e. really generate some csv from the data in just one option, not many.
But this is obviously debatable.
I suspect we'll get requests for an all-JSON option, HTML tables,
etc., assuming we don't have them already.
I would definitely be fine with --html (which indeed already exists) & --json (which does not, but could some day) as long options.I'm hoping we can have that all in one framework.
ISTM that it is more or less the case if an option simply presets a bunch of existing table output options which is an existing framework.I get that setting each of tuples_only, fieldsep, recordsep, etc. might be a bit of a lift for some users, but it's not clear how we'd make a sane default that made choices among those correct for enough users. For example, do we know that we want tuples_only behavior by default? A lot of people's CSV tools assume a header row.
If there is a possible disagreement on one option, then let it out and use the corresponding short option if needed?
Tuple only:
psql --csv -t -c 'TABLE foo' -o foo.csv
With title headers:
psql --csv -c 'TABLE foo' -o foo.csv
Would be okay.
+1
Pavel
--
Fabien.
Hello Daniel, > PFA a v3 patch that implements that, along with regression tests this > time. My 0.02 €: Patch applies cleanly, compiles, make check ok, doc generation ok. I'm in favor of having a simple psql way to generate a convenient and compliant CSV output for export/import. I also think that a short option brings little value, and "--csv" is good enough for the purpose, so I would agree to remove the "-C" binding. About "fieldsep_csv", I do not like much the principle of having different output variables to represent the same concept depending on the format. I would rather have reused fieldsep as in your previous submission and set it to "," when under --csv. This is not a strong opinion and other people may differ: the committer opinion is the one to follow:-) The "\n" eol style is hardcoded. Should it use "recordsep"? For instance, https://tools.ietf.org/html/rfc4180 seems to specify CRLF end of lines. The definition is evolving, eg https://www.w3.org/TR/tabular-data-model/ accepts both "\r" and "\r\n". I do not like using windows eol, but I think that it should be possible to do it, which is not the case with this version. The "\pset format" error message in "do_pset" shows values in seemingly random order. The situation is pre-existing but not really satisfactory. I'd suggest to put all values in alphabetical order. In csv_print_field & csv_print_text, you are not consistent when putting braces on blocks with only one instruction. I'd suggest not to put braces in that case. I'd suggest that tests should include more types, not just strings. I would suggest int, float, timestamp, bytea, an array (which uses , as a separator), json (which uses both " and ,)... -- Fabien.
2018-03-08 11:05 GMT+01:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Daniel,PFA a v3 patch that implements that, along with regression tests this time.
My 0.02 €:
Patch applies cleanly, compiles, make check ok, doc generation ok.
I'm in favor of having a simple psql way to generate a convenient and compliant CSV output for export/import.
I also think that a short option brings little value, and "--csv" is good enough for the purpose, so I would agree to remove the "-C" binding.
About "fieldsep_csv", I do not like much the principle of having different output variables to represent the same concept depending on the format. I would rather have reused fieldsep as in your previous submission and set it to "," when under --csv. This is not a strong opinion and other people may differ: the committer opinion is the one to follow:-)
The "\n" eol style is hardcoded. Should it use "recordsep"? For instance, https://tools.ietf.org/html/rfc4180 seems to specify CRLF end of lines. The definition is evolving, eg https://www.w3.org/TR/tabular- data-model/ accepts both "\r" and "\r\n". I do not like using windows eol, but I think that it should be possible to do it, which is not the case with this version.
In this case recordsep is not problem - default is ok.
The "\pset format" error message in "do_pset" shows values in seemingly random order. The situation is pre-existing but not really satisfactory. I'd suggest to put all values in alphabetical order.
In csv_print_field & csv_print_text, you are not consistent when putting braces on blocks with only one instruction. I'd suggest not to put braces in that case.
I'd suggest that tests should include more types, not just strings. I would suggest int, float, timestamp, bytea, an array (which uses , as a separator), json (which uses both " and ,)...
--
Fabien.
2018-03-08 11:17 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2018-03-08 11:05 GMT+01:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Daniel,PFA a v3 patch that implements that, along with regression tests this time.
My 0.02 €:
Patch applies cleanly, compiles, make check ok, doc generation ok.
I'm in favor of having a simple psql way to generate a convenient and compliant CSV output for export/import.
I also think that a short option brings little value, and "--csv" is good enough for the purpose, so I would agree to remove the "-C" binding.
About "fieldsep_csv", I do not like much the principle of having different output variables to represent the same concept depending on the format. I would rather have reused fieldsep as in your previous submission and set it to "," when under --csv. This is not a strong opinion and other people may differ: the committer opinion is the one to follow:-)
Looks like complex rule. But maybe it is acceptable. If the format is csv, and fieldsep is not defined, then use ','. If it defined, then use defined field sep. I am not sure, if is possible distinct between default and explicitly defined field separator.
If not, then I prefer fieldsep_csv.
The "\n" eol style is hardcoded. Should it use "recordsep"? For instance, https://tools.ietf.org/html/rfc4180 seems to specify CRLF end of lines. The definition is evolving, eg https://www.w3.org/TR/tabular- data-model/ accepts both "\r" and "\r\n". I do not like using windows eol, but I think that it should be possible to do it, which is not the case with this version. In this case recordsep is not problem - default is ok.
The "\pset format" error message in "do_pset" shows values in seemingly random order. The situation is pre-existing but not really satisfactory. I'd suggest to put all values in alphabetical order.
In csv_print_field & csv_print_text, you are not consistent when putting braces on blocks with only one instruction. I'd suggest not to put braces in that case.
I'd suggest that tests should include more types, not just strings. I would suggest int, float, timestamp, bytea, an array (which uses , as a separator), json (which uses both " and ,)...
--
Fabien.
On 3/8/18 05:05, Fabien COELHO wrote: > I'm in favor of having a simple psql way to generate a convenient and > compliant CSV output for export/import. yes > I also think that a short option brings little value, and "--csv" is good > enough for the purpose, so I would agree to remove the "-C" binding. yes > About "fieldsep_csv", I do not like much the principle of having different > output variables to represent the same concept depending on the format. I > would rather have reused fieldsep as in your previous submission and set > it to "," when under --csv. yes > The "\n" eol style is hardcoded. Should it use "recordsep"? yes > The "\pset format" error message in "do_pset" shows values in seemingly > random order. The situation is pre-existing but not really satisfactory. > I'd suggest to put all values in alphabetical order. yes > I'd suggest that tests should include more types, not just strings. I > would suggest int, float, timestamp, bytea, an array (which uses , as a > separator), json (which uses both " and ,)... sounds good -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2018-03-09 3:13 GMT+01:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>:
On 3/8/18 05:05, Fabien COELHO wrote:
> I'm in favor of having a simple psql way to generate a convenient and
> compliant CSV output for export/import.
yes
> I also think that a short option brings little value, and "--csv" is good
> enough for the purpose, so I would agree to remove the "-C" binding.
yes
> About "fieldsep_csv", I do not like much the principle of having different
> output variables to represent the same concept depending on the format. I
> would rather have reused fieldsep as in your previous submission and set
> it to "," when under --csv.
yes
how will be possible to set different separator ';'? I don't see it with described design
Regards
Pavel
> The "\n" eol style is hardcoded. Should it use "recordsep"?
yes
> The "\pset format" error message in "do_pset" shows values in seemingly
> random order. The situation is pre-existing but not really satisfactory.
> I'd suggest to put all values in alphabetical order.
yes
> I'd suggest that tests should include more types, not just strings. I
> would suggest int, float, timestamp, bytea, an array (which uses , as a
> separator), json (which uses both " and ,)...
sounds good
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>> About "fieldsep_csv", I do not like much the principle of having >>> different output variables to represent the same concept depending on >>> the format. I would rather have reused fieldsep as in your previous >>> submission and set it to "," when under --csv. >> >> yes >> > > how will be possible to set different separator ';'? I don't see it with > described design Indeed, it should be possible. I think that the following should be made to work: psql --csv -P fieldsep=; -c 'TABLE foo' > foo.csv So that it can be changed the semi-colon (or tab or whatever) style if required. -- Fabien.
2018-03-09 8:40 GMT+01:00 Fabien COELHO <coelho@cri.ensmp.fr>:
About "fieldsep_csv", I do not like much the principle of having different output variables to represent the same concept depending on the format. I would rather have reused fieldsep as in your previous submission and set it to "," when under --csv.
yes
how will be possible to set different separator ';'? I don't see it with
described design
Indeed, it should be possible. I think that the following should be made to work:
psql --csv -P fieldsep=; -c 'TABLE foo' > foo.csv
So that it can be changed the semi-colon (or tab or whatever) style if required.
ok, then all is ok
Regards
Pavel
--
Fabien.
Fabien COELHO wrote: > I also think that a short option brings little value, and "--csv" is good > enough for the purpose, so I would agree to remove the "-C" binding. It's not that accepting -C brings much value by itself, it's that loosing the consistency across all options comes with a negative cost. The point is that up to now all options have a short form and a long form, so --csv would be a deliberate exception. I'm rather unconvinced it's justified, but I seem to be alone in that case, so I'll comply. > About "fieldsep_csv", I do not like much the principle of having different > output variables to represent the same concept depending on the format. I > would rather have reused fieldsep as in your previous submission and set > it to "," when under --csv The trouble with fieldsep is that it defaults to '|', which both you and Pavel say you dislike. Fair enough, it's better to have ',' by default, but the cleanest solution to that is fieldsep_csv with its own default. The solution to set fieldsep automatically to ',' with \pset format csv is problematic. For instance \pset format csv \pset fieldsep ';' changes fieldsep to ';' as expected, but in the other order \pset fieldsep ';' \pset format csv you get ',' while reasonably you'd expect ';' Same problem on the command line. Options are evaluated left-to-right: $ psql --csv -F';' would work as expected, but $ psql -F';' --csv would not. I don't feel good about solving these issues with ad-hoc rules, or inventing the notion that a pset variable has been defined but not user-redefined. This stuff has "inconsistent" written all over it and I don't see a maintainer going along with that. > The "\n" eol style is hardcoded. Should it use "recordsep"? For instance, > https://tools.ietf.org/html/rfc4180 seems to specify CRLF end of lines. > The definition is evolving, eg https://www.w3.org/TR/tabular-data-model/ > accepts both "\r" and "\r\n". I do not like using windows eol, but I think > that it should be possible to do it, which is not the case with this > version. Interesting point. The output stream is opened in text mode so printing '\n' should generate LF on Unix, CR LF on Windows, and I think CR on MacOS. I think that's for the best. recordsep in the unaligned mode doesn't play the role of a line ending because the last line is not finished by recordsep. According to the source code, this is intended, see print_unaligned_text() in print.c: /* * The last record is terminated by a newline, independent of the set * record separator. But when the record separator is a zero byte, we * use that (compatible with find -print0 and xargs). */ > The "\pset format" error message in "do_pset" shows values in seemingly > random order. The situation is pre-existing but not really satisfactory. > I'd suggest to put all values in alphabetical order. ok > In csv_print_field & csv_print_text, you are not consistent when putting > braces on blocks with only one instruction. I'd suggest not to put braces > in that case. ok > I'd suggest that tests should include more types, not just strings. I > would suggest int, float, timestamp, bytea, an array (which uses , as a > separator), json (which uses both " and ,)... I'll do but the printout code is type-agnostic so it's not supposed to make a difference compared to mere literals. Cases with NULLs are missing though, I'll go add some too. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
I wrote: > recordsep in the unaligned mode doesn't play the role of a line ending > because the last line is not finished by recordsep. According to the source >code, this is intended, see print_unaligned_text() in print.c: Something else comes to mind: CSV allows linefeeds inside fields, and we don't want to replace these with record separators. So the notion that recordsep can be used to choose line endings is even less okay than if there was just the last line issue. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
I wrote:
> recordsep in the unaligned mode doesn't play the role of a line ending
> because the last line is not finished by recordsep. According to the source
>code, this is intended, see print_unaligned_text() in print.c:
Something else comes to mind: CSV allows linefeeds inside fields, and
we don't want to replace these with record separators.
So the notion that recordsep can be used to choose line endings
is even less okay than if there was just the last line issue.
I'm not following - if recordsep is not something that would interpreted as a newline then the file we output would have not structural newlines. It might have data newlines but those would be quoted.
David J.
David G. Johnston wrote: > I'm not following - if recordsep is not something that would > interpreted as a newline then the file we output would have not > structural newlines. > It might have data newlines but those would be quoted. They would be, but I don't quite understand the point in your first sentence. All I'm saying is that setting recordsep to '\r\n' does not achieve the goal of obtaining CRLF line endings. Here's an example with the unaligned mode, linux host: (psql -At -P recordsep=$'\r\n' << EOF select E'A\nB' union E'C\nD'; EOF ) | hexdump -C The result is: 00000000 41 0a 42 0d 0a 43 0a 44 0a |A.B..C.D.| The expectation of CRLF line endings is that every LF would be preceded by CR, but here we get that only for 1 LF out of 4. That's useless. It's not a bug. We asked for a CRLF to separate our two records and that's exactly what we got, no more no less. In csv output, the difference would be that there would a double quote character before A and after B, and before C and after D but otherwise it would be the same mix of LF and CRLF. I think that the point of recordsep in unaligned mode is you can set it to something that never appears in the data, especially when embedded newlines might be in the data. In CSV this is solved differently so we don't need it. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
I think that the point of recordsep in unaligned mode is you can set it
to something that never appears in the data, especially when embedded
newlines might be in the data. In CSV this is solved differently so
we don't need it.
I'd rather argue it from the standpoint that \copy doesn't use recordsep nor fieldsep and thus neither should --csv; which is arguably a convenience invocation of \copy that pipes to psql's stdout (and overcomes \copy's single-line limitation - which I think still exists... - and inability to use variables - does it?...). COPY doesn't allow for changing the record separator and the newline output is system-dependent. I can accept the same limitation with this feature.
I suppose the question is how many "COPY" options do we want to expose on the command line, and how does it look?
I'll put a -1 on having a short option (-C or otherwise); "that is the way its always been done" doesn't work for me here - by way of example "-a and -A" is ill-advised; --echo-all does not seem important enough to warrant a short option (especially not a lower-case one) and so the more useful unaligned mode is forced into the secondary capital A position.
David J.
Hello Daniel, About "-C", I'm fine if it is used and if it is not used, really. psql begins to be quite full of one letter options, currently 34 of them, with upper & lower cases and numbers included. > The solution to set fieldsep automatically to ',' with > \pset format csv is problematic. I agree. I'm really advocating that --csv would set fieldsep, but manual pset on format would still do what is expected, and only that, i.e. --csv is NOT a simple shortcut for -P format=csv". > Same problem on the command line. Options are evaluated left-to-right: > > $ psql --csv -F';' > would work as expected, but > $ psql -F';' --csv > would not. ISTM that having an option overriding another one after it is standard practice. I would be fine with that if --csv is documented as "setting format, fieldsep and recordsep to default suited for outputting CSV". Now this is just a personal opinion. >> The "\n" eol style is hardcoded. Should it use "recordsep"? For instance, >> https://tools.ietf.org/html/rfc4180 seems to specify CRLF end of lines. >> The definition is evolving, eg https://www.w3.org/TR/tabular-data-model/ >> accepts both "\r" and "\r\n". I do not like using windows eol, but I think >> that it should be possible to do it, which is not the case with this >> version. > > Interesting point. The output stream is opened in text mode so printing > '\n' should generate LF on Unix, CR LF on Windows, and I think CR on MacOS. > I think that's for the best. I did not know that C's putc/printf/... would change output on sight on different systems. I'm not sure I like it. It would still mean that one cannot change the format to suits \r\n or \n at will, which is kind of disappointing. > recordsep in the unaligned mode doesn't play the role of a line ending > because the last line is not finished by recordsep. It could just be with CSV format? As you point out, there is already an exception with the separator is '\0'. Note that the last line of a CSV file may or may not end with \n or \r\n, according to specs. >> I'd suggest that tests should include more types, not just strings. I >> would suggest int, float, timestamp, bytea, an array (which uses , as a >> separator), json (which uses both " and ,)... > > I'll do but the printout code is type-agnostic so it's not supposed > to make a difference compared to mere literals. Sure, but it seems better to actually see that it works properly for non trivial cases. > Cases with NULLs are missing though, I'll go add some too. Indeed. -- Fabien.
>> recordsep in the unaligned mode doesn't play the role of a line ending >> because the last line is not finished by recordsep. According to the source >> code, this is intended, see print_unaligned_text() in print.c: > > Something else comes to mind: CSV allows linefeeds inside fields, and > we don't want to replace these with record separators. Sure. > So the notion that recordsep can be used to choose line endings > is even less okay than if there was just the last line issue. I'm not following. ISTM that the escaping macanism would work in pretty all reasonable cases, although it might be possible to shot oneself in the foot by setting manually strange values for recordsep, eg '"'. I do not see that as a significant issue. If the user asks for something stupid, they get something stupid, fine. -- Fabien.
Hi Daniel, On 3/10/18 1:49 AM, Fabien COELHO wrote: > >>> recordsep in the unaligned mode doesn't play the role of a line ending >>> because the last line is not finished by recordsep. According to the >>> source >>> code, this is intended, see print_unaligned_text() in print.c: >> >> Something else comes to mind: CSV allows linefeeds inside fields, and >> we don't want to replace these with record separators. > > Sure. > >> So the notion that recordsep can be used to choose line endings >> is even less okay than if there was just the last line issue. > > I'm not following. ISTM that the escaping macanism would work in pretty > all reasonable cases, although it might be possible to shot oneself in > the foot by setting manually strange values for recordsep, eg '"'. I do > not see that as a significant issue. If the user asks for something > stupid, they get something stupid, fine. This patch looks like it is getting close but there has been no update in a while. Do you know when you'll have an updated patch that addresses the minor issues brought up in review and the concern above? Regards, -- -David david@pgmasters.net
David Steele wrote: > Do you know when you'll have an updated patch that addresses the minor > issues brought up in review and the concern above? Here's an update addressing the issues discussed: - fieldsep and recordsep are used, no more fieldsep_csv - the command line option is --csv without short option and is equivalent to -P format=csv -P fieldsep=',' - pset output formats are reordered alphabetically on display - a couple more cases in the regression tests Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Attachment
Hi
2018-03-22 16:30 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:
David Steele wrote:
> Do you know when you'll have an updated patch that addresses the minor
> issues brought up in review and the concern above?
Here's an update addressing the issues discussed:
- fieldsep and recordsep are used, no more fieldsep_csv
- the command line option is --csv without short option and is equivalent
to -P format=csv -P fieldsep=','
- pset output formats are reordered alphabetically on display
- a couple more cases in the regression tests
I am sorry, I don't think so this design is correct. It introduce dependency on arguments order
# correct output
[pavel@nemesis postgresql]$ psql --csv -c "table foo" -F ';'
a;b;c
1;2;Hello
3;4;Nazdar
a;b;c
1;2;Hello
3;4;Nazdar
# error -F is ignored
[pavel@nemesis postgresql]$ psql -F ';' --csv -c "table foo"
a,b,c
1,2,Hello
3,4,Nazdar
[pavel@nemesis postgresql]$ psql -F ';' --csv -c "table foo"
a,b,c
1,2,Hello
3,4,Nazdar
Using \pset format csv means overwriting field sep every time - nobody uses |
I think so dependency on order of psql arguments is significant problem
Regards
Pavel
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Hello Pavel, > Using \pset format csv means overwriting field sep every time - nobody uses > | Yep. The alternative is to have a csv-specific separator variable, which does not seem very useful, must be remembered, but this is indeed debatable. > I think so dependency on order of psql arguments is significant problem This is intentional, and this issue/feature already exists, the last argument overwrite previous settings thus will win, eg: psql --pset=format=troff --html -c 'SELECT 1' Will output in html, not in troff. -- Fabien.
2018-03-22 18:38 GMT+01:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Pavel,Using \pset format csv means overwriting field sep every time - nobody uses
|
Yep. The alternative is to have a csv-specific separator variable, which does not seem very useful, must be remembered, but this is indeed debatable.I think so dependency on order of psql arguments is significant problem
This is intentional, and this issue/feature already exists, the last argument overwrite previous settings thus will win, eg:
psql --pset=format=troff --html -c 'SELECT 1'
Will output in html, not in troff.
Can we introduce some format specific default separators - if we would not to introduce csv_field_sep options?
It should not be hard. All formats can has '|' like now, and csv can have a ',' - then if field separator is not explicit, then default field separator is used, else specified field separator is used.
You can see my idea in attached patch
Regards
Pavel
postgres=# \pset format csv
Output format is csv.
postgres=# select * from foo;
a,b,c
1,2,Hello
3,4,Nazdar
postgres=# \pset fieldsep ;
Field separator is ";".
postgres=# select * from foo;
a;b;c
1;2;Hello
3;4;Nazdar
postgres=# \pset format csv
Output format is csv.
postgres=# select * from foo;
a,b,c
1,2,Hello
3,4,Nazdar
postgres=# \pset fieldsep ;
Field separator is ";".
postgres=# select * from foo;
a;b;c
1;2;Hello
3;4;Nazdar
--
Fabien.
Attachment
2018-03-22 19:28 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2018-03-22 18:38 GMT+01:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Pavel,Using \pset format csv means overwriting field sep every time - nobody uses
|
Yep. The alternative is to have a csv-specific separator variable, which does not seem very useful, must be remembered, but this is indeed debatable.I think so dependency on order of psql arguments is significant problem
This is intentional, and this issue/feature already exists, the last argument overwrite previous settings thus will win, eg:
psql --pset=format=troff --html -c 'SELECT 1'
Will output in html, not in troff.Can we introduce some format specific default separators - if we would not to introduce csv_field_sep options?It should not be hard. All formats can has '|' like now, and csv can have a ',' - then if field separator is not explicit, then default field separator is used, else specified field separator is used.You can see my idea in attached patchRegardsPavel
postgres=# \pset format csv
Output format is csv.
postgres=# select * from foo;
a,b,c
1,2,Hello
3,4,Nazdar
postgres=# \pset fieldsep ;
Field separator is ";".
postgres=# select * from foo;
a;b;c
1;2;Hello
3;4;Nazdar
The default fieldsep should be "off" that means so format defaults are used. ',' is used for CSV, | for any else.
So all will work like now, but there will be bigger freedom with new format design. Now, all formats with possibility to setting fieldsep, should to share default '|', what I think, is not practical.
--
Fabien.
2018-03-22 20:10 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2018-03-22 19:28 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:2018-03-22 18:38 GMT+01:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Pavel,Using \pset format csv means overwriting field sep every time - nobody uses
|
Yep. The alternative is to have a csv-specific separator variable, which does not seem very useful, must be remembered, but this is indeed debatable.I think so dependency on order of psql arguments is significant problem
This is intentional, and this issue/feature already exists, the last argument overwrite previous settings thus will win, eg:
psql --pset=format=troff --html -c 'SELECT 1'
Will output in html, not in troff.Can we introduce some format specific default separators - if we would not to introduce csv_field_sep options?It should not be hard. All formats can has '|' like now, and csv can have a ',' - then if field separator is not explicit, then default field separator is used, else specified field separator is used.You can see my idea in attached patchRegardsPavel
postgres=# \pset format csv
Output format is csv.
postgres=# select * from foo;
a,b,c
1,2,Hello
3,4,Nazdar
postgres=# \pset fieldsep ;
Field separator is ";".
postgres=# select * from foo;
a;b;c
1;2;Hello
3;4;NazdarThe default fieldsep should be "off" that means so format defaults are used. ',' is used for CSV, | for any else.So all will work like now, but there will be bigger freedom with new format design. Now, all formats with possibility to setting fieldsep, should to share default '|', what I think, is not practical.
I can make patch, if there will be a agreement.
comments?
Regards
Pavel
--
Fabien.
Pavel Stehule wrote: > It should not be hard. All formats can has '|' like now, and csv can have a > ',' - then if field separator is not explicit, then default field separator > is used, else specified field separator is used. > > You can see my idea in attached patch With that patch, consider this sequence: postgres=# \pset format csv Output format is csv. postgres=# \pset fieldsep Field separator is "|". postgres=# select 1 as a,2 as b; a,b 1,2 Here psql reports that fieldsep is "|" and right away is using something else in the output. That doesn't look good. You may object that it's fixable by tweaking the output of \pset, \pset fieldsep, and \? variables so that it knows that the current output format is going to use a "hidden" default separator, and then these commands should display that value instead. But that'd be somewhat playing whack-a-mole, as the following sequence would now be possible, with '|' being used as the separator instead of the ',' reported just above: postgres=# \pset format csv Output format is csv. postgres=# \pset fieldsep Field separator is ",". postgres=# \a Output format is aligned. postgres=# select 1 as a,2 as b; a | b ---+--- 1 | 2 Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Hello Daniel, >> Do you know when you'll have an updated patch that addresses the minor >> issues brought up in review and the concern above? > > Here's an update addressing the issues discussed: > > - fieldsep and recordsep are used, no more fieldsep_csv > - the command line option is --csv without short option and is equivalent > to -P format=csv -P fieldsep=',' > - pset output formats are reordered alphabetically on display > - a couple more cases in the regression tests Patch applies cleanly, compiles, doc gen ok, "make check" ok. The patch adds a simple way to generate csv output from "psql" queries, much simpler than playing around with COPY or \copy. It allows to generate a clean CSV dump from something as short as: sh> psql --csv -c 'TABLE foo' > foo.csv Documentation is clear. Test cover a significant number of cases (fieldsep, expanded, tuples-only). Although recordsep changes are not actually tested, it worked interactively and I think that tests are sufficient as is. There are somehow remaining point about which a committer/other people input would be nice: (1) There are some mild disagreement whether the fieldsep should be format specific shared with other format. I do not think that a specific fieldsep is worth it, but this is a marginal preference, and other people opinion differ. What is best is not obvious. Pavel also suggested to have a special handling based on whether the fieldsep is explicitely set or not. I'm not too keen on that because it departs significantly from the way psql formatting is currently handled, and what is happening becomes unclear to the user. (2) For interactive use, two commands are required: \pset format csv + \pset fieldsep ',' (or ';' or '\t' or whatever...). Maybe some \csv command similar to \H would be appropriate, or not, to set both values more efficiently. Could be something for another patch. Not sure what is the status of the patch if we do not have a clear consensus. -- Fabien.
2018-03-23 12:59 GMT+01:00 Daniel Verite <daniel@manitou-mail.org>:
Pavel Stehule wrote:
> It should not be hard. All formats can has '|' like now, and csv can have a
> ',' - then if field separator is not explicit, then default field separator
> is used, else specified field separator is used.
>
> You can see my idea in attached patch
With that patch, consider this sequence:
postgres=# \pset format csv
Output format is csv.
postgres=# \pset fieldsep
Field separator is "|".
postgres=# select 1 as a,2 as b;
a,b
1,2
Here psql reports that fieldsep is "|" and right away is using something
else in the output. That doesn't look good.
yes - my patch was proof concept - nothing more.
But this can be simply solved - if we have a table of default field separator, then if separator is not explicit, then default for used format is printed.
You may object that it's fixable by tweaking the output of \pset,
\pset fieldsep, and \? variables so that it knows that the current
output format is going to use a "hidden" default separator, and
then these commands should display that value instead.
But that'd be somewhat playing whack-a-mole, as the following
sequence would now be possible, with '|' being used as
the separator instead of the ',' reported just above:
postgres=# \pset format csv
Output format is csv.
postgres=# \pset fieldsep
Field separator is ",".
postgres=# \a
Output format is aligned.
postgres=# select 1 as a,2 as b;
a | b
---+---
1 | 2
I am sorry, but path that I sent was just proof concept - I didn't implement defaults for any other related formats.
I'll try to send cleaner patch tomorrow.
Regards
Pavel
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
2018-03-23 18:55 GMT+01:00 Fabien COELHO <fabien.coelho@mines-paristech.fr>:
Hello Daniel,Do you know when you'll have an updated patch that addresses the minor
issues brought up in review and the concern above?
Here's an update addressing the issues discussed:
- fieldsep and recordsep are used, no more fieldsep_csv
- the command line option is --csv without short option and is equivalent
to -P format=csv -P fieldsep=','
- pset output formats are reordered alphabetically on display
- a couple more cases in the regression tests
Patch applies cleanly, compiles, doc gen ok, "make check" ok.
The patch adds a simple way to generate csv output from "psql" queries, much simpler than playing around with COPY or \copy. It allows to generate a clean CSV dump from something as short as:
sh> psql --csv -c 'TABLE foo' > foo.csv
Documentation is clear.
Test cover a significant number of cases (fieldsep, expanded, tuples-only).
Although recordsep changes are not actually tested, it worked interactively
and I think that tests are sufficient as is.
There are somehow remaining point about which a committer/other people input
would be nice:
(1) There are some mild disagreement whether the fieldsep should be format
specific shared with other format. I do not think that a specific fieldsep
is worth it, but this is a marginal preference, and other people opinion
differ. What is best is not obvious.
Pavel also suggested to have a special handling based on whether
the fieldsep is explicitely set or not. I'm not too keen on that because
it departs significantly from the way psql formatting is currently
handled, and what is happening becomes unclear to the user.
(2) For interactive use, two commands are required: \pset format csv +
\pset fieldsep ',' (or ';' or '\t' or whatever...). Maybe some \csv
command similar to \H would be appropriate, or not, to set both values
more efficiently. Could be something for another patch.
Not sure what is the status of the patch if we do not have a clear consensus.
I am sorry, but I don't think so this interface is good enough. Using | as default CSV separator is just wrong. It and only it is a problem. Any other is perfect.
Regards
Pavel
--
Fabien.
Hello Pavel, >> The patch adds a simple way to generate csv output from "psql" queries, >> much simpler than playing around with COPY or \copy. It allows to generate >> a clean CSV dump from something as short as: >> >> sh> psql --csv -c 'TABLE foo' > foo.csv >> >> Documentation is clear. >> >> Test cover a significant number of cases (fieldsep, expanded, tuples-only). >> Although recordsep changes are not actually tested, it worked interactively >> and I think that tests are sufficient as is. >> >> There are somehow remaining point about which a committer/other people >> input would be nice: >> >> (1) There are some mild disagreement whether the fieldsep should be >> format specific shared with other format. I do not think that a >> specific fieldsep is worth it, but this is a marginal preference, and >> other people opinion differ. What is best is not obvious. >> >> Pavel also suggested to have a special handling based on whether the >> fieldsep is explicitely set or not. I'm not too keen on that because it >> departs significantly from the way psql formatting is currently >> handled, and what is happening becomes unclear to the user. >> >> (2) For interactive use, two commands are required: \pset format csv + >> \pset fieldsep ',' (or ';' or '\t' or whatever...). Maybe some \csv >> command similar to \H would be appropriate, or not, to set both values >> more efficiently. Could be something for another patch. >> >> Not sure what is the status of the patch if we do not have a clear >> consensus. > > I am sorry, but I don't think so this interface is good enough. Using | as > default CSV separator is just wrong. It and only it is a problem. Any other > is perfect. I do not think that there is a perfect solution, so some compromise will be needed or we won't get it. (1) patch v4: "\pset format csv" retains the current fieldsep value, so fields are separated by whatever is in the variable, which means that for getting a standard csv two commands are needed, which is clearly documented, but may be considered as surprising. ISTM that the underlying point is that "format" is really about string escaping, not about the full output format, but this is a pre-existing situation. I'm suggesting to add \csv which would behave like \H to toggle CSV mode so as to improve this situation, with a caveat which is that toggling back \csv would have forgotted the previous settings (just like \H does, though, so would for instance reset to aligned with |), so it would not be perfect. (2) your proposal as I understand it: "\pset format csv" may or may not use the fieldsep, depending on whether it was explicitely set, an information which is not shown, i.e.: \pset fieldsep # fieldsep separator is "|" \pset format csv # would output a,b,c or a|b|c... Because it depends on whether fieldsep was set explicitely to '|' or whether it has this value but it was due to the default. This kind of unclear behavioral determinism does not seem desirable. (3) other option, always use a comma: this was rejected because some people like their comma separated values to be separated by semi-colons or tabs (aka tsv). (4) other option, Daniel v3 or v2: use a distinct "fieldsep_csv" variable initially set to ','. This adds yet another specific variable that has to be remembered, some styles would use fieldsep but csv would not so it is some kind of exception that I would wish to avoid. My current preference order in the suggested solutions is 1, 4, 2, 3, with a significant preference for 1. -- Fabien.
2018-03-24 8:15 GMT+01:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Pavel,The patch adds a simple way to generate csv output from "psql" queries,
much simpler than playing around with COPY or \copy. It allows to generate
a clean CSV dump from something as short as:
sh> psql --csv -c 'TABLE foo' > foo.csv
Documentation is clear.
Test cover a significant number of cases (fieldsep, expanded, tuples-only).
Although recordsep changes are not actually tested, it worked interactively
and I think that tests are sufficient as is.
There are somehow remaining point about which a committer/other people input would be nice:
(1) There are some mild disagreement whether the fieldsep should be format specific shared with other format. I do not think that a specific fieldsep is worth it, but this is a marginal preference, and other people opinion differ. What is best is not obvious.
Pavel also suggested to have a special handling based on whether the fieldsep is explicitely set or not. I'm not too keen on that because it departs significantly from the way psql formatting is currently handled, and what is happening becomes unclear to the user.
(2) For interactive use, two commands are required: \pset format csv + \pset fieldsep ',' (or ';' or '\t' or whatever...). Maybe some \csv command similar to \H would be appropriate, or not, to set both values more efficiently. Could be something for another patch.
Not sure what is the status of the patch if we do not have a clear
consensus.
I am sorry, but I don't think so this interface is good enough. Using | as
default CSV separator is just wrong. It and only it is a problem. Any other
is perfect.
I do not think that there is a perfect solution, so some compromise will be needed or we won't get it.
(1) patch v4:
"\pset format csv" retains the current fieldsep value, so fields are
separated by whatever is in the variable, which means that for getting
a standard csv two commands are needed, which is clearly documented,
but may be considered as surprising. ISTM that the underlying point is
that "format" is really about string escaping, not about the full output
format, but this is a pre-existing situation.
I'm suggesting to add \csv which would behave like \H to toggle CSV
mode so as to improve this situation, with a caveat which is that
toggling back \csv would have forgotted the previous settings (just
like \H does, though, so would for instance reset to aligned with |),
so it would not be perfect.
this doesn't solve usual format settings by \pset format csv
(2) your proposal as I understand it:
"\pset format csv" may or may not use the fieldsep, depending on
whether it was explicitely set, an information which is not shown, i.e.:
\pset fieldsep # fieldsep separator is "|"
\pset format csv # would output a,b,c or a|b|c...
Because it depends on whether fieldsep was set explicitely to '|' or
whether it has this value but it was due to the default.
This kind of unclear behavioral determinism does not seem desirable.
please, check and test attached patch. It is very simply for usage - and there is not any unclear behave. Just you should to accept so formats can have own defaults for separators.
(3) other option, always use a comma:
this was rejected because some people like their comma separated
values to be separated by semi-colons or tabs (aka tsv).
(4) other option, Daniel v3 or v2:
use a distinct "fieldsep_csv" variable initially set to ','. This adds
yet another specific variable that has to be remembered, some styles
would use fieldsep but csv would not so it is some kind of exception
that I would wish to avoid.
My current preference order in the suggested solutions is 1, 4, 2, 3, with a significant preference for 1.
I am thinking so @1 solves nothing - people are using \pset format ...
@3 is clearly bad - there are not any discussion
@4 can be compromise solution, but then there should be renamed fieldsep. Now, fieldsep is used just for unaligned format - for nothing else. If we introduce fieldsep_csv, then fieldsep should be renamed to fieldsep_unaligned. I can live with it.
But I think so default fieldsep is better option. Please, try my patch and comment it.
Regards
Pavel
--
Fabien.
Attachment
2018-03-24 8:24 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2018-03-24 8:15 GMT+01:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Pavel,The patch adds a simple way to generate csv output from "psql" queries,
much simpler than playing around with COPY or \copy. It allows to generate
a clean CSV dump from something as short as:
sh> psql --csv -c 'TABLE foo' > foo.csv
Documentation is clear.
Test cover a significant number of cases (fieldsep, expanded, tuples-only).
Although recordsep changes are not actually tested, it worked interactively
and I think that tests are sufficient as is.
There are somehow remaining point about which a committer/other people input would be nice:
(1) There are some mild disagreement whether the fieldsep should be format specific shared with other format. I do not think that a specific fieldsep is worth it, but this is a marginal preference, and other people opinion differ. What is best is not obvious.
Pavel also suggested to have a special handling based on whether the fieldsep is explicitely set or not. I'm not too keen on that because it departs significantly from the way psql formatting is currently handled, and what is happening becomes unclear to the user.
(2) For interactive use, two commands are required: \pset format csv + \pset fieldsep ',' (or ';' or '\t' or whatever...). Maybe some \csv command similar to \H would be appropriate, or not, to set both values more efficiently. Could be something for another patch.
Not sure what is the status of the patch if we do not have a clear
consensus.
I am sorry, but I don't think so this interface is good enough. Using | as
default CSV separator is just wrong. It and only it is a problem. Any other
is perfect.
I do not think that there is a perfect solution, so some compromise will be needed or we won't get it.
(1) patch v4:
"\pset format csv" retains the current fieldsep value, so fields are
separated by whatever is in the variable, which means that for getting
a standard csv two commands are needed, which is clearly documented,
but may be considered as surprising. ISTM that the underlying point is
that "format" is really about string escaping, not about the full output
format, but this is a pre-existing situation.
I'm suggesting to add \csv which would behave like \H to toggle CSV
mode so as to improve this situation, with a caveat which is that
toggling back \csv would have forgotted the previous settings (just
like \H does, though, so would for instance reset to aligned with |),
so it would not be perfect.this doesn't solve usual format settings by \pset format csv
(2) your proposal as I understand it:
"\pset format csv" may or may not use the fieldsep, depending on
whether it was explicitely set, an information which is not shown, i.e.:
\pset fieldsep # fieldsep separator is "|"
\pset format csv # would output a,b,c or a|b|c...
Because it depends on whether fieldsep was set explicitely to '|' or
whether it has this value but it was due to the default.
This kind of unclear behavioral determinism does not seem desirable.please, check and test attached patch. It is very simply for usage - and there is not any unclear behave. Just you should to accept so formats can have own defaults for separators.
(3) other option, always use a comma:
this was rejected because some people like their comma separated
values to be separated by semi-colons or tabs (aka tsv).
(4) other option, Daniel v3 or v2:
use a distinct "fieldsep_csv" variable initially set to ','. This adds
yet another specific variable that has to be remembered, some styles
would use fieldsep but csv would not so it is some kind of exception
that I would wish to avoid.
My current preference order in the suggested solutions is 1, 4, 2, 3, with a significant preference for 1.I am thinking so @1 solves nothing - people are using \pset format ...@3 is clearly bad - there are not any discussion@4 can be compromise solution, but then there should be renamed fieldsep. Now, fieldsep is used just for unaligned format - for nothing else. If we introduce fieldsep_csv, then fieldsep should be renamed to fieldsep_unaligned. I can live with it.But I think so default fieldsep is better option. Please, try my patch and comment it.
minor fix
Regards
Pavel
RegardsPavel
--
Fabien.
Attachment
minor fix
all is working now
[pavel@nemesis postgresql]$ psql --csv -F ';' -c "table foo" postgres
a;b;c
3;4;Nazdar
3;4;Nazdar
[pavel@nemesis postgresql]$ psql -F ';' --csv -c "table foo" postgres
a;b;c
3;4;Nazdar
3;4;Nazdar
[pavel@nemesis postgresql]$ psql --csv -c "table foo" postgres
a,b,c
3,4,Nazdar
3,4,Nazdar
[pavel@nemesis postgresql]$ psql -A -c "table foo" postgres
a|b|c
3|4|Nazdar
3|4|Nazdar
(2 rows)
[pavel@nemesis postgresql]$ psql -A -F , -c "table foo" postgres
a,b,c
3,4,Nazdar
3,4,Nazdar
(2 rows)
[pavel@nemesis postgresql]$ psql --csv -F ';' -c "table foo" postgres
a;b;c
3;4;Nazdar
3;4;Nazdar
[pavel@nemesis postgresql]$ psql -F ';' --csv -c "table foo" postgres
a;b;c
3;4;Nazdar
3;4;Nazdar
[pavel@nemesis postgresql]$ psql --csv -c "table foo" postgres
a,b,c
3,4,Nazdar
3,4,Nazdar
[pavel@nemesis postgresql]$ psql -A -c "table foo" postgres
a|b|c
3|4|Nazdar
3|4|Nazdar
(2 rows)
[pavel@nemesis postgresql]$ psql -A -F , -c "table foo" postgres
a,b,c
3,4,Nazdar
3,4,Nazdar
(2 rows)
Regards
Pavel
RegardsPavelRegardsPavel
--
Fabien.
minor fix
another fix in help
Pavel
postgres=# \f
Field separator is "".
postgres=# \pset format csv
Output format is csv.
postgres=# \f
Field separator is ",".
postgres=# \pset format unaligned
Output format is unaligned.
postgres=# \f
Field separator is "|".
postgres=# \f ;
Field separator is ";".
postgres=# \pset format unaligned
Output format is unaligned.
postgres=# \f
Field separator is ";".
postgres=# \pset format csv
Output format is csv.
postgres=# \f
Field separator is ";".
postgres=# \f
Field separator is "".
postgres=# \pset format csv
Output format is csv.
postgres=# \f
Field separator is ",".
postgres=# \pset format unaligned
Output format is unaligned.
postgres=# \f
Field separator is "|".
postgres=# \f ;
Field separator is ";".
postgres=# \pset format unaligned
Output format is unaligned.
postgres=# \f
Field separator is ";".
postgres=# \pset format csv
Output format is csv.
postgres=# \f
Field separator is ";".
Regards
RegardsPavelRegardsPavel
--
Fabien.
Attachment
Hello Pavel, >>> I'm suggesting to add \csv which would behave like \H to toggle CSV >>> mode so as to improve this situation, with a caveat which is that >>> toggling back \csv would have forgotted the previous settings (just >>> like \H does, though, so would for instance reset to aligned with |), >>> so it would not be perfect. >> >> this doesn't solve usual format settings by \pset format csv Indeed, it does not. Alas, "format" is not strictly about format but more about string escaping. >>> (2) your proposal as I understand it: >>> >>> "\pset format csv" may or may not use the fieldsep, depending on >>> whether it was explicitely set, an information which is not shown, >>> i.e.: >>> >>> \pset fieldsep # fieldsep separator is "|" >>> \pset format csv # would output a,b,c or a|b|c... >>> >>> Because it depends on whether fieldsep was set explicitely to '|' or >>> whether it has this value but it was due to the default. >>> >>> This kind of unclear behavioral determinism does not seem desirable. >>> >> >> please, check and test attached patch. It is very simply for usage - and >> there is not any unclear behave. Just you should to accept so formats can >> have own defaults for separators. I checked, and the behavior is intrinsically strange. Your patch shows an empty '' fieldsep at startup, which is debatable because it is not its actual value. psql> \pset # fieldsep '' -- doubtful ISTM that at the minimum it should show a special "<format-default>" or whatever value, which creates some kind of exception because this special value cannot be set and special values are a bad thing. When a format is set, a current default fielsep is shown. psql> \pset format unaligned psql> \pset # fieldsep '|' -- new default shown psql> SELECT 1 AS one, 2 AS two; one|two 1|2 psql> \pset format csv psql> \pset # fieldsep ',' -- new default shown psql> SELECT 1 AS one, 2 AS two; one,two 1,2 However, if fieldsep is set once, the behaviors changes indefinitely: psql> \pset fieldsep '|' psql> \pset format unaligned # unaligned with '|' psql> \pset format csv # csv with '|' That is once the fieldsep has been used, you are back to the v4 behavior that you are disagreeing with in the first place. The patch creates an incomplete state automaton which some unreachable states, because once a value is set there is no way to switch back to the previous "use default" behavior. Also, if I do "\pset fielsep ''" then the \pset output does not allow to distinguish between the unset state and set to empty state. I would not expect a committer to accept such a strange no-possible-return to previous state because of a hidden state (whether fieldsep has been set or not in the past) behavior. So I do not think that this proposed version is really satisfactory. -- Fabien.
2018-03-24 10:06 GMT+01:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Pavel,I'm suggesting to add \csv which would behave like \H to toggle CSV
mode so as to improve this situation, with a caveat which is that
toggling back \csv would have forgotted the previous settings (just
like \H does, though, so would for instance reset to aligned with |),
so it would not be perfect.
this doesn't solve usual format settings by \pset format csv
Indeed, it does not. Alas, "format" is not strictly about format but more about string escaping.(2) your proposal as I understand it:
"\pset format csv" may or may not use the fieldsep, depending on
whether it was explicitely set, an information which is not shown,
i.e.:
\pset fieldsep # fieldsep separator is "|"
\pset format csv # would output a,b,c or a|b|c...
Because it depends on whether fieldsep was set explicitely to '|' or
whether it has this value but it was due to the default.
This kind of unclear behavioral determinism does not seem desirable.
please, check and test attached patch. It is very simply for usage - and
there is not any unclear behave. Just you should to accept so formats can
have own defaults for separators.
I checked, and the behavior is intrinsically strange.
Your patch shows an empty '' fieldsep at startup, which is debatable because it is not its actual value.
it is correct. Default format is aligned, that doesn't use fieldsep.
psql> \pset # fieldsep '' -- doubtful
ISTM that at the minimum it should show a special "<format-default>" or whatever value, which creates some kind of exception because this special value cannot be set and special values are a bad thing.
When a format is set, a current default fielsep is shown.
psql> \pset format unaligned
psql> \pset # fieldsep '|' -- new default shown
psql> SELECT 1 AS one, 2 AS two;
one|two
1|2
psql> \pset format csv
psql> \pset # fieldsep ',' -- new default shown
psql> SELECT 1 AS one, 2 AS two;
one,two
1,2
However, if fieldsep is set once, the behaviors changes indefinitely:
psql> \pset fieldsep '|'
psql> \pset format unaligned
# unaligned with '|'
psql> \pset format csv
# csv with '|'
There should be similar mechanism like fieldsep_zero, that reset settings. Some like fieldsep_default.
That is once the fieldsep has been used, you are back to the v4 behavior that you are disagreeing with in the first place.
The patch creates an incomplete state automaton which some unreachable states, because once a value is set there is no way to switch back to the previous "use default" behavior.
It is not implemented, but it is small problem
Also, if I do "\pset fielsep ''" then the \pset output does not allow to distinguish between the unset state and set to empty state.
This is question - how to fieldsep be related to current format. Aligned format doesn't use fieldsep.
The alternative can be words "not used" -- and then you can see difference between "" and other. But if you see the code, the empty string is used like info about empty fieldsep now.
I would not expect a committer to accept such a strange no-possible-return to previous state because of a hidden state (whether fieldsep has been set or not in the past) behavior.
This design is very similar to already implemented fieldsep_zero - look to code. It is nothing new.
So I do not think that this proposed version is really satisfactory.
So I can do better?
1. use special default string for formats that doesn't field sep - like "not used" or some
2. I can implemet the option fieldsep_default - very similary to fieldsep_zero to reset fieldsep to default state.
please, check updated patch
Regards
Pavel
--
Fabien.
Attachment
Hello Pavel, >> [...] > it is correct. Default format is aligned, that doesn't use fieldsep. My comment is that currently fieldsep is kind of a variable, the value of which is displayed and reliable wrt commands executed afterwards, and the proposed approach changes that by adding a new "default" state. >> [...] > There should be similar mechanism like fieldsep_zero, that reset settings. > Some like fieldsep_default. The user-visible fieldsep_zero is a horrible hack which exposes the fact that C strings cannot hold character zero because it is the end of string marker. The user should not have to care! I wish it would be simply removed, and that "\pset fieldsep '\0'" or something like that would do the trick. Having another one of these, which would create 4 boolean states which would have to be interpreted (eg if default is true and zero is true and sep is ';', and format is csv, what will I get?) by the user to guess what is going to happen looks unhelpful. For me adding another user-visible fieldsep-related boolean a no go. >> Also, if I do "\pset fielsep ''" then the \pset output does not allow to >> distinguish between the unset state and set to empty state. > > This is question - how to fieldsep be related to current format. Aligned > format doesn't use fieldsep. For me \pset reflects the current value of output options. With the proposed approach it does not mean that anymore, as already said above. >> I would not expect a committer to accept such a strange no-possible-return >> to previous state because of a hidden state (whether fieldsep has been set >> or not in the past) behavior. > > This design is very similar to already implemented fieldsep_zero - look to > code. It is nothing new. Alas you are right. I suggest not to go on the same path again:-) > So I can do better? Dunno. Possibly. > 1. use special default string for formats that doesn't field sep - like > "not used" or some > 2. I can implemet the option fieldsep_default - very similary to > fieldsep_zero to reset fieldsep to default state. I strongly dislike option 2, as expressed above. I would enthousiastically review any patch that would aim are removing these "*_zero" options. I might submit it someday. I'm more unclear about option 1. Maybe it could be managed cleanly. I'm still at odds that it would mean that \pset would not show the actual setting anymore, but something harder to explain, "actual value or some format-specific value, depending". -- Fabien.
2018-03-25 15:21 GMT+02:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Pavel,[...]it is correct. Default format is aligned, that doesn't use fieldsep.
My comment is that currently fieldsep is kind of a variable, the value of which is displayed and reliable wrt commands executed afterwards, and the proposed approach changes that by adding a new "default" state.[...]There should be similar mechanism like fieldsep_zero, that reset settings.
Some like fieldsep_default.
The user-visible fieldsep_zero is a horrible hack which exposes the fact that C strings cannot hold character zero because it is the end of string marker. The user should not have to care! I wish it would be simply removed, and that "\pset fieldsep '\0'" or something like that would do the trick.
Having another one of these, which would create 4 boolean states which would have to be interpreted (eg if default is true and zero is true and sep is ';', and format is csv, what will I get?) by the user to guess what is going to happen looks unhelpful.
For me adding another user-visible fieldsep-related boolean a no go.Also, if I do "\pset fielsep ''" then the \pset output does not allow to
distinguish between the unset state and set to empty state.
This is question - how to fieldsep be related to current format. Aligned
format doesn't use fieldsep.
For me \pset reflects the current value of output options. With the proposed approach it does not mean that anymore, as already said above.I would not expect a committer to accept such a strange no-possible-return
to previous state because of a hidden state (whether fieldsep has been set
or not in the past) behavior.
This design is very similar to already implemented fieldsep_zero - look to
code. It is nothing new.
Alas you are right. I suggest not to go on the same path again:-)So I can do better?
Dunno. Possibly.1. use special default string for formats that doesn't field sep - like
"not used" or some
2. I can implemet the option fieldsep_default - very similary to
fieldsep_zero to reset fieldsep to default state.
I strongly dislike option 2, as expressed above. I would enthousiastically review any patch that would aim are removing these "*_zero" options. I might submit it someday.
I can remove it simply - but a alternative is implementation of some \pset_reset command maybe:
\pset reset fieldsep
what do you think?
I'm more unclear about option 1. Maybe it could be managed cleanly.
I'm still at odds that it would mean that \pset would not show the actual setting anymore, but something harder to explain, "actual value or some format-specific value, depending".
It can be formulated little bit different - "when a value of field separator is not entered, then format specific default is used (if can be specified - some formats doesn't allow to specify field separator)."
Regards
Pavel
--
Fabien.
2018-03-25 18:02 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
2018-03-25 15:21 GMT+02:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Pavel,[...]it is correct. Default format is aligned, that doesn't use fieldsep.
My comment is that currently fieldsep is kind of a variable, the value of which is displayed and reliable wrt commands executed afterwards, and the proposed approach changes that by adding a new "default" state.[...]There should be similar mechanism like fieldsep_zero, that reset settings.
Some like fieldsep_default.
The user-visible fieldsep_zero is a horrible hack which exposes the fact that C strings cannot hold character zero because it is the end of string marker. The user should not have to care! I wish it would be simply removed, and that "\pset fieldsep '\0'" or something like that would do the trick.
Having another one of these, which would create 4 boolean states which would have to be interpreted (eg if default is true and zero is true and sep is ';', and format is csv, what will I get?) by the user to guess what is going to happen looks unhelpful.
For me adding another user-visible fieldsep-related boolean a no go.Also, if I do "\pset fielsep ''" then the \pset output does not allow to
distinguish between the unset state and set to empty state.
This is question - how to fieldsep be related to current format. Aligned
format doesn't use fieldsep.
For me \pset reflects the current value of output options. With the proposed approach it does not mean that anymore, as already said above.I would not expect a committer to accept such a strange no-possible-return
to previous state because of a hidden state (whether fieldsep has been set
or not in the past) behavior.
This design is very similar to already implemented fieldsep_zero - look to
code. It is nothing new.
Alas you are right. I suggest not to go on the same path again:-)So I can do better?
Dunno. Possibly.1. use special default string for formats that doesn't field sep - like
"not used" or some
2. I can implemet the option fieldsep_default - very similary to
fieldsep_zero to reset fieldsep to default state.
I strongly dislike option 2, as expressed above. I would enthousiastically review any patch that would aim are removing these "*_zero" options. I might submit it someday.I can remove it simply - but a alternative is implementation of some \pset_reset command maybe:\pset reset fieldsepwhat do you think?
implemented in attached patch
I'm more unclear about option 1. Maybe it could be managed cleanly.
I'm still at odds that it would mean that \pset would not show the actual setting anymore, but something harder to explain, "actual value or some format-specific value, depending".It can be formulated little bit different - "when a value of field separator is not entered, then format specific default is used (if can be specified - some formats doesn't allow to specify field separator)."
it can be clean from following result
postgres=# \pset fieldsep
User didn't specified field separator.
Current format doesn't specify default field separator.
postgres=# \pset fieldsep ;
Field separator is ";".
postgres=# \pset fieldsep
Field separator is ";".
postgres=# \pset fieldsep
User didn't specified field separator.
Current format doesn't specify default field separator.
postgres=# \pset fieldsep ;
Field separator is ";".
postgres=# \pset fieldsep
Field separator is ";".
fieldsep_default was removed
Regards
Pavel
RegardsPavel
--
Fabien.
Attachment
Pavel Stehule wrote: > implemented in attached patch Consider your initial output of \pset, when no option is specified ================= $ ./psql psql (11devel) Type "help" for help. postgres=# \pset border 1 columns 0 expanded off fieldsep not used fieldsep_zero ERROR footer on format aligned linestyle ascii null '' numericlocale off pager 1 pager_min_lines 0 recordsep '\n' recordsep_zero off reset ERROR tableattr title tuples_only off unicode_border_linestyle single unicode_column_linestyle single unicode_header_linestyle single ================ These entries with ERROR correspond in fact to no error at all, or we have to pretend that the default state of psql is erroneous, which doesn't make sense. Also "reset" is not a variable, it seems to be a command, so it probably shouldn't be there in the first place. More generally, I'd think the point of reusing "fieldsep" was to reuse the concept, not reimplement it, let alone changing bits of behavior of the unaligned mode along the way. With this patch, again without specifying any option, just looking at what fieldsep is leads to this: postgres=# \pset fieldsep User didn't specified field separator. Current format doesn't specify default field separator. If this is the way to "solve" the fact that a user has to do \pset fieldsep ',' to get commas in csv mode, then IMV the proposed solution is clearly worse than the stated problem, and worse than simply adding fieldsep_csv to be independant from the unaligned mode. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
2018-03-26 14:44 GMT+02:00 Daniel Verite <daniel@manitou-mail.org>:
Pavel Stehule wrote:
> implemented in attached patch
Consider your initial output of \pset, when no option is specified
=================
$ ./psql
psql (11devel)
Type "help" for help.
postgres=# \pset
border 1
columns 0
expanded off
fieldsep not used
fieldsep_zero ERROR
footer on
format aligned
linestyle ascii
null ''
numericlocale off
pager 1
pager_min_lines 0
recordsep '\n'
recordsep_zero off
reset ERROR
tableattr
title
tuples_only off
unicode_border_linestyle single
unicode_column_linestyle single
unicode_header_linestyle single
================
These entries with ERROR correspond in fact to no error at all,
or we have to pretend that the default state of psql is erroneous,
which doesn't make sense.
Also "reset" is not a variable, it seems to be a command,
so it probably shouldn't be there in the first place.
fixed
More generally, I'd think the point of reusing "fieldsep" was to
reuse the concept, not reimplement it, let alone changing
bits of behavior of the unaligned mode along the way.
With this patch, again without specifying any option, just looking
at what fieldsep is leads to this:
postgres=# \pset fieldsep
User didn't specified field separator.
Current format doesn't specify default field separator.
If this is the way to "solve" the fact that a user has to do
\pset fieldsep ','
to get commas in csv mode, then IMV the proposed solution
is clearly worse than the stated problem, and worse than
simply adding fieldsep_csv to be independant from the
unaligned mode.
I don't understand what is wrong there? There can be any message (short - just "unset",...). I understand so default visual can look strange, because aligned mode has not default field separator, but I don't see any other possibility.
Can I do some recapitulation:
1. using CSV with default | as field separator is wrong - probably there is a agreement
2. partial solution is fieldsep_X where X will be format name. It is better than using | for csv, but it introduces new problems:
a) there are not new long, short options for this value
b) the list of pset options is bloating - every possible new format can introduce fieldsep_X option
c) can be messy for people, because the setting fieldsep can has zero effect on csv or some other formats that don't share default with unaligned format.
So my position - I am very strong against to introduce CSV format with | as field separator, and I am not happy if we introduce fieldsep_X like options, because it is not too good too.
Regards
Pavel
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Attachment
Pavel Stehule wrote: > b) the list of pset options is bloating - every possible new format can > introduce fieldsep_X option What new format? The usefulness of fieldsep does not extend outside of xSV, and it's no suprise that there have been no other use for a fieldsep-like variable until now, with the other formats supported in psql. In fact it's even absurd for any format to use a non-fixed separator at a place that is key for being parsed unambiguously. We could even support only the comma and make it non-configurable based on the fact it's Comma-Separated-Values, not Whatever-Separated-Values, except that won't do much to serve the users interests, as the reality is that people use various separators. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
2018-03-26 20:26 GMT+02:00 Daniel Verite <daniel@manitou-mail.org>:
Pavel Stehule wrote:
> b) the list of pset options is bloating - every possible new format can
> introduce fieldsep_X option
What new format?
The usefulness of fieldsep does not extend outside of xSV, and it's
no suprise that there have been no other use for a fieldsep-like
variable until now, with the other formats supported in psql.
I can imagine format based on \t as field separator, or some other invisible chars.
In fact it's even absurd for any format to use a non-fixed separator
at a place that is key for being parsed unambiguously.
We could even support only the comma and make it non-configurable
based on the fact it's Comma-Separated-Values, not
Whatever-Separated-Values, except that won't do much
to serve the users interests, as the reality is that
people use various separators.
I don't think so allow only comma as separator is the solution.
Regards
Pavel
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Hello Pavel, I'd like to convince you to compromise, because otherwise I'm afraid we will not get this feature. >> 1. use special default string for formats that doesn't field sep - like >>> "not used" or some >>> 2. I can implemet the option fieldsep_default - very similary to >>> fieldsep_zero to reset fieldsep to default state. >>> >> >> I strongly dislike option 2, as expressed above. I would enthousiastically >> review any patch that would aim are removing these "*_zero" options. I >> might submit it someday. >> > > I can remove it simply - but a alternative is implementation of some > \pset_reset command maybe: > > \pset reset fieldsep > > what do you think? I think that changing the semantics of \pset is a nonstarter, it should be only about "output [p]arameter [set]ting", because it has been like that for the last XX years and people are used to it. >> I'm more unclear about option 1. Maybe it could be managed cleanly. >> >> I'm still at odds that it would mean that \pset would not show the actual >> setting anymore, but something harder to explain, "actual value or some >> format-specific value, depending". > > It can be formulated little bit different - "when a value of field > separator is not entered, then format specific default is used (if can be > specified - some formats doesn't allow to specify field separator)." Well, that is 3 lines of explanations where people thought they were just setting a simple variable to a simple value, or showing the actual current value which would be used if needed. My opinion is that some of what you are suggesting could have participated to an alternate and interesting output-parameters-for-psql design, but we are a much too late to change that. The purpose of the patch is just to enable having clean CSV quite easily from psql, possibly for pg11... Changing the design and the underlying user visible behavior would require a much wider and difficult to obtain consensus, and is very unlikely to get in for pg11, if ever. The current version allows "--csv" or need two "\pset" to achieve CSV, without changing the preceding behavior, however unperfect it is... Could you live with that with the benefit of getting it in? I do not claim it is a perfect solution, just that it is a reasonable one. The dynamic default changing approach departs too much for the current user expectations, the user-benefit is not worth it, and committers are very likely to think like that. The only no-surprise, no-behavioral-change, alternative I see is to have a csv-specific fieldsep. I'm not keen on that one because it is yet another variable, one has to remember it, and then it asks the question about recordsep... and I think there are already too many parameters, and more is not desirable, although I probably could live with that if I can live with "fieldsep_zero":-) -- Fabien.
2018-03-27 11:35 GMT+02:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Pavel,
I'd like to convince you to compromise, because otherwise I'm afraid we will not get this feature.1. use special default string for formats that doesn't field sep - like"not used" or some
2. I can implemet the option fieldsep_default - very similary to
fieldsep_zero to reset fieldsep to default state.
I strongly dislike option 2, as expressed above. I would enthousiastically
review any patch that would aim are removing these "*_zero" options. I
might submit it someday.
I can remove it simply - but a alternative is implementation of some
\pset_reset command maybe:
\pset reset fieldsep
what do you think?
I think that changing the semantics of \pset is a nonstarter, it should be only about "output [p]arameter [set]ting", because it has been like that for the last XX years and people are used to it.I'm more unclear about option 1. Maybe it could be managed cleanly.
I'm still at odds that it would mean that \pset would not show the actual
setting anymore, but something harder to explain, "actual value or some
format-specific value, depending".
It can be formulated little bit different - "when a value of field
separator is not entered, then format specific default is used (if can be
specified - some formats doesn't allow to specify field separator)."
Well, that is 3 lines of explanations where people thought they were just setting a simple variable to a simple value, or showing the actual current value which would be used if needed.
My opinion is that some of what you are suggesting could have participated to an alternate and interesting output-parameters-for-psql design, but we are a much too late to change that.
The purpose of the patch is just to enable having clean CSV quite easily from psql, possibly for pg11... Changing the design and the underlying user visible behavior would require a much wider and difficult to obtain consensus, and is very unlikely to get in for pg11, if ever.
The current version allows "--csv" or need two "\pset" to achieve CSV, without changing the preceding behavior, however unperfect it is...
Could you live with that with the benefit of getting it in? I do not claim it is a perfect solution, just that it is a reasonable one.
The dynamic default changing approach departs too much for the current user expectations, the user-benefit is not worth it, and committers are very likely to think like that.
The only no-surprise, no-behavioral-change, alternative I see is to have a csv-specific fieldsep. I'm not keen on that one because it is yet another variable, one has to remember it, and then it asks the question about recordsep... and I think there are already too many parameters, and more is not desirable, although I probably could live with that if I can live with "fieldsep_zero":-)
I don't share your opinion so introduction csv-specific fieldsep is surprise less. Current design is wrong - this thread is a evidence. And if we introduce csv-specific fieldsep, then we multiply this wrong design. The fix in this direction is renaming fieldsep to fieldsep-unaliagn - but it is probably too big change too. So this design is nothing what I can mark as good solution. I can live with because it is much better than using pipe as separator for csv, and because real impact is small - for almost people it will be invisible - but have not good feeling from it. Are there some possible alternatives?
Regards
Pavel
--
Fabien.
Hello Pavel, >> I'd like to convince you to compromise, because otherwise I'm afraid we >> will not get this feature. > > [...] >> >> The only no-surprise, no-behavioral-change, alternative I see is to have a >> csv-specific fieldsep. I'm not keen on that one because it is yet another >> variable, one has to remember it, and then it asks the question about >> recordsep... and I think there are already too many parameters, and more is >> not desirable, although I probably could live with that if I can live with >> "fieldsep_zero":-) >> >> > I don't share your opinion so introduction csv-specific fieldsep is > surprise less. Current design is wrong - this thread is a evidence. Wrong is too strong a word. Current design is not perfect, sure. Proposed alternatives are somehow worse, at least to some people mind, which explains why we arrived on this version after a few iterations. > And if we introduce csv-specific fieldsep, then we multiply this wrong > design. The fix in this direction is renaming fieldsep to > fieldsep-unaliagn - but it is probably too big change too. So this > design is nothing what I can mark as good solution. Good, we somehow agree on something! > I can live with because it is much better than using pipe as separator > for csv, and because real impact is small - for almost people it will be > invisible - but have not good feeling from it. > Are there some possible alternatives? Given the date and the fact that the cf end is 3 days away, the proposed short term alternative is Daniel's version, that I feel is reasonable. Ok, people have to do two pset to get comma-separated csv, otherwise they get pipe-separated csv in one pset. You could compromise on that for now, and submit an improvement patch for a later version if you wish. Otherwise, ISTM that the current result of this thread is that there will be no simple CSV in pg11:-( Or maybe I can mark Daniel's latest version as "ready" and point out that there is some disagreement on the thread, so it is not a full consensus. Then a committer can decide whether it is better in like that or that it should be put back in some design stage, possibly with their preference wrt to the kind of solution they think best. -- Fabien.
2018-03-28 10:24 GMT+02:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Pavel,I'd like to convince you to compromise, because otherwise I'm afraid we
will not get this feature.
[...]I don't share your opinion so introduction csv-specific fieldsep is
The only no-surprise, no-behavioral-change, alternative I see is to have a
csv-specific fieldsep. I'm not keen on that one because it is yet another
variable, one has to remember it, and then it asks the question about
recordsep... and I think there are already too many parameters, and more is
not desirable, although I probably could live with that if I can live with
"fieldsep_zero":-)
surprise less. Current design is wrong - this thread is a evidence.
Wrong is too strong a word. Current design is not perfect, sure. Proposed alternatives are somehow worse, at least to some people mind, which explains why we arrived on this version after a few iterations.And if we introduce csv-specific fieldsep, then we multiply this wrong design. The fix in this direction is renaming fieldsep to fieldsep-unaliagn - but it is probably too big change too. So this design is nothing what I can mark as good solution.
Good, we somehow agree on something!I can live with because it is much better than using pipe as separator for csv, and because real impact is small - for almost people it will be invisible - but have not good feeling from it.Are there some possible alternatives?
Given the date and the fact that the cf end is 3 days away, the proposed short term alternative is Daniel's version, that I feel is reasonable. Ok, people have to do two pset to get comma-separated csv, otherwise they get pipe-separated csv in one pset.
I dislike the last Daniel's design. I am sorry.
You could compromise on that for now, and submit an improvement patch for a later version if you wish.
I am able to accept csv specific field sep independent on unaligned field sep.
I have not any other idea. And there is not any good (acceptable) ideas. It is hard to expect so there will be change next year. This space is small, and there are not too much possible variants. We checked all possibility without any agreement.
Otherwise, ISTM that the current result of this thread is that there will be no simple CSV in pg11:-(
Can be. If there is not good enough solution now. If we merge it now, then will be hard to change it due possible compatibility issues.
Or maybe I can mark Daniel's latest version as "ready" and point out that there is some disagreement on the thread, so it is not a full consensus. Then a committer can decide whether it is better in like that or that it should be put back in some design stage, possibly with their preference wrt to the kind of solution they think best.
You can do it. But from my view the Daniel's latest version (with respect to his work) is the worst variant :(. So I am against to commit this version.
Regards
Pavel
--
Fabien.
On Wednesday, March 28, 2018, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Are there some possible alternatives?
Given the date and the fact that the cf end is 3 days away, the proposed short term alternative is Daniel's version, that I feel is reasonable. Ok, people have to do two pset to get comma-separated csv, otherwise they get pipe-separated csv in one pset.
Could someone post how captions, rows-only, and footer pset settings factor into this? Specifically are they fixed to on/off or will they hide/show if users request them explicitly?
My take on this is that --csv mode is/should be an alternate output mode from the existing pset controlled one, and functions basically like "\copy to stdout" and all echoing and metadata outputs are disabled and only query results, with header and the user specified delimiter, are output. No control other than the delimiter seems to be provided in the current design but that could be expanded upon. In that specification the existing fieldsep argument that is tied to pset should not be used and something like --csv-fieldsep should be provided (I like the prefixing to tie the option lexically to the master --csv option).
David J.
On Wednesday, March 28, 2018, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wednesday, March 28, 2018, Pavel Stehule <pavel.stehule@gmail.com> wrote:Are there some possible alternatives?
Given the date and the fact that the cf end is 3 days away, the proposed short term alternative is Daniel's version, that I feel is reasonable. Ok, people have to do two pset to get comma-separated csv, otherwise they get pipe-separated csv in one pset.Could someone post how captions, rows-only, and footer pset settings factor into this? Specifically are they fixed to on/off or will they hide/show if users request them explicitly?
I found the original post that covers that indeed we simply fix these values, which is why I was thinking.
and something like --csv-fieldsep should be provided (I like the prefixing to tie the option lexically to the master --csv option).
Or, really, just make --csv take an optional argument which, if present, is the delimiter. No separate argument needed, and we ignore the pset fieldsep argument like we ignore everything else.
Need to decide whether to "not ignore" --tuples-only, which doesn't seem controversial aside from being a pset argument that isn't being ignored in this design...
David J.
On Monday, March 26, 2018, Daniel Verite <daniel@manitou-mail.org> wrote:
We could even support only the comma and make it non-configurable
based on the fact it's Comma-Separated-Values, not
Whatever-Separated-Values, except that won't do much
to serve the users interests, as the reality is that
people use various separators.
I like to call it "Character Separated Values" now for just that reason.
David J.
On 03/28/2018 12:35 PM, David G. Johnston wrote:
On Monday, March 26, 2018, Daniel Verite <daniel@manitou-mail.org> wrote:
We could even support only the comma and make it non-configurable
based on the fact it's Comma-Separated-Values, not
Whatever-Separated-Values, except that won't do much
to serve the users interests, as the reality is that
people use various separators.I like to call it "Character Separated Values" now for just that reason.
Isn't the actual wording Character Delimited Values? I may be picking at hairs here but every single time I use anything to import a CSV or other delimited file (TAB or | usually) that is what the import screen says.
JD
David J.
-- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc *** A fault and talent of mine is to tell it exactly how it is. *** PostgreSQL centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://postgresconf.org ***** Unless otherwise stated, opinions are my own. *****
On Wednesday, March 28, 2018, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
And if we introduce csv-specific fieldsep, then we multiply this wrong design. The fix in this direction is renaming fieldsep to fieldsep-unaliagn - but it is probably too big change too. So this design is nothing what I can mark as good solution.
Good, we somehow agree on something!I can live with because it is much better than using pipe as separator for csv, and because real impact is small - for almost people it will be invisible - but have not good feeling from it.
Concretely...I'm in favor of the "\pset fieldsep_csv ," solution and csv format should always use its existing value. Teach \pset fieldsep to fail if the current format is csv. Being able to specify the csv fieldsep like "\pset format csv ;" would be a plus.
Unaligned format could grow its own fieldsep if it wanted to but it can also just use the default provided fieldsep var whose default value is pipe. If it did grow one it would need to understand "not set" in order to preserve existing behavior. We don't have that problem with csv.
I don't believe we can modify fieldsep without causing unwanted grief.
David J.
On 28 March 2018 at 15:43, Joshua D. Drake <jd@commandprompt.com> wrote:
On 03/28/2018 12:35 PM, David G. Johnston wrote:I like to call it "Character Separated Values" now for just that reason.
Isn't the actual wording Character Delimited Values? I may be picking at hairs here but every single time I use anything to import a CSV or other delimited file (TAB or | usually) that is what the import screen says.
CSV stands for Comma Separated Values, and not anything else common as far as I can tell. A Google search for "csv" turns up the Wikipedia page describing the file format as the first hit, followed by the Wikipedia disambiguation page for CSV, which links to the aforementioned Wikipedia page as the only file-format-related link.
The actual reason I'm posting this is because some of the discussion has made me a bit confused: there is already a CSV format defined for the COPY command and used by the psql \copy. I just want to check that what is being discussed here would use the exact same format as the existing CSV COPY format; and the configurability of them should be the exact same options (which already includes being able to change the delimiter). I think it's important that Postgres not have two CSVs with slightly different behaviours. Indeed, psql could use COPY behind the scenes to generate the CSV output, which would guarantee no nasty surprises.
If this is already widely accepted or if I'm horribly misunderstanding the discussion then I'm sorry for the noise.
On 28 March 2018 at 15:43, Joshua D. Drake <jd@commandprompt.com> wrote:On 03/28/2018 12:35 PM, David G. Johnston wrote:I like to call it "Character Separated Values" now for just that reason.
Isn't the actual wording Character Delimited Values? I may be picking at hairs here but every single time I use anything to import a CSV or other delimited file (TAB or | usually) that is what the import screen says.
CSV stands for Comma Separated Values, and not anything else common as far as I can tell.
Language evolves - Wikipedia just hasn't recognized this particular evolution yet :)
The actual reason I'm posting this is because some of the discussion has made me a bit confused: there is already a CSV format defined for the COPY command and used by the psql \copy. I just want to check that what is being discussed here would use the exact same format as the existing CSV COPY format; and the configurability of them should be the exact same options (which already includes being able to change the delimiter).
Nope, the \copy meta-command, and corresponding COPY SQL command, have additional options that I don't believe will be accessible here. Specifically the "OIDS", "QUOTE", and various "FORCE_" options are not controllable, nor are ESCAPE and ENCODING (this last I think, or at least not as flexible).
I think it's important that Postgres not have two CSVs with slightly different behaviours. Indeed, psql could use COPY behind the scenes to generate the CSV output, which would guarantee no nasty surprises.If this is already widely accepted or if I'm horribly misunderstanding the discussion then I'm sorry for the noise.
I agree this is a possible approach but the way the thinking is presently is that this is just another "\pset format" option with forced defaults excepting the delimiter (DELIMITER),presence of the column names (HEADER), NULL (I think...)
Issuing \copy with the equivalent settings should result in the same output...but the two mechanisms are distinct for better and worse.
David J.
David G. Johnston wrote: > Could someone post how captions, rows-only, and footer pset settings factor > into this? Specifically are they fixed to on/off or will they hide/show if > users request them explicitly? This is described in the doc with: + <para><literal>csv</literal> format writes columns separated + by <literal>fieldsep</literal>, applying the CSV quoting rules + described in RFC-4180 and compatible with the CSV format + of the <command>COPY</command> command. + The header with column names is output unless the + <literal>tuples_only</literal> parameter is <literal>on</literal>. + Title and footers are not printed. + </para> Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
David G. Johnston wrote: > Or, really, just make --csv take an optional argument which, if present, is > the delimiter. I don't think psql can support optional arguments because psql --option foo would be ambiguous, since foo could be the option's value or the name of a database to connect to. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
2018-03-29 14:17 GMT+02:00 Daniel Verite <daniel@manitou-mail.org>:
David G. Johnston wrote:
> Or, really, just make --csv take an optional argument which, if present, is
> the delimiter.
I don't think psql can support optional arguments because
psql --option foo
would be ambiguous, since foo could be the option's value or
the name of a database to connect to.
sure
Pavel
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Isaac Morland wrote: > The actual reason I'm posting this is because some of the discussion has > made me a bit confused: there is already a CSV format defined for the COPY > command and used by the psql \copy. I just want to check that what is being > discussed here would use the exact same format as the existing CSV COPY Please see the start of the thread at https://www.postgresql.org/message-id/a8de371e-006f-4f92-ab72-2bbe3ee78f03%40manitou-mail.org where what is proposed is compared to what \copy and COPY already provide. About the CSV acronym itself, the COPY documentation https://www.postgresql.org/docs/current/static/sql-copy.html already refers to it as "Comma Separated Values", even though as we know, the delimiter is user-configurable. There's no difference in psql that would justify a different wording. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
David G. Johnston wrote: > Unaligned format could grow its own fieldsep if it wanted to but it can > also just use the default provided fieldsep var whose default value is > pipe. If it did grow one it would need to understand "not set" in order to > preserve existing behavior. We don't have that problem with csv. fielsep is already owned by the unaligned format. No other format uses fieldsep currently. Why would it needs to grow its own? I don't quite see what you mean here. For csv, Fabien and Peter expressed the opinion that we shouldn't create another fieldsep-like variable specifically for it, but instead reuse fieldsep. That's what my latest patch does. Now it turns out that sharing fieldsep comes with some problems. 1. since a single variable can't have two different default values, we have either to accept '|' as a default csv separator, or introduce some tricks to automagically commute the value. 2. when a user does \pset fieldsep ';' there are doubts on whether this should affect the current mode only, or both, or even the other one when the user goes back and forth between formats. Any proposal where a \pset var value affects by side-effect another pset variable is inconsistent with what has been done until now with these variables, and I expect that side-effects are not exactly sought after in general. 3. the command-line cannot express: use character 'X' for unaligned and 'Y' for csv for the rest of the session. The current patch provides '--csv' to select both the csv format and a comma separator, but when combined with -F, the end result depend on the relative position of the options, which may feel unintuitive or buggy. Again we could tweak that, but only by being inconsistent with how we process other options. Personally I think the benefit of sharing fieldsep is not worth these problems, but I'm waiting for the discussion to continue with more opinions. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
David G. Johnston wrote:
> Unaligned format could grow its own fieldsep if it wanted to but it can
> also just use the default provided fieldsep var whose default value is
> pipe. If it did grow one it would need to understand "not set" in order to
> preserve existing behavior. We don't have that problem with csv.
fielsep is already owned by the unaligned format. No other format
uses fieldsep currently. Why would it needs to grow its own?
I don't quite see what you mean here.
Its mostly a terminology concern...
For csv, Fabien and Peter expressed the opinion that we shouldn't
create another fieldsep-like variable specifically for it, but instead
reuse fieldsep. That's what my latest patch does.
Now it turns out that sharing fieldsep comes with some problems.[...]
Personally I think the benefit of sharing fieldsep is not worth these
problems, but I'm waiting for the discussion to continue with
more opinions.
Basically that because "fieldsep" lacks any kind of suffix it can be considered "unowned" versus "fieldsep_csv" which would be "owned" by the "csv" format. Given the problems I do want an "owned by csv" variable. An owned version for unaligned would then be named "fieldsep_unaligned". But we wouldn't want to rename fieldsep, we'd simply leave it in place as a generic fieldsep variable that formats can choose to use if they wish - with the limitations that come with sharing it with unaligned. Decent odds that no other format would want to but we cannot remove it so we might as well better describe its place now that two formats care about field separation.
fieldsep is "shared" even though at present no one else is sharing it. fieldsep_csv is not shared. fieldsep_unaligned doesn't exist because unaligned uses the shared variable. No one else uses fieldsep, shared or owned.
Saying the "fieldsep" is owned by unaligned mode would be technically correct, though there is no "mark" that makes is readily obvious.
Implementation wise "ownership" could mean that changing the delimiter can only occur while the format type matches. But we could not do that for fieldsep since no such ordering currently is enforced.
David J.
Personally I think the benefit of sharing fieldsep is not worth these
problems, but I'm waiting for the discussion to continue with
more opinions.
Apologies in advance if I mis-represent someone's position.
It seems like having a dedicated option is the consensus opinion. Daniel (the original author) and I both prefer it, Pavel will accept it. Fabien is opposed.
Peter E. was opposed, wanting to leverage both fieldsep and recordsep, but hasn't chimed in recently. His opinion at this point might push this over the edge since he is also a committer.
I would probably suggest maybe just calling it "\pset separator" to match the "S" in "cSv" and not have any name overlap with the fieldsep variable used with unaligned mode. The user will have to learn anything and being more distinct should help the process. We would not consult recordsep though the end-of-line choice should be platform dependent.
David J.
2018-03-30 0:15 GMT+02:00 David G. Johnston <david.g.johnston@gmail.com>:
Personally I think the benefit of sharing fieldsep is not worth these
problems, but I'm waiting for the discussion to continue with
more opinions.Apologies in advance if I mis-represent someone's position.It seems like having a dedicated option is the consensus opinion. Daniel (the original author) and I both prefer it, Pavel will accept it. Fabien is opposed.Peter E. was opposed, wanting to leverage both fieldsep and recordsep, but hasn't chimed in recently. His opinion at this point might push this over the edge since he is also a committer.I would probably suggest maybe just calling it "\pset separator" to match the "S" in "cSv" and not have any name overlap with the fieldsep variable used with unaligned mode. The user will have to learn anything and being more distinct should help the process. We would not consult recordsep though the end-of-line choice should be platform dependent.
-1. The difference between fieldsep and separator is not clear, and the relation between separator and csv is not clean too.
fieldsep_csv or csv_fieldsep is not nice, but it is clear.
Regards
Pavel
David J.
Bonjour Daniel, > For csv, Fabien and Peter expressed the opinion that we shouldn't > create another fieldsep-like variable specifically for it, but instead > reuse fieldsep. That's what my latest patch does. > > Now it turns out that sharing fieldsep comes with some problems. Personnaly I do not have any problem with CSV defaulting to '|' separator, given that anyway people often use anything but a comma for the purpose, including '|'. However Pavel wants to block the patch on this point. Too bad. Maybe you can do (again) a patch with a 'fieldsep_csv' or whatever additional variable? Not sure about 'recordsep'... And then we forward to committers both version and they can chose whatever they think best, including not committing anything? -- Fabien.
On 31/03/18 21:33, Fabien COELHO wrote: > > Bonjour Daniel, > >> For csv, Fabien and Peter expressed the opinion that we shouldn't >> create another fieldsep-like variable specifically for it, but instead >> reuse fieldsep. That's what my latest patch does. >> >> Now it turns out that sharing fieldsep comes with some problems. > > Personnaly I do not have any problem with CSV defaulting to '|' > separator, given that anyway people often use anything but a comma for > the purpose, including '|'. > > However Pavel wants to block the patch on this point. Too bad. > > Maybe you can do (again) a patch with a 'fieldsep_csv' or whatever > additional variable? Not sure about 'recordsep'... > > And then we forward to committers both version and they can chose > whatever they think best, including not committing anything? > Would suggest the choice of 'fieldsep_csv' out of those two (a record would correspond to the whole row), and the default being a comma. Cheers, Gavin
On 31 March 2018 at 04:33, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
Bonjour Daniel,For csv, Fabien and Peter expressed the opinion that we shouldn't
create another fieldsep-like variable specifically for it, but instead
reuse fieldsep. That's what my latest patch does.
Now it turns out that sharing fieldsep comes with some problems.
Personnaly I do not have any problem with CSV defaulting to '|' separator, given that anyway people often use anything but a comma for the purpose, including '|'.
However Pavel wants to block the patch on this point. Too bad.
OK, mostly trying to avoid commenting because I doubt I have much to add. But. If I ask for CSV and don't specify any overrides, I expect to get "C"omma separated values, not some other character. More specifically, if I say --csv I expect to get files that are identical with what I would get if I used COPY ... CSV. Actually, COPY ... CSV HEADER, given that psql shows column headings. This also implies that I expect the quoting and related details that are associated with CSV.
And I don't think I'm a weird user. If --csv does anything even a little different from a simple COPY invocation on the same query, at some point it's going to bite somebody and they will rightfully curse the design decisions taken in this thread.
Hello Isaac, >> Personnaly I do not have any problem with CSV defaulting to '|' separator, >> given that anyway people often use anything but a comma for the purpose, >> including '|'. >> >> However Pavel wants to block the patch on this point. Too bad. > > OK, mostly trying to avoid commenting because I doubt I have much to > add. But. If I ask for CSV and don't specify any overrides, I expect to > get "C"omma separated values, not some other character. More > specifically, if I say --csv I expect to get files that are identical > with what I would get if I used COPY ... CSV. My summary was incomplete. The --csv option implementation by Daniel already does that. The issue Pavel is complaining about is that in interactive mode "\pset format csv" does not do the same: it triggers the csv-rule string-escaping mechanism, but does not reset the "fieldsep" variable (eh, it sets the "format" variable) so the default separator under this interactive use is "|" if the "fieldsep" variable is shared. I have suggested a "\csv" interactive command to set both as a convenient shorthand for "\pset format csv & \pset fieldsep ','", similarly to --csv, but Pavel still wants "\pset format csv" to trigger the full csv output. A consequence I forgot about adding "fieldsep_csv", is that it probably has to duplicate the "_zero" ugly hack to be consistent with existing "*sep" variables, or else be inconsistent. Sigh. -- Fabien.
2018-04-01 8:30 GMT+02:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Isaac,Personnaly I do not have any problem with CSV defaulting to '|' separator,
given that anyway people often use anything but a comma for the purpose,
including '|'.
However Pavel wants to block the patch on this point. Too bad.
OK, mostly trying to avoid commenting because I doubt I have much to add. But. If I ask for CSV and don't specify any overrides, I expect to get "C"omma separated values, not some other character. More specifically, if I say --csv I expect to get files that are identical with what I would get if I used COPY ... CSV.
My summary was incomplete. The --csv option implementation by Daniel already does that.
The issue Pavel is complaining about is that in interactive mode "\pset format csv" does not do the same: it triggers the csv-rule string-escaping mechanism, but does not reset the "fieldsep" variable (eh, it sets the "format" variable) so the default separator under this interactive use is "|" if the "fieldsep" variable is shared.
I have suggested a "\csv" interactive command to set both as a convenient shorthand for "\pset format csv & \pset fieldsep ','", similarly to --csv, but Pavel still wants "\pset format csv" to trigger the full csv output.
A consequence I forgot about adding "fieldsep_csv", is that it probably has to duplicate the "_zero" ugly hack to be consistent with existing "*sep" variables, or else be inconsistent. Sigh.
there are not any relation between fieldsep_csv and fieldsep_zero.
The base of this issue is in wrong concept of fieldsep. It is based on idea so one value can be good enough for all formats. It is worked when all formats that used this variable is ONE format - unaligned.
similar issue is with tuplesonly option - see Isaac's mail.
For different formats the different defaults are more consistent - the other formats does it - doesn't use fieldsep - but it lost internal consistency, because its does quietly.
We can have different ideas about users's expectations - but we should to share opinion, so correct solution should be consistent. If I choose some format, then behave should be same without dependency on activation mechanism.
so \csv should be same like \pset format csv or --csv.
I don't share option so CSV format should be exactly same like CSV COPY. COPY is designed for backups - and header is not too important there. When I seen some csv, then there usually header was used.
If I can search some precedent, then it is \pset linestyle and \pset unicode_*** options.
so we can to have
\pset format xxx
and set of local possibly changed defaults
\pset csv_fieldsep ,
\pset csv_tuplesonly on
\pset unaligned_fieldsep |
\pset unaligned_tuplesonly off
the fieldsep, tuplesonly can be just links to currently used ****_fieldsep and ****_tuplesonly
This is consistent - defaults can be correct for Isaac, and I can paste to my .psqlrc
\pset csv_tuplesonly off
and the semantic is clean, and this option will be active just for csv, and doesn't depend on current format - so it can be used in .psqlrc
When it is works well for linestyle, then it can works for some other too
Regards
Pavel
--
Fabien.
On 4/1/18 03:27, Pavel Stehule wrote: > I don't share option so CSV format should be exactly same like CSV COPY. > COPY is designed for backups - and header is not too important there. > When I seen some csv, then there usually header was used. I think in practice a lot of people use COPY also because it's a nice way to get CSV output, even if it's not for backups. The options that COPY has for CSV are clearly designed around making the output compatible with various CSV-variants. If we create another way to do CSV in psql, it would be totally confusing if some of the options are the same and some are different or missing. Another thought: Isn't CSV just the same as unaligned output plus some quoting? Could we add a quote character setting and then define --csv to be quote-character = " and fieldsep = , ? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Apr 5, 2018, at 11:08 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 4/1/18 03:27, Pavel Stehule wrote: >> I don't share option so CSV format should be exactly same like CSV COPY. >> COPY is designed for backups - and header is not too important there. >> When I seen some csv, then there usually header was used. > > I think in practice a lot of people use COPY also because it's a nice > way to get CSV output, even if it's not for backups. The options that > COPY has for CSV are clearly designed around making the output > compatible with various CSV-variants. +1 From a user standpoint this was mostly how I use COPY. Someone requests a report that they can manipulate in $SPREADSHEET. I write a query, place it inside a “COPY” statement with FORMAT CSV, HEADER TRUE, save to file, and deliver. > Another thought: Isn't CSV just the same as unaligned output plus some > quoting? Could we add a quote character setting and then define --csv > to be quote-character = " and fieldsep = , ? This makes a lot of sense. I’ve also generated CSV files using a combination of: \f , \a \o file.csv and then running the query, but if any of the fields contained a “,” if would inevitably break in $SPREADSHEET. Jonathan
2018-04-06 5:46 GMT+02:00 Jonathan S. Katz <jkatz@postgresql.org>:
> On Apr 5, 2018, at 11:08 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>
> On 4/1/18 03:27, Pavel Stehule wrote:
>> I don't share option so CSV format should be exactly same like CSV COPY.
>> COPY is designed for backups - and header is not too important there.
>> When I seen some csv, then there usually header was used.
>
> I think in practice a lot of people use COPY also because it's a nice
> way to get CSV output, even if it's not for backups. The options that
> COPY has for CSV are clearly designed around making the output
> compatible with various CSV-variants.
+1
From a user standpoint this was mostly how I use COPY. Someone
requests a report that they can manipulate in $SPREADSHEET. I write
a query, place it inside a “COPY” statement with FORMAT CSV,
HEADER TRUE, save to file, and deliver.
> Another thought: Isn't CSV just the same as unaligned output plus some
> quoting? Could we add a quote character setting and then define --csv
> to be quote-character = " and fieldsep = , ?
This makes a lot of sense. I’ve also generated CSV files using a
combination of:
\f ,
\a
\o file.csv
and then running the query, but if any of the fields contained a “,” if would
inevitably break in $SPREADSHEET.
unfortunately, most used CSV separator in Czech Repuplic is ; due MS Excel settings for CR
Regards
Pavel
Jonathan
Hi Pavel,
On Apr 6, 2018, at 1:38 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:2018-04-06 5:46 GMT+02:00 Jonathan S. Katz <jkatz@postgresql.org>:
> On Apr 5, 2018, at 11:08 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>
> On 4/1/18 03:27, Pavel Stehule wrote:
>> I don't share option so CSV format should be exactly same like CSV COPY.
>> COPY is designed for backups - and header is not too important there.
>> When I seen some csv, then there usually header was used.
>
> I think in practice a lot of people use COPY also because it's a nice
> way to get CSV output, even if it's not for backups. The options that
> COPY has for CSV are clearly designed around making the output
> compatible with various CSV-variants.
+1
From a user standpoint this was mostly how I use COPY. Someone
requests a report that they can manipulate in $SPREADSHEET. I write
a query, place it inside a “COPY” statement with FORMAT CSV,
HEADER TRUE, save to file, and deliver.
> Another thought: Isn't CSV just the same as unaligned output plus some
> quoting? Could we add a quote character setting and then define --csv
> to be quote-character = " and fieldsep = , ?
This makes a lot of sense. I’ve also generated CSV files using a
combination of:
\f ,
\a
\o file.csv
and then running the query, but if any of the fields contained a “,” if would
inevitably break in $SPREADSHEET.unfortunately, most used CSV separator in Czech Repuplic is ; due MS Excel settings for CR
Sure. So in that case with the above, I would have used "\f ;” or
DELIMITER ‘;’ so it sounds like Peter’s suggestion is still valid.
Jonathan
Pavel Stehule wrote: > so we can to have > > \pset format xxx > > and set of local possibly changed defaults > > \pset csv_fieldsep , > \pset csv_tuplesonly on > \pset unaligned_fieldsep | > \pset unaligned_tuplesonly off tuples_only (\t) is a single setting that is currently used by all formats. It makes sense as it is and I don't quite see what we would gain by "exploding" it. There's also "footer" that goes in tandem with "tuples_only", to switch off the footer while keeping the header and title. Whatever change is considered to "tuples_only", "footer" must be considered with it. For the csv format, tuples_only=off is interpreted as "output the header" and tuples_only=on as "don't output the header". This is consistent with what other formats do. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
2018-04-07 13:55 GMT+02:00 Daniel Verite <daniel@manitou-mail.org>:
Pavel Stehule wrote:
> so we can to have
>
> \pset format xxx
>
> and set of local possibly changed defaults
>
> \pset csv_fieldsep ,
> \pset csv_tuplesonly on
> \pset unaligned_fieldsep |
> \pset unaligned_tuplesonly off
tuples_only (\t) is a single setting that is currently used by all
formats. It makes sense as it is and I don't quite see what we
would gain by "exploding" it.
It was a example, how the one default can be not good enough.
Usually we expect in align, unalign headers by default. But somebody can expect tuples only by default for CSV format.
There's also "footer" that goes in tandem with "tuples_only",
to switch off the footer while keeping the header and title.
Whatever change is considered to "tuples_only", "footer" must
be considered with it.
For the csv format, tuples_only=off is interpreted as "output the header"
and tuples_only=on as "don't output the header". This is consistent
with what other formats do.
My note was not about the implementations, it was about different expectations of some users - looks on Isaac's mail, pls.
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Peter Eisentraut wrote: > Another thought: Isn't CSV just the same as unaligned output plus some > quoting? Could we add a quote character setting and then define --csv > to be quote-character = " and fieldsep = , ? Plus footer set to off. So --csv would be like \pset format unaligned \pset fieldsep ',' \pset footer off \pset unaligned_quote_character '"' I guess we'd add a \csv command that does these together. There would still be some discomfort with some of the issues discussed upthread. For instance psql -F ';' --csv is likely to reset fieldsep to ',' having then a different outcome than psql --csv -F';' And there's the point on how to come back from \csv to another output, given that it has overwritten "footer" that is used across all formats, and fieldsep used by unaligned. So a user might need to figure out anyway the intricaties behind \csv, if it's not an independant format. On the whole I'm inclined to resubmit the patch with fieldsep_csv and some minor changes based on the rest of the discussion. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Isaac Morland wrote: > OK, mostly trying to avoid commenting because I doubt I have much to add. > But. If I ask for CSV and don't specify any overrides, I expect to get > "C"omma separated values, not some other character. More specifically, if I > say --csv I expect to get files that are identical with what I would get if > I used COPY ... CSV. Actually, COPY ... CSV HEADER, given that psql shows > column headings. This also implies that I expect the quoting and related > details that are associated with CSV. Yes, this is what the current patch does. Only the first version used the same default delimiter as the unaligned mode, but nobody liked that part. As for the header, it's driven by how it's already done for other formats in psql, with the tuples_only setting, which implies that column names are output by default as you mention. Both settings must be changeable, the exact details of how/when is what's being discussed. The output might still differ compared to COPY in that line endings depend on the client-side OS. There's also the minor issue of a single \ by itself on a line, which gets quoted by COPY and not by psql. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Daniel Verite wrote: > The output might still differ compared to COPY in that line endings > depend on the client-side OS. There's also the minor issue > of a single \ by itself on a line, which gets quoted by COPY > and not by psql. I meant \. or backslash followed by period. This sequence means nothing in particular in csv outside of COPY, but it means end-of-data for COPY. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
2018-04-07 14:23 GMT+02:00 Daniel Verite <daniel@manitou-mail.org>:
+1
Peter Eisentraut wrote:
> Another thought: Isn't CSV just the same as unaligned output plus some
> quoting? Could we add a quote character setting and then define --csv
> to be quote-character = " and fieldsep = , ?
Plus footer set to off. So --csv would be like
\pset format unaligned
\pset fieldsep ','
\pset footer off
\pset unaligned_quote_character '"'
I guess we'd add a \csv command that does these together.
There would still be some discomfort with some of the issues
discussed upthread. For instance
psql -F ';' --csv
is likely to reset fieldsep to ',' having then a different outcome than
psql --csv -F';'
And there's the point on how to come back from \csv to another
output, given that it has overwritten "footer" that is used across
all formats, and fieldsep used by unaligned.
So a user might need to figure out anyway the
intricaties behind \csv, if it's not an independant format.
On the whole I'm inclined to resubmit the patch with
fieldsep_csv and some minor changes based on the rest
of the discussion.
+1
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Hi
2018-04-07 14:23 GMT+02:00 Daniel Verite <daniel@manitou-mail.org>:
Can we restart discussion about this feature? I am sure so there is strong common agreement about usability of this feature. Now, we have good enough time to design and implement it well.
Peter Eisentraut wrote:
> Another thought: Isn't CSV just the same as unaligned output plus some
> quoting? Could we add a quote character setting and then define --csv
> to be quote-character = " and fieldsep = , ?
Plus footer set to off. So --csv would be like
\pset format unaligned
\pset fieldsep ','
\pset footer off
\pset unaligned_quote_character '"'
I guess we'd add a \csv command that does these together.
There would still be some discomfort with some of the issues
discussed upthread. For instance
psql -F ';' --csv
is likely to reset fieldsep to ',' having then a different outcome than
psql --csv -F';'
And there's the point on how to come back from \csv to another
output, given that it has overwritten "footer" that is used across
all formats, and fieldsep used by unaligned.
So a user might need to figure out anyway the
intricaties behind \csv, if it's not an independant format.
On the whole I'm inclined to resubmit the patch with
fieldsep_csv and some minor changes based on the rest
of the discussion.
I am starting this discussion with some initial requirement and we can talk about validity of these requirements first.
1. command line long option --csv set output format CSV, default separator should be ','
2. option -F should to work too (there is (was) first issue - this value is now semicolon). I don't like introduction a new like "-F" option just for CSV, because we can share this option for all output formats where separator can be customized.
3. The behave should not depend on order of usage
4. I miss some short/long option for enabling/disabling header (default should be on), there is option -t, that is inverse value of "use header". I am not sure if it is enough.
5. In interactive mode I prefer activating CSV format via \pset format csv .. like almost all any other supported formats
It should to set output configuration like mentioned
\pset format unaligned
\pset fieldsep ',' -- if not assigned before
\pset footer off
\pset unaligned_quote_character '"'
\pset fieldsep ',' -- if not assigned before
\pset footer off
\pset unaligned_quote_character '"'
similarly like command line I am expecting so I can set \pset fieldsep and \pset format in any order.
expected behave:
psql -c "select ..." --csv -F';'
psql -c "select ..." -F';' --csv
interactive
\pset format csv
\pset fieldsep ;
or
\pset fieldsep ;
\pset format csv
I don't like when one command overwrite settings of some other command. We can introduce some similar like GUC where default values from configure file can be overwritten by custom setting for session. So I am able to thinking about some settings
like
\pset csv_default_separator
\pset csv_default_header
Then there is question to simplify all and use \pset csv_separator, and csv format independent of fieldseparator value? It is possible, but I don't think so have more option for similar value is good idea (for interactive mode).
Why I propose these rich and strong requirements?
1. CSV is fundamental format - I don't know any other with similar value for data import/export
2. unfortunately, important feature (separator) is not same in all areas - in Czech rep most used separator is semicolon - so typical case needs setting format and setting separator.
Regards
Pavel
On Sunday, June 3, 2018, Pavel Stehule <pavel.stehule@gmail.com> wrote:
\pset fieldsep ;\pset format csvI don't like when one command overwrite settings of some other command. We can introduce some similar like GUC where default values from configure file can be overwritten by custom setting for session. So I am able to thinking about some settingslike\pset csv_default_separator\pset csv_default_headerThen there is question to simplify all and use \pset csv_separator, and csv format independent of fieldseparator value? It is possible, but I don't think so have more option for similar value is good idea (for interactive mode).
Having a dedicated option seems infinitely better than adding new settings for defaults and having to keep track of whether the shared field separator is a default versus a user specified value.
Recently we reworked server GUCs to avoid this kind of unset/default behavior. I don't see how introducing or relying upon it in psql would be an advantage.
At this point -1, keep the status quo, for any implementation that tries to make the unaligned mode field separator perform double duty. I'm open, but unlikely, to be convinced that it can be done without unforeseen bad side effects and degraded usability.
David J.
2018-06-03 19:16 GMT+02:00 David G. Johnston <david.g.johnston@gmail.com>:
Regards
On Sunday, June 3, 2018, Pavel Stehule <pavel.stehule@gmail.com> wrote:\pset fieldsep ;\pset format csvI don't like when one command overwrite settings of some other command. We can introduce some similar like GUC where default values from configure file can be overwritten by custom setting for session. So I am able to thinking about some settingslike\pset csv_default_separator\pset csv_default_headerThen there is question to simplify all and use \pset csv_separator, and csv format independent of fieldseparator value? It is possible, but I don't think so have more option for similar value is good idea (for interactive mode).Having a dedicated option seems infinitely better than adding new settings for defaults and having to keep track of whether the shared field separator is a default versus a user specified value.Recently we reworked server GUCs to avoid this kind of unset/default behavior. I don't see how introducing or relying upon it in psql would be an advantage.
I am thinking so psql design is little bit special, because we should to think about more modes - interactive and not interactive, and our goal should be some consistency from user perspective in interactive mode. But for me, the special CSV options are acceptable - although little bit unfriendly from user perspective.
At this point -1, keep the status quo, for any implementation that tries to make the unaligned mode field separator perform double duty. I'm open, but unlikely, to be convinced that it can be done without unforeseen bad side effects and degraded usability.
With respect to your opinion, I don't agree so current status is good - mainly in this case.
Pavel
David J.
Pavel Stehule wrote: > > On the whole I'm inclined to resubmit the patch with > > fieldsep_csv and some minor changes based on the rest > > of the discussion. > > > > +1 PFA an updated version. Usage from the command line: $ psql --csv # or -P format=csv $ psql --csv -P fieldsep_csv=";" # for non-comma csv separator From inside psql: \pset format csv \pset fieldsep_csv '\t' Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Attachment
Hello Daniel, > PFA an updated version. > Usage from the command line: > $ psql --csv # or -P format=csv > $ psql --csv -P fieldsep_csv=";" # for non-comma csv separator > > From inside psql: > > \pset format csv > \pset fieldsep_csv '\t' Patch applies cleanly, compiles, global make check ok. Doc: "according to the csv rules" -> "according to csv rules."? Doc: "RFC-4180" -> "RFC 4180"? The previous RFC specifies CRLF as eol, but '\n' is hardcoded in the source. I'm fine with that, but I'd suggest that the documentation should said which EOL is used. ISTM that "--csv" & "-C" are not documented, neither in sgml nor under --help. "fieldsep_csv" does not show on the list of output options under "\?". There seems to be a test in the code to set an empty string "" by default, but it is unclear to me when this is triggered. I'd tend to use "CSV" instead of "csv" everywhere it makes sense, eg in the doc (CSV rules) and in variable names in the code (FooCsv -> FooCSV?), but that is pretty debatable. -- Fabien.
2018-08-10 12:25 GMT+02:00 Daniel Verite <daniel@manitou-mail.org>:
Pavel Stehule wrote:
> > On the whole I'm inclined to resubmit the patch with
> > fieldsep_csv and some minor changes based on the rest
> > of the discussion.
> >
>
> +1
PFA an updated version.
Usage from the command line:
$ psql --csv # or -P format=csv
$ psql --csv -P fieldsep_csv=";" # for non-comma csv separator
From inside psql:
\pset format csv
\pset fieldsep_csv '\t'
quick check +1 - I have not a objections
Regards
Pavel
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Fabien COELHO wrote: > Doc: "according to the csv rules" -> "according to csv rules."? Fixed. > Doc: "RFC-4180" -> "RFC 4180"? Fixed. The other references to RFCs use this syntax indeed. > The previous RFC specifies CRLF as eol, but '\n' is hardcoded in the > source. I'm fine with that, but I'd suggest that the documentation should > said which EOL is used. '\n' gets translated by libc when the output is in text mode. We discussed this upthread, but maybe it should be a code comment: added now. > ISTM that "--csv" & "-C" are not documented, neither in sgml nor under > --help. > > "fieldsep_csv" does not show on the list of output options under "\?". Oops, fixed. > There seems to be a test in the code to set an empty string "" by default, > but it is unclear to me when this is triggered. Where is that code? > I'd tend to use "CSV" instead of "csv" everywhere it makes sense, eg in > the doc (CSV rules) and in variable names in the code (FooCsv -> FooCSV?), > but that is pretty debatable. I've changed to upper case in a couple places and added <acronym> tags, but depending on the context sometimes lower case feels more consistent. This is the same as, for instance, ASCII. We display \pset linestyle as ascii, not ASCII, presumably because everything else in the \pset area is lower case. But both cases are accepted in input. Also added a CSV entry in the doc index per Alvaro's suggestion in https://www.postgresql.org/message-id/20180809202125.r4mdx2jzm7hytetz@alvherre.pgsql with pointers to psql and COPY. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Attachment
Bonjour Daniel, > '\n' gets translated by libc when the output is in text mode. > We discussed this upthread, but maybe it should be a code comment: > added now. My point was more about the documentation which should be clear about what is the EOL. I understand from your point above that the EOL is the platform-specific one, which is yet again fine with me, but should be said clearly in the documentation? >> There seems to be a test in the code to set an empty string "" by >> default, but it is unclear to me when this is triggered. > > Where is that code? + else if (strcmp(param, "fieldsep_csv") == 0) + return pset_quoted_string(popt->topt.fieldSepCsv + ? popt->topt.fieldSepCsv + : ""); It is unclear to me when this "" is triggered. Never? If so, maybe a comment should say so? A few comments about v6: Patch applies cleanly, make check ok, doc build ok. Why removing "-C"? As we already have "-A" an "-H", I was fine with it. It seems that you changed the indentation in "psql-ref.sgml": </varlistentry> - <varlistentry> + <varlistentry> but maybe this is just the previous entry which was misaligned... - {"unaligned", "aligned", "wrapped", "html", "asciidoc", - "latex", "latex-longtable", "troff-ms", NULL}; + {"aligned", "asciidoc", "csv", "html", "latex", "latex-longtable", + "unaligned", "troff-ms", "wrapped", NULL}; If you want alphabetical, 'u' > 't'. While testing I found a small issue if "fieldsep_csv" is set to a strange value: \pset format_csv ',,' SELECT ',', ','; -- gives the ambiguous: ,,,, The rule to decide whether to quote should be made safer/smarter. I'd suggest that if the string contains any of the caracters used in format csv it should be quoted. -- Fabien.
Fabien COELHO wrote: > My point was more about the documentation which should be clear about what > is the EOL. I understand from your point above that the EOL is the > platform-specific one, which is yet again fine with me, but should be said > clearly in the documentation? Okay, I've added a bit in the doc. > + else if (strcmp(param, "fieldsep_csv") == 0) > + return pset_quoted_string(popt->topt.fieldSepCsv > + ? popt->topt.fieldSepCsv > + : ""); > > It is unclear to me when this "" is triggered. Never? If so, maybe a > comment should say so? Currently popt->topt.fieldSepCsv can't be NULL so I've simplified this to just return pset_quoted_string(popt->topt.fieldSepCsv). > Why removing "-C"? As we already have "-A" an "-H", I was fine with it. It was a leftover from v3. Participants in the thread don't seem to want the short option, to my surprise. Pavel argued first against -C upthread, I argued quite a bit in favor of it, the "for" had 0 upvote, and "against" had at least 4 I think, including yours in [1]. > It seems that you changed the indentation in "psql-ref.sgml": > > </varlistentry> > - <varlistentry> > + <varlistentry> Fixed. > - {"unaligned", "aligned", "wrapped", "html", "asciidoc", > - "latex", "latex-longtable", "troff-ms", NULL}; > + {"aligned", "asciidoc", "csv", "html", "latex", "latex-longtable", > + "unaligned", "troff-ms", "wrapped", NULL}; > > If you want alphabetical, 'u' > 't'. Fixed. > > While testing I found a small issue if "fieldsep_csv" is set to a strange > value: > > \pset format_csv ',,' > SELECT ',', ','; > -- gives the ambiguous: > ,,,, > > The rule to decide whether to quote should be made safer/smarter. I'd > suggest that if the string contains any of the caracters used in format > csv it should be quoted. You meant \pset fieldsep_csv ',,' If you do that even SELECT 'foo', 'bar' comes out wrong because it looks like a 3-field row: foo,,bar If we want to prevent people to shoot themselves in the foot with that sort of thing, I've added a couple tests: No double quote, no LF or CR, single character (but multibyte allowed) for the separator. [1] https://www.postgresql.org/message-id/alpine.DEB.2.20.1803081004241.2916%40lancre Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Attachment
Bonjour Daniel, > Currently popt->topt.fieldSepCsv can't be NULL so I've simplified this > to just return pset_quoted_string(popt->topt.fieldSepCsv). Ok. >> While testing I found a small issue if "fieldsep_csv" is set to a strange >> value: >> >> \pset format_csv ',,' > > You meant \pset fieldsep_csv ',,' Indeed. > If you do that even SELECT 'foo', 'bar' comes out wrong because it looks > like a 3-field row: foo,,bar Yes and no. I asked for ",," as a separator, so probably I really want that and I'd be okay with the result. > If we want to prevent people to shoot themselves in the foot with that > sort of thing, I've added a couple tests: No double quote, no LF or > CR, single character (but multibyte allowed) for the separator. Ok, why not. Patch applies cleanly, compiles, "make check" ok. I tried "\pset fieldsep_csv '\0'" which could be seen as one character, but it does not want it. I'm okay with this behavior. I'd suggest to add a test about rejected fieldsep_csv values, which raises both errors. -- Fabien.
On Tue, Aug 28, 2018 at 08:54:02PM +0200, Fabien COELHO wrote: > Patch applies cleanly, compiles, "make check" ok. > > I tried "\pset fieldsep_csv '\0'" which could be seen as one character, but > it does not want it. I'm okay with this behavior. > > I'd suggest to add a test about rejected fieldsep_csv values, which raises > both errors. The patch has been waiting for input from the author since August, so I am marking it as returned with feedback. If you can produce a new version, please feel free to post it. -- Michael
Attachment
Michael Paquier wrote: > If you can produce a new version, please feel free to post it. Here's a rebased version with a couple regression tests added per the discussions during the previous CF. Now at https://commitfest.postgresql.org/20/1861/ Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Attachment
Bonjour Daniel, > Here's a rebased version with a couple regression tests > added per the discussions during the previous CF. > > Now at https://commitfest.postgresql.org/20/1861/ Patch applies cleanly, compiles, make check ok, doc gen ok. Fine with me. I switched the patch to "Ready". -- Fabien.
On Sat, Nov 03, 2018 at 09:45:14AM +0100, Fabien COELHO wrote: > Patch applies cleanly, compiles, make check ok, doc gen ok. > > Fine with me. I switched the patch to "Ready". I have begun looking at this patch, and there are some noise diffs because of the reordering of the items you are doing in psql code. Ordering them in alphabetical order is a good idea due to the high number of options available, and more would pile up even if this separates a bit "aligned" and "unaligned", so I have have separated those diffs from the core patch and committed it, leaving the core portion of the patch aside for later. -- Michael
Attachment
Michael Paquier wrote: > Ordering them in alphabetical order is a good idea due to the high > number of options available, and more would pile up even if this > separates a bit "aligned" and "unaligned", so I have have separated > those diffs from the core patch and committed it, leaving the core > portion of the patch aside for later. Here's a rebased version following these changes. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Attachment
On Tue, Nov 06, 2018 at 02:57:15PM +0100, Daniel Verite wrote: > Here's a rebased version following these changes. Thanks for doing a rebase. First, it seems to me that it is necessary to overcomplicate the interface of psql for that. Most of the other formats don't have their own switch, and it is perfectly possible to use CSV format with just --pset=format=csv. -C could also be useful for other things, say compression. Not reusing fieldsep and creating a new variable looks like the correct way to go, as there is no need to bother about cross-option checks if the format is moved from unaligned to csv and the other way around, especially as the delimiter of CSV needs cannot be larger than one-byte. + pset.popt.topt.fieldSepCsv = pg_strdup(","); Let's store that in a variable instead of hardcoding it. In the regression tests, "col 9" is wanted with a newline? I am still digging into the details of this patch, and I am unfortunately running out of fuel for the day. -- Michael
Attachment
Bonjour Michaël, > First, it seems to me that it is necessary to overcomplicate the > interface of psql for that. Most of the other formats don't have their > own switch, and it is perfectly possible to use CSV format with just > --pset=format=csv. Although this work, it is not very user friendly nor easy to memorize. Quite a few direct format options already exists, eg -A --no-align (yuk, -A means not to align... :-(), -H --html... In this context "--csv" looks pretty unambiguous and more useful, so I would be in favor of keeping this option. > -C could also be useful for other things, say compression. Sure. -C is *NOT* used for --csv in the patch, although it may have been with an earlier version? > In the regression tests, "col 9" is wanted with a newline? Yes, the point is to check whether quoting & escaping are triggered appropriately in column names and values. -- Fabien.
Michael Paquier wrote: > -C could also be useful for other things, say compression. The patch does not grab any short letter, as the consensus went that way. > + pset.popt.topt.fieldSepCsv = pg_strdup(","); > Let's store that in a variable instead of hardcoding it. I guess it could go into a #define in psql/settings.h, along with these: #define DEFAULT_FIELD_SEP "|" #define DEFAULT_RECORD_SEP "\n" > In the regression tests, "col 9" is wanted with a newline? Yes, since we support column names with embedded newlines (even though it's hard to think of a legit use case for that) and CSV fields support embedded newlines too. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On Wed, Nov 07, 2018 at 05:36:54PM +0100, Daniel Verite wrote: > I guess it could go into a #define in psql/settings.h, along with > these: > #define DEFAULT_FIELD_SEP "|" > #define DEFAULT_RECORD_SEP "\n" Yes. > Yes, since we support column names with embedded newlines (even > though it's hard to think of a legit use case for that) and CSV fields > support embedded newlines too. Hm. We would likely want at least a comment about that.. Anyway, I am still going through the patch, so no need to send a new version for now. -- Michael
Attachment
On Thu, Nov 08, 2018 at 01:58:34PM +0900, Michael Paquier wrote: > Anyway, I am still going through the patch, so no need to send a new > version for now. Okay, I have done a round of more in-depth review, and the patch looks to be in pretty good shape. Relying on tuples_only to decide if the header should be printed or not looks good to me. + /* check for value being non-empty and with an MB length of 1 */ + if (*value == '\0' || value[PQmblen(value, pset.encoding)] != '\0') It seems to me that this can just be replaced with that: if (strlen(value) != 1) Attached is what I am finishing up with for the moment. Comments are welcome. I am still planning look at that stuff a bit more once again, particularly the printing part, but I am lacking of time now.. -- Michael
Attachment
pá 9. 11. 2018 v 6:57 odesílatel Michael Paquier <michael@paquier.xyz> napsal:
On Thu, Nov 08, 2018 at 01:58:34PM +0900, Michael Paquier wrote:
> Anyway, I am still going through the patch, so no need to send a new
> version for now.
Okay, I have done a round of more in-depth review, and the patch looks
to be in pretty good shape.
Relying on tuples_only to decide if the header should be printed or not
looks good to me.
+ /* check for value being non-empty and with an MB length of 1 */
+ if (*value == '\0' || value[PQmblen(value, pset.encoding)] != '\0')
It seems to me that this can just be replaced with that:
if (strlen(value) != 1)
Attached is what I am finishing up with for the moment. Comments are
welcome. I am still planning look at that stuff a bit more once again,
particularly the printing part, but I am lacking of time now..
looks ok
Pavel
--
Michael
Michael Paquier wrote: > + /* check for value being non-empty and with an MB length of 1 */ > + if (*value == '\0' || value[PQmblen(value, pset.encoding)] != '\0') > > It seems to me that this can just be replaced with that: > if (strlen(value) != 1) The above is meant to accept a multibyte character as the separator, in which case strlen(value) would be greater than 1. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On Fri, Nov 09, 2018 at 11:40:44AM +0100, Daniel Verite wrote: > The above is meant to accept a multibyte character as the separator, > in which case strlen(value) would be greater than 1. Still what's the point except complicating the code? We don't care about anything fancy in the backend-side ProcessCopyOptions() when checking cstate->delim, and having some consistency looks like a good thing to me. I am also questioning a couple of things: it is possible to choose freely how to represent a null value thanks to "\pset null", which gives an equivalent to the COPY option. Choosing if a header is present or not can be done with tuples_only. However there is no option to specify an escape character, no option to specify a quote character, and it is not possible to force quotes for all values. Those are huge advantages as any output can be made compatible with other CSV variants. Isn't what is presented too limited? \copy has at least the advantage to push down those responsibilities to the backend.. -- Michael
Attachment
Michael Paquier wrote: > Still what's the point except complicating the code? We don't care > about anything fancy in the backend-side ProcessCopyOptions() when > checking cstate->delim, and having some consistency looks like a good > thing to me. The backend has its reasons that don't apply to the psql output format, mostly import performance according to [1] It's not that nobody wants delimiter outside of US-ASCII, as people do ask for that sometimes: https://www.postgresql.org/message-id/f02ulk%242r3u%241%40news.hub.org https://github.com/greenplum-db/gpdb/issues/1246 > However there is no option to specify > an escape character, no option to specify a quote character, and it is > not possible to force quotes for all values. Those are huge advantages > as any output can be made compatible with other CSV variants. Isn't > what is presented too limited? The guidelines that the patch has been following are those of RFC 4180 [2] with two exceptions on the field separator that we can define and the end of lines that are OS-dependant instead of the fixed CRLF that IETF seems to see as the norm. The only reference to escaping in the RFC is: "If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote" The problem with using non-standard QUOTE or ESCAPE is that it's a violation of the format that goes further than choosing a separator different than comma, which is already a pain point. We can always add these options later if there is demand. I suspect it will never happen. I looked at the 2004 archives when CSV was added to COPY, that's around commit 862b20b38 in case anyone cares to look, but I couldn't find a discussion on these options, all I could find is they were present from the start. But again COPY is concerned with importing the data that preexists, even if it's weird, whereas a psql output formats are not. [1] https://www.postgresql.org/message-id/4C9D2BC5.1080006%40optonline.net [2] https://tools.ietf.org/html/rfc4180 Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On Fri, Nov 09, 2018 at 05:28:07PM +0100, Daniel Verite wrote: > But again COPY is concerned with importing the data that preexists, > even if it's weird, whereas a psql output formats are not. Hm. I checked the contents of the patch in details which provide output consistent with COPY, but after looking at the global picture I am getting cold feet on this patch for a couple of reasons: - This stuff adds new code paths in the frontend mimicking what the backend already does for years, both doing the same thing. - There are already three ways to fetch data in this format with COPY, \copy and file_fdw, with all three using the same code paths for option validations (I can see the arguments at the top of the thread for which COPY SELECT can actually do everything you want with?). - The experience is confusing, as the psql format uses different options than the backend to do the same things: -- tuples_only instead of HEADER. -- fieldsep_csv instead of DELIMITER -- null is an equivalent of the one with the same name, which is actually consistent. -- encoding is also an equivalent of ENCODING. -- and all the others missing. That looks like a lot. -- Michael
Attachment
Michael Paquier wrote: > - The experience is confusing, as the psql format uses different options > than the backend to do the same things: > -- tuples_only instead of HEADER. > -- fieldsep_csv instead of DELIMITER > -- null is an equivalent of the one with the same name, which is > actually consistent. > -- encoding is also an equivalent of ENCODING. That conveniently ignores the fact that "\pset format somename" as a command doesn't take any additional option, contrary to COPY. We can't do \pset format csv (delimiter=';') If we choosed "delimiter" as an option name, it would have to exist within 20 other names in the \pset namespace and then it would be too vague, whereas "fieldsep_csv" makes it clear what it applies to. "tuples_only" is preexisting, and I don't see how the comment that it's not called "header" could be actionable. Overall, you seem to posit that we should mimic the entire COPY TO interface to implement 'psql --csv'. But the starting point is that 'psql --csv' is just a slightly different (safer) variant of 'psql -At', which is not concerned at all with being consistent with COPY. The model of the csv output format is the unaligned output format, it's just not COPY. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On Friday, November 9, 2018, Michael Paquier <michael@paquier.xyz> wrote:
does work with these."
On Fri, Nov 09, 2018 at 05:28:07PM +0100, Daniel Verite wrote:
> But again COPY is concerned with importing the data that preexists,
> even if it's weird, whereas a psql output formats are not.
Hm. I checked the contents of the patch in details which provide output
consistent with COPY, but after looking at the global picture I am
getting cold feet on this patch for a couple of reasons:
- This stuff adds new code paths in the frontend mimicking what the
backend already does for years, both doing the same thing.
From the original post:
"copy with csv can't help for the output of meta-commands
such as \gx, \crosstabview, \l, \d ... whereas a CSV format within psqldoes work with these."
Formatting is usually a client-side concern so this feature fits well there. The fact that we’ve provided a server interface for the same doesn’t preclude its potential desirability in the client.
- There are already three ways to fetch data in this format with COPY,
\copy and file_fdw, with all three using the same code paths for option
validations (I can see the arguments at the top of the thread for which
COPY SELECT can actually do everything you want with?).
Not always conveniently.
- The experience is confusing, as the psql format uses different options
than the backend to do the same things:
Yes, those who use psql need to learn its features. I’d posit that since this syntax is being learned anyway that transferring said knowledge to a newly added csv format will not be confusing. No more so that having to do something that is usually client-side (formatting) on the server in the first place. That we don’t fully replicate the server functionality does’t bother be. This is meant to be a simple and quick ability that handles 95% of use cases and defers to the more powerful server version for the outliers.
Feature-wise I’m on board. Given it’s already written I’d say it should go in unless there are code complexity and support concerns - which given the prevalence of other formats I have to believe there are not.
David J.
po 12. 11. 2018 v 5:19 odesílatel David G. Johnston <david.g.johnston@gmail.com> napsal:
On Friday, November 9, 2018, Michael Paquier <michael@paquier.xyz> wrote:On Fri, Nov 09, 2018 at 05:28:07PM +0100, Daniel Verite wrote:
> But again COPY is concerned with importing the data that preexists,
> even if it's weird, whereas a psql output formats are not.
Hm. I checked the contents of the patch in details which provide output
consistent with COPY, but after looking at the global picture I am
getting cold feet on this patch for a couple of reasons:
- This stuff adds new code paths in the frontend mimicking what the
backend already does for years, both doing the same thing.From the original post:"copy with csv can't help for the output of meta-commandssuch as \gx, \crosstabview, \l, \d ... whereas a CSV format within psql
does work with these."Formatting is usually a client-side concern so this feature fits well there. The fact that we’ve provided a server interface for the same doesn’t preclude its potential desirability in the client.- There are already three ways to fetch data in this format with COPY,
\copy and file_fdw, with all three using the same code paths for option
validations (I can see the arguments at the top of the thread for which
COPY SELECT can actually do everything you want with?).Not always conveniently.- The experience is confusing, as the psql format uses different options
than the backend to do the same things:Yes, those who use psql need to learn its features. I’d posit that since this syntax is being learned anyway that transferring said knowledge to a newly added csv format will not be confusing. No more so that having to do something that is usually client-side (formatting) on the server in the first place. That we don’t fully replicate the server functionality does’t bother be. This is meant to be a simple and quick ability that handles 95% of use cases and defers to the more powerful server version for the outliers.
This patch is not simple - not due own complexity, but due current state of psql and output format support. The psql is aged software and implement new format well known is not simple.
The COPY statement has different purpose and because it has server side implementation, it cannot to cover client side space.
I afraid so there hardy be designed some better - and it is unhappy so psql has not native csv support yet.
Regards
Pavel
Feature-wise I’m on board. Given it’s already written I’d say it should go in unless there are code complexity and support concerns - which given the prevalence of other formats I have to believe there are not.David J.
Bonjour Michaël, >> But again COPY is concerned with importing the data that preexists, >> even if it's weird, whereas a psql output formats are not. > > Hm. I checked the contents of the patch in details which provide output > consistent with COPY, but after looking at the global picture I am > getting cold feet on this patch for a couple of reasons: > - This stuff adds new code paths in the frontend mimicking what the > backend already does for years, both doing the same thing. - Backend's COPY cannot write to client space, so the comparison is not relevant. - "\copy (SQL-query) TO STDOUT CSV;" is kind of a pain, because one has to edit around the query, which is not convenient, esp from the command line: sh> psql --csv -c 'SELECT 1 as one, 2 as two' > out.csv vs sh> psql -c "\copy (SELECT 1 AS one, 2 as two) TO STDOUT CSV" > out.csv or mixing the output file name inside the argument, which is very unshell like: sh> psql -c "\copy (SELECT 1 AS one, 2 as two) TO 'out.csv' CSV" If you have a "query.sql" file that you want to output in csv, there is no simple way to do that with \copy/COPY, whereas "psql --csv -f query.sql" looks pretty straightforward to me. Also, in a makefile, I could write: %.csv: %.sql psql --csv -f $< > $@ My point is that \copy, COPY and the proposed CSV format do not address the same use cases. > - There are already three ways to fetch data in this format with COPY, > \copy and file_fdw, with all three using the same code paths for option > validations (I can see the arguments at the top of the thread for which > COPY SELECT can actually do everything you want with?). > - The experience is confusing, as the psql format uses different options > than the backend to do the same things: > -- tuples_only instead of HEADER. > -- fieldsep_csv instead of DELIMITER > -- null is an equivalent of the one with the same name, which is > actually consistent. > -- encoding is also an equivalent of ENCODING. > -- and all the others missing. > That looks like a lot. I disagree on this one: the proposed csv format just follows the existing psql format pattern used for 8 formats and reuses it for csv. Note that there are existing command line options for tuples_only (-t), encoding is inherited from the shell and does not need to be changed that often nowadays, fieldsep_csv is kind-of a pain, but then if someone wants "comma-separated-values" NOT separated by commas, probably they can handle it. Basically the proposed patch addresses a simple and convenient use case which are neither addressed by \copy nor COPY. The fact that more options are available with these commands does it precludes its usefulness as is. -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: > Basically the proposed patch addresses a simple and convenient use case > which are neither addressed by \copy nor COPY. The fact that more options > are available with these commands does it precludes its usefulness as is. Yeah, I agree that this option is useful independently of whether COPY provides something similar. I think the killer argument is that right now, psql-ref.sgml repeatedly suggests that unaligned mode with fieldsep ',' is a reasonable way to produce comma-separated output for consumption by other programs. That's like handing our users a loaded foot-gun. And, in fact, right now *none* of psql's table output formats is both unambiguous and reasonably simple/popular to use. So the astonishing thing about this patch, IMO, is that we didn't do it a decade ago. I went through the documentation to improve that point, and did some other cosmetic cleanup including rebasing up to HEAD, and got the attached. I think there are two remaining points to settle: 1. Are we limiting the separator to be a single-byte character or not? If we are, the code could be made simpler and faster by working with a "char" value instead of a string. If we're not, then Michael's change needs to be undone (but I didn't do that here). I feel that if we allow multi-byte characters here, we might as well take the training wheels off and just say you can use any separator string you want, as long as it doesn't contain double quote, \r, or \n. Most programs reading a file are not going to perceive a difference between separating fields with a single multibyte character and multiple single-byte characters; either they can cope or they can't. A fair number of them are going to be in the latter category. So we can either say "do what you wish, it's your problem whether anything can read the result" or "we're going to restrict you to something that (perhaps) is more widely readable". I'm a bit inclined to the former viewpoint. If we were in the business of being restrictive, why would we allow the field separator to be changed at all? The name of the format is *comma* separated values, not something else. 2. Speaking of the field separator, I'm pretty desperately unhappy with the choice of "fieldsep_csv" as the parameter name. The trouble with that is that it encourages sticking "fieldsep_csv" in between "fieldsep" and "fieldsep_zero", because alphabet. But "fieldsep" and "fieldsep_zero" are *intimately* tied together, in fact it's only a dubious implementation choice that made them separate parameters at all. It does not make any semantic sense to stick other vaguely-related parameters in between, neither in the documentation nor in \pset output. We could avoid this self-inflicted confusion by choosing a different parameter name. I'd be good with "csv_fieldsep" or "csvfieldsep". Or we could kill both issues by hard-wiring the separator as ','. Thoughts? regards, tom lane diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 6e6d0f4..d53451b 100644 *** a/doc/src/sgml/ref/psql-ref.sgml --- b/doc/src/sgml/ref/psql-ref.sgml *************** PostgreSQL documentation *** 68,75 **** <listitem> <para> Switches to unaligned output mode. (The default output mode is ! otherwise aligned.) This is equivalent to <command>\pset format ! unaligned</command>. </para> </listitem> </varlistentry> --- 68,75 ---- <listitem> <para> Switches to unaligned output mode. (The default output mode is ! <literal>aligned</literal>.) This is equivalent to ! <command>\pset format unaligned</command>. </para> </listitem> </varlistentry> *************** EOF *** 152,157 **** --- 152,167 ---- </varlistentry> <varlistentry> + <term><option>--csv</option></term> + <listitem> + <para> + Switches to <acronym>CSV</acronym> (Comma-Separated Values) output + mode. This is equivalent to <command>\pset format csv</command>. + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><option>-d <replaceable class="parameter">dbname</replaceable></option></term> <term><option>--dbname=<replaceable class="parameter">dbname</replaceable></option></term> <listitem> *************** EOF *** 270,277 **** <term><option>--html</option></term> <listitem> <para> ! Turn on <acronym>HTML</acronym> tabular output. This is ! equivalent to <literal>\pset format html</literal> or the <command>\H</command> command. </para> </listitem> --- 280,287 ---- <term><option>--html</option></term> <listitem> <para> ! Switches to <acronym>HTML</acronym> output mode. This is ! equivalent to <command>\pset format html</command> or the <command>\H</command> command. </para> </listitem> *************** lo_import 152801 *** 2547,2554 **** <listitem> <para> Specifies the field separator to be used in unaligned output ! format. That way one can create, for example, tab- or ! comma-separated output, which other programs might prefer. To set a tab as field separator, type <literal>\pset fieldsep '\t'</literal>. The default field separator is <literal>'|'</literal> (a vertical bar). --- 2557,2564 ---- <listitem> <para> Specifies the field separator to be used in unaligned output ! format. That way one can create, for example, tab-separated ! output, which other programs might prefer. To set a tab as field separator, type <literal>\pset fieldsep '\t'</literal>. The default field separator is <literal>'|'</literal> (a vertical bar). *************** lo_import 152801 *** 2557,2562 **** --- 2567,2585 ---- </varlistentry> <varlistentry> + <term><literal>fieldsep_csv</literal></term> + <listitem> + <para> + Specifies the field separator to be used in + <acronym>CSV</acronym> output format. If the separator character + appears in a field's value, that field is output inside double + quotes, following standard <acronym>CSV</acronym> rules. + The default is a comma. + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><literal>fieldsep_zero</literal></term> <listitem> <para> *************** lo_import 152801 *** 2585,2606 **** <listitem> <para> Sets the output format to one of <literal>aligned</literal>, ! <literal>asciidoc</literal>, <literal>html</literal>, ! <literal>latex</literal> (uses <literal>tabular</literal>), <literal>latex-longtable</literal>, <literal>troff-ms</literal>, <literal>unaligned</literal>, or <literal>wrapped</literal>. Unique abbreviations are allowed. </para> <para><literal>unaligned</literal> format writes all columns of a row on one line, separated by the currently active field separator. This is useful for creating output that might be intended to be read ! in by other programs (for example, tab-separated or comma-separated ! format). </para> ! <para><literal>aligned</literal> format is the standard, human-readable, ! nicely formatted text output; this is the default. </para> <para><literal>wrapped</literal> format is like <literal>aligned</literal> but wraps --- 2608,2655 ---- <listitem> <para> Sets the output format to one of <literal>aligned</literal>, ! <literal>asciidoc</literal>, ! <literal>csv</literal>, ! <literal>html</literal>, ! <literal>latex</literal>, <literal>latex-longtable</literal>, <literal>troff-ms</literal>, <literal>unaligned</literal>, or <literal>wrapped</literal>. Unique abbreviations are allowed. </para> + <para><literal>aligned</literal> format is the standard, + human-readable, nicely formatted text output; this is the default. + </para> + <para><literal>unaligned</literal> format writes all columns of a row on one line, separated by the currently active field separator. This is useful for creating output that might be intended to be read ! in by other programs, for example, tab-separated or comma-separated ! format. However, the field separator character is not treated ! specially if it appears in a column's value; ! so <acronym>CSV</acronym> format may be better suited for such ! purposes. </para> ! <para><literal>csv</literal> format ! <indexterm> ! <primary>CSV (Comma-Separated Values) format</primary> ! <secondary>in psql</secondary> ! </indexterm> ! writes column values separated by commas, applying the quoting ! rules described in ! <ulink url="https://tools.ietf.org/html/rfc4180">RFC 4180</ulink>. ! This output is compatible with the CSV format of the server's ! <command>COPY</command> command. ! A header line with column names is generated unless ! the <literal>tuples_only</literal> parameter is ! <literal>on</literal>. Titles and footers are not printed. ! Each row is terminated by the system-dependent end-of-line character, ! which is typically a single newline (<literal>\n</literal>) for ! Unix-like systems or a carriage return and newline sequence ! (<literal>\r\n</literal>) for Microsoft Windows. ! Field separator characters other than comma can be selected with ! <command>\pset fieldsep_csv</command>. </para> <para><literal>wrapped</literal> format is like <literal>aligned</literal> but wraps *************** lo_import 152801 *** 2620,2627 **** language. They are not complete documents! This might not be necessary in <acronym>HTML</acronym>, but in <application>LaTeX</application> you must have a complete ! document wrapper. <literal>latex-longtable</literal> ! also requires the <application>LaTeX</application> <literal>longtable</literal> and <literal>booktabs</literal> packages. </para> </listitem> --- 2669,2680 ---- language. They are not complete documents! This might not be necessary in <acronym>HTML</acronym>, but in <application>LaTeX</application> you must have a complete ! document wrapper. ! The <literal>latex</literal> format ! uses <application>LaTeX</application>'s <literal>tabular</literal> ! environment. ! The <literal>latex-longtable</literal> format ! requires the <application>LaTeX</application> <literal>longtable</literal> and <literal>booktabs</literal> packages. </para> </listitem> *************** first second *** 4588,4604 **** peter@localhost testdb=> <userinput>\pset border 1</userinput> Border style is 1. ! peter@localhost testdb=> <userinput>\pset format unaligned</userinput> ! Output format is unaligned. ! peter@localhost testdb=> <userinput>\pset fieldsep ","</userinput> ! Field separator is ",". peter@localhost testdb=> <userinput>\pset tuples_only</userinput> ! Showing only tuples. peter@localhost testdb=> <userinput>SELECT second, first FROM my_table;</userinput> one,1 two,2 three,3 four,4 </programlisting> Alternatively, use the short commands: <programlisting> --- 4641,4664 ---- peter@localhost testdb=> <userinput>\pset border 1</userinput> Border style is 1. ! peter@localhost testdb=> <userinput>\pset format csv</userinput> ! Output format is csv. peter@localhost testdb=> <userinput>\pset tuples_only</userinput> ! Tuples only is on. peter@localhost testdb=> <userinput>SELECT second, first FROM my_table;</userinput> one,1 two,2 three,3 four,4 + peter@localhost testdb=> <userinput>\pset format unaligned</userinput> + Output format is unaligned. + peter@localhost testdb=> <userinput>\pset fieldsep '\t'</userinput> + Field separator is " ". + peter@localhost testdb=> <userinput>SELECT second, first FROM my_table;</userinput> + one 1 + two 2 + three 3 + four 4 </programlisting> Alternatively, use the short commands: <programlisting> diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index ee88e1c..a18c8a7 100644 *** a/src/bin/psql/command.c --- b/src/bin/psql/command.c *************** exec_command_pset(PsqlScanState scan_sta *** 1957,1964 **** int i; static const char *const my_list[] = { ! "border", "columns", "expanded", "fieldsep", "fieldsep_zero", ! "footer", "format", "linestyle", "null", "numericlocale", "pager", "pager_min_lines", "recordsep", "recordsep_zero", "tableattr", "title", "tuples_only", --- 1957,1964 ---- int i; static const char *const my_list[] = { ! "border", "columns", "expanded", "fieldsep", "fieldsep_csv", ! "fieldsep_zero", "footer", "format", "linestyle", "null", "numericlocale", "pager", "pager_min_lines", "recordsep", "recordsep_zero", "tableattr", "title", "tuples_only", *************** _align2string(enum printFormat in) *** 3616,3621 **** --- 3616,3624 ---- case PRINT_ASCIIDOC: return "asciidoc"; break; + case PRINT_CSV: + return "csv"; + break; case PRINT_HTML: return "html"; break; *************** do_pset(const char *param, const char *v *** 3696,3701 **** --- 3699,3705 ---- /* remember to update error message below when adding more */ {"aligned", PRINT_ALIGNED}, {"asciidoc", PRINT_ASCIIDOC}, + {"csv", PRINT_CSV}, {"html", PRINT_HTML}, {"latex", PRINT_LATEX}, {"latex-longtable", PRINT_LATEX_LONGTABLE}, *************** do_pset(const char *param, const char *v *** 3727,3733 **** } if (match_pos < 0) { ! psql_error("\\pset: allowed formats are aligned, asciidoc, html, latex, latex-longtable, troff-ms, unaligned,wrapped\n"); return false; } else --- 3731,3737 ---- } if (match_pos < 0) { ! psql_error("\\pset: allowed formats are aligned, asciidoc, csv, html, latex, latex-longtable, troff-ms,unaligned, wrapped\n"); return false; } else *************** do_pset(const char *param, const char *v *** 3865,3870 **** --- 3869,3895 ---- popt->topt.fieldSep.separator_zero = true; } + /* field separator for CSV format */ + else if (strcmp(param, "fieldsep_csv") == 0) + { + if (value) + { + /* CSV separator has to be a one-byte character */ + if (strlen(value) != 1) + { + psql_error("\\pset: the CSV field separator must be a single character\n"); + return false; + } + if (value[0] == '"' || value[0] == '\n' || value[0] == '\r') + { + psql_error("\\pset: the CSV field separator cannot be a double quote, a newline, or a carriage return\n"); + return false; + } + free(popt->topt.fieldSepCsv); + popt->topt.fieldSepCsv = pg_strdup(value); + } + } + /* record separator for unaligned text */ else if (strcmp(param, "recordsep") == 0) { *************** printPsetInfo(const char *param, struct *** 4013,4018 **** --- 4038,4050 ---- printf(_("Field separator is zero byte.\n")); } + /* show field separator for CSV format */ + else if (strcmp(param, "fieldsep_csv") == 0) + { + printf(_("Field separator for CSV is \"%s\".\n"), + popt->topt.fieldSepCsv); + } + /* show disable "(x rows)" footer */ else if (strcmp(param, "footer") == 0) { *************** pset_value_string(const char *param, str *** 4207,4212 **** --- 4239,4246 ---- return pset_quoted_string(popt->topt.fieldSep.separator ? popt->topt.fieldSep.separator : ""); + else if (strcmp(param, "fieldsep_csv") == 0) + return pset_quoted_string(popt->topt.fieldSepCsv); else if (strcmp(param, "fieldsep_zero") == 0) return pstrdup(pset_bool_string(popt->topt.fieldSep.separator_zero)); else if (strcmp(param, "footer") == 0) diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index 586aebd..961acd9 100644 *** a/src/bin/psql/help.c --- b/src/bin/psql/help.c *************** usage(unsigned short int pager) *** 108,113 **** --- 108,114 ---- fprintf(output, _("\nOutput format options:\n")); fprintf(output, _(" -A, --no-align unaligned table output mode\n")); + fprintf(output, _(" --csv CSV (Comma-Separated Values) table output mode\n")); fprintf(output, _(" -F, --field-separator=STRING\n" " field separator for unaligned output (default: \"%s\")\n"), DEFAULT_FIELD_SEP); *************** slashUsage(unsigned short int pager) *** 272,281 **** fprintf(output, _(" \\H toggle HTML output mode (currently %s)\n"), ON(pset.popt.topt.format == PRINT_HTML)); fprintf(output, _(" \\pset [NAME [VALUE]] set table output option\n" ! " (NAME := {border|columns|expanded|fieldsep|fieldsep_zero|\n" ! " footer|format|linestyle|null|numericlocale|pager|\n" ! " pager_min_lines|recordsep|recordsep_zero|tableattr|title|\n" ! " tuples_only|unicode_border_linestyle|\n" " unicode_column_linestyle|unicode_header_linestyle})\n")); fprintf(output, _(" \\t [on|off] show only rows (currently %s)\n"), ON(pset.popt.topt.tuples_only)); --- 273,282 ---- fprintf(output, _(" \\H toggle HTML output mode (currently %s)\n"), ON(pset.popt.topt.format == PRINT_HTML)); fprintf(output, _(" \\pset [NAME [VALUE]] set table output option\n" ! " (NAME := {border|columns|expanded|fieldsep|fieldsep_csv|\n" ! " fieldsep_zero|footer|format|linestyle|null|numericlocale|\n" ! " pager|pager_min_lines|recordsep|recordsep_zero|tableattr|\n" ! " title|tuples_only|unicode_border_linestyle|\n" " unicode_column_linestyle|unicode_header_linestyle})\n")); fprintf(output, _(" \\t [on|off] show only rows (currently %s)\n"), ON(pset.popt.topt.tuples_only)); diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h index 69e617e..93d0b95 100644 *** a/src/bin/psql/settings.h --- b/src/bin/psql/settings.h *************** *** 13,18 **** --- 13,19 ---- #include "fe_utils/print.h" #define DEFAULT_FIELD_SEP "|" + #define DEFAULT_FIELD_SEP_CSV "," #define DEFAULT_RECORD_SEP "\n" #if defined(WIN32) || defined(__CYGWIN__) diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c index be57574..84d02f4 100644 *** a/src/bin/psql/startup.c --- b/src/bin/psql/startup.c *************** main(int argc, char *argv[]) *** 144,149 **** --- 144,151 ---- pset.popt.topt.stop_table = true; pset.popt.topt.default_footer = true; + pset.popt.topt.fieldSepCsv = pg_strdup(DEFAULT_FIELD_SEP_CSV); + pset.popt.topt.unicode_border_linestyle = UNICODE_LINESTYLE_SINGLE; pset.popt.topt.unicode_column_linestyle = UNICODE_LINESTYLE_SINGLE; pset.popt.topt.unicode_header_linestyle = UNICODE_LINESTYLE_SINGLE; *************** parse_psql_options(int argc, char *argv[ *** 468,473 **** --- 470,476 ---- {"expanded", no_argument, NULL, 'x'}, {"no-psqlrc", no_argument, NULL, 'X'}, {"help", optional_argument, NULL, 1}, + {"csv", no_argument, NULL, 2}, {NULL, 0, NULL, 0} }; *************** parse_psql_options(int argc, char *argv[ *** 658,663 **** --- 661,669 ---- exit(EXIT_SUCCESS); } break; + case 2: + pset.popt.topt.format = PRINT_CSV; + break; default: unknown_option: fprintf(stderr, _("Try \"%s --help\" for more information.\n"), diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 9dbd555..b51fd57 100644 *** a/src/bin/psql/tab-complete.c --- b/src/bin/psql/tab-complete.c *************** psql_completion(const char *text, int st *** 2605,2610 **** --- 2605,2611 ---- /* Complete CREATE EVENT TRIGGER <name> ON with event_type */ else if (Matches("CREATE", "EVENT", "TRIGGER", MatchAny, "ON")) COMPLETE_WITH("ddl_command_start", "ddl_command_end", "sql_drop"); + /* * Complete CREATE EVENT TRIGGER <name> ON <event_type>. EXECUTE FUNCTION * is the recommended grammar instead of EXECUTE PROCEDURE in version 11 *************** psql_completion(const char *text, int st *** 3525,3531 **** COMPLETE_WITH_QUERY(Query_for_list_of_roles); else if (TailMatchesCS("\\pset")) COMPLETE_WITH_CS("border", "columns", "expanded", ! "fieldsep", "fieldsep_zero", "footer", "format", "linestyle", "null", "numericlocale", "pager", "pager_min_lines", "recordsep", "recordsep_zero", --- 3526,3533 ---- COMPLETE_WITH_QUERY(Query_for_list_of_roles); else if (TailMatchesCS("\\pset")) COMPLETE_WITH_CS("border", "columns", "expanded", ! "fieldsep", "fieldsep_csv", "fieldsep_zero", ! "footer", "format", "linestyle", "null", "numericlocale", "pager", "pager_min_lines", "recordsep", "recordsep_zero", *************** psql_completion(const char *text, int st *** 3536,3542 **** else if (TailMatchesCS("\\pset", MatchAny)) { if (TailMatchesCS("format")) ! COMPLETE_WITH_CS("aligned", "asciidoc", "html", "latex", "latex-longtable", "troff-ms", "unaligned", "wrapped"); else if (TailMatchesCS("linestyle")) --- 3538,3544 ---- else if (TailMatchesCS("\\pset", MatchAny)) { if (TailMatchesCS("format")) ! COMPLETE_WITH_CS("aligned", "asciidoc", "csv", "html", "latex", "latex-longtable", "troff-ms", "unaligned", "wrapped"); else if (TailMatchesCS("linestyle")) diff --git a/src/fe_utils/print.c b/src/fe_utils/print.c index cb9a9a0..3b07280 100644 *** a/src/fe_utils/print.c --- b/src/fe_utils/print.c *************** print_aligned_vertical(const printTableC *** 1737,1742 **** --- 1737,1846 ---- /**********************/ + /* CSV format */ + /**********************/ + + + static void + csv_escaped_print(const char *str, FILE *fout) + { + const char *p; + + fputc('"', fout); + for (p = str; *p; p++) + { + if (*p == '"') + fputc('"', fout); /* double quotes are doubled */ + fputc(*p, fout); + } + fputc('"', fout); + } + + static void + csv_print_field(const char *str, FILE *fout, const char *sep) + { + /*---------------- + * Enclose and escape field contents when one of these conditions is met: + * - the field separator is found in the contents. + * - the field contains a CR or LF. + * - the field contains a double quote. + *---------------- + */ + if ((sep != NULL && *sep != '\0' && strstr(str, sep) != NULL) || + strcspn(str, "\r\n\"") != strlen(str)) + csv_escaped_print(str, fout); + else + fputs(str, fout); + } + + static void + print_csv_text(const printTableContent *cont, FILE *fout) + { + const char *const *ptr; + int i; + + if (cancel_pressed) + return; + + /* + * The title and footer are never printed in csv format. The header is + * printed if opt_tuples_only is false. + * + * Despite RFC 4180 saying that end of lines are CRLF, terminate lines + * with '\n', which prints out as the system-dependent EOL string in text + * mode (typically LF on Unix and CRLF on Windows). + */ + if (cont->opt->start_table && !cont->opt->tuples_only) + { + /* print headers */ + for (ptr = cont->headers; *ptr; ptr++) + { + if (ptr != cont->headers) + fputs(cont->opt->fieldSepCsv, fout); + csv_print_field(*ptr, fout, cont->opt->fieldSepCsv); + } + fputc('\n', fout); + } + + /* print cells */ + for (i = 0, ptr = cont->cells; *ptr; i++, ptr++) + { + csv_print_field(*ptr, fout, cont->opt->fieldSepCsv); + if ((i + 1) % cont->ncolumns) + fputs(cont->opt->fieldSepCsv, fout); + else + fputc('\n', fout); + } + } + + static void + print_csv_vertical(const printTableContent *cont, FILE *fout) + { + const char *const *ptr; + int i; + + /* print records */ + for (i = 0, ptr = cont->cells; *ptr; i++, ptr++) + { + if (cancel_pressed) + return; + + /* print name of column */ + csv_print_field(cont->headers[i % cont->ncolumns], fout, + cont->opt->fieldSepCsv); + + /* print field separator */ + fputs(cont->opt->fieldSepCsv, fout); + + /* print field value */ + csv_print_field(*ptr, fout, cont->opt->fieldSepCsv); + + fputc('\n', fout); + } + } + + + /**********************/ /* HTML printing ******/ /**********************/ *************** printTable(const printTableContent *cont *** 3234,3239 **** --- 3338,3349 ---- else print_aligned_text(cont, fout, is_pager); break; + case PRINT_CSV: + if (cont->opt->expanded == 1) + print_csv_vertical(cont, fout); + else + print_csv_text(cont, fout); + break; case PRINT_HTML: if (cont->opt->expanded == 1) print_html_vertical(cont, fout); diff --git a/src/include/fe_utils/print.h b/src/include/fe_utils/print.h index b761349..665a782 100644 *** a/src/include/fe_utils/print.h --- b/src/include/fe_utils/print.h *************** enum printFormat *** 28,33 **** --- 28,34 ---- PRINT_NOTHING = 0, /* to make sure someone initializes this */ PRINT_ALIGNED, PRINT_ASCIIDOC, + PRINT_CSV, PRINT_HTML, PRINT_LATEX, PRINT_LATEX_LONGTABLE, *************** typedef struct printTableOpt *** 112,117 **** --- 113,119 ---- const printTextFormat *line_style; /* line style (NULL for default) */ struct separator fieldSep; /* field separator for unaligned text mode */ struct separator recordSep; /* record separator for unaligned text mode */ + char *fieldSepCsv; /* field separator for csv format */ bool numericLocale; /* locale-aware numeric units separator and * decimal marker */ char *tableAttr; /* attributes for HTML <table ...> */ diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out index 3818cfe..ceea28f 100644 *** a/src/test/regress/expected/psql.out --- b/src/test/regress/expected/psql.out *************** border 1 *** 262,267 **** --- 262,268 ---- columns 0 expanded off fieldsep '|' + fieldsep_csv ',' fieldsep_zero off footer on format aligned *************** last error message: division by zero *** 3243,3245 **** --- 3244,3389 ---- \echo 'last error code:' :LAST_ERROR_SQLSTATE last error code: 22012 \unset FETCH_COUNT + -- test csv format + -- test multi-line headers, wrapping, quoting rules and newline indicators + prepare q as select 'ab,cd' as col1, + 'ab' as "col,2", + E'a\tb' as col3, + '"' as col4, + '""' as col5, + 'a"b' as "col""6", + E'a\nb' as col7, + NULL as col8, + 'ab' as "col + 9", + 'cd' as "col + + 10", + array['ab', E'cd\nef'] as col11, + '{"a":"a,b", "a,b":null, "c":"a,\"b"}'::json as col12 + from generate_series(1,2); + \pset format csv + \pset fieldsep_csv ',' + \pset expanded off + \t off + execute q; + col1,"col,2",col3,col4,col5,"col""6",col7,col8,"col + 9","col + + 10",col11,col12 + "ab,cd",ab,a b,"""","""""","a""b","a + b",,ab,cd,"{ab,""cd + ef""}","{""a"":""a,b"", ""a,b"":null, ""c"":""a,\""b""}" + "ab,cd",ab,a b,"""","""""","a""b","a + b",,ab,cd,"{ab,""cd + ef""}","{""a"":""a,b"", ""a,b"":null, ""c"":""a,\""b""}" + \pset fieldsep_csv '\t' + execute q; + col1 col,2 col3 col4 col5 "col""6" col7 col8 "col + 9" "col + + 10" col11 col12 + ab,cd ab "a b" """" """""" "a""b" "a + b" ab cd "{ab,""cd + ef""}" "{""a"":""a,b"", ""a,b"":null, ""c"":""a,\""b""}" + ab,cd ab "a b" """" """""" "a""b" "a + b" ab cd "{ab,""cd + ef""}" "{""a"":""a,b"", ""a,b"":null, ""c"":""a,\""b""}" + \t on + execute q; + ab,cd ab "a b" """" """""" "a""b" "a + b" ab cd "{ab,""cd + ef""}" "{""a"":""a,b"", ""a,b"":null, ""c"":""a,\""b""}" + ab,cd ab "a b" """" """""" "a""b" "a + b" ab cd "{ab,""cd + ef""}" "{""a"":""a,b"", ""a,b"":null, ""c"":""a,\""b""}" + \t off + \pset expanded on + execute q; + col1 ab,cd + col,2 ab + col3 "a b" + col4 """" + col5 """""" + "col""6" "a""b" + col7 "a + b" + col8 + "col + 9" ab + "col + + 10" cd + col11 "{ab,""cd + ef""}" + col12 "{""a"":""a,b"", ""a,b"":null, ""c"":""a,\""b""}" + col1 ab,cd + col,2 ab + col3 "a b" + col4 """" + col5 """""" + "col""6" "a""b" + col7 "a + b" + col8 + "col + 9" ab + "col + + 10" cd + col11 "{ab,""cd + ef""}" + col12 "{""a"":""a,b"", ""a,b"":null, ""c"":""a,\""b""}" + \pset fieldsep_csv ',' + execute q; + col1,"ab,cd" + "col,2",ab + col3,a b + col4,"""" + col5,"""""" + "col""6","a""b" + col7,"a + b" + col8, + "col + 9",ab + "col + + 10",cd + col11,"{ab,""cd + ef""}" + col12,"{""a"":""a,b"", ""a,b"":null, ""c"":""a,\""b""}" + col1,"ab,cd" + "col,2",ab + col3,a b + col4,"""" + col5,"""""" + "col""6","a""b" + col7,"a + b" + col8, + "col + 9",ab + "col + + 10",cd + col11,"{ab,""cd + ef""}" + col12,"{""a"":""a,b"", ""a,b"":null, ""c"":""a,\""b""}" + -- illegal csv separators + \pset fieldsep_csv '' + \pset: the CSV field separator must be a single character + \pset fieldsep_csv ',,' + \pset: the CSV field separator must be a single character + \pset fieldsep_csv '\0' + \pset: the CSV field separator must be a single character + \pset fieldsep_csv '\n' + \pset: the CSV field separator cannot be a double quote, a newline, or a carriage return + \pset fieldsep_csv '\r' + \pset: the CSV field separator cannot be a double quote, a newline, or a carriage return + \pset fieldsep_csv '"' + \pset: the CSV field separator cannot be a double quote, a newline, or a carriage return + deallocate q; + \pset format aligned + \pset expanded off + \t off diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql index b45da9b..fd2d0df 100644 *** a/src/test/regress/sql/psql.sql --- b/src/test/regress/sql/psql.sql *************** select 1/(15-unique2) from tenk1 order b *** 688,690 **** --- 688,737 ---- \echo 'last error code:' :LAST_ERROR_SQLSTATE \unset FETCH_COUNT + + -- test csv format + -- test multi-line headers, wrapping, quoting rules and newline indicators + prepare q as select 'ab,cd' as col1, + 'ab' as "col,2", + E'a\tb' as col3, + '"' as col4, + '""' as col5, + 'a"b' as "col""6", + E'a\nb' as col7, + NULL as col8, + 'ab' as "col + 9", + 'cd' as "col + + 10", + array['ab', E'cd\nef'] as col11, + '{"a":"a,b", "a,b":null, "c":"a,\"b"}'::json as col12 + from generate_series(1,2); + + \pset format csv + \pset fieldsep_csv ',' + \pset expanded off + \t off + execute q; + \pset fieldsep_csv '\t' + execute q; + \t on + execute q; + \t off + \pset expanded on + execute q; + \pset fieldsep_csv ',' + execute q; + + -- illegal csv separators + \pset fieldsep_csv '' + \pset fieldsep_csv ',,' + \pset fieldsep_csv '\0' + \pset fieldsep_csv '\n' + \pset fieldsep_csv '\r' + \pset fieldsep_csv '"' + + deallocate q; + \pset format aligned + \pset expanded off + \t off
Or we could kill both issues by hard-wiring the separator as ','.
+1
I've never encountered a situation where a customer wanted a custom delimiter AND quoted strings. So either they wanted pure CSV or a customed TSV.
Could we have another output type called "separated" that uses the existing --fieldsep / --recordsep? Word will get out that csv is faster, but we'd still have the flexibility if somebody really wanted it.
Corey Huinker <corey.huinker@gmail.com> writes: > Could we have another output type called "separated" that uses the existing > --fieldsep / --recordsep? Uh, what's the difference from the existing unaligned format? regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> Or we could kill both issues by hard-wiring the separator as ','. Using ';' for the delimiter isn't all that rare. But I don't see any reason to allow multibyte or non-ascii characters or arbitrary strings. -- Andrew (irc:RhodiumToad)
po 26. 11. 2018 v 5:36 odesílatel Andrew Gierth <andrew@tao11.riddles.org.uk> napsal:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Or we could kill both issues by hard-wiring the separator as ','.
Using ';' for the delimiter isn't all that rare.
; is default for CSV produced by MS Excel in Czech mutation - so for some countries, like CR is common.
Regards
Pavel
But I don't see any reason to allow multibyte or non-ascii characters or
arbitrary strings.
--
Andrew (irc:RhodiumToad)
On Sun, Nov 25, 2018 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Corey Huinker <corey.huinker@gmail.com> writes:
> Could we have another output type called "separated" that uses the existing
> --fieldsep / --recordsep?
Uh, what's the difference from the existing unaligned format?
No footer and I guess we'd want to escape instances of fieldsep and recordsep in the data, so I guess if we had an option to escape instances of fieldsep/recordsep found in the data, unaligned would work fine.
n Mon, Nov 26, 2018 at 5:53 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
po 26. 11. 2018 v 5:36 odesílatel Andrew Gierth <andrew@tao11.riddles.org.uk> napsal:>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Or we could kill both issues by hard-wiring the separator as ','.
Using ';' for the delimiter isn't all that rare.; is default for CSV produced by MS Excel in Czech mutation - so for some countries, like CR is common.
I believe this is a common occurance in countries that use decimal comma instead of decimal point, which is a fairly large portion of the world.
/Magnus
Tom Lane wrote: > And, in fact, right now *none* of psql's table output formats is both > unambiguous and reasonably simple/popular to use. So the astonishing > thing about this patch, IMO, is that we didn't do it a decade ago. Yeah, that's what motivated this submission in the first place. > I feel that if we allow multi-byte characters here, we might as well > take the training wheels off and just say you can use any separator > string you want, as long as it doesn't contain double quote, \r, or \n. The reason behind disallowing multiple characters was the likeliness of them being mistakes. For instance, this example came up at some point in the discussion: \pset fieldsep_csv ,, To me the probability that a user has fat-fingered this is pretty high, and this would silently produce a really bogus file. Another kind of mistake comes from the difficulty of properly quoting on the command line: psql -- csv -P fieldsep_csv='\t' would be interpreted as a two-character separator despite being obviously not the user's intention. About disallowing characters beyond US-ASCII, I can't find a similar justification. COPY does not allow them, but it's justified (in the archives) by the fear of being slower when importing, which is not a concern here. > We could avoid this self-inflicted confusion by choosing a different > parameter name. I'd be good with "csv_fieldsep" or "csvfieldsep". +1 > Or we could kill both issues by hard-wiring the separator as ','. Ideally people would understand that they can use -A for any delimiter but no quoting, or --csv with strict quoting and in that case a fixed delimiter is fine, since it's going to be safely quoted, its presence in the data is not a problem. But I'm not too confident that everyone would understand it that way, even if it's well explained in the doc. When one is told "please produce CSV files with semi-colons as separators", it's simpler to just produce that rather than arguing that these requirements are probably ill-advised. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On Sun, Nov 25, 2018 at 6:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think there are two remaining points to settle: > > 1. Are we limiting the separator to be a single-byte character or not? I agree with what others have said that expanding functionality in this direction is more likely to mask errors than be useful. > I'm a bit inclined to the former viewpoint. > If we were in the business of being restrictive, why would we allow > the field separator to be changed at all? The name of the format is > *comma* separated values, not something else. I still stand by the more inclusive, and arguably modern, name "character separated values" for the abbreviation...which can be taken to mean any single character quite easily and is how it appears to be used these days in any case. > 2. Speaking of the field separator, I'm pretty desperately unhappy > with the choice of "fieldsep_csv" as the parameter name.[...] > We could avoid this self-inflicted confusion by choosing a different > parameter name. I'd be good with "csv_fieldsep" or "csvfieldsep". Make sense to me - with the underscore personally. David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Sun, Nov 25, 2018 at 6:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 1. Are we limiting the separator to be a single-byte character or not? > I agree with what others have said that expanding functionality in > this direction is more likely to mask errors than be useful. OK, reasonable arguments were made why not to allow multi-character separators. Should we then match the server and insist on a single-byte separator? It's a bit inconsistent if psql can be made to emit "csv" files that COPY can't read, especially when it's otherwise a subset of what COPY allows. >> 2. Speaking of the field separator, I'm pretty desperately unhappy >> with the choice of "fieldsep_csv" as the parameter name.[...] >> We could avoid this self-inflicted confusion by choosing a different >> parameter name. I'd be good with "csv_fieldsep" or "csvfieldsep". > Make sense to me - with the underscore personally. Barring complaints, I'll switch it to "csv_fieldsep". regards, tom lane
Tom Lane wrote: > OK, reasonable arguments were made why not to allow multi-character > separators. Should we then match the server and insist on a single-byte > separator? It's a bit inconsistent if psql can be made to emit "csv" > files that COPY can't read, especially when it's otherwise a subset > of what COPY allows. I seem to be the only one advocating to accept an Unicode character here, and I won't insist on it. There's still a minor annoyance to solve if we want to claim full compatibility with COPY CSV: backslash followed by dot must be quoted to avoid being interpreted as an end of data indicator. A proposed fix is attached. print_csv_vertical() is left unchanged because it's not possible currently to end up with \. alone on a line with the expanded display: we'd need to allow first for an empty field separator, I believe. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Attachment
I wrote: > A proposed fix is attached. print_csv_vertical() is left unchanged > because it's not possible currently to end up with \. alone > on a line with the expanded display On second thought, it is possible postgres=# \pset format csv Output format is csv. postgres=# \pset fieldsep_csv '.' Field separator for CSV is ".". postgres=# \x Expanded display is on. postgres=# select '' as "\" ; \. PFA an upgraded fix. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Attachment
"Daniel Verite" <daniel@manitou-mail.org> writes: >> A proposed fix is attached. print_csv_vertical() is left unchanged >> because it's not possible currently to end up with \. alone >> on a line with the expanded display > On second thought, it is possible > postgres=# \pset format csv > Output format is csv. > postgres=# \pset fieldsep_csv '.' > Field separator for CSV is ".". > postgres=# \x > Expanded display is on. > postgres=# select '' as "\" ; > \. Ugh. There's more to that than meets the eye; consider also \pset csv_fieldsep '.' select '\' as d1, '' as d2; or the reverse case where we set the fieldsep to '\' and then emit a second field containing '.'. I think that the approach you're using doesn't really scale to handle all these cases sanely, so what I did instead was just to make csv_print_field force field quoting if any of these cases could possibly apply. That will result in excess quoting in some cases, but I think that's fine, since they're all pretty uncommon. (BTW, it seems only chance that the server's CSV mode isn't also subject to this case, but since it always quotes empty fields, I think we're OK there.) Pushed with that correction and some other last-minute review. regards, tom lane
Pushed with that correction and some other last-minute review.
Great! Thank you
Pavel
regards, tom lane
Tom Lane wrote: > what I did instead was just to make > csv_print_field force field quoting if any of these cases could > possibly apply. That will result in excess quoting in some > cases, but I think that's fine, since they're all pretty uncommon. Indeed. > (BTW, it seems only chance that the server's CSV mode isn't also > subject to this case, but since it always quotes empty fields, > I think we're OK there.) > > Pushed with that correction and some other last-minute review. Thanks! Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite