Thread: psql and tab-delimited output
Hi.
Traditionally, to generate a TSV report, I've simply invoked psql with:
--no-align --field-separator '\t' --pset footer=off
That works in most cases, except when your column values contain tabs themselves.
I know that COPY() will escape tabs (as \t), and we can use that from psql with the \copy command, but that does not include a header row of the column names.
So, my question is, what's the simplest way to generate tab-escaped TSV-formatted reports with the first line containing the list of column names?
I also considered handling the escaping myself within the SELECT, and then sticking with the first approach above.
Suggestions?
Thanks.
Abelard Hoffman wrote on 06.09.2014 09:32: > Traditionally, to generate a TSV report, I've simply invoked psql > with: --no-align --field-separator '\t' --pset footer=off > > That works in most cases, except when your column values contain tabs > themselves. > > I know that COPY() will escape tabs (as \t), and we can use that from > psql with the \copy command, but that does not include a header row > of the column names. > > So, my question is, what's the simplest way to generate tab-escaped > TSV-formatted reports with the first line containing the list of > column names? Do you have to do that using psql or can you use a different tool? The tool I maintain - SQL Workbench/J - has a command to export tables (or queries) to a wide range of export formats. WbExport can escape of special characters by using \t, \n etc notation for "control" characters http://sql-workbench.net/manual/command-export.html especially http://sql-workbench.net/manual/command-export.html#text-escape-switch This can all be done from a shell/batch script if needed. The tool needs a Java runtime and the Postgres JDBC driver. Thomas
On 09/06/2014 12:32 AM, Abelard Hoffman wrote: > Hi. > > Traditionally, to generate a TSV report, I've simply invoked psql with: > --no-align --field-separator '\t' --pset footer=off > > That works in most cases, except when your column values contain tabs > themselves. > > I know that COPY() will escape tabs (as \t), and we can use that from > psql with the \copy command, but that does not include a header row of > the column names. > > So, my question is, what's the simplest way to generate tab-escaped > TSV-formatted reports with the first line containing the list of column > names? > create table tsv_test (id int, fld_1 varchar); insert into tsv_test values (1, 'test value'); insert into tsv_test values (2, 'test value'); insert into tsv_test values (3, 'test value'); \copy tsv_test to 'data.tsv' with csv header delimiter ' '; aklaver@panda:~> cat data.tsv id fld_1 1 "test value" 2 "test value" 3 "test value" > > I also considered handling the escaping myself within the SELECT, and > then sticking with the first approach above. > > Suggestions? > > Thanks. > > -- Adrian Klaver adrian.klaver@aklaver.com
On Sat, Sep 6, 2014 at 7:28 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 09/06/2014 12:32 AM, Abelard Hoffman wrote:
[snip]
So, my question is, what's the simplest way to generate tab-escaped
TSV-formatted reports with the first line containing the list of column
names?
create table tsv_test (id int, fld_1 varchar);
insert into tsv_test values (1, 'test value');
insert into tsv_test values (2, 'test value');
insert into tsv_test values (3, 'test value');
\copy tsv_test to 'data.tsv' with csv header delimiter ' ';
aklaver@panda:~> cat data.tsv
id fld_1
1 "test value"
2 "test value"
3 "test value"
Thanks, Adrian. That works, but since we're using quotes to embed the delimiter, we lose the simplicity of TSV. I can't just do a split on /\t/ to get the fields and then unescape the values. At that point it's probably simpler to just switch to standard CSV.
Using your example, the output I'd prefer is:
id fld_1
1 test\tvalue
2 test\tvalue
3 test\tvalue
I looked at the options for COPY's CSV format, but I don't see a way to disable quoting but still have escaping.
This works, although it's not exactly simple:
DROP TABLE IF EXISTS tsv_test;
CREATE TABLE tsv_test (id int, fld_1 varchar);
INSERT INTO tsv_test VALUES (1, 'test value');
INSERT INTO tsv_test VALUES (2, 'test value');
INSERT INTO tsv_test VALUES (3, 'test value');
SELECT * FROM tsv_test WHERE FALSE; -- to generate header row
COPY tsv_test TO STDOUT;
And then run that through psql with the --no-align --field-separator '\t' --pset footer=off options.
With that, I'd probably generate the report into a temp table, and then run the above to actually export that table as TSV.
@Thomas, yes, I was hoping to stick with just psql, but I'll look at other tools if necessary.
Any other ideas?
Thanks.
On 09/06/2014 10:34 AM, Abelard Hoffman wrote: > On Sat, Sep 6, 2014 at 7:28 AM, Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > On 09/06/2014 12:32 AM, Abelard Hoffman wrote: > > [snip] > > So, my question is, what's the simplest way to generate tab-escaped > TSV-formatted reports with the first line containing the list of > column > names? > > > > create table tsv_test (id int, fld_1 varchar); > > insert into tsv_test values (1, 'test value'); > insert into tsv_test values (2, 'test value'); > insert into tsv_test values (3, 'test value'); > > \copy tsv_test to 'data.tsv' with csv header delimiter ' '; > > aklaver@panda:~> cat data.tsv > id fld_1 > 1 "test value" > 2 "test value" > 3 "test value" > > > Thanks, Adrian. That works, but since we're using quotes to embed the > delimiter, we lose the simplicity of TSV. I can't just do a split on > /\t/ to get the fields and then unescape the values. At that point it's > probably simpler to just switch to standard CSV. > > Using your example, the output I'd prefer is: > > id fld_1 > 1 test\tvalue > 2 test\tvalue > 3 test\tvalue I guess it depends on what you are using. In Python: In [30]: with open('data.tsv', 'rb') as c_file: c_reader = csv.reader(c_file, delimiter='\t', quotechar = '"') for row in c_reader: print row ....: ['id', 'fld_1'] ['1', 'test\tvalue'] ['2', 'test\tvalue'] ['3', 'test\tvalue'] In [33]: with open('data.tsv', 'rb') as c_file: c_reader = csv.reader(c_file, delimiter='\t', quotechar = '"') for row in c_reader: print row[1] ....: fld_1 test value test value test value The Postgres docs have a good note on the CSV format: http://www.postgresql.org/docs/9.3/interactive/sql-copy.html "Note: Many programs produce strange and occasionally perverse CSV files, so the file format is more a convention than a standard. Thus you might encounter some files that cannot be imported using this mechanism, and COPY might produce files that other programs cannot process." So it always an adventure:) > > I looked at the options for COPY's CSV format, but I don't see a way to > disable quoting but still have escaping. > > This works, although it's not exactly simple: > > DROP TABLE IF EXISTS tsv_test; > > CREATE TABLE tsv_test (id int, fld_1 varchar); > > INSERT INTO tsv_test VALUES (1, 'test value'); > INSERT INTO tsv_test VALUES (2, 'test value'); > INSERT INTO tsv_test VALUES (3, 'test value'); > > SELECT * FROM tsv_test WHERE FALSE; -- to generate header row > COPY tsv_test TO STDOUT; > > And then run that through psql with the --no-align --field-separator > '\t' --pset footer=off options. > With that, I'd probably generate the report into a temp table, and then > run the above to actually export that table as TSV. > > @Thomas, yes, I was hoping to stick with just psql, but I'll look at > other tools if necessary. > > Any other ideas? > > Thanks. > > > > -- Adrian Klaver adrian.klaver@aklaver.com
On Sat, Sep 6, 2014 at 11:43 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 09/06/2014 10:34 AM, Abelard Hoffman wrote:On Sat, Sep 6, 2014 at 7:28 AM, Adrian Klaver <adrian.klaver@aklaver.com<mailto:adrian.klaver@aklaver.com>> wrote:
On 09/06/2014 12:32 AM, Abelard Hoffman wrote:
[snip]
So, my question is, what's the simplest way to generate tab-escaped
TSV-formatted reports with the first line containing the list of
column
names?
create table tsv_test (id int, fld_1 varchar);
insert into tsv_test values (1, 'test value');
insert into tsv_test values (2, 'test value');
insert into tsv_test values (3, 'test value');
\copy tsv_test to 'data.tsv' with csv header delimiter ' ';
aklaver@panda:~> cat data.tsv
id fld_1
1 "test value"
2 "test value"
3 "test value"
Thanks, Adrian. That works, but since we're using quotes to embed the
delimiter, we lose the simplicity of TSV. I can't just do a split on
/\t/ to get the fields and then unescape the values. At that point it's
probably simpler to just switch to standard CSV.I guess it depends on what you are using.
Using your example, the output I'd prefer is:
id fld_1
1 test\tvalue
2 test\tvalue
3 test\tvalue
In Python:
[snip]
Yeah, I can parse CSV easily enough. My boss wants TSV though (I could parse CSV and split out TSV, of course). Even then, having to take the report query (which can be big), strip all the newlines and insert it into a \copy command is kind of a PITA.
I wrote this function, which does what I want. I can call this on each column that might include tabs and then safely generate TSV using just psql:
CREATE OR REPLACE FUNCTION public.tsv_escape(text) RETURNS text AS $$
my %replace = (
"\t" => 't',
"\r" => 'r',
"\n" => 'n',
'\\' => '\\',
);
$_[0] =~ s/(.)/exists $replace{ $1 } ? "\\$replace{$1}" : $1/gse;
return $_[0];
$$ LANGUAGE 'plperl';
Anyone see a way to rewrite that as an 'sql' function rather than plperl?
I also took a look at the psql source. It doesn't look like it would be very hard to add some backslash escaping logic as an option. Am I the only one that would use that? For reports, everyone else mostly uses other tools? I'd like to stay away from GUI-tools, if possible.
Thanks.
On 07 Sep 2014, at 10:45, Abelard Hoffman <abelardhoffman@gmail.com> wrote: > For reports, everyone else mostly uses other tools? I'd like to stay away from GUI-tools, if possible. For reporting, usually you use the data in the database directly. A TSV or CSV file is not a report, it’s at best a data source for your report. Going through an intermediate format is nota particularly effective approach to create reports, but if you have to (for example because you aren’t _allowed_ accessto the database), generally preferred formats seem to be CSV, XML or JSON; as long as it’s a standard format. TSV is not a common choice. Are you sure your boss actually cares that it’s TSV and not, for example, CSV? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Alban Hertroys-4 wrote > On 07 Sep 2014, at 10:45, Abelard Hoffman < > abelardhoffman@ > > wrote: > >> For reports, everyone else mostly uses other tools? I'd like to stay away >> from GUI-tools, if possible. > > For reporting, usually you use the data in the database directly. > > A TSV or CSV file is not a report, it’s at best a data source for your > report. Going through an intermediate format is not a particularly > effective approach to create reports, but if you have to (for example > because you aren’t _allowed_ access to the database), generally preferred > formats seem to be CSV, XML or JSON; as long as it’s a standard format. > TSV is not a common choice. Are you sure your boss actually cares that > it’s TSV and not, for example, CSV? TSV is generally chosen as an interchange format because data rarely has tabs...if your does that advantage goes away. Same question regarding the report aspect. If the boss wants a human readable text format report I'd suggest using the format function and output the the resultant single column as-is to the screen. Fixed-width columns are better than any kind of delimited format in that use case. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/psql-and-tab-delimited-output-tp5818019p5818097.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On 09/07/2014 01:45 AM, Abelard Hoffman wrote: > > > > On Sat, Sep 6, 2014 at 11:43 AM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > On 09/06/2014 10:34 AM, Abelard Hoffman wrote: > > On Sat, Sep 6, 2014 at 7:28 AM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > <mailto:adrian.klaver@aklaver.__com > <mailto:adrian.klaver@aklaver.com>>> wrote: > > On 09/06/2014 12:32 AM, Abelard Hoffman wrote: > > [snip] > > So, my question is, what's the simplest way to generate > tab-escaped > TSV-formatted reports with the first line containing > the list of > column > names? > > > > create table tsv_test (id int, fld_1 varchar); > > insert into tsv_test values (1, 'test value'); > insert into tsv_test values (2, 'test value'); > insert into tsv_test values (3, 'test value'); > > \copy tsv_test to 'data.tsv' with csv header delimiter ' > '; > > aklaver@panda:~> cat data.tsv > id fld_1 > 1 "test value" > 2 "test value" > 3 "test value" > > > Thanks, Adrian. That works, but since we're using quotes to > embed the > delimiter, we lose the simplicity of TSV. I can't just do a split on > /\t/ to get the fields and then unescape the values. At that > point it's > probably simpler to just switch to standard CSV. > > > Using your example, the output I'd prefer is: > > id fld_1 > 1 test\tvalue > 2 test\tvalue > 3 test\tvalue > > > I guess it depends on what you are using. > > In Python: > > [snip] > > Yeah, I can parse CSV easily enough. My boss wants TSV though (I could > parse CSV and split out TSV, of course). Even then, having to take the > report query (which can be big), strip all the newlines and insert it > into a \copy command is kind of a PITA. I see CSV as a generic term that covers all separated value formats, so TSV is just a variation. I am not sure exactly what the end point of all this, so I am probably not going to be able to offer much more. One heads up, in 9.3+ you have the option in \copy(COPY) to use an external program to copy TO or FROM > > I also took a look at the psql source. It doesn't look like it would be > very hard to add some backslash escaping logic as an option. Am I the > only one that would use that? For reports, everyone else mostly uses > other tools? I'd like to stay away from GUI-tools, if possible. Not sure what you are asking for here. Something like this?: http://www.postgresql.org/docs/9.3/static/sql-syntax-lexical.html 4.1.2.2. String Constants with C-style Escapes test=> select E'a\tb'; ?column? ----------- a b (1 row) > > Thanks. -- Adrian Klaver adrian.klaver@aklaver.com
On Sat, Sep 6, 2014 at 12:32 AM, Abelard Hoffman <abelardhoffman@gmail.com> wrote:
Hi.Traditionally, to generate a TSV report, I've simply invoked psql with:--no-align --field-separator '\t' --pset footer=offThat works in most cases, except when your column values contain tabs themselves.I know that COPY() will escape tabs (as \t), and we can use that from psql with the \copy command, but that does not include a header row of the column names.
Which is a shame. \copy really should allow HEADER in the default format, not just CSV format.
And it on the to-do list, just hasn't be done yet:
It seems like it should be fairly easy to do.
So, my question is, what's the simplest way to generate tab-escaped TSV-formatted reports with the first line containing the list of column names?
I just assume that none of the column names need escaping, and so select and join them on tabs. At one point I had a perl script that would do this for me, e.g. given a query, it would execute it once with a 'and 1=0' at the end (obviously can't be done legally/efficiently/safely with all queries) to get the column names, then again in a \COPY to get the data, but I seem to have misplaced it.
It worked well as long as you understood it was a dirty hack and so had the limitations of one.
Cheers,
Jeff
>the output I'd prefer is:
> id fld_1
> 1 test\tvalue
> 2 test\tvalue
> 3 >test\tvalue
Does this work for you?
copy (SELECT id, replace(fld_1, ' ', '\t') FROM tsv_test) to stdout with csv header delimiter ' ';
query returned copy data:
id replace
1 test\tvalue
2 test\tvalue
3 test\tvalue
--
> id fld_1
> 1 test\tvalue
> 2 test\tvalue
> 3 >test\tvalue
Does this work for you?
copy (SELECT id, replace(fld_1, ' ', '\t') FROM tsv_test) to stdout with csv header delimiter ' ';
query returned copy data:
id replace
1 test\tvalue
2 test\tvalue
3 test\tvalue
--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
On Sun, Sep 7, 2014 at 12:28 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Sat, Sep 6, 2014 at 12:32 AM, Abelard Hoffman <abelardhoffman@gmail.com> wrote:
[snip]
I know that COPY() will escape tabs (as \t), and we can use that from psql with the \copy command, but that does not include a header row of the column names.Which is a shame. \copy really should allow HEADER in the default format, not just CSV format.And it on the to-do list, just hasn't be done yet:It seems like it should be fairly easy to do.
Ah, excellent. That would solve most of my issue.
So, my question is, what's the simplest way to generate tab-escaped TSV-formatted reports with the first line containing the list of column names?I just assume that none of the column names need escaping, and so select and join them on tabs. At one point I had a perl script that would do this for me, e.g. given a query, it would execute it once with a 'and 1=0' at the end (obviously can't be done legally/efficiently/safely with all queries) to get the column names, then again in a \COPY to get the data, but I seem to have misplaced it.It worked well as long as you understood it was a dirty hack and so had the limitations of one.
Yes, that's exactly what I need, although I'm loathe to run the queries twice just to get the headers.
Thanks to everyone for their comments and suggestions.
As additional background, I have a dozen or so "reports" that exist as plain text files, and they're just run by cron with psql and mailed to those who are interested in them. Historically, all of them have been TSV. It's worked fine since all of the data has been computed (e.g., sales counts, etc.). But with a recent report, we're including a customer feedback comment, which obviously can have tabs and newlines, etc.
On Sun, Sep 7, 2014 at 9:25 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
I also took a look at the psql source. It doesn't look like it would be
very hard to add some backslash escaping logic as an option. Am I the
only one that would use that? For reports, everyone else mostly uses
other tools? I'd like to stay away from GUI-tools, if possible.
Not sure what you are asking for here. Something like this?:
http://www.postgresql.org/docs/9.3/static/sql-syntax-lexical.html
I was meaning, it seems like it would be useful to have psql be able to behave like COPY but without having to call \copy.
So IOW, something like:
psql --no-align --field-separator '\t' --field-escape
Where the --field-escape option would cause the data to be escaped exactly like COPY does. Having the HEADER option to COPY's default format will largely solve this though. Then it's just a matter of slurping in the report files, making sure all newlines are removed, and passing them to the appropriate \copy command.
Hi Alban.
On Sun, Sep 7, 2014 at 4:18 AM, Alban Hertroys <haramrae@gmail.com> wrote:
On 07 Sep 2014, at 10:45, Abelard Hoffman <abelardhoffman@gmail.com> wrote:
> For reports, everyone else mostly uses other tools? I'd like to stay away from GUI-tools, if possible.
For reporting, usually you use the data in the database directly.
A TSV or CSV file is not a report, it’s at best a data source for your report. Going through an intermediate format is not a particularly effective approach to create reports, but if you have to (for example because you aren’t _allowed_ access to the database), generally preferred formats seem to be CSV, XML or JSON; as long as it’s a standard format.
TSV is not a common choice. Are you sure your boss actually cares that it’s TSV and not, for example, CSV?
Could you expand on that a bit? What sort of tools does management use to generate reports from the database directly?
You're meaning a database warehouse? We just have an OLTP db, so we've always generated reports periodically through cron jobs.
Or maybe "reports" is the wrong word. We generate a bunch of db stats which can then be used however they want (pulled into Excel, etc.).
But would definitely be interested in learning about other approaches.
And yes, I'm sure we could convert everything over to CSV. Just an issue of inertia.
Thanks.
On Sun, Sep 7, 2014 at 5:17 PM, Melvin Davidson <melvin6925@gmail.com> wrote:
>the output I'd prefer is:
> id fld_1
> 1 test\tvalue
> 2 test\tvalue
> 3 >test\tvalue
Does this work for you?
copy (SELECT id, replace(fld_1, ' ', '\t') FROM tsv_test) to stdout with csv header delimiter ' ';
Thank you, Melvin. Yes, I think that's what I'll do for the time-being, until COPY's TSV format supports HEADER.
On 09/08/2014 11:45 AM, Abelard Hoffman wrote: > Hi Alban. > > On Sun, Sep 7, 2014 at 4:18 AM, Alban Hertroys <haramrae@gmail.com > <mailto:haramrae@gmail.com>> wrote: > > On 07 Sep 2014, at 10:45, Abelard Hoffman <abelardhoffman@gmail.com > <mailto:abelardhoffman@gmail.com>> wrote: > > > For reports, everyone else mostly uses other tools? I'd like to stay away from GUI-tools, if possible. > > For reporting, usually you use the data in the database directly. > > A TSV or CSV file is not a report, it’s at best a data source for > your report. Going through an intermediate format is not a > particularly effective approach to create reports, but if you have > to (for example because you aren’t _allowed_ access to the > database), generally preferred formats seem to be CSV, XML or JSON; > as long as it’s a standard format. > TSV is not a common choice. Are you sure your boss actually cares > that it’s TSV and not, for example, CSV? > > > Could you expand on that a bit? What sort of tools does management use > to generate reports from the database directly? > You're meaning a database warehouse? We just have an OLTP db, so we've > always generated reports periodically through cron jobs. > > Or maybe "reports" is the wrong word. We generate a bunch of db stats > which can then be used however they want (pulled into Excel, etc.). > But would definitely be interested in learning about other approaches. Ways I have done it: 1) In Excel use the data tools to run the query and return data directly to the spreadsheet. This assumes a version of Excel that has the data tools. 2) I use Python, so use psycopg2 to pull data from the database and then xlwt to write out a spreadsheet or reportlab to create a pdf or the csv module to create whatever flavor of CSV. > > And yes, I'm sure we could convert everything over to CSV. Just an issue > of inertia. > > Thanks. > -- Adrian Klaver adrian.klaver@aklaver.com