Thread: Markdown format output for psql, design notes
Hi all,
# What I'd like to do
I've been working on the idea of a markdown format for psql as I had said in that thread : https://www.postgresql.org/message-id/flat/CAB_COdiiwTmBcrmjXWCKiqkcPgf_bLodrUyb4GYE6pfKeoK2eg%40mail.gmail.com
I've been working on the idea of a markdown format for psql as I had said in that thread : https://www.postgresql.org/message-id/flat/CAB_COdiiwTmBcrmjXWCKiqkcPgf_bLodrUyb4GYE6pfKeoK2eg%40mail.gmail.com
An attempt was made a year ago (see here : https://www.postgresql.org/message-id/flat/CAAYBy8bs%3D8vz6Ps_nLW24NJhqcxz4bsWBLawAiwWSPSLdWSmvA%40mail.gmail.com#eb7b6eb6daa60aac1f5fa001f934f89a), but didn't end up with something commitable.
What's more, I quite disagree with `\pset linestyle markdown` option to have a markdown output in psql, I prefer `\pset format markdown`.
What's more, I quite disagree with `\pset linestyle markdown` option to have a markdown output in psql, I prefer `\pset format markdown`.
# Some official doc about markdown So here are my thoughts (before writing any code) :
- "Official" markdown seems to be the daring fireball project (see Aaron Schwartz's note here http://www.aaronsw.com/weblog/001189)
- Official markdown doesn't support table formatting. Authors said we could just use HTML inside markdown to do so (it's quite not readable for a human, that's why I don't like this option) -> see here: https://daringfireball.net/projects/markdown/syntax#html
- Table markdown is introduced in "Markdown Extra" that was first implemented in PHP (see here https://michelf.ca/projects/php-markdown/extra/#table)
- I want to make the patch as simple as possible, so I won't implement cell alignment
# The result I want
From points 3 and 4, here is what I'd like to see :
| Header 1 | Header 2 | Header 3 |
|----------|----------|----------|
| content | content | content |
| content | content | content |
(2 rows)
**'|' at beginning and end of line are optional in markdown extra, but it seems as a consensus to always add them. You may challenge this choice, I'm open to discussion.**
From the fireball project (https://daringfireball.net/projects/markdown/syntax#backslash) and markdown extra (https://michelf.ca/projects/php-markdown/extra/#backslash), it seems we need to backslash escape all of those characters:
~~~
\ backslash
` backtick
* asterisk
_ underscore
{} curly braces
[] square brackets
() parentheses
# hash mark
+ plus sign
- minus sign (hyphen)
. dot
! exclamation mark
: colon
| pipe
~~~
# psql syntax to get that
It feels to me that we should use the `\pset format` (or `-P` or `--pset=` in batch mode) syntax to tell psql we want markdown. So any of that one should provide a markdown output :
- `psql -P format=markdown`
- `psql --pset=format=markdown`
- `\pset format markdown` (in psql prompt command)
# Code to change
If I want to code that patch, here are the files I think I'll need to change :
- Documentation
- doc/src/sgml/rel/psql-ref.sgml
- src/bin/psql/help.c
- Tests
- src/test/regress/expected/psql.out
- src/test/regress/sql/psql.sql
- Code
- src/bin/psql/command.c
- src/bin/psql/tab-complete.c
# What I'd like you to do
First, thanks to have read that whole mail and sorry I didn't mean to make it so long...
Then I'd like to know **what you think about what I'm about to do** before heading in a wrong direction.
Have a nice day,
Lætitia
-- Think! Do you really need to print this email ?
There is no Planet B.
There is no Planet B.
st 28. 11. 2018 v 9:59 odesílatel Lætitia Avrot <laetitia.avrot@gmail.com> napsal:
Hi all,# What I'd like to do
I've been working on the idea of a markdown format for psql as I had said in that thread : https://www.postgresql.org/message-id/flat/CAB_COdiiwTmBcrmjXWCKiqkcPgf_bLodrUyb4GYE6pfKeoK2eg%40mail.gmail.comAn attempt was made a year ago (see here : https://www.postgresql.org/message-id/flat/CAAYBy8bs%3D8vz6Ps_nLW24NJhqcxz4bsWBLawAiwWSPSLdWSmvA%40mail.gmail.com#eb7b6eb6daa60aac1f5fa001f934f89a), but didn't end up with something commitable.
What's more, I quite disagree with `\pset linestyle markdown` option to have a markdown output in psql, I prefer `\pset format markdown`.
sure +1
# Some official doc about markdown So here are my thoughts (before writing any code) :
- "Official" markdown seems to be the daring fireball project (see Aaron Schwartz's note here http://www.aaronsw.com/weblog/001189)
- Official markdown doesn't support table formatting. Authors said we could just use HTML inside markdown to do so (it's quite not readable for a human, that's why I don't like this option) -> see here: https://daringfireball.net/projects/markdown/syntax#html
- Table markdown is introduced in "Markdown Extra" that was first implemented in PHP (see here https://michelf.ca/projects/php-markdown/extra/#table)
- I want to make the patch as simple as possible, so I won't implement cell alignment
# The result I wantFrom points 3 and 4, here is what I'd like to see :| Header 1 | Header 2 | Header 3 ||----------|----------|----------|| content | content | content || content | content | content |(2 rows)
+1
**'|' at beginning and end of line are optional in markdown extra, but it seems as a consensus to always add them. You may challenge this choice, I'm open to discussion.**From the fireball project (https://daringfireball.net/projects/markdown/syntax#backslash) and markdown extra (https://michelf.ca/projects/php-markdown/extra/#backslash), it seems we need to backslash escape all of those characters:~~~\ backslash` backtick* asterisk_ underscore{} curly braces[] square brackets() parentheses# hash mark+ plus sign- minus sign (hyphen). dot! exclamation mark: colon| pipe~~~# psql syntax to get thatIt feels to me that we should use the `\pset format` (or `-P` or `--pset=` in batch mode) syntax to tell psql we want markdown. So any of that one should provide a markdown output :
- `psql -P format=markdown`
- `psql --pset=format=markdown`
- `\pset format markdown` (in psql prompt command)
# Code to changeIf I want to code that patch, here are the files I think I'll need to change :**You're welcome to add any other file that I missed in that list!**
- Documentation
- doc/src/sgml/rel/psql-ref.sgml
- src/bin/psql/help.c
- Tests
- src/test/regress/expected/psql.out
- src/test/regress/sql/psql.sql
- Code
- src/bin/psql/command.c
- src/bin/psql/tab-complete.c
# What I'd like you to doFirst, thanks to have read that whole mail and sorry I didn't mean to make it so long...Then I'd like to know **what you think about what I'm about to do** before heading in a wrong direction.Have a nice day,Lætitia--Think! Do you really need to print this email ?
There is no Planet B.
Lætitia Avrot wrote: > # The result I want > From points 3 and 4, here is what I'd like to see : > > | Header 1 | Header 2 | Header 3 | > |----------|----------|----------| > | content | content | content | > | content | content | content | > (2 rows) What would it look like when a field or a header is made of multiple lines? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Le mer. 28 nov. 2018 à 16:25, Daniel Verite <daniel@manitou-mail.org> a écrit :
Lætitia Avrot wrote:
> # The result I want
> From points 3 and 4, here is what I'd like to see :
>
> | Header 1 | Header 2 | Header 3 |
> |----------|----------|----------|
> | content | content | content |
> | content | content | content |
> (2 rows)
What would it look like when a field or a header is made of multiple
lines?
I suppose you mean in the standard output when the screen is too short to print the whole line ?
Because if the output is redirected to a file (with `\o myfile` for example), the line end naturally when the row ends.
That's a good question. Markdown Extra doesn't provide any solution in that case. Each newline means a new row.
I'd say that in that case markdown syntax will be broken and the user has to redirect the output in a file to have a right markdown syntax.... That case could be explained in the documentation.
So if we try with an example we'd have something like that :
| Header | Header | Header | Header | Header | Header | Header | Header | Header | Header | Reallyreallyreallyreally
TooLongHeader | Header |
| content | content | content | content | content | content | content | content | content | content | Reallyreallyreallyreally
TooLongContent |content |
| content | content | content | content | content | content | content | content | content | content | Reallyreallyreallyreally
TooLongContent |content |
(2 rows)
I couldn't find a way to make it right. If you have a better idea, please share it :-)
Cheers,
Cheers,
Lætitia
-- Think! Do you really need to print this email ?
There is no Planet B.
There is no Planet B.
On 28/11/2018 09:59, Lætitia Avrot wrote: > First, thanks to have read that whole mail and sorry I didn't mean to > make it so long... > Then I'd like to know ***what you think about what I'm about to do*** > before heading in a wrong direction. I'm a little bit reluctant for us to write and maintain more and more format styles, especially one as subjective and varied as markdown. I imagine we will constantly be bombarded with "this isn't quite right" or "this isn't compatible with github". What I personally use is the excellent pandoc tool (https://pandoc.org/) which can convert formats we already output into a multitude of other formats. psql -qHc "values (E'hello\nworld', 42), ('single line', 5), ('another', null)" | pandoc -f html -t markdown ----------------------- column1 column2 ------------- --------- hello\ 42 world single line 5 another ----------------------- (3 rows)\ This handles both column alignment and the multiline issue Daniel raised. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
I'm a little bit reluctant for us to write and maintain more and more
format styles, especially one as subjective and varied as markdown. I
imagine we will constantly be bombarded with "this isn't quite right" or
"this isn't compatible with github".
I understand your concern. It's a pretty good point.
What I personally use is the excellent pandoc tool (https://pandoc.org/)
which can convert formats we already output into a multitude of other
formats.
psql -qHc "values (E'hello\nworld', 42), ('single line', 5), ('another',
null)" | pandoc -f html -t markdown
-----------------------
column1 column2
------------- ---------
hello\ 42
world
single line 5
another
-----------------------
(3 rows)\
This handles both column alignment and the multiline issue Daniel raised.
Well, pandoc doesn't handle line breaks for Markdown Extra.
~~~
psql -qHc "values (E'hello world', 42), ('single line', 5), ('another',
null)" log | pandoc -f html -t markdown_phpextra
| column1 | column2 |
|-------------|---------|
| hello world | 42 |
| single line | 5 |
| another | |
~~~
But with a `\n`, the output is simply html without transformation (even with option --wrap=none)...
# What stays in my mind
* It's pretty difficult to handle line breaks
* Markdown is not standardised and several flavours exist for table implementation (so why favor one over the others?)
# The question I'd like to ask you
So now, I think we need to ask that fundamental question :
**Is it worth it ?**
Cheers,
Lætitia
-- Think! Do you really need to print this email ?
There is no Planet B.
There is no Planet B.
Lætitia Avrot wrote: > I suppose you mean in the standard output when the screen is too short to > print the whole line ? > Because if the output is redirected to a file (with `\o myfile` for > example), the line end naturally when the row ends. No I meant independently of the screen, if there's an LF character in a cell. Or a '|' character, since that's the same problem: an element of structure happening to be in the contents. The specs mentioned upthread don't seem to give any indication about that being supported. Say we have: SELECT E'foo\nbar' as "Header1", 'foo|bar' as "Header2" If the markdown output was produced for the sole purpose of being converted to HTML in the end, which is often the case, it would work to use HTML entities in the output, for instance: Header1|Header2 ---|--- foo<br>bar|foo|bar This piece seems to be properly processed and rendered by markdown processors I can try (pandoc, grip, github). But then we'd also need to convert < > and & in the original contents to the equivalent HTML entities, and that would really be markdown-for-html instead of just markdown, I guess. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On 29/11/2018 08:26, Lætitia Avrot wrote: > # What stays in my mind > > * It's pretty difficult to handle line breaks > * Markdown is not standardised and several flavours exist for table > implementation (so why favor one over the others?) > > # The question I'd like to ask you > So now, I think we need to ask that fundamental question : > > ***Is it worth it ?*** And my answer to that is: No. Markdown isn't standardized enough to support and please everyone. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Hi,
--
No I meant independently of the screen, if there's an LF character
in a cell. Or a '|' character, since that's the same problem: an
element of structure happening to be in the contents.
The specs mentioned upthread don't seem to give any indication
about that being supported.
i've given a list of characters that needs escaping as stated in the Markdown Extra doc and `|` is certainly one of this.
For LF caracter, I'm totally ok with the fact that it will break the markdown output and my answer to that is "KISS". I don't want to handle that case. Markdown Extra obviously decided that there was no such thing as a multiline row.
Say we have:
SELECT E'foo\nbar' as "Header1", 'foo|bar' as "Header2"
If the markdown output was produced for the sole purpose of being
converted to HTML in the end, which is often the case, it would work
to use HTML entities in the output
I don't use Markdown to create a HTML output. I use it to generate pdf for my customers.
But as Vik said earlier, maybe it's not worth it to provide a markdown output as pandoc can generate the markdown from the HTML output.
And if you need the markdown output to generate HTML why don't you use the HTML output ?
Cheers,
Lætitia
Think! Do you really need to print this email ?
There is no Planet B.
There is no Planet B.
Lætitia Avrot wrote: > But as Vik said earlier, maybe it's not worth it to provide a markdown > output as pandoc can generate the markdown from the HTML output. > And if you need the markdown output to generate HTML why don't you use the > HTML output ? The round-trip through pandoc does not do any miracle. The end result is readable to the human eye but structurally broken. If converted back to html, it's no longer a table. Anyway I tend to agree with Vik on this: "Markdown isn't standardized enough to support and please everyone." BTW github has independently started to support '|' in the cells by accepting the quoted version '\|' : https://help.github.com/articles/organizing-information-with-tables/ Now that we have csv as an output format, we can suggest custom csv-to-markdown converters to produce markdown rather than implementing one particular flavor of markdown in psql, or several flavors through flags. The popular script languages have solid CSV parsers that make this relatively easy and safe. Personally I'd use Perl with something like below, which looks short/simple enough to be shared on wiki.postgresql.org, along with versions in other languages. #!/usr/bin/perl # Usage # inside psql: # \pset format csv # \o |csvtomarkdown >/tmp/output.md # SQL commands... # \o # or psql --csv -c "...query..." | csvtomarkdown use Text::CSV; use open qw( :std :encoding(UTF-8) ); my $csv = Text::CSV->new({ binary => 1, eol => $/ }); sub do_format { # customize to your needs s/&/&/g; s/</</g; s/>/>/g; s/\n/<br>/g; s/\|/|/g; return $_; } my $header = $csv->getline(STDIN); for (@{$header}) { $_ = do_format($_); } print join ('|', @{$header}), "\n"; print join ('|', map { "---" } @{$header}), "\n"; while (my $row = $csv->getline(STDIN)) { my @contents = map { do_format($_) } @{$row}; print join('|', @contents), "\n"; } Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
so 1. 12. 2018 v 22:11 odesílatel Daniel Verite <daniel@manitou-mail.org> napsal:
Lætitia Avrot wrote:
> But as Vik said earlier, maybe it's not worth it to provide a markdown
> output as pandoc can generate the markdown from the HTML output.
> And if you need the markdown output to generate HTML why don't you use the
> HTML output ?
The round-trip through pandoc does not do any miracle.
The end result is readable to the human eye but structurally
broken. If converted back to html, it's no longer a table.
Anyway I tend to agree with Vik on this:
"Markdown isn't standardized enough to support and please everyone."
BTW github has independently started to support '|' in the cells
by accepting the quoted version '\|' :
https://help.github.com/articles/organizing-information-with-tables/
Now that we have csv as an output format, we can suggest
custom csv-to-markdown converters to produce markdown
rather than implementing one particular flavor of markdown
in psql, or several flavors through flags. The popular script
languages have solid CSV parsers that make this relatively easy
and safe.
I agree with you about importance of CSV. On second hand, I don't see a reason why we should not to support some very popular markdown formats - although there can be a discussion - which
maybe github and JIRA, CONFLUENCE
Regards
Pavel
Personally I'd use Perl with something like below, which looks
short/simple enough to be shared on wiki.postgresql.org,
along with versions in other languages.
#!/usr/bin/perl
# Usage
# inside psql:
# \pset format csv
# \o |csvtomarkdown >/tmp/output.md
# SQL commands...
# \o
# or psql --csv -c "...query..." | csvtomarkdown
use Text::CSV;
use open qw( :std :encoding(UTF-8) );
my $csv = Text::CSV->new({ binary => 1, eol => $/ });
sub do_format {
# customize to your needs
s/&/&/g;
s/</</g;
s/>/>/g;
s/\n/<br>/g;
s/\|/|/g;
return $_;
}
my $header = $csv->getline(STDIN);
for (@{$header}) {
$_ = do_format($_);
}
print join ('|', @{$header}), "\n";
print join ('|', map { "---" } @{$header}), "\n";
while (my $row = $csv->getline(STDIN)) {
my @contents = map { do_format($_) } @{$row};
print join('|', @contents), "\n";
}
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
If I sum it up, we're at 2 against trying to write such a patch and one for (with some modifications about which markdown format to implement).
Anyone else wants to join the vote?
Cheers,
Lætitia
Le dim. 2 déc. 2018 à 05:11, Pavel Stehule <pavel.stehule@gmail.com> a écrit :
so 1. 12. 2018 v 22:11 odesílatel Daniel Verite <daniel@manitou-mail.org> napsal:Lætitia Avrot wrote:
> But as Vik said earlier, maybe it's not worth it to provide a markdown
> output as pandoc can generate the markdown from the HTML output.
> And if you need the markdown output to generate HTML why don't you use the
> HTML output ?
The round-trip through pandoc does not do any miracle.
The end result is readable to the human eye but structurally
broken. If converted back to html, it's no longer a table.
Anyway I tend to agree with Vik on this:
"Markdown isn't standardized enough to support and please everyone."
BTW github has independently started to support '|' in the cells
by accepting the quoted version '\|' :
https://help.github.com/articles/organizing-information-with-tables/
Now that we have csv as an output format, we can suggest
custom csv-to-markdown converters to produce markdown
rather than implementing one particular flavor of markdown
in psql, or several flavors through flags. The popular script
languages have solid CSV parsers that make this relatively easy
and safe.I agree with you about importance of CSV. On second hand, I don't see a reason why we should not to support some very popular markdown formats - although there can be a discussion - whichmaybe github and JIRA, CONFLUENCERegardsPavel
Personally I'd use Perl with something like below, which looks
short/simple enough to be shared on wiki.postgresql.org,
along with versions in other languages.
#!/usr/bin/perl
# Usage
# inside psql:
# \pset format csv
# \o |csvtomarkdown >/tmp/output.md
# SQL commands...
# \o
# or psql --csv -c "...query..." | csvtomarkdown
use Text::CSV;
use open qw( :std :encoding(UTF-8) );
my $csv = Text::CSV->new({ binary => 1, eol => $/ });
sub do_format {
# customize to your needs
s/&/&/g;
s/</</g;
s/>/>/g;
s/\n/<br>/g;
s/\|/|/g;
return $_;
}
my $header = $csv->getline(STDIN);
for (@{$header}) {
$_ = do_format($_);
}
print join ('|', @{$header}), "\n";
print join ('|', map { "---" } @{$header}), "\n";
while (my $row = $csv->getline(STDIN)) {
my @contents = map { do_format($_) } @{$row};
print join('|', @contents), "\n";
}
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite