Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki - Mailing list pgsql-hackers

From Jan Michálek
Subject Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki
Date
Msg-id CAAYBy8bMBZ2Modd-RyME2YOtFodHaNbc_N7Sy=HnqLS2EkfOYw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki
List pgsql-hackers
I have`t any new version recently, because i was waiting for new version. But I will have some time on this in next months. How many time I have to next freeze?

Thanks Je;

2017-09-13 1:14 GMT+02:00 Daniel Gustafsson <daniel@yesql.se>:
> On 08 May 2017, at 12:02, Fabien COELHO <fabien.coelho@mines-paristech.fr> wrote:
>
> Hello Jan,
>
> Please give a number to submitted patches. I think that this was v3.
>
> The patch does NOT fix various issues I pointed out in my previous review:
>
> - tabs introduced in "doc/src/sgml/ref/psql-ref.sgml"
> - too long help line in "src/bin/psql/help.c"
> - spurious space after a comma in "src/fe_utils/print.c"
>   and possibly elsewhere.
>
> On Sun, 23 Apr 2017, Jan Michálek wrote:
>
>>>> Markdown include characters/sequences which are interpreted as markers:
>>>> _Italic_, **Bold**, *** => horizontal rules, > block quote... `inline
>>>> code`... If they are interpreted within a table cell then probably they
>>>> should be escaped somehow.
>>
>> I have treated "_*|<>"
>
> Probably not enough, see below. Note the escaping chars should also be escaped.
>
>>> I`m able to sanitize characters, but complex sequences will be problem. I
>>> will look on this, but I don`t know, if I`m able to do this.
>
> I do not know whether only those are necessary. Have you checked? Guessing is probably not the right approach.
>
>
> Concerning MARKDOWN, and according to the following source about github markdown implementation:
>
>       https://enterprise.github.com/downloads/en/markdown-cheatsheet.pdf
>
> The following characters may need to be backslash escaped, although it does not cover HTML stuff.
>
>       \   backslash
>       `   backtick
>       *   asterisk
>       _   underscore
>       {}  curly braces
>       []  square brackets
>       ()  parentheses
>       #   hash mark
>       +   plus sign
>       -   minus sign (hyphen)
>       .   dot
>       !   exclamation
>
> Another source https://genius.com/3057216 suggests (* # / ( ) [ ] < >),
> which should protect HTML.
>
> However, the escaping seems to be the backslash character, NOT using html encoding &lt; as done in your version.
>
> Where did you find the precise escaping rules for markdown? I do not think that it should be invented...
>
>
> I have looked at RST, according to this reference:
>
>       http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#grid-tables
>
> The good news is that you do not need to handle a special | case because you would only produce clean tables.
>
> I've tested UTF-8 with plane 1 (你好!) and plane 2 (𠀡) and the alignment seems to worked well, incredible!
>
>>> My main interest on this was in rst. I`m using markdown only in github issues and my knowldge about md is poor.
>
> Then maybe only do RST?!
>
> It looks much simpler anyway, and if you do MARKDOWN the support needs to be clean.
>
> About the code:
>
> I'm still at odds with the code which needs to test for markdown to call for different functions in multiple places. If you keep md and in order to avoid that, I would suggest to extend the pg_wcs* functions with a list of caracters which may have different sizes with additionnal args, say:
>
>  pg_wcssize(// same args, plus:
>             char * escaped_chars, // will require escaping
>             int escape_len, // how many chars added when escaping
>             int nllen // len of newline if substituted
>             );
>
> So that pg_wcssize(..., "\r", 1, 1) would behave as before (\n and \t are rather special cases), and the various constants could be held in the format description so the whole thing would be parametric.
>
> Same approach with format.
>
> That would allow to simplify the code significantly and to share it between MARKDOWN and others. Also, the multiple "else if" list would be simplified by using strchr or the escaped_chars string.

This patch was moved into the current Commitfest marked “Waiting for author”
with the above review.  Have you had a chance to work on it addressing the
review comments such that we can expect a new version within this CF?

cheers ./daniel



--
Jelen
Starší čeledín datovýho chlíva

pgsql-hackers by date:

Previous
From: Nikita Glukhov
Date:
Subject: Re: [HACKERS] SQL/JSON in PostgreSQL
Next
From: Daniel Gustafsson
Date:
Subject: Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki