Re: Doc: typo in config.sgml - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Doc: typo in config.sgml
Date
Msg-id 20241008.090317.1416798526131173425.ishii@postgresql.org
Whole thread Raw
In response to Doc: typo in config.sgml  (Tatsuo Ishii <ishii@postgresql.org>)
Responses Re: Doc: typo in config.sgml
List pgsql-hackers
> On Tue, 1 Oct 2024 22:20:55 +0900
> Yugo Nagata <nagata@sraoss.co.jp> wrote:
> 
>> On Tue, 1 Oct 2024 15:16:52 +0900
>> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
>> 
>> > On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
>> > Tatsuo Ishii <ishii@postgresql.org> wrote:
>> > 
>> > > >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in
>> > > >> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
>> > > >> point in Unicode. i.e. U+00A0).
>> > > >> So grep -P "[\xC2\xA0]" should work to detect nbsp.
>> > > > 
>> > > > `LC_ALL=C grep -P "\xC2\xA0"` works for my environment. 
>> > > > ([ and ] were not necessary.)
>> > > > 
>> > > > When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml,
>> > > > but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting
>> > > > nbsp.
>> > > > 
>> > > > One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it.
>> > > > 
>> > > > On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash.
>> > > > 
>> > > > Maybe, better way is use perl itself rather than grep as following.
>> > > > 
>> > > >  `perl -ne '/\xC2\xA0/ and print' `
>> > > > 
>> > > > I attached a patch fixed in this way.
>> > > 
>> > > GNU sed can also be used without setting LC_ALL:
>> > > 
>> > > sed -n /"\xC2\xA0"/p
>> > > 
>> > > However I am not sure if non-GNU sed can do this too...
>> > 
>> > Although I've not check it myself, BSD sed doesn't support \x escape according to [1].
>> > 
>> > [1]
https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference
>> > 
>> > By the way, I've attached a patch a bit modified to use the plural form statement
>> > as same as check-tabs.
>> > 
>> >  Non-breaking **spaces** appear in SGML/XML files
>> 
>> The previous patch was broken because the perl command failed to return the correct result.
>> I've attached an updated patch to fix the return value. In passing, I added line breaks
>> for long lines.
> 
> I've attached a updated patch. 
> I added the comment to explain why Perl is used instead of grep or sed.

Looks good to me. If there's no objection, I will commit this to
master branch.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: On disable_cost
Next
From: Bruce Momjian
Date:
Subject: Re: First draft of PG 17 release notes