Thread: pgsql: Fix XML tag namespace change inadvertantly missed from previous

pgsql: Fix XML tag namespace change inadvertantly missed from previous

From
adunstan@postgresql.org (Andrew Dunstan)
Date:
Log Message:
-----------
Fix XML tag namespace change inadvertantly missed from previous fix. Add
regression test for XML names and numeric entities.

Modified Files:
--------------
    pgsql/src/backend/tsearch:
        wparser_def.c (r1.11 -> r1.12)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/tsearch/wparser_def.c?r1=1.11&r2=1.12)
    pgsql/src/test/regress/expected:
        tsearch.out (r1.9 -> r1.10)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/tsearch.out?r1=1.9&r2=1.10)
    pgsql/src/test/regress/sql:
        tsearch.sql (r1.4 -> r1.5)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/sql/tsearch.sql?r1=1.4&r2=1.5)

Re: pgsql: Fix XML tag namespace change inadvertantly missed from previous

From
Tom Lane
Date:
adunstan@postgresql.org (Andrew Dunstan) writes:
> Fix XML tag namespace change inadvertantly missed from previous fix. Add
> regression test for XML names and numeric entities.

Still one gripe:

regression=# select * from ts_debug(' λ λ');
  alias  |       description        |  token  | dictionaries | dictionary | lexemes
---------+--------------------------+---------+--------------+------------+---------
 blank   | Space symbols            |         | {}           |            |
 entity  | XML entity               | λ | {}           |            |
 blank   | Space symbols            |         | {}           |            |
 blank   | Space symbols            | &#      | {}           |            |
 numword | Word, letters and digits | X3BB    | {simple}     | simple     | {x3bb}
 blank   | Space symbols            | ;       | {}           |            |
(6 rows)

Aren't hexadecimal entities supposed to be case-insensitive?

            regards, tom lane

Re: pgsql: Fix XML tag namespace change inadvertantly missed from previous

From
Andrew Dunstan
Date:

Tom Lane wrote:
> adunstan@postgresql.org (Andrew Dunstan) writes:
>
>> Fix XML tag namespace change inadvertantly missed from previous fix. Add
>> regression test for XML names and numeric entities.
>>
>
> Still one gripe:
>
> regression=# select * from ts_debug(' λ λ');
>   alias  |       description        |  token  | dictionaries | dictionary | lexemes
> ---------+--------------------------+---------+--------------+------------+---------
>  blank   | Space symbols            |         | {}           |            |
>  entity  | XML entity               | λ | {}           |            |
>  blank   | Space symbols            |         | {}           |            |
>  blank   | Space symbols            | &#      | {}           |            |
>  numword | Word, letters and digits | X3BB    | {simple}     | simple     | {x3bb}
>  blank   | Space symbols            | ;       | {}           |            |
> (6 rows)
>
> Aren't hexadecimal entities supposed to be case-insensitive?
>
>
>

The 'x' must be lower case, the hex digits can be upper or lower. The
XML spec says:

    CharRef       ::=       '&#' [0-9]+ ';'
                | '&#x' [0-9a-fA-F]+ ';'

cheers

andrew



Re: pgsql: Fix XML tag namespace change inadvertantly missed from previous

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> Aren't hexadecimal entities supposed to be case-insensitive?

> The 'x' must be lower case, the hex digits can be upper or lower. The
> XML spec says:

But we're also interested in parsing HTML, and upper case X is
allowed in HTML:
http://www.w3.org/TR/REC-html40/charset.html#h-5.3.1

            regards, tom lane

Re: pgsql: Fix XML tag namespace change inadvertantly missed from previous

From
Andrew Dunstan
Date:

I wrote:
>
>
> Tom Lane wrote:
>>
>>
>> Aren't hexadecimal entities supposed to be case-insensitive?
>>
>>
>>
>
> The 'x' must be lower case, the hex digits can be upper or lower. The
> XML spec says:
>
>    CharRef       ::=       '&#' [0-9]+ ';'
>                | '&#x' [0-9a-fA-F]+ ';'
>
>

But I also see that the HTML spec allows for 'X' as well as 'x', so I'll
change it.

cheers

andrew