Re: fulltext parser strange behave - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: fulltext parser strange behave
Date
Msg-id 4739FE1A.3090508@dunslane.net
Whole thread Raw
In response to Re: fulltext parser strange behave  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> I've just been looking at the state machine in wparser_def.c. I think 
>> the processing for entities is also a few bob short in the pound. It 
>> recognises decimal numeric character references, but nor hexadecimal 
>> numeric character references. That's fairly silly since the HTML spec 
>> specifically says the latter are "particularly useful". The rules for 
>> named entities are also deficient w.r.t. digits, just like the case of 
>> tags that Tom noticed. This isn't academic: HTML features a number of 
>> named entities with digits in the name (sup2, frac14 for example).
>>     
>
>   
>> In XML at least, legal names are defined by the following rules from the 
>> spec:
>> ...
>> [A-Za-z:_][A-Za-z0-9:_.-]*
>>     
>
>   
>> I suggest we use that or something very close to it as the rule for 
>> names in these patterns.
>>     
>
> No objections here.  Who wants to patch wparser_def?
>
>             
>   


I can get to it some time in the next week. - rather snowed under right now.

BTW, I'm also suspicious of the clause that allows <?xml ... it appears 
that it will allow <?xfoo  and <?XFOO also, which seems quite odd, 
especially the latter.

cheers

andrew


pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: How to keep a table in memory?
Next
From: Andrew Sullivan
Date:
Subject: Re: How to keep a table in memory?