Re: fulltext parser strange behave - Mailing list pgsql-hackers

From Tom Lane
Subject Re: fulltext parser strange behave
Date
Msg-id 13471.1194634433@sss.pgh.pa.us
Whole thread Raw
In response to Re: fulltext parser strange behave  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: fulltext parser strange behave  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
> I've just been looking at the state machine in wparser_def.c. I think 
> the processing for entities is also a few bob short in the pound. It 
> recognises decimal numeric character references, but nor hexadecimal 
> numeric character references. That's fairly silly since the HTML spec 
> specifically says the latter are "particularly useful". The rules for 
> named entities are also deficient w.r.t. digits, just like the case of 
> tags that Tom noticed. This isn't academic: HTML features a number of 
> named entities with digits in the name (sup2, frac14 for example).

> In XML at least, legal names are defined by the following rules from the 
> spec:
> ...
> [A-Za-z:_][A-Za-z0-9:_.-]*

> I suggest we use that or something very close to it as the rule for 
> names in these patterns.

No objections here.  Who wants to patch wparser_def?
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Gevik Babakhani"
Date:
Subject: Re: Throw error and ErrorContext question.
Next
From: Alvaro Herrera
Date:
Subject: Re: New tzdata available