Lexing with different charsets - Mailing list pgsql-hackers

From Dennis Bjorklund
Subject Lexing with different charsets
Date
Msg-id Pine.LNX.4.44.0404131843530.4551-100000@zigo.dhs.org
Whole thread Raw
Responses Re: Lexing with different charsets
Re: Lexing with different charsets
Re: Lexing with different charsets
List pgsql-hackers
I've spent some more time reading specs today. Together with Peter E's
explanataion (Thanks!) I think I've got a farily good understanding of the
parts talking about locales now.

My next question is about lexing. The spec says that one can use strings 
of different charsets in the queries, like:
 ... WHERE field1 = _latin1'FooBar' and field2 = _utf8'Åäö'

I can see that the lexer either needs to be taught about all the
different charsets or this is not going to work very well.

What if one wants to include a string in utf-16 in the query, the lexer
can not handle that without understanding utf-16. The query can also be in
different charsets. If it's in utf-8 for example, then we can not embed
latin1 strings and still have a validating utf-8 query. With the above we
can not think of the query as being in a single charset anymore. That's 
strange but okay I guess.

The new wire protocol allows us to send data seperatly from the query
which is nice, but the standard talked about strings as above so it's not
a solution to the problem.

Maybe I should have adressed this to Peter directly :-)

-- 
/Dennis Björklund



pgsql-hackers by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: make == as = ?
Next
From: Jaume Teixi
Date:
Subject: unsubscribe