Thread: UNICODE and PL/PGSQL

UNICODE and PL/PGSQL

From
"Bart Degryse"
Date:
Dear all,
This works: SELECT '\x65'; => it returns the letter 'e'.
When I do the following in PL/PGSQL it returns the same letter 'e' (as might be expected);
 
CREATE OR REPLACE FUNCTION "public"."myfunction" (out result varchar) RETURNS varchar AS
$body$
DECLARE
  charset varchar := '';
BEGIN
  charset := charset || '\x65';
  result := charset;
  RETURN;
END;
$body$
LANGUAGE 'plpgsql' IMMUTABLE RETURNS NULL ON NULL INPUT SECURITY INVOKER;
 
However, when I compose charset dynamically it doesn't work anymore. Following function returns 'x65' instead of 'e'.
Can anyone tell me why that is and how to make it work? The target is of course to change the values in the FOR control structure.
 
CREATE OR REPLACE FUNCTION "public"."myfunction" (out result varchar) RETURNS varchar AS
$body$
DECLARE
  charset varchar := '';
BEGIN
  FOR i IN 101..101 LOOP
    charset := charset || '\x' || to_hex(i);
  END LOOP;
  result := charset;
  RETURN;
END;
$body$
LANGUAGE 'plpgsql' IMMUTABLE RETURNS NULL ON NULL INPUT SECURITY INVOKER;
 
Thanks for any help.
Bart
 

Re: UNICODE and PL/PGSQL

From
Ragnar
Date:
On mán, 2006-11-27 at 11:52 +0100, Bart Degryse wrote:
> Dear all,
> This works: SELECT '\x65'; => it returns the letter 'e'.

yes, but:
test=# select '\x'||'65';?column?
----------x65
(1 row)


> When I do the following in PL/PGSQL ....

>   FOR i IN 101..101 LOOP
>     charset := charset || '\x' || to_hex(i);
>   END LOOP;


gnari




Re: UNICODE and PL/PGSQL

From
"Bart Degryse"
Date:
Hi gnari,
I suppose your statement
test=# select '\x'||'65';
is done on some command line interface. I don't have that. I can only use some client program. I'm using EMS SQL Manager 2007 and pgAdmin III 1.3
None of them accepts your statement.
My point is that in my first version of the function (  charset := charset || '\x65';  ) a unicode hex value for the letter e is stored in charset.
When I try to do the same for a range of hex values (  FOR i IN 101..101 LOOP    charset := charset || '\x' || to_hex(i);  ) it is not longer a bunch of hex values that get stored but a series of varchars.
When  you run the first version of the function (  SELECT myfunction();  ) 'e' is returned, with the second version 'x65' is returned, while I want also 'e' to be returned.
Any ideas?


>>> Ragnar <gnari@hive.is> 2006-11-27 15:37 >>>
On mán, 2006-11-27 at 11:52 +0100, Bart Degryse wrote:
> Dear all,
> This works: SELECT '\x65'; => it returns the letter 'e'.

yes, but:
test=# select '\x'||'65';
?column?
----------
x65
(1 row)


> When I do the following in PL/PGSQL ....

>   FOR i IN 101..101 LOOP
>     charset := charset || '\x' || to_hex(i);
>   END LOOP;


gnari


Re: UNICODE and PL/PGSQL

From
Markus Schaber
Date:
Hi, Bart,

Bart Degryse wrote:

> I suppose your statement
> test=# select '\x'||'65';
> is done on some command line interface. I don't have that. I can only
> use some client program.

All versions of PostgreSQL I know are shipped with "psql" as command
line interface. (It's a client program, actually. :-)

> I'm using EMS SQL Manager 2007 and pgAdmin III 1.3
> None of them accepts your statement.

I just tried with pgAdmin III 1.4.3, and it worked fine.

> When I try to do the same for a range of hex values (  FOR i IN 101..101
> LOOP    charset := charset || '\x' || to_hex(i);  ) it is not longer a
> bunch of hex values that get stored but a series of varchars.

The problem is that the \x escaping is done in the parser, so in your
first function, the query engine actually sees"charset := charset || 'e';"

In the second function, the '\x' string is parsed as is, and converted
to the String 'x' instead of being rejected as broken \x sequence, I
think for compatibility reasons. Then, the engine sees:"charset := charset || 'x' || to_hex(i);"


Maybe you can change it to (ASCII version):"charset := charset || chr(i);"
or (256-bit version):"charset := charset || decode(to_hex(i),'hex');


HTH,
Markus
-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


Re: UNICODE and PL/PGSQL

From
"Bart Degryse"
Date:
Hi Markus,
I should have said "The ICT department doesn't give me access to the psql command line interface".
Thanks for explaining what goes wrong. I now understand the problem. It doesn't solve it though.
I haven't tried your first suggestion since ASCII won't be good enough. I also need to be able to do something like
  charset := charset || '\xC2\xA9';
but then in the dynamic form like
  charset := charset || '\xC2\x' || to_hex(i)';
So I thought your second suggestion might be of help.
I get an error though:
ERROR:  operator does not exist: character varying || bytea
HINT:  No operator matches the given name and argument type(s). You may need to add explicit type casts.
CONTEXT:  SQL statement "SELECT   $1  || decode(to_hex( $2 ), 'hex')"

>>> Markus Schaber <schabi@logix-tt.com> 2006-11-27 16:16 >>>
Hi, Bart,

Bart Degryse wrote:

> I suppose your statement
> test=# select '\x'||'65';
> is done on some command line interface. I don't have that. I can only
> use some client program.

All versions of PostgreSQL I know are shipped with "psql" as command
line interface. (It's a client program, actually. :-)

> I'm using EMS SQL Manager 2007 and pgAdmin III 1.3
> None of them accepts your statement.

I just tried with pgAdmin III 1.4.3, and it worked fine.

> When I try to do the same for a range of hex values (  FOR i IN 101..101
> LOOP    charset := charset || '\x' || to_hex(i);  ) it is not longer a
> bunch of hex values that get stored but a series of varchars.

The problem is that the \x escaping is done in the parser, so in your
first function, the query engine actually sees
"charset := charset || 'e';"

In the second function, the '\x' string is parsed as is, and converted
to the String 'x' instead of being rejected as broken \x sequence, I
think for compatibility reasons. Then, the engine sees:
"charset := charset || 'x' || to_hex(i);"


Maybe you can change it to (ASCII version):
"charset := charset || chr(i);"
or (256-bit version):
"charset := charset || decode(to_hex(i),'hex');


HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: UNICODE and PL/PGSQL

From
Ragnar
Date:
On mán, 2006-11-27 at 15:54 +0100, Bart Degryse wrote:
> Hi gnari,
> I suppose your statement
> test=# select '\x'||'65';
> is done on some command line interface. I don't have that. I can only
> use some client program. I'm using EMS SQL Manager 2007 and pgAdmin
> III 1.3
> None of them accepts your statement.
> My point is that in my first version of the function (  charset :=
> charset || '\x65';  ) a unicode hex value for the letter e is stored
> in charset.

my point was that '\x65' as a literal
is read as 'e', but '\\x' || '65' is
just the concatation of 2 2-char varchars

> When I try to do the same for a range of hex values (  FOR i IN
> 101..101 LOOP    charset := charset || '\x' || to_hex(i);  )

you might be able to do what you want with
set_byte() and encode().

CREATE OR REPLACE FUNCTION myfunction()  RETURNS varchar AS 
$x$  DECLARE x bytea := '     '; BEGIN   FOR i in 101..105 LOOP     x:=set_byte(x,i-101,i);   END LOOP;   RETURN
encode(x,''escape'');END;
 
$x$LANGUAGE 'plpgsql';

select myfunction(); 


gnari




Re: UNICODE and PL/PGSQL

From
Richard Huxton
Date:
Bart Degryse wrote:
> Hi Markus,
> I should have said "The ICT department doesn't give me access to the psql command line interface".
> Thanks for explaining what goes wrong. I now understand the problem. It doesn't solve it though.

What's wrong with the chr() function?
http://www.postgresql.org/docs/8.1/static/functions-string.html

--   Richard Huxton  Archonet Ltd


Re: UNICODE and PL/PGSQL

From
Markus Schaber
Date:
Hi, Bart,

Bart Degryse wrote:

> I should have said "The ICT department doesn't give me access to the
> psql command line interface".

Hmm, and you can't install psql on the same host you use to run pgadmin?

It's just a client, that connects to PostgreSQL the same way as pgamdin.


> ERROR:  operator does not exist: character varying || bytea
> HINT:  No operator matches the given name and argument type(s). You may
> need to add explicit type casts.
> CONTEXT:  SQL statement "SELECT   $1  || decode(to_hex( $2 ), 'hex')"

Hmm. Strange.

Does it happen in a plpgsql function, or when issuing the select as
PREPARE statement via pgadmin?

Which PostgreSQL version are you running?

Markus

-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


Re: UNICODE and PL/PGSQL

From
"Bart Degryse"
Date:
As the manual says : Character with the given ASCII code
I need characters outside the ASCII range. Something like
  charset := charset || '\xC2\x' || to_hex(i)';

>>> Richard Huxton <dev@archonet.com> 2006-11-27 16:53 >>>
Bart Degryse wrote:
> Hi Markus,
> I should have said "The ICT department doesn't give me access to the psql command line interface".
> Thanks for explaining what goes wrong. I now understand the problem. It doesn't solve it though.

What's wrong with the chr() function?
http://www.postgresql.org/docs/8.1/static/functions-string.html

--
   Richard Huxton
   Archonet Ltd

Re: UNICODE and PL/PGSQL

From
"Bart Degryse"
Date:
I haven't got enough permissions to install new programs. Big brother... ;-)
I get the error in a plpgsql function
I use PostgreSQL 8.1.4 with EMS SQL Manager 2005 (3.6.0.1).

>>> Markus Schaber <schabi@logix-tt.com> 2006-11-27 16:53 >>>
Hi, Bart,

Bart Degryse wrote:

> I should have said "The ICT department doesn't give me access to the
> psql command line interface".

Hmm, and you can't install psql on the same host you use to run pgadmin?

It's just a client, that connects to PostgreSQL the same way as pgamdin.


> ERROR:  operator does not exist: character varying || bytea
> HINT:  No operator matches the given name and argument type(s). You may
> need to add explicit type casts.
> CONTEXT:  SQL statement "SELECT   $1  || decode(to_hex( $2 ), 'hex')"

Hmm. Strange.

Does it happen in a plpgsql function, or when issuing the select as
PREPARE statement via pgadmin?

Which PostgreSQL version are you running?

Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: UNICODE and PL/PGSQL

From
Richard Huxton
Date:
Bart Degryse wrote:
> As the manual says : Character with the given ASCII code
> I need characters outside the ASCII range. Something like
>   charset := charset || '\xC2\x' || to_hex(i)';

Well, if you've tried it for characters > 127 and it didn't work then I 
don't know that it is possible at all. You *did* try the function?

--   Richard Huxton  Archonet Ltd


Re: UNICODE and PL/PGSQL

From
Markus Schaber
Date:
Hi, Bart,

Bart Degryse wrote:
> I haven't got enough permissions to install new programs. Big brother... ;-)

A developer doesn't get the permissions to use the tools he need for his
work?

Developers absolutely _must_ have root / admin permissions on their own
workstations, that's a prerequirement for them to fulfil their job.

Time to search a new employer, honest!

> I get the error in a plpgsql function

Ah, I just saw that the "decode" function returns bytea, not text.

I'll need a further look into this...

Sorry,
Markus

-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


Re: UNICODE and PL/PGSQL

From
"Bart Degryse"
Date:
Yes, I tried and it didn't work. PostgreSQL surely makes something of it, but not the right characters.
Unless Markus can make his idea using "decode" work, this might be something pl/pgsql cannot do.

>>> Richard Huxton <dev@archonet.com> 2006-11-27 17:34 >>>
Bart Degryse wrote:
> As the manual says : Character with the given ASCII code
> I need characters outside the ASCII range. Something like
>   charset := charset || '\xC2\x' || to_hex(i)';

Well, if you've tried it for characters > 127 and it didn't work then I
don't know that it is possible at all. You *did* try the function?

--
   Richard Huxton
   Archonet Ltd

Re: UNICODE and PL/PGSQL

From
"Bart Degryse"
Date:
Basically, I agree with your viewpoint on developer's permissions, but any annoyance can be fixed with a large enough paycheck...


>>> Markus Schaber <schabi@logix-tt.com> 2006-11-27 17:34 >>>
Hi, Bart,

Bart Degryse wrote:
> I haven't got enough permissions to install new programs. Big brother... ;-)

A developer doesn't get the permissions to use the tools he need for his
work?

Developers absolutely _must_ have root / admin permissions on their own
workstations, that's a prerequirement for them to fulfil their job.

Time to search a new employer, honest!

> I get the error in a plpgsql function

Ah, I just saw that the "decode" function returns bytea, not text.

I'll need a further look into this...

Sorry,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: UNICODE and PL/PGSQL

From
Ragnar
Date:
On þri, 2006-11-28 at 09:14 +0100, Bart Degryse wrote:
> Yes, I tried and it didn't work. PostgreSQL surely makes something of
> it, but not the right characters.

maybe you should show us exacly what you did, and what
you got, and what you expected

> Unless Markus can make his idea using "decode" work, this might be
> something pl/pgsql cannot do.

did you try the function I posted yesterday elsewhere
in this thread?

gnari




Re: UNICODE and PL/PGSQL

From
"Bart Degryse"
Date:
Due to a lack of time, I'm closing the thread. It takes a little more code, but I've decided to just forget about the loop and write every single statement in my function.
Thanks anyway for explaning and brainstorming.

>>> Ragnar <gnari@hive.is> 2006-11-28 10:28 >>>
On þri, 2006-11-28 at 09:14 +0100, Bart Degryse wrote:
> Yes, I tried and it didn't work. PostgreSQL surely makes something of
> it, but not the right characters.

maybe you should show us exacly what you did, and what
you got, and what you expected

> Unless Markus can make his idea using "decode" work, this might be
> something pl/pgsql cannot do.

did you try the function I posted yesterday elsewhere
in this thread?

gnari