Thread: Using psql to insert character codes

Using psql to insert character codes

From
Ian Barwick
Date:
Say using psql I wish to insert a character into a VARCHAR / TEXT / whatever
column using the its hexadecimal representation in the relevant character set
/ encoding. E.g.: C3A4, which represents the character 'ä' (lower case a with
umlaut) in UTF-8.

I can do this:
INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), 'escape'))

Is there any other, shorter way of doing the same?


Ian Barwick
barwick@gmx.net



Re: Using psql to insert character codes

From
Randall Lucas
Date:
Yes, use a Mac!

I'm not being entirely facetious -- if you can use a Mac OS X terminal
prompt (and if your pg config is substantially similar to mine),
inserting any unicode stuff is quite easy.  Simply typing or
cutting-and-pasting at the terminal let me visually put in accented
Latin, Cyrillic, and Chinese (don't ask me what kind, I am a
sinoignoramus) no sweat.

Also, Java works well for this -- if you can write or use a Java app to
take advantage of its I18N, it will handle the encoding issues via JDBC
quite nicely as far as I can tell.

Best,

Randall

On Saturday, May 10, 2003, at 10:20 AM, Ian Barwick wrote:

>
> Say using psql I wish to insert a character into a VARCHAR / TEXT /
> whatever
> column using the its hexadecimal representation in the relevant
> character set
> / encoding. E.g.: C3A4, which represents the character 'ä' (lower case
> a with
> umlaut) in UTF-8.
>
> I can do this:
> INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'),
> 'escape'))
>
> Is there any other, shorter way of doing the same?
>
>
> Ian Barwick
> barwick@gmx.net
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to
> majordomo@postgresql.org)
>



Re: Using psql to insert character codes

From
Ian Barwick
Date:
On Saturday 10 May 2003 18:15, Randall Lucas wrote:
> Yes, use a Mac!
>
> I'm not being entirely facetious -- if you can use a Mac OS X terminal
> prompt (and if your pg config is substantially similar to mine),
> inserting any unicode stuff is quite easy.  Simply typing or
> cutting-and-pasting at the terminal let me visually put in accented
> Latin, Cyrillic, and Chinese (don't ask me what kind, I am a
> sinoignoramus) no sweat.

Thanks, but right answer to wrong question ;-). "mlterm" in Linux / FreeBSD
works fine for me; the "problem" is not inputting in general, but the
ability to use psql to input using the hexadecimal (or decimal) character
code, which I need to do occasionally for some obscure character not
reachable by other means.

The solution I posted works, I was just wondering whether there was some more
elegant method I had overlooked.

Ian Barwick
barwick@gmx.net

> On Saturday, May 10, 2003, at 10:20 AM, Ian Barwick wrote:
> > Say using psql I wish to insert a character into a VARCHAR / TEXT /
> > whatever
> > column using the its hexadecimal representation in the relevant
> > character set
> > / encoding. E.g.: C3A4, which represents the character 'ä' (lower case
> > a with
> > umlaut) in UTF-8.
> >
> > I can do this:
> > INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'),
> > 'escape'))
> >
> > Is there any other, shorter way of doing the same?
> >
> >
> > Ian Barwick
> > barwick@gmx.net
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> >     (send "unregister YourEmailAddressHere" to
> > majordomo@postgresql.org)



Re: Using psql to insert character codes

From
Ian Barwick
Date:
On Saturday 10 May 2003 16:20, Ian Barwick wrote:
> Say using psql I wish to insert a character into a VARCHAR / TEXT /
> whatever column using the its hexadecimal representation in the relevant
> character set / encoding. E.g.: C3A4, which represents the character 'ä'
> (lower case a with umlaut) in UTF-8.
>
> I can do this:
> INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), 'escape'))

correction:
INSERT INTO my_tbl (unitxt) VALUES(encode(decode('c3a4','hex'), 'escape'))

Ian Barwick
barwick@gmx.net



Re: Using psql to insert character codes

From
Randall Lucas
Date:
Ah, I see.  </mac_over_enthusiasm>  In that case, I would recommend
just making a custom function that handles that for you, maybe:

create function hex2esc(text) returns text as '
select encode(decode($1,''hex''), ''escape'');
' language sql;

Not a whole lot more elegant but shorter.

Best,

Randall

On Saturday, May 10, 2003, at 12:53 PM, Ian Barwick wrote:

> On Saturday 10 May 2003 18:15, Randall Lucas wrote:
>> Yes, use a Mac!
>>
>> I'm not being entirely facetious -- if you can use a Mac OS X terminal
>> prompt (and if your pg config is substantially similar to mine),
>> inserting any unicode stuff is quite easy.  Simply typing or
>> cutting-and-pasting at the terminal let me visually put in accented
>> Latin, Cyrillic, and Chinese (don't ask me what kind, I am a
>> sinoignoramus) no sweat.
>
> Thanks, but right answer to wrong question ;-). "mlterm" in Linux /
> FreeBSD
> works fine for me; the "problem" is not inputting in general, but the
> ability to use psql to input using the hexadecimal (or decimal)
> character
> code, which I need to do occasionally for some obscure character not
> reachable by other means.
>
> The solution I posted works, I was just wondering whether there was
> some more
> elegant method I had overlooked.
>
> Ian Barwick
> barwick@gmx.net
>
>> On Saturday, May 10, 2003, at 10:20 AM, Ian Barwick wrote:
>>> Say using psql I wish to insert a character into a VARCHAR / TEXT /
>>> whatever
>>> column using the its hexadecimal representation in the relevant
>>> character set
>>> / encoding. E.g.: C3A4, which represents the character 'ä' (lower
>>> case
>>> a with
>>> umlaut) in UTF-8.
>>>
>>> I can do this:
>>> INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'),
>>> 'escape'))
>>>
>>> Is there any other, shorter way of doing the same?
>>>
>>>
>>> Ian Barwick
>>> barwick@gmx.net
>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 2: you can get off all lists at once with the unregister command
>>>     (send "unregister YourEmailAddressHere" to
>>> majordomo@postgresql.org)
>



merge-join for domain with underlying type text

From
Randall Lucas
Date:
Hi Folks,

If I have a domain defined with an underlying type of "text" or 
"varchar," and I need to FULL OUTER JOIN two tables based upon the 
similarity in these fields, I am getting:

ERROR:  FULL JOIN is only supported with mergejoinable join conditions

I have looked at the docs and it seems that it's a matter of making the 
comparison operators for the domain type (as per SQL Reference :: 
"CREATE OPERATOR" section).  However, is there anything tricky about 
adding the merge-join capability to a domain?  Is there a preferred 
method?  Am I barking entirely up the wrong tree (could I, for example, 
use a casting trick?  Simply recasting the fields as type "text" 
doesn't avoid the error)?

Postgres version is 7.3.

Best,

Randall



Re: merge-join for domain with underlying type text

From
Rod Taylor
Date:
On Sat, 2003-05-10 at 17:04, Randall Lucas wrote:
> Hi Folks,
>
> If I have a domain defined with an underlying type of "text" or
> "varchar," and I need to FULL OUTER JOIN two tables based upon the
> similarity in these fields, I am getting:

Seems it works perfectly fine with integers and text based domains on
7.3 and 7.4.  Do you have a full failing example?

junk=# create domain int as integer;
CREATE DOMAIN
junk=# create table a (col1 int);
CREATE TABLE
junk=# create table b (col1 int);
CREATE TABLE
junk=# select * from a full outer join b using (col1);col1
------
(0 rows)
junk=# select version();                               version
------------------------------------------------------------------------PostgreSQL 7.4devel on i386-unknown-freebsd4.8,
compiledby GCC 2.95.4 
(1 row)

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Re: merge-join for domain with underlying type text

From
Randall Lucas
Date:
-- Hmm, here's some code that manifests the problem:

-- using text itself:
create table text_test_1 ( key text, stuff varchar
);
insert into text_test_1 values ('alpha', 'asdflkjasdf');
insert into text_test_1 values ('bravo', 'asdfdasfsaff');
insert into text_test_1 values ('delta', 'asdfasfdas');
create table text_test_2 ( other_key text, more_stuff varchar
);
insert into text_test_2 values ('charlie', 'asdfasfasfda');
insert into text_test_2 values ('delta', 'asgasgdda');
insert into text_test_2 values ('echo', 'asdasfsdfsfda');
select * from text_test_1 full outer join text_test_2 on  
text_test_1.key = text_test_2.other_key;

-- Works OK, BUT:

-- using domain textual:
create domain textual_test as text;
create table textual_test_1 ( key textual_test, stuff varchar
);
insert into textual_test_1 values ('alpha', 'asdflkjasdf');
insert into textual_test_1 values ('bravo', 'asdfdasfsaff');
insert into textual_test_1 values ('delta', 'asdfasfdas');
create table textual_test_2 ( other_key textual_test, more_stuff varchar
);
insert into textual_test_2 values ('charlie', 'asdfasfasfda');
insert into textual_test_2 values ('delta', 'asgasgdda');
insert into textual_test_2 values ('echo', 'asdasfsdfsfda');
select * from textual_test_1 full outer join textual_test_2 on  
textual_test_1.key = textual_test_2.other_key;

-- Will give ERROR:  FULL JOIN is only supported with mergejoinable  
join conditions

-- clean up
drop table text_test_1;
drop table text_test_2;
drop table textual_test_1;
drop table textual_test_2;
drop domain textual_test;

-- Best,
--
-- Randall

On Saturday, May 10, 2003, at 06:13 PM, Rod Taylor wrote:

> On Sat, 2003-05-10 at 17:04, Randall Lucas wrote:
>> Hi Folks,
>>
>> If I have a domain defined with an underlying type of "text" or
>> "varchar," and I need to FULL OUTER JOIN two tables based upon the
>> similarity in these fields, I am getting:
>
> Seems it works perfectly fine with integers and text based domains on
> 7.3 and 7.4.  Do you have a full failing example?
>
> junk=# create domain int as integer;
> CREATE DOMAIN
> junk=# create table a (col1 int);
> CREATE TABLE
> junk=# create table b (col1 int);
> CREATE TABLE
> junk=# select * from a full outer join b using (col1);
>  col1
> ------
> (0 rows)
>
> junk=# select version();
>                                 version
> ----------------------------------------------------------------------- 
> -
>  PostgreSQL 7.4devel on i386-unknown-freebsd4.8, compiled by GCC 2.95.4
> (1 row)
>
> -- 
> Rod Taylor <rbt@rbt.ca>
>
> PGP Key: http://www.rbt.ca/rbtpub.asc
> <signature.asc>



Re: merge-join for domain with underlying type text

From
Rod Taylor
Date:
On Sat, 2003-05-10 at 19:40, Randall Lucas wrote:
> -- Hmm, here's some code that manifests the problem:

Seems it's been fixed in 7.4.  In the mean time, you might try this:

select * from textual_test_1 full outer join textual_test_2 on
CAST(textual_test_1.key AS text) = CAST(textual_test_2.other_key AS
text);

-- LOG --
h=# create domain textual_test as text;
CREATE DOMAIN
h=# create table textual_test_1 (
h(#  key textual_test,
h(#  stuff varchar
h(# );
CREATE TABLE
h=# insert into textual_test_1 values ('alpha', 'asdflkjasdf');
INSERT 154456 1
h=# insert into textual_test_1 values ('bravo', 'asdfdasfsaff');
INSERT 154457 1
h=# insert into textual_test_1 values ('delta', 'asdfasfdas');
INSERT 154458 1
h=# create table textual_test_2 (
h(#  other_key textual_test,
h(#  more_stuff varchar
h(# );
CREATE TABLE
h=# insert into textual_test_2 values ('charlie', 'asdfasfasfda');
INSERT 154464 1
h=# insert into textual_test_2 values ('delta', 'asgasgdda');
INSERT 154465 1
h=# insert into textual_test_2 values ('echo', 'asdasfsdfsfda');
INSERT 154466 1
h=# select * from textual_test_1 full outer join textual_test_2 on
textual_test_1.key = textual_test_2.other_key; key  |    stuff     | other_key |  more_stuff
-------+--------------+-----------+---------------alpha | asdflkjasdf  |           |bravo | asdfdasfsaff |           |
   |              | charlie   | asdfasfasfdadelta | asdfasfdas   | delta     | asgasgdda      |              | echo
| asdasfsdfsfda 
(5 rows)
h=# select version();                               version
------------------------------------------------------------------------PostgreSQL 7.4devel on i386-unknown-freebsd4.8,
compiledby GCC 2.95.4 
(1 row)

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Re: merge-join for domain with underlying type text

From
Randall Lucas
Date:
OK, I ended up wrapping up the text equality and comparison functions 
in simple sql functions that explicitly take my domain as the type, 
then added =, <>, <, <=, >, >= operators pointing to those functions, 
and now I can perform full outer joins with my domain "literal."

See below for ugly code.

Hackers, why is this?  It seems to me that since everything else (or 
everything I've run into so far, up to the full outer joins) is 
implicitly the same for a simple domain and its underlying base type, 
that it would make sense if this, too, Just Worked.  Perhaps we could 
have the create domain function implicitly perform the operator mapping 
to underlying basetypes to permit merge joins?

Best,

Randall

-- begin ugly code:

create domain literal as text;

create or replace function literaleq(literal, literal) returns boolean 
as ' select texteq($1::text, $2::text);
' language sql;

create or replace function literalne(literal, literal) returns boolean 
as ' select textne($1::text, $2::text);
' language sql;

create or replace function literal_lt(literal, literal) returns boolean 
as ' select text_lt($1::text, $2::text);
' language sql;

create or replace function literal_le(literal, literal) returns boolean 
as ' select text_le($1::text, $2::text);
' language sql;

create or replace function literal_gt(literal, literal) returns boolean 
as ' select text_gt($1::text, $2::text);
' language sql;

create or replace function literal_ge(literal, literal) returns boolean 
as ' select text_ge($1::text, $2::text);
' language sql;

create operator < ( leftarg = literal, rightarg = literal, procedure = literal_lt, commutator = >, negator = >=,
restrict= scalarltsel, join = scalarltjoinsel
 
);
create operator <= ( leftarg = literal, rightarg = literal, procedure = literal_le, commutator = >=, negator = >,
restrict= scalarltsel, join = scalarltjoinsel
 
);
create operator > ( leftarg = literal, rightarg = literal, procedure = literal_gt, commutator = <, negator = <=,
restrict= scalargtsel, join = scalargtjoinsel
 
);
create operator >= ( leftarg = literal, rightarg = literal, procedure = literal_ge, commutator = <=, negator = <,
restrict= scalargtsel, join = scalargtjoinsel
 
);


create operator = ( leftarg = literal, rightarg = literal, procedure = literaleq, commutator = =, negator = <>,
restrict= eqsel, join = eqjoinsel, hashes, sort1 = <, sort2 = <
 
);

create operator <> ( leftarg = literal, rightarg = literal, procedure = literaleq, commutator = <>, negator = =,
restrict= neqsel, join = neqjoinsel, hashes, sort1 = <, sort2 = <
 
);





On Saturday, May 10, 2003, at 07:40 PM, Randall Lucas wrote:

> create table textual_test_1 (
>  key textual_test,
>  stuff varchar
> );
> insert into textual_test_1 values ('alpha', 'asdflkjasdf');
> insert into textual_test_1 values ('bravo', 'asdfdasfsaff');
> insert into textual_test_1 values ('delta', 'asdfasfdas');
> create table textual_test_2 (
>  other_key textual_test,
>  more_stuff varchar
> );
> insert into textual_test_2 values ('charlie', 'asdfasfasfda');
> insert into textual_test_2 values ('delta', 'asgasgdda');
> insert into textual_test_2 values ('echo', 'asdasfsdfsfda');
> select * from textual_test_1 full outer join textual_test_2 on 
> textual_test_1.key = textual_test_2.other_key;



Re: merge-join for domain with underlying type text

From
Tom Lane
Date:
Randall Lucas <rlucas@tercent.net> writes:
> Hackers, why is this?  It seems to me that since everything else (or 
> everything I've run into so far, up to the full outer joins) is 
> implicitly the same for a simple domain and its underlying base type, 
> that it would make sense if this, too, Just Worked.

It does Just Work ... in CVS tip.  I can't make it magically work in
7.3 --- at least not without back-patching a lot of stuff that hasn't
been through beta-testing yet.

The problem is that mergejoin in existing releases will only work on
plain "Var = Var" clauses.  Your domain case doesn't look like that
because of the runtime cast operators.
        regards, tom lane



Re: Using psql to insert character codes

From
Peter Eisentraut
Date:
Ian Barwick writes:

> I can do this:
> INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), 'escape'))
>
> Is there any other, shorter way of doing the same?

Use octal escapes: '\303\244';

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Using psql to insert character codes

From
Ian Barwick
Date:
On Sunday 11 May 2003 18:54, Peter Eisentraut wrote:
> Ian Barwick writes:
> > I can do this:
> > INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), 'escape'))
> >
> > Is there any other, shorter way of doing the same?
>
> Use octal escapes: '\303\244';

Aha, thanks.

That seems to imply though that this statement in the docs:
"...where xxx is an octal number, is the character with the corresponding
ASCII code."

http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=sql-syntax.html

is not completely correct. Maybe it should read something like:

"...where xxx is an octal number which corresponds to a valid character in the
database encoding."


Ian Barwick
barwick@gmx.net


Re: Using psql to insert character codes

From
Peter Eisentraut
Date:
Ian Barwick writes:

> That seems to imply though that this statement in the docs:
> "...where xxx is an octal number, is the character with the corresponding
> ASCII code."
> is not completely correct. Maybe it should read something like:
>
> "...where xxx is an octal number which corresponds to a valid character in the
> database encoding."

What it actually does is insert exactly one byte of the given value. The
problem is that you can easily construct invalid code sequences that way,
and I that should not be allowable.

What I would like better is if \xxx took xxx as a code point and converted
it to the appropriately encoded byte sequence.  For single-byte encodings
that would mean no change, for Unicode it would make a lot more sense than
what it does now, but I don't know if that concept can be applied to all
character set encodings.

-- 
Peter Eisentraut   peter_e@gmx.net