Thread: Using psql to insert character codes
Say using psql I wish to insert a character into a VARCHAR / TEXT / whatever column using the its hexadecimal representation in the relevant character set / encoding. E.g.: C3A4, which represents the character 'ä' (lower case a with umlaut) in UTF-8. I can do this: INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), 'escape')) Is there any other, shorter way of doing the same? Ian Barwick barwick@gmx.net
Yes, use a Mac! I'm not being entirely facetious -- if you can use a Mac OS X terminal prompt (and if your pg config is substantially similar to mine), inserting any unicode stuff is quite easy. Simply typing or cutting-and-pasting at the terminal let me visually put in accented Latin, Cyrillic, and Chinese (don't ask me what kind, I am a sinoignoramus) no sweat. Also, Java works well for this -- if you can write or use a Java app to take advantage of its I18N, it will handle the encoding issues via JDBC quite nicely as far as I can tell. Best, Randall On Saturday, May 10, 2003, at 10:20 AM, Ian Barwick wrote: > > Say using psql I wish to insert a character into a VARCHAR / TEXT / > whatever > column using the its hexadecimal representation in the relevant > character set > / encoding. E.g.: C3A4, which represents the character 'ä' (lower case > a with > umlaut) in UTF-8. > > I can do this: > INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), > 'escape')) > > Is there any other, shorter way of doing the same? > > > Ian Barwick > barwick@gmx.net > > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to > majordomo@postgresql.org) >
On Saturday 10 May 2003 18:15, Randall Lucas wrote: > Yes, use a Mac! > > I'm not being entirely facetious -- if you can use a Mac OS X terminal > prompt (and if your pg config is substantially similar to mine), > inserting any unicode stuff is quite easy. Simply typing or > cutting-and-pasting at the terminal let me visually put in accented > Latin, Cyrillic, and Chinese (don't ask me what kind, I am a > sinoignoramus) no sweat. Thanks, but right answer to wrong question ;-). "mlterm" in Linux / FreeBSD works fine for me; the "problem" is not inputting in general, but the ability to use psql to input using the hexadecimal (or decimal) character code, which I need to do occasionally for some obscure character not reachable by other means. The solution I posted works, I was just wondering whether there was some more elegant method I had overlooked. Ian Barwick barwick@gmx.net > On Saturday, May 10, 2003, at 10:20 AM, Ian Barwick wrote: > > Say using psql I wish to insert a character into a VARCHAR / TEXT / > > whatever > > column using the its hexadecimal representation in the relevant > > character set > > / encoding. E.g.: C3A4, which represents the character 'ä' (lower case > > a with > > umlaut) in UTF-8. > > > > I can do this: > > INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), > > 'escape')) > > > > Is there any other, shorter way of doing the same? > > > > > > Ian Barwick > > barwick@gmx.net > > > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 2: you can get off all lists at once with the unregister command > > (send "unregister YourEmailAddressHere" to > > majordomo@postgresql.org)
On Saturday 10 May 2003 16:20, Ian Barwick wrote: > Say using psql I wish to insert a character into a VARCHAR / TEXT / > whatever column using the its hexadecimal representation in the relevant > character set / encoding. E.g.: C3A4, which represents the character 'ä' > (lower case a with umlaut) in UTF-8. > > I can do this: > INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), 'escape')) correction: INSERT INTO my_tbl (unitxt) VALUES(encode(decode('c3a4','hex'), 'escape')) Ian Barwick barwick@gmx.net
Ah, I see. </mac_over_enthusiasm> In that case, I would recommend just making a custom function that handles that for you, maybe: create function hex2esc(text) returns text as ' select encode(decode($1,''hex''), ''escape''); ' language sql; Not a whole lot more elegant but shorter. Best, Randall On Saturday, May 10, 2003, at 12:53 PM, Ian Barwick wrote: > On Saturday 10 May 2003 18:15, Randall Lucas wrote: >> Yes, use a Mac! >> >> I'm not being entirely facetious -- if you can use a Mac OS X terminal >> prompt (and if your pg config is substantially similar to mine), >> inserting any unicode stuff is quite easy. Simply typing or >> cutting-and-pasting at the terminal let me visually put in accented >> Latin, Cyrillic, and Chinese (don't ask me what kind, I am a >> sinoignoramus) no sweat. > > Thanks, but right answer to wrong question ;-). "mlterm" in Linux / > FreeBSD > works fine for me; the "problem" is not inputting in general, but the > ability to use psql to input using the hexadecimal (or decimal) > character > code, which I need to do occasionally for some obscure character not > reachable by other means. > > The solution I posted works, I was just wondering whether there was > some more > elegant method I had overlooked. > > Ian Barwick > barwick@gmx.net > >> On Saturday, May 10, 2003, at 10:20 AM, Ian Barwick wrote: >>> Say using psql I wish to insert a character into a VARCHAR / TEXT / >>> whatever >>> column using the its hexadecimal representation in the relevant >>> character set >>> / encoding. E.g.: C3A4, which represents the character 'ä' (lower >>> case >>> a with >>> umlaut) in UTF-8. >>> >>> I can do this: >>> INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), >>> 'escape')) >>> >>> Is there any other, shorter way of doing the same? >>> >>> >>> Ian Barwick >>> barwick@gmx.net >>> >>> >>> ---------------------------(end of >>> broadcast)--------------------------- >>> TIP 2: you can get off all lists at once with the unregister command >>> (send "unregister YourEmailAddressHere" to >>> majordomo@postgresql.org) >
Hi Folks, If I have a domain defined with an underlying type of "text" or "varchar," and I need to FULL OUTER JOIN two tables based upon the similarity in these fields, I am getting: ERROR: FULL JOIN is only supported with mergejoinable join conditions I have looked at the docs and it seems that it's a matter of making the comparison operators for the domain type (as per SQL Reference :: "CREATE OPERATOR" section). However, is there anything tricky about adding the merge-join capability to a domain? Is there a preferred method? Am I barking entirely up the wrong tree (could I, for example, use a casting trick? Simply recasting the fields as type "text" doesn't avoid the error)? Postgres version is 7.3. Best, Randall
On Sat, 2003-05-10 at 17:04, Randall Lucas wrote: > Hi Folks, > > If I have a domain defined with an underlying type of "text" or > "varchar," and I need to FULL OUTER JOIN two tables based upon the > similarity in these fields, I am getting: Seems it works perfectly fine with integers and text based domains on 7.3 and 7.4. Do you have a full failing example? junk=# create domain int as integer; CREATE DOMAIN junk=# create table a (col1 int); CREATE TABLE junk=# create table b (col1 int); CREATE TABLE junk=# select * from a full outer join b using (col1);col1 ------ (0 rows) junk=# select version(); version ------------------------------------------------------------------------PostgreSQL 7.4devel on i386-unknown-freebsd4.8, compiledby GCC 2.95.4 (1 row) -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
-- Hmm, here's some code that manifests the problem: -- using text itself: create table text_test_1 ( key text, stuff varchar ); insert into text_test_1 values ('alpha', 'asdflkjasdf'); insert into text_test_1 values ('bravo', 'asdfdasfsaff'); insert into text_test_1 values ('delta', 'asdfasfdas'); create table text_test_2 ( other_key text, more_stuff varchar ); insert into text_test_2 values ('charlie', 'asdfasfasfda'); insert into text_test_2 values ('delta', 'asgasgdda'); insert into text_test_2 values ('echo', 'asdasfsdfsfda'); select * from text_test_1 full outer join text_test_2 on text_test_1.key = text_test_2.other_key; -- Works OK, BUT: -- using domain textual: create domain textual_test as text; create table textual_test_1 ( key textual_test, stuff varchar ); insert into textual_test_1 values ('alpha', 'asdflkjasdf'); insert into textual_test_1 values ('bravo', 'asdfdasfsaff'); insert into textual_test_1 values ('delta', 'asdfasfdas'); create table textual_test_2 ( other_key textual_test, more_stuff varchar ); insert into textual_test_2 values ('charlie', 'asdfasfasfda'); insert into textual_test_2 values ('delta', 'asgasgdda'); insert into textual_test_2 values ('echo', 'asdasfsdfsfda'); select * from textual_test_1 full outer join textual_test_2 on textual_test_1.key = textual_test_2.other_key; -- Will give ERROR: FULL JOIN is only supported with mergejoinable join conditions -- clean up drop table text_test_1; drop table text_test_2; drop table textual_test_1; drop table textual_test_2; drop domain textual_test; -- Best, -- -- Randall On Saturday, May 10, 2003, at 06:13 PM, Rod Taylor wrote: > On Sat, 2003-05-10 at 17:04, Randall Lucas wrote: >> Hi Folks, >> >> If I have a domain defined with an underlying type of "text" or >> "varchar," and I need to FULL OUTER JOIN two tables based upon the >> similarity in these fields, I am getting: > > Seems it works perfectly fine with integers and text based domains on > 7.3 and 7.4. Do you have a full failing example? > > junk=# create domain int as integer; > CREATE DOMAIN > junk=# create table a (col1 int); > CREATE TABLE > junk=# create table b (col1 int); > CREATE TABLE > junk=# select * from a full outer join b using (col1); > col1 > ------ > (0 rows) > > junk=# select version(); > version > ----------------------------------------------------------------------- > - > PostgreSQL 7.4devel on i386-unknown-freebsd4.8, compiled by GCC 2.95.4 > (1 row) > > -- > Rod Taylor <rbt@rbt.ca> > > PGP Key: http://www.rbt.ca/rbtpub.asc > <signature.asc>
On Sat, 2003-05-10 at 19:40, Randall Lucas wrote: > -- Hmm, here's some code that manifests the problem: Seems it's been fixed in 7.4. In the mean time, you might try this: select * from textual_test_1 full outer join textual_test_2 on CAST(textual_test_1.key AS text) = CAST(textual_test_2.other_key AS text); -- LOG -- h=# create domain textual_test as text; CREATE DOMAIN h=# create table textual_test_1 ( h(# key textual_test, h(# stuff varchar h(# ); CREATE TABLE h=# insert into textual_test_1 values ('alpha', 'asdflkjasdf'); INSERT 154456 1 h=# insert into textual_test_1 values ('bravo', 'asdfdasfsaff'); INSERT 154457 1 h=# insert into textual_test_1 values ('delta', 'asdfasfdas'); INSERT 154458 1 h=# create table textual_test_2 ( h(# other_key textual_test, h(# more_stuff varchar h(# ); CREATE TABLE h=# insert into textual_test_2 values ('charlie', 'asdfasfasfda'); INSERT 154464 1 h=# insert into textual_test_2 values ('delta', 'asgasgdda'); INSERT 154465 1 h=# insert into textual_test_2 values ('echo', 'asdasfsdfsfda'); INSERT 154466 1 h=# select * from textual_test_1 full outer join textual_test_2 on textual_test_1.key = textual_test_2.other_key; key | stuff | other_key | more_stuff -------+--------------+-----------+---------------alpha | asdflkjasdf | |bravo | asdfdasfsaff | | | | charlie | asdfasfasfdadelta | asdfasfdas | delta | asgasgdda | | echo | asdasfsdfsfda (5 rows) h=# select version(); version ------------------------------------------------------------------------PostgreSQL 7.4devel on i386-unknown-freebsd4.8, compiledby GCC 2.95.4 (1 row) -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
OK, I ended up wrapping up the text equality and comparison functions in simple sql functions that explicitly take my domain as the type, then added =, <>, <, <=, >, >= operators pointing to those functions, and now I can perform full outer joins with my domain "literal." See below for ugly code. Hackers, why is this? It seems to me that since everything else (or everything I've run into so far, up to the full outer joins) is implicitly the same for a simple domain and its underlying base type, that it would make sense if this, too, Just Worked. Perhaps we could have the create domain function implicitly perform the operator mapping to underlying basetypes to permit merge joins? Best, Randall -- begin ugly code: create domain literal as text; create or replace function literaleq(literal, literal) returns boolean as ' select texteq($1::text, $2::text); ' language sql; create or replace function literalne(literal, literal) returns boolean as ' select textne($1::text, $2::text); ' language sql; create or replace function literal_lt(literal, literal) returns boolean as ' select text_lt($1::text, $2::text); ' language sql; create or replace function literal_le(literal, literal) returns boolean as ' select text_le($1::text, $2::text); ' language sql; create or replace function literal_gt(literal, literal) returns boolean as ' select text_gt($1::text, $2::text); ' language sql; create or replace function literal_ge(literal, literal) returns boolean as ' select text_ge($1::text, $2::text); ' language sql; create operator < ( leftarg = literal, rightarg = literal, procedure = literal_lt, commutator = >, negator = >=, restrict= scalarltsel, join = scalarltjoinsel ); create operator <= ( leftarg = literal, rightarg = literal, procedure = literal_le, commutator = >=, negator = >, restrict= scalarltsel, join = scalarltjoinsel ); create operator > ( leftarg = literal, rightarg = literal, procedure = literal_gt, commutator = <, negator = <=, restrict= scalargtsel, join = scalargtjoinsel ); create operator >= ( leftarg = literal, rightarg = literal, procedure = literal_ge, commutator = <=, negator = <, restrict= scalargtsel, join = scalargtjoinsel ); create operator = ( leftarg = literal, rightarg = literal, procedure = literaleq, commutator = =, negator = <>, restrict= eqsel, join = eqjoinsel, hashes, sort1 = <, sort2 = < ); create operator <> ( leftarg = literal, rightarg = literal, procedure = literaleq, commutator = <>, negator = =, restrict= neqsel, join = neqjoinsel, hashes, sort1 = <, sort2 = < ); On Saturday, May 10, 2003, at 07:40 PM, Randall Lucas wrote: > create table textual_test_1 ( > key textual_test, > stuff varchar > ); > insert into textual_test_1 values ('alpha', 'asdflkjasdf'); > insert into textual_test_1 values ('bravo', 'asdfdasfsaff'); > insert into textual_test_1 values ('delta', 'asdfasfdas'); > create table textual_test_2 ( > other_key textual_test, > more_stuff varchar > ); > insert into textual_test_2 values ('charlie', 'asdfasfasfda'); > insert into textual_test_2 values ('delta', 'asgasgdda'); > insert into textual_test_2 values ('echo', 'asdasfsdfsfda'); > select * from textual_test_1 full outer join textual_test_2 on > textual_test_1.key = textual_test_2.other_key;
Randall Lucas <rlucas@tercent.net> writes: > Hackers, why is this? It seems to me that since everything else (or > everything I've run into so far, up to the full outer joins) is > implicitly the same for a simple domain and its underlying base type, > that it would make sense if this, too, Just Worked. It does Just Work ... in CVS tip. I can't make it magically work in 7.3 --- at least not without back-patching a lot of stuff that hasn't been through beta-testing yet. The problem is that mergejoin in existing releases will only work on plain "Var = Var" clauses. Your domain case doesn't look like that because of the runtime cast operators. regards, tom lane
Ian Barwick writes: > I can do this: > INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), 'escape')) > > Is there any other, shorter way of doing the same? Use octal escapes: '\303\244'; -- Peter Eisentraut peter_e@gmx.net
On Sunday 11 May 2003 18:54, Peter Eisentraut wrote: > Ian Barwick writes: > > I can do this: > > INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), 'escape')) > > > > Is there any other, shorter way of doing the same? > > Use octal escapes: '\303\244'; Aha, thanks. That seems to imply though that this statement in the docs: "...where xxx is an octal number, is the character with the corresponding ASCII code." http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=sql-syntax.html is not completely correct. Maybe it should read something like: "...where xxx is an octal number which corresponds to a valid character in the database encoding." Ian Barwick barwick@gmx.net
Ian Barwick writes: > That seems to imply though that this statement in the docs: > "...where xxx is an octal number, is the character with the corresponding > ASCII code." > is not completely correct. Maybe it should read something like: > > "...where xxx is an octal number which corresponds to a valid character in the > database encoding." What it actually does is insert exactly one byte of the given value. The problem is that you can easily construct invalid code sequences that way, and I that should not be allowable. What I would like better is if \xxx took xxx as a code point and converted it to the appropriately encoded byte sequence. For single-byte encodings that would mean no change, for Unicode it would make a lot more sense than what it does now, but I don't know if that concept can be applied to all character set encodings. -- Peter Eisentraut peter_e@gmx.net