Thread: How do I remove selected words from text field?

How do I remove selected words from text field?

From
Frank Bax
Date:
I'm not quite sure how to ask for the query I want, so let's start with 
data:

create table t1 (i int, val varchar);
insert into t1 values(1,'A B C D');
insert into t1 values(2,'B D E F');
insert into t1 values(3,'G H I J');
create table t2 (q varchar, z varchar);
insert into t2 values('A','vowel');
insert into t2 values('B','consonant');
insert into t2 values('E','vowel');
insert into t2 values('K','consonant');

t1.val will contain "words" separated by blanks.  It might be better if 
each "word" were a separate row in another table; but that's not how the 
legacy database was built.  I understand this can be simulated by:

select i,a[s] from (select i, generate_subscripts(string_to_array(val,' 
'),1) as s, string_to_array(val,' ') as a from t1) foo;

In my "real life" situation, the "words" are not single letters.

I'd like to write a function that removes selected "words" from t1.val 
based on select on t2.
In the above example; let's exclude all vowels, so I end up with:
1 'B C D'
2 'B D F'
3 'G H I J'

For some "words" in val; there may not be a row when joining to t2.q; 
these words must be included in final result.  In the above example; 
there is no row in t2 where q="I"; so it is included in result.

How do I write such a function?  Can it be done with SQL only?


Re: How do I remove selected words from text field?

From
Osvaldo Kussama
Date:
2010/6/25 Frank Bax <fbax@sympatico.ca>:
> I'm not quite sure how to ask for the query I want, so let's start with
> data:
>
> create table t1 (i int, val varchar);
> insert into t1 values(1,'A B C D');
> insert into t1 values(2,'B D E F');
> insert into t1 values(3,'G H I J');
> create table t2 (q varchar, z varchar);
> insert into t2 values('A','vowel');
> insert into t2 values('B','consonant');
> insert into t2 values('E','vowel');
> insert into t2 values('K','consonant');
>
> t1.val will contain "words" separated by blanks.  It might be better if each
> "word" were a separate row in another table; but that's not how the legacy
> database was built.  I understand this can be simulated by:
>
> select i,a[s] from (select i, generate_subscripts(string_to_array(val,'
> '),1) as s, string_to_array(val,' ') as a from t1) foo;
>
> In my "real life" situation, the "words" are not single letters.
>
> I'd like to write a function that removes selected "words" from t1.val based
> on select on t2.
> In the above example; let's exclude all vowels, so I end up with:
> 1 'B C D'
> 2 'B D F'
> 3 'G H I J'
>
> For some "words" in val; there may not be a row when joining to t2.q; these
> words must be included in final result.  In the above example; there is no
> row in t2 where q="I"; so it is included in result.
>
> How do I write such a function?  Can it be done with SQL only?
>


Try:

SELECT i, array_to_string(array_agg(foo), ' ') "Val"
FROM (SELECT * FROM (SELECT i, regexp_split_to_table(val, E'\\s+') AS
foo FROM t1) bar
LEFT OUTER JOIN t2 ON (bar.foo = t2.q) WHERE z IS DISTINCT FROM 'vowel') foobar
GROUP BY i;i |   Val
---+---------1 | B C D3 | G H I J2 | B D F
(3 linhas)

Osvaldo


Re: How do I remove selected words from text field?

From
Frank Bax
Date:
Osvaldo Kussama wrote:
> 2010/6/25 Frank Bax <fbax@sympatico.ca>:
>> I'm not quite sure how to ask for the query I want, so let's start with
>> data:
>>
>> create table t1 (i int, val varchar);
>> insert into t1 values(1,'A B C D');
>> insert into t1 values(2,'B D E F');
>> insert into t1 values(3,'G H I J');
>> create table t2 (q varchar, z varchar);
>> insert into t2 values('A','vowel');
>> insert into t2 values('B','consonant');
>> insert into t2 values('E','vowel');
>> insert into t2 values('K','consonant');
>>
>> t1.val will contain "words" separated by blanks.  It might be better if each
>> "word" were a separate row in another table; but that's not how the legacy
>> database was built.  I understand this can be simulated by:
>>
>> select i,a[s] from (select i, generate_subscripts(string_to_array(val,'
>> '),1) as s, string_to_array(val,' ') as a from t1) foo;
>>
>> In my "real life" situation, the "words" are not single letters.
>>
>> I'd like to write a function that removes selected "words" from t1.val based
>> on select on t2.
>> In the above example; let's exclude all vowels, so I end up with:
>> 1 'B C D'
>> 2 'B D F'
>> 3 'G H I J'
>>
>> For some "words" in val; there may not be a row when joining to t2.q; these
>> words must be included in final result.  In the above example; there is no
>> row in t2 where q="I"; so it is included in result.
>>
>> How do I write such a function?  Can it be done with SQL only?
>>
> 
> 
> Try:
> 
> SELECT i, array_to_string(array_agg(foo), ' ') "Val"
> FROM (SELECT * FROM (SELECT i, regexp_split_to_table(val, E'\\s+') AS
> foo FROM t1) bar
> LEFT OUTER JOIN t2 ON (bar.foo = t2.q) WHERE z IS DISTINCT FROM 'vowel') foobar
> GROUP BY i;
>  i |   Val
> ---+---------
>  1 | B C D
>  3 | G H I J
>  2 | B D F
> (3 linhas)
> 
> Osvaldo
> 



Excellent!  Thanks!  I've never seen "is distinct from" before.  Looks 
like that was the missing piece to my puzzle.


Re: How do I remove selected words from text field?

From
Frank Bax
Date:
Osvaldo Kussama wrote:
> 2010/6/25 Frank Bax <fbax@sympatico.ca>:
>> I'm not quite sure how to ask for the query I want, so let's start with
>> data:
>>
>> create table t1 (i int, val varchar);
>> insert into t1 values(1,'A B C D');
>> insert into t1 values(2,'B D E F');
>> insert into t1 values(3,'G H I J');
>> create table t2 (q varchar, z varchar);
>> insert into t2 values('A','vowel');
>> insert into t2 values('B','consonant');
>> insert into t2 values('E','vowel');
>> insert into t2 values('K','consonant');
>>
>> t1.val will contain "words" separated by blanks.  It might be better if each
>> "word" were a separate row in another table; but that's not how the legacy
>> database was built.  I understand this can be simulated by:
>>
>> select i,a[s] from (select i, generate_subscripts(string_to_array(val,'
>> '),1) as s, string_to_array(val,' ') as a from t1) foo;
>>
>> In my "real life" situation, the "words" are not single letters.
>>
>> I'd like to write a function that removes selected "words" from t1.val based
>> on select on t2.
>> In the above example; let's exclude all vowels, so I end up with:
>> 1 'B C D'
>> 2 'B D F'
>> 3 'G H I J'
>>
>> For some "words" in val; there may not be a row when joining to t2.q; these
>> words must be included in final result.  In the above example; there is no
>> row in t2 where q="I"; so it is included in result.
>>
>> How do I write such a function?  Can it be done with SQL only?
>>
> 
> 
> Try:
> 
> SELECT i, array_to_string(array_agg(foo), ' ') "Val"
> FROM (SELECT * FROM (SELECT i, regexp_split_to_table(val, E'\\s+') AS
> foo FROM t1) bar
> LEFT OUTER JOIN t2 ON (bar.foo = t2.q) WHERE z IS DISTINCT FROM 'vowel') foobar
> GROUP BY i;
>  i |   Val
> ---+---------
>  1 | B C D
>  3 | G H I J
>  2 | B D F
> (3 linhas)
> 
> Osvaldo
> 



Excellent!  Thanks!  I've never seen "is distinct from" before.  Looks
like that was the missing piece to my puzzle.

When I write this as a function...

CREATE OR REPLACE FUNCTION novowel(text) RETURNS text AS $$  SELECT array_to_string(array_agg(word),' ') FROM  (SELECT
*FROM (SELECT regexp_split_to_table($1, E'\\s+') AS word) bar  LEFT OUTER JOIN t2 ON (bar.word=t2.q)  WHERE z IS
DISTINCTFROM 'vowel') foo  GROUP BY word
 
$$ LANGUAGE SQL

It only works when t1.val is a single word/vowel...

insert into t1 values (4,'E');

select *,novowel(val) from t1; i |    val    |  novowel
---+-----------+----------- 1 | A B C D   | A B C D 2 | B C D E F | B C D E F 3 | G H I J K | G H I J K 4 | E
|
(4 rows)




Re: How do I remove selected words from text field?

From
Frank Bax
Date:
Create some tables; then add some data:

create table t1 (i int, v varchar);
insert into t1 values(1,'A B C D');
insert into t1 values(2,'B D E F');
insert into t1 values(3,'G H I J');
insert into t1 values(4,'E');
create table t2 (q varchar, z varchar);
insert into t2 values('A','vowel');
insert into t2 values('B','consonant');
insert into t2 values('E','vowel');
insert into t2 values('K','consonant');

I am trying to write a function which will:
* split the argument into "words" (separated by blanks);
* remove words that meet a certain condition in another table  (in this example 'vowel');
* reassemble "words" into a string;
* return the result
This query does that job (Thanks Osvaldo):

SELECT i, array_to_string(array_agg(word), ' ') "new-v" FROM
(SELECT * FROM (SELECT i, regexp_split_to_table(v, E'\\s+') AS word FROM 
t1) bar
LEFT OUTER JOIN t2 ON (bar.word=t2.q)
WHERE z IS DISTINCT FROM 'vowel') foo
GROUP BY i;
 i |  new-v
---+--------- 1 | B C D 3 | G H I J 2 | B D F
(3 rows)


When I try to create a function to do the same thing; it only works for 
(4,'E') and not the other tuples.

CREATE OR REPLACE FUNCTION notvowel(text) RETURNS text AS $$  SELECT array_to_string(array_agg(word),' ') FROM  (SELECT
*FROM (SELECT regexp_split_to_table($1, E'\s+') AS word) bar  LEFT OUTER JOIN t2 ON (bar.word=t2.q)  WHERE z IS
DISTINCTFROM 'vowel') foo  GROUP BY word
 
$$ LANGUAGE SQL;

select *,notvowel(v::text) from t1; i |    v    | notvowel
---+---------+---------- 1 | A B C D | A B C D 2 | B D E F | B D E F 3 | G H I J | G H I J 4 | E       |
(4 rows)


I wonder if it has something to do with pattern passed to 
regexp_split_to_table() since inside the function, E'\\s+' results in: i |    v    | notvowel
---+---------+---------- 1 | A B C D | C 2 | B D E F | B 3 | G H I J | G 4 | E       |
(4 rows)



Re: How do I remove selected words from text field?

From
Osvaldo Kussama
Date:
2010/7/1 Frank Bax <fbax@sympatico.ca>:
>
> Create some tables; then add some data:
>
> create table t1 (i int, v varchar);
> insert into t1 values(1,'A B C D');
> insert into t1 values(2,'B D E F');
> insert into t1 values(3,'G H I J');
> insert into t1 values(4,'E');
> create table t2 (q varchar, z varchar);
> insert into t2 values('A','vowel');
> insert into t2 values('B','consonant');
> insert into t2 values('E','vowel');
> insert into t2 values('K','consonant');
>
> I am trying to write a function which will:
> * split the argument into "words" (separated by blanks);
> * remove words that meet a certain condition in another table
>  (in this example 'vowel');
> * reassemble "words" into a string;
> * return the result
> This query does that job (Thanks Osvaldo):
>
> SELECT i, array_to_string(array_agg(word), ' ') "new-v" FROM
> (SELECT * FROM (SELECT i, regexp_split_to_table(v, E'\\s+') AS word FROM t1)
> bar
> LEFT OUTER JOIN t2 ON (bar.word=t2.q)
> WHERE z IS DISTINCT FROM 'vowel') foo
> GROUP BY i;
>
>  i |  new-v
> ---+---------
>  1 | B C D
>  3 | G H I J
>  2 | B D F
> (3 rows)
>
>
> When I try to create a function to do the same thing; it only works for
> (4,'E') and not the other tuples.
>
> CREATE OR REPLACE FUNCTION notvowel(text) RETURNS text AS $$
>  SELECT array_to_string(array_agg(word),' ') FROM
>  (SELECT * FROM (SELECT regexp_split_to_table($1, E'\s+') AS word) bar

Use E'\\s+' or E'[[:space:]]+' in regexp_split_to_table function.


>  LEFT OUTER JOIN t2 ON (bar.word=t2.q)
>  WHERE z IS DISTINCT FROM 'vowel') foo
>  GROUP BY word
> $$ LANGUAGE SQL;
>
> select *,notvowel(v::text) from t1;
>  i |    v    | notvowel
> ---+---------+----------
>  1 | A B C D | A B C D
>  2 | B D E F | B D E F
>  3 | G H I J | G H I J
>  4 | E       |
> (4 rows)
>
>
> I wonder if it has something to do with pattern passed to
> regexp_split_to_table() since inside the function, E'\\s+' results in:
>  i |    v    | notvowel
> ---+---------+----------
>  1 | A B C D | C
>  2 | B D E F | B
>  3 | G H I J | G
>  4 | E       |
> (4 rows)
>


Osvaldo


Re: How do I remove selected words from text field?

From
Frank Bax
Date:
Osvaldo Kussama wrote:
> 2010/7/1 Frank Bax <fbax@sympatico.ca>:
>> Create some tables; then add some data:
>>
>> create table t1 (i int, v varchar);
>> insert into t1 values(1,'A B C D');
>> insert into t1 values(2,'B D E F');
>> insert into t1 values(3,'G H I J');
>> insert into t1 values(4,'E');
>> create table t2 (q varchar, z varchar);
>> insert into t2 values('A','vowel');
>> insert into t2 values('B','consonant');
>> insert into t2 values('E','vowel');
>> insert into t2 values('K','consonant');
>>
>> I am trying to write a function which will:
>> * split the argument into "words" (separated by blanks);
>> * remove words that meet a certain condition in another table
>>  (in this example 'vowel');
>> * reassemble "words" into a string;
>> * return the result
>> This query does that job (Thanks Osvaldo):
>>
>> SELECT i, array_to_string(array_agg(word), ' ') "new-v" FROM
>> (SELECT * FROM (SELECT i, regexp_split_to_table(v, E'\\s+') AS word FROM t1)
>> bar
>> LEFT OUTER JOIN t2 ON (bar.word=t2.q)
>> WHERE z IS DISTINCT FROM 'vowel') foo
>> GROUP BY i;
>>
>>  i |  new-v
>> ---+---------
>>  1 | B C D
>>  3 | G H I J
>>  2 | B D F
>> (3 rows)
>>
>>
>> When I try to create a function to do the same thing; it only works for
>> (4,'E') and not the other tuples.
>>
>> CREATE OR REPLACE FUNCTION notvowel(text) RETURNS text AS $$
>>  SELECT array_to_string(array_agg(word),' ') FROM
>>  (SELECT * FROM (SELECT regexp_split_to_table($1, E'\s+') AS word) bar
> 
> Use E'\\s+' or E'[[:space:]]+' in regexp_split_to_table function.


Both of these produce incorrect results...
  i |    v    | notvowel
---+---------+----------  1 | A B C D | C  2 | B D E F | B  3 | G H I J | G  4 | E       | (4 rows)



Re: How do I remove selected words from text field?

From
silly sad
Date:
On 07/02/10 03:25, Frank Bax wrote:
> Osvaldo Kussama wrote:
>> 2010/7/1 Frank Bax <fbax@sympatico.ca>:
>>> Create some tables; then add some data:
>>>
>>> create table t1 (i int, v varchar);
>>> insert into t1 values(1,'A B C D');
>>> insert into t1 values(2,'B D E F');
>>> insert into t1 values(3,'G H I J');
>>> insert into t1 values(4,'E');

first of all rebuild the table.
explode these strings by space char
and put into another table for convenient use

In other words
construct proper data scheme PRIOR TO USE it.