Thread: extracting words

extracting words

From
Tarlika Elisabeth Schmitz
Date:
I am not quite sure whether this would be sensible or indeed at
all possible to do in SQL:

TABLE product: product_pk,  product_name, department_fk
TABLE product: department_pk,  department_name

example data:

SELECT product_pk, department_name, product_name ... LEFT JOIN ...:

1, "cakes & desserts", "apple crumble"
2, "cakes & desserts", "cheese cake"
3, "starters & soups", "french onion soup"


I need to extract the words from department_name and product_name
(words are separated by spaces) and get something like:

1 cakes
1 desserts
1 apple
1 crumble
2 cakes
2 desserts
2 cheese
2 cake
3 starters
3 soups
3 french
3 onion
3 soup



Would it be more sensible to produce a table join and split the result
into keywords programmatically?






--


Best Regards,

Tarlika Elisabeth Schmitz


A: Because it breaks the logical sequence of discussion
Q: Why is top posting bad? 


Re: extracting words

From
"A. Kretschmer"
Date:
am  Tue, dem 22.04.2008, um 22:25:04 +0100 mailte Tarlika Elisabeth Schmitz folgendes:
> I am not quite sure whether this would be sensible or indeed at
> all possible to do in SQL:
> 
> TABLE product: product_pk,  product_name, department_fk
> TABLE product: department_pk,  department_name
> 
> example data:
> 
> SELECT product_pk, department_name, product_name ... LEFT JOIN ...:
> 
> 1, "cakes & desserts", "apple crumble"
> 2, "cakes & desserts", "cheese cake"
> 3, "starters & soups", "french onion soup"
> 
> 
> I need to extract the words from department_name and product_name
> (words are separated by spaces) and get something like:
> 
> 1 cakes
> 1 desserts
> 1 apple
> 1 crumble
> 2 cakes
> 2 desserts
> 2 cheese
> 2 cake
> 3 starters
> 3 soups
> 3 french
> 3 onion
> 3 soup
> 

A little function (author: David Fetter)

-- split a string to rows, by David Fetter
CREATE OR REPLACE FUNCTION split_to_rows(TEXT, TEXT) RETURNS SETOF TEXT
STRICT
LANGUAGE sql
AS $$   SELECT (string_to_array($1, $2))[s.i]   FROM generate_series(       1,       array_upper(string_to_array($1,
$2),1)   ) AS s(i);
 
$$;


And now:

test=*# select * from product ;id |       val1       |       val2
----+------------------+------------------- 1 | cakes & desserts | apple crumble 2 | cakes & desserts | cheese cake 3 |
starters& soups | french onion soup
 
(3 rows)

test=*# select id, split_to_rows(replace(val1 || ' ' || val2,' &',''),' ') as col1 from product;id |   col1
----+---------- 1 | cakes 1 | desserts 1 | apple 1 | crumble 2 | cakes 2 | desserts 2 | cheese 2 | cake 3 | starters 3
|soups 3 | french 3 | onion 3 | soup
 
(13 rows)



HTH, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net


Re: extracting words

From
Alvaro Herrera
Date:
Tarlika Elisabeth Schmitz wrote:

> I need to extract the words from department_name and product_name
> (words are separated by spaces) and get something like:
> 
> 1 cakes
> 1 desserts
> 1 apple
> 1 crumble
> 2 cakes
> 2 desserts
> 2 cheese
> 2 cake
> 3 starters
> 3 soups
> 3 french
> 3 onion
> 3 soup

regexp_split_to_table is handy (8.3 only):

select pk, regexp_split_to_table(product_name, '[& ]+') from products
union
select pk, regexp_split_to_table(dept_name, '[& ]+') from departments;

Add joins as desired.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: extracting words

From
Tarlika Elisabeth Schmitz
Date:
On Wed, 23 Apr 2008 09:13:03 -0400
Alvaro Herrera <alvherre@commandprompt.com> wrote:

> Tarlika Elisabeth Schmitz wrote:
> 
> > I need to extract the words from department_name and product_name
> > (words are separated by spaces) and get something like:
> > 
> > 1 cakes
> > 1 desserts
> > 1 apple
> > 1 crumble
> > 2 cakes
> > 2 desserts
> > 2 cheese
> > 2 cake
> > 3 starters
> > 3 soups
> > 3 french
> > 3 onion
> > 3 soup
> 
> regexp_split_to_table is handy (8.3 only):
> 
> select pk, regexp_split_to_table(product_name, '[& ]+') from products
> union
> select pk, regexp_split_to_table(dept_name, '[& ]+') from departments;
> 
> Add joins as desired.


This is neat!
Unfortunately, on the machine that I need this, we are running 7.4.

--


Best Regards,

Tarlika Elisabeth Schmitz


A: Because it breaks the logical sequence of discussion
Q: Why is top posting bad? 


Re: extracting words

From
Tarlika Elisabeth Schmitz
Date:
On Wed, 23 Apr 2008 07:18:44 +0200
"A. Kretschmer" <andreas.kretschmer@schollglas.com> wrote:

> am  Tue, dem 22.04.2008, um 22:25:04 +0100 mailte Tarlika Elisabeth
> Schmitz folgendes:
> > TABLE product: product_pk,  product_name, department_fk
> > TABLE product: department_pk,  department_name
> > 
> > ...
> > 
> > 
> > I need to extract the words from department_name and product_name
> > (words are separated by spaces) ...
> 
> A little function (author: David Fetter)
> 
> -- split a string to rows, by David Fetter
> CREATE OR REPLACE FUNCTION split_to_rows(TEXT, TEXT) RETURNS SETOF
> TEXT STRICT
> LANGUAGE sql
> AS $$
>     SELECT (string_to_array($1, $2))[s.i]
>     FROM generate_series(
>         1,
>         array_upper(string_to_array($1, $2), 1)
>     ) AS s(i);
> $$;
> 
> 
> And now:
> 
> test=*# select * from product ;
>  id |       val1       |       val2
> ----+------------------+-------------------
>   1 | cakes & desserts | apple crumble
>   2 | cakes & desserts | cheese cake
>   3 | starters & soups | french onion soup
> (3 rows)
> 
> test=*# select id, split_to_rows(replace(val1 || ' ' || val2,'
> &',''),' ') as col1 from product; id |   col1
> ----+----------
>   1 | cakes
>   1 | desserts
>   1 | apple
>   1 | crumble
>   2 | cakes
>   2 | desserts
>   2 | cheese
>   2 | cake
>   3 | starters
>   3 | soups
>   3 | french
>   3 | onion
>   3 | soup
> (13 rows)


Many thanks! This is ingenious!

Before I ask any stupid questions such as "what does replace (text, '
&', '') do?": where can I find a more detailed description of the
string functions?

I would like to treat all white space (new-line, tab, space) as
word separator.

Will the above work on 7.4?


--


Best Regards,

Tarlika Elisabeth Schmitz


A: Because it breaks the logical sequence of discussion
Q: Why is top posting bad?