Re: syntax for reaching into records, specifically ts_stat results - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: syntax for reaching into records, specifically ts_stat results
Date
Msg-id Pine.LNX.4.64.0812092303280.28443@sn.sai.msu.ru
Whole thread Raw
In response to syntax for reaching into records, specifically ts_stat results  (Dan Chak <chak@MIT.EDU>)
Responses Re: syntax for reaching into records, specifically ts_stat results
List pgsql-hackers
try 
select * from ts_stat(....)
btw, performance of ts_stat() was greatly improved in 8.4.

Oleg
On Tue, 9 Dec 2008, Dan Chak wrote:

> Dear Postgres Folk,
>
> In working with tsvectors (added in 8.3), I've come to a place where my 
> syntax-fu has failed me.  I've resorted to turning a result set of records 
> into strings so that I can regexp out the record fields I need, rather than 
> access them directly, as I'm sure it's possible to do with the right 
> syntactic formulation.  Although my solution works, I'm sure it's much less 
> efficient than it could be, and hope someone on the list can help do this the 
> right way.
>
> Basically, I would like to transpose a series of tsvectors (one per row) into 
> columns.  E.g., as tsvects, I have this:
>
> test=# select * from tsvects;
> sentence_id |            tsvect
> -------------+------------------------------
>          1 | 'fox':3 'brown':2 'quick':1
>          2 | 'lazi':1 'eleph':3 'green':2
>
> Instead I want this:
>
> sentence_id | word  | freq
> -------------+-------+------
>          1 | fox   | 1
>          1 | brown | 1
>          1 | quick | 1
>          2 | lazi  | 1
>          2 | eleph | 1
>          2 | green | 1
>
> I am able to generate this with the following view, but the problem is that 
> to create it, I must first cast the ts_stat results to a string, and then 
> regexp out the pertinent pieces:
>
> create or replace view words as
> select sentence_id,
>      substring(stat from '^\\(([^,]+),') as word,
>      substring(stat from ',([^,]+)\\)$') as freq
> from (select sentence_id,
>              ts_stat('select tsvect from tsvects where sentence_id = ' ||
>                 tsvects.sentence_id)::text as stat
>         from tsvects
>      ) as foo;
>
> It seems like there should be a way to access fields in the records returned 
> from ts_stat directly, but I can't figure out how.  Here's the result of the 
> subquery:
>
> test=# select sentence_id,
>              ts_stat('select tsvect from tsvects where sentence_id = ' ||
>                 tsvects.sentence_id)::text as stat
>         from tsvects;
> sentence_id |    stat
> -------------+-------------
>          1 | (fox,1,1)
>          1 | (brown,1,1)
>          1 | (quick,1,1)
>          2 | (lazi,1,1)
>          2 | (eleph,1,1)
>          2 | (green,1,1)
> (6 rows)
>
> If I try to get at the elements (which I believe are named 'word', 'ndoc', 
> 'nentry'), I get a variety of syntax errors:
>
> test=# select sentence_id,
> test-#        stat['word'],
> test-#        stat['nentry']
> test-#   from (select sentence_id,
> test(#                ts_stat('select tsvect from tsvects where sentence_id = 
> ' ||
> test(#                   tsvects.sentence_id) as stat
> test(#           from tsvects
> test(#        ) as foo;
> ERROR:  cannot subscript type record because it is not an array
>
> If I say stat.word (instead of subscripting), I get 'missing FROM-clause 
> entry for table "stat"'.  If I say foo.stat.word, I get 'ERROR:  schema "foo" 
> does not exist'.
>
> Any ideas on how to get into these records with resorting to text parsing?
>
> Thanks,
> Dan
>
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: parallel restore vs. windows
Next
From: Tom Lane
Date:
Subject: Re: syntax for reaching into records, specifically ts_stat results