Thread: Strange result for full text query
I have a record with this in the 'ti' field:
On type A and type B uncertainties and its propagation without
derivatives: a contribution to incorporate contemporary metrology to
Physics' laboratories in higher education
On type A and type B uncertainties and its propagation without
derivatives: a contribution to incorporate contemporary metrology to
Physics' laboratories in higher education
And there is a tsv-field which is updated by the following trigger function:
CREATE OR REPLACE FUNCTION isi.update_ritem_tsv()
RETURNS trigger AS
$BODY$
begin
new.tsv := to_tsvector(coalesce(new.ti,'')) || to_tsvector(coalesce(new.ab,''));
return new;
end
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION isi.update_ritem_tsv()
OWNER TO crest;
and tsv contains the following for this record:
"'activ':129 'altern':73,94 'assumpt':62 'b':6,39 'concept':28,122
'contemporari':17 'contribut':14 'deriv':12,91 'discuss':42
'earliest':127 'educ':24,133 'estim':76 'evalu':34 'experiment':128
'explicit':87 'express':54 'goal':65 'guid':51 'higher':23,132
'incorpor':16 'input':82 'introduc':26,71 'introduct':119
'laboratori':21,135 'law':44,106 'main':64 'measur':58
'metrolog':18,121 'output':84 'paper':68 'partial':90 'perceiv':114
'perform':130 'physic':20,134 'possibl':112 'procedur':32,74,95,124
'propag':10,46,79,108 'quantiti':85 'refer':88 'relat':97 'share':100
'simpl':98 'type':2,5,35,38 'uncertainti':7,30,40,48,56,77 'under':61
'valid':103 'valu':116 'without':11,86"
CREATE OR REPLACE FUNCTION isi.update_ritem_tsv()
RETURNS trigger AS
$BODY$
begin
new.tsv := to_tsvector(coalesce(new.ti,'')) || to_tsvector(coalesce(new.ab,''));
return new;
end
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION isi.update_ritem_tsv()
OWNER TO crest;
and tsv contains the following for this record:
"'activ':129 'altern':73,94 'assumpt':62 'b':6,39 'concept':28,122
'contemporari':17 'contribut':14 'deriv':12,91 'discuss':42
'earliest':127 'educ':24,133 'estim':76 'evalu':34 'experiment':128
'explicit':87 'express':54 'goal':65 'guid':51 'higher':23,132
'incorpor':16 'input':82 'introduc':26,71 'introduct':119
'laboratori':21,135 'law':44,106 'main':64 'measur':58
'metrolog':18,121 'output':84 'paper':68 'partial':90 'perceiv':114
'perform':130 'physic':20,134 'possibl':112 'procedur':32,74,95,124
'propag':10,46,79,108 'quantiti':85 'refer':88 'relat':97 'share':100
'simpl':98 'type':2,5,35,38 'uncertainti':7,30,40,48,56,77 'under':61
'valid':103 'valu':116 'without':11,86"
When I use the query
select ut, ti
from isi.ritem A
where
A.tsv @@ to_tsquery('Simple','higher & education')
select ut, ti
from isi.ritem A
where
A.tsv @@ to_tsquery('Simple','higher & education')
The result is zero.
But then I use
A.tsv @@ to_tsquery('Simple','higher & educ')
A.tsv @@ to_tsquery('Simple','higher & educ')
I get more than 54000 records as result.
and with
A.tsv @@ to_tsquery('Simple','education')
A.tsv @@ to_tsquery('Simple','education')
I get 19 records containing words like Educatione, educationism, Education etc.
That makes me think the full text search is probably not very reliable. Why this results?
Regards
Johann
Johann
--
Because experiencing your loyal love is better than life itself,
my lips will praise you. (Psalm 63:3)
Because experiencing your loyal love is better than life itself,
my lips will praise you. (Psalm 63:3)
Johann Spies <johann.spies@gmail.com> writes: > I have a record with this in the 'ti' field: > On type A and type B uncertainties and its propagation without > derivatives: a contribution to incorporate contemporary metrology to > Physics' laboratories in higher education > and tsv contains the following for this record: > "'activ':129 'altern':73,94 'assumpt':62 'b':6,39 'concept':28,122 > 'contemporari':17 'contribut':14 'deriv':12,91 'discuss':42 > 'earliest':127 'educ':24,133 'estim':76 'evalu':34 'experiment':128 What have you got default_text_search_config set to? Not "simple": regression=# select to_tsvector('simple','higher education'); to_tsvector -------------------------- 'education':2 'higher':1 (1 row) regression=# select to_tsvector('english','higher education'); to_tsvector --------------------- 'educ':2 'higher':1 (1 row) > When I use the query > select ut, ti > from isi.ritem A > where > A.tsv @@ to_tsquery('Simple','higher & education') > The result is zero. > But then I use > A.tsv @@ to_tsquery('Simple','higher & educ') > I get more than 54000 records as result. Not surprising: the simple config doesn't assume the lexemes are English words. regression=# select to_tsquery('Simple','higher & education'); to_tsquery ------------------------ 'higher' & 'education' (1 row) regression=# select to_tsquery('english','higher & education'); to_tsquery ------------------- 'higher' & 'educ' (1 row) > That makes me think the full text search is probably not very reliable. If misused, yeah. You need to have a fairly clear idea of which language you are indexing; or maybe I should just say that you need to use a consistent text search configuration selection. (Words in other languages will still work, they just might not get stemmed usefully.) regards, tom lane