Thread: Normalized Ranking example incorrect in text search
http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html Ranking Search Results shows and example which says "This is the same example using normalized ranking" and then gives a query which calculates normalization in an incorrect manner, yet without using the normalization parameter. A correct example would be something like this: SELECT title, ts_rank_cd(textsearch, query, 8 /*Normalization*/) AS rank FROM apod, to_tsquery('neutrino|(dark & matter)') query WHERE query @@ textsearch ORDER BY rank DESC LIMIT 10; I can't rerun the query because I don't have the example data set used. Is that available? This section also describes the two ranking functions supplied and suggests you can write your own also. - Can we say what the differences are between the two ranking functions? Why do we have two? - Can we supply or link to an example ranking function to allow people to write their own? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs <simon@2ndquadrant.com> writes: > http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html > Ranking Search Results > shows and example which says > "This is the same example using normalized ranking" > and then gives a query which calculates normalization in an incorrect > manner, On what basis do you claim that's an incorrect manner? It's exactly what is described in the paragraph just before the examples. > A correct example > would be something like this: > SELECT title, ts_rank_cd(textsearch, query, 8 /*Normalization*/) AS rank Why is that correct (or more correct than other ways)? > - Can we say what the differences are between the two ranking functions? > Why do we have two? We already say that: the _cd function doesn't work without positional info in the input tsvector. regards, tom lane
I wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> and then gives a query which calculates normalization in an incorrect >> manner, > On what basis do you claim that's an incorrect manner? It's exactly > what is described in the paragraph just before the examples. ... although on reflection, it seems pretty stupid to be recommending a method that requires two evaluations at each row of an admittedly expensive function. Seems like we should add one more normalization flag bit: 32 --- replace computed rank by rank / (rank + 1) and then the second example would be SELECT title, ts_rank_cd(textsearch, query, 32 /* rank/(rank+1) */) AS rank FROM apod, to_tsquery('neutrino|(dark & matter)') query WHERE query @@ textsearch ORDER BY rank DESC LIMIT 10; with no change in the example output. regards, tom lane