Thread: [pgsql-www] Searching for pgweb

[pgsql-www] Searching for pgweb

From
Magnus Hagander
Date:
Right now our main website search uses plainto_tsquery() to generate the searches.

Should we consider switching that to phraseto_tsquery() now that we have phrase searching?

//Magnus



--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [pgsql-www] Searching for pgweb

From
Dave Page
Date:
On Wed, Mar 22, 2017 at 4:51 PM, Magnus Hagander <magnus@hagander.net> wrote:
> Right now our main website search uses plainto_tsquery() to generate the
> searches.
>
> Should we consider switching that to phraseto_tsquery() now that we have
> phrase searching?

+1

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [pgsql-www] Searching for pgweb

From
"Joshua D. Drake"
Date:
On 03/22/2017 09:55 AM, Dave Page wrote:
> On Wed, Mar 22, 2017 at 4:51 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> Right now our main website search uses plainto_tsquery() to generate the
>> searches.
>>
>> Should we consider switching that to phraseto_tsquery() now that we have
>> phrase searching?
>
> +1
>

+1

-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.



Re: [pgsql-www] Searching for pgweb

From
Christophe Pettus
Date:
> On Mar 22, 2017, at 09:51, Magnus Hagander <magnus@hagander.net> wrote:
> Should we consider switching that to phraseto_tsquery() now that we have phrase searching?

+1.

--
-- Christophe Pettus  xof@thebuild.com




Re: [pgsql-www] Searching for pgweb

From
Peter van Hardenberg
Date:
Ok, I'll ask the silly questions: Will it actually give better results? Does it need user documentation added beyond just changing the query?

On Wed, Mar 22, 2017 at 11:20 AM, Christophe Pettus <xof@thebuild.com> wrote:

> On Mar 22, 2017, at 09:51, Magnus Hagander <magnus@hagander.net> wrote:
> Should we consider switching that to phraseto_tsquery() now that we have phrase searching?

+1.

--
-- Christophe Pettus
   xof@thebuild.com



--
Sent via pgsql-www mailing list (pgsql-www@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-www



--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut

Re: [pgsql-www] Searching for pgweb

From
Magnus Hagander
Date:
I think so. For example:

search=# SELECT count(*) FROM webpages WHERE suburl LIKE '/docs/current/%' AND fti @@ plainto_tsquery('create index');
 count 
-------
   342
(1 row)

search=# SELECT count(*) FROM webpages WHERE suburl LIKE '/docs/current/%' AND fti @@ phraseto_tsquery('create index');
 count 
-------
   106
(1 row)


But yes, that is kind of the core of the question.

I don't think it needs any special user docs -- it's not like we have documented how it works now, it's supposed to "just work".

//Magnus



On Thu, Mar 23, 2017 at 9:11 AM, Peter van Hardenberg <pvh@pvh.ca> wrote:
Ok, I'll ask the silly questions: Will it actually give better results? Does it need user documentation added beyond just changing the query?

On Wed, Mar 22, 2017 at 11:20 AM, Christophe Pettus <xof@thebuild.com> wrote:

> On Mar 22, 2017, at 09:51, Magnus Hagander <magnus@hagander.net> wrote:
> Should we consider switching that to phraseto_tsquery() now that we have phrase searching?

+1.

--
-- Christophe Pettus
   xof@thebuild.com


Re: [pgsql-www] Searching for pgweb

From
Oleg Bartunov
Date:


On Wed, Mar 22, 2017 at 7:51 PM, Magnus Hagander <magnus@hagander.net> wrote:
Right now our main website search uses plainto_tsquery() to generate the searches.

Should we consider switching that to phraseto_tsquery() now that we have phrase searching?

+1

Also, I suggest to use new parser, which better works _ and -, for example:

1.  
select ts_parse('tsparser', 'btree_gin');
    ts_parse
----------------
 (16,btree_gin)
 (11,btree)
 (12,_)
 (11,gin)
(4 rows)

select ts_parse('default', 'btree_gin');
 ts_parse
-----------
 (1,btree)
 (12,_)
 (1,gin)
(3 rows)

 
2. 
select ts_parse('tsparser', 'utc-5');
  ts_parse
------------
 (15,utc-5)
 (11,utc)
 (12,-)
 (9,5)
(4 rows)

select ts_parse('default', 'utc-5');
 ts_parse
----------
 (1,utc)
 (21,-5)
(2 rows)

again, compare
We have also better parsing of email, but I'm not sure we need it on postgres site.

We'll publish soon on github, let me know if you know it.


Oleg



//Magnus



--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [pgsql-www] Searching for pgweb

From
Oleg Bartunov
Date:


On Fri, Mar 24, 2017 at 10:56 AM, Oleg Bartunov <obartunov@gmail.com> wrote:


On Wed, Mar 22, 2017 at 7:51 PM, Magnus Hagander <magnus@hagander.net> wrote:
Right now our main website search uses plainto_tsquery() to generate the searches.

Should we consider switching that to phraseto_tsquery() now that we have phrase searching?

+1

Also, I suggest to use new parser, which better works _ and -, for example:

1.  
select ts_parse('tsparser', 'btree_gin');
    ts_parse
----------------
 (16,btree_gin)
 (11,btree)
 (12,_)
 (11,gin)
(4 rows)

select ts_parse('default', 'btree_gin');
 ts_parse
-----------
 (1,btree)
 (12,_)
 (1,gin)
(3 rows)

 
2. 
select ts_parse('tsparser', 'utc-5');
  ts_parse
------------
 (15,utc-5)
 (11,utc)
 (12,-)
 (9,5)
(4 rows)

select ts_parse('default', 'utc-5');
 ts_parse
----------
 (1,utc)
 (21,-5)
(2 rows)

again, compare
We have also better parsing of email, but I'm not sure we need it on postgres site.

We'll publish soon on github, let me know if you know it.


 

Oleg



//Magnus



--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Searching for pgweb

From
Magnus Hagander
Date:


On Fri, Mar 24, 2017 at 8:56 AM, Oleg Bartunov <obartunov@gmail.com> wrote:


On Wed, Mar 22, 2017 at 7:51 PM, Magnus Hagander <magnus@hagander.net> wrote:
Right now our main website search uses plainto_tsquery() to generate the searches.

Should we consider switching that to phraseto_tsquery() now that we have phrase searching?

+1

Also, I suggest to use new parser, which better works _ and -, for example:

1.  
select ts_parse('tsparser', 'btree_gin');
    ts_parse
----------------
 (16,btree_gin)
 (11,btree)
 (12,_)
 (11,gin)
(4 rows)

select ts_parse('default', 'btree_gin');
 ts_parse
-----------
 (1,btree)
 (12,_)
 (1,gin)
(3 rows)

 
2. 
select ts_parse('tsparser', 'utc-5');
  ts_parse
------------
 (15,utc-5)
 (11,utc)
 (12,-)
 (9,5)
(4 rows)

select ts_parse('default', 'utc-5');
 ts_parse
----------
 (1,utc)
 (21,-5)
(2 rows)

again, compare
We have also better parsing of email, but I'm not sure we need it on postgres site.

We'll publish soon on github, let me know if you know it.


That sounds interesting. Two questions:

1. Do you have plans for contributing this one for upstream postgres, or is it intended to be run separately?

2. Would it still make sense to do phrase searching? Sounds like it would? 

--

Re: Searching for pgweb

From
Oleg Bartunov
Date:


On 29 Mar 2017 09:49, "Magnus Hagander" <magnus@hagander.net> wrote:


On Fri, Mar 24, 2017 at 8:56 AM, Oleg Bartunov <obartunov@gmail.com> wrote:


On Wed, Mar 22, 2017 at 7:51 PM, Magnus Hagander <magnus@hagander.net> wrote:
Right now our main website search uses plainto_tsquery() to generate the searches.

Should we consider switching that to phraseto_tsquery() now that we have phrase searching?

+1

Also, I suggest to use new parser, which better works _ and -, for example:

1.  
select ts_parse('tsparser', 'btree_gin');
    ts_parse
----------------
 (16,btree_gin)
 (11,btree)
 (12,_)
 (11,gin)
(4 rows)

select ts_parse('default', 'btree_gin');
 ts_parse
-----------
 (1,btree)
 (12,_)
 (1,gin)
(3 rows)

 
2. 
select ts_parse('tsparser', 'utc-5');
  ts_parse
------------
 (15,utc-5)
 (11,utc)
 (12,-)
 (9,5)
(4 rows)

select ts_parse('default', 'utc-5');
 ts_parse
----------
 (1,utc)
 (21,-5)
(2 rows)

again, compare
We have also better parsing of email, but I'm not sure we need it on postgres site.

We'll publish soon on github, let me know if you know it.


That sounds interesting. Two questions:

1. Do you have plans for contributing this one for upstream postgres, or is it intended to be run separately?

We would love to do this, but currently it's there



2. Would it still make sense to do phrase searching? Sounds like it would? 

Yes, it would.

Re: Searching for pgweb

From
Magnus Hagander
Date:
On Wed, Mar 29, 2017 at 3:55 PM, Oleg Bartunov <obartunov@gmail.com> wrote:


On 29 Mar 2017 09:49, "Magnus Hagander" <magnus@hagander.net> wrote:


On Fri, Mar 24, 2017 at 8:56 AM, Oleg Bartunov <obartunov@gmail.com> wrote:


On Wed, Mar 22, 2017 at 7:51 PM, Magnus Hagander <magnus@hagander.net> wrote:
Right now our main website search uses plainto_tsquery() to generate the searches.

Should we consider switching that to phraseto_tsquery() now that we have phrase searching?

+1

Also, I suggest to use new parser, which better works _ and -, for example:

1.  
select ts_parse('tsparser', 'btree_gin');
    ts_parse
----------------
 (16,btree_gin)
 (11,btree)
 (12,_)
 (11,gin)
(4 rows)

select ts_parse('default', 'btree_gin');
 ts_parse
-----------
 (1,btree)
 (12,_)
 (1,gin)
(3 rows)

 
2. 
select ts_parse('tsparser', 'utc-5');
  ts_parse
------------
 (15,utc-5)
 (11,utc)
 (12,-)
 (9,5)
(4 rows)

select ts_parse('default', 'utc-5');
 ts_parse
----------
 (1,utc)
 (21,-5)
(2 rows)

again, compare
We have also better parsing of email, but I'm not sure we need it on postgres site.

We'll publish soon on github, let me know if you know it.


That sounds interesting. Two questions:

1. Do you have plans for contributing this one for upstream postgres, or is it intended to be run separately?

We would love to do this, but currently it's there


Right, found that one. But if your long term plan is to contribute it upstream, that makes it easier to rely on :)


 
2. Would it still make sense to do phrase searching? Sounds like it would? 

Yes, it would.


Cool, thanks.

--

Re: Searching for pgweb

From
Oleg Bartunov
Date:


On Fri, Mar 31, 2017 at 8:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
On Wed, Mar 29, 2017 at 3:55 PM, Oleg Bartunov <obartunov@gmail.com> wrote:


On 29 Mar 2017 09:49, "Magnus Hagander" <magnus@hagander.net> wrote:


On Fri, Mar 24, 2017 at 8:56 AM, Oleg Bartunov <obartunov@gmail.com> wrote:


On Wed, Mar 22, 2017 at 7:51 PM, Magnus Hagander <magnus@hagander.net> wrote:
Right now our main website search uses plainto_tsquery() to generate the searches.

Should we consider switching that to phraseto_tsquery() now that we have phrase searching?

+1

Also, I suggest to use new parser, which better works _ and -, for example:

1.  
select ts_parse('tsparser', 'btree_gin');
    ts_parse
----------------
 (16,btree_gin)
 (11,btree)
 (12,_)
 (11,gin)
(4 rows)

select ts_parse('default', 'btree_gin');
 ts_parse
-----------
 (1,btree)
 (12,_)
 (1,gin)
(3 rows)

 
2. 
select ts_parse('tsparser', 'utc-5');
  ts_parse
------------
 (15,utc-5)
 (11,utc)
 (12,-)
 (9,5)
(4 rows)

select ts_parse('default', 'utc-5');
 ts_parse
----------
 (1,utc)
 (21,-5)
(2 rows)

again, compare
We have also better parsing of email, but I'm not sure we need it on postgres site.

We'll publish soon on github, let me know if you know it.


That sounds interesting. Two questions:

1. Do you have plans for contributing this one for upstream postgres, or is it intended to be run separately?

We would love to do this, but currently it's there


Right, found that one. But if your long term plan is to contribute it upstream, that makes it easier to rely on :)

I'd love if you test it, give us feedback what to improve, what to fix.  Then we could try to convince community to accept it.
 


 
2. Would it still make sense to do phrase searching? Sounds like it would? 

Yes, it would.


Cool, thanks.

--

Re: Searching for pgweb

From
Magnus Hagander
Date:


On Fri, Mar 31, 2017 at 2:46 PM, Oleg Bartunov <obartunov@gmail.com> wrote:


On Fri, Mar 31, 2017 at 8:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
On Wed, Mar 29, 2017 at 3:55 PM, Oleg Bartunov <obartunov@gmail.com> wrote:


On 29 Mar 2017 09:49, "Magnus Hagander" <magnus@hagander.net> wrote:


On Fri, Mar 24, 2017 at 8:56 AM, Oleg Bartunov <obartunov@gmail.com> wrote:


On Wed, Mar 22, 2017 at 7:51 PM, Magnus Hagander <magnus@hagander.net> wrote:
Right now our main website search uses plainto_tsquery() to generate the searches.

Should we consider switching that to phraseto_tsquery() now that we have phrase searching?

+1

Also, I suggest to use new parser, which better works _ and -, for example:

1.  
select ts_parse('tsparser', 'btree_gin');
    ts_parse
----------------
 (16,btree_gin)
 (11,btree)
 (12,_)
 (11,gin)
(4 rows)

select ts_parse('default', 'btree_gin');
 ts_parse
-----------
 (1,btree)
 (12,_)
 (1,gin)
(3 rows)

 
2. 
select ts_parse('tsparser', 'utc-5');
  ts_parse
------------
 (15,utc-5)
 (11,utc)
 (12,-)
 (9,5)
(4 rows)

select ts_parse('default', 'utc-5');
 ts_parse
----------
 (1,utc)
 (21,-5)
(2 rows)

again, compare
We have also better parsing of email, but I'm not sure we need it on postgres site.

We'll publish soon on github, let me know if you know it.


That sounds interesting. Two questions:

1. Do you have plans for contributing this one for upstream postgres, or is it intended to be run separately?

We would love to do this, but currently it's there


Right, found that one. But if your long term plan is to contribute it upstream, that makes it easier to rely on :)

I'd love if you test it, give us feedback what to improve, what to fix.  Then we could try to convince community to accept it.


I've applied this one for testing on the main website search.

At the same time I realized we didn't setweight() on the title on regular webpages, so I fixed that too (setting title to weight A).

Basically the conf is:

CREATE TEXT SEARCH DICTIONARY english_ispell (
    TEMPLATE = pg_catalog.ispell,
    dictfile = 'en_us', afffile = 'en_us', stopwords = 'english' );
CREATE TEXT SEARCH DICTIONARY pg_dict (
    TEMPLATE = pg_catalog.synonym,
    synonyms = 'pg_dict' );
CREATE TEXT SEARCH CONFIGURATION pg (
    PARSER = tsparser );
ALTER TEXT SEARCH CONFIGURATION pg
    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
                      word, hword, hword_part
      WITH pg_dict, english_ispell, english_stem;
ALTER TEXT SEARCH CONFIGURATION pg
    ALTER MAPPING FOR email, file, float, host, hword_numpart, int, numhword, numword, sfloat, uint, url, url_path, version WITH simple;

If you have any other suggestions of things we should change there, please let me know!

So far, this is on the main website search and *not* on the archives search. Let's try it there first, but in the long run we should use similar configurations.
 
--

Re: Searching for pgweb

From
Oleg Bartunov
Date:


On Sun, Apr 2, 2017 at 9:37 AM, Magnus Hagander <magnus@hagander.net> wrote:


On Fri, Mar 31, 2017 at 2:46 PM, Oleg Bartunov <obartunov@gmail.com> wrote:


On Fri, Mar 31, 2017 at 8:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
On Wed, Mar 29, 2017 at 3:55 PM, Oleg Bartunov <obartunov@gmail.com> wrote:


On 29 Mar 2017 09:49, "Magnus Hagander" <magnus@hagander.net> wrote:


On Fri, Mar 24, 2017 at 8:56 AM, Oleg Bartunov <obartunov@gmail.com> wrote:


On Wed, Mar 22, 2017 at 7:51 PM, Magnus Hagander <magnus@hagander.net> wrote:
Right now our main website search uses plainto_tsquery() to generate the searches.

Should we consider switching that to phraseto_tsquery() now that we have phrase searching?

+1

Also, I suggest to use new parser, which better works _ and -, for example:

1.  
select ts_parse('tsparser', 'btree_gin');
    ts_parse
----------------
 (16,btree_gin)
 (11,btree)
 (12,_)
 (11,gin)
(4 rows)

select ts_parse('default', 'btree_gin');
 ts_parse
-----------
 (1,btree)
 (12,_)
 (1,gin)
(3 rows)

 
2. 
select ts_parse('tsparser', 'utc-5');
  ts_parse
------------
 (15,utc-5)
 (11,utc)
 (12,-)
 (9,5)
(4 rows)

select ts_parse('default', 'utc-5');
 ts_parse
----------
 (1,utc)
 (21,-5)
(2 rows)

again, compare
We have also better parsing of email, but I'm not sure we need it on postgres site.

We'll publish soon on github, let me know if you know it.


That sounds interesting. Two questions:

1. Do you have plans for contributing this one for upstream postgres, or is it intended to be run separately?

We would love to do this, but currently it's there


Right, found that one. But if your long term plan is to contribute it upstream, that makes it easier to rely on :)

I'd love if you test it, give us feedback what to improve, what to fix.  Then we could try to convince community to accept it.


I've applied this one for testing on the main website search.

At the same time I realized we didn't setweight() on the title on regular webpages, so I fixed that too (setting title to weight A).

Basically the conf is:

CREATE TEXT SEARCH DICTIONARY english_ispell (
    TEMPLATE = pg_catalog.ispell,
    dictfile = 'en_us', afffile = 'en_us', stopwords = 'english' );
CREATE TEXT SEARCH DICTIONARY pg_dict (
    TEMPLATE = pg_catalog.synonym,
    synonyms = 'pg_dict' );
CREATE TEXT SEARCH CONFIGURATION pg (
    PARSER = tsparser );
ALTER TEXT SEARCH CONFIGURATION pg
    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
                      word, hword, hword_part
      WITH pg_dict, english_ispell, english_stem;
ALTER TEXT SEARCH CONFIGURATION pg
    ALTER MAPPING FOR email, file, float, host, hword_numpart, int, numhword, numword, sfloat, uint, url, url_path, version WITH simple;

If you have any other suggestions of things we should change there, please let me know!


Depending on the load I'd use also shared ispell https://github.com/postgrespro/shared_ispell, which will save memory a lot.

So far, this is on the main website search and *not* on the archives search. Let's try it there first, but in the long run we should use similar configurations.
 
--

Re: Searching for pgweb

From
Magnus Hagander
Date:


On Mon, Apr 3, 2017 at 3:37 PM, Oleg Bartunov <obartunov@gmail.com> wrote:


On Sun, Apr 2, 2017 at 9:37 AM, Magnus Hagander <magnus@hagander.net> wrote:


On Fri, Mar 31, 2017 at 2:46 PM, Oleg Bartunov <obartunov@gmail.com> wrote:


On Fri, Mar 31, 2017 at 8:04 AM, Magnus Hagander <magnus@hagander.net> wrote:
On Wed, Mar 29, 2017 at 3:55 PM, Oleg Bartunov <obartunov@gmail.com> wrote:


On 29 Mar 2017 09:49, "Magnus Hagander" <magnus@hagander.net> wrote:


On Fri, Mar 24, 2017 at 8:56 AM, Oleg Bartunov <obartunov@gmail.com> wrote:


On Wed, Mar 22, 2017 at 7:51 PM, Magnus Hagander <magnus@hagander.net> wrote:
Right now our main website search uses plainto_tsquery() to generate the searches.

Should we consider switching that to phraseto_tsquery() now that we have phrase searching?

+1

Also, I suggest to use new parser, which better works _ and -, for example:

1.  
select ts_parse('tsparser', 'btree_gin');
    ts_parse
----------------
 (16,btree_gin)
 (11,btree)
 (12,_)
 (11,gin)
(4 rows)

select ts_parse('default', 'btree_gin');
 ts_parse
-----------
 (1,btree)
 (12,_)
 (1,gin)
(3 rows)

 
2. 
select ts_parse('tsparser', 'utc-5');
  ts_parse
------------
 (15,utc-5)
 (11,utc)
 (12,-)
 (9,5)
(4 rows)

select ts_parse('default', 'utc-5');
 ts_parse
----------
 (1,utc)
 (21,-5)
(2 rows)

again, compare
We have also better parsing of email, but I'm not sure we need it on postgres site.

We'll publish soon on github, let me know if you know it.


That sounds interesting. Two questions:

1. Do you have plans for contributing this one for upstream postgres, or is it intended to be run separately?

We would love to do this, but currently it's there


Right, found that one. But if your long term plan is to contribute it upstream, that makes it easier to rely on :)

I'd love if you test it, give us feedback what to improve, what to fix.  Then we could try to convince community to accept it.


I've applied this one for testing on the main website search.

At the same time I realized we didn't setweight() on the title on regular webpages, so I fixed that too (setting title to weight A).

Basically the conf is:

CREATE TEXT SEARCH DICTIONARY english_ispell (
    TEMPLATE = pg_catalog.ispell,
    dictfile = 'en_us', afffile = 'en_us', stopwords = 'english' );
CREATE TEXT SEARCH DICTIONARY pg_dict (
    TEMPLATE = pg_catalog.synonym,
    synonyms = 'pg_dict' );
CREATE TEXT SEARCH CONFIGURATION pg (
    PARSER = tsparser );
ALTER TEXT SEARCH CONFIGURATION pg
    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
                      word, hword, hword_part
      WITH pg_dict, english_ispell, english_stem;
ALTER TEXT SEARCH CONFIGURATION pg
    ALTER MAPPING FOR email, file, float, host, hword_numpart, int, numhword, numword, sfloat, uint, url, url_path, version WITH simple;

If you have any other suggestions of things we should change there, please let me know!


Depending on the load I'd use also shared ispell https://github.com/postgrespro/shared_ispell, which will save memory a lot. 

Our load is pretty low, so we don't really have need for that one at this point. But I'll try to remember it :)
 
--