Thread: pgsql: Add json(b)_to_tsvector function

pgsql: Add json(b)_to_tsvector function

From
Teodor Sigaev
Date:
Add json(b)_to_tsvector function

Jsonb has a complex nature so there isn't best-for-everything way to convert it
to tsvector for full text search. Current to_tsvector(json(b)) suggests to
convert only string values, but it's possible to index keys, numerics and even
booleans value. To solve that json(b)_to_tsvector has a second required
argument contained a list of desired types of json fields. Second argument is
a jsonb scalar or array right now with possibility to add new options in a
future.

Bump catalog version

Author: Dmitry Dolgov with some editorization by me
Reviewed by: Teodor Sigaev
Discussion: https://www.postgresql.org/message-id/CA+q6zcXJQbS1b4kJ_HeAOoOc=unfnOrUEL=KGgE32QKDww7d8g@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/1c1791e00065f6986f9d44a78ce7c28b2d1322dd

Modified Files
--------------
doc/src/sgml/func.sgml              |  20 ++++
src/backend/tsearch/to_tsany.c      | 128 ++++++++++++++++++++++----
src/backend/utils/adt/jsonfuncs.c   | 179 ++++++++++++++++++++++++++++++++----
src/include/catalog/catversion.h    |   2 +-
src/include/catalog/pg_proc.h       |  24 +++--
src/include/utils/jsonapi.h         |  21 ++++-
src/test/regress/expected/json.out  | 122 ++++++++++++++++++++++++
src/test/regress/expected/jsonb.out | 122 ++++++++++++++++++++++++
src/test/regress/sql/json.sql       |  30 ++++++
src/test/regress/sql/jsonb.sql      |  30 ++++++
10 files changed, 629 insertions(+), 49 deletions(-)


Re: pgsql: Add json(b)_to_tsvector function

From
Teodor Sigaev
Date:
Forget to add to test 'english' FTS configuration, will fix
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dunlin&dt=2018-04-07%2018%3A05%3A17

Teodor Sigaev wrote:
> Add json(b)_to_tsvector function
> 
> Jsonb has a complex nature so there isn't best-for-everything way to convert it
> to tsvector for full text search. Current to_tsvector(json(b)) suggests to
> convert only string values, but it's possible to index keys, numerics and even
> booleans value. To solve that json(b)_to_tsvector has a second required
> argument contained a list of desired types of json fields. Second argument is
> a jsonb scalar or array right now with possibility to add new options in a
> future.
> 
> Bump catalog version
> 
> Author: Dmitry Dolgov with some editorization by me
> Reviewed by: Teodor Sigaev
> Discussion: https://www.postgresql.org/message-id/CA+q6zcXJQbS1b4kJ_HeAOoOc=unfnOrUEL=KGgE32QKDww7d8g@mail.gmail.com
> 
> Branch
> ------
> master
> 
> Details
> -------
> https://git.postgresql.org/pg/commitdiff/1c1791e00065f6986f9d44a78ce7c28b2d1322dd
> 
> Modified Files
> --------------
> doc/src/sgml/func.sgml              |  20 ++++
> src/backend/tsearch/to_tsany.c      | 128 ++++++++++++++++++++++----
> src/backend/utils/adt/jsonfuncs.c   | 179 ++++++++++++++++++++++++++++++++----
> src/include/catalog/catversion.h    |   2 +-
> src/include/catalog/pg_proc.h       |  24 +++--
> src/include/utils/jsonapi.h         |  21 ++++-
> src/test/regress/expected/json.out  | 122 ++++++++++++++++++++++++
> src/test/regress/expected/jsonb.out | 122 ++++++++++++++++++++++++
> src/test/regress/sql/json.sql       |  30 ++++++
> src/test/regress/sql/jsonb.sql      |  30 ++++++
> 10 files changed, 629 insertions(+), 49 deletions(-)
> 

-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru
                                       WWW: http://www.sigaev.ru/


Re: pgsql: Add json(b)_to_tsvector function

From
Teodor Sigaev
Date:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2018-04-07 
18%3A32%3A02

test select_parallel              ... FAILED

I don't understand how it's connected to json_to_tsquery. Can somebody 
point me what I'm missing?

Teodor Sigaev wrote:
> Add json(b)_to_tsvector function
> 
> Jsonb has a complex nature so there isn't best-for-everything way to convert it
> to tsvector for full text search. Current to_tsvector(json(b)) suggests to
> convert only string values, but it's possible to index keys, numerics and even
> booleans value. To solve that json(b)_to_tsvector has a second required
> argument contained a list of desired types of json fields. Second argument is
> a jsonb scalar or array right now with possibility to add new options in a
> future.
> 
> Bump catalog version
> 
> Author: Dmitry Dolgov with some editorization by me
> Reviewed by: Teodor Sigaev
> Discussion: https://www.postgresql.org/message-id/CA+q6zcXJQbS1b4kJ_HeAOoOc=unfnOrUEL=KGgE32QKDww7d8g@mail.gmail.com
> 
> Branch
> ------
> master
> 
> Details
> -------
> https://git.postgresql.org/pg/commitdiff/1c1791e00065f6986f9d44a78ce7c28b2d1322dd
> 
> Modified Files
> --------------
> doc/src/sgml/func.sgml              |  20 ++++
> src/backend/tsearch/to_tsany.c      | 128 ++++++++++++++++++++++----
> src/backend/utils/adt/jsonfuncs.c   | 179 ++++++++++++++++++++++++++++++++----
> src/include/catalog/catversion.h    |   2 +-
> src/include/catalog/pg_proc.h       |  24 +++--
> src/include/utils/jsonapi.h         |  21 ++++-
> src/test/regress/expected/json.out  | 122 ++++++++++++++++++++++++
> src/test/regress/expected/jsonb.out | 122 ++++++++++++++++++++++++
> src/test/regress/sql/json.sql       |  30 ++++++
> src/test/regress/sql/jsonb.sql      |  30 ++++++
> 10 files changed, 629 insertions(+), 49 deletions(-)
> 

-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru
                                       WWW: http://www.sigaev.ru/


Re: pgsql: Add json(b)_to_tsvector function

From
Andres Freund
Date:
On 2018-04-07 21:53:01 +0300, Teodor Sigaev wrote:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2018-04-07
> 18%3A32%3A02
> 
> test select_parallel              ... FAILED
> 
> I don't understand how it's connected to json_to_tsquery. Can somebody point
> me what I'm missing?

Yea, I was confused as well.

Given that it "only" failed during upgrade check, not the earlier
parallel check, it is possible that it's entirely unrelated and just a
low likelihood event?


This however does clearly seem related:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fulmar&dt=2018-04-07%2018%3A15%3A15

Probably caused by that animal using a non-standard collation?

*** /var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/expected/json.out    Sat Apr  7 20:15:40 2018
--- /var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/results/json.out    Sat Apr  7 20:29:30 2018
***************
*** 2333,2353 ****
  
  -- json_to_tsvector
  select json_to_tsvector('{"a": "aaa in bbb", "b": 123, "c": 456, "d": true, "f": false, "g": null}'::json, '"all"');
!                                     json_to_tsvector                                    
! ----------------------------------------------------------------------------------------
!  '123':8 '456':12 'aaa':2 'b':6 'bbb':4 'c':10 'd':14 'f':18 'fals':20 'g':22 'true':16
  (1 row)
  
  select json_to_tsvector('{"a": "aaa in bbb", "b": 123, "c": 456, "d": true, "f": false, "g": null}'::json, '"key"');
!         json_to_tsvector        
! --------------------------------
!  'b':2 'c':4 'd':6 'f':8 'g':10
  (1 row)
  
  select json_to_tsvector('{"a": "aaa in bbb", "b": 123, "c": 456, "d": true, "f": false, "g": null}'::json,
'"string"');
!  json_to_tsvector 
! ------------------
!  'aaa':1 'bbb':3
  (1 row)
  
  select json_to_tsvector('{"a": "aaa in bbb", "b": 123, "c": 456, "d": true, "f": false, "g": null}'::json,
'"numeric"');
--- 2333,2353 ----
  
  -- json_to_tsvector
  select json_to_tsvector('{"a": "aaa in bbb", "b": 123, "c": 456, "d": true, "f": false, "g": null}'::json, '"all"');
!                                            json_to_tsvector                                           
! ------------------------------------------------------------------------------------------------------
!  '123':9 '456':13 'a':1 'aaa':3 'b':7 'bbb':5 'c':11 'd':15 'f':19 'false':21 'g':23 'in':4 'true':17
  (1 row)
  
  select json_to_tsvector('{"a": "aaa in bbb", "b": 123, "c": 456, "d": true, "f": false, "g": null}'::json, '"key"');
!            json_to_tsvector           
! --------------------------------------
!  'a':1 'b':3 'c':5 'd':7 'f':9 'g':11
  (1 row)
  
  select json_to_tsvector('{"a": "aaa in bbb", "b": 123, "c": 456, "d": true, "f": false, "g": null}'::json,
'"string"');
!     json_to_tsvector    
! ------------------------
!  'aaa':1 'bbb':3 'in':2
  (1 row)
  
  select json_to_tsvector('{"a": "aaa in bbb", "b": 123, "c": 456, "d": true, "f": false, "g": null}'::json,
'"numeric"');

- Andres


Re: pgsql: Add json(b)_to_tsvector function

From
Teodor Sigaev
Date:
> Given that it "only" failed during upgrade check, not the earlier
> parallel check, it is possible that it's entirely unrelated and just a
> low likelihood event?
May be, may be. So, we have bug with low probability.

> This however does clearly seem related:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fulmar&dt=2018-04-07%2018%3A15%3A15
> 
> Probably caused by that animal using a non-standard collation?It should be already fixed in
https://git.postgresql.org/pg/commitdiff/01bb85169afadfe63e2f0e344ff671292080de7e
-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru
                                       WWW: http://www.sigaev.ru/


Re: pgsql: Add json(b)_to_tsvector function

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2018-04-07 21:53:01 +0300, Teodor Sigaev wrote:
>> test select_parallel              ... FAILED
>>
>> I don't understand how it's connected to json_to_tsquery. Can somebody point
>> me what I'm missing?

> Given that it "only" failed during upgrade check, not the earlier
> parallel check, it is possible that it's entirely unrelated and just a
> low likelihood event?

Yes.  We've seen that exact symptom before:

https://www.postgresql.org/message-id/7752.1515284083@sss.pgh.pa.us

I have a suspicion that it's related to the significantly-more-common
plan instability we've been seeing in postgres_fdw.

            regards, tom lane