Re: integrated tsearch doesn't work with non utf8 database - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Re: integrated tsearch doesn't work with non utf8 database
Date
Msg-id 46E53499.1010904@sigaev.ru
Whole thread Raw
In response to Re: integrated tsearch doesn't work with non utf8 database  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Responses Re: integrated tsearch doesn't work with non utf8 database
List pgsql-hackers
> Note the Seq Scan on pg_ts_config_map, with filter on ts_lexize(mapdict,
> $1). That means that it will call ts_lexize on every dictionary, which
> will try to load every dictionary. And loading danish_stem dictionary
> fails in latin2 encoding, because of the problem with the stopword file.

Attached patch should fix it, I hope.

New plan:
  Hash Join  (cost=2.80..1073.85 rows=80 width=100)
    Hash Cond: (parse.tokid = tt.tokid)
    InitPlan
      ->  Seq Scan on pg_ts_config  (cost=0.00..1.20 rows=1 width=4)
            Filter: (oid = 11308::oid)
      ->  Seq Scan on pg_ts_config  (cost=0.00..1.20 rows=1 width=4)
            Filter: (oid = 11308::oid)
    ->  Function Scan on ts_parse parse  (cost=0.00..12.50 rows=1000 width=36)
    ->  Hash  (cost=0.20..0.20 rows=16 width=68)
          ->  Function Scan on ts_token_type tt  (cost=0.00..0.20 rows=16 width=68)
    SubPlan
      ->  Limit  (cost=6.57..6.60 rows=1 width=36)
            ->  Subquery Scan dl  (cost=6.57..6.60 rows=1 width=36)
                  ->  Sort  (cost=6.57..6.58 rows=1 width=8)
                        Sort Key: ((ts_lexize(m.mapdict, $1) IS NULL)), m.mapseqno
                        ->  Seq Scan on pg_ts_config_map m  (cost=0.00..6.56
rows=1 width=8)
                              Filter: ((mapcfg = 11308::oid) AND (maptokentype =
$0))
      ->  Sort  (cost=6.57..6.57 rows=1 width=8)
            Sort Key: m.mapseqno
            ->  Seq Scan on pg_ts_config_map m  (cost=0.00..6.56 rows=1 width=8)
                  Filter: ((mapcfg = 11308::oid) AND (maptokentype = $0))


At least, it checks only needed dictionaries.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/
*** ./src/backend/catalog/system_views.sql.orig    Mon Sep 10 15:51:27 2007
--- ./src/backend/catalog/system_views.sql    Mon Sep 10 16:09:52 2007
***************
*** 415,422 ****
              ( SELECT mapdict, pg_catalog.ts_lexize(mapdict, parse.token) AS lex
                FROM pg_catalog.pg_ts_config_map AS m
                WHERE m.mapcfg = $1 AND m.maptokentype = parse.tokid
!               ORDER BY m.mapseqno ) dl
!         WHERE dl.lex IS NOT NULL
          LIMIT 1
      ) AS "Lexized token"
  FROM pg_catalog.ts_parse(
--- 415,421 ----
              ( SELECT mapdict, pg_catalog.ts_lexize(mapdict, parse.token) AS lex
                FROM pg_catalog.pg_ts_config_map AS m
                WHERE m.mapcfg = $1 AND m.maptokentype = parse.tokid
!               ORDER BY pg_catalog.ts_lexize(mapdict, parse.token) IS NULL, m.mapseqno ) dl
          LIMIT 1
      ) AS "Lexized token"
  FROM pg_catalog.ts_parse(

pgsql-hackers by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: Include Lists for Text Search
Next
From: Andrew Dunstan
Date:
Subject: Re: invalidly encoded strings