Re: Problems with FTS - Mailing list pgsql-performance

From Rauan Maemirov
Subject Re: Problems with FTS
Date
Msg-id AANLkTim4vLhMB5Ty6hTN4XdNoJP0BWYxguqEXt0+r920@mail.gmail.com
Whole thread Raw
In response to Re: Problems with FTS  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: Problems with FTS  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
Hi, Kevin.

Sorry for long delay.

EXPLAIN ANALYZE SELECT "v"."id", "v"."title" FROM "video" AS "v"
WHERE (v.active) AND (v.fts @@ 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery and v.id <> 500563 ) 
ORDER BY COALESCE(ts_rank_cd( '{0.1, 0.2, 0.7, 1.0}', v.fts, 'dexter:A|season:A|seri:A|декстер:A|качество:A|сезон:A|серия:A'::tsquery), 1) DESC, v.views DESC 
LIMIT 6

"Limit  (cost=103975.50..103975.52 rows=6 width=280) (actual time=2893.193..2893.199 rows=6 loops=1)"
"  ->  Sort  (cost=103975.50..104206.07 rows=92228 width=280) (actual time=2893.189..2893.193 rows=6 loops=1)"
"        Sort Key: (COALESCE(ts_rank_cd('{0.1,0.2,0.7,1}'::real[], fts, '( ( ( ( ( ''dexter'':A | ''season'':A ) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) | ''сезон'':A ) | ''серия'':A'::tsquery), 1::real)), views"
"        Sort Method:  top-N heapsort  Memory: 25kB"
"        ->  Seq Scan on video v  (cost=0.00..102322.34 rows=92228 width=280) (actual time=0.100..2846.639 rows=54509 loops=1)"
"              Filter: (active AND (fts @@ '( ( ( ( ( ''dexter'':A | ''season'':A ) | ''seri'':A ) | ''декстер'':A ) | ''качество'':A ) | ''сезон'':A ) | ''серия'':A'::tsquery) AND (id <> 500563))"
"Total runtime: 2893.264 ms"

Table scheme:

CREATE TABLE video
(
  id bigserial NOT NULL,
  hash character varying(12),
  account_id bigint NOT NULL,
  category_id smallint NOT NULL,
  converted boolean NOT NULL DEFAULT false,
  active boolean NOT NULL DEFAULT true,
  title character varying(255),
  description text,
  tags character varying(1000),
  authorized boolean NOT NULL DEFAULT false,
  adult boolean NOT NULL DEFAULT false,
  views bigint DEFAULT 0,
  rating real NOT NULL DEFAULT 0,
  screen smallint DEFAULT 2,
  duration smallint,
  "type" smallint DEFAULT 0,
  mp4 smallint NOT NULL DEFAULT 0,
  size bigint,
  size_high bigint DEFAULT 0,
  source character varying(255),
  storage_id smallint NOT NULL DEFAULT 1,
  rule_watching smallint,
  rule_commenting smallint,
  count_comments integer NOT NULL DEFAULT 0,
  count_likes integer NOT NULL DEFAULT 0,
  count_faves integer NOT NULL DEFAULT 0,
  fts tsvector,
  modified timestamp without time zone NOT NULL DEFAULT now(),
  created timestamp without time zone DEFAULT now(),
  CONSTRAINT video_pkey PRIMARY KEY (id),
  CONSTRAINT video_hash_key UNIQUE (hash)
)
WITH (
  OIDS=FALSE
);

Indexes:

CREATE INDEX idx_video_account_id  ON video  USING btree  (account_id);
CREATE INDEX idx_video_created  ON video  USING btree  (created);
CREATE INDEX idx_video_fts  ON video  USING gin  (fts);
CREATE INDEX idx_video_hash  ON video  USING hash  (hash);

(here I tried both gist and gin indexes)

I have 32Gb ram and 2 core quad E5520, 2.27GHz (8Mb cache).

Pgsql conf:
max_connections = 200
shared_buffers = 7680MB
work_mem = 128MB
maintenance_work_mem = 1GB
effective_cache_size = 22GB
default_statistics_target = 100

Anything else?

2010/12/18 Kevin Grittner <Kevin.Grittner@wicourts.gov>
Rauan Maemirov  wrote:

> EXPLAIN SELECT [...]

Please show us the results of EXPLAIN ANALYZE SELECT ...

Also, please show us the table layout (including indexes), and
details about your hardware and PostgreSQL configuration.  See this
page for details:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

> As you can see the query doesn't use index.

That means that either the optimizer thinks that the index isn't
usable for this query (due to type mismatch or some such) or that it
thinks a plan without the index costs less to run (i.e., it will
generally run faster).  You haven't told us enough to know whether
that is actually true, much less how to allow PostgreSQL to develop
more accurate costing estimates in your environment if it's currently
wrong about this.

-Kevin

pgsql-performance by date:

Previous
From: Scott Carey
Date:
Subject: Re: "SELECT .. WHERE NOT IN" query running for hours
Next
From: Euler Taveira de Oliveira
Date:
Subject: Re: pgbench to the MAXINT