Periodically slow inserts - Mailing list pgsql-performance

Hello,

We are  using PostgreSQL for  storing data and full-text  search indexes
for the webiste of a daily newspaper. We are very happy overall with the
results, but we have one "weird" behaviour that we would like to solve.

The problem is  when we index objects into the  full-text search part of
the database (which a DELETE and  then an INSERT into a specific table),
the INSERT  sometimes take a long time  (from 10s to 20s),  but the same
insert (and many other similar ones) are fast (below 0.2s).

This  slowness  comes  regularly,   about  every  200  objects  indexed,
regardless  of the frequency  of the  inserts. If  I reindex  one object
every 5 seconds for one hour, or one object every second for 10 minutes,
I've the same  kind of results : around 0.5% of  the time, indexing took
more than 10s.

The  positive point  is that  this slowness  doesn't block  the  rest of
queries to the database, but it's still painful to have to wait (even if
only once in a  while) for 10s or 20s when the  end-user hits the "save"
button.

This  slowness is associated  with very  high IO  load on  the operating
system.  I  tried playing with  checkpoint parameters (making  them more
frequent  or  less  frequent,  but  I  didn't  notice  any  siginificant
difference).

Do you have any hint on how to smooth the process, so we don't have this
regular huge slowdown ?



If you want more details about the setup :

- server is a Xen virtual machine  with 8Gb of memory, disks being 15000
  rpm SAS disks on RAID 1, and CPU being one core of a Nehalem processor
  (but CPU load is low anyway).

- the database schema is like :

CREATE TABLE sesql_index (
  classname varchar(255),
  id integer,
  created_at timestamp,
  modified_at timestamp,
  created_by integer,
  modified_by integer,
  workflow_state integer,
  site_id integer,
  title_text text,
  title_tsv tsvector,
  subtitle_text text,
  subtitle_tsv tsvector,
  fulltext_text text,
  fulltext_tsv tsvector,
  authors integer[],
  folders integer[],
  [...]
  indexed_at timestamp DEFAULT NOW(),
  PRIMARY KEY (classname, id)
);

CREATE TABLE sesql_author (CHECK (classname = 'Author'),
  PRIMARY KEY (classname, id)) INHERITS (sesql_index);

CREATE TABLE sesql_program (CHECK (classname = 'Program'),
  PRIMARY KEY (classname, id)) INHERITS (sesql_index);

CREATE TABLE sesql_default (CHECK (classname = 'Slideshow' OR classname
= 'Book' OR classname = 'Article' OR classname = 'Publication' OR
classname = 'Forum'), PRIMARY KEY (classname, id)) INHERITS (sesql_index);

(with a few other similar tables for different objects).

Inserts/deletes are  done directly into  the child tables,  searches are
done either  on the  master table (sesql_index)  or on the  child tables
depending of the use case (but search works fine anyway).

In  addition to  that we  have several  indexes, created  on  each child
tables :

CREATE INDEX sesql_default_classname_index ON sesql_default (classname);
CREATE INDEX sesql_default_id_index ON sesql_default (id);
CREATE INDEX sesql_default_created_at_index ON sesql_default (created_at);
CREATE INDEX sesql_default_modified_at_index ON sesql_default (modified_at);
CREATE INDEX sesql_default_created_by_index ON sesql_default (created_by);
CREATE INDEX sesql_default_modified_by_index ON sesql_default (modified_by);
CREATE INDEX sesql_default_workflow_state_index ON sesql_default (workflow_state);
CREATE INDEX sesql_default_site_id_index ON sesql_default (site_id);
CREATE INDEX sesql_default_publication_date_index ON sesql_default (publication_date);
CREATE INDEX sesql_default_authors_index ON sesql_default USING GIN (authors);
CREATE INDEX sesql_default_folders_index ON sesql_default USING GIN (folders);

And the  heavy ones, for each  fulltext field, we have  two columns, the
text  and the  tsv, with  an index  on the  tsv, and  the tsv  itself is
updated via a trigger :

CREATE INDEX sesql_default_fulltext_index ON sesql_default USING GIN (fulltext_tsv);

CREATE TRIGGER sesql_default_fulltext_update BEFORE INSERT OR UPDATE
ON sesql_default FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(fulltext_tsv, 'public.lem_french', fulltext_text);

Thanks a lot for reading me until here ;)

Regards,

--
Gaël Le Mignot - gael@pilotsystems.net
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Index scan is not working, why??
Next
From: Kenneth Marshall
Date:
Subject: Re: Periodically slow inserts