Re: Forums & tsearch2 - best practices reg. concurrency - Mailing list pgsql-performance

From Oleg Bartunov
Subject Re: Forums & tsearch2 - best practices reg. concurrency
Date
Msg-id Pine.GSO.4.63.0506221451470.19683@ra.sai.msu.su
Whole thread Raw
In response to Forums & tsearch2 - best practices reg. concurrency  ("Markus Wollny" <Markus.Wollny@computec.de>)
List pgsql-performance
Markus,

wait for 8.1 which should solve all of these issues. We're working
on GiST concurrency & recovery right now. See http://www.pgsql.ru/db/mw/msg.html?mid=2073083
for details.

Oleg
On Wed, 22 Jun 2005, Markus Wollny wrote:

> Hello!
>
> We're using PostgreSQL 8.0.1 as general backend for all of our websites,
> including our online forums (aka bulletin boards or whatever you wish to
> call that). As for full text search capabilities, we've chosen to
> implement this via tsearch2. However, the tables themselves are quite
> large, and as there's lots of weird user input in them (just no way of
> limiting our users to "proper" orthography), so are the indices; we have
> already split up the main posting-table in two, one containing the more
> recent messages (<6 months) and one for everything else.
>
> Search capabilities have been limited to accessing only one of those,
> either recent or archive. Still, the tsearch2-GiST-index for a table is
> around 325MB in size; the "recent messages" table itself without any
> indices weighs in at about 1.8GB containing over one million rows, the
> archive-table is a little over 3GB and contains about 1.3 million rows.
> A full text search in the table with the recent postings can take up to
> five minutes.
>
> This wouldn't be much of a problem, as we're providing other, quicker
> search options (like searching for an author or a full text search just
> on the topics); the problem with the full text search lies in the
> locking mechanisms: As long as there's a search going on, all the
> subsequent INSERTs or UPDATEs on that table fail due to timeout. This
> means that currently, whenever we allow full text searching, there may
> be a timeframe of more than one hour, during which users cannot write
> any new postings in our forum or edit (i.e. update) anything. This is
> hardly acceptable...
>
> This is what I did to actually diagnose that simple tsearch2-related
> SELECTs where causing the write-locks:
>
> First I started a full text search query which I knew would run over
> four minutes. Then I waited for other users to try and post some
> messages; soon enough a 'ps ax|grep wait' showed several "INSERT/UPDATE
> waiting"-backends. So I took a look at the locks:
>
> select s.current_query as statement,
>        l.mode as lock_mode,
>        l.granted as lock_granted,
>        c.relname as locked_relation,
>        c.relnamespace as locked_relnamespace,
>        c.reltype as locked_reltype
> from pg_stat_activity s,
>        pg_locks l,
>        pg_class c
> where
>        l.pid = s.procpid
> and
>        l.relation = c.oid
> order by age(s.query_start) desc;
>
> I found four locks for the search query at the very beginning of the
> resultset - all of them of the AccessShareLock persuasion and granted
> alright: one on the message-table, one on the thread-table, one on the
> tsearch2-index and another one on the primary key index of the
> thread-table.
>
> The hanging inserts/updates were waiting for an AccessExclusiveLock on
> the tsearch2-index - all the other locks of these queries were marked as
> granted.
>
> As far as I understand from some of the previous messages on the mailing
> list regarding concurrency issues with GiST-type indices, any SELECT
> that's using a tsearch2-index would completely lock write-access to that
> index for the runtime of the query - is that correct so far?
>
> Now I'd like to find out about possible solutions or workarounds for
> this issue. Surely some of you must have encountered quite similar
> situations, so what did you do about it? I already pondered the idea of
> a separate insert/update-queue-table which would then be processed by a
> cron-job, thus separating the information-entry from the actual insert
> into the table that's blocked due to the lock on the index. Another
> possibility (which I find a little bit more compelling) would involve
> replicating the message-table via Slony-I to another database which
> could then be used as only target for any search-queries which require
> use of the GiST-index. Would this provide the needed "asynchronicity" to
> avoid this race condition between the AccessShareLock from the
> search-SELECT and the AccessExclusiveLock from the write access queries?
>
> I'd be very glad to know your opinions on this matter.
>
> Kind regards
>
>   Markus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

pgsql-performance by date:

Previous
From: Tobias Brox
Date:
Subject: Re: Querying 19million records very slowly
Next
From: Tom Lane
Date:
Subject: Re: Querying 19million records very slowly