Re: Forums & tsearch2 - best practices reg. concurrency - Mailing list pgsql-performance
From | Oleg Bartunov |
---|---|
Subject | Re: Forums & tsearch2 - best practices reg. concurrency |
Date | |
Msg-id | Pine.GSO.4.63.0506221451470.19683@ra.sai.msu.su Whole thread Raw |
In response to | Forums & tsearch2 - best practices reg. concurrency ("Markus Wollny" <Markus.Wollny@computec.de>) |
List | pgsql-performance |
Markus, wait for 8.1 which should solve all of these issues. We're working on GiST concurrency & recovery right now. See http://www.pgsql.ru/db/mw/msg.html?mid=2073083 for details. Oleg On Wed, 22 Jun 2005, Markus Wollny wrote: > Hello! > > We're using PostgreSQL 8.0.1 as general backend for all of our websites, > including our online forums (aka bulletin boards or whatever you wish to > call that). As for full text search capabilities, we've chosen to > implement this via tsearch2. However, the tables themselves are quite > large, and as there's lots of weird user input in them (just no way of > limiting our users to "proper" orthography), so are the indices; we have > already split up the main posting-table in two, one containing the more > recent messages (<6 months) and one for everything else. > > Search capabilities have been limited to accessing only one of those, > either recent or archive. Still, the tsearch2-GiST-index for a table is > around 325MB in size; the "recent messages" table itself without any > indices weighs in at about 1.8GB containing over one million rows, the > archive-table is a little over 3GB and contains about 1.3 million rows. > A full text search in the table with the recent postings can take up to > five minutes. > > This wouldn't be much of a problem, as we're providing other, quicker > search options (like searching for an author or a full text search just > on the topics); the problem with the full text search lies in the > locking mechanisms: As long as there's a search going on, all the > subsequent INSERTs or UPDATEs on that table fail due to timeout. This > means that currently, whenever we allow full text searching, there may > be a timeframe of more than one hour, during which users cannot write > any new postings in our forum or edit (i.e. update) anything. This is > hardly acceptable... > > This is what I did to actually diagnose that simple tsearch2-related > SELECTs where causing the write-locks: > > First I started a full text search query which I knew would run over > four minutes. Then I waited for other users to try and post some > messages; soon enough a 'ps ax|grep wait' showed several "INSERT/UPDATE > waiting"-backends. So I took a look at the locks: > > select s.current_query as statement, > l.mode as lock_mode, > l.granted as lock_granted, > c.relname as locked_relation, > c.relnamespace as locked_relnamespace, > c.reltype as locked_reltype > from pg_stat_activity s, > pg_locks l, > pg_class c > where > l.pid = s.procpid > and > l.relation = c.oid > order by age(s.query_start) desc; > > I found four locks for the search query at the very beginning of the > resultset - all of them of the AccessShareLock persuasion and granted > alright: one on the message-table, one on the thread-table, one on the > tsearch2-index and another one on the primary key index of the > thread-table. > > The hanging inserts/updates were waiting for an AccessExclusiveLock on > the tsearch2-index - all the other locks of these queries were marked as > granted. > > As far as I understand from some of the previous messages on the mailing > list regarding concurrency issues with GiST-type indices, any SELECT > that's using a tsearch2-index would completely lock write-access to that > index for the runtime of the query - is that correct so far? > > Now I'd like to find out about possible solutions or workarounds for > this issue. Surely some of you must have encountered quite similar > situations, so what did you do about it? I already pondered the idea of > a separate insert/update-queue-table which would then be processed by a > cron-job, thus separating the information-entry from the actual insert > into the table that's blocked due to the lock on the index. Another > possibility (which I find a little bit more compelling) would involve > replicating the message-table via Slony-I to another database which > could then be used as only target for any search-queries which require > use of the GiST-index. Would this provide the needed "asynchronicity" to > avoid this race condition between the AccessShareLock from the > search-SELECT and the AccessExclusiveLock from the write access queries? > > I'd be very glad to know your opinions on this matter. > > Kind regards > > Markus > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-performance by date: