Re: Possible multiprocess lock/unlock-loop problem in Postgresql 9.2 - Mailing list pgsql-general

From Yngve N. Pettersen
Subject Re: Possible multiprocess lock/unlock-loop problem in Postgresql 9.2
Date
Msg-id op.xchgburr3dfyax@killashandra.invalid.invalid
Whole thread Raw
In response to Re: Possible multiprocess lock/unlock-loop problem in Postgresql 9.2  ("Yngve N. Pettersen" <yngve@spec-work.net>)
Responses Re: Possible multiprocess lock/unlock-loop problem in Postgresql 9.2
List pgsql-general
Hi again,

I have now had time to do further research about this issue. I have been
able to produce a script (available on request) that reproduces the
problem, even in tables as small as 100 items and using a single thread,
and as a result may have located an area that may cause the problem: A
conditional index.

As mentioned in the attachment to my previous email the table which is
having the problem look like this:

                                       Table "public.probedata2_probequeue"
        Column     |         Type         |
Modifiers
----------------+----------------------+-------------------------------------
    id             | bigint               | not null default
nextval('probedata2_probequeue_id_seq'::regclass)
    part_of_run_id | integer              | not null
    server_id      | integer              | not null
    state          | character varying(1) |
Indexes:
       "probedata2_probequeue_pkey" PRIMARY KEY, btree (id)
       "run_server" UNIQUE CONSTRAINT, btree (part_of_run_id, server_id)
       "probedata2_probequeue_finished" btree (id) WHERE state::text =
'F'::text
       "probedata2_probequeue_run_idle" btree (part_of_run_id) WHERE
state::text = 'I'::text
       "probedata2_probequeue_started" btree (part_of_run_id) WHERE
state::text = 'S'::text
Foreign-key constraints:
       "probedata2_probequeue_part_of_run_id_fkey" FOREIGN KEY
(part_of_run_id) REFERENCES probedata2_proberun(id) DEFERRABLE INITIALLY
DEFERRED
       "probedata2_probequeue_server_id_fkey" FOREIGN KEY (server_id)
REFERENCES probedata2_server(id) DEFERRABLE INITIALLY DEFERRED


In my test database I have been running tests without the
"probedata2_probequeue_run_idle"-equivalent and the other conditional
indexes.

Without the "run_idle"-index the queries (fetch idle candidates, mark as
started, fetch the records) complete in less than a second (<0.9 seconds),
as expected.

*With* the index, the time to complete that operation increases by a
factor ranging from 10-30 times for small sequences and a single thread,
to 1000 times for large sequences (23 million, 4-10 threads), taking up to
20
minutes to complete an update of 6000 rows for each thread, running the
process at 100% CPU the whole time.

The purpose of the index is to provide quick access to the idle items for
a given job, both records and counts. Normally, there will currently be
just a single active job in the table, and at the time the problem is
occurring all entries for the job will be in the index.

As I mentioned last time, the problematic command is be the UPDATE command

       UPDATE probedata2_probequeue SET "state"=E'S'
       WHERE "state" = E'I'  AND id IN ( .....)
       RETURNING id

and I have confirmed that with the built-in Django debug query information
in my test script.


On Sat, 08 Feb 2014 15:57:10 +0100, Yngve N. Pettersen
<yngve@spec-work.net> wrote:

> Hi again,
>
> Sorry about the delay, but an unexpected 3 week trip combined with not
> being
> able to fix the system's router whose configuration had become corrupted
> before I left, meant that I could not perform any testing until this
> week,
> after the router had been fixed and reinstalled.
>
> I just did a test using expanded logging of queries and duration, and
> actually saw this happen with just 4 (four) processes, not the 10 I have
> seen before.
>
> The relevant parts of the log, as well as an EXPLAIN and table info dump
> are attached in a zipfile; the large parts consisting of 6000
> to-be-updated
> IDs in the UPDATE commands have been removed for clarity (the sequences
> were not overlapping, with a numerical distance of at least 80000 from
> the other updates, and each sequence was within a range of 7500 IDs)
>
> Background: Immediately before this operation the queue had been set up
> with the
> command
>
>      INSERT INTO probedata2_probequeue (part_of_run_id, server_id, state)
>      SELECT '334' AS part_of_run_id, server_id, E'I' AS state FROM
> probedata2_preparedqueueitem
>      WHERE part_of_queue_id = '2'
>
> followed by a COMMIT, a process taking ~15 minutes to copy a 23 million
> entry table into the queue. After this the total number of entries in the
> target table is 70 million, distributed across 3 runs.
>
> Some SELECT count and ANALYZE operations have probably also been
> performed
> after the initialization operation, before processes
>
> Shortly afterward (within a couple of minutes), four processes initiated
> the SELECT and UPDATE
> sequence I outlined earlier, each operation was performed between 1 and
> 10
> seconds after one of the others, 20 seconds from start to last command
> started.
>
> SELECT id FROM probedata2_probequeue
> WHERE state = E'I'  AND part_of_run_id = 334
> LIMIT 6000 OFFSET 85103
>
> UPDATE probedata2_probequeue SET "state"=E'S'
> WHERE "state" = E'I'  AND id IN ( .....)
> RETURNING id
>
> The SELECT operations took 21-122ms to complete, while the UPDATEs took
> 1093000-1120000 ms (~18 minutes, to complete). During this time the
> processes were running at 100% CPU. With more processes involved earlier
> I
> recall seeing even longer execution times for the UPDATEs, before I
> killed
> the processes (this is the first time I have seen this kind of situation
> be resolved without killing the processes).
>
> For reference, once these 4 commands had completed (the system is set up
> to wait until it sees task completed messages from the first processes
> that were started, before starting new ones), the next sequence of these
> commands took 122ms and 107ms, respectively, and the second took 50ms and
> 108ms.
>
> Any suggestions for where to investigate further?
>
> I am considering setting up a small example to see if I can reproduce,
> but
> have not had time to do so yet.
>
> On Sat, 04 Jan 2014 20:06:19 +0100, Yngve N. Pettersen
> <yngve@spec-work.net> wrote:
>
>> On Sat, 04 Jan 2014 19:40:31 +0100, Andrew Sullivan
>> <ajs@crankycanuck.ca> wrote:
>>
>>> On Sat, Jan 04, 2014 at 07:07:08PM +0100, Yngve N. Pettersen wrote:
>>>> I tried that before, but ran into some issues, IIRC a similar looping
>>>> problem as this where queries never ended. I split it up in an
>>>> attempt to
>>>> solve that problem.
>>>
>>> Pulling the data out into the application and sending it back in won't
>>> improve things.  Exactly the same number of rows need to be visited,
>>> but the way you have it now you have to marshall all the data and ship
>>> it to the application too.  So it's automatically slower.  Indeed,
>>> making it slower might have masked your problem.
>>
>> Could be
>>
>>>
>>>> In the select/update case there is no sorting in the query; there is
>>>> an
>>>> offset/limit clause though, number of records retrieved are currently
>>>> restricted to <10000 per query (out of 20 million in the active
>>>> subset).
>>>>
>>>>     SELECT id from queue where state = E'I' and job_id = <integer>
>>>> offset
>>>> <random 200..150000> limit <1-6000>
>>>
>>> This could be part of problem.  Are the different threads working on
>>> different job_ids, or is that the same job_id?  If you don't SORT that
>>
>> Same job_id, at least in the current system.
>>
>>> query before the OFFSET, then the rows will come back in whatever
>>> order the system likes.
>>
>> I suspect that a sort operation on (currently) 20+ million rows for
>> every query for just 6000 (previous version was 1500 entries) would
>> cause quite a bit more slowness than breaking up the query in two
>> operations, or the risk of collisions would, because each process would
>> have to load all that information (even if it is cached).
>>
>>>> However, in the UPDATE case, the looping processes are all UPDATE
>>>> queries,
>>>> no SELECTs involved.
>>>
>>> But you said it's all in the same transaction scope.  The lock is a
>>> the transaction scope.
>>
>> But the statement it locks/loops on are only UPDATE statements, also in
>> the processes that are waiting.
>>
>>> Anyway, what I'd do is try to cause the condition and post the
>>> pg_locks information.  When I've done this in the past, usually the
>>> best thing to do is also to have query logs on for everything (don't
>>> forget to log the pid!) so you can see what the other transaction
>>> you're waiting on touched already.  You can usually find the inversion
>>> that way.  Once you see it, it's always obvious what you've done, in
>>> my experience (and completely mystifying before that, unfortunately).
>>
>> Will take a look in a few days, probably midweek.
>>
>>
>
>


--
Sincerely,
Yngve N. Pettersen

Using Opera's mail client: http://www.opera.com/mail/


pgsql-general by date:

Previous
From: "Daniel Verite"
Date:
Subject: Re: libpq - lack of support to set the fetch size
Next
From: "Aggarwal, Ajay"
Date:
Subject: replication timeout in pg_basebackup