Re: SSI and predicate locks - a non-trivial use case - Mailing list pgsql-general

From Gianni Ceccarelli
Subject Re: SSI and predicate locks - a non-trivial use case
Date
Msg-id 20130902135944.413cab39@exelion
Whole thread Raw
In response to Re: SSI and predicate locks - a non-trivial use case  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: SSI and predicate locks - a non-trivial use case  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-general
On 2013-08-31 Kevin Grittner <kgrittn@ymail.com> wrote:
> Of course, the subject line gives me some pause -- I'm aware of many
> uses of SSI in non-trivial production environments, including
> multi-terrabyte databases with thousands of concurrent users. In
> some cases the performance hit compared to REPEATABLE READ, which
> would allow anomalies, is about 1%.

The subject line was a product of my ignorance of such use
cases. Also, I'm not really talking about performance: I've seen it
working with no appreciable slow-down in updating documents, relative
to our previous usage (``select for update;update;commit``), which had
the distinct disadvantage of locking out all transactions that wanted
to touch the same documents. Performance-wise, it works brilliantly.

> Using SSI to manage a database queue is known to be a "worst case"
> workload.

Sorry, I was not clear enough. The queue is managed by ActiveMQ, each
concurrent consumer gets a different message and (usually) updates a
different set of documents.

> your problems are with the access to the documents based on the
> requests pulled from the queue.

It's not even much of a problem. I mean, it works and it works well.
I was looking for explanations about the details that I don't
understand, and I thought that providing a use case could help the
implementers tune some of the internal logic.

> I'm afraid that one of the most interesting and pertinent bits of
> the code is written as ellipses. Can you show or describe what the
> REPEATABLE READ transaction is doing in more detail?

It's really not much. Most of the times it's doing a ``SELECT id FROM
docs WHERE id IN (?,?,?,...)`` to check which documents to update
(missing ones may be created, or ignored, depending on the particular
message). Other times it's doing ``SELECT document_id FROM
doc_attribute WHERE attribute_name=? AND value=?`` to get a set of ids
of documents that have some attribute set to a particular value.

> > If we receive a serialisation error, we retry the whole
> > transaction, applying the changes to the new version of the
> > document. Each retry takes about 0.1 seconds.
>
> What percentage of transactions are rolled back? What is the
> processing time in such transactions as a percentage of the total
> load?

Depends on the message. Most times messages touch one or two
documents, and very rarely collide. Sometimes we get two messages that
touch the same 100 documents, and about a quarter of the 200 commits
will fail (200 because 2 consumers are updating 100 documents
each). The consumer processes spend most of their time waiting for
messages, and the rest inside a serialisable transaction. Under our
load tests, the consumers were force-fed messages, so they were
spending essentially all their time updating documents inside such
transactions.

> [Locks without PIDs] are predicate locks related to a transaction
> which has been PREPARED (for two-phase commit) or committed, but
> which may still be relevant because there are overlapping read-write
> transactions which are still active.  One long-running SELECT
> statement, if it is not declared to be in a read-only transaction,
> could cause a large number of such locks to accumulate.

So a long "read committed" transaction will cause locks from
"serialisable" transactions to accumulate? Good to know, I had not
realised that.

> If you are not already doing so, make sure that any serializable
> transaction which is not going to modify data is declared to be
> READ ONLY at the start.

All our serializable transactions modify data. Those that don't, don't
need to be isolated that much, so we declare them "read
committed". Should we declare them as "read committed, read only"?

> > - Is the large number of page locks to be expected?
>
> There is probably some database transaction (and that will count
> individual statements not explicitly included in transactions)
> which is running for minutes.

The slow transactions should only be "read committed". Or we may have
some bug in the code. I'll keep looking.

> As you probably noticed, the heuristics in
> PredicateLockPromotionThreshold() are pretty simple.

Yes, that's the main reason I decided to write to the list.

> and you can always just go really high on
> max_pred_locks_per_transaction instead.

Could it be useful to document that predicate locks are very small
(from the source, I'd say around 150 bytes), so that people don't get
scared to set max_pred_locks_per_transaction very high?

> If you have a workload which you think would do better with
> something more sophisticated, it would be great to have more
> details.

I'm no longer sure that our system is interesting, but I'll be glad to
provide as much detail as I can gather. What kind of details would be
useful?

> If you wanted to benchmark your workload against a custom version
> with a modified PredicateLockPromotionThreshold() function, that
> would be fantastic.

I don't know enough to write such a modified version, but I can run
it.

--
    Dakkar - <Mobilis in mobile>
    GPG public key fingerprint = A071 E618 DD2C 5901 9574
                                 6FE2 40EA 9883 7519 3F88
                        key id = 0x75193F88

Chicken Little was right.

Attachment

pgsql-general by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: store multiple rows with the SELECT INTO statement
Next
From: DT
Date:
Subject: ALTER TABLE transaction isolation problem