Re: UPDATEDs slowing SELECTs in a fully cached database - Mailing list pgsql-performance

From lars
Subject Re: UPDATEDs slowing SELECTs in a fully cached database
Date
Msg-id 4E1BD0C5.5030204@yahoo.com
Whole thread Raw
In response to Re: UPDATEDs slowing SELECTs in a fully cached database  (Robert Klemme <shortcutter@googlemail.com>)
Responses Re: UPDATEDs slowing SELECTs in a fully cached database  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-performance
On 07/11/2011 08:26 AM, Robert Klemme wrote:
> On Mon, Jul 11, 2011 at 3:13 PM, ktm@rice.edu<ktm@rice.edu>  wrote:
>> I do not know if this makes sense in PostgreSQL and that readers
>> do not block writers and writes do not block readers. Are your
>> UPDATEs to individual rows, each in a separate transaction, or
>> do you UPDATE multiple rows in the same transaction? If you
>> perform multiple updates in a single transaction, you are
>> synchronizing the changes to that set of rows and that constraint
>> is causing other readers that need to get the correct values post-
>> transaction to wait until the COMMIT completes. This means that
>> the WAL write must be completed.
> What readers should that be?  Docs explicitly state that readers are
> never blocked by writers:
> http://www.postgresql.org/docs/9.0/interactive/mvcc-intro.html
> http://www.postgresql.org/docs/9.0/interactive/mvcc.html
>
>  From what I understand about this issue the observed effect must be
> caused by the implementation and not by a conceptual issue with
> transactions.
>
>> Have you tried disabling synchronous_commit? If this scenario
>> holds, you should be able to reduce the slowdown by un-batching
>> your UPDATEs, as counter-intuitive as that is. This seems to
>> be similar to a problem that I have been looking at with using
>> PostgreSQL as the backend to a Bayesian engine. I am following
>> this thread with interest.
> I don't think this will help (see above).  Also, I would be very
> cautious to do this because although the client might get a faster
> acknowledge the DB still has to do the same work as without
> synchronous_commit (i.e. WAL, checkpointing etc.) but it still has to
> do significantly more transactions than in the batched version.
>
> Typically there is an optimum batch size: if batch size is too small
> (say, one row) the ratio of TX overhead to "work" is too bad.  If
> batch size is too large (say, millions of rows) you hit resource
> limitations (memory) which inevitable force the RDBMS to do additional
> disk IO.
>
> Kind regards
>
> robert
>
Thanks Ken and Robert,

What I am observing is definitely not readers blocked by writers by
means of row-level locking.

This seems to be some implementation detail in Postgres about how dirty
pages (or dead versions of tuples) are
flushed to the disk (see the other parts of this thread) when they
accessed by a SELECT query.

The batch size in this case is one SELECT statement accessing 10000 rows
via an aggregate (such as COUNT) and
an UPDATE updating 10000 rows in a single statement.

I am not trying to optimize this particular use case, but rather to
understand what Postgres is doing, and why SELECT
queries are affected negatively (sometimes severely) by concurrent (or
even preceding) UPDATEs at all when the
database resides in the cache completely.

-- Lars


pgsql-performance by date:

Previous
From: lars
Date:
Subject: Re: UPDATEDs slowing SELECTs in a fully cached database
Next
From: Ivan Voras
Date:
Subject: Re: UPDATEDs slowing SELECTs in a fully cached database