Re: Delay locking partitions during INSERT and UPDATE - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Delay locking partitions during INSERT and UPDATE
Date
Msg-id 0b1531d4-fce5-c7cf-1018-592fb0456124@2ndquadrant.com
Whole thread Raw
In response to Delay locking partitions during INSERT and UPDATE  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Delay locking partitions during INSERT and UPDATE  (sho kato <kato-sho@jp.fujitsu.com>)
List pgsql-hackers
Hi,

On 11/23/18 1:14 AM, David Rowley wrote:
> As a follow-on from [1] and also discussed in [2], I'd like to propose
> that we don't obtain locks on all partitions during INSERT into a
> partitioned table and UPDATE of a partitioned key and instead, only
> lock the partition when we first route a tuple to it. This means that
> the order that the locks are obtained is no longer well defined and is
> at the mercy of the order that tuples are INSERTed or UPDATEd.  It
> seems worth relaxing this a bit for gains in performance, as when a
> partitioned table contains many partitions, the overhead of locking
> all partitions when inserting a single row, or just a few rows is
> often significantly higher than the cost of doing the actual insert.
> 

Yep, the locking seems like a significant bottleneck. I've done quite a
bit of testing on two machines, using a slightly modified version of
your test script with variable number of partitions (0 means not
partitioned), and the results look like this:

1) xeon e5-2620v4

    partitions        0     100     1000    10000
    ---------------------------------------------
    master        16643    6956     1039      108
    patched       16398   15522    15222    13228

2) i5-2500k

    partitions     0     100    1000    10000
    -----------------------------------------
    master      3901    2892     920       76
    patched     3894    3838    3845     3522

When using UNLOGGED tables to minimize the external noise, it looks like
this:

3) xeon e5-2620v4

    partitions      0      100     1000    10000
    --------------------------------------------
    master      30806     8740     1091      107
    patched     30455    28137    27582    24985

    partitions      0      100     1000    10000
    --------------------------------------------
    master      27662     9013     1277       79
    patched     28263    26474    25794    22434


So the performance benefit is pretty clear - up to 2 orders of magnitude
with 10k partitions, and gets us fairly close to non-partitioned table.

Me gusta.

> The current behaviour was added in 54cde0c4c058073 in order to
> minimise deadlock risk.  It seems that the risk there only comes from
> AELs that could be taken when a partition directly receives a TRUNCATE
> / CREATE INDEX / VACUUM FULL / CLUSTER. There's obviously no conflict
> with other DML operations since two RowExclusiveLocks don't conflict
> with each other.  I think all other AEL obtaining DDL must be
> performed on the top level partitioned table, for example, ADD COLUMN
> can't be done directly on a partition, so there's no added deadlock
> risk from those. For a deadlock to occur one of the above DDL commands
> would have to be executed inside a transaction in an order opposite to
> the order rows are being INSERTed or UPDATEd in the partitioned table.
> If required, such operations could LOCK TABLE the top partitioned
> table to block the DML operation. There's already a risk of similar
> deadlocks from such operations done on multiple separate tables when
> the order they're done is not the same as the order the tables are
> written in a query, although, in that case, the window for the
> deadlock is likely to be much smaller.
> 

Hmmm, yeah.

Per the discussion in [1] the locking was necessary also to ensure
partitions can't disappear while we're building the descriptors in
RelationBuildPartitionDesc(). But AFAICS 3f2393edef fixed this.

The other issue - as you note - is ensuring locking order, to prevent
(or rather reduce the risk of) deadlocks. I agree with your assessment
here, i.e. that locking the parent is a sufficient protection.

Maybe there's an alternative solution with the same benefits and not
sacrificing the lock ordering, but I fail to see how it would work.

> 
> [1] https://www.postgresql.org/message-id/flat/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: pg_upgrade: Pass -j down to vacuumdb
Next
From: David Rowley
Date:
Subject: Re: "SELECT ... FROM DUAL" is not quite as silly as it appears