Thread: Transaction locks on first insert into partitioned table partition

Transaction locks on first insert into partitioned table partition

From
Martin Lund Askøe
Date:
I'm experiencing something I cannot explain with regards to partitioned tables.
I create a partitioned table and create a partition on it in one transaction.
I then insert a record into the partition, somehow taking a lock on the master table.
Then I rollback. I then insert a record again, but this time I do not take a lock on the master table.

Why is this happening?

This is my reproduction of the problem.

CREATE TABLE master_table (
  id               bigint NOT NULL,
  some_info   varchar NOT NULL,
  e_timestamp   timestamp NOT NULL
) PARTITION BY RANGE (e_timestamp);
create index on master_table (id);
create index on master_table (e_timestamp);

CREATE TABLE partition_table_2019_01_01 PARTITION OF master_table FOR VALUES FROM ('2019-01-01') TO ('2019-01-02');
-- locks taken are (amongs others)
-- master_table AccessExclusiveLock
-- partition_table_2019_01_01 ShareLock
-- okay so far
commit;

insert into partition_table_2019_01_01 (id, some_info, e_timestamp) VALUES (1, 'hello', '2019-01-01 10:00:00'::timestamp);
-- locks taken are (amongs others)
-- master_table AccessShareLock
-- partition_table_2019_01_01 RowExclusiveLock
-- why a lock on the master_table ?
rollback;

insert into partition_table_2019_01_01 (id, some_info, e_timestamp) VALUES (1, 'hello', '2019-01-01 10:00:00'::timestamp);
-- locks taken are (amongs others)
-- partition_table_2019_01_01 RowExclusiveLock
-- this time no lock on the master_table ???? Did something happen during the transaction that was rolled back?

Any input would be appreciated.
Regards, Martin.

Re: Transaction locks on first insert into partitioned table partition

From
David Rowley
Date:
On Fri, 7 Jun 2019 at 03:31, Martin Lund Askøe
<martinlundaskoe@gmail.com> wrote:
>
> I'm experiencing something I cannot explain with regards to partitioned tables.
> I create a partitioned table and create a partition on it in one transaction.
> I then insert a record into the partition, somehow taking a lock on the master table.
> Then I rollback. I then insert a record again, but this time I do not take a lock on the master table.
>
> Why is this happening?

Attaching the partition to the partitioned table causes a relcache
invalidation (basically an internal cache that each session maintains
to quickly access relation metadata). When you open a relation for the
first time after its cache entry was invalidated, the data must be
reloaded. This requires locking the relation to ensure nobody drops it
out from under us.  For this particular case, we must look up the
parent partitioned table's details in order to determine if the tuple
you're inserting is suitable for that partition. The code in question
is in generate_partition_qual(). Since the partition bound is then
stored in the partition's relcache entry, and not the partitioned
table then we've no need to open the parent again on subsequent
inserts that go directly to the partition. If you attached another
partition or did something else like add a column, then this would
invalidate the entry again and you'd see the lock for the duration of
the transaction.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Transaction locks on first insert into partitioned table partition

From
Martin Lund Askøe
Date:
Hi David.

Thank you for your reply.

Do you know if there is any way to force update the relcache entry during my 'attach'-transaction?
After attaching a new partition and committing that transaction, I would like to be able to start a data insert transaction (potentially long running) on the partition without blocking other transactions from attaching another partition, as I would by taking an AccessShareLock on the master table.

So,
Attach partition A -> transaction 1
Insert into partition A -> transaction 2
Attach partition B -> transaction 3 (should not be blocked by transaction 2)

I can think of workarounds, but they aren't pretty.

Regards, Martin.


On Fri, Jun 7, 2019 at 3:11 AM David Rowley <david.rowley@2ndquadrant.com> wrote:

Attaching the partition to the partitioned table causes a relcache
invalidation (basically an internal cache that each session maintains
to quickly access relation metadata). When you open a relation for the
first time after its cache entry was invalidated, the data must be
reloaded. This requires locking the relation to ensure nobody drops it
out from under us.  For this particular case, we must look up the
parent partitioned table's details in order to determine if the tuple
you're inserting is suitable for that partition. The code in question
is in generate_partition_qual(). Since the partition bound is then
stored in the partition's relcache entry, and not the partitioned
table then we've no need to open the parent again on subsequent
inserts that go directly to the partition. If you attached another
partition or did something else like add a column, then this would
invalidate the entry again and you'd see the lock for the duration of
the transaction.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Transaction locks on first insert into partitioned table partition

From
David Rowley
Date:
On Fri, 7 Jun 2019 at 19:06, Martin Lund Askøe
<martinlundaskoe@gmail.com> wrote:
> Do you know if there is any way to force update the relcache entry during my 'attach'-transaction?

You could just do a dummy insert after the ATTACH PARTITION
transaction ends and before the transaction where you start inserting
the data. Something like:

insert into partition_table_2019_01_01 (id, some_info, e_timestamp)
select 1,'',now() where 1=0;

That'll build and cache the partition bound in the partitions relcache entry.

I'm curious to know why you have concerns about the AccessShareLock on
the partitioned table being held during the insert. Can you explain
that?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Transaction locks on first insert into partitioned table partition

From
Martin Lund Askøe
Date:
Hi David,

It should be obvious that I don't want any sessions to be able to inadvertently be taking locks on the master table.
I could construct a situation where this would produce a deadlock, but this will be based on poor code quality.

Lets say that I have transaction A inserting into a newly attached partition partition_table_2019_01_01 inadvertently taking a access shared lock on the master_table (because of the relcache update).
At some point in transaction A I encounter data which should be inserted into a non-existing partition_table_2019_01_02 (infer the meaning of the dates in the table), so in that same transaction I decide to create and attach that table.
Unfortunately another transaction B (completely unrelated to transaction A, even in a different session) has had the same idea, and has already been granted a access exclusive lock on that table prior to creating the table, but is now missing a lock on the master_table in order to complete the attach.
Now we have deadlock and someone will need to perform a rollback.

Of course this example is a bad one, because I should not decide to create and attach a new partition table in the middle of transaction A, but it does show how my deadlock could happen - "almost" out of my control.

I've tried doing the dummy insert as part of the transaction that creates and attaches the partition table, but it does not update the relcache entry or at least it had no affect.
I still end up taking the shared access lock on the master_table on next insert or deletion. What does work is doing the dummy insert in another transaction, and I don't even have to commit it, I can just do a rollback.

It feels like a bug to me, that the relcache entry update leaves my transaction with an access shared lock.
Are there no way to enforce the cache update during the attach, when I already have the lock on the master_table?

Regards, Martin.

On Sat, Jun 8, 2019 at 8:49 AM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Fri, 7 Jun 2019 at 19:06, Martin Lund Askøe
<martinlundaskoe@gmail.com> wrote:
> Do you know if there is any way to force update the relcache entry during my 'attach'-transaction?

You could just do a dummy insert after the ATTACH PARTITION
transaction ends and before the transaction where you start inserting
the data. Something like:

insert into partition_table_2019_01_01 (id, some_info, e_timestamp)
select 1,'',now() where 1=0;

That'll build and cache the partition bound in the partitions relcache entry.

I'm curious to know why you have concerns about the AccessShareLock on
the partitioned table being held during the insert. Can you explain
that?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
=?UTF-8?Q?Martin_Lund_Ask=C3=B8e?= <martinlundaskoe@gmail.com> writes:
> It feels like a bug to me, that the relcache entry update leaves my
> transaction with an access shared lock.

It is not a bug.  The fact that the uncommitted insert takes a lock
preventing any DDL changing the partition structure is actually an
essential property, because otherwise such DDL could perhaps change
which partition the inserted row should have gone into.  The DDL
command would then be responsible for moving rows appropriately ...
but it can't see an uncommitted row from another session, so it
would fail to move it.  Result: corrupted data.

It might be possible to draw some lines around which sorts of
partition-changing DDL could result in that sort of change, allowing
more fine-grained locking.  But we don't have any mechanism for that
right now, and I'm not really sure that it'd be cost-effective to
draw finer distinctions.

            regards, tom lane



Re: Transaction locks on first insert into partitioned table partition

From
David Rowley
Date:
On Sun, 9 Jun 2019 at 11:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> =?UTF-8?Q?Martin_Lund_Ask=C3=B8e?= <martinlundaskoe@gmail.com> writes:
> > It feels like a bug to me, that the relcache entry update leaves my
> > transaction with an access shared lock.
>
> It is not a bug.  The fact that the uncommitted insert takes a lock
> preventing any DDL changing the partition structure is actually an
> essential property, because otherwise such DDL could perhaps change
> which partition the inserted row should have gone into.  The DDL
> command would then be responsible for moving rows appropriately ...
> but it can't see an uncommitted row from another session, so it
> would fail to move it.  Result: corrupted data.

If that were the case, we'd already have bugs since the lock on the
parent is only taken when we must obtain the partition bound in
generate_partition_qual(), and that's only going to happen just after
a relcache entry has been invalidated or if it's not been built yet.

The only DDL that could affect what partition the row is meant to go
into is DROP TABLE or DETACH PARTITION on the partition that's being
INSERTed into. In that case, we'll take an AccessExclusiveLock on the
partition itself which would prevent the concurrent INSERT.

To me it sounds very much like Martin is trying to use partitioning in
a way that's not supported. He didn't mention it, by my guess is he's
trying to use a BEFORE INSERT trigger to create a partition that does
not exist. That's a very bad idea and he'll be much better off
creating the partitions before they're required using some job that
runs periodically.  I think this is likely the case since otherwise,
he'd not have a problem with starting a new transaction to perform the
dummy insert to populate the relcache.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Transaction locks on first insert into partitioned table partition

From
Martin Lund Askøe
Date:

The only DDL that could affect what partition the row is meant to go
into is DROP TABLE or DETACH PARTITION on the partition that's being
INSERTed into. In that case, we'll take an AccessExclusiveLock on the
partition itself which would prevent the concurrent INSERT.

To me it sounds very much like Martin is trying to use partitioning in
a way that's not supported. He didn't mention it, by my guess is he's
trying to use a BEFORE INSERT trigger to create a partition that does
not exist. That's a very bad idea and he'll be much better off
creating the partitions before they're required using some job that
runs periodically.  I think this is likely the case since otherwise,
he'd not have a problem with starting a new transaction to perform the
dummy insert to populate the relcache. 

If only specific DDL changes invalidate the relcache entry then why not update the cache when -they- happen. I don't understand what is going on behind the scenes so there may be a good explanation.

I am creating missing partition tables ad-hoc roughly once every day as my table naming pattern would suggest, though not as a before insert trigger. 
I am inserting about 1000 records every second 24/7, so there will be no "good" time to do the DDL update anyway.

I do not understand how this would not be how partitioned tables where intended to be used, nor why you cannot see the problem of having the first insert statement take a lock on the master table "without my knowledge".

I cannot ensure that a potential dummy insert statement happens before a real insert statement, since having the relcache entry invalidated requires my transaction doing the DDL update to be committed, leaving it all to be a race condition with my normal insert operations.

I'm going to do a workaround to this if not bug then IMO bad design. Probably something along the lines of a more managed table maintenance step. I am fortunate that I can do that. Not everyone will be able to.

I now understand why the lock is taking, which was what I was asking. I thank you for your comments and suggestions.

Kind regards, Martin