Thread: Exclusively locking parent tables while disinheriting children.
Hi, When working with partition sets, we're seeing occasional errors of "could not find inherited attribute..." in Select queries. This is apparently caused when an "ALTER TABLE ... NO INHERIT" runs concurrently with another transaction selecting from the relevant child table. I found an old bug report filed against 8.3 back in 2008 [1] I can still reproduce the test case in that report on 9.1.11 and 9.3.5, and it seems to match what we're seeing in production. Tom Lane said at the time that a lock would cause more problems than it solved [2], but when I add an explicit lock statement ("LOCK TABLE ONLY p_tbl, c1_tbl IN ACCESS EXCLUSIVE MODE;") between "BEGIN" and "ALTER TABLE", I get the behaviour I would expect - the SELECT blocks until the transaction is committed, then returns rows from the remaining child table. So what I want to understand is what the risk of adding this lock are - under what circumstances would I expect to see dead locks if I manually added this lock to my partition maintenance functions? If there aren't any, should the database itself acquire this lock during the ALTER TABLE process? There is mention in previous discussions of DROP TABLE also not taking a lock, but even if that case isn't fixable, fixing NO INHERIT would at least provide a documented (and quite intuitive) way to achieve this safely - always disinherit your children before dropping them. [1] http://www.postgresql.org/message-id/200806171229.m5HCTfsI091593%40wwwmaster.postgresql.org [2] http://www.postgresql.org/message-id/19666.1213709303%40sss.pgh.pa.us Regards, -- Rowan Collins [IMSoP]
On 30 July 2015 at 13:35, Rowan Collins <rowan.collins@gmail.com> wrote:
Hi,
When working with partition sets, we're seeing occasional errors of "could not find inherited attribute..." in Select queries. This is apparently caused when an "ALTER TABLE ... NO INHERIT" runs concurrently with another transaction selecting from the relevant child table.
I found an old bug report filed against 8.3 back in 2008 [1] I can still reproduce the test case in that report on 9.1.11 and 9.3.5, and it seems to match what we're seeing in production.
Tom Lane said at the time that a lock would cause more problems than it solved [2], but when I add an explicit lock statement ("LOCK TABLE ONLY p_tbl, c1_tbl IN ACCESS EXCLUSIVE MODE;") between "BEGIN" and "ALTER TABLE", I get the behaviour I would expect - the SELECT blocks until the transaction is committed, then returns rows from the remaining child table.
So what I want to understand is what the risk of adding this lock are - under what circumstances would I expect to see dead locks if I manually added this lock to my partition maintenance functions?
I'm not clear on the problems such a change would present either, but I'm probably overlooking the relevant scenario.
Thom
On 7 August 2015 at 12:34, Thom Brown <thom@linux.com> wrote:
On 30 July 2015 at 13:35, Rowan Collins <rowan.collins@gmail.com> wrote:Hi,
When working with partition sets, we're seeing occasional errors of "could not find inherited attribute..." in Select queries. This is apparently caused when an "ALTER TABLE ... NO INHERIT" runs concurrently with another transaction selecting from the relevant child table.
I found an old bug report filed against 8.3 back in 2008 [1] I can still reproduce the test case in that report on 9.1.11 and 9.3.5, and it seems to match what we're seeing in production.
Tom Lane said at the time that a lock would cause more problems than it solved [2], but when I add an explicit lock statement ("LOCK TABLE ONLY p_tbl, c1_tbl IN ACCESS EXCLUSIVE MODE;") between "BEGIN" and "ALTER TABLE", I get the behaviour I would expect - the SELECT blocks until the transaction is committed, then returns rows from the remaining child table.
So what I want to understand is what the risk of adding this lock are - under what circumstances would I expect to see dead locks if I manually added this lock to my partition maintenance functions?I'm not clear on the problems such a change would present either, but I'm probably overlooking the relevant scenario.
Has anyone got insight as to what's wrong with exclusively locking a parent table to disinherit a child table?
Thom
On Wed, Sep 16, 2015 at 12:10 AM, Thom Brown <thom@linux.com> wrote: > On 7 August 2015 at 12:34, Thom Brown <thom@linux.com> wrote: >> On 30 July 2015 at 13:35, Rowan Collins <rowan.collins@gmail.com> wrote: >>> So what I want to understand is what the risk of adding this lock are - >>> under what circumstances would I expect to see dead locks if I manually >>> added this lock to my partition maintenance functions? >> I'm not clear on the problems such a change would present either, but I'm >> probably overlooking the relevant scenario. > Has anyone got insight as to what's wrong with exclusively locking a parent > table to disinherit a child table? I don't see why that error would occur if concurrently selecting directly from a "child". I can imagine it occurring if selected from the parent. For example, one session performs ALTER TABLE somechild NO INHERIT parent, then another session trying to select from parent would block for lock on somechild. Once it gets the lock, it checks if somechild exists at all for safety (because other session may have dropped it). What it doesn't check though is whether somechild is still in the list of children of parent. Having failed to do that, it encounters the error in question when further along it tries to find parent's attributes in somechild which are no longer there (remember, other session just disinherited parent). That perhaps needs some fixing. I may be missing though. Thanks, Amit