Thread: More parallel pg_dump bogosities
So I started poking at the idea of sorting by size during parallel restore instead of sorting pg_dump's TOC that way. While investigating just where to do that, I discovered that, using the regression database as test case, restore_toc_entries_parallel() finds these objects to be *immediately* ready to restore at the start of the parallel phase: all TABLE DATA objects --- as expected all SEQUENCE SET objects --- as expected BLOBS --- as expected CONSTRAINT idxpart_another idxpart_another_pkey INDEX mvtest_aa INDEX mvtest_tm_type INDEX mvtest_tvmm_expr INDEX mvtest_tvmm_pred ROW SECURITY ec1 ROW SECURITY rls_tbl ROW SECURITY rls_tbl_force I wasn't expecting any POST_DATA objects to be ready at this point, so I dug into the reasons why these other ones are ready, and found that: idxpart_another_pkey is an index on a partitioned table (new feature in v11). According to the dump archive, it has a dependency on the partitioned table. Normally, repoint_table_dependencies() would change an index's table dependency to reference the table's TABLE DATA item, preventing it from being restored before the data is loaded. But a partitioned table has no TABLE DATA item, so that doesn't happen. I guess this is okay, really, but it's a bit surprising. The other four indexes are on materialized views, which likewise don't have TABLE DATA items. This means that when restoring materialized views, we make their indexes before we REFRESH the matviews. I guess that's probably functionally okay (the same thing happens in non-parallel restores) but it's leaving some parallelism on the table, because it means more work gets crammed into the REFRESH action. Maybe somebody would like to fix that. I'm not volunteering right now, though. And lastly, the ROW SECURITY items are ready because they are not marked with any dependency at all, none, nada. This seems bad. In principle it could mean that parallel restore would try to emit "ALTER TABLE ENABLE ROW LEVEL SECURITY" before it's created the table :-(. I think that in practice that can't happen today, because CREATE TABLE commands get emitted before we've switched into parallel restore mode at all. But it's definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before we've restored the table's data. Won't that break things? I think this is easy enough to fix, just force a dependency on the table to be attached to a ROW SECURITY item; but I wanted to confirm my conclusion that we need one. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > So I started poking at the idea of sorting by size during parallel > restore instead of sorting pg_dump's TOC that way. While investigating > just where to do that, I discovered that, using the regression database > as test case, restore_toc_entries_parallel() finds these objects to > be *immediately* ready to restore at the start of the parallel phase: > > all TABLE DATA objects --- as expected > all SEQUENCE SET objects --- as expected > BLOBS --- as expected > CONSTRAINT idxpart_another idxpart_another_pkey > INDEX mvtest_aa > INDEX mvtest_tm_type > INDEX mvtest_tvmm_expr > INDEX mvtest_tvmm_pred > ROW SECURITY ec1 > ROW SECURITY rls_tbl > ROW SECURITY rls_tbl_force > > I wasn't expecting any POST_DATA objects to be ready at this point, > so I dug into the reasons why these other ones are ready, and found > that: > > idxpart_another_pkey is an index on a partitioned table (new feature > in v11). According to the dump archive, it has a dependency on the > partitioned table. Normally, repoint_table_dependencies() would change > an index's table dependency to reference the table's TABLE DATA item, > preventing it from being restored before the data is loaded. But a > partitioned table has no TABLE DATA item, so that doesn't happen. > I guess this is okay, really, but it's a bit surprising. > > The other four indexes are on materialized views, which likewise don't > have TABLE DATA items. This means that when restoring materialized > views, we make their indexes before we REFRESH the matviews. I guess > that's probably functionally okay (the same thing happens in non-parallel > restores) but it's leaving some parallelism on the table, because it means > more work gets crammed into the REFRESH action. Maybe somebody would like > to fix that. I'm not volunteering right now, though. Hrmpf, I agree, that certainly doesn't seem ideal. > And lastly, the ROW SECURITY items are ready because they are not marked > with any dependency at all, none, nada. This seems bad. In principle > it could mean that parallel restore would try to emit "ALTER TABLE ENABLE > ROW LEVEL SECURITY" before it's created the table :-(. I think that in > practice that can't happen today, because CREATE TABLE commands get > emitted before we've switched into parallel restore mode at all. But it's > definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before > we've restored the table's data. Won't that break things? We certainly wouldn't want the ROW SECURITY items to be emitted before the table itself, that wouldn't work. Emitting ENABLE RLS before restoring the table's data shouldn't actually break anything though as the owner of the table (which is who we're restoring the data as, at that point, right?) will bypass RLS. Now, if what's actually set is FORCE RLS, then we may have an issue if that is emitted before the table data since that would cause RLS to be in effect even if we're the owner of the table. > I think this is easy enough to fix, just force a dependency on the table > to be attached to a ROW SECURITY item; but I wanted to confirm my > conclusion that we need one. I do think we should have one. In an ideal world, I think we'd want: CREATE TABLE TABLE DATA ENABLE RLS ADD RLS POLICIES GRANT RIGHTS Though, ultimately, those last two could be flipped since RLS has a 'default deny' policy if no policies exist on the table but RLS is enabled. We really shouldn't issue GRANT statements before ENABLE'ing RLS though, since that might allow someone to access all the rows in the table when they really should be limited to only some subset. This all only applies in the cases where we aren't running the restore in a single transaction, of course, but that's implied by talking about paralell restore. I'm not sure about how much fun it would be to make that all work that way though. I do think we need the ENABLE RLS bits to depend on the table to exist though. Thanks! Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> And lastly, the ROW SECURITY items are ready because they are not marked >> with any dependency at all, none, nada. This seems bad. In principle >> it could mean that parallel restore would try to emit "ALTER TABLE ENABLE >> ROW LEVEL SECURITY" before it's created the table :-(. I think that in >> practice that can't happen today, because CREATE TABLE commands get >> emitted before we've switched into parallel restore mode at all. But it's >> definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before >> we've restored the table's data. Won't that break things? > We certainly wouldn't want the ROW SECURITY items to be emitted before > the table itself, that wouldn't work. Emitting ENABLE RLS before > restoring the table's data shouldn't actually break anything though as > the owner of the table (which is who we're restoring the data as, > at that point, right?) will bypass RLS. Hmm. We'd typically be running a restore either as superuser or as the table owner ... > Now, if what's actually set is > FORCE RLS, then we may have an issue if that is emitted before the table > data since that would cause RLS to be in effect even if we're the owner > of the table. ... but if FORCE RLS is applied to the table, we gotta problem anyway, because that command is included right with the initial table creation. Once the ENABLE comes out, inserts will fail, because there are no policies yet (those *do* have table dependencies that get moved to point at the TABLE DATA). So it's a bug all right, but a sufficiently corner-case one that it's not too surprising it hasn't been reported. Even though the ENABLE RLS item is "ready to restore", it'll still be at the end of the work list, so you'd need a very high parallel worker count to have much chance of it coming out before the table's data item gets processed. >> I think this is easy enough to fix, just force a dependency on the table >> to be attached to a ROW SECURITY item; but I wanted to confirm my >> conclusion that we need one. > I do think we should have one. I'll do that, but ... > In an ideal world, I think we'd want: > CREATE TABLE > TABLE DATA > ENABLE RLS > ADD RLS POLICIES > GRANT RIGHTS ... as things stand, even with this fix, a parallel restore will emit CREATE POLICY and ENABLE RLS commands in an unpredictable order. Not sure how much it matters, though. The GRANTs should come out last anyway. regards, tom lane
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> And lastly, the ROW SECURITY items are ready because they are not marked > >> with any dependency at all, none, nada. This seems bad. In principle > >> it could mean that parallel restore would try to emit "ALTER TABLE ENABLE > >> ROW LEVEL SECURITY" before it's created the table :-(. I think that in > >> practice that can't happen today, because CREATE TABLE commands get > >> emitted before we've switched into parallel restore mode at all. But it's > >> definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before > >> we've restored the table's data. Won't that break things? > > > We certainly wouldn't want the ROW SECURITY items to be emitted before > > the table itself, that wouldn't work. Emitting ENABLE RLS before > > restoring the table's data shouldn't actually break anything though as > > the owner of the table (which is who we're restoring the data as, > > at that point, right?) will bypass RLS. > > Hmm. We'd typically be running a restore either as superuser or as the > table owner ... Right. > > Now, if what's actually set is > > FORCE RLS, then we may have an issue if that is emitted before the table > > data since that would cause RLS to be in effect even if we're the owner > > of the table. > > ... but if FORCE RLS is applied to the table, we gotta problem anyway, > because that command is included right with the initial table creation. > Once the ENABLE comes out, inserts will fail, because there are no > policies yet (those *do* have table dependencies that get moved to > point at the TABLE DATA). Yeah, I don't think the pg_dump aspect was considered very carefully when we reworked how the 'force' RLS option works. > So it's a bug all right, but a sufficiently corner-case one that it's > not too surprising it hasn't been reported. Even though the ENABLE RLS > item is "ready to restore", it'll still be at the end of the work list, > so you'd need a very high parallel worker count to have much chance of > it coming out before the table's data item gets processed. Ok. > >> I think this is easy enough to fix, just force a dependency on the table > >> to be attached to a ROW SECURITY item; but I wanted to confirm my > >> conclusion that we need one. > > > I do think we should have one. > > I'll do that, but ... Ok. > > In an ideal world, I think we'd want: > > > CREATE TABLE > > TABLE DATA > > ENABLE RLS > > ADD RLS POLICIES > > GRANT RIGHTS > > ... as things stand, even with this fix, a parallel restore will emit > CREATE POLICY and ENABLE RLS commands in an unpredictable order. Not sure > how much it matters, though. The GRANTs should come out last anyway. Yeah, as long as GRANT's come out last, it really shouldn't matter when the ENABLE happens vs. the CREATE POLICY's, except in the odd FORCE case. Thanks! Stephen
Attachment
... just when you thought it was safe to go back in the water ... Doesn't pg_backup_archiver.c's identify_locking_dependencies() need to treat POLICY and ROW SECURITY items as requiring exclusive lock on the referenced table? Those commands definitely acquire AccessExclusiveLock in a quick test. I haven't looked hard, but I'm suspicious that other recently-added dump object types may have been missed here too, and even more suspicious that we'll forget this again in future. I wonder if we shouldn't invert the logic, so that instead of a blacklist of object types that we assume need exclusive lock, we keep a whitelist of object types that are known not to (which might be just INDEX, not sure). That way, we'd at least be failing in a safe direction. regards, tom lane
On 2018-Aug-28, Tom Lane wrote: > ... just when you thought it was safe to go back in the water ... > > Doesn't pg_backup_archiver.c's identify_locking_dependencies() need to > treat POLICY and ROW SECURITY items as requiring exclusive lock on > the referenced table? Those commands definitely acquire > AccessExclusiveLock in a quick test. > > I haven't looked hard, but I'm suspicious that other recently-added > dump object types may have been missed here too, I hadn't come across this locking dependency before, so it's pretty likely that partitioned index attachment has a problem here. > and even more suspicious that we'll forget this again in future. ... yeah, it seems easy to overlook the need to edit this. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2018-Aug-28, Tom Lane wrote: >> Doesn't pg_backup_archiver.c's identify_locking_dependencies() need to >> treat POLICY and ROW SECURITY items as requiring exclusive lock on >> the referenced table? Those commands definitely acquire >> AccessExclusiveLock in a quick test. > I hadn't come across this locking dependency before, so it's pretty > likely that partitioned index attachment has a problem here. Hm, it looks like ALTER INDEX public.at_partitioned_a_idx ATTACH PARTITION public.at_part_1_a_idx; takes these locks: relation | mode ----------------------+--------------------- at_part_1 | AccessShareLock at_partitioned | AccessShareLock at_part_1_a_idx | AccessExclusiveLock at_partitioned_a_idx | AccessExclusiveLock I'm not aware of exactly what this does to catalog entries, but is it *really* safe to have only AccessShareLock on the tables? That sounds like a bit of wishful thinking :-(. In any case, the exclusive locks on the indexes are likely sufficient to block other operations on the tables (maybe leading to deadlocks), so I'm inclined to think that yeah, parallel restore should refrain from running this in parallel with other DDL on either table. regards, tom lane