Thread: More parallel pg_dump bogosities

More parallel pg_dump bogosities

From
Tom Lane
Date:
So I started poking at the idea of sorting by size during parallel
restore instead of sorting pg_dump's TOC that way.  While investigating
just where to do that, I discovered that, using the regression database
as test case, restore_toc_entries_parallel() finds these objects to
be *immediately* ready to restore at the start of the parallel phase:

all TABLE DATA objects --- as expected
all SEQUENCE SET objects --- as expected
BLOBS --- as expected
CONSTRAINT idxpart_another idxpart_another_pkey
INDEX mvtest_aa
INDEX mvtest_tm_type
INDEX mvtest_tvmm_expr
INDEX mvtest_tvmm_pred
ROW SECURITY ec1
ROW SECURITY rls_tbl
ROW SECURITY rls_tbl_force

I wasn't expecting any POST_DATA objects to be ready at this point,
so I dug into the reasons why these other ones are ready, and found
that:

idxpart_another_pkey is an index on a partitioned table (new feature
in v11).  According to the dump archive, it has a dependency on the
partitioned table.  Normally, repoint_table_dependencies() would change
an index's table dependency to reference the table's TABLE DATA item,
preventing it from being restored before the data is loaded.  But a
partitioned table has no TABLE DATA item, so that doesn't happen.
I guess this is okay, really, but it's a bit surprising.

The other four indexes are on materialized views, which likewise don't
have TABLE DATA items.  This means that when restoring materialized
views, we make their indexes before we REFRESH the matviews.  I guess
that's probably functionally okay (the same thing happens in non-parallel
restores) but it's leaving some parallelism on the table, because it means
more work gets crammed into the REFRESH action.  Maybe somebody would like
to fix that.  I'm not volunteering right now, though.

And lastly, the ROW SECURITY items are ready because they are not marked
with any dependency at all, none, nada.  This seems bad.  In principle
it could mean that parallel restore would try to emit "ALTER TABLE ENABLE
ROW LEVEL SECURITY" before it's created the table :-(.  I think that in
practice that can't happen today, because CREATE TABLE commands get
emitted before we've switched into parallel restore mode at all.  But it's
definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before
we've restored the table's data.  Won't that break things?

I think this is easy enough to fix, just force a dependency on the table
to be attached to a ROW SECURITY item; but I wanted to confirm my
conclusion that we need one.

            regards, tom lane


Re: More parallel pg_dump bogosities

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> So I started poking at the idea of sorting by size during parallel
> restore instead of sorting pg_dump's TOC that way.  While investigating
> just where to do that, I discovered that, using the regression database
> as test case, restore_toc_entries_parallel() finds these objects to
> be *immediately* ready to restore at the start of the parallel phase:
>
> all TABLE DATA objects --- as expected
> all SEQUENCE SET objects --- as expected
> BLOBS --- as expected
> CONSTRAINT idxpart_another idxpart_another_pkey
> INDEX mvtest_aa
> INDEX mvtest_tm_type
> INDEX mvtest_tvmm_expr
> INDEX mvtest_tvmm_pred
> ROW SECURITY ec1
> ROW SECURITY rls_tbl
> ROW SECURITY rls_tbl_force
>
> I wasn't expecting any POST_DATA objects to be ready at this point,
> so I dug into the reasons why these other ones are ready, and found
> that:
>
> idxpart_another_pkey is an index on a partitioned table (new feature
> in v11).  According to the dump archive, it has a dependency on the
> partitioned table.  Normally, repoint_table_dependencies() would change
> an index's table dependency to reference the table's TABLE DATA item,
> preventing it from being restored before the data is loaded.  But a
> partitioned table has no TABLE DATA item, so that doesn't happen.
> I guess this is okay, really, but it's a bit surprising.
>
> The other four indexes are on materialized views, which likewise don't
> have TABLE DATA items.  This means that when restoring materialized
> views, we make their indexes before we REFRESH the matviews.  I guess
> that's probably functionally okay (the same thing happens in non-parallel
> restores) but it's leaving some parallelism on the table, because it means
> more work gets crammed into the REFRESH action.  Maybe somebody would like
> to fix that.  I'm not volunteering right now, though.

Hrmpf, I agree, that certainly doesn't seem ideal.

> And lastly, the ROW SECURITY items are ready because they are not marked
> with any dependency at all, none, nada.  This seems bad.  In principle
> it could mean that parallel restore would try to emit "ALTER TABLE ENABLE
> ROW LEVEL SECURITY" before it's created the table :-(.  I think that in
> practice that can't happen today, because CREATE TABLE commands get
> emitted before we've switched into parallel restore mode at all.  But it's
> definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before
> we've restored the table's data.  Won't that break things?

We certainly wouldn't want the ROW SECURITY items to be emitted before
the table itself, that wouldn't work.  Emitting ENABLE RLS before
restoring the table's data shouldn't actually break anything though as
the owner of the table (which is who we're restoring the data as,
at that point, right?) will bypass RLS.  Now, if what's actually set is
FORCE RLS, then we may have an issue if that is emitted before the table
data since that would cause RLS to be in effect even if we're the owner
of the table.

> I think this is easy enough to fix, just force a dependency on the table
> to be attached to a ROW SECURITY item; but I wanted to confirm my
> conclusion that we need one.

I do think we should have one.  In an ideal world, I think we'd want:

CREATE TABLE
TABLE DATA
ENABLE RLS
ADD RLS POLICIES
GRANT RIGHTS

Though, ultimately, those last two could be flipped since RLS has a
'default deny' policy if no policies exist on the table but RLS is
enabled.  We really shouldn't issue GRANT statements before ENABLE'ing
RLS though, since that might allow someone to access all the rows in the
table when they really should be limited to only some subset.  This all
only applies in the cases where we aren't running the restore in a
single transaction, of course, but that's implied by talking about
paralell restore.  I'm not sure about how much fun it would be to make
that all work that way though.  I do think we need the ENABLE RLS bits
to depend on the table to exist though.

Thanks!

Stephen

Attachment

Re: More parallel pg_dump bogosities

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> And lastly, the ROW SECURITY items are ready because they are not marked
>> with any dependency at all, none, nada.  This seems bad.  In principle
>> it could mean that parallel restore would try to emit "ALTER TABLE ENABLE
>> ROW LEVEL SECURITY" before it's created the table :-(.  I think that in
>> practice that can't happen today, because CREATE TABLE commands get
>> emitted before we've switched into parallel restore mode at all.  But it's
>> definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before
>> we've restored the table's data.  Won't that break things?

> We certainly wouldn't want the ROW SECURITY items to be emitted before
> the table itself, that wouldn't work.  Emitting ENABLE RLS before
> restoring the table's data shouldn't actually break anything though as
> the owner of the table (which is who we're restoring the data as,
> at that point, right?) will bypass RLS.

Hmm.  We'd typically be running a restore either as superuser or as the
table owner ...

> Now, if what's actually set is
> FORCE RLS, then we may have an issue if that is emitted before the table
> data since that would cause RLS to be in effect even if we're the owner
> of the table.

... but if FORCE RLS is applied to the table, we gotta problem anyway,
because that command is included right with the initial table creation.
Once the ENABLE comes out, inserts will fail, because there are no
policies yet (those *do* have table dependencies that get moved to
point at the TABLE DATA).

So it's a bug all right, but a sufficiently corner-case one that it's
not too surprising it hasn't been reported.  Even though the ENABLE RLS
item is "ready to restore", it'll still be at the end of the work list,
so you'd need a very high parallel worker count to have much chance of
it coming out before the table's data item gets processed.

>> I think this is easy enough to fix, just force a dependency on the table
>> to be attached to a ROW SECURITY item; but I wanted to confirm my
>> conclusion that we need one.

> I do think we should have one.

I'll do that, but ...

> In an ideal world, I think we'd want:

> CREATE TABLE
> TABLE DATA
> ENABLE RLS
> ADD RLS POLICIES
> GRANT RIGHTS

... as things stand, even with this fix, a parallel restore will emit
CREATE POLICY and ENABLE RLS commands in an unpredictable order.  Not sure
how much it matters, though.  The GRANTs should come out last anyway.

            regards, tom lane


Re: More parallel pg_dump bogosities

From
Stephen Frost
Date:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> And lastly, the ROW SECURITY items are ready because they are not marked
> >> with any dependency at all, none, nada.  This seems bad.  In principle
> >> it could mean that parallel restore would try to emit "ALTER TABLE ENABLE
> >> ROW LEVEL SECURITY" before it's created the table :-(.  I think that in
> >> practice that can't happen today, because CREATE TABLE commands get
> >> emitted before we've switched into parallel restore mode at all.  But it's
> >> definitely possible that ENABLE ROW LEVEL SECURITY could be emitted before
> >> we've restored the table's data.  Won't that break things?
>
> > We certainly wouldn't want the ROW SECURITY items to be emitted before
> > the table itself, that wouldn't work.  Emitting ENABLE RLS before
> > restoring the table's data shouldn't actually break anything though as
> > the owner of the table (which is who we're restoring the data as,
> > at that point, right?) will bypass RLS.
>
> Hmm.  We'd typically be running a restore either as superuser or as the
> table owner ...

Right.

> > Now, if what's actually set is
> > FORCE RLS, then we may have an issue if that is emitted before the table
> > data since that would cause RLS to be in effect even if we're the owner
> > of the table.
>
> ... but if FORCE RLS is applied to the table, we gotta problem anyway,
> because that command is included right with the initial table creation.
> Once the ENABLE comes out, inserts will fail, because there are no
> policies yet (those *do* have table dependencies that get moved to
> point at the TABLE DATA).

Yeah, I don't think the pg_dump aspect was considered very carefully
when we reworked how the 'force' RLS option works.

> So it's a bug all right, but a sufficiently corner-case one that it's
> not too surprising it hasn't been reported.  Even though the ENABLE RLS
> item is "ready to restore", it'll still be at the end of the work list,
> so you'd need a very high parallel worker count to have much chance of
> it coming out before the table's data item gets processed.

Ok.

> >> I think this is easy enough to fix, just force a dependency on the table
> >> to be attached to a ROW SECURITY item; but I wanted to confirm my
> >> conclusion that we need one.
>
> > I do think we should have one.
>
> I'll do that, but ...

Ok.

> > In an ideal world, I think we'd want:
>
> > CREATE TABLE
> > TABLE DATA
> > ENABLE RLS
> > ADD RLS POLICIES
> > GRANT RIGHTS
>
> ... as things stand, even with this fix, a parallel restore will emit
> CREATE POLICY and ENABLE RLS commands in an unpredictable order.  Not sure
> how much it matters, though.  The GRANTs should come out last anyway.

Yeah, as long as GRANT's come out last, it really shouldn't matter when
the ENABLE happens vs. the CREATE POLICY's, except in the odd FORCE
case.

Thanks!

Stephen

Attachment

Re: More parallel pg_dump bogosities

From
Tom Lane
Date:
... just when you thought it was safe to go back in the water ...

Doesn't pg_backup_archiver.c's identify_locking_dependencies() need to
treat POLICY and ROW SECURITY items as requiring exclusive lock on
the referenced table?  Those commands definitely acquire
AccessExclusiveLock in a quick test.

I haven't looked hard, but I'm suspicious that other recently-added
dump object types may have been missed here too, and even more
suspicious that we'll forget this again in future.  I wonder if we
shouldn't invert the logic, so that instead of a blacklist of object
types that we assume need exclusive lock, we keep a whitelist of
object types that are known not to (which might be just INDEX,
not sure).  That way, we'd at least be failing in a safe direction.

            regards, tom lane


Re: More parallel pg_dump bogosities

From
Alvaro Herrera
Date:
On 2018-Aug-28, Tom Lane wrote:

> ... just when you thought it was safe to go back in the water ...
> 
> Doesn't pg_backup_archiver.c's identify_locking_dependencies() need to
> treat POLICY and ROW SECURITY items as requiring exclusive lock on
> the referenced table?  Those commands definitely acquire
> AccessExclusiveLock in a quick test.
> 
> I haven't looked hard, but I'm suspicious that other recently-added
> dump object types may have been missed here too,

I hadn't come across this locking dependency before, so it's pretty
likely that partitioned index attachment has a problem here.

> and even more suspicious that we'll forget this again in future.

... yeah, it seems easy to overlook the need to edit this.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: More parallel pg_dump bogosities

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2018-Aug-28, Tom Lane wrote:
>> Doesn't pg_backup_archiver.c's identify_locking_dependencies() need to
>> treat POLICY and ROW SECURITY items as requiring exclusive lock on
>> the referenced table?  Those commands definitely acquire
>> AccessExclusiveLock in a quick test.

> I hadn't come across this locking dependency before, so it's pretty
> likely that partitioned index attachment has a problem here.

Hm, it looks like

ALTER INDEX public.at_partitioned_a_idx ATTACH PARTITION public.at_part_1_a_idx;

takes these locks:

       relation       |        mode
----------------------+---------------------
 at_part_1            | AccessShareLock
 at_partitioned       | AccessShareLock
 at_part_1_a_idx      | AccessExclusiveLock
 at_partitioned_a_idx | AccessExclusiveLock

I'm not aware of exactly what this does to catalog entries, but is it
*really* safe to have only AccessShareLock on the tables?  That sounds
like a bit of wishful thinking :-(.

In any case, the exclusive locks on the indexes are likely sufficient
to block other operations on the tables (maybe leading to deadlocks),
so I'm inclined to think that yeah, parallel restore should refrain
from running this in parallel with other DDL on either table.

            regards, tom lane