Thread: Re: [PERFORM] Slow BLOBs restoring

Re: [PERFORM] Slow BLOBs restoring

From

Tom Lane

Date:

09 December 2010, 01:29:03

Vlad Arkhipov <arhipov@dc.baikal.ru> writes:
> 08.12.2010 22:46, Tom Lane writes:
>> Are you by any chance restoring from an 8.3 or older pg_dump file made
>> on Windows?  If so, it's a known issue.

> No, I tried Linux only.

OK, then it's not the missing-data-offsets issue.

> I think you can reproduce it. First I created a database full of many
> BLOBs on Postres 8.4.5. Then I created a dump:

Oh, you should have said how many was "many".  I had tried with several
thousand large blobs yesterday and didn't see any problem.  However,
with several hundred thousand small blobs, indeed it gets pretty slow
as soon as you use -j.

oprofile shows all the time is going into reduce_dependencies during the
first loop in restore_toc_entries_parallel (ie, before we've actually
started doing anything in parallel).  The reason is that for each blob,
we're iterating through all of the several hundred thousand TOC entries,
uselessly looking for anything that depends on the blob.  And to add
insult to injury, because the blobs are all marked as SECTION_PRE_DATA,
we don't get to parallelize at all.  I think we won't get to parallelize
the blob data restoration either, since all the blob data is hidden in a
single TOC entry :-(

So the short answer is "don't bother to use -j in a mostly-blobs restore,
becausw it isn't going to help you in 9.0".

One fairly simple, if ugly, thing we could do about this is skip calling
reduce_dependencies during the first loop if the TOC object is a blob;
effectively assuming that nothing could depend on a blob.  But that does
nothing about the point that we're failing to parallelize blob
restoration.  Right offhand it seems hard to do much about that without
some changes to the archive representation of blobs.  Some things that
might be worth looking at for 9.1:

* Add a flag to TOC objects saying "this object has no dependencies",
to provide a generalized and principled way to skip the
reduce_dependencies loop.  This is only a good idea if pg_dump knows
that or can cheaply determine it at dump time, but I think it can.

* Mark BLOB TOC entries as SECTION_DATA, or somehow otherwise make them
parallelizable.  Also break the BLOBS data item apart into an item per
BLOB, so that that part's parallelizable.  Maybe we should combine the
metadata and data for each blob into one TOC item --- if we don't, it
seems like we need a dependency, which will put us back behind the
eight-ball.  I think the reason it's like this is we didn't originally
have a separate TOC item per blob; but now that we added that to support
per-blob ACL data, the monolithic BLOBS item seems pretty pointless.
(Another thing that would have to be looked at here is the dependency
between a BLOB and any BLOB COMMENT for it.)

Thoughts?

            regards, tom lane

Re: [PERFORM] Slow BLOBs restoring

From

Robert Haas

Date:

09 December 2010, 09:05:41

On Thu, Dec 9, 2010 at 12:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Vlad Arkhipov <arhipov@dc.baikal.ru> writes:
>> 08.12.2010 22:46, Tom Lane writes:
>>> Are you by any chance restoring from an 8.3 or older pg_dump file made
>>> on Windows?  If so, it's a known issue.
>
>> No, I tried Linux only.
>
> OK, then it's not the missing-data-offsets issue.
>
>> I think you can reproduce it. First I created a database full of many
>> BLOBs on Postres 8.4.5. Then I created a dump:
>
> Oh, you should have said how many was "many".  I had tried with several
> thousand large blobs yesterday and didn't see any problem.  However,
> with several hundred thousand small blobs, indeed it gets pretty slow
> as soon as you use -j.
>
> oprofile shows all the time is going into reduce_dependencies during the
> first loop in restore_toc_entries_parallel (ie, before we've actually
> started doing anything in parallel).  The reason is that for each blob,
> we're iterating through all of the several hundred thousand TOC entries,
> uselessly looking for anything that depends on the blob.  And to add
> insult to injury, because the blobs are all marked as SECTION_PRE_DATA,
> we don't get to parallelize at all.  I think we won't get to parallelize
> the blob data restoration either, since all the blob data is hidden in a
> single TOC entry :-(
>
> So the short answer is "don't bother to use -j in a mostly-blobs restore,
> becausw it isn't going to help you in 9.0".
>
> One fairly simple, if ugly, thing we could do about this is skip calling
> reduce_dependencies during the first loop if the TOC object is a blob;
> effectively assuming that nothing could depend on a blob.  But that does
> nothing about the point that we're failing to parallelize blob
> restoration.  Right offhand it seems hard to do much about that without
> some changes to the archive representation of blobs.  Some things that
> might be worth looking at for 9.1:
>
> * Add a flag to TOC objects saying "this object has no dependencies",
> to provide a generalized and principled way to skip the
> reduce_dependencies loop.  This is only a good idea if pg_dump knows
> that or can cheaply determine it at dump time, but I think it can.
>
> * Mark BLOB TOC entries as SECTION_DATA, or somehow otherwise make them
> parallelizable.  Also break the BLOBS data item apart into an item per
> BLOB, so that that part's parallelizable.  Maybe we should combine the
> metadata and data for each blob into one TOC item --- if we don't, it
> seems like we need a dependency, which will put us back behind the
> eight-ball.  I think the reason it's like this is we didn't originally
> have a separate TOC item per blob; but now that we added that to support
> per-blob ACL data, the monolithic BLOBS item seems pretty pointless.
> (Another thing that would have to be looked at here is the dependency
> between a BLOB and any BLOB COMMENT for it.)
>
> Thoughts?

Is there any use case for restoring a BLOB but not the BLOB COMMENT or
BLOB ACLs?  Can we just smush everything together into one section?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PERFORM] Slow BLOBs restoring

From

Tom Lane

Date:

09 December 2010, 10:50:43

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 9, 2010 at 12:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> * Mark BLOB TOC entries as SECTION_DATA, or somehow otherwise make them
>> parallelizable. �Also break the BLOBS data item apart into an item per
>> BLOB, so that that part's parallelizable. �Maybe we should combine the
>> metadata and data for each blob into one TOC item --- if we don't, it
>> seems like we need a dependency, which will put us back behind the
>> eight-ball. �I think the reason it's like this is we didn't originally
>> have a separate TOC item per blob; but now that we added that to support
>> per-blob ACL data, the monolithic BLOBS item seems pretty pointless.
>> (Another thing that would have to be looked at here is the dependency
>> between a BLOB and any BLOB COMMENT for it.)

> Is there any use case for restoring a BLOB but not the BLOB COMMENT or
> BLOB ACLs?  Can we just smush everything together into one section?

The ACLs are already part of the main TOC entry for the blob.  As for
comments, I'd want to keep the handling of those the same as they are
for every other kind of object.  But that just begs the question of why
comments are separate TOC entries in the first place.  We could
eliminate this problem if they became fields of object entries across
the board.  Which would be a non-backwards-compatible change in dump
file format, but doing anything about the other issues mentioned above
will require that anyway.

I'm not certain however about whether it's safe to treat the
object-metadata aspects of a blob as SECTION_DATA rather than
SECTION_PRE_DATA.  That will take a bit of investigation.  It seems like
there shouldn't be any fundamental reason for it not to work, but that
doesn't mean there's not any weird assumptions buried someplace in
pg_dump ...
        regards, tom lane

Re: [PERFORM] Slow BLOBs restoring

From

Tom Lane

Date:

09 December 2010, 11:05:27

I wrote:
> One fairly simple, if ugly, thing we could do about this is skip calling
> reduce_dependencies during the first loop if the TOC object is a blob;
> effectively assuming that nothing could depend on a blob.  But that does
> nothing about the point that we're failing to parallelize blob
> restoration.  Right offhand it seems hard to do much about that without
> some changes to the archive representation of blobs.  Some things that
> might be worth looking at for 9.1:

> * Add a flag to TOC objects saying "this object has no dependencies",
> to provide a generalized and principled way to skip the
> reduce_dependencies loop.  This is only a good idea if pg_dump knows
> that or can cheaply determine it at dump time, but I think it can.

I had further ideas about this part of the problem.  First, there's no
need for a file format change to fix this: parallel restore is already
groveling over all the dependencies in its fix_dependencies step, so it
could count them for itself easily enough.  Second, the real problem
here is that reduce_dependencies processing is O(N^2) in the number of
TOC objects.  Skipping it for blobs, or even for all dependency-free
objects, doesn't make that very much better: the kind of people who
really need parallel restore are still likely to bump into unreasonable
processing time.  I think what we need to do is make fix_dependencies
build a reverse lookup list of all the objects dependent on each TOC
object, so that the searching behavior in reduce_dependencies can be
eliminated outright.  That will take O(N) time and O(N) extra space,
which is a good tradeoff because you won't care if N is small, while if
N is large you have got to have it anyway.

Barring objections, I will do this and back-patch into 9.0.  There is
maybe some case for trying to fix 8.4 as well, but since 8.4 didn't
make a separate TOC entry for each blob, it isn't as exposed to the
problem.  We didn't back-patch the last round of efficiency hacks in
this area, so I'm thinking it's not necessary here either.  Comments?
        regards, tom lane

Re: [PERFORM] Slow BLOBs restoring

From

Robert Haas

Date:

09 December 2010, 11:56:47

On Thu, Dec 9, 2010 at 10:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> One fairly simple, if ugly, thing we could do about this is skip calling
>> reduce_dependencies during the first loop if the TOC object is a blob;
>> effectively assuming that nothing could depend on a blob.  But that does
>> nothing about the point that we're failing to parallelize blob
>> restoration.  Right offhand it seems hard to do much about that without
>> some changes to the archive representation of blobs.  Some things that
>> might be worth looking at for 9.1:
>
>> * Add a flag to TOC objects saying "this object has no dependencies",
>> to provide a generalized and principled way to skip the
>> reduce_dependencies loop.  This is only a good idea if pg_dump knows
>> that or can cheaply determine it at dump time, but I think it can.
>
> I had further ideas about this part of the problem.  First, there's no
> need for a file format change to fix this: parallel restore is already
> groveling over all the dependencies in its fix_dependencies step, so it
> could count them for itself easily enough.  Second, the real problem
> here is that reduce_dependencies processing is O(N^2) in the number of
> TOC objects.  Skipping it for blobs, or even for all dependency-free
> objects, doesn't make that very much better: the kind of people who
> really need parallel restore are still likely to bump into unreasonable
> processing time.  I think what we need to do is make fix_dependencies
> build a reverse lookup list of all the objects dependent on each TOC
> object, so that the searching behavior in reduce_dependencies can be
> eliminated outright.  That will take O(N) time and O(N) extra space,
> which is a good tradeoff because you won't care if N is small, while if
> N is large you have got to have it anyway.
>
> Barring objections, I will do this and back-patch into 9.0.  There is
> maybe some case for trying to fix 8.4 as well, but since 8.4 didn't
> make a separate TOC entry for each blob, it isn't as exposed to the
> problem.  We didn't back-patch the last round of efficiency hacks in
> this area, so I'm thinking it's not necessary here either.  Comments?

Ah, that sounds like a much cleaner solution.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PERFORM] Slow BLOBs restoring

From

Andrew Dunstan

Date:

09 December 2010, 12:01:51


On 12/09/2010 10:05 AM, Tom Lane wrote:
> I think what we need to do is make fix_dependencies
> build a reverse lookup list of all the objects dependent on each TOC
> object, so that the searching behavior in reduce_dependencies can be
> eliminated outright.  That will take O(N) time and O(N) extra space,
> which is a good tradeoff because you won't care if N is small, while if
> N is large you have got to have it anyway.
>
> Barring objections, I will do this and back-patch into 9.0.  There is
> maybe some case for trying to fix 8.4 as well, but since 8.4 didn't
> make a separate TOC entry for each blob, it isn't as exposed to the
> problem.  We didn't back-patch the last round of efficiency hacks in
> this area, so I'm thinking it's not necessary here either.  Comments?
>
>             


Sound good. Re 8.4: at a pinch people could probably use the 9.0 
pg_restore with their 8.4 dump.

cheers

andrew