Re: [PERFORM] Slow BLOBs restoring - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [PERFORM] Slow BLOBs restoring |
Date | |
Msg-id | AANLkTinvRa_XM88FV8vJjjqkRONeNiQiY=Lxg5iARqtU@mail.gmail.com Whole thread Raw |
In response to | Re: [PERFORM] Slow BLOBs restoring (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [PERFORM] Slow BLOBs restoring
|
List | pgsql-hackers |
On Thu, Dec 9, 2010 at 12:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Vlad Arkhipov <arhipov@dc.baikal.ru> writes: >> 08.12.2010 22:46, Tom Lane writes: >>> Are you by any chance restoring from an 8.3 or older pg_dump file made >>> on Windows? If so, it's a known issue. > >> No, I tried Linux only. > > OK, then it's not the missing-data-offsets issue. > >> I think you can reproduce it. First I created a database full of many >> BLOBs on Postres 8.4.5. Then I created a dump: > > Oh, you should have said how many was "many". I had tried with several > thousand large blobs yesterday and didn't see any problem. However, > with several hundred thousand small blobs, indeed it gets pretty slow > as soon as you use -j. > > oprofile shows all the time is going into reduce_dependencies during the > first loop in restore_toc_entries_parallel (ie, before we've actually > started doing anything in parallel). The reason is that for each blob, > we're iterating through all of the several hundred thousand TOC entries, > uselessly looking for anything that depends on the blob. And to add > insult to injury, because the blobs are all marked as SECTION_PRE_DATA, > we don't get to parallelize at all. I think we won't get to parallelize > the blob data restoration either, since all the blob data is hidden in a > single TOC entry :-( > > So the short answer is "don't bother to use -j in a mostly-blobs restore, > becausw it isn't going to help you in 9.0". > > One fairly simple, if ugly, thing we could do about this is skip calling > reduce_dependencies during the first loop if the TOC object is a blob; > effectively assuming that nothing could depend on a blob. But that does > nothing about the point that we're failing to parallelize blob > restoration. Right offhand it seems hard to do much about that without > some changes to the archive representation of blobs. Some things that > might be worth looking at for 9.1: > > * Add a flag to TOC objects saying "this object has no dependencies", > to provide a generalized and principled way to skip the > reduce_dependencies loop. This is only a good idea if pg_dump knows > that or can cheaply determine it at dump time, but I think it can. > > * Mark BLOB TOC entries as SECTION_DATA, or somehow otherwise make them > parallelizable. Also break the BLOBS data item apart into an item per > BLOB, so that that part's parallelizable. Maybe we should combine the > metadata and data for each blob into one TOC item --- if we don't, it > seems like we need a dependency, which will put us back behind the > eight-ball. I think the reason it's like this is we didn't originally > have a separate TOC item per blob; but now that we added that to support > per-blob ACL data, the monolithic BLOBS item seems pretty pointless. > (Another thing that would have to be looked at here is the dependency > between a BLOB and any BLOB COMMENT for it.) > > Thoughts? Is there any use case for restoring a BLOB but not the BLOB COMMENT or BLOB ACLs? Can we just smush everything together into one section? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: