Thread: Should pg_dump dump larger tables first?
All, It's perhaps not the ideal time for a discussion but if I thought it would turn into a long discussion then I'd probably not post this due to the current timing in the release cycle. This is something I thought of while doing a restore on a 40ish GB database which has a few hundred smallish tables of various sizes up to about 1.5 million records, then a handful of larger tables containing 20-70 million records. During the restore (which was running 4 separate jobs), I was polling SELECT query FROM pg_Stat_activity to find out the progress of the restore. I noticed that there was now less than 4 jobs running and pg_restore was busy doing COPY into some of the 20-70 million record tables. If pg_dump was to still follow the dependencies of objects, would there be any reason why it shouldn't backup larger tables first? This should then allow pg_restore to balance the smaller tables around separate jobs at the end of the restore instead of having CPUs sitting idle while say 1 job is busy on a big table. Of course this would not improve things for all work loads, but I hardly think that a database with a high number of smallish tables and a small number of large tables is unusual. If there was consensus that it might be a good idea to craft up a patch to test if this is worth it then I'd be willing to give it a go. Some of the things I thought about but did not have an answer for:1. Would it be enough just check the number of blocks ineach relation or would it be better to look at the statistics to estimate the size of the when it's restored minus the dead tuples.2. Would it be a good idea to add an extra pg_dump option for this or just make it the default for all dumps that contain table data? Any thoughts on this are welcome. Regards David Rowley
"David Rowley" <dgrowleyml@gmail.com> writes: > If pg_dump was to still follow the dependencies of objects, would there be > any reason why it shouldn't backup larger tables first? Pretty much every single discussion/complaint about pg_dump's ordering choices has been about making its behavior more deterministic not less so. So I can't imagine such a change would go over well with most folks. Also, it's far from obvious to me that "largest first" is the best rule anyhow; it's likely to be more complicated than that. But anyway, the right place to add this sort of consideration is in pg_restore --parallel, not pg_dump. I don't know how hard it would be for the scheduler algorithm in there to take table size into account, but at least in principle it should be possible to find out the size of the (compressed) table data from examination of the archive file. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Also, it's far from obvious to me that "largest first" is the best rule > anyhow; it's likely to be more complicated than that. > > But anyway, the right place to add this sort of consideration is in > pg_restore --parallel, not pg_dump. I don't know how hard it would be > for the scheduler algorithm in there to take table size into account, > but at least in principle it should be possible to find out the size of > the (compressed) table data from examination of the archive file. From some experiences with pgloader and loading data in migration processes, often enough the most gains are to be had when you load the biggest table in parallel with loading all the little ones. It often makes it so that the big table loading time is not affected, and by the time it's done the rest of the database is done too. Loading several big'o'tables in parallel tend not to give benefits in the tests I've done so far, but that might be an artefact of python multi threading, I will do some testing with proper tooling later. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 1/31/2013 2:06 AM, Dimitri Fontaine wrote: > Loading several big'o'tables in parallel tend not to give benefits in > the tests I've done so far, but that might be an artefact of python > multi threading, I will do some testing with proper tooling later. or insufficient IO parallelism in your disk hardware, file system, etc. -- john r pierce 37N 122W somewhere on the middle of the left coast
On Tue, Jan 29, 2013 at 3:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "David Rowley" <dgrowleyml@gmail.com> writes: >> If pg_dump was to still follow the dependencies of objects, would there be >> any reason why it shouldn't backup larger tables first? > > Pretty much every single discussion/complaint about pg_dump's ordering > choices has been about making its behavior more deterministic not less > so. So I can't imagine such a change would go over well with most folks. > > Also, it's far from obvious to me that "largest first" is the best rule > anyhow; it's likely to be more complicated than that. From my experience in the non-database world of processing many files of greatly different sizes in parallel, sorting them so the largest are scheduled first and smaller ones get "pack" around them is very successful and very easy. I agree that best rule surely is more complicated, but probably so much so that it will never get implemented. > > But anyway, the right place to add this sort of consideration is in > pg_restore --parallel, not pg_dump. Yeah. Cheers, Jeff
On Thu, Jan 31, 2013 at 5:06 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Also, it's far from obvious to me that "largest first" is the best rule >> anyhow; it's likely to be more complicated than that. >> >> But anyway, the right place to add this sort of consideration is in >> pg_restore --parallel, not pg_dump. I don't know how hard it would be >> for the scheduler algorithm in there to take table size into account, >> but at least in principle it should be possible to find out the size of >> the (compressed) table data from examination of the archive file. > > From some experiences with pgloader and loading data in migration > processes, often enough the most gains are to be had when you load the > biggest table in parallel with loading all the little ones. It often > makes it so that the big table loading time is not affected, and by the > time it's done the rest of the database is done too. > > Loading several big'o'tables in parallel tend not to give benefits in > the tests I've done so far, but that might be an artefact of python > multi threading, I will do some testing with proper tooling later. We had the idea of doing this with Slony, to try to process subscriptions faster by, yes, indeed, loading large tables first, and throwing reindexing off onto secondary threads. The big win seemed to come specifically from the reindexing aspect; that tends to take a fair bit more time than the indexless COPY. It would make the subscription process a bit more fragile, and would add quite a bit of development work, for something that didn't seem to be *that* much of a priority, so we never went past the "Gedankenexperiment" of establishing that it seemed feasible. A side-effect that we didn't have to worry about with Slony, but that would be important for more general use, is what happens to the processing of re-establishing foreign keys. In Slony, we suppress them on subscriber nodes; you'd need to worry about that for general purpose use. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"