Thread: Idea for improving speed of pg_restore
Hi, While on the topic of "need for in-place upgrades", I got to think- ing how the pg_restore could be speeded up. Am I wrong in saying that in the current pg_restore, all of the indexes are created in serial? How about this new, multi-threaded way of doing the pg_restore: 0. On the command line, you specify how many threads you want. 1. pg_restore creates the metadata. 2. pg_restore creates a set of streams/children that have the table loads and index-creates (PKs and "regular" indexes), where each streams does a set of tables. 3. pg_restore waits for all the streams to complete. 4. pg_restore then creates the FKs, stored procedures, triggers, etc, etc. Pardon if this has already been suggested and shot down as impossible or too difficult, etc. -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA "Why should we not accept all in favor of woman suffrage to our platform and association even though they be rabid pro-slavery Democrats." Susan B. Anthony, _History_of_Woman_Suffrage_ http://www.ifeminists.com/introduction/essays/introduction.html
A long time ago, in a galaxy far, far away, ron.l.johnson@cox.net (Ron Johnson) wrote: > Pardon if this has already been suggested and shot down as impossible > or too difficult, etc. None of this is forcibly the problem. The _big_ problem is that this isn't necessarily going to improve the speed of the would-be restore. I have brought servers to their knees by trying to restore multiple databases at once. The reason why this isn't likely to work is that the "multitasking" that you need to maximize is not of the CPUs, but rather of the disk drives. Restoring a database involves, for each table: 1. Reading table data from the source file; 2. Writing data to the database file for the table; 3. After that, reading the database file data, and 4. Writing the sorted bits to the index file. 5. Along with all of this, HEFTY amounts of updates to WAL. Ideal performance for this involves having 1. and 2. take place concurrently, where a sequential set of reads from one disk lead to more-or-less sequential writes to the "database drive" and to the "WAL drive." Multitask that and it is easy for the I/O patterns to go bad, where one process is trying to sequence writes to one table whilst another is writing to another, so you thereby have the disk seeking back and forth between the one and the other. There might well be some merit to having the capability to work on several indices on the same table all at once, so that after reading through the table once, _three_ indices are generated, mostly in memory, and then consecutively spilled to disk. But it may be just as well to let the shared memory cache do the work so that the table gets read in once, to generate the first index, and then is already in cache, which will surely help performance for generating the subsequent indices. That's a longwinded way of saying that doing multiple concurrent mass data loads sucks pretty bad. -- "aa454","@","freenet.carleton.ca" http://www3.sympatico.ca/cbbrowne/spiritual.html "If it can't be abused, it's not freedom. A man may be in as just possession of truth as of a city, and yet be forced to surrender." -- Thomas Browne
Christopher Browne <cbbrowne@acm.org> writes: > Restoring a database involves, for each table: > 1. Reading table data from the source file; > 2. Writing data to the database file for the table; > 3. After that, reading the database file data, and > 4. Writing the sorted bits to the index file. > 5. Along with all of this, HEFTY amounts of updates to WAL. An idea that Marc and Jan and I were kicking around last night was to offer a GUC option to disable writes to WAL. During initial data load, you might as well go back to initdb if you have any failure, so why bother with full ACID compliance? I'm not sure if the performance benefit would be great enough to make it worth equipping the system with such a large-caliber foot-gun, but it's something to think about. I tend to agree with your doubts about parallelizing index builds, but there may be scenarios where it's a win; it'd depend on your relative CPU and disk horsepower. (Consider fast disk and multiple not-so-fast CPUs; serial index builds can only use one of the CPUs.) Question is, is it a big enough win for enough people to make it worth supporting? regards, tom lane
On Tue, 16 Sep 2003, Tom Lane wrote: > Christopher Browne <cbbrowne@acm.org> writes: > > Restoring a database involves, for each table: > > 1. Reading table data from the source file; > > 2. Writing data to the database file for the table; > > 3. After that, reading the database file data, and > > 4. Writing the sorted bits to the index file. > > 5. Along with all of this, HEFTY amounts of updates to WAL. > > An idea that Marc and Jan and I were kicking around last night was to > offer a GUC option to disable writes to WAL. During initial data load, > you might as well go back to initdb if you have any failure, so why > bother with full ACID compliance? I'm not sure if the performance > benefit would be great enough to make it worth equipping the system > with such a large-caliber foot-gun, but it's something to think about. > > I tend to agree with your doubts about parallelizing index builds, > but there may be scenarios where it's a win; it'd depend on your > relative CPU and disk horsepower. (Consider fast disk and multiple > not-so-fast CPUs; serial index builds can only use one of the CPUs.) > Question is, is it a big enough win for enough people to make it worth > supporting? I'd say anything that improves postgresql's performance on medium to large iron is worth at least trying. I imagine a 12 disk RAID5 with battery backed RAID would likely be able to use parallelized index builds and even loads. Not so sure on whether the foot gun is a good idea. We already have .22 calibre foot gun (fsync) that makes for pretty big improvements in load speed, and we see people all the time on General and Performance running production servers with it turned off. You know as well as I do the second we make WAL optional, some people are gonna start running production servers with it.
On Tue, 16 Sep 2003, scott.marlowe wrote: > Not so sure on whether the foot gun is a good idea. We already have .22 > calibre foot gun (fsync) that makes for pretty big improvements in load > speed, and we see people all the time on General and Performance running > production servers with it turned off. You know as well as I do the > second we make WAL optional, some people are gonna start running > production servers with it. it shouldn't be too difficult to put some sort of restrictions on its usual ... say if WAL disabled, max connections == 2? :)
ROTFLMAO! That's just the trigger I needed for a belly laugh today. Thanks guys! Marc G. Fournier wrote: >On Tue, 16 Sep 2003, scott.marlowe wrote: > > > >>Not so sure on whether the foot gun is a good idea. We already have .22 >>calibre foot gun (fsync) that makes for pretty big improvements in load >>speed, and we see people all the time on General and Performance running >>production servers with it turned off. You know as well as I do the >>second we make WAL optional, some people are gonna start running >>production servers with it. >> >> > >it shouldn't be too difficult to put some sort of restrictions on its >usual ... say if WAL disabled, max connections == 2? :) > > >---------------------------(end of broadcast)--------------------------- >TIP 7: don't forget to increase your free space map settings > > >
On Mon, 2003-09-15 at 21:55, Christopher Browne wrote: > A long time ago, in a galaxy far, far away, ron.l.johnson@cox.net (Ron Johnson) wrote: > > Pardon if this has already been suggested and shot down as impossible > > or too difficult, etc. > > None of this is forcibly the problem. > > The _big_ problem is that this isn't necessarily going to improve the > speed of the would-be restore. > > I have brought servers to their knees by trying to restore multiple > databases at once. And I've seen aggregate performance increase by parallelizing table inserts. It all depends on the IO sub-system. I'm sure that, as Scott says, the win would be bigger on systems with fast IO systems (large RAID systems, or, even, once table- spaces are implemented, multiple controllers). > The reason why this isn't likely to work is that the "multitasking" > that you need to maximize is not of the CPUs, but rather of the disk > drives. > > Restoring a database involves, for each table: > 1. Reading table data from the source file; So, how about parallelized pg_dump, that can create multiple dump files. Match the number of dump files with the number of pg_restore threads. Of course, as mentioned before, the better the IO system, the more threads it would take to swamp. > 2. Writing data to the database file for the table; > 3. After that, reading the database file data, and > 4. Writing the sorted bits to the index file. > 5. Along with all of this, HEFTY amounts of updates to WAL. Soooo, have pg_restore bypass the WAL. It's one big transaction anyway, so why write to it anyway. (Thanks to Tom for this idea.) > Ideal performance for this involves having 1. and 2. take place > concurrently, where a sequential set of reads from one disk lead to > more-or-less sequential writes to the "database drive" and to the "WAL > drive." > > Multitask that and it is easy for the I/O patterns to go bad, where > one process is trying to sequence writes to one table whilst another > is writing to another, so you thereby have the disk seeking back and > forth between the one and the other. Well, the DBA has to be cluefull enough to match the number of threads to the capacity of the IO system. > There might well be some merit to having the capability to work on > several indices on the same table all at once, so that after reading > through the table once, _three_ indices are generated, mostly in > memory, and then consecutively spilled to disk. But it may be just as I've been dreaming of that for years... > well to let the shared memory cache do the work so that the table gets > read in once, to generate the first index, and then is already in > cache, which will surely help performance for generating the > subsequent indices. If the table is, say, 20GB, then how much would the cache really matter, since it would get full. > That's a longwinded way of saying that doing multiple concurrent mass > data loads sucks pretty bad. No, it doesn't suck, it's a big win, depending on the "capacity" of the IO system. -- ----------------------------------------------------------------- Ron Johnson, Jr. ron.l.johnson@cox.net Jefferson, LA USA After listening to many White House, Pentagon & CENTCOM briefings in both Gulf Wars, it is my firm belief that most "senior correspondents" either have serious agendas that don't get shaken by facts, or are dumb as dog feces.
On Tue, 16 Sep 2003, Marc G. Fournier wrote: > > > On Tue, 16 Sep 2003, scott.marlowe wrote: > > > Not so sure on whether the foot gun is a good idea. We already have .22 > > calibre foot gun (fsync) that makes for pretty big improvements in load > > speed, and we see people all the time on General and Performance running > > production servers with it turned off. You know as well as I do the > > second we make WAL optional, some people are gonna start running > > production servers with it. > > it shouldn't be too difficult to put some sort of restrictions on its > usual ... say if WAL disabled, max connections == 2? :) Even better, restrict logins to superuser only, that way we could still have more than 2 things happening (think of a machine with a huge number of disks in a RAID array kinda thing) or only updateable / writeable by the superuser when in non-WAL mode.
that works too ... basically, adding 'security' for a "load nly" mode shouldn't be to difficult On Tue, 16 Sep 2003, scott.marlowe wrote: > On Tue, 16 Sep 2003, Marc G. Fournier wrote: > > > > > > > On Tue, 16 Sep 2003, scott.marlowe wrote: > > > > > Not so sure on whether the foot gun is a good idea. We already have .22 > > > calibre foot gun (fsync) that makes for pretty big improvements in load > > > speed, and we see people all the time on General and Performance running > > > production servers with it turned off. You know as well as I do the > > > second we make WAL optional, some people are gonna start running > > > production servers with it. > > > > it shouldn't be too difficult to put some sort of restrictions on its > > usual ... say if WAL disabled, max connections == 2? :) > > Even better, restrict logins to superuser only, that way we could still > have more than 2 things happening (think of a machine with a huge number > of disks in a RAID array kinda thing) or only updateable / writeable by > the superuser when in non-WAL mode. > >
"scott.marlowe" <scott.marlowe@ihs.com> writes: > Not so sure on whether the foot gun is a good idea. We already have .22 > calibre foot gun (fsync) that makes for pretty big improvements in load > speed, and we see people all the time on General and Performance running > production servers with it turned off. You know as well as I do the > second we make WAL optional, some people are gonna start running > production servers with it. Well, yeah, they will. On a noncritical server, is that a sin? I mean, if we offer fsync-off, it's not clear to me that offering WAL-off makes the difference between venial and mortal sin. Seems to me we're just putting the weapons in the display case. fsync = .22, WAL = .45, but you shoot your foot with either one it's still gonna ruin your day. regards, tom lane
On 17 Sep 2003 at 0:16, Tom Lane wrote: > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > Not so sure on whether the foot gun is a good idea. We already have .22 > > calibre foot gun (fsync) that makes for pretty big improvements in load > > speed, and we see people all the time on General and Performance running > > production servers with it turned off. You know as well as I do the > > second we make WAL optional, some people are gonna start running > > production servers with it. > > Well, yeah, they will. On a noncritical server, is that a sin? I mean, > if we offer fsync-off, it's not clear to me that offering WAL-off makes > the difference between venial and mortal sin. Seems to me we're just > putting the weapons in the display case. fsync = .22, WAL = .45, > but you shoot your foot with either one it's still gonna ruin your day. If somebopdy wants WAL effectively turned off, then can symlink WAL to a ramdisk that has a GB under the carpet. That would offer all the "benefits" of WAL being tunred off. Why this new provision? Is it really that difficult to mount WAL on ramdisk during reload? See, we offer non-transaction mode mysql defaults to, already..:-) Just a thought.. Bye Shridhar -- QOTD: "I used to go to UCLA, but then my Dad got a job."
"Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes: > On 17 Sep 2003 at 0:16, Tom Lane wrote: >> Well, yeah, they will. On a noncritical server, is that a sin? I mean, >> if we offer fsync-off, it's not clear to me that offering WAL-off makes >> the difference between venial and mortal sin. > If somebopdy wants WAL effectively turned off, then can symlink WAL to > a ramdisk that has a GB under the carpet. That would offer all the > "benefits" of WAL being tunred off. No, because the point of the proposal is to turn off WAL *temporarily* during initial database load. Having to move WAL around and then back again isn't simple, it isn't fool-proof, and it doesn't buy all of the intended speed savings (the above might save some disk bandwidth but it avoids none of the CPU expense associated with creating WAL entries). regards, tom lane
On Wed, 17 Sep 2003, Tom Lane wrote: > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > Not so sure on whether the foot gun is a good idea. We already have .22 > > calibre foot gun (fsync) that makes for pretty big improvements in load > > speed, and we see people all the time on General and Performance running > > production servers with it turned off. You know as well as I do the > > second we make WAL optional, some people are gonna start running > > production servers with it. > > Well, yeah, they will. On a noncritical server, is that a sin? I mean, > if we offer fsync-off, it's not clear to me that offering WAL-off makes > the difference between venial and mortal sin. Seems to me we're just > putting the weapons in the display case. fsync = .22, WAL = .45, > but you shoot your foot with either one it's still gonna ruin your day. Now that you mention it, there are some areas where no WAL makes sense, like proxy caching data stores and such, where errors aren't critical because you can just grab the original data. Postgresql, now with MySQL data integrity mode. :-)
tgl@sss.pgh.pa.us (Tom Lane) writes: > Christopher Browne <cbbrowne@acm.org> writes: >> Restoring a database involves, for each table: >> 1. Reading table data from the source file; >> 2. Writing data to the database file for the table; >> 3. After that, reading the database file data, and >> 4. Writing the sorted bits to the index file. >> 5. Along with all of this, HEFTY amounts of updates to WAL. > > An idea that Marc and Jan and I were kicking around last night was > to offer a GUC option to disable writes to WAL. During initial data > load, you might as well go back to initdb if you have any failure, > so why bother with full ACID compliance? I'm not sure if the > performance benefit would be great enough to make it worth equipping > the system with such a large-caliber foot-gun, but it's something to > think about. That is a good thought. To make it _marginally_ less unsafe, it might be an idea to only enable this in some variation on single user mode so that it is made manifestly clear that this is a "recovery" process and that nobody else should be poking their head in on. (Or, in keeping with the metaphor, it involves having a large flashing light that says "Stay off the range - live fire artillery exercise under way!" "Make sure this isn't pointed at Mabel or any other highly trained monkeys or databases you want to keep around!" <http://cbbrowne.com/info/images/shooting.jpg> Similarly, if my brother gets behind the wheel of Certain Large Vehicles, it is good to keep any lurking trees out of the way... <http://www3.ns.sympatico.ca/coffee/pics/brdtnk2.gif> :-)) > I tend to agree with your doubts about parallelizing index builds, > but there may be scenarios where it's a win; it'd depend on your > relative CPU and disk horsepower. (Consider fast disk and multiple > not-so-fast CPUs; serial index builds can only use one of the CPUs.) > Question is, is it a big enough win for enough people to make it > worth supporting? That one would take some benchmarking to see if/where it would be a win. And "fast disk and multiple cheezy CPUs" sounds pretty atypical in these days of Opterons and multiple GHz Intel hardware. That being said, I could point to a _perfect_ server to try it out on, with the problem that it isn't representative of the sort of 'production' environment where you'd actually care about tuning the builds. Certainly doesn't fit into the set of things I can afford to kludge into my schedule :-(. -- let name="aa454" and tld="freenet.carleton.ca" in String.concat "@" [name;tld];; http://www.ntlug.org/~cbbrowne/x.html If nothing ever sticks to Teflon, how do they make Teflon stick to the pan?
We already have on TODO: * Turn off after-change writes if fsync is disabled (?) I am wondering if we should remove the fsync option completely and just have a "os_crash_unsafe" option that turns off fsync and WAL, or at least all the WAL used for os crash recovery --- I think we still need WAL to recover from dirty buffers that didn't get written to the OS cache. --------------------------------------------------------------------------- Tom Lane wrote: > Christopher Browne <cbbrowne@acm.org> writes: > > Restoring a database involves, for each table: > > 1. Reading table data from the source file; > > 2. Writing data to the database file for the table; > > 3. After that, reading the database file data, and > > 4. Writing the sorted bits to the index file. > > 5. Along with all of this, HEFTY amounts of updates to WAL. > > An idea that Marc and Jan and I were kicking around last night was to > offer a GUC option to disable writes to WAL. During initial data load, > you might as well go back to initdb if you have any failure, so why > bother with full ACID compliance? I'm not sure if the performance > benefit would be great enough to make it worth equipping the system > with such a large-caliber foot-gun, but it's something to think about. > > I tend to agree with your doubts about parallelizing index builds, > but there may be scenarios where it's a win; it'd depend on your > relative CPU and disk horsepower. (Consider fast disk and multiple > not-so-fast CPUs; serial index builds can only use one of the CPUs.) > Question is, is it a big enough win for enough people to make it worth > supporting? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073