Thread: Idea for improving speed of pg_restore

Idea for improving speed of pg_restore

From
Ron Johnson
Date:
Hi,

While on the topic of "need for in-place upgrades", I got to think-
ing how the pg_restore could be speeded up.

Am I wrong in saying that in the current pg_restore, all of the
indexes are created in serial?

How about this new, multi-threaded way of doing the pg_restore:
0. On the command line, you specify how many threads you want.
1. pg_restore creates the metadata.
2. pg_restore creates a set of streams/children that have the table
   loads and index-creates (PKs and "regular" indexes), where each
   streams does a set of tables.
3. pg_restore waits for all the streams to complete.
4. pg_restore then creates the FKs, stored procedures, triggers,
   etc, etc.

Pardon if this has already been suggested and shot down as impossible
or too difficult, etc.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

"Why should we not accept all in favor of woman suffrage to our
platform and association even though they be rabid pro-slavery
Democrats."
Susan B. Anthony, _History_of_Woman_Suffrage_
http://www.ifeminists.com/introduction/essays/introduction.html


Re: Idea for improving speed of pg_restore

From
Christopher Browne
Date:
A long time ago, in a galaxy far, far away, ron.l.johnson@cox.net (Ron Johnson) wrote:
> Pardon if this has already been suggested and shot down as impossible
> or too difficult, etc.

None of this is forcibly the problem.

The _big_ problem is that this isn't necessarily going to improve the
speed of the would-be restore.

I have brought servers to their knees by trying to restore multiple
databases at once.

The reason why this isn't likely to work is that the "multitasking"
that you need to maximize is not of the CPUs, but rather of the disk
drives.

Restoring a database involves, for each table:
 1. Reading table data from the source file;
 2. Writing data to the database file for the table;
 3. After that, reading the database file data, and
 4. Writing the sorted bits to the index file.
 5. Along with all of this, HEFTY amounts of updates to WAL.

Ideal performance for this involves having 1. and 2. take place
concurrently, where a sequential set of reads from one disk lead to
more-or-less sequential writes to the "database drive" and to the "WAL
drive."

Multitask that and it is easy for the I/O patterns to go bad, where
one process is trying to sequence writes to one table whilst another
is writing to another, so you thereby have the disk seeking back and
forth between the one and the other.

There might well be some merit to having the capability to work on
several indices on the same table all at once, so that after reading
through the table once, _three_ indices are generated, mostly in
memory, and then consecutively spilled to disk.  But it may be just as
well to let the shared memory cache do the work so that the table gets
read in once, to generate the first index, and then is already in
cache, which will surely help performance for generating the
subsequent indices.

That's a longwinded way of saying that doing multiple concurrent mass
data loads sucks pretty bad.
--
"aa454","@","freenet.carleton.ca"
http://www3.sympatico.ca/cbbrowne/spiritual.html
"If it  can't be abused, it's  not freedom.  A  man may be in  as just
possession of truth as of a city, and yet be forced to surrender."
-- Thomas Browne

Re: Idea for improving speed of pg_restore

From
Tom Lane
Date:
Christopher Browne <cbbrowne@acm.org> writes:
> Restoring a database involves, for each table:
>  1. Reading table data from the source file;
>  2. Writing data to the database file for the table;
>  3. After that, reading the database file data, and
>  4. Writing the sorted bits to the index file.
>  5. Along with all of this, HEFTY amounts of updates to WAL.

An idea that Marc and Jan and I were kicking around last night was to
offer a GUC option to disable writes to WAL.  During initial data load,
you might as well go back to initdb if you have any failure, so why
bother with full ACID compliance?  I'm not sure if the performance
benefit would be great enough to make it worth equipping the system
with such a large-caliber foot-gun, but it's something to think about.

I tend to agree with your doubts about parallelizing index builds,
but there may be scenarios where it's a win; it'd depend on your
relative CPU and disk horsepower.  (Consider fast disk and multiple
not-so-fast CPUs; serial index builds can only use one of the CPUs.)
Question is, is it a big enough win for enough people to make it worth
supporting?

            regards, tom lane

Re: Idea for improving speed of pg_restore

From
"scott.marlowe"
Date:
On Tue, 16 Sep 2003, Tom Lane wrote:

> Christopher Browne <cbbrowne@acm.org> writes:
> > Restoring a database involves, for each table:
> >  1. Reading table data from the source file;
> >  2. Writing data to the database file for the table;
> >  3. After that, reading the database file data, and
> >  4. Writing the sorted bits to the index file.
> >  5. Along with all of this, HEFTY amounts of updates to WAL.
>
> An idea that Marc and Jan and I were kicking around last night was to
> offer a GUC option to disable writes to WAL.  During initial data load,
> you might as well go back to initdb if you have any failure, so why
> bother with full ACID compliance?  I'm not sure if the performance
> benefit would be great enough to make it worth equipping the system
> with such a large-caliber foot-gun, but it's something to think about.
>
> I tend to agree with your doubts about parallelizing index builds,
> but there may be scenarios where it's a win; it'd depend on your
> relative CPU and disk horsepower.  (Consider fast disk and multiple
> not-so-fast CPUs; serial index builds can only use one of the CPUs.)
> Question is, is it a big enough win for enough people to make it worth
> supporting?

I'd say anything that improves postgresql's performance on medium to large
iron is worth at least trying.  I imagine a 12 disk RAID5 with battery
backed RAID would likely be able to use parallelized index builds and even
loads.

Not so sure on whether the foot gun is a good idea.  We already have .22
calibre foot gun (fsync) that makes for pretty big improvements in load
speed, and we see people all the time on General and Performance running
production servers with it turned off.  You know as well as I do the
second we make WAL optional, some people are gonna start running
production servers with it.


Re: Idea for improving speed of pg_restore

From
"Marc G. Fournier"
Date:

On Tue, 16 Sep 2003, scott.marlowe wrote:

> Not so sure on whether the foot gun is a good idea.  We already have .22
> calibre foot gun (fsync) that makes for pretty big improvements in load
> speed, and we see people all the time on General and Performance running
> production servers with it turned off.  You know as well as I do the
> second we make WAL optional, some people are gonna start running
> production servers with it.

it shouldn't be too difficult to put some sort of restrictions on its
usual ... say if WAL disabled, max connections == 2? :)


Re: Idea for improving speed of pg_restore

From
Dennis Gearon
Date:
ROTFLMAO!

That's just the trigger I needed for a belly laugh today. Thanks guys!

Marc G. Fournier wrote:

>On Tue, 16 Sep 2003, scott.marlowe wrote:
>
>
>
>>Not so sure on whether the foot gun is a good idea.  We already have .22
>>calibre foot gun (fsync) that makes for pretty big improvements in load
>>speed, and we see people all the time on General and Performance running
>>production servers with it turned off.  You know as well as I do the
>>second we make WAL optional, some people are gonna start running
>>production servers with it.
>>
>>
>
>it shouldn't be too difficult to put some sort of restrictions on its
>usual ... say if WAL disabled, max connections == 2? :)
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 7: don't forget to increase your free space map settings
>
>
>


Re: Idea for improving speed of pg_restore

From
Ron Johnson
Date:
On Mon, 2003-09-15 at 21:55, Christopher Browne wrote:
> A long time ago, in a galaxy far, far away, ron.l.johnson@cox.net (Ron Johnson) wrote:
> > Pardon if this has already been suggested and shot down as impossible
> > or too difficult, etc.
>
> None of this is forcibly the problem.
>
> The _big_ problem is that this isn't necessarily going to improve the
> speed of the would-be restore.
>
> I have brought servers to their knees by trying to restore multiple
> databases at once.

And I've seen aggregate performance increase by parallelizing table
inserts.  It all depends on the IO sub-system.

I'm sure that, as Scott says, the win would be bigger on systems
with fast IO systems (large RAID systems, or, even, once table-
spaces are implemented, multiple controllers).

> The reason why this isn't likely to work is that the "multitasking"
> that you need to maximize is not of the CPUs, but rather of the disk
> drives.
>
> Restoring a database involves, for each table:
>  1. Reading table data from the source file;

So, how about parallelized pg_dump, that can create multiple
dump files.  Match the number of dump files with the number
of pg_restore threads.

Of course, as mentioned before, the better the IO system, the
more threads it would take to swamp.

>  2. Writing data to the database file for the table;
>  3. After that, reading the database file data, and
>  4. Writing the sorted bits to the index file.
>  5. Along with all of this, HEFTY amounts of updates to WAL.

Soooo, have pg_restore bypass the WAL.  It's one big transaction
anyway, so why write to it anyway.  (Thanks to Tom for this idea.)

> Ideal performance for this involves having 1. and 2. take place
> concurrently, where a sequential set of reads from one disk lead to
> more-or-less sequential writes to the "database drive" and to the "WAL
> drive."
>
> Multitask that and it is easy for the I/O patterns to go bad, where
> one process is trying to sequence writes to one table whilst another
> is writing to another, so you thereby have the disk seeking back and
> forth between the one and the other.

Well, the DBA has to be cluefull enough to match the number of
threads to the capacity of the IO system.

> There might well be some merit to having the capability to work on
> several indices on the same table all at once, so that after reading
> through the table once, _three_ indices are generated, mostly in
> memory, and then consecutively spilled to disk.  But it may be just as

I've been dreaming of that for years...

> well to let the shared memory cache do the work so that the table gets
> read in once, to generate the first index, and then is already in
> cache, which will surely help performance for generating the
> subsequent indices.

If the table is, say, 20GB, then how much would the cache really
matter, since it would get full.

> That's a longwinded way of saying that doing multiple concurrent mass
> data loads sucks pretty bad.

No, it doesn't suck, it's a big win, depending on the "capacity"
of the IO system.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

After listening to many White House, Pentagon & CENTCOM
briefings in both Gulf Wars, it is my firm belief that most
"senior correspondents" either have serious agendas that don't
get shaken by facts, or are dumb as dog feces.


Re: Idea for improving speed of pg_restore

From
"scott.marlowe"
Date:
On Tue, 16 Sep 2003, Marc G. Fournier wrote:

>
>
> On Tue, 16 Sep 2003, scott.marlowe wrote:
>
> > Not so sure on whether the foot gun is a good idea.  We already have .22
> > calibre foot gun (fsync) that makes for pretty big improvements in load
> > speed, and we see people all the time on General and Performance running
> > production servers with it turned off.  You know as well as I do the
> > second we make WAL optional, some people are gonna start running
> > production servers with it.
>
> it shouldn't be too difficult to put some sort of restrictions on its
> usual ... say if WAL disabled, max connections == 2? :)

Even better, restrict logins to superuser only, that way we could still
have more than 2 things happening (think of a machine with a huge number
of disks in a RAID array kinda thing) or only updateable / writeable by
the superuser when in non-WAL mode.


Re: Idea for improving speed of pg_restore

From
"Marc G. Fournier"
Date:
that works too ... basically, adding 'security' for a "load nly" mode
shouldn't be to difficult

On Tue, 16 Sep 2003, scott.marlowe wrote:

> On Tue, 16 Sep 2003, Marc G. Fournier wrote:
>
> >
> >
> > On Tue, 16 Sep 2003, scott.marlowe wrote:
> >
> > > Not so sure on whether the foot gun is a good idea.  We already have .22
> > > calibre foot gun (fsync) that makes for pretty big improvements in load
> > > speed, and we see people all the time on General and Performance running
> > > production servers with it turned off.  You know as well as I do the
> > > second we make WAL optional, some people are gonna start running
> > > production servers with it.
> >
> > it shouldn't be too difficult to put some sort of restrictions on its
> > usual ... say if WAL disabled, max connections == 2? :)
>
> Even better, restrict logins to superuser only, that way we could still
> have more than 2 things happening (think of a machine with a huge number
> of disks in a RAID array kinda thing) or only updateable / writeable by
> the superuser when in non-WAL mode.
>
>

Re: Idea for improving speed of pg_restore

From
Tom Lane
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> Not so sure on whether the foot gun is a good idea.  We already have .22
> calibre foot gun (fsync) that makes for pretty big improvements in load
> speed, and we see people all the time on General and Performance running
> production servers with it turned off.  You know as well as I do the
> second we make WAL optional, some people are gonna start running
> production servers with it.

Well, yeah, they will.  On a noncritical server, is that a sin?  I mean,
if we offer fsync-off, it's not clear to me that offering WAL-off makes
the difference between venial and mortal sin.  Seems to me we're just
putting the weapons in the display case.  fsync = .22, WAL = .45,
but you shoot your foot with either one it's still gonna ruin your day.

            regards, tom lane

Re: Idea for improving speed of pg_restore

From
"Shridhar Daithankar"
Date:
On 17 Sep 2003 at 0:16, Tom Lane wrote:

> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > Not so sure on whether the foot gun is a good idea.  We already have .22
> > calibre foot gun (fsync) that makes for pretty big improvements in load
> > speed, and we see people all the time on General and Performance running
> > production servers with it turned off.  You know as well as I do the
> > second we make WAL optional, some people are gonna start running
> > production servers with it.
>
> Well, yeah, they will.  On a noncritical server, is that a sin?  I mean,
> if we offer fsync-off, it's not clear to me that offering WAL-off makes
> the difference between venial and mortal sin.  Seems to me we're just
> putting the weapons in the display case.  fsync = .22, WAL = .45,
> but you shoot your foot with either one it's still gonna ruin your day.

If somebopdy wants WAL effectively turned off, then can symlink WAL to a
ramdisk that has a GB under the carpet. That would offer all the "benefits" of
WAL being tunred off.

Why this new provision? Is it really that difficult to mount WAL on ramdisk
during reload?

See, we offer non-transaction mode mysql defaults to, already..:-)

Just a thought..

Bye
 Shridhar

--
QOTD:    "I used to go to UCLA, but then my Dad got a job."


Re: Idea for improving speed of pg_restore

From
Tom Lane
Date:
"Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes:
> On 17 Sep 2003 at 0:16, Tom Lane wrote:
>> Well, yeah, they will.  On a noncritical server, is that a sin?  I mean,
>> if we offer fsync-off, it's not clear to me that offering WAL-off makes
>> the difference between venial and mortal sin.

> If somebopdy wants WAL effectively turned off, then can symlink WAL to
> a ramdisk that has a GB under the carpet. That would offer all the
> "benefits" of WAL being tunred off.

No, because the point of the proposal is to turn off WAL *temporarily*
during initial database load.  Having to move WAL around and then back
again isn't simple, it isn't fool-proof, and it doesn't buy all of the
intended speed savings (the above might save some disk bandwidth but it
avoids none of the CPU expense associated with creating WAL entries).

            regards, tom lane

Re: Idea for improving speed of pg_restore

From
"scott.marlowe"
Date:
On Wed, 17 Sep 2003, Tom Lane wrote:

> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > Not so sure on whether the foot gun is a good idea.  We already have .22
> > calibre foot gun (fsync) that makes for pretty big improvements in load
> > speed, and we see people all the time on General and Performance running
> > production servers with it turned off.  You know as well as I do the
> > second we make WAL optional, some people are gonna start running
> > production servers with it.
>
> Well, yeah, they will.  On a noncritical server, is that a sin?  I mean,
> if we offer fsync-off, it's not clear to me that offering WAL-off makes
> the difference between venial and mortal sin.  Seems to me we're just
> putting the weapons in the display case.  fsync = .22, WAL = .45,
> but you shoot your foot with either one it's still gonna ruin your day.

Now that you mention it, there are some areas where no WAL makes sense,
like proxy caching data stores and such, where errors aren't critical
because you can just grab the original data.

Postgresql, now with MySQL data integrity mode. :-)


Re: Idea for improving speed of pg_restore

From
cbbrowne@acm.org
Date:
tgl@sss.pgh.pa.us (Tom Lane) writes:
> Christopher Browne <cbbrowne@acm.org> writes:
>> Restoring a database involves, for each table:
>>  1. Reading table data from the source file;
>>  2. Writing data to the database file for the table;
>>  3. After that, reading the database file data, and
>>  4. Writing the sorted bits to the index file.
>>  5. Along with all of this, HEFTY amounts of updates to WAL.
>
> An idea that Marc and Jan and I were kicking around last night was
> to offer a GUC option to disable writes to WAL.  During initial data
> load, you might as well go back to initdb if you have any failure,
> so why bother with full ACID compliance?  I'm not sure if the
> performance benefit would be great enough to make it worth equipping
> the system with such a large-caliber foot-gun, but it's something to
> think about.

That is a good thought.

To make it _marginally_ less unsafe, it might be an idea to only
enable this in some variation on single user mode so that it is made
manifestly clear that this is a "recovery" process and that nobody
else should be poking their head in on.

(Or, in keeping with the metaphor, it involves having a large flashing
light that says "Stay off the range - live fire artillery exercise
under way!"  "Make sure this isn't pointed at Mabel or any other
highly trained monkeys or databases you want to keep around!"
<http://cbbrowne.com/info/images/shooting.jpg>  Similarly, if my
brother gets behind the wheel of Certain Large Vehicles, it is good to
keep any lurking trees out of the way...
<http://www3.ns.sympatico.ca/coffee/pics/brdtnk2.gif> :-))

> I tend to agree with your doubts about parallelizing index builds,
> but there may be scenarios where it's a win; it'd depend on your
> relative CPU and disk horsepower.  (Consider fast disk and multiple
> not-so-fast CPUs; serial index builds can only use one of the CPUs.)
> Question is, is it a big enough win for enough people to make it
> worth supporting?

That one would take some benchmarking to see if/where it would be a
win.  And "fast disk and multiple cheezy CPUs" sounds pretty atypical
in these days of Opterons and multiple GHz Intel hardware.  That being
said, I could point to a _perfect_ server to try it out on, with the
problem that it isn't representative of the sort of 'production'
environment where you'd actually care about tuning the builds.
Certainly doesn't fit into the set of things I can afford to kludge
into my schedule :-(.
--
let name="aa454" and tld="freenet.carleton.ca" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/x.html
If nothing ever sticks to Teflon, how do they make Teflon stick to the
pan?

Re: Idea for improving speed of pg_restore

From
Bruce Momjian
Date:
We already have on TODO:

    * Turn off after-change writes if fsync is disabled (?)

I am wondering if we should remove the fsync option completely and just
have a "os_crash_unsafe" option that turns off fsync and WAL, or at
least all the WAL used for os crash recovery --- I think we still need
WAL to recover from dirty buffers that didn't get written to the OS
cache.

---------------------------------------------------------------------------

Tom Lane wrote:
> Christopher Browne <cbbrowne@acm.org> writes:
> > Restoring a database involves, for each table:
> >  1. Reading table data from the source file;
> >  2. Writing data to the database file for the table;
> >  3. After that, reading the database file data, and
> >  4. Writing the sorted bits to the index file.
> >  5. Along with all of this, HEFTY amounts of updates to WAL.
>
> An idea that Marc and Jan and I were kicking around last night was to
> offer a GUC option to disable writes to WAL.  During initial data load,
> you might as well go back to initdb if you have any failure, so why
> bother with full ACID compliance?  I'm not sure if the performance
> benefit would be great enough to make it worth equipping the system
> with such a large-caliber foot-gun, but it's something to think about.
>
> I tend to agree with your doubts about parallelizing index builds,
> but there may be scenarios where it's a win; it'd depend on your
> relative CPU and disk horsepower.  (Consider fast disk and multiple
> not-so-fast CPUs; serial index builds can only use one of the CPUs.)
> Question is, is it a big enough win for enough people to make it worth
> supporting?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073