Re: Idea for improving speed of pg_restore - Mailing list pgsql-general

From Christopher Browne
Subject Re: Idea for improving speed of pg_restore
Date
Msg-id m3wuc9saqn.fsf@wolfe.cbbrowne.com
Whole thread Raw
In response to Idea for improving speed of pg_restore  (Ron Johnson <ron.l.johnson@cox.net>)
Responses Re: Idea for improving speed of pg_restore  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Idea for improving speed of pg_restore  (Ron Johnson <ron.l.johnson@cox.net>)
List pgsql-general
A long time ago, in a galaxy far, far away, ron.l.johnson@cox.net (Ron Johnson) wrote:
> Pardon if this has already been suggested and shot down as impossible
> or too difficult, etc.

None of this is forcibly the problem.

The _big_ problem is that this isn't necessarily going to improve the
speed of the would-be restore.

I have brought servers to their knees by trying to restore multiple
databases at once.

The reason why this isn't likely to work is that the "multitasking"
that you need to maximize is not of the CPUs, but rather of the disk
drives.

Restoring a database involves, for each table:
 1. Reading table data from the source file;
 2. Writing data to the database file for the table;
 3. After that, reading the database file data, and
 4. Writing the sorted bits to the index file.
 5. Along with all of this, HEFTY amounts of updates to WAL.

Ideal performance for this involves having 1. and 2. take place
concurrently, where a sequential set of reads from one disk lead to
more-or-less sequential writes to the "database drive" and to the "WAL
drive."

Multitask that and it is easy for the I/O patterns to go bad, where
one process is trying to sequence writes to one table whilst another
is writing to another, so you thereby have the disk seeking back and
forth between the one and the other.

There might well be some merit to having the capability to work on
several indices on the same table all at once, so that after reading
through the table once, _three_ indices are generated, mostly in
memory, and then consecutively spilled to disk.  But it may be just as
well to let the shared memory cache do the work so that the table gets
read in once, to generate the first index, and then is already in
cache, which will surely help performance for generating the
subsequent indices.

That's a longwinded way of saying that doing multiple concurrent mass
data loads sucks pretty bad.
--
"aa454","@","freenet.carleton.ca"
http://www3.sympatico.ca/cbbrowne/spiritual.html
"If it  can't be abused, it's  not freedom.  A  man may be in  as just
possession of truth as of a city, and yet be forced to surrender."
-- Thomas Browne

pgsql-general by date:

Previous
From: Christopher Browne
Date:
Subject: Re: CONCAT function
Next
From: Ish Ahluwalia
Date:
Subject: Re: pgSql Memory footprint