Home > mailing lists

Re: 8.3 / 8.2.6 restore comparison - Mailing list pgsql-hackers

From	Joshua D. Drake
Subject	Re: 8.3 / 8.2.6 restore comparison
Date	February 24, 2008 16:47:42
Msg-id	47C1ADA3.60104@commandprompt.com Whole thread Raw
In response to	Re: 8.3 / 8.2.6 restore comparison (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: 8.3 / 8.2.6 restore comparison
List	pgsql-hackers

Tree view

Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> At some point, I think we have to bite the bullet and find a way to use 
>> multiple CPUs for a single load. I don't have any good ideas or plans 
>> for that, but hopefully someone does.
> 
> As already mentioned upthread, we could do that today, with zero backend
> changes, by making pg_restore drive multiple sessions.  Now there are
> scenarios where this wouldn't help too much --- eg, a database with only
> one enormous table.  We couldn't parallelize the loading of that table,
> although we could parallelize creation of its indexes.  But for an
> example such as JD is complaining about, we ought to have no trouble
> thrashing his disks into the ground ;-)

Bring it on ! but I would note that with the current pg_restore I was 
not able to trash my disks. I only used four backends (I could have used 
8) but even with that, I was only doing ~ 45M a second. So if I double, 
I still have bandwidth. It would still be an huge improvement though.

> 
> What you would need for this is an -Fc or -Ft dump, because a plain
> script output from pg_dump doesn't carry any dependency information,

I would focus on -Fc. With the limitations of -Ft this would be a good 
way to start phasing -Ft out.

> much less any index of where in the file different bits of data are.
> Just armwaving, I envision a multiprocess pg_restore like this:
> 
>     * one controller process that tracks the not-yet-loaded
>       TOC items in the dump
>     * N child processes that just transmit a selected TOC item
>       to a connected backend, and then ask the controller
>       what to do next

> Most likely, the bottleneck with this sort of thing would be multiple
> parallel reads from the pg_dump archive file.  Possibly the controller
> process could be taught to schedule COPY and CREATE INDEX operations
> so that not too many processes are trying to read lots of archive
> data at the same time.

A less hacker and more DBA bottleneck will be to limit the number of 
backends being created for restore. We don't really want to have more 
than one backend per CPU, otherwise we just start switching.

Sincerely,

Joshua D. Drake

pgsql-hackers by date:

From: Hannes Dorbath
Date: 24 February 2008, 16:41:20
Subject: Re: 8.3 / 8.2.6 restore comparison

From: Tom Lane
Date: 24 February 2008, 16:57:37
Subject: Re: Behaviour of rows containg not-null domains in plpgsql

Re: 8.3 / 8.2.6 restore comparison - Mailing list pgsql-hackers

Previous

Next