Home > mailing lists

Re: directory archive format for pg_dump - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: directory archive format for pg_dump
Date	December 16, 2010 17:45:59
Msg-id	4D0A5E56.1080109@enterprisedb.com Whole thread Raw
In response to	Re: directory archive format for pg_dump (Joachim Wieland <joe@mcknight.de>)
Responses	Re: directory archive format for pg_dump (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On 16.12.2010 20:33, Joachim Wieland wrote:
> On Thu, Dec 16, 2010 at 12:48 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> As soon as we have parallel pg_dump, the next big thing is going to be
>> parallel dump of the same table using multiple processes. Perhaps we should
>> prepare for that in the directory archive format, by allowing the data of a
>> single table to be split into multiple files. That way parallel pg_dump is
>> simple, you just split the table in chunks of roughly the same size, say
>> 10GB each, and launch a process for each chunk, writing to a separate file.
>
> How exactly would you "just split the table in chunks of roughly the
> same size" ?

Check pg_class.relpages, and divide that evenly across the processes. 
That should be good enough.

> Which queries should pg_dump send to the backend? If it
> just sends a bunch of WHERE queries, the server would still scan the
> same data several times since each pg_dump client would result in a
> seqscan over the full table.

Hmm, I was thinking of "SELECT * FROM table WHERE ctid BETWEEN ? AND ?", 
but we don't support TidScans for ranges. Perhaps we could add that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Joachim Wieland
Date: 16 December 2010, 17:39:25
Subject: Re: directory archive format for pg_dump

From: Pavel Stehule
Date: 16 December 2010, 18:20:06
Subject: proposal: FOREACH-IN-ARRAY (probably for 9.2?)

Re: directory archive format for pg_dump - Mailing list pgsql-hackers

Previous

Next