Re: no XLOG during COPY? - Mailing list pgsql-hackers

From Russell Smith
Subject Re: no XLOG during COPY?
Date
Msg-id 48CF81B8.9050206@pws.com.au
Whole thread Raw
In response to Re: no XLOG during COPY?  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: no XLOG during COPY?  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
Andrew Dunstan wrote:
> [snip]
>>  
>
> Er, who doesn't know what for certain, exactly? pg_restore will
> certainly know that it has created the table in another session and
> can thus safely truncate the table in the same transaction as the data
> load.
>
> cheers
>
> andrew
>
I'm confused about why table definition and data can't be loaded in the
same backend and transaction.  Can somebody explain that?

All items in the tree like  A -> B -> C -> D  should all be loaded in
the same transaction as they are serially dependent.  I can't think of a
way that the table data requires more than just the table to load. 
Foreign keys may produce this situation but if all tables are loaded
with the data I can't see how it can happen.  As Foreign key tables must
be loaded before the referencing table.  But then I think these
constraints are loaded at the end anyway.

The first cut of this may not have the dependency resolution smarts to
work out how best to group restore items together to send to a backend
together.  My research into how the directed graph dependency
information is stored should allow for dishing out the data to backends
in the best possible way.  But currently there is no graph as such, just
a serial list of items that are safe to load.  Producing the graph will
give a better idea of maximum concurrency based on what's dependent on
each other.  But the graph has to be built from the dependency
information that's stored.

Is it also feasible to have the -1 (single transaction) option to
complete the largest possible work unit inside a single transaction. 
This means there would be 1transaction per backend work unit, eg (A, B,
C, D in the above).  I don' t know if indexes can skip WAL if they are
in the table creation transaction but that would seem like another win
if they were added at the same time as the table.  That does play
against the ideas of running all of the index creation statements in
parallel to get the benefit of synchronized scan.  I don't know what
going to be the biggest win on big hardware as I don't have any.  Just
something to think about.

Thanks

Russell Smith


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: NDirectFileRead and Write
Next
From: Martin Pihlak
Date:
Subject: per-table autovacuum configuration