Thread: no XLOG during COPY?
Back in February, Tom said here: http://archives.postgresql.org/pgsql-hackers/2008-02/msg00963.php : > That defeats a couple of optimizations that > Simon put in recently. The one for no XLOG during COPY is not too > hard to see how to re-enable, but I'm not sure what else there was. > > Could someone please point me at where this optimization was committed? I'm having trouble locating it. Thanks andrew
On Thu, Sep 11, 2008 at 9:01 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > Could someone please point me at where this optimization was committed? I'm > having trouble locating it. I think it's this one: http://archives.postgresql.org/pgsql-committers/2007-01/msg00296.php -- Guillaume
Andrew Dunstan wrote: > Back in February, Tom said here: > http://archives.postgresql.org/pgsql-hackers/2008-02/msg00963.php : > >> That defeats a couple of optimizations that >> Simon put in recently. The one for no XLOG during COPY is not too >> hard to see how to re-enable, but I'm not sure what else there was. > > Could someone please point me at where this optimization was committed? > I'm having trouble locating it. http://archives.postgresql.org/pgsql-committers/2007-01/msg00296.php -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Andrew Dunstan wrote: >> Back in February, Tom said here: >> http://archives.postgresql.org/pgsql-hackers/2008-02/msg00963.php : >> >>> That defeats a couple of optimizations that >>> Simon put in recently. The one for no XLOG during COPY is not too >>> hard to see how to re-enable, but I'm not sure what else there was. >> >> Could someone please point me at where this optimization was >> committed? I'm having trouble locating it. > > http://archives.postgresql.org/pgsql-committers/2007-01/msg00296.php Great, thanks (and also to Guillaume). It looks to me like the simple way around this issue would be to provide an option to have pg_restore emit: begin; truncate foo; copy foo ... commit; The truncate will be trivial as there won't be any data or indexes at that stage anyway. cheers andrew
On Thu, 2008-09-11 at 15:25 -0400, Andrew Dunstan wrote: > Great, thanks (and also to Guillaume). > > It looks to me like the simple way around this issue would be to provide > an option to have pg_restore emit: > begin; truncate foo; copy foo ... commit; > > The truncate will be trivial as there won't be any data or indexes at > that stage anyway. Not sure which stage you're talking about. If this is a parallel restore and you are running a create in one session and a load in another, then ISTM you have no way of knowing that for certain. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Thu, 2008-09-11 at 15:25 -0400, Andrew Dunstan wrote: > > >> Great, thanks (and also to Guillaume). >> >> It looks to me like the simple way around this issue would be to provide >> an option to have pg_restore emit: >> begin; truncate foo; copy foo ... commit; >> >> The truncate will be trivial as there won't be any data or indexes at >> that stage anyway. >> > > Not sure which stage you're talking about. If this is a parallel restore > and you are running a create in one session and a load in another, then > ISTM you have no way of knowing that for certain. > > Er, who doesn't know what for certain, exactly? pg_restore will certainly know that it has created the table in another session and can thus safely truncate the table in the same transaction as the data load. cheers andrew
Andrew Dunstan wrote: > [snip] >> > > Er, who doesn't know what for certain, exactly? pg_restore will > certainly know that it has created the table in another session and > can thus safely truncate the table in the same transaction as the data > load. > > cheers > > andrew > I'm confused about why table definition and data can't be loaded in the same backend and transaction. Can somebody explain that? All items in the tree like A -> B -> C -> D should all be loaded in the same transaction as they are serially dependent. I can't think of a way that the table data requires more than just the table to load. Foreign keys may produce this situation but if all tables are loaded with the data I can't see how it can happen. As Foreign key tables must be loaded before the referencing table. But then I think these constraints are loaded at the end anyway. The first cut of this may not have the dependency resolution smarts to work out how best to group restore items together to send to a backend together. My research into how the directed graph dependency information is stored should allow for dishing out the data to backends in the best possible way. But currently there is no graph as such, just a serial list of items that are safe to load. Producing the graph will give a better idea of maximum concurrency based on what's dependent on each other. But the graph has to be built from the dependency information that's stored. Is it also feasible to have the -1 (single transaction) option to complete the largest possible work unit inside a single transaction. This means there would be 1transaction per backend work unit, eg (A, B, C, D in the above). I don' t know if indexes can skip WAL if they are in the table creation transaction but that would seem like another win if they were added at the same time as the table. That does play against the ideas of running all of the index creation statements in parallel to get the benefit of synchronized scan. I don't know what going to be the biggest win on big hardware as I don't have any. Just something to think about. Thanks Russell Smith
Russell Smith wrote: > Andrew Dunstan wrote: > >> [snip] >> >>> >>> >> Er, who doesn't know what for certain, exactly? pg_restore will >> certainly know that it has created the table in another session and >> can thus safely truncate the table in the same transaction as the data >> load. >> >> cheers >> >> andrew >> >> > I'm confused about why table definition and data can't be loaded in the > same backend and transaction. Can somebody explain that? > That would delay other things that depend on the table definition (e.g. function definitions). If we don't need to make this link (and we don't) then I can't see why we should shackle ourselves with it. cheers andrew