Thread: pg_restore TODO - delay PK creation

pg_restore TODO - delay PK creation

From
"Iain"
Date:
Hi,
 
I'm wondering if this is already on some todo list for pg_restore but I didn't find any mention of it anywhere, so I thought I should post this and see what people think.. 
 
Basically, I'd like to see an option at restore time to not include the primary key constraint when issuing the create table command. I'd like the PK to be added after data has been loaded using an ALTER command.
 
The principle reason for this is performance.
 
There may also be a bug somewhere, or perhaps just a problem with my system, but I was trying to restore a fairly large table (over 7000000 rows) which would run for a couple hours before failing. Dropping the PK enabled the load to complete in 3 or 4 minutes. Adding the PK took another 3 or 4 minutes which adds up to quite a difference.
 
regards
Iain

Re: pg_restore TODO - delay PK creation

From
Bruce Momjian
Date:
Iain wrote:
> Hi,
>
> I'm wondering if this is already on some todo list for pg_restore but I
> didn't find any mention of it anywhere, so I thought I should post this and
> see what people think..
>
> Basically, I'd like to see an option at restore time to not include the
> primary key constraint when issuing the create table command. I'd like the
> PK to be added after data has been loaded using an ALTER command.
>
> The principle reason for this is performance.
>
> There may also be a bug somewhere, or perhaps just a problem with my system,
> but I was trying to restore a fairly large table (over 7000000 rows) which
> would run for a couple hours before failing. Dropping the PK enabled the
> load to complete in 3 or 4 minutes. Adding the PK took another 3 or 4
> minutes which adds up to quite a difference.

I don't know what PostgreSQL version you have but we currently do what
you suggest and I think have been doing it for a few releases now:

---------------------------------------------------------------------------

--
-- Name: test; Type: TABLE; Schema: public; Owner: postgres
--

CREATE TABLE test (
    x integer NOT NULL
);


ALTER TABLE public.test OWNER TO postgres;

--
-- Data for Name: test; Type: TABLE DATA; Schema: public; Owner: postgres
--

COPY test (x) FROM stdin;
1
\.


--
-- Name: test_pkey; Type: CONSTRAINT; Schema: public; Owner: postgres
--

ALTER TABLE ONLY test
    ADD CONSTRAINT test_pkey PRIMARY KEY (x);



--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pg_restore TODO - delay PK creation

From
"Iain"
Date:
Hi Bruce,

OK, I've just looked into it and you are right, thanks.

In the case I just tested I restored a 7.1 dump to a 7.4.6 db. I had assumed
that this is a restore time issue but in fact it is dependent on the format
of the dump file. I noticed the problem after the data load failed and I
checked the description of the table in question, it had a primary key.

Worth noting for anybody preparing to upgrade large databases from old
versions. This db takes 8 hours to restore on 7.1 but I just dumped it and
restored it in less than 20 minutes total on 7.4 :)

I havn't used 7.4 for dumping/restoring for a long time so I had forgotten
how it worked. We're just starting a redevelopment that will include an
upgrade of the production db so I'm looking forward to working with 7.4
again, though I'd like to be working on v8.

Out of interest, what is the word on using newer versions of pg_dump on
older verisons of  the DB - is it is possible or even wise to unload a 7.1
DB with the 7.4 version of pg?

Regards
iain



----- Original Message -----
From: "Bruce Momjian" <pgman@candle.pha.pa.us>
To: "Iain" <iain@mst.co.jp>
Cc: <pgsql-admin@postgresql.org>
Sent: Monday, November 01, 2004 11:36 AM
Subject: Re: [ADMIN] pg_restore TODO - delay PK creation


> Iain wrote:
>> Hi,
>>
>> I'm wondering if this is already on some todo list for pg_restore but I
>> didn't find any mention of it anywhere, so I thought I should post this
>> and
>> see what people think..
>>
>> Basically, I'd like to see an option at restore time to not include the
>> primary key constraint when issuing the create table command. I'd like
>> the
>> PK to be added after data has been loaded using an ALTER command.
>>
>> The principle reason for this is performance.
>>
>> There may also be a bug somewhere, or perhaps just a problem with my
>> system,
>> but I was trying to restore a fairly large table (over 7000000 rows)
>> which
>> would run for a couple hours before failing. Dropping the PK enabled the
>> load to complete in 3 or 4 minutes. Adding the PK took another 3 or 4
>> minutes which adds up to quite a difference.
>
> I don't know what PostgreSQL version you have but we currently do what
> you suggest and I think have been doing it for a few releases now:
>
> ---------------------------------------------------------------------------
>
> --
> -- Name: test; Type: TABLE; Schema: public; Owner: postgres
> --
>
> CREATE TABLE test (
>    x integer NOT NULL
> );
>
>
> ALTER TABLE public.test OWNER TO postgres;
>
> --
> -- Data for Name: test; Type: TABLE DATA; Schema: public; Owner: postgres
> --
>
> COPY test (x) FROM stdin;
> 1
> \.
>
>
> --
> -- Name: test_pkey; Type: CONSTRAINT; Schema: public; Owner: postgres
> --
>
> ALTER TABLE ONLY test
>    ADD CONSTRAINT test_pkey PRIMARY KEY (x);
>
>
>
> --
>  Bruce Momjian                        |  http://candle.pha.pa.us
>  pgman@candle.pha.pa.us               |  (610) 359-1001
>  +  If your life is a hard drive,     |  13 Roberts Road
>  +  Christ can be your backup.        |  Newtown Square, Pennsylvania
> 19073


Re: pg_restore TODO - delay PK creation

From
Bruce Momjian
Date:
Iain wrote:
> Hi Bruce,
>
> OK, I've just looked into it and you are right, thanks.
>
> In the case I just tested I restored a 7.1 dump to a 7.4.6 db. I had assumed
> that this is a restore time issue but in fact it is dependent on the format
> of the dump file. I noticed the problem after the data load failed and I
> checked the description of the table in question, it had a primary key.
>
> Worth noting for anybody preparing to upgrade large databases from old
> versions. This db takes 8 hours to restore on 7.1 but I just dumped it and
> restored it in less than 20 minutes total on 7.4 :)

Quite a dramatic improvement, though 7.1 is in the ancient category.

> I havn't used 7.4 for dumping/restoring for a long time so I had forgotten
> how it worked. We're just starting a redevelopment that will include an
> upgrade of the production db so I'm looking forward to working with 7.4
> again, though I'd like to be working on v8.
>
> Out of interest, what is the word on using newer versions of pg_dump on
> older verisons of  the DB - is it is possible or even wise to unload a 7.1
> DB with the 7.4 version of pg?

It should work.  I see version checks for >=70100 in the code so you
should be fine using 7.4 or 8.0 pg_dump for 7.1.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pg_restore TODO - delay PK creation

From
Tom Lane
Date:
"Iain" <iain@mst.co.jp> writes:
> Out of interest, what is the word on using newer versions of pg_dump on
> older verisons of  the DB - is it is possible or even wise to unload a 7.1
> DB with the 7.4 version of pg?

Standard, recommended procedure is to use the later version of pg_dump
to unload data from the old server.  This gets you the benefit of any
bug fixes in the newer pg_dump (of which there are always some...)

Note that pg_dump is usually not designed to produce output that will
load into back-release servers, so this only applies to cross-version
upgrades.  You can't use the 7.4 pg_dump against a 7.1 server and expect
to get a dump file that will load back into the 7.1 server --- more than
likely, the dump file will make use of features that weren't in 7.1.

            regards, tom lane

Re: pg_restore TODO - delay PK creation

From
"Iain"
Date:
Thanks guys, I'll keep that in mind.

Sorry to keep on about this, but please bear with me for one more  m(_ _)m
<-- this is me abasing myself before you

IIRC there has been a bug fix to the COPY command, regarding handling
backslashed Ns or something like that sometime between 7.1 and 7.4, if I
dump the 7.1 db with the 7.4 version of pg_dump then do I get the fixes to
COPY?

The documentation for psql is clear that it uses a frontend version of copy
so I'm gonna assume for now (until I test it am told eitherway) that psql
7.4 will use the later version of copy against a 7.1 db.

Because the old db is badly in need of maintenance and some tables take a
considerable time to unload, I was considering running a few concurrent
sessions of psql and using the COPY command so I can start loading smaller
tables on the new server while the bigger ones are still unloading. Most of
the old tables have to be transformed to a new format, and I wanna get home
by the last train if possible :) .

regards
iain

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Iain" <iain@mst.co.jp>
Cc: "Bruce Momjian" <pgman@candle.pha.pa.us>; <pgsql-admin@postgresql.org>
Sent: Monday, November 01, 2004 12:36 PM
Subject: Re: [ADMIN] pg_restore TODO - delay PK creation


> "Iain" <iain@mst.co.jp> writes:
>> Out of interest, what is the word on using newer versions of pg_dump on
>> older verisons of  the DB - is it is possible or even wise to unload a
>> 7.1
>> DB with the 7.4 version of pg?
>
> Standard, recommended procedure is to use the later version of pg_dump
> to unload data from the old server.  This gets you the benefit of any
> bug fixes in the newer pg_dump (of which there are always some...)
>
> Note that pg_dump is usually not designed to produce output that will
> load into back-release servers, so this only applies to cross-version
> upgrades.  You can't use the 7.4 pg_dump against a 7.1 server and expect
> to get a dump file that will load back into the 7.1 server --- more than
> likely, the dump file will make use of features that weren't in 7.1.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings


Re: pg_restore TODO - delay PK creation

From
Tom Lane
Date:
"Iain" <iain@mst.co.jp> writes:
> IIRC there has been a bug fix to the COPY command, regarding handling
> backslashed Ns or something like that sometime between 7.1 and 7.4, if I
> dump the 7.1 db with the 7.4 version of pg_dump then do I get the fixes to
> COPY?

My recollection is that some of those changes were on the backend side,
and so they would affect the COPY data that pg_dump transcribes to the
dump file.  Using the newer pg_dump will not make you any worse off
AFAIR, but it's not a magic fix for server-side bugs either ...

            regards, tom lane

Re: pg_restore TODO - delay PK creation

From
"Iain"
Date:
Hi Bruce and Tom,

I've got the picture now, thanks.

I'll be sure to test it all pretty thoroughly before the day anyway.

Thanks again.

Iain
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Iain" <iain@mst.co.jp>
Cc: "Bruce Momjian" <pgman@candle.pha.pa.us>; <pgsql-admin@postgresql.org>
Sent: Monday, November 01, 2004 2:56 PM
Subject: Re: [ADMIN] pg_restore TODO - delay PK creation


> "Iain" <iain@mst.co.jp> writes:
>> IIRC there has been a bug fix to the COPY command, regarding handling
>> backslashed Ns or something like that sometime between 7.1 and 7.4, if I
>> dump the 7.1 db with the 7.4 version of pg_dump then do I get the fixes
>> to
>> COPY?
>
> My recollection is that some of those changes were on the backend side,
> and so they would affect the COPY data that pg_dump transcribes to the
> dump file.  Using the newer pg_dump will not make you any worse off
> AFAIR, but it's not a magic fix for server-side bugs either ...
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend