Thread: pg_restore taking 4 hours!

pg_restore taking 4 hours!

From
Rodrigo Carvalhaes
Date:
Hi!

I am using PostgreSQL with a proprietary ERP software in Brazil. The
database have around 1.600 tables (each one with +/- 50 columns).
My problem now is the time that takes to restore a dump. My customer
database have arount 500mb (on the disk, not the dump file) and I am
making the dump with pg_dump -Fc, my dumped file have 30mb. To make the
dump, it's taking +/- 1,5 hours BUT to restore (using pg_restore ) it it
takes 4 - 5 hours!!!

Our machine it's a Dell Server Power Edge 1600sc (Xeon 2,4Ghz, with 1GB
memory, 7200 RPM disk). I don't think that there is a machine problem
because it's a server dedicated for the database and the cpu utilization
during the restore is around 30%.

Looking on the lists arquives I found some messages about this and Tom
Lane was saying that then you have a lot of convertions the dump can
delay too much. 90% of the columns on my database are char columns and I
don't have large objects on the database. The restore is delaying too
much because the conversion of the char columns ? How can I have a
better performance on this restore?

I need to find a solution for this because I am convincing customers
that are using SQL Server, DB2 and Oracle to change to PostgreSQL but
this customers have databases of 5GB!!! I am thinking that even with a
better server, the restore will take 2 days!

My data:
Conectiva Linux 10 , Kernel 2.6.8
PostgreSQL 7.4.6.

postgresql.conf modified parameters (the other parameters are the default)
tcpip_socket = true
max_connections = 30
shared_buffers = 30000
sort_mem = 4096
vacuum_mem = 8192
max_fsm_pages = 20000
max_fsm_relations = 1000

Regards,

Rodrigo Carvalhaes

Re: [PERFORM] pg_restore taking 4 hours!

From
Shridhar Daithankar
Date:
On Wednesday 01 Dec 2004 4:46 pm, Rodrigo Carvalhaes wrote:
> I need to find a solution for this because I am convincing customers
> that are using SQL Server, DB2 and Oracle to change to PostgreSQL but
> this customers have databases of 5GB!!! I am thinking that even with a
> better server, the restore will take 2 days!
>
> My data:
> Conectiva Linux 10 , Kernel 2.6.8
> PostgreSQL 7.4.6.
>
> postgresql.conf modified parameters (the other parameters are the default)
> tcpip_socket = true
> max_connections = 30
> shared_buffers = 30000
> sort_mem = 4096
> vacuum_mem = 8192
> max_fsm_pages = 20000
> max_fsm_relations = 1000

Can you try bumping sort mem lot higher(basically whatever the machine can
afford) so that index creation is faster?

Just try setting sort mem for the restore session and see if it helps..

 Shridhar

Re: [PERFORM] pg_restore taking 4 hours!

From
"Riccardo G. Facchini"
Date:
--- Shridhar Daithankar <__> wrote:

> On Wednesday 01 Dec 2004 4:46 pm, Rodrigo Carvalhaes wrote:
> > I need to find a solution for this because I am convincing
> customers
> > that are using SQL Server, DB2 and Oracle to change to PostgreSQL
> but
> > this customers have databases of 5GB!!! I am thinking that even
> with a
> > better server, the restore will take 2 days!
> >
> > My data:
> > Conectiva Linux 10 , Kernel 2.6.8
> > PostgreSQL 7.4.6.
> >
> > postgresql.conf modified parameters (the other parameters are the
> default)
> > tcpip_socket = true
> > max_connections = 30
> > shared_buffers = 30000
> > sort_mem = 4096
> > vacuum_mem = 8192
> > max_fsm_pages = 20000
> > max_fsm_relations = 1000
>
> Can you try bumping sort mem lot higher(basically whatever the
> machine can
> afford) so that index creation is faster?
>
> Just try setting sort mem for the restore session and see if it
> helps..
>
>  Shridhar
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

Yes, indexing is probably the issue.

You can always ask them to report how long does it take to restore
their M$-SQL, DB2 and Oracle from a scripting dump.

I've been restoring DB2 for a looong time (on different architectures)
and the main problem comes from indexing.

As an index is basically a dynamic structure that is created on the
physical data (the data stored on the table), what is normally saved is
the index DEFINITION, not the index itself, so this is recreated at
restore time.

Some DB2 architectures (and M$-SQL, and Oracle, and Sybase, and others.
others) may have a backup tool that is capable of saving the index
data, but is almost never used, as the index space itself can grow well
over the data size.

I'll give one example: we have one DB2 on iSeries that runs around the
70Gb of Data and Indexes. We do a full backup that occupies only 45Gb
of Data and we do that in a little more than 1 hour because we only
save the index definitions.

We know for sure that this full backup takes something between 5 and 7
hours because of the reindexing. I had this written down in the Restore
Procedure Manual, so the user can't complain (they know that the
procedure will eventually restore the data and the full functionality).

So, make sure that your client knows of their restore times.

One small trick that can help you:
FIRST restore the tables.
THEN restore the foreingn keys, the constraints and the triggers and
procedures.
LAST restore the indexes and views.
LATEST restore the security.

This way, if you have complicated views and indexes with a lot of info,
the procedure <<<may>>> be shorter.

regards,

R.

Re: [PERFORM] pg_restore taking 4 hours!

From
Tom Lane
Date:
Shridhar Daithankar <ghodechhap@ghodechhap.net> writes:
> On Wednesday 01 Dec 2004 4:46 pm, Rodrigo Carvalhaes wrote:
>> I need to find a solution for this because I am convincing customers
>> that are using SQL Server, DB2 and Oracle to change to PostgreSQL but
>> this customers have databases of 5GB!!! I am thinking that even with a
>> better server, the restore will take 2 days!

> Can you try bumping sort mem lot higher(basically whatever the machine can
> afford) so that index creation is faster?

It would be a good idea to bump up vacuum_mem as well.  In current
sources it's vacuum_mem (well actually maintenance_work_mem) that
determines the speed of CREATE INDEX; I forget just how long that
behavior has been around, but 7.4.6 might do it too.

            regards, tom lane

Re: [PERFORM] pg_restore taking 4 hours!

From
Josh Berkus
Date:
Rodrigo,

> Our machine it's a Dell Server Power Edge 1600sc (Xeon 2,4Ghz, with 1GB
> memory, 7200 RPM disk). I don't think that there is a machine problem
> because it's a server dedicated for the database and the cpu utilization
> during the restore is around 30%.

In addition to Tom and Shridhar's advice, a single IDE disk is simply going to
make restores slow.   A 500MB data file copy on that disk, straight, would
take up to 15 min.   If this is for your ISV application, you need to
seriously re-think your hardware strategy; spending less on processors and
more on disks would be wise.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: pg_restore taking 4 hours!

From
Thierry Missimilly
Date:
Rodrigo Carvalhaes a écrit :

> Hi!
>
> I am using PostgreSQL with a proprietary ERP software in Brazil. The
> database have around 1.600 tables (each one with +/- 50 columns).
> My problem now is the time that takes to restore a dump. My customer
> database have arount 500mb (on the disk, not the dump file) and I am
> making the dump with pg_dump -Fc, my dumped file have 30mb. To make
> the dump, it's taking +/- 1,5 hours BUT to restore (using pg_restore )
> it it takes 4 - 5 hours!!!

I have notice that fac and one way to improve the restore prefomances,
is to avoid build indexes and checking the foreign key in the same step
than the restore.
So, as it is not possible to disable indexes and Foreign key, you have
to drop them and recreate them once the restore step has finished. To do
that you should have a script to recreate the indexes and the Foreign
Key afterward.

>
> Our machine it's a Dell Server Power Edge 1600sc (Xeon 2,4Ghz, with
> 1GB memory, 7200 RPM disk). I don't think that there is a machine
> problem because it's a server dedicated for the database and the cpu
> utilization during the restore is around 30%.
>
> Looking on the lists arquives I found some messages about this and Tom
> Lane was saying that then you have a lot of convertions the dump can
> delay too much. 90% of the columns on my database are char columns and
> I don't have large objects on the database. The restore is delaying
> too much because the conversion of the char columns ? How can I have a
> better performance on this restore?
>
> I need to find a solution for this because I am convincing customers
> that are using SQL Server, DB2 and Oracle to change to PostgreSQL but
> this customers have databases of 5GB!!! I am thinking that even with a
> better server, the restore will take 2 days!
>
> My data:
> Conectiva Linux 10 , Kernel 2.6.8
> PostgreSQL 7.4.6.
>
> postgresql.conf modified parameters (the other parameters are the
> default)
> tcpip_socket = true
> max_connections = 30
> shared_buffers = 30000
> sort_mem = 4096 vacuum_mem = 8192
> max_fsm_pages = 20000
> max_fsm_relations = 1000
>
> Regards,
>
> Rodrigo Carvalhaes
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>               http://archives.postgresql.org
>


Re: pg_restore taking 4 hours!

From
"Joshua D. Drake"
Date:
Thierry Missimilly wrote:

>
> Rodrigo Carvalhaes a écrit :
>
>> Hi!
>>
>> I am using PostgreSQL with a proprietary ERP software in Brazil. The
>> database have around 1.600 tables (each one with +/- 50 columns).
>> My problem now is the time that takes to restore a dump. My customer
>> database have arount 500mb (on the disk, not the dump file) and I am
>> making the dump with pg_dump -Fc, my dumped file have 30mb. To make
>> the dump, it's taking +/- 1,5 hours BUT to restore (using pg_restore
>> ) it it takes 4 - 5 hours!!!
>
>
> I have notice that fac and one way to improve the restore prefomances,
> is to avoid build indexes and checking the foreign key in the same
> step than the restore.
> So, as it is not possible to disable indexes and Foreign key, you have
> to drop them and recreate them once the restore step has finished. To
> do that you should have a script to recreate the indexes and the
> Foreign Key afterward.
>
There are a couple of things you can do.

1. Turn off Fsync for the restore
2. Restore in three phases:

    1. Schema without constraints or indexes
    2. Restore data
    3. Apply rest of schema with constraints and indexes

3. Increase the number of transaction logs.

Sincerely,

Joshua D. Drake





>>
>> Our machine it's a Dell Server Power Edge 1600sc (Xeon 2,4Ghz, with
>> 1GB memory, 7200 RPM disk). I don't think that there is a machine
>> problem because it's a server dedicated for the database and the cpu
>> utilization during the restore is around 30%.
>>
>> Looking on the lists arquives I found some messages about this and
>> Tom Lane was saying that then you have a lot of convertions the dump
>> can delay too much. 90% of the columns on my database are char
>> columns and I don't have large objects on the database. The restore
>> is delaying too much because the conversion of the char columns ? How
>> can I have a better performance on this restore?
>>
>> I need to find a solution for this because I am convincing customers
>> that are using SQL Server, DB2 and Oracle to change to PostgreSQL but
>> this customers have databases of 5GB!!! I am thinking that even with
>> a better server, the restore will take 2 days!
>>
>> My data:
>> Conectiva Linux 10 , Kernel 2.6.8
>> PostgreSQL 7.4.6.
>>
>> postgresql.conf modified parameters (the other parameters are the
>> default)
>> tcpip_socket = true
>> max_connections = 30
>> shared_buffers = 30000
>> sort_mem = 4096 vacuum_mem = 8192
>> max_fsm_pages = 20000
>> max_fsm_relations = 1000
>>
>> Regards,
>>
>> Rodrigo Carvalhaes
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: Have you searched our list archives?
>>
>>               http://archives.postgresql.org
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>               http://archives.postgresql.org



--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL


Attachment