Re: Strange results using pg_restore - Mailing list pgsql-admin

From Oscar Calderon
Subject Re: Strange results using pg_restore
Date
Msg-id CAOqBXCB-CjAC5=z97PzdDmH9ep8x87ArQodTys_0iKh4jUgrrA@mail.gmail.com
Whole thread Raw
In response to Re: Strange results using pg_restore  (reiner peterke <zedaardv@drizzle.com>)
List pgsql-admin
Thank you Kevin Grittner, i was hoping to find an answer like that, because honestly even when i'm studying a lot about PostgreSQL those days (i'm planning to take Associate Certificate exam in a couple of weeks) i have no previous experience in migration of database of that size (i've only migrated some databases of about 20Mb from applications that we currently host where i work of clients that are just starting working with us, for those cases i used SQL backups) so i have doubts about those execution times. Just as an extra info, that's my computer, and the production server where i'm going to reinstall database and restore backups is even more powerful and dedicated just to host PostgreSQL cluster so it will be faster:

IBM X3850 (that's the model but i'm not sure about the exact version)
RHEL 5.7
16GB RAM
Intel Xeon X7350 2.93Ghz (8 cores)
500 GB hard drive (but in a future they are planning to move the data to a LUN, i don't have too much knowledgement about that but i think that i will have to create a new tablespace in that LUN and then move all the objects in a future)

Thank you also for the piece of advice about VACUUM FREEZE ANALYZE, i was planning just to shutdown PostgreSQL service, then restore the postgresql.conf tuned for normal operations (i will create a copy with settings tuned for restoring and another one with general tuning) and start the server, but i will run this before shutdown. Also i didn't know that it will take some time in shutdown. Or maybe i will wait a couple of minutes befure shutting down.

And thanks Reiner, as i said before i don't have previous migration experience of this level so i was unsure, but i'm glad to hear about your experience with bigger databases. I will check the command pgupgrade, i didn't know about that, altough it isn't an option for me in this situation because, they currently have RHEL 4, and they are forced to upgrade to RHEL 5.7 (because other backup technologies that they have in that server doesn't let him to migrate to the last RHEL stable version) to continue getting support from RHEL provider, but they don't want to upgrade, they want to wipe the server and reinstall RHEL 5.7, so i have to backup everything and extract it before that. and then install PG 9.3 and restore backups.

Thank to everybody, i will share my experience tomorrow, even when isn't as big as yours, but as a reference for other people in my situation.

Regards.

***************************
Oscar Calderon
Analista de Sistemas
Soluciones Aplicativas S.A. de C.V.
www.solucionesaplicativas.com
Cel. (503) 7741 7850 Tel. (503) 2522-2834


2013/10/18 reiner peterke <zedaardv@drizzle.com>
Hi Oscar,

I've done some migration work on several of our databases, in the 150-200 GB  range.  The times have been around 20-40 minuets.
So, depending on the hardware your times seem to be reasonable.

Regarding the smaller size, doing a pg_restore the tables and indexes will be of 'optimal' size.  the restore will build the tables and indexes without any wasted space so there should be a reduction in size over the databases.

check the number of rows and look at sample data.  when i'm wanting to do a quick comparison i'll use a query like,
select to_char('<timespamp_column>','yyyy-mm') dt, count(*) from <table> group by dt;
and compare the results between the two.

If you have the disk space i would recommend using the pg_upgrade command.

reiner

On 17 okt 2013, at 17:48, Oscar Calderon <ocalderon@solucionesaplicativas.com> wrote:

Have a good day. This friday i'm going to migrate an entire database of a government institution in my country. Those are like 4 database of 2GB each one. So, i was preparing about what i'm going to do tomorrow. They currently have PostgreSQL 9.1 installed from source i think, and they will update RHEL version to 5.7, it will be a reinstall of operative system, so before that i will backup the databases, copy contents of pg_hba.conf and so on, and after the reinstallation of RHEL i'm going to install PostgreSQL 9.3 from official repositories alongside contrib and plpython. 

I've already tested that on a Centos 5.7 VM and it works fine. But well, yesterday i was practicing creation of backups and restoring backups in my laptop with the next specs:

Core i7 2.2 Ghz
8GB RAM
30GB empty space
Windows 8.1

In my laptop i also have PostgreSQL 9.3. First i used the sample database pgdellstore, what i did is that i created a database, and ran the SQL file of the database. After that, i created a backup of that database pgdellstore with pg_dump, a custom format, without compression:

pg_dump -h localhost -p 5432 -U postgres -W -Fc -Z0 -C -d pgdellstore > pgdellstore.backup

And everything as normal. After that i created a second database and restored my custom backup with pg_restore trying to speed up using -j option:

pg_restore -h localhost -p 5432 -U postgres -W -d pgdellstore -j4 pgdellstore.backup

Also, before that, i backed up my postgresql.conf and edited the current one with the next parameters, according to some research that i did in google about performance tuning of pg_restore:

shared_buffers=2GB
maintenance_work_mem=1GB
work_mem=128MB
wal_buffers=16MB 
checkpoint_segments=8 
autovacuum=off
archive_mode=off 
fsync=off 
full_page_writes=off
checkpoint_timeout=15min 
checkpoint_completion_target=0.9
track_counts=off
synchronous_commit=off 
bgwriter_delay=50ms

And it restored it in 2 seconds i think, for me it was extremely fast (is a database of 25MB aproximately) and i had doubts, so i have some queries to check database tables size and i compared the results of the first database (the one that i created with original sql script) and the second one (the one that i created using a custom backup of the first one) and i got this:

DB created with original SQL script:

 schemaname |  tablename   | reltuples |  tamanio   | tamanioord
------------+--------------+-----------+------------+------------
 public     | customers    |     20000 | 5016 kB    |    5136384
 public     | orderlines   |     60350 | 4440 kB    |    4546560
 public     | cust_hist    |     60350 | 3976 kB    |    4071424
 public     | products     |     10000 | 1552 kB    |    1589248
 public     | orders       |     12000 | 1384 kB    |    1417216
 public     | inventory    |     10000 | 704 kB     |     720896
 public     | categories   |        16 | 24 kB      |      24576
 public     | afiliado     |         4 | 24 kB      |      24576
 public     | pruebafechas |         0 | 8192 bytes |       8192
 public     | reorder      |         0 | 0 bytes    |          0
(10 filas)

DB created with custom backup based on first db

 schemaname |  tablename   | reltuples |  tamanio   | tamanioord
------------+--------------+-----------+------------+------------
 public     | customers    |     20000 | 4992 kB    |    5111808
 public     | orderlines   |     60350 | 4416 kB    |    4521984
 public     | cust_hist    |     60350 | 3952 kB    |    4046848
 public     | products     |     10000 | 1528 kB    |    1564672
 public     | orders       |     12000 | 1360 kB    |    1392640
 public     | inventory    |     10000 | 680 kB     |     696320
 public     | afiliado     |         4 | 24 kB      |      24576
 public     | categories   |        16 | 24 kB      |      24576
 public     | pruebafechas |         0 | 8192 bytes |       8192
 public     | reorder      |         0 | 0 bytes    |          0
(10 filas)

This is the query that i used to get those results:

SELECT tbl.schemaname, tbl.tablename, obj.reltuples
, pg_size_pretty(pg_total_relation_size(text('"' || tbl.schemaname || '"."' || tbl.tablename || '"'))) tamanio, pg_total_relation_size(text('"' || tbl.schemaname || '"."' || tbl.tablename || '"')) tamanioord
FROM pg_tables tbl, pg_class obj WHERE tbl.tablename = obj.relname
AND tbl.schemaname NOT IN ('pg_catalog', 'information_schema') ORDER BY tamanioord DESC;

As you can see, there are certain variations in size, the number of tuples remains, but the fields tamanio (formatted size) and tamanioord (size without format) have a difference, for example the table customers has 5016 kB versus 4992 kB . I'm afraid that in someway i lost data or something like that, or according to your experience that can be normal and doesn't mean that information is corrupted or something, because tomorrow when i will do that with productions databases it will be a chaos if information gets corrupted.

Also, my second concern is the next: 4 months ago, i did a routine maintenance of those production database, and first as usual i backup all the databases. I took one of the backups of a development database for testing purposes. The backup file weights 279MB and the database is like 1.8 GB  . So, yesterday i also practiced restoring this backup with pg_restore in a new database, with the same configuration from above. The command used was this:

pg_restore -h localhost -p 5432 -U postgres -W --create -d debdbrest -j4 devdb.backup

And it took only 1:30 mins aproximately. For me is suspiciously fast becuse when i was reading about pg_restore and how to enhace performance during backup restoring, i found various cases in previous postgresql versions about pg_restore is too slow, and also, i tried the same restore of the same backups 2 days ago but without postgresql.conf parameters modified and without using -j flag, and it started and after 2 hours it hadn't finished, so i cancelled; also in the first try i noted that it printed a lot of output in the console and in this second try it didn't print anything.

But that time reduction from more than 2 hours to 1:30 mins is too suspicious for me, but i don't know if, as the previous question that i have, you consider that this is normal (a backup of 279MB of a database of 1840MB restored in just 1.30 min).

Regards.

***************************
Oscar Calderon
Analista de Sistemas
Soluciones Aplicativas S.A. de C.V.
www.solucionesaplicativas.com
Cel. (503) 7741 7850 Tel. (503) 2522-2834


pgsql-admin by date:

Previous
From: reiner peterke
Date:
Subject: Re: Strange results using pg_restore
Next
From: Thara Vadakkeveedu
Date:
Subject: CPU usage by sqls