Re: [PERFORM] performance problem on big tables - Mailing list pgsql-performance

From Mariel Cherkassky
Subject Re: [PERFORM] performance problem on big tables
Date
Msg-id CA+t6e1khEV=YeWR-xCk8i4s7q--FKxo_PNmB+OJ=jxnr8g51dg@mail.gmail.com
Whole thread Raw
In response to Re: [PERFORM] performance problem on big tables  (Daniel Blanch Bataller <daniel.blanch.bataller@gmail.com>)
Responses Re: [PERFORM] performance problem on big tables  (MichaelDBA <MichaelDBA@sqlexec.com>)
List pgsql-performance
I`m searching for a way to improve the current performance, I'm not interesting in using a different tool or writing something new because I'm trying to migrate a system on oracle database to a postgresql database.

2017-08-21 14:53 GMT+03:00 Daniel Blanch Bataller <daniel.blanch.bataller@gmail.com>:

El 21 ago 2017, a las 13:27, Mariel Cherkassky <mariel.cherkassky@gmail.com> escribió:

All this operation runs as part of a big transaction that I run.
How can I create a dump in the oracle server and copy it to the postgresql server from a postgresql transaction ?

I guess you could create a user defined function in any of the available languages (perl, python, java, …). Functions run inside transactions too…this is not simple, though. 

Chopping the table is optional when I use copy, but when I use copy to remote oracle table it takes longer to create the dump. 

It may take longer depending on how the oracle machine, table and database are configured. In my experience oracle is not very fast dumping whole tables, not to mention tables with BLOB data, which can be as slow as hundreds of records per second  (which is probably not your case).

If this transaction is to synchronize data between transactional servers and data analysis servers you may consider using some type of replication where only changes are sent. EnterpriseDB has tools to do such things, I’m not aware of any other tool that can do this between oracle and postgres.

Regards,

Daniel.


2017-08-21 11:37 GMT+03:00 Daniel Blanch Bataller <daniel.blanch.bataller@gmail.com>:

El 21 ago 2017, a las 10:00, Mariel Cherkassky <mariel.cherkassky@gmail.com> escribió:

To summarize, I still have performance problems. My current situation : 

I'm trying to copy the data of many tables in the oracle database into my postgresql tables. I'm doing so by running insert into local_postgresql_temp select * from remote_oracle_table. The performance of this operation are very slow and I tried to check the reason for that and mybe choose a different alternative.

1)First method - Insert into local_postgresql_table select * from remote_oracle_table this generated total disk write of 7 M/s and actual disk write of 4 M/s(iotop). For 32G table it took me 2 hours and 30 minutes.

2)second method - copy (select * from oracle_remote_table) to /tmp/dump generates total disk write of 4 M/s and actuval disk write of 100 K/s. The copy utility suppose to be very fast but it seems very slow.




Are you using a FDW to access oracle server and then dump it using copy? This is going to be slow, FDW isn't fast.


-When I run copy from the local dump, the reading is very fast 300 M/s.



You reported it was slow before. What has changed? How much does it take to load the 32G table then?


-I created a 32G file on the oracle server and used scp to copy it and it took me a few minutes.


-The wals directory is located on a different file system. The parameters I assigned :

min_parallel_relation_size = 200MB
max_parallel_workers_per_gather = 5 
max_worker_processes = 8 
effective_cache_size = 12GB
work_mem = 128MB
maintenance_work_mem = 4GB
shared_buffers = 2000MB
RAM : 16G
CPU CORES : 8

HOW can I increase the writes ? How can I get the data faster from the oracle database to my postgresql database?




Extract the table to a file in the oracle server in a format that the COPY utility can read, then copy it to postgres server and load it. You can even pipe commands and do it in a single step.

This is what I meant when I said that COPY is much faster than any thing else. To make it even faster, if I/O is not your bottleneck, you can chop the table in chunks and load it in parallel as I told you before, I have done this many times when migrating data from oracle to postgres. ora2pg uses this method to migrate data from oracle to postgres too. 


2017-08-20 14:00 GMT+03:00 Mariel Cherkassky <mariel.cherkassky@gmail.com>:
I realized something weird. When I`m preforming the copy utility of postgresql in order to create dump from a local table in my postgresql db it takes for 32G table 20 minutes. When I try to use copy for a foregin table (on oracle database) It takes more than 2 hours.. During the copy operation from the foreign table I dont see alot of write operations, with iotop i see that its writes 3 M/s. What else I can check ? 

2017-08-20 9:39 GMT+03:00 Mariel Cherkassky <mariel.cherkassky@gmail.com>:
This server is dedicated to be a postgresql production database, therefore postgresql is the only thing the runs on the server. The fs that I`m using is xfs. I`ll add two different disks - one for the wals and one for the temp tablespace. Regarding the disk, what size should they be considering that the database size is about 250G. Does 16G of ram considered little ? I installed iotop and I see that postgresql writer is writing most of the time and above all.

I mentioned that I perform alot of insert into table select * from table. Before that I remove indexes,constraints and truncate the table. Should I run vacuum before or after the operation ? 

2017-08-17 19:37 GMT+03:00 Claudio Freire <klaussfreire@gmail.com>:
On Thu, Aug 17, 2017 at 6:00 AM, Mariel Cherkassky
<mariel.cherkassky@gmail.com> wrote:
> I checked with the storage team in the company and they saw that I have alot
> of io on the server. How should I reduce the io that the postgresql uses ?

Do you have concurrent activity on that server?

What filesystem are you using wherever the data is sitting?

If you've got concurrent fsyncs happening, some filesystems handle
that poorly. When you've got WAL and data mixed in a single disk, or
worse, filesystem, it happens often that the filesystem won't handle
the write barriers for the WAL efficiently. I/O gets intermingled with
bulk operations, and even small fsyncs will have to flush writes from
bulk operations, which makes a mess of things.

It is a very good idea, and in fact a recommended practice, to put WAL
on its own disk for that reason mainly.

With that little RAM, you'll also probably cause a lot of I/O in temp
files, so I'd also recommend setting aside another disk for a temp
tablespace so that I/O doesn't block other transactions as well.

This is all assuming you've got concurrent activity on the server. If
not, install iotop and try to see who's causing that much I/O.





pgsql-performance by date:

Previous
From: Daniel Blanch Bataller
Date:
Subject: Re: [PERFORM] performance problem on big tables
Next
From: MichaelDBA
Date:
Subject: Re: [PERFORM] performance problem on big tables