Thread: loading increase into huge table with 50.000.000 records
Hello, Sorry for my poor english, My problem : I meet some performance problem during load increase. massive update of 50.000.000 records and 2.000.000 insert with a weekly frequency in a huge table (+50.000.000 records, ten fields, 12 Go on hard disk) current performance obtained : 120 records / s At the beginning, I got a better speed : 1400 records/s CPU : bi xeon 2.40GHz (cache de 512KB) postgresql version : 8.1.4 OS : debian Linux sa 2.6.17-mm2 Hard disk scsi U320 with scsi card U160 on software RAID 1 Memory : only 1 Go at this time. My database contains less than ten tables. But the main table takes more than 12 Go on harddisk. This table has got ten text records and two date records. I use few connection on this database. I try many ideas : - put severals thousands operations into transaction (with BEGIN and COMMIT) - modify parameters in postgres.conf like shared_buffers (several tests with 30000 50000 75000) fsync = off checkpoint_segments = 10 (several tests with 20 - 30) checkpoint_timeout = 1000 (30-1800) stats_start_collector = off unfortunately, I can't use another disk for pg_xlog file. But I did not obtain a convincing result My program does some resquest quite simple. It does some UPDATE table set dat_update=current_date where id=XXXX ; And if not found id does some insert into table My sysadmin tells me write/read on hard disk aren't the pb (see with iostat) Have you got some idea to increase performance for my problem ? Thanks. Larry.
Hi Larry, Do you run vacuum and analyze frequently? Did you check PowerPostgresql.com for hints about PostgreSQL tuning? <http://www.powerpostgresql.com/Docs/> You can increase wal_buffers, checkpoint_segments and checkpoint_timeout much higher. Here is a sample which works for me. wal_buffers = 128 checkpoint_segments = 256 checkpoint_timeout = 3600 Cheers Sven. nuggets72@free.fr schrieb: > Hello, > Sorry for my poor english, > > My problem : > > I meet some performance problem during load increase. > > massive update of 50.000.000 records and 2.000.000 insert with a weekly > frequency in a huge table (+50.000.000 records, ten fields, 12 Go on hard disk) > > current performance obtained : 120 records / s > At the beginning, I got a better speed : 1400 records/s > > > CPU : bi xeon 2.40GHz (cache de 512KB) > postgresql version : 8.1.4 > OS : debian Linux sa 2.6.17-mm2 > Hard disk scsi U320 with scsi card U160 on software RAID 1 > Memory : only 1 Go at this time. > > > My database contains less than ten tables. But the main table takes more than 12 > Go on harddisk. This table has got ten text records and two date records. > > I use few connection on this database. > > I try many ideas : > - put severals thousands operations into transaction (with BEGIN and COMMIT) > - modify parameters in postgres.conf like > shared_buffers (several tests with 30000 50000 75000) > fsync = off > checkpoint_segments = 10 (several tests with 20 - 30) > checkpoint_timeout = 1000 (30-1800) > stats_start_collector = off > > unfortunately, I can't use another disk for pg_xlog file. > > > But I did not obtain a convincing result > > > > My program does some resquest quite simple. > It does some > UPDATE table set dat_update=current_date where id=XXXX ; > And if not found > id does some > insert into table > > > My sysadmin tells me write/read on hard disk aren't the pb (see with iostat) > > > Have you got some idea to increase performance for my problem ? > > Thanks. > > Larry. > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster -- /This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient, you should not copy it, re-transmit it, use it or disclose its contents, but should return it to the sender immediately and delete your copy from your system. Thank you for your cooperation./ Sven Geisler <sgeisler@aeccom.com> Tel +49.30.5362.1627 Fax .1638 Senior Developer, AEC/communications GmbH Berlin, Germany
Hi, Larry, Hi, Sven, Sven Geisler wrote: > You can increase wal_buffers, checkpoint_segments and checkpoint_timeout > much higher. You also should increase the free space map settings, it must be large enough to cope with your weekly bunch. Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
On 7/26/06, nuggets72@free.fr <nuggets72@free.fr> wrote: > Hello, > Sorry for my poor english, > > My problem : > > I meet some performance problem during load increase. > > massive update of 50.000.000 records and 2.000.000 insert with a weekly > frequency in a huge table (+50.000.000 records, ten fields, 12 Go on hard disk) > > current performance obtained : 120 records / s > At the beginning, I got a better speed : 1400 records/s > > > CPU : bi xeon 2.40GHz (cache de 512KB) > postgresql version : 8.1.4 > OS : debian Linux sa 2.6.17-mm2 > Hard disk scsi U320 with scsi card U160 on software RAID 1 > Memory : only 1 Go at this time. > > > My database contains less than ten tables. But the main table takes more than 12 > Go on harddisk. This table has got ten text records and two date records. > > I use few connection on this database. > > I try many ideas : > - put severals thousands operations into transaction (with BEGIN and COMMIT) > - modify parameters in postgres.conf like > shared_buffers (several tests with 30000 50000 75000) > fsync = off > checkpoint_segments = 10 (several tests with 20 - 30) > checkpoint_timeout = 1000 (30-1800) > stats_start_collector = off > > unfortunately, I can't use another disk for pg_xlog file. > > > But I did not obtain a convincing result > > > > My program does some resquest quite simple. > It does some > UPDATE table set dat_update=current_date where id=XXXX ; > And if not found > id does some > insert into table > > > My sysadmin tells me write/read on hard disk aren't the pb (see with iostat) your sysadmin is probably wrong. random query across 50m table on machine with 1gb memory is going to cause alot of seeking. take a look at your pg data folder and you will see it is much larger than 1gb. a lookup of a cached tuple via a cached index might take 0.2ms, and might take 200ms if it has to completely to disk on a 50m table. normal reality is somehwere in between depending on various factors. my guess is that as you add more memory, the time will drift from the slow case (120/sec) to the fast case (1400/sec). you may consider the following alternative: 1. bulk load your 2m update set into scratch table via copy interface 2. update table set dat_update=current_date where table.id=scratch.id 3. insert into table select [...], current_date where not exists (select id from table where table.id = scratch.id); you may experiment with boolean form of #3, using 'except' also. while running these monster queries definately crank up work mem in expesnse of shared buffers. merlin im am guessing you are bottlenecked at the lookup, not the update. so, if