Thread: Insert are going slower ...
Hi, I have a database with 10 tables having about 50 000 000 records ... Every day I have to delete about 20 000 records, inserting about the same in one of this table. Then I make some agregations inside the other tables to get some week results, and globals result by users. That mean about 180 000 to 300 000 insert by table each days. The time for the calculation of the request is about 2 / 4 minutes ... I do the request inside a temporary table ... then I do an insert into my_table select * from temp_table. And that's the point, the INSERT take about (depending of the tables) 41 minutes up to 2 hours ... only for 180 to 300 000 INSERTs ... The table have index really usefull ... so please do not tell me to delete some of them ... and I can't drop them before inserting data ... it's really too long to regenerate them ... I'm configured with no flush, I have 8 Gb of RAM, and RAID 5 with SCSI 7200 harddrive ... I'm using Linux Debian, with a PostgreSQL version compiled by myself in 7.4.3. What can I do to get better results ?? (configuration option, and/or hardware update ?) What can I give you to get more important informations to help me ? Regards, -- Hervé Piedvache Elma Ingénierie Informatique 6 rue du Faubourg Saint-Honoré F-75008 - Paris - France Pho. 33-144949901 Fax. 33-144949902
Herve, > What can I do to get better results ?? (configuration option, and/or > hardware update ?) > What can I give you to get more important informations to help me ? 1) What PostgreSQL version are you using? 2) What's your VACUUM, ANALYZE, VACUUM FULL, REINDEX schedule? 3) Can you list the non-default settings in your PostgreSQL.conf? Particularly, shared_buffers, sort_mem, checkpoint_segments, estimated_cache, and max_fsm_pages? -- Josh Berkus Aglio Database Solutions San Francisco
Josh, Le mardi 13 Juillet 2004 19:10, Josh Berkus a écrit : > > > What can I do to get better results ?? (configuration option, and/or > > hardware update ?) > > What can I give you to get more important informations to help me ? > > 1) What PostgreSQL version are you using? v7.4.3 > 2) What's your VACUUM, ANALYZE, VACUUM FULL, REINDEX schedule? VACUUM FULL VERBOSE ANALYZE; Every day after the calculation I was talking about ... > 3) Can you list the non-default settings in your PostgreSQL.conf? > Particularly, shared_buffers, sort_mem, checkpoint_segments, > estimated_cache, and max_fsm_pages? shared_buffers = 48828 sort_mem = 512000 vacuum_mem = 409600 max_fsm_pages = 50000000 max_fsm_relations = 2000 max_files_per_process = 2000 wal_buffers = 1000 checkpoint_segments = 3 effective_cache_size = 5000000 random_page_cost = 3 default_statistics_target = 20 join_collapse_limit = 10 Regards, -- Hervé Piedvache
From: "Hervé Piedvache" <footcow@noos.fr> Sent: Tuesday, July 13, 2004 11:42 PM > effective_cache_size = 5000000 looks like madness to me. my (modest) understanding of this, is that you are telling postgres to assume a 40Gb disk cache (correct me if I am wrong). btw, how much effect does this setting have on the planner? is there a recommended procedure to estimate the best value for effective_cache_size on a dedicated DB server ? gnari
gnari wrote: > is there a recommended procedure to estimate > the best value for effective_cache_size on a > dedicated DB server ? Rule of thumb(On linux): on a typically loaded machine, observe cache memory of the machine and allocate good chunk of it as effective cache. To define good chunck of it, you need to consider how many other things are running on that machine. If it is file server + web server + database server, you have to allocate the resources depending upon requirement. But remember It does not guarantee that it will be a good value. It is just a starting point..:-) You have to tune it further if required. HTH Shridhar
Le mercredi 14 Juillet 2004 12:13, Shridhar Daithankar a écrit : > gnari wrote: > > is there a recommended procedure to estimate > > the best value for effective_cache_size on a > > dedicated DB server ? > > Rule of thumb(On linux): on a typically loaded machine, observe cache > memory of the machine and allocate good chunk of it as effective cache. > > To define good chunck of it, you need to consider how many other things are > running on that machine. If it is file server + web server + database > server, you have to allocate the resources depending upon requirement. > > But remember It does not guarantee that it will be a good value. It is just > a starting point..:-) You have to tune it further if required. In my case it's a PostgreSQL dedicated server ... effective_cache_size = 5000000 For me I give to the planner the information that the kernel is able to cache 5000000 disk page in RAM >free total used free shared buffers cached Mem: 7959120 7712164 246956 0 17372 7165704 -/+ buffers/cache: 529088 7430032 Swap: 2097136 9880 2087256 What should I put ? Regards, -- Hervé Piedvache Elma Ingénierie Informatique 6 rue du Faubourg Saint-Honoré F-75008 - Paris - France Pho. 33-144949901 Fax. 33-144949902
Hervé Piedvache wrote: > In my case it's a PostgreSQL dedicated server ... > > effective_cache_size = 5000000 > > For me I give to the planner the information that the kernel is able to cache > 5000000 disk page in RAM That is what? 38GB of RAM? > > >>free > > total used free shared buffers cached > Mem: 7959120 7712164 246956 0 17372 7165704 > -/+ buffers/cache: 529088 7430032 > Swap: 2097136 9880 2087256 > > What should I put ? 7165704 / 8 = 895713 So counting variations, I would say 875000. That is a 8GB box, right? So 875000 is about 7000MB. Which should be rather practical. Of course you can give it everything you can but that's upto you. Can you get explain analze for inserts? I think some foreign key check etc. are taking long and hence it accumulates. But that is just a wild guess. Off the top of my head, you have allocated roughly 48K shard buffers which seems bit on higher side. Can you check with something like 10K-15K? HTH Shridhar
Herve' I forgot to ask about your hardware. How much RAM, and what's your disk setup? CPU? > sort_mem = 512000 Huh? Sort_mem is in K. The above says that you've allocated 512MB sort mem. Is this process the *only* thing going on on the machine? > vacuum_mem = 409600 Again, 409.6MB vacuum mem? That's an odd number, and quite high. > max_fsm_pages = 50000000 50million? That's quite high. Certianly enough to have an effect on your memory usage. How did you calculate this number? > checkpoint_segments = 3 You should probably increase this if you have the disk space. For massive insert operations, I've found it useful to have as much as 128 segments (although this means about 1.5GB disk space) > effective_cache_size = 5000000 If you actually have that much RAM, I'd love to play on your box. Please? > Off the top of my head, you have allocated roughly 48K shard buffers which > seems bit on higher side. Can you check with something like 10K-15K? Shridhar, that depends on how much RAM he has. On 4GB dedicated machines, I've set Shared_Buffers as high as 750MB. -- Josh Berkus Aglio Database Solutions San Francisco
Josh, Le mercredi 14 Juillet 2004 18:28, Josh Berkus a écrit : > > I forgot to ask about your hardware. How much RAM, and what's your disk > setup? CPU? 8 Gb of RAM Bi - Intel Xeon 2.00GHz Hard drive in SCSI RAID 5 /dev/sdb6 101G 87G 8.7G 91% /usr/local/pgsql/data /dev/sda7 1.8G 129M 1.6G 8% /usr/local/pgsql/data/pg_xlog Server dedicated to PostgreSQL with only one database. > > sort_mem = 512000 > > Huh? Sort_mem is in K. The above says that you've allocated 512MB sort > mem. Is this process the *only* thing going on on the machine? PostgreSQL dedicated server yes ... so it's too much ? How you decide the good value ? > > vacuum_mem = 409600 > > Again, 409.6MB vacuum mem? That's an odd number, and quite high. Yep but I have 8 Gb of memory ... ;o) So why not ? Just explain me why it's not a good choice ... I have done this because of this text from you found somewhere : "As this setting only uses RAM when VACUUM is running, you may wish to increase it on high-RAM machines to make VACUUM run faster (but never more than 20% of available RAM!)" So that's less than 20% of my memory ... > > max_fsm_pages = 50000000 > > 50million? That's quite high. Certianly enough to have an effect on > your memory usage. How did you calculate this number? Not done by me ... and the guy is out ... but in same time with 8 Gb of RAM ... that's not a crazy number ? > > checkpoint_segments = 3 > > You should probably increase this if you have the disk space. For massive > insert operations, I've found it useful to have as much as 128 segments > (although this means about 1.5GB disk space) > > > effective_cache_size = 5000000 > > If you actually have that much RAM, I'd love to play on your box. Please? Hum ... yes as Shridhar told me the number is a crazy one and now down to 875000 ... > > Off the top of my head, you have allocated roughly 48K shard buffers > > which seems bit on higher side. Can you check with something like > > 10K-15K? > > Shridhar, that depends on how much RAM he has. On 4GB dedicated machines, > I've set Shared_Buffers as high as 750MB. Could you explain me the interest to reduce this size ?? I really miss understand this point ... regards, -- Bill Footcow
Josh, Le mercredi 14 Juillet 2004 18:28, Josh Berkus a écrit : > > > checkpoint_segments = 3 > > You should probably increase this if you have the disk space. For massive > insert operations, I've found it useful to have as much as 128 segments > (although this means about 1.5GB disk space) Other point I have also read this : "NOTE: Since 7.2, turning fsync off does NOT stop WAL. It does stop checkpointing." So ... still true for 7.4.3 ??? So I'm with fsync = off so the value of checkpoint_segments have no interest ?? Thanks for your help... -- Bill Footcow
Hervé Piedvache wrote: > Josh, > > Le mercredi 14 Juillet 2004 18:28, Josh Berkus a écrit : > >>>checkpoint_segments = 3 >> >>You should probably increase this if you have the disk space. For massive >>insert operations, I've found it useful to have as much as 128 segments >>(although this means about 1.5GB disk space) > > > Other point I have also read this : > "NOTE: Since 7.2, turning fsync off does NOT stop WAL. It does stop > checkpointing." > > So ... still true for 7.4.3 ??? So I'm with fsync = off so the value of > checkpoint_segments have no interest ?? > > Thanks for your help... I suggest you check this first. Check the performance tuning guide.. http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php That is a starters. As Josh suggested, increase checkpoint segments if you have disk space. Correspondingly WAL disk space requirements go up as well. HTH Shridhar
Shridhar, > I suggest you check this first. Check the performance tuning guide.. > > http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php > > That is a starters. As Josh suggested, increase checkpoint segments if you have > disk space. Correspondingly WAL disk space requirements go up as well. Well, not if he has fsync=off. But having fsync off is a very bad idea. You do realize, Herve', that if you lose power on that machine you'll most likely have to restore from backup? -- -Josh Berkus Aglio Database Solutions San Francisco
Josh, Le jeudi 15 Juillet 2004 20:09, Josh Berkus a écrit : > > I suggest you check this first. Check the performance tuning guide.. > > > > http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php > > > > That is a starters. As Josh suggested, increase checkpoint segments if > > you > > have > > > disk space. Correspondingly WAL disk space requirements go up as well. > > Well, not if he has fsync=off. But having fsync off is a very bad idea. > You do realize, Herve', that if you lose power on that machine you'll most > likely have to restore from backup? Hum ... it's only for speed aspect ... I was using postgresql with this option since 7.01 ... and for me fsync=on was so slow ... Is it really no time consuming for the system to bring it ON now with v7.4.3 ?? Tell me ... -- Hervé Piedvache Elma Ingénierie Informatique 6 rue du Faubourg Saint-Honoré F-75008 - Paris - France Pho. 33-144949901 Fax. 33-144949902
Herve' > Hum ... it's only for speed aspect ... I was using postgresql with this > option since 7.01 ... and for me fsync=on was so slow ... > Is it really no time consuming for the system to bring it ON now with > v7.4.3 ?? Well, I wouldn't do it until you've figured out the current performance problem. The issue with having fsync=off is that, if someone yanks the power cord on your server, there is a significant chance that you will have to restore the database from backup becuase it will be corrupted. But clearly you've been living with that risk for some time. It *is* true that there is significantly less performance difference between 7.4 with fsync off and on than there was between 7.1 with fsync off and on. But there is still a difference. In 7.0 and 7.1 (I think), when you turned fsync off it turned WAL off completely, resulting in a substantial difference in disk activity. Now, it just stops checkpointing WAL but WAL is still recording -- meaning that disk activity decreases some but not a lot. The difference is more noticable the more vulnerable to contention your disk system is. The other reason not to think of fsync=off as a permanent performance tweak is that we're likely to remove the option sometime in the next 2 versions, since an increasing number of features depend on WAL behavior, and the option is largely a legacy of the 7.0 days, when WAL was sometimes buggy and needed to be turned off to get the database to start. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > Herve' > > I forgot to ask about your hardware. How much RAM, and what's your disk > setup? CPU? > > >>sort_mem = 512000 > > > Huh? Sort_mem is in K. The above says that you've allocated 512MB sort > mem. Is this process the *only* thing going on on the machine? And also is not system wide but let me say "for backend"... Regards Gaetano Mendola
Hervé Piedvache wrote: > Josh, > > Le mardi 13 Juillet 2004 19:10, Josh Berkus a écrit : > >>>What can I do to get better results ?? (configuration option, and/or >>>hardware update ?) >>>What can I give you to get more important informations to help me ? >> >>1) What PostgreSQL version are you using? > > > v7.4.3 > > >>2) What's your VACUUM, ANALYZE, VACUUM FULL, REINDEX schedule? > > > VACUUM FULL VERBOSE ANALYZE; > > Every day after the calculation I was talking about ... > > >>3) Can you list the non-default settings in your PostgreSQL.conf? >>Particularly, shared_buffers, sort_mem, checkpoint_segments, >>estimated_cache, and max_fsm_pages? > > sort_mem = 512000 This is too much, you are instructing Postgres to use 512MB for each backend ( some time each backend can use this quantity more then one ) > vacuum_mem = 409600 > max_fsm_pages = 50000000 > max_fsm_relations = 2000 50 milions ? HUG. what tell you postgres in the log after performing a vacuum full ? > max_files_per_process = 2000 > wal_buffers = 1000 > checkpoint_segments = 3 For massive insert you have to increase this number, pump it up to 16 > effective_cache_size = 5000000 5GB for 8 GB system is too much > random_page_cost = 3 on your HW you can decrease it to 2 and also decrease the other cpu costs Regards Gaetano Mendola BTW, I live in Paris too, if you need a hand...
On Mon, 2004-07-26 at 08:20, Gaetano Mendola wrote: > Hervé Piedvache wrote: SNIP > > sort_mem = 512000 > > This is too much, you are instructing Postgres to use 512MB > for each backend ( some time each backend can use this quantity > more then one ) agreed. If any one process needs this much sort mem, you can set it in that sessions with set sort_mem anyway, so to let every sort consume up to 512 meg is asking for trouble. > > effective_cache_size = 5000000 > > 5GB for 8 GB system is too much No, it's not. Assuming that postgresql with all it's shared buffers is using <2 gig, it's quite likely that the kernel is caching at least 5 gigs of disk data. Effective cache size doesn't set any cache size, it tells the planner about how much the kernel is caching. > > random_page_cost = 3 > > on your HW you can decrease it to 2 > and also decrease the other cpu costs On fast machines it often winds up needing to be set somewhere around 1.2 to 2.0