Thread: updates (postgreSQL) very slow
Hi, Can someone please help me. My PostgreSQL queries are very slow, especially update statements. What can I do to improve the speed? I have already try VACUUM and ANALYZE. " From the command line I have vacuumdb -z -a -f from inside psql: database=# VACUUM FULL ANALYZE;" I work on a Linux (Mandake 9.1) computer with 2 Zeon Prosessors and 4Gig of ram. The HD on which postgreSQL is running is a 80Gig drive and read 55Mbit/Sec. The next query is an example. The table in which I work here contains 747 524 records and 14 fields (columns). bons_acc=# explain update edc_ww set edc=null; QUERY PLAN ------------------------------------- --------------------------- Seq Scan on edc_ww (cost=0.00..156793.91 rows=3491 width=184) (1 row) Time: 0.61 ms bons_acc=# update edc_ww set edc=null; UPDATE 747524 Time: 7628686.23 ms This is just a Seq Scan where a numeric field must be updated to NULL but if I run it you can see that this simple query takes forever (7628686.23 ms this is over 2 hours for only updating 747524 records!). I dont think that the tables are to big? Could it be my hardware/software/postgreSQL? What can I do to optimise postgreSQL? I already increased the shared buffer in the conf. file aswell. Bobbie______________________________________________ ____________ Bobbie van der Westhuizen Quantitative Animal Breeding (BLUP) ARC - Animal Improvement Institute +27-12-672-9128 (o/h) +27-12-665-1419 (fax) bobbie@irene.agric.za ____________________________________________________ ______
> Can someone please help me. My PostgreSQL queries > are very slow, especially update statements. What > can I do to improve the speed? I have already try > VACUUM and ANALYZE. " From the command line I have > vacuumdb -z -a -f > from inside psql: > database=# VACUUM FULL ANALYZE;" Good start to taking performance measurements - the planner's statistics are updated you will be able to get accurate analyses. > I work on a Linux (Mandake 9.1) computer with 2 Zeon > Prosessors and 4Gig of ram. The HD on which > postgreSQL is running is a 80Gig drive and read > 55Mbit/Sec. The next query is an example. The table > in which I work here contains 747 524 records and 14 > fields (columns). So you have some decent sized hardware here but you may want a second drive for backups or housing the logfiles. Also your database is not exceptionally large - PostgreSQL can handle much larger. > bons_acc=# explain update edc_ww set edc=null; > Seq Scan on edc_ww > (cost=0.00..156793.91 rows=3491 width=184) > (1 row) > Time: 0.61 ms > > bons_acc=# update edc_ww set edc=null; > > UPDATE 747524 > Time: 7628686.23 ms > > This is just a Seq Scan where a numeric field must be updated to > NULL but if I run it you can see that this simple query takes > forever (7628686.23 ms this is over 2 hours for only updating > 747524 records!). I dont think that the tables are to big? Could it > be my hardware/software/postgreSQL? What can I do to > optimise postgreSQL? I already increased the shared buffer in > the conf. file aswell. For improving the performance of large updates such as this query you will need to adjust some other parameters of postgresql.conf such as increasing the number of checkpoint segments and setting logging to a minimum level. Or to accomplish this particular update quickly, you can drop the edc column, re-add it and set the default to null. However updating every row to null with 700k rows is going to take a while and this query is probably not a good test case to judge your database performance. Try testing some of your other queries. Post the EXPLAIN ANALYZE results of those queries to the psql-performance@postgresql.org list along with a complete copy of your postgresql.conf file. I think you are not getting a good measurement of your actual database performance by judging it with this simple test case where every row is updated. Regards, Fred
"Fred Moyer" <fred@redhotpenguin.com> writes: >> This is just a Seq Scan where a numeric field must be updated to >> NULL but if I run it you can see that this �simple� query takes >> forever (7628686.23 ms this is over 2 hours for only updating >> 747524 records!). > However updating every row to null with 700k rows is going to take a while A while, sure, but 2 hours seems excessive to me too. I'm betting that there are triggers or foreign keys on the table being updated, and that that's where the time is going. It might be possible to improve that, but Bobbie hasn't given us enough information. Another thing that jumps out at me is that this table hasn't been vacuumed or analyzed recently. The planner thinks there are 3491 rows when really there are 747524. That's a bit of a big difference. It won't matter for the UPDATE itself --- a seqscan is a seqscan --- but it might matter for planning foreign-key queries. regards, tom lane
Have you run this update query again and again with vacuuming? http://www.postgresql.org/docs/7.4/static/maintenance.html#ROUTINE-VACUUMING If so, you might have millions and millions of dead tuples taking up space and slowing things down. If you're running 7.4, install the autovacuum daemon and turn it on. Nice little program that should mostly take care of this issue for you. Got any foreign keys on that field? Triggers?
On Wed, 2004-03-10 at 15:30, Tom Lane wrote: > "Fred Moyer" <fred@redhotpenguin.com> writes: > >> This is just a Seq Scan where a numeric field must be updated to > >> NULL but if I run it you can see that this simple query takes > >> forever (7628686.23 ms this is over 2 hours for only updating > >> 747524 records!). > > > However updating every row to null with 700k rows is going to take a while > > A while, sure, but 2 hours seems excessive to me too. I'm betting that > there are triggers or foreign keys on the table being updated, and that > that's where the time is going. It might be possible to improve that, > but Bobbie hasn't given us enough information. If there are no foreign keys or triggers and updating each row is taking one drive seek ( approximately 9 ms with the 80 gig IDE drive being used here ) then to do 747524 seeks will take 6727716 ms, about 10% less than the time of 7628686 ms for the update above. Is this is an accurate estimate or are these numbers just coincidence? It seems like this could represent the least efficient update scenario.
Fred Moyer <fred@redhotpenguin.com> writes: > On Wed, 2004-03-10 at 15:30, Tom Lane wrote: >> A while, sure, but 2 hours seems excessive to me too. > If there are no foreign keys or triggers and updating each row is taking > one drive seek ( approximately 9 ms with the 80 gig IDE drive being used > here ) then to do 747524 seeks will take 6727716 ms, about 10% less than > the time of 7628686 ms for the update above. Is this is an accurate > estimate or are these numbers just coincidence? Probably coincidence. There's no reason to think that a large UPDATE would expend one disk seek per updated row on average --- there's enough buffering between the UPDATE and the drive heads that under normal circumstances the cost should be lots less. If I had to bet at this point I'd bet on inefficient foreign-key checks, but since we haven't seen any schema details that's purely speculation. regards, tom lane
On 11 Mar 2004 at 2:01, Tom Lane wrote: Fred Moyer <fred@redhotpenguin.com> writes: > On Wed, 2004-03-10 at 15:30, Tom Lane wrote: >> A while, sure, but 2 hours seems excessive to me too. > If there are no foreign keys or triggers and updating each row is taking > one drive seek ( approximately 9 ms with the 80 gig IDE drive being used > here ) then to do 747524 seeks will take 6727716 ms, about 10% less than > the time of 7628686 ms for the update above. Is this is an accurate > estimate or are these numbers just coincidence? Probably coincidence. There's no reason to think that a large UPDATE would expend one disk seek per updated row on average --- there's enough buffering between the UPDATE and the drive heads that under normal circumstances the cost should be lots less. If I had to bet at this point I'd bet on inefficient foreign-key checks, but since we haven't seen any schema details that's purely speculation. regards, tom lane There are no foreign-keys in this table. What schema details do you need, then I can give it to you. I am a new user of postgreSQL so I am not clude-up with all of the stuff. ________________________________________________________ __ Bobbie van der Westhuizen Quantitative Animal Breeding (BLUP) ARC - Animal Improvement Institute +27-12-672-9128 (o/h) +27-12-665-1419 (fax) bobbie@irene.agric.za ________________________________________________________ __
Hello, Is this a place where increasing default_statistics_target will help? Sincerely, J Bobbie van der Westhuizen wrote: > On 11 Mar 2004 at 2:01, Tom Lane wrote: > > Fred Moyer <fred@redhotpenguin.com> writes: > >>On Wed, 2004-03-10 at 15:30, Tom Lane wrote: >> >>>A while, sure, but 2 hours seems excessive to me too. > > >>If there are no foreign keys or triggers and updating each row is taking >>one drive seek ( approximately 9 ms with the 80 gig IDE drive being used >>here ) then to do 747524 seeks will take 6727716 ms, about 10% less than >>the time of 7628686 ms for the update above. Is this is an accurate >>estimate or are these numbers just coincidence? > > > Probably coincidence. There's no reason to think that a large UPDATE > would expend one disk seek per updated row on average --- there's > enough > buffering between the UPDATE and the drive heads that under normal > circumstances the cost should be lots less. > > If I had to bet at this point I'd bet on inefficient foreign-key checks, > but since we haven't seen any schema details that's purely > speculation. > > regards, tom lane > > There are no foreign-keys in this table. What schema details do you > need, then I can give it to you. I am a new user of postgreSQL so I am > not clude-up with all of the stuff. > ________________________________________________________ > __ > Bobbie van der Westhuizen > Quantitative Animal Breeding (BLUP) > ARC - Animal Improvement Institute > +27-12-672-9128 (o/h) > +27-12-665-1419 (fax) > bobbie@irene.agric.za > ________________________________________________________ > __ > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match
Attachment
"Bobbie van der Westhuizen" <Bobbie@idpi1.agric.za> writes: >> If I had to bet at this point I'd bet on inefficient foreign-key checks, >> but since we haven't seen any schema details that's purely >> speculation. > There are no foreign-keys in this table. What schema details do you > need, then I can give it to you. I am a new user of postgreSQL so I am > not clude-up with all of the stuff. "pg_dump -s" is the easiest way of exhibiting the schema of a database. regards, tom lane