Thread: Need advice about triggers
Hello, I have small table (up to 10000 rows) and every row will be updated once per minute. Table also has "before update on each row" trigger written in plpgsql. But trigger 99.99% of the time will do nothing to the database. It will just compare old and new values in the row and those values almost always will be identical. Now I tried simple test and was able to do 10000 updates on 1000 rows table in ~30s. That's practically enough but I'd like to have more room to slow down. Also best result I achieved by doing commit+vacuum every ~500 updates. How can I improve performance and will version 7.4 bring something valuable for my task? Rewrite to some other scripting language is not a problem. Trigger is simple enough. Postgres v7.3.4, shared_buffers=4096 max_fsm settings also bumped up 10 times. Thanks, Mindaugas
On Tuesday 09 September 2003 13:40, Mindaugas Riauba wrote: > Hello, > > I have small table (up to 10000 rows) and every row will be updated > once per minute. Table also has "before update on each row" trigger > written in plpgsql. But trigger 99.99% of the time will do nothing > to the database. It will just compare old and new values in the row > and those values almost always will be identical. > > Now I tried simple test and was able to do 10000 updates on 1000 > rows table in ~30s. That's practically enough but I'd like to have > more room to slow down. > Also best result I achieved by doing commit+vacuum every ~500 > updates. > > How can I improve performance and will version 7.4 bring something > valuable for my task? Rewrite to some other scripting language is not > a problem. Trigger is simple enough. Well, try it without the trigger. If performance improves markedly, it might be worth rewriting in C. If not, you're probably saturating the disk I/O - using iostat/vmstat will let you see what's happening. If it is your disks, you might see if moving the WAL onto a separate drive would help, or check the archives for plenty of discussion about raid setups. > Postgres v7.3.4, shared_buffers=4096 max_fsm settings also bumped up > 10 times. Well effective_cache_size is useful for reads, but won't help with writing. You might want to look at wal_buffers and see if increasing that helps, but I couldn't say for sure. -- Richard Huxton Archonet Ltd
> How can I improve performance and will version 7.4 bring something > valuable for my task? Rewrite to some other scripting language is not > a problem. Trigger is simple enough. Your best bet is to have additional clients connected to the database requesting work. Approx NUMCPUs * 2 + 1 seems to be ideal. (+1 to ensure there is something waiting when the others complete. *2 to ensure that you can have 50% reading from disk, 50% doing calculations) You may simply want to put vacuum into a loop of it's own so it executes ~1 second after the previous run finished. Work should still be going on even though vacuum is running.
Attachment
> > How can I improve performance and will version 7.4 bring something > > valuable for my task? Rewrite to some other scripting language is not > > a problem. Trigger is simple enough. > > Well, try it without the trigger. If performance improves markedly, it might > be worth rewriting in C. Nope. Execution time is practically the same without trigger. > If not, you're probably saturating the disk I/O - using iostat/vmstat will let > you see what's happening. If it is your disks, you might see if moving the > WAL onto a separate drive would help, or check the archives for plenty of > discussion about raid setups. Bottleneck in this case is CPU. postmaster process uses almost 100% of CPU. > > Postgres v7.3.4, shared_buffers=4096 max_fsm settings also bumped up > > 10 times. > Well effective_cache_size is useful for reads, but won't help with writing. > You might want to look at wal_buffers and see if increasing that helps, but I > couldn't say for sure. Disk I/O should not be a problem in this case. vmstat shows ~300kb/s write activity. Mindaugas
On Tue, 9 Sep 2003, Mindaugas Riauba wrote: > > Hello, > > I have small table (up to 10000 rows) and every row will be updated > once per minute. Table also has "before update on each row" trigger > written in plpgsql. But trigger 99.99% of the time will do nothing > to the database. It will just compare old and new values in the row > and those values almost always will be identical. If the rows aren't going to actually change all that often, perhaps you could program your trigger to just silently drop the update, i.e. only change the rows that need updating and ignore the rest? That should speed things up. Unless I'm misunderstanding your needs here.
"Mindaugas Riauba" <mind@bi.lt> writes: >> Well, try it without the trigger. If performance improves markedly, it >> might be worth rewriting in C. > Nope. Execution time is practically the same without trigger. >> If not, you're probably saturating the disk I/O - > Bottleneck in this case is CPU. postmaster process uses almost 100% of > CPU. That seems very odd. Updates should be I/O intensive, not CPU intensive. I wouldn't have been surprised to hear of a plpgsql trigger consuming lots of CPU, but without it, I'm not sure where the time is going. Can you show us an EXPLAIN ANALYZE result for a typical update command? regards, tom lane
On Tuesday 09 September 2003 14:33, Mindaugas Riauba wrote: > > Well, try it without the trigger. If performance improves markedly, it > might > > be worth rewriting in C. > > Nope. Execution time is practically the same without trigger. OK - no point in rewriting it then. > > If not, you're probably saturating the disk I/O - using iostat/vmstat > > will > > let > > > you see what's happening. If it is your disks, you might see if moving > > the WAL onto a separate drive would help, or check the archives for > > plenty of discussion about raid setups. > > Bottleneck in this case is CPU. postmaster process uses almost 100% of > CPU. > Disk I/O should not be a problem in this case. vmstat shows ~300kb/s > write activity. Hmm - I must admit I wasn't expecting that. Closest I can get on my test machine here: AMD 400MHz / 256MB / IDE disk / other stuff running is about 20 secs. I've attached the perl script I used - what sort of timings does it give you? -- Richard Huxton Archonet Ltd
Attachment
Mindaugas Riauba kirjutas T, 09.09.2003 kell 15:40: > Hello, > > I have small table (up to 10000 rows) and every row will be updated > once per minute. Table also has "before update on each row" trigger > written in plpgsql. But trigger 99.99% of the time will do nothing > to the database. It will just compare old and new values in the row > and those values almost always will be identical. > > Now I tried simple test and was able to do 10000 updates on 1000 > rows table in ~30s. That's practically enough but I'd like to have > more room to slow down. Is it 10000 *rows* or 10000*1000 = 10 000 000 *rows* updated ? When I run a simple update 10 times on 1000 rows (with no trigger, which you claim to take about the same time) it took 0.25 sec. > Also best result I achieved by doing commit+vacuum every ~500 > updates. It seems like you are updating more than one row at each update ? --------------- Hannu
> >> Well, try it without the trigger. If performance improves markedly, it > >> might be worth rewriting in C. > > > Nope. Execution time is practically the same without trigger. > > >> If not, you're probably saturating the disk I/O - > > > Bottleneck in this case is CPU. postmaster process uses almost 100% of > > CPU. > > That seems very odd. Updates should be I/O intensive, not CPU > intensive. I wouldn't have been surprised to hear of a plpgsql trigger > consuming lots of CPU, but without it, I'm not sure where the time is > going. Can you show us an EXPLAIN ANALYZE result for a typical update > command? Two EXPLAIN ANALYZE below. One is before another is after REINDEX. It seems that REINDEX before updates helps. Time went down to ~17s. Also CPU is not at 100%. vmstat output is below (machine is 2xCPU so 40% load means 80% on one CPU). So the solution would be REINDEX before updates and VACUUM at the same time? Without REINDEX performance slowly degrades. Mindaugas router_db=# explain analyze update ifdata set ifspeed=256000, ifreason='12121', iflastupdate=CURRENT_TIMESTAMP WHERE clientid='#0003904#'; QUERY PLAN ---------------------------------------------------------------------------- ------------------------------------------------ Index Scan using ifdata_clientid_key on ifdata (cost=0.00..5.64 rows=1 width=116) (actual time=0.17..0.36 rows=1 loops=1) Index Cond: (clientid = '#0003904#'::character varying) Total runtime: 1.70 msec (3 rows) router_db=# reindex table ifdata; REINDEX router_db=# explain analyze update ifdata set ifspeed=256000, ifreason='12121', iflastupdate=CURRENT_TIMESTAMP WHERE clientid='#0003904#'; QUERY PLAN ---------------------------------------------------------------------------- ------------------------------------------------ Index Scan using ifdata_clientid_key on ifdata (cost=0.00..5.65 rows=1 width=116) (actual time=0.06..0.07 rows=1 loops=1) Index Cond: (clientid = '#0003904#'::character varying) Total runtime: 0.47 msec (3 rows) ---------------------------------------------------------------------------- --- procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 0 0 5048 20616 273556 1614692 0 0 4 3 2 0 0 1 3 0 0 0 5048 20612 273556 1614692 0 0 0 0 109 8 0 0 100 0 0 0 5048 20612 273556 1614692 0 0 0 168 144 20 0 0 100 1 0 0 5048 19420 273556 1614612 0 0 0 192 123 4120 35 2 63 0 1 1 5048 19420 273572 1614652 0 0 0 672 144 4139 32 2 66 1 0 0 5048 19420 273580 1614660 0 0 0 360 125 4279 33 12 55 1 0 0 5048 19420 273580 1614724 0 0 0 272 119 5887 41 2 57 1 0 0 5048 19420 273580 1614716 0 0 0 488 124 4871 40 1 59
Mindaugas Riauba kirjutas K, 10.09.2003 kell 13:21: > > router_db=# explain analyze update ifdata set ifspeed=256000, > ifreason='12121', iflastupdate=CURRENT_TIMESTAMP WHERE clientid='#0003904#'; > QUERY PLAN > ---------------------------------------------------------------------------- > ------------------------------------------------ > Index Scan using ifdata_clientid_key on ifdata (cost=0.00..5.64 rows=1 > width=116) (actual time=0.17..0.36 rows=1 loops=1) > Index Cond: (clientid = '#0003904#'::character varying) > Total runtime: 1.70 msec > (3 rows) could you try the same query on similar table, where clientid is int4 ? is it faster ? does the performance degrade at a slower rate? --------------- Hannu