Thread: Sampling Profler for Postgres
Hello, I think we need two types of profilers: SQL-based and resource-based. We have some SQL-based profilers like slow-query logs (log_min_duration_statement) and contrib/pg_stat_statements in 8.4. For resource-based profilers, we have DTrace probes[1] and continue to extend them[2], but unfortunately DTrace only works on Solaris and limited platforms. Also, it is not so easy for typical users to write profilers using DTrace without performance degradation. [1] http://developer.postgresql.org/pgdocs/postgres/dynamic-trace.html [2] http://archives.postgresql.org/pgsql-hackers/2009-03/msg00226.php Therefore, I'd like to propose an profiler with sampling approach in 8.5. The attached patch is an experimental model of the profiler. Each backends reports its condtion in PgBackendStatus.st_condition and the stats collector process does polling them every seconds. This is an extension of the st_waiting field, which reports locking condition in pg_stat_activity. There are some advantages in portability and less overhead. Consideration is needed about how to coexist with DTrace. I added codes to push/pop conditions just on the same place as TRACE_POSTGRESQL_*_START/DONE(). So, we could merge the codes of DTrace and the profiler, or implement one of them with another. I would emphasize that an offical profler is required in this area because it enables users to share knowledge and documentaions; information-sharing would be difficult if they use home-made profilers. Comments welcome. ---- Here is a sample output of the profiler with pgbench on Windows: $ pgbench -i -s3 $ psql -c "SELECT pg_save_profiles()" $ pgbench -c4 -T60 -n transaction type: TPC-B (sort of) tps = 401.510694 $ psql -c "SELECT * FROM pg_diff_profiles" profid | profname | percent --------+--------------------+--------- 19 | XLog:Write | 23.04 <- means wal contension 46 | LWLock:WALWrite | 23.04 <- same as the above 32 | Lock:Transaction | 22.61 <- confliction on row locks 15 | Network:Recv | 7.83 21 | Data:Stat | 4.35 <- lseek() is slow on Windows 7 | CPU:Execute | 3.91 3 | CPU | 3.91 1 | Idle:InTransaction | 2.61 5 | CPU:Rewrite | 1.74 16 | Network:Send | 1.74 6 | CPU:Plan | 1.74 31 | Lock:Tuple | 1.74 4 | CPU:Parse | 0.87 11 | CPU:Commit | 0.87 (14 rows) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
Em Seg, 2009-03-09 às 13:55 +0900, ITAGAKI Takahiro escreveu: > Therefore, I'd like to propose an profiler with sampling approach in 8.5. > The attached patch is an experimental model of the profiler. > Each backends reports its condtion in PgBackendStatus.st_condition > and the stats collector process does polling them every seconds. Hi Takahiro! Compiled and Works fine here on Ubuntu 8.04 2.6.25.15-bd-mod #1 SMP PREEMPT Thu Nov 27 10:05:44 BRST 2008 i686 GNU/Linux dba@analise3:/srv/postgresql/HEAD$ ./bin/pgbench -i -s3 dba@analise3:/srv/postgresql/HEAD$ ./bin/pgbench -i -s3 -d postgres transaction type: TPC-B (sort of) scaling factor: 3 query mode: simple number of clients: 4 duration: 60 s number of transactions actually processed: 3730 tps = 62.090946 (including connections establishing) tps = 62.112183 (excluding connections establishing) dba@analise3:/srv/postgresql/HEAD$ ./bin/psql -c "SELECT * FROM pg_diff_profiles" -d postgresprofid | profname | percent --------+------------------+--------- 15 | Network:Recv | 50.45 16 | Network:Send | 24.55 32 | Lock:Transaction| 7.14 3 | CPU | 5.80 20 | XLog:Flush | 3.13 31 | Lock:Tuple | 2.68 7 | CPU:Execute | 1.79 6 | CPU:Plan | 1.79 46 | LWLock:WALWrite | 1.34 11| CPU:Commit | 0.89 19 | XLog:Write | 0.45 (11 rows) Two questions here: 1) How will be this behavior in a syncrep environment? I don't have one here to test this, yet. 2) I couldn't find a clear way to disable it. There is one in this patch or are you planning this to future? Regards, -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://planeta.postgresql.org.br
"Dickson S. Guedes" <listas@guedesoft.net> wrote: > Compiled and Works fine here on Ubuntu 8.04 2.6.25.15-bd-mod #1 SMP > PREEMPT Thu Nov 27 10:05:44 BRST 2008 i686 GNU/Linux Thanks for testing. Network (or communication between pgbench and postgres) seems to be a bottleneck on your machine. > Two questions here: > > 1) How will be this behavior in a syncrep environment? I don't have one > here to test this, yet. I think it has relation with hot-standby, but not syncrep. Profiling is enabled when stats collector process is running. We already run the collector during warm-standby, so profiling would be also available on log-shipping slaves. > 2) I couldn't find a clear way to disable it. There is one in this patch > or are you planning this to future? Ah, I forgot sampling should be disabled when track_activities is off. I'll fix it in the next patch. Also, I'd better measure overheads by the patch. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Em Ter, 2009-03-10 às 10:23 +0900, ITAGAKI Takahiro escreveu: > Thanks for testing. Network (or communication between pgbench and postgres) > seems to be a bottleneck on your machine. Yes, it is a very poor machine for quicktest. I'll test other environments tomorrow. > > Two questions here: > > > > 1) How will be this behavior in a syncrep environment? I don't have one > > here to test this, yet. > > I think it has relation with hot-standby, but not syncrep. > Profiling is enabled when stats collector process is running. > We already run the collector during warm-standby, so profiling would > be also available on log-shipping slaves. OK. Thanks. > > 2) I couldn't find a clear way to disable it. There is one in this patch > > or are you planning this to future? > > Ah, I forgot sampling should be disabled when track_activities is off. > I'll fix it in the next patch. Also, I'd better measure overheads > by the patch. Will be very nice if I could on/off it. When done, please send us. I'd like to test it in some stress scenarios, enabling and disabling it on some environment and comparing with my old benchmarks. Regards, -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://planeta.postgresql.org.br
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > For resource-based profilers, we have DTrace probes[1] and continue to > extend them[2], but unfortunately DTrace only works on Solaris and limited > platforms. FWIW, the systemtap guys are really, really close to having a working DTrace equivalent for Linux: http://gnu.wildebeest.org/diary/2009/02/24/systemtap-09-markers-everywhere/ It's not *quite* there for our purposes https://bugzilla.redhat.com/show_bug.cgi?id=488941 but I'll be surprised if I'm not dtracing on my Fedora 10 machine before the week is out. I'm not at all convinced that we should be putting effort into a homegrown, partial substitute for DTrace. regards, tom lane
On Mon, 2009-03-09 at 21:57 -0400, Tom Lane wrote: > ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > > For resource-based profilers, we have DTrace probes[1] and continue to > > extend them[2], but unfortunately DTrace only works on Solaris and limited > > platforms. > > FWIW, the systemtap guys are really, really close to having a working > DTrace equivalent for Linux: > http://gnu.wildebeest.org/diary/2009/02/24/systemtap-09-markers-everywhere/ > > It's not *quite* there for our purposes > https://bugzilla.redhat.com/show_bug.cgi?id=488941 > but I'll be surprised if I'm not dtracing on my Fedora 10 machine before > the week is out. After all this time, you think it will be done in a week :-) > I'm not at all convinced that we should be putting effort into a > homegrown, partial substitute for DTrace. I was, but I'm not anymore. Do you think we will be able to enable this in builds for 8.4? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > On Mon, 2009-03-09 at 21:57 -0400, Tom Lane wrote: >> I'm not at all convinced that we should be putting effort into a >> homegrown, partial substitute for DTrace. > I was, but I'm not anymore. > Do you think we will be able to enable this in builds for 8.4? The bugzilla entry I pointed to was asking me to enable it for 8.3. Which I did. It's certainly got some rough edges today, but I fully expect it to be usable when Fedora 11 ships. regards, tom lane
Hi! Tom Lane writes: > I'm not at all convinced that we should be putting effort into a > homegrown, partial substitute for DTrace. In my opinion providing DTrace as the only means of profiling would except a number of users from the tuning benefits. DTrace seems to rely on specific kernel options on Linux, which you might not be able to influence if you run your business on leased virtual servers hosted somewhere. DTrace is also not available for all platforms, most notably Windows. DTrace might be a great tool for the developers and should probably be used. For the rest of the world I see a benefit in having something like the proposed solution that could be enabled by the database administrator on every server or maybe even be the default. I think it would reduce the guesswork on why something might me slow and the work on 'probable' causes and establish more of a 'tuning by numbers' attitude. Looking at the existing probes in HEAD it this seems to be your target to provide high-level resource usage patterns to the user and I agree that this is the right abstraction layer. With this proposal I see a way of providing the resource usage in a (database) user-friendly way: namely as tupels that the user can access in a familiar manner and without using shell commands on a server that he might not even have access to. I also see an easy way of keeping historic data by copying the current state with a timestamp to a different table and then being able to look at performance problems of last night when nobody was there to notice it and fire up a profiler to watch it. Just my 0.02€. -- Stefan
"Dickson S. Guedes" <listas@guedesoft.net> wrote: > > > 2) I couldn't find a clear way to disable it. There is one in this patch > > > or are you planning this to future? > > > > Ah, I forgot sampling should be disabled when track_activities is off. > > I'll fix it in the next patch. Also, I'd better measure overheads > > by the patch. > > Will be very nice if I could on/off it. When done, please send us. I'd > like to test it in some stress scenarios, enabling and disabling it on > some environment and comparing with my old benchmarks. Here is a new version of the patch. I added a new GUC parameter 'profiling_interval' (ms). Profiling is disabled when the value is 0. The default value is 1 second. You could get more granular results if you set the value to 100-500ms, but 1 sec should be enough for continuous regular load (like benchmarks). I cannot see any differences whether profiling is on/off. So I think sampling has little overheads for now. Please notify me report if you see troubles. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
On March 10, 2009, Tom Lane wrote: > FWIW, the systemtap guys are really, really close to having a working > DTrace equivalent for Linux: > http://gnu.wildebeest.org/diary/2009/02/24/systemtap-09-markers-everywhere/ > > It's not *quite* there for our purposes > https://bugzilla.redhat.com/show_bug.cgi?id=488941 > but I'll be surprised if I'm not dtracing on my Fedora 10 machine before > the week is out. So how is this going? Is it usable? I assume it's source compatible with the dtrace support that we already have?
Peter Eisentraut <peter_e@gmx.net> writes: > On March 10, 2009, Tom Lane wrote: >> FWIW, the systemtap guys are really, really close to having a working >> DTrace equivalent for Linux: >> http://gnu.wildebeest.org/diary/2009/02/24/systemtap-09-markers-everywhere/ > So how is this going? Is it usable? I assume it's source compatible > with the dtrace support that we already have? Their SCM tip successfully builds our code with --enable-dtrace. I haven't gotten any further with it than to try the sample script linked on the page above, but that seemed to work (on a Fedora 10 x86_64 box). The current 0.9 release does *not* work on our CVS tip (dtrace fails on more-than-6-argument probes, and there are some other issues), but you can pull from their git repository: install elfutils-devel git clone git://sources.redhat.com/git/systemtap.git configure --prefix=SOMEWHERE make all sudo makeinstall Then build PG with PATH=SOMEWHERE/bin:$PATH configure --with-includes=SOMEWHERE/include --enable-dtrace regards, tom lane
I wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> On March 10, 2009, Tom Lane wrote: >>> FWIW, the systemtap guys are really, really close to having a working >>> DTrace equivalent for Linux: >>> http://gnu.wildebeest.org/diary/2009/02/24/systemtap-09-markers-everywhere/ >> So how is this going? Is it usable? I assume it's source compatible >> with the dtrace support that we already have? > The current 0.9 release does *not* work on our CVS tip (dtrace fails > on more-than-6-argument probes, and there are some other issues), > but you can pull from their git repository: BTW, systemtap 0.9.5 is now available as part of the standard Fedora 10 package set, so you don't have to install any nonstandard software anymore. I've checked, and 0.9.5 appears to "just work" with our CVS HEAD. You need these packages: $ rpm -qa | grep systemtap systemtap-sdt-devel-0.9.5-1.fc10.x86_64 systemtap-runtime-0.9.5-1.fc10.x86_64 systemtap-0.9.5-1.fc10.x86_64 Then configure --enable-dtrace, and away you go. regards, tom lane
Here is an updated version of sampling profiler patch. Now condition IDs can be discrete numbers and don't have to be continuous. It enables us to insert some new conditions between existing numbers if needed in the future. I think we need more discussion about how to adjust this patch and dtrace probes, but I'll submit it to the next commit-fest for the record. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center