Thread: Adding getrusage profiling data to EXPLAIN output
Attached is a patch to display getrusage output to EXPLAIN output. This is the patch I mentioned previously in http://archives.postgresql.org/pgsql-hackers/2010-02/msg00684.php and it raises the same issues we were talking about then. Should the resource usage stats displayed be per-iteration totals for the whole query execution for that node. I can't see dividing by nloops making things clearer but the discrepancy is becoming more noticeable. I wonder if there's anything we can do to make things clearer. Incidentally, this is a first cut at the patch written in a single sitting. I need to go through it again to make sure I didn't do anything silly. One thing I know is broken is Windows support. I fleshed out our getrusage stub a bit but I can't test it at all, and the INSTR_TIME_* macros actually are defined differently on windows so I can't use them quite so simply on struct timeval. The simplest option is to define the struct timeval versions always even if INSTR_TIME_* doesn't use them and this code can use them directly. Another would be to modify the struct rusage definition on Windows so we use the native Windows time datatype -- which would be tempting since it would avoid the loss of precision in "only" having microsecond precision. postgres=# explain (analyze, resource) select * from x; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Seq Scan on x (cost=0.00..11540.00 rows=800000 width=4) (actual time=69.851..1287.025 rows=800000 loops=1) Resources: sys=240.000ms user=940.000ms read=27.7MB Total runtime: 2431.237 ms (3 rows) [...flush buffers and drop caches between tests...] postgres=# explain (analyze, verbose, buffers, resource) select * from x; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Seq Scan on public.x (cost=0.00..11540.00 rows=800000 width=4) (actual time=28.739..1290.786 rows=800000 loops=1) Output: generate_series Resources: sys=280.000ms user=890.000ms read=27.7MB minflt=65 nvcsw=9 nivcsw=633 Buffers: shared read=3540 Total runtime: 2487.629 ms (5 rows) -- greg
Attachment
On Fri, Oct 1, 2010 at 9:16 AM, Greg Stark <stark@mit.edu> wrote: > Attached is a patch to display getrusage output to EXPLAIN output. > This is the patch I mentioned previously in > http://archives.postgresql.org/pgsql-hackers/2010-02/msg00684.php and > it raises the same issues we were talking about then. How much overhead do you have with "resource" option? getrusage() calls for each tuple might have considerable overheads. How much difference between (analyze) and (analyze, resource) options? Auto_explain and pg_stat_statements will be also adjusted to the change when the patch is acceptable, I was asked for "queries ordered by CPU times" in pg_stat_statements several times. The getrusage infrastructure will make it a real possibility. -- Itagaki Takahiro
On Thu, Sep 30, 2010 at 6:51 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > How much overhead do you have with "resource" option? > getrusage() calls for each tuple might have considerable overheads. > How much difference between (analyze) and (analyze, resource) options? Here's strace -c for a select count(*) from x where x is a table with 800,000 narrow rows entirely in shared buffers. % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ----------------99.56 0.025299 0 6400024 getrusage0.44 0.000112 0 9872 read 0.00 0.000000 0 1 open 0.00 0.000000 0 13 lseek 0.00 0.000000 0 10 sendto 0.00 0.000000 0 5 recvfrom ------ ----------- ----------- --------- --------- ---------------- 100.00 0.025411 6409925 total That's under 40ns per getrusage call. My test C program seems to take about 300ns per getrusage call however so I'm not sure I trust the strace results. Either way that's pretty negligable. It's basically just copying a struct from kernel space to user space. Of course it depends on what query you run. In my very simple example of doing select count(*) from x where x has 800k rows the whole query takes about 100ms. Since there are two nodes (the seq scan and the aggregate) it has to do 3.2 million getrusage calls. That's about 960ms total. So it imposes nearly a 10x speed slowdown but only because the base measurement was so low. For what it's worth that's less than the overhead from gettimeofday which seems to be imposing about a 1100ms of overhead. The observed timings are query only 100.5ms explain (analyze) 1234.9ms explain (analyze,resource) 2210.2ms -- greg
On Sat, Oct 2, 2010 at 3:14 AM, Greg Stark <stark@mit.edu> wrote: > The observed timings are > query only 100.5ms > explain (analyze) 1234.9ms > explain (analyze,resource) 2210.2ms It's very large overheads, but I think it is still useful if there are no overhead if we disable the resource option. Some random ideas and comments:* Only top-level rusage might be useful as a replacement for ancient log_xxx_stats, and shouldhave smaller overhead than full-spec counters.* There are some overlaps between the feature and DTrace hooks. Ifwe need such extension even though we have DTrace hooks, it might mean DTrace hooks are hard to use for average users and maybe also for postgres' hackers...* Some kinds of statistical sampling could solve too much overhead in stop-watchbased algorithm. Sampling is not always accurate, but it might have better balance between overhead and resolution. -- Itagaki Takahiro
On Mon, Oct 4, 2010 at 3:29 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > * There are some overlaps between the feature and DTrace hooks. > If we need such extension even though we have DTrace hooks, > it might mean DTrace hooks are hard to use for average users > and maybe also for postgres' hackers... I've had it on my radar for a long time to make use of DTrace directly in EXPLAIN output. The dtrace folk are really not interested in programmatic access to the interfaces to the counters becasue they're afraid it'll nail down internal interfaces too early. But I do think there's lots of potential there. > * Some kinds of statistical sampling could solve too much overhead > in stop-watch based algorithm. Sampling is not always accurate, > but it might have better balance between overhead and resolution. > Tom experimented with doing statistical sampling and found the results were even worse than the overhead from stopwatch profiling. I think we should be able to subtract the overhead from the measurements if it's a problem. But actually in the case of getrusage stats I don't think it's an issue. The main interesting bits in getrusage are inblock and oublock which don't have any profiling overhead -- getrusage doesn't do any i/o. if you're only concerned with the cpu run-time of an in-memory query you don't really need to look at getrusage at all. -- greg