Thread: about explain analyze
Hello! Explain analyze takes 3 times more time for execution. Why? wow=# \timing Timing is on. wow=# select max(click.accesses) from click; max ---------- 10944762 (1 row) Time: 105,654 ms wow=# explain analyze select max(click.accesses) from click; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Aggregate (cost=1103.70..1103.70 rows=1 width=8) (actual time=289.391..289.393 rows=1 loops=1) -> Seq Scan on click (cost=0.00..971.36 rows=52936 width=8) (actual time=0.013..133.943 rows=52936 loops=1) Total runtime: 289.498 ms (3 rows) Time: 290,695 ms wow=# select version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 7.4on i386-unknown-freebsd5.1, compiled by GCC gcc (GCC) 3.2.2 [FreeBSD] 20030205 (release) (1 row) -- Teodor Sigaev E-mail: teodor@sigaev.ru
Teodor Sigaev <teodor@sigaev.ru> writes: > Explain analyze takes 3 times more time for execution. Why? Measurement overhead. It would seem your platform has a particularly slow version of gettimeofday() though ... I've never noticed such a large discrepancy myself. regards, tom lane
On Fri, 28 Nov 2003, Tom Lane wrote: > Teodor Sigaev <teodor@sigaev.ru> writes: > > Explain analyze takes 3 times more time for execution. Why? > > Measurement overhead. It would seem your platform has a particularly > slow version of gettimeofday() though ... I've never noticed such a > large discrepancy myself. Actually, I'm noticing similar "problems" with v7.4 ... "EXPLAIN ANALYZE" seems to be showing some *very* high ms for execution time, but if you run the actual query, it doesn't seem to take even 1/10th the time reported ... This is on a FreeBSD 4.x box ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes: > Actually, I'm noticing similar "problems" with v7.4 ... "EXPLAIN ANALYZE" > seems to be showing some *very* high ms for execution time, but if you run > the actual query, it doesn't seem to take even 1/10th the time reported > ... Example? I don't see anything out of line here. regards, tom lane
On Fri, 28 Nov 2003, Tom Lane wrote: > "Marc G. Fournier" <scrappy@postgresql.org> writes: > > Actually, I'm noticing similar "problems" with v7.4 ... "EXPLAIN ANALYZE" > > seems to be showing some *very* high ms for execution time, but if you run > > the actual query, it doesn't seem to take even 1/10th the time reported > > ... > > Example? I don't see anything out of line here. pgsql74# /usr/local/pgsql/bin/psql -U pgsql -c "SELECT count(rec_id) FROM url" 186_archivescount --------209872 (1 row) pgsql74# /usr/local/pgsql/bin/psql -U pgsql -c "EXPLAIN ANALYZE SELECT max(rec_id) FROM url" 186_archives QUERY PLAN --------------------------------------------------------------------------------------------------------------------Aggregate (cost=11177.19..11177.19 rows=1 width=4) (actual time=2400.579..2400.585 rows=1 loops=1) -> Seq Scan on url (cost=0.00..10741.55rows=174255 width=4) (actual time=0.056..1387.803 rows=209872 loops=1)Total runtime: 2407.095 ms (3 rows) pgsql74# time /usr/local/pgsql/bin/psql -U pgsql -c "SELECT max(rec_id) FROM url" 186_archives max --------690582 (1 row) 0.000u 0.004s 0:00.32 0.0% 0+0k 0+0io 0pf+0w ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
How many times is gettimeofday called? wow=# explain analyze select 1; QUERY PLAN ------------------------------------------------------------------------------------ Result (cost=0.00..0.01 rows=1 width=0)(actual time=0.008..0.010 rows=1 loops=1) Total runtime: 0.047 ms (2 rows) Several thousands? Tom Lane wrote: > Teodor Sigaev <teodor@sigaev.ru> writes: > >>Explain analyze takes 3 times more time for execution. Why? > > > Measurement overhead. It would seem your platform has a particularly > slow version of gettimeofday() though ... I've never noticed such a > large discrepancy myself. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings -- Teodor Sigaev E-mail: teodor@sigaev.ru
Teodor Sigaev <teodor@sigaev.ru> writes: > How many times is gettimeofday called? Twice per plan node visit, if you are doing EXPLAIN ANALYZE. (The number of "visits" is one more than the number of rows returned.) regards, tom lane
> ... if its just me mis-reading the numbers, let me > know ... it just "feels" off Here's what I see: > time psql -c "explain analyze SELECT count(rec_id) FROM url" 186_archives QUERY PLAN --------------------------------------------------------------------------------------------------------------------Aggregate (cost=11177.19..11177.19 rows=1 width=4) (actual time=3539.180..3539.184 rows=1 loops=1) -> Seq Scan on url (cost=0.00..10741.55rows=174255 width=4) (actual time=0.111..2292.234 rows=215552 loops=1)Total runtime: 3539.922 ms (3 rows) 0.006u 0.000s 0:03.57 0.0% 0+0k 0+0io 0pf+0w So there doesn't seem to be any discrepancy between what EXPLAIN says and what time(1) says. It is true that the overhead of EXPLAIN ANALYZE looks a bit high: > time psql -c "SELECT count(rec_id) FROM url" 186_archivescount --------215169 (1 row) 0.000u 0.004s 0:00.77 0.0% 0+0k 0+0io 0pf+0w If I'm doing the arithmetic correctly this makes the measurement overhead about 13 microseconds per row. Presumably that's almost entirely spent on the two gettimeofday() calls, so they are costing upwards of 6 microseconds apiece, which seems like a lot for a modern processor. Might be worth griping to the BSD kernel folk... Still though I think we could be proud that we've gotten the price of a seqscan down to the point where a couple of gettimeofday() calls per row are dominating the runtime. regards, tom lane
None that I've heard of ... Sean? Tom, any way of writing a quite test C program for this? Something to 'simulate' the same thing, but without having to build the whole postmaster? On Fri, 28 Nov 2003, Tom Lane wrote: > "Marc G. Fournier" <scrappy@postgresql.org> writes: > > On Fri, 28 Nov 2003, Tom Lane wrote: > >> Now that I recall, didn't you complain of something similar with a beta? > > > Yup ... and I bet its not reproducible yet again, is it? :) That would > > make for twice though, with v7.4, that I've come up with - results :) > > Well, it's not reproducibly negative, but it seems reproducibly wrong: > > Aggregate (cost=40168.96..40168.96 rows=1 width=4) (actual time=49641.603..49641.611 rows=1 loops=1) > -> Seq Scan on ndict3 (cost=0.00..34560.57 rows=2243357 width=4) (actual time=34.854..724754.474 rows=3570252 loops=1) > Total runtime: 49688.524 ms > > Aggregate (cost=40168.96..40168.96 rows=1 width=4) (actual time=36625.013..36625.018 rows=1 loops=1) > -> Seq Scan on ndict3 (cost=0.00..34560.57 rows=2243357 width=4) (actual time=0.128..-676113.173 rows=3572298 loops=1) > Total runtime: 36625.779 ms > > Aggregate (cost=40168.96..40168.96 rows=1 width=4) (actual time=41380.881..41380.885 rows=1 loops=1) > -> Seq Scan on ndict3 (cost=0.00..34560.57 rows=2243357 width=4) (actual time=0.091..718200.092 rows=3575264 loops=1) > Total runtime: 41381.504 ms > (3 rows) > > I'm wondering if there's an actual bug in gettimeofday() in this > platform, such that once in a while it returns a value that's off > a minute or so ... > > regards, tom lane > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
I said: > I'm wondering if there's an actual bug in gettimeofday() in this > platform, such that once in a while it returns a value that's off > a minute or so ... Whoa. Try the following test program. Then send it in to your friendly local BSD hackers .... > uname -a FreeBSD pgsql74.hub.org 4.9-PRERELEASE FreeBSD 4.9-PRERELEASE #3: Sat Sep 20 14:16:48 ADT 2003 root@neptune.hub.org:/usr/obj/usr/src/sys/kernel i386 > ./a.out out of order tv_sec: 1070050646 939970, prev 1070051342 516165 out of order tv_sec: 1070051342 789127, prev 1070050646 939970 out of order tv_usec: 1070051342 789127, prev 1070050646 939970 out of order tv_sec: 1070050654 690255, prev 1070051350 216448 out of order tv_sec: 1070051350 402412, prev 1070050654 690255 out of order tv_usec: 1070051350 402412, prev 1070050654 690255 out of order tv_sec: 1070050672 990359, prev 1070051368 416546 out of order tv_sec: 1070051368 501648, prev 1070050672 990359 out of order tv_usec: 1070051368 501648, prev 1070050672 990359 out of order tv_sec: 1070050672 990363, prev 1070051368 516557 out of order tv_sec: 1070051368 700227, prev 1070050672 990363 out of order tv_usec: 1070051368 700227, prev 1070050672 990363 out of order tv_sec: 1070050675 740524, prev 1070051371 216718 out of order tv_sec: 1070051371 346521, prev 1070050675 740524 out of order tv_usec: 1070051371 346521, prev 1070050675 740524 out of order tv_sec: 1070050676 440415, prev 1070051371 916605 out of order tv_usec: 1070050676 440415, prev 1070051371 916605 out of order tv_sec: 1070051372 238348, prev 1070050676 440415 out of order tv_usec: 1070051372 238348, prev 1070050676 440415 ^C > regards, tom lane #include <stdio.h> #include <sys/time.h> int main() {struct timeval prevtime;struct timeval curtime; gettimeofday(&prevtime, NULL); for (;;){ gettimeofday(&curtime, NULL); if (curtime.tv_usec < 0 || curtime.tv_usec >= 1000000) fprintf(stderr, "bogus tv_usec: %ld %ld, prev %ld %ld\n", (long int) curtime.tv_sec, (long int) curtime.tv_usec, (long int) prevtime.tv_sec, (long int) prevtime.tv_usec); if (curtime.tv_sec < prevtime.tv_sec || curtime.tv_sec > prevtime.tv_sec + 1) fprintf(stderr, "out of ordertv_sec: %ld %ld, prev %ld %ld\n", (long int) curtime.tv_sec, (long int) curtime.tv_usec, (long int) prevtime.tv_sec, (long int) prevtime.tv_usec); if (curtime.tv_usec < prevtime.tv_usec && curtime.tv_sec != prevtime.tv_sec + 1) fprintf(stderr, "out oforder tv_usec: %ld %ld, prev %ld %ld\n", (long int) curtime.tv_sec, (long int) curtime.tv_usec, (long int) prevtime.tv_sec, (long int) prevtime.tv_usec); prevtime = curtime;} return 0; }
"Marc G. Fournier" <scrappy@postgresql.org> writes: > On Fri, 28 Nov 2003, Tom Lane wrote: >> Now that I recall, didn't you complain of something similar with a beta? > Yup ... and I bet its not reproducible yet again, is it? :) That would > make for twice though, with v7.4, that I've come up with - results :) Well, it's not reproducibly negative, but it seems reproducibly wrong: Aggregate (cost=40168.96..40168.96 rows=1 width=4) (actual time=49641.603..49641.611 rows=1 loops=1) -> Seq Scan on ndict3 (cost=0.00..34560.57 rows=2243357 width=4) (actual time=34.854..724754.474 rows=3570252 loops=1)Total runtime: 49688.524ms Aggregate (cost=40168.96..40168.96 rows=1 width=4) (actual time=36625.013..36625.018 rows=1 loops=1) -> Seq Scan on ndict3 (cost=0.00..34560.57 rows=2243357 width=4) (actual time=0.128..-676113.173 rows=3572298 loops=1)Total runtime: 36625.779ms Aggregate (cost=40168.96..40168.96 rows=1 width=4) (actual time=41380.881..41380.885 rows=1 loops=1) -> Seq Scan on ndict3 (cost=0.00..34560.57 rows=2243357 width=4) (actual time=0.091..718200.092 rows=3575264 loops=1)Total runtime: 41381.504ms (3 rows) I'm wondering if there's an actual bug in gettimeofday() in this platform, such that once in a while it returns a value that's off a minute or so ... regards, tom lane
Teodor Sigaev <teodor@sigaev.ru> writes: > Tom, I am afraid that I don't understand. My first example: > wow=# explain analyze select max(click.accesses) from click; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------- > Aggregate (cost=1103.70..1103.70 rows=1 width=8) (actual > time=289.391..289.393 rows=1 loops=1) > -> Seq Scan on click (cost=0.00..971.36 rows=52936 width=8) (actual > time=0.013..133.943 rows=52936 loops=1) > Total runtime: 289.498 ms > (3 rows) > Time: 290,695 ms > select max() returns only one row... If you mean rows in Seq Scan than > gettimeofday was called (52936+1)*2. Huge value :( Right, that many times at the Seq Scan plan node, and (1+1)*2 more times at the Aggregate node. regards, tom lane
On Fri, 28 Nov 2003, Tom Lane wrote: > "Marc G. Fournier" <scrappy@postgresql.org> writes: > > did you happen to take a look at the other set of queries I sent you, that > > were showing 39k and 41k explain results? > > You mean this one? Yup ... > > time psql -c "explain analyze select max(intag) from ndict3" 186_archives > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=40168.96..40168.96 rows=1 width=4) (actual time=48953.823..48953.827 rows=1 loops=1) > -> Seq Scan on ndict3 (cost=0.00..34560.57 rows=2243357 width=4) (actual time=4.903..-666785.605 rows=3516680 loops=1) > Total runtime: 48982.514 ms > (3 rows) > > 0.000u 0.005s 0:49.06 0.0% 0+0k 0+0io 0pf+0w > > time psql -c "select max(intag) from ndict3" 186_archives > max > ------------ > 2147418368 > (1 row) > > 0.000u 0.005s 0:03.06 0.0% 0+0k 0+0io 0pf+0w > > This looks like 46 seconds of overhead for 3516680 rows, or still right > about 13 microseconds per row, so that's consistent. The negative > "actual time" measurement for the Seq Scan row seems pretty broken > though :-(. > > Now that I recall, didn't you complain of something similar with a beta? Yup ... and I bet its not reproducible yet again, is it? :) That would make for twice though, with v7.4, that I've come up with - results :) ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Tom, I am afraid that I don't understand. My first example: wow=# explain analyze select max(click.accesses) from click; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Aggregate (cost=1103.70..1103.70 rows=1 width=8) (actual time=289.391..289.393 rows=1 loops=1) -> Seq Scan on click (cost=0.00..971.36 rows=52936 width=8) (actual time=0.013..133.943 rows=52936 loops=1) Total runtime: 289.498 ms (3 rows) Time: 290,695 ms select max() returns only one row... If you mean rows in Seq Scan than gettimeofday was called (52936+1)*2. Huge value :( Tom Lane wrote: > Teodor Sigaev <teodor@sigaev.ru> writes: > >>How many times is gettimeofday called? > > > Twice per plan node visit, if you are doing EXPLAIN ANALYZE. (The > number of "visits" is one more than the number of rows returned.) > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings -- Teodor Sigaev E-mail: teodor@sigaev.ru
"Marc G. Fournier" <scrappy@postgresql.org> writes: > did you happen to take a look at the other set of queries I sent you, that > were showing 39k and 41k explain results? You mean this one? > time psql -c "explain analyze select max(intag) from ndict3" 186_archives QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------Aggregate (cost=40168.96..40168.96 rows=1 width=4) (actual time=48953.823..48953.827 rows=1 loops=1) -> Seq Scan on ndict3 (cost=0.00..34560.57rows=2243357 width=4) (actual time=4.903..-666785.605 rows=3516680 loops=1)Total runtime: 48982.514 ms (3 rows) 0.000u 0.005s 0:49.06 0.0% 0+0k 0+0io 0pf+0w > time psql -c "select max(intag) from ndict3" 186_archives max ------------2147418368 (1 row) 0.000u 0.005s 0:03.06 0.0% 0+0k 0+0io 0pf+0w This looks like 46 seconds of overhead for 3516680 rows, or still right about 13 microseconds per row, so that's consistent. The negative "actual time" measurement for the Seq Scan row seems pretty broken though :-(. Now that I recall, didn't you complain of something similar with a beta? regards, tom lane
did you happen to take a look at the other set of queries I sent you, that were showing 39k and 41k explain results? do they show pretty much the same 'lag' in gettimeofday()? On Fri, 28 Nov 2003, Tom Lane wrote: > > ... if its just me mis-reading the numbers, let me > > know ... it just "feels" off > > Here's what I see: > > > time psql -c "explain analyze SELECT count(rec_id) FROM url" 186_archives > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=11177.19..11177.19 rows=1 width=4) (actual time=3539.180..3539.184 rows=1 loops=1) > -> Seq Scan on url (cost=0.00..10741.55 rows=174255 width=4) (actual time=0.111..2292.234 rows=215552 loops=1) > Total runtime: 3539.922 ms > (3 rows) > > 0.006u 0.000s 0:03.57 0.0% 0+0k 0+0io 0pf+0w > > So there doesn't seem to be any discrepancy between what EXPLAIN says > and what time(1) says. It is true that the overhead of EXPLAIN ANALYZE > looks a bit high: > > > time psql -c "SELECT count(rec_id) FROM url" 186_archives > count > -------- > 215169 > (1 row) > > 0.000u 0.004s 0:00.77 0.0% 0+0k 0+0io 0pf+0w > > If I'm doing the arithmetic correctly this makes the measurement > overhead about 13 microseconds per row. Presumably that's almost > entirely spent on the two gettimeofday() calls, so they are costing > upwards of 6 microseconds apiece, which seems like a lot for a modern > processor. Might be worth griping to the BSD kernel folk... > > Still though I think we could be proud that we've gotten the price of > a seqscan down to the point where a couple of gettimeofday() calls per > row are dominating the runtime. > > regards, tom lane > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
On November 28, 2003 12:33 pm, Tom Lane wrote: > I said: > > I'm wondering if there's an actual bug in gettimeofday() in this > > platform, such that once in a while it returns a value that's off > > a minute or so ... > > Whoa. Try the following test program. Then send it in to your friendly > local BSD hackers .... > > > uname -a > > FreeBSD pgsql74.hub.org 4.9-PRERELEASE FreeBSD 4.9-PRERELEASE #3: Sat Sep > 20 14:16:48 ADT 2003 root@neptune.hub.org:/usr/obj/usr/src/sys/kernel > i386 > > > ./a.out > > out of order tv_sec: 1070050646 939970, prev 1070051342 516165 > out of order tv_sec: 1070051342 789127, prev 1070050646 939970 > out of order tv_usec: 1070051342 789127, prev 1070050646 939970 > out of order tv_sec: 1070050654 690255, prev 1070051350 216448 > out of order tv_sec: 1070051350 402412, prev 1070050654 690255 > out of order tv_usec: 1070051350 402412, prev 1070050654 690255 > out of order tv_sec: 1070050672 990359, prev 1070051368 416546 > out of order tv_sec: 1070051368 501648, prev 1070050672 990359 > out of order tv_usec: 1070051368 501648, prev 1070050672 990359 > out of order tv_sec: 1070050672 990363, prev 1070051368 516557 > out of order tv_sec: 1070051368 700227, prev 1070050672 990363 > out of order tv_usec: 1070051368 700227, prev 1070050672 990363 > out of order tv_sec: 1070050675 740524, prev 1070051371 216718 > out of order tv_sec: 1070051371 346521, prev 1070050675 740524 > out of order tv_usec: 1070051371 346521, prev 1070050675 740524 > out of order tv_sec: 1070050676 440415, prev 1070051371 916605 > out of order tv_usec: 1070050676 440415, prev 1070051371 916605 > out of order tv_sec: 1070051372 238348, prev 1070050676 440415 > out of order tv_usec: 1070051372 238348, prev 1070050676 440415 > ^C > I've been running this code on a pair of FreeBSD (i386) boxen, for some time now, one of which is a 4.8-STABLE, the other is a 5.2-BETA. In 10 minutes of execution the 4.8 box has only shown the following: out of order tv_sec: 1070063393 115226, prev 1070063391 381087 out of order tv_usec: 1070063393 115226, prev 1070063391 381087 the 5.2 box, hasn't outputed anything. I'm goignt o run this test on some other boxen I have and see what/ifany results are happening. > > regards, tom lane > > > #include <stdio.h> > #include <sys/time.h> > > int > main() > { > struct timeval prevtime; > struct timeval curtime; > > gettimeofday(&prevtime, NULL); > > for (;;) > { > gettimeofday(&curtime, NULL); > > if (curtime.tv_usec < 0 || curtime.tv_usec >= 1000000) > fprintf(stderr, "bogus tv_usec: %ld %ld, prev %ld %ld\n", > (long int) curtime.tv_sec, > (long int) curtime.tv_usec, > (long int) prevtime.tv_sec, > (long int) prevtime.tv_usec); > > if (curtime.tv_sec < prevtime.tv_sec || > curtime.tv_sec > prevtime.tv_sec + 1) > fprintf(stderr, "out of order tv_sec: %ld %ld, prev %ld %ld\n", > (long int) curtime.tv_sec, > (long int) curtime.tv_usec, > (long int) prevtime.tv_sec, > (long int) prevtime.tv_usec); > > if (curtime.tv_usec < prevtime.tv_usec && > curtime.tv_sec != prevtime.tv_sec + 1) > fprintf(stderr, "out of order tv_usec: %ld %ld, prev %ld %ld\n", > (long int) curtime.tv_sec, > (long int) curtime.tv_usec, > (long int) prevtime.tv_sec, > (long int) prevtime.tv_usec); > > prevtime = curtime; > } > > return 0; > } > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com
Darcy Buskermolen <darcy@wavefire.com> writes: > On November 28, 2003 12:33 pm, Tom Lane wrote: >> Whoa. Try the following test program. Then send it in to your friendly >> local BSD hackers .... > I've been running this code on a pair of FreeBSD (i386) boxen, for some time > now, one of which is a 4.8-STABLE, the other is a 5.2-BETA. Could it be a hardware problem on Marc's box? Or specific to some other aspect of that installation (Marc, is pgsql74.hub.org multi-CPU, for example?) The failure is definitely quite repeatable on pgsql74.hub.org. I don't see it on svr1.postgresql.org, though, which seems to be running almost the same kernel. regards, tom lane
I said: > The failure is definitely quite repeatable on pgsql74.hub.org. I don't > see it on svr1.postgresql.org, though, which seems to be running almost > the same kernel. After looking more closely, I take that back: svr1 is failing too, though not as often: > uname -a FreeBSD svr1.postgresql.org 4.9-PRERELEASE FreeBSD 4.9-PRERELEASE #4: Sat Sep 20 14:41:58 ADT 2003 root@jupiter.hub.org:/usr/obj/usr/src/sys/kernel i386 > ./a.out out of order tv_sec: 1070064182 410497, prev 1070064877 836691 <----------- out of order tv_usec: 1070064182 410497, prev 1070064877 836691 out of order tv_sec: 1070064877 920579, prev 1070064182 410497 out of order tv_sec: 1070064901 126624, prev 1070064899 897160 out of order tv_usec: 1070064901 126624, prev 1070064899 897160 out of order tv_sec: 1070064907 306286, prev 1070064905 481423 out of order tv_usec: 1070064907 306286, prev 1070064905 481423 out of order tv_sec: 1070064218 861047, prev 1070064914 337241 <----------- out of order tv_sec: 1070064914 570717, prev 1070064218 861047 out of order tv_usec: 1070064914 570717, prev 1070064218 861047 out of order tv_sec: 1070064241 411391, prev 1070064936 837585 <----------- out of order tv_usec: 1070064241 411391, prev 1070064936 837585 out of order tv_sec: 1070064937 497925, prev 1070064241 411391 out of order tv_sec: 1070064251 811548, prev 1070064947 337739 <----------- out of order tv_sec: 1070064947 508364, prev 1070064251 811548 out of order tv_usec: 1070064947 508364, prev 1070064251 811548 Maybe it's a 4.9-PRERELEASE bug? regards, tom lane
On November 28, 2003 12:33 pm, Tom Lane wrote: > I said: > > I'm wondering if there's an actual bug in gettimeofday() in this > > platform, such that once in a while it returns a value that's off > > a minute or so ... > > Whoa. Try the following test program. Then send it in to your friendly > local BSD hackers .... > > > uname -a > > FreeBSD pgsql74.hub.org 4.9-PRERELEASE FreeBSD 4.9-PRERELEASE #3: Sat Sep > 20 14:16:48 ADT 2003 root@neptune.hub.org:/usr/obj/usr/src/sys/kernel > i386 > > > ./a.out ----8<------ snip Further to my last email, after 30 minutes of running I was able to get simular behavior off the 5.2 test system. -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com
On Fri, 28 Nov 2003, Tom Lane wrote: > Darcy Buskermolen <darcy@wavefire.com> writes: > > On November 28, 2003 12:33 pm, Tom Lane wrote: > >> Whoa. Try the following test program. Then send it in to your friendly > >> local BSD hackers .... > > > I've been running this code on a pair of FreeBSD (i386) boxen, for some time > > now, one of which is a 4.8-STABLE, the other is a 5.2-BETA. > > Could it be a hardware problem on Marc's box? Or specific to some other > aspect of that installation (Marc, is pgsql74.hub.org multi-CPU, for > example?) > > The failure is definitely quite repeatable on pgsql74.hub.org. I don't > see it on svr1.postgresql.org, though, which seems to be running almost > the same kernel. On an Intel Linux 2.4.18 I get them quite often, 25 in 1'45", but they are all just a microsecond. On an Intel FreeBSD 3.3 I had one just after starting the program that was 2 secs behind. Then I stopped to restart with a leading under time and I've not seen any since. (4'50" it was running) .529 .2 user/sys time = 0.53 on the Linux system user/sys time = 0.2 on the FreeBSD one Make of that what you will. -- Nigel Andrews
On November 28, 2003 04:19 pm, Tom Lane wrote: > I said: > > The failure is definitely quite repeatable on pgsql74.hub.org. I don't > > see it on svr1.postgresql.org, though, which seems to be running almost > > the same kernel. > > After looking more closely, I take that back: svr1 is failing too, > After 30+ minutes I'm now starting to see the same problems on my 5.2 testbed as well. I'm going to fire this test up on my FreeBSD 3.x and 2.x boxen and see if' it's there too. I can confirm that this not a SMP issue as that it's happeing on both UP and SMP boxen for me. > though not as often: > > uname -a > > FreeBSD svr1.postgresql.org 4.9-PRERELEASE FreeBSD 4.9-PRERELEASE #4: Sat > Sep 20 14:41:58 ADT 2003 > root@jupiter.hub.org:/usr/obj/usr/src/sys/kernel i386 > > > ./a.out > > out of order tv_sec: 1070064182 410497, prev 1070064877 836691 > <----------- out of order tv_usec: 1070064182 410497, prev 1070064877 > 836691 > out of order tv_sec: 1070064877 920579, prev 1070064182 410497 > out of order tv_sec: 1070064901 126624, prev 1070064899 897160 > out of order tv_usec: 1070064901 126624, prev 1070064899 897160 > out of order tv_sec: 1070064907 306286, prev 1070064905 481423 > out of order tv_usec: 1070064907 306286, prev 1070064905 481423 > out of order tv_sec: 1070064218 861047, prev 1070064914 337241 > <----------- out of order tv_sec: 1070064914 570717, prev 1070064218 861047 > out of order tv_usec: 1070064914 570717, prev 1070064218 861047 > out of order tv_sec: 1070064241 411391, prev 1070064936 837585 > <----------- out of order tv_usec: 1070064241 411391, prev 1070064936 > 837585 > out of order tv_sec: 1070064937 497925, prev 1070064241 411391 > out of order tv_sec: 1070064251 811548, prev 1070064947 337739 > <----------- out of order tv_sec: 1070064947 508364, prev 1070064251 811548 > out of order tv_usec: 1070064947 508364, prev 1070064251 811548 > > Maybe it's a 4.9-PRERELEASE bug? > > regards, tom lane -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > On an Intel Linux 2.4.18 I get them quite often, 25 in 1'45", but they > are all just a microsecond. What do you mean by "just a microsecond"? Attached is a tightened-up test program that will only complain if the value of gettimeofday goes backward (at all) or forward by more than 10 seconds (adjustable as MAX_SKIP). This should be suitable to run on moderately loaded machines where the test program might occasionally not get dispatched for a few seconds. I still see no failures on my own (non-BSD) machines, fairly frequent failures on pgsql74.hub.org: > ./a.out out of order tv_sec: 1070066031 262048, prev 1070066726 688240 out of order tv_sec: 1070066726 785019, prev 1070066031 262048 out of order tv_sec: 1070066062 62622, prev 1070066757 588814 out of order tv_sec: 1070066757 771848, prev 1070066062 62622 out of order tv_sec: 1070066093 262974, prev 1070066788 689167 out of order tv_sec: 1070066788 777486, prev 1070066093 262974 out of order tv_sec: 1070066114 113410, prev 1070066809 589602 out of order tv_sec: 1070066809 724663, prev 1070066114 113410 out of order tv_sec: 1070066145 113899, prev 1070066840 590097 out of order tv_sec: 1070066840 726558, prev 1070066145 113899 out of order tv_sec: 1070066155 263911, prev 1070066850 690103 out of order tv_sec: 1070066850 781343, prev 1070066155 263911 out of order tv_sec: 1070066176 164307, prev 1070066871 590505 out of order tv_sec: 1070066871 643350, prev 1070066176 164307 out of order tv_sec: 1070066217 264846, prev 1070066912 691039 out of order tv_sec: 1070066912 775989, prev 1070066217 264846 out of order tv_sec: 1070066248 65394, prev 1070066943 591592 out of order tv_sec: 1070066943 773822, prev 1070066248 65394 ^C and rarer failures on cvs.postgresql.org: > ./a.out out of order tv_sec: 1070066099 389427, prev 1070066794 865617 out of order tv_sec: 1070066795 17855, prev 1070066099 389427 out of order tv_sec: 1070066252 541729, prev 1070066947 967921 out of order tv_sec: 1070066948 38715, prev 1070066252 541729 out of order tv_sec: 1070066371 393525, prev 1070067066 869715 out of order tv_sec: 1070067068 24754, prev 1070066371 393525 ^C It seems consistent that the error is 695 seconds and change when it happens, and that the very next read gives a correct (or at least plausible) value again. regards, tom lane #include <stdio.h> #include <sys/time.h> #define MAX_SKIP 10 int main() {struct timeval prevtime;struct timeval curtime; gettimeofday(&prevtime, NULL); for (;;){ gettimeofday(&curtime, NULL); if (curtime.tv_usec < 0 || curtime.tv_usec >= 1000000) fprintf(stderr, "bogus tv_usec: %ld %ld, prev %ld %ld\n", (long int) curtime.tv_sec, (long int) curtime.tv_usec, (long int) prevtime.tv_sec, (long int) prevtime.tv_usec); else if (curtime.tv_sec < prevtime.tv_sec || curtime.tv_sec > prevtime.tv_sec + MAX_SKIP) fprintf(stderr, "out of order tv_sec: %ld %ld, prev %ld %ld\n", (long int) curtime.tv_sec, (long int) curtime.tv_usec, (long int) prevtime.tv_sec, (long int) prevtime.tv_usec); else if (curtime.tv_usec < prevtime.tv_usec && curtime.tv_sec == prevtime.tv_sec) fprintf(stderr, "out of order tv_usec: %ld %ld, prev %ld %ld\n", (long int) curtime.tv_sec, (long int) curtime.tv_usec, (long int) prevtime.tv_sec, (long int) prevtime.tv_usec); prevtime = curtime;} return 0; }
On Fri, 28 Nov 2003, Tom Lane wrote: > "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > > On an Intel Linux 2.4.18 I get them quite often, 25 in 1'45", but they > > are all just a microsecond. > > What do you mean by "just a microsecond"? I mean it's always a "out of order tv_usec..." line and the difference is 1us. That is a.out gives: out of order tv_usec: 1070065862 374978, prev 1070065862 374979 out of order tv_usec: 1070065867 814300, prev 1070065867 814301 out of order tv_usec: 1070065868 794176, prev 1070065868 794177 out of order tv_usec: 1070065871 553831, prev 1070065871 553832 etc. > > Attached is a tightened-up test program that will only complain if the > value of gettimeofday goes backward (at all) or forward by more than > 10 seconds (adjustable as MAX_SKIP). This should be suitable to run on > moderately loaded machines where the test program might occasionally not > get dispatched for a few seconds. I don't think that would show any for me. I've only got two skips showing from a longer run on my FreeBSD 3.3 and those differ in the seconds counter by only 2. Time between the occurances is about 1100s. -- Nigel
Darcy Buskermolen <darcy@wavefire.com> writes: > After 30+ minutes I'm now starting to see the same problems on my 5.2 testbed > as well. I'm going to fire this test up on my FreeBSD 3.x and 2.x boxen and > see if' it's there too. I can confirm that this not a SMP issue as that it's > happeing on both UP and SMP boxen for me. Good, that's one variable eliminated. Looking at my own data, I notice that when the error happens, the elapsed time shown between the immediately preceding and following okay-looking timestamps is always significant (at least 100 msec and often a second or more). That is kind of a lot for a tight loop containing one simple kernel call, no? I am suspicious that the failure occurs when gettimeofday() is called just as the process is losing control of the CPU (due to using up its timeslice or whatever). When control eventually returns, the process gets a reading that is neither pre-loss-of-CPU nor post-regain-of-CPU, but some unholy combination that nets out as a time about 15 min in the past. Just a theory, but it fits some of the available facts. Can anyone think of a significant interpretation for the number 695 seconds? That's got to be an important clue ... regards, tom lane
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > On Fri, 28 Nov 2003, Tom Lane wrote: >> "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > On an Intel Linux 2.4.18 I get them quite often, 25 in 1'45", but they > are all just a microsecond. >> >> What do you mean by "just a microsecond"? > I mean it's always a "out of order tv_usec..." line and the difference is > 1us. That is a.out gives: > out of order tv_usec: 1070065862 374978, prev 1070065862 374979 > out of order tv_usec: 1070065867 814300, prev 1070065867 814301 Fascinating. I'd call that a bug too, but evidently one with a different mechanism than the BSD issue we are chasing. FWIW, I have not seen any failures in a fair amount of runtime on a 2.4.18 (Red Hat 8.0) kernel here, running on a Dell P4. What is the hardware platform you're using? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Whoa. Try the following test program. I tried this on two uniprocessor x86 machines: (1) $ uname -a Linux tokyo 2.6.0-test10 #3 Mon Nov 24 13:43:54 EST 2003 i686 GNU/Linux This machine produces this output, after running the test app for about 30 minutes: out of order tv_sec: 1070067959 929846, prev 1070067957 813066 out of order tv_usec: 1070068017 80145, prev 1070068017 80384 out of order tv_usec: 1070068017 104138, prev 1070068017 104368 (2) $ uname -a FreeBSD home.samurai.com 4.9-RELEASE FreeBSD 4.9-RELEASE #0: Thu Nov 6 21:25:41 EST 2003 bryanf@joe.samurai.com:/usr/obj/usr/src/sys/SAMURAI i386 This machine produces no output after running the test app for about 30 minutes. -Neil
On Fri, 28 Nov 2003, Tom Lane wrote: > "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > > On Fri, 28 Nov 2003, Tom Lane wrote: > >> "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > > On an Intel Linux 2.4.18 I get them quite often, 25 in 1'45", but they > > are all just a microsecond. > >> > >> What do you mean by "just a microsecond"? > > > I mean it's always a "out of order tv_usec..." line and the difference is > > 1us. That is a.out gives: > > > out of order tv_usec: 1070065862 374978, prev 1070065862 374979 > > out of order tv_usec: 1070065867 814300, prev 1070065867 814301 > > Fascinating. I'd call that a bug too, but evidently one with a > different mechanism than the BSD issue we are chasing. > > FWIW, I have not seen any failures in a fair amount of runtime on > a 2.4.18 (Red Hat 8.0) kernel here, running on a Dell P4. What is > the hardware platform you're using? Ah, I have made a mistake. It's only a 2.2.18 kernal. Dual SMP P-III, perhaps that's the issue there. And on the FreeBSD system I've got this: $ time ./a.out 2>&1 | tee a.txt out of order tv_sec: 1070066197 273140, prev 1070066195 721010 out of order tv_usec: 1070066197 273140, prev 1070066195 721010 out of order tv_sec: 1070067322 116061, prev 1070067320 440490 out of order tv_usec: 1070067322 116061, prev 1070067320 440490 out of order tv_sec: 1070067833 514969, prev 1070067831 755019 out of order tv_usec: 1070067833 514969, prev 1070067831 755019 ^C real 38m53.026s user 6m13.953s sys 32m6.589s So not very often there. -- Nigel
On Fri, 28 Nov 2003, Darcy Buskermolen wrote: > I've been running this code on a pair of FreeBSD (i386) boxen, for some time > now, one of which is a 4.8-STABLE, the other is a 5.2-BETA. > > In 10 minutes of execution the 4.8 box has only shown the following: > > out of order tv_sec: 1070063393 115226, prev 1070063391 381087 > out of order tv_usec: 1070063393 115226, prev 1070063391 381087 I'm suspecting that the issue is amplified by load on the server itself ... > the 5.2 box, hasn't outputed anything. I'm goignt o run this test on > some other boxen I have and see what/ifany results are happening. Similar results here with a 5.2 box ...
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > Ah, I have made a mistake. It's only a 2.2.18 kernal. Dual SMP P-III, perhaps > that's the issue there. Hm. I seem to recall there were still lots of SMP bugs in the 2.2.* kernels. > And on the FreeBSD system I've got this: > $ time ./a.out 2>&1 | tee a.txt > out of order tv_sec: 1070066197 273140, prev 1070066195 721010 > out of order tv_usec: 1070066197 273140, prev 1070066195 721010 > out of order tv_sec: 1070067322 116061, prev 1070067320 440490 > out of order tv_usec: 1070067322 116061, prev 1070067320 440490 > out of order tv_sec: 1070067833 514969, prev 1070067831 755019 > out of order tv_usec: 1070067833 514969, prev 1070067831 755019 > ^C AFAICT the above is a legal trace, indicating only that the test program sometimes lost control for more than a second at a time. The "revised" version of the test will not complain about this. Time going backwards by even one microsecond, however, is generally considered Bad News, unless you're actively manipulating the system clock setting. One variable I didn't think to ask about is whether you are running NTP. In my experience an ntp daemon that has achieved lock will never step the clock back by even 1 usec (it's supposed to use much more subtle methods than that to manage the clock ;-)) but maybe under unstable conditions such things could happen. The machines I have tested here all run NTP. regards, tom lane
On Fri, 28 Nov 2003, Tom Lane wrote: > Could it be a hardware problem on Marc's box? Or specific to some other > aspect of that installation (Marc, is pgsql74.hub.org multi-CPU, for > example?) All the servers are SMP ... pgsql74 is on a Dual-Xeon ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes: > I'm suspecting that the issue is amplified by load on the server itself That would fit right in with my idea that the failure occurs when the process loses its timeslice partway through gettimeofday(). Heavier load would presumably make it more likely to lose the CPU. BTW, I can't get top(1) to work on either pgsql74 or svr1, any idea why? regards, tom lane
On Fri, 28 Nov 2003, Tom Lane wrote: > "Marc G. Fournier" <scrappy@postgresql.org> writes: > > I'm suspecting that the issue is amplified by load on the server itself > > That would fit right in with my idea that the failure occurs when the > process loses its timeslice partway through gettimeofday(). Heavier > load would presumably make it more likely to lose the CPU. I'm posting to the freebsd lists about it, see if they have anything to say ... > > BTW, I can't get top(1) to work on either pgsql74 or svr1, any idea why? top is one of those things that do not work within a VM ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
On Fri, 28 Nov 2003, Tom Lane wrote: > "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > > > $ time ./a.out 2>&1 | tee a.txt > > out of order tv_sec: 1070066197 273140, prev 1070066195 721010 > > out of order tv_usec: 1070066197 273140, prev 1070066195 721010 > > out of order tv_sec: 1070067322 116061, prev 1070067320 440490 > > out of order tv_usec: 1070067322 116061, prev 1070067320 440490 > > out of order tv_sec: 1070067833 514969, prev 1070067831 755019 > > out of order tv_usec: 1070067833 514969, prev 1070067831 755019 > > ^C > > AFAICT the above is a legal trace, indicating only that the test program > sometimes lost control for more than a second at a time. The "revised" > version of the test will not complain about this. Doh! I was looking for time going backwards so that's what I saw, even though I already new the same complaint would be made if time skipped forward a second or so. > One variable I didn't think to ask about is whether you are running > NTP. In my experience an ntp daemon that has achieved lock will never > step the clock back by even 1 usec (it's supposed to use much more > subtle methods than that to manage the clock ;-)) but maybe under > unstable conditions such things could happen. The machines I have > tested here all run NTP. Yes, all NTPed. The FreeBSD system, the one that doesn't skip back *smacks forehead*, is the local time server but is generally lightly loaded. Not to say that there's not going to be the odd job it has to run which could make the system busy and time time away from this test. It's just difficult to say that every so often there's some task that makes the system unresponsive for a couple of seconds, which is what this is saying. Anyway, I like your suggestion as the timeslicing being a possible cause. Seems a little strange though, presumably all the numbers going into the timeofday calc are gathered by the one system call at a place past where the thread could have had it's time sliced away and before the next. However, I have no idea how the BSD kernel operates so can't do more than assume and guess. -- Nigel
On Fri, 28 Nov 2003, Marc G. Fournier wrote: > On Fri, 28 Nov 2003, Tom Lane wrote: > > > "Marc G. Fournier" <scrappy@postgresql.org> writes: > > > I'm suspecting that the issue is amplified by load on the server itself > > > > That would fit right in with my idea that the failure occurs when the > > process loses its timeslice partway through gettimeofday(). Heavier > > load would presumably make it more likely to lose the CPU. > > I'm posting to the freebsd lists about it, see if they have anything to > say ... Just got my first response on this, and, in fact, this is 'acceptable behaviour' to a certain extent ... there is a kernel tuneable called kern.timecounter.method, that is either 1 or 0: 0 sacrifice precision in favor of faster code execution 1 return the more precise time (the same as calling microtime() or nanotime()). The intent of the getmicrotime() and getnanotime() functions is to enforce the user's preference for timer accuracyversus execution time. Now, reading the man page, there seems to be a 'microtime' vs 'getmicrotime' function ... these might be preferable to using gettimeofday on FreeBSD? ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes: > Just got my first response on this, and, in fact, this is 'acceptable > behaviour' to a certain extent ... there is a kernel tuneable called > kern.timecounter.method, that is either 1 or 0: 15 minutes of error is acceptable? regards, tom lane
On Fri, 28 Nov 2003, Tom Lane wrote: > "Marc G. Fournier" <scrappy@postgresql.org> writes: > > Just got my first response on this, and, in fact, this is 'acceptable > > behaviour' to a certain extent ... there is a kernel tuneable called > > kern.timecounter.method, that is either 1 or 0: > > 15 minutes of error is acceptable? Apparently *shrug* That was just the first answer back though, so we'll see what happens ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes: >> 15 minutes of error is acceptable? > Apparently *shrug* That was just the first answer back though, so we'll > see what happens ... BTW, it seems from this thread that the BIND guys have seen even more bogus behavior than this from gettimeofday() on BSD. Their attitude is clearly "BSD is broken": http://www.isc.org/ml-archives/bind-users/2001/01/msg01246.html regards, tom lane
On Fri, 28 Nov 2003, Neil Conway wrote: > $ uname -a > FreeBSD home.samurai.com 4.9-RELEASE FreeBSD 4.9-RELEASE #0: Thu Nov 6 21:25:41 EST 2003 bryanf@joe.samurai.com:/usr/obj/usr/src/sys/SAMURAI i386 > > This machine produces no output after running the test app for about > 30 minutes. what is kern.timecounter.method set to? ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes: > On Fri, 28 Nov 2003, Neil Conway wrote: >> $ uname -a >> FreeBSD home.samurai.com 4.9-RELEASE [...] > > what is kern.timecounter.method set to? $ sysctl kern.timecounter.method kern.timecounter.method: 0 FYI, I ran the test program for another 30 minutes (an hour in total) without any output. -Neil
On Fri, 28 Nov 2003, Neil Conway wrote: > "Marc G. Fournier" <scrappy@postgresql.org> writes: > > On Fri, 28 Nov 2003, Neil Conway wrote: > >> $ uname -a > >> FreeBSD home.samurai.com 4.9-RELEASE [...] > > > > what is kern.timecounter.method set to? > > $ sysctl kern.timecounter.method > kern.timecounter.method: 0 > > FYI, I ran the test program for another 30 minutes (an hour in total) > without any output. Great, I love consistent results :) Some have 5.x that do it, some don't ... 4.x does it on all my machines, but not yours ... What kinda processor are you running? Intel or AMD? Mine are all Intel ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes: > What kinda processor are you running? From dmesg: Timecounter "i8254" frequency 1193182 Hz CPU: Intel Celeron (631.29-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x686 Stepping = 6 Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> -Neil
I ran Tom's second script for two hours on an AMD Duron on Linux 2.4.18. No problems. Gavin
On Fri, Nov 28, 2003 at 08:22:30PM -0500, Tom Lane wrote: > > One variable I didn't think to ask about is whether you are running > NTP. In my experience an ntp daemon that has achieved lock will never > step the clock back by even 1 usec (it's supposed to use much more > subtle methods than that to manage the clock ;-)) but maybe under > unstable conditions such things could happen. The machines I have > tested here all run NTP. ntpd will slew the clock in case the difference is small enough. Which means it will tell the kernel to slow down time a little. In case the offset is too large (0.128 seconds) it will tell the kernel to step the clock. ntpd also has an option to "always slew". But it's broken on atleast Linux 2.4. Note that the kernel uses values from different places to calculate what the current time is, and they're not all running on the same frequency. I wouldn't worry about a 1 us step back. Kurt