Thread: transforms vs. CLOBBER_CACHE_ALWAYS
friarbird is a FreeBSD buildfarm animal running with -DCLOBBER_CACHE_ALWAYS. It usually completes a run in about 6.5 hours. However, it's been stuck since Monday running the plpython regression tests. The only relevant commit seems to be the transforms feature. Here's what it's been doing: pl_regression=# select * from pg_stat_activity where application_name = 'pg_regress'; -[ RECORD 1 ]----+------------------------------ datid | 27438 datname | pl_regression pid | 15434 usesysid | 10 usename | buildfarm application_name | pg_regress client_addr | client_hostname | client_port | -1 backend_start | 2015-04-27 05:51:12.689281-04 xact_start | 2015-04-27 05:51:28.324329-04 query_start | 2015-04-27 05:51:28.324329-04 state_change | 2015-04-27 05:51:28.324341-04 waiting | f state | active backend_xid | backend_xmin | 5540 query | SELECT cursor_plan(); I imagine it was in some sort of infinite loop. gdb says it's all in src/backend/utils/cache/plancache.c, although not the same line each time I run it. I ended up killing it accidentally, but I hope this helps narrow the problem down. The buildfarm server rejected the report because the snapshot was so old, but the relevant report is attached. cheers andrew
Attachment
* Andrew Dunstan: > friarbird is a FreeBSD buildfarm animal running with > -DCLOBBER_CACHE_ALWAYS. It usually completes a run in about 6.5 hours. > However, it's been stuck since Monday running the plpython regression > tests. The only relevant commit seems to be the transforms feature. > Here's what it's been doing: > query | SELECT cursor_plan(); Same here, on jaguarundi. I actually killed it intentionally this morning, hoping that whatever the problem was might have been fixed already. No such luck. I would suspect that it might have something to do with the OS, if all the other CCA animals weren't lining up nicely behind in the buildfarm status page. > I imagine it was in some sort of infinite loop. gdb says it's all in > src/backend/utils/cache/plancache.c, although not the same line each > time I run it. I ktrace'd it this morning, but cleverly did not keep the dump. It looked much the same to me, though, it was reading the same filenode over and over again. -- Christian
On 4/30/15 2:49 PM, Andrew Dunstan wrote: > friarbird is a FreeBSD buildfarm animal running with > -DCLOBBER_CACHE_ALWAYS. It usually completes a run in about 6.5 hours. > However, it's been stuck since Monday running the plpython regression > tests. The only relevant commit seems to be the transforms feature. I can reproduce it. I'll look into it.
On 04/30/2015 09:09 PM, Christian Ullrich wrote: > * Andrew Dunstan: > >> friarbird is a FreeBSD buildfarm animal running with >> -DCLOBBER_CACHE_ALWAYS. It usually completes a run in about 6.5 hours. >> However, it's been stuck since Monday running the plpython regression >> tests. The only relevant commit seems to be the transforms feature. >> Here's what it's been doing: > >> query | SELECT cursor_plan(); > > Same here, on jaguarundi. I actually killed it intentionally this > morning, hoping that whatever the problem was might have been fixed > already. No such luck. > > I would suspect that it might have something to do with the OS, if all > the other CCA animals weren't lining up nicely behind in the buildfarm > status page. > >> I imagine it was in some sort of infinite loop. gdb says it's all in >> src/backend/utils/cache/plancache.c, although not the same line each >> time I run it. > > I ktrace'd it this morning, but cleverly did not keep the dump. It > looked much the same to me, though, it was reading the same filenode > over and over again. > Yeah, this happened again this morning, so it seems to be quite reliably reproducible. I killed it and I've set friarbird to build without python for now, but this is clearly an issue that needs to be resolved. Side thought - maybe we need some sort of timeout mechanism for the buildfarm to try to stop it from hanging. There is actually some timeout code in there from back in the CVS days when occasionally CVS would hang. It could be adapted to timeout other steps. cheers andrew
On 05/01/2015 08:57 AM, Andrew Dunstan wrote: > > On 04/30/2015 09:09 PM, Christian Ullrich wrote: >> * Andrew Dunstan: >> >>> friarbird is a FreeBSD buildfarm animal running with >>> -DCLOBBER_CACHE_ALWAYS. It usually completes a run in about 6.5 hours. >>> However, it's been stuck since Monday running the plpython regression >>> tests. The only relevant commit seems to be the transforms feature. >>> Here's what it's been doing: >> >>> query | SELECT cursor_plan(); >> >> Same here, on jaguarundi. I actually killed it intentionally this >> morning, hoping that whatever the problem was might have been fixed >> already. No such luck. >> >> I would suspect that it might have something to do with the OS, if >> all the other CCA animals weren't lining up nicely behind in the >> buildfarm status page. >> >> > > > Yeah, this happened again this morning, so it seems to be quite > reliably reproducible. I killed it and I've set friarbird to build > without python for now, but this is clearly an issue that needs to be > resolved. And I have confirmed that it's not an OS problem - I have reproduced it on a modern Linux instance (it's still running, in fact). So it's quite clearly a bug that needs to be fixed. cheers andrew
* Peter Eisentraut wrote: > On 4/30/15 2:49 PM, Andrew Dunstan wrote: >> friarbird is a FreeBSD buildfarm animal running with >> -DCLOBBER_CACHE_ALWAYS. It usually completes a run in about 6.5 hours. >> However, it's been stuck since Monday running the plpython regression >> tests. The only relevant commit seems to be the transforms feature. > > I can reproduce it. I'll look into it. I looked over the CCA animals and noticed that pademelon and gaur are apparently unaffected; what they have in common is the OS (HP-UX) and the Python version (2.5). There's nothing I can do about OS-related differences, but I thought I'd check the Python angle. With Python 2.5.6, jaguarundi locks up on the plpython tests just the same as with 3.4, and friarbird with 2.7. So that is not the culprit, either. I ran make check by hand, and noticed three tests where it seemed to hang (I gave it at least three minutes each, and the functions in the queries are simple): plpython_spi SELECT cursor_plan(); plpython_setof SELECT test_setof_as_list(1, 'list'); plpython_composite SELECT multiout_simple_setof(); These are the only plpython tests that mention SETOF at all, and the queries that hung are the first ones in their respective tests to actually build a set. Does that help? -- Christian
Christian Ullrich <chris@chrullrich.net> writes: > * Peter Eisentraut wrote: >> On 4/30/15 2:49 PM, Andrew Dunstan wrote: >>> friarbird is a FreeBSD buildfarm animal running with >>> -DCLOBBER_CACHE_ALWAYS. It usually completes a run in about 6.5 hours. >>> However, it's been stuck since Monday running the plpython regression >>> tests. The only relevant commit seems to be the transforms feature. >> I can reproduce it. I'll look into it. > I looked over the CCA animals and noticed that pademelon and gaur are > apparently unaffected; pademelon and gaur do not run CCA (if they did, it would take weeks for a run to complete :-(). regards, tom lane
* Tom Lane wrote: > Christian Ullrich <chris@chrullrich.net> writes: >> * Peter Eisentraut wrote: >>> On 4/30/15 2:49 PM, Andrew Dunstan wrote: >>>> friarbird is a FreeBSD buildfarm animal running with >>>> -DCLOBBER_CACHE_ALWAYS. It usually completes a run in about 6.5 hours. >>>> However, it's been stuck since Monday running the plpython regression >>>> tests. The only relevant commit seems to be the transforms feature. > >>> I can reproduce it. I'll look into it. > >> I looked over the CCA animals and noticed that pademelon and gaur are >> apparently unaffected; > > pademelon and gaur do not run CCA (if they did, it would take weeks for > a run to complete :-(). Ah. I obviously had associated the "note" icon with CCA. Sorry. OTOH, if I had noticed that, I might not have gone into more detail ... -- Christian