Thread: planner fails on HEAD
Hello I have a relative simple query SELECT q.object_id FROM queue q JOIN outgoing.cps_forms f ON f.id = q.object_id AND q.object_type= 'cp' JOIN flat_file_ex fe ON fe.id = q.rejected_flat_file_id WHERE q.rejected_result= 'ACTV'; The planner fails on this query #0 0x00cf7424 in __kernel_vsyscall () #1 0x004752f1 in raise () from /lib/libc.so.6 #2 0x00476d5e in abort () from /lib/libc.so.6 #3 0x083a1dfe in ExceptionalCondition (conditionName=0x8505474 "!(innerstartsel <= innerendsel)", errorType=0x83db178 "FailedAssertion", fileName=0x8505140 "costsize.c", lineNumber=1937) at assert.c:57 #4 0x08244cea in cost_mergejoin (path=0x93acdd4, root=0x93935d4, sjinfo=0xbfbc9504) at costsize.c:1937 #5 0x0826f859 in create_mergejoin_path (root=0x93935d4, joinrel=0x93aad80, jointype=JOIN_INNER, sjinfo=0xbfbc9504, outer_path=0x93ac0f8, inner_path=0x93ac080, restrict_clauses=0x93acce0, pathkeys=0x0, mergeclauses=0x93adcb4, outersortkeys=0x93adc98, innersortkeys=0x93adcd0) at pathnode.c:1576 #6 0x0824cee4 in sort_inner_and_outer (root=0x93935d4, joinrel=0x93aad80, outerrel=0x93a9a20, innerrel=0x9393e04, jointype=JOIN_INNER, sjinfo=0xbfbc9504, restrictlist=0x93acce0) at joinpath.c:306 #7 add_paths_to_joinrel (root=0x93935d4, joinrel=0x93aad80, outerrel=0x93a9a20, innerrel=0x9393e04, jointype=JOIN_INNER, sjinfo=0xbfbc9504, restrictlist=0x93acce0) at joinpath.c:103 #8 0x0824ea12 in make_join_rel (root=0x93935d4, rel1=0x9393e04, rel2=0x93a9a20) at joinrels.c:733 #9 0x0824ee48 in make_rels_by_clause_joins (root=0x93935d4, level=2) at joinrels.c:268 #10 join_search_one_level (root=0x93935d4, level=2) at joinrels.c:99 #11 0x082410bf in standard_join_search (root=0x93935d4, levels_needed=3, initial_rels=0x93ac998) at allpaths.c:1127 #12 0x082412cf in make_rel_from_joinlist (root=0x93935d4, joinlist=<value optimized out>) at allpaths.c:1058 #13 0x08241390 in make_one_rel (root=0x93935d4, joinlist=0x93aad64) at allpaths.c:103 #14 0x082593d0 in query_planner (root=0x93935d4, tlist=0x939f740, tuple_fraction=0, limit_tuples=-1, cheapest_path=0xbfbc98dc, sorted_path=0xbfbc98d8, num_groups=0xbfbc98d0) at planmain.c:259 #15 0x0825b24d in grouping_planner (root=0x93935d4, tuple_fraction=0) at planner.c:1240 #16 0x0825cfbd in subquery_planner (glob=0x939f37c, parse=0x9370b08, parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0, subroot=0xbfbc9a7c) at planner.c:524 #17 0x0825d8dd in standard_planner (parse=0x9370b08, cursorOptions=0, boundParams=0x0) at planner.c:196 #18 0x082d1b2e in pg_plan_query (querytree=0x9370b08, cursorOptions=0, boundParams=0x0) at postgres.c:720 #19 0x082d1c33 in pg_plan_queries (querytrees=0x939f360, cursorOptions=0, boundParams=0x0) at postgres.c:779 #20 0x082d26fc in exec_simple_query (argc=2, argv=0x92f95a4, username=0x92f94a0 "pavel") at postgres.c:944 #21 PostgresMain (argc=2, argv=0x92f95a4, username=0x92f94a0 "pavel") at postgres.c:3859 #22 0x082844ae in BackendRun (port=0x9316600) at postmaster.c:3587 #23 BackendStartup (port=0x9316600) at postmaster.c:3272 #24 0x08284b58 in ServerLoop () at postmaster.c:1350 #25 0x082856f3 in PostmasterMain (argc=3, argv=0x92f8308) at postmaster.c:1110 #26 0x0821cd00 in main (argc=3, argv=0x92f8308) at main.c:199 with little bit modified query planner does ohs=# explain SELECT q.object_id FROM queue q JOIN outgoing.cps_forms f ON f.id = q.object_idAND q.object_type = 'cp' JOIN flat_file_ex fe ON fe.id = q.rejected_flat_file_id WHERE q.rejected_result = 'ACTVa'; QUERY PLAN ────────────────────────────────────────────────────────────────────────────────────────────────────Nested Loop (cost=0.00..154.05rows=1 width=4) -> Nested Loop (cost=0.00..145.77 rows=1 width=8) -> Seq Scan on queue q (cost=0.00..137.49rows=1 width=8) Filter: ((object_type = 'cp'::bpchar) AND (rejected_result = 'ACTVa'::bpchar)) -> Index Only Scan using cps_forms_pkey on cps_forms f (cost=0.00..8.27 rows=1 width=4) Index Cond: (id = q.object_id) -> Index Only Scan using flat_file_ex_pkeyon flat_file_ex fe (cost=0.00..8.27 rows=1 width=4) Index Cond: (id = q.rejected_flat_file_id) (8 rows) Data and necessary indexes should be correct ohs=# \dt+ List of relationsSchema │ Name │ Type │ Owner │ Size │ Description ────────┼───────────────────────┼───────┼──────────┼─────────┼─────────────public │ cps_form │ table │ postgres│ 48 kB │public │ flat_file_ex │ table │ pavel │ 2632 kB │public │ np_form │ table │pavel │ 432 kB │public │ np_return_number_form │ table │ pavel │ 48 kB │public │ queue │ table│ pavel │ 568 kB │ (5 rows) ohs=# \dt+ outgoing.* List of relations Schema │ Name │ Type │ Owner │ Size │ Description ──────────┼───────────────────────┼───────┼─────────────┼────────────┼─────────────outgoing │ comlog │ table│ dialtelecom │ 8192 bytes │outgoing │ cps_forms │ table │ dialtelecom │ 40 kB │outgoing │ cps_forms_hist │ table │ dialtelecom │ 72 kB │outgoing │ flatfiles │ table │ dialtelecom │ 1328 kB │outgoing │ flatfiles_hist │ table │ dialtelecom │ 8192 bytes │outgoing │ log │ table │ dialtelecom│ 8192 bytes │outgoing │ np_forms │ table │ dialtelecom │ 400 kB │outgoing │ np_forms_hist │ table │ dialtelecom │ 560 kB │outgoing │ opids_soap_properties │ table │ dialtelecom │ 16 kB │outgoing │order_numbers │ table │ dialtelecom │ 0 bytes │ (10 rows) ohs=# select version(); version ──────────────────────────────────────────────────────────────────────────────────────────────────────────PostgreSQL 9.2develon i686-pc-linux-gnu, compiled by gcc (GCC) 4.5.1 20100924 (Red Hat 4.5.1-4), 32-bit (1 row) [pavel@nemesis ohs]$ uname -a Linux nemesis 2.6.35.14-106.fc14.i686.PAE #1 SMP Wed Nov 23 13:39:51 UTC 2011 i686 i686 i386 GNU/Linux I am able to send data by some private stream Regards Pavel Stehule
a plan for modified query is ohs=# explain analyze SELECT object_id, inserted, 'ASSIGN_RSLT', order_id, 2, seqnum, rejected_flat_file_id, true FROM ( SELECT q.object_id, fe.inserted, q.order_id, q.seqnum, q.rejected_flat_file_id, q.rejected_result FROM queue q JOIN outgoing.cps_forms f ON f.id = q.object_id ANDq.object_type = 'cp' JOIN flat_file_ex fe ON fe.id = q.rejected_flat_file_id offset 0) x WHERE rejected_result = 'ACTV'; QUERY PLAN ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────Subquery Scanon x (cost=11.68..192.72 rows=1 width=24) (actual time=1.748..12.398 rows=139 loops=1) Filter: (x.rejected_result = 'ACTV'::bpchar) Rows Removed by Filter: 17 -> Limit (cost=11.68..192.65 rows=6 width=29) (actual time=1.739..11.655 rows=156 loops=1) -> Nested Loop (cost=11.68..192.65 rows=6 width=29) (actual time=1.732..11.036 rows=156 loops=1) -> Hash Join (cost=11.68..138.77 rows=15 width=21) (actual time=1.459..6.987 rows=186 loops=1) Hash Cond: (q.object_id = f.id) -> SeqScan on queue q (cost=0.00..126.24 rows=186 width=21) (actual time=0.032..4.658 rows=186 loops=1) Filter: (object_type = 'cp'::bpchar) Rows Removed by Filter: 4313 -> Hash (cost=9.08..9.08 rows=208width=4) (actual time=1.402..1.402 rows=208 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 5kB -> Seq Scan on cps_forms f (cost=0.00..9.08 rows=208 width=4) (actual time=0.008..0.576 rows=208 loops=1) -> Index Scan using flat_file_ex_pkey on flat_file_ex fe (cost=0.00..3.58 rows=1 width=12) (actual time=0.008..0.010 rows=1 loops=186) Index Cond: (id = q.rejected_flat_file_id)Total runtime: 12.846 ms (16 rows)
Pavel Stehule <pavel.stehule@gmail.com> writes: > #3 0x083a1dfe in ExceptionalCondition (conditionName=0x8505474 > "!(innerstartsel <= innerendsel)", errorType=0x83db178 > "FailedAssertion", fileName=0x8505140 "costsize.c", lineNumber=1937) > at assert.c:57 [ scratches head ... ] Given that it got past the previous assertions, surely that ought to be impossible. Could we see the values of cost_mergejoin's local variables, please? regards, tom lane
2011/12/4 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> #3 0x083a1dfe in ExceptionalCondition (conditionName=0x8505474 >> "!(innerstartsel <= innerendsel)", errorType=0x83db178 >> "FailedAssertion", fileName=0x8505140 "costsize.c", lineNumber=1937) >> at assert.c:57 > > [ scratches head ... ] Given that it got past the previous assertions, > surely that ought to be impossible. Could we see the values of > cost_mergejoin's local variables, please? It is strange when I put a fprintf(stderr, "const literal") to exactly before or somewhere after assertion, then assertion is ok. Without fprintf assertion fails again it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was configured just with --enable-debug and --enable-cassert when I put elog before calculation outerstartsel, innerstartsel,outerendsel and innerendsel then it fails the output is (last elog result) outer_skip_rows: 0.0 inner_skip_rows: 1.0 outer_rows: 208.000 inner_rows: 1.00000 when I append elog to show selectivity, then it work again - related selectivity is outerstartsel: 0.000 outerendsel: 1.000 innerstartsel: 0.166667 innerendsel: 0.166667 Regards Pavel Stehule > > regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2011/12/4 Tom Lane <tgl@sss.pgh.pa.us>: >> [ scratches head ... ] Given that it got past the previous assertions, >> surely that ought to be impossible. Could we see the values of >> cost_mergejoin's local variables, please? > It is strange > when I put a fprintf(stderr, "const literal") to exactly before or > somewhere after assertion, then assertion is ok. Without fprintf > assertion fails again > it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was > configured just with --enable-debug and --enable-cassert Hmm. I'm betting that gcc has flushed one value to memory but the other one is still in a register that's wider than memory, creating a roundoff hazard. Can you look at the generated assembly code? regards, tom lane
2011/12/4 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2011/12/4 Tom Lane <tgl@sss.pgh.pa.us>: >>> [ scratches head ... ] Given that it got past the previous assertions, >>> surely that ought to be impossible. Could we see the values of >>> cost_mergejoin's local variables, please? > >> It is strange > >> when I put a fprintf(stderr, "const literal") to exactly before or >> somewhere after assertion, then assertion is ok. Without fprintf >> assertion fails again > >> it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was >> configured just with --enable-debug and --enable-cassert > > Hmm. I'm betting that gcc has flushed one value to memory but the other > one is still in a register that's wider than memory, creating a roundoff > hazard. Can you look at the generated assembly code? I can, but tomorrow evening, I'll send a code Regards Pavel > > regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was > configured just with --enable-debug and --enable-cassert Is this x86? I can't reproduce it on x86_64. It's fairly easy to get a set of values such that innerstartsel *should* equal innerendsel; but if one value has been rounded to memory precision and the other hasn't, the assert could certainly fail. Some digging around yields the information that the gcc hackers do not consider this a bug, or at least adamantly refuse to do anything about it: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 Comment 47 is particularly relevant to our situation: To summarize, this defect effectively states that:assert( (x/y) == (x/y) )may cause an assertion if compiled with optimization. Also, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45691#c4 indicates that an explicit cast to double should help. Would you check if the problem goes away if the Asserts are changed to Assert((double) outerstartsel <= (double) outerendsel);Assert((double) innerstartsel <= (double) innerendsel); regards, tom lane
On Sun, Dec 4, 2011 at 4:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was >> configured just with --enable-debug and --enable-cassert > > Is this x86? I can't reproduce it on x86_64. reading all the comments in the gcc bug report, this is because x86 targets the x87 fpu by default which is where the bug is -- it's a hardware problem. x86_64 targets sse which has stricter standards for rounding. most x86 processors support sse -- is there a reason why we don't target sse? merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Sun, Dec 4, 2011 at 4:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Pavel Stehule <pavel.stehule@gmail.com> writes: > it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was > configured just with --enable-debug and --enable-cassert >> >> Is this x86? �I can't reproduce it on x86_64. > reading all the comments in the gcc bug report, this is because x86 > targets the x87 fpu by default which is where the bug is -- it's a > hardware problem. x86_64 targets sse which has stricter standards for > rounding. most x86 processors support sse -- is there a reason why we > don't target sse? Well, older machines won't have sse, and in any case I think x86 is not the only architecture with the issue, just the most popular one. Floating-point registers that are wider than standard double are hardly an unusual idea. regards, tom lane
Hello 2011/12/4 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was >> configured just with --enable-debug and --enable-cassert > > Is this x86? I can't reproduce it on x86_64. > yes, this is x86 platform uname -a Linux nemesis 2.6.35.14-106.fc14.i686.PAE #1 SMP Wed Nov 23 13:39:51 UTC 2011 i686 i686 i386 GNU/Linux [pavel@nemesis ~]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz stepping : 11 cpu MHz : 800.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi flexpriority bogomips : 4785.76 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz stepping : 11 cpu MHz : 800.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi flexpriority bogomips : 4786.60 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: it is Dell latitude D830 > It's fairly easy to get a set of values such that innerstartsel *should* > equal innerendsel; but if one value has been rounded to memory precision > and the other hasn't, the assert could certainly fail. > > Some digging around yields the information that the gcc hackers do not > consider this a bug, or at least adamantly refuse to do anything about it: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 > Comment 47 is particularly relevant to our situation: > > To summarize, this defect effectively states that: > assert( (x/y) == (x/y) ) > may cause an assertion if compiled with optimization. > > Also, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45691#c4 > indicates that an explicit cast to double should help. Would > you check if the problem goes away if the Asserts are changed to > > Assert((double) outerstartsel <= (double) outerendsel); > Assert((double) innerstartsel <= (double) innerendsel); > it doesn't help > regards, tom lane assambler list is attached Regards Pavel Stehule
Attachment
On Mon, Dec 5, 2011 at 12:17 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hello > > 2011/12/4 Tom Lane <tgl@sss.pgh.pa.us>: >> Pavel Stehule <pavel.stehule@gmail.com> writes: >>> it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was >>> configured just with --enable-debug and --enable-cassert >> >> Is this x86? I can't reproduce it on x86_64. >> > > yes, this is x86 platform > > uname -a > Linux nemesis 2.6.35.14-106.fc14.i686.PAE #1 SMP Wed Nov 23 13:39:51 > UTC 2011 i686 i686 i386 GNU/Linux > > [pavel@nemesis ~]$ cat /proc/cpuinfo > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 15 > model name : Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz > stepping : 11 > cpu MHz : 800.000 > cache size : 4096 KB > physical id : 0 > siblings : 2 > core id : 0 > cpu cores : 2 > apicid : 0 > initial apicid : 0 > fdiv_bug : no > hlt_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm > constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor > ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi > flexpriority > bogomips : 4785.76 > clflush size : 64 > cache_alignment : 64 > address sizes : 36 bits physical, 48 bits virtual > power management: > > processor : 1 > vendor_id : GenuineIntel > cpu family : 6 > model : 15 > model name : Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz > stepping : 11 > cpu MHz : 800.000 > cache size : 4096 KB > physical id : 0 > siblings : 2 > core id : 1 > cpu cores : 2 > apicid : 1 > initial apicid : 1 > fdiv_bug : no > hlt_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm > constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor > ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi > flexpriority > bogomips : 4786.60 > clflush size : 64 > cache_alignment : 64 > address sizes : 36 bits physical, 48 bits virtual > power management: > > it is Dell latitude D830 > >> It's fairly easy to get a set of values such that innerstartsel *should* >> equal innerendsel; but if one value has been rounded to memory precision >> and the other hasn't, the assert could certainly fail. >> >> Some digging around yields the information that the gcc hackers do not >> consider this a bug, or at least adamantly refuse to do anything about it: >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 >> Comment 47 is particularly relevant to our situation: >> >> To summarize, this defect effectively states that: >> assert( (x/y) == (x/y) ) >> may cause an assertion if compiled with optimization. >> >> Also, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45691#c4 >> indicates that an explicit cast to double should help. Would >> you check if the problem goes away if the Asserts are changed to >> >> Assert((double) outerstartsel <= (double) outerendsel); >> Assert((double) innerstartsel <= (double) innerendsel); >> > > it doesn't help > >> regards, tom lane > > assambler list is attached how about: Assert((volatile double) outerstartsel <= (volatile double) outerendsel); etc merlin
2011/12/5 Merlin Moncure <mmoncure@gmail.com>: > On Mon, Dec 5, 2011 at 12:17 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Hello >> >> 2011/12/4 Tom Lane <tgl@sss.pgh.pa.us>: >>> Pavel Stehule <pavel.stehule@gmail.com> writes: >>>> it looks like gcc bug - gcc 4.5.1 20100924 (Red Hat 4.5.1) It was >>>> configured just with --enable-debug and --enable-cassert >>> >>> Is this x86? I can't reproduce it on x86_64. >>> >> >> yes, this is x86 platform >> >> uname -a >> Linux nemesis 2.6.35.14-106.fc14.i686.PAE #1 SMP Wed Nov 23 13:39:51 >> UTC 2011 i686 i686 i386 GNU/Linux >> >> [pavel@nemesis ~]$ cat /proc/cpuinfo >> processor : 0 >> vendor_id : GenuineIntel >> cpu family : 6 >> model : 15 >> model name : Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz >> stepping : 11 >> cpu MHz : 800.000 >> cache size : 4096 KB >> physical id : 0 >> siblings : 2 >> core id : 0 >> cpu cores : 2 >> apicid : 0 >> initial apicid : 0 >> fdiv_bug : no >> hlt_bug : no >> f00f_bug : no >> coma_bug : no >> fpu : yes >> fpu_exception : yes >> cpuid level : 10 >> wp : yes >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov >> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm >> constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor >> ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi >> flexpriority >> bogomips : 4785.76 >> clflush size : 64 >> cache_alignment : 64 >> address sizes : 36 bits physical, 48 bits virtual >> power management: >> >> processor : 1 >> vendor_id : GenuineIntel >> cpu family : 6 >> model : 15 >> model name : Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz >> stepping : 11 >> cpu MHz : 800.000 >> cache size : 4096 KB >> physical id : 0 >> siblings : 2 >> core id : 1 >> cpu cores : 2 >> apicid : 1 >> initial apicid : 1 >> fdiv_bug : no >> hlt_bug : no >> f00f_bug : no >> coma_bug : no >> fpu : yes >> fpu_exception : yes >> cpuid level : 10 >> wp : yes >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov >> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm >> constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor >> ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi >> flexpriority >> bogomips : 4786.60 >> clflush size : 64 >> cache_alignment : 64 >> address sizes : 36 bits physical, 48 bits virtual >> power management: >> >> it is Dell latitude D830 >> >>> It's fairly easy to get a set of values such that innerstartsel *should* >>> equal innerendsel; but if one value has been rounded to memory precision >>> and the other hasn't, the assert could certainly fail. >>> >>> Some digging around yields the information that the gcc hackers do not >>> consider this a bug, or at least adamantly refuse to do anything about it: >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 >>> Comment 47 is particularly relevant to our situation: >>> >>> To summarize, this defect effectively states that: >>> assert( (x/y) == (x/y) ) >>> may cause an assertion if compiled with optimization. >>> >>> Also, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45691#c4 >>> indicates that an explicit cast to double should help. Would >>> you check if the problem goes away if the Asserts are changed to >>> >>> Assert((double) outerstartsel <= (double) outerendsel); >>> Assert((double) innerstartsel <= (double) innerendsel); >>> >> >> it doesn't help >> >>> regards, tom lane >> >> assambler list is attached > > how about: > Assert((volatile double) outerstartsel <= (volatile double) outerendsel); doesn't help too Regards Pavel > etc > > merlin
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2011/12/4 Tom Lane <tgl@sss.pgh.pa.us>: >> Is this x86? I can't reproduce it on x86_64. > yes, this is x86 platform > uname -a > Linux nemesis 2.6.35.14-106.fc14.i686.PAE #1 SMP Wed Nov 23 13:39:51 > UTC 2011 i686 i686 i386 GNU/Linux I reproduced this with gcc 4.6.0 on Fedora 15 x86, too. >> Also, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45691#c4 >> indicates that an explicit cast to double should help. Would >> you check if the problem goes away if the Asserts are changed to >> >> Assert((double) outerstartsel <= (double) outerendsel); >> Assert((double) innerstartsel <= (double) innerendsel); > it doesn't help Hmm ... I'm inclined to think this actually *is* a bug, since Jakub is on record as saying it should work. Nonetheless, we need a workaround, since gcc versions behaving this way are going to be widespread for a long time even if we convince them to do something about it (which I suspect they wouldn't given their imperviousness to complaints about the main issue). I'm now thinking the best solution is just to drop these two Asserts. They're not adding anything very useful given the previous ones (which should be safe since those involve quantities rounded to integers). regards, tom lane