Thread: Assertion failure with LEFT JOINs among >500 relations
Hi,
I hit an assertion failure. When asserts disabled, it works fine even with more tables (>5000).
Steps to reproduce:
CREATE TABLE users_table (user_id int, time timestamp, value_1 int, value_2 int, value_3 float, value_4 bigint);
250 relations work fine, see the query (too long to copy & paste here): https://gist.github.com/onderkalaci/2b40a18d989da389ee4fb631e1ad7c0e#file-steps_to_assert_pg-sql-L41
-- when # relations >500, we hit the assertion (too long to copy & paste here):
See the query: https://gist.github.com/onderkalaci/2b40a18d989da389ee4fb631e1ad7c0e#file-steps_to_assert_pg-sql-L45
And, the backtrace:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff639fa2c2 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff63ab5bf1 libsystem_pthread.dylib`pthread_kill + 284
frame #2: 0x00007fff639646a6 libsystem_c.dylib`abort + 127
frame #3: 0x0000000102180a02 postgres`ExceptionalCondition(conditionName=<unavailable>, errorType=<unavailable>, fileName=<unavailable>, lineNumber=<unavailable>) at assert.c:67:2
frame #4: 0x0000000101ece9b2 postgres`initial_cost_mergejoin(root=0x7ff0000000000000, workspace=0x00007ffeedf5b528, jointype=JOIN_INNER, mergeclauses=<unavailable>, outer_path=0x000000012ebf12d0, inner_path=0x4093d80000000000, outersortkeys=0x0000000000000000, innersortkeys=0x000000012ebf68e8, extra=0x00007ffeedf5b6f8) at costsize.c:3043:2
frame #5: 0x0000000101eda01b postgres`try_mergejoin_path(root=0x0000000104a12618, joinrel=0x000000012ebeede0, outer_path=0x000000012ebf12d0, inner_path=0x00000001283d00e8, pathkeys=0x000000012ebf67e0, mergeclauses=0x000000012ebf6890, outersortkeys=0x0000000000000000, innersortkeys=0x000000012ebf68e8, jointype=JOIN_LEFT, extra=0x00007ffeedf5b6f8, is_partial=<unavailable>) at joinpath.c:615:2
frame #6: 0x0000000101ed9426 postgres`sort_inner_and_outer(root=0x0000000104a12618, joinrel=0x000000012ebeede0, outerrel=<unavailable>, innerrel=<unavailable>, jointype=JOIN_LEFT, extra=0x00007ffeedf5b6f8) at joinpath.c:1038:3
frame #7: 0x0000000101ed8f7a postgres`add_paths_to_joinrel(root=0x0000000104a12618, joinrel=0x000000012ebeede0, outerrel=0x000000012ebe7b48, innerrel=0x0000000127f146e0, jointype=<unavailable>, sjinfo=<unavailable>, restrictlist=0x000000012ebf42b0) at joinpath.c:269:3
frame #8: 0x0000000101edbdc6 postgres`populate_joinrel_with_paths(root=0x0000000104a12618, rel1=0x000000012ebe7b48, rel2=0x0000000127f146e0, joinrel=0x000000012ebeede0, sjinfo=0x000000012809edc8, restrictlist=0x000000012ebf42b0) at joinrels.c:824:4
frame #9: 0x0000000101edb57a postgres`make_join_rel(root=0x0000000104a12618, rel1=0x000000012ebe7b48, rel2=0x0000000127f146e0) at joinrels.c:760:2
frame #10: 0x0000000101edb1ec postgres`make_rels_by_clause_joins(root=0x0000000104a12618, old_rel=0x000000012ebe7b48, other_rels_list=<unavailable>, other_rels=<unavailable>) at joinrels.c:312:11
frame #11: 0x0000000101edada3 postgres`join_search_one_level(root=0x0000000104a12618, level=2) at joinrels.c:123:4
frame #12: 0x0000000101ec7feb postgres`standard_join_search(root=0x0000000104a12618, levels_needed=8, initial_rels=0x000000012ebf4078) at allpaths.c:3097:3
frame #13: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280a5618) at allpaths.c:2993:14
frame #14: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280ab320) at allpaths.c:2993:14
frame #15: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280b1028) at allpaths.c:2993:14
frame #16: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280b6d30) at allpaths.c:2993:14
frame #17: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280bca38) at allpaths.c:2993:14
frame #18: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280c2740) at allpaths.c:2993:14
frame #19: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280c8448) at allpaths.c:2993:14
frame #20: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280ce150) at allpaths.c:2993:14
frame #21: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280d3e58) at allpaths.c:2993:14
frame #22: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280d9b60) at allpaths.c:2993:14
frame #23: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280df868) at allpaths.c:2993:14
frame #24: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280e5570) at allpaths.c:2993:14
frame #25: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280eb278) at allpaths.c:2993:14
frame #26: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280f0f80) at allpaths.c:2993:14
frame #27: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001280f8d88) at allpaths.c:2993:14
frame #28: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128101810) at allpaths.c:2993:14
frame #29: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012810a298) at allpaths.c:2993:14
frame #30: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128112d20) at allpaths.c:2993:14
frame #31: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012811b7a8) at allpaths.c:2993:14
frame #32: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128124230) at allpaths.c:2993:14
frame #33: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012812ccb8) at allpaths.c:2993:14
frame #34: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128135740) at allpaths.c:2993:14
frame #35: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012813e1c8) at allpaths.c:2993:14
frame #36: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128146c50) at allpaths.c:2993:14
frame #37: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012814f6d8) at allpaths.c:2993:14
frame #38: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128158160) at allpaths.c:2993:14
frame #39: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128160be8) at allpaths.c:2993:14
frame #40: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128169670) at allpaths.c:2993:14
frame #41: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001281720f8) at allpaths.c:2993:14
frame #42: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012817ab80) at allpaths.c:2993:14
frame #43: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128183608) at allpaths.c:2993:14
frame #44: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012818c090) at allpaths.c:2993:14
frame #45: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128194b18) at allpaths.c:2993:14
frame #46: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012819d5a0) at allpaths.c:2993:14
frame #47: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001281a6028) at allpaths.c:2993:14
frame #48: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001281aeab0) at allpaths.c:2993:14
frame #49: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001281b7538) at allpaths.c:2993:14
frame #50: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001281bffc0) at allpaths.c:2993:14
frame #51: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001281c8a48) at allpaths.c:2993:14
frame #52: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001281d14d0) at allpaths.c:2993:14
frame #53: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001281d9f58) at allpaths.c:2993:14
frame #54: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001281e29e0) at allpaths.c:2993:14
frame #55: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001281eb468) at allpaths.c:2993:14
frame #56: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001281f3ef0) at allpaths.c:2993:14
frame #57: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001281fc978) at allpaths.c:2993:14
frame #58: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128205400) at allpaths.c:2993:14
frame #59: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012820de88) at allpaths.c:2993:14
frame #60: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128216910) at allpaths.c:2993:14
frame #61: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012821f398) at allpaths.c:2993:14
frame #62: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128227e20) at allpaths.c:2993:14
frame #63: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282308a8) at allpaths.c:2993:14
frame #64: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128239330) at allpaths.c:2993:14
frame #65: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128241db8) at allpaths.c:2993:14
frame #66: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012824a840) at allpaths.c:2993:14
frame #67: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282532c8) at allpaths.c:2993:14
frame #68: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012825bd50) at allpaths.c:2993:14
frame #69: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282647d8) at allpaths.c:2993:14
frame #70: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012826d260) at allpaths.c:2993:14
frame #71: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128275ce8) at allpaths.c:2993:14
frame #72: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012827e770) at allpaths.c:2993:14
frame #73: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282871f8) at allpaths.c:2993:14
frame #74: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012828fc80) at allpaths.c:2993:14
frame #75: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128298708) at allpaths.c:2993:14
frame #76: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282a1190) at allpaths.c:2993:14
frame #77: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282a9c18) at allpaths.c:2993:14
frame #78: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282b26a0) at allpaths.c:2993:14
frame #79: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282bb128) at allpaths.c:2993:14
frame #80: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282c3bb0) at allpaths.c:2993:14
frame #81: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282cc638) at allpaths.c:2993:14
frame #82: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282d50c0) at allpaths.c:2993:14
frame #83: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282ddb48) at allpaths.c:2993:14
frame #84: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282e65d0) at allpaths.c:2993:14
frame #85: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282ef058) at allpaths.c:2993:14
frame #86: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001282f7ae0) at allpaths.c:2993:14
frame #87: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128300568) at allpaths.c:2993:14
frame #88: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128308ff0) at allpaths.c:2993:14
frame #89: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128311a78) at allpaths.c:2993:14
frame #90: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012831a500) at allpaths.c:2993:14
frame #91: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128322f88) at allpaths.c:2993:14
frame #92: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012832ba10) at allpaths.c:2993:14
frame #93: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128334498) at allpaths.c:2993:14
frame #94: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012833cf20) at allpaths.c:2993:14
frame #95: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001283459a8) at allpaths.c:2993:14
frame #96: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012834e430) at allpaths.c:2993:14
frame #97: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x0000000128356eb8) at allpaths.c:2993:14
frame #98: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012835f940) at allpaths.c:2993:14
frame #99: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x00000001283683c8) at allpaths.c:2993:14
frame #100: 0x0000000101ec6b38 postgres`make_rel_from_joinlist(root=0x0000000104a12618, joinlist=0x000000012837e358) at allpaths.c:2993:14
frame #101: 0x0000000101ec688f postgres`make_one_rel(root=0x0000000104a12618, joinlist=0x000000012837e358) at allpaths.c:227:8
frame #102: 0x0000000101eec187 postgres`query_planner(root=0x0000000104a12618, qp_callback=<unavailable>, qp_extra=0x00007ffeedf5d000) at planmain.c:269:14
frame #103: 0x0000000101eeea9b postgres`grouping_planner(root=0x0000000104a12618, inheritance_update=<unavailable>, tuple_fraction=<unavailable>) at planner.c:2058:17
frame #104: 0x0000000101eed1a1 postgres`subquery_planner(glob=<unavailable>, parse=0x00000001049ad620, parent_root=<unavailable>, hasRecursion=<unavailable>, tuple_fraction=0) at planner.c:1015:3
frame #105: 0x0000000101eec3b6 postgres`standard_planner(parse=0x00000001049ad620, query_string=<unavailable>, cursorOptions=256, boundParams=0x0000000000000000) at planner.c:405:9
frame #106: 0x0000000101faeaf1 postgres`pg_plan_query(querytree=0x00000001049ad620, query_string="SELECT count(*) FROM users_table u_1123123123123123 LEFT JOIN users_table u0 USING (user_id) LEFT JOIN users_table u1 USING (user_id) LEFT JOIN users_table u2 USING (user_id) LEFT JOIN users_table u3 USING (user_id) LEFT JOIN users_table u4 USING (user_id) LEFT JOIN users_table u5 USING (user_id) LEFT JOIN users_table u6 USING (user_id) LEFT JOIN users_table u7 USING (user_id) LEFT JOIN users_table u8 USING (user_id) LEFT JOIN users_table u9 USING (user_id) LEFT JOIN users_table u10 USING (user_id) LEFT JOIN users_table u11 USING (user_id) LEFT JOIN users_table u12 USING (user_id) LEFT JOIN users_table u13 USING (user_id) LEFT JOIN users_table u14 USING (user_id) LEFT JOIN users_table u15 USING (user_id) LEFT JOIN users_table u16 USING (user_id) LEFT JOIN users_table u17 USING (user_id) LEFT JOIN users_table u18 USING (user_id) LEFT JOIN users_table u19 USING (user_id) LEFT JOIN users_table u20 USING (user_id) LEFT JOIN users_table u21 USING (user_id) LEFT JOIN users_table u22 USING (use"..., cursorOptions=256, boundParams=0x0000000000000000) at postgres.c:875:9
frame #107: 0x0000000101faec32 postgres`pg_plan_queries(querytrees=0x00000001275c20e0, query_string="SELECT count(*) FROM users_table u_1123123123123123 LEFT JOIN users_table u0 USING (user_id) LEFT JOIN users_table u1 USING (user_id) LEFT JOIN users_table u2 USING (user_id) LEFT JOIN users_table u3 USING (user_id) LEFT JOIN users_table u4 USING (user_id) LEFT JOIN users_table u5 USING (user_id) LEFT JOIN users_table u6 USING (user_id) LEFT JOIN users_table u7 USING (user_id) LEFT JOIN users_table u8 USING (user_id) LEFT JOIN users_table u9 USING (user_id) LEFT JOIN users_table u10 USING (user_id) LEFT JOIN users_table u11 USING (user_id) LEFT JOIN users_table u12 USING (user_id) LEFT JOIN users_table u13 USING (user_id) LEFT JOIN users_table u14 USING (user_id) LEFT JOIN users_table u15 USING (user_id) LEFT JOIN users_table u16 USING (user_id) LEFT JOIN users_table u17 USING (user_id) LEFT JOIN users_table u18 USING (user_id) LEFT JOIN users_table u19 USING (user_id) LEFT JOIN users_table u20 USING (user_id) LEFT JOIN users_table u21 USING (user_id) LEFT JOIN users_table u22 USING (use"..., cursorOptions=256, boundParams=0x0000000000000000) at postgres.c:966:11
frame #108: 0x0000000101fb09fa postgres`exec_simple_query(query_string="SELECT count(*) FROM users_table u_1123123123123123 LEFT JOIN users_table u0 USING (user_id) LEFT JOIN users_table u1 USING (user_id) LEFT JOIN users_table u2 USING (user_id) LEFT JOIN users_table u3 USING (user_id) LEFT JOIN users_table u4 USING (user_id) LEFT JOIN users_table u5 USING (user_id) LEFT JOIN users_table u6 USING (user_id) LEFT JOIN users_table u7 USING (user_id) LEFT JOIN users_table u8 USING (user_id) LEFT JOIN users_table u9 USING (user_id) LEFT JOIN users_table u10 USING (user_id) LEFT JOIN users_table u11 USING (user_id) LEFT JOIN users_table u12 USING (user_id) LEFT JOIN users_table u13 USING (user_id) LEFT JOIN users_table u14 USING (user_id) LEFT JOIN users_table u15 USING (user_id) LEFT JOIN users_table u16 USING (user_id) LEFT JOIN users_table u17 USING (user_id) LEFT JOIN users_table u18 USING (user_id) LEFT JOIN users_table u19 USING (user_id) LEFT JOIN users_table u20 USING (user_id) LEFT JOIN users_table u21 USING (user_id) LEFT JOIN users_table u22 USING (use"...) at postgres.c:1158:19
frame #109: 0x0000000101fb024e postgres`PostgresMain(argc=<unavailable>, argv=<unavailable>, dbname=<unavailable>, username=<unavailable>) at postgres.c:0
frame #110: 0x0000000101f35f65 postgres`BackendRun(port=0x0000000000000001) at postmaster.c:4536:2
frame #111: 0x0000000101f35830 postgres`BackendStartup(port=<unavailable>) at postmaster.c:4220:3
frame #112: 0x0000000101f35005 postgres`ServerLoop at postmaster.c:1739:7
frame #113: 0x0000000101f3321c postgres`PostmasterMain(argc=3, argv=0x00007fc7a7403250) at postmaster.c:1412:11
frame #114: 0x0000000101e91e06 postgres`main(argc=3, argv=0x00007fc7a7403250) at main.c:210:3
frame #115: 0x00007fff638bf3d5 libdyld.dylib`start + 1
frame #116: 0x00007fff638bf3d5 libdyld.dylib`start + 1
SELECT version();
version
-------------------------------------------------------------------------------------------------------------------
PostgreSQL 13.0 on x86_64-apple-darwin18.7.0, compiled by Apple clang version 11.0.0 (clang-1100.0.33.17), 64-bit
(1 row)
pg_config
BINDIR = /Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0/bin
DOCDIR = /Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0/share/doc
HTMLDIR = /Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0/share/doc
INCLUDEDIR = /Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0/include
PKGINCLUDEDIR = /Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0/include
INCLUDEDIR-SERVER = /Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0/include/server
LIBDIR = /Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0/lib
PKGLIBDIR = /Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0/lib
LOCALEDIR = /Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0/share/locale
MANDIR = /Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0/share/man
SHAREDIR = /Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0/share
SYSCONFDIR = /Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0/etc
PGXS = /Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/Users/onderkalaci/Documents/citus_code/pgenv/pgsql-13.0' '--enable-debug' '--enable-cassert' 'CFLAGS=-ggdb -Og -g3 -fno-omit-frame-pointer' '--with-openssl' '--with-icu' 'LDFLAGS=-L/usr/local/opt/readline/lib -L/usr/local/opt/openssl/lib ' 'CPPFLAGS=-I/usr/local/opt/readline/include -I/usr/local/opt/openssl/include/ ' 'PKG_CONFIG_PATH=/usr/local/opt/icu4c/lib/pkgconfig'
CC = gcc
CPPFLAGS = -I/usr/local/Cellar/icu4c/66.1/include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -I/usr/local/opt/readline/include -I/usr/local/opt/openssl/include/
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -g -ggdb -Og -g3 -fno-omit-frame-pointer
CFLAGS_SL =
LDFLAGS = -L/usr/local/opt/readline/lib -L/usr/local/opt/openssl/lib -Wl,-dead_strip_dylibs
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lssl -lcrypto -lz -lreadline -lm
VERSION = PostgreSQL 13.0
On Fri, 9 Oct 2020 at 08:16, Onder Kalaci <onderk@microsoft.com> wrote: > I hit an assertion failure. When asserts disabled, it works fine even with more tables (>5000). > > Steps to reproduce: > CREATE TABLE users_table (user_id int, time timestamp, value_1 int, value_2 int, value_3 float, value_4 bigint); > 250 relations work fine, see the query (too long to copy & paste here): https://gist.github.com/onderkalaci/2b40a18d989da389ee4fb631e1ad7c0e#file-steps_to_assert_pg-sql-L41 I had a quick look at this and I can recreate it using the following (using psql) select 'explain select count(*) from users_table ' || string_Agg('LEFT JOIN users_table u'|| x::text || ' USING (user_id)',' ') from generate_Series(1,379)x; \gexec That triggers the assert due to the Assert(outer_skip_rows <= outer_rows); failing in initial_cost_mergejoin(). The reason it fails is that outer_path_rows has become infinity due to calc_joinrel_size_estimate continually multiplying in the join selectivity of 0.05 (due to our 200 default num distinct from lack of any stats) which after a number of iterations causes the number to become very large. Instead of running 379 joins from above, try with 378 and you get: Aggregate (cost=NaN..NaN rows=1 width=8) -> Nested Loop Left Join (cost=33329.16..NaN rows=Infinity width=0) Join Filter: (users_table.user_id = u378.user_id) -> Merge Left Join (cost=33329.16..<very large number> width=4) Merge Cond: (users_table.user_id = u377.user_id) -> Merge Left Join (cost=33240.99..<very large number> width=4) Changing the code in initial_cost_mergejoin() to add: if (outer_path_rows <= 0 || isnan(outer_path_rows)) outer_path_rows = 1; +else if (isinf(outer_path_rows)) + outer_path_rows = DBL_MAX; does seem to fix the problem, but that's certainly not the right fix. Perhaps the right fix is to modify clamp_row_est() with: @@ -193,7 +194,9 @@ clamp_row_est(double nrows) * better and to avoid possible divide-by-zero when interpolating costs. * Make it an integer, too. */ - if (nrows <= 1.0) + if (isinf(nrows)) + nrows = rint(DBL_MAX); + else if (nrows <= 1.0) nrows = 1.0; else nrows = rint(nrows); but the row estimates are getting pretty insane well before then. DBL_MAX is 226 orders of magnitude more than the estimated number of atoms in the observable universe, so it seems pretty unreasonable that someone might figure out a way to store that many tuples on a disk any time soon. Perhaps DBL_MAX is way to big a number to clamp at. I'm just not sure what we should reduce it to so that it is reasonable. David
David Rowley <dgrowleyml@gmail.com> writes: > The reason it fails is that outer_path_rows has become infinity due to > calc_joinrel_size_estimate continually multiplying in the join > selectivity of 0.05 (due to our 200 default num distinct from lack of > any stats) which after a number of iterations causes the number to > become very large. 0.005, but yeah. We're estimating that each additional join inflates the output size by about 6x (1270 * 0.005), and after a few hundred of those, it'll overflow. > Perhaps the right fix is to modify clamp_row_est() with: I thought of that too, but as you say, if the rowcount has overflowed a double then we've got way worse problems. It'd make more sense to try to keep the count to a saner value in the first place. In the end, (a) this is an Assert, so not a problem for production systems, and (b) it's going to take you longer than you want to wait to join 500+ tables, anyhow, unless maybe they're empty. I'm kind of disinclined to do anything in the way of a band-aid fix. If somebody has an idea for a different way of estimating the join size with no stats, we could talk about that. I notice though that the only way a plan of this sort isn't going to blow up at execution is if the join multiplication factor is at most 1, ie the join key is unique. But guess what, we already know what to do in that case. Adding a unique or pkey constraint to users_table.user_id causes the plan to collapse entirely (if they're left joins) or at least still produce a small rowcount estimate (if plain joins). regards, tom lane
On Fri, 9 Oct 2020 at 12:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Perhaps the right fix is to modify clamp_row_est() with: > > I thought of that too, but as you say, if the rowcount has overflowed a > double then we've got way worse problems. It'd make more sense to try > to keep the count to a saner value in the first place. I wonder if there was something more logical we could do to maintain sane estimates too, but someone could surely still cause it to blow up by writing a long series of clause-less joins. We can't really get away from the fact that we must estimate those as inner_rows * outer_rows I admit it's annoying to add cycles to clamp_row_est() for such insane cases. David
David Rowley <dgrowleyml@gmail.com> writes: > I admit it's annoying to add cycles to clamp_row_est() for such insane cases. I poked at this a bit more closely, and noted that the actual problem is that when we do this: outer_skip_rows = rint(outer_path_rows * outerstartsel); we have outer_path_rows = inf, outerstartsel = 0, and of course inf times zero is NaN. So we end up asserting "NaN <= Inf", not "Inf <= Inf" (which wouldn't have caused a problem). If we did want to do something here, I'd consider something like if (isnan(outer_skip_rows)) outer_skip_rows = 0; if (isnan(inner_skip_rows)) inner_skip_rows = 0; (We shouldn't need that for outer_rows/inner_rows, since the endsel values can't be 0.) Messing with clamp_row_est would be a much more indirect way of fixing it, as well as having more widespread effects. In the end though, I'm still not terribly excited about this. regards, tom lane
On Fri, 9 Oct 2020 at 12:59, Tom Lane <tgl@sss.pgh.pa.us> wrote: > If we did want to do something here, I'd consider something like > > if (isnan(outer_skip_rows)) > outer_skip_rows = 0; > if (isnan(inner_skip_rows)) > inner_skip_rows = 0; Are you worried about the costs above the join that triggers that coming out as NaN with that fix? It appears that's the case. Cost comparisons of paths with that are not going to do anything along the lines of sane. I guess whether or not that matters depends on if we expect any real queries to hit this, or if we just want to stop the Assert failure. ... 500 joins. I'm willing to listen to the explanation use case, but in absence of that explanation, I'd be leaning towards "you're doing it wrong". If that turns out to be true, then perhaps your proposed fix is okay. David
David Rowley <dgrowleyml@gmail.com> writes: > Are you worried about the costs above the join that triggers that > coming out as NaN with that fix? It appears that's the case. [ pokes at that... ] Yeah, it looks like nestloop cost estimation also has some issues with inf-times-zero producing NaN; it's just not asserting about it. I notice there are some other ad-hoc isnan() checks scattered about costsize.c, too. Maybe we should indeed consider fixing clamp_row_estimate to get rid of inf (and nan too, I suppose) so that we'd not need those. I don't recall the exact cases that made us introduce those checks, but they were for cases a lot more easily reachable than this one, I believe. regards, tom lane
On Fri, 9 Oct 2020 at 15:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I notice there are some other ad-hoc isnan() checks scattered > about costsize.c, too. Maybe we should indeed consider fixing > clamp_row_estimate to get rid of inf (and nan too, I suppose) > so that we'd not need those. I don't recall the exact cases > that made us introduce those checks, but they were for cases > a lot more easily reachable than this one, I believe. Is there actually a case where nrows could be NaN? If not, then it seems like a wasted check. Wouldn't it take one of the input relations or the input rels to have an Inf row estimate (which won't happen after changing clamp_row_estimate()), or the selectivity estimate being NaN. David
David Rowley <dgrowleyml@gmail.com> writes: > On Fri, 9 Oct 2020 at 15:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I notice there are some other ad-hoc isnan() checks scattered >> about costsize.c, too. Maybe we should indeed consider fixing >> clamp_row_estimate to get rid of inf (and nan too, I suppose) >> so that we'd not need those. I don't recall the exact cases >> that made us introduce those checks, but they were for cases >> a lot more easily reachable than this one, I believe. > Is there actually a case where nrows could be NaN? If not, then it > seems like a wasted check. Wouldn't it take one of the input > relations or the input rels to have an Inf row estimate (which won't > happen after changing clamp_row_estimate()), or the selectivity > estimate being NaN. I'm fairly certain that every one of the existing NaN checks was put there on the basis of hard experience. Possibly digging in the git history would offer more info about exactly where the NaNs came from. regards, tom lane
On Sat, 10 Oct 2020 at 02:19, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm fairly certain that every one of the existing NaN checks was put > there on the basis of hard experience. Possibly digging in the git > history would offer more info about exactly where the NaNs came from. I had a look at this and found there's been quite a number of fixes which added either that isnan checks or the <= 0 checks. Namely: ----------- commit 72826fb362c4aada6d2431df0b706df448806c02 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Fri Apr 15 17:45:41 2011 -0400 Guard against incoming rowcount estimate of NaN in cost_mergejoin(). Although rowcount estimates really ought not be NaN, a bug elsewhere could perhaps result in that, and that would cause Assert failure in cost_mergejoin, which I believe to be the explanation for bug #5977 from Anton Kuznetsov. Seems like a good idea to expend a couple more cycles to prevent that, even though the real bug is elsewhere. Not back-patching, though, because we don't encourage running production systems with Asserts on. The discussion for that is in https://www.postgresql.org/message-id/flat/4602.1302705756%40sss.pgh.pa.us#69dd8c334aa714cfac4e0d9b04c5201c commit 76281aa9647e6a5dfc646514554d0f519e3b8a58 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sat Mar 26 12:03:12 2016 -0400 Avoid a couple of zero-divide scenarios in the planner. commit fd791e7b5a1bf53131ad15e68e4d4f8ca795fcb4 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Mon Mar 24 21:53:04 2008 +0000 When a relation has been proven empty by constraint exclusion, propagate that knowledge up through any joins it participates in. We were doing that already in some special cases but not in the general case. Also, defend against zero row estimates for the input relations in cost_mergejoin --- this fix may have eliminated the only scenario in which that can happen, but be safe. Per report from Alex Solovey. That was reported in https://www.postgresql.org/message-id/flat/BLU136-DAV79FF310AC13FFC96FA2FDAEFD0%40phx.gbl#4cde17b2369fc7e0da83cc7d4aeeaa48 The problem was that an Append with no subpaths could have a 0 row estimate. ----------- Because there's been quite a few of these, and this report is yet another one, I wonder if it's time to try and stamp these out at the source rather than where the row counts are being used. I toyed around with the attached patch, but I'm still not that excited about the clamping of infinite values to DBL_MAX. The test case I showed above with generate_Series(1,379) still ends up with NaN cost estimates due to costing a sort with DBL_MAX rows. When I was writing the patch, I had it in my head that the costs per row will always be lower than 1. I thought because of that that even if the row count is dangerously close to DBL_MAX, the costs will never be higher than the row count... Turns out, I was wrong about that as clearly sorting a number of rows even close to DBL_MAX would beyond astronomically expensive and cause the costs would go infinite. The fd791e7b5 fix was for a subpath-less Append node having a 0-row estimate and causing problems in the costing of merge join. In the patch, I thought it would be better just to fix this by insisting that Append always will have at least 1 row. That means even a dummy path would have 1 row, which will become a const-false Result in the plan. I've had to add a special case to set the plan_rows back to 0 so that EXPLAIN shows 0 rows as it did before. That's not exactly pretty, but I still feel there is merit in insisting we never have 0-row paths to get away from these types of bugs once at for all. The patch does fix the failing Assert. However, something along these lines seems more suitable for master only. The back branches maybe should just get a more localised isinf() check and clamp to DBL_MAX that I mentioned earlier in this thread. I've searched through the code to see if there are other possible cases where paths may be generated with a 0-row count. I imagine anything that has a qual and performs a selectivity estimate will already have a clamp_row_est() since we'd see fractional row counts if it didn't. That leaves me with Append / Merge Append and each join type + aggregates. Currently, it seems we never will generate a Merge Append without any sub-paths. I wondered if I should just Assert that's the case in create_merge_append_path(). I ended up just adding a clamp_row_est call instead. calc_joinrel_size_estimate() seems to handle all join path row estimates. That uses clamp_row_est. Aggregate paths can reduce the number of rows, but I think all the row estimates from those will go through estimate_num_groups(), which appears to never be able to return 0. David
Attachment
David Rowley <dgrowleyml@gmail.com> writes: > Because there's been quite a few of these, and this report is yet > another one, I wonder if it's time to try and stamp these out at the > source rather than where the row counts are being used. I'm on board with trying to get rid of NaN rowcount estimates more centrally. I do not think it is a good idea to try to wire in a prohibition against zero rowcounts. That is actually the correct thing in assorted scenarios --- one example recently under discussion was ModifyTable without RETURNING, and another is where we can prove that a restriction clause is constant-false. At some point I think we are going to want to deal honestly with those cases instead of sweeping them under the rug. So I'm disinclined to remove zero defenses that we'll just have to put back someday. I think converting Inf to DBL_MAX, in hopes of avoiding creation of NaNs later, is fine. (Note that applying rint() to that is quite useless --- in every floating-point system, values bigger than 2^number-of-mantissa-bits are certainly integral.) I'm not sure why you propose to map NaN to one. Wouldn't mapping it to Inf (and thence to DBL_MAX) make at least as much sense? Probably more in fact. We know that unwarranted one-row estimates are absolute death to our chances of picking a well-chosen plan. > I toyed around with the attached patch, but I'm still not that excited > about the clamping of infinite values to DBL_MAX. The test case I > showed above with generate_Series(1,379) still ends up with NaN cost > estimates due to costing a sort with DBL_MAX rows. When I was writing > the patch, I had it in my head that the costs per row will always be > lower than 1. Yeah, that is a good point. Maybe instead of clamping to DBL_MAX, we should clamp rowcounts to something that provides some headroom for multiplication by per-row costs. A max rowcount of say 1e100 should serve fine, while still being comfortably more than any non-insane estimate. So now I'm imagining something like #define MAXIMUM_ROWCOUNT 1e100 clamp_row_est(double nrows) { /* Get rid of NaN, Inf, and impossibly large row counts */ if (isnan(nrows) || nrows >= MAXIMUM_ROWCOUNT) nrows = MAXIMUM_ROWCOUNT; else ... existing logic ... Perhaps we should also have some sort of clamp for path cost estimates, at least to prevent them from being NaNs which is going to confuse add_path terribly. regards, tom lane
Thanks for having a look at this. On Wed, 14 Oct 2020 at 04:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm on board with trying to get rid of NaN rowcount estimates more > centrally. I do not think it is a good idea to try to wire in a > prohibition against zero rowcounts. That is actually the correct > thing in assorted scenarios --- one example recently under discussion > was ModifyTable without RETURNING, and another is where we can prove > that a restriction clause is constant-false. At some point I think > we are going to want to deal honestly with those cases instead of > sweeping them under the rug. So I'm disinclined to remove zero > defenses that we'll just have to put back someday. OK, that certainly limits the scope here. It just means we can't get rid of the <= 0 checks in join costing functions. The problem case that this was added for was a dummy Append. We still have valid cases that won't convert the join rel to a dummy rel with a dummy Append on one side. > I think converting Inf to DBL_MAX, in hopes of avoiding creation of > NaNs later, is fine. (Note that applying rint() to that is quite > useless --- in every floating-point system, values bigger than > 2^number-of-mantissa-bits are certainly integral.) Good point. > I'm not sure why you propose to map NaN to one. Wouldn't mapping it > to Inf (and thence to DBL_MAX) make at least as much sense? Probably > more in fact. We know that unwarranted one-row estimates are absolute > death to our chances of picking a well-chosen plan. That came around due to what the join costing functions were doing. i.e: /* Protect some assumptions below that rowcounts aren't zero or NaN */ if (inner_path_rows <= 0 || isnan(inner_path_rows)) inner_path_rows = 1; [1] didn't have an example case of how the NaNs were introduced, so I was mostly just copying the logic that was added to fix that back in 72826fb3. > > I toyed around with the attached patch, but I'm still not that excited > > about the clamping of infinite values to DBL_MAX. The test case I > > showed above with generate_Series(1,379) still ends up with NaN cost > > estimates due to costing a sort with DBL_MAX rows. When I was writing > > the patch, I had it in my head that the costs per row will always be > > lower than 1. > > Yeah, that is a good point. Maybe instead of clamping to DBL_MAX, > we should clamp rowcounts to something that provides some headroom > for multiplication by per-row costs. A max rowcount of say 1e100 > should serve fine, while still being comfortably more than any > non-insane estimate. > > So now I'm imagining something like > > #define MAXIMUM_ROWCOUNT 1e100 That seems more reasonable. We likely could push it a bit higher, but I'm not all that motivated to since if that was true, then you could expect the heat death of the universe to arrive before your query results. In which case the user would likely struggle to find electrons to power their computer. > clamp_row_est(double nrows) > { > /* Get rid of NaN, Inf, and impossibly large row counts */ > if (isnan(nrows) || nrows >= MAXIMUM_ROWCOUNT) > nrows = MAXIMUM_ROWCOUNT; > else > ... existing logic ... I've got something along those lines in the attached. > Perhaps we should also have some sort of clamp for path cost > estimates, at least to prevent them from being NaNs which > is going to confuse add_path terribly. hmm. I'm not quite sure where to start with that one. Many of the path estimates will already go through clamp_row_est(). There are various special requirements, e.g Appends with no subpaths. So when to apply it would depend on what path type it is. I'd say it would need lots of careful analysis and a scattering of new calls in pathnode.c I've ended up leaving the NaN checks in the join costing functions. There was no case mentioned in [1] that showed how we hit that reported test case, so I'm not really confident enough to know I'm not just reintroducing the same problem again by removing that. The path row estimate that had the NaN might not have been through clamp_row_est(). Many don't. David [1] https://www.postgresql.org/message-id/7270.1302902842%40sss.pgh.pa.us
Attachment
David Rowley <dgrowleyml@gmail.com> writes: > On Wed, 14 Oct 2020 at 04:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So now I'm imagining something like >> #define MAXIMUM_ROWCOUNT 1e100 > That seems more reasonable. We likely could push it a bit higher, but > I'm not all that motivated to since if that was true, then you could > expect the heat death of the universe to arrive before your query > results. In which case the user would likely struggle to find > electrons to power their computer. Right. But I'm thinking about joins in which both inputs are clamped to that maximum estimate. If we allowed it to be as high as 1e200, then multiplying the two input rowcounts together would itself overflow. At 1e100, we can do that and also multiply in a ridiculous per-row cost, and we're still well below the overflow threshold. So this should go pretty far towards preventing internal overflows in any one plan step's cost & rows calculations. (For comparison's sake, I believe the number of atoms in the observable universe is thought to be somewhere on the order of 1e80. So we are pretty safe in thinking that no practically-useful rowcount estimate will exceed 1e100; there is no need to make it higher.) > I've ended up leaving the NaN checks in the join costing functions. > There was no case mentioned in [1] that showed how we hit that > reported test case, so I'm not really confident enough to know I'm not > just reintroducing the same problem again by removing that. The path > row estimate that had the NaN might not have been through > clamp_row_est(). Many don't. Hmm, I will try to find some time tomorrow to reconstruct that. regards, tom lane
I wrote: > David Rowley <dgrowleyml@gmail.com> writes: >> I've ended up leaving the NaN checks in the join costing functions. >> There was no case mentioned in [1] that showed how we hit that >> reported test case, so I'm not really confident enough to know I'm not >> just reintroducing the same problem again by removing that. The path >> row estimate that had the NaN might not have been through >> clamp_row_est(). Many don't. > Hmm, I will try to find some time tomorrow to reconstruct that. I'm confused now, because the v2 patch does remove those isnan calls? I rechecked the archives, and I agree that there's no data about exactly how we could have gotten a NaN here. My guess though is infinity-times-zero in some earlier relation size estimate. So hopefully the clamp to 1e100 will make that impossible, or if it doesn't then clamp_row_est() should still prevent a NaN from propagating to the next level up. I'm good with the v2 patch. regards, tom lane
On Sat, 17 Oct 2020 at 06:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm confused now, because the v2 patch does remove those isnan calls? I think that was a case of a last-minute change of mind and forgetting to attach the updated patch. > I rechecked the archives, and I agree that there's no data about > exactly how we could have gotten a NaN here. My guess though is > infinity-times-zero in some earlier relation size estimate. So > hopefully the clamp to 1e100 will make that impossible, or if it > doesn't then clamp_row_est() should still prevent a NaN from > propagating to the next level up. > > I'm good with the v2 patch. Thanks a lot for having a look. I'll proceed in getting the v2 which I sent earlier into master. For the backbranches, I think I go with something more minimal in the form of adding: if (outer_path_rows <= 0 || isnan(outer_path_rows)) outer_path_rows = 1; +else if (isinf(outer_path_rows)) + outer_path_rows = DBL_MAX; and the same for the inner_path_rows to each area in costsize.c which has that code. Wondering your thoughts on that. David
David Rowley <dgrowleyml@gmail.com> writes: > On Sat, 17 Oct 2020 at 06:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm good with the v2 patch. > Thanks a lot for having a look. I'll proceed in getting the v2 which I > sent earlier into master. > For the backbranches, I think I go with something more minimal in the > form of adding: TBH, I see no need to do anything in the back branches. This is not an issue for production usage. regards, tom lane
On Mon, 19 Oct 2020 at 12:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > For the backbranches, I think I go with something more minimal in the > > form of adding: > > TBH, I see no need to do anything in the back branches. This is not > an issue for production usage. I understand the Assert failure is pretty harmless, so non-assert builds shouldn't suffer too greatly. I just assumed that any large stakeholders invested in upgrading to a newer version of PostgreSQL may like to run various tests with their application against an assert enabled version of PostgreSQL perhaps to gain some confidence in the upgrade. A failing assert is unlikely to inspire additional confidence. I'm not set on backpatching, but that's just my thoughts. FWIW, the patch I'd thought of is attached. David
Attachment
David Rowley <dgrowleyml@gmail.com> writes: > On Mon, 19 Oct 2020 at 12:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> TBH, I see no need to do anything in the back branches. This is not >> an issue for production usage. > I understand the Assert failure is pretty harmless, so non-assert > builds shouldn't suffer too greatly. I just assumed that any large > stakeholders invested in upgrading to a newer version of PostgreSQL > may like to run various tests with their application against an assert > enabled version of PostgreSQL perhaps to gain some confidence in the > upgrade. A failing assert is unlikely to inspire additional > confidence. If any existing outside regression tests hit such corner cases, then (a) we'd have heard about it, and (b) likely they'd fail in the older branch as well. So I don't buy the argument that this will dissuade somebody from upgrading. I do, on the other hand, buy the idea that if anyone is indeed working in this realm, they might be annoyed by a behavior change in a stable branch. So it cuts both ways. On balance I don't think we should touch this in the back branches. regards, tom lane
On Mon, 19 Oct 2020 at 12:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > On Mon, 19 Oct 2020 at 12:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> TBH, I see no need to do anything in the back branches. This is not > >> an issue for production usage. > > > I understand the Assert failure is pretty harmless, so non-assert > > builds shouldn't suffer too greatly. I just assumed that any large > > stakeholders invested in upgrading to a newer version of PostgreSQL > > may like to run various tests with their application against an assert > > enabled version of PostgreSQL perhaps to gain some confidence in the > > upgrade. A failing assert is unlikely to inspire additional > > confidence. > > If any existing outside regression tests hit such corner cases, then > (a) we'd have heard about it, and (b) likely they'd fail in the older > branch as well. So I don't buy the argument that this will dissuade > somebody from upgrading. hmm, well it was reported to us. Perhaps swapping the word "upgrading" for "migrating". It would be good to hear Onder's case to see if he has a good argument for having a vested interest in pg13 not failing this way with assets enabled. > I do, on the other hand, buy the idea that if anyone is indeed working > in this realm, they might be annoyed by a behavior change in a stable > branch. So it cuts both ways. On balance I don't think we should > touch this in the back branches. I guess we could resolve that concern by just changing the failing assert to become: Assert(outer_skip_rows <= outer_rows || isinf(outer_rows)); It's pretty grotty but should address that concern. David
David Rowley <dgrowleyml@gmail.com> writes: > It would be good to hear Onder's case to see if he has a good argument > for having a vested interest in pg13 not failing this way with assets > enabled. Yeah, some context for this report would be a good thing. (BTW, am I wrong to suppose that the same case fails the same way in our older branches? Certainly that Assert has been there a long time.) > I guess we could resolve that concern by just changing the failing > assert to become: Assert(outer_skip_rows <= outer_rows || > isinf(outer_rows)); I can't really object to just weakening the Assert a tad. My thoughts would have run towards checking for the NaN though. regards, tom lane
On Mon, 19 Oct 2020 at 13:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > (BTW, am I wrong to suppose that the same case fails the same > way in our older branches? Certainly that Assert has been there > a long time.) I only tested as back as far as 9.5, but it does fail there. David
On Mon, 19 Oct 2020 at 13:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > I guess we could resolve that concern by just changing the failing > > assert to become: Assert(outer_skip_rows <= outer_rows || > > isinf(outer_rows)); > > I can't really object to just weakening the Assert a tad. > My thoughts would have run towards checking for the NaN though. I ended up back-patching a change that does that. Thanks for your input on this and for the report, Onder. David