Thread: Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c
Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c
From
Piotr Stefaniak
Date:
Hi, using sqlsmith and UBSan I have found these two division by zero errors: src/backend/optimizer/plan/planner.c:4846/* Convert absolute # of tuples to a fraction; no need to clamp */if (tuple_fraction>= 1.0){ tuple_fraction /= best_path->rows;} and src/backend/optimizer/path/costsize.c:3029 if (subplan->subLinkType == EXISTS_SUBLINK) { /* we only need tofetch 1 tuple */ sp_cost.per_tuple += plan_run_cost / plan->plan_rows; } The first is triggered by this query (reduced by me from the original query string generated by sqlsmith): select 1 from ( select ref_0.location as c0 from public.city as ref_0 ) as subq_0 where EXISTS ( select 1 from ( select sample_0.collname as c0 from pg_catalog.pg_collation as sample_0 ) as subq_1 right join public.tt5 as ref_2 inner join pg_catalog.pg_constraint as ref_4 on (ref_2.z = ref_4.coninhcount) on (subq_1.c0 = ref_4.conname ), lateral ( select 1 from public.shoelace_candelete as ref_5 where false ) as subq_2 ); #0 get_cheapest_fractional_path (rel=0x7ffff7ec32a8, tuple_fraction=1) at src/backend/optimizer/plan/planner.c:4846 #1 0x00000000007422a1 in make_subplan (root=0xf49778, orig_subquery=0x7ffff7f593c8, subLinkType=EXISTS_SUBLINK, subLinkId=0, testexpr=0x0, isTopQual=1 '\001') at src/backend/optimizer/plan/subselect.c:546 #2 0x000000000074470d in process_sublinks_mutator (node=0x7ffff7f610b0, context=0x7fffffffd900) at src/backend/optimizer/plan/subselect.c:1974 #3 0x0000000000744670 in SS_process_sublinks (root=0xf49778, expr=0x7ffff7f610b0, isQual=1 '\001') at src/backend/optimizer/plan/subselect.c:1947 #4 0x0000000000736621 in preprocess_expression (root=0xf49778, expr=0x7ffff7f610b0, kind=0) at src/backend/optimizer/plan/planner.c:848 #5 0x0000000000736700 in preprocess_qual_conditions (root=0xf49778, jtnode=0xf5f790) at src/backend/optimizer/plan/planner.c:893 #6 0x0000000000735ff3 in subquery_planner (glob=0xf3ef70, parse=0xf3e9a0, parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0) at src/backend/optimizer/plan/planner.c:600 #7 0x000000000073566b in standard_planner (parse=0xf3e9a0, cursorOptions=256, boundParams=0x0) at src/backend/optimizer/plan/planner.c:307 #8 0x00000000007353ad in planner (parse=0xf3e9a0, cursorOptions=256, boundParams=0x0) at src/backend/optimizer/plan/planner.c:177 #9 0x0000000000800d3b in pg_plan_query (querytree=0xf3e9a0, cursorOptions=256, boundParams=0x0) at src/backend/tcop/postgres.c:798 #10 0x0000000000800dee in pg_plan_queries (querytrees=0xf53648, cursorOptions=256, boundParams=0x0) at src/backend/tcop/postgres.c:857 #11 0x0000000000801093 in exec_simple_query (query_string=0xf07dd8 "select 1\nfrom (\n select ref_0.location as c0\n from public.city as ref_0\n) as subq_0\nwhere EXISTS (\n select 1\n from (\n select sample_0.collname as c0\n from pg_catalog.pg_collation as sample_0\n ) as subq_1\n right join public.tt5 as ref_2\n inner join pg_catalog.pg_constraint as ref_4\n on (ref_2.z = ref_4.coninhcount )\n on (subq_1.c0 = ref_4.conname ),\n lateral (\n select 1\n from public.shoelace_candelete as ref_5\n where false\n ) as subq_2\n);") at src/backend/tcop/postgres.c:1022 #12 0x0000000000805355 in PostgresMain (argc=1, argv=0xe95ee0, dbname=0xe95d40 "regression", username=0xe95d20 "me") at src/backend/tcop/postgres.c:4059 #13 0x000000000077ed44 in BackendRun (port=0xeb2f80) at src/backend/postmaster/postmaster.c:4258 #14 0x000000000077e4a8 in BackendStartup (port=0xeb2f80) at src/backend/postmaster/postmaster.c:3932 #15 0x000000000077ac2c in ServerLoop () at src/backend/postmaster/postmaster.c:1690 #16 0x000000000077a261 in PostmasterMain (argc=5, argv=0xe94e10) at src/backend/postmaster/postmaster.c:1298 #17 0x00000000006c623c in main (argc=5, argv=0xe94e10) at src/backend/main/main.c:228 The second one is triggered by this (again, reduced from the original): select 1 from public.tt5 as subq_0 where EXISTS ( select 1 from public.b_star as ref_0 where false ); #0 cost_subplan (root=0xf3e718, subplan=0xf42780, plan=0xf3fcd8) at src/backend/optimizer/path/costsize.c:3029 #1 0x0000000000742eb9 in build_subplan (root=0xf3e718, plan=0xf3fcd8, subroot=0xf3f6a8, plan_params=0x0, subLinkType=EXISTS_SUBLINK, subLinkId=0, testexpr=0x0, adjust_testexpr=1 '\001', unknownEqFalse=1 '\001') at src/backend/optimizer/plan/subselect.c:887 #2 0x00000000007422c0 in make_subplan (root=0xf3e718, orig_subquery=0xf09628, subLinkType=EXISTS_SUBLINK, subLinkId=0, testexpr=0x0, isTopQual=1 '\001') at src/backend/optimizer/plan/subselect.c:551 #3 0x00000000007446d7 in process_sublinks_mutator (node=0xf3f100, context=0x7fffffffd900) at src/backend/optimizer/plan/subselect.c:1974 #4 0x000000000074463a in SS_process_sublinks (root=0xf3e718, expr=0xf3f100, isQual=1 '\001') at src/backend/optimizer/plan/subselect.c:1947 #5 0x0000000000736621 in preprocess_expression (root=0xf3e718, expr=0xf3f100, kind=0) at src/backend/optimizer/plan/planner.c:848 #6 0x0000000000736700 in preprocess_qual_conditions (root=0xf3e718, jtnode=0xf3e9b0) at src/backend/optimizer/plan/planner.c:893 #7 0x0000000000735ff3 in subquery_planner (glob=0xf09740, parse=0xf093a0, parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0) at src/backend/optimizer/plan/planner.c:600 #8 0x000000000073566b in standard_planner (parse=0xf093a0, cursorOptions=256, boundParams=0x0) at src/backend/optimizer/plan/planner.c:307 #9 0x00000000007353ad in planner (parse=0xf093a0, cursorOptions=256, boundParams=0x0) at src/backend/optimizer/plan/planner.c:177 #10 0x0000000000800d05 in pg_plan_query (querytree=0xf093a0, cursorOptions=256, boundParams=0x0) at src/backend/tcop/postgres.c:798 #11 0x0000000000800db8 in pg_plan_queries (querytrees=0xf3e6b8, cursorOptions=256, boundParams=0x0) at src/backend/tcop/postgres.c:857 #12 0x000000000080105d in exec_simple_query (query_string=0xf07dd8 "select 1\nfrom public.tt5 as subq_0\nwhere EXISTS (\n select 1\n from public.b_star as ref_0\n where false\n);") at src/backend/tcop/postgres.c:1022 #13 0x000000000080531f in PostgresMain (argc=1, argv=0xe95ee0, dbname=0xe95d40 "regression", username=0xe95d20 "me") at src/backend/tcop/postgres.c:4059 #14 0x000000000077ed0e in BackendRun (port=0xeb2f80) at src/backend/postmaster/postmaster.c:4258 #15 0x000000000077e472 in BackendStartup (port=0xeb2f80) at src/backend/postmaster/postmaster.c:3932 #16 0x000000000077abf6 in ServerLoop () at src/backend/postmaster/postmaster.c:1690 #17 0x000000000077a22b in PostmasterMain (argc=5, argv=0xe94e10) at src/backend/postmaster/postmaster.c:1298 #18 0x00000000006c623c in main (argc=5, argv=0xe94e10) at src/backend/main/main.c:228 (the back-traces are slightly redacted for readability).
Re: Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c
From
Tom Lane
Date:
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes: > using sqlsmith and UBSan I have found these two division by zero errors: Hmm, thanks. Seems there's a bit of a disagreement as to whether path->rows is allowed to be zero or not. It normally isn't; but we've created an exception that provably-empty relations have zero rowcount, so now these places had better deal with the case. regards, tom lane
Re: Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c
From
Piotr Stefaniak
Date:
I'm not saying this is necessarily a bug since the whole function deals with floats, but perhaps it's interesting to note that ndistinct can be 0 in src/backend/utils/adt/selfuncs.c:estimate_hash_bucketsize:/* * Initial estimate of bucketsize fraction is 1/nbucketsas long as the * number of buckets is less than the expected number of distinct values; * otherwise it is 1/ndistinct.*/if (ndistinct > nbuckets) estfract = 1.0 / nbuckets;else estfract = 1.0 / ndistinct; for this query: select subq_0.c1 as c0 from ( select ref_0.a as c0, (select NULL::integer from information_schema.user_defined_types limit 1 offset 1) as c1 from public.rtest_nothn3 as ref_0 limit 130 ) as subq_0 left join ( select sample_0.x as c0 from public.insert_tbl as sample_0 where false ) as subq_1 on subq_0.c1 = subq_1.c0;
Re: Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c
From
Piotr Stefaniak
Date:
On 2016-03-26 19:29, Piotr Stefaniak wrote: > I'm not saying this is necessarily a bug since the whole function deals > with floats, but perhaps it's interesting to note that ndistinct can be > 0 in src/backend/utils/adt/selfuncs.c:estimate_hash_bucketsize: On the exact same note, something like this (again reduced from what sqlsmith produced): select 1 from unnest('{}'::boolean[]) a (x) left join ( select * from unnest('{}'::boolean[]) where false ) b (x) on a.x = b.x; leads to vardata.rel->tuples being zero here: if (vardata.rel) ndistinct *= vardata.rel->rows / vardata.rel->tuples; Back-trace attached.
Attachment
Re: Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c
From
Tom Lane
Date:
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes: > I'm not saying this is necessarily a bug since the whole function deals > with floats, but perhaps it's interesting to note that ndistinct can be > 0 in src/backend/utils/adt/selfuncs.c:estimate_hash_bucketsize: I think it's basically cosmetic unless you've got a machine that traps zero divide, but still that's good to fix. Thanks for the report! regards, tom lane
Re: Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c
From
Tom Lane
Date:
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes: > On the exact same note, something like this (again reduced from what > sqlsmith produced): > leads to vardata.rel->tuples being zero here: > if (vardata.rel) > ndistinct *= vardata.rel->rows / vardata.rel->tuples; Ugh. That's a bit worse because it'll be 0/0, ie you get a NaN. Thanks for the report. regards, tom lane
Re: Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c
From
Aleksander Alekseev
Date:
Hello, Piotr. Thanks for report. But I'm having some difficulties reproducing issues you described. I compiled PostgreSQL from master branch on FreeBSD 10.2 using this command: ``` CC=/usr/local/bin/gcc49 CFLAGS="-O0 -g" \ ./configure --enable-cassert --enable-debug \ --prefix=/home/eax/postgresql-install\ && gmake clean && gmake -j2 -s ``` Then I run reinit.sh: ``` #!/usr/bin/env bash P=~/postgresql-install pkill -9 postgres make install rm -rf $P/data $P/bin/initdb -D $P/data echo "max_prepared_transactions = 100" >> $P/data/postgresql.conf echo "wal_level = hot_standby" >> $P/data/postgresql.conf echo "wal_keep_segments = 128" >> $P/data/postgresql.conf echo "max_connections = 10" >> $P/data/postgresql.conf echo "listen_addresses = '*'" >> $P/data/postgresql.conf echo '' > $P/data/logfile echo "host all all 0.0.0.0/0 trust" >> $P/data/pg_hba.conf echo "host replication all 0.0.0.0/0 trust" >> $P/data/pg_hba.conf echo "local replication all trust" >> $P/data/pg_hba.conf $P/bin/pg_ctl -w -D $P/data -l $P/data/logfile start $P/bin/createdb `whoami` $P/bin/psql -c "create table test(k int primary key, v text);" ``` ..., connected to PostgreSQL using psql, in second terminal I attached to the backend process using gdb710 and input `c`. Now in psql: ``` eax=# create table tt5(x int); CREATE TABLE eax=# create table b_star(x int); CREATE TABLE eax=# insert into b_star values (1), (2), (3); INSERT 0 3 eax=# insert into tt5 values (2), (3), (4), (5); INSERT 0 4 eax=# select 1 eax-# from public.tt5 as subq_0 eax-# where EXISTS ( eax(# select 1 eax(# from public.b_star as ref_0 eax(# where false eax(# );?column? ---------- (0 rows) eax=# select 1 eax-# from unnest('{}'::boolean[]) a (x) eax-# left join ( eax(# select * eax(# from unnest('{}'::boolean[]) eax(# where false eax(# ) b (x) on a.x = b.x;?column? ---------- (0 rows) ``` Everything seems to work, no stacktraces in gdb. Could you please provide more concrete steps to reproduce these issues i.e, OS and compiler version, compilation flags (or package version), cluster config, database schema, etc? These steps are required at least to make sure that fixed code really fixes a problem. Also it would be a good idea to include these steps to regression tests. -- Best regards, Aleksander Alekseev http://eax.me/
Re: Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c
From
Piotr Stefaniak
Date:
On 2016-03-28 11:33, Aleksander Alekseev wrote: > Hello, Piotr. > > Thanks for report. But I'm having some difficulties reproducing issues > you described. Oh, if you want backtraces then either set a conditional breakpoint or add your own Assert like I did. > Also it would be a good idea to include these steps to regression tests. I agree and I generally think that the more test cases touch previously not covered code paths the better, even if it had to be run as a different make(1) target. Although it seems that at least some people would agree (see [1]), the "make check" split somehow isn't happening. [1] CA+TgmoYMOFE94+3WG3spg9sqAjkv4siXjey+RDrMAMyE-VqsEw@mail.gmail.com