Thread: Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c

Hi,

using sqlsmith and UBSan I have found these two division by zero errors:

src/backend/optimizer/plan/planner.c:4846/* Convert absolute # of tuples to a fraction; no need to clamp */if
(tuple_fraction>= 1.0){    tuple_fraction /= best_path->rows;}
 

and

src/backend/optimizer/path/costsize.c:3029    if (subplan->subLinkType == EXISTS_SUBLINK)    {        /* we only need
tofetch 1 tuple */        sp_cost.per_tuple += plan_run_cost / plan->plan_rows;    }
 

The first is triggered by this query (reduced by me from the original 
query string generated by sqlsmith):

select 1
from (  select ref_0.location as c0  from public.city as ref_0
) as subq_0
where EXISTS (  select 1  from (    select sample_0.collname as c0    from pg_catalog.pg_collation as sample_0  ) as
subq_1 right join public.tt5 as ref_2    inner join pg_catalog.pg_constraint as ref_4    on (ref_2.z =
ref_4.coninhcount)  on (subq_1.c0 = ref_4.conname ),  lateral (    select 1    from public.shoelace_candelete as ref_5
 where false  ) as subq_2
 
);

#0  get_cheapest_fractional_path (rel=0x7ffff7ec32a8, tuple_fraction=1) 
at src/backend/optimizer/plan/planner.c:4846
#1  0x00000000007422a1 in make_subplan (root=0xf49778, 
orig_subquery=0x7ffff7f593c8, subLinkType=EXISTS_SUBLINK, subLinkId=0, 
testexpr=0x0, isTopQual=1 '\001') at 
src/backend/optimizer/plan/subselect.c:546
#2  0x000000000074470d in process_sublinks_mutator (node=0x7ffff7f610b0, 
context=0x7fffffffd900) at src/backend/optimizer/plan/subselect.c:1974
#3  0x0000000000744670 in SS_process_sublinks (root=0xf49778, 
expr=0x7ffff7f610b0, isQual=1 '\001') at 
src/backend/optimizer/plan/subselect.c:1947
#4  0x0000000000736621 in preprocess_expression (root=0xf49778, 
expr=0x7ffff7f610b0, kind=0) at src/backend/optimizer/plan/planner.c:848
#5  0x0000000000736700 in preprocess_qual_conditions (root=0xf49778, 
jtnode=0xf5f790) at src/backend/optimizer/plan/planner.c:893
#6  0x0000000000735ff3 in subquery_planner (glob=0xf3ef70, 
parse=0xf3e9a0, parent_root=0x0, hasRecursion=0 '\000', 
tuple_fraction=0) at src/backend/optimizer/plan/planner.c:600
#7  0x000000000073566b in standard_planner (parse=0xf3e9a0, 
cursorOptions=256, boundParams=0x0) at 
src/backend/optimizer/plan/planner.c:307
#8  0x00000000007353ad in planner (parse=0xf3e9a0, cursorOptions=256, 
boundParams=0x0) at src/backend/optimizer/plan/planner.c:177
#9  0x0000000000800d3b in pg_plan_query (querytree=0xf3e9a0, 
cursorOptions=256, boundParams=0x0) at src/backend/tcop/postgres.c:798
#10 0x0000000000800dee in pg_plan_queries (querytrees=0xf53648, 
cursorOptions=256, boundParams=0x0) at src/backend/tcop/postgres.c:857
#11 0x0000000000801093 in exec_simple_query (query_string=0xf07dd8 
"select 1\nfrom (\n  select ref_0.location as c0\n  from public.city as 
ref_0\n) as subq_0\nwhere EXISTS (\n  select 1\n  from (\n    select 
sample_0.collname as c0\n    from pg_catalog.pg_collation as sample_0\n  ) as subq_1\n  right join public.tt5 as
ref_2\n   inner join 
 
pg_catalog.pg_constraint as ref_4\n    on (ref_2.z = ref_4.coninhcount 
)\n  on (subq_1.c0 = ref_4.conname ),\n  lateral (\n    select 1\n 
from public.shoelace_candelete as ref_5\n    where false\n  ) as 
subq_2\n);") at src/backend/tcop/postgres.c:1022
#12 0x0000000000805355 in PostgresMain (argc=1, argv=0xe95ee0, 
dbname=0xe95d40 "regression", username=0xe95d20 "me") at 
src/backend/tcop/postgres.c:4059
#13 0x000000000077ed44 in BackendRun (port=0xeb2f80) at 
src/backend/postmaster/postmaster.c:4258
#14 0x000000000077e4a8 in BackendStartup (port=0xeb2f80) at 
src/backend/postmaster/postmaster.c:3932
#15 0x000000000077ac2c in ServerLoop () at 
src/backend/postmaster/postmaster.c:1690
#16 0x000000000077a261 in PostmasterMain (argc=5, argv=0xe94e10) at 
src/backend/postmaster/postmaster.c:1298
#17 0x00000000006c623c in main (argc=5, argv=0xe94e10) at 
src/backend/main/main.c:228

The second one is triggered by this (again, reduced from the original):

select 1
from public.tt5 as subq_0
where EXISTS (  select 1  from public.b_star as ref_0  where false
);

#0  cost_subplan (root=0xf3e718, subplan=0xf42780, plan=0xf3fcd8) at 
src/backend/optimizer/path/costsize.c:3029
#1  0x0000000000742eb9 in build_subplan (root=0xf3e718, plan=0xf3fcd8, 
subroot=0xf3f6a8, plan_params=0x0, subLinkType=EXISTS_SUBLINK, 
subLinkId=0, testexpr=0x0, adjust_testexpr=1 '\001', unknownEqFalse=1 
'\001') at src/backend/optimizer/plan/subselect.c:887
#2  0x00000000007422c0 in make_subplan (root=0xf3e718, 
orig_subquery=0xf09628, subLinkType=EXISTS_SUBLINK, subLinkId=0, 
testexpr=0x0, isTopQual=1 '\001') at 
src/backend/optimizer/plan/subselect.c:551
#3  0x00000000007446d7 in process_sublinks_mutator (node=0xf3f100, 
context=0x7fffffffd900) at src/backend/optimizer/plan/subselect.c:1974
#4  0x000000000074463a in SS_process_sublinks (root=0xf3e718, 
expr=0xf3f100, isQual=1 '\001') at 
src/backend/optimizer/plan/subselect.c:1947
#5  0x0000000000736621 in preprocess_expression (root=0xf3e718, 
expr=0xf3f100, kind=0) at src/backend/optimizer/plan/planner.c:848
#6  0x0000000000736700 in preprocess_qual_conditions (root=0xf3e718, 
jtnode=0xf3e9b0) at src/backend/optimizer/plan/planner.c:893
#7  0x0000000000735ff3 in subquery_planner (glob=0xf09740, 
parse=0xf093a0, parent_root=0x0, hasRecursion=0 '\000', 
tuple_fraction=0) at src/backend/optimizer/plan/planner.c:600
#8  0x000000000073566b in standard_planner (parse=0xf093a0, 
cursorOptions=256, boundParams=0x0) at 
src/backend/optimizer/plan/planner.c:307
#9  0x00000000007353ad in planner (parse=0xf093a0, cursorOptions=256, 
boundParams=0x0) at src/backend/optimizer/plan/planner.c:177
#10 0x0000000000800d05 in pg_plan_query (querytree=0xf093a0, 
cursorOptions=256, boundParams=0x0) at src/backend/tcop/postgres.c:798
#11 0x0000000000800db8 in pg_plan_queries (querytrees=0xf3e6b8, 
cursorOptions=256, boundParams=0x0) at src/backend/tcop/postgres.c:857
#12 0x000000000080105d in exec_simple_query (query_string=0xf07dd8 
"select 1\nfrom public.tt5 as subq_0\nwhere EXISTS (\n  select 1\n  from 
public.b_star as ref_0\n  where false\n);") at 
src/backend/tcop/postgres.c:1022
#13 0x000000000080531f in PostgresMain (argc=1, argv=0xe95ee0, 
dbname=0xe95d40 "regression", username=0xe95d20 "me") at 
src/backend/tcop/postgres.c:4059
#14 0x000000000077ed0e in BackendRun (port=0xeb2f80) at 
src/backend/postmaster/postmaster.c:4258
#15 0x000000000077e472 in BackendStartup (port=0xeb2f80) at 
src/backend/postmaster/postmaster.c:3932
#16 0x000000000077abf6 in ServerLoop () at 
src/backend/postmaster/postmaster.c:1690
#17 0x000000000077a22b in PostmasterMain (argc=5, argv=0xe94e10) at 
src/backend/postmaster/postmaster.c:1298
#18 0x00000000006c623c in main (argc=5, argv=0xe94e10) at 
src/backend/main/main.c:228


(the back-traces are slightly redacted for readability).



Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
> using sqlsmith and UBSan I have found these two division by zero errors:

Hmm, thanks.  Seems there's a bit of a disagreement as to whether
path->rows is allowed to be zero or not.  It normally isn't; but
we've created an exception that provably-empty relations have zero
rowcount, so now these places had better deal with the case.
        regards, tom lane



I'm not saying this is necessarily a bug since the whole function deals 
with floats, but perhaps it's interesting to note that ndistinct can be 
0 in src/backend/utils/adt/selfuncs.c:estimate_hash_bucketsize:/* * Initial estimate of bucketsize fraction is
1/nbucketsas long as the * number of buckets is less than the expected number of distinct values; * otherwise it is
1/ndistinct.*/if (ndistinct > nbuckets)    estfract = 1.0 / nbuckets;else    estfract = 1.0 / ndistinct;
 

for this query:
select subq_0.c1 as c0
from (  select ref_0.a as c0, (select NULL::integer from 
information_schema.user_defined_types limit 1 offset 1) as c1  from public.rtest_nothn3 as ref_0  limit 130
) as subq_0
left join (  select sample_0.x as c0  from public.insert_tbl as sample_0  where false
) as subq_1 on subq_0.c1 = subq_1.c0;



On 2016-03-26 19:29, Piotr Stefaniak wrote:
> I'm not saying this is necessarily a bug since the whole function deals
> with floats, but perhaps it's interesting to note that ndistinct can be
> 0 in src/backend/utils/adt/selfuncs.c:estimate_hash_bucketsize:

On the exact same note, something like this (again reduced from what
sqlsmith produced):

select 1
from unnest('{}'::boolean[]) a (x)
left join (
   select *
   from unnest('{}'::boolean[])
   where false
) b (x) on a.x = b.x;

leads to vardata.rel->tuples being zero here:
    if (vardata.rel)
        ndistinct *= vardata.rel->rows / vardata.rel->tuples;

Back-trace attached.


Attachment
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
> I'm not saying this is necessarily a bug since the whole function deals 
> with floats, but perhaps it's interesting to note that ndistinct can be 
> 0 in src/backend/utils/adt/selfuncs.c:estimate_hash_bucketsize:

I think it's basically cosmetic unless you've got a machine that traps
zero divide, but still that's good to fix.  Thanks for the report!
        regards, tom lane



Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
> On the exact same note, something like this (again reduced from what 
> sqlsmith produced):
> leads to vardata.rel->tuples being zero here:
>     if (vardata.rel)
>         ndistinct *= vardata.rel->rows / vardata.rel->tuples;

Ugh.  That's a bit worse because it'll be 0/0, ie you get a NaN.
Thanks for the report.
        regards, tom lane



Re: Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c

From
Aleksander Alekseev
Date:
Hello, Piotr.

Thanks for report. But I'm having some difficulties reproducing issues
you described.

I compiled PostgreSQL from master branch on FreeBSD 10.2 using this
command:

```
CC=/usr/local/bin/gcc49 CFLAGS="-O0 -g" \ ./configure --enable-cassert --enable-debug \
--prefix=/home/eax/postgresql-install\ && gmake clean && gmake -j2 -s
 
```

Then I run reinit.sh:

```
#!/usr/bin/env bash

P=~/postgresql-install

pkill -9 postgres
make install

rm -rf $P/data
$P/bin/initdb -D $P/data

echo "max_prepared_transactions = 100" >> $P/data/postgresql.conf
echo "wal_level = hot_standby" >> $P/data/postgresql.conf
echo "wal_keep_segments = 128" >> $P/data/postgresql.conf
echo "max_connections = 10" >> $P/data/postgresql.conf
echo "listen_addresses = '*'" >> $P/data/postgresql.conf

echo '' > $P/data/logfile

echo "host all all 0.0.0.0/0 trust" >> $P/data/pg_hba.conf
echo "host replication all 0.0.0.0/0 trust" >> $P/data/pg_hba.conf
echo "local replication all trust" >> $P/data/pg_hba.conf

$P/bin/pg_ctl -w -D $P/data -l $P/data/logfile start
$P/bin/createdb `whoami`
$P/bin/psql -c "create table test(k int primary key, v text);"
```

..., connected to PostgreSQL using psql, in second terminal I attached
to the backend process using gdb710 and input `c`. Now in psql:

```
eax=# create table tt5(x int);
CREATE TABLE
eax=# create table b_star(x int);
CREATE TABLE
eax=# insert into b_star values (1), (2), (3);
INSERT 0 3
eax=# insert into tt5 values (2), (3), (4), (5);
INSERT 0 4
eax=# select 1
eax-# from public.tt5 as subq_0
eax-# where EXISTS (
eax(#    select 1
eax(#    from public.b_star as ref_0
eax(#    where false
eax(# );?column? 
----------
(0 rows)

eax=# select 1
eax-# from unnest('{}'::boolean[]) a (x)
eax-# left join (
eax(#    select *
eax(#    from unnest('{}'::boolean[])
eax(#    where false
eax(# ) b (x) on a.x = b.x;?column? 
----------
(0 rows)

```

Everything seems to work, no stacktraces in gdb.

Could you please provide more concrete steps to reproduce these
issues i.e, OS and compiler version, compilation flags (or package
version), cluster config, database schema, etc? These steps are required
at least to make sure that fixed code really fixes a problem. Also it
would be a good idea to include these steps to regression tests.

-- 
Best regards,
Aleksander Alekseev
http://eax.me/



On 2016-03-28 11:33, Aleksander Alekseev wrote:
> Hello, Piotr.
>
> Thanks for report. But I'm having some difficulties reproducing issues
> you described.

Oh, if you want backtraces then either set a conditional breakpoint or 
add your own Assert like I did.

> Also it would be a good idea to include these steps to regression tests.

I agree and I generally think that the more test cases touch previously 
not covered code paths the better, even if it had to be run as a 
different make(1) target. Although it seems that at least some people 
would agree (see [1]), the "make check" split somehow isn't happening.

[1] CA+TgmoYMOFE94+3WG3spg9sqAjkv4siXjey+RDrMAMyE-VqsEw@mail.gmail.com