8.3 planner handling of IS NULL in aggregations - Mailing list pgsql-general

From Sam Mason
Subject 8.3 planner handling of IS NULL in aggregations
Date
Msg-id 20080704143712.GY2572@frubble.xen.chris-lamb.co.uk
Whole thread Raw
Responses Re: 8.3 planner handling of IS NULL in aggregations
List pgsql-general
Hi,

I've just noticed that the planner in 8.3.3 doesn't seem to realize the
difference in the result of the following:

  SELECT col, COUNT(*)
  FROM tbl
  GROUP BY col;

and

  SELECT col IS NULL, COUNT(*)
  FROM tbl
  GROUP BY col IS NULL;

For a table with several million distinct values in "col" this
makes quite a difference.  I'd expect to be getting in memory hash
aggregations, but I'm getting a sort step in there instead.  Here's an
example:

  SELECT col1 IS NOT NULL, col2 IS NOT NULL, col3 IS NOT NULL,
    COUNT(*)
  FROM tbl
  GROUP BY 1,2,3
  ORDER BY 1,2,3;

gives the following plan:

  GroupAggregate  (cost=5018623.99..5387423.18 rows=4338999 width=12)
    ->  Sort  (cost=5018623.99..5081536.33 rows=25164936 width=12)
          Sort Key: ((col1 IS NOT NULL)), ((col2 IS NOT NULL)), ((col3 IS NOT NULL))
          ->  Seq Scan on tbl  (cost=0.00..376989.36 rows=25164936 width=12)

I can't see any way for it to produce more than 8 rows of output and so
I'd expect a hash aggregate to be best.  Removing the IS NOT NULLs from
the expression gives basically the same plan and expected number of rows
which then looks reasonable.


  Sam

pgsql-general by date:

Previous
From: Raymond O'Donnell
Date:
Subject: Re: simple installation problem in windows system
Next
From: Raymond O'Donnell
Date:
Subject: Re: simple installation problem in windows system