Thread: Adding a ROLLUP switches to GroupAggregate unexpectedly

Adding a ROLLUP switches to GroupAggregate unexpectedly

From
Chris Cogdon
Date:
Hi folks! I’ve a query where adding a rollup to the group by switches to GroupAggregate unexpectedly, where the standard GROUP BY uses HashAggregate. Since the rollup should only add one additional bucket, the switch to having to sort (and thus a to-disk temporary file) is very puzzling. This reads like a query optimiser bug to me. This is the first I’ve posted to the list, please forgive me if I’ve omitted any “before bugging the list” homework.


Description: Adding a summary row by changing “GROUP BY x” into “GROUP BY ROLLUP (x)” should not cause a switch from HashAggregate to GroupAggregate


Here’s the “explain” from the simple GROUP BY:

projectdb=> explain analyze verbose SELECT error_code, count ( * ) FROM api_activities GROUP BY error_code;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=3456930.11..3456930.16 rows=5 width=2) (actual time=26016.222..26016.223 rows=5 loops=1)
   Output: error_code, count(*)
   Group Key: api_activities.error_code
   ->  Seq Scan on public.api_activities  (cost=0.00..3317425.74 rows=27900874 width=2) (actual time=0.018..16232.608 rows=36224844 loops=1)
         Output: id, client_id, date_added, kind, activity, error_code
 Planning time: 0.098 ms
 Execution time: 26016.337 ms
(7 rows)

Changing this to a GROUP BY ROLLUP switches to GroupAggregate (with the corresponding to-disk temporary table being created):

projectdb=> explain analyze verbose SELECT error_code, count ( * ) FROM api_activities GROUP BY rollup (error_code);
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=7149357.90..7358614.52 rows=6 width=2) (actual time=54271.725..82354.144 rows=6 loops=1)
   Output: error_code, count(*)
   Group Key: api_activities.error_code
   Group Key: ()
   ->  Sort  (cost=7149357.90..7219110.09 rows=27900874 width=2) (actual time=54270.636..76651.121 rows=36222428 loops=1)
         Output: error_code
         Sort Key: api_activities.error_code
         Sort Method: external merge  Disk: 424864kB
         ->  Seq Scan on public.api_activities  (cost=0.00..3317425.74 rows=27900874 width=2) (actual time=0.053..34282.239 rows=36222428 loops=1)
               Output: error_code
 Planning time: 2.611 ms
 Execution time: 82437.416 ms
(12 rows)


I’ve given the output of “EXPLAIN ANAYLZE VERBOSE” rather than non-analyze, but there was no difference in the plan.

Running VACUUM FULL ANALYZE on this table makes no difference. Switching to Count(error_code) makes no difference. Using GROUP BY GROUPING SETS ((), error_code) makes no difference.

I understand that a HashAggregate is possible only if it can fit all the aggregates into work_mem. There are 5 different error codes, and the statistics (from pg_stats) are showing that PG knows this. Adding just one more bucket for the “()” case should not cause a fallback to GroupAggregate.


PostgreSQL version: 9.5.2 (just upgraded today, Thank you! <3 )

(Was exhibiting same problem under 9.5.0)


How installed: apt-get package from apt.postgresql.org


Settings differences:

 application_name: psql
 client_encoding: UTF8
 DateStyle: ISO, MDY
 default_text_search_config: pg_catalog.english
 dynamic_shared_memory_type: posix
 lc_messages: en_US.UTF-8
 lc_monetary: en_US.UTF-8
 lc_numeric: en_US.UTF-8
 lc_time: en_US.UTF-8
 listen_addresses: *
 log_line_prefix: %t [%p-%c-%l][%a][%i][%e][%s][%x-%v] %q%u@%d 
 log_timezone: UTC
 logging_collector: on
 max_connections: 100
 max_stack_depth: 2MB
 port: 5432
 shared_buffers: 1GB
 ssl: on
 ssl_cert_file: /etc/ssl/certs/ssl-cert-snakeoil.pem
 ssl_key_file: /etc/ssl/private/ssl-cert-snakeoil.key
 TimeZone: UTC
 work_mem: 128MB


OS and Version: Ubuntu Trusty: Linux 3.13.0-66-generic #108-Ubuntu SMP Wed Oct 7 15:20:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux


Program used to connect: psql


Nothing unusual in the logs, apart from the query indicating that it took a while to run.


I know that there’s several workarounds I can use for this simple case, such as using a CTE, then doing a rollup on that, but I’m simply reporting what I think is a bug in the query optimizer.


Thank you for your attention! Please let me know if there’s any additional information you need, or additional tests you’d like to run.


— Chris Cogdon <chris@cogdon.org>
— Using PostgreSQL since 6.2! 




Re: Adding a ROLLUP switches to GroupAggregate unexpectedly

From
Tom Lane
Date:
Chris Cogdon <chris@cogdon.org> writes:
> Hi folks! I’ve a query where adding a rollup to the group by switches to
> GroupAggregate unexpectedly, where the standard GROUP BY uses
> HashAggregate.

The current implementation of rollup doesn't support using hashed
aggregation.  I don't know if that's for lack of round tuits or because
it's actually hard, but it's not the planner's fault.

            regards, tom lane


Re: Adding a ROLLUP switches to GroupAggregate unexpectedly

From
"ktm@rice.edu"
Date:
On Thu, Mar 31, 2016 at 02:56:48PM -0400, Tom Lane wrote:
> Chris Cogdon <chris@cogdon.org> writes:
> > Hi folks! I’ve a query where adding a rollup to the group by switches to
> > GroupAggregate unexpectedly, where the standard GROUP BY uses
> > HashAggregate.
>
> The current implementation of rollup doesn't support using hashed
> aggregation.  I don't know if that's for lack of round tuits or because
> it's actually hard, but it's not the planner's fault.
>
>             regards, tom lane
>

Hi,

Cribbed from the mailing list:

http://www.postgresql.org/message-id/E1YtRD5-0005Q7-SM@gemulon.postgresql.org

The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.

The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.

Regards,
Ken


Re: Adding a ROLLUP switches to GroupAggregate unexpectedly

From
Peter Geoghegan
Date:
On Thu, Mar 31, 2016 at 10:03 AM, Chris Cogdon <chris@cogdon.org> wrote:
> Description: Adding a summary row by changing “GROUP BY x” into “GROUP BY
> ROLLUP (x)” should not cause a switch from HashAggregate to GroupAggregate

While this restriction has not been lifted for PostgreSQL 9.6,
external sorting will be much faster in 9.6. During benchmarking,
there were 2x-3x speedups in overall query runtime for many common
cases. This new performance optimization should ameliorate your ROLLUP
problem on 9.6, simply because the sort operation will be so much
faster.

Similarly, we have yet to make HashAggregates spill when they exceed
work_mem, which is another restriction on their use that we should get
around to fixing. As you point out, this restriction continues to be a
major consideration during planning, sometimes resulting in a
GroupAggregate where a HashAggregate would have been faster (even with
spilling of the hash table). However, simply having significantly
faster external sorts once again makes that restriction less of a
problem.

I have noticed that the old replacement selection algorithm that the
external sort would have used here does quite badly on low cardinality
inputs, too. I bet that was a factor here.

--
Peter Geoghegan