Home > mailing lists

Re: queries with DISTINCT / GROUP BY giving different plans - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: queries with DISTINCT / GROUP BY giving different plans
Date	August 20, 2013 23:32:17
Msg-id	20391.1377041529@sss.pgh.pa.us Whole thread Raw
In response to	Re: queries with DISTINCT / GROUP BY giving different plans (Tomas Vondra <tv@fuzzy.cz>)
List	pgsql-performance

Tree view

Tomas Vondra <tv@fuzzy.cz> writes:
> Not quite sure how to parse this (not a native speaker here, sorry).
> Does that mean we want to keep it as it is now (because fixing it would
> cause even worse errors with low estimates)? Or do we want to fix
> hashed_distinct so that it behaves like hashed_grouping?

We need to fix hashed_distinct like this:

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bcc0d45..99284cb 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** choose_hashed_distinct(PlannerInfo *root
*** 2848,2854 ****
--- 2848,2858 ----
       * Don't do it if it doesn't look like the hashtable will fit into
       * work_mem.
       */
+
+     /* Estimate per-hash-entry space at tuple width... */
      hashentrysize = MAXALIGN(path_width) + MAXALIGN(sizeof(MinimalTupleData));
+     /* plus the per-hash-entry overhead */
+     hashentrysize += hash_agg_entry_size(0);

      if (hashentrysize * dNumDistinctRows > work_mem * 1024L)
          return false;

I've started a thread over in -hackers about whether it's prudent to
back-patch this change or not.

            regards, tom lane

pgsql-performance by date:

From: Tomas Vondra
Date: 20 August 2013, 22:21:29
Subject: Re: queries with DISTINCT / GROUP BY giving different plans

From: Jeff Janes
Date: 21 August 2013, 00:24:14
Subject: How to investiage slow insert problem

Re: queries with DISTINCT / GROUP BY giving different plans - Mailing list pgsql-performance

Previous

Next