Re: Yet another abort-early plan disaster on 9.3

From: Tomas Vondra
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: ,
Msg-id: 5472416C.3080506@fuzzy.cz
(view: Whole thread, Raw)
In response to: Re: Yet another abort-early plan disaster on 9.3  (Jeff Janes)
List: pgsql-performance

Tree view

Yet another abort-early plan disaster on 9.3  (Josh Berkus, )
 Re: Yet another abort-early plan disaster on 9.3  (Merlin Moncure, )
 Re: Yet another abort-early plan disaster on 9.3  (Josh Berkus, )
  Re: Yet another abort-early plan disaster on 9.3  (Greg Stark, )
   Re: Yet another abort-early plan disaster on 9.3  (Claudio Freire, )
   Re: Yet another abort-early plan disaster on 9.3  (Tom Lane, )
 Re: Yet another abort-early plan disaster on 9.3  (Josh Berkus, )
  Re: Yet another abort-early plan disaster on 9.3  (Merlin Moncure, )
 Re: Yet another abort-early plan disaster on 9.3  (Josh Berkus, )
  Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs, )
   Re: Yet another abort-early plan disaster on 9.3  (Merlin Moncure, )
    Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs, )
     Re: Yet another abort-early plan disaster on 9.3  (Tom Lane, )
      Re: Yet another abort-early plan disaster on 9.3  (Gavin Flower, )
       Re: Yet another abort-early plan disaster on 9.3  (Jeff Janes, )
        Re: Yet another abort-early plan disaster on 9.3  (Gavin Flower, )
        Re: Yet another abort-early plan disaster on 9.3  (Merlin Moncure, )
      Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs, )
       Re: Yet another abort-early plan disaster on 9.3  ("Graeme B. Bell", )
        Re: Yet another abort-early plan disaster on 9.3  (Claudio Freire, )
        Re: Yet another abort-early plan disaster on 9.3  (Tom Lane, )
         Re: Yet another abort-early plan disaster on 9.3  ("Graeme B. Bell", )
       Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs, )
        Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs, )
         Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs, )
          Re: Yet another abort-early plan disaster on 9.3  (Michael Paquier, )
           Re: Yet another abort-early plan disaster on 9.3  (Evgeniy Shishkin, )
     Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs, )
      Re: Yet another abort-early plan disaster on 9.3  (Merlin Moncure, )
       Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs, )
    Re: Yet another abort-early plan disaster on 9.3  (Ryan Johnson, )
   Re: Yet another abort-early plan disaster on 9.3  (Greg Stark, )
 Re: Yet another abort-early plan disaster on 9.3  (Josh Berkus, )
  Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs, )
  Re: Yet another abort-early plan disaster on 9.3  (Jeff Janes, )
   Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs, )
 Re: Yet another abort-early plan disaster on 9.3  (Josh Berkus, )
  Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs, )
   Re: Yet another abort-early plan disaster on 9.3  (Peter Geoghegan, )
 Re: Yet another abort-early plan disaster on 9.3  (Josh Berkus, )
  Re: Yet another abort-early plan disaster on 9.3  (Peter Geoghegan, )
   Re: Yet another abort-early plan disaster on 9.3  (Tomas Vondra, )
  Re: Yet another abort-early plan disaster on 9.3  (Jeff Janes, )
   Re: Yet another abort-early plan disaster on 9.3  (Tomas Vondra, )
  Re: Yet another abort-early plan disaster on 9.3  (Greg Stark, )
   Re: Yet another abort-early plan disaster on 9.3  ("Tomas Vondra", )
    Re: Yet another abort-early plan disaster on 9.3  (Craig James, )
     Re: Yet another abort-early plan disaster on 9.3  (Tomas Vondra, )
      Re: Yet another abort-early plan disaster on 9.3  (Craig James, )
       Re: Yet another abort-early plan disaster on 9.3  (Tomas Vondra, )
    Re: Yet another abort-early plan disaster on 9.3  (Tomas Vondra, )
     Re: Yet another abort-early plan disaster on 9.3  (Jeff Janes, )
      Re: Yet another abort-early plan disaster on 9.3  (Tomas Vondra, )
 Re: Yet another abort-early plan disaster on 9.3  (Josh Berkus, )
  Re: Yet another abort-early plan disaster on 9.3  (Tomas Vondra, )
   Re: Yet another abort-early plan disaster on 9.3  (Greg Stark, )
    Re: Yet another abort-early plan disaster on 9.3  (Tomas Vondra, )
     Re: Yet another abort-early plan disaster on 9.3  (Greg Stark, )
 Re: Yet another abort-early plan disaster on 9.3  (Josh Berkus, )
  Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs, )

On 21.11.2014 19:38, Jeff Janes wrote:
>
> When I run this patch on the regression database, I get a case where
> the current method is exact but the adaptive one is off:
>
> WARNING:  ndistinct estimate current=676.00 adaptive=906.00
>
> select count(distinct stringu1) from onek;
> 676
>
> It should be seeing every single row, so I don't know why the
> adaptive method is off. Seems like a bug.

Thanks for noticing this. I wouldn't call it a bug, but there's clearly
room for improvement.

The estimator, as described in the original paper, does not expect the
sampling to be done "our" way (using fixed number of rows) but assumes
to get a fixed percentage of rows. Thus it does not expect the number of
sampled rows to get so close (or equal) to the total number of rows.

I think the only way to fix this is by checking if samplerows is close
to totalrows, and use a straightforward estimate in that case (instead
of a more sophisticated one). Something along these lines:

    if (samplerows >= 0.95 * totalrows)
        stats->stadistinct = (d + d/0.95) / 2;

which means "if we sampled >= 95% of the table, use the number of
observed distinct values directly".

I have modified the estimator to do the adaptive estimation, and then do
this correction too (and print the values). And with that in place I get
these results

  WARNING:  ndistinct estimate current=676.00 adaptive=996.00
  WARNING:  corrected ndistinct estimate current=676.00 adaptive=693.79

So it gets fairly close to the original estimate (and exact value).

In the end, this check should be performed before calling the adaptive
estimator at all (and not calling it in case we sampled most of the rows).

I also discovered an actual bug in the optimize_estimate() function,
using 'f_max' instead of the number of sampled rows.

Attached is a patch fixing the bug, and implementing the sample size check.

regards
Tomas

Attachment

pgsql-performance by date:

From: Tomas Vondra
Date:
Subject: Re: Yet another abort-early plan disaster on 9.3
From: Johann Spies
Date:
Subject: Re: pgtune + configurations with 9.3