Re: benchmarking the query planner - Mailing list pgsql-hackers

From Tom Lane
Subject Re: benchmarking the query planner
Date
Msg-id 9565.1229018968@sss.pgh.pa.us
Whole thread Raw
In response to Re: benchmarking the query planner  ("Robert Haas" <robertmhaas@gmail.com>)
Responses Re: benchmarking the query planner  ("Vladimir Sitnikov" <sitnikov.vladimir@gmail.com>)
Re: benchmarking the query planner  ("Robert Haas" <robertmhaas@gmail.com>)
Re: benchmarking the query planner  (Simon Riggs <simon@2ndQuadrant.com>)
Re: benchmarking the query planner  (Gregory Stark <stark@enterprisedb.com>)
Re: benchmarking the query planner  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-hackers
"Robert Haas" <robertmhaas@gmail.com> writes:
> Ah, that makes sense.  Here's a test case based on Greg's.  This is
> definitely more than linear once you get above about n = 80, but it's
> not quadratic either.  n = 1000 is only 43x n = 80, and while that's
> surely more than 1000/80 = 12.5, it's also a lot less than (1000/80)^2
> = 156.25.

Yeah, that's not too surprising.  There's a linear cost associated with
fetching/deconstructing the stats array, and a quadratic cost in the
comparison loop.  What these results say is that the upper limit of 1000
keeps you from getting into the range where the linear component is
completely negligible.  If you plot the results and try to fit a curve
like c1*x^2 + c2*x to them, you get a very good fit for all the points
above about 80.  Below that, the curve has a flatter slope, indicating
a smaller linear cost component.  The values in this test are about 100
bytes each, so the breakpoint at 80 seems to correspond to what happens
when the stats array no longer fits in one page.

I replicated your test and got timings like these, using CVS HEAD with
cassert off:

10    1.587
20    1.997
30    2.208
40    2.499
50    2.734
60    3.048
70    3.368
80    3.706
90    4.686
100    6.418
150    10.016
200    13.598
250    17.905
300    22.777
400    33.471
500    46.394
600    61.185
700    77.664
800    96.304
900    116.615
1000    140.117

So this machine is a bit slower than yours, but otherwise it's pretty
consistent with your numbers.  I then modified the test to use

array[random()::text,random()::text,random()::text,random()::text,random()::text,random()::text]

ie, the same data except stuffed into an array.  I did this because
I know that array_eq is pretty durn expensive, and indeed:

10    1.662
20    2.478
30    3.119
40    3.885
50    4.636
60    5.437
70    6.403
80    7.427
90    8.473
100    9.597
150    16.542
200    24.919
250    35.225
300    47.423
400    76.509
500    114.076
600    157.535
700    211.189
800    269.731
900    335.427
1000    409.638

When looking at these numbers one might think the threshold of pain
is about 50, rather than 100 which is where I'd put it for the text
example.  However, this is probably an extreme worst case.

On the whole I think we have some evidence here to say that upping the
default value of default_stats_target to 100 wouldn't be out of line,
but 1000 definitely is.  Comments?

BTW, does anyone have an opinion about changing the upper limit for
default_stats_target to, say, 10000?  These tests suggest that you
wouldn't want such a value for a column used as a join key, but
I can see a possible argument for high values in text search and
similar applications.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Updates of SE-PostgreSQL 8.4devel patches (r1268)
Next
From: Tom Lane
Date:
Subject: Re: Updates of SE-PostgreSQL 8.4devel patches (r1268)