On Sun, Feb 22, 2015 at 1:19 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> In short, this fixes all the cases except for the ASC sorted data. I
> haven't done any code review, but I think we want this.
>
> I'll use data from the i5-2500k, but it applies to the Xeon too, except
> that the Xeon results are more noisy and the speedups are not that
> significant.
>
> For the 'text' data type, and 'random' dataset, the results are these:
>
> scale datum cost-model
> -------------------------------
> 100000 328% 323%
> 1000000 392% 391%
> 2000000 96% 565%
> 3000000 97% 572%
> 4000000 97% 571%
> 5000000 98% 570%
>
> The numbers are speedup vs. master, so 100% means exactly the same
> speed, 200% means twice as fast.
>
> So while with 'datum' patch this actually caused very nice speedup for
> small datasets - about 3-4x speedup up to 1M rows, for larger datasets
> we've seen small regression (~3% slower). With the cost model fix, we
> actually see a significant speedup (about 5.7x) for these cases.
Cool.
> I haven't verified whether this produces the same results, but if it
> does this is very nice.
>
> For 'DESC' dataset (i.e. data sorted in reverse order), we do get even
> better numbers, with up to 6.5x speedup on large datasets.
>
> But for 'ASC' dataset (i.e. already sorted data), we do get this:
>
> scale datum cost-model
> -------------------------------
> 100000 85% 84%
> 1000000 87% 87%
> 2000000 76% 96%
> 3000000 82% 90%
> 4000000 91% 83%
> 5000000 93% 81%
>
> Ummm, not that great, I guess :-(
You should try it with the data fully sorted like this, but with one
tiny difference: The very last tuple is out of order. How does that
look?
--
Peter Geoghegan