Re: Abbreviated keys for text cost model fix - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Abbreviated keys for text cost model fix
Date
Msg-id CAM3SWZR2PDCphC+sWi9y811uYrJZopCj0PSKfafnoWHji=qckw@mail.gmail.com
Whole thread Raw
In response to Re: Abbreviated keys for text cost model fix  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Abbreviated keys for text cost model fix  (Peter Geoghegan <pg@heroku.com>)
Re: Abbreviated keys for text cost model fix  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Sun, Feb 22, 2015 at 1:19 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> In short, this fixes all the cases except for the ASC sorted data. I
> haven't done any code review, but I think we want this.
>
> I'll use data from the i5-2500k, but it applies to the Xeon too, except
> that the Xeon results are more noisy and the speedups are not that
> significant.
>
> For the 'text' data type, and 'random' dataset, the results are these:
>
>       scale    datum    cost-model
>     -------------------------------
>      100000     328%          323%
>     1000000     392%          391%
>     2000000      96%          565%
>     3000000      97%          572%
>     4000000      97%          571%
>     5000000      98%          570%
>
> The numbers are speedup vs. master, so 100% means exactly the same
> speed, 200% means twice as fast.
>
> So while with 'datum' patch this actually caused very nice speedup for
> small datasets - about 3-4x speedup up to 1M rows, for larger datasets
> we've seen small regression (~3% slower). With the cost model fix, we
> actually see a significant speedup (about 5.7x) for these cases.

Cool.

> I haven't verified whether this produces the same results, but if it
> does this is very nice.
>
> For 'DESC' dataset (i.e. data sorted in reverse order), we do get even
> better numbers, with up to 6.5x speedup on large datasets.
>
> But for 'ASC' dataset (i.e. already sorted data), we do get this:
>
>       scale    datum    cost-model
>     -------------------------------
>      100000      85%           84%
>     1000000      87%           87%
>     2000000      76%           96%
>     3000000      82%           90%
>     4000000      91%           83%
>     5000000      93%           81%
>
> Ummm, not that great, I guess :-(

You should try it with the data fully sorted like this, but with one
tiny difference: The very last tuple is out of order. How does that
look?

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Abbreviated keys for text cost model fix
Next
From: Peter Geoghegan
Date:
Subject: Re: Abbreviated keys for text cost model fix