On Tue, Mar 19, 2024 at 4:34 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 3/18/24 16:55, Tomas Vondra wrote:
> >
> > ...
> >
> > OK, I've restarted the tests for only 0012 and 0014 patches, and I'll
> > wait for these to complete - I don't want to be looking for patterns
> > until we have enough data to smooth this out.
> >
> >
>
> I now have results for 1M and 10M runs on the two builds (0012 and
> 0014), attached is a chart for relative performance plotting
>
> (0014 timing) / (0012 timing)
>
> for "optimal' runs that would pick bitmapscan on their own. There's
> nothing special about the config - I reduced the random_page_cost to
> 1.5-2.0 to reflect both machines have flash storage, etc.
>
> Overall, the chart is pretty consistent with what I shared on Sunday.
> Most of the results are fine (0014 is close to 0012 or faster), but
> there's a bunch of cases that are much slower. Interestingly enough,
> almost all of them are on the i5 machine, almost none of the xeon. My
> guess is this is about the SSD type (SATA vs. NVMe).
>
> Attached if table of ~50 worst regressions (by the metric above), and
> it's interesting the worst regressions are with eic=0 and eic=1.
>
> I decided to look at the first case (eic=0), and the timings are quite
> stable - there are three runs for each build, with timings close to the
> average (see below the table).
>
> Attached is a script that reproduces this on both machines, but the
> difference is much more significant on i5 (~5x) compared to xeon (~2x).
>
> I haven't investigated what exactly is happening and why, hopefully the
> script will allow you to reproduce this independently. I plan to take a
> look, but I don't know when I'll have time for this.
>
> FWIW if the script does not reproduce this on your machines, I might be
> able to give you access to the i5 machine. Let me know.
I had this particular email on this thread bookmarked so I could go
back and investigate the regression. The patch set has changed since
these benchmarks were run. And, I honestly no longer remember what
0014 and 0012 were. There are four remaining patches in the set I
posted earlier today in [1]. All of them are directly related to
bitmap heap scan using the streaming read interface (i.e. not useful
on their own). Therefore, it is time to investigate if we should merge
streaming read bitmap heap scan.
I ran the query included in the reproducer in this mail a dozen times
on master and with the patches in [1] and the average speedup with my
patch is 12%. So, at least for this query, I don't see a regression.
What do you think about rerunning these old benchmarks to see what
they look like now?
- Melanie
[1] https://www.postgresql.org/message-id/CAAKRu_as499kHb9B4B4%3D%2Bc%2B4p%2BOF_Bibd4KEdoqyBgjEaEUdgA%40mail.gmail.com