On 2/28/24 21:06, Melanie Plageman wrote:
> On Wed, Feb 28, 2024 at 2:23 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 2/28/24 15:56, Tomas Vondra wrote:
>>>> ...
>>>
>>> Sure, I can do that. It'll take a couple hours to get the results, I'll
>>> share them when I have them.
>>>
>>
>> Here are the results with only patches 0001 - 0012 applied (i.e. without
>> the patch introducing the streaming read API, and the patch switching
>> the bitmap heap scan to use it).
>>
>> The changes in performance don't disappear entirely, but the scale is
>> certainly much smaller - both in the complete results for all runs, and
>> for the "optimal" runs that would actually pick bitmapscan.
>
> Hmm. I'm trying to think how my refactor could have had this impact.
> It seems like all the most notable regressions are with 4 parallel
> workers. What do the numeric column labels mean across the top
> (2,4,8,16...) -- are they related to "matches"? And if so, what does
> that mean?
>
That's the number of distinct values matched by the query, which should
be an approximation of the number of matching rows. The number of
distinct values in the data set differs by data set, but for 1M rows
it's roughly like this:
uniform: 10k
linear: 10k
cyclic: 100
So for example matches=128 means ~1% of rows for uniform/linear, and
100% for cyclic data sets.
As for the possible cause, I think it's clear most of the difference
comes from the last patch that actually switches bitmap heap scan to the
streaming read API. That's mostly expected/understandable, although we
probably need to look into the regressions or cases with e_i_c=0.
To analyze the 0001-0012 patches, maybe it'd be helpful to run tests for
individual patches. I can try doing that tomorrow. It'll have to be a
limited set of tests, to reduce the time, but might tell us whether it's
due to a single patch or multiple patches.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company