Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: BitmapHeapScan streaming read user and prelim refactoring
Date
Msg-id 45bed4f3-a5bf-4a34-b544-7e751bd437e1@enterprisedb.com
Whole thread Raw
In response to Re: BitmapHeapScan streaming read user and prelim refactoring  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: BitmapHeapScan streaming read user and prelim refactoring
List pgsql-hackers

On 2/28/24 21:06, Melanie Plageman wrote:
> On Wed, Feb 28, 2024 at 2:23 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 2/28/24 15:56, Tomas Vondra wrote:
>>>> ...
>>>
>>> Sure, I can do that. It'll take a couple hours to get the results, I'll
>>> share them when I have them.
>>>
>>
>> Here are the results with only patches 0001 - 0012 applied (i.e. without
>> the patch introducing the streaming read API, and the patch switching
>> the bitmap heap scan to use it).
>>
>> The changes in performance don't disappear entirely, but the scale is
>> certainly much smaller - both in the complete results for all runs, and
>> for the "optimal" runs that would actually pick bitmapscan.
> 
> Hmm. I'm trying to think how my refactor could have had this impact.
> It seems like all the most notable regressions are with 4 parallel
> workers. What do the numeric column labels mean across the top
> (2,4,8,16...) -- are they related to "matches"? And if so, what does
> that mean?
> 

That's the number of distinct values matched by the query, which should
be an approximation of the number of matching rows. The number of
distinct values in the data set differs by data set, but for 1M rows
it's roughly like this:

uniform: 10k
linear: 10k
cyclic: 100

So for example matches=128 means ~1% of rows for uniform/linear, and
100% for cyclic data sets.

As for the possible cause, I think it's clear most of the difference
comes from the last patch that actually switches bitmap heap scan to the
streaming read API. That's mostly expected/understandable, although we
probably need to look into the regressions or cases with e_i_c=0.

To analyze the 0001-0012 patches, maybe it'd be helpful to run tests for
individual patches. I can try doing that tomorrow. It'll have to be a
limited set of tests, to reduce the time, but might tell us whether it's
due to a single patch or multiple patches.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Refactor SASL exchange in preparation for OAuth Bearer
Next
From: Melanie Plageman
Date:
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring