Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: BitmapHeapScan streaming read user and prelim refactoring |
Date | |
Msg-id | CAAKRu_YXTOezK3h_YrNJrAUDuAzet59hD1bmmtH4zVPLC00HtA@mail.gmail.com Whole thread Raw |
In response to | Re: BitmapHeapScan streaming read user and prelim refactoring (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: BitmapHeapScan streaming read user and prelim refactoring
|
List | pgsql-hackers |
On Thu, Feb 29, 2024 at 6:44 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 2/29/24 23:44, Tomas Vondra wrote: > > > > ... > > > >>> > >>> I do have some partial results, comparing the patches. I only ran one of > >>> the more affected workloads (cyclic) on the xeon, attached is a PDF > >>> comparing master and the 0001-0014 patches. The percentages are timing > >>> vs. the preceding patch (green - faster, red - slower). > >> > >> Just confirming: the results are for uncached? > >> > > > > Yes, cyclic data set, uncached case. I picked this because it seemed > > like one of the most affected cases. Do you want me to test some other > > cases too? > > > > BTW I decided to look at the data from a slightly different angle and > compare the behavior with increasing effective_io_concurrency. Attached > are charts for three "uncached" cases: > > * uniform, work_mem=4MB, workers_per_gather=0 > * linear-fuzz, work_mem=4MB, workers_per_gather=0 > * uniform, work_mem=4MB, workers_per_gather=4 > > Each page has charts for master and patched build (with all patches). I > think there's a pretty obvious difference in how increasing e_i_c > affects the two builds: Wow! These visualizations make it exceptionally clear. I want to go to the Vondra school of data visualizations for performance results! > 1) On master there's clear difference between eic=0 and eic=1 cases, but > on the patched build there's literally no difference - for example the > "uniform" distribution is clearly not great for prefetching, but eic=0 > regresses to eic=1 poor behavior). Yes, so eic=0 and eic=1 are identical with the streaming read API. That is, eic 0 does not disable prefetching. Thomas is going to update the streaming read API to avoid issuing an fadvise for the last block in a range before issuing a read -- which would mean no prefetching with eic 0 and eic 1. Not doing prefetching with eic 1 actually seems like the right behavior -- which would be different than what master is doing, right? Hopefully this fixes the clear difference between master and the patched version at eic 0. > 2) For some reason, the prefetching with eic>1 perform much better with > the patches, except for with very low selectivity values (close to 0%). > Not sure why this is happening - either the overhead is much lower > (which would matter on these "adversarial" data distribution, but how > could that be when fadvise is not free), or it ends up not doing any > prefetching (but then what about (1)?). For the uniform with four parallel workers, eic == 0 being worse than master makes sense for the above reason. But I'm not totally sure why eic == 1 would be worse with the patch than with master. Both are doing a (somewhat useless) prefetch. With very low selectivity, you are less likely to get readahead (right?) and similarly less likely to be able to build up > 8kB IOs -- which is one of the main value propositions of the streaming read code. I imagine that this larger read benefit is part of why the performance is better at higher selectivities with the patch. This might be a silly experiment, but we could try decreasing MAX_BUFFERS_PER_TRANSFER on the patched version and see if the performance gains go away. > 3) I'm not sure about the linear-fuzz case, the only explanation I have > we're able to skip almost all of the prefetches (and read-ahead likely > works pretty well here). I started looking at the data generated by linear-fuzz to understand exactly what effect the fuzz was having but haven't had time to really understand the characteristics of this dataset. In the original results, I thought uncached linear-fuzz and linear had similar results (performance improvement from master). What do you expect with linear vs linear-fuzz? - Melanie
pgsql-hackers by date: