Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: PATCH: Using BRIN indexes for sorted output |
Date | |
Msg-id | CAEze2Wi=DmXRzrOfvWoFqiSjT+arvXFUU=X2rEXkbZFoKUUqVQ@mail.gmail.com Whole thread Raw |
In response to | Re: PATCH: Using BRIN indexes for sorted output (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: PATCH: Using BRIN indexes for sorted output
|
List | pgsql-hackers |
On Thu, 23 Feb 2023 at 19:48, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 2/23/23 17:44, Matthias van de Meent wrote: > > On Thu, 23 Feb 2023 at 16:22, Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> > >> On 2/23/23 15:19, Matthias van de Meent wrote: > >>> Comments on 0001, mostly comments and patch design: > > > > One more comment: > > > >>>> + range->min_value = bval->bv_values[0]; > >>>> + range->max_value = bval->bv_values[1]; > > > > This produces dangling pointers for minmax indexes on by-ref types > > such as text and bytea, due to the memory context of the decoding > > tuple being reset every iteration. :-( > > > > Yeah, that sounds like a bug. Also a sign the tests should have some > by-ref data types (presumably there are none, as that would certainly > trip some asserts etc.). I'm not sure we currently trip asserts, as the data we store in the memory context is very limited, making it unlikely we actually release the memory region back to the OS. I did get assertion failures by adding the attached patch, but I don't think that's something we can do in the final release. > > With the proposed patch, we do O(ncols_statsenabled) scans over the > > BRIN index. Each scan reads all ncol columns of all block ranges from > > disk, so in effect the data scan does on the order of > > O(ncols_statsenabled * ncols * nranges) IOs, or O(n^2) on cols when > > all columns have statistics enabled. > > > > I don't think that's the number of I/O operations we'll do, because we > always read the whole BRIN tuple at once. So I believe it should rather > be something like > > O(ncols_statsenabled * nranges) > > assuming nranges is the number of page ranges. But even that's likely a > significant overestimate because most of the tuples will be served from > shared buffers. We store some data per index attribute, which makes the IO required for a single indexed range proportional to the number of index attributes. If we then scan the index a number of times proportional to the number of attributes, the cumulative IO load of statistics gathering for that index is quadratic on the number of attributes, not linear, right? > Considering how tiny BRIN indexes are, this is likely orders of > magnitude less I/O than we expend on sampling rows from the table. I > mean, with the default statistics target we read ~30000 pages (~240MB) > or more just to sample the rows. Randomly, while the BRIN index is > likely scanned mostly sequentially. Mostly agreed; except I think it's not too difficult to imagine a BRIN index that is on that scale; with as an example the bloom filters that easily take up several hundreds of bytes. With the default configuration of 128 pages_per_range, n_distinct_per_range of -0.1, and false_positive_rate of 0.01, the bloom filter size is 4.36 KiB - each indexed item on its own page. It is still only 1% of the original table's size, but there are enough tables that are larger than 24GB that this could be a significant issue. Note that most of my concerns are related to our current documentation's statement that there are no demerits to multi-column BRIN indexes: """ 11.3. Multicolumn Indexes [...] The only reason to have multiple BRIN indexes instead of one multicolumn BRIN index on a single table is to have a different pages_per_range storage parameter. """ Wide BRIN indexes add IO overhead for single-attribute scans when compared to single-attribute indexes, so having N single-attribute index scans to build statistics the statistics on an N-attribute index is not great. > Maybe there are cases where this would be an issue, but I haven't seen > one when working on this patch (and I did a lot of experiments). I'd > like to see one before we start optimizing it ... I'm not only worried about optimizing it, I'm also worried that we're putting this abstraction at the wrong level in a way that is difficult to modify. > This also reminds me that the issues I actually saw (e.g. memory > consumption) would be made worse by processing all columns at once, > because then you need to keep more columns in memory. Yes, I that can be a valid concern, but don't we already do similar things in the current table statistics gathering? Kind regards, Matthias van de Meent
pgsql-hackers by date: