Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: PATCH: Using BRIN indexes for sorted output
Date
Msg-id CAEze2Whd=MCQ61EEUUWJgohHN5cm3bv027N+vS4o6zOuBTY3tw@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: Using BRIN indexes for sorted output  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: PATCH: Using BRIN indexes for sorted output  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Re: PATCH: Using BRIN indexes for sorted output  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-hackers
On Sat, 18 Feb 2023 at 16:55, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> cfbot complained there's one more place triggering a compiler warning on
> 32-bit systems, so here's a version fixing that.
>
> I've also added a copy of the regression tests but using the indexam
> stats added in 0001. This is just a copy of the already existing
> regression tests, just with enable_indexam_stats=true - this should
> catch some of the issues that went mostly undetected in the earlier
> patch versions.


Comments on 0001, mostly comments and patch design:

> +range_minval_cmp(const void *a, const void *b, void *arg)
> [...]
> +range_maxval_cmp(const void *a, const void *b, void *arg)
> [...]
> +range_values_cmp(const void *a, const void *b, void *arg)

Can the arguments of these functions be modified into the types they
are expected to receive? If not, could you add a comment on why that's
not possible?
I don't think it's good practise to "just" use void* for arguments to
cast them to more concrete types in the next lines without an
immediate explanation.

> +     * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).

Could you please expand on this? We do have GIST support for ranges, too.

> + * brin_minmax_stats
> + *        Calculate custom statistics for a BRIN minmax index.
> + *
> + * At the moment this calculates:
> + *
> + *  - number of summarized/not-summarized and all/has nulls ranges

I think statistics gathering of an index should be done at the AM
level, not attribute level. The docs currently suggest that the user
builds one BRIN index with 16 columns instead of 16 BRIN indexes with
one column each, which would make the statistics gathering use 16x
more IO if the scanned data cannot be reused.

It is possible to build BRIN indexes on more than one column with more
than one opclass family like `USING brin (id int8_minmax_ops, id
int8_bloom_ops)`. This would mean various duplicate statistics fields,
no?
It seems to me that it's more useful to do the null- and n_summarized
on the index level instead of duplicating that inside the opclass.

> + for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)

I am not familiar with the frequency of max-sized relations, but this
would go into an infinite loop for pagesPerRange values >1 for
max-sized relations due to BlockNumber wraparound. I think there
should be some additional overflow checks here.

> +/*
> + * get_attstaindexam
> + *
> + *      Given the table and attribute number of a column, get the index AM
> + *      statistics.  Return NULL if no data available.
> + *

Shouldn't this be "given the index and attribute number" instead of
"the table and attribute number"?
I think we need to be compatible with indexes on expression here, so
that we don't fail to create (or use) statistics for an index `USING
brin ( (int8range(my_min_column, my_max_column, '[]'))
range_inclusion_ops)` when we implement stats for range_inclusion_ops.

> + * Alternative to brin_minmax_match_tuples_to_ranges2, leveraging ordering

Does this function still exist?

.

I'm planning on reviewing the other patches, and noticed that a lot of
the patches are marked WIP. Could you share a status on those, because
currently that status is unknown: Are these patches you don't plan on
including, or are these patches only (or mostly) included for
debugging?


Kind regards,

Matthias van de Meent



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Reducing connection overhead in pg_upgrade compat check phase
Next
From: Tom Lane
Date:
Subject: Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)