On 10/21/22 18:44, Tomas Vondra wrote:
>
> ...
>
>> Apart from that, this patch looks good.
>>
Sadly, I don't think we can fix it like this :-(
The problem is that all ranges start with all_nulls=true, because the
new range gets initialized by brin_memtuple_initialize() like that. But
this happens for *every* range before we even start processing the rows.
So this way all the ranges would end up with has_nulls=true, making that
flag pretty useless.
Actually, even just doing "truncate" on the table creates such all-nulls
range for the first range, and serializes it to disk.
I wondered why we even write such tuples for "empty" ranges to disk, for
example after "TRUNCATE" - the table is empty by definition, so how come
we write all-nulls brin summary for the first range?
For example brininsert() checks if the brin tuple was modified and needs
to be written back, but brinbuild() just ignores that, and initializes
(and writes) writes the tuple to disk anyway. I think we should not do
that - there should be a flag in BrinBuildState, tracking if the BRIN
tuple was modified, and we should only write it if it's true.
That means we should never get an on-disk summary representing nothing.
That doesn't fix the issue, though, because we still need to pass the
memtuple tuple to the add_value opclass procedure, and whether it sets
the has_nulls flag depends on whether it's a new tuple representing no
other rows (in which case has_nulls remains false) or whether it was
read from disk (in which case it needs to be flipped to 'true').
But the opclass has no way to tell the difference at the moment - it
just gets the BrinMemTuple. So we'd have to extend this, somehow.
I wonder how to do this in a back-patchable way - we can't add
parameters to the opclass procedure, and the other solution seems to be
storing it right in the BrinMemTuple, somehow. But that's likely an ABI
break :-(
The only solution I can think of is actually passing it using all_nulls
and has_nulls - we could set both flags to true (which we never do now)
and teach the opclass that it signifies "empty" (and thus not to update
has_nulls after resetting all_nulls).
Something like the attached (I haven't added any more tests, not sure
what would those look like - I can't think of a query testing this,
although maybe we could check how the flags change using pageinspect).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company