Re: [HACKERS] A design for amcheck heapam verification - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] A design for amcheck heapam verification
Date
Msg-id 20180331012031.ftf4vbiwabsg7uvb@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] A design for amcheck heapam verification  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] A design for amcheck heapam verification  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On 2018-03-29 19:42:42 -0700, Peter Geoghegan wrote:
> >> +     /*
> >> +      * Aim for two bytes per element; this is sufficient to get a false
> >> +      * positive rate below 1%, independent of the size of the bitset or total
> >> +      * number of elements.  Also, if rounding down the size of the bitset to
> >> +      * the next lowest power of two turns out to be a significant drop, the
> >> +      * false positive rate still won't exceed 2% in almost all cases.
> >> +      */
> >> +     bitset_bytes = Min(bloom_work_mem * 1024L, total_elems * 2);
> >> +     /* Minimum allowable size is 1MB */
> >> +     bitset_bytes = Max(1024L * 1024L, bitset_bytes);
> >
> > Some upcasting might be advisable, to avoid dangers of overflows?
> 
> When it comes to sizing work_mem, using long literals to go from KB to
> bytes is How It's Done™. I actually think that's silly myself, because
> it's based on the assumption that long is wider than int, even though
> it isn't on Windows. But that's okay because we have the old work_mem
> size limits on Windows.

> What would the upcasting you have in mind look like?

Just use UINT64CONST()?  Let's try not to introduce further code that'll
need to get painfully fixed.


> 
> >> +     /* Use 64-bit hashing to get two independent 32-bit hashes */
> >> +     hash = DatumGetUInt64(hash_any_extended(elem, len, filter->seed));
> >
> > Hm. Is that smart given how some hash functions are defined? E.g. for
> > int8 the higher bits aren't really that independent for small values:
> 
> Robert suggested that I do this. I don't think that we need to make it
> about the quality of the hash function that we have available. That
> really seems like a distinct question to me. It seems clear that this
> ought to be fine (or should be fine, if you prefer). I understand why
> you're asking about this, but it's not scalable to ask every user of a
> hash function to care that it might be a bit crap. Hash functions
> aren't supposed to be a bit crap.

Well, they're not supposed to be, but they are. Practical realities
matter...  But I think it's moot because we don't use any of the bad
ones, I only got to that later...


> >> +DROP FUNCTION bt_index_check(regclass);
> >> +CREATE FUNCTION bt_index_check(index regclass,
> >> +    heapallindexed boolean DEFAULT false)
> >> +RETURNS VOID
> >> +AS 'MODULE_PATHNAME', 'bt_index_check'
> >> +LANGUAGE C STRICT PARALLEL RESTRICTED;
> >
> > This breaks functions et al referencing the existing function.
> 
> This sounds like a general argument against ever changing a function's
> signature. It's not like we haven't done that a number of times in
> extensions like pageinspect. Does it really matter?

Yes, it does. And imo we shouldn't.


> > Also, I
> > don't quite recall the rules, don't we have to drop the function from
> > the extension first?
> 
> But...I did drop the function?

I mean in the ALTER EXTENSION ... DROP FUNCTION sense.



Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] A design for amcheck heapam verification
Next
From: Thomas Munro
Date:
Subject: Re: [HACKERS] Planning counters in pg_stat_statements