Re: Removing "long int"-related limit on hash table sizes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Removing "long int"-related limit on hash table sizes
Date
Msg-id 44485.1627230484@sss.pgh.pa.us
Whole thread Raw
In response to Re: Removing "long int"-related limit on hash table sizes  (Andres Freund <andres@anarazel.de>)
Responses Re: Removing "long int"-related limit on hash table sizes  (Ranier Vilela <ranier.vf@gmail.com>)
Re: Removing "long int"-related limit on hash table sizes  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2021-07-23 17:15:24 -0400, Tom Lane wrote:
>> That's because they spill to disk where they did not before.  The easy
>> answer of "raise hash_mem_multiplier" doesn't help, because on Windows
>> the product of work_mem and hash_mem_multiplier is clamped to 2GB,
>> thanks to the ancient decision to do a lot of memory-space-related
>> calculations in "long int", which is only 32 bits on Win64.

> We really ought to just remove every single use of long.

I have no objection to that as a long-term goal.  But I'm not volunteering
to do all the work, and in any case it wouldn't be a back-patchable fix.
I feel that we do need to do something about this performance regression
in v13.

> Hm. I wonder if we would avoid some overflow dangers on 32bit systems if
> we made get_hash_memory_limit() and the relevant variables 64 bit,
> rather than 32bit / size_t.  E.g.

No, I don't like that.  Using size_t for memory-size variables is good
discipline.  Moreover, I'm not convinced that even with 64-bit ints,
overflow would be impossible in all the places I fixed here.  They're
multiplying several potentially very large values (one of which
is a float).  I think this is just plain sloppy coding, independently
of which bit-width you choose to be sloppy in.

>> +        skew_mcvs = skew_mcvs * SKEW_HASH_MEM_PERCENT / 100;

> I always have to think about the evaluation order of things like this
> (it's left to right for these), so I'd prefer parens around the
> multiplication. I did wonder briefly whether the SKEW_HASH_MEM_PERCENT /
> 100 just evaluates to 0...

OK, will do.  I see your point, because I'd sort of instinctively
wanted to write that as
        skew_mcvs *= SKEW_HASH_MEM_PERCENT / 100;
which of course would not work.

Thanks for looking at the code.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Yura Sokolov
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Next
From: Julien Rouhaud
Date:
Subject: Re: Planning counters in pg_stat_statements (using pgss_store)