Home > mailing lists

Re: bigint out of range - Mailing list pgsql-general

From	Peter J. Holzer
Subject	Re: bigint out of range
Date	May 19, 2019 10:43:52
Msg-id	20190519104352.fty5grjwrsj5p4zp@hjp.at Whole thread Raw
In response to	Re: bigint out of range (Ron <ronljohnsonjr@gmail.com>)
Responses	Re: bigint out of range Re: bigint out of range
List	pgsql-general

Tree view

On 2019-05-18 19:16:19 -0500, Ron wrote:
> On 5/18/19 5:39 PM, Peter J. Holzer wrote:
> > On 2019-05-18 17:14:59 -0500, Ron wrote:
> > > On 5/18/19 3:49 PM, Peter J. Holzer wrote:
> > >      On 2019-05-18 15:19:22 -0500, Ron wrote:
> > >          On 5/18/19 2:27 PM, Peter J. Holzer wrote:
> > >              On 2019-05-18 10:49:53 -0700, David G. Johnston wrote:
> > >
> > >                  You don’t perform math on a hash
> > >
> > >              That's not generally true. Hashes are used for further computation for
> > >              example in hash tables or in cryptography.
> > >
> > >          How is it "using math" to use a hash key in a hash lookup table?
> > >
> > >      hash modulo table size.
> > >
> > >
> > > I've seen that used when the tablespace is pre-allocated, and you hash modulo
> > > the tablespace page number.  (Yes, performance tanks when you start filling up
> > > pages.)  How do you hash on the (ever growing) table size?
> > The hash function returns a number in a range much larger than the
> > possible number of buckets. 64 bits is a good choice today.
> >
> > To determine the bucket you need to reduce this number to something in
> > the range [0, nr_buckets). This is where modulo comes in:
> >
> > i = h % nr_buckets
> >
> > If the the table fills up, you increase nr_buckets, reallocate and
> > rehash all entries.
>
> Ouch.  Response time on a big table would take a serious hit if that rehash
> happened in the middle of the day on a big OLTP system.

So that might be a reason not to use hash indexes tables where the worst
case insert time is a concern (the average insert time is still O(1)).

But please be aware that I answered your question 'How is it "using
math" to use a hash key?', not 'How are hash indexes in PostgreSQL
implemented?'. So my answer covered the most simple and generic
implementation.

You can split buckets lazily, and in fact PostgreSQL does this. I just
looked at the code briefly, but I don't really understand how it works.
Guess I would have to read Margo Seltzer's paper paper for that.

Still, even with that optimization (or tradeoff - it may make the lookup
slower) the README notes that splits are expensive.

        hp

--
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp@hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

Attachment

signature.asc

pgsql-general by date:

From: Andrew Gierth
Date: 19 May 2019, 02:48:17
Subject: Re: Strange performance degregation in sql function (PG11.1)

From: Ron
Date: 19 May 2019, 13:38:14
Subject: Re: bigint out of range

Re: bigint out of range - Mailing list pgsql-general

Attachment

Previous

Next