Re: WIP: Fast GiST index build - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: WIP: Fast GiST index build
Date
Msg-id 4E4A5CD5.2040601@enterprisedb.com
Whole thread Raw
In response to Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: WIP: Fast GiST index build
List pgsql-hackers
Looking at the calculation of levelStep:

> +     /*
> +      * Calculate levelStep by available amount of memory. We should be able to
> +      * load into main memory one page of each underlying node buffer (which
> +      * are in levelStep below). That give constraint over
> +      * maintenance_work_mem. Also we should be able to have subtree of
> +      * levelStep level in cache. That give constraint over
> +      * effective_cache_size.
> +      *
> +      * i'th underlying level of sub-tree can consists of
> +      * i^maxIndexTuplesPerPage pages at maximum. So, subtree of levelStep
> +      * levels can't be greater then 2 * maxIndexTuplesPerPage ^ levelStep
> +      * pages. We use some more reserve due to we probably can't take whole
> +      * effective cache and use formula 4 * maxIndexTuplesPerPage ^ levelStep =
> +      * effectiveCache. We use similar logic with maintenance_work_mem. We
> +      * should be able to store at least last pages of all buffers where we are
> +      * emptying current buffer to.
> +      */
> +     effectiveMemory = Min(maintenance_work_mem * 1024 / BLCKSZ,
> +                           effective_cache_size);
> +     levelStep = (int) log((double) effectiveMemory / 4.0) /
> +         log((double) maxIndexTuplesPerPage);
> +

I can see that that's equal to the formula given in the paper, 
log_B(M/4B), but I couldn't see any explanation for that formula in the 
paper. Your explanation makes sense, but where did it come from?

It seems a bit pessimistic: while it's true that the a subtree can't be 
larger than 2 * maxIndexTuplesPerPage ^ levelStep, you can put a tighter 
upper bound on it. The max size of a subtree of depth n can be 
calculated as the geometric series:

r^0 + r^1 + r^2 + ... + r^n = (1 - r^(n + 1)) / (1 - r)

where r = maxIndexTuplesPerPage. For r=2 those formulas are equal, but 
for a large r and small n (which is typical), the 2 * 
maxIndexTuplesPerPage^levelStep formula gives a value that's almost 
twice as large as the real max size of a subtree.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Re: Should we have an optional limit on the recursion depth of recursive CTEs?
Next
From: Magnus Hagander
Date:
Subject: non-ipv6 vs hostnames