Re: Allow a per-tablespace effective_io_concurrency setting - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Allow a per-tablespace effective_io_concurrency setting
Date
Msg-id 55E71E9E.4090403@2ndquadrant.com
Whole thread Raw
In response to Re: Allow a per-tablespace effective_io_concurrency setting  (Andres Freund <andres@anarazel.de>)
Responses Re: Allow a per-tablespace effective_io_concurrency setting  (Andres Freund <andres@anarazel.de>)
Re: Allow a per-tablespace effective_io_concurrency setting  (Julien Rouhaud <julien.rouhaud@dalibo.com>)
List pgsql-hackers
Hi

On 09/02/2015 03:53 PM, Andres Freund wrote:
>
> Hi,
>
> On 2015-07-18 12:17:39 +0200, Julien Rouhaud wrote:
>> I didn't know that the thread must exists on -hackers to be able to add
>> a commitfest entry, so I transfer the thread here.
>
> Please, in the future, also update the title of the thread to something
> fitting.
>
>> @@ -539,6 +541,9 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
>>   {
>>       BitmapHeapScanState *scanstate;
>>       Relation    currentRelation;
>> +#ifdef USE_PREFETCH
>> +    int new_io_concurrency;
>> +#endif
>>
>>       /* check for unsupported flags */
>>       Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
>> @@ -598,6 +603,25 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
>>        */
>>       currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
>>
>> +#ifdef USE_PREFETCH
>> +    /* check if the effective_io_concurrency has been overloaded for the
>> +     * tablespace storing the relation and compute the target_prefetch_pages,
>> +     * or just get the current target_prefetch_pages
>> +     */
>> +    new_io_concurrency = get_tablespace_io_concurrency(
>> +            currentRelation->rd_rel->reltablespace);
>> +
>> +
>> +    scanstate->target_prefetch_pages = target_prefetch_pages;
>> +
>> +    if (new_io_concurrency != effective_io_concurrency)
>> +    {
>> +        double prefetch_pages;
>> +       if (compute_io_concurrency(new_io_concurrency, &prefetch_pages))
>> +            scanstate->target_prefetch_pages = rint(prefetch_pages);
>> +    }
>> +#endif
>
> Maybe it's just me - but imo there should be as few USE_PREFETCH
> dependant places in the code as possible. It'll just be 0 when not
> supported, that's fine?

It's not just you. Dealing with code with plenty of ifdefs is annoying - 
it compiles just fine most of the time, until you compile it with some 
rare configuration. Then it either starts producing strange warnings or 
the compilation fails entirely.

It might make a tiny difference on builds without prefetching support 
because of code size, but realistically I think it's noise.

> Especially changing the size of externally visible structs depending
> ona configure detected ifdef seems wrong to me.

+100 to that

>> +    /*----------
>> +     * The user-visible GUC parameter is the number of drives (spindles),
>> +     * which we need to translate to a number-of-pages-to-prefetch target.
>> +     * The target value is stashed in *extra and then assigned to the actual
>> +     * variable by assign_effective_io_concurrency.
>> +     *
>> +     * The expected number of prefetch pages needed to keep N drives busy is:
>> +     *
>> +     * drives |   I/O requests
>> +     * -------+----------------
>> +     *        1 |   1
>> +     *        2 |   2/1 + 2/2 = 3
>> +     *        3 |   3/1 + 3/2 + 3/3 = 5 1/2
>> +     *        4 |   4/1 + 4/2 + 4/3 + 4/4 = 8 1/3
>> +     *        n |   n * H(n)
>
> I know you just moved this code. But: I don't buy this formula. Like at
> all. Doesn't queuing and reordering entirely invalidate the logic here?

Well, even the comment right next after the formula says that:
    * Experimental results show that both of these formulas aren't    * aggressiveenough, but we don't really have any
betterproposals.
 

That's the reason why users generally either use 0 or some rather high 
value (16 or 32 are the most common values see). The problem is that we 
don't really care about the number of spindles (and not just because 
SSDs don't have them at all), but about the target queue length per 
device. Spinning rust uses TCQ/NCQ to optimize the head movement, SSDs 
are parallel by nature (stacking multiple chips with separate channels).

I doubt we can really improve the formula, except maybe for saying "we 
want 16 requests per device" and multiplying the number by that. We 
don't really have the necessary introspection to determine better values 
(and it's not really possible, because the devices may be hidden behind 
a RAID controller (or a SAN). So we can't really do much.

Maybe the best thing we can do is just completely abandon the "number of 
spindles" idea, and just say "number of I/O requests to prefetch". 
Possibly with an explanation of how to estimate it (devices * queue length).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Improving test coverage of extensions with pg_dump
Next
From: Robert Haas
Date:
Subject: Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)