Re: pgbench - allow to specify scale as a size - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench - allow to specify scale as a size
Date
Msg-id alpine.DEB.2.20.1802190832140.10483@lancre
Whole thread Raw
In response to Re: pgbench - allow to specify scale as a size  (Alvaro Hernandez <aht@ongres.com>)
List pgsql-hackers
Hello Alvaro & Tom,

>>> Why not then insert a "few" rows, measure size, truncate the table, 
>>> compute the formula and then insert to the desired user requested 
>>> size? (or insert what should be the minimum, scale 1, measure, and 
>>> extrapolate what's missing). It doesn't sound too complicated to me, 
>>> and targeting a size is something that I believe it's quite good for 
>>> user.
>> 
>> The formula I used approximates the whole database, not just one table. 
>> There was one for the table, but this is only part of the issue. In 
>> particular, ISTM that index sizes should be included when caching is 
>> considered.
>> 
>> Also, index sizes are probably in n ln(n), so some level of 
>> approximation is inevitable.
>> 
>> Moreover, the intrinsic granularity of TPC-B as multiple of 100,000 
>> rows makes it not very precise wrt size anyway.
>
> Sure, makes sense, so my second suggestion seems more reasonable: insert 
> with scale 1, measure there (ok, you might need to crete indexes only to 
> later drop them), and if computed scale > 1 then insert whatever is left 
> to insert. Shouldn't be a big deal to me.

I could implement that, even if it would lead to some approximation 
nevertheless: ISTM that the very large scale regression performed by 
Kaarel is significantly more precise than testing with scale 1 (typically 
a few MiB) and extrapolation that to hundreds of GiB.

Maybe it could be done with kind of an open ended dichotomy, but creating 
and recreating index looks like an ugly solution, and what should be 
significant is the whole database size, including tellers & branches 
tables and all indexes, so I'm not convinced. Now as tellers & branches 
tables have basically the same structure as accounts, it could be just 
scaled by assuming that it would incur the same storage per row.

Anyway, even if I do not like it, it could be better than nothing. The key 
point for me is that if Tom is dead set against the feature the patch is 
dead anyway.

Tom, would Alvaro approach be more admissible to you that a fixed formula 
that would need updating, keeping in mind that such a feature implies 
some level approximation?

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: spelling of enable_partition_wise_join
Next
From: David Rowley
Date:
Subject: Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath