On Sat, Feb 17, 2018 at 12:22:37PM -0500, Alvaro Hernandez wrote:
>
>
> On 17/02/18 12:17, Tom Lane wrote:
> > Alvaro Hernandez <aht@ongres.com> writes:
> >> On 17/02/18 11:26, Tom Lane wrote:
> >>> Fabien COELHO <coelho@cri.ensmp.fr> writes:
> >>>> Here is a attempt at extending --scale so that it can be given a size.
> >>> I do not actually find this to be a good idea. It's going to be
> >>> platform-dependent, or not very accurate, or both, and thereby
> >>> contribute to confusion by making results less reproducible.
> >>>
> >>> Plus, what do we do if the backend changes table representation in
> >>> some way that invalidates Kaarel's formula altogether? More confusion
> >>> would be inevitable.
> >> Why not then insert a "few" rows, measure size, truncate the table,
> >> compute the formula and then insert to the desired user requested size?
> >> (or insert what should be the minimum, scale 1, measure, and extrapolate
> >> what's missing). It doesn't sound too complicated to me, and targeting a
> >> size is something that I believe it's quite good for user.
> > Then you'd *really* have irreproducible results.
> >
> > regards, tom lane
>
> You also have irreproducible results today, according to your
> criteria. Either you agree on the number of rows but may not agree on
> the size (today), or you agree on the size but may not agree on the
> number of rows. Right now you can only pick the former, while I think
> people would significantly appreciate the latter. If neither is correct,
> let's at least provide the choice.
What if we consider using ascii (utf8?) text file sizes as a reference
point, something independent from the database?
I realize even if a flat file size can be used as a more consistent
reference across platforms, it doesn't help with accurately determining
the final data file sizes due to any architecture specific nuances or
changes in the backend. But perhaps it might still offer a little more
meaning to be able to say "50 gigabytes of bank account data" rather
than "10 million rows of bank accounts" when thinking about size over
cardinality.
Regards,
Mark
--
Mark Wong http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services