Re: estimating # of distinct values - Mailing list pgsql-hackers

From tv@fuzzy.cz
Subject Re: estimating # of distinct values
Date
Msg-id f597bf00a7301a6b2e251caed26fc3d4.squirrel@sq.gransy.com
Whole thread Raw
In response to Re: estimating # of distinct values  (Jim Nasby <jim@nasby.net>)
Responses Re: estimating # of distinct values  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
> On Jan 17, 2011, at 6:36 PM, Tomas Vondra wrote:
>> 1) Forks are 'per relation' but the distinct estimators are 'per
>>   column' (or 'per group of columns') so I'm not sure whether the file
>>   should contain all the estimators for the table, or if there should
>>   be one fork for each estimator. The former is a bit difficult to
>>   manage, the latter somehow breaks the current fork naming convention.
>
> Yeah, when I looked at the fork stuff I was disappointed to find out
> there's essentially no support for dynamically adding forks. There's two
> other possible uses for that I can think of:
>
> - Forks are very possibly a more efficient way to deal with TOAST than
> having separate tables. There's a fair amount of overhead we pay for the
> current setup.
> - Dynamic forks would make it possible to do a column-store database, or
> at least something approximating one.
>
> Without some research, there's no way to know if either of the above makes
> sense; but without dynamic forks we're pretty much dead in the water.
>
> So I wonder what it would take to support dynamically adding forks...

Interesting ideas, but a bit out of scope. I think I'll go with one fork
containing all the estimators for now, although it might be inconvenient
in some cases. I was thinking about rebuilding a single estimator with
increased precision - in that case the size changes so that all the other
data has to be shifted. But this won't be very common (usually all the
estimators will be rebuilt at the same time), and it's actually doable.

So the most important question is how to intercept the new/updated rows,
and where to store them. I think each backend should maintain it's own
private list of new records and forward them only in case of commit. Does
that sound reasonable?

regards
Tomas



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pg_basebackup for streaming base backups
Next
From: Anssi Kääriäinen
Date:
Subject: Re: REVIEW: Extensions support for pg_dump