(2013/07/23 20:02), Greg Smith wrote:
> On 7/23/13 2:16 AM, Satoshi Nagayasu wrote:
>> I've been working on new pgstattuple function to allow
>> block sampling [1] in order to reduce block reads while
>> scanning a table. A PoC patch is attached.
>
> Take a look at all of the messages linked in
> https://commitfest.postgresql.org/action/patch_view?id=778
>
> Jaime and I tried to do what you're working on then, including a random
> block sampling mechanism modeled on the stats_target mechanism. We
> didn't do that as part of pgstattuple though, which was a mistake.
>
> Noah created some test cases as part of his thorough review that were
> not computing the correct results. Getting the results correct for all
> of the various types of PostgreSQL tables and indexes ended up being
> much harder than the sampling part. See
> http://www.postgresql.org/message-id/20120222052747.GE8592@tornado.leadboat.com
> in particular for that.
Thanks for the info. I have read the previous discussion.
I'm looking forward to seeing more feedback on this approach,
in terms of design and performance improvement.
So, I have submitted this for the next CF.
>> This new function, pgstattuple2(), samples only 3,000 blocks
>> (which accounts 24MB) from the table randomly, and estimates
>> several parameters of the entire table.
>
> There should be an input parameter to the function for how much sampling
> to do, and if it's possible to make the scale for it to look like
> ANALYZE that's helpful too.
>
> I have a project for this summer that includes reviving this topic and
> making sure it works on some real-world systems. If you want to work on
> this too, I can easily combine that project into what you're doing.
Yeah, I'm interested in that. Something can be shared?
Regards,
--
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp