Re: Gsoc2012 idea, tablesample - Mailing list pgsql-hackers
From | Qi Huang |
---|---|
Subject | Re: Gsoc2012 idea, tablesample |
Date | |
Msg-id | BAY159-W3319FCEE2FBB7561A1CEE9A33F0@phx.gbl Whole thread Raw |
In response to | Gsoc2012 idea, tablesample (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: Gsoc2012 idea, tablesample
|
List | pgsql-hackers |
Hi, Heikki
Best Regards and Thanks
Thanks for your advice.
I will change my plan accordingly. But I have a few questions.
> 1. We probably don't want the SQL syntax to be added to the grammar.
> This should be written as an extension, using custom functions as the
> API, instead of extra SQL syntax.
>
> This should be written as an extension, using custom functions as the
> API, instead of extra SQL syntax.
>
1. "This should be written as an extension, using custom functions as the API". Could you explain a bit more what does this mean?
> 2. It's not very useful if it's just a dummy replacement for "WHERE
> random() < ?". It has to be more advanced than that. Quality of the
> sample is important, as is performance. There was also an interesting
> idea of on implementing monetary unit sampling.
> random() < ?". It has to be more advanced than that. Quality of the
> sample is important, as is performance. There was also an interesting
> idea of on implementing monetary unit sampling.
2. In the plan, I mentioned using optimizer statistics to improve the quality of sampling. I may emphasize on that point. I will read about monetary unit sampling and add into the plan about possibility of implementing this idea.
> Another idea that Robert Haas suggested was to add support doing a TID
> scan for a query like "WHERE ctid< '(501,1)'". That's not enough work
> for GSoC project on its own, but could certainly be a part of it.
> scan for a query like "WHERE ctid< '(501,1)'". That's not enough work
> for GSoC project on its own, but could certainly be a part of it.
3. I read about the replies on using ctid. But I don't quite understand how that might help. ctid is just a physical location of row version within the table. If I do "where ctid<'(501, 1)'", what is actually happening? Can I add in this as an optional implementation? I think I can check how to do this if I can have enough time in this project.
Best Regards and Thanks
Huang Qi Victor
Computer Science Department- National University of Singapore
> Date: Tue, 17 Apr 2012 09:16:29 +0300
> From: heikki.linnakangas@enterprisedb.com
> To: josh@agliodbs.com
> CC: huangqiyx@hotmail.com; pgsql-hackers@postgresql.org; andres@anarazel.de; alvherre@commandprompt.com; neil.conway@gmail.com; daniel@heroku.com; cbbrowne@gmail.com; kevin.grittner@wicourts.gov
> Subject: [HACKERS] Gsoc2012 idea, tablesample
>
> On 24.03.2012 22:12, Joshua Berkus wrote:
> > Qi,
> >
> > Yeah, I can see that. That's a sign that you had a good idea for a project, actually: your idea is interesting enough that people want to debate it. Make a proposal on Monday and our potential mentors will help you refine the idea.
>
> Yep. The discussion withered, so let me try to summarize:
>
> 1. We probably don't want the SQL syntax to be added to the grammar.
> This should be written as an extension, using custom functions as the
> API, instead of extra SQL syntax.
>
> 2. It's not very useful if it's just a dummy replacement for "WHERE
> random() < ?". It has to be more advanced than that. Quality of the
> sample is important, as is performance. There was also an interesting
> idea of on implementing monetary unit sampling.
>
> I think this would be a useful project if those two points are taken
> care of.
>
> Another idea that Robert Haas suggested was to add support doing a TID
> scan for a query like "WHERE ctid< '(501,1)'". That's not enough work
> for GSoC project on its own, but could certainly be a part of it.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> From: heikki.linnakangas@enterprisedb.com
> To: josh@agliodbs.com
> CC: huangqiyx@hotmail.com; pgsql-hackers@postgresql.org; andres@anarazel.de; alvherre@commandprompt.com; neil.conway@gmail.com; daniel@heroku.com; cbbrowne@gmail.com; kevin.grittner@wicourts.gov
> Subject: [HACKERS] Gsoc2012 idea, tablesample
>
> On 24.03.2012 22:12, Joshua Berkus wrote:
> > Qi,
> >
> > Yeah, I can see that. That's a sign that you had a good idea for a project, actually: your idea is interesting enough that people want to debate it. Make a proposal on Monday and our potential mentors will help you refine the idea.
>
> Yep. The discussion withered, so let me try to summarize:
>
> 1. We probably don't want the SQL syntax to be added to the grammar.
> This should be written as an extension, using custom functions as the
> API, instead of extra SQL syntax.
>
> 2. It's not very useful if it's just a dummy replacement for "WHERE
> random() < ?". It has to be more advanced than that. Quality of the
> sample is important, as is performance. There was also an interesting
> idea of on implementing monetary unit sampling.
>
> I think this would be a useful project if those two points are taken
> care of.
>
> Another idea that Robert Haas suggested was to add support doing a TID
> scan for a query like "WHERE ctid< '(501,1)'". That's not enough work
> for GSoC project on its own, but could certainly be a part of it.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: