Re: [PATCH]Tablesample Submission - Mailing list pgsql-hackers

From Hitoshi Harada
Subject Re: [PATCH]Tablesample Submission
Date
Msg-id CAP7QgmkMhjoUi3_RetR7nH4ZOm=B_M7BWdAkv5cf-rekYU5H0w@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH]Tablesample Submission  (Qi Huang <huangqiyx@outlook.com>)
Responses Re: [PATCH]Tablesample Submission
Re: [PATCH]Tablesample Submission
List pgsql-hackers
On Tue, Aug 21, 2012 at 8:08 AM, Qi Huang <huangqiyx@outlook.com> wrote:
>> Please add your patch here:
>>
>> https://commitfest.postgresql.org/action/commitfest_view/open
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>
> Hi, Robert
>     I added it under "Miscellaneous".
>     https://commitfest.postgresql.org/action/patch_view?id=918
>
>

Patch does not apply cleanly against latest master.  outfuncs.c,
allpath.c and cost.h have rejected parts.  The make check failed in a
lot of cases up to 26 out of 133.  I didn't look into each issue but I
suggest rebasing on the latest master and making sure the regression
test passes.

Some of the patch don't follow our coding standard.  Please adjust
brace positions, for example.  For the header include list and
Makefile, place a new files in alphabetical order.

The patch doesn't include any documentation.  Consider add some doc
patch for such a big feature like this.

You should update kwlist.h for REPEATABLE but I'm not sure if
REPEATABLE should become a reserved keyword yet.

I don't see why you created T_TableSampleInfo.  TableSampleInfo looks
fine with a simple struct rather than a Node.

If we want to disable a cursor over a sampling table, we should check
it in the parser not the planner.  In the wiki page, one of the TODOs
says about cursor support, but how much difficult is it?  How does the
standard say?

s/skiped/skipped/

Don't we need to reset seed on ExecReScanSampleScan?  Should we add a
new executor node SampleScan?  It seems everything about random
sampling is happening under heapam.

It looks like BERNOULLI allocates heap tuple array beforehand, and
copy all the tuples into it.  This doesn't look acceptable when you
are dealing with a large number of rows in the table.

As wiki says, BERNOULLI relies on the statistics of the table, which
doesn't sound good to me.  Of course we could say this is our
restriction and say good-bye to users who hadn't run ANALYZE first,
but it is too hard for a normal users to use it.  We may need
quick-and-rough count(*) for this.

That is pretty much I have so far.  I haven't read all the code nor
the standard, so I might be wrong somewhere.

Thanks,
-- 
Hitoshi Harada



pgsql-hackers by date:

Previous
From: "Etsuro Fujita"
Date:
Subject: Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY
Next
From: Fujii Masao
Date:
Subject: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown