Home > mailing lists

Re: [PATCH]Tablesample Submission - Mailing list pgsql-hackers

From	Hitoshi Harada
Subject	Re: [PATCH]Tablesample Submission
Date	September 18, 2012 12:32:55
Msg-id	CAP7QgmkMhjoUi3_RetR7nH4ZOm=B_M7BWdAkv5cf-rekYU5H0w@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH]Tablesample Submission (Qi Huang <huangqiyx@outlook.com>)
Responses	Re: [PATCH]Tablesample Submission Re: [PATCH]Tablesample Submission
List	pgsql-hackers

Tree view

On Tue, Aug 21, 2012 at 8:08 AM, Qi Huang <huangqiyx@outlook.com> wrote:
>> Please add your patch here:
>>
>> https://commitfest.postgresql.org/action/commitfest_view/open
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>
> Hi, Robert
>     I added it under "Miscellaneous".
>     https://commitfest.postgresql.org/action/patch_view?id=918
>
>

Patch does not apply cleanly against latest master.  outfuncs.c,
allpath.c and cost.h have rejected parts.  The make check failed in a
lot of cases up to 26 out of 133.  I didn't look into each issue but I
suggest rebasing on the latest master and making sure the regression
test passes.

Some of the patch don't follow our coding standard.  Please adjust
brace positions, for example.  For the header include list and
Makefile, place a new files in alphabetical order.

The patch doesn't include any documentation.  Consider add some doc
patch for such a big feature like this.

You should update kwlist.h for REPEATABLE but I'm not sure if
REPEATABLE should become a reserved keyword yet.

I don't see why you created T_TableSampleInfo.  TableSampleInfo looks
fine with a simple struct rather than a Node.

If we want to disable a cursor over a sampling table, we should check
it in the parser not the planner.  In the wiki page, one of the TODOs
says about cursor support, but how much difficult is it?  How does the
standard say?

s/skiped/skipped/

Don't we need to reset seed on ExecReScanSampleScan?  Should we add a
new executor node SampleScan?  It seems everything about random
sampling is happening under heapam.

It looks like BERNOULLI allocates heap tuple array beforehand, and
copy all the tuples into it.  This doesn't look acceptable when you
are dealing with a large number of rows in the table.

As wiki says, BERNOULLI relies on the statistics of the table, which
doesn't sound good to me.  Of course we could say this is our
restriction and say good-bye to users who hadn't run ANALYZE first,
but it is too hard for a normal users to use it.  We may need
quick-and-rough count(*) for this.

That is pretty much I have so far.  I haven't read all the code nor
the standard, so I might be wrong somewhere.

Thanks,
-- 
Hitoshi Harada

pgsql-hackers by date:

From: "Etsuro Fujita"
Date: 18 September 2012, 11:46:00
Subject: Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY

From: Fujii Masao
Date: 18 September 2012, 15:32:49
Subject: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

Re: [PATCH]Tablesample Submission - Mailing list pgsql-hackers

Previous

Next