Re: [GSoC 2018] Proposal Draft - Mailing list pgsql-hackers

From Kefan Yang
Subject Re: [GSoC 2018] Proposal Draft
Date
Msg-id CAF448YLzU4vf4GJMiQw7-=DNmWfuSzdgTOWqEm33DBuMtX1S2Q@mail.gmail.com
Whole thread Raw
In response to Re: [GSoC 2018] Proposal Draft  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers
Hello,

Thanks again for all the feedback! They have really helped me a lot, and I've made some changes in my proposal accordingly.

The major updates in this version are in the timeline and implementation sections. I feel there's still a lot to improve, so I would appreciate any comment:)

I decide to add the new hashing table into this proposal, and I am still working on it. This part is still quite incomplete in the proposal.

Regards,

Kefan

2018-03-18 9:39 GMT-07:00 Andrey Borodin <x4mmm@yandex-team.ru>:
Hi Kefar!

18 марта 2018 г., в 5:34, Kefan Yang <starordust@gmail.com> написал(а):

I am Kefan Yang, a third-year Computing Science student from Simon Fraser University, Canada. I am very interested in the sorting algorithm benchmarking and implementation issue you mentioned on the idealist of Google Summer of Code 2018. 

I am currently working on my proposal, but I am a little bit worried about whether I am on the right track. I've encountered some problems in the implementation of the benchmark of those sorting algorithms. I briefly described my current ideas in the proposal draft, but they may change a bit as I read more research papers. I sincerely hope to get some feedback from you to improve my proposal, especially for the implementation part.

I've attached a proposal draft to this email. It's a brief one but I guess it's enough to show my current ideas:)

Peter have already added some context, I'll comment a bit on contents of your proposal.

I expect implementation of both timsort (dual pivot qsort) and introsort (depth-limited qsort). MySQL recently switched to introsort (they just used STL version instead of his own qsort).
Sorting routines are important part of many DBMS parts. But, indeed, sorting of actual user data is related to qsort very distantly - often DBMS deals with external sorting.

There is built-in Postgres way to benchmark - pgbench performance. But I do not think that it's output will be statistically conclusive. I think that benchmarking will be at least as time consuming as implementation itself.

BTW I think it is good to add Postgres hash table implementation to this project too. Currently, we use Larson's linear hashing (HTAB struct, dynahash.c), which is somewhat seldom used nowadays. It would be good to tests cuckoo hashing and some open addressing against HTAB interface.

Please do not forget to add to your proposal review of other's patches. If you will produce some useful output in a patch form and you will want it committed you will have to review someone else patch. This is required by Postgres dev process.

Best regards, Andrey Borodin.

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Question about WalSndWriteData
Next
From: Thomas Munro
Date:
Subject: Re: JIT compiling with LLVM v12.2