Re: [HACKERS] [GSoC] Personal presentation and request for clarification - Mailing list pgsql-hackers

From João Miguel Afonso
Subject Re: [HACKERS] [GSoC] Personal presentation and request for clarification
Date
Msg-id VI1PR0801MB20316676FC26FB842E860198C13C0@VI1PR0801MB2031.eurprd08.prod.outlook.com
Whole thread Raw
In response to Re: [HACKERS] [GSoC] Personal presentation and request for clarification  (Andrew Borodin <borodin@octonica.com>)
List pgsql-hackers

> From: Robert Haas <robertmhaas@gmail.com>
> Sent: 09 March 2017 01:09
>
>> The project that most caught my eye was on "Implementing push-based query
>> executor".
>> Although it completely fits my capabilities and current research, I have
>> some concerns on "The ability to understand and modify PostgresSQL executor
>> code" as I had not enough time to understand the dimension of the referred
>> changes.
> They are formidable.

I want to contribute with valuable work, so I will focus on my second
choice: "Sorting algorithms benchmark and implementation". Maybe when
I get more familiarised with the PostgreSQL project I would give it a
try.


> From: pgsql-hackers-owner@Postgresql.org <pgsql-hackers-owner@Postgresql.org> on behalf of Kevin Grittner <kgrittn@gmail.com>
> Sent: 17 March 2017 13:57
>
> Some ideas for desirable content:
>   - A resume or CV of the student, including any prior GSoC work
>   - Their reasons for wanting to participate
>   - What else they have planned for the summer, and what their time
>     commitment to the GSoC work will be
>   - A clear statement that there will be no intellectual property
>     problems with the work they will be doing -- that the PostgreSQL
>     community will be able to use their work without encumbrances
>     (e.g., there should be no agreements related to prior or
>     ongoing work which might assign the rights to the work they do
>     to someone else)
>   - A description of what they will do, and how
>   - Milestones with dates
>   - What they consider to be the test that they have successfully
>     completed the project

Using the information posted HERE and Kevin Grittner's suggestions,
I would like to start writing my proposal as well as begin my work on the
project.

In the last two weeks I have been using some profiling tools like
dstat, top, iostat,... in my university's cluster with the "NAS
Parallel Benchmarks" package from NASA. Now I will start another
academic work using DTrace on a Solaris machine.

I have permanent access to the cluster of SeARCH6, description HERE.
I know it is not that powerful, but it's quite heterogeneous, composed
by many generations of processors, including both Intel many core
solutions (the KNC and the not listed KNL), what I think is good
to test the algorithms in many different scenarios.

I have no permissions to install new software, so I guess I can't use
specific benchmarking software, but it can still be use to test the
algorithm alone, using some selected data sets.

The point here is just to inform about important knowledge and
material that maybe I can use on the project. Other information about
my motivations and competences can be found HERE.

Anyway, I would like to accomplish some small goals before the
23 April's deadline, so I can spot and be prepared for some trickier
parts of the project.

As I will have classes and evaluations in June, and possibly an
internship in the University of Texas in July, I will have to
work in both tasks at the same time, so I made a schedule with
what I think I can do, leaving August almost free to explore the
project (micro optimisations, ...) or compensate in case something
doesn't go as expected.

I would appreciate if you could review it and a advise me if I'm
pointing on the wrong direction.

Schedule:

Before April 3:

project specific work:
- read all the suggested papers
- implement all the sorting algorithms (functional but
 unoptimised versions)
- validate core ideas with the community
integration work:
- read some of the PostgreSQL documentation and source code
- read the HACKERS mailing list

April 3 - May 30:

project specific work:
- discuss possible benchmarks and optimization possibilities
- do a simple benchmark to the current used sort
integration work:
- go further on understanding PostgreSQL project
- keep reading the mailing list and clarify possible doubts

May 30 - June 26 (Coding officially begins!):

- set up the final benchmark environment
- correctly benchmark current sort
- macro optimise all the implemented sorts and define performance
 goals
- test the produced code vs the current one 

June 26 - July 24:

micro optimise all the algorithms:
- study cache/memory issues, vectorisation, ...
- first steps on parallelism
do a full profile of the current work:
- CPU and memory usage
- execution time
- number of operations (per second)

July 24 - August 29:

- optimise parallel solutions
- discuss some possible optimisations and test them
- revise and document all the code
- produce valuable report for future reference

After August 29:

- keep in contact and look for a possible project that fits
 my skills


A small apart:

I read this INFO , but I have been strugling with using the internet
style quoting in outlook's browser client and I end up by doing it by
hand. I have never user a development mailing list before, so any
tip woud be valuable.

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Metadata about relation creation & full scans.
Next
From: Mithun Cy
Date:
Subject: Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)