Home > mailing lists

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
Date	March 23, 2017 18:51:55
Msg-id	CAFj8pRCc6JCYQVHFn8pJswR8OwVL61UJPsOHBA==WK2X_PAdMA@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling (Pavel Stehule <pavel.stehule@gmail.com>)
List	pgsql-hackers

Tree view

1) Is there anyone out of PG comunity who will be interested in such project and can be a menthor?
2) These two points have a general idea – to simplify work with a large amount of data from a different sources, but mybe it would be better to focus on the single task?

I spent lot of time on implementation @1 - maybe I found somewhere a patch. Both tasks has some common - you have to divide import to more batches.

Patch is in /dev/null :( - My implementation was based on subtransactions for 1000 rows. When some checks fails, then I throw subtransaction and I imported every row from block in own subtransaction. It was a prototype - I didn't search some smarter implementation.

3) Is it realistic to mostly finish both parts during the 3+ months of almost full-time work or I am too presumptuous?

It is possible, I am thinking - I am not sure about all possible details, but basic implementation can be done in 3 months.

Some data, some check depends on order - it can be a problem in parallel processing - you should to define corner cases.

I will be very appreciate to any comments and criticism.

P.S. I know about very interesting ready projects from the PG's comunity https://wiki.postgresql.org/wiki/GSoC_2017, but it always more interesting to solve your own problems, issues and questions, which are the product of you experience with software. That's why I dare to propose my own project.

P.P.S. A few words about me: I'm a PhD stident in Theoretical physics from Moscow, Russia, and highly involved in software development since 2010. I guess that I have good skills in Python, Ruby, JavaScript, MATLAB, C, Fortran development and basic understanding of algorithms design and analysis.

Best regards,

Alexey

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 23 March 2017, 18:48:59
Subject: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables

From: Craig Ringer
Date: 23 March 2017, 18:53:14
Subject: Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling - Mailing list pgsql-hackers

Previous

Next