Thread: GSoC 2014 - mentors, students and admins

GSoC 2014 - mentors, students and admins

From
Thom Brown
Date:
Hi all,

Application to Google Summer of Code 2014 can be made as of next
Monday (3rd Feb), and then there will be a 12 day window in which to
submit an application.

I'd like to gauge interest from both mentors and students as to
whether we'll want to do this.

And I'd be fine with being admin again this year, unless there's
anyone else who would like to take up the mantle?

Who would be up for mentoring this year?  And are there any project
ideas folk would like to suggest?

Thanks

Thom


Re: GSoC 2014 - mentors, students and admins

From
Atri Sharma
Date:



On Tue, Jan 28, 2014 at 11:16 PM, Atri Sharma <atri.jiit@gmail.com> wrote:
 


On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown <thom@linux.com> wrote:
Hi all,

Application to Google Summer of Code 2014 can be made as of next
Monday (3rd Feb), and then there will be a 12 day window in which to
submit an application.

I'd like to gauge interest from both mentors and students as to
whether we'll want to do this.

And I'd be fine with being admin again this year, unless there's
anyone else who would like to take up the mantle?

Who would be up for mentoring this year?  And are there any project
ideas folk would like to suggest?
 

Hi,

I would like to bring up the addition to MADLIB algorithms again this year.

Also, some work on the foreign table constraints could be helpful.


Hi,

Also, can we consider a project in an extension to be a project in GSoC 2014 as GSoC 2014 under PostgreSQL?

I was thinking of having some support for writable FDW in JDBC_FDW, if possible.

Regards,

Atri
--
Regards,
 
Atri
l'apprenant

Re: GSoC 2014 - mentors, students and admins

From
Atri Sharma
Date:



On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown <thom@linux.com> wrote:
Hi all,

Application to Google Summer of Code 2014 can be made as of next
Monday (3rd Feb), and then there will be a 12 day window in which to
submit an application.

I'd like to gauge interest from both mentors and students as to
whether we'll want to do this.

And I'd be fine with being admin again this year, unless there's
anyone else who would like to take up the mantle?

Who would be up for mentoring this year?  And are there any project
ideas folk would like to suggest?
 

Hi,

I would like to bring up the addition to MADLIB algorithms again this year.

Also, some work on the foreign table constraints could be helpful.

Regards,

Atri

--
Regards,
 
Atri
l'apprenant

Re: GSoC 2014 - mentors, students and admins

From
Stephen Frost
Date:
Thom,

* Thom Brown (thom@linux.com) wrote:
> Application to Google Summer of Code 2014 can be made as of next
> Monday (3rd Feb), and then there will be a 12 day window in which to
> submit an application.

This is just for PG to be a participating organization, right?  There's
a while before mentors and students get invovled, as I understand it.

> I'd like to gauge interest from both mentors and students as to
> whether we'll want to do this.

Yes.

> And I'd be fine with being admin again this year, unless there's
> anyone else who would like to take up the mantle?

Having you do it works for me. :)

> Who would be up for mentoring this year?  And are there any project
> ideas folk would like to suggest?

I'm interested in mentoring and, unlike previous years, I've been
collecting a personal list of things that I'd like to see worked on for
PG which could be GSoC projects and will provide such in the next few
days to this list (unless there's a different list that people want such
posted to..?).

    Thanks,

        Stephen

Attachment

Re: GSoC 2014 - mentors, students and admins

From
Thom Brown
Date:
On 28 January 2014 19:43, Stephen Frost <sfrost@snowman.net> wrote:
> Thom,
>
> * Thom Brown (thom@linux.com) wrote:
>> Application to Google Summer of Code 2014 can be made as of next
>> Monday (3rd Feb), and then there will be a 12 day window in which to
>> submit an application.
>
> This is just for PG to be a participating organization, right?  There's
> a while before mentors and students get invovled, as I understand it.

Yes, correct.  Students and mentors don't need to be signed up until April.

>> Who would be up for mentoring this year?  And are there any project
>> ideas folk would like to suggest?
>
> I'm interested in mentoring and, unlike previous years, I've been
> collecting a personal list of things that I'd like to see worked on for
> PG which could be GSoC projects and will provide such in the next few
> days to this list (unless there's a different list that people want such
> posted to..?).

That's great.  I don't see any problem with posting suggestions here,
although I'd suggest refraining from going in-depth as that can come
later.  If there's enough interest and agreement, we'll go ahead and
apply.

Thom


Re: [HACKERS] GSoC 2014 - mentors, students and admins

From
Heikki Linnakangas
Date:
On 01/28/2014 07:34 PM, Thom Brown wrote:
> And I'd be fine with being admin again this year, unless there's
> anyone else who would like to take up the mantle?

Please do, thanks!

> Who would be up for mentoring this year?

I can mentor.

- Heikki


Re: GSoC 2014 - mentors, students and admins

From
David Fetter
Date:
On Tue, Jan 28, 2014 at 05:34:21PM +0000, Thom Brown wrote:
> Hi all,
>
> Application to Google Summer of Code 2014 can be made as of next
> Monday (3rd Feb), and then there will be a 12 day window in which to
> submit an application.
>
> I'd like to gauge interest from both mentors and students as to
> whether we'll want to do this.
>
> And I'd be fine with being admin again this year, unless there's
> anyone else who would like to take up the mantle?

Thanks for your hard work administering last year, and thanks even
more for taking this on in light of that experience :)

> Who would be up for mentoring this year?  And are there any project
> ideas folk would like to suggest?

I'd be delighted to mentor.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: [HACKERS] GSoC 2014 - mentors, students and admins

From
Alexander Korotkov
Date:
Hi!

On Tue, Jan 28, 2014 at 9:34 PM, Thom Brown <thom@linux.com> wrote:
And I'd be fine with being admin again this year, unless there's
anyone else who would like to take up the mantle?
 
Thanks for your work. I would like to see you as admin this year again.

Who would be up for mentoring this year?  And are there any project
ideas folk would like to suggest?
 
I would like to be mentor.

------
With best regards,
Alexander Korotkov.

Re: GSoC 2014 - mentors, students and admins

From
Andreas 'ads' Scherbaum
Date:
Hi,

On 01/28/2014 06:46 PM, Atri Sharma wrote:
> On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown <thom@linux.com
> <mailto:thom@linux.com>> wrote:
>
>     Hi all,
>
>     Application to Google Summer of Code 2014 can be made as of next
>     Monday (3rd Feb), and then there will be a 12 day window in which to
>     submit an application.
>
>     I'd like to gauge interest from both mentors and students as to
>     whether we'll want to do this.
>
>     And I'd be fine with being admin again this year, unless there's
>     anyone else who would like to take up the mantle?
>
>     Who would be up for mentoring this year?  And are there any project
>     ideas folk would like to suggest?
>
> I would like to bring up the addition to MADLIB algorithms again this year.

I've spoken with the MADlib team at goivotal and they are ok to support
this proposal. Therefore I offer to mentor this.


Regards,

--
                Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


Re: GSoC 2014 - mentors, students and admins

From
Atri Sharma
Date:
Awesome. I can be an assistant mentor for this one is possible or I could mentor some other project.

On Tuesday, February 25, 2014, Andreas 'ads' Scherbaum <adsmail@wars-nicht.de> wrote:

Hi,

On 01/28/2014 06:46 PM, Atri Sharma wrote:
On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown <thom@linux.com
<mailto:thom@linux.com>> wrote:

    Hi all,

    Application to Google Summer of Code 2014 can be made as of next
    Monday (3rd Feb), and then there will be a 12 day window in which to
    submit an application.

    I'd like to gauge interest from both mentors and students as to
    whether we'll want to do this.

    And I'd be fine with being admin again this year, unless there's
    anyone else who would like to take up the mantle?

    Who would be up for mentoring this year?  And are there any project
    ideas folk would like to suggest?

I would like to bring up the addition to MADLIB algorithms again this year.

I've spoken with the MADlib team at goivotal and they are ok to support this proposal. Therefore I offer to mentor this.


Regards,

--
                                Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


--
Regards,
 
Atri
l'apprenant

Re: GSoC 2014 - mentors, students and admins

From
Thom Brown
Date:
On 25 February 2014 13:28, Andreas 'ads' Scherbaum <adsmail@wars-nicht.de> wrote:

Hi,


On 01/28/2014 06:46 PM, Atri Sharma wrote:
On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown <thom@linux.com
<mailto:thom@linux.com>> wrote:

    Hi all,

    Application to Google Summer of Code 2014 can be made as of next
    Monday (3rd Feb), and then there will be a 12 day window in which to
    submit an application.

    I'd like to gauge interest from both mentors and students as to
    whether we'll want to do this.

    And I'd be fine with being admin again this year, unless there's
    anyone else who would like to take up the mantle?

    Who would be up for mentoring this year?  And are there any project
    ideas folk would like to suggest?

I would like to bring up the addition to MADLIB algorithms again this year.

I've spoken with the MADlib team at goivotal and they are ok to support this proposal. Therefore I offer to mentor this.

Are there any more prospective mentors?  We'll want some folk to act as back-up mentors too to ensure projects can still be completed should any mentor become unavailable.

--
Thom

Re: [HACKERS] GSoC 2014 - mentors, students and admins

From
David Fetter
Date:
On Thu, Feb 27, 2014 at 07:54:13PM +0000, Thom Brown wrote:
> On 25 February 2014 13:28, Andreas 'ads' Scherbaum <adsmail@wars-nicht.de>wrote:
> > I've spoken with the MADlib team at goivotal and they are ok to
> > support this proposal. Therefore I offer to mentor this.
>
> Are there any more prospective mentors?  We'll want some folk to act
> as back-up mentors too to ensure projects can still be completed
> should any mentor become unavailable.

For MADlib, no.  Are you asking for mentors in general?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: [HACKERS] GSoC 2014 - mentors, students and admins

From
Thom Brown
Date:
On 27 February 2014 21:08, David Fetter <david@fetter.org> wrote:
On Thu, Feb 27, 2014 at 07:54:13PM +0000, Thom Brown wrote:
> On 25 February 2014 13:28, Andreas 'ads' Scherbaum <adsmail@wars-nicht.de>wrote:
> > I've spoken with the MADlib team at goivotal and they are ok to
> > support this proposal. Therefore I offer to mentor this.
>
> Are there any more prospective mentors?  We'll want some folk to act
> as back-up mentors too to ensure projects can still be completed
> should any mentor become unavailable.

For MADlib, no.  Are you asking for mentors in general?

Ah yes, I should clarify.  Yes, mentors in general.

--
Thom

Re: GSoC 2014 - mentors, students and admins

From
Greg Stark
Date:
On Tue, Jan 28, 2014 at 5:34 PM, Thom Brown <thom@linux.com> wrote:
> Who would be up for mentoring this year?  And are there any project
> ideas folk would like to suggest?

I mentored in the past and felt I didn't do a very good job because I
didn't really understand the project the student was working on.

There's precisely one project that I feel I would be competent to
mentor at this point. Making hash indexes WAL recoverable. This is
something that's easy to define the scope of and easy to determine if
the student is on track and easy to measure when finished. It's
something where as far as I can tell all the mentor work will be
purely technical advice.

Also it's something the project really really needs and is perfectly
sized for a GSOC project IMHO. Also it's a great project for a student
who might be interested in working on Postgres in the future since it
requires learning all our idiosyncratic build and source conventions
but doesn't require huge or controversial architectural changes.

I fear a number of items in the Wiki seem unrealistically large
projects for GSOC IMNSHO.

--
greg


Re: [HACKERS] GSoC 2014 - mentors, students and admins

From
Karol Trzcionka
Date:
W dniu 27.02.2014 22:25, Thom Brown pisze:
On 27 February 2014 21:08, David Fetter <david@fetter.org> wrote:
For MADlib, no.  Are you asking for mentors in general?

Ah yes, I should clarify.  Yes, mentors in general.
In general I can help but I'm not sure if I'm not too fresh in pgsql ;) However after GSOC as student I can try "the another side".
Regards,
Karol

Re: [HACKERS] GSoC 2014 - mentors, students and admins

From
Tan Tran
Date:
Hi Greg, pgsql-advocacy, and pgsql-hackers,

I'm interested in doing my GSoC project on this idea. I'm new to indexing and WAL, which I haven't encountered in my classes, but it sounds interesting and valuable to Postgresql. So here's my draft proposal. Do you mind giving your opinion and corrections? With your help I'll add some technical detail to my plans.

Thanks,
Tan Tran

Introduction
In write-ahead logging (WAL), all modifications to a database are written to a write-ahead log before being flushed to disk at periodic checkpoints. This method saves I/O operations, enables a continuous backup, and, in the case of database failure, guarantees data integrity up until the last saved checkpoint. In Postgresql’s implementation, transactions are written to XLog, which is divided into 16MB files (“segments”) that together comprise a complete history of transactions. Transactions are continually appended to the latest segment, while checkpointing continually archives segments up until the last checkpoint. Internally, a suite of XLog structures and functions interfaces with the various resource managers so they can log a sufficient amount of data to restore data (“redo”) in case of failure.
Another Postgresql feature is the creation of indexes on a invariant custom field; for example, on the LastName of a Person even though the primary key is ID. These custom indexes speed up row lookup. Postgres currently supports four index types: B-tree, GiST, and GIN, and hash. Indexes on the former three are WAL-recoverable, but hashing is not.

2. Proposal
As a GSoC student, I will implement WAL recovery of hash indexes using the other index types’ WAL code as a guide. Roughly, I will:
- Devise a way to store and retrieve hashing data within the XLog data structures. 
- In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record) in hash.c, branch to code for the various redo operations: creating an index, inserting into an index, deleting an index, and page operations (split, delete, update?).
- Code each branch by drawing on examples from btree_redo, gin_redo, and gist_redo, the existing XLog code of the other index types.

Benefits
Hash index searching is O(1), which is asymptotically faster than the O(n lg n) searching of a B-tree, and does not require custom indexing functions like GIN and GIST inherently do. Therefore it is desirable for rows that will only be retrieved on an equality or inequality relation. However, two things currently stand in the way of its popular use. From the Postgresql documentation,
“Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash if there were unwritten changes. Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. For these reasons, hash index use is presently discouraged.”
My project would solve the first problem, after which I would like to stay on and fix the second.

To be written: Quantifiable Results, Schedule, Completeness Criteria, Bio


On Feb 28, 2014, at 6:21 AM, Greg Stark <stark@mit.edu> wrote:

On Tue, Jan 28, 2014 at 5:34 PM, Thom Brown <thom@linux.com> wrote:
Who would be up for mentoring this year?  And are there any project
ideas folk would like to suggest?

I mentored in the past and felt I didn't do a very good job because I
didn't really understand the project the student was working on.

There's precisely one project that I feel I would be competent to
mentor at this point. Making hash indexes WAL recoverable. This is
something that's easy to define the scope of and easy to determine if
the student is on track and easy to measure when finished. It's
something where as far as I can tell all the mentor work will be
purely technical advice.

Also it's something the project really really needs and is perfectly
sized for a GSOC project IMHO. Also it's a great project for a student
who might be interested in working on Postgres in the future since it
requires learning all our idiosyncratic build and source conventions
but doesn't require huge or controversial architectural changes.

I fear a number of items in the Wiki seem unrealistically large
projects for GSOC IMNSHO.

--
greg


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2014 - mentors, students and admins

From
Tan Tran
Date:
Earlier I posted an email to this thread that I realize "hijacked" the discussion. Please continue replying to here instead.
 
On Feb 28, 2014, at 6:59 AM, Karol Trzcionka <karlikt@gmail.com> wrote:

W dniu 27.02.2014 22:25, Thom Brown pisze:
On 27 February 2014 21:08, David Fetter <david@fetter.org> wrote:
For MADlib, no.  Are you asking for mentors in general?

Ah yes, I should clarify.  Yes, mentors in general.
In general I can help but I'm not sure if I'm not too fresh in pgsql ;) However after GSOC as student I can try "the another side".
Regards,
Karol

GSoC on WAL-logging hash indexes

From
Tan Tran
Date:
Hi all,

Earlier I posted this in the wrong thread. Please excuse the double posting.

Tan Tran

Begin forwarded message:

From: Tan Tran <tankimtran@gmail.com>
Subject: Re: [HACKERS] GSoC 2014 - mentors, students and admins
Date: March 2, 2014 at 5:03:14 AM PST
To: Greg Stark <stark@mit.edu>
Cc: pgsql-advocacy <pgsql-advocacy@postgresql.org>, PostgreSQL-development <pgsql-hackers@postgresql.org>

Hi Greg, pgsql-advocacy, and pgsql-hackers,

I'm interested in doing my GSoC project on this idea. I'm new to indexing and WAL, which I haven't encountered in my classes, but it sounds interesting and valuable to Postgresql. So here's my draft proposal. Do you mind giving your opinion and corrections? With your help I'll add some technical detail to my plans.

Thanks,
Tan Tran

Introduction
In write-ahead logging (WAL), all modifications to a database are written to a write-ahead log before being flushed to disk at periodic checkpoints. This method saves I/O operations, enables a continuous backup, and, in the case of database failure, guarantees data integrity up until the last saved checkpoint. In Postgresql’s implementation, transactions are written to XLog, which is divided into 16MB files (“segments”) that together comprise a complete history of transactions. Transactions are continually appended to the latest segment, while checkpointing continually archives segments up until the last checkpoint. Internally, a suite of XLog structures and functions interfaces with the various resource managers so they can log a sufficient amount of data to restore data (“redo”) in case of failure.
Another Postgresql feature is the creation of indexes on a invariant custom field; for example, on the LastName of a Person even though the primary key is ID. These custom indexes speed up row lookup. Postgres currently supports four index types: B-tree, GiST, and GIN, and hash. Indexes on the former three are WAL-recoverable, but hashing is not.

2. Proposal
As a GSoC student, I will implement WAL recovery of hash indexes using the other index types’ WAL code as a guide. Roughly, I will:
- Devise a way to store and retrieve hashing data within the XLog data structures. 
- In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record) in hash.c, branch to code for the various redo operations: creating an index, inserting into an index, deleting an index, and page operations (split, delete, update?).
- Code each branch by drawing on examples from btree_redo, gin_redo, and gist_redo, the existing XLog code of the other index types.

Benefits
Hash index searching is O(1), which is asymptotically faster than the O(n lg n) searching of a B-tree, and does not require custom indexing functions like GIN and GIST inherently do. Therefore it is desirable for rows that will only be retrieved on an equality or inequality relation. However, two things currently stand in the way of its popular use. From the Postgresql documentation,
“Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash if there were unwritten changes. Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. For these reasons, hash index use is presently discouraged.”
My project would solve the first problem, after which I would like to stay on and fix the second.

To be written: Quantifiable Results, Schedule, Completeness Criteria, Bio


On Feb 28, 2014, at 6:21 AM, Greg Stark <stark@mit.edu> wrote:

On Tue, Jan 28, 2014 at 5:34 PM, Thom Brown <thom@linux.com> wrote:
Who would be up for mentoring this year?  And are there any project
ideas folk would like to suggest?

I mentored in the past and felt I didn't do a very good job because I
didn't really understand the project the student was working on.

There's precisely one project that I feel I would be competent to
mentor at this point. Making hash indexes WAL recoverable. This is
something that's easy to define the scope of and easy to determine if
the student is on track and easy to measure when finished. It's
something where as far as I can tell all the mentor work will be
purely technical advice.

Also it's something the project really really needs and is perfectly
sized for a GSOC project IMHO. Also it's a great project for a student
who might be interested in working on Postgres in the future since it
requires learning all our idiosyncratic build and source conventions
but doesn't require huge or controversial architectural changes.

I fear a number of items in the Wiki seem unrealistically large
projects for GSOC IMNSHO.

--
greg


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Robert Haas
Date:
On Sun, Mar 2, 2014 at 2:38 PM, Tan Tran <tankimtran@gmail.com> wrote:
> 2. Proposal
> As a GSoC student, I will implement WAL recovery of hash indexes using the
> other index types' WAL code as a guide. Roughly, I will:
> - Devise a way to store and retrieve hashing data within the XLog data
> structures.
> - In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record)
> in hash.c, branch to code for the various redo operations: creating an
> index, inserting into an index, deleting an index, and page operations
> (split, delete, update?).
> - Code each branch by drawing on examples from btree_redo, gin_redo, and
> gist_redo, the existing XLog code of the other index types.

Unfortunately, I don't believe that it's possible to do this easily
today because of the way bucket splits are handled.  I wrote about
this previously here, with an idea for solving the problem:

http://www.postgresql.org/message-id/CA+TgmoZyMoJSrFxHXQ06G8jhjXQcsKvDiHB_8z_7nc7hj7iHYQ@mail.gmail.com

Sadly, no one responded.  :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Tan Tran
Date:
Thanks for alerting me to your previous idea. While I don't know enough about Postgresql internals to judge its merits yet, I'll write some pseudocode based on it in my proposal; and I'll relegate it to a "reach" proposal alongside a more straightforward one.

Tan

On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Mar 2, 2014 at 2:38 PM, Tan Tran <tankimtran@gmail.com> wrote:
> 2. Proposal
> As a GSoC student, I will implement WAL recovery of hash indexes using the
> other index types' WAL code as a guide. Roughly, I will:
> - Devise a way to store and retrieve hashing data within the XLog data
> structures.
> - In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record)
> in hash.c, branch to code for the various redo operations: creating an
> index, inserting into an index, deleting an index, and page operations
> (split, delete, update?).
> - Code each branch by drawing on examples from btree_redo, gin_redo, and
> gist_redo, the existing XLog code of the other index types.

Unfortunately, I don't believe that it's possible to do this easily
today because of the way bucket splits are handled.  I wrote about
this previously here, with an idea for solving the problem:

http://www.postgresql.org/message-id/CA+TgmoZyMoJSrFxHXQ06G8jhjXQcsKvDiHB_8z_7nc7hj7iHYQ@mail.gmail.com

Sadly, no one responded.  :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: GSoC 2014 - mentors, students and admins

From
Lubennikova Anastasia
Date:
Thom Brown-2 wrote
> Hi all,
>
> Application to Google Summer of Code 2014 can be made as of next
> Monday (3rd Feb), and then there will be a 12 day window in which to
> submit an application.
>
> I'd like to gauge interest from both mentors and students as to
> whether we'll want to do this.

Hi,
I am interested in taking part in GSoC as a student.
I want to choose "support for index only scan for GIST". I think it will be
actual for community and good for gsoc. I am reading documentation and
source code now, and preparing to discuss questions of implementation and
write a proposal.
Which special knowledges and tools do I need to cope with a task?
Can I find useful information somewhere else in addition to PostgreSQL
documentation?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/GSoC-2014-mentors-students-and-admins-tp5789285p5794743.html
Sent from the PostgreSQL - advocacy mailing list archive at Nabble.com.


Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Jeff Janes
Date:
On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Unfortunately, I don't believe that it's possible to do this easily
today because of the way bucket splits are handled.  I wrote about
this previously here, with an idea for solving the problem:

http://www.postgresql.org/message-id/CA+TgmoZyMoJSrFxHXQ06G8jhjXQcsKvDiHB_8z_7nc7hj7iHYQ@mail.gmail.com

Sadly, no one responded.  :-(

On Mon, Mar 3, 2014 at 9:39 AM, Tan Tran <tankimtran@gmail.com> wrote:
Thanks for alerting me to your previous idea. While I don't know enough about Postgresql internals to judge its merits yet, I'll write some pseudocode based on it in my proposal; and I'll relegate it to a "reach" proposal alongside a more straightforward one.

Tan


Hi Tan,

I'm not familiar with the inner workings of the GSoC, but I don't know if this can be relegated to a "stretch" goal.  WAL logging is an all or nothing thing.  I think that, to be applied to the codebase (which I assume is the goal of GSoC), all actions need to be implemented.  That is probably why this has remained open so long: there is no incremental way to get the code written.  (But I would like to see it get done, I don't want to discourage that.)

Cheers,

Jeff

Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Greg Stark
Date:
On Mon, Mar 3, 2014 at 4:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Unfortunately, I don't believe that it's possible to do this easily
> today because of the way bucket splits are handled.  I wrote about
> this previously here, with an idea for solving the problem:

We could just tackle this in the same incomplete, buggy, way that
btrees tackled it for years until Heikki fixed them and the way gin
and gist still do I believe. Namely just emit xlog records for each
page individually and during replay remember when you have an
"incomplete split" and complain if recovery ends with any still
incomplete. That would be unfortunate to be adding new cases of this
just as Heikki and company are making progress eliminating the ones we
already had but that's surely better than having no recovery at all.



--
greg


Re: GSoC 2014 - mentors, students and admins

From
Thom Brown
Date:
All students and mentors (and backup mentors) should now register to
this year's GSoC.  Students only have until Friday next week (up until
21st March 19:00 UTC) to apply.

Thanks

Thom


Re: GSoC 2014 - mentors, students and admins

From
Stephen Frost
Date:
* Stephen Frost (sfrost@snowman.net) wrote:
> I'm interested in mentoring and, unlike previous years, I've been
> collecting a personal list of things that I'd like to see worked on for
> PG which could be GSoC projects and will provide such in the next few
> days to this list (unless there's a different list that people want such
> posted to..?).

Alright, I've updated the wiki page here:

https://wiki.postgresql.org/wiki/GSoC_2014

with my thoughts (finally- sorry about the delay).  Looks like other
folks have been updating it too, which is great.  Hopefully we can
encourage some students to go check it out and try to pick up one of
those projects...

    Thanks,

        Stephen

Re: [HACKERS] GSoC 2014 - mentors, students and admins

From
Fabrízio de Royes Mello
Date:

On Mon, Mar 17, 2014 at 10:43 PM, Stephen Frost <sfrost@snowman.net> wrote:
>
> * Stephen Frost (sfrost@snowman.net) wrote:
> > I'm interested in mentoring and, unlike previous years, I've been
> > collecting a personal list of things that I'd like to see worked on for
> > PG which could be GSoC projects and will provide such in the next few
> > days to this list (unless there's a different list that people want such
> > posted to..?).
>
> Alright, I've updated the wiki page here:
>
> https://wiki.postgresql.org/wiki/GSoC_2014
>
> with my thoughts (finally- sorry about the delay).  Looks like other
> folks have been updating it too, which is great.  Hopefully we can
> encourage some students to go check it out and try to pick up one of
> those projects...
>

Folks if this don't cause trouble I added one more item:
* Allow an unlogged table to be changed to logged

We already discussed a while about this feature [1].

Regards,

[1] http://www.postgresql.org/message-id/CAFcNs+peg3VPG2=v6Lu3vfCDP8mt7cs6-RMMXxjxWNLREgSRVQ@mail.gmail.com

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: GSoC 2014 - mentors, students and admins

From
Thom Brown
Date:
Hi all,

There is 1 day left to get submissions in, so students should ensure
that they submit their proposals as soon as possible.  No submissions
will be accepted beyond the deadline of 19:00 UTC tomorrow (Friday
21st March).

Regards

Thom


Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Peter Geoghegan
Date:
On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> As a GSoC student, I will implement WAL recovery of hash indexes using the
>> other index types' WAL code as a guide.

Frankly, I'm skeptical of the idea that hash indexes will ever really
be useful. I realize that that's a counter-intuitive conclusion, but
there are many things we could do to improve B-Tree CPU costs to make
them closer to those of hash indexes, without making them any less
flexible. I myself would much rather work on that, and intend to.

The O(1) cost seems attractive when you consider that that only
requires that we read one index page from disk to service any given
index scan, but in fact B-Trees almost always only require the same.
They are of course also much more flexible. The concurrency
characteristics B-Trees are a lot better understood. I sincerely
suggest that we forget about conventional hash table type indexes. I
fear they're a lost cause.

--
Peter Geoghegan


Re: [HACKERS] GSoC on WAL-logging hash indexes

From
"ktm@rice.edu"
Date:
On Wed, Apr 30, 2014 at 12:26:20AM -0700, Peter Geoghegan wrote:
> On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> As a GSoC student, I will implement WAL recovery of hash indexes using the
> >> other index types' WAL code as a guide.
>
> Frankly, I'm skeptical of the idea that hash indexes will ever really
> be useful. I realize that that's a counter-intuitive conclusion, but
> there are many things we could do to improve B-Tree CPU costs to make
> them closer to those of hash indexes, without making them any less
> flexible. I myself would much rather work on that, and intend to.
>
> The O(1) cost seems attractive when you consider that that only
> requires that we read one index page from disk to service any given
> index scan, but in fact B-Trees almost always only require the same.
> They are of course also much more flexible. The concurrency
> characteristics B-Trees are a lot better understood. I sincerely
> suggest that we forget about conventional hash table type indexes. I
> fear they're a lost cause.
>
> --
> Peter Geoghegan
>
Hi Peter,

I do not think that CPU costs matter as much as the O(1) probe to
get a result value specifically for very large indexes/tables where
even caching the upper levels of a B-tree index would kill your
working set in memory. I know, I know, everyone has so much memory
and can just buy more... but this does matter. I also think that
development of hash indexes has been stalled waiting for WAL
logging. For example, hash indexes can almost trivially become
more space efficient as they grow in size by utilizing the page
number to represent the prefix bits of the hash value for a bucket.

My 2 cents.
Ken


Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Peter Geoghegan
Date:
On Wed, Apr 30, 2014 at 5:55 AM, ktm@rice.edu <ktm@rice.edu> wrote:
> I do not think that CPU costs matter as much as the O(1) probe to
> get a result value specifically for very large indexes/tables where
> even caching the upper levels of a B-tree index would kill your
> working set in memory. I know, I know, everyone has so much memory
> and can just buy more... but this does matter.

Have you actually investigated how little memory it takes to store the
inner pages? It's typically less than 1% of the entire index. AFAIK,
hash indexes are not used much in any other system. I think MySQL has
them, and SQL Server 2014 has special in-memory hash table indexes for
in memory tables, but that's all I can find on Google.


--
Peter Geoghegan


Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Robert Haas
Date:
On Wed, Apr 30, 2014 at 12:54 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Apr 30, 2014 at 5:55 AM, ktm@rice.edu <ktm@rice.edu> wrote:
>> I do not think that CPU costs matter as much as the O(1) probe to
>> get a result value specifically for very large indexes/tables where
>> even caching the upper levels of a B-tree index would kill your
>> working set in memory. I know, I know, everyone has so much memory
>> and can just buy more... but this does matter.
>
> Have you actually investigated how little memory it takes to store the
> inner pages? It's typically less than 1% of the entire index. AFAIK,
> hash indexes are not used much in any other system. I think MySQL has
> them, and SQL Server 2014 has special in-memory hash table indexes for
> in memory tables, but that's all I can find on Google.

I thought the theoretical advantage of hash indexes wasn't that they
were smaller but that you avoided a central contention point (the
btree root).

Of course our current hash indexes have *more* not less contention
than btree but I'm pretty comfortable chalking that up to quality of
implementation rather than anything intrinsic.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I thought the theoretical advantage of hash indexes wasn't that they
> were smaller but that you avoided a central contention point (the
> btree root).

> Of course our current hash indexes have *more* not less contention
> than btree but I'm pretty comfortable chalking that up to quality of
> implementation rather than anything intrinsic.

The long and the short of it is that there are *lots* of implementation
deficiences in our hash indexes.  There's no real way to know whether
they'd be competitive if all those things were rectified, except by doing
the work to fix 'em.  And it's hard to justify putting much effort into
hash indexes so long as there's an elephant in the room of the size of "no
WAL support".  So I'm in favor of getting that fixed, if we have somebody
who's willing to do it.  It might lead to good things later; and even if
it doesn't, the lack of WAL support is an embarrassment.

            regards, tom lane


Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Peter Geoghegan
Date:
On Wed, Apr 30, 2014 at 10:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I thought the theoretical advantage of hash indexes wasn't that they
> were smaller but that you avoided a central contention point (the
> btree root).

The B-Tree root isn't really a central contention point at all. The
locking/latching protocol that nbtree uses is remarkably
concurrency-friendly. In the real world, there is pretty much no
exclusive locking of the root page's buffer.

> Of course our current hash indexes have *more* not less contention
> than btree but I'm pretty comfortable chalking that up to quality of
> implementation rather than anything intrinsic.

I am not convinced of that.

--
Peter Geoghegan


Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Peter Geoghegan
Date:
On Wed, Apr 30, 2014 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  It might lead to good things later; and even if
> it doesn't, the lack of WAL support is an embarrassment.

I don't think it will, but I do agree that the current state of
affairs is an embarrassment.

--
Peter Geoghegan


Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Jeff Janes
Date:
On Wed, Apr 30, 2014 at 12:26 AM, Peter Geoghegan <pg@heroku.com> wrote:
On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> As a GSoC student, I will implement WAL recovery of hash indexes using the
>> other index types' WAL code as a guide.

Frankly, I'm skeptical of the idea that hash indexes will ever really
be useful. I realize that that's a counter-intuitive conclusion, but
there are many things we could do to improve B-Tree CPU costs to make
them closer to those of hash indexes, without making them any less
flexible. I myself would much rather work on that, and intend to.

If we don't put in the work to make them useful, then they won't ever become useful.

If we do put in the effort (and it would be considerable) then I think they will be.  But you may be correct that the effort required would perhaps be better used in making btree even more better.  I don't think we can conclude that definitively without putting in the work to do the experiment.

One advantage of the hash indexes is that the code is simple enough for someone to actually understand it in a summer. Whether it would still be like that after WAL logging was implemented, I don't know.

The O(1) cost seems attractive when you consider that that only
requires that we read one index page from disk to service any given
index scan, but in fact B-Trees almost always only require the same.
They are of course also much more flexible. The concurrency
characteristics B-Trees are a lot better understood.

Not sure what you mean there.  The concurrency issues of the hash index has a lot less that needs to be understand.  I think I understand it pretty well (unlike B-tree), I just don't know what to with that knowledge.
 
I sincerely
suggest that we forget about conventional hash table type indexes. I
fear they're a lost cause.

I understand that those are the only ones worth fighting for. :)

Cheers,

Jeff

Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Jeff Janes
Date:
On Wed, Apr 30, 2014 at 11:02 AM, Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Apr 30, 2014 at 10:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I thought the theoretical advantage of hash indexes wasn't that they
> were smaller but that you avoided a central contention point (the
> btree root).

The B-Tree root isn't really a central contention point at all. The
locking/latching protocol that nbtree uses is remarkably
concurrency-friendly. In the real world, there is pretty much no
exclusive locking of the root page's buffer.

I've seen the simple pinning and unpinning of the root page (or the fast root, whatever the first page we bother to pin on a regular basis is called) be a point of contention.  When one index dominates the entire system workload, that one page also drives contention on the spin lock that protects the lwlock that share-protects whichever buffer mapping partition happens to contain it.
 
Cheers,

Jeff

Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Andres Freund
Date:
On 2014-04-30 11:10:22 -0700, Jeff Janes wrote:
> I've seen the simple pinning and unpinning of the root page (or the fast root,
> whatever the first page we bother to pin on a regular basis is called) be a
> point of contention.  When one index dominates the entire system workload, that
> one page also drives contention on the spin lock that protects the lwlock that
> share-protects whichever buffer mapping partition happens to contain it.

To quite some degree that's an implementation deficiency of our lwlocks
though. I've seen *massive* improvements with my lwlock patch for that
problem. Additionally we need to get rid of the spinlock around
pin/unpin.
That said, even after those optimizations, there remains a significant
amount of cacheline bouncing. That's much easier to avoid for something
like hash indexes than btrees.

I think another advantage is that hash indexes can be *much* smaller
than btree when the individual rows are wide. I wonder though if we
couldn't solve that better by introducing "transforms" around the looked
up data. E.g. allow to *transparently* use a hash(indexed_column) to be
used. If you currently do that a lot of work has to be done in every
query...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] GSoC on WAL-logging hash indexes

From
Darren Duncan
Date:
Is there a good reason for this thread being copied to the advocacy list?  It
seems to me just on topic for hackers. -- Darren Duncan