Thread: GSoC 2014 - mentors, students and admins
Hi all, Application to Google Summer of Code 2014 can be made as of next Monday (3rd Feb), and then there will be a 12 day window in which to submit an application. I'd like to gauge interest from both mentors and students as to whether we'll want to do this. And I'd be fine with being admin again this year, unless there's anyone else who would like to take up the mantle? Who would be up for mentoring this year? And are there any project ideas folk would like to suggest? Thanks Thom
On Tue, Jan 28, 2014 at 11:16 PM, Atri Sharma <atri.jiit@gmail.com> wrote:
On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown <thom@linux.com> wrote:Hi all,
Application to Google Summer of Code 2014 can be made as of next
Monday (3rd Feb), and then there will be a 12 day window in which to
submit an application.
I'd like to gauge interest from both mentors and students as to
whether we'll want to do this.
And I'd be fine with being admin again this year, unless there's
anyone else who would like to take up the mantle?
Who would be up for mentoring this year? And are there any project
ideas folk would like to suggest?
Hi,
I would like to bring up the addition to MADLIB algorithms again this year.Also, some work on the foreign table constraints could be helpful.
Hi,
Also, can we consider a project in an extension to be a project in GSoC 2014 as GSoC 2014 under PostgreSQL?
I was thinking of having some support for writable FDW in JDBC_FDW, if possible.
Regards,
Atri
Regards,
Atri
--
Regards,
Atri
l'apprenant
On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown <thom@linux.com> wrote:
Hi all,
Application to Google Summer of Code 2014 can be made as of next
Monday (3rd Feb), and then there will be a 12 day window in which to
submit an application.
I'd like to gauge interest from both mentors and students as to
whether we'll want to do this.
And I'd be fine with being admin again this year, unless there's
anyone else who would like to take up the mantle?
Who would be up for mentoring this year? And are there any project
ideas folk would like to suggest?
Hi,
I would like to bring up the addition to MADLIB algorithms again this year.
I would like to bring up the addition to MADLIB algorithms again this year.
Also, some work on the foreign table constraints could be helpful.
Regards,
Atri
Regards,
Atri
--
Regards,
Atri
l'apprenant
Thom, * Thom Brown (thom@linux.com) wrote: > Application to Google Summer of Code 2014 can be made as of next > Monday (3rd Feb), and then there will be a 12 day window in which to > submit an application. This is just for PG to be a participating organization, right? There's a while before mentors and students get invovled, as I understand it. > I'd like to gauge interest from both mentors and students as to > whether we'll want to do this. Yes. > And I'd be fine with being admin again this year, unless there's > anyone else who would like to take up the mantle? Having you do it works for me. :) > Who would be up for mentoring this year? And are there any project > ideas folk would like to suggest? I'm interested in mentoring and, unlike previous years, I've been collecting a personal list of things that I'd like to see worked on for PG which could be GSoC projects and will provide such in the next few days to this list (unless there's a different list that people want such posted to..?). Thanks, Stephen
Attachment
On 28 January 2014 19:43, Stephen Frost <sfrost@snowman.net> wrote: > Thom, > > * Thom Brown (thom@linux.com) wrote: >> Application to Google Summer of Code 2014 can be made as of next >> Monday (3rd Feb), and then there will be a 12 day window in which to >> submit an application. > > This is just for PG to be a participating organization, right? There's > a while before mentors and students get invovled, as I understand it. Yes, correct. Students and mentors don't need to be signed up until April. >> Who would be up for mentoring this year? And are there any project >> ideas folk would like to suggest? > > I'm interested in mentoring and, unlike previous years, I've been > collecting a personal list of things that I'd like to see worked on for > PG which could be GSoC projects and will provide such in the next few > days to this list (unless there's a different list that people want such > posted to..?). That's great. I don't see any problem with posting suggestions here, although I'd suggest refraining from going in-depth as that can come later. If there's enough interest and agreement, we'll go ahead and apply. Thom
On 01/28/2014 07:34 PM, Thom Brown wrote: > And I'd be fine with being admin again this year, unless there's > anyone else who would like to take up the mantle? Please do, thanks! > Who would be up for mentoring this year? I can mentor. - Heikki
On Tue, Jan 28, 2014 at 05:34:21PM +0000, Thom Brown wrote: > Hi all, > > Application to Google Summer of Code 2014 can be made as of next > Monday (3rd Feb), and then there will be a 12 day window in which to > submit an application. > > I'd like to gauge interest from both mentors and students as to > whether we'll want to do this. > > And I'd be fine with being admin again this year, unless there's > anyone else who would like to take up the mantle? Thanks for your hard work administering last year, and thanks even more for taking this on in light of that experience :) > Who would be up for mentoring this year? And are there any project > ideas folk would like to suggest? I'd be delighted to mentor. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Hi!
On Tue, Jan 28, 2014 at 9:34 PM, Thom Brown <thom@linux.com> wrote:
And I'd be fine with being admin again this year, unless there's
anyone else who would like to take up the mantle?
Thanks for your work. I would like to see you as admin this year again.
Who would be up for mentoring this year? And are there any project
ideas folk would like to suggest?
I would like to be mentor.
------
With best regards,
Alexander Korotkov.
Hi, On 01/28/2014 06:46 PM, Atri Sharma wrote: > On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown <thom@linux.com > <mailto:thom@linux.com>> wrote: > > Hi all, > > Application to Google Summer of Code 2014 can be made as of next > Monday (3rd Feb), and then there will be a 12 day window in which to > submit an application. > > I'd like to gauge interest from both mentors and students as to > whether we'll want to do this. > > And I'd be fine with being admin again this year, unless there's > anyone else who would like to take up the mantle? > > Who would be up for mentoring this year? And are there any project > ideas folk would like to suggest? > > I would like to bring up the addition to MADLIB algorithms again this year. I've spoken with the MADlib team at goivotal and they are ok to support this proposal. Therefore I offer to mentor this. Regards, -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
Awesome. I can be an assistant mentor for this one is possible or I could mentor some other project.
On Tuesday, February 25, 2014, Andreas 'ads' Scherbaum <adsmail@wars-nicht.de> wrote:
--
On Tuesday, February 25, 2014, Andreas 'ads' Scherbaum <adsmail@wars-nicht.de> wrote:
Hi,
On 01/28/2014 06:46 PM, Atri Sharma wrote:On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown <thom@linux.com
<mailto:thom@linux.com>> wrote:
Hi all,
Application to Google Summer of Code 2014 can be made as of next
Monday (3rd Feb), and then there will be a 12 day window in which to
submit an application.
I'd like to gauge interest from both mentors and students as to
whether we'll want to do this.
And I'd be fine with being admin again this year, unless there's
anyone else who would like to take up the mantle?
Who would be up for mentoring this year? And are there any project
ideas folk would like to suggest?
I would like to bring up the addition to MADLIB algorithms again this year.
I've spoken with the MADlib team at goivotal and they are ok to support this proposal. Therefore I offer to mentor this.
Regards,
--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
--
Regards,
Atri
l'apprenant
On 25 February 2014 13:28, Andreas 'ads' Scherbaum <adsmail@wars-nicht.de> wrote:
Hi,
On 01/28/2014 06:46 PM, Atri Sharma wrote:On Tue, Jan 28, 2014 at 11:04 PM, Thom Brown <thom@linux.com<mailto:thom@linux.com>> wrote:
Hi all,
Application to Google Summer of Code 2014 can be made as of next
Monday (3rd Feb), and then there will be a 12 day window in which to
submit an application.
I'd like to gauge interest from both mentors and students as to
whether we'll want to do this.
And I'd be fine with being admin again this year, unless there's
anyone else who would like to take up the mantle?
Who would be up for mentoring this year? And are there any project
ideas folk would like to suggest?I would like to bring up the addition to MADLIB algorithms again this year.
I've spoken with the MADlib team at goivotal and they are ok to support this proposal. Therefore I offer to mentor this.
Are there any more prospective mentors? We'll want some folk to act as back-up mentors too to ensure projects can still be completed should any mentor become unavailable.
Thom
On Thu, Feb 27, 2014 at 07:54:13PM +0000, Thom Brown wrote: > On 25 February 2014 13:28, Andreas 'ads' Scherbaum <adsmail@wars-nicht.de>wrote: > > I've spoken with the MADlib team at goivotal and they are ok to > > support this proposal. Therefore I offer to mentor this. > > Are there any more prospective mentors? We'll want some folk to act > as back-up mentors too to ensure projects can still be completed > should any mentor become unavailable. For MADlib, no. Are you asking for mentors in general? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 27 February 2014 21:08, David Fetter <david@fetter.org> wrote:
On Thu, Feb 27, 2014 at 07:54:13PM +0000, Thom Brown wrote:
> On 25 February 2014 13:28, Andreas 'ads' Scherbaum <adsmail@wars-nicht.de>wrote:> > I've spoken with the MADlib team at goivotal and they are ok toFor MADlib, no. Are you asking for mentors in general?
> > support this proposal. Therefore I offer to mentor this.
>
> Are there any more prospective mentors? We'll want some folk to act
> as back-up mentors too to ensure projects can still be completed
> should any mentor become unavailable.
Ah yes, I should clarify. Yes, mentors in general.
Thom
On Tue, Jan 28, 2014 at 5:34 PM, Thom Brown <thom@linux.com> wrote: > Who would be up for mentoring this year? And are there any project > ideas folk would like to suggest? I mentored in the past and felt I didn't do a very good job because I didn't really understand the project the student was working on. There's precisely one project that I feel I would be competent to mentor at this point. Making hash indexes WAL recoverable. This is something that's easy to define the scope of and easy to determine if the student is on track and easy to measure when finished. It's something where as far as I can tell all the mentor work will be purely technical advice. Also it's something the project really really needs and is perfectly sized for a GSOC project IMHO. Also it's a great project for a student who might be interested in working on Postgres in the future since it requires learning all our idiosyncratic build and source conventions but doesn't require huge or controversial architectural changes. I fear a number of items in the Wiki seem unrealistically large projects for GSOC IMNSHO. -- greg
W dniu 27.02.2014 22:25, Thom Brown pisze:
In general I can help but I'm not sure if I'm not too fresh in pgsql ;) However after GSOC as student I can try "the another side".On 27 February 2014 21:08, David Fetter <david@fetter.org> wrote:For MADlib, no. Are you asking for mentors in general?Ah yes, I should clarify. Yes, mentors in general.
Regards,
Karol
Hi Greg, pgsql-advocacy, and pgsql-hackers,
I'm interested in doing my GSoC project on this idea. I'm new to indexing and WAL, which I haven't encountered in my classes, but it sounds interesting and valuable to Postgresql. So here's my draft proposal. Do you mind giving your opinion and corrections? With your help I'll add some technical detail to my plans.
Thanks,
Tan Tran
Introduction
In write-ahead logging (WAL), all modifications to a database are written to a write-ahead log before being flushed to disk at periodic checkpoints. This method saves I/O operations, enables a continuous backup, and, in the case of database failure, guarantees data integrity up until the last saved checkpoint. In Postgresql’s implementation, transactions are written to XLog, which is divided into 16MB files (“segments”) that together comprise a complete history of transactions. Transactions are continually appended to the latest segment, while checkpointing continually archives segments up until the last checkpoint. Internally, a suite of XLog structures and functions interfaces with the various resource managers so they can log a sufficient amount of data to restore data (“redo”) in case of failure.
Another Postgresql feature is the creation of indexes on a invariant custom field; for example, on the LastName of a Person even though the primary key is ID. These custom indexes speed up row lookup. Postgres currently supports four index types: B-tree, GiST, and GIN, and hash. Indexes on the former three are WAL-recoverable, but hashing is not.
2. Proposal
As a GSoC student, I will implement WAL recovery of hash indexes using the other index types’ WAL code as a guide. Roughly, I will:
- Devise a way to store and retrieve hashing data within the XLog data structures.
- In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record) in hash.c, branch to code for the various redo operations: creating an index, inserting into an index, deleting an index, and page operations (split, delete, update?).
- Code each branch by drawing on examples from btree_redo, gin_redo, and gist_redo, the existing XLog code of the other index types.
Benefits
Hash index searching is O(1), which is asymptotically faster than the O(n lg n) searching of a B-tree, and does not require custom indexing functions like GIN and GIST inherently do. Therefore it is desirable for rows that will only be retrieved on an equality or inequality relation. However, two things currently stand in the way of its popular use. From the Postgresql documentation,
“Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash if there were unwritten changes. Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. For these reasons, hash index use is presently discouraged.”
My project would solve the first problem, after which I would like to stay on and fix the second.
To be written: Quantifiable Results, Schedule, Completeness Criteria, Bio
On Feb 28, 2014, at 6:21 AM, Greg Stark <stark@mit.edu> wrote:
On Tue, Jan 28, 2014 at 5:34 PM, Thom Brown <thom@linux.com> wrote:Who would be up for mentoring this year? And are there any project
ideas folk would like to suggest?
I mentored in the past and felt I didn't do a very good job because I
didn't really understand the project the student was working on.
There's precisely one project that I feel I would be competent to
mentor at this point. Making hash indexes WAL recoverable. This is
something that's easy to define the scope of and easy to determine if
the student is on track and easy to measure when finished. It's
something where as far as I can tell all the mentor work will be
purely technical advice.
Also it's something the project really really needs and is perfectly
sized for a GSOC project IMHO. Also it's a great project for a student
who might be interested in working on Postgres in the future since it
requires learning all our idiosyncratic build and source conventions
but doesn't require huge or controversial architectural changes.
I fear a number of items in the Wiki seem unrealistically large
projects for GSOC IMNSHO.
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Earlier I posted an email to this thread that I realize "hijacked" the discussion. Please continue replying to here instead.
On Feb 28, 2014, at 6:59 AM, Karol Trzcionka <karlikt@gmail.com> wrote:
W dniu 27.02.2014 22:25, Thom Brown pisze:In general I can help but I'm not sure if I'm not too fresh in pgsql ;) However after GSOC as student I can try "the another side".On 27 February 2014 21:08, David Fetter <david@fetter.org> wrote:For MADlib, no. Are you asking for mentors in general?Ah yes, I should clarify. Yes, mentors in general.
Regards,
Karol
Hi all,
Earlier I posted this in the wrong thread. Please excuse the double posting.
Tan Tran
Begin forwarded message:
From: Tan Tran <tankimtran@gmail.com>Subject: Re: [HACKERS] GSoC 2014 - mentors, students and adminsDate: March 2, 2014 at 5:03:14 AM PSTTo: Greg Stark <stark@mit.edu>Cc: pgsql-advocacy <pgsql-advocacy@postgresql.org>, PostgreSQL-development <pgsql-hackers@postgresql.org>Hi Greg, pgsql-advocacy, and pgsql-hackers,I'm interested in doing my GSoC project on this idea. I'm new to indexing and WAL, which I haven't encountered in my classes, but it sounds interesting and valuable to Postgresql. So here's my draft proposal. Do you mind giving your opinion and corrections? With your help I'll add some technical detail to my plans.Thanks,Tan TranIntroductionIn write-ahead logging (WAL), all modifications to a database are written to a write-ahead log before being flushed to disk at periodic checkpoints. This method saves I/O operations, enables a continuous backup, and, in the case of database failure, guarantees data integrity up until the last saved checkpoint. In Postgresql’s implementation, transactions are written to XLog, which is divided into 16MB files (“segments”) that together comprise a complete history of transactions. Transactions are continually appended to the latest segment, while checkpointing continually archives segments up until the last checkpoint. Internally, a suite of XLog structures and functions interfaces with the various resource managers so they can log a sufficient amount of data to restore data (“redo”) in case of failure.Another Postgresql feature is the creation of indexes on a invariant custom field; for example, on the LastName of a Person even though the primary key is ID. These custom indexes speed up row lookup. Postgres currently supports four index types: B-tree, GiST, and GIN, and hash. Indexes on the former three are WAL-recoverable, but hashing is not.2. ProposalAs a GSoC student, I will implement WAL recovery of hash indexes using the other index types’ WAL code as a guide. Roughly, I will:- Devise a way to store and retrieve hashing data within the XLog data structures.- In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record) in hash.c, branch to code for the various redo operations: creating an index, inserting into an index, deleting an index, and page operations (split, delete, update?).- Code each branch by drawing on examples from btree_redo, gin_redo, and gist_redo, the existing XLog code of the other index types.BenefitsHash index searching is O(1), which is asymptotically faster than the O(n lg n) searching of a B-tree, and does not require custom indexing functions like GIN and GIST inherently do. Therefore it is desirable for rows that will only be retrieved on an equality or inequality relation. However, two things currently stand in the way of its popular use. From the Postgresql documentation,“Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash if there were unwritten changes. Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. For these reasons, hash index use is presently discouraged.”My project would solve the first problem, after which I would like to stay on and fix the second.To be written: Quantifiable Results, Schedule, Completeness Criteria, BioOn Feb 28, 2014, at 6:21 AM, Greg Stark <stark@mit.edu> wrote:On Tue, Jan 28, 2014 at 5:34 PM, Thom Brown <thom@linux.com> wrote:Who would be up for mentoring this year? And are there any project
ideas folk would like to suggest?
I mentored in the past and felt I didn't do a very good job because I
didn't really understand the project the student was working on.
There's precisely one project that I feel I would be competent to
mentor at this point. Making hash indexes WAL recoverable. This is
something that's easy to define the scope of and easy to determine if
the student is on track and easy to measure when finished. It's
something where as far as I can tell all the mentor work will be
purely technical advice.
Also it's something the project really really needs and is perfectly
sized for a GSOC project IMHO. Also it's a great project for a student
who might be interested in working on Postgres in the future since it
requires learning all our idiosyncratic build and source conventions
but doesn't require huge or controversial architectural changes.
I fear a number of items in the Wiki seem unrealistically large
projects for GSOC IMNSHO.
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Mar 2, 2014 at 2:38 PM, Tan Tran <tankimtran@gmail.com> wrote: > 2. Proposal > As a GSoC student, I will implement WAL recovery of hash indexes using the > other index types' WAL code as a guide. Roughly, I will: > - Devise a way to store and retrieve hashing data within the XLog data > structures. > - In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record) > in hash.c, branch to code for the various redo operations: creating an > index, inserting into an index, deleting an index, and page operations > (split, delete, update?). > - Code each branch by drawing on examples from btree_redo, gin_redo, and > gist_redo, the existing XLog code of the other index types. Unfortunately, I don't believe that it's possible to do this easily today because of the way bucket splits are handled. I wrote about this previously here, with an idea for solving the problem: http://www.postgresql.org/message-id/CA+TgmoZyMoJSrFxHXQ06G8jhjXQcsKvDiHB_8z_7nc7hj7iHYQ@mail.gmail.com Sadly, no one responded. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Thanks for alerting me to your previous idea. While I don't know enough about Postgresql internals to judge its merits yet, I'll write some pseudocode based on it in my proposal; and I'll relegate it to a "reach" proposal alongside a more straightforward one.
Tan
On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Mar 2, 2014 at 2:38 PM, Tan Tran <tankimtran@gmail.com> wrote:Unfortunately, I don't believe that it's possible to do this easily
> 2. Proposal
> As a GSoC student, I will implement WAL recovery of hash indexes using the
> other index types' WAL code as a guide. Roughly, I will:
> - Devise a way to store and retrieve hashing data within the XLog data
> structures.
> - In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record)
> in hash.c, branch to code for the various redo operations: creating an
> index, inserting into an index, deleting an index, and page operations
> (split, delete, update?).
> - Code each branch by drawing on examples from btree_redo, gin_redo, and
> gist_redo, the existing XLog code of the other index types.
today because of the way bucket splits are handled. I wrote about
this previously here, with an idea for solving the problem:
http://www.postgresql.org/message-id/CA+TgmoZyMoJSrFxHXQ06G8jhjXQcsKvDiHB_8z_7nc7hj7iHYQ@mail.gmail.com
Sadly, no one responded. :-(
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Thom Brown-2 wrote > Hi all, > > Application to Google Summer of Code 2014 can be made as of next > Monday (3rd Feb), and then there will be a 12 day window in which to > submit an application. > > I'd like to gauge interest from both mentors and students as to > whether we'll want to do this. Hi, I am interested in taking part in GSoC as a student. I want to choose "support for index only scan for GIST". I think it will be actual for community and good for gsoc. I am reading documentation and source code now, and preparing to discuss questions of implementation and write a proposal. Which special knowledges and tools do I need to cope with a task? Can I find useful information somewhere else in addition to PostgreSQL documentation? -- View this message in context: http://postgresql.1045698.n5.nabble.com/GSoC-2014-mentors-students-and-admins-tp5789285p5794743.html Sent from the PostgreSQL - advocacy mailing list archive at Nabble.com.
On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Unfortunately, I don't believe that it's possible to do this easily
today because of the way bucket splits are handled. I wrote about
this previously here, with an idea for solving the problem:
http://www.postgresql.org/message-id/CA+TgmoZyMoJSrFxHXQ06G8jhjXQcsKvDiHB_8z_7nc7hj7iHYQ@mail.gmail.com
Sadly, no one responded. :-(
On Mon, Mar 3, 2014 at 9:39 AM, Tan Tran <tankimtran@gmail.com> wrote:
Thanks for alerting me to your previous idea. While I don't know enough about Postgresql internals to judge its merits yet, I'll write some pseudocode based on it in my proposal; and I'll relegate it to a "reach" proposal alongside a more straightforward one.Tan
Hi Tan,
I'm not familiar with the inner workings of the GSoC, but I don't know if this can be relegated to a "stretch" goal. WAL logging is an all or nothing thing. I think that, to be applied to the codebase (which I assume is the goal of GSoC), all actions need to be implemented. That is probably why this has remained open so long: there is no incremental way to get the code written. (But I would like to see it get done, I don't want to discourage that.)
Cheers,
Jeff
On Mon, Mar 3, 2014 at 4:12 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Unfortunately, I don't believe that it's possible to do this easily > today because of the way bucket splits are handled. I wrote about > this previously here, with an idea for solving the problem: We could just tackle this in the same incomplete, buggy, way that btrees tackled it for years until Heikki fixed them and the way gin and gist still do I believe. Namely just emit xlog records for each page individually and during replay remember when you have an "incomplete split" and complain if recovery ends with any still incomplete. That would be unfortunate to be adding new cases of this just as Heikki and company are making progress eliminating the ones we already had but that's surely better than having no recovery at all. -- greg
All students and mentors (and backup mentors) should now register to this year's GSoC. Students only have until Friday next week (up until 21st March 19:00 UTC) to apply. Thanks Thom
* Stephen Frost (sfrost@snowman.net) wrote: > I'm interested in mentoring and, unlike previous years, I've been > collecting a personal list of things that I'd like to see worked on for > PG which could be GSoC projects and will provide such in the next few > days to this list (unless there's a different list that people want such > posted to..?). Alright, I've updated the wiki page here: https://wiki.postgresql.org/wiki/GSoC_2014 with my thoughts (finally- sorry about the delay). Looks like other folks have been updating it too, which is great. Hopefully we can encourage some students to go check it out and try to pick up one of those projects... Thanks, Stephen
On Mon, Mar 17, 2014 at 10:43 PM, Stephen Frost <sfrost@snowman.net> wrote:
>
> * Stephen Frost (sfrost@snowman.net) wrote:
> > I'm interested in mentoring and, unlike previous years, I've been
> > collecting a personal list of things that I'd like to see worked on for
> > PG which could be GSoC projects and will provide such in the next few
> > days to this list (unless there's a different list that people want such
> > posted to..?).
>
> Alright, I've updated the wiki page here:
>
> https://wiki.postgresql.org/wiki/GSoC_2014
>
> with my thoughts (finally- sorry about the delay). Looks like other
> folks have been updating it too, which is great. Hopefully we can
> encourage some students to go check it out and try to pick up one of
> those projects...
>
Folks if this don't cause trouble I added one more item:
* Allow an unlogged table to be changed to logged
We already discussed a while about this feature [1].
Regards,
[1] http://www.postgresql.org/message-id/CAFcNs+peg3VPG2=v6Lu3vfCDP8mt7cs6-RMMXxjxWNLREgSRVQ@mail.gmail.com
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
Hi all, There is 1 day left to get submissions in, so students should ensure that they submit their proposals as soon as possible. No submissions will be accepted beyond the deadline of 19:00 UTC tomorrow (Friday 21st March). Regards Thom
On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> As a GSoC student, I will implement WAL recovery of hash indexes using the >> other index types' WAL code as a guide. Frankly, I'm skeptical of the idea that hash indexes will ever really be useful. I realize that that's a counter-intuitive conclusion, but there are many things we could do to improve B-Tree CPU costs to make them closer to those of hash indexes, without making them any less flexible. I myself would much rather work on that, and intend to. The O(1) cost seems attractive when you consider that that only requires that we read one index page from disk to service any given index scan, but in fact B-Trees almost always only require the same. They are of course also much more flexible. The concurrency characteristics B-Trees are a lot better understood. I sincerely suggest that we forget about conventional hash table type indexes. I fear they're a lost cause. -- Peter Geoghegan
On Wed, Apr 30, 2014 at 12:26:20AM -0700, Peter Geoghegan wrote: > On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote: > >> As a GSoC student, I will implement WAL recovery of hash indexes using the > >> other index types' WAL code as a guide. > > Frankly, I'm skeptical of the idea that hash indexes will ever really > be useful. I realize that that's a counter-intuitive conclusion, but > there are many things we could do to improve B-Tree CPU costs to make > them closer to those of hash indexes, without making them any less > flexible. I myself would much rather work on that, and intend to. > > The O(1) cost seems attractive when you consider that that only > requires that we read one index page from disk to service any given > index scan, but in fact B-Trees almost always only require the same. > They are of course also much more flexible. The concurrency > characteristics B-Trees are a lot better understood. I sincerely > suggest that we forget about conventional hash table type indexes. I > fear they're a lost cause. > > -- > Peter Geoghegan > Hi Peter, I do not think that CPU costs matter as much as the O(1) probe to get a result value specifically for very large indexes/tables where even caching the upper levels of a B-tree index would kill your working set in memory. I know, I know, everyone has so much memory and can just buy more... but this does matter. I also think that development of hash indexes has been stalled waiting for WAL logging. For example, hash indexes can almost trivially become more space efficient as they grow in size by utilizing the page number to represent the prefix bits of the hash value for a bucket. My 2 cents. Ken
On Wed, Apr 30, 2014 at 5:55 AM, ktm@rice.edu <ktm@rice.edu> wrote: > I do not think that CPU costs matter as much as the O(1) probe to > get a result value specifically for very large indexes/tables where > even caching the upper levels of a B-tree index would kill your > working set in memory. I know, I know, everyone has so much memory > and can just buy more... but this does matter. Have you actually investigated how little memory it takes to store the inner pages? It's typically less than 1% of the entire index. AFAIK, hash indexes are not used much in any other system. I think MySQL has them, and SQL Server 2014 has special in-memory hash table indexes for in memory tables, but that's all I can find on Google. -- Peter Geoghegan
On Wed, Apr 30, 2014 at 12:54 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Apr 30, 2014 at 5:55 AM, ktm@rice.edu <ktm@rice.edu> wrote: >> I do not think that CPU costs matter as much as the O(1) probe to >> get a result value specifically for very large indexes/tables where >> even caching the upper levels of a B-tree index would kill your >> working set in memory. I know, I know, everyone has so much memory >> and can just buy more... but this does matter. > > Have you actually investigated how little memory it takes to store the > inner pages? It's typically less than 1% of the entire index. AFAIK, > hash indexes are not used much in any other system. I think MySQL has > them, and SQL Server 2014 has special in-memory hash table indexes for > in memory tables, but that's all I can find on Google. I thought the theoretical advantage of hash indexes wasn't that they were smaller but that you avoided a central contention point (the btree root). Of course our current hash indexes have *more* not less contention than btree but I'm pretty comfortable chalking that up to quality of implementation rather than anything intrinsic. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I thought the theoretical advantage of hash indexes wasn't that they > were smaller but that you avoided a central contention point (the > btree root). > Of course our current hash indexes have *more* not less contention > than btree but I'm pretty comfortable chalking that up to quality of > implementation rather than anything intrinsic. The long and the short of it is that there are *lots* of implementation deficiences in our hash indexes. There's no real way to know whether they'd be competitive if all those things were rectified, except by doing the work to fix 'em. And it's hard to justify putting much effort into hash indexes so long as there's an elephant in the room of the size of "no WAL support". So I'm in favor of getting that fixed, if we have somebody who's willing to do it. It might lead to good things later; and even if it doesn't, the lack of WAL support is an embarrassment. regards, tom lane
On Wed, Apr 30, 2014 at 10:11 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I thought the theoretical advantage of hash indexes wasn't that they > were smaller but that you avoided a central contention point (the > btree root). The B-Tree root isn't really a central contention point at all. The locking/latching protocol that nbtree uses is remarkably concurrency-friendly. In the real world, there is pretty much no exclusive locking of the root page's buffer. > Of course our current hash indexes have *more* not less contention > than btree but I'm pretty comfortable chalking that up to quality of > implementation rather than anything intrinsic. I am not convinced of that. -- Peter Geoghegan
On Wed, Apr 30, 2014 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It might lead to good things later; and even if > it doesn't, the lack of WAL support is an embarrassment. I don't think it will, but I do agree that the current state of affairs is an embarrassment. -- Peter Geoghegan
On Wed, Apr 30, 2014 at 12:26 AM, Peter Geoghegan <pg@heroku.com> wrote:
On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:Frankly, I'm skeptical of the idea that hash indexes will ever really
>> As a GSoC student, I will implement WAL recovery of hash indexes using the
>> other index types' WAL code as a guide.
be useful. I realize that that's a counter-intuitive conclusion, but
there are many things we could do to improve B-Tree CPU costs to make
them closer to those of hash indexes, without making them any less
flexible. I myself would much rather work on that, and intend to.
If we don't put in the work to make them useful, then they won't ever become useful.
If we do put in the effort (and it would be considerable) then I think they will be. But you may be correct that the effort required would perhaps be better used in making btree even more better. I don't think we can conclude that definitively without putting in the work to do the experiment.
One advantage of the hash indexes is that the code is simple enough for someone to actually understand it in a summer. Whether it would still be like that after WAL logging was implemented, I don't know.
The O(1) cost seems attractive when you consider that that only
requires that we read one index page from disk to service any given
index scan, but in fact B-Trees almost always only require the same.
They are of course also much more flexible. The concurrency
characteristics B-Trees are a lot better understood.
Not sure what you mean there. The concurrency issues of the hash index has a lot less that needs to be understand. I think I understand it pretty well (unlike B-tree), I just don't know what to with that knowledge.
I sincerely
suggest that we forget about conventional hash table type indexes. I
fear they're a lost cause.
I understand that those are the only ones worth fighting for. :)
Cheers,
Jeff
On Wed, Apr 30, 2014 at 11:02 AM, Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Apr 30, 2014 at 10:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:The B-Tree root isn't really a central contention point at all. The
> I thought the theoretical advantage of hash indexes wasn't that they
> were smaller but that you avoided a central contention point (the
> btree root).
locking/latching protocol that nbtree uses is remarkably
concurrency-friendly. In the real world, there is pretty much no
exclusive locking of the root page's buffer.
I've seen the simple pinning and unpinning of the root page (or the fast root, whatever the first page we bother to pin on a regular basis is called) be a point of contention. When one index dominates the entire system workload, that one page also drives contention on the spin lock that protects the lwlock that share-protects whichever buffer mapping partition happens to contain it.
Cheers,
Jeff
On 2014-04-30 11:10:22 -0700, Jeff Janes wrote: > I've seen the simple pinning and unpinning of the root page (or the fast root, > whatever the first page we bother to pin on a regular basis is called) be a > point of contention. When one index dominates the entire system workload, that > one page also drives contention on the spin lock that protects the lwlock that > share-protects whichever buffer mapping partition happens to contain it. To quite some degree that's an implementation deficiency of our lwlocks though. I've seen *massive* improvements with my lwlock patch for that problem. Additionally we need to get rid of the spinlock around pin/unpin. That said, even after those optimizations, there remains a significant amount of cacheline bouncing. That's much easier to avoid for something like hash indexes than btrees. I think another advantage is that hash indexes can be *much* smaller than btree when the individual rows are wide. I wonder though if we couldn't solve that better by introducing "transforms" around the looked up data. E.g. allow to *transparently* use a hash(indexed_column) to be used. If you currently do that a lot of work has to be done in every query... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Is there a good reason for this thread being copied to the advocacy list? It seems to me just on topic for hackers. -- Darren Duncan