Re: Berkeley and CMU classes adopt/extend PostgreSQL - Mailing list pgsql-hackers

From Marc G. Fournier
Subject Re: Berkeley and CMU classes adopt/extend PostgreSQL
Date
Msg-id 20030214194204.P23108@hub.org
Whole thread Raw
In response to Berkeley and CMU classes adopt/extend PostgreSQL  ("Joe Hellerstein" <jmh@cs.berkeley.edu>)
List pgsql-hackers
On Tue, 11 Feb 2003, Joe Hellerstein wrote:

> Hi all:
>     I emailed Marc Fournier on this topic some weeks back, but haven't
> heard from him.

And most public apologies for that ... this past month has been a complete
nightmare all around ... we're just finishing up moving our office, and
finally have phone lines again, and hope to have internet again starting
tomorrow ... :(

> 1) We changed the course projects to make the students hack PostgreSQL
> internals, rather than the "minibase" eduware
> 2) We are coordinating the class with a class at CMU being taught by
> Prof. Anastassia ("Natassa") Ailamaki
>
> Our "Homework 2", which is being passed out this week, will ask the
> students to implement a hash-based grouping that spills to disk.  I
> understand this topic has been batted about the pgsql-hackers list
> recently.  The TAs who've prepared the assignment (Sailesh
> Krishnamurthy at Berkeley and Spiros Papadimitriou at CMU) have also
> implemented a reference solution to assignment.  Once we've got the
> students' projects all turned in, we'll be very happy to contribute our
> code back the PostgreSQL project.
>
> I'm hopeful this will lead to many good things:
>
> 1) Each year we can pick another feature to assign in class, and
> contribute back.  We'll need to come up with well-scoped engine
> features that exercise concepts from the class -- eventually we'll run
> out of tractable things that PGSQL needs, but not in the next couple
> years I bet.
>
> 2) We'll raise a crop of good students who know Postgres internals.
> Roughly half the Berkeley EECS undergrads take the DB class, and all of
> them will be post-hackers!  (Again, I don't know the stats at CMU.)
>
> So consider this a heads up on the hash-agg front, and on the future
> contributions front.   I'll follow up with another email on
> PostgreSQL-centered research in our group at Berkeley as well.
>
> Another favor I'd ask is that people on the list be a bit hesitant
> about helping our students with their homework!  We would like them to
> do it themselves, more or less :-)
>
> Regards,
> Joe Hellerstein
>
> --
>
> Joseph M. Hellerstein
> Professor, EECS Computer Science Division
> UC Berkeley
> http://www.cs.berkeley.edu/~jmh
>
>
> On Tuesday, February 11, 2003, at 06:54  PM, Sailesh Krishnamurthy
> wrote:
>
> > From: Hannu Krosing <hannu@tm.ee>
> > Date: Tue Feb 11, 2003  12:21:26  PM US/Pacific
> > To: Tom Lane <tgl@sss.pgh.pa.us>
> > Cc: Bruno Wolff III <bruno@wolff.to>, Greg Stark <gsstark@mit.edu>,
> > pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] Hash grouping, aggregates
> >
> >
> > Tom Lane kirjutas T, 11.02.2003 kell 18:39:
> >> Bruno Wolff III <bruno@wolff.to> writes:
> >>>   Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>>> Greg Stark <gsstark@mit.edu> writes:
> >>>>> The neat thing is that hash aggregates would allow grouping on
> >>>>> data types that
> >>>>> have = operators but no useful < operator.
> >>>>
> >>>> Hm.  Right now I think that would barf on you, because the parser
> >>>> wants
> >>>> to find the '<' operator to label the grouping column with, even if
> >>>> the
> >>>> planner later decides not to use it.  It'd take some redesign of the
> >>>> query data structure (specifically SortClause/GroupClause) to avoid
> >>>> that.
> >>
> >>> I think another issue is that for some = operators you still might
> >>> not
> >>> be able to use a hash. I would expect the discussion for hash joins
> >>> in
> >>> http://developer.postgresql.org/docs/postgres/xoper-optimization.html
> >>> would to hash aggregates as well.
> >>
> >> Right, the = operator must be hashable or you're out of luck.  But we
> >> could imagine tweaking the parser to allow GROUP BY if it finds a
> >> hashable = operator and no sort operator.  The only objection I can
> >> see
> >> to this is that it means the planner *must* use hash aggregation,
> >> which
> >> might be a bad move if there are too many distinct groups.
> >
> > If we run out of sort memory, we can always bail out later, preferrably
> > with a descriptive error message. It is not as elegant as erring out at
> > parse (or even plan/optimise) time, but the result is /almost/ the
> > same.
> >
> > Relying on hash aggregation will become essential if we are ever going
> > to implement the "other" groupings (CUBE, ROLLUP, (), ...), so it would
> > be nice if hash aggregation could also overflow to disk - I suspect
> > that
> > this will still be faster that running an independent scan for each
> > GROUP BY grouping and merging the results.
> >
> > -----
> > Hannu
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to
> > majordomo@postgresql.org
> >
> >
> >
> >
> > --
> > Pip-pip
> > Sailesh
> > http://www.cs.berkeley.edu/~sailesh
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: location of the configuration files
Next
From: Bruce Momjian
Date:
Subject: Re: client_encoding directive is ignored in