Berkeley and CMU classes adopt/extend PostgreSQL - Mailing list pgsql-hackers

From Joe Hellerstein
Subject Berkeley and CMU classes adopt/extend PostgreSQL
Date
Msg-id 8E12D10F-3E3B-11D7-8738-0003938012B0@cs.berkeley.edu
Whole thread Raw
Responses Re: Berkeley and CMU classes adopt/extend PostgreSQL
List pgsql-hackers
Hi all:I emailed Marc Fournier on this topic some weeks back, but haven't 
heard from him.

I am teaching the undergrad DB course at UC Berkeley, something I do 
with some frequency.  We have the usual 180 students  we get every 
semester (yep: 180!), but this year we've instituted 2 changes:

1) We changed the course projects to make the students hack PostgreSQL 
internals, rather than the "minibase" eduware
2) We are coordinating the class with a class at CMU being taught by 
Prof. Anastassia ("Natassa") Ailamaki

Our "Homework 2", which is being passed out this week, will ask the 
students to implement a hash-based grouping that spills to disk.  I 
understand this topic has been batted about the pgsql-hackers list 
recently.  The TAs who've prepared the assignment (Sailesh 
Krishnamurthy at Berkeley and Spiros Papadimitriou at CMU) have also 
implemented a reference solution to assignment.  Once we've got the 
students' projects all turned in, we'll be very happy to contribute our 
code back the PostgreSQL project.

I'm hopeful this will lead to many good things:

1) Each year we can pick another feature to assign in class, and 
contribute back.  We'll need to come up with well-scoped engine 
features that exercise concepts from the class -- eventually we'll run 
out of tractable things that PGSQL needs, but not in the next couple 
years I bet.

2) We'll raise a crop of good students who know Postgres internals.  
Roughly half the Berkeley EECS undergrads take the DB class, and all of 
them will be post-hackers!  (Again, I don't know the stats at CMU.)

So consider this a heads up on the hash-agg front, and on the future 
contributions front.   I'll follow up with another email on 
PostgreSQL-centered research in our group at Berkeley as well.

Another favor I'd ask is that people on the list be a bit hesitant 
about helping our students with their homework!  We would like them to 
do it themselves, more or less :-)

Regards,
Joe Hellerstein

--

Joseph M. Hellerstein
Professor, EECS Computer Science Division
UC Berkeley
http://www.cs.berkeley.edu/~jmh


On Tuesday, February 11, 2003, at 06:54  PM, Sailesh Krishnamurthy 
wrote:

> From: Hannu Krosing <hannu@tm.ee>
> Date: Tue Feb 11, 2003  12:21:26  PM US/Pacific
> To: Tom Lane <tgl@sss.pgh.pa.us>
> Cc: Bruno Wolff III <bruno@wolff.to>, Greg Stark <gsstark@mit.edu>, 
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Hash grouping, aggregates
>
>
> Tom Lane kirjutas T, 11.02.2003 kell 18:39:
>> Bruno Wolff III <bruno@wolff.to> writes:
>>>   Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Greg Stark <gsstark@mit.edu> writes:
>>>>> The neat thing is that hash aggregates would allow grouping on 
>>>>> data types that
>>>>> have = operators but no useful < operator.
>>>>
>>>> Hm.  Right now I think that would barf on you, because the parser 
>>>> wants
>>>> to find the '<' operator to label the grouping column with, even if 
>>>> the
>>>> planner later decides not to use it.  It'd take some redesign of the
>>>> query data structure (specifically SortClause/GroupClause) to avoid 
>>>> that.
>>
>>> I think another issue is that for some = operators you still might 
>>> not
>>> be able to use a hash. I would expect the discussion for hash joins 
>>> in
>>> http://developer.postgresql.org/docs/postgres/xoper-optimization.html
>>> would to hash aggregates as well.
>>
>> Right, the = operator must be hashable or you're out of luck.  But we
>> could imagine tweaking the parser to allow GROUP BY if it finds a
>> hashable = operator and no sort operator.  The only objection I can 
>> see
>> to this is that it means the planner *must* use hash aggregation, 
>> which
>> might be a bad move if there are too many distinct groups.
>
> If we run out of sort memory, we can always bail out later, preferrably
> with a descriptive error message. It is not as elegant as erring out at
> parse (or even plan/optimise) time, but the result is /almost/ the 
> same.
>
> Relying on hash aggregation will become essential if we are ever going
> to implement the "other" groupings (CUBE, ROLLUP, (), ...), so it would
> be nice if hash aggregation could also overflow to disk - I suspect 
> that
> this will still be faster that running an independent scan for each
> GROUP BY grouping and merging the results.
>
> -----
> Hannu
>
>
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to 
> majordomo@postgresql.org
>
>
>
>
> -- 
> Pip-pip
> Sailesh
> http://www.cs.berkeley.edu/~sailesh
>



pgsql-hackers by date:

Previous
From: Robert Osowiecki
Date:
Subject: Views and unique indicies optimisation
Next
From: Jason Hihn
Date:
Subject: Re: Changing the default configuration (was Re: [pgsql-advocacy]