Thread: Call for Google Summer of Code (GSoC) 2012: Project ideas?

Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

Stefan Keller

Date:

08 March 2012, 15:40:42

Hi

I do have a student who is interested in participating at the Google
Summer of Code (GSoC) 2012
Now I have the "burden" to look for a cool project... Any ideas?

-Stefan

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

Andy Colson

Date:

08 March 2012, 16:01:30

On 03/08/2012 01:40 PM, Stefan Keller wrote:
> Hi
>
> I do have a student who is interested in participating at the Google
> Summer of Code (GSoC) 2012
> Now I have the "burden" to look for a cool project... Any ideas?
>
> -Stefan
>

How about one of:

1) on disk page level compression (maybe with LZF or snappy) (maybe not page level, any level really)

I know toast compresses, but I believe its only one row.  page level would compress better because there is more data,
andit would also decrease the amount of IO, so it might speed up disk access. 

2) better partitioning support.  Something much more automatic.

3) take a nice big table, have it inserted/updated a few times a second.  Then make "select * from bigtable where
indexed_field= 'somevalue'; work 10 times faster than it does today. 

I think there is also a wish list on the wiki somewhere.

-Andy

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

dennis jenkins

Date:

08 March 2012, 16:12:02

>> Now I have the "burden" to look for a cool project... Any ideas?
>>
>> -Stefan
>>
>
> How about one of:
>
> 1) on disk page level compression (maybe with LZF or snappy) (maybe not page
> level, any level really)
>
> I know toast compresses, but I believe its only one row.  page level would
> compress better because there is more data, and it would also decrease the
> amount of IO, so it might speed up disk access.
>
> 2) better partitioning support.  Something much more automatic.
>
> 3) take a nice big table, have it inserted/updated a few times a second.
>  Then make "select * from bigtable where indexed_field = 'somevalue'; work
> 10 times faster than it does today.
>
>
> I think there is also a wish list on the wiki somewhere.
>
> -Andy
>

Ability to dynamically resize the shared-memory segment without taking
postgresql down :)

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

Simon Riggs

Date:

08 March 2012, 18:09:16

On Thu, Mar 8, 2012 at 8:01 PM, Andy Colson <andy@squeakycode.net> wrote:
> On 03/08/2012 01:40 PM, Stefan Keller wrote:
>>
>> Hi
>>
>> I do have a student who is interested in participating at the Google
>> Summer of Code (GSoC) 2012
>> Now I have the "burden" to look for a cool project... Any ideas?
>>
>> -Stefan
>>
>
> How about one of:
>
> 1) on disk page level compression (maybe with LZF or snappy) (maybe not page
> level, any level really)
>
> I know toast compresses, but I believe its only one row.  page level would
> compress better because there is more data, and it would also decrease the
> amount of IO, so it might speed up disk access.
>
> 2) better partitioning support.  Something much more automatic.
>
> 3) take a nice big table, have it inserted/updated a few times a second.
>  Then make "select * from bigtable where indexed_field = 'somevalue'; work
> 10 times faster than it does today.
>
>
> I think there is also a wish list on the wiki somewhere.

Nice ideas

Those aren't projects we should be giving to summer students. I don't
suppose many people could do those things in two months, let alone
people with the least experience in both their career and our
codebase.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

Merlin Moncure

Date:

09 March 2012, 11:47:26

On Thu, Mar 8, 2012 at 2:01 PM, Andy Colson <andy@squeakycode.net> wrote:
> I know toast compresses, but I believe its only one row.  page level would
> compress better because there is more data, and it would also decrease the
> amount of IO, so it might speed up disk access.

er, but when data is toasted it's spanning pages.  page level
compression is a super complicated problem.

something that is maybe more attainable on the compression side of
things is a userland api for compression -- like pgcrypto is for
encryption.  even if it didn't make it into core, it could live on
reasonably as a pgfoundry project.

merlin

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

Andy Colson

Date:

09 March 2012, 12:20:08

On 3/9/2012 9:47 AM, Merlin Moncure wrote:
> On Thu, Mar 8, 2012 at 2:01 PM, Andy Colson<andy@squeakycode.net>  wrote:
>> I know toast compresses, but I believe its only one row.  page level would
>> compress better because there is more data, and it would also decrease the
>> amount of IO, so it might speed up disk access.
>
> er, but when data is toasted it's spanning pages.  page level
> compression is a super complicated problem.
>
> something that is maybe more attainable on the compression side of
> things is a userland api for compression -- like pgcrypto is for
> encryption.  even if it didn't make it into core, it could live on
> reasonably as a pgfoundry project.
>
> merlin

Agreed its probably too difficult for a GSoC project.  But userland api
would still be row level, which, in my opinion is useless.  Consider
rows from my apache log that I'm dumping to database:

date, url, status
2012-3-9 10:15:00, '/index.php?id=4', 202
2012-3-9 10:15:01, '/index.php?id=5', 202
2012-3-9 10:15:02, '/index.php?id=6', 202

That wont compress at all on a row level.  But it'll compress 99% on a
"larger" (page/multirow/whatever/?) level.

-Andy

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

Merlin Moncure

Date:

09 March 2012, 13:06:19

On Fri, Mar 9, 2012 at 10:19 AM, Andy Colson <andy@squeakycode.net> wrote:
> On 3/9/2012 9:47 AM, Merlin Moncure wrote:
>>
>> On Thu, Mar 8, 2012 at 2:01 PM, Andy Colson<andy@squeakycode.net>  wrote:
>>>
>>> I know toast compresses, but I believe its only one row.  page level
>>> would
>>> compress better because there is more data, and it would also decrease
>>> the
>>> amount of IO, so it might speed up disk access.
>>
>>
>> er, but when data is toasted it's spanning pages.  page level
>> compression is a super complicated problem.
>>
>> something that is maybe more attainable on the compression side of
>> things is a userland api for compression -- like pgcrypto is for
>> encryption.  even if it didn't make it into core, it could live on
>> reasonably as a pgfoundry project.
>>
>> merlin
>
>
> Agreed its probably too difficult for a GSoC project.  But userland api
> would still be row level, which, in my opinion is useless.  Consider rows
> from my apache log that I'm dumping to database:

It's useless for what you're trying to do, but it would be useful to
people trying to compress large datums (data, I  know) before storage
using algorithms that postgres can't support, like lzo.

> date, url, status
> 2012-3-9 10:15:00, '/index.php?id=4', 202
> 2012-3-9 10:15:01, '/index.php?id=5', 202
> 2012-3-9 10:15:02, '/index.php?id=6', 202
>
> That wont compress at all on a row level.  But it'll compress 99% on a
> "larger" (page/multirow/whatever/?) level.

sure, but you can only get those rates by giving up the segmented view
of the data that postgres requires.  your tuples are very small and I
only see compression happening on the userland side by employing
tricks specific to your specific dataset (like employing "char" to map
the status, url mapping, etc).

merlin

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

Selena Deckelmann

Date:

09 March 2012, 13:18:19

Hi!

On Thursday, March 8, 2012 at 11:40 AM, Stefan Keller wrote:

Hi

I do have a student who is interested in participating at the Google
Summer of Code (GSoC) 2012
Now I have the "burden" to look for a cool project... Any ideas?

Also those who are on this thread, we are collecting ideas on the wiki:

http://wiki.postgresql.org/wiki/GSoC_2012

And we have the TODO list:

http://wiki.postgresql.org/wiki/TODO

-selena

http://chesnok.com

@selenamarie

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

"Albe Laurenz"

Date:

12 March 2012, 06:34:40

Selena Deckelmann wrote:
> On Thursday, March 8, 2012 at 11:40 AM, Stefan Keller wrote:
>> I do have a student who is interested in participating at the Google
>> Summer of Code (GSoC) 2012
>> Now I have the "burden" to look for a cool project... Any ideas?
> 
> Also those who are on this thread, we are collecting ideas on the wiki:
> 
> http://wiki.postgresql.org/wiki/GSoC_2012

I have added Foreign Data Wrappers.
I think that would be a good idea for anybody who wants a clearly
defined project - the API is (currently changing but) documented,
it's a good opportunity to learn hacking PostgreSQL server code,
and you can leverage your knowledge of other software.

Yours,
Laurenz Albe

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

Samba

Date:

13 March 2012, 15:59:35

Excuse me if what i say below is nonsensical, for I haven't read much about compression techniques and hence these ramblings are just out of common sense.

I think the debate about level (row, page, file) of compression arises when we strictly stick to the axioms of compression which require that all the info that would be needed for decompression must also be presented in the same compressed unit.

Can't we relax this rule a bit and separate out the compression-hints into separate file, like the way we have a table data in one file and the positional references [indexes] in another file? will it not eliminate this dilemma about the boundaries of compression?

perhaps a periodic auto vacuum like compressor daemon can take up the job of recompression to have the compression-hints updated as per the latest data present in the file/page at that instant.

Regards,
Samba

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

John R Pierce

Date:

13 March 2012, 16:08:01

On 03/08/12 12:01 PM, Andy Colson wrote:
>
> 2) better partitioning support.  Something much more automatic.

that would be really high on our list.  and something that can handle
adding/dropping partitions while there's concurrent transactions
involving the partitioned table

also a planner that can cope with optimizing prepared statements where
the partitioning variable is a passed parameter.

--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

Kiriakos Georgiou

Date:

13 March 2012, 20:22:27

+1 to seamless partitioning.

Although the idea of having a student work on this seems a bit scary, but what seems scary to me may be a piece of cake
fora talented kid :-) 

Kiriakos
http://www.mockbites.com



On Mar 13, 2012, at 3:07 PM, John R Pierce wrote:

> On 03/08/12 12:01 PM, Andy Colson wrote:
>>
>> 2) better partitioning support.  Something much more automatic.
>
> that would be really high on our list.  and something that can handle adding/dropping partitions while there's
concurrenttransactions involving the partitioned table 
>
> also a planner that can cope with optimizing prepared statements where the partitioning variable is a passed
parameter.

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

Thomas Kellerer

Date:

14 March 2012, 11:25:01

Stefan Keller, 08.03.2012 20:40:
> Hi
>
> I do have a student who is interested in participating at the Google
> Summer of Code (GSoC) 2012
> Now I have the "burden" to look for a cool project... Any ideas?
>
> -Stefan
>

What about an extension to the CREATE TRIGGER syntax that combines trigger definition and function definition in a
singlestatement? 

Something like:

CREATE TRIGGER my_trg BEFORE UPDATE ON some_table
     FOR EACH ROW EXECUTE
DO
$body$
BEGIN
    ... here goes the function code ...
END;
$body$
LANGUAGE plpgsql;

which would create both objects (trigger and trigger function) at the same time in the background.

The CASCADE option of DROP TRIGGER could be enhanced to include the corresponding function in the DROP as well.

This would make the syntax a bit easier to handle for those cases where a 1:1 relationship exists between triggers and
functionsbut would still allow the flexibility to re-use trigger functions in more than one trigger. 

Regards
Thomas

Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From

Stefan Keller

Date:

14 March 2012, 11:51:56

Hi all,

2012/3/14 Thomas Kellerer <spam_eater@gmx.net>:
> Stefan Keller, 08.03.2012 20:40:
>
>> Hi
>>
>> I do have a student who is interested in participating at the Google
>> Summer of Code (GSoC) 2012
>> Now I have the "burden" to look for a cool project... Any ideas?
>>
>> -Stefan
>>
>
> What about an extension to the CREATE TRIGGER syntax that combines trigger
> definition and function definition in a single statement?
>
> Something like:
>
> CREATE TRIGGER my_trg BEFORE UPDATE ON some_table
>    FOR EACH ROW EXECUTE
> DO
> $body$
> BEGIN
>   ... here goes the function code ...
> END;
> $body$
> LANGUAGE plpgsql;
>
> which would create both objects (trigger and trigger function) at the same
> time in the background.
>
> The CASCADE option of DROP TRIGGER could be enhanced to include the
> corresponding function in the DROP as well.
>
> This would make the syntax a bit easier to handle for those cases where a
> 1:1 relationship exists between triggers and functions but would still allow
> the flexibility to re-use trigger functions in more than one trigger.
>
> Regards
> Thomas

Thanks to all who responded here.
There are now two students here at our university and it seems that
they prefer another open source project (which I support too).
Let's take some these good ideas to the Postgres wiki (if there is an
idea page there :->)

-Stefan