Thread: Call for Google Summer of Code (GSoC) 2012: Project ideas?
Hi I do have a student who is interested in participating at the Google Summer of Code (GSoC) 2012 Now I have the "burden" to look for a cool project... Any ideas? -Stefan
On 03/08/2012 01:40 PM, Stefan Keller wrote: > Hi > > I do have a student who is interested in participating at the Google > Summer of Code (GSoC) 2012 > Now I have the "burden" to look for a cool project... Any ideas? > > -Stefan > How about one of: 1) on disk page level compression (maybe with LZF or snappy) (maybe not page level, any level really) I know toast compresses, but I believe its only one row. page level would compress better because there is more data, andit would also decrease the amount of IO, so it might speed up disk access. 2) better partitioning support. Something much more automatic. 3) take a nice big table, have it inserted/updated a few times a second. Then make "select * from bigtable where indexed_field= 'somevalue'; work 10 times faster than it does today. I think there is also a wish list on the wiki somewhere. -Andy
>> Now I have the "burden" to look for a cool project... Any ideas? >> >> -Stefan >> > > How about one of: > > 1) on disk page level compression (maybe with LZF or snappy) (maybe not page > level, any level really) > > I know toast compresses, but I believe its only one row. page level would > compress better because there is more data, and it would also decrease the > amount of IO, so it might speed up disk access. > > 2) better partitioning support. Something much more automatic. > > 3) take a nice big table, have it inserted/updated a few times a second. > Then make "select * from bigtable where indexed_field = 'somevalue'; work > 10 times faster than it does today. > > > I think there is also a wish list on the wiki somewhere. > > -Andy > Ability to dynamically resize the shared-memory segment without taking postgresql down :)
On Thu, Mar 8, 2012 at 8:01 PM, Andy Colson <andy@squeakycode.net> wrote: > On 03/08/2012 01:40 PM, Stefan Keller wrote: >> >> Hi >> >> I do have a student who is interested in participating at the Google >> Summer of Code (GSoC) 2012 >> Now I have the "burden" to look for a cool project... Any ideas? >> >> -Stefan >> > > How about one of: > > 1) on disk page level compression (maybe with LZF or snappy) (maybe not page > level, any level really) > > I know toast compresses, but I believe its only one row. page level would > compress better because there is more data, and it would also decrease the > amount of IO, so it might speed up disk access. > > 2) better partitioning support. Something much more automatic. > > 3) take a nice big table, have it inserted/updated a few times a second. > Then make "select * from bigtable where indexed_field = 'somevalue'; work > 10 times faster than it does today. > > > I think there is also a wish list on the wiki somewhere. Nice ideas Those aren't projects we should be giving to summer students. I don't suppose many people could do those things in two months, let alone people with the least experience in both their career and our codebase. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Mar 8, 2012 at 2:01 PM, Andy Colson <andy@squeakycode.net> wrote: > I know toast compresses, but I believe its only one row. page level would > compress better because there is more data, and it would also decrease the > amount of IO, so it might speed up disk access. er, but when data is toasted it's spanning pages. page level compression is a super complicated problem. something that is maybe more attainable on the compression side of things is a userland api for compression -- like pgcrypto is for encryption. even if it didn't make it into core, it could live on reasonably as a pgfoundry project. merlin
On 3/9/2012 9:47 AM, Merlin Moncure wrote: > On Thu, Mar 8, 2012 at 2:01 PM, Andy Colson<andy@squeakycode.net> wrote: >> I know toast compresses, but I believe its only one row. page level would >> compress better because there is more data, and it would also decrease the >> amount of IO, so it might speed up disk access. > > er, but when data is toasted it's spanning pages. page level > compression is a super complicated problem. > > something that is maybe more attainable on the compression side of > things is a userland api for compression -- like pgcrypto is for > encryption. even if it didn't make it into core, it could live on > reasonably as a pgfoundry project. > > merlin Agreed its probably too difficult for a GSoC project. But userland api would still be row level, which, in my opinion is useless. Consider rows from my apache log that I'm dumping to database: date, url, status 2012-3-9 10:15:00, '/index.php?id=4', 202 2012-3-9 10:15:01, '/index.php?id=5', 202 2012-3-9 10:15:02, '/index.php?id=6', 202 That wont compress at all on a row level. But it'll compress 99% on a "larger" (page/multirow/whatever/?) level. -Andy
On Fri, Mar 9, 2012 at 10:19 AM, Andy Colson <andy@squeakycode.net> wrote: > On 3/9/2012 9:47 AM, Merlin Moncure wrote: >> >> On Thu, Mar 8, 2012 at 2:01 PM, Andy Colson<andy@squeakycode.net> wrote: >>> >>> I know toast compresses, but I believe its only one row. page level >>> would >>> compress better because there is more data, and it would also decrease >>> the >>> amount of IO, so it might speed up disk access. >> >> >> er, but when data is toasted it's spanning pages. page level >> compression is a super complicated problem. >> >> something that is maybe more attainable on the compression side of >> things is a userland api for compression -- like pgcrypto is for >> encryption. even if it didn't make it into core, it could live on >> reasonably as a pgfoundry project. >> >> merlin > > > Agreed its probably too difficult for a GSoC project. But userland api > would still be row level, which, in my opinion is useless. Consider rows > from my apache log that I'm dumping to database: It's useless for what you're trying to do, but it would be useful to people trying to compress large datums (data, I know) before storage using algorithms that postgres can't support, like lzo. > date, url, status > 2012-3-9 10:15:00, '/index.php?id=4', 202 > 2012-3-9 10:15:01, '/index.php?id=5', 202 > 2012-3-9 10:15:02, '/index.php?id=6', 202 > > That wont compress at all on a row level. But it'll compress 99% on a > "larger" (page/multirow/whatever/?) level. sure, but you can only get those rates by giving up the segmented view of the data that postgres requires. your tuples are very small and I only see compression happening on the userland side by employing tricks specific to your specific dataset (like employing "char" to map the status, url mapping, etc). merlin
Hi!
On Thursday, March 8, 2012 at 11:40 AM, Stefan Keller wrote:
HiI do have a student who is interested in participating at the GoogleSummer of Code (GSoC) 2012Now I have the "burden" to look for a cool project... Any ideas?
Also those who are on this thread, we are collecting ideas on the wiki:
And we have the TODO list:
-selena
Selena Deckelmann wrote: > On Thursday, March 8, 2012 at 11:40 AM, Stefan Keller wrote: >> I do have a student who is interested in participating at the Google >> Summer of Code (GSoC) 2012 >> Now I have the "burden" to look for a cool project... Any ideas? > > Also those who are on this thread, we are collecting ideas on the wiki: > > http://wiki.postgresql.org/wiki/GSoC_2012 I have added Foreign Data Wrappers. I think that would be a good idea for anybody who wants a clearly defined project - the API is (currently changing but) documented, it's a good opportunity to learn hacking PostgreSQL server code, and you can leverage your knowledge of other software. Yours, Laurenz Albe
Excuse me if what i say below is nonsensical, for I haven't read much about compression techniques and hence these ramblings are just out of common sense.
I think the debate about level (row, page, file) of compression arises when we strictly stick to the axioms of compression which require that all the info that would be needed for decompression must also be presented in the same compressed unit.
Can't we relax this rule a bit and separate out the compression-hints into separate file, like the way we have a table data in one file and the positional references [indexes] in another file? will it not eliminate this dilemma about the boundaries of compression?
perhaps a periodic auto vacuum like compressor daemon can take up the job of recompression to have the compression-hints updated as per the latest data present in the file/page at that instant.
Regards,
Samba
I think the debate about level (row, page, file) of compression arises when we strictly stick to the axioms of compression which require that all the info that would be needed for decompression must also be presented in the same compressed unit.
Can't we relax this rule a bit and separate out the compression-hints into separate file, like the way we have a table data in one file and the positional references [indexes] in another file? will it not eliminate this dilemma about the boundaries of compression?
perhaps a periodic auto vacuum like compressor daemon can take up the job of recompression to have the compression-hints updated as per the latest data present in the file/page at that instant.
Regards,
Samba
On 03/08/12 12:01 PM, Andy Colson wrote: > > 2) better partitioning support. Something much more automatic. that would be really high on our list. and something that can handle adding/dropping partitions while there's concurrent transactions involving the partitioned table also a planner that can cope with optimizing prepared statements where the partitioning variable is a passed parameter. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
+1 to seamless partitioning. Although the idea of having a student work on this seems a bit scary, but what seems scary to me may be a piece of cake fora talented kid :-) Kiriakos http://www.mockbites.com On Mar 13, 2012, at 3:07 PM, John R Pierce wrote: > On 03/08/12 12:01 PM, Andy Colson wrote: >> >> 2) better partitioning support. Something much more automatic. > > that would be really high on our list. and something that can handle adding/dropping partitions while there's concurrenttransactions involving the partitioned table > > also a planner that can cope with optimizing prepared statements where the partitioning variable is a passed parameter.
Stefan Keller, 08.03.2012 20:40: > Hi > > I do have a student who is interested in participating at the Google > Summer of Code (GSoC) 2012 > Now I have the "burden" to look for a cool project... Any ideas? > > -Stefan > What about an extension to the CREATE TRIGGER syntax that combines trigger definition and function definition in a singlestatement? Something like: CREATE TRIGGER my_trg BEFORE UPDATE ON some_table FOR EACH ROW EXECUTE DO $body$ BEGIN ... here goes the function code ... END; $body$ LANGUAGE plpgsql; which would create both objects (trigger and trigger function) at the same time in the background. The CASCADE option of DROP TRIGGER could be enhanced to include the corresponding function in the DROP as well. This would make the syntax a bit easier to handle for those cases where a 1:1 relationship exists between triggers and functionsbut would still allow the flexibility to re-use trigger functions in more than one trigger. Regards Thomas
Hi all, 2012/3/14 Thomas Kellerer <spam_eater@gmx.net>: > Stefan Keller, 08.03.2012 20:40: > >> Hi >> >> I do have a student who is interested in participating at the Google >> Summer of Code (GSoC) 2012 >> Now I have the "burden" to look for a cool project... Any ideas? >> >> -Stefan >> > > What about an extension to the CREATE TRIGGER syntax that combines trigger > definition and function definition in a single statement? > > Something like: > > CREATE TRIGGER my_trg BEFORE UPDATE ON some_table > FOR EACH ROW EXECUTE > DO > $body$ > BEGIN > ... here goes the function code ... > END; > $body$ > LANGUAGE plpgsql; > > which would create both objects (trigger and trigger function) at the same > time in the background. > > The CASCADE option of DROP TRIGGER could be enhanced to include the > corresponding function in the DROP as well. > > This would make the syntax a bit easier to handle for those cases where a > 1:1 relationship exists between triggers and functions but would still allow > the flexibility to re-use trigger functions in more than one trigger. > > Regards > Thomas Thanks to all who responded here. There are now two students here at our university and it seems that they prefer another open source project (which I support too). Let's take some these good ideas to the Postgres wiki (if there is an idea page there :->) -Stefan