Thread: Google SoC--Idea Request

Google SoC--Idea Request

From

"Jonah H. Harris"

Date:

15 April 2006, 16:05:24

Hey everyone,

I know we started a discussion a month or so ago regarding ideas for
SoC projects.  However, after reading through the thread, I didn't see
us nail down any actual items.

As such, we need to quickly put together a list of oh, 15-20 midlevel
project ideas.  I'm sure we can pull some off the TODO list, but we
should also look at project ideas for porting some of the most used
third-party OSS software to PostgreSQL too (portals, CMS systems,
accounting systems, etc.).

All ideas welcome!

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

Re: Google SoC--Idea Request

From

"Dave Page"

Date:

15 April 2006, 17:25:06

-----Original Message-----
From: "Jonah H. Harris"<jonah.harris@gmail.com>
Sent: 15/04/06 20:06:27
To: "Pgsql Hackers"<pgsql-hackers@postgresql.org>
Subject: [HACKERS] Google SoC--Idea Request

> As such, we need to quickly put together a list of oh, 15-20 midlevel
> project ideas.

There's a couple of listen/notify todos iirc that would be nice to get done - one to allow a message to be sent with
thenotify, and one to move from a table based design to shared mem/disk. 

Regards, Dave

-----Unmodified Original Message-----
Hey everyone,

I know we started a discussion a month or so ago regarding ideas for
SoC projects.  However, after reading through the thread, I didn't see
us nail down any actual items.

As such, we need to quickly put together a list of oh, 15-20 midlevel
project ideas.  I'm sure we can pull some off the TODO list, but we
should also look at project ideas for porting some of the most used
third-party OSS software to PostgreSQL too (portals, CMS systems,
accounting systems, etc.).

All ideas welcome!

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: Google SoC--Idea Request

From

Neil Conway

Date:

15 April 2006, 20:02:48

On Sat, 2006-04-15 at 21:24 +0100, Dave Page wrote:
> one to allow a message to be sent with the notify, and one to move
> from a table based design to shared mem/disk.

Doing the latter is a precondition for implementing the former in a
reasonable way, I believe.

BTW, these two web log entries summarizing Mono and Mozilla's
experiences with SoC might make interesting reading:

http://weblogs.mozillazine.org/gerv/archives/2006/03/summer_of_code_six_months_on.html
http://tirania.org/blog/archive/2006/Apr-13.html

> we should also look at project ideas for porting some of the most used
> third-party OSS software to PostgreSQL too (portals, CMS systems,
> accounting systems, etc.).

Given the above, I would be wary of such projects bit-rotting. If the
upstream project hasn't bothered to add PostgreSQL support, there might
be a good reason why: writing truly database-agnostic applications is
not always easy (or even desirable).

-Neil

Re: Google SoC--Idea Request

From

"Jonah H. Harris"

Date:

15 April 2006, 20:25:36

On 4/15/06, Neil Conway <neilc@samurai.com> wrote:
> Doing the latter is a precondition for implementing the former in a
> reasonable way, I believe.

> BTW, these two web log entries summarizing Mono and Mozilla's
> experiences with SoC might make interesting reading:

Thanks for the reading material.  I don't think our project is exactly
the same, but it's good information to keep in mind.

> Given the above, I would be wary of such projects bit-rotting. If the
> upstream project hasn't bothered to add PostgreSQL support, there might
> be a good reason why: writing truly database-agnostic applications is
> not always easy (or even desirable).

This isn't always the case.  In a lot of cases, the developers just
wanted to take the easy route and used MySQL... they have a lot of
people asking for PostgreSQL support but they don't have the expertise
to add it themselves.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

Re: Google SoC--Idea Request

From

Robert Treat

Date:

15 April 2006, 23:44:37

On Saturday 15 April 2006 19:25, Jonah H. Harris wrote:
> On 4/15/06, Neil Conway <neilc@samurai.com> wrote:
> > Doing the latter is a precondition for implementing the former in a
> > reasonable way, I believe.
> >
> >
> > BTW, these two web log entries summarizing Mono and Mozilla's
> > experiences with SoC might make interesting reading:
>
> Thanks for the reading material.  I don't think our project is exactly
> the same, but it's good information to keep in mind.
>

Agreed. I sent some ideas to Josh, was thinking he might be posting a list 
soon. I kept it aimed at a few ideas I have had/seen that need an initial 
push to get going but beyond that could be (and likely would be) community 
maintained.  Example?  Extendning the build farm code to test external pl 
langs or database drivers or patches other modules.  We've talked about it, 
and if someone had the time to make the push, I believe this would be 
community maintained going forward. 

> > Given the above, I would be wary of such projects bit-rotting. If the
> > upstream project hasn't bothered to add PostgreSQL support, there might
> > be a good reason why: writing truly database-agnostic applications is
> > not always easy (or even desirable).
>
> This isn't always the case.  In a lot of cases, the developers just
> wanted to take the easy route and used MySQL... they have a lot of
> people asking for PostgreSQL support but they don't have the expertise
> to add it themselves.
>

I think more importantly is that the time needed to do an initial port is 
often much greater than it is to maintain a port.

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

Re: Google SoC--Idea Request

From

"Dave Page"

Date:

16 April 2006, 05:29:56

-----Original Message-----
From: "Jonah H. Harris"<jonah.harris@gmail.com>
Sent: 15/04/06 20:06:27
To: "Pgsql Hackers"<pgsql-hackers@postgresql.org>
Subject: [HACKERS] Google SoC--Idea Request

> As such, we need to quickly put together a list of oh, 15-20 midlevel
> project ideas.

Another thought - a nice C++ project, requiring minimal previous knowledge of existing code would be to add a query
builderto pgAdmin. 

Regards, Dave

-----Unmodified Original Message-----
Hey everyone,

I know we started a discussion a month or so ago regarding ideas for
SoC projects.  However, after reading through the thread, I didn't see
us nail down any actual items.

As such, we need to quickly put together a list of oh, 15-20 midlevel
project ideas.  I'm sure we can pull some off the TODO list, but we
should also look at project ideas for porting some of the most used
third-party OSS software to PostgreSQL too (portals, CMS systems,
accounting systems, etc.).

All ideas welcome!

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: Google SoC--Idea Request

From

Ned Lilly

Date:

17 April 2006, 10:32:46

OpenMFG has done some work on getting PostgreSQL working with the Drupal CMS and the Mantis bugtracker (and also
integratingthose two, btw).  We're in contact with the respective projects about getting our patches worked in, but if
anyone'skeeping a tally, just wanted you to be aware.
 

Regards,
Ned


Jonah H. Harris wrote:
> Hey everyone,
> 
> I know we started a discussion a month or so ago regarding ideas for
> SoC projects.  However, after reading through the thread, I didn't see
> us nail down any actual items.
> 
> As such, we need to quickly put together a list of oh, 15-20 midlevel
> project ideas.  I'm sure we can pull some off the TODO list, but we
> should also look at project ideas for porting some of the most used
> third-party OSS software to PostgreSQL too (portals, CMS systems,
> accounting systems, etc.).
> 
> All ideas welcome!
> 
> --
> Jonah H. Harris, Database Internals Architect
> EnterpriseDB Corporation
> 732.331.1324
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
> 
> 
>

Re: Google SoC--Idea Request

From

Stephen Frost

Date:

17 April 2006, 11:34:20

* Jonah H. Harris (jonah.harris@gmail.com) wrote:
> I know we started a discussion a month or so ago regarding ideas for
> SoC projects.  However, after reading through the thread, I didn't see
> us nail down any actual items.

I got an email already for a good idea, actually, which is to work on
having pg_hba.conf modifiable from SQL.  The only problem with that is
that it really needs to be done in an acceptable way which requires
probably as much design work as actual programming.  Another idea along
those same lines would be having .k5login-style support for Kerberos.
We'd need a conf-flag for that for backwards compatibility (once the
.k5login-style support exists we should clean up our Kerberos
credentials matching to, for example, not accept 'sfrost/root' for
'sfrost' or 'sfrost@ABC.COM' for 'sfrost@XYZ.com').

It'd also be nice to support SASL, and better hashes than md5.
Thanks,
    Stephen

Re: Google SoC--Idea Request

From

"Jim C. Nasby"

Date:

17 April 2006, 17:55:04

On Sat, Apr 15, 2006 at 03:05:20PM -0400, Jonah H. Harris wrote:
> All ideas welcome!

I know it's not directly PostgreSQL related, but I'd love to see the
dbt* code improved. Items on my wish-list:

- make it easy to run the test framework and clients on a seperate machine from the database server
- keep results in a database
- provide a front-end to allow users to schedule tests in a queue
- add support for windows, at least for the database (theoretically possible to run that way now, but you have to do
everythingby hand)
 

Another idea: afaik, spikesource is still offering a bounty for
improvements to OSS test suites, something that'd fit well with SoC.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Google SoC--Idea Request

From

Mark Wong

Date:

18 April 2006, 15:31:08

Jim C. Nasby wrote:
> On Sat, Apr 15, 2006 at 03:05:20PM -0400, Jonah H. Harris wrote:
>> All ideas welcome!
> 
> I know it's not directly PostgreSQL related, but I'd love to see the
> dbt* code improved. Items on my wish-list:
> 
> - make it easy to run the test framework and clients on a seperate
>   machine from the database server
> - keep results in a database
> - provide a front-end to allow users to schedule tests in a queue
> - add support for windows, at least for the database (theoretically
>   possible to run that way now, but you have to do everything by hand)
> 
> Another idea: afaik, spikesource is still offering a bounty for
> improvements to OSS test suites, something that'd fit well with SoC.

I second this. :)  There are also the TPC-App (Java) fair-use 
implementation that I've started and the TPC-E (next gen OLTP) that I 
would like to start.

Mark

Re: Google SoC--Idea Request

From

"Jim C. Nasby"

Date:

18 April 2006, 16:11:56

On Tue, Apr 18, 2006 at 11:27:40AM -0700, Mark Wong wrote:
> Jim C. Nasby wrote:
> >On Sat, Apr 15, 2006 at 03:05:20PM -0400, Jonah H. Harris wrote:
> >>All ideas welcome!
> >
> >I know it's not directly PostgreSQL related, but I'd love to see the
> >dbt* code improved. Items on my wish-list:
> >
> >- make it easy to run the test framework and clients on a seperate
> >  machine from the database server
> >- keep results in a database
> >- provide a front-end to allow users to schedule tests in a queue
> >- add support for windows, at least for the database (theoretically
> >  possible to run that way now, but you have to do everything by hand)
> >
> >Another idea: afaik, spikesource is still offering a bounty for
> >improvements to OSS test suites, something that'd fit well with SoC.
> 
> I second this. :)  There are also the TPC-App (Java) fair-use 
> implementation that I've started and the TPC-E (next gen OLTP) that I 
> would like to start.

Maybe before starting on TPC-E it makes sense to try and get a common
framework for all the different tests built? AFAIK most of the
benchmarks all use a fairly standard client-server infrastructure, so we
should hopefully be able to share that between the different tests...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Google SoC--Idea Request

From

"Jonah H. Harris"

Date:

18 April 2006, 17:10:23

On 4/18/06, Jim C. Nasby <jnasby@pervasive.com> wrote:
> On Tue, Apr 18, 2006 at 11:27:40AM -0700, Mark Wong wrote:
> > Jim C. Nasby wrote:
> > >On Sat, Apr 15, 2006 at 03:05:20PM -0400, Jonah H. Harris wrote:
> > >>All ideas welcome!
> > >
> > >I know it's not directly PostgreSQL related, but I'd love to see the
> > >dbt* code improved. Items on my wish-list:
> > >
> > >- make it easy to run the test framework and clients on a seperate
> > >  machine from the database server
> > >- keep results in a database
> > >- provide a front-end to allow users to schedule tests in a queue
> > >- add support for windows, at least for the database (theoretically
> > >  possible to run that way now, but you have to do everything by hand)
> > >
> > >Another idea: afaik, spikesource is still offering a bounty for
> > >improvements to OSS test suites, something that'd fit well with SoC.
> >
> > I second this. :)  There are also the TPC-App (Java) fair-use
> > implementation that I've started and the TPC-E (next gen OLTP) that I
> > would like to start.
>
> Maybe before starting on TPC-E it makes sense to try and get a common
> framework for all the different tests built? AFAIK most of the
> benchmarks all use a fairly standard client-server infrastructure, so we
> should hopefully be able to share that between the different tests...

I agree with Jim.  A framework would really help out here.  All of the
tests are basically the same and would benefit from a framework.

However, Mark, do you think Java is a reliable benchmarking platform?
At EnterpriseDB, we've tried several Java benchmarks and could never
get as repeatable or reliable of a benchmark as DBT2 gives you.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

Re: Google SoC--Idea Request

From

Mark Wong

Date:

18 April 2006, 17:59:29

Jonah H. Harris wrote:
> On 4/18/06, Jim C. Nasby <jnasby@pervasive.com> wrote:
>> On Tue, Apr 18, 2006 at 11:27:40AM -0700, Mark Wong wrote:
>>> Jim C. Nasby wrote:
>>>> On Sat, Apr 15, 2006 at 03:05:20PM -0400, Jonah H. Harris wrote:
>>>>> All ideas welcome!
>>>> I know it's not directly PostgreSQL related, but I'd love to see the
>>>> dbt* code improved. Items on my wish-list:
>>>>
>>>> - make it easy to run the test framework and clients on a seperate
>>>>  machine from the database server
>>>> - keep results in a database
>>>> - provide a front-end to allow users to schedule tests in a queue
>>>> - add support for windows, at least for the database (theoretically
>>>>  possible to run that way now, but you have to do everything by hand)
>>>>
>>>> Another idea: afaik, spikesource is still offering a bounty for
>>>> improvements to OSS test suites, something that'd fit well with SoC.
>>> I second this. :)  There are also the TPC-App (Java) fair-use
>>> implementation that I've started and the TPC-E (next gen OLTP) that I
>>> would like to start.
>> Maybe before starting on TPC-E it makes sense to try and get a common
>> framework for all the different tests built? AFAIK most of the
>> benchmarks all use a fairly standard client-server infrastructure, so we
>> should hopefully be able to share that between the different tests...
> 
> I agree with Jim.  A framework would really help out here.  All of the
> tests are basically the same and would benefit from a framework.

This has crossed my mind before.  I haven't been able to come up with 
something that I've felt good about on my own though.

> However, Mark, do you think Java is a reliable benchmarking platform? 
> At EnterpriseDB, we've tried several Java benchmarks and could never
> get as repeatable or reliable of a benchmark as DBT2 gives you.

I don't have much experience here yet.  I've only got a portion of the 
TPC-App implemented, although probably enough now to see how repeatable 
it is thus far.  Do you want to give my DBT4 kit a shot? :)  I'm curious 
to what platforms you've tried Java on as I've heard the Linux 
implementations aren't as good as their Windows counterparts.  I'm not 
sure how true that is today though.

Mark

Re: Google SoC--Idea Request

From

John DeSoi

Date:

19 April 2006, 12:36:09

Proposed item: Improve PL/PHP support, especially installation on non- 
Linux platforms. PL/PHP does not currently work on OS X (not sure  
about Windows, but I doubt it).

Alvaro indicated he would be willing to provide direction on this  
with testing support from me. He also said there are several other  
possible PL/PHP issues that would warrant a SoC project.




John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL

Re: Google SoC--Idea Request

From

"Jonah H. Harris"

Date:

19 April 2006, 13:09:11

On 4/19/06, John DeSoi <desoi@pgedit.com> wrote:
> Alvaro indicated he would be willing to provide direction on this
> with testing support from me. He also said there are several other
> possible PL/PHP issues that would warrant a SoC project.

Cool... let's get 'em all listed here so we can move forward.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

Re: Google SoC--Idea Request

From

"Joshua D. Drake"

Date:

19 April 2006, 13:10:13

John DeSoi wrote:
> Proposed item: Improve PL/PHP support, especially installation on 
> non-Linux platforms. PL/PHP does not currently work on OS X (not sure 
> about Windows, but I doubt it).

It definitely does NOT work on Windows. MacOSX is just a matter of us 
having some time.

> Alvaro indicated he would be willing to provide direction on this with 
> testing support from me. He also said there are several other possible 
> PL/PHP issues that would warrant a SoC project.

Well my number one issue is the build process which needs to be cleaned 
up but there are other more technical issues to be resolved as well.

Joshua D. Drake

> 
> 
> 
> 
> John DeSoi, Ph.D.
> http://pgedit.com/
> Power Tools for PostgreSQL
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>               http://www.postgresql.org/docs/faq
> 


-- 
           === The PostgreSQL Company: Command Prompt, Inc. ===     Sales/Support: +1.503.667.4564 || 24x7/Emergency:
+1.800.492.2240    Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/

Re: Google SoC--Idea Request

From

Alvaro Herrera

Date:

19 April 2006, 13:12:55

Joshua D. Drake wrote:
> John DeSoi wrote:
> >Proposed item: Improve PL/PHP support, especially installation on 
> >non-Linux platforms. PL/PHP does not currently work on OS X (not sure 
> >about Windows, but I doubt it).
> 
> It definitely does NOT work on Windows. MacOSX is just a matter of us 
> having some time.
> 
> >Alvaro indicated he would be willing to provide direction on this with 
> >testing support from me. He also said there are several other possible 
> >PL/PHP issues that would warrant a SoC project.
> 
> Well my number one issue is the build process which needs to be cleaned 
> up but there are other more technical issues to be resolved as well.

Yeah, there are also a number of possible improvements documented as
tickets in the Trac site and others that currently exist only as very
vague noise in my head.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Google SoC--Idea Request

From

Robert Treat

Date:

20 April 2006, 09:51:37

On Wednesday 19 April 2006 12:09, Jonah H. Harris wrote:
> On 4/19/06, John DeSoi <desoi@pgedit.com> wrote:
> > Alvaro indicated he would be willing to provide direction on this
> > with testing support from me. He also said there are several other
> > possible PL/PHP issues that would warrant a SoC project.
>
> Cool... let's get 'em all listed here so we can move forward.
>

I think Martin Oosterhout's nearby email on coverity bug reports might make a 
good SoC project, but should it also be added to the TODO list? 

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

Re: Google SoC--Idea Request

From

Martijn van Oosterhout

Date:

20 April 2006, 11:05:14

On Thu, Apr 20, 2006 at 08:51:25AM -0400, Robert Treat wrote:
> On Wednesday 19 April 2006 12:09, Jonah H. Harris wrote:
> > On 4/19/06, John DeSoi <desoi@pgedit.com> wrote:
> > > Alvaro indicated he would be willing to provide direction on this
> > > with testing support from me. He also said there are several other
> > > possible PL/PHP issues that would warrant a SoC project.
> >
> > Cool... let's get 'em all listed here so we can move forward.
> >
>
> I think Martin Oosterhout's nearby email on coverity bug reports might make a
> good SoC project, but should it also be added to the TODO list?

Nice idea, though it would be much more useful if the reports could be
exported en-masse. There's an export function but it only exports the
user comments, not the error itself. So unless people signup there's no
easy way to get the info to people. :(

In any case, after you weed out the false-positives and exclude ECPG
you're only talking about less than 50 issues that may need to be
addressed. Hardly a project that will take any amount of time.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Google SoC--Idea Request

From

Tom Lane

Date:

20 April 2006, 12:04:47

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Thu, Apr 20, 2006 at 08:51:25AM -0400, Robert Treat wrote:
>> I think Martin Oosterhout's nearby email on coverity bug reports might make a
>> good SoC project, but should it also be added to the TODO list?
> ...
> In any case, after you weed out the false-positives and exclude ECPG
> you're only talking about less than 50 issues that may need to be
> addressed. Hardly a project that will take any amount of time.

Nor one we'd be willing to wait till the summer to address, if any of
the bugs are real.
        regards, tom lane

Re: Google SoC--Idea Request

From

Martijn van Oosterhout

Date:

20 April 2006, 12:48:14

On Thu, Apr 20, 2006 at 11:04:31AM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > On Thu, Apr 20, 2006 at 08:51:25AM -0400, Robert Treat wrote:
> >> I think Martin Oosterhout's nearby email on coverity bug reports might make a
> >> good SoC project, but should it also be added to the TODO list?
> > ...
> > In any case, after you weed out the false-positives and exclude ECPG
> > you're only talking about less than 50 issues that may need to be
> > addressed. Hardly a project that will take any amount of time.
>
> Nor one we'd be willing to wait till the summer to address, if any of
> the bugs are real.

Most of the stuff remaining is memory leaks in the src/bin directories,
and ECPG. The memory leaks are not important there (initdb leaks like a
sieve in many places).

About the only thing in the backend I found interesting was this:

src/backend/utils/hash/dynahash.c function hash_create

The numbers are line numbers. Somewhat squished version, hope I didn't
miss anything.

185  if( flags & HASH_SHARED_MEM) {
193      hashp->hcxt = NULL;
197      if (flags & HASH_ATTACH)
198           return hashp;
199  }
256  if (!init_htab(hashp, nelem))
257  {
258      hash_destroy(hashp);

hash_destroy dereferences hashp->hcxt. I don't see anything in
init_htab that special-cases shared memory hashes. The only way this
could be avoided is if HASH_SHARED_MEM was always combined with
HASH_ATTACH. But if so, why the test?

The only other thing we could do, if we were prepare to annotate the
source, is maybe teach it about our locking stuff and have it check
that. But I don't think that's suitable for mainline, more someone's
private tree...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Google SoC--Idea Request

From

Tom Lane

Date:

20 April 2006, 12:56:38

Martijn van Oosterhout <kleptog@svana.org> writes:
> About the only thing in the backend I found interesting was this:
> src/backend/utils/hash/dynahash.c function hash_create

I wonder if we shouldn't just remove the hash_destroy calls in
hash_create's failure paths.  hash_destroy is explicitly not gonna
work on a shared-memory hashtable, and in all other cases I'd expect
that any already-allocated table structure will be in a palloc context
that will get cleaned up during error recovery.
        regards, tom lane

Re: Google SoC--Idea Request

From

"Jim C. Nasby"

Date:

20 April 2006, 15:01:41

Another idea; add the ability for buildfarm machines to do a pgbench run
to stress-test the code. Such a test would probably have found the
windows pgbench issue I reported some time ago.

This would have to be optional, as not all buildfarm machines/owners
would tolerate the benchmark.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Google SoC--Idea Request

From

Martijn van Oosterhout

Date:

20 April 2006, 18:13:08

On Sat, Apr 15, 2006 at 03:05:20PM -0400, Jonah H. Harris wrote:
> Hey everyone,
>
> I know we started a discussion a month or so ago regarding ideas for
> SoC projects.  However, after reading through the thread, I didn't see
> us nail down any actual items.

Here's an idea: Get the ECPG test programs into a state that they can
be integrated into the regression tests.

There are programs already but you can't easily run them, no schema...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Google SoC--Idea Request

From

Christopher Kings-Lynne

Date:

20 April 2006, 22:10:03

> I think Martin Oosterhout's nearby email on coverity bug reports might make a 
> good SoC project, but should it also be added to the TODO list? 

I may as well put up phpPgAdmin for it.  We have plenty of projects 
available in phpPgAdmin...

Chris

Re: Google SoC--Idea Request

From

Andreas Pflug

Date:

21 April 2006, 05:28:15

Christopher Kings-Lynne wrote:
>> I think Martin Oosterhout's nearby email on coverity bug reports might 
>> make a good SoC project, but should it also be added to the TODO list? 
> 
> 
> I may as well put up phpPgAdmin for it.  We have plenty of projects 
> available in phpPgAdmin...

Same with pgAdmin3.

Regards,
Andreas

Re: Google SoC--Idea Request

From

"Jim C. Nasby"

Date:

21 April 2006, 15:11:52

On Fri, Apr 21, 2006 at 10:27:48AM +0200, Andreas Pflug wrote:
> Christopher Kings-Lynne wrote:
> >>I think Martin Oosterhout's nearby email on coverity bug reports might 
> >>make a good SoC project, but should it also be added to the TODO list? 
> >
> >
> >I may as well put up phpPgAdmin for it.  We have plenty of projects 
> >available in phpPgAdmin...
> 
> Same with pgAdmin3.

Is there a list of specific projects? I'm pretty sure we can't just say
"work on (pgp)PgAdmin...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Google SoC--Idea Request

From

"Jonah H. Harris"

Date:

21 April 2006, 15:18:12

Robert and I are working on updating it ASAP.

On 4/21/06, Jim C. Nasby <jnasby@pervasive.com> wrote:
> On Fri, Apr 21, 2006 at 10:27:48AM +0200, Andreas Pflug wrote:
> > Christopher Kings-Lynne wrote:
> > >>I think Martin Oosterhout's nearby email on coverity bug reports might
> > >>make a good SoC project, but should it also be added to the TODO list?
> > >
> > >
> > >I may as well put up phpPgAdmin for it.  We have plenty of projects
> > >available in phpPgAdmin...
> >
> > Same with pgAdmin3.
>
> Is there a list of specific projects? I'm pretty sure we can't just say
> "work on (pgp)PgAdmin...
> --
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
>


--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

Re: Google SoC--Idea Request

From

Robert Treat

Date:

21 April 2006, 18:48:48

On Friday 21 April 2006 14:11, Jim C. Nasby wrote:
> On Fri, Apr 21, 2006 at 10:27:48AM +0200, Andreas Pflug wrote:
> > Christopher Kings-Lynne wrote:
> > >>I think Martin Oosterhout's nearby email on coverity bug reports might
> > >>make a good SoC project, but should it also be added to the TODO list?
> > >
> > >I may as well put up phpPgAdmin for it.  We have plenty of projects
> > >available in phpPgAdmin...
> >
> > Same with pgAdmin3.
>
> Is there a list of specific projects? I'm pretty sure we can't just say
> "work on (pgp)PgAdmin...

http://www.postgresql.org/developer/summerofcode

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

Re: Google SoC--Idea Request

From

"Jim C. Nasby"

Date:

21 April 2006, 19:35:10

On Fri, Apr 21, 2006 at 05:48:33PM -0400, Robert Treat wrote:
> On Friday 21 April 2006 14:11, Jim C. Nasby wrote:
> > On Fri, Apr 21, 2006 at 10:27:48AM +0200, Andreas Pflug wrote:
> > > Christopher Kings-Lynne wrote:
> > > >>I think Martin Oosterhout's nearby email on coverity bug reports might
> > > >>make a good SoC project, but should it also be added to the TODO list?
> > > >
> > > >I may as well put up phpPgAdmin for it.  We have plenty of projects
> > > >available in phpPgAdmin...
> > >
> > > Same with pgAdmin3.
> >
> > Is there a list of specific projects? I'm pretty sure we can't just say
> > "work on (pgp)PgAdmin...
> 
> http://www.postgresql.org/developer/summerofcode

Want to replace

<li><strong>Many TODO Items</strong>A number of the items on our TODO
list have been marked as good projects for beginners whos are new to the
PostgreSQL code. Items on this list have the advantage of already having
general community agreement that the feature is desireable. These items
should also have some general discussion available in the mailing list
archives to help get you started. You can find these items on the <a
href="http://wwwmaster.postgresql.org/docs/faqs.TODO.html">TODO</a>
list, they will be marked with apercent sign (%).
</li>

with

<li><strong>Many TODO Items</strong>: A number of the items on our TODO
list have been marked as good projects for beginners who are new to the
PostgreSQL code. Items on this list have the advantage of already having
general community agreement that the feature is desireable. These items
should also have some general discussion available in the mailing list
archives to help get you started. You can find these items on the <a
href="http://wwwmaster.postgresql.org/docs/faqs.TODO.html">TODO</a>
list, they will be marked with apercent sign (%).
</li>

?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Google SoC--Idea Request

From

Andreas Pflug

Date:

21 April 2006, 20:12:30

Jim C. Nasby wrote:
>>
>> Same with pgAdmin3.
>>     
>
> Is there a list of specific projects? I'm pretty sure we can't just say
> "work on (pgp)PgAdmin...
>   

Our TODO list has some.

Regards,
Andreas

Re: Google SoC--Idea Request

From

Alvaro Herrera

Date:

23 April 2006, 18:45:33

I hope I'm not too late.

Jonah H. Harris wrote:
> On 4/19/06, John DeSoi <desoi@pgedit.com> wrote:
> > Alvaro indicated he would be willing to provide direction on this
> > with testing support from me. He also said there are several other
> > possible PL/PHP issues that would warrant a SoC project.
> 
> Cool... let's get 'em all listed here so we can move forward.

The following is all PL/php related, in no particular order:

1. Add support for IN/OUT parameters, and named parameters.  This should
be easy to do, the majority of needed infraestructure in PL/php is there
already.  It only needs a bit more love.

2. Clean up memory usage.  Both compilation and execution of a function
should happen on separate, maybe temporary, memory contexts; and provide
adequate cleanup for both (for example when a function is recompiled).

3. Enable it to build separate from the Apache SAPI.

4. Allow huge resultsets to be processed by providing an option to
transparently use a cursor to fetch results partially, when spi_exec()
is called.

5. Clean up the plphp_proc_desc struct.  This involves making sure we
store all the info we need to know about a function; no more, no less.
(I think currently we store things we don't need, and we don't store
some things it would be useful to know).

I don't think any of these would warrant a SoC by itself.  Maybe the
whole bunch could, however.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Google SoC--Idea Request

From

"Jonah H. Harris"

Date:

23 April 2006, 19:29:28

Cool... will get them added.

On 4/23/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> I hope I'm not too late.
>
> Jonah H. Harris wrote:
> > On 4/19/06, John DeSoi <desoi@pgedit.com> wrote:
> > > Alvaro indicated he would be willing to provide direction on this
> > > with testing support from me. He also said there are several other
> > > possible PL/PHP issues that would warrant a SoC project.
> >
> > Cool... let's get 'em all listed here so we can move forward.
>
> The following is all PL/php related, in no particular order:
>
> 1. Add support for IN/OUT parameters, and named parameters.  This should
> be easy to do, the majority of needed infraestructure in PL/php is there
> already.  It only needs a bit more love.
>
> 2. Clean up memory usage.  Both compilation and execution of a function
> should happen on separate, maybe temporary, memory contexts; and provide
> adequate cleanup for both (for example when a function is recompiled).
>
> 3. Enable it to build separate from the Apache SAPI.
>
> 4. Allow huge resultsets to be processed by providing an option to
> transparently use a cursor to fetch results partially, when spi_exec()
> is called.
>
> 5. Clean up the plphp_proc_desc struct.  This involves making sure we
> store all the info we need to know about a function; no more, no less.
> (I think currently we store things we don't need, and we don't store
> some things it would be useful to know).
>
>
> I don't think any of these would warrant a SoC by itself.  Maybe the
> whole bunch could, however.
>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>


--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

Re: Google SoC--Idea Request

From

"Jim C. Nasby"

Date:

24 April 2006, 14:45:20

Where do we stand with getting much more reasonable default values in
postgresql.conf? Maybe that should be a SoC project, or is it too small?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Google SoC--Idea Request

From

Tom Lane

Date:

24 April 2006, 23:36:48

"Jim C. Nasby" <jnasby@pervasive.com> writes:
> Where do we stand with getting much more reasonable default values in
> postgresql.conf? Maybe that should be a SoC project, or is it too small?

Define "much more reasonable".

I doubt this is SoC material, simply because the issues have little to
do with coding and a lot to do with persuading people to drop default
support for old platforms.  Which is not something a student is likely
to succeed at.
        regards, tom lane

Re: Google SoC--Idea Request

From

"Jonah H. Harris"

Date:

24 April 2006, 23:56:21

On 4/24/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I doubt this is SoC material, simply because the issues have little to
> do with coding and a lot to do with persuading people to drop default
> support for old platforms.  Which is not something a student is likely
> to succeed at.

While the student could do some benchmarking on relatively new
hardware and make suggestions, I agree with Tom.  Having to keep
support for older platforms doesn't leave much flexibility to change
the defaults.

I just don't see enough work here to warrant a SoC project.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

Re: Google SoC--Idea Request

From

Tom Lane

Date:

25 April 2006, 00:06:07

"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> While the student could do some benchmarking on relatively new
> hardware and make suggestions, I agree with Tom.  Having to keep
> support for older platforms doesn't leave much flexibility to change
> the defaults.

Another point here is that the defaults *are* reasonable for development
and for small installations; the people who are complaining are the ones
who expect to run terabyte databases without any tuning.  (I exaggerate
perhaps, but the point is valid.)

We've talked more than once about offering multiple alternative
starting-point postgresql.conf files to give people an idea of what to
do for small/medium/large installations.  MySQL have done that for years
and it doesn't seem that users are unable to cope with the concept.
But doing this is (a) mostly a matter of testing and documenting, not
coding and (b) probably too small for a SoC project anyway.
        regards, tom lane

Re: Google SoC--Idea Request

From

"Jonah H. Harris"

Date:

25 April 2006, 00:12:46

On 4/24/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We've talked more than once about offering multiple alternative
> starting-point postgresql.conf files to give people an idea of what to
> do for small/medium/large installations.  MySQL have done that for years
> and it doesn't seem that users are unable to cope with the concept.
> But doing this is (a) mostly a matter of testing and documenting, not
> coding and (b) probably too small for a SoC project anyway.

Yeah, it would be nice to offer a small/med/large config file, but
there are also other considerations that affect PostgreSQL and not
MySQL.  An example is the system-wide shared memory maximum... RedHat
defaults to 32M, SuSE to 32M?, and OSX to 4M (or something crazy like
that).  So even if we give out a med/large config file, they won't
work for most people who have default Linux installs.  Tuning
PostgreSQL isn't all that hard, but it may be nice to give people a
starting point.

I don't know, I'm not averse to adding something like the following to
the SoC ideas:

Benchmark PostgreSQL and analyze results to build optimal default
configuration files for medium and large-scale systems.

Of course, the definition of medium and large vary, as does the
application (OLTP, DSS, etc.); so we'd have to define them.

Thoughts?

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

Re: Google SoC--Idea Request

From

"Jim C. Nasby"

Date:

25 April 2006, 02:00:49

On Mon, Apr 24, 2006 at 11:05:18PM -0400, Tom Lane wrote:
> "Jonah H. Harris" <jonah.harris@gmail.com> writes:
> > While the student could do some benchmarking on relatively new
> > hardware and make suggestions, I agree with Tom.  Having to keep
> > support for older platforms doesn't leave much flexibility to change
> > the defaults.
> 
> Another point here is that the defaults *are* reasonable for development
> and for small installations; the people who are complaining are the ones
> who expect to run terabyte databases without any tuning.  (I exaggerate
> perhaps, but the point is valid.)
> 
> We've talked more than once about offering multiple alternative
> starting-point postgresql.conf files to give people an idea of what to
> do for small/medium/large installations.  MySQL have done that for years
> and it doesn't seem that users are unable to cope with the concept.
> But doing this is (a) mostly a matter of testing and documenting, not
> coding and (b) probably too small for a SoC project anyway.

My recollection was that there was opposition to offering multiple
config files, but that there was a proposal to make initdb smarter about
picking configuration values.

Personally, I agree that multiple config files would be fine. Or a
really fancy solution would be feeding a config option to initdb and
have it generate an appropriate postgresql.conf.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Google SoC--Idea Request

From

"Jonah H. Harris"

Date:

25 April 2006, 02:51:06

On 4/25/06, Andrew Dunstan <andrew@dunslane.net> wrote:
> We have already done some initdb tuning improvements for 8.2

Cool, I hadn't looked at this.

> I would have liked to increase max_connections too, but that would have
> caused problems on OSX, apparently. See previous discussion.

Yeah, their defaults really suck.

> Personally I would much rather see a tuning advisor tool in more general
> use than just provide small/medium/large config setting files.

True dat.

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

Re: Google SoC--Idea Request

From

Andrew Dunstan

Date:

25 April 2006, 03:06:31

Jim C. Nasby wrote:

>On Mon, Apr 24, 2006 at 11:05:18PM -0400, Tom Lane wrote:
>  
>
>>"Jonah H. Harris" <jonah.harris@gmail.com> writes:
>>    
>>
>>>While the student could do some benchmarking on relatively new
>>>hardware and make suggestions, I agree with Tom.  Having to keep
>>>support for older platforms doesn't leave much flexibility to change
>>>the defaults.
>>>      
>>>
>>Another point here is that the defaults *are* reasonable for development
>>and for small installations; the people who are complaining are the ones
>>who expect to run terabyte databases without any tuning.  (I exaggerate
>>perhaps, but the point is valid.)
>>
>>We've talked more than once about offering multiple alternative
>>starting-point postgresql.conf files to give people an idea of what to
>>do for small/medium/large installations.  MySQL have done that for years
>>and it doesn't seem that users are unable to cope with the concept.
>>But doing this is (a) mostly a matter of testing and documenting, not
>>coding and (b) probably too small for a SoC project anyway.
>>    
>>
>
>My recollection was that there was opposition to offering multiple
>config files, but that there was a proposal to make initdb smarter about
>picking configuration values.
>
>Personally, I agree that multiple config files would be fine. Or a
>really fancy solution would be feeding a config option to initdb and
>have it generate an appropriate postgresql.conf.
>  
>


We have already done some initdb tuning improvements for 8.2 - shared 
buffers now tops out at 4000 instead of 1000 and initdb now sets 
max_fsm_pages at a more realistic level. (top is 200,000 instead of 
previously hardcoded 20,000).

I would have liked to increase max_connections too, but that would have 
caused problems on OSX, apparently. See previous discussion.

Personally I would much rather see a tuning advisor tool in more general 
use than just provide small/medium/large config setting files.


cheers

andrew

Re: Google SoC--Idea Request

From

Tom Lane

Date:

25 April 2006, 03:17:07

"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> On 4/25/06, Andrew Dunstan <andrew@dunslane.net> wrote:
>> Personally I would much rather see a tuning advisor tool in more general
>> use than just provide small/medium/large config setting files.

> True dat.

One thing that has to be figured out before we can go far with this
is the whole question of how much smarts initdb really ought to have.
Since a lot of packagers think that initdb should be run
non-interactively behind the scenes, the obvious solution of "give
initdb a --small/--medium/--large parameter" does not work all that
nicely.  But on the other hand we can't just tell people to drop in
replacement config files when the one in place contains initdb-created
specifics, such as locale settings.

Now that there's a provision for "include" directives in
postgresql.conf, one way to address this would be to split the
config info into multiple physical files, some containing purely
performance-related settings while others consider functionality.
But that seems more like a wart than a solution to me.  I feel that
we've pushed performance-tuning logic into initdb that probably ought
not be there, and we ought to factor it out again.
        regards, tom lane

Re: Google SoC--Idea Request

From

"ipig"

Date:

25 April 2006, 03:44:56

Maybe you can develop a graphic interface just like Fedora Core setup interface which can choose packages installing,
thenthe user can choose config file and then have a little change in parameters.
 
----- Original Message ----- 
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Jonah H. Harris" <jonah.harris@gmail.com>
Cc: "Andrew Dunstan" <andrew@dunslane.net>; "Jim C. Nasby" <jnasby@pervasive.com>; "John DeSoi" <desoi@pgedit.com>;
"PgsqlHackers" <pgsql-hackers@postgresql.org>
 
Sent: Tuesday, April 25, 2006 2:16 PM
Subject: Re: [HACKERS] Google SoC--Idea Request 


> "Jonah H. Harris" <jonah.harris@gmail.com> writes:
>> On 4/25/06, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> Personally I would much rather see a tuning advisor tool in more general
>>> use than just provide small/medium/large config setting files.
> 
>> True dat.
> 
> One thing that has to be figured out before we can go far with this
> is the whole question of how much smarts initdb really ought to have.
> Since a lot of packagers think that initdb should be run
> non-interactively behind the scenes, the obvious solution of "give
> initdb a --small/--medium/--large parameter" does not work all that
> nicely.  But on the other hand we can't just tell people to drop in
> replacement config files when the one in place contains initdb-created
> specifics, such as locale settings.
> 
> Now that there's a provision for "include" directives in
> postgresql.conf, one way to address this would be to split the
> config info into multiple physical files, some containing purely
> performance-related settings while others consider functionality.
> But that seems more like a wart than a solution to me.  I feel that
> we've pushed performance-tuning logic into initdb that probably ought
> not be there, and we ought to factor it out again.
> 
> regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>               http://archives.postgresql.org
>

Re: Google SoC--Idea Request

From

"Bort, Paul"

Date:

25 April 2006, 09:29:49

> > Personally I would much rather see a tuning advisor tool in
> more general
> > use than just provide small/medium/large config setting files.
>
> True dat.

Maybe the SoC project here is just such a tuning advisor tool? Something
that can run pgbench repeatedly, try different settings, and compare
results.

Re: Google SoC--Idea Request

From

"Jonah H. Harris"

Date:

25 April 2006, 09:40:13

On 4/25/06, Bort, Paul <pbort@tmwsystems.com> wrote:
> Maybe the SoC project here is just such a tuning advisor tool? Something
> that can run pgbench repeatedly, try different settings, and compare
> results.

IIRC, that already exists.  I think it was called pg_autotune or
something similar.


--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

Re: Google SoC--Idea Request

From

Bruce Momjian

Date:

25 April 2006, 10:17:05

Tom Lane wrote:
> "Jonah H. Harris" <jonah.harris@gmail.com> writes:
> > On 4/25/06, Andrew Dunstan <andrew@dunslane.net> wrote:
> >> Personally I would much rather see a tuning advisor tool in more general
> >> use than just provide small/medium/large config setting files.
> 
> > True dat.
> 
> One thing that has to be figured out before we can go far with this
> is the whole question of how much smarts initdb really ought to have.
> Since a lot of packagers think that initdb should be run
> non-interactively behind the scenes, the obvious solution of "give
> initdb a --small/--medium/--large parameter" does not work all that
> nicely.  But on the other hand we can't just tell people to drop in
> replacement config files when the one in place contains initdb-created
> specifics, such as locale settings.
> 
> Now that there's a provision for "include" directives in
> postgresql.conf, one way to address this would be to split the
> config info into multiple physical files, some containing purely
> performance-related settings while others consider functionality.
> But that seems more like a wart than a solution to me.  I feel that
> we've pushed performance-tuning logic into initdb that probably ought
> not be there, and we ought to factor it out again.

Sounds good. I don't care what we do for 8.2, but we should do
something.

Or am I going to have to bring out my dancing elephant again?  :-)
http://www.janetskiles.com/ART/greeting/greet-ani/dancing-elephant.jpg


--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Google SoC--Idea Request

From

"Jim C. Nasby"

Date:

25 April 2006, 12:09:33

On Tue, Apr 25, 2006 at 08:39:57AM -0400, Jonah H. Harris wrote:
> On 4/25/06, Bort, Paul <pbort@tmwsystems.com> wrote:
> > Maybe the SoC project here is just such a tuning advisor tool? Something
> > that can run pgbench repeatedly, try different settings, and compare
> > results.
> 
> IIRC, that already exists.  I think it was called pg_autotune or
> something similar.

Last time I tried autotune I couldn't get it to work on FreeBSD, and it
tuned a minimum of parameters. For example, it didn't touch
checkpoint_segments, which is pretty essential to tune on a higher-end
server.

Not saying it wouldn't be a good place to start, but I also don't think
it's a replacement for a built-in tuning tool.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Google SoC--Idea Request

From

"Nikolay Samokhvalov"

Date:

02 May 2006, 05:35:13

Proposal: XMLType for PostgreSQL.

*** Minimum: ***
to have special type support for storing XML data and working with it.
This means following:- ability to define any column of a table as of XMLType; internally,
all data is stored as VARCHAR;- auto validation of documents against XML schema, if it was
specified in column
definition or in XML data sheets themselves (DTD, XSD or at least one
of them) /*contrib/xml2 has such feature, but it uses libxml, what
means DOM interface. Maybe it's better to use some SAX parser to solve
this task*/;- XPath indexes for queries with path expressions in WHERE clause /*I
suppose this kind of indexes would be most frequently used. I propose
using good labeling schema and GIST and/or Gin here*/;- some subset of SQL/XML. Actually, part 14 of SQL:200n (SQL/XML)
has
more than 400 pages now and contains some established constructions,
that are using in other DBMSes. There is the some patch already
written by Pavel Stehule:
http://www.pgsql.ru/db/mw/msg.html?mid=2096818. (BTW, what is with it?
it was kept for 8.2, so what is the result?) I've tested it several
months ago, basic SQL/XML functions worked fine. It changes grammar,
but there is no other way... So, using this patch as a part of this
project means that this project cannot be contrib module,
unfortunately. Nevertheless, current paper of SQL/XML standard seems
to be mature - so, compared with existing implementation it would be a
nice 'landmark';- XML domains support: ability to define domain based on XMLType and
XML schema definition (e.g., external DTD file or smth). I'd consider
XML schema definition as a restriction of entire XML Type (similar to
restrictions for plain types, which are defined as CHECK constraint in
domain definition)

*** Maximum: ***- all things from 'minimum' list :-)- reach index system: * structure index (labeling schema; prefix
schemasseem to be best 
for this and I
suppose GIST would help here). Actually, it would be full shredding,
like primary index for XML in MS SQL Server, but I'm aware of better
labeling algorithms than simple prefix labeling (as in SQL Server).
Surely, GIST/Gin support would be great foundation for these * flexible support of path indexes, value indexes and so
on(smth 
like secondary XML  indexes in SQL Server...) - as a continuation of
work on path indexes from 'minimum' list;- full-text search abilties (tsearch2 / GIST);- different encoding issues
(autoconversion to column's encoding, etc);- ability to choose storage type: VARCHAR or 'native' (trees - like 
in native XML DBMSes and DB2 Viper [if their articles don't lie ;-)])
mode. Actually, this is very-very huge task (almost so as creating
DBMS from scratch) and I inderstand clearly that I won't solve it
using only my own abilities. But the work on 'minimum' list
(especially if it will be a part of SoC) would be a good start point
and may involve some other developers that help to implement it. Maybe
at the initial stage, it's worth to integrate with some other DBMS and
work with it using two-phase commit (surely, this is not a clue to all
problems, as it
means two different execution plans, etc);- XQuery and its integration with SQL (according SQL/XML standard).
In other words,  implementation of XQuery Data Model - this would be
great target point (version 1.0 of entire  project);- XML views / updatable XML views (actually, it's a crazy idea, but
it's my dream ;-) )

As a part of SoC I would concentrate on tasks from 'minimum' list. It
would be a good start point.

Some articles:
Fresh draft of SQL:200n: http://www.wiscorp.com/sql_2003_standard.zip
Other SQL/XML papers: http://www.wiscorp.com/SQLStandards.html#xsqlstandards
XISS system (Li, Moon - advanced interval indexes):
http://www.cs.arizona.edu/xiss/
MASS (prefix indexes):
http://davis.wpi.edu/dsrg/vamana/WebPages/Publication.html
Staircase joins (accelerating XPath Evaluation):
http://www.inf.uni-konstanz.de/dbis/publications/download/injection.pdf
Oleg's TODO list: http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo
XML in DB2 Viper: http://www.vldb2005.org/program/paper/thu/p1164-nicola.pdf
XQuery in SQL Server: http://www.vldb2005.org/program/paper/thu/p1175-pal.pdf
Labeling schema in SQL Server (ORDPATHs):
http://portal.acm.org/ft_gateway.cfm?id=1007686&type=pdf&coll=GUIDE&dl=GUIDE&CFID=74920272&CFTOKEN=73736781

One more comment: I'm a PhD student of MIPT, Russia. I plan to create
an overview of XMLType implementations of last versions of three major
commercial DBMSes (ORA, MS, DB2), comparing them to standard and each
other. First article of this comparison is planned to the end of May.
This work will help to understand, where major commercial DBMS vendors
go and why they go there :-) Moreover, I intend to create a technique
for testing of XMLType support in (O)RDBMSes. In spite of the fact,
that SoC assumes all work be done by only one person, I expect some
upport/help from following people:- Dr. Sergey Kuznetsov (my scientific mentor)- Oleg Bartunov and Teodor Sigaev (as
majordevelopers of PostgreSQL 
and GIST and Gin, they definitely can help me to be successive);- Ivan Zolotukhin (together we plan to create the
overviewmentioned above)- PostgreSQL community (actually, as I've already mentioned, I intend 
using code by Pavel Stehule, and I'm pretty sure that I'll need a lot
of other help from the community)

On 4/15/06, Jonah H. Harris <jonah.harris@gmail.com> wrote:
> Hey everyone,
>
> I know we started a discussion a month or so ago regarding ideas for
> SoC projects.  However, after reading through the thread, I didn't see
> us nail down any actual items.
>
> As such, we need to quickly put together a list of oh, 15-20 midlevel
> project ideas.  I'm sure we can pull some off the TODO list, but we
> should also look at project ideas for porting some of the most used
> third-party OSS software to PostgreSQL too (portals, CMS systems,
> accounting systems, etc.).
>
> All ideas welcome!
>
> --
> Jonah H. Harris, Database Internals Architect
> EnterpriseDB Corporation
> 732.331.1324
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Best regards,
Nikolay

Re: Google SoC--Idea Request

From

"Jonah H. Harris"

Date:

02 May 2006, 07:36:35

You need to submit this through Google.

Student FAQ:
http://code.google.com/soc/studentfaq.html

Student Sign-up:
http://code.google.com/soc/student_step1.html

On 5/2/06, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
> Proposal: XMLType for PostgreSQL.
>
> *** Minimum: ***
> to have special type support for storing XML data and working with it.
> This means following:
>  - ability to define any column of a table as of XMLType; internally,
> all data is stored as VARCHAR;
>  - auto validation of documents against XML schema, if it was
> specified in column
> definition or in XML data sheets themselves (DTD, XSD or at least one
> of them) /*contrib/xml2 has such feature, but it uses libxml, what
> means DOM interface. Maybe it's better to use some SAX parser to solve
> this task*/;
>  - XPath indexes for queries with path expressions in WHERE clause /*I
> suppose this kind of indexes would be most frequently used. I propose
> using good labeling schema and GIST and/or Gin here*/;
>  - some subset of SQL/XML. Actually, part 14 of SQL:200n (SQL/XML) has
> more than 400 pages now and contains some established constructions,
> that are using in other DBMSes. There is the some patch already
> written by Pavel Stehule:
> http://www.pgsql.ru/db/mw/msg.html?mid=2096818. (BTW, what is with it?
> it was kept for 8.2, so what is the result?) I've tested it several
> months ago, basic SQL/XML functions worked fine. It changes grammar,
> but there is no other way... So, using this patch as a part of this
> project means that this project cannot be contrib module,
> unfortunately. Nevertheless, current paper of SQL/XML standard seems
> to be mature - so, compared with existing implementation it would be a
> nice 'landmark';
>  - XML domains support: ability to define domain based on XMLType and
> XML schema definition (e.g., external DTD file or smth). I'd consider
> XML schema definition as a restriction of entire XML Type (similar to
> restrictions for plain types, which are defined as CHECK constraint in
> domain definition)
>
> *** Maximum: ***
>  - all things from 'minimum' list :-)
>  - reach index system:
>   * structure index (labeling schema; prefix schemas seem to be best
> for this and I
> suppose GIST would help here). Actually, it would be full shredding,
> like primary index for XML in MS SQL Server, but I'm aware of better
> labeling algorithms than simple prefix labeling (as in SQL Server).
> Surely, GIST/Gin support would be great foundation for these
>   * flexible support of path indexes, value indexes and so on (smth
> like secondary XML  indexes in SQL Server...) - as a continuation of
> work on path indexes from 'minimum' list;
>  - full-text search abilties (tsearch2 / GIST);
>  - different encoding issues (auto conversion to column's encoding, etc);
>  - ability to choose storage type: VARCHAR or 'native' (trees - like
> in native XML DBMSes and DB2 Viper [if their articles don't lie ;-)])
> mode. Actually, this is very-very huge task (almost so as creating
> DBMS from scratch) and I inderstand clearly that I won't solve it
> using only my own abilities. But the work on 'minimum' list
> (especially if it will be a part of SoC) would be a good start point
> and may involve some other developers that help to implement it. Maybe
> at the initial stage, it's worth to integrate with some other DBMS and
> work with it using two-phase commit (surely, this is not a clue to all
> problems, as it
> means two different execution plans, etc);
>  - XQuery and its integration with SQL (according SQL/XML standard).
> In other words,  implementation of XQuery Data Model - this would be
> great target point (version 1.0 of entire  project);
>  - XML views / updatable XML views (actually, it's a crazy idea, but
> it's my dream ;-) )
>
> As a part of SoC I would concentrate on tasks from 'minimum' list. It
> would be a good start point.
>
> Some articles:
> Fresh draft of SQL:200n: http://www.wiscorp.com/sql_2003_standard.zip
> Other SQL/XML papers: http://www.wiscorp.com/SQLStandards.html#xsqlstandards
> XISS system (Li, Moon - advanced interval indexes):
> http://www.cs.arizona.edu/xiss/
> MASS (prefix indexes):
> http://davis.wpi.edu/dsrg/vamana/WebPages/Publication.html
> Staircase joins (accelerating XPath Evaluation):
> http://www.inf.uni-konstanz.de/dbis/publications/download/injection.pdf
> Oleg's TODO list: http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo
> XML in DB2 Viper: http://www.vldb2005.org/program/paper/thu/p1164-nicola.pdf
> XQuery in SQL Server: http://www.vldb2005.org/program/paper/thu/p1175-pal.pdf
> Labeling schema in SQL Server (ORDPATHs):
> http://portal.acm.org/ft_gateway.cfm?id=1007686&type=pdf&coll=GUIDE&dl=GUIDE&CFID=74920272&CFTOKEN=73736781
>
> One more comment: I'm a PhD student of MIPT, Russia. I plan to create
> an overview of XMLType implementations of last versions of three major
> commercial DBMSes (ORA, MS, DB2), comparing them to standard and each
> other. First article of this comparison is planned to the end of May.
> This work will help to understand, where major commercial DBMS vendors
> go and why they go there :-) Moreover, I intend to create a technique
> for testing of XMLType support in (O)RDBMSes. In spite of the fact,
> that SoC assumes all work be done by only one person, I expect some
> upport/help from following people:
>  - Dr. Sergey Kuznetsov (my scientific mentor)
>  - Oleg Bartunov and Teodor Sigaev (as major developers of PostgreSQL
> and GIST and Gin, they definitely can help me to be successive);
>  - Ivan Zolotukhin (together we plan to create the overview mentioned above)
>  - PostgreSQL community (actually, as I've already mentioned, I intend
> using code by Pavel Stehule, and I'm pretty sure that I'll need a lot
> of other help from the community)
>
> On 4/15/06, Jonah H. Harris <jonah.harris@gmail.com> wrote:
> > Hey everyone,
> >
> > I know we started a discussion a month or so ago regarding ideas for
> > SoC projects.  However, after reading through the thread, I didn't see
> > us nail down any actual items.
> >
> > As such, we need to quickly put together a list of oh, 15-20 midlevel
> > project ideas.  I'm sure we can pull some off the TODO list, but we
> > should also look at project ideas for porting some of the most used
> > third-party OSS software to PostgreSQL too (portals, CMS systems,
> > accounting systems, etc.).
> >
> > All ideas welcome!
> >
> > --
> > Jonah H. Harris, Database Internals Architect
> > EnterpriseDB Corporation
> > 732.331.1324
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
> >
>
>
> --
> Best regards,
> Nikolay
>


--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

Re: Google SoC--Idea Request

From

Martijn van Oosterhout

Date:

14 August 2006, 04:42:00

On Thu, Apr 20, 2006 at 11:56:32AM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > About the only thing in the backend I found interesting was this:
> > src/backend/utils/hash/dynahash.c function hash_create
>
> I wonder if we shouldn't just remove the hash_destroy calls in
> hash_create's failure paths.  hash_destroy is explicitly not gonna
> work on a shared-memory hashtable, and in all other cases I'd expect
> that any already-allocated table structure will be in a palloc context
> that will get cleaned up during error recovery.

[re: failure to create hash in shared memory causes crash]

Any thoughts on this? Make it a TODO item, document it, or simply
ignore it?

Have a nicy day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Google SoC--Idea Request

From

Tom Lane

Date:

14 August 2006, 09:09:47

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Thu, Apr 20, 2006 at 11:56:32AM -0400, Tom Lane wrote:
>> I wonder if we shouldn't just remove the hash_destroy calls in
>> hash_create's failure paths.  hash_destroy is explicitly not gonna
>> work on a shared-memory hashtable, and in all other cases I'd expect
>> that any already-allocated table structure will be in a palloc context
>> that will get cleaned up during error recovery.

> Any thoughts on this? Make it a TODO item, document it, or simply
> ignore it?

It's like a two-line patch, so hardly worth putting in TODO ... might
as well just do it.  IIRC the motivation is mostly to silence a
Coverity warning?
        regards, tom lane

Re: Google SoC--Idea Request

From

Martijn van Oosterhout

Date:

14 August 2006, 09:35:33

On Mon, Aug 14, 2006 at 08:09:36AM -0400, Tom Lane wrote:
> > Any thoughts on this? Make it a TODO item, document it, or simply
> > ignore it?
>
> It's like a two-line patch, so hardly worth putting in TODO ... might
> as well just do it.  IIRC the motivation is mostly to silence a
> Coverity warning?

Well sort of. I can also just tick a box and the warning goes away too.
It just seemed from the discussion that it was something people were
going to fix...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Google SoC--Idea Request

From

Tom Lane

Date:

14 August 2006, 09:41:06

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Mon, Aug 14, 2006 at 08:09:36AM -0400, Tom Lane wrote:
>> It's like a two-line patch, so hardly worth putting in TODO ... might
>> as well just do it.  IIRC the motivation is mostly to silence a
>> Coverity warning?

> Well sort of. I can also just tick a box and the warning goes away too.
> It just seemed from the discussion that it was something people were
> going to fix...

Done now --- I have to admit I'd forgotten about it.
        regards, tom lane