Re: Google SoC--Idea Request - Mailing list pgsql-hackers

From Jonah H. Harris
Subject Re: Google SoC--Idea Request
Date
Msg-id 36e682920605020336i7570d805w75a1c6b5b9345e78@mail.gmail.com
Whole thread Raw
In response to Re: Google SoC--Idea Request  ("Nikolay Samokhvalov" <samokhvalov@gmail.com>)
List pgsql-hackers
You need to submit this through Google.

Student FAQ:
http://code.google.com/soc/studentfaq.html

Student Sign-up:
http://code.google.com/soc/student_step1.html

On 5/2/06, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
> Proposal: XMLType for PostgreSQL.
>
> *** Minimum: ***
> to have special type support for storing XML data and working with it.
> This means following:
>  - ability to define any column of a table as of XMLType; internally,
> all data is stored as VARCHAR;
>  - auto validation of documents against XML schema, if it was
> specified in column
> definition or in XML data sheets themselves (DTD, XSD or at least one
> of them) /*contrib/xml2 has such feature, but it uses libxml, what
> means DOM interface. Maybe it's better to use some SAX parser to solve
> this task*/;
>  - XPath indexes for queries with path expressions in WHERE clause /*I
> suppose this kind of indexes would be most frequently used. I propose
> using good labeling schema and GIST and/or Gin here*/;
>  - some subset of SQL/XML. Actually, part 14 of SQL:200n (SQL/XML) has
> more than 400 pages now and contains some established constructions,
> that are using in other DBMSes. There is the some patch already
> written by Pavel Stehule:
> http://www.pgsql.ru/db/mw/msg.html?mid=2096818. (BTW, what is with it?
> it was kept for 8.2, so what is the result?) I've tested it several
> months ago, basic SQL/XML functions worked fine. It changes grammar,
> but there is no other way... So, using this patch as a part of this
> project means that this project cannot be contrib module,
> unfortunately. Nevertheless, current paper of SQL/XML standard seems
> to be mature - so, compared with existing implementation it would be a
> nice 'landmark';
>  - XML domains support: ability to define domain based on XMLType and
> XML schema definition (e.g., external DTD file or smth). I'd consider
> XML schema definition as a restriction of entire XML Type (similar to
> restrictions for plain types, which are defined as CHECK constraint in
> domain definition)
>
> *** Maximum: ***
>  - all things from 'minimum' list :-)
>  - reach index system:
>   * structure index (labeling schema; prefix schemas seem to be best
> for this and I
> suppose GIST would help here). Actually, it would be full shredding,
> like primary index for XML in MS SQL Server, but I'm aware of better
> labeling algorithms than simple prefix labeling (as in SQL Server).
> Surely, GIST/Gin support would be great foundation for these
>   * flexible support of path indexes, value indexes and so on (smth
> like secondary XML  indexes in SQL Server...) - as a continuation of
> work on path indexes from 'minimum' list;
>  - full-text search abilties (tsearch2 / GIST);
>  - different encoding issues (auto conversion to column's encoding, etc);
>  - ability to choose storage type: VARCHAR or 'native' (trees - like
> in native XML DBMSes and DB2 Viper [if their articles don't lie ;-)])
> mode. Actually, this is very-very huge task (almost so as creating
> DBMS from scratch) and I inderstand clearly that I won't solve it
> using only my own abilities. But the work on 'minimum' list
> (especially if it will be a part of SoC) would be a good start point
> and may involve some other developers that help to implement it. Maybe
> at the initial stage, it's worth to integrate with some other DBMS and
> work with it using two-phase commit (surely, this is not a clue to all
> problems, as it
> means two different execution plans, etc);
>  - XQuery and its integration with SQL (according SQL/XML standard).
> In other words,  implementation of XQuery Data Model - this would be
> great target point (version 1.0 of entire  project);
>  - XML views / updatable XML views (actually, it's a crazy idea, but
> it's my dream ;-) )
>
> As a part of SoC I would concentrate on tasks from 'minimum' list. It
> would be a good start point.
>
> Some articles:
> Fresh draft of SQL:200n: http://www.wiscorp.com/sql_2003_standard.zip
> Other SQL/XML papers: http://www.wiscorp.com/SQLStandards.html#xsqlstandards
> XISS system (Li, Moon - advanced interval indexes):
> http://www.cs.arizona.edu/xiss/
> MASS (prefix indexes):
> http://davis.wpi.edu/dsrg/vamana/WebPages/Publication.html
> Staircase joins (accelerating XPath Evaluation):
> http://www.inf.uni-konstanz.de/dbis/publications/download/injection.pdf
> Oleg's TODO list: http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo
> XML in DB2 Viper: http://www.vldb2005.org/program/paper/thu/p1164-nicola.pdf
> XQuery in SQL Server: http://www.vldb2005.org/program/paper/thu/p1175-pal.pdf
> Labeling schema in SQL Server (ORDPATHs):
> http://portal.acm.org/ft_gateway.cfm?id=1007686&type=pdf&coll=GUIDE&dl=GUIDE&CFID=74920272&CFTOKEN=73736781
>
> One more comment: I'm a PhD student of MIPT, Russia. I plan to create
> an overview of XMLType implementations of last versions of three major
> commercial DBMSes (ORA, MS, DB2), comparing them to standard and each
> other. First article of this comparison is planned to the end of May.
> This work will help to understand, where major commercial DBMS vendors
> go and why they go there :-) Moreover, I intend to create a technique
> for testing of XMLType support in (O)RDBMSes. In spite of the fact,
> that SoC assumes all work be done by only one person, I expect some
> upport/help from following people:
>  - Dr. Sergey Kuznetsov (my scientific mentor)
>  - Oleg Bartunov and Teodor Sigaev (as major developers of PostgreSQL
> and GIST and Gin, they definitely can help me to be successive);
>  - Ivan Zolotukhin (together we plan to create the overview mentioned above)
>  - PostgreSQL community (actually, as I've already mentioned, I intend
> using code by Pavel Stehule, and I'm pretty sure that I'll need a lot
> of other help from the community)
>
> On 4/15/06, Jonah H. Harris <jonah.harris@gmail.com> wrote:
> > Hey everyone,
> >
> > I know we started a discussion a month or so ago regarding ideas for
> > SoC projects.  However, after reading through the thread, I didn't see
> > us nail down any actual items.
> >
> > As such, we need to quickly put together a list of oh, 15-20 midlevel
> > project ideas.  I'm sure we can pull some off the TODO list, but we
> > should also look at project ideas for porting some of the most used
> > third-party OSS software to PostgreSQL too (portals, CMS systems,
> > accounting systems, etc.).
> >
> > All ideas welcome!
> >
> > --
> > Jonah H. Harris, Database Internals Architect
> > EnterpriseDB Corporation
> > 732.331.1324
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
> >
>
>
> --
> Best regards,
> Nikolay
>


--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Automatic free space map filling
Next
From: Simon Riggs
Date:
Subject: Re: Logging pg_autovacuum