Re: Google SoC--Idea Request - Mailing list pgsql-hackers
From | Jonah H. Harris |
---|---|
Subject | Re: Google SoC--Idea Request |
Date | |
Msg-id | 36e682920605020336i7570d805w75a1c6b5b9345e78@mail.gmail.com Whole thread Raw |
In response to | Re: Google SoC--Idea Request ("Nikolay Samokhvalov" <samokhvalov@gmail.com>) |
List | pgsql-hackers |
You need to submit this through Google. Student FAQ: http://code.google.com/soc/studentfaq.html Student Sign-up: http://code.google.com/soc/student_step1.html On 5/2/06, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote: > Proposal: XMLType for PostgreSQL. > > *** Minimum: *** > to have special type support for storing XML data and working with it. > This means following: > - ability to define any column of a table as of XMLType; internally, > all data is stored as VARCHAR; > - auto validation of documents against XML schema, if it was > specified in column > definition or in XML data sheets themselves (DTD, XSD or at least one > of them) /*contrib/xml2 has such feature, but it uses libxml, what > means DOM interface. Maybe it's better to use some SAX parser to solve > this task*/; > - XPath indexes for queries with path expressions in WHERE clause /*I > suppose this kind of indexes would be most frequently used. I propose > using good labeling schema and GIST and/or Gin here*/; > - some subset of SQL/XML. Actually, part 14 of SQL:200n (SQL/XML) has > more than 400 pages now and contains some established constructions, > that are using in other DBMSes. There is the some patch already > written by Pavel Stehule: > http://www.pgsql.ru/db/mw/msg.html?mid=2096818. (BTW, what is with it? > it was kept for 8.2, so what is the result?) I've tested it several > months ago, basic SQL/XML functions worked fine. It changes grammar, > but there is no other way... So, using this patch as a part of this > project means that this project cannot be contrib module, > unfortunately. Nevertheless, current paper of SQL/XML standard seems > to be mature - so, compared with existing implementation it would be a > nice 'landmark'; > - XML domains support: ability to define domain based on XMLType and > XML schema definition (e.g., external DTD file or smth). I'd consider > XML schema definition as a restriction of entire XML Type (similar to > restrictions for plain types, which are defined as CHECK constraint in > domain definition) > > *** Maximum: *** > - all things from 'minimum' list :-) > - reach index system: > * structure index (labeling schema; prefix schemas seem to be best > for this and I > suppose GIST would help here). Actually, it would be full shredding, > like primary index for XML in MS SQL Server, but I'm aware of better > labeling algorithms than simple prefix labeling (as in SQL Server). > Surely, GIST/Gin support would be great foundation for these > * flexible support of path indexes, value indexes and so on (smth > like secondary XML indexes in SQL Server...) - as a continuation of > work on path indexes from 'minimum' list; > - full-text search abilties (tsearch2 / GIST); > - different encoding issues (auto conversion to column's encoding, etc); > - ability to choose storage type: VARCHAR or 'native' (trees - like > in native XML DBMSes and DB2 Viper [if their articles don't lie ;-)]) > mode. Actually, this is very-very huge task (almost so as creating > DBMS from scratch) and I inderstand clearly that I won't solve it > using only my own abilities. But the work on 'minimum' list > (especially if it will be a part of SoC) would be a good start point > and may involve some other developers that help to implement it. Maybe > at the initial stage, it's worth to integrate with some other DBMS and > work with it using two-phase commit (surely, this is not a clue to all > problems, as it > means two different execution plans, etc); > - XQuery and its integration with SQL (according SQL/XML standard). > In other words, implementation of XQuery Data Model - this would be > great target point (version 1.0 of entire project); > - XML views / updatable XML views (actually, it's a crazy idea, but > it's my dream ;-) ) > > As a part of SoC I would concentrate on tasks from 'minimum' list. It > would be a good start point. > > Some articles: > Fresh draft of SQL:200n: http://www.wiscorp.com/sql_2003_standard.zip > Other SQL/XML papers: http://www.wiscorp.com/SQLStandards.html#xsqlstandards > XISS system (Li, Moon - advanced interval indexes): > http://www.cs.arizona.edu/xiss/ > MASS (prefix indexes): > http://davis.wpi.edu/dsrg/vamana/WebPages/Publication.html > Staircase joins (accelerating XPath Evaluation): > http://www.inf.uni-konstanz.de/dbis/publications/download/injection.pdf > Oleg's TODO list: http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo > XML in DB2 Viper: http://www.vldb2005.org/program/paper/thu/p1164-nicola.pdf > XQuery in SQL Server: http://www.vldb2005.org/program/paper/thu/p1175-pal.pdf > Labeling schema in SQL Server (ORDPATHs): > http://portal.acm.org/ft_gateway.cfm?id=1007686&type=pdf&coll=GUIDE&dl=GUIDE&CFID=74920272&CFTOKEN=73736781 > > One more comment: I'm a PhD student of MIPT, Russia. I plan to create > an overview of XMLType implementations of last versions of three major > commercial DBMSes (ORA, MS, DB2), comparing them to standard and each > other. First article of this comparison is planned to the end of May. > This work will help to understand, where major commercial DBMS vendors > go and why they go there :-) Moreover, I intend to create a technique > for testing of XMLType support in (O)RDBMSes. In spite of the fact, > that SoC assumes all work be done by only one person, I expect some > upport/help from following people: > - Dr. Sergey Kuznetsov (my scientific mentor) > - Oleg Bartunov and Teodor Sigaev (as major developers of PostgreSQL > and GIST and Gin, they definitely can help me to be successive); > - Ivan Zolotukhin (together we plan to create the overview mentioned above) > - PostgreSQL community (actually, as I've already mentioned, I intend > using code by Pavel Stehule, and I'm pretty sure that I'll need a lot > of other help from the community) > > On 4/15/06, Jonah H. Harris <jonah.harris@gmail.com> wrote: > > Hey everyone, > > > > I know we started a discussion a month or so ago regarding ideas for > > SoC projects. However, after reading through the thread, I didn't see > > us nail down any actual items. > > > > As such, we need to quickly put together a list of oh, 15-20 midlevel > > project ideas. I'm sure we can pull some off the TODO list, but we > > should also look at project ideas for porting some of the most used > > third-party OSS software to PostgreSQL too (portals, CMS systems, > > accounting systems, etc.). > > > > All ideas welcome! > > > > -- > > Jonah H. Harris, Database Internals Architect > > EnterpriseDB Corporation > > 732.331.1324 > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 2: Don't 'kill -9' the postmaster > > > > > -- > Best regards, > Nikolay > -- Jonah H. Harris, Database Internals Architect EnterpriseDB Corporation 732.331.1324
pgsql-hackers by date: