Thread: patches for items from TODO list
Hello all, We would like to contribute to the Postgresql community by implementing the following items from the TODO list (http://developer.postgresql.org/todo.php): . Allow COPY to understand \x as a hex byte . Allow COPY to optionally include column headings in the first line . Add XML output to COPY The changes are straightforward and include implementation of the features as well as modification of the regression tests and documentation. Before sending a diff file with the changes, we would like to know if these features have been already implemented. Best regards, Jason Lucas and Sergey Ten SourceLabs Dependable Open Source Systems
Sergey Ten wrote: > Hello all, > > We would like to contribute to the Postgresql community by implementing > the following items from the TODO list > (http://developer.postgresql.org/todo.php): > . Allow COPY to understand \x as a hex byte . Allow COPY to optionally > include column headings in the first line . Add XML output to COPY > > The changes are straightforward and include implementation of the > features as well as modification of the regression tests and documentation. > > Before sending a diff file with the changes, we would like to know if > these features have been already implemented. Please check the web site version. Someone has already implemented "Allow COPY to optionally include column headings in the first line". As far as XML, there has been discussion on where that should be done? In the backend, libpq, or psql. It will need discussion on hackers. I assume you have read the developer's FAQ too. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
> Please check the web site version. Someone has already implemented > "Allow COPY to optionally include column headings in the first line". > > As far as XML, there has been discussion on where that should be done? > In the backend, libpq, or psql. It will need discussion on hackers. I > assume you have read the developer's FAQ too. The other issue is 'what XML format'? Find me a standard data dump XML DTD and I'll change phpPgAdmin to use it as well. Otherwise, phpPgAdmin's is quite simple. Chris
Thank you to all who replied for your time. We have checked http://momjian.postgresql.org/cgi-bin/pgpatches and http://momjian.postgresql.org/cgi-bin/pgpatches2, and could not find patches with words "copy" or "xml" in the subject. Could you please clarify what the website version is and where it can be found? Is it sources available at ftp://ftp.postgresql.org? Best regards, Jason, Sergey > -----Original Message----- > From: Christopher Kings-Lynne [mailto:chriskl@familyhealth.com.au] > Sent: Wednesday, May 11, 2005 7:36 PM > To: Bruce Momjian > Cc: Sergey Ten; pgsql-hackers@postgresql.org; jason@sourcelabs.com > Subject: Re: [HACKERS] patches for items from TODO list > > > Please check the web site version. Someone has already implemented > > "Allow COPY to optionally include column headings in the first line". > > > > As far as XML, there has been discussion on where that should be done? > > In the backend, libpq, or psql. It will need discussion on hackers. I > > assume you have read the developer's FAQ too. > > The other issue is 'what XML format'? Find me a standard data dump XML > DTD and I'll change phpPgAdmin to use it as well. > > Otherwise, phpPgAdmin's is quite simple. > > Chris
Sergey Ten wrote: > Thank you to all who replied for your time. > > We have checked http://momjian.postgresql.org/cgi-bin/pgpatches and > http://momjian.postgresql.org/cgi-bin/pgpatches2, and could not find > patches > with words "copy" or "xml" in the subject. Could you please clarify what > the > website version is and where it can be found? Is it sources available at > ftp://ftp.postgresql.org? I think the website version of the TODO is that one on www.postgresql.org (see http://www.postgresql.org/docs/faqs.TODO.html, from Developers->Roadmap). "-Allow COPY to optionally include column headings in the first line" is already completed (-). That is what Bruce referenced (below). The patch has already been commited and is therefore not on the patches list anymore: http://archives.postgresql.org/pgsql-committers/2005-05/msg00075.php (I searched for "copy" in the mailing list archives of pgsql-committers.) Bruce Momjian wrote: >> > Please check the web site version. Someone has already implemented >> > "Allow COPY to optionally include column headings in the first line". Hope that clarifies the situation a little bit. Best Regards, Michael Paesold
Hello all, Thank you to all who replied for suggestions and help. Enclosed please find code changes for the following items: - Allow COPY to understand \x as a hex byte, and - Add XML output to COPY The changes include implementation of the features as well as modification of the copy regression test. After a careful consideration we decided to - put XML implementation in the backend and - use XML format described below, with justification of our decision. The XML schema used by the COPY TO command was designed for ease of use and to avoid the problem of column names appearing in XML element names. XML doesn't allow spaces and punctuation in element names but Postgres does allow these characters in column names; therefore, a direct mapping would be problematic. The solution selected places the column names into attribute fields where any special characters they contain can be properly escaped using XML entities. An additional attribute is used to distinguish null fields from empty ones. The example below is taken from the test suite. It demonstrates some basic XML escaping in row 2. Row 3 demonstrates the difference between an empty string (in col2) and a null string (in col3). If a field is null it will always be empty but a field which is empty may or may not be null. Always check the value of the 'null' attribute to be sure when a field is truly null. <?xml version='1.0'?> <table> <row> <col name='col1' null='n'>Jackson, Sam</col> <col name='col2' null='n'>\h</col> </row> <row> <col name='col1' null='n'>It is "perfect".</col> <col name='col2' null='n'> </col> </row> <row> <col name='col1' null='n'></col> <col name='col2' null='y'></col> </row> </table> Please let us know if about any concerns, objections the proposed change may cause. Best regards, Jason Lucas, Sergey Ten SourceLabs > -----Original Message----- > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > Sent: Wednesday, May 11, 2005 7:11 PM > To: Sergey Ten > Cc: pgsql-hackers@postgresql.org; jason@sourcelabs.com > Subject: Re: [HACKERS] patches for items from TODO list > > Sergey Ten wrote: > > Hello all, > > > > We would like to contribute to the Postgresql community by implementing > > the following items from the TODO list > > (http://developer.postgresql.org/todo.php): > > . Allow COPY to understand \x as a hex byte . Allow COPY to optionally > > include column headings in the first line . Add XML output to COPY > > > > The changes are straightforward and include implementation of the > > features as well as modification of the regression tests and > documentation. > > > > Before sending a diff file with the changes, we would like to know if > > these features have been already implemented. > > Please check the web site version. Someone has already implemented > "Allow COPY to optionally include column headings in the first line". > > As far as XML, there has been discussion on where that should be done? > In the backend, libpq, or psql. It will need discussion on hackers. I > assume you have read the developer's FAQ too. > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania > 19073
Attachment
On Sun, May 15, 2005 at 04:44:57PM +0800, Christopher Kings-Lynne wrote: > Looks like hierarchical queries are now officially stalled :( > > Anyone want to take this up for 8.1? Sergei and Jason, Feel like taking SQL:1999 WITH RECURSIVE? It would be a giant help to the PostgreSQL community. :) http://gppl.moonbone.ru/index.shtml has part of it, and http://candle.pha.pa.us/mhonarc/patches2/msg00175.html has others. There's also MERGE, which is covered starting on page 47 of http://wiscorp.com/sql/SQL2003Features.pdf also pp 839-845 of 5WD-02-Foundation-2003-09.pdf which is part of this: http://wiscorp.com/sql/sql_2003_standard.zip and an overview here: http://www.varlena.com/varlena/GeneralBits/73.php Cheers, D > Hi, > > I haven't done any significant progress on that way because of lack of > free time. > Beside this, I'm recently changed my job and now I'm woking for MySQL. > I think it's not possible for me to continue work on PostgreSQL. > Feel free to take the patch and develop it further as long as you > mention me as author of initial version. > > Regards, Evgen. > > On 5/5/05, Christopher Kings-Lynne <chriskl@familyhealth.com.au> wrote: > >Hi Evgen, > > > >I just keep pinging this patch thread every once in a while to make sure > >it doesn't get forgotten :) > > > >How is the syncing with 8.1 CVS coming along? > > > >Chris > > > >Evgen Potemkin wrote: > >> Hi hackers! > >> > >> I have done initial implementation of SQL99 WITH clause (attached). > >> It's now only for v7.3.4 and haven't a lot of checks and restrictions. > >> It can execute only simple WITH queries like > >> WITH tree AS (SELECT id,pnt,name,1::int AS level FROM t WHERE id=0 > >> UNION SELECT t.id,t.pnt,t.name,tree.level+1 FROM t JOIN tree ON > >> tree.id=t.pnt ) SELECT * FROM tree; > >> It would be great if someone with knowledge of Pg internals can make a > >> kind of code review and make some advices how to better implement all > >> this. > >> > >> Regards, Evgen. > >> > >> > >> ------------------------------------------------------------------------ > >> > >> > >> ---------------------------(end of broadcast)--------------------------- > >> TIP 3: if posting/reading through Usenet, please send an appropriate > >> subscribe-nomail command to majordomo@postgresql.org so that your > >> message can get through to the mailing list cleanly > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings On Wed, May 11, 2005 at 07:01:50PM -0700, Sergey Ten wrote: > Hello all, > > We would like to contribute to the Postgresql community by implementing > the following items from the TODO list > (http://developer.postgresql.org/todo.php): > . Allow COPY to understand \x as a hex byte . Allow COPY to optionally > include column headings in the first line . Add XML output to COPY > > The changes are straightforward and include implementation of the > features as well as modification of the regression tests and documentation. > > Before sending a diff file with the changes, we would like to know if > these features have been already implemented. > > Best regards, > Jason Lucas and Sergey Ten > SourceLabs > > Dependable Open Source Systems > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
On Mon, May 16, 2005 at 03:09:18PM -0700, jason@sourcelabs.com wrote: > David-- > > My boss has given me approval to put up to 8 hours per week of > SourceLabs' time in on the SQL99 hierarchical query implementation. That's great! :) > (I'm free, of course, to supplement this with whatever of my own > time I can spare.) I'm willing to take on the work. What's the > next step? I suppose the first thing would be to look over the patches I mentioned and the SQL:2003 specification, then put together a preliminary patch and send it to -hackers. Cheers, David. -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
David Fetter <david@fetter.org> writes: >> What's the next step? > I suppose the first thing would be to look over the patches I > mentioned and the SQL:2003 specification, then put together a > preliminary patch and send it to -hackers. You can get useful feedback long before having anything that looks like a patch --- and I'd encourage you to do so. Send us design notes, rough data structures, etc. Quite honestly, hierarchical queries aren't the easiest thing in the world and wouldn't be my recommendation of the first ... or even the second ... backend hacking project for a new posthacker to tackle. If that's where you feel you must start, OK, but try to get as much feedback as soon as you can, sooner not later. I seem to recall some discussion of how to do this in the past; have you trolled the pghackers archives? regards, tom lane
On T, 2005-05-17 at 00:42 -0400, Tom Lane wrote: > David Fetter <david@fetter.org> writes: > >> What's the next step? > > > I suppose the first thing would be to look over the patches I > > mentioned and the SQL:2003 specification, then put together a > > preliminary patch and send it to -hackers. ... > I seem to recall some discussion of how to do this in the past; > have you trolled the pghackers archives? I think that Jasons inspiration for doing it came from the the fact that there are already now abandoned patches for doing it. So studying/understanding the current patch, and describing and getting feedback from pgsql-hackers should be quite a good way of gaining insight in trickier parts of postgres. So it will not be jumping at new problem and writing a patch, but rather trying to get an existing patch into good shape for being accepted in the backend. -- Hannu Krosing <hannu@skype.net>
David-- My boss has given me approval to put up to 8 hours per week of SourceLabs' time in on the SQL99 hierarchical query implementation. (I'm free, of course, to supplement this with whatever of my own time I can spare.) I'm willing to take on the work. What's the next step? --Jason
Hannu Krosing <hannu@skype.net> writes: > On T, 2005-05-17 at 00:42 -0400, Tom Lane wrote: >> I seem to recall some discussion of how to do this in the past; >> have you trolled the pghackers archives? > I think that Jasons inspiration for doing it came from the the fact that > there are already now abandoned patches for doing it. Having looked over the latest patch, my advice would be to ignore it :-( It's almost completely devoid of documentation, except for comments that he copied-and-pasted from elsewhere without modification. Wrong comments are even worse than none. What I'd like to see before anyone writes a line of code is a text document explaining how this is going to work: what's the plan tree structure, what happens at execution time, how much of the SQL99 spec is going to get implemented. If you don't have that understanding first, you're going to get buried in irrelevant details. regards, tom lane
Tom Lane wrote: > Hannu Krosing <hannu@skype.net> writes: > > On T, 2005-05-17 at 00:42 -0400, Tom Lane wrote: > >> I seem to recall some discussion of how to do this in the past; > >> have you trolled the pghackers archives? > > > I think that Jasons inspiration for doing it came from the the fact that > > there are already now abandoned patches for doing it. > > Having looked over the latest patch, my advice would be to ignore it :-( > It's almost completely devoid of documentation, except for comments > that he copied-and-pasted from elsewhere without modification. Wrong > comments are even worse than none. > > What I'd like to see before anyone writes a line of code is a text > document explaining how this is going to work: what's the plan tree > structure, what happens at execution time, how much of the SQL99 spec > is going to get implemented. If you don't have that understanding > first, you're going to get buried in irrelevant details. I have updated the developer's FAQ to cover these suggestions on how to start a patch: 1.4) What do I do after choosing an item to work on? Send an email to pgsql-hackers with a proposal for what you want to do (assuming your contribution is not trivial). Workingin isolation is not advisable because others might be working on the same TODO item, or you might have misunderstoodthe TODO item. In the email, discuss both the internal implementation method you plan to use, and any user-visiblechanges (new syntax, etc). For complex patches, it is important to get community feeback on your proposal beforestarting work. Failure to do so might mean your patch is rejected. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Dnia 13-05-2005, pią o godzinie 16:01 -0700, Sergey Ten napisał(a): > <?xml version='1.0'?> > <table> > <row> > <col name='col1' null='n'>Jackson, Sam</col> > <col name='col2' null='n'>\h</col> > </row> > <row> > <col name='col1' null='n'>It is "perfect".</col> > <col name='col2' null='n'> </col> > </row> > <row> > <col name='col1' null='n'></col> > <col name='col2' null='y'></col> > </row> > </table> Why didn't you do something to the effect of <?xml version='1.0'?> <table><cols> <col name='col1'/> <col name='col2'/></cols><row> <col null='n'>Jackson, Sam</col> <col null='n'>\h</col></row><row> <col null='n'>It is "perfect".</col> <col null='n'> </col></row><row> <col null='n'></col> <col null='y'></col></row> </table> This avoids repeating the column names in every row, which don't change over the rows anyway. By reducing redundant information it also makes structurally invalid XML less likely (whether that is relevant depends on what people do with the XML data). Also you could encode the XML output as UTF-8, which would make the files more readable for humans if the text data is not ASCII. Markus -- Markus Bertheau <twanger@bluetwanger.de>
Sergey Ten wrote: > After a careful consideration we decided to > - put XML implementation in the backend What advantage does putting the XML output mode in the backend provide? -Neil
On T, 2005-05-17 at 11:22 -0400, Tom Lane wrote: > Hannu Krosing <hannu@skype.net> writes: > > On T, 2005-05-17 at 00:42 -0400, Tom Lane wrote: > >> I seem to recall some discussion of how to do this in the past; > >> have you trolled the pghackers archives? > > > I think that Jasons inspiration for doing it came from the the fact that > > there are already now abandoned patches for doing it. > > Having looked over the latest patch, my advice would be to ignore it :-( > It's almost completely devoid of documentation, except for comments > that he copied-and-pasted from elsewhere without modification. Wrong > comments are even worse than none. ANd even worse - this is from the README on the website: ----8<---------8<---------8<---------8<---------8<---------8<---------8<----- WHAT'S THIS This is a patch which allows PgSQL to make hierarchical queries a la Oracle do. (c) Evgen Potemkin 2003,2004, < gppl at inbox dot ru >, entirely based on PostgreSQL (http://www.postgresql.org) Patch itself distributed under GPL. No warranty of any kind is given, use it at your own risk. ----8<---------8<---------8<---------8<---------8<---------8<---------8<----- So the license is also incompatible :( -- Hannu Krosing <hannu@skype.net>
On Tue, May 17, 2005 at 11:24:19PM +0300, Hannu Krosing wrote: > On T, 2005-05-17 at 11:22 -0400, Tom Lane wrote: > > Hannu Krosing <hannu@skype.net> writes: > > > On T, 2005-05-17 at 00:42 -0400, Tom Lane wrote: > > >> I seem to recall some discussion of how to do this in the past; > > >> have you trolled the pghackers archives? > > > > > I think that Jasons inspiration for doing it came from the the fact that > > > there are already now abandoned patches for doing it. > > > > Having looked over the latest patch, my advice would be to ignore it :-( > > It's almost completely devoid of documentation, except for comments > > that he copied-and-pasted from elsewhere without modification. Wrong > > comments are even worse than none. > > ANd even worse - this is from the README on the website: > > ----8<---------8<---------8<---------8<---------8<---------8<---------8<----- > WHAT'S THIS > > This is a patch which allows PgSQL to make hierarchical queries a la > Oracle do. > > (c) Evgen Potemkin 2003,2004, < gppl at inbox dot ru >, entirely based > on PostgreSQL (http://www.postgresql.org) > Patch itself distributed under GPL. No warranty of any kind is given, > use it at your own risk. It's the notes on the CONNECT BY patch, not the WITH patch. The WITH patch, as far as I can tell, is in the public domain. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
Markus, Thank you for your reply. We considered embedding of an XML schema first followed by data. We decided to stick to our current data format to make sure stateless XML parsers can process it as well. Would it be better to add an option to the COPY command, to allow embedding an XML schema, so more advanced XML parsers can take advantage of it? Thanks, Jason, Sergey > -----Original Message----- > From: Markus Bertheau [mailto:twanger@bluetwanger.de] > Sent: Tuesday, May 17, 2005 6:00 PM > To: Sergey Ten > Cc: 'Bruce Momjian'; 'Christopher Kings-Lynne'; pgsql- > hackers@postgresql.org; jason@sourcelabs.com > Subject: Re: [HACKERS] patches for items from TODO list > > Dnia 13-05-2005, pią o godzinie 16:01 -0700, Sergey Ten napisał(a): > > > <?xml version='1.0'?> > > <table> > > <row> > > <col name='col1' null='n'>Jackson, Sam</col> > > <col name='col2' null='n'>\h</col> > > </row> > > <row> > > <col name='col1' null='n'>It is "perfect".</col> > > <col name='col2' null='n'> </col> > > </row> > > <row> > > <col name='col1' null='n'></col> > > <col name='col2' null='y'></col> > > </row> > > </table> > > Why didn't you do something to the effect of > > <?xml version='1.0'?> > <table> > <cols> > <col name='col1'/> > <col name='col2'/> > </cols> > <row> > <col null='n'>Jackson, Sam</col> > <col null='n'>\h</col> > </row> > <row> > <col null='n'>It is "perfect".</col> > <col null='n'> </col> > </row> > <row> > <col null='n'></col> > <col null='y'></col> > </row> > </table> > > This avoids repeating the column names in every row, which don't change > over the rows anyway. By reducing redundant information it also makes > structurally invalid XML less likely (whether that is relevant depends > on what people do with the XML data). > > Also you could encode the XML output as UTF-8, which would make the > files more readable for humans if the text data is not ASCII. > > Markus > > -- > Markus Bertheau <twanger@bluetwanger.de>
Neil, We think that putting it in the backend will make access from other components easier. Thank you, Sergey > -----Original Message----- > From: Neil Conway [mailto:neilc@samurai.com] > Sent: Tuesday, May 17, 2005 7:11 PM > To: Sergey Ten > Cc: 'Bruce Momjian'; 'Christopher Kings-Lynne'; pgsql- > hackers@postgresql.org; jason@sourcelabs.com > Subject: Re: [HACKERS] patches for items from TODO list > > Sergey Ten wrote: > > After a careful consideration we decided to > > - put XML implementation in the backend > > What advantage does putting the XML output mode in the backend provide? > > -Neil
Sergey Ten wrote: > We think that putting it in the backend will make access from other > components easier. In what way? It seems to me that this can be done just as easily in a client application / library, without cluttering the backend with yet another COPY output format. It would also avoid the need to mandate a single XML schema -- different clients will likely have different requirements. Since I am skeptical of the value of this feature in the first place, I think it would do less damage if implemented outside the backend. -Neil
Sergey Ten wrote: > Markus, > > Thank you for your reply. > We considered embedding of an XML schema first followed by data. We decided > to stick to our current data format to make sure stateless XML parsers can > process it as well. Would it be better to add an option to the COPY command, > to allow embedding an XML schema, so more advanced XML parsers can take > advantage of it? Current CVS has a WITH CSV HEADER option. I wonder if we should add a HEADER option to XML to output the schema --- seems to make sense. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Neil Conway wrote: > Sergey Ten wrote: > > We think that putting it in the backend will make access from other > > components easier. > > In what way? > > It seems to me that this can be done just as easily in a client > application / library, without cluttering the backend with yet another > COPY output format. It would also avoid the need to mandate a single XML > schema -- different clients will likely have different requirements. > Since I am skeptical of the value of this feature in the first place, I > think it would do less damage if implemented outside the backend. We considered putting XML in psql or libpq in the past, but the problem is that interfaces like jdbc couldn't take advantage of it. I do think it needs to be in the backend, and I think that is the agreement we had in the past. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
I've been reviewing this patch and some of the following discussion. First, postgresql patches are usually sent as context diffs. I don't object to unidiffs myself, but you should do what everybody else does. Second, it's best not to combine features in one patch. The \x escape piece should be broken out. I'm also a rather worried about COPY producing output which it can't itself parse. We can read our own binary, text and CSV formats, and I think that's a useful validation tool. I know the TODO item only mentions output, but I believe we should rethink that. In any case, if it's valid for us to hand XML to other programs why shouldn't we accept it too. This is all about playing nicely in the playground. One advantage of XML is that, being hierarchical, it can easily express nested composites (records, arrays) in a way that our present text and CSV formats really can't. But unless I missed something this patch doesn't in fact do anything to break out nested composites. Finally, I don't know if there is a standard on this, or even a convention. What do other DBs do? I'm not keen on us just inventing our own XML dialect for something that should after all be most useful in data exchange. Bottom line, much as I would like to see XML input/output, I think this needs lots more thought and discussion. cheers andrew Sergey Ten wrote: >Hello all, > >Thank you to all who replied for suggestions and help. Enclosed please find >code changes for the following items: >- Allow COPY to understand \x as a hex byte, and >- Add XML output to COPY >The changes include implementation of the features as well as modification >of the copy regression test. > >After a careful consideration we decided to >- put XML implementation in the backend and >- use XML format described below, with justification of our decision. > >The XML schema used by the COPY TO command was designed for ease of use and >to avoid the problem of column names appearing in XML element names. >XML doesn't allow spaces and punctuation in element names but Postgres does >allow these characters in column names; therefore, a direct mapping would be >problematic. > >The solution selected places the column names into attribute fields where >any special characters they contain can be properly escaped using XML >entities. An additional attribute is used to distinguish null fields from >empty ones. > >The example below is taken from the test suite. It demonstrates some basic >XML escaping in row 2. Row 3 demonstrates the difference between an empty >string (in col2) and a null string (in col3). If a field is null it will >always be empty but a field which is empty may or may not be null. >Always check the value of the 'null' attribute to be sure when a field is >truly null. > ><?xml version='1.0'?> ><table> > <row> > <col name='col1' null='n'>Jackson, Sam</col> > <col name='col2' null='n'>\h</col> > </row> > <row> > <col name='col1' null='n'>It is "perfect".</col> > <col name='col2' null='n'> </col> > </row> > <row> > <col name='col1' null='n'></col> > <col name='col2' null='y'></col> > </row> ></table> > >Please let us know if about any concerns, objections the proposed change may >cause. > >Best regards, >Jason Lucas, Sergey Ten >SourceLabs > > > >>-----Original Message----- >>From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] >>Sent: Wednesday, May 11, 2005 7:11 PM >>To: Sergey Ten >>Cc: pgsql-hackers@postgresql.org; jason@sourcelabs.com >>Subject: Re: [HACKERS] patches for items from TODO list >> >>Sergey Ten wrote: >> >> >>>Hello all, >>> >>>We would like to contribute to the Postgresql community by implementing >>>the following items from the TODO list >>>(http://developer.postgresql.org/todo.php): >>>. Allow COPY to understand \x as a hex byte . Allow COPY to optionally >>>include column headings in the first line . Add XML output to COPY >>> >>>The changes are straightforward and include implementation of the >>>features as well as modification of the regression tests and >>> >>> >>documentation. >> >> >>>Before sending a diff file with the changes, we would like to know if >>>these features have been already implemented. >>> >>> >>Please check the web site version. Someone has already implemented >>"Allow COPY to optionally include column headings in the first line". >> >>As far as XML, there has been discussion on where that should be done? >>In the backend, libpq, or psql. It will need discussion on hackers. I >>assume you have read the developer's FAQ too. >> >>-- >> Bruce Momjian | http://candle.pha.pa.us >> pgman@candle.pha.pa.us | (610) 359-1001 >> + If your life is a hard drive, | 13 Roberts Road >> + Christ can be your backup. | Newtown Square, Pennsylvania >>19073 >> >> >>------------------------------------------------------------------------ >> >>Index: src/backend/commands/copy.c >>=================================================================== >>RCS file: /projects/cvsroot/pgsql/src/backend/commands/copy.c,v >>retrieving revision 1.244 >>diff -u -r1.244 copy.c >>--- src/backend/commands/copy.c 7 May 2005 02:22:46 -0000 1.244 >>+++ src/backend/commands/copy.c 13 May 2005 22:21:00 -0000 >>@@ -84,6 +84,16 @@ >> EOL_CRNL >> } EolType; >> >>+/* >>+ * Represents the format of the file to be read or written >>+ */ >>+typedef enum CopyFmt >>+{ >>+ FMT_TXT, >>+ FMT_BIN, >>+ FMT_CSV, >>+ FMT_XML >>+} CopyFmt; >> >> static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0"; >> >>@@ -129,14 +139,14 @@ >> static bool line_buf_converted; >> >> /* non-export function prototypes */ >>-static void DoCopyTo(Relation rel, List *attnumlist, bool binary, bool oids, >>- char *delim, char *null_print, bool csv_mode, char *quote, >>+static void DoCopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids, >>+ char *delim, char *null_print, char *quote, >> char *escape, List *force_quote_atts, bool header_line, bool fe_copy); >>-static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids, >>- char *delim, char *null_print, bool csv_mode, char *quote, char *escape, >>+static void CopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids, >>+ char *delim, char *null_print, char *quote, char *escape, >> List *force_quote_atts, bool header_line); >>-static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids, >>- char *delim, char *null_print, bool csv_mode, char *quote, char *escape, >>+static void CopyFrom(Relation rel, List *attnumlist, CopyFmt fmt, bool oids, >>+ char *delim, char *null_print, char *quote, char *escape, >> List *force_notnull_atts, bool header_line); >> static bool CopyReadLine(char * quote, char * escape); >> static char *CopyReadAttribute(const char *delim, const char *null_print, >>@@ -171,6 +181,11 @@ >> static void CopySendInt16(int16 val); >> static int16 CopyGetInt16(void); >> >>+static int GetDecimalFromHex(char hex); >>+ >>+static void CopyAttributeOutXML (char *colname, char *string); >>+static void CopySendStringXML(char *string); >>+static char *CopyGetXMLEntity(char c, char *buf); >> >> /* >> * Send copy start/stop messages for frontend copies. These have changed >>@@ -692,10 +707,9 @@ >> List *attnamelist = stmt->attlist; >> List *attnumlist; >> bool fe_copy = false; >>- bool binary = false; >> bool oids = false; >>- bool csv_mode = false; >>- bool header_line = false; >>+ bool header_line = false; >>+ CopyFmt fmt = FMT_TXT; >> char *delim = NULL; >> char *quote = NULL; >> char *escape = NULL; >>@@ -715,11 +729,11 @@ >> >> if (strcmp(defel->defname, "binary") == 0) >> { >>- if (binary) >>+ if (fmt != FMT_TXT) >> ereport(ERROR, >> (errcode(ERRCODE_SYNTAX_ERROR), >> errmsg("conflicting or redundant options"))); >>- binary = intVal(defel->arg); >>+ fmt = FMT_BIN; >> } >> else if (strcmp(defel->defname, "oids") == 0) >> { >>@@ -747,11 +761,19 @@ >> } >> else if (strcmp(defel->defname, "csv") == 0) >> { >>- if (csv_mode) >>+ if (fmt != FMT_TXT) >> ereport(ERROR, >> (errcode(ERRCODE_SYNTAX_ERROR), >> errmsg("conflicting or redundant options"))); >>- csv_mode = intVal(defel->arg); >>+ fmt = FMT_CSV; >>+ } >>+ else if (strcmp(defel->defname, "xml") == 0) >>+ { >>+ if (fmt != FMT_TXT) >>+ ereport(ERROR, >>+ (errcode(ERRCODE_SYNTAX_ERROR), >>+ errmsg("conflicting or redundant options"))); >>+ fmt = FMT_XML; >> } >> else if (strcmp(defel->defname, "header") == 0) >> { >>@@ -798,29 +820,39 @@ >> defel->defname); >> } >> >>- if (binary && delim) >>+ if (fmt == FMT_BIN && delim) >> ereport(ERROR, >> (errcode(ERRCODE_SYNTAX_ERROR), >> errmsg("cannot specify DELIMITER in BINARY mode"))); >> >>- if (binary && csv_mode) >>+ if (fmt == FMT_BIN && null_print) >> ereport(ERROR, >> (errcode(ERRCODE_SYNTAX_ERROR), >>- errmsg("cannot specify CSV in BINARY mode"))); >>+ errmsg("cannot specify NULL in BINARY mode"))); >> >>- if (binary && null_print) >>+ if (fmt == FMT_XML && is_from) >> ereport(ERROR, >> (errcode(ERRCODE_SYNTAX_ERROR), >>- errmsg("cannot specify NULL in BINARY mode"))); >>+ errmsg("XML mode is not available in COPY FROM"))); >>+ >>+ if (fmt == FMT_XML && delim) >>+ ereport(ERROR, >>+ (errcode(ERRCODE_SYNTAX_ERROR), >>+ errmsg("cannot specify DELIMITER in XML mode"))); >>+ >>+ if (fmt == FMT_XML && null_print) >>+ ereport(ERROR, >>+ (errcode(ERRCODE_SYNTAX_ERROR), >>+ errmsg("cannot specify NULL in XML mode"))); >> >> /* Set defaults */ >> if (!delim) >>- delim = csv_mode ? "," : "\t"; >>+ delim = (fmt == FMT_CSV) ? "," : "\t"; >> >> if (!null_print) >>- null_print = csv_mode ? "" : "\\N"; >>+ null_print = (fmt == FMT_CSV) ? "" : "\\N"; >> >>- if (csv_mode) >>+ if (fmt == FMT_CSV) >> { >> if (!quote) >> quote = "\""; >>@@ -835,35 +867,35 @@ >> errmsg("COPY delimiter must be a single character"))); >> >> /* Check header */ >>- if (!csv_mode && header_line) >>+ if (fmt != FMT_CSV && header_line) >> ereport(ERROR, >> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> errmsg("COPY HEADER available only in CSV mode"))); >> >> /* Check quote */ >>- if (!csv_mode && quote != NULL) >>+ if (fmt != FMT_CSV && quote != NULL) >> ereport(ERROR, >> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> errmsg("COPY quote available only in CSV mode"))); >> >>- if (csv_mode && strlen(quote) != 1) >>+ if (fmt == FMT_CSV && strlen(quote) != 1) >> ereport(ERROR, >> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> errmsg("COPY quote must be a single character"))); >> >> /* Check escape */ >>- if (!csv_mode && escape != NULL) >>+ if (fmt != FMT_CSV && escape != NULL) >> ereport(ERROR, >> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> errmsg("COPY escape available only in CSV mode"))); >> >>- if (csv_mode && strlen(escape) != 1) >>+ if (fmt == FMT_CSV && strlen(escape) != 1) >> ereport(ERROR, >> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> errmsg("COPY escape must be a single character"))); >> >> /* Check force_quote */ >>- if (!csv_mode && force_quote != NIL) >>+ if (fmt != FMT_CSV && force_quote != NIL) >> ereport(ERROR, >> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> errmsg("COPY force quote available only in CSV mode"))); >>@@ -873,7 +905,7 @@ >> errmsg("COPY force quote only available using COPY TO"))); >> >> /* Check force_notnull */ >>- if (!csv_mode && force_notnull != NIL) >>+ if (fmt != FMT_CSV && force_notnull != NIL) >> ereport(ERROR, >> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> errmsg("COPY force not null available only in CSV mode"))); >>@@ -889,7 +921,7 @@ >> errmsg("COPY delimiter must not appear in the NULL specification"))); >> >> /* Don't allow the csv quote char to appear in the null string. */ >>- if (csv_mode && strchr(null_print, quote[0]) != NULL) >>+ if (fmt == FMT_CSV && strchr(null_print, quote[0]) != NULL) >> ereport(ERROR, >> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> errmsg("CSV quote character must not appear in the NULL specification"))); >>@@ -1004,7 +1036,7 @@ >> if (pipe) >> { >> if (whereToSendOutput == Remote) >>- ReceiveCopyBegin(binary, list_length(attnumlist)); >>+ ReceiveCopyBegin(fmt == FMT_BIN, list_length(attnumlist)); >> else >> copy_file = stdin; >> } >>@@ -1029,7 +1061,7 @@ >> errmsg("\"%s\" is a directory", filename))); >> } >> } >>- CopyFrom(rel, attnumlist, binary, oids, delim, null_print, csv_mode, >>+ CopyFrom(rel, attnumlist, fmt, oids, delim, null_print, >> quote, escape, force_notnull_atts, header_line); >> } >> else >>@@ -1093,7 +1125,7 @@ >> } >> } >> >>- DoCopyTo(rel, attnumlist, binary, oids, delim, null_print, csv_mode, >>+ DoCopyTo(rel, attnumlist, fmt, oids, delim, null_print, >> quote, escape, force_quote_atts, header_line, fe_copy); >> } >> >>@@ -1124,20 +1156,20 @@ >> * so we don't need to plaster a lot of variables with "volatile". >> */ >> static void >>-DoCopyTo(Relation rel, List *attnumlist, bool binary, bool oids, >>- char *delim, char *null_print, bool csv_mode, char *quote, >>+DoCopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids, >>+ char *delim, char *null_print, char *quote, >> char *escape, List *force_quote_atts, bool header_line, bool fe_copy) >> { >> PG_TRY(); >> { >> if (fe_copy) >>- SendCopyBegin(binary, list_length(attnumlist)); >>+ SendCopyBegin(fmt == FMT_BIN, list_length(attnumlist)); >> >>- CopyTo(rel, attnumlist, binary, oids, delim, null_print, csv_mode, >>+ CopyTo(rel, attnumlist, fmt, oids, delim, null_print, >> quote, escape, force_quote_atts, header_line); >> >> if (fe_copy) >>- SendCopyEnd(binary); >>+ SendCopyEnd(fmt == FMT_BIN); >> } >> PG_CATCH(); >> { >>@@ -1156,8 +1188,8 @@ >> * Copy from relation TO file. >> */ >> static void >>-CopyTo(Relation rel, List *attnumlist, bool binary, bool oids, >>- char *delim, char *null_print, bool csv_mode, char *quote, >>+CopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids, >>+ char *delim, char *null_print, char *quote, >> char *escape, List *force_quote_atts, bool header_line) >> { >> HeapTuple tuple; >>@@ -1187,7 +1219,7 @@ >> Oid out_func_oid; >> bool isvarlena; >> >>- if (binary) >>+ if (fmt == FMT_BIN) >> getTypeBinaryOutputInfo(attr[attnum - 1]->atttypid, >> &out_func_oid, >> &isvarlena); >>@@ -1215,7 +1247,7 @@ >> ALLOCSET_DEFAULT_INITSIZE, >> ALLOCSET_DEFAULT_MAXSIZE); >> >>- if (binary) >>+ if (fmt == FMT_BIN) >> { >> /* Generate header for a binary copy */ >> int32 tmp; >>@@ -1233,6 +1265,14 @@ >> } >> else >> { >>+ if (fmt == FMT_XML) >>+ { >>+ CopySendString("<?xml version='1.0'?>"); >>+ CopySendEndOfRow(false); >>+ CopySendString("<table>"); >>+ CopySendEndOfRow(false); >>+ } >>+ >> /* >> * For non-binary copy, we need to convert null_print to client >> * encoding, because it will be sent directly with CopySendString. >>@@ -1262,7 +1302,7 @@ >> strcmp(colname, null_print) == 0); >> } >> >>- CopySendEndOfRow(binary); >>+ CopySendEndOfRow(fmt == FMT_BIN); >> >> } >> } >>@@ -1278,7 +1318,7 @@ >> MemoryContextReset(mycontext); >> oldcontext = MemoryContextSwitchTo(mycontext); >> >>- if (binary) >>+ if (fmt == FMT_BIN) >> { >> /* Binary per-tuple header */ >> CopySendInt16(attr_count); >>@@ -1294,25 +1334,34 @@ >> } >> else >> { >>+ if (fmt == FMT_XML) >>+ CopySendString("<row>"); >>+ >> /* Text format has no per-tuple header, but send OID if wanted */ >> if (oids) >> { >> string = DatumGetCString(DirectFunctionCall1(oidout, >> ObjectIdGetDatum(HeapTupleGetOid(tuple)))); >>- CopySendString(string); >>+ >>+ if (fmt == FMT_XML) >>+ CopyAttributeOutXML("oid", string); >>+ else >>+ CopySendString(string); >>+ >> need_delim = true; >> } >> } >> >> foreach(cur, attnumlist) >> { >>- int attnum = lfirst_int(cur); >>+ int attnum = lfirst_int(cur); >> Datum value; >> bool isnull; >>+ char *colname = NameStr(attr[attnum - 1]->attname); >> >> value = heap_getattr(tuple, attnum, tupDesc, &isnull); >> >>- if (!binary) >>+ if (fmt == FMT_TXT || fmt == FMT_CSV) >> { >> if (need_delim) >> CopySendChar(delim[0]); >>@@ -1321,53 +1370,71 @@ >> >> if (isnull) >> { >>- if (!binary) >>- CopySendString(null_print); /* null indicator */ >>- else >>- CopySendInt32(-1); /* null marker */ >>+ switch (fmt) >>+ { >>+ case FMT_BIN: >>+ CopySendInt32(-1); /* null marker */ >>+ break; >>+ case FMT_XML: >>+ CopyAttributeOutXML(colname, NULL); /* null entity */ >>+ break; >>+ default: >>+ CopySendString(null_print); /* null indicator */ >>+ break; >>+ } >>+ >> } >> else >> { >>- if (!binary) >>+ if (fmt == FMT_BIN) >> { >>- string = DatumGetCString(FunctionCall1(&out_functions[attnum - 1], >>- value)); >>- if (csv_mode) >>- { >>- CopyAttributeOutCSV(string, delim, quote, escape, >>- (strcmp(string, null_print) == 0 || >>- force_quote[attnum - 1])); >>- } >>- else >>- CopyAttributeOut(string, delim); >>- >>+ bytea *outputbytes = >>+ DatumGetByteaP(FunctionCall1(&out_functions[attnum - 1], value)); >>+ /* We assume the result will not have been toasted */ >>+ CopySendInt32(VARSIZE(outputbytes) - VARHDRSZ); >>+ CopySendData(VARDATA(outputbytes), VARSIZE(outputbytes) - VARHDRSZ); >> } >> else >> { >>- bytea *outputbytes; >>- >>- outputbytes = DatumGetByteaP(FunctionCall1(&out_functions[attnum - 1], >>- value)); >>- /* We assume the result will not have been toasted */ >>- CopySendInt32(VARSIZE(outputbytes) - VARHDRSZ); >>- CopySendData(VARDATA(outputbytes), >>- VARSIZE(outputbytes) - VARHDRSZ); >>+ string = DatumGetCString(FunctionCall1(&out_functions[attnum - 1], value)); >>+ switch (fmt) >>+ { >>+ case FMT_CSV: >>+ CopyAttributeOutCSV(string, delim, quote, escape, >>+ (strcmp(string, null_print) == 0 >>+ || force_quote[attnum - 1])); >>+ break; >>+ case FMT_XML: >>+ CopyAttributeOutXML(colname, string); >>+ break; >>+ default: >>+ CopyAttributeOut(string, delim); >>+ break; >>+ } >> } >> } >> } >> >>- CopySendEndOfRow(binary); >>+ if (fmt == FMT_XML) >>+ CopySendString("</row>"); >>+ >>+ CopySendEndOfRow(fmt == FMT_BIN); >> >> MemoryContextSwitchTo(oldcontext); >> } >> >> heap_endscan(scandesc); >> >>- if (binary) >>+ if (fmt == FMT_BIN) >> { >> /* Generate trailer for a binary copy */ >> CopySendInt16(-1); >> } >>+ else if (fmt == FMT_XML) >>+ { >>+ CopySendString("</table>"); >>+ CopySendEndOfRow(false); >>+ } >> >> MemoryContextDelete(mycontext); >> >>@@ -1464,8 +1531,8 @@ >> * Copy FROM file to relation. >> */ >> static void >>-CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids, >>- char *delim, char *null_print, bool csv_mode, char *quote, >>+CopyFrom(Relation rel, List *attnumlist, CopyFmt fmt, bool oids, >>+ char *delim, char *null_print, char *quote, >> char *escape, List *force_notnull_atts, bool header_line) >> { >> HeapTuple tuple; >>@@ -1549,7 +1616,7 @@ >> continue; >> >> /* Fetch the input function and typioparam info */ >>- if (binary) >>+ if (fmt == FMT_BIN) >> getTypeBinaryInputInfo(attr[attnum - 1]->atttypid, >> &in_func_oid, &typioparams[attnum - 1]); >> else >>@@ -1620,7 +1687,7 @@ >> */ >> ExecBSInsertTriggers(estate, resultRelInfo); >> >>- if (!binary) >>+ if (fmt != FMT_BIN) >> file_has_oids = oids; /* must rely on user to tell us this... */ >> else >> { >>@@ -1663,7 +1730,7 @@ >> } >> } >> >>- if (file_has_oids && binary) >>+ if (file_has_oids && fmt == FMT_BIN) >> { >> getTypeBinaryInputInfo(OIDOID, >> &in_func_oid, &oid_typioparam); >>@@ -1681,7 +1748,7 @@ >> /* Initialize static variables */ >> fe_eof = false; >> eol_type = EOL_UNKNOWN; >>- copy_binary = binary; >>+ copy_binary = (fmt == FMT_BIN); >> copy_relname = RelationGetRelationName(rel); >> copy_lineno = 0; >> copy_attname = NULL; >>@@ -1718,14 +1785,14 @@ >> MemSet(values, 0, num_phys_attrs * sizeof(Datum)); >> MemSet(nulls, 'n', num_phys_attrs * sizeof(char)); >> >>- if (!binary) >>+ if (fmt != FMT_BIN) >> { >> CopyReadResult result = NORMAL_ATTR; >> char *string; >> ListCell *cur; >> >> /* Actually read the line into memory here */ >>- done = csv_mode ? >>+ done = (fmt == FMT_CSV) ? >> CopyReadLine(quote, escape) : CopyReadLine(NULL, NULL); >> >> /* >>@@ -1776,7 +1843,7 @@ >> errmsg("missing data for column \"%s\"", >> NameStr(attr[m]->attname)))); >> >>- if (csv_mode) >>+ if (fmt == FMT_CSV) >> { >> string = CopyReadAttributeCSV(delim, null_print, quote, >> escape, &result, &isnull); >>@@ -1789,7 +1856,7 @@ >> string = CopyReadAttribute(delim, null_print, >> &result, &isnull); >> >>- if (csv_mode && isnull && force_notnull[m]) >>+ if (fmt == FMT_CSV && isnull && force_notnull[m]) >> { >> string = null_print; /* set to NULL string */ >> isnull = false; >>@@ -2275,6 +2342,27 @@ >> } >> >> /*---------- >>+ * Returns decimal value for a hexadecimal digit. >>+*---------- >>+ */ >>+static int GetDecimalFromHex(char hex) >>+{ >>+ if (isdigit(hex)) >>+ { >>+ // If it is a digit >>+ return hex - '0'; >>+ } >>+ if (hex < 'a') >>+ { >>+ return hex - 'A' + 10; >>+ } >>+ else >>+ { >>+ return hex - 'a' + 10; >>+ } >>+} >>+ >>+/*---------- >> * Read the value of a single attribute, performing de-escaping as needed. >> * >> * delim is the column delimiter string (must be just one byte for now). >>@@ -2378,6 +2466,29 @@ >> case 'v': >> c = '\v'; >> break; >>+ case 'x': >>+ case 'X': >>+ if (line_buf.cursor < line_buf.len) >>+ { >>+ char hexchar = line_buf.data[line_buf.cursor]; >>+ if (isxdigit(hexchar)) >>+ { >>+ int val = GetDecimalFromHex(hexchar); >>+ line_buf.cursor++; >>+ if (line_buf.cursor < line_buf.len) >>+ { >>+ hexchar = line_buf.data[line_buf.cursor]; >>+ if (isxdigit(hexchar)) >>+ { >>+ line_buf.cursor++; >>+ val = (val << 4) + GetDecimalFromHex(hexchar); >>+ } >>+ } >>+ >>+ c = val & 0xff; >>+ } >>+ } >>+ break; >> >> /* >> * in all other cases, take the char after '\' >>@@ -2760,3 +2871,84 @@ >> >> return attnums; >> } >>+ >>+/* >>+ * Send XML representation of one attribute, with element tagging, null >>+ * marking, and entity escaping. >>+ */ >>+ >>+static void >>+CopyAttributeOutXML (char *colname, char *string) >>+{ >>+ CopySendString("<col name='"); >>+ CopySendStringXML(colname); >>+ CopySendString("' null='"); >>+ CopySendChar((string == NULL) ? 'y' : 'n'); >>+ CopySendString("'>"); >>+ >>+ if (string != NULL) >>+ CopySendStringXML(string); >>+ >>+ CopySendString("</col>"); >>+} >>+ >>+/* >>+ * Sends a string with entity escaping. >>+ */ >>+ >>+static void >>+CopySendStringXML (char *string) >>+{ >>+ char *csr; >>+ for (csr = string; *csr; ++csr) >>+ { >>+ char buf[10]; >>+ char *entity = CopyGetXMLEntity(*csr, buf); >>+ if (entity) >>+ CopySendString(entity); >>+ else >>+ CopySendChar(*csr); >>+ } >>+} >>+ >>+/* >>+ * Locates or creates an XML entity for the given character. >>+ * If that character doesn't require an entity, then the >>+ * function returns NULL. >>+ */ >>+ >>+static char * >>+CopyGetXMLEntity (char c, char *buf) >>+{ >>+ char *entity; >>+ >>+ switch (c) >>+ { >>+ case '<': >>+ entity = "<"; >>+ break; >>+ case '>': >>+ entity = ">"; >>+ break; >>+ case '&': >>+ entity = "&"; >>+ break; >>+ case '\'': >>+ entity = "'"; >>+ break; >>+ case '"': >>+ entity = """; >>+ break; >>+ default: >>+ if (!isgraph(c) && c != ' ') >>+ { >>+ sprintf(buf, "%02x;", (unsigned char)c); >>+ entity = buf; >>+ } >>+ else >>+ entity = NULL; >>+ break; >>+ } >>+ >>+ return entity; >>+} >>Index: src/backend/parser/gram.y >>=================================================================== >>RCS file: /projects/cvsroot/pgsql/src/backend/parser/gram.y,v >>retrieving revision 2.491 >>diff -u -r2.491 gram.y >>--- src/backend/parser/gram.y 7 May 2005 02:22:46 -0000 2.491 >>+++ src/backend/parser/gram.y 13 May 2005 22:21:01 -0000 >>@@ -413,6 +413,8 @@ >> >> WHEN WHERE WITH WITHOUT WORK WRITE >> >>+ XML >>+ >> YEAR_P >> >> ZONE >>@@ -1448,6 +1450,10 @@ >> { >> $$ = makeDefElem("header", (Node *)makeInteger(TRUE)); >> } >>+ | XML >>+ { >>+ $$ = makeDefElem("xml", (Node *)makeInteger(TRUE)); >>+ } >> | QUOTE opt_as Sconst >> { >> $$ = makeDefElem("quote", (Node *)makeString($3)); >>Index: src/backend/parser/keywords.c >>=================================================================== >>RCS file: /projects/cvsroot/pgsql/src/backend/parser/keywords.c,v >>retrieving revision 1.155 >>diff -u -r1.155 keywords.c >>--- src/backend/parser/keywords.c 7 May 2005 02:22:47 -0000 1.155 >>+++ src/backend/parser/keywords.c 13 May 2005 22:21:01 -0000 >>@@ -342,6 +342,7 @@ >> {"without", WITHOUT}, >> {"work", WORK}, >> {"write", WRITE}, >>+ {"xml", XML}, >> {"year", YEAR_P}, >> {"zone", ZONE}, >> }; >>Index: src/test/regress/expected/copy2.out >>=================================================================== >>RCS file: /projects/cvsroot/pgsql/src/test/regress/expected/copy2.out,v >>retrieving revision 1.21 >>diff -u -r1.21 copy2.out >>--- src/test/regress/expected/copy2.out 13 May 2005 06:33:40 -0000 1.21 >>+++ src/test/regress/expected/copy2.out 13 May 2005 22:21:01 -0000 >>@@ -194,6 +194,28 @@ >> --test that we read consecutive LFs properly >> CREATE TEMP TABLE testnl (a int, b text, c int); >> COPY testnl FROM stdin CSV; >>-DROP TABLE x, y; >>+CREATE TABLE z ( >>+ col1 text, >>+ col2 text >>+); >>+COPY z from stdin; >>+COPY z TO stdout; >>+Jackson, Sam \\h >>+ABC \\\\\t >>+It is "perfect". \t >>+ NULL >>+COPY z TO stdout WITH CSV; >>+"Jackson, Sam",\h >>+ABC,\\ >>+"It is ""perfect"".", >>+"",NULL >>+COPY y TO stdout WITH XML; >>+<?xml version='1.0'?> >>+<table> >>+<row><col name='col1' null='n'>Jackson, Sam</col><col name='col2' null='n'>\h</col></row> >>+<row><col name='col1' null='n'>It is "perfect".</col><col name='col2' null='n'> </col></row> >>+<row><col name='col1' null='n'></col><col name='col2' null='y'></col></row> >>+</table> >>+DROP TABLE x, y, z; >> DROP FUNCTION fn_x_before(); >> DROP FUNCTION fn_x_after(); >>Index: src/test/regress/sql/copy2.sql >>=================================================================== >>RCS file: /projects/cvsroot/pgsql/src/test/regress/sql/copy2.sql,v >>retrieving revision 1.12 >>diff -u -r1.12 copy2.sql >>--- src/test/regress/sql/copy2.sql 13 May 2005 06:33:40 -0000 1.12 >>+++ src/test/regress/sql/copy2.sql 13 May 2005 22:21:01 -0000 >>@@ -139,7 +139,22 @@ >> inside",2 >> \. >> >>+CREATE TABLE z ( >>+ col1 text, >>+ col2 text >>+); >> >>-DROP TABLE x, y; >>+COPY z from stdin; >>+Jackson, Sam \\h >>+\x41\x42\x43\xa0\x1 \x5c\x5c\x9 >>+It is "perfect". \t >>+ NULL >>+\. >>+ >>+COPY z TO stdout; >>+COPY z TO stdout WITH CSV; >>+COPY y TO stdout WITH XML; >>+ >>+DROP TABLE x, y, z; >> DROP FUNCTION fn_x_before(); >> DROP FUNCTION fn_x_after(); >> >> >>------------------------------------------------------------------------ >> >> >>---------------------------(end of broadcast)--------------------------- >>TIP 6: Have you searched our list archives? >> >> http://archives.postgresql.org >> >>
Bruce Momjian wrote: > We considered putting XML in psql or libpq in the past, but the problem > is that interfaces like jdbc couldn't take advantage of it. Well, you could implement it as a C UDF and use SPI. Or write it as a C client library, and use JNI. Or just provide a Java implementation -- it's not like the COPY -> XML transformation is very complex. To restate the case: - I don't see how this feature is useful. Perhaps I'm mistaken, but I don't think there's a lot of user demand for it (feel free to demonstrate the contrary) - The COPY -> XML transformation is trivial -- it would be easy for clients to roll their own. At the same time, there is no standard or canonical XML representation for COPY output, and I can easily imagine different clients needing different representations. So there is limited value in providing a single, inflexible backend implementation. - There's no need for it to be in the backend, anyway. Perhaps if there were overwhelming demand for this functionality, we would need to provide it for all client libraries and the easiest solution would be to put it in the backend, but I don't think that's the case. If people really think XML COPY output mode is useful, why not implement it client-side first and host it on pgfoundry? If lots of people are using the client-side code, we can consider moving it into the core distribution or the backend itself at that point. -Neil
Neil Conway wrote: > Bruce Momjian wrote: > > We considered putting XML in psql or libpq in the past, but the problem > > is that interfaces like jdbc couldn't take advantage of it. > > Well, you could implement it as a C UDF and use SPI. Or write it as a C > client library, and use JNI. Or just provide a Java implementation -- > it's not like the COPY -> XML transformation is very complex. > > To restate the case: > > - I don't see how this feature is useful. Perhaps I'm mistaken, but I > don't think there's a lot of user demand for it (feel free to > demonstrate the contrary) > > - The COPY -> XML transformation is trivial -- it would be easy for > clients to roll their own. At the same time, there is no standard or > canonical XML representation for COPY output, and I can easily imagine > different clients needing different representations. So there is limited > value in providing a single, inflexible backend implementation. > > - There's no need for it to be in the backend, anyway. Perhaps if there > were overwhelming demand for this functionality, we would need to > provide it for all client libraries and the easiest solution would be to > put it in the backend, but I don't think that's the case. > > If people really think XML COPY output mode is useful, why not implement > it client-side first and host it on pgfoundry? If lots of people are > using the client-side code, we can consider moving it into the core > distribution or the backend itself at that point. All I can say is that we rejected an XML in the client patch a long time ago and the discussion was that it belongs in the backend so everyone can use it. I don't use XML myself so I have no opinion. We need people who need XML output to comment in this thread. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Folks, > - The COPY -> XML transformation is trivial -- it would be easy for > clients to roll their own. At the same time, there is no standard or > canonical XML representation for COPY output, and I can easily imagine > different clients needing different representations. So there is limited > value in providing a single, inflexible backend implementation. I'm going to second Neil here. This feature becomes useful *only* when there is a certified or de-facto universal standard XML representation for database data. Then I could see a case for it. But there isn't. Feel free to throw it on pgFoundry, though. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > I'm going to second Neil here. I think the same --- given the points about lack of standardization, it seems premature to put this into the backend. I'd be for it if there were a clear standard, but ... regards, tom lane
> I'm going to second Neil here. This feature becomes useful *only* when there > is a certified or de-facto universal standard XML representation for database > data. Then I could see a case for it. But there isn't. We've done it in phpPgAdmin (we made up our own standard), and a couple of people use it. I also do not think that it should be in the backend until there is a standard. Here is what phpPgAdmin produces (note NULL handling): <?xml version="1.0" encoding="UTF-8" ?> <data><header> <column name="feature_id" type="varchar" /> <column name="feature_name" type="varchar" /> <columnname="is_supported" type="varchar" /> <column name="is_verified_by" type="varchar" /> <column name="comments"type="varchar" /></header><records> <row> <column name="feature_id">PKG000</column> <columnname="feature_name">Core</column> <column name="is_supported">NO</column> <column name="is_verified_by"null="null"></column> <column name="comments" null="null"></column> </row> <row> <column name="feature_id">PKG001</column> <column name="feature_name">Enhanced datetime facilities</column> <column name="is_supported">YES</column> <column name="is_verified_by" null="null"></column> <column name="comments"null="null"></column> </row> <row> <column name="feature_id">PKG002</column> <column name="feature_name">Enhancedintegrity management</column> <column name="is_supported">NO</column> <column name="is_verified_by"null="null"></column> <column name="comments" null="null"></column> </row> <row> <column name="feature_id">PKG003</column> <column name="feature_name">OLAP facilities</column> <column name="is_supported">NO</column> <column name="is_verified_by" null="null"></column> <column name="comments"null="null"></column> </row> <row> <column name="feature_id">PKG004</column> <column name="feature_name">PSM</column> <column name="is_supported">NO</column> <column name="is_verified_by" null="null"></column> <column name="comments">PL/pgSQL is similar.</column> </row> <row> <column name="feature_id">PKG005</column> <column name="feature_name">CLI</column> <column name="is_supported">NO</column> <column name="is_verified_by" null="null"></column> <column name="comments">ODBCis similar.</column> </row> <row> <column name="feature_id">PKG006</column> <columnname="feature_name">Basic object support</column> <column name="is_supported">NO</column> <column name="is_verified_by"null="null"></column> <column name="comments" null="null"></column> </row> <row> <column name="feature_id">PKG007</column> <column name="feature_name">Enhanced object support</column> <columnname="is_supported">NO</column> <column name="is_verified_by" null="null"></column> <column name="comments"null="null"></column> </row> <row> <column name="feature_id">PKG008</column> <column name="feature_name">Activedatabase</column> <column name="is_supported">NO</column> <column name="is_verified_by"null="null"></column> <column name="comments" null="null"></column> </row> <row> <column name="feature_id">PKG010</column> <column name="feature_name">OLAP</column> <column name="is_supported">NO</column> <column name="is_verified_by" null="null"></column> <column name="comments">NO</column> </row></records> </data>
minor nit: the null attribute should take XMLSchema boolean type values: {true, false, 1, 0} More importantly, how do you handle array or record type fields? If they are just opaque text I don't think that's what I at least would want from XML output routines. cheers andrew Christopher Kings-Lynne wrote: >> I'm going to second Neil here. This feature becomes useful *only* >> when there is a certified or de-facto universal standard XML >> representation for database data. Then I could see a case for it. >> But there isn't. > > > We've done it in phpPgAdmin (we made up our own standard), and a > couple of people use it. I also do not think that it should be in the > backend until there is a standard. Here is what phpPgAdmin produces > (note NULL handling): > > <?xml version="1.0" encoding="UTF-8" ?> > <data> > <header> > <column name="feature_id" type="varchar" /> > <column name="feature_name" type="varchar" /> > <column name="is_supported" type="varchar" /> > <column name="is_verified_by" type="varchar" /> > <column name="comments" type="varchar" /> > </header> > <records> > <row> > <column name="feature_id">PKG000</column> > <column name="feature_name">Core</column> > <column name="is_supported">NO</column> > <column name="is_verified_by" null="null"></column> > <column name="comments" null="null"></column> > </row> > <row> > <column name="feature_id">PKG001</column> > <column name="feature_name">Enhanced datetime > facilities</column> > <column name="is_supported">YES</column> > <column name="is_verified_by" null="null"></column> > <column name="comments" null="null"></column> > </row> > <row> > <column name="feature_id">PKG002</column> > <column name="feature_name">Enhanced integrity > management</column> > <column name="is_supported">NO</column> > <column name="is_verified_by" null="null"></column> > <column name="comments" null="null"></column> > </row> > <row> > <column name="feature_id">PKG003</column> > <column name="feature_name">OLAP facilities</column> > <column name="is_supported">NO</column> > <column name="is_verified_by" null="null"></column> > <column name="comments" null="null"></column> > </row> > <row> > <column name="feature_id">PKG004</column> > <column name="feature_name">PSM</column> > <column name="is_supported">NO</column> > <column name="is_verified_by" null="null"></column> > <column name="comments">PL/pgSQL is similar.</column> > </row> > <row> > <column name="feature_id">PKG005</column> > <column name="feature_name">CLI</column> > <column name="is_supported">NO</column> > <column name="is_verified_by" null="null"></column> > <column name="comments">ODBC is similar.</column> > </row> > <row> > <column name="feature_id">PKG006</column> > <column name="feature_name">Basic object support</column> > <column name="is_supported">NO</column> > <column name="is_verified_by" null="null"></column> > <column name="comments" null="null"></column> > </row> > <row> > <column name="feature_id">PKG007</column> > <column name="feature_name">Enhanced object support</column> > <column name="is_supported">NO</column> > <column name="is_verified_by" null="null"></column> > <column name="comments" null="null"></column> > </row> > <row> > <column name="feature_id">PKG008</column> > <column name="feature_name">Active database</column> > <column name="is_supported">NO</column> > <column name="is_verified_by" null="null"></column> > <column name="comments" null="null"></column> > </row> > <row> > <column name="feature_id">PKG010</column> > <column name="feature_name">OLAP</column> > <column name="is_supported">NO</column> > <column name="is_verified_by" null="null"></column> > <column name="comments">NO</column> > </row> > </records> > </data> > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
I have removed the XML TODO item: * Add XML output to pg_dump and COPY We already allow XML to be stored in the database, and XPath queries can be used onthat data using /contrib/xml2. It also supports XSLT transformations. --------------------------------------------------------------------------- Josh Berkus wrote: > Folks, > > > - The COPY -> XML transformation is trivial -- it would be easy for > > clients to roll their own. At the same time, there is no standard or > > canonical XML representation for COPY output, and I can easily imagine > > different clients needing different representations. So there is limited > > value in providing a single, inflexible backend implementation. > > I'm going to second Neil here. This feature becomes useful *only* when there > is a certified or de-facto universal standard XML representation for database > data. Then I could see a case for it. But there isn't. > > Feel free to throw it on pgFoundry, though. > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073