Thread: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

From
Jim Michaels
Date:
what do you think about foreign data wrappers getting CSV file table I/O?
- I had thought that CSVQL db could be implemented completely with
small amount of memory and file I/O, line at a time. EOL detection
would be needed. can be: CR, LF, CR+LF. sometimes beginners get it
backwards (LF+CR), but it's stll detectable because it's an immediate
sequence detectable by state machine or while loop. it should be
written as CR+LF because of standards compliance X2J4579 - that means
go look for it (I have not found it yet).
while (!feof(...)&&(ch=='\r'||ch=='\n')) {
do {
if (1!=fread(&ch, 1, 1, ...)) {
ch=0;
break;
}
if (ch!='\r'&&ch!='\n') {
break;//process non-EOL character
}
//at this point, it's an EOL character
} while (true);
or
while (!feof(...)&&ch=='\t') {
if (1==fread(&ch, 1, 1, ...)) {//read 1 character
}
}
this code could be read in chunks using a buffer. the last chunk would
need to be handled as a special case, since if it exists, it exists as
<full buffer size. try %

- the microsoft patented CSV would be required for implementation. it
handles special data with commas and double-quotes in them
- tab-separated I/O would be nice as well.
- you could see >2,000,000 rows, so don't limit it. try GCC's
fopen64/fsetpos,fgetpos/fpos_t/fclose, other vendors use plain fopen
and that works with >=64-bit file sizes. and it's fast. avoid 32-bit
fseek/ftell.
- indexing could be file/RAM array of uint64_t-like file pointers
convertable to fpos_t if needed.
- biggest needed feature is an easier-to-use ALTER TABLE RENAME. a
memorable alternative/alias would be simply RENAME COLUMN columnName
TO newColumnName.

-- 
======
Jim Michaels <jmichae35@gmail.com>


Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

From
"David G. Johnston"
Date:
On the whole this email is very confusing/hard-to-follow...

On Wed, May 2, 2018 at 2:29 PM, Jim Michaels <jmichae35@gmail.com> wrote:
what do you think about foreign data wrappers getting CSV file table I/O?

 ​I don't understand the question...​ 
  ​
I had thought that CSVQL db could be implemented completely with
small amount of memory and file I/O, line at a time

​Do you no longer think that then?

I don't see PostgreSQL being all that open to implementing a second query language beside SQL - and doing it functionally seems unlikely unless, like with JSON, you are storing entire CSV files in a table field.​
 
- the microsoft patented CSV would be required for implementation. it
handles special data with commas and double-quotes in them

​If true this seems like a show-stopper to anything PostgreSQL would implement

- biggest needed feature is an easier-to-use ALTER TABLE RENAME. a
memorable alternative/alias would be simply RENAME COLUMN columnName
TO newColumnName.

I don't see us adding new syntax for this...

David J.
​​

Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

From
Ron
Date:
On 05/02/2018 04:49 PM, David G. Johnston wrote:
[snip]
- the microsoft patented CSV would be required for implementation. it
handles special data with commas and double-quotes in them

​If true this seems like a show-stopper to anything PostgreSQL would implement

If MSFT really holds a patent on the CSV format, then Postgresql is already in a world of hurt.

--
Angular momentum makes the world go 'round.

Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

From
John McKown
Date:
On Wed, May 2, 2018 at 4:29 PM, Jim Michaels <jmichae35@gmail.com> wrote:
what do you think about foreign data wrappers getting CSV file table I/O?
- I had thought that CSVQL db could be implemented completely with
​<snip>

​I don't know what you want to do with this. SQLite already supports it. SQLite is an embedded SQL database​ software. To a great extent, the SQL that it understands is similar to what PostgreSQL understands. It is one of the "standards" that the author uses. It implements "virtual tables" via extensions. One of these is to process CSV files. ref: http://sqlite.org/csv.html  The CSV it understands is RFC 4180 format (https://www.ietf.org/rfc/rfc4180.txt). SQLite is open source and is _PUBLIC DOMAIN_. That is, it has NO copyright at all. So you can do anything with it that you want to. You might even be able to use the source to the SQLite extension ( https://www.sqlite.org/src/artifact?ci=trunk&filename=ext/misc/csv.c ) to write the PostgreSQL foreign data wrapper.

So, if you just need something so that you can read a CSV using SQL, then you might consider using SQLite instead of, or in addition to, PostgreSQL. Or, if you need to do subselects, unions, or other things containing both PostgreSQL data & CSV data, then I guess you're stuck with the foreign data wrapper. 

 

--
======
Jim Michaels <jmichae35@gmail.com>




--
We all have skeletons in our closet.
Mine are so old, they have osteoporosis.

Maranatha! <><
John McKown

Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

From
Adrian Klaver
Date:
On 05/02/2018 02:29 PM, Jim Michaels wrote:

> 
> - the microsoft patented CSV would be required for implementation. it
> handles special data with commas and double-quotes in them

Huh?:
https://en.wikipedia.org/wiki/Comma-separated_values#History


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

From
raf@raf.org
Date:
Ron wrote:

> On 05/02/2018 04:49 PM, David G. Johnston wrote:
> [snip]
> > 
> >     - the microsoft patented CSV would be required for implementation. it
> >     handles special data with commas and double-quotes in them
> > 
> > 
> > If true this seems like a show-stopper to anything PostgreSQL would implement
> 
> If MSFT really holds a patent on the CSV format, then Postgresql is already
> in a world of hurt.

Even if the CSV format was patented, don't patents only last 17 years?
Has anyone found the patent? When was it granted? I would guess in the 1980s.
And can anyone remember MS ever demanding a fee for using it?



Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

From
George Neuner
Date:
On Wed, 2 May 2018 16:01:01 -0700, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

>On 05/02/2018 02:29 PM, Jim Michaels wrote:
>
>> 
>> - the microsoft patented CSV would be required for implementation. it
>> handles special data with commas and double-quotes in them
>
>Huh?:
>https://en.wikipedia.org/wiki/Comma-separated_values#History


Disclaimer ... I haven't investigated the claim.

However, I would not discount the possibility that Microsoft really
has patented some variation of CSV.  They absolutely did *try* to
copyright the use of + and - symbols for specifying addition and
subtraction operations in VisualBASIC.

It's possible that they slipped something past the examiners.  But
more likely the use of a CSV-like format was specified to be part of a
larger process.  In that case the format itself might not be claimed,
but rather only the *use* of the format for some specific purpose.

IANAL,
George



Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

From
Adrian Klaver
Date:
On 05/03/2018 09:47 AM, George Neuner wrote:
> On Wed, 2 May 2018 16:01:01 -0700, Adrian Klaver
> <adrian.klaver@aklaver.com> wrote:
> 
>> On 05/02/2018 02:29 PM, Jim Michaels wrote:
>>
>>>
>>> - the microsoft patented CSV would be required for implementation. it
>>> handles special data with commas and double-quotes in them
>>
>> Huh?:
>> https://en.wikipedia.org/wiki/Comma-separated_values#History
> 
> 
> Disclaimer ... I haven't investigated the claim.

Difficult because it is made up of dreams, wishes and nightmares:)

> 
> However, I would not discount the possibility that Microsoft really
> has patented some variation of CSV.  They absolutely did *try* to
> copyright the use of + and - symbols for specifying addition and
> subtraction operations in VisualBASIC.

Not seeing it:


http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=0&f=S&l=50&TERM1=microsoft&FIELD1=AANM&co1=AND&TERM2=csv&FIELD2=&d=PTXT


> 
> It's possible that they slipped something past the examiners.  But
> more likely the use of a CSV-like format was specified to be part of a
> larger process.  In that case the format itself might not be claimed,
> but rather only the *use* of the format for some specific purpose.
> 
> IANAL,
> George
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

From
George Neuner
Date:
On Thu, 3 May 2018 11:02:00 -0700, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

>On 05/03/2018 09:47 AM, George Neuner wrote:
>> 
>> ..., I would not discount the possibility that Microsoft really
>> has patented some variation of CSV.  They absolutely did *try* to
>> copyright the use of + and - symbols for specifying addition and
>> subtraction operations in VisualBASIC.
>
>Not seeing it:
>

>http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=0&f=S&l=50&TERM1=microsoft&FIELD1=AANM&co1=AND&TERM2=csv&FIELD2=&d=PTXT


That's the patent database.  Microsoft tried to get a *copyright*.  I
don't recall whether it was granted [I don't believe it was], and this
would have been circa ~1990, so it's hard to search for in any case.
Unlike the patent database, the copyright database does not contain
the protected material - it only gives archival references to it.

It generated quite a bit of negative press coverage at the time.  The
basis of Microsoft's argument was that "x + y" was a unique and
protectable expression of the addition concept because it could have
been done in other ways, e.g., by "add(x,y)".


George



Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

From
Ken Tanzer
Date:
On Fri, May 4, 2018 at 1:03 PM, George Neuner <gneuner2@comcast.net> wrote:
On Thu, 3 May 2018 11:02:00 -0700, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

>On 05/03/2018 09:47 AM, George Neuner wrote:
>>
>> ..., I would not discount the possibility that Microsoft really
>> has patented some variation of CSV.  They absolutely did *try* to
>> copyright the use of + and - symbols for specifying addition and
>> subtraction operations in VisualBASIC.
>
>Not seeing it:
>
>http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=0&f=S&l=50&TERM1=microsoft&FIELD1=AANM&co1=AND&TERM2=csv&FIELD2=&d=PTXT


That's the patent database.  Microsoft tried to get a *copyright*.  I
don't recall whether it was granted [I don't believe it was], and this
would have been circa ~1990, so it's hard to search for in any case.
Unlike the patent database, the copyright database does not contain
the protected material - it only gives archival references to it.

It generated quite a bit of negative press coverage at the time.  The
basis of Microsoft's argument was that "x + y" was a unique and
protectable expression of the addition concept because it could have
been done in other ways, e.g., by "add(x,y)".



I don't think in general you can copyright a file format.  You can copyright things you create, and you can try to keep secret the information about how they work.  People can't steal your code to create CSV files, but you can't tell people they can't string a bunch of values together with commas in between if they can figure out how to do so all by themselves.  Plus it's hard to see how "fair use" wouldn't protect something as short as "x+y", or ",".

FWIW, Wikipedia includes CSV in its list of open formats.  The article linked below also says no, although it seems UK-based, not U.S.

Cheers,
Ken






--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

From
George Neuner
Date:
On Sun, 6 May 2018 15:26:22 -0700, Ken Tanzer <ken.tanzer@gmail.com>
wrote:

>On Fri, May 4, 2018 at 1:03 PM, George Neuner <gneuner2@comcast.net> wrote:
>
>> On Thu, 3 May 2018 11:02:00 -0700, Adrian Klaver
>> <adrian.klaver@aklaver.com> wrote:
>>
>> >On 05/03/2018 09:47 AM, George Neuner wrote:
>> >>
>> >> ..., I would not discount the possibility that Microsoft really
>> >> has patented some variation of CSV.  They absolutely did *try* to
>> >> copyright the use of + and - symbols for specifying addition and
>> >> subtraction operations in VisualBASIC.
>> >
>> >Not seeing it:
>> >
>> >http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%
>> 2Fnetahtml%2FPTO%2Fsearch-bool.html&r=0&f=S&l=50&TERM1=
>> microsoft&FIELD1=AANM&co1=AND&TERM2=csv&FIELD2=&d=PTXT
>>
>>
>> That's the patent database.  Microsoft tried to get a *copyright*.  I
>> don't recall whether it was granted [I don't believe it was], and this
>> would have been circa ~1990, so it's hard to search for in any case.
>> Unlike the patent database, the copyright database does not contain
>> the protected material - it only gives archival references to it.
>>
>> It generated quite a bit of negative press coverage at the time.  The
>> basis of Microsoft's argument was that "x + y" was a unique and
>> protectable expression of the addition concept because it could have
>> been done in other ways, e.g., by "add(x,y)".
>>
>>
>>
>I don't think in general you can copyright a file format. 

CSV is deliberately public domain, but I'm not at all certain that it
could not be copyrighted otherwise.  And I _am_ certain that a more
restricted derivative could be copyrighted. [more below]


You can copyright damn near anything.  Under the Berne convention, the
simple fact of authorship conveys certain rights even if the "work" is
a verbatim copy of something else.

The only rules - at least in the US - are that your expression of the
idea is not known to be common usage, and that it is neither a
verbatim copy, nor a derivative of a previously *registered* work.  

There are 3 kinds of copyrights: registered, explicit, and implicit.
The US recognizes registered and explicit copyrights, but protection
is available only for registered copyrights.
The US does not follow the Berne convention wrt implicit or explicit
copyrights.  Explicit copyrights carry no legal weight in the US, but
they *may* influence the court in case of a dispute because they do
represent a claim of the work.  The US does not recognize implicit
copyrights at all.


Checking that something is a verbatim copy of an existing work is not
that easy ... every registered work is archived in the Library of
Congress (LoC), but much of the LoC *still* is not electronically
searchable.  Determining whether something is or is not derivative of
something else is not even within the US Copyright Office's mandate
... they leave that to the courts.




>You can copyright things you create, 

Including data formats ... 

>and you can try to keep secret the information about how they work.  

Trade secret has nothing whatsoever to do with IP law.  Secrets convey
no legal protection, and a secret may still be an infringement on a
protected, publicly available work.

It might be hard to figure out that a secret is infringing ...


>People can't steal your code to create CSV files, but
>you can't tell people they can't string a bunch of values together with
>commas in between if they can figure out how to do so all by themselves.

>Plus it's hard to see how "fair use" wouldn't protect something as short as
>"x+y", or ",".

"Fair Use" has no effect on whether the work is copyrightable.  It
applies only after the fact as a possible defense against
infringement.

It doesn't even apply here.  Fair Use is not a general exception to
infringement under the law - it is in fact specifically limited to
educational and historical use.

Fair Use does not prescribe any minimum length "sampling" of the work.
It describes that there is a maximum length "sampling" that should be
permitted to be copied verbatim into a new work.  But, even there, the
allowable sample size is not fixed under the law: it is ad hoc, per
work, and decided after the fact by a court in the event of a dispute.

Note that some countries do not recognize the concept of Fair Use.


Punctuation alone is not copyrightable by law - the "work" is required
to be intelligible [for some definition - computer binaries can be
copyrighted].  The issue for a legitimate work would be whether it is
either in the public domain, or otherwise is a common usage not
deserving of a copyright.

In many cases, the Copyright Office is not equipped to determine
either status.  There is a database of registered copyrights, but the
"material" that was copyrighted is NOT in that database.  There is
just an LoC reference to it.  There is no database of public domain
works. Nor is there a database of failed applications.  There would be
archived examiner notes on failed applications, but who knows if they
are searchable?


>FWIW, Wikipedia includes CSV in its list of open formats.  The article
>linked below also says no, although it seems UK-based, not U.S.

Yes.  But being an "open" format doesn't prevent anyone from
*patenting* USE OF THE FORMAT in a larger process.

It probably is true that no one could get a copyright on a data format
consisting of an open-ended comma separated value sequence.  Even the
Copyright Office isn't THAT LAME.

But you absolutely can copyright a comma separated sequence of literal
values.  And you probably can copyright a data format using comma
separation in which a limited number of values, drawn from particular
specified domains, are listed in a particular order.

And you absolutely can *patent* use of any data format for a given
purpose [assuming the purpose itself is patentable].


If you really want to know about this stuff, you need to talk to an IP
attorney [not a general one].  IP law is a specialty, and the details
vary considerably by locale.  Don't try to get legal knowledge from
wikipedia.  Or even from me. <grin>


>Cheers,
>Ken
>
>https://en.wikipedia.org/wiki/List_of_open_formats
>http://www.blplaw.com/expert-legal-insights/articles/copyright-protect-data-file-formats

Back at you,
George



Re: CSVQL? CSV SQL? tab-separated table I/O? RENAME COLUMN

From
Ken Tanzer
Date:


On Sun, May 6, 2018 at 10:22 PM, George Neuner <gneuner2@comcast.net> wrote:
 
>> That's the patent database.  Microsoft tried to get a *copyright*.  I

>I don't think in general you can copyright a file format.

And you absolutely can *patent* use of any data format for a given
purpose [assuming the purpose itself is patentable].
 
Yes, I wasn't addressing patents, just addressing the issue of copyright that you raised.


You can copyright damn near anything.  Under the Berne convention, the
simple fact of authorship conveys certain rights even if the "work" is
a verbatim copy of something else.

The only rules - at least in the US - are that your expression of the
idea is not known to be common usage, and that it is neither a
verbatim copy, nor a derivative of a previously *registered* work. 


That is patently untrue.  And really you don't have to be a lawyer to form an opinion about this.

From the US Copyright Office:

To be copyrightable, a work must qualify as an original work of authorship, meaning that it must have been created independently and contain a sufficient amount of creativity.

Copyright law expressly excludes copyright protection for “any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied.”

A recipe is a statement of the ingredients and procedure required for making a dish of food. A mere listing of ingredients or contents, or a simple set of directions, is uncopyrightable. 


Cheers,
Ken

--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.