Thread: Ad-hoc table type?

Ad-hoc table type?

From

pgsql@mohawksoft.com

Date:

28 September 2008, 19:35:23

I was in a discussion with someone about the difference between ad-hoc
storage systems and SQL. Yes, I know, I was rolling my eyes as well. One
thing did strike me though was the idea that a table could contain a
variable number of columns.

Something like this:

create adhoc table foo ();

insert into foo (name, rank, serial) values ('joe', 'sargent', '42');

In an "ad-hoc" table type, when an insert is made, and a column is not
found, then a new varchar column is added.

I know the idea has a lot of holes, and is probably a bad idea, but it
answers an important problem of easily mapping programmatic types to a
database.

Anyone think its interesting?

Re: Ad-hoc table type?

From

Tom Lane

Date:

28 September 2008, 19:39:41

pgsql@mohawksoft.com writes:
> Something like this:

> create adhoc table foo ();

> insert into foo (name, rank, serial) values ('joe', 'sargent', '42');

> In an "ad-hoc" table type, when an insert is made, and a column is not
> found, then a new varchar column is added.

> I know the idea has a lot of holes, and is probably a bad idea, but it
> answers an important problem of easily mapping programmatic types to a
> database.

Seems like a table with one contrib/hstore column might be more relevant
to this guy's idea of how to do database design.
        regards, tom lane

Re: Ad-hoc table type?

From

pgsql@mohawksoft.com

Date:

28 September 2008, 20:56:32

> pgsql@mohawksoft.com writes:
>> Something like this:
>
>> create adhoc table foo ();
>
>> insert into foo (name, rank, serial) values ('joe', 'sargent', '42');
>
>> In an "ad-hoc" table type, when an insert is made, and a column is not
>> found, then a new varchar column is added.
>
>> I know the idea has a lot of holes, and is probably a bad idea, but it
>> answers an important problem of easily mapping programmatic types to a
>> database.
>
> Seems like a table with one contrib/hstore column might be more relevant
> to this guy's idea of how to do database design.
>

That's actually a very cool module, I hadn't seen it before. I've
considered writing something like it, but more XML centric, but I'm not
sure it answers the concept.

I'm not sure if you have dealt with web site sessions and object
persistence crap, but its a pain to get up and running and improving
performance is a drag. Web guys tend to know very little about databases
and tend, sadly, not to be very inquisitive about such things.

Web session and user attribute objects are typically stored in a database
as XML, JSON, or some other aggregated format in a single column (hstore).
That works great for when you just need to access the data by the key, but
if you want to "use" the data outside the web application for something
like OLAP, you have to decide which attributes reside in the aggregate
column or get promoted to a full fledged column. That's why you'll see
tables with username, passwdhash, email, etc. in addition to an aggregated
column of things like screen template, age, etc.

So, how do you have a table of a generally arbitrary number of columns
without creating some sort of aggregate column?  With an aggregate column,
the data isn't on the same level as real column data, so you need to parse
the aggregate to extract a value, and you have to do that for each value.
On top of that, you then have to explain your aggregate strategy to the
web guys.

Being able to insert arbitrary named values, and extracting them
similarly, IMHO works "better" and more naturally than some external
aggregate system built on a column. I know it is a little "outside the
box" thinking, what do you think?

Re: Ad-hoc table type?

From

Tom Lane

Date:

28 September 2008, 21:46:38

pgsql@mohawksoft.com writes:
> Being able to insert arbitrary named values, and extracting them
> similarly, IMHO works "better" and more naturally than some external
> aggregate system built on a column. I know it is a little "outside the
> box" thinking, what do you think?

I'm failing to see the point.  Allowing columns to spring into existence
without any forethought seems to me to be all minuses and no pluses
worth mentioning.

* What if the column name is just a typo?

* What datatype should it have?  ("Always varchar" is just lame.)

* Should it have an index?  If so, should it be unique?

* If you keep doing this, you'll soon find yourself reading out
unbelievably wide tables (lots of columns), which won't be especially
easy or efficient to process on either the backend or the client side.
Plus you might run into the max-columns-per-tuple limit.

If you've expended enough thought to be sure that the column is not just
a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN
command to tell the database the results of your genius.

I do see the point that switching from "member of an hstore column" to
"real database column" is pretty painful, but I don't see that "allow
columns to spring into existence" solves that in any meaningful way.
Is there some other way we could address such conversions?

BTW, I think it is (or should be) possible to create an index on
hstore->'mycol', so at least one of the reasons why you should *need*
to switch to a "real" database column seems bogus.
        regards, tom lane

Re: Ad-hoc table type?

From

Mark Mielke

Date:

28 September 2008, 22:13:03

Not that I'm agreeing with the direction but just as a thinking experiment:<br /><br /> Tom Lane wrote: <blockquote
cite="mid:22562.1222649186@sss.pgh.pa.us"type="cite"><pre wrap=""><a class="moz-txt-link-abbreviated"
href="mailto:pgsql@mohawksoft.com">pgsql@mohawksoft.com</a>writes: </pre><blockquote type="cite"><pre wrap="">Being
ableto insert arbitrary named values, and extracting them
 
similarly, IMHO works "better" and more naturally than some external
aggregate system built on a column. I know it is a little "outside the
box" thinking, what do you think?   </pre></blockquote><pre wrap="">
I'm failing to see the point.  Allowing columns to spring into existence
without any forethought seems to me to be all minuses and no pluses
worth mentioning.

* What if the column name is just a typo? </pre></blockquote><br /> If it's a field in a data structure from a language
suchas Java, it's not a typo.<br /><br /><blockquote cite="mid:22562.1222649186@sss.pgh.pa.us" type="cite"><pre
wrap="">*What datatype should it have?  ("Always varchar" is just lame.) </pre></blockquote><br /> SQLite uses "always
varchar"and it doesn't seem to be a problem. For simpler numbers like "0", the text form can be more compact, and the
databasemay be portable across different hardware architectures.<br /><br /><blockquote
cite="mid:22562.1222649186@sss.pgh.pa.us"type="cite"><pre wrap="">* Should it have an index?  If so, should it be
unique?</pre></blockquote><br /> It might be cool for indexes to automatically appear as they become beneficial (and
removedas they become problematic). Unique is a constraint which should be considered separate from whether it should
bean index or not. I don't know if it would be useful or not.<br /><br /><blockquote
cite="mid:22562.1222649186@sss.pgh.pa.us"type="cite"><pre wrap="">* If you keep doing this, you'll soon find yourself
readingout
 
unbelievably wide tables (lots of columns), which won't be especially
easy or efficient to process on either the backend or the client side.
Plus you might run into the max-columns-per-tuple limit. </pre></blockquote><br /> Introduce variable field-order for
tuples?Only provide values if non-null? :-)<br /><br /><blockquote cite="mid:22562.1222649186@sss.pgh.pa.us"
type="cite"><prewrap="">If you've expended enough thought to be sure that the column is not just
 
a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN
command to tell the database the results of your genius.

I do see the point that switching from "member of an hstore column" to
"real database column" is pretty painful, but I don't see that "allow
columns to spring into existence" solves that in any meaningful way.
Is there some other way we could address such conversions?

BTW, I think it is (or should be) possible to create an index on
hstore->'mycol', so at least one of the reasons why you should *need*
to switch to a "real" database column seems bogus. </pre></blockquote><br /> I find the Oracle nested table and data
structuresupport enticing although I do not have experience with it. It seems like it might be a more mature
implementationof hstore? If hstore had everything that was required in terms of performance or flexibility, we wouldn't
needfixed columns at all?<br /><br /> But yes - I tend to agree that the object persistent layer can be hidden away
behindsomething like the Java object persistence model, automatically doing alter table or providing a configured
mappingfrom a description file. This isn't a problem that needs to be solved at the database layer.<br /><br />
Cheers,<br/> mark<br /><br /><pre class="moz-signature" cols="72">-- 
 
Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a>
</pre>

Re: Ad-hoc table type?

From

pgsql@mohawksoft.com

Date:

28 September 2008, 22:54:07

> pgsql@mohawksoft.com writes:
>> Being able to insert arbitrary named values, and extracting them
>> similarly, IMHO works "better" and more naturally than some external
>> aggregate system built on a column. I know it is a little "outside the
>> box" thinking, what do you think?
>
> I'm failing to see the point.  Allowing columns to spring into existence
> without any forethought seems to me to be all minuses and no pluses
> worth mentioning.
>
> * What if the column name is just a typo?

In an automated system like PHP, Java, etc. that's not too likely.

>
> * What datatype should it have?  ("Always varchar" is just lame.)

varchar or text is not "just lame," SQLite used to do that exclusively.
One could argue that XML is nothing more than text.

>
> * Should it have an index?  If so, should it be unique?

The answer to that is, well, no, not unless the dba generates one or it is
declared. Just like any other column. All the rules that apply to "create
table" and "alter table add column" just apply naturally as would be
expected.

create adhoc table userdata(username varchar, email varchar, primary
key(email));

>
> * If you keep doing this, you'll soon find yourself reading out
> unbelievably wide tables (lots of columns), which won't be especially
> easy or efficient to process on either the backend or the client side.
> Plus you might run into the max-columns-per-tuple limit.

Well, I fully understand that it is not a general purpose "unlimited"
width sort of thing. In a programing environment, the target environment
for this type of feature, it is unlikely to be a run-away problem.

>
> If you've expended enough thought to be sure that the column is not just
> a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN
> command to tell the database the results of your genius.

Like I said, if you've never dealt with a live web site, maintained by a
team of "web dudes," working furiously to keep their job and get paid,
your only hope to keep up with "Oh! I needed to add the 'time to live' of
the session into the session data" is to use an aggregate storage system.

>
> I do see the point that switching from "member of an hstore column" to
> "real database column" is pretty painful, but I don't see that "allow
> columns to spring into existence" solves that in any meaningful way.
> Is there some other way we could address such conversions?

Every other solution creates a second tier of data storage. You either
deal with data elements at the table level, or you create a "roll your
own" aggregate mechanism, or make a HUGE table of "user,name,value" table
and force a join and index scan for every select.  (A million users, 5-10
attributes each is an expensive join.)
>
> BTW, I think it is (or should be) possible to create an index on
> hstore->'mycol', so at least one of the reasons why you should *need*
> to switch to a "real" database column seems bogus.

Oh, yea, function indexes work great. I think you did that right?

For what its worth, I don't expect you to jump all over this. It really is
a divergence from classic SQL design. I'm not even sure I like it. In
fact, I don't like it, but the argument that you are being forced to
create a second class data storage mechanism or a relational join for data
that is logically in a single relation does cause one to ponder the
problem.

Re: Ad-hoc table type?

From

"David E. Wheeler"

Date:

29 September 2008, 01:33:51

On Sep 28, 2008, at 17:46, Tom Lane wrote:

> BTW, I think it is (or should be) possible to create an index on
> hstore->'mycol', so at least one of the reasons why you should *need*
> to switch to a "real" database column seems bogus.

The docs say:
  <title>Indexes</title>
  <para>   <type>hstore</> has index support for <literal>@></> and  
<literal>?</>   operators.  You can use either GiST or GIN index types.  For  
example:  </para>  <programlisting>
CREATE INDEX hidx ON testhstore USING GIST(h);

CREATE INDEX hidx ON testhstore USING GIN(h);  </programlisting>

I'm not sure what that means. Can you create normal btree or hash  
indexes on hstore columns? And is the index useful for both `@>` and `? 
`?

Thanks,

David

Re: Ad-hoc table type?

From

Oleg Bartunov

Date:

29 September 2008, 02:04:49

What you're talking about is a document based database like
StrokeDB, CouchDB. With hstore you don't need to parse content of 
'aggregate' column, it provides necessary methods. Also, we tried 
to speedup selects using indexes. Probably, we need to refresh our 
interest to hstore, do you have any actual proposals ?

Oleg

On Sun, 28 Sep 2008, pgsql@mohawksoft.com wrote:

>> pgsql@mohawksoft.com writes:
>>> Something like this:
>>
>>> create adhoc table foo ();
>>
>>> insert into foo (name, rank, serial) values ('joe', 'sargent', '42');
>>
>>> In an "ad-hoc" table type, when an insert is made, and a column is not
>>> found, then a new varchar column is added.
>>
>>> I know the idea has a lot of holes, and is probably a bad idea, but it
>>> answers an important problem of easily mapping programmatic types to a
>>> database.
>>
>> Seems like a table with one contrib/hstore column might be more relevant
>> to this guy's idea of how to do database design.
>>
>
> That's actually a very cool module, I hadn't seen it before. I've
> considered writing something like it, but more XML centric, but I'm not
> sure it answers the concept.
>
> I'm not sure if you have dealt with web site sessions and object
> persistence crap, but its a pain to get up and running and improving
> performance is a drag. Web guys tend to know very little about databases
> and tend, sadly, not to be very inquisitive about such things.
>
> Web session and user attribute objects are typically stored in a database
> as XML, JSON, or some other aggregated format in a single column (hstore).
> That works great for when you just need to access the data by the key, but
> if you want to "use" the data outside the web application for something
> like OLAP, you have to decide which attributes reside in the aggregate
> column or get promoted to a full fledged column. That's why you'll see
> tables with username, passwdhash, email, etc. in addition to an aggregated
> column of things like screen template, age, etc.
>
> So, how do you have a table of a generally arbitrary number of columns
> without creating some sort of aggregate column?  With an aggregate column,
> the data isn't on the same level as real column data, so you need to parse
> the aggregate to extract a value, and you have to do that for each value.
> On top of that, you then have to explain your aggregate strategy to the
> web guys.
>
> Being able to insert arbitrary named values, and extracting them
> similarly, IMHO works "better" and more naturally than some external
> aggregate system built on a column. I know it is a little "outside the
> box" thinking, what do you think?
>
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Ad-hoc table type?

From

tomas@tuxteam.de

Date:

29 September 2008, 03:35:14

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, Sep 28, 2008 at 09:24:48PM -0700, David E. Wheeler wrote:
> On Sep 28, 2008, at 17:46, Tom Lane wrote:
>
>> BTW, I think it is (or should be) possible to create an index on
>> hstore->'mycol', so at least one of the reasons why you should *need*
>> to switch to a "real" database column seems bogus.

[...]

> I'm not sure what that means. Can you create normal btree or hash indexes 
> on hstore columns? And is the index useful for both `@>` and `?`?

That means that those operations are supported by a GiST (or GIN) index,
that is:
 "find the records where col contains 'foo => 1, bar => 2'"

is supported by the index. Likewise for "is contained in" and "has key".
It's a bit like having mini-indexes on all keys (although I guess not
that efficient). Pretty cool, I'd say.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD4DBQFI4HnHBcgs9XrR2kYRAgmiAJ0U9UD8KqX5vLXOGBlW+WwPzzIpEQCY1caS
F4Uug9QD6e0Jw18EvNm28g==
=f8q5
-----END PGP SIGNATURE-----

Re: Ad-hoc table type?

From

pgsql@mohawksoft.com

Date:

29 September 2008, 08:42:55

> What you're talking about is a document based database like
> StrokeDB, CouchDB. With hstore you don't need to parse content of
> 'aggregate' column, it provides necessary methods. Also, we tried
> to speedup selects using indexes. Probably, we need to refresh our
> interest to hstore, do you have any actual proposals ?

Proposals, not at this point. I'm trying to decide (a) if I have the time
and (b) do I do it with Postgres or SQLite. The hstore module, as I said,
looks really cool, I've contemplated something like it. I have a module
provides a set of accessors for an XML text column that works similarly,
but it parses the XML on each access and the application has to create the
XML. (I have XML creation modules for Java, PHP, C++, and standard C
bindings.)

It is more a conflict of data ideology, IMHO. There is a class of data
that is logically on the same level as other data, but is forced into a
secondary storage methodology. It isn't a pressing need as there are work
arounds, but don't you think a cleaner interface make sense? Also, what is
the overhead for the secondary storage mechanism? I think it would make
the life of application developers easier.


>
> Oleg
>
> On Sun, 28 Sep 2008, pgsql@mohawksoft.com wrote:
>
>>> pgsql@mohawksoft.com writes:
>>>> Something like this:
>>>
>>>> create adhoc table foo ();
>>>
>>>> insert into foo (name, rank, serial) values ('joe', 'sargent', '42');
>>>
>>>> In an "ad-hoc" table type, when an insert is made, and a column is not
>>>> found, then a new varchar column is added.
>>>
>>>> I know the idea has a lot of holes, and is probably a bad idea, but it
>>>> answers an important problem of easily mapping programmatic types to a
>>>> database.
>>>
>>> Seems like a table with one contrib/hstore column might be more
>>> relevant
>>> to this guy's idea of how to do database design.
>>>
>>
>> That's actually a very cool module, I hadn't seen it before. I've
>> considered writing something like it, but more XML centric, but I'm not
>> sure it answers the concept.
>>
>> I'm not sure if you have dealt with web site sessions and object
>> persistence crap, but its a pain to get up and running and improving
>> performance is a drag. Web guys tend to know very little about databases
>> and tend, sadly, not to be very inquisitive about such things.
>>
>> Web session and user attribute objects are typically stored in a
>> database
>> as XML, JSON, or some other aggregated format in a single column
>> (hstore).
>> That works great for when you just need to access the data by the key,
>> but
>> if you want to "use" the data outside the web application for something
>> like OLAP, you have to decide which attributes reside in the aggregate
>> column or get promoted to a full fledged column. That's why you'll see
>> tables with username, passwdhash, email, etc. in addition to an
>> aggregated
>> column of things like screen template, age, etc.
>>
>> So, how do you have a table of a generally arbitrary number of columns
>> without creating some sort of aggregate column?  With an aggregate
>> column,
>> the data isn't on the same level as real column data, so you need to
>> parse
>> the aggregate to extract a value, and you have to do that for each
>> value.
>> On top of that, you then have to explain your aggregate strategy to the
>> web guys.
>>
>> Being able to insert arbitrary named values, and extracting them
>> similarly, IMHO works "better" and more naturally than some external
>> aggregate system built on a column. I know it is a little "outside the
>> box" thinking, what do you think?
>>
>>
>
>      Regards,
>          Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: Ad-hoc table type?

From

"David E. Wheeler"

Date:

29 September 2008, 13:08:15

On Sep 28, 2008, at 23:46, tomas@tuxteam.de wrote:

>> I'm not sure what that means. Can you create normal btree or hash
>> indexes
>> on hstore columns? And is the index useful for both `@>` and `?`?
>
> That means that those operations are supported by a GiST (or GIN)
> index,
> that is:
>
>  "find the records where col contains 'foo => 1, bar => 2'"
>
> is supported by the index. Likewise for "is contained in" and "has
> key".
> It's a bit like having mini-indexes on all keys (although I guess not
> that efficient). Pretty cool, I'd say.

Yeah, that does sound good. I look forward to having an excuse for
playing with this type…

Best,

David

Re: Ad-hoc table type?

From

Decibel!

Date:

06 October 2008, 15:51:43

On Sep 29, 2008, at 6:16 AM, pgsql@mohawksoft.com wrote:
> The hstore module, as I said,
> looks really cool, I've contemplated something like it. I have a  
> module
> provides a set of accessors for an XML text column that works  
> similarly,
> but it parses the XML on each access and the application has to  
> create the
> XML. (I have XML creation modules for Java, PHP, C++, and standard C
> bindings.)

Yeah, "ad-hoc" storage is always a huge problem in databases. For  
years the only way to do it was with EAV, which is tricky at best.

In my experience, there typically isn't an un-bounded set of possible  
attribute names. It's usually fairly constrained, but the problem is  
that you never know when a new one will just pop up.

It's very common right now for people to use either XML or YAML to  
deal with this. That has it's own set of problems.

There's a few major improvements to be had here:

1: We should have a flexible storage mechanism that can either be  
used with it's own native syntax, or can interface to other hash  
formats such XML or YAML. Of course, both XML and YAML allow an  
obscene amount of nesting, etc, but generally people are only using  
these in a very simple form to emulate a hash table. It would be  
interesting to allow casting hstore to and from other proprietary  
hash formats as well, such as perl hashes.

2: Storage of attribute names can quickly become *very* expensive.  
Even with short 6-10 character names, you can easily end up using  
half the storage for just attribute names. I'd like to see hstore  
support storing attribute names in a lookup table, or using some  
other means to reduce the storage overhead.

3: Related to #2, storing numbers stinks because you end up burning 1  
byte per digit. Some concept of data type for an attribute would  
improve this.

Sadly, I don't have time to work on any of this. But these things are  
issues to my company, and we do have money. ;)
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828