Thread: Ad-hoc table type?
I was in a discussion with someone about the difference between ad-hoc storage systems and SQL. Yes, I know, I was rolling my eyes as well. One thing did strike me though was the idea that a table could contain a variable number of columns. Something like this: create adhoc table foo (); insert into foo (name, rank, serial) values ('joe', 'sargent', '42'); In an "ad-hoc" table type, when an insert is made, and a column is not found, then a new varchar column is added. I know the idea has a lot of holes, and is probably a bad idea, but it answers an important problem of easily mapping programmatic types to a database. Anyone think its interesting?
pgsql@mohawksoft.com writes: > Something like this: > create adhoc table foo (); > insert into foo (name, rank, serial) values ('joe', 'sargent', '42'); > In an "ad-hoc" table type, when an insert is made, and a column is not > found, then a new varchar column is added. > I know the idea has a lot of holes, and is probably a bad idea, but it > answers an important problem of easily mapping programmatic types to a > database. Seems like a table with one contrib/hstore column might be more relevant to this guy's idea of how to do database design. regards, tom lane
> pgsql@mohawksoft.com writes: >> Something like this: > >> create adhoc table foo (); > >> insert into foo (name, rank, serial) values ('joe', 'sargent', '42'); > >> In an "ad-hoc" table type, when an insert is made, and a column is not >> found, then a new varchar column is added. > >> I know the idea has a lot of holes, and is probably a bad idea, but it >> answers an important problem of easily mapping programmatic types to a >> database. > > Seems like a table with one contrib/hstore column might be more relevant > to this guy's idea of how to do database design. > That's actually a very cool module, I hadn't seen it before. I've considered writing something like it, but more XML centric, but I'm not sure it answers the concept. I'm not sure if you have dealt with web site sessions and object persistence crap, but its a pain to get up and running and improving performance is a drag. Web guys tend to know very little about databases and tend, sadly, not to be very inquisitive about such things. Web session and user attribute objects are typically stored in a database as XML, JSON, or some other aggregated format in a single column (hstore). That works great for when you just need to access the data by the key, but if you want to "use" the data outside the web application for something like OLAP, you have to decide which attributes reside in the aggregate column or get promoted to a full fledged column. That's why you'll see tables with username, passwdhash, email, etc. in addition to an aggregated column of things like screen template, age, etc. So, how do you have a table of a generally arbitrary number of columns without creating some sort of aggregate column? With an aggregate column, the data isn't on the same level as real column data, so you need to parse the aggregate to extract a value, and you have to do that for each value. On top of that, you then have to explain your aggregate strategy to the web guys. Being able to insert arbitrary named values, and extracting them similarly, IMHO works "better" and more naturally than some external aggregate system built on a column. I know it is a little "outside the box" thinking, what do you think?
pgsql@mohawksoft.com writes: > Being able to insert arbitrary named values, and extracting them > similarly, IMHO works "better" and more naturally than some external > aggregate system built on a column. I know it is a little "outside the > box" thinking, what do you think? I'm failing to see the point. Allowing columns to spring into existence without any forethought seems to me to be all minuses and no pluses worth mentioning. * What if the column name is just a typo? * What datatype should it have? ("Always varchar" is just lame.) * Should it have an index? If so, should it be unique? * If you keep doing this, you'll soon find yourself reading out unbelievably wide tables (lots of columns), which won't be especially easy or efficient to process on either the backend or the client side. Plus you might run into the max-columns-per-tuple limit. If you've expended enough thought to be sure that the column is not just a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN command to tell the database the results of your genius. I do see the point that switching from "member of an hstore column" to "real database column" is pretty painful, but I don't see that "allow columns to spring into existence" solves that in any meaningful way. Is there some other way we could address such conversions? BTW, I think it is (or should be) possible to create an index on hstore->'mycol', so at least one of the reasons why you should *need* to switch to a "real" database column seems bogus. regards, tom lane
Not that I'm agreeing with the direction but just as a thinking experiment:<br /><br /> Tom Lane wrote: <blockquote cite="mid:22562.1222649186@sss.pgh.pa.us"type="cite"><pre wrap=""><a class="moz-txt-link-abbreviated" href="mailto:pgsql@mohawksoft.com">pgsql@mohawksoft.com</a>writes: </pre><blockquote type="cite"><pre wrap="">Being ableto insert arbitrary named values, and extracting them similarly, IMHO works "better" and more naturally than some external aggregate system built on a column. I know it is a little "outside the box" thinking, what do you think? </pre></blockquote><pre wrap=""> I'm failing to see the point. Allowing columns to spring into existence without any forethought seems to me to be all minuses and no pluses worth mentioning. * What if the column name is just a typo? </pre></blockquote><br /> If it's a field in a data structure from a language suchas Java, it's not a typo.<br /><br /><blockquote cite="mid:22562.1222649186@sss.pgh.pa.us" type="cite"><pre wrap="">*What datatype should it have? ("Always varchar" is just lame.) </pre></blockquote><br /> SQLite uses "always varchar"and it doesn't seem to be a problem. For simpler numbers like "0", the text form can be more compact, and the databasemay be portable across different hardware architectures.<br /><br /><blockquote cite="mid:22562.1222649186@sss.pgh.pa.us"type="cite"><pre wrap="">* Should it have an index? If so, should it be unique?</pre></blockquote><br /> It might be cool for indexes to automatically appear as they become beneficial (and removedas they become problematic). Unique is a constraint which should be considered separate from whether it should bean index or not. I don't know if it would be useful or not.<br /><br /><blockquote cite="mid:22562.1222649186@sss.pgh.pa.us"type="cite"><pre wrap="">* If you keep doing this, you'll soon find yourself readingout unbelievably wide tables (lots of columns), which won't be especially easy or efficient to process on either the backend or the client side. Plus you might run into the max-columns-per-tuple limit. </pre></blockquote><br /> Introduce variable field-order for tuples?Only provide values if non-null? :-)<br /><br /><blockquote cite="mid:22562.1222649186@sss.pgh.pa.us" type="cite"><prewrap="">If you've expended enough thought to be sure that the column is not just a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN command to tell the database the results of your genius. I do see the point that switching from "member of an hstore column" to "real database column" is pretty painful, but I don't see that "allow columns to spring into existence" solves that in any meaningful way. Is there some other way we could address such conversions? BTW, I think it is (or should be) possible to create an index on hstore->'mycol', so at least one of the reasons why you should *need* to switch to a "real" database column seems bogus. </pre></blockquote><br /> I find the Oracle nested table and data structuresupport enticing although I do not have experience with it. It seems like it might be a more mature implementationof hstore? If hstore had everything that was required in terms of performance or flexibility, we wouldn't needfixed columns at all?<br /><br /> But yes - I tend to agree that the object persistent layer can be hidden away behindsomething like the Java object persistence model, automatically doing alter table or providing a configured mappingfrom a description file. This isn't a problem that needs to be solved at the database layer.<br /><br /> Cheers,<br/> mark<br /><br /><pre class="moz-signature" cols="72">-- Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a> </pre>
> pgsql@mohawksoft.com writes: >> Being able to insert arbitrary named values, and extracting them >> similarly, IMHO works "better" and more naturally than some external >> aggregate system built on a column. I know it is a little "outside the >> box" thinking, what do you think? > > I'm failing to see the point. Allowing columns to spring into existence > without any forethought seems to me to be all minuses and no pluses > worth mentioning. > > * What if the column name is just a typo? In an automated system like PHP, Java, etc. that's not too likely. > > * What datatype should it have? ("Always varchar" is just lame.) varchar or text is not "just lame," SQLite used to do that exclusively. One could argue that XML is nothing more than text. > > * Should it have an index? If so, should it be unique? The answer to that is, well, no, not unless the dba generates one or it is declared. Just like any other column. All the rules that apply to "create table" and "alter table add column" just apply naturally as would be expected. create adhoc table userdata(username varchar, email varchar, primary key(email)); > > * If you keep doing this, you'll soon find yourself reading out > unbelievably wide tables (lots of columns), which won't be especially > easy or efficient to process on either the backend or the client side. > Plus you might run into the max-columns-per-tuple limit. Well, I fully understand that it is not a general purpose "unlimited" width sort of thing. In a programing environment, the target environment for this type of feature, it is unlikely to be a run-away problem. > > If you've expended enough thought to be sure that the column is not just > a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN > command to tell the database the results of your genius. Like I said, if you've never dealt with a live web site, maintained by a team of "web dudes," working furiously to keep their job and get paid, your only hope to keep up with "Oh! I needed to add the 'time to live' of the session into the session data" is to use an aggregate storage system. > > I do see the point that switching from "member of an hstore column" to > "real database column" is pretty painful, but I don't see that "allow > columns to spring into existence" solves that in any meaningful way. > Is there some other way we could address such conversions? Every other solution creates a second tier of data storage. You either deal with data elements at the table level, or you create a "roll your own" aggregate mechanism, or make a HUGE table of "user,name,value" table and force a join and index scan for every select. (A million users, 5-10 attributes each is an expensive join.) > > BTW, I think it is (or should be) possible to create an index on > hstore->'mycol', so at least one of the reasons why you should *need* > to switch to a "real" database column seems bogus. Oh, yea, function indexes work great. I think you did that right? For what its worth, I don't expect you to jump all over this. It really is a divergence from classic SQL design. I'm not even sure I like it. In fact, I don't like it, but the argument that you are being forced to create a second class data storage mechanism or a relational join for data that is logically in a single relation does cause one to ponder the problem.
On Sep 28, 2008, at 17:46, Tom Lane wrote: > BTW, I think it is (or should be) possible to create an index on > hstore->'mycol', so at least one of the reasons why you should *need* > to switch to a "real" database column seems bogus. The docs say: <title>Indexes</title> <para> <type>hstore</> has index support for <literal>@></> and <literal>?</> operators. You can use either GiST or GIN index types. For example: </para> <programlisting> CREATE INDEX hidx ON testhstore USING GIST(h); CREATE INDEX hidx ON testhstore USING GIN(h); </programlisting> I'm not sure what that means. Can you create normal btree or hash indexes on hstore columns? And is the index useful for both `@>` and `? `? Thanks, David
What you're talking about is a document based database like StrokeDB, CouchDB. With hstore you don't need to parse content of 'aggregate' column, it provides necessary methods. Also, we tried to speedup selects using indexes. Probably, we need to refresh our interest to hstore, do you have any actual proposals ? Oleg On Sun, 28 Sep 2008, pgsql@mohawksoft.com wrote: >> pgsql@mohawksoft.com writes: >>> Something like this: >> >>> create adhoc table foo (); >> >>> insert into foo (name, rank, serial) values ('joe', 'sargent', '42'); >> >>> In an "ad-hoc" table type, when an insert is made, and a column is not >>> found, then a new varchar column is added. >> >>> I know the idea has a lot of holes, and is probably a bad idea, but it >>> answers an important problem of easily mapping programmatic types to a >>> database. >> >> Seems like a table with one contrib/hstore column might be more relevant >> to this guy's idea of how to do database design. >> > > That's actually a very cool module, I hadn't seen it before. I've > considered writing something like it, but more XML centric, but I'm not > sure it answers the concept. > > I'm not sure if you have dealt with web site sessions and object > persistence crap, but its a pain to get up and running and improving > performance is a drag. Web guys tend to know very little about databases > and tend, sadly, not to be very inquisitive about such things. > > Web session and user attribute objects are typically stored in a database > as XML, JSON, or some other aggregated format in a single column (hstore). > That works great for when you just need to access the data by the key, but > if you want to "use" the data outside the web application for something > like OLAP, you have to decide which attributes reside in the aggregate > column or get promoted to a full fledged column. That's why you'll see > tables with username, passwdhash, email, etc. in addition to an aggregated > column of things like screen template, age, etc. > > So, how do you have a table of a generally arbitrary number of columns > without creating some sort of aggregate column? With an aggregate column, > the data isn't on the same level as real column data, so you need to parse > the aggregate to extract a value, and you have to do that for each value. > On top of that, you then have to explain your aggregate strategy to the > web guys. > > Being able to insert arbitrary named values, and extracting them > similarly, IMHO works "better" and more naturally than some external > aggregate system built on a column. I know it is a little "outside the > box" thinking, what do you think? > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, Sep 28, 2008 at 09:24:48PM -0700, David E. Wheeler wrote: > On Sep 28, 2008, at 17:46, Tom Lane wrote: > >> BTW, I think it is (or should be) possible to create an index on >> hstore->'mycol', so at least one of the reasons why you should *need* >> to switch to a "real" database column seems bogus. [...] > I'm not sure what that means. Can you create normal btree or hash indexes > on hstore columns? And is the index useful for both `@>` and `?`? That means that those operations are supported by a GiST (or GIN) index, that is: "find the records where col contains 'foo => 1, bar => 2'" is supported by the index. Likewise for "is contained in" and "has key". It's a bit like having mini-indexes on all keys (although I guess not that efficient). Pretty cool, I'd say. Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD4DBQFI4HnHBcgs9XrR2kYRAgmiAJ0U9UD8KqX5vLXOGBlW+WwPzzIpEQCY1caS F4Uug9QD6e0Jw18EvNm28g== =f8q5 -----END PGP SIGNATURE-----
> What you're talking about is a document based database like > StrokeDB, CouchDB. With hstore you don't need to parse content of > 'aggregate' column, it provides necessary methods. Also, we tried > to speedup selects using indexes. Probably, we need to refresh our > interest to hstore, do you have any actual proposals ? Proposals, not at this point. I'm trying to decide (a) if I have the time and (b) do I do it with Postgres or SQLite. The hstore module, as I said, looks really cool, I've contemplated something like it. I have a module provides a set of accessors for an XML text column that works similarly, but it parses the XML on each access and the application has to create the XML. (I have XML creation modules for Java, PHP, C++, and standard C bindings.) It is more a conflict of data ideology, IMHO. There is a class of data that is logically on the same level as other data, but is forced into a secondary storage methodology. It isn't a pressing need as there are work arounds, but don't you think a cleaner interface make sense? Also, what is the overhead for the secondary storage mechanism? I think it would make the life of application developers easier. > > Oleg > > On Sun, 28 Sep 2008, pgsql@mohawksoft.com wrote: > >>> pgsql@mohawksoft.com writes: >>>> Something like this: >>> >>>> create adhoc table foo (); >>> >>>> insert into foo (name, rank, serial) values ('joe', 'sargent', '42'); >>> >>>> In an "ad-hoc" table type, when an insert is made, and a column is not >>>> found, then a new varchar column is added. >>> >>>> I know the idea has a lot of holes, and is probably a bad idea, but it >>>> answers an important problem of easily mapping programmatic types to a >>>> database. >>> >>> Seems like a table with one contrib/hstore column might be more >>> relevant >>> to this guy's idea of how to do database design. >>> >> >> That's actually a very cool module, I hadn't seen it before. I've >> considered writing something like it, but more XML centric, but I'm not >> sure it answers the concept. >> >> I'm not sure if you have dealt with web site sessions and object >> persistence crap, but its a pain to get up and running and improving >> performance is a drag. Web guys tend to know very little about databases >> and tend, sadly, not to be very inquisitive about such things. >> >> Web session and user attribute objects are typically stored in a >> database >> as XML, JSON, or some other aggregated format in a single column >> (hstore). >> That works great for when you just need to access the data by the key, >> but >> if you want to "use" the data outside the web application for something >> like OLAP, you have to decide which attributes reside in the aggregate >> column or get promoted to a full fledged column. That's why you'll see >> tables with username, passwdhash, email, etc. in addition to an >> aggregated >> column of things like screen template, age, etc. >> >> So, how do you have a table of a generally arbitrary number of columns >> without creating some sort of aggregate column? With an aggregate >> column, >> the data isn't on the same level as real column data, so you need to >> parse >> the aggregate to extract a value, and you have to do that for each >> value. >> On top of that, you then have to explain your aggregate strategy to the >> web guys. >> >> Being able to insert arbitrary named values, and extracting them >> similarly, IMHO works "better" and more naturally than some external >> aggregate system built on a column. I know it is a little "outside the >> box" thinking, what do you think? >> >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On Sep 28, 2008, at 23:46, tomas@tuxteam.de wrote: >> I'm not sure what that means. Can you create normal btree or hash >> indexes >> on hstore columns? And is the index useful for both `@>` and `?`? > > That means that those operations are supported by a GiST (or GIN) > index, > that is: > > "find the records where col contains 'foo => 1, bar => 2'" > > is supported by the index. Likewise for "is contained in" and "has > key". > It's a bit like having mini-indexes on all keys (although I guess not > that efficient). Pretty cool, I'd say. Yeah, that does sound good. I look forward to having an excuse for playing with this type… Best, David
On Sep 29, 2008, at 6:16 AM, pgsql@mohawksoft.com wrote: > The hstore module, as I said, > looks really cool, I've contemplated something like it. I have a > module > provides a set of accessors for an XML text column that works > similarly, > but it parses the XML on each access and the application has to > create the > XML. (I have XML creation modules for Java, PHP, C++, and standard C > bindings.) Yeah, "ad-hoc" storage is always a huge problem in databases. For years the only way to do it was with EAV, which is tricky at best. In my experience, there typically isn't an un-bounded set of possible attribute names. It's usually fairly constrained, but the problem is that you never know when a new one will just pop up. It's very common right now for people to use either XML or YAML to deal with this. That has it's own set of problems. There's a few major improvements to be had here: 1: We should have a flexible storage mechanism that can either be used with it's own native syntax, or can interface to other hash formats such XML or YAML. Of course, both XML and YAML allow an obscene amount of nesting, etc, but generally people are only using these in a very simple form to emulate a hash table. It would be interesting to allow casting hstore to and from other proprietary hash formats as well, such as perl hashes. 2: Storage of attribute names can quickly become *very* expensive. Even with short 6-10 character names, you can easily end up using half the storage for just attribute names. I'd like to see hstore support storing attribute names in a lookup table, or using some other means to reduce the storage overhead. 3: Related to #2, storing numbers stinks because you end up burning 1 byte per digit. Some concept of data type for an attribute would improve this. Sadly, I don't have time to work on any of this. But these things are issues to my company, and we do have money. ;) -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828