Thread: latest hstore patch
Hstore patch incorporating changes as previously discussed. In addition the requested new features of conversions to and from array formats have been added (with docs). -- Andrew (irc:RhodiumToad)
Attachment
On Sep 22, 2009, at 7:18 PM, Andrew Gierth wrote: > Hstore patch incorporating changes as previously discussed. > > In addition the requested new features of conversions to and from > array formats have been added (with docs). Thanks Andrew. Just a few thoughts for discussion: * From my previous posts: Is it time to kill off `@` and `~`,? Not necessarily for your patch to handle, just wondering what others think. * I like the %% operator for converting to arrays. Though I think maybe I would have liked %@ better, but maybe that's just the Perl hacker in me. * I also like the new %# operator to convert to two-dimensional arrays. But if you adopted %@ for arrays, maybe %@@ better indicates a 2-dimensional array? I'm just thinking out lout here, I'm happy to have them no matter what they're called. * More name stuff: Why `hstore_to_list` rather than `hstore_to_array`? And I'm not sure about `hstore_to_matrix` for the 2-dimensional array. I guess that's better than `hstore_to_multidimensional_array` would be. ;-) For those following along at home, here's what these guys look like: SELECT %% 'a=>foo, b=>bar'::hstore as array_op, hstore_to_list('a=>foo, b=>bar'::hstore), %# 'a=>foo, b=>bar'::hstore as matrix_op, hstore_to_matrix('a=>foo, b=>bar'::hstore); array_op | hstore_to_list | matrix_op | hstore_to_matrix ---------------+----------------+------------------- +------------------- {a,foo,b,bar} | {a,foo,b,bar} | {{a,foo},{b,bar}} | {{a,foo}, {b,bar}} (1 row) Pretty cool! * Thanks for updating the docs with: + BTREE and HASH index support + A fix for the populate_hash() pasto + A link to a discussion of backslashing and SQL standard strings + A note on the overhead of reading the old binary format + Notes on how to update from the old binary format In the attached patch, I made a few tweaks to the hstore docs, after applying your patch. I would have created a new patch with everything, but ran out of time trying to convince Git to create a context diff. This is a unified diff, but short, with just these changes: * Fixed doc pasto for %#. * Noted in docs that the format is new in 8.5, rather than "this version". * Eliminated a redundant "However, ". * Added an example for creating a HASH index. In sum: Modulo a discussion of the names of the array casting operators and functions, I think this patch is ready for committer review. Thanks, David
Attachment
>>>>> "David" == "David E Wheeler" <david@kineticode.com> writes: David> Just a few thoughts for discussion: David> * From my previous posts: Is it time to kill off `@` and `~`,?David> Not necessarily for your patch to handle, justwondering whatDavid> others think. I'll take them out if people think that is appropriate. David> * I like the %% operator for converting to arrays. Though IDavid> think maybe I would have liked %@ better, but maybethat'sDavid> just the Perl hacker in me. I originally tried just % but something in the grammar stops you using that for a unary op. David> * I also like the new %# operator to convert toDavid> two-dimensional arrays. But if you adopted %@ for arrays,David>maybe %@@ better indicates a 2-dimensional array? I'm justDavid> thinking out lout here, I'm happy to have themno matter whatDavid> they're called. %@@ is a bit on the ugly side for an operator I think. David> * More name stuff: Why `hstore_to_list` rather thanDavid> `hstore_to_array`? And I'm not sure about `hstore_to_matrix`David>for the 2-dimensional array. I guess that's better thanDavid> `hstore_to_multidimensional_array`would be. ;-) I intentionally avoided hstore_to_array because it would be unclear which one it meant (the 1-d or 2-d result). -- Andrew (irc:RhodiumToad)
On Sep 23, 2009, at 5:27 PM, Andrew Gierth wrote: > I intentionally avoided hstore_to_array because it would be unclear > which one it meant (the 1-d or 2-d result). Thanks Andrew. Given these replies, unless anyone else wants to weigh in on the array conversion operator and function names, this patch is ready for committer review (along with my tiny doc patch). I'll update the commitfest site to that effect. Thanks, David
[ starting to look at this now... ] Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "David" == "David E Wheeler" <david@kineticode.com> writes: > David> * More name stuff: Why `hstore_to_list` rather than > David> `hstore_to_array`? And I'm not sure about `hstore_to_matrix` > David> for the 2-dimensional array. I guess that's better than > David> `hstore_to_multidimensional_array` would be. ;-) > I intentionally avoided hstore_to_array because it would be unclear > which one it meant (the 1-d or 2-d result). hstore_to_list seems like a pretty horrible name though for something that produces an array. I also note that "array" means "1-D array" according to no less an authority than the SQL standard ;-). I think we could live with hstore_to_array and hstore_to_matrix. Thoughts, other ideas? regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: David> * More name stuff: Why `hstore_to_list` rather thanDavid> `hstore_to_array`? And I'm not sure about `hstore_to_matrix`David>for the 2-dimensional array. I guess that's better thanDavid> `hstore_to_multidimensional_array`would be. ;-) >> I intentionally avoided hstore_to_array because it would be>> unclear which one it meant (the 1-d or 2-d result). Tom> hstore_to_list seems like a pretty horrible name though forTom> something that produces an array. I also note that"array"Tom> means "1-D array" according to no less an authority than the SQLTom> standard ;-). I think we could livewith hstore_to_array andTom> hstore_to_matrix. Thoughts, other ideas? I don't feel particularly strongly about the name (I've also intentionally held off on updating the pgfoundry version of the code until this is settled so no-one else should care either). My own expectation is that the operator should normally be used in preference (though obviously people's tastes will vary in this respect). -- Andrew (irc:RhodiumToad)
> I don't feel particularly strongly about the name (I've also > intentionally held off on updating the pgfoundry version of the code > until this is settled so no-one else should care either). Well, since we already have string_to_array, hstore_to_array would be consistent. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
On Sep 29, 2009, at 4:11 PM, Andrew Gierth wrote: > I don't feel particularly strongly about the name (I've also > intentionally held off on updating the pgfoundry version of the code > until this is settled so no-one else should care either). I'm down with hstore_to_array() and hstore_to_matrix(). > My own expectation is that the operator should normally be used in > preference (though obviously people's tastes will vary in this > respect). Sure. But I realized that I forgot to ask for array_to_hstore() and matrix_to_hstore(). :-) Would love to have those, too. Not sure about the operators… Best, David
>>>>> "David" == "David E Wheeler" <david@kineticode.com> writes: >> I don't feel particularly strongly about the name (I've also>> intentionally held off on updating the pgfoundry versionof the>> code until this is settled so no-one else should care either). David> I'm down with hstore_to_array() and hstore_to_matrix(). >> My own expectation is that the operator should normally be used in>> preference (though obviously people's tastes willvary in this>> respect). David> Sure. But I realized that I forgot to ask forDavid> array_to_hstore() and matrix_to_hstore(). :-) Would love toDavid>have those, too. Not sure about the operators… hstore(text[]) (which is also present as an explicit cast) covers both of those cases since it can figure out from the array dimensions which is intended. -- Andrew (irc:RhodiumToad)
On Sep 29, 2009, at 5:00 PM, Andrew Gierth wrote: > David> Sure. But I realized that I forgot to ask for > David> array_to_hstore() and matrix_to_hstore(). :-) Would love to > David> have those, too. Not sure about the operators… > > hstore(text[]) (which is also present as an explicit cast) covers both > of those cases since it can figure out from the array dimensions which > is intended. Oooh! RhodiumToad++ Thanks, David
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > Hstore patch incorporating changes as previously discussed. > In addition the requested new features of conversions to and from > array formats have been added (with docs). Applied with some mostly-cosmetic editorialization. regards, tom lane
On Sep 30, 2009, at 12:52 PM, Tom Lane wrote: > Applied with some mostly-cosmetic editorialization. And there was much rejoicing… David
"David E. Wheeler" <david@kineticode.com> writes: > On Sep 30, 2009, at 12:52 PM, Tom Lane wrote: >> Applied with some mostly-cosmetic editorialization. > And there was much rejoicing� ... except in the buildfarm. Must be some platform dependency we both missed ... regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> And there was much rejoicing Tom> ... except in the buildfarm. Must be some platform dependencyTom> we both missed ... "oops" -- Andrew (irc:RhodiumToad)
Tom Lane wrote: > > I intentionally avoided hstore_to_array because it would be unclear > > which one it meant (the 1-d or 2-d result). > > hstore_to_list seems like a pretty horrible name though for something > that produces an array. I also note that "array" means "1-D array" > according to no less an authority than the SQL standard ;-). I think > we could live with hstore_to_array and hstore_to_matrix. Thoughts, > other ideas? Off topic, but in normal English usage I thought 'vector' was a 1-D array, and an array could be any number of dimensions. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote: > Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > > Hstore patch incorporating changes as previously discussed. > > In addition the requested new features of conversions to and from > > array formats have been added (with docs). > > Applied with some mostly-cosmetic editorialization. Are there any pg_migrator affects in this patch? We had discussed this issue in the past with this patch. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Are there any pg_migrator affects in this patch? We had discussed this > issue in the past with this patch. The code is upward compatible with the old on-disk format, so that end of things is fine. There's still the issue of how to get the improved module definition (new functions etc) into a migrated database. That's not specific to hstore in any way though, it would affect any contrib module that had added stuff in a new release. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Are there any pg_migrator affects in this patch? We had discussed this > > issue in the past with this patch. > > The code is upward compatible with the old on-disk format, so that > end of things is fine. > > There's still the issue of how to get the improved module definition > (new functions etc) into a migrated database. That's not specific > to hstore in any way though, it would affect any contrib module that > had added stuff in a new release. Most modules just install functions, which are easily uninstalled/reinstalled. A data type like hstore is more complicated assuming it is the data type that is changing and not the support functions. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Oct 2, 2009, at 8:20 AM, Bruce Momjian wrote: > Most modules just install functions, which are easily > uninstalled/reinstalled. A data type like hstore is more complicated > assuming it is the data type that is changing and not the support > functions. Lots of modules install data types. From contrib: * hstore * uin * citext * cube * inarray * ltree Plus lots of stuff on pgFoundry. It's a problem that needs to be solved. Surely someone, somewhere, has solved this problem, no? Best, David
David E. Wheeler wrote: > Plus lots of stuff on pgFoundry. It's a problem that needs to be > solved. Surely someone, somewhere, has solved this problem, no? Dump & reload? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Oct 2, 2009, at 9:43 AM, Alvaro Herrera wrote: >> Plus lots of stuff on pgFoundry. It's a problem that needs to be >> solved. Surely someone, somewhere, has solved this problem, no? > > Dump & reload? Hahahahaha. No, really. "Dump & reload" is a phrase that end users will not put up with for much longer. Best, David
Alvaro Herrera <alvherre@commandprompt.com> writes: > David E. Wheeler wrote: >> Plus lots of stuff on pgFoundry. It's a problem that needs to be >> solved. Surely someone, somewhere, has solved this problem, no? > Dump & reload? The point is it's *not* solved in the context of using pg_migrator. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > David E. Wheeler wrote: > >> Plus lots of stuff on pgFoundry. It's a problem that needs to be > >> solved. Surely someone, somewhere, has solved this problem, no? > > > Dump & reload? > > The point is it's *not* solved in the context of using pg_migrator. Yes, that's my point too, against David's argument that "surely someone must have solved it". What we have here is a new problem, so it's not so clear that there's any solution at all (yet). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Oct 2, 2009, at 10:04 AM, Alvaro Herrera wrote: >> The point is it's *not* solved in the context of using pg_migrator. > > Yes, that's my point too, against David's argument that "surely > someone > must have solved it". What we have here is a new problem, so it's not > so clear that there's any solution at all (yet). Yeah, I didn't mean that someone must've solved it for PostgreSQL, but that this sort of problem must have been solved before, wherever binary data storage is an issue. Best, David
David E. Wheeler wrote: > On Oct 2, 2009, at 9:43 AM, Alvaro Herrera wrote: > > >> Plus lots of stuff on pgFoundry. It's a problem that needs to be > >> solved. Surely someone, somewhere, has solved this problem, no? > > > > Dump & reload? > > Hahahahaha. No, really. "Dump & reload" is a phrase that end users > will not put up with for much longer. Well, if it is just changed syntax, we could wack around the system catalogs. If storage changes, we have to dump/reload that data type. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Oct 2, 2009, at 11:14 AM, Bruce Momjian wrote: > Well, if it is just changed syntax, we could wack around the system > catalogs. If storage changes, we have to dump/reload that data type. Andrew solved this problem for hstore by making the new version able to read the old representation. It will also update to the new representation when you update a value. Best, David
David E. Wheeler wrote: > On Oct 2, 2009, at 11:14 AM, Bruce Momjian wrote: > > > Well, if it is just changed syntax, we could wack around the system > > catalogs. If storage changes, we have to dump/reload that data type. > > Andrew solved this problem for hstore by making the new version able > to read the old representation. It will also update to the new > representation when you update a value. Nice job. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"David E. Wheeler" <david@kineticode.com> writes: > On Oct 2, 2009, at 10:04 AM, Alvaro Herrera wrote: >> Yes, that's my point too, against David's argument that "surely someone >> must have solved it". What we have here is a new problem, so it's not >> so clear that there's any solution at all (yet). > > Yeah, I didn't mean that someone must've solved it for PostgreSQL, but that > this sort of problem must have been solved before, wherever binary data > storage is an issue. In the extension proposal you can find the idea of an upgrade hook function called with current and new version of the extension as arguments. This allows for the extension authors to provide the data conversion support. We'd in fact want pg_migrator to find any columm using a datatype offered by the extension and for each of them run: UPDATE t SET col = ext_upgrade_function(current_version, new_version, col); A way to indicate that no ondisk change has been made will be a nice optimisation, allowing to entirely skip the UPDATE step. Those information should be easy to get from each extension's metadata (which can point to functions, like ext_ondisk_change(version, version)) and from pg_depend (any user column hosting an extension provided datatype should have a tuple there, right?). The reactions to this part of the proposal where not very warm, in particular some where concerned that we still have a table rewrite here, which pg_migrator tries hard to avoid, AFAIUI. But upgrading ondisk format without rewriting table content is not something I feel able to help provide. Regards, -- dim PS: the original proposal for the hook let the upgrade function find which columns to upgrade, on reflexion it's not that friendly...
>>>>> "Bruce" == Bruce Momjian <bruce@momjian.us> writes: >> There's still the issue of how to get the improved module>> definition (new functions etc) into a migrated database. That's>> not specific to hstore in any way though, it would affect any>> contrib module that had added stuff in a new release. Bruce> Most modules just install functions, which are easilyBruce> uninstalled/reinstalled. A data type like hstore is moreBruce>complicated assuming it is the data type that is changing andBruce> not the support functions. For hstore, the various changes (and their solutions if any) are roughly these: 1) new functions and operators - these don't present a migration issue other than that they won't be available in a migrated db unless added, which can be done after the fact with CREATE FUNCTION / CREATE OPERATOR as in the install script. (This issue is the same for dump/restore upgrades if the new version is not installed prior to the restore) 2) new internal C names for some functions - this is addressed in hstore by defining both the new and old names, so no migration issue there (running an after the fact CREATE OR REPLACE FUNCTION, as in the install script, will remove the references to the old names; but even that much isn't necessary unless there's actually a naming conflict) 3) Change in the representation of the underlying data. This is handled by having the code recognize the old format and convert it on the fly; this isn't ideal, but it does work. 4) Change in the SQL-level definition of the data type (specifically, the new code adds SEND and RECEIVE functions which weren't previously present). This is a hard one; currently, even for a dump/restore upgrade, this requires that you run the new version's .sql file before restoring the dump, otherwise you get the old type definition with those functions missing, and there's no convenient way to add them back. A migration upgrade would have the same issue. 5) Changes to operator classes; the new version adds two new opclasses, which is easy, but it also adds new operators to two opclasses defined in the old version, and there's no ALTER or CREATE OR REPLACE for those. Again, with a dump/restore upgrade this is fixable by installing the new version before restoring; if you don't do that, there's no convenient way to enable access to the new functionality short of dropping the old opclasses (and all the indexes that use them) and recreating them. It's really items (4) and (5) (and similar ones, such as changes to operator definitions, and other SQL-level objects that don't have an OR REPLACE option) that currently present an obstacle for all module authors. (3) is something that's more of a case-by-case problem which can be tackled within the module itself. -- Andrew.