Thread: latest hstore patch

latest hstore patch

From
Andrew Gierth
Date:
Hstore patch incorporating changes as previously discussed.

In addition the requested new features of conversions to and from
array formats have been added (with docs).

--
Andrew (irc:RhodiumToad)


Attachment

Re: latest hstore patch

From
"David E. Wheeler"
Date:
On Sep 22, 2009, at 7:18 PM, Andrew Gierth wrote:

> Hstore patch incorporating changes as previously discussed.
>
> In addition the requested new features of conversions to and from
> array formats have been added (with docs).

Thanks Andrew.

Just a few thoughts for discussion:

* From my previous posts: Is it time to kill off `@` and `~`,? Not
necessarily for your patch to handle, just wondering what others think.

* I like the %% operator for converting to arrays. Though I think
maybe I would have liked %@ better, but maybe that's just the Perl
hacker in me.

* I also like the new %# operator to convert to two-dimensional
arrays. But if you adopted %@ for arrays, maybe %@@ better indicates a
2-dimensional array? I'm just thinking out lout here, I'm happy to
have them no matter what they're called.

* More name stuff: Why `hstore_to_list` rather than `hstore_to_array`?
And I'm not sure about `hstore_to_matrix` for the 2-dimensional array.
I guess that's better than `hstore_to_multidimensional_array` would
be. ;-)

   For those following along at home, here's what these guys look like:

       SELECT %% 'a=>foo, b=>bar'::hstore as array_op,
              hstore_to_list('a=>foo, b=>bar'::hstore),
              %#  'a=>foo, b=>bar'::hstore as matrix_op,
              hstore_to_matrix('a=>foo, b=>bar'::hstore);
          array_op    | hstore_to_list |     matrix_op     |
hstore_to_matrix
       ---------------+----------------+-------------------
+-------------------
        {a,foo,b,bar} | {a,foo,b,bar}  | {{a,foo},{b,bar}} | {{a,foo},
{b,bar}}
       (1 row)

   Pretty cool!

* Thanks for updating the docs with:
   + BTREE and HASH index support
   + A fix for the populate_hash() pasto
   + A link to a discussion of backslashing and SQL standard strings
   + A note on the overhead of reading the old binary format
   + Notes on how to update from the old binary format

In the attached patch, I made a few tweaks to the hstore docs, after
applying your patch. I would have created a new patch with everything,
but ran out of time trying to convince Git to create a context diff.
This is a unified diff, but short, with just these changes:

* Fixed doc pasto for %#.
* Noted in docs that the format is new in 8.5, rather than "this
version".
* Eliminated a redundant "However, ".
* Added an example for creating a HASH index.

In sum: Modulo a discussion of the names of the array casting
operators and functions, I think this patch is ready for committer
review.

Thanks,

David


Attachment

Re: latest hstore patch

From
Andrew Gierth
Date:
>>>>> "David" == "David E Wheeler" <david@kineticode.com> writes:
David> Just a few thoughts for discussion:
David> * From my previous posts: Is it time to kill off `@` and `~`,?David> Not necessarily for your patch to handle,
justwondering whatDavid> others think.
 

I'll take them out if people think that is appropriate.
David> * I like the %% operator for converting to arrays. Though IDavid> think maybe I would have liked %@ better, but
maybethat'sDavid> just the Perl hacker in me.
 

I originally tried just % but something in the grammar stops you using
that for a unary op.
David> * I also like the new %# operator to convert toDavid> two-dimensional arrays. But if you adopted %@ for
arrays,David>maybe %@@ better indicates a 2-dimensional array? I'm justDavid> thinking out lout here, I'm happy to have
themno matter whatDavid> they're called.
 

%@@ is a bit on the ugly side for an operator I think.
David> * More name stuff: Why `hstore_to_list` rather thanDavid> `hstore_to_array`?  And I'm not sure about
`hstore_to_matrix`David>for the 2-dimensional array.  I guess that's better thanDavid>
`hstore_to_multidimensional_array`would be. ;-)
 

I intentionally avoided hstore_to_array because it would be unclear
which one it meant (the 1-d or 2-d result).

-- 
Andrew (irc:RhodiumToad)


Re: latest hstore patch

From
"David E. Wheeler"
Date:
On Sep 23, 2009, at 5:27 PM, Andrew Gierth wrote:

> I intentionally avoided hstore_to_array because it would be unclear
> which one it meant (the 1-d or 2-d result).

Thanks Andrew.

Given these replies, unless anyone else wants to weigh in on the array  
conversion operator and function names, this patch is ready for  
committer review (along with my tiny doc patch). I'll update the  
commitfest site to that effect.

Thanks,

David



Re: latest hstore patch

From
Tom Lane
Date:
[ starting to look at this now... ]

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "David" == "David E Wheeler" <david@kineticode.com> writes:
>  David> * More name stuff: Why `hstore_to_list` rather than
>  David> `hstore_to_array`?  And I'm not sure about `hstore_to_matrix`
>  David> for the 2-dimensional array.  I guess that's better than
>  David> `hstore_to_multidimensional_array` would be. ;-)

> I intentionally avoided hstore_to_array because it would be unclear
> which one it meant (the 1-d or 2-d result).

hstore_to_list seems like a pretty horrible name though for something
that produces an array.  I also note that "array" means "1-D array"
according to no less an authority than the SQL standard ;-).  I think
we could live with hstore_to_array and hstore_to_matrix.  Thoughts,
other ideas?
        regards, tom lane


Re: latest hstore patch

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
David> * More name stuff: Why `hstore_to_list` rather thanDavid> `hstore_to_array`?  And I'm not sure about
`hstore_to_matrix`David>for the 2-dimensional array.  I guess that's better thanDavid>
`hstore_to_multidimensional_array`would be. ;-)
 
>> I intentionally avoided hstore_to_array because it would be>> unclear which one it meant (the 1-d or 2-d result).
Tom> hstore_to_list seems like a pretty horrible name though forTom> something that produces an array.  I also note
that"array"Tom> means "1-D array" according to no less an authority than the SQLTom> standard ;-).  I think we could
livewith hstore_to_array andTom> hstore_to_matrix.  Thoughts, other ideas?
 

I don't feel particularly strongly about the name (I've also
intentionally held off on updating the pgfoundry version of the code
until this is settled so no-one else should care either).

My own expectation is that the operator should normally be used in
preference (though obviously people's tastes will vary in this
respect).

-- 
Andrew (irc:RhodiumToad)


Re: latest hstore patch

From
Josh Berkus
Date:
> I don't feel particularly strongly about the name (I've also
> intentionally held off on updating the pgfoundry version of the code
> until this is settled so no-one else should care either).

Well, since we already have string_to_array, hstore_to_array would be
consistent.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: latest hstore patch

From
"David E. Wheeler"
Date:
On Sep 29, 2009, at 4:11 PM, Andrew Gierth wrote:

> I don't feel particularly strongly about the name (I've also
> intentionally held off on updating the pgfoundry version of the code
> until this is settled so no-one else should care either).

I'm down with hstore_to_array() and hstore_to_matrix().

> My own expectation is that the operator should normally be used in
> preference (though obviously people's tastes will vary in this
> respect).

Sure. But I realized that I forgot to ask for array_to_hstore() and
matrix_to_hstore(). :-) Would love to have those, too. Not sure about
the operators…

Best,

David



Re: latest hstore patch

From
Andrew Gierth
Date:
>>>>> "David" == "David E Wheeler" <david@kineticode.com> writes:
>> I don't feel particularly strongly about the name (I've also>> intentionally held off on updating the pgfoundry
versionof the>> code until this is settled so no-one else should care either). 
David> I'm down with hstore_to_array() and hstore_to_matrix().
>> My own expectation is that the operator should normally be used in>> preference (though obviously people's tastes
willvary in this>> respect). 
David> Sure. But I realized that I forgot to ask forDavid> array_to_hstore() and matrix_to_hstore(). :-) Would love
toDavid>have those, too. Not sure about the operators… 

hstore(text[]) (which is also present as an explicit cast) covers both
of those cases since it can figure out from the array dimensions which
is intended.

--
Andrew (irc:RhodiumToad)


Re: latest hstore patch

From
"David E. Wheeler"
Date:
On Sep 29, 2009, at 5:00 PM, Andrew Gierth wrote:

> David> Sure. But I realized that I forgot to ask for
> David> array_to_hstore() and matrix_to_hstore(). :-) Would love to
> David> have those, too. Not sure about the operators…
>
> hstore(text[]) (which is also present as an explicit cast) covers both
> of those cases since it can figure out from the array dimensions which
> is intended.

Oooh! RhodiumToad++

Thanks,

David

Re: latest hstore patch

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Hstore patch incorporating changes as previously discussed.
> In addition the requested new features of conversions to and from
> array formats have been added (with docs).

Applied with some mostly-cosmetic editorialization.
        regards, tom lane


Re: latest hstore patch

From
"David E. Wheeler"
Date:
On Sep 30, 2009, at 12:52 PM, Tom Lane wrote:

> Applied with some mostly-cosmetic editorialization.

And there was much rejoicing…

David

Re: latest hstore patch

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> On Sep 30, 2009, at 12:52 PM, Tom Lane wrote:
>> Applied with some mostly-cosmetic editorialization.

> And there was much rejoicing�

... except in the buildfarm.  Must be some platform dependency we both
missed ...
        regards, tom lane


Re: latest hstore patch

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> And there was much rejoicing
Tom> ... except in the buildfarm.  Must be some platform dependencyTom> we both missed ...

"oops"

-- 
Andrew (irc:RhodiumToad)


Re: latest hstore patch

From
Bruce Momjian
Date:
Tom Lane wrote:
> > I intentionally avoided hstore_to_array because it would be unclear
> > which one it meant (the 1-d or 2-d result).
> 
> hstore_to_list seems like a pretty horrible name though for something
> that produces an array.  I also note that "array" means "1-D array"
> according to no less an authority than the SQL standard ;-).  I think
> we could live with hstore_to_array and hstore_to_matrix.  Thoughts,
> other ideas?

Off topic, but in normal English usage I thought 'vector' was a 1-D
array, and an array could be any number of dimensions.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: latest hstore patch

From
Bruce Momjian
Date:
Tom Lane wrote:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> > Hstore patch incorporating changes as previously discussed.
> > In addition the requested new features of conversions to and from
> > array formats have been added (with docs).
> 
> Applied with some mostly-cosmetic editorialization.

Are there any pg_migrator affects in this patch?  We had discussed this
issue in the past with this patch.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: latest hstore patch

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Are there any pg_migrator affects in this patch?  We had discussed this
> issue in the past with this patch.

The code is upward compatible with the old on-disk format, so that
end of things is fine.

There's still the issue of how to get the improved module definition
(new functions etc) into a migrated database.  That's not specific
to hstore in any way though, it would affect any contrib module that
had added stuff in a new release.
        regards, tom lane


Re: latest hstore patch

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Are there any pg_migrator affects in this patch?  We had discussed this
> > issue in the past with this patch.
> 
> The code is upward compatible with the old on-disk format, so that
> end of things is fine.
> 
> There's still the issue of how to get the improved module definition
> (new functions etc) into a migrated database.  That's not specific
> to hstore in any way though, it would affect any contrib module that
> had added stuff in a new release.

Most modules just install functions, which are easily
uninstalled/reinstalled.  A data type like hstore is more complicated
assuming it is the data type that is changing and not the support
functions.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: latest hstore patch

From
"David E. Wheeler"
Date:
On Oct 2, 2009, at 8:20 AM, Bruce Momjian wrote:

> Most modules just install functions, which are easily
> uninstalled/reinstalled.  A data type like hstore is more complicated
> assuming it is the data type that is changing and not the support
> functions.

Lots of modules install data types. From contrib:

* hstore
* uin
* citext
* cube
* inarray
* ltree

Plus lots of stuff on pgFoundry. It's a problem that needs to be  
solved. Surely someone, somewhere, has solved this problem, no?

Best,

David


Re: latest hstore patch

From
Alvaro Herrera
Date:
David E. Wheeler wrote:

> Plus lots of stuff on pgFoundry. It's a problem that needs to be
> solved. Surely someone, somewhere, has solved this problem, no?

Dump & reload?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: latest hstore patch

From
"David E. Wheeler"
Date:
On Oct 2, 2009, at 9:43 AM, Alvaro Herrera wrote:

>> Plus lots of stuff on pgFoundry. It's a problem that needs to be
>> solved. Surely someone, somewhere, has solved this problem, no?
>
> Dump & reload?

Hahahahaha. No, really. "Dump & reload" is a phrase that end users  
will not put up with for much longer.

Best,

David


Re: latest hstore patch

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> David E. Wheeler wrote:
>> Plus lots of stuff on pgFoundry. It's a problem that needs to be
>> solved. Surely someone, somewhere, has solved this problem, no?

> Dump & reload?

The point is it's *not* solved in the context of using pg_migrator.
        regards, tom lane


Re: latest hstore patch

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > David E. Wheeler wrote:
> >> Plus lots of stuff on pgFoundry. It's a problem that needs to be
> >> solved. Surely someone, somewhere, has solved this problem, no?
> 
> > Dump & reload?
> 
> The point is it's *not* solved in the context of using pg_migrator.

Yes, that's my point too, against David's argument that "surely someone
must have solved it".  What we have here is a new problem, so it's not
so clear that there's any solution at all (yet).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: latest hstore patch

From
"David E. Wheeler"
Date:
On Oct 2, 2009, at 10:04 AM, Alvaro Herrera wrote:

>> The point is it's *not* solved in the context of using pg_migrator.
>
> Yes, that's my point too, against David's argument that "surely  
> someone
> must have solved it".  What we have here is a new problem, so it's not
> so clear that there's any solution at all (yet).

Yeah, I didn't mean that someone must've solved it for PostgreSQL, but  
that this sort of problem must have been solved before, wherever  
binary data storage is an issue.

Best,

David


Re: latest hstore patch

From
Bruce Momjian
Date:
David E. Wheeler wrote:
> On Oct 2, 2009, at 9:43 AM, Alvaro Herrera wrote:
> 
> >> Plus lots of stuff on pgFoundry. It's a problem that needs to be
> >> solved. Surely someone, somewhere, has solved this problem, no?
> >
> > Dump & reload?
> 
> Hahahahaha. No, really. "Dump & reload" is a phrase that end users  
> will not put up with for much longer.

Well, if it is just changed syntax, we could wack around the system
catalogs.  If storage changes, we have to dump/reload that data type.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: latest hstore patch

From
"David E. Wheeler"
Date:
On Oct 2, 2009, at 11:14 AM, Bruce Momjian wrote:

> Well, if it is just changed syntax, we could wack around the system
> catalogs.  If storage changes, we have to dump/reload that data type.

Andrew solved this problem for hstore by making the new version able  
to read the old representation. It will also update to the new  
representation when you update a value.

Best,

David


Re: latest hstore patch

From
Bruce Momjian
Date:
David E. Wheeler wrote:
> On Oct 2, 2009, at 11:14 AM, Bruce Momjian wrote:
> 
> > Well, if it is just changed syntax, we could wack around the system
> > catalogs.  If storage changes, we have to dump/reload that data type.
> 
> Andrew solved this problem for hstore by making the new version able  
> to read the old representation. It will also update to the new  
> representation when you update a value.

Nice job.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: latest hstore patch

From
Dimitri Fontaine
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> On Oct 2, 2009, at 10:04 AM, Alvaro Herrera wrote:
>> Yes, that's my point too, against David's argument that "surely someone
>> must have solved it".  What we have here is a new problem, so it's not
>> so clear that there's any solution at all (yet).
>
> Yeah, I didn't mean that someone must've solved it for PostgreSQL, but that
> this sort of problem must have been solved before, wherever  binary data
> storage is an issue.

In the extension proposal you can find the idea of an upgrade hook
function called with current and new version of the extension as
arguments. This allows for the extension authors to provide the data
conversion support. We'd in fact want pg_migrator to find any columm
using a datatype offered by the extension and for each of them run:
 UPDATE t SET col = ext_upgrade_function(current_version, new_version, col);

A way to indicate that no ondisk change has been made will be a nice
optimisation, allowing to entirely skip the UPDATE step. Those
information should be easy to get from each extension's metadata (which
can point to functions, like ext_ondisk_change(version, version)) and
from pg_depend (any user column hosting an extension provided datatype
should have a tuple there, right?).

The reactions to this part of the proposal where not very warm, in
particular some where concerned that we still have a table rewrite here,
which pg_migrator tries hard to avoid, AFAIUI. But upgrading ondisk
format without rewriting table content is not something I feel able to
help provide.

Regards,
-- 
dim

PS: the original proposal for the hook let the upgrade function find
which columns to upgrade, on reflexion it's not that friendly...


Re: latest hstore patch

From
Andrew Gierth
Date:
>>>>> "Bruce" == Bruce Momjian <bruce@momjian.us> writes:
>> There's still the issue of how to get the improved module>> definition (new functions etc) into a migrated database.
That's>> not specific to hstore in any way though, it would affect any>> contrib module that had added stuff in a new
release.
Bruce> Most modules just install functions, which are easilyBruce> uninstalled/reinstalled.  A data type like hstore is
moreBruce>complicated assuming it is the data type that is changing andBruce> not the support functions.
 

For hstore, the various changes (and their solutions if any) are roughly
these:

1) new functions and operators - these don't present a migration issue
other than that they won't be available in a migrated db unless added,
which can be done after the fact with CREATE FUNCTION / CREATE OPERATOR
as in the install script. (This issue is the same for dump/restore
upgrades if the new version is not installed prior to the restore)

2) new internal C names for some functions - this is addressed in hstore
by defining both the new and old names, so no migration issue there
(running an after the fact CREATE OR REPLACE FUNCTION, as in the install
script, will remove the references to the old names; but even that much
isn't necessary unless there's actually a naming conflict)

3) Change in the representation of the underlying data. This is handled
by having the code recognize the old format and convert it on the fly;
this isn't ideal, but it does work.

4) Change in the SQL-level definition of the data type (specifically,
the new code adds SEND and RECEIVE functions which weren't previously
present). This is a hard one; currently, even for a dump/restore
upgrade, this requires that you run the new version's .sql file before
restoring the dump, otherwise you get the old type definition with
those functions missing, and there's no convenient way to add them
back.  A migration upgrade would have the same issue.

5) Changes to operator classes; the new version adds two new
opclasses, which is easy, but it also adds new operators to two
opclasses defined in the old version, and there's no ALTER or CREATE
OR REPLACE for those.  Again, with a dump/restore upgrade this is
fixable by installing the new version before restoring; if you don't
do that, there's no convenient way to enable access to the new
functionality short of dropping the old opclasses (and all the indexes
that use them) and recreating them.

It's really items (4) and (5) (and similar ones, such as changes to
operator definitions, and other SQL-level objects that don't have an
OR REPLACE option) that currently present an obstacle for all module
authors. (3) is something that's more of a case-by-case problem which
can be tackled within the module itself.

-- 
Andrew.