Thread: record identical operator
Attached is a patch for a bit of infrastructure I believe to be necessary for correct behavior of REFRESH MATERIALIZED VIEW CONCURRENTLY as well as incremental maintenance of matviews. The idea is that after RMVC or incremental maintenance, the matview should not be visibly different that it would have been at creation or on a non-concurrent REFRESH. The issue is easy to demonstrate with citext, but anywhere that the = operator allows user-visible differences between "equal" values it can be an issue. test=# CREATE TABLE citext_table ( test(# id serial primary key, test(# name citext test(# ); CREATE TABLE test=# INSERT INTO citext_table (name) test-# VALUES ('one'), ('two'), ('three'), (NULL), (NULL); INSERT 0 5 test=# CREATE MATERIALIZED VIEW citext_matview AS test-# SELECT * FROM citext_table; SELECT 5 test=# CREATE UNIQUE INDEX citext_matview_id test-# ON citext_matview (id); CREATE INDEX test=# UPDATE citext_table SET name = 'Two' WHERE name = 'TWO'; UPDATE 1 At this point, the table and the matview have visibly different values, yet without the patch the query used to find differences for RMVC would be essentially like this (slightly simplified for readability): test=# SELECT * FROM citext_matview m FULL JOIN citext_table t ON (t.id = m.id AND t = m) WHERE t IS NULL OR m IS NULL; id | name | id | name ----+------+----+------ (0 rows) No differences were found, so without this patch, the matview would remain visibly different from the results generated by a run of its defining query. The patch adds an "identical" operator (===) for the record type: test=# SELECT * FROM citext_matview m FULL JOIN citext_table t ON (t.id = m.id AND t === m) WHERE t IS NULL OR m IS NULL; id | name | id | name ----+------+----+------ | | 2 | Two 2 | two | | (2 rows) The difference is now found, so RMVC makes the appropriate change. test=# REFRESH MATERIALIZED VIEW CONCURRENTLY citext_matview; REFRESH MATERIALIZED VIEW test=# SELECT * FROM citext_matview ORDER BY id; id | name ----+------- 1 | one 2 | Two 3 | three 4 | 5 | (5 rows) The patch adds all of the functions, operators, and catalog information to support merge joins using the "identical" operator. The new operator is logically similar to IS NOT DISTINCT FROM for a record, although its implementation is very different. For one thing, it doesn't replace the operation with column level operators in the parser. For another thing, it doesn't look up operators for each type, so the "identical" operator does not need to be implemented for each type to use it as shown above. It compares values byte-for-byte, after detoasting. The test for identical records can avoid the detoasting altogether for any values with different lengths, and it stops when it finds the first column with a difference. I toyed with the idea of supporting hashing of records using this operator, but could not see how that would be a performance win. The identical (===) and not identical (!==) operator names were chosen because of a vague similarity to the "exactly equals" concepts in JavaScript and PHP, which use that name. The semantics aren't quite the same, but it seemed close enough not to be too surprising. The additional operator names seemed natural to me based on the first two, but I'm not really that attached to these names for the operators if someone has a better idea. Since the comparison of record values is not documented (only comparisons involving row value constructors), it doesn't seem like we should document this special case. It is intended primarily for support of matview refresh and maintenance, and it seems likely that record comparison was not documented on the basis that it is intended primarily for support of such things as indexing and merge joins -- so leaving the new operators undocumented seems consistent with existing policy. I'm open to arguments that the policy should change. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Thu, Sep 12, 2013 at 11:27 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Attached is a patch for a bit of infrastructure I believe to be
necessary for correct behavior of REFRESH MATERIALIZED VIEW
CONCURRENTLY as well as incremental maintenance of matviews.
[...]
The patch adds an "identical" operator (===) for the record type:
[...]
The new operator is logically similar to IS NOT DISTINCT FROM for a
record, although its implementation is very different. For one
thing, it doesn't replace the operation with column level operators
in the parser. For another thing, it doesn't look up operators for
each type, so the "identical" operator does not need to be
implemented for each type to use it as shown above. It compares
values byte-for-byte, after detoasting. The test for identical
records can avoid the detoasting altogether for any values with
different lengths, and it stops when it finds the first column with
a difference.
I toyed with the idea of supporting hashing of records using this
operator, but could not see how that would be a performance win.
The identical (===) and not identical (!==) operator names were
chosen because of a vague similarity to the "exactly equals"
concepts in JavaScript and PHP, which use that name. The semantics
aren't quite the same, but it seemed close enough not to be too
surprising. The additional operator names seemed natural to me
based on the first two, but I'm not really that attached to these
names for the operators if someone has a better idea.
Since the comparison of record values is not documented (only
comparisons involving row value constructors), it doesn't seem like
we should document this special case. It is intended primarily for
support of matview refresh and maintenance, and it seems likely
that record comparison was not documented on the basis that it is
intended primarily for support of such things as indexing and merge
joins -- so leaving the new operators undocumented seems consistent
with existing policy. I'm open to arguments that the policy should
change.
-
Wouldn't it be slightly less surprising / magical to not declare new operators
but just new primitive functions? In the least invasive version they could even
be called matview_is_record_identical or similar.
but just new primitive functions? In the least invasive version they could even
be called matview_is_record_identical or similar.
cheers,
Bene
Bene
Benedikt Grundmann <bgrundmann@janestreet.com> wrote: > Kevin Grittner <kgrittn@ymail.com> wrote: > >> Attached is a patch for a bit of infrastructure I believe to be >> necessary for correct behavior of REFRESH MATERIALIZED VIEW >> CONCURRENTLY as well as incremental maintenance of matviews. >> [...] >> The patch adds an "identical" operator (===) for the record >> type: > Wouldn't it be slightly less surprising / magical to not declare > new operators but just new primitive functions? In the least > invasive version they could even be called > matview_is_record_identical or similar. I'm not sure what is particularly surprising or magical about new operators, but it is true that the patch could be smaller if operators were not added. The SQL functions added by the current patch to support the operator approach are: extern Datum record_image_eq(PG_FUNCTION_ARGS); extern Datum record_image_ne(PG_FUNCTION_ARGS); extern Datum record_image_lt(PG_FUNCTION_ARGS); extern Datum record_image_gt(PG_FUNCTION_ARGS); extern Datum record_image_le(PG_FUNCTION_ARGS); extern Datum record_image_ge(PG_FUNCTION_ARGS); extern Datum btrecordimagecmp(PG_FUNCTION_ARGS); All take two record arguments and all but the last return a boolean. The last one returns -1, 0, or 1 depending on how the values compare. All comparisons are based on memcmp() of the data in its storage format (after detoasting, where applicable). As currently written, the patch still requires a unique index on the matview in order to allow RMVC, but this patch was intended to support removal of that restriction, which is something some people were saying they wanted. It just seemed best to do that with a separate patch once we had the mechanism to support it. RMVC currently generates its "diff" data with a query similar to this: test=# explain analyze test-# SELECT * test-# FROM citext_matview m test-# FULL JOIN citext_table t ON (t === m) test-# WHERE t.id IS NULL OR m.id IS NULL; test=# \pset pager off Pager usage is off. test=# explain analyze SELECT * FROM citext_matview m FULL JOIN citext_table t ON (t.id = m.id AND t === m) WHERE t.id IS NULL OR m.id IS NULL; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Hash Full Join (cost=1.11..2.24 rows=1 width=16) (actual time=0.056..0.056 rows=0 loops=1) Hash Cond: (m.id = t.id) Join Filter: (t.* === m.*) Filter: ((t.id IS NULL) OR (m.id IS NULL)) Rows Removed by Filter: 5 -> Seq Scan on citext_matview m (cost=0.00..1.05 rows=5 width=40) (actual time=0.002..0.006 rows=5 loops=1) -> Hash (cost=1.05..1.05 rows=5 width=40) (actual time=0.023..0.023 rows=5 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Seq Scan on citext_table t (cost=0.00..1.05 rows=5 width=40) (actual time=0.010..0.016 rows=5 loops=1) Total runtime: 0.102 ms (10 rows) With the operator support, we can remove the primary key columns from the join conditions, and it still works, albeit with a slower plan: test=# explain analyze SELECT * FROM citext_matview m FULL JOIN citext_table t ON (t === m) WHERE t.id IS NULL OR m.id IS NULL; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Merge Full Join (cost=2.22..2.32 rows=1 width=16) (actual time=0.072..0.072 rows=0 loops=1) Merge Cond: (m.* === t.*) Filter: ((t.id IS NULL) OR (m.id IS NULL)) Rows Removed by Filter: 5 -> Sort (cost=1.11..1.12 rows=5 width=40) (actual time=0.035..0.035 rows=5 loops=1) Sort Key: m.* Sort Method: quicksort Memory: 25kB -> Seq Scan on citext_matview m (cost=0.00..1.05 rows=5 width=40) (actual time=0.012..0.016 rows=5 loops=1) -> Sort (cost=1.11..1.12 rows=5 width=40) (actual time=0.014..0.014 rows=5 loops=1) Sort Key: t.* Sort Method: quicksort Memory: 25kB -> Seq Scan on citext_table t (cost=0.00..1.05 rows=5 width=40) (actual time=0.003..0.003 rows=5 loops=1) Total runtime: 0.128 ms (13 rows) So, if the operators are included it will be a very small and simple patch to relax the requirement for a unique index. Currently we generate an error if no index columns were found, but with the new operators we could leave them out and the query would still work. Adding indexes to matviews would then be just a matter of optimization, not a requirement to be able to use the RMVC feature. Using the functions instead of the operators things work just as well as long as we use columns in the join conditions, which is currently based on indexed columns: test=# explain analyze test-# SELECT * test-# FROM citext_matview m test-# FULL JOIN citext_table t ON (t.id = m.id AND record_image_eq(t, m)) test-# WHERE t.id IS NULL OR m.id IS NULL; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Hash Full Join (cost=1.11..2.24 rows=1 width=16) (actual time=0.056..0.056 rows=0 loops=1) Hash Cond: (m.id = t.id) Join Filter: record_image_eq(t.*, m.*) Filter: ((t.id IS NULL) OR (m.id IS NULL)) Rows Removed by Filter: 5 -> Seq Scan on citext_matview m (cost=0.00..1.05 rows=5 width=40) (actual time=0.003..0.006 rows=5 loops=1) -> Hash (cost=1.05..1.05 rows=5 width=40) (actual time=0.023..0.023 rows=5 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Seq Scan on citext_table t (cost=0.00..1.05 rows=5 width=40) (actual time=0.011..0.015 rows=5 loops=1) Total runtime: 0.101 ms (10 rows) Without columns in the FULL JOIN conditions, the function fails entirely, because it is the operator information which lets the planner know that a FULL JOIN is even possible: test=# explain analyze test-# SELECT * test-# FROM citext_matview m test-# FULL JOIN citext_table t ON (record_image_eq(t, m)) test-# WHERE t.id IS NULL OR m.id IS NULL; ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions So, either the indexes would continue to be required, or we would need some other criteria for deciding which columns to use for the additional FULL JOIN criteria. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Kevin, On 2013-09-12 15:27:27 -0700, Kevin Grittner wrote: > The new operator is logically similar to IS NOT DISTINCT FROM for a > record, although its implementation is very different. For one > thing, it doesn't replace the operation with column level operators > in the parser. For another thing, it doesn't look up operators for > each type, so the "identical" operator does not need to be > implemented for each type to use it as shown above. It compares > values byte-for-byte, after detoasting. The test for identical > records can avoid the detoasting altogether for any values with > different lengths, and it stops when it finds the first column with > a difference. In the general case, that operator sounds dangerous to me. We don't guarantee that a Datum containing the same data always has the same binary representation. E.g. array can have a null bitmap or may not have one, depending on how they were created. I am not actually sure whether that's a problem for your usecase, but I get headaches when we try circumventing the type abstraction that way. Yes, we do such tricks in other places already, but afaik in all those places errorneously believing two Datums are distinct is not error, just a missed optimization. Allowing a general operator with such a murky definition to creep into something SQL exposed... Hm. Not sure. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-09-12 15:27:27 -0700, Kevin Grittner wrote: >> The new operator is logically similar to IS NOT DISTINCT FROM for a >> record, although its implementation is very different. For one >> thing, it doesn't replace the operation with column level operators >> in the parser. For another thing, it doesn't look up operators for >> each type, so the "identical" operator does not need to be >> implemented for each type to use it as shown above. It compares >> values byte-for-byte, after detoasting. The test for identical >> records can avoid the detoasting altogether for any values with >> different lengths, and it stops when it finds the first column with >> a difference. > > In the general case, that operator sounds dangerous to me. We don't > guarantee that a Datum containing the same data always has the same > binary representation. E.g. array can have a null bitmap or may not have > one, depending on how they were created. > > I am not actually sure whether that's a problem for your usecase, but I > get headaches when we try circumventing the type abstraction that way. > > Yes, we do such tricks in other places already, but afaik in all those > places errorneously believing two Datums are distinct is not error, just > a missed optimization. Allowing a general operator with such a murky > definition to creep into something SQL exposed... Hm. Not sure. Well, the only two alternatives I could see were to allow user-visible differences not to be carried to the matview if they old and new values were considered "equal", or to implement an "identical" operator or function in every type that was to be allowed in a matview. Given those options, what's in this patch seemed to me to be the least evil. It might be worth noting that this scheme doesn't have a problem with correctness if there are multiple equal values which are not identical, as long as any two identical values are equal. If the query which generates contents for a matview generates non-identical but equal values from one run to the next without any particular reason, that might cause performance problems. To mangle Orwell: "Among pairs of equal values, some pairs are more equal than others." -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-13 14:36:27 -0700, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > On 2013-09-12 15:27:27 -0700, Kevin Grittner wrote: > >> The new operator is logically similar to IS NOT DISTINCT FROM for a > >> record, although its implementation is very different. For one > >> thing, it doesn't replace the operation with column level operators > >> in the parser. For another thing, it doesn't look up operators for > >> each type, so the "identical" operator does not need to be > >> implemented for each type to use it as shown above. It compares > >> values byte-for-byte, after detoasting. The test for identical > >> records can avoid the detoasting altogether for any values with > >> different lengths, and it stops when it finds the first column with > >> a difference. > > > > In the general case, that operator sounds dangerous to me. We don't > > guarantee that a Datum containing the same data always has the same > > binary representation. E.g. array can have a null bitmap or may not have > > one, depending on how they were created. > > > > I am not actually sure whether that's a problem for your usecase, but I > > get headaches when we try circumventing the type abstraction that way. > > > > Yes, we do such tricks in other places already, but afaik in all those > > places errorneously believing two Datums are distinct is not error, just > > a missed optimization. Allowing a general operator with such a murky > > definition to creep into something SQL exposed... Hm. Not sure. > > Well, the only two alternatives I could see were to allow > user-visible differences not to be carried to the matview if they > old and new values were considered "equal", or to implement an > "identical" operator or function in every type that was to be > allowed in a matview. Given those options, what's in this patch > seemed to me to be the least evil. > > It might be worth noting that this scheme doesn't have a problem > with correctness if there are multiple equal values which are not > identical, as long as any two identical values are equal. If the > query which generates contents for a matview generates > non-identical but equal values from one run to the next without any > particular reason, that might cause performance problems. I am not actually that concerned with MVCs using this, you're quite capable of analyzing the dangers. What I am wary of is exposing an operator that's basically broken from the get go to SQL. Now, the obvious issue there is that matviews use SQL to refresh :( Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> wrote: > I am not actually that concerned with MVCs using this, you're quite > capable of analyzing the dangers. What I am wary of is exposing an > operator that's basically broken from the get go to SQL. > Now, the obvious issue there is that matviews use SQL to refresh :( I'm not sure why these operators are more broken or dangerous than those which already exist to support the text_pattern_ops and bpchar_pattern_ops operator families. I could overload those operator names as much as possible if that seems better. As I said at the start of the thread, I have no particular attachment to these operator names. For example, that would mean using ~>=~ as the operator for record_image_ge() instead of using >==. It would be trivial to make that adjustment to the patch. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-13 15:13:20 -0700, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > > I am not actually that concerned with MVCs using this, you're quite > > capable of analyzing the dangers. What I am wary of is exposing an > > operator that's basically broken from the get go to SQL. > > Now, the obvious issue there is that matviews use SQL to refresh :( > > I'm not sure why these operators are more broken or dangerous than > those which already exist to support the text_pattern_ops and > bpchar_pattern_ops operator families. I could overload those > operator names as much as possible if that seems better. As I said > at the start of the thread, I have no particular attachment to > these operator names. For example, that would mean using ~>=~ as > the operator for record_image_ge() instead of using >==. It would > be trivial to make that adjustment to the patch. Hm. I don't see the similarity. Those have pretty clearly defined behaviour. Not one that's dependendant on padding bytes, null bitmaps that can or cannot be present and such. I guess we need input from others here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> wrote: > Not one that's dependendant on padding bytes, null bitmaps that > can or cannot be present and such. Can you provide an example of where that's an issue with this patch? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-13 19:20:11 -0700, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > > Not one that's dependendant on padding bytes, null bitmaps that > > can or cannot be present and such. > > Can you provide an example of where that's an issue with this > patch? I haven't yet tested your patch, but what I am talking about is that e.g.: SELECT (ARRAY[1,2,3,NULL])[1:3] = ARRAY[1,2,3]; obviously should be true. But both arrays don't have the same binary representation since the former has a null bitmap, the latter not. So, if you had a composite type like (int4[]) and would compare that without invoking operators you'd return something false in some cases because of the null bitmaps. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> wrote: > what I am talking about is that > e.g.: SELECT (ARRAY[1,2,3,NULL])[1:3] = ARRAY[1,2,3]; > obviously should be true. The patch does not change the behavior of the = operator for any type under any circumstances. > But both arrays don't have the same binary representation since > the former has a null bitmap, the latter not. So, if you had a > composite type like (int4[]) and would compare that without > invoking operators you'd return something false in some cases > because of the null bitmaps. Not for the = operator. The new "identical" operator would find them to not be identical, though. Since the new operator is only for the record type, I need to wrap the values in your example: test=# SELECT row ((ARRAY[1,2,3,NULL])[1:3])::record test-# = row (ARRAY[1,2,3])::record; ?column? ---------- t (1 row) test=# SELECT row ((ARRAY[1,2,3,NULL])[1:3])::record test-# === row (ARRAY[1,2,3])::record; ?column? ---------- f (1 row) Or, to borrow from the citext example using arrays: test=# CREATE TABLE array_table ( test(# id serial primary key, test(# nums int4[] test(# ); CREATE TABLE test=# INSERT INTO array_table (nums) test-# VALUES ((ARRAY[1,2,3,NULL])[1:3]), (ARRAY[1,2,3]), test-# (ARRAY[1,2,3]), (NULL), (NULL); INSERT 0 5 test=# CREATE MATERIALIZED VIEW array_matview AS test-# SELECT * FROM array_table; SELECT 5 test=# CREATE UNIQUE INDEX array_matview_id test-# ON array_matview (id); CREATE INDEX test=# select * from array_matview; id | nums ----+--------- 1 | {1,2,3} 2 | {1,2,3} 3 | {1,2,3} 4 | 5 | (5 rows) Note that the on-disk representation of the row where id = 1 differs from the on-disk representation where id in (2,3), both in the table and the matview. test=# SELECT * test-# FROM array_matview m test-# FULL JOIN array_table t ON (t.id = m.id AND t === m) test-# WHERE t.id IS NULL OR m.id IS NULL; id | nums | id | nums ----+------+----+------ (0 rows) ... so the query looking for work for the RMVC statement finds nothing to do. test=# UPDATE array_table SET nums = (ARRAY[1,2,3,NULL])[1:3] test-# WHERE id between 1 and 2; UPDATE 2 Now we have added an unnecessary bitmap to the on-disk storage of the value where id = 2. test=# SELECT * test-# FROM array_matview m test-# FULL JOIN array_table t ON (t.id = m.id AND t === m) test-# WHERE t.id IS NULL OR m.id IS NULL; id | nums | id | nums ----+---------+----+--------- | | 2 | {1,2,3} 2 | {1,2,3} | | (2 rows) ... and the query sees that they differ. test=# REFRESH MATERIALIZED VIEW CONCURRENTLY array_matview; REFRESH MATERIALIZED VIEW test=# SELECT * test-# FROM array_matview m test-# FULL JOIN array_table t ON (t.id = m.id AND t === m) test-# WHERE t.id IS NULL OR m.id IS NULL; id | nums | id | nums ----+------+----+------ (0 rows) The REFRESH causes them to match again, and later REFRESH runs won't see a need to do any work there unless the on-disk representation changes again. As far as I can see, we have four choices: (1) Never update values that are "equal", even if they appear different to the users, as was demonstrated with the citext example. (2) Require every data type which can be used in a matview to implement some new operator or function for "identical". Perhaps that could be mitigated to only implementat it if equal values can have user-visible differences. (3) Embed special cases into record identical tests for types known to allow multiple on-disk representations which have no user-visible differences. (4) Base the need to update a matview column on whether its on-disk representation is identical to what a new run of the defining query would generate. If this causes performance problems for use of a given type in a matview, one possible solution would be to modify that particular type to use a canonical format when storing a value into a record. For example, storing an array which has a bitmap of null values even though there are no nulls in the array could strip the bitmap as it is stored to the record. Currently we are using (4). I only included (3) for completeness; even just typing it as a hypothetical made me want to take a shower. (1) seems pretty horrid, too. (2) isn't evil, exactly, but any types which allowed user-visible differences in equal values would not exhibit correct behavior in matviews until and unless an "identical" operator was added. It might also perform noticeably worse than (4). Option (4) has the advantage of showing correct logical behavior in all types immediately, and restricting any performance fixes to the types with the inconsistent storage formats. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-14 11:25:52 -0700, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > > what I am talking about is that > > e.g.: SELECT (ARRAY[1,2,3,NULL])[1:3] = ARRAY[1,2,3]; > > obviously should be true. > > The patch does not change the behavior of the = operator for any > type under any circumstances. Yes, sure. I wasn't thinking you would. > > But both arrays don't have the same binary representation since > > the former has a null bitmap, the latter not. So, if you had a > > composite type like (int4[]) and would compare that without > > invoking operators you'd return something false in some cases > > because of the null bitmaps. > > Not for the = operator. The new "identical" operator would find > them to not be identical, though. Yep. And I think that's a problem if exposed to SQL. People won't understand the hazards and end up using it because its faster or somesuch. > Since the new operator is only for the record type, I need to wrap > the values in your example: Yes. > The REFRESH causes them to match again, and later REFRESH runs > won't see a need to do any work there unless the on-disk > representation changes again. Yes, I understand that the matview code itself will just perform superflous work. We use such comparisons in other parts of the code similarly. > As far as I can see, we have four choices: > > (1) Never update values that are "equal", even if they appear > different to the users, as was demonstrated with the citext > example. I think, introducing a noticeable amount of infrastructure for this just because of citext is a bad idea. At some point we need to replace citext with proper case-insensitive collation support - then it really might become necessary. > (2) Require every data type which can be used in a matview to > implement some new operator or function for "identical". Perhaps > that could be mitigated to only implementat it if equal values can > have user-visible differences. That basically would require adding a new member to btree opclasses that btrees don't need themselves... Hm. > (3) Embed special cases into record identical tests for types > known to allow multiple on-disk representations which have no > user-visible differences. I think this is a complete nogo. a) I don't forsee we know of all these cases b) it wouldn't be extensible. Oh. Now that I've read further, I see you feel the same. Good ;) > (4) Base the need to update a matview column on whether its > on-disk representation is identical to what a new run of the > defining query would generate. If this causes performance problems > for use of a given type in a matview, one possible solution would > be to modify that particular type to use a canonical format when > storing a value into a record. For example, storing an array which > has a bitmap of null values even though there are no nulls in the > array could strip the bitmap as it is stored to the record. If matview refreshs weren't using plain SQL and thus wouldn't require exposing that operator to SQL I wouldn't have a problem with this... There's the ungodly ugly choice of having an matview_equal function (or operator) that checks if we're doing a refresh atm... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> wrote: > If matview refreshs weren't using plain SQL and thus wouldn't > require exposing that operator to SQL I wouldn't have a problem > with this... If RMVC were the end of the story, it might be worth building up a mass of execution nodes directly, although it would be hard to see how we could make the right planning choices (e.g., MergeJoin versus HashJoin) that way. But the whole incremental maintenance area, to have any chance of working accurately and without an endless stream of bugs, needs to be based on relational algebra. There needs to be a way to express that in a much higher level language than execution node creation. If it doesn't use SQL we would need to invent a relational language very much like it, which would be silly when we have a perfectly good language we can already use. The sky is blue; let's move on. The test for identical records will be needed in SQL if we want to have these matview features. We could limit use of that to contexts where MatViewIncrementalMaintenanceIsEnabled(), but I don't see the point. If someone uses an undocumented operator, and uses it inappropriately, they may get a surprising result. We already have undocumented operators to support record comparison and pattern opclasses, and if people use those they may get surprising results. I don't recall any reports of problems from people actually trying to do so. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Sep 14, 2013 at 08:58:32PM +0200, Andres Freund wrote: > On 2013-09-14 11:25:52 -0700, Kevin Grittner wrote: > > Andres Freund <andres@2ndquadrant.com> wrote: > > > But both arrays don't have the same binary representation since > > > the former has a null bitmap, the latter not. So, if you had a > > > composite type like (int4[]) and would compare that without > > > invoking operators you'd return something false in some cases > > > because of the null bitmaps. > > > > Not for the = operator. The new "identical" operator would find > > them to not be identical, though. > > Yep. And I think that's a problem if exposed to SQL. People won't > understand the hazards and end up using it because its faster or > somesuch. The important question is whether to document the new operator and/or provide it under a guessable name. If we give the operator a weird name, don't document it, and put an "internal use only" comment in the catalogs, that is essentially as good as hiding this feature at the SQL level. I'm of two minds on that question. On the one hand, MV maintenance is hardly the first use case for an identity operator. Any replication system or user space materialized view implementation might want this. On the other hand, offering it for the record type exclusively is surprising. It's also surprising how records with different numbers of dropped columns can be found identical, even though a record column within the top-level record is not permitted to vary that way. Supposing a decision to document the operator, a second question is whether "===" is the right name: On Thu, Sep 12, 2013 at 03:27:27PM -0700, Kevin Grittner wrote: > The identical (===) and not identical (!==) operator names were > chosen because of a vague similarity to the "exactly equals" > concepts in JavaScript and PHP, which use that name. The semantics > aren't quite the same, but it seemed close enough not to be too > surprising. Maybe. If we were mimicking the JavaScript/PHP operator of the same name, '1.0'::numeric === '1.00'::numeric would return true, and '1'::int4 === '1'::int2 would return false. The patch submitted returns false for the first example and raises a "cannot compare dissimilar column types" error when comparing an int4 record column to an int2 record column. > I think, introducing a noticeable amount of infrastructure for this just > because of citext is a bad idea. > At some point we need to replace citext with proper case-insensitive > collation support - then it really might become necessary. citext is just one example. Others include '1.0'::numeric = '1.00'::numeric and '30 day'::interval = '1 mon'::interval. > > (2) Require every data type which can be used in a matview to > > implement some new operator or function for "identical". Perhaps > > that could be mitigated to only implementat it if equal values can > > have user-visible differences. > > That basically would require adding a new member to btree opclasses that > btrees don't need themselves... Hm. That's the wrong way to model it. Identity is a particular kind of equality, which is to say a particular hash opclass. It so happens that memcmp() is the logical way to implement most/all identity operators, so we can get a btree opclass as well. Type-specific identity operators seem like overkill, anyway. If we find that meaningless variations in a particular data type are causing too many false non-matches for the generic identity operator, the answer is to make the functions generating datums of that type settle on a canonical form. That would be the solution for your example involving array null bitmaps. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On 09/15/2013 01:35 PM, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > >> If matview refreshs weren't using plain SQL and thus wouldn't >> require exposing that operator to SQL I wouldn't have a problem >> with this... > If RMVC were the end of the story, it might be worth building up a > mass of execution nodes directly, although it would be hard to see > how we could make the right planning choices (e.g., MergeJoin > versus HashJoin) that way. But the whole incremental maintenance > area, to have any chance of working accurately and without an > endless stream of bugs, needs to be based on relational algebra. > There needs to be a way to express that in a much higher level > language than execution node creation. If it doesn't use SQL we > would need to invent a relational language very much like it, which > would be silly when we have a perfectly good language we can > already use. The sky is blue; let's move on. > > The test for identical records will be needed in SQL if we want to > have these matview features. We could limit use of that to > contexts where MatViewIncrementalMaintenanceIsEnabled(), but I > don't see the point. If someone uses an undocumented operator, and > uses it inappropriately, they may get a surprising result. Just remember to document it as "undocumented" so people will know not to use them ;) Lots of people were bitten when (undocumented) hash functions were changed thus breaking hash-based partitioning. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
Hannu Krosing <hannu@2ndQuadrant.com> wrote: > Lots of people were bitten when (undocumented) hash > functions were changed thus breaking hash-based partitioning. Nobody can be affected by the new operators in this patch unless they choose to use them to compare two records. They don't work for any other type and they don't come into play unless you specifically request them. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-15 19:49:26 -0400, Noah Misch wrote: > On Sat, Sep 14, 2013 at 08:58:32PM +0200, Andres Freund wrote: > > On 2013-09-14 11:25:52 -0700, Kevin Grittner wrote: > > > Andres Freund <andres@2ndquadrant.com> wrote: > > > > But both arrays don't have the same binary representation since > > > > the former has a null bitmap, the latter not. So, if you had a > > > > composite type like (int4[]) and would compare that without > > > > invoking operators you'd return something false in some cases > > > > because of the null bitmaps. > > > > > > Not for the = operator. The new "identical" operator would find > > > them to not be identical, though. > > > > Yep. And I think that's a problem if exposed to SQL. People won't > > understand the hazards and end up using it because its faster or > > somesuch. > > The important question is whether to document the new operator and/or provide > it under a guessable name. If we give the operator a weird name, don't > document it, and put an "internal use only" comment in the catalogs, that is > essentially as good as hiding this feature at the SQL level. Doesn't match my experience. > Type-specific identity operators seem like overkill, anyway. If we find that > meaningless variations in a particular data type are causing too many false > non-matches for the generic identity operator, the answer is to make the > functions generating datums of that type settle on a canonical form. That > would be the solution for your example involving array null bitmaps. I think that's pretty much unrealistic. I am pretty sure that if either of us starts looking we will find at about a dozen of such cases and miss the other dozen. Not to speak about external code which is damn likely to contain such cases. And I think that efficiency will often make such normalization expensive (consider postgis where Datums afaik can exist with an internal bounding box or without). I think it's far more realistic to implement an identity operator that will fall back to a type specific operator iff equals has "strange" properties. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> wrote: > I think it's far more realistic to implement an identity operator > that will fall back to a type specific operator iff equals has > "strange" properties. My biggest problem with that approach is that it defaults to incorrect behavior for types for which user-visible differences among "equal" values are possible, and requires per-type fixes to get correct behavior. The approach in this patch provides correct behavior as the default, with possible per-type changes for optimization, as people find that desirable. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Sep 15, 2013 at 6:49 PM, Noah Misch <noah@leadboat.com> wrote: > On Sat, Sep 14, 2013 at 08:58:32PM +0200, Andres Freund wrote: >> On 2013-09-14 11:25:52 -0700, Kevin Grittner wrote: >> > Andres Freund <andres@2ndquadrant.com> wrote: >> > > But both arrays don't have the same binary representation since >> > > the former has a null bitmap, the latter not. So, if you had a >> > > composite type like (int4[]) and would compare that without >> > > invoking operators you'd return something false in some cases >> > > because of the null bitmaps. >> > >> > Not for the = operator. The new "identical" operator would find >> > them to not be identical, though. >> >> Yep. And I think that's a problem if exposed to SQL. People won't >> understand the hazards and end up using it because its faster or >> somesuch. > > The important question is whether to document the new operator and/or provide > it under a guessable name. If we give the operator a weird name, don't > document it, and put an "internal use only" comment in the catalogs, that is > essentially as good as hiding this feature at the SQL level. > > I'm of two minds on that question. On the one hand, MV maintenance is hardly > the first use case for an identity operator. Any replication system or user > space materialized view implementation might want this. On the other hand, > offering it for the record type exclusively is surprising. It's also > surprising how records with different numbers of dropped columns can be found > identical, even though a record column within the top-level record is not > permitted to vary that way. > > Supposing a decision to document the operator, a second question is whether > "===" is the right name: I vote to reserve '===' as shorthand for 'IS NOT DISTINCT FROM' and give the binary equality operator a funky name. I would document the operator though. merlin
On 09/16/2013 04:01 PM, Kevin Grittner wrote: > Hannu Krosing <hannu@2ndQuadrant.com> wrote: > >> Lots of people were bitten when (undocumented) hash >> functions were changed thus breaking hash-based partitioning. > Nobody can be affected by the new operators in this patch unless > they choose to use them to compare two records. They don't work > for any other type and they don't come into play unless you > specifically request them. What I meant is that rather than leave it really undocumented, document it as "system function for specific usage, has caveats and may change in future versions. use at your own risk and make sure you know what you are doing" PostgreSQL has good enough introspection features that people tend to find functions and operators using psql-s \d ... > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
Hannu Krosing <hannu@2ndQuadrant.com> wrote: > What I meant is that rather than leave it really undocumented, > document it as "system function for specific usage, has caveats > and may change in future versions. use at your own risk and > make sure you know what you are doing" Well, that was my original assumption and intention; but when I went to look for where the operators for record *equals* were defined, I found that we had apparently chosen to leave them undocumented. Oddly, under a section titled "Row-wise Comparison" we only document the behavior of comparisons involving what the SQL spec calls <row value constructor>. I asked whether that was intentional, and heard only the chirping of crickets: http://www.postgresql.org/message-id/1378848776.70700.YahooMailNeo@web162902.mail.bf1.yahoo.com If we choose not to document the equals operator for records, it hardly makes sense to document the identical operator for records. > PostgreSQL has good enough introspection features that people > tend to find functions and operators using psql-s \d ... One would think so, yet I don't recall seeing anyone posting regarding the existing undocumented record comparison operators. Nor do I recall seeing anyone posting about the undocumented pattern comparison operators. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-16 10:46:53 -0700, Kevin Grittner wrote: > One would think so, yet I don't recall seeing anyone posting > regarding the existing undocumented record comparison operators. > Nor do I recall seeing anyone posting about the undocumented > pattern comparison operators. I've used and have seen them being used in client code... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-09-16 10:46:53 -0700, Kevin Grittner wrote: >> I don't recall seeing anyone posting >> regarding the existing undocumented record comparison operators. >> Nor do I recall seeing anyone posting about the undocumented >> pattern comparison operators. > > I've used and have seen them being used in client code... Which, the record operators or the text pattern operators (which ignore collations and multi-byte character considerations and just use memcmp())? http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/varlena.c;h=5e2c2ddc532c604a05f365f0cf6761033a35be76;hb=master#l1719 Is the fact that you have seen them used in client code even though they are not documented an argument for or against adding documentation for them? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-16 10:58:01 -0700, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > On 2013-09-16 10:46:53 -0700, Kevin Grittner wrote: > >> I don't recall seeing anyone posting > >> regarding the existing undocumented record comparison operators. > >> Nor do I recall seeing anyone posting about the undocumented > >> pattern comparison operators. > > > > I've used and have seen them being used in client code... > > Which, the record operators or the text pattern operators (which > ignore collations and multi-byte character considerations and just > use memcmp())? I was referring to the text pattern ops. But I've seen the record ones as well. > http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/varlena.c;h=5e2c2ddc532c604a05f365f0cf6761033a35be76;hb=master#l1719 > > Is the fact that you have seen them used in client code even though > they are not documented an argument for or against adding > documentation for them? Neither. It's an argument that providing operators with confusing behaviours will not go unnoticed/unused. I don't really think the record identical and pattern ops situations are comparable. The latter has a well defined behaviour (using the C locale basically) and is only useable for types where it's well defined. The proposed identical operator returns false for comparisons of actually equal data, that's quite different imo. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Sep 16, 2013 at 12:46 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > Hannu Krosing <hannu@2ndQuadrant.com> wrote: > >> What I meant is that rather than leave it really undocumented, >> document it as "system function for specific usage, has caveats >> and may change in future versions. use at your own risk and >> make sure you know what you are doing" > > Well, that was my original assumption and intention; but when I > went to look for where the operators for record *equals* were > defined, I found that we had apparently chosen to leave them > undocumented. Oddly, under a section titled "Row-wise Comparison" > we only document the behavior of comparisons involving what the SQL > spec calls <row value constructor>. I asked whether that was > intentional, and heard only the chirping of crickets: > > http://www.postgresql.org/message-id/1378848776.70700.YahooMailNeo@web162902.mail.bf1.yahoo.com > > If we choose not to document the equals operator for records, it > hardly makes sense to document the identical operator for records. > >> PostgreSQL has good enough introspection features that people >> tend to find functions and operators using psql-s \d ... > > One would think so, yet I don't recall seeing anyone posting > regarding the existing undocumented record comparison operators. > Nor do I recall seeing anyone posting about the undocumented > pattern comparison operators. This behavior came about via a gripe of mine (but mostly courtesy Tom Lane_: http://www.postgresql.org/message-id/6EE64EF3AB31D5448D0007DD34EEB34101AEF5@Herge.rcsinc.local It brought row-wise comparon closer- (if not exactly to-) SQL spec. The underlying use-case is to do ISAM-like record iteration over the index. "index" being the operative word. merlin
Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-09-16 10:58:01 -0700, Kevin Grittner wrote: >> Andres Freund <andres@2ndquadrant.com> wrote: >>> On 2013-09-16 10:46:53 -0700, Kevin Grittner wrote: >>>> I don't recall seeing anyone posting regarding the existing >>>> undocumented record comparison operators. Nor do I recall >>>> seeing anyone posting about the undocumented pattern >>>> comparison operators. >>> >>> I've used and have seen them being used in client code... >> >> Which, the record operators or the text pattern operators (which >> ignore collations and multi-byte character considerations and >> just use memcmp())? > > I was referring to the text pattern ops. But I've seen the record > ones as well. >> Is the fact that you have seen them used in client code even >> though they are not documented an argument for or against adding >> documentation for them? > > Neither. It's an argument that providing operators with confusing > behaviours will not go unnoticed/unused. It is true that there are likely to be some who find the new operators useful and will read the code to determine how to use them, just as they apparently did for the undocumented operators which already exist. > I don't really think the record identical and pattern ops > situations are comparable. The latter has a well defined > behaviour (using the C locale basically) and is only useable for > types where it's well defined. But it is *not* restricted to text in the C locale. It uses memcmp() on text values even if they contain multi-byte characters and use collations which require different sequencing. > The proposed identical operator returns false for comparisons of > actually equal data, That is the point of it. It tells you whether the record images are byte-for-byte identical for all values within the record. In some cases a detected difference clearly causes a user-visible difference; in others it is merely a performance or storage-space difference. For example in your array example the array with the unnecessary bitmap attached to it takes an extra eight bytes on disk, for no apparent benefit. The point is that code which wants to know whether those are identical images can detect that they are not. This works by default for all types, without having to track down which types are capable of doing this for "equal" values or in which cases the differences are user-visible. > that's quite different imo. Sorting a string based on its byte image, with no regard to its collation or character boundaries, is what these text pattern operators do, and it is exactly what my patch does. My patch just provides such a test for all data types within a record by default, rather than requiring a new opfamily for every type. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Merlin Moncure <mmoncure@gmail.com> wrote: > Kevin Grittner <kgrittn@ymail.com> wrote: >> I don't recall seeing anyone posting regarding the existing >> undocumented record comparison operators. > This behavior came about via a gripe of mine (but mostly courtesy > Tom Lane_: > http://www.postgresql.org/message-id/6EE64EF3AB31D5448D0007DD34EEB34101AEF5@Herge.rcsinc.local > > It brought row-wise comparon closer- (if not exactly to-) SQL > spec. The underlying use-case is to do ISAM-like record > iteration over the index. "index" being the operative word. Indeed. The documented behavior complies with the spec (and is extremely useful!), but only fully matches the behavior when one of the arguments is a row value constructor. The documentation sort of acknowledges this by describing the arguments not as "row" or "record", but as "row_constructor". Comparison operators which followed those exact rules would not be usable for indexing, so we have slightly different semantics to support that. I have no gripe whatsoever with any of that. I'm even OK with not documenting this behavior, since it might do more to confuse than elucidate. I did ask the question, because if we choose not to document the behavior of record comparisons based around the concept of "record equals", I don't think it makes sense to document record comparisons based around the concept of "record image equals". That is the terminology used in the function names and opfamily, but I have been using "identical" in discussion, which perhaps has confused the issue. If we take documentation of record comparisons farther, we need to be prepared to explain why this is correct (because it is required by spec): test=# select row(1, null::text) = row(1, null::text); ?column? ---------- (1 row) ... but this is also correct (because otherwise indexes don't work): test=# select row(1, null::text) = row(1, null::text)::record; ?column? ---------- t (1 row) -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Sep 16, 2013 at 04:28:23PM +0200, Andres Freund wrote: > On 2013-09-15 19:49:26 -0400, Noah Misch wrote: > > Type-specific identity operators seem like overkill, anyway. If we find that > > meaningless variations in a particular data type are causing too many false > > non-matches for the generic identity operator, the answer is to make the > > functions generating datums of that type settle on a canonical form. That > > would be the solution for your example involving array null bitmaps. > > I think that's pretty much unrealistic. I am pretty sure that if either > of us starts looking we will find at about a dozen of such cases and > miss the other dozen. Not to speak about external code which is damn > likely to contain such cases. It wouldn't be a problem if we missed cases or declined to update known cases. The array example probably isn't worth changing. Who's going to repeatedly flip-flop an array column value between the two representations and then bemoan the resulting MV write traffic? > And I think that efficiency will often make such normalization expensive > (consider postgis where Datums afaik can exist with an internal bounding > box or without). If it's so difficult to canonicalize between two supposedly-identical variations, it's likewise a stretch to trust that all credible operations will fail to distinguish between those variations. > I think it's far more realistic to implement an identity operator that > will fall back to a type specific operator iff equals has "strange" > properties. Complicating such efforts, the author of a custom identity operator doesn't have the last word on functions that process the data type. Take your postgis example; if a second party adds a has_internal_bounding_box() function, an identity operator ignoring that facet is no longer valid. memcmp() has served well for HOT and for _equalConst(); why would it suddenly fall short for MV maintenance? nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On 2013-09-16 16:58:21 -0400, Noah Misch wrote: > memcmp() has served well for HOT and for _equalConst(); why would it suddenly > fall short for MV maintenance? I don't have a problem using it internally, I have a problem exposing the capability to sql. Don't tell me that's the same. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-09-16 16:58:21 -0400, Noah Misch wrote: >> memcmp() has served well for HOT and for _equalConst(); why >> would it suddenly fall short for MV maintenance? > > I don't have a problem using it internally, I have a problem > exposing the capability to sql. ... like we do in *pattern ops and the suppress_redundant_updates_trigger() function? > Don't tell me that's the same. No, this gives users a way to make the same test that HOT uses for whether values match, albeit undocumented. Well, not exactly the same test, because this patch detoasts before comparing -- but pretty close. The question is, if it's unsafe for a user to make this test, why would it be safe for HOT to use it? I'm really having trouble understanding what problem you have with this. Can you describe a scenario where someone shoots themselves in the foot with it, because I'm not seeing any? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-16 14:39:30 -0700, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > On 2013-09-16 16:58:21 -0400, Noah Misch wrote: > >> memcmp() has served well for HOT and for _equalConst(); why > >> would it suddenly fall short for MV maintenance? > > > > I don't have a problem using it internally, I have a problem > > exposing the capability to sql. > > ... like we do in *pattern ops and the That's still an absurd comparison. pattern ops have a defined behaviour, even for multibyte characters. After all, you can simply have a UTF-8 database with C collation. Remember, in a C collation pattern ops can use normal indexes. The only thing that happens is that multibyte chars are sorted differently. They aren't sorted basically randomly or anything. > suppress_redundant_updates_trigger() function? You get superflous trigger calls. So what. It's not usable for anything but a trigger. > I'm really having trouble understanding what problem you have with > this. Can you describe a scenario where someone shoots themselves > in the foot with it, because I'm not seeing any? It certainly will be surprising to just about anyone that something like: SELECT * FROM foo WHERE whatever_composite_or_row === '(...)'; may not return rows where there's no SQL level discernible difference (even by looking at the text conversion) between whatever_composite_or_row and '(...)' because e.g. of the aforementioned array null bitmaps. I can understand claiming that the risk is acceptable but arguing it's not there seems extremly strange to me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-09-16 23:58:46 +0200, Andres Freund wrote: > > suppress_redundant_updates_trigger() function? > > You get superflous trigger calls. So what. It's not usable for anything > but a trigger. Primarily unneccesary IO, not unneccessary trigger calls (which can also happen). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-09-16 14:39:30 -0700, Kevin Grittner wrote: >> Andres Freund <andres@2ndquadrant.com> wrote: >>> On 2013-09-16 16:58:21 -0400, Noah Misch wrote: >>>> memcmp() has served well for HOT and for _equalConst(); why >>>> would it suddenly fall short for MV maintenance? >>> >>> I don't have a problem using it internally, I have a problem >>> exposing the capability to sql. >> >> ... like we do in *pattern ops and the > > That's still an absurd comparison. pattern ops have a defined > behaviour, even for multibyte characters. After all, you can > simply have a UTF-8 database with C collation. Remember, in a C > collation pattern ops can use normal indexes. > The only thing that happens is that multibyte chars are sorted > differently. They aren't sorted basically randomly or anything. Neither are records using the new operator -- there is a deterministic sort order based on the bytes representing the record's values. >> suppress_redundant_updates_trigger() function? > > You get superflous trigger calls. So what. My feeling exactly. >> I'm really having trouble understanding what problem you have >> with this. Can you describe a scenario where someone shoots >> themselves in the foot with it, because I'm not seeing any? > > It certainly will be surprising to just about anyone that > something like: > > SELECT * FROM foo WHERE whatever_composite_or_row === '(...)'; > > may not return rows where there's no SQL level discernible > difference (even by looking at the text conversion) between > whatever_composite_or_row and '(...)' because e.g. of the > aforementioned array null bitmaps. Since the operator is bound to a function named record_image_eq(), it seems to me to be a feature that if the image isn't equal due to a bitmap being present in one and not the other it says they differ. It's not a bug or a problem. It gives you a way to *tell* whether two rows actually have the same representation, which could be valuable for debugging. The closest you can easily come without this is to see that pg_column_size() yields a different result for them, where that is true (as in the array bitmap case). > I can understand claiming that the risk is acceptable but arguing > it's not there seems extremly strange to me. It's not a risk. It's why the operator exists. Perhaps the name of the operator (===) or the fact that I've been using the shorthand description of "identical" instead of writing out "record image equals" in these emails has confused you. I can stop using the shorter description and have absolutely no attachment to the operator name, if that helps. The point of the operator is to determine whether two records, which may compare as "equal" (but which don't necessarily appear the same to a user), have the same byte image for each corresponding value. The point of the opfamily is to be able to do a MergeJoin on this basis. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-16 15:26:08 -0700, Kevin Grittner wrote: > > I can understand claiming that the risk is acceptable but arguing > > it's not there seems extremly strange to me. > > It's not a risk. It's why the operator exists. Pft. It's fine if the materialized view code uses it. I don't see the danger there. It's about users discovering it. If they notice it, they will use it because "its a crapload faster" than normal row comparisons. And deals with NULLs in a "simpler" way. > Perhaps the name > of the operator (===) or the fact that I've been using the > shorthand description of "identical" instead of writing out "record > image equals" in these emails has confused you. If you really think that "confusing you" is the correct description for concerns about users not understanding limitations (which you didn't seem to know about), go ahead. Way to go to solicit feedback. > I can stop using > the shorter description and have absolutely no attachment to the > operator name, if that helps. You're unfortunately going to need operators if you want mergejoins to work, right? If not I'd have said name it matview_needs_update() or something like that... But yes, === certainly seems like a bad idea. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/16/2013 04:01 PM, Kevin Grittner wrote: > Hannu Krosing <hannu@2ndQuadrant.com> wrote: > >> Lots of people were bitten when (undocumented) hash >> functions were changed thus breaking hash-based partitioning. > Nobody can be affected by the new operators in this patch unless > they choose to use them to compare two records. They don't work > for any other type and they don't come into play unless you > specifically request them. > Maybe the binary equality operator should be named ==== for "really deeply equal" to distinguish it from === which would be merely NOT DISTINCT FROM we could even go one step further and define ===== to mean "the same". ? This could fit better in conceptual sequence of operator 'strength' -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-09-16 15:26:08 -0700, Kevin Grittner wrote: >>> I can understand claiming that the risk is acceptable but >>> arguing it's not there seems extremly strange to me. >> >> It's not a risk. It's why the operator exists. > > Pft. It's fine if the materialized view code uses it. I don't see > the danger there. > It's about users discovering it. If they notice it, they will use > it because "its a crapload faster" than normal row comparisons. To have clean semantics, I think the operator should mean that the stored format of the row is the same. Regarding the array null bitmap example, I think it would be truly weird if the operator said that the stored format was the same, but this happened: test=# select pg_column_size(ARRAY[1,2,3]); pg_column_size ---------------- 36 (1 row) test=# select pg_column_size((ARRAY[1,2,3,NULL])::int4[3]); pg_column_size ---------------- 44 (1 row) They have the same stored format, but a different number of bytes?!? > And deals with NULLs in a "simpler" way. That might be addressed by leaving room to implement IS NOT DISTINCT FROM as an operator. See below. >> Perhaps the name of the operator (===) or the fact that I've >> been using the shorthand description of "identical" instead of >> writing out "record image equals" in these emails has confused >> you. > > If you really think that "confusing you" is the correct > description for concerns about users not understanding > limitations (which you didn't seem to know about), go ahead. Way > to go to solicit feedback. I see how that could be taken in a pejorative way; that was not intended. I apologize. I was rushing a bit on that email to make an appointment shortly afterward. The point was to suggest that bad communication on my part about the intent and concept of the operator may have contributed to you saying that there was a risk of it working as intended. There is perhaps a risk that someone will think that it does something other than what is intended, but when you say that there is a "risk" that it will behave exactly as intended, it does suggest that we're talking past each other. >> I can stop using the shorter description and have absolutely no >> attachment to the operator name, if that helps. > > You're unfortunately going to need operators if you want > mergejoins to work, right? Yes, operators are needed for that. > If not I'd have said name it matview_needs_update() or something > like that... But yes, === certainly seems like a bad idea. I've come to agree with that. The appointment was to meet with a local PostgreSQL user who I've talked to before. He reminded me that his pet peeve was that he couldn't use IS [ NOT ] DISTINCT FROM with the ALL | ANY construct because the IS [ NOT ] DISTINCT FROM predicate isn't an operator. Hannu Krosing <hannu@2ndQuadrant.com> wrote: > Maybe the binary equality operator should be named ==== > for "really deeply equal" > > to distinguish it from === which would be merely NOT DISTINCT FROM > > we could even go one step further and define ===== to mean "the same". > > ? > > This could fit better in conceptual sequence of operator 'strength' I'm inclined to go with this. It would leave the door open to implementing IS NOT DISTINCT FROM on a type-by-type basis. My one concern is that it doesn't make room for a "has no user-visible differences from" operator; but I'm not sure whether that is something that there really is any use for. But if we want to reserve name space for it "just in case" someone has a use for it, that would be between IS NOT DISTINCT FROM and "has the same storage representation as", so that would leave: = equals (but doesn't necessarily look the same as) === IS NOT DISTINCT FROM as an operator ==== reserved for "has no user-visible differences from" ===== stored image is the same Of course, that begs the question of whether == is already "taken". If not, we could knock one '=' off of everything above except for "equals". What existing uses are known for == ? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Sep 17, 2013 at 8:23 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
Of course, that begs the question of whether == is already "taken".
If not, we could knock one '=' off of everything above except for
"equals". What existing uses are known for == ?
== is already taken as a common typo in plpgsql scripts. I strongly prefer if this remained an error.
IF foo == bar THEN ...
IF foo == bar THEN ...
On 09/17/2013 02:46 PM, Rod Taylor wrote:
That was also my reason for not suggesting == .On Tue, Sep 17, 2013 at 8:23 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
Of course, that begs the question of whether == is already "taken".
If not, we could knock one '=' off of everything above except for
"equals". What existing uses are known for == ?== is already taken as a common typo in plpgsql scripts. I strongly prefer if this remained an error.
IF foo == bar THEN ...
It is too widely used in other systems for simple equality check.
-- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 09/17/2013 12:52 AM, Andres Freund wrote: > On 2013-09-16 15:26:08 -0700, Kevin Grittner wrote: >>> I can understand claiming that the risk is acceptable but arguing >>> it's not there seems extremly strange to me. >> It's not a risk. It's why the operator exists. > Pft. It's fine if the materialized view code uses it. I don't see the > danger there. > It's about users discovering it. If they notice it, they will use it > because "its a crapload faster" than normal row comparisons. And deals > with NULLs in a "simpler" way. This is why I suggested naming the operator ==== with four '='s as this should make the user ponder why there are so many equal signs. And there are other languages where more equal signs mean "stronger" equality. Basically it is "equality with possible false negatives". If a user wants to use it for speeding up simple equality, it should be used as "WHERE t.* ==== sample AND t.* = sample" > >> Perhaps the name >> of the operator (===) or the fact that I've been using the >> shorthand description of "identical" instead of writing out "record >> image equals" in these emails has confused you. > If you really think that "confusing you" is the correct description for > concerns about users not understanding limitations (which you didn't > seem to know about), IIRC he did start with a visible limitation of = being too weak equality for some fields > go ahead. Way to go to solicit feedback. > >> I can stop using >> the shorter description and have absolutely no attachment to the >> operator name, if that helps. > You're unfortunately going to need operators if you want mergejoins to > work, right? If not I'd have said name it matview_needs_update() or > something like that... But yes, === certainly seems like a bad idea. but having === as an operator form of IS NOT DISTINCT FROM seems like a good idea to me. We will get an operator which most people expect, with sufficiently strange name to make people look up the docs and we will not be violating SQL spec. And it prepares their heads for stronger semantics of ==== :) > > Greetings, > > Andres Freund >
Kevin Grittner <kgrittn@ymail.com> writes: > = equals (but doesn't necessarily look the same as) > === IS NOT DISTINCT FROM as an operator > ==== reserved for "has no user-visible differences from" > ===== stored image is the same I understand the need for more than one equality operator and my programming language of choice exposes eq, eql, equal and equalp for solving a very similar situation. Oh, and also offers = and string= and string-equal, among others. My vote goes against using a different number of the same character to name the operators though, as that's not visually helpful enough IMO. In yet another language whose grammar took inspiration with Prolog, dealing with binary is made with explicitely using the bit syntax expression where any datum is boxed into << and >>. http://erlang.org/doc/reference_manual/expressions.html#id78513 It might not be practical to do the same in SQL and have either the following syntax example or even a unary "bit representation" operator to do the same: SELECT <<row(expression, here)>> = <<column_name>> … That would probably prevent using indexes? It might be inspiration to name the bit-comparison operator something like <<=>> though, or =><= or something else. If only we could force the usage of unicode for the SQL input itself… I'm sure there's a unicode glyph perfect for what we want here… About IS NOT DISTINCT FROM, I would propose something where NOT and DISTINCT are kept as concepts. Not usually is expressed with ! and we already have a bunch of "same as" or "matches" operators with ~, so: <~> IS DISTINCT FROM !<~> IS NOT DISTINCT FROM Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Tue, Sep 17, 2013 at 8:23 AM, Kevin Grittner <kgrittn@ymail.com> wrote: > To have clean semantics, I think the operator should mean that the > stored format of the row is the same. Regarding the array null > bitmap example, I think it would be truly weird if the operator > said that the stored format was the same, but this happened: > > test=# select pg_column_size(ARRAY[1,2,3]); > pg_column_size > ---------------- > 36 > (1 row) > > test=# select pg_column_size((ARRAY[1,2,3,NULL])::int4[3]); > pg_column_size > ---------------- > 44 > (1 row) > > They have the same stored format, but a different number of > bytes?!? Hmm. For most of this thread, I was leaning toward the view that comparing the binary representations was the wrong concept, and that we actually needed to have type-specific operators that understand the semantics of the data type. But I think this example convinces me otherwise. What we really want to do here is test whether two values are the same, and if you can feed two values that are supposedly the same to some function and get two different answers, well then they're not really the same, are they? Now, I grant that the array case is pretty weird. An array with an all-zeroes null bitmap is basically semantically identical to one with no null bitmap at all. But there are other such cases as well. You can have two floats that print the same way except when extra_float_digits=3, for example, and I think that's probably a difference that we *wouldn't* want to paper over. You can have a long-form numeric that represents a value that could have been represented as a short-form numeric, which is similar to the array case. There are probably other examples as well. But in each of those cases, the point is that there *is* some operation which will distinguish between the two supposedly-identical values, and therefore they are not identical for all purposes. Therefore, I see no harm in having an operator that tests for are-these-values-identical-for-all-purposes. If that's useful for RMVC, then there may be other legitimate uses for it as well. And once we decide that's OK, I think we ought to document it. Sure, it's a little confusing, but we can explain it, I think. It's a good opportunity to point out to people that, most of the time, they really want something else, like the equality operator for the default btree opclass. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > Therefore, I see no harm in > having an operator that tests for > are-these-values-identical-for-all-purposes. If that's useful for > RMVC, then there may be other legitimate uses for it as well. > > And once we decide that's OK, I think we ought to document it. Sure, > it's a little confusing, but we can explain it, I think. It's a good > opportunity to point out to people that, most of the time, they really > want something else, like the equality operator for the default btree > opclass. For my 2c on this, while this can be useful for *us*, and maybe folks hacking pretty close to PG, I can't get behind introducing this as an '===' or some such operator. I've missed why this can't be a simple function and why in the world we would want to encourage users to use this by making it look like a normal language construct of SQL, which damn well better consider numbers which are equal in value to be equal, regardless of their representation. What the heck is the use case for this being a user-oriented, SQL operator..? Thanks, Stephen
On 2013-09-18 11:50:23 -0400, Stephen Frost wrote: > For my 2c on this, while this can be useful for *us*, and maybe folks > hacking pretty close to PG, I can't get behind introducing this as an > '===' or some such operator. I've missed why this can't be a simple > function and why in the world we would want to encourage users to use > this by making it look like a normal language construct of SQL, which > damn well better consider numbers which are equal in value to be equal, > regardless of their representation. I certainly understand the feeling... I think this really needs to have an obscure name. Like ==!!== or somesuch (is equal very much, but doesn't actually test for equality ;)) > What the heck is the use case for this being a user-oriented, SQL > operator..? The materalized view code uses generated SQL, so it has to be SQL accessible. And it needs to be an operator because the join planning code requires that :( Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> wrote: > Kevin Grittner <kgrittn@ymail.com> wrote: >> To have clean semantics, I think the operator should mean that the >> stored format of the row is the same. Regarding the array null >> bitmap example, I think it would be truly weird if the operator >> said that the stored format was the same, but this happened: >> >> test=# select pg_column_size(ARRAY[1,2,3]); >> pg_column_size >> ---------------- >> 36 >> (1 row) >> >> test=# select pg_column_size((ARRAY[1,2,3,NULL])::int4[3]); >> pg_column_size >> ---------------- >> 44 >> (1 row) >> >> They have the same stored format, but a different number of >> bytes?!? > > Hmm. For most of this thread, I was leaning toward the view that > comparing the binary representations was the wrong concept, and that > we actually needed to have type-specific operators that understand the > semantics of the data type. > > But I think this example convinces me otherwise. What we really want > to do here is test whether two values are the same, and if you can > feed two values that are supposedly the same to some function and get > two different answers, well then they're not really the same, are > they? Right. Not only would the per-type solution make materialized views maintenance broken by default, requiring per-type work to make it work reasonably, with silent failures for any type you didn't know about, but "no user-visible differences" is a pretty slippery concept. Did you really think of all the functions someone might use to look at a value? Might there be performance differences we care about that should be handled, even if the user has no way to dig out the difference? Will that change in a future release? > Now, I grant that the array case is pretty weird. An array with an > all-zeroes null bitmap is basically semantically identical to one with > no null bitmap at all. But there are other such cases as well. You > can have two floats that print the same way except when > extra_float_digits=3, for example, and I think that's probably a > difference that we *wouldn't* want to paper over. You can have a > long-form numeric that represents a value that could have been > represented as a short-form numeric, which is similar to the array > case. There are probably other examples as well. But in each of > those cases, the point is that there *is* some operation which will > distinguish between the two supposedly-identical values, and therefore > they are not identical for all purposes. Therefore, I see no harm in > having an operator that tests for > are-these-values-identical-for-all-purposes. If that's useful for > RMVC, then there may be other legitimate uses for it as well. > > And once we decide that's OK, I think we ought to document it. That seems to be the consensus. I don't think we can really document this form of record comparison without also documenting how equality works. I'll work something up for the next version of the patch. > Sure, it's a little confusing, but we can explain it, I think. It's a good > opportunity to point out to people that, most of the time, they really > want something else, like the equality operator for the default btree > opclass. I think the hardest part will be documenting the difference between the row value constructor semantics (which are all that is currently documented) and the record equality semantics (used for sorting and building indexes). In a green field I think I would have argued for having just the standard semantics we have documented, and modifying our sort execution nodes and index builds to deal with that. This is one of those cases where the breakage from changing to that is hard to justify on a cleaner conceptual semantics basis. There also seems to be universal agreement that the operator names should be something other than what I put in the v1 patch, but we don't have agreement on what should be used instead. We need six operators, to support the btree am requirements. Currently the patch has: === !== >== >>> <== <<< Suggested "same as" operators so far are: ==== ===== <<=>> =><= Anyone want to champion one of those, or something else? How about the other five operators to go with your favorite? Keep in mind that this thread has also turned up strong support for an operator to express IS NOT DISTINCT FROM -- so that it can be used with ANY/ALL, among other things. Long term, having an opfamily for that might help us clean up the semantics of record comparison when there are NULLs involved. Currently we use the = operator but act as though IS NOT DISTINCT FROM was specified (except for some cases involving a row value constructor). Any serious discussion of that should probably move to a new thread, but I mention it here because some people wanted to reserve operator space for that, which could conflict with "same as" operators. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Andres Freund (andres@2ndquadrant.com) wrote: > I think this really needs to have an obscure name. Like ==!!== or > somesuch (is equal very much, but doesn't actually test for equality ;)) hah. > > What the heck is the use case for this being a user-oriented, SQL > > operator..? > > The materalized view code uses generated SQL, so it has to be SQL > accessible. And it needs to be an operator because the join planning > code requires that :( Ugh. This feels like a pretty ugly hack to deal with that. I haven't got any magical wand to address it, but making an SQL operator for 'are these really the same bytes' to deal with what is essentially implementation detail is _very_ grotty. Thanks, Stephen
* Kevin Grittner (kgrittn@ymail.com) wrote: > Right. Not only would the per-type solution make materialized views > maintenance broken by default, requiring per-type work to make it > work reasonably, with silent failures for any type you didn't know > about, but "no user-visible differences" is a pretty slippery > concept. I don't like those possibilities, of course, but I'm starting to wonder about this whole concept of looking at it byte-wise. If I'm following correctly, what we're looking at here is having a way for matviews to tell if these bytes are the same as those bytes, for the purpose of deciding to update the matview, right? Yet we can then have cases where the row isn't *actually* different from a value perspective, yet we're going to update it anyway because it's represented slightly differently? What happens if we later want to add support for users to have a matview trigger that's called when a matview row *actually* changes? We'd end up calling it on what are essentially false positives, or having to do some double-check later on "well, did it *really* change?", neither of which is good at all. If we had the IS NOT DISTINCT FROM operators discussed, would that work for this even if it isn't as performant? Or is there an issue with that? Thanks, Stephen
On Wed, Sep 18, 2013 at 10:59 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Andres Freund (andres@2ndquadrant.com) wrote: >> I think this really needs to have an obscure name. Like ==!!== or >> somesuch (is equal very much, but doesn't actually test for equality ;)) > > hah. > >> > What the heck is the use case for this being a user-oriented, SQL >> > operator..? >> >> The materalized view code uses generated SQL, so it has to be SQL >> accessible. And it needs to be an operator because the join planning >> code requires that :( > > Ugh. This feels like a pretty ugly hack to deal with that. I haven't > got any magical wand to address it, but making an SQL operator for 'are > these really the same bytes' to deal with what is essentially > implementation detail is _very_ grotty. Having matviews using SQL expressible features is a *good* thing. Having a user accessible operator is nice to have (if for no other reason than to allow testing for which matview rows would be refreshed). I just don't understand what all the fuss is about except to make sure not to utilize an operator name that is better suited for other purposes. merlin
Stephen Frost <sfrost@snowman.net> wrote: > making an SQL operator for 'are these really the same bytes' to > deal with what is essentially implementation detail is _very_ > grotty. We already have some such operators, although Andres argues that comparing to that isn't fair because we at least know it is a string of characters; we're just ignoring character boundaries and collations. Some of the operators use for the existing byte comparison opclasses are: ~<~ ~<=~ ~>=~ ~>~ Go ahead and try them out with existing text values. Andres has said that he has seen these used in production systems. = and <> aren't listed above even though they do a byte-for-byte comparison because, well, I guess because we have chosen to treat two UTF8 strings which produce the same set of glyphs using different bytes as unequal. :-/ -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-18 11:06:13 -0500, Merlin Moncure wrote: > > Ugh. This feels like a pretty ugly hack to deal with that. I haven't > > got any magical wand to address it, but making an SQL operator for 'are > > these really the same bytes' to deal with what is essentially > > implementation detail is _very_ grotty. I know the feeling, but I don't have a better suggestion either, so... > Having matviews using SQL expressible features is a *good* thing. > Having a user accessible operator is nice to have (if for no other > reason than to allow testing for which matview rows would be > refreshed). I just don't understand what all the fuss is about except > to make sure not to utilize an operator name that is better suited for > other purposes. It's an externally exposed API with not easily understandable semantics that's not actually all that useful outside specific usecases. If we decide to change it we're creating an API breakage. And we get to deal with people saying it's broken because they don't understand the semantics. That said, I am ok with this if we use strange operator names and document that the semantics are complicated... ==!!< ==!!<= ... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Merlin Moncure (mmoncure@gmail.com) wrote: > Having matviews using SQL expressible features is a *good* thing. Fine, then it should be implemented *using SQL*, which is based on *values*, not on how the value is represented in bits and bytes. > Having a user accessible operator is nice to have (if for no other > reason than to allow testing for which matview rows would be > refreshed). If it's not actually *changing* (wrt its value), then I'm not at all impressed with the notion that it's going to get updated anyway. Thanks, Stephen
* Kevin Grittner (kgrittn@ymail.com) wrote: > = and <> aren't listed above even though they do a byte-for-byte > comparison because, well, I guess because we have chosen to treat > two UTF8 strings which produce the same set of glyphs using > different bytes as unequal. :-/ I tend to side with Andres on this case actually- we're being asked to store specific UTF8 bytes by the end user. That is not the same as treating two different numerics which are the same *number* as different because they have different binary representations, which is entirely an internal-to-postgres consideration. Thanks, Stephen
On 09/18/2013 11:39 AM, Stephen Frost wrote: > * Kevin Grittner (kgrittn@ymail.com) wrote: >> = and <> aren't listed above even though they do a byte-for-byte >> comparison because, well, I guess because we have chosen to treat >> two UTF8 strings which produce the same set of glyphs using >> different bytes as unequal. :-/ > I tend to side with Andres on this case actually- we're being asked to > store specific UTF8 bytes by the end user. That is not the same as > treating two different numerics which are the same *number* as > different because they have different binary representations, which is > entirely an internal-to-postgres consideration. The problem is that there are datatypes (citext, postgis,...) that have defined = to return true when comparing two values that are different not just stored differently. Are you saying that matview's should update only when the = operator of the datatype returns false and if people don't like this behaviour they should fix the datatypes? > Thanks, > > Stephen
* Steve Singer (steve@ssinger.info) wrote: > The problem is that there are datatypes (citext, postgis,...) that > have defined = to return true when comparing two values that are > different not just stored differently. If the definition of the type is that they're equal, then they're equal. Certainly there are cases where this is really rather broken (particularly in the postgis case that you mention), but I don't think that means we should change our definition of equality to generally be "are the bytes the same"- clearly that'd lead to incorrect behavior in the NUMERIC case. > Are you saying that > matview's should update only when the = operator of the datatype > returns false and if people don't like this behaviour they should > fix the datatypes? imv, we are depending on the "=" operator to tell us when the values are equal, regardless of type. I have a hard time seeing how we can do anything else. The PostGIS situation is already broken when you consider UNIQUE constraints and, while it's unfortunate that they'd need to change their data type to fix that, I do feel it's on them to deal with it. Anyone can create an extension with their own data type which returns wrong data and results, it's not on us to figure out how to make those work even in the face of blatent violations like making "=" not actually mean "these values are the same". Thanks, Stephen
Stephen Frost <sfrost@snowman.net> wrote: > If it's not actually *changing* (wrt its value), then I'm not at > all impressed with the notion that it's going to get updated > anyway. But PostgreSQL very specifically (and as far as I can tell *intentionally*) allows you to *change* a value and have it still be considered *equal*. The concept of equal values really means more like "equivalent" or "close enough" for common purposes. It very specifically does *not* mean the same value. As just one example, think how much easier the citext type would be to implement if it folded all values to lower case as they were input, rather than preserving the data as entered and considering different capitalizations as "equal". The notion that in PostgreSQL a value has not changed if the new value is equal to the old is just flat out wrong. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin,
On Wednesday, September 18, 2013, Kevin Grittner wrote:
On Wednesday, September 18, 2013, Kevin Grittner wrote:
Stephen Frost <sfrost@snowman.net> wrote:
> If it's not actually *changing* (wrt its value), then I'm not at
> all impressed with the notion that it's going to get updated
> anyway.
But PostgreSQL very specifically (and as far as I can tell
*intentionally*) allows you to *change* a value and have it still
be considered *equal*.
I'm curious where you're going with that- of course you can update a value and have the same value (and possibly the same byte representation) stored over the old.
The concept of equal values really means
more like "equivalent" or "close enough" for common purposes. It
very specifically does *not* mean the same value.
I'm really curious about your thoughts on unique indexes then. Should two numerics which are the same value but different byte representations be allowed in a unique index?
As just one example, think how much easier the citext type would be
to implement if it folded all values to lower case as they were
input, rather than preserving the data as entered and considering
different capitalizations as "equal".
If the type operator says they're equal, then I think we need to consider them as equal. If an update happens with a conditional of:
where col1 = 'Abc'
When col1 is 'ABC' using citext, should we still issue the update?
The notion that in PostgreSQL a value has not changed if the new
value is equal to the old is just flat out wrong.
The value *can* be changed to be equal to the existing value but that doesn't make the two values *not equal*.
Thanks,
Stephen
On 09/18/2013 05:53 PM, Andres Freund wrote: > On 2013-09-18 11:50:23 -0400, Stephen Frost wrote: >> For my 2c on this, while this can be useful for *us*, and maybe folks >> hacking pretty close to PG, I can't get behind introducing this as an >> '===' or some such operator. I've missed why this can't be a simple >> function and why in the world we would want to encourage users to use >> this by making it look like a normal language construct of SQL, which >> damn well better consider numbers which are equal in value to be equal, >> regardless of their representation. > I certainly understand the feeling... > > I think this really needs to have an obscure name. Like ==!!== or > somesuch (is equal very much, but doesn't actually test for equality ;)) In PostgreSQL equality can be "anything" :) In other words, we have "pluggable equality", so it is entirely feasible to have an opclass where binary equality is *the* equality the problem started with some "opclass equality" (case insensitive comparison) missing user-visible changes. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
Stephen Frost <sfrost@snowman.net> wrote: > I don't think that means we should change our definition of > equality to generally be "are the bytes the same"- clearly that'd > lead to incorrect behavior in the NUMERIC case. Nobody is talking in any way, shape, or form about changing our concept of what is "equal". We're talking about recognizing that in PostgreSQL "equal" does *not* mean "the same". If we used the equal concept for determining what has changed, if someone was tracking numeric data without precision and scale so that they could track accuracy (by storing the correct number of decimal positions) the accuracy could not be replicated to a materialized view. Of course, streaming replication would replicate the change, but if '1.4' was stored in a column copied into a matview and they later updated the source to '1.40' the increase in accuracy would not flow to the matview. That would be a bug, not a feature. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Stephen Frost <sfrost@snowman.net> > Kevin Grittner wrote: >> Stephen Frost <sfrost@snowman.net> wrote: >> >>> If it's not actually *changing* (wrt its value), then I'm not >>> at all impressed with the notion that it's going to get updated >>> anyway. >> >> But PostgreSQL very specifically (and as far as I can tell >> *intentionally*) allows you to *change* a value and have it >> still be considered *equal*. > > I'm curious where you're going with that- of course you can > update a value and have the same value (and possibly the same > byte representation) stored over the old. The way I see it, you can update a column to a different value which will compare as equal. That's fine. Nobody wants to change that. But it is still not the same value. >> The concept of equal values really means >> more like "equivalent" or "close enough" for common purposes. It >> very specifically does *not* mean the same value. > > I'm really curious about your thoughts on unique indexes then. > Should two numerics which are the same value but different byte > representations be allowed in a unique index? Not if it is defined with the default opclass, which will use an equal operator. Of course, this patch would allow an index on a record to be defined with record_image_ops, in which case it would sort by the raw bytes in the values of the record. That's not going to be useful in very many places, which is why it would not be the default. You don't get that behavior unless you ask for it. See this docs page for a similar example related to complex numbers: http://www.postgresql.org/docs/current/interactive/xindex.html#XINDEX-EXAMPLE > If the type operator says they're equal, then I think we need to > consider them as equal. Absolutely. Two different values may be equal within an opclass. > If an update happens with a conditional of: > > where col1 = 'Abc' > > When col1 is 'ABC' using citext, should we still issue the > update? Absolutely not, because the update was requested in the case that the equality test was true. Yet if a row is updated to replace 'Abc' with 'ABC', then streaming replication should copy the "different but equal value" (it does), a normal view should now show 'ABC' (it does), and a refresh of a matview should cause the matview to show 'ABC' (it doesn't in git, but this patch would make that work). > The value *can* be changed to be equal to the existing value but > that doesn't make the two values *not equal*. Nobody has ever argued that they should be considered *not equal*. It's just about providing a way to recognize when two equal values *are not the same*. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> wrote: >> If an update happens with a conditional of: >> >> where col1 = 'Abc' >> >> When col1 is 'ABC' using citext, should we still issue the >> update? > > Absolutely not, because the update was requested in the case that > the equality test was true. Sorry, as if this thread were not long enough, I misread that and gave the wrong answer. Yes, the equal operator was used and the equal operator for two citext values says those are equal, so the row *should* be updated. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > change, but if '1.4' was stored in a column copied into a matview > and they later updated the source to '1.40' the increase in > accuracy would not flow to the matview. That would be a bug, not a > feature. Maybe the answer to that use case is to use the seg extension? http://www.postgresql.org/docs/9.3/interactive/seg.html IOW, colour me unconvinced about that binary-equality opclass use case in MatViews. We are trusting the btree equality operator about everywhere in PostgreSQL and it's quite disturbing to be told that in fact we should not trust it. Would it make sense for you to produce a patch without the extra operators, behavior, opclass and opfamily here so that we can focus on the MatView parts of it? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> change, but if '1.4' was stored in a column copied into a matview >> and they later updated the source to '1.40' the increase in >> accuracy would not flow to the matview. That would be a bug, not a >> feature. > > Maybe the answer to that use case is to use the seg extension? > > http://www.postgresql.org/docs/9.3/interactive/seg.html You are arguing that we should provide lesser support for numeric columns (and who knows how many other types) in materialized views than we do in streaming replication, pg_dump, suppress_redundant_updates_trigger(), and other places? Why? > IOW, colour me unconvinced about that binary-equality opclass use case > in MatViews. We are trusting the btree equality operator about > everywhere in PostgreSQL and it's quite disturbing to be told that in > fact we should not trust it. Who said we should not trust it? I have said that equality in PostgreSQL does not mean the two values appear the same or behave the same in all cases -- the definer of the class gets to determine how many definitions of equality there are, and which (if any) is tied to the = operator name. That should not be news to anybody; it's in the documentation. I'm proposing a second definition of equality with a different operator name for comparing two records, without in any way disturbing the existing definition. I am taken completely by surprise that in this case creating a second opclass for something is somehow controversial. The documentation I cited previously provides a clear example of another case where two completely different concepts of equality for a type are useful. We have, as a community, gone to a fair amount of trouble to make the concept of equality pluggable and allow multiple types of equality per type. To me it seems the perfect tool to solve this problem. Why the fuss? > Would it make sense for you to produce a patch without the extra > operators, behavior, opclass and opfamily here so that we can focus on > the MatView parts of it? No, matviews cannot be fixed without the new operators. Here are the stats on the patch: kgrittn@Kevin-Desktop:~/pg/master$ git diff --stat master..matview contrib/citext/expected/citext.out | 41 +++ contrib/citext/expected/citext_1.out | 41 +++ contrib/citext/sql/citext.sql | 23 ++ src/backend/commands/matview.c | 7 +- src/backend/utils/adt/rowtypes.c | 482 ++++++++++++++++++++++++++ src/include/catalog/pg_amop.h | 10 + src/include/catalog/pg_amproc.h | 1 + src/include/catalog/pg_opclass.h | 1 + src/include/catalog/pg_operator.h | 14 + src/include/catalog/pg_opfamily.h | 1 + src/include/catalog/pg_proc.h | 12 +- src/include/utils/builtins.h | 7 + src/test/regress/expected/opr_sanity.out | 7 +- 13 files changed, 642 insertions(+), 5 deletions(-) The changes to matview.c are the only ones that are matview-specific. Basically, that consists of using the new operator instead of = in a couple places. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 09/18/2013 09:19 PM, Dimitri Fontaine wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> change, but if '1.4' was stored in a column copied into a matview >> and they later updated the source to '1.40' the increase in >> accuracy would not flow to the matview. That would be a bug, not a >> feature. > Maybe the answer to that use case is to use the seg extension? > > http://www.postgresql.org/docs/9.3/interactive/seg.html > > IOW, colour me unconvinced about that binary-equality opclass use case > in MatViews. We are trusting the btree equality operator about > everywhere in PostgreSQL and it's quite disturbing to be told that in > fact we should not trust it. The problem is, that in this case the simple VIEW and MATVIEW would yield different results. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 09/18/2013 09:41 PM, Kevin Grittner wrote: > Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote: >> Kevin Grittner <kgrittn@ymail.com> writes: >>> change, but if '1.4' was stored in a column copied into a matview >>> and they later updated the source to '1.40' the increase in >>> accuracy would not flow to the matview. That would be a bug, not a >>> feature. >> Maybe the answer to that use case is to use the seg extension? >> >> http://www.postgresql.org/docs/9.3/interactive/seg.html > You are arguing that we should provide lesser support for numeric > columns (and who knows how many other types) in materialized views > than we do in streaming replication, pg_dump, > suppress_redundant_updates_trigger(), and other places? Why? > >> IOW, colour me unconvinced about that binary-equality opclass use case >> in MatViews. We are trusting the btree equality operator about >> everywhere in PostgreSQL and it's quite disturbing to be told that in >> fact we should not trust it. > Who said we should not trust it? I have said that equality in > PostgreSQL does not mean the two values appear the same or behave > the same in all cases -- the definer of the class gets to determine > how many definitions of equality there are, and which (if any) is > tied to the = operator name. That should not be news to anybody; > it's in the documentation. I'm proposing a second definition of > equality with a different operator name for comparing two records, > without in any way disturbing the existing definition. Basically what proposed === does is "is guaranteed to be equal". If it is not *guaranteed* it is safe to re-evaluate, either using equal or something else. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 09/18/2013 06:05 PM, Stephen Frost wrote: > * Kevin Grittner (kgrittn@ymail.com) wrote: >> Right. Not only would the per-type solution make materialized views >> maintenance broken by default, requiring per-type work to make it >> work reasonably, with silent failures for any type you didn't know >> about, but "no user-visible differences" is a pretty slippery >> concept. > I don't like those possibilities, of course, but I'm starting to wonder > about this whole concept of looking at it byte-wise. If I'm following > correctly, what we're looking at here is having a way for matviews to > tell if these bytes are the same as those bytes, for the purpose of > deciding to update the matview, right? Basically what is needed is to check if the rowm *might* have changed, so the new value, which may or may not be "equal" would be refreshed into matview. > Yet we can then have cases where > the row isn't *actually* different from a value perspective, yet we're > going to update it anyway because it's represented slightly differently? > > What happens if we later want to add support for users to have a matview > trigger that's called when a matview row *actually* changes? We'd end > up calling it on what are essentially false positives, or having to do > some double-check later on "well, did it *really* change?", neither of > which is good at all. If we had the IS NOT DISTINCT FROM operators > discussed, would that work for this even if it isn't as performant? Or > is there an issue with that? IS NOT DISTINCT solves the problem with weird equality of NULLS and nothing else, so it would not help here. What the proposed operator family solves is "possiby changed" Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 09/18/2013 05:54 PM, Kevin Grittner wrote: > ... > I think the hardest part will be documenting the difference between > the row value constructor semantics (which are all that is > currently documented) and the record equality semantics (used for > sorting and building indexes). In a green field I think I would > have argued for having just the standard semantics we have > documented, and modifying our sort execution nodes and index builds > to deal with that. This is one of those cases where the breakage > from changing to that is hard to justify on a cleaner conceptual > semantics basis. > > There also seems to be universal agreement that the operator names > should be something other than what I put in the v1 patch, but we > don't have agreement on what should be used instead. We need six > operators, to support the btree am requirements. Currently the > patch has: > > === !== >== >>> <== <<< > > Suggested "same as" operators so far are: > > ==== > ===== > <<=>> > =><= > > Anyone want to champion one of those, or something else? How about > the other five operators to go with your favorite? ANother take would be using "possibly unequal" for this with operator defined as *==* (*definitely* equal , or guaranteed to be equal) the inequality operator would thus become !== (may be not equal) and ordering ops would be ?<==(maybe smaller or equal), ?<(maybe smaller) and same for larger ?>== and ?> as a table *==* "binary equal, surely very equal by any other definition as wall" !==? "maybe not equal" -- "binary inequal, may still be equal by other comparison methods" <==? "binary smaller or equal, may be anything by other comparison methods" <? >==? >? Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 09/18/2013 07:53 PM, Stephen Frost wrote: > > I'm really curious about your thoughts on unique indexes then. Should > two numerics which are the same value but different byte > representations be allowed in a unique index? You could have multiple btree opclasses defined which would enforce different kind of "uniqueness" For example you could have an opclass which considers two strings "equal" if four first bytes are equal. If you would create an unique index using that opclass you could not have both "industrial" and "induction" as primary keys as the same time, as the unique index would consider them equal. But you would still want to see the change in your matview after you do UPDATE mytable set id = 'industrial' where id = 'induction'; Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
Kevin Grittner <kgrittn@ymail.com> writes: > You are arguing that we should provide lesser support for numeric > columns (and who knows how many other types) in materialized views > than we do in streaming replication, pg_dump, > suppress_redundant_updates_trigger(), and other places? Why? Because you're saying that you need SQL semantics, and probably because I'm not understanding well enough the problem you're trying to solve. > We have, as a community, gone to a fair amount of trouble to make > the concept of equality pluggable and allow multiple types of > equality per type. To me it seems the perfect tool to solve this > problem. Why the fuss? Because I don't understand why you need another equality than the default btree one, certainly. The other opclass, to my knowledge, are only used in relation with index searches, that is when comparing heap or input values with indexed values, right? > No, matviews cannot be fixed without the new operators. Here are > the stats on the patch: Ok, then someone (preferably a commiter) need to understand the problem at hand in a better way than I do now, I guess. If possible I will read through your patch, I'm curious now. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 09/19/2013 12:55 AM, Dimitri Fontaine wrote: >> We have, as a community, gone to a fair amount of trouble to make >> > the concept of equality pluggable and allow multiple types of >> > equality per type. To me it seems the perfect tool to solve this >> > problem. Why the fuss? > Because I don't understand why you need another equality than the > default btree one, certainly. Basically because 'word'::citext and 'Word'::citext are the same to your brain but not to your eyeballs. Unique indexes, for example, only need to please your brain. Matviews need to please your eyeballs. -- Vik
Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> You are arguing that we should provide lesser support for numeric >> columns (and who knows how many other types) in materialized views >> than we do in streaming replication, pg_dump, >> suppress_redundant_updates_trigger(), and other places? Why? > > Because you're saying that you need SQL semantics, and probably because > I'm not understanding well enough the problem you're trying to solve. There are examples in the patch and this thread, but rather than reference back to those I'll add a new one. Without the patch: test=# CREATE TABLE nt (id serial PRIMARY KEY, grp citext, num numeric); CREATE TABLE test=# INSERT INTO nt (grp, num) VALUES test-# ('one', '1.0'), test-# ('one', '2.0'), test-# ('two', '123.000'); INSERT 0 3 test=# CREATE VIEW nv AS SELECT grp, sum(num) AS num FROM nt GROUP BY grp; CREATE VIEW test=# SELECT * FROM nv ORDER BY grp; grp | num -----+--------- one | 3.0 two | 123.000 (2 rows) test=# CREATE MATERIALIZED VIEW nm AS SELECT grp, sum(num) AS num FROM nt GROUP BY grp; SELECT 2 test=# CREATE UNIQUE INDEX nm_id ON nm (grp); CREATE INDEX test=# SELECT * FROM nm ORDER BY grp; grp | num -----+--------- one | 3.0 two | 123.000 (2 rows) test=# UPDATE nt SET grp = 'Two', num = '123.0000' WHERE id = 3; UPDATE 1 test=# REFRESH MATERIALIZED VIEW CONCURRENTLY nm; REFRESH MATERIALIZED VIEW test=# SELECT * FROM nv ORDER BY grp; grp | num -----+---------- one | 3.0 Two | 123.0000 (2 rows) test=# SELECT * FROM nm ORDER BY grp; grp | num -----+--------- one | 3.0 two | 123.000 (2 rows) The problem, as I see it, is that the view and the concurrently refreshed materialized view don't yield the same results for the same query. The rows are equal, but they are not the same. With the patch the matview, after RMVC, looks just the same as the view. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > There are examples in the patch and this thread, but rather than > reference back to those I'll add a new one. Without the patch: Thanks much for doing that. > The problem, as I see it, is that the view and the concurrently > refreshed materialized view don't yield the same results for the > same query. The rows are equal, but they are not the same. With > the patch the matview, after RMVC, looks just the same as the view. My understanding is that if you choose citext then you don't care at all about the case, so that the two relations actually yield the same results for the right definition of "same" here: the citext one. In other words, the results only look different in ways that don't matter for the datatype involved, and I think that if it matters to the user then he needs to review is datatype choices or view definition. So my position on that is that your patch is only adding confusion for no benefits that I'm able to understand. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 09/19/2013 10:54 AM, Dimitri Fontaine wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> There are examples in the patch and this thread, but rather than >> reference back to those I'll add a new one. Without the patch: > Thanks much for doing that. > >> The problem, as I see it, is that the view and the concurrently >> refreshed materialized view don't yield the same results for the >> same query. The rows are equal, but they are not the same. With >> the patch the matview, after RMVC, looks just the same as the view. > My understanding is that if you choose citext then you don't care at all > about the case, so that the two relations actually yield the same > results for the right definition of "same" here: the citext one. > > In other words, the results only look different in ways that don't > matter for the datatype involved, and I think that if it matters to the > user then he needs to review is datatype choices or view definition. > > So my position on that is that your patch is only adding confusion for > no benefits that I'm able to understand. The aim is to have a view and materialized view return the same results. If they do not, the user is guaranteed to be confused and to consider the matview implementation broken the patch solves the general problem of "when the table changes, refresh" After saying it like this, the problem could also be solved by including xmin(s) for rows from underlying table(s)in the matview. Would this be a better approach ? -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> The problem, as I see it, is that the view and the concurrently >> refreshed materialized view don't yield the same results for the >> same query. The rows are equal, but they are not the same. >> With the patch the matview, after RMVC, looks just the same as >> the view. > > My understanding is that if you choose citext then you don't care > at all about the case That's not my understanding. If that was what citext was for it would be much simpler to force the case in creating each value. It *preserves* the case for display, but ignores it for comparisons. That's the contract of the type, like it or not. "Equal" does not mean "the same". They clearly want to preserve and display differences among equal values. Streaming replication would preserve the differences. pg_dump and restore of the data would preserve the differences. SELECTing the data shows the differences. suppress_redundant_updates_trigger() will not suppress an UPDATE setting an "equal" value unless it is also *the same*. A normal VIEW shows the differences. RMV without CONCURRENTLY will show the difference. I'm suggesting that RMVC and the upcoming incremental update of matviews should join this crowd. A matview which is being refreshed with the CONCURRENTLY option, or maintained by incremental maintenance should not look different from a regular VIEW, and should not suddenly look completely different after a non-concurrent REFRESH. If things are left that way, people will quite justifiably feel that matviews are broken. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-19 05:33:22 -0700, Kevin Grittner wrote: > Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote: > > Kevin Grittner <kgrittn@ymail.com> writes: > > >> The problem, as I see it, is that the view and the concurrently > >> refreshed materialized view don't yield the same results for the > >> same query. The rows are equal, but they are not the same. > >> With the patch the matview, after RMVC, looks just the same as > >> the view. > > > > My understanding is that if you choose citext then you don't care > > at all about the case > > That's not my understanding. If that was what citext was for it > would be much simpler to force the case in creating each value. It > *preserves* the case for display, but ignores it for comparisons. > That's the contract of the type, like it or not. "Equal" does not > mean "the same". They clearly want to preserve and display > differences among equal values. I agree. I am not 100% sure if the can of worms this opens is worth the trouble, but from my POV it's definitely an understandable and sensible goal. My complaints about this "subfeature" were never about trying to get that right for matviews... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hannu Krosing <hannu@2ndQuadrant.com> wrote: > the patch solves the general problem of "when the table changes, > refresh" > > After saying it like this, the problem could also be solved by > including xmin(s) for rows from underlying table(s)in the > matview. > > Would this be a better approach ? Now you're moving from REFRESH territory into incremental maintenance. There have been a very large number of papers written on that topic, I have reviewed the literature, and I have picked a technique which looks like it will be fast and reliable -- based on relational algebra. Let's save discussion of alternatives such as you're suggesting here, for when I get past the easy stuff ... like just refreshing a view so that the new contents are the same as what you would see if you re-ran the query defining the matview. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 09/19/2013 02:41 PM, Kevin Grittner wrote: > Hannu Krosing <hannu@2ndQuadrant.com> wrote: > >> the patch solves the general problem of "when the table changes, >> refresh" >> >> After saying it like this, the problem could also be solved by >> including xmin(s) for rows from underlying table(s)in the >> matview. >> >> Would this be a better approach ? > Now you're moving from REFRESH territory into incremental > maintenance. There have been a very large number of papers written > on that topic, I have reviewed the literature, and I have picked a > technique which looks like it will be fast and reliable -- based on > relational algebra. Let's save discussion of alternatives such as > you're suggesting here, for when I get past the easy stuff ... like > just refreshing a view so that the new contents are the same as > what you would see if you re-ran the query defining the matview. I'm sure that comparing xmin records would work exactly similar to binary comparisons of records for detecting possible change in 99.999% of real-world cases. I am also pretty sure it would have its own cans of worms all over again when trying to actually implement this in matviews :) Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On Wed, Sep 18, 2013 at 7:29 PM, Vik Fearing <vik.fearing@dalibo.com> wrote: > On 09/19/2013 12:55 AM, Dimitri Fontaine wrote: >>> We have, as a community, gone to a fair amount of trouble to make >>> > the concept of equality pluggable and allow multiple types of >>> > equality per type. To me it seems the perfect tool to solve this >>> > problem. Why the fuss? >> Because I don't understand why you need another equality than the >> default btree one, certainly. > > Basically because 'word'::citext and 'Word'::citext are the same to your > brain but not to your eyeballs. > > Unique indexes, for example, only need to please your brain. Matviews > need to please your eyeballs. Right, and well said. It's perfectly reasonable to want a "unique" index that doesn't allow both "Kevin O'leary" and "Kevin O'Leary" to be listed in the "irish_names" table, but it's also perfectly reasonable to want to remember that the user who entered the data spelled it the second way and not the first. And it's also reasonable to want any corrections made to the table to propagate to any materialized views defined over it. The fact that we have multiple concepts of equality can be confusing, but it's not for no reason, and we're not the only database or programming language to have such a concept. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 09/12/2013 06:27 PM, Kevin Grittner wrote: > Attached is a patch for a bit of infrastructure I believe to be > necessary for correct behavior of REFRESH MATERIALIZED VIEW > CONCURRENTLY as well as incremental maintenance of matviews. Here is attempt at a review of the v1 patch. There has been a heated discussion on list about if we even want this type of operator. I will try to summarize it here The arguments against it are * that a record_image_identical call on two records that returns false can still return true under the equality operator, and the records might (or might not) appear to be the same to users * Once we expose this as an operator via SQL someone will find it and use it and then complain when we change things such as the internal representation of a datatype which might break there queries * The differences between = and record_image_identical might not be understood by everywhere who finds the operator leading to confusion * Using a criteria other than equality (the = operator provided by the datatype) might cause problems if we later provide 'on change' triggers for materialized views The arguments for this patch are * We want the materialized view to return the same value as would be returned if the query were executed directly. This not only means that the values should be the same according to a datatypes = operator but that they should appear the same 'to the eyeball'. * Supporting the materialized view refresh check via SQL makes a lot of sense but doing that requires exposing something via SQL * If we adequately document what we mean by record_image_identical and the operator we pick for this then users shouldn't be surprised at what they get if they use this I think there is agreement that better (as in more obscure) operators than === and !== need to be picked and we need to find a place in the user-docs to warn users of the behaviour of this operators. Hannu has proposed *==* "binary equal, surely very equal by any other definition as wall" !==? "maybe not equal" -- "binary inequal, may still be equal by other comparison methods" and no one has yet objected to this. My vote would be to update the patch to use those operator names and then figure outwhere we can document them. It it means we have to section describing the equal operator for records as well then maybewe should do that so we can get on with things. Code Review -------------------- The record_image_eq and record_image_cmp functions are VERY similar. It seems to me that you could have the meet of thesefunctions into a common function (that isn't exposed via SQL) that then behaves differently in 2 or 3 spots based ona parameter that controls if it detoasts the tuple for the compare or returns false on the equals. Beyond that I don't see any issues with the code. You call out a question about two records being compared have a different number of columns should it always be an error,or only an error when they match on the columns they do have in common. The current behaviour is select * FROM r1,r2 where r1<==r2; a | b | a | b | c ---+---+---+---+--- 1 | 1 | 1 | 2 | 1 (1 row) update r2 set b=1; UPDATE 1 test=# select * FROM r1,r2 where r2<==r1; ERROR: cannot compare record types with different numbers of columns This seems like a bad idea to me. I don't like that I get a type comparison error only sometimes based on the values ofthe data in the column. If I'm a user testing code that compares two records with this operator and I test my query with1 value pair then and it works then I'd naively expect to get a true or false on all other value sets of the same recordtype.
On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote: > I think there is agreement that better (as in more obscure) > operators than === and !== need to be picked and we need to find a > place in the user-docs to warn users of the behaviour of this > operators. Hannu has proposed > > *==* "binary equal, surely very equal by any other definition as wall" > !==? "maybe not equal" -- "binary inequal, may still be equal by > other comparison methods" It's a pity operators must be non-alpha and can't be named. Something like: SELECT foo OPERATOR("byte_equivalent") bar; is simultaneously obscure, yet clear. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
* Kevin Grittner (kgrittn@ymail.com) wrote: > ... like > just refreshing a view so that the new contents are the same as > what you would see if you re-ran the query defining the matview. I've heard this notion a few times of wanting to make sure that what you get from running the query matches the matview. While that makes sense when the equality operator and what-you-see actually match, it doesn't when they don't because the what-you-see ends up being non-deterministic and can change based on the order the datums are seen in during the query processing which can change with different data ordering on disk and even due to simply different plans for the same data, I believe. Consider a GROUP BY with a citext column as one of the key fields. You're going to get whatever value the aggregate happened to come across first when building the HashAgg. Having the binary equality operator doesn't help that and seems like it could even result in change storms happening due to a different plan when the actual data didn't change. Thanks, Stephen
Steve Singer <steve@ssinger.info> wrote: > On 09/12/2013 06:27 PM, Kevin Grittner wrote: >> Attached is a patch for a bit of infrastructure I believe to be >> necessary for correct behavior of REFRESH MATERIALIZED VIEW >> CONCURRENTLY as well as incremental maintenance of matviews. > > Here is attempt at a review of the v1 patch. Thanks for looking at it. > There has been a heated discussion on list about if we even want this > type of operator. It's not a question of whether we want to allow operators which define comparisons between objects in a non-standard way. We already have 11 such cases; this would add one more to make 12. In all cases there are different operators for making a comparison that return potentially different results from the default operators, to support uses that are specific to that type. > I think there is agreement that better (as in more obscure) operators > than === and !== need to be picked and we need to find a place in the > user-docs to warn users of the behaviour of this operators. Hannu has > proposed > > *==* "binary equal, surely very equal by any other definition as > wall" > !==? "maybe not equal" -- "binary inequal, may still be > equal by > other comparison methods" > > and no one has yet objected to this. I do. The issue is that there are a great many places that there are multiple definitions of equality. We generally try to use something which tries to convey something about the nature of the operation, since the fact that it can have different results is a given. There is nothing in those operators that says "binary image comparison". I thought about prepending a hash character in front of normal operators, because that somehow suggests binary operations to my eye (although I have no idea whether it does so for anyone else), but those operators are already used for an alternative set of comparisons for intervals, with a different meaning. I'm still trying to come up with something I really like. > My vote would be to update the patch to > use those operator names and then figure out where we can document them. It it > means we have to section describing the equal operator for records as well then > maybe we should do that so we can get on with things. > Code Review > -------------------- > > The record_image_eq and record_image_cmp functions are VERY similar. It seems > to me that you could have the meet of these functions into a common function > (that isn't exposed via SQL) that then behaves differently in 2 or 3 spots > based on a parameter that controls if it detoasts the tuple for the compare or > returns false on the equals. Did you look at the record_eq and record_cmp functions which exist before this patch? Is there a reason we should do it one way for the default operators and another way for this alternative? Do you think we should change record_eq and record_cmp to do things the way you suggest? > Beyond that I don't see any issues with the code. > > You call out a question about two records being compared have a different number > of columns should it always be an error, or only an error when they match on the > columns they do have in common. > > The current behaviour is > > select * FROM r1,r2 where r1<==r2; > a | b | a | b | c > ---+---+---+---+--- > 1 | 1 | 1 | 2 | 1 > (1 row) > > update r2 set b=1; > UPDATE 1 > test=# select * FROM r1,r2 where r2<==r1; > ERROR: cannot compare record types with different numbers of columns > > This seems like a bad idea to me. I don't like that I get a type comparison > error only sometimes based on the values of the data in the column. If I'm > a user testing code that compares two records with this operator and I test my > query with 1 value pair then and it works then I'd naively expect to get a > true or false on all other value sets of the same record type. Again, this is a result of following the precedent set by the existing record comparison operators. test=# create table r1 (c1 int, c2 int); CREATE TABLE test=# create table r2 (c1 int, c2 int, c3 int); CREATE TABLE test=# insert into r1 values (1, 2); INSERT 0 1 test=# insert into r2 values (3, 4, 5); INSERT 0 1 test=# select * from r1 join r2 on r1 < r2; c1 | c2 | c1 | c2 | c3 ----+----+----+----+---- 1 | 2 | 3 | 4 | 5 (1 row) test=# update r1 set c1 = 3, c2 = 4; UPDATE 1 test=# select * from r1 join r2 on r1 < r2; ERROR: cannot compare record types with different numbers of columns Would be in favor of forcing a change to the record comparison operators which have been in use for years? If not, is there a good reason to have them behave differently in this regard? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Steve, Thanks for providing a summary. * Steve Singer (steve@ssinger.info) wrote: > The arguments for this patch are > * We want the materialized view to return the same value as would be > returned if the query were executed directly. This not only means > that the values should be the same according to a datatypes = > operator but that they should appear the same 'to the eyeball'. With the cases where the equality operator doesn't match the 'to the eyeball' appearance, this really seems to be a pipe dream to me, unless we *also* define an ordering or similar which would then change the existing semantics for end users (which might be reasonable), but I'm not sure how we could do that without input from the type- iow, I don't think we can say "well, we'll order by byte representation". It's also going to possibly be a performance hit since we're going to have to do both an equality op call during a HashAgg/HashJoin/whatever and have to do a "which one is bigger of these two equal things", and then I wonder if we'd need to allow users to somehow specify "I don't want the bigger of the equal things, I want the smaller, in this query for this equality check." I'd really like to see how we're going to provide for that when the user is doing a GROUP BY without breaking something else or causing problems with later SQL spec revisions. > * Supporting the materialized view refresh check via SQL makes a lot > of sense but doing that requires exposing something via SQL ... which we don't currently expose for the queries that we already support users running today. Users seem to generally be accepting of that too, even though what they end up with in the output isn't necessairly consistent from query to query. The issue here is that we're trying to make the mat view look like what the query would do when run at the same time, which is a bit of a losing battle, imv. > * If we adequately document what we mean by record_image_identical > and the operator we pick for this then users shouldn't be surprised > at what they get if they use this That's a bit over-simplistic, really. We do try to keep to the POLA (principle of least astonishment) and that's not going to be easy here. > I think there is agreement that better (as in more obscure) > operators than === and !== need to be picked and we need to find a > place in the user-docs to warn users of the behaviour of this > operators. Hannu has proposed I'm a bit on the fence about this, after having discussed my concerns with Robert at PostgresOpen. If we're going to expose and support these at the SQL level, and we can figure out some semantics and consistency for using them that isn't totally baffling, then perhaps having them as simple and clear operator names would be reasonable. One concern here is if the SQL spec might decide some day that '===' is a useful operator for something else (perhaps even "they look the same when cast to text"). > *==* "binary equal, surely very equal by any other definition as wall" > !==? "maybe not equal" -- "binary inequal, may still be equal by > other comparison methods" Those look alright to me also though, but we'd need to work out the other operation names, right? And then figure out if we want to use those other operators (less-than, greater-than, etc) when doing equality operations to figure out which equal value to return to the user. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> wrote: > * Kevin Grittner (kgrittn@ymail.com) wrote: >> ... like >> just refreshing a view so that the new contents are the same as >> what you would see if you re-ran the query defining the matview. > > I've heard this notion a few times of wanting to make sure that what you > get from running the query matches the matview. While that makes sense > when the equality operator and what-you-see actually match, it doesn't > when they don't because the what-you-see ends up being non-deterministic > and can change based on the order the datums are seen in during the > query processing which can change with different data ordering on disk > and even due to simply different plans for the same data, I believe. That's a fair point to some extent. Where notions of equality differ, it is not always non-deterministic, but it can be. For citext you are correct. For a sum() of numeric data, the number of decimal positions will be the largest value seen; the value present in the query results will not vary by order of rows scanned or by plan. The result of this is that with the patch, an incremental refresh of a matview will always match what the query returned at the time it was run (there is no *correctness* problem) but if someone uses a query with non-deterministic results they may have a lot of activity on a concurrent refresh even if there was no change to the underlying data -- so you could have a *performance* penalty in cases where the query returns something different, compared to leaving the old "equal but not the same" results. > Consider a GROUP BY with a citext column as one of the key fields. > You're going to get whatever value the aggregate happened to come across > first when building the HashAgg. Having the binary equality operator > doesn't help that and seems like it could even result in change storms > happening due to a different plan when the actual data didn't change. Yup. A person who wants to GROUP BY a citext value for a matview might want to fold it to a consistent capitalization to avoid that issue. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Stephen Frost <sfrost@snowman.net> wrote: > The issue here is that we're trying to make the mat view look > like what the query would do when run at the same time, which is > a bit of a losing battle, imv. If we're not doing that, I don't see the point of matviews. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Friday, September 20, 2013, Kevin Grittner wrote:
Stephen Frost <sfrost@snowman.net> wrote:
> The issue here is that we're trying to make the mat view look
> like what the query would do when run at the same time, which is
> a bit of a losing battle, imv.
If we're not doing that, I don't see the point of matviews.
When taken out of context, I can see how that might not come across correctly. I was merely pointing out that it's a losing battle in the context of types which have equality operators which claim equalness but cast to text with results that don't match there.
Thanks,
Stephen
Stephen Frost <sfrost@snowman.net> wrote: > On Friday, September 20, 2013, Kevin Grittner wrote: >> Stephen Frost <sfrost@snowman.net> wrote: >> >>> The issue here is that we're trying to make the mat view look >>> like what the query would do when run at the same time, which >>> is a bit of a losing battle, imv. >> >> If we're not doing that, I don't see the point of matviews. > > When taken out of context, I can see how that might not come > across correctly. I was merely pointing out that it's a losing > battle in the context of types which have equality operators > which claim equalness but cast to text with results that don't > match there. I think the patch I've submitted wins that battle. The only oddity is that if someone uses a query for a matview which can provide results with rows which are equal to previous result rows under the default record opclass but different under this patch's opclass, RMVC could update to the latest query results when someone thinks that might not be necessary. Workarounds would include using a query with deterministic results (like using the upper() or lower() function on a grouped citext column in the result set) or not using the CONCURRENTLY option. Neither seems too onerous. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Kevin Grittner (kgrittn@ymail.com) wrote: > The result of this is that with the patch, an incremental refresh > of a matview will always match what the query returned at the time > it was run (there is no *correctness* problem) but if someone uses > a query with non-deterministic results they may have a lot of > activity on a concurrent refresh even if there was no change to the > underlying data -- so you could have a *performance* penalty in > cases where the query returns something different, compared to > leaving the old "equal but not the same" results. You mean 'at the time of the incremental refresh', right? Yet that may or may not match running that query directly by an end-user as the plan that a user might get for the entire query could be different than what is run for an incremental update, or due to statistics updates, etc. > > Consider a GROUP BY with a citext column as one of the key fields. > > You're going to get whatever value the aggregate happened to come across > > first when building the HashAgg. Having the binary equality operator > > doesn't help that and seems like it could even result in change storms > > happening due to a different plan when the actual data didn't change. > > Yup. A person who wants to GROUP BY a citext value for a matview > might want to fold it to a consistent capitalization to avoid that > issue. I'm trying to figure out why that's a perfectly acceptable solution for users running views with GROUP BYs, but apparently it isn't sufficient for mat views? In other words, you would suggest telling users "sorry, you can't rely on the value returned by a GROUP BY on that citext column using a normal view, but we're going to try and do better for mat views". I don't intend the above to imply that we should never update values in mat views when we can do so in a reliable way to ensure that the value matches what a view would return. This matches our notion of UPDATE, where we will still UPDATE a value even if the old value and the new value are equal according to the type's equality operator, when the conditional for the UPDATE is using a reliable type (eg: integer). Thanks, Stephen
On 2013-09-20 10:51:46 -0400, Stephen Frost wrote: > I'm trying to figure out why that's a perfectly acceptable solution for > users running views with GROUP BYs, but apparently it isn't sufficient > for mat views? Err, because users wrote a GROUP BY? They haven't (neccessarily) in the cases of the matviews we're talking about? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Stephen Frost <sfrost@snowman.net> wrote: > * Kevin Grittner (kgrittn@ymail.com) wrote: >> The result of this is that with the patch, an incremental refresh >> of a matview will always match what the query returned at the time >> it was run (there is no *correctness* problem) but if someone uses >> a query with non-deterministic results they may have a lot of >> activity on a concurrent refresh even if there was no change to the >> underlying data -- so you could have a *performance* penalty in >> cases where the query returns something different, compared to >> leaving the old "equal but not the same" results. > > You mean 'at the time of the incremental refresh', right? Yet that may > or may not match running that query directly by an end-user as the plan > that a user might get for the entire query could be different than what > is run for an incremental update, or due to statistics updates, etc. I'm confused. The refresh *does* run the query. Sure, if the query is run again it could return different results. I'm missing the point here. >>> Consider a GROUP BY with a citext column as one of the key >>> fields. You're going to get whatever value the aggregate >>> happened to come across first when building the HashAgg. >>> Having the binary equality operator doesn't help that and seems >>> like it could even result in change storms happening due to a >>> different plan when the actual data didn't change. >> >> Yup. A person who wants to GROUP BY a citext value for a matview >> might want to fold it to a consistent capitalization to avoid that >> issue. > > I'm trying to figure out why that's a perfectly acceptable solution for > users running views with GROUP BYs, but apparently it isn't sufficient > for mat views? In other words, you would suggest telling users "sorry, > you can't rely on the value returned by a GROUP BY on that citext column > using a normal view, but we're going to try and do better for mat > views". Again, I'm lost. If they don't do something to make the result deterministic, it could be different on each run of the VIEW, and on each REFRESH of the matview. I don't see why that is an argument for trying to suppress the effort needed make the matview match the latest run of the query. > I don't intend the above to imply that we should never update values in > mat views when we can do so in a reliable way to ensure that the value > matches what a view would return. This matches our notion of UPDATE, > where we will still UPDATE a value even if the old value and the new > value are equal according to the type's equality operator, when the > conditional for the UPDATE is using a reliable type (eg: integer). Well, we provide a trigger function to suppress the UPDATE operation if the old and new values are identical -- in terms of what is stored. We do not attempt to use the default btree equals operator to suppress updates to different values in some circumstances. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Andres Freund (andres@2ndquadrant.com) wrote: > On 2013-09-20 10:51:46 -0400, Stephen Frost wrote: > > I'm trying to figure out why that's a perfectly acceptable solution for > > users running views with GROUP BYs, but apparently it isn't sufficient > > for mat views? > > Err, because users wrote a GROUP BY? They haven't (neccessarily) in the > cases of the matviews we're talking about? Sure; my thinking was going back to what Hannu had suggested where we have a mechanism to see if the value was updated (using xmin or similar) and then update it in the mat view in that case, without actually doing a comparison at all. That wouldn't necessairly work for the GROUP BY case, but that situation doesn't work reliably today anyway. If we solve the GROUP BY case in SQL for these types then we could use the same for mat views, but we've been happy enough to ignore the issue thus far. Thanks, Stephen
* Kevin Grittner (kgrittn@ymail.com) wrote: > Stephen Frost <sfrost@snowman.net> wrote: > > You mean 'at the time of the incremental refresh', right? Yet that may > > or may not match running that query directly by an end-user as the plan > > that a user might get for the entire query could be different than what > > is run for an incremental update, or due to statistics updates, etc. > > I'm confused. The refresh *does* run the query. Sure, if the > query is run again it could return different results. I'm missing > the point here. Perhaps I'm assuming things are farther along than they are.. I was assumed that 'incremental mat view' updates were actuallly 'partial'- iow, that it keeps track of the rows in the mat view which are impacted by the set of rows in the base relations and then runs specific queries to pull out and operate on the base-rel set-of-rows to update the specific mat-view-rows. If it's running the whole query again then it's certainly more likely to get the same results that the user did, but it's not a guarantee and that's only a happy coincidence while we're still doing the whole query approach (which I hope we'd eventually progress beyond..). > > I'm trying to figure out why that's a perfectly acceptable solution for > > users running views with GROUP BYs, but apparently it isn't sufficient > > for mat views? In other words, you would suggest telling users "sorry, > > you can't rely on the value returned by a GROUP BY on that citext column > > using a normal view, but we're going to try and do better for mat > > views". > > Again, I'm lost. Perhaps my reply to Andres will help clear that up? > If they don't do something to make the result > deterministic, it could be different on each run of the VIEW, and > on each REFRESH of the matview. I don't see why that is an > argument for trying to suppress the effort needed make the matview > match the latest run of the query. Is this really just "run the new query and look if the old and new row are different" wrt 'incremental' view updates? If so, I think we're trying to design something here that we're going to totally break very shortly when we move beyond that kind of sledgehammer approach to incremental mat view updates and I'd rather we figure out a solution which will work beyond that.. > > I don't intend the above to imply that we should never update values in > > mat views when we can do so in a reliable way to ensure that the value > > matches what a view would return. This matches our notion of UPDATE, > > where we will still UPDATE a value even if the old value and the new > > value are equal according to the type's equality operator, when the > > conditional for the UPDATE is using a reliable type (eg: integer). > > Well, we provide a trigger function to suppress the UPDATE > operation if the old and new values are identical -- in terms of > what is stored. We do not attempt to use the default btree equals > operator to suppress updates to different values in some > circumstances. This feels like we're trying to figure out how to create a key for a whole row based on the binary representation of that row and then use that to check if we should update a given row or not. This seems a bit radical, but perhaps that's what we should just do then, rather than invent binary equivilance operators for what-things-look-like-now for this- extract the row columns out using their binary _send functions and then hash the result to build a key that you can use. At least then we're using a documented (well, we could probably improve on this, but still) and stable representation of the data that at least some of our users already deal with. Thanks, Stephen
On 2013-09-20 11:05:06 -0400, Stephen Frost wrote: > * Andres Freund (andres@2ndquadrant.com) wrote: > > On 2013-09-20 10:51:46 -0400, Stephen Frost wrote: > > > I'm trying to figure out why that's a perfectly acceptable solution for > > > users running views with GROUP BYs, but apparently it isn't sufficient > > > for mat views? > > > > Err, because users wrote a GROUP BY? They haven't (neccessarily) in the > > cases of the matviews we're talking about? > > Sure; my thinking was going back to what Hannu had suggested where we > have a mechanism to see if the value was updated (using xmin or similar) > and then update it in the mat view in that case, without actually doing > a comparison at all. VACUUM, HOT pruning. Have fun. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Andres Freund (andres@2ndquadrant.com) wrote: > On 2013-09-20 11:05:06 -0400, Stephen Frost wrote: > > Sure; my thinking was going back to what Hannu had suggested where we > > have a mechanism to see if the value was updated (using xmin or similar) > > and then update it in the mat view in that case, without actually doing > > a comparison at all. > > VACUUM, HOT pruning. Have fun. Yea, clearly oversimplified, but I do expect that we're going to reach a point where we're looking at the rows being updated in the base rels and which rows they map to in the view and then marking those rows as needing to be updated. That whole mechanism doesn't depend on this "are-they-binary-equal" approach and is what I had anticipated as the path we'd be going down in the future. The above is also what I recall had been discussed at the hackers meeting, along with some ideas/papers about how to specifically implement partial updates, hence my assumption that was what we were talking about.. Thanks, Stephen
On 09/20/2013 08:38 AM, Kevin Grittner wrote: > Did you look at the record_eq and record_cmp functions which exist > before this patch? Is there a reason we should do it one way for the > default operators and another way for this alternative? > Do you think we should change record_eq and record_cmp to do things > the way you suggest? I think the record_eq and record_cmp functions would be better if they shared code as well, but I don't think changing that should be part of this patch, you aren't otherwise touching those functions. I know some people dislike code that is switch based and prefer duplication, my preference is to avoid duplication. This seems like a bad idea to me. I don't like that I get a type comparison error only sometimes based on the values of the data in the column. If I'm a user testing code that compares two records with this operator and I test my query with 1 value pair then and it works then I'd naively expect to get a true or false on all other value sets of the same record type. > Again, this is a result of following the precedent set by the > existing record comparison operators. > > test=# create table r1 (c1 int, c2 int); > CREATE TABLE > test=# create table r2 (c1 int, c2 int, c3 int); > CREATE TABLE > test=# insert into r1 values (1, 2); > INSERT 0 1 > test=# insert into r2 values (3, 4, 5); > INSERT 0 1 > test=# select * from r1 join r2 on r1 < r2; > c1 | c2 | c1 | c2 | c3 > ----+----+----+----+---- > 1 | 2 | 3 | 4 | 5 > (1 row) > > test=# update r1 set c1 = 3, c2 = 4; > UPDATE 1 > test=# select * from r1 join r2 on r1 < r2; > ERROR: cannot compare record types with different numbers of columns > > Would be in favor of forcing a change to the record comparison > operators which have been in use for years? If not, is there a > good reason to have them behave differently in this regard? > > -- I hadn't picked up on that you copied that part of the behaviour from the existing comparison operators. No we would need a really good reason for changing the behaviour of the comparison operators and I don't think we have that. I agree that the binary identical operators should behave similarly to the existing comparison operators and error out on the column number mismatch after comparing the columns that are present in both. Steve > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > >
Dimitri Fontaine escribió: > My understanding is that if you choose citext then you don't care at all > about the case, so that the two relations actually yield the same > results for the right definition of "same" here: the citext one. For the record, I don't think citext means that the user doesn't care" about the case; it only means they want the comparisons to be case-insensitive, but case should nonetheless be preserved. That is, "case-insensitive, case-preserving". A parallel is MS-DOS file name semantics. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Sep 20, 2013 at 11:23 AM, Stephen Frost <sfrost@snowman.net> wrote: > Perhaps I'm assuming things are farther along than they are.. I was > assumed that 'incremental mat view' updates were actuallly 'partial'- > iow, that it keeps track of the rows in the mat view which are > impacted by the set of rows in the base relations and then runs specific > queries to pull out and operate on the base-rel set-of-rows to update > the specific mat-view-rows. If it's running the whole query again then > it's certainly more likely to get the same results that the user did, > but it's not a guarantee and that's only a happy coincidence while we're > still doing the whole query approach (which I hope we'd eventually > progress beyond..). It seems odd to me that you have such strong opinions about what is and is not acceptable here given that you don't seem to familiar with the current state of this work. I will attempt to summarize my understanding. In 9.3, we can refresh a materialized view by taking an access exclusive lock on the relation, rerunning the query, and replacing the contents wholesale. In master, there is a new option to perform the refresh concurrently, which is documented here: http://www.postgresql.org/docs/devel/static/sql-refreshmaterializedview.html It reruns the query in its entirety and then figures out what inserts, updates, and deletes are needed to make the matview match the output of the query (see refresh_by_match_merge). This is an improvement over what is available in 9.3, because even though you still have to rerun the full query, you don't have to lock out access to the table in order to apply the changes. However, currently, it sometimes fails to perform updates that are needed to make the contents of the view match the query output, because it relies on a notion of equality other than exact equality. Kevin is proposing to fix this problem via this patch. Now, the next project Kevin's going to work on, and that he was working on when he discovered this problem, is incremental maintenance: that is, allowing us to update the view *without* needing to rerun the entire query. This record comparison operator will be just as important in that context. The *only* strategy refreshing a materialized view that *doesn't* need this operator is the only we have in 9.3: through out all the old data and replace it with the result of re-executing the query. If you want anything smarter than that, you MUST compare old and new rows for equality so that you can update only those rows that have been changed. And if you compare them *any strategy other than the one Kevin is proposing*, namely binary equality, then you may sometimes decide that a row has not been changed when it really has, and then you won't update the row, and then incremental maintenance will be enabled to produce *wrong answers*. So to me this has come to seem pretty much open and shut. We all know that materialized views are not going to always match the data returned by the underlying query. Perhaps the canonical example is SELECT random(). As you pointed out upthread, any query that is nondeterministic is a potential problem for materialized views. When you write a query that can return different output based on the order in which input rows are scanned, or based on any sort of external state such as a random-number generator, each refresh of a materialized view based on that query may produce different answers. There's not a lot we can do about that, except tell people to avoid using such queries in materialized views that they expect to be stable. However, what we're talking about here is a completely separate problem. Even if the query is 100% deterministic, without this patch, the materialized view can get out of sync with the query results. Granted, most of the ways in which it can get out of sync are fairly subtle: failing to preserve case in a data type where comparisons are text-insensitive; gaining or loosing an all-zeroes null bitmap on an array type; adding or removing trailing zeroes after the decimal point in a numeric. If the materialized view sometimes said "1" when the query was returning "0", we'd presumably all say "that's a bug, let's fix it". But what we're actually talking about is that the query returns "0.00" and the view still says zero. So we're doing a lot of soul-searching about whether that's unequal enough to justify updating the row. Personally, though, there's not a lot of doubt in my mind. If I create a table and I put 0 into a column of that table and then create a materialized view and that 0 ends up in the materialized view, and then I update the 0 to 0.00 and refresh the view, I expect that change to propagate through to the materialized view. It works that way if I select from a regular, non-materialized view; and it also works that way if I select from the table itself. The idea that materialized views should somehow be exempt from reflecting changes to the underlying data in certain corner cases seems odd and indefensible to me, and I can't understand why anyone's arguing that we should do that. If we're going to avoid that, we need this operator. We can argue about how it should be named and whether it should be documented, and we can have all those arguments and still fix the problem. But if we decide we're not going to add this operator, then that seems to be basically saying that we don't want to allow materialized views to accurately reflect the results of the underlying queries. And I think that would be an extremely poor decision. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > It seems odd to me that you have such strong opinions about what is > and is not acceptable here given that you don't seem to familiar with > the current state of this work. That'd be because the arguments that I've been trying to make around this havn't had terribly much to do with the current state of this work but rather the higher level concepts- which are more important anyway, imv. > I will attempt to summarize my > understanding. In 9.3, we can refresh a materialized view by taking > an access exclusive lock on the relation, rerunning the query, and > replacing the contents wholesale. In master, there is a new option to > perform the refresh concurrently, which is documented here: > > http://www.postgresql.org/docs/devel/static/sql-refreshmaterializedview.html > > It reruns the query in its entirety and then figures out what inserts, > updates, and deletes are needed to make the matview match the output > of the query (see refresh_by_match_merge). So I've gathered and I'm not terribly surprised that it's broken. It was unfortunate that we committed it as such, imv. I'm *not* convinced that means we should build on that in a direction that doesn't appear to be getting us any closer to real matviews. Being in master doesn't make it released. > This is an improvement > over what is available in 9.3, because even though you still have to > rerun the full query, you don't have to lock out access to the table > in order to apply the changes. I see how you can view it as an improvement, but consider what you'd say if a user showed up on IRC wanting to implement *exactly* this in their DB using a cronjob and a bit of plpgsql. Would we encourage him/her and say "gee, that's a great approach, just compare the whole row!" Heck no, we'd say "uhh, you need a key there that you can depend on when you're doing your updates." We'd probably even suggest that they, I dunno, make that key their *primary key* for the matview. If they asked what would happen with their little plpgsql code when using citext or other, similar, type, we'd tell them exactly what would happen when using such a type with regular updates, and maybe even describe how they could get themselves into trouble if they tried to issue updates which changed those key fields and therefore run into possible lock contention from the unique index on values-that-are-not-really-unique-to-them. I can't see anyone encouraging that and here we are building a major new feature on it! To be honest, I'm amazed that I appear to be alone with this. > Now, the next project Kevin's going to work on, and that he was > working on when he discovered this problem, is incremental > maintenance: that is, allowing us to update the view *without* needing > to rerun the entire query. This record comparison operator will be > just as important in that context. You state this but I don't see where you justify this claim.. > The *only* strategy refreshing a > materialized view that *doesn't* need this operator is the only we > have in 9.3: through out all the old data and replace it with the > result of re-executing the query. I really don't agree with this, but allow me to play along a bit. What is going to happen with this incremental updates when you want to use an index to figure out what to update? Are you going to build the index using your binary equality operator, which could allow *duplicates that are not really duplicates*, even in the externally-visible world? And then we'll be returning multiple rows to the user when we shouldn't be? There's a whole slew of evil examples of this- it's called unnecessary DISTINCT's. Or perhaps you'll simply look up based on the actual equality answer and then *pick one* to use, perhaps the most recent, but then that may or may not equal what the actual query would generate *anyway* because of costing, join ordering, etc, etc, as I already pointed out. You're trying to guarantee something here that you can't. Trying to fake it and tell the users "oh, you'll be ok" is actually worse because it means they'll try and depend on it and then get all kinds of upset when it ends up *not* working. I'd much rather tell users "look, if you want to use these for *data* fields, that's fine, because our smart matview will figure out the keys associated with a GROUP BY and will update based on those, but if you GROUP BY a column with a type whose equality operator allows things to be different, you'll get *the same behavior you would get from the query*, which is to say, it won't be deterministic." > If you want anything smarter than > that, you MUST compare old and new rows for equality so that you can > update only those rows that have been changed. And if you compare > them *any strategy other than the one Kevin is proposing*, namely > binary equality, then you may sometimes decide that a row has not been > changed when it really has, and then you won't update the row, and > then incremental maintenance will be enabled to produce *wrong > answers*. So to me this has come to seem pretty much open and shut. The incremental maintenance approach, I'd hope, would be keeping track of what changed and what rows in the view need to be regenerated due to those underlying rows changing. Doing that, you *would* know what needed to be updated, and how. If you're not keeping track of the specific rows that changed underneath, then I'm curious how these incremental matviews will work differently from what's being described here, where we're trying to make the whole freakin' row the key. > We all know that materialized views are not going to always match the > data returned by the underlying query. Perhaps the canonical example > is SELECT random(). As you pointed out upthread, any query that is > nondeterministic is a potential problem for materialized views. When > you write a query that can return different output based on the order > in which input rows are scanned, or based on any sort of external > state such as a random-number generator, each refresh of a > materialized view based on that query may produce different answers. Of course. > There's not a lot we can do about that, except tell people to avoid > using such queries in materialized views that they expect to be > stable. However, what we're talking about here is a completely > separate problem. Even if the query is 100% deterministic, without > this patch, the materialized view can get out of sync with the query > results. Only a subset of the queries involving these types are 100% deterministic and those are the stupid simple cases where the view is just being used as a synonym. As soon as you start doing anything that actual uses the equality operator (and we have a *lot* of things that do!), you're going to get nondeterministic behavior. You're trying to paper over this and I'm not impressed. Or are you telling me that a matview using the binary-equality technique will be deterministic when your GROUP BY a citext column in the view? I don't buy it. > Granted, most of the ways in which it can get out of sync are fairly > subtle: failing to preserve case in a data type where comparisons are > text-insensitive; gaining or loosing an all-zeroes null bitmap on an > array type; adding or removing trailing zeroes after the decimal point > in a numeric. If the materialized view sometimes said "1" when the > query was returning "0", we'd presumably all say "that's a bug, let's > fix it". But what we're actually talking about is that the query > returns "0.00" and the view still says zero. So we're doing a lot of > soul-searching about whether that's unequal enough to justify updating > the row. Personally, though, there's not a lot of doubt in my mind. If it was as open-and-shut as that, and we *could* be 100% deterministic in *all* cases with these special types when going through views, I'd be a lot more on-board. However, we can't; not without insisting that the users use some new set of "binary-aggregation" operators which are able to *pick* which of the equal-but-not-really options they want; and by-the-way, you can *bet* that's going to need to be type-specific. > If I create a table and I put 0 into a column of that table and then > create a materialized view and that 0 ends up in the materialized > view, and then I update the 0 to 0.00 and refresh the view, I expect > that change to propagate through to the materialized view. It works > that way if I select from a regular, non-materialized view; and it > also works that way if I select from the table itself. Sure- and then you try to do the same thing with a join or a group by. Just because it works in the simple SELECT case doesn't mean we get to tell users "you can depend on this!" > The idea that > materialized views should somehow be exempt from reflecting changes to > the underlying data in certain corner cases seems odd and indefensible > to me, and I can't understand why anyone's arguing that we should do > that. *This* implementation of matviews, which doesn't actually track the changes to the underlying data and *also* doesn't regenerate everything is the *only* case where this concern is actually realized- and that's because it's trying to make a key column out of an entire row, which is, I'd argue, generally accepted in the DB world as a *bad idea*, with good reason. > If we're going to avoid that, we need this operator. We can argue > about how it should be named and whether it should be documented, and > we can have all those arguments and still fix the problem. But if we > decide we're not going to add this operator, then that seems to be > basically saying that we don't want to allow materialized views to > accurately reflect the results of the underlying queries. And I think > that would be an extremely poor decision. I don't want this implementation of matviews, which I think we can all agree isn't the end-all, be-all wrt what we want matvies in PG to look like anyway, to introduce things that will cause problems for us and confuse our users. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> wrote: > Robert Haas (robertmhaas@gmail.com) wrote: >> Now, the next project Kevin's going to work on, and that he was >> working on when he discovered this problem, is incremental >> maintenance: that is, allowing us to update the view *without* >> needing to rerun the entire query. This record comparison >> operator will be just as important in that context. > > You state this but I don't see where you justify this claim.. Unless we can tell whether there are any differences between two versions of a row, we can't accurately generate the delta to drive the incremental maintenance. The initial thread discussing how incremental maintenance would be done is here: http://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo@web162904.mail.bf1.yahoo.com#1368561126.64093.YahooMailNeo@web162904.mail.bf1.yahoo.com The thread on the initial patch for REFRESH MATERIALIZED VIEW CONCURRENTLY started with: | Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY | for 9.4 CF1. The goal of this patch is to allow a refresh | without interfering with concurrent reads, using transactional | semantics. | | It is my hope to get this committed during this CF to allow me to | focus on incremental maintenance for the rest of the release cycle. There was much discussion, testing, and revision then, which is here: http://www.postgresql.org/message-id/flat/1371225929.28496.YahooMailNeo@web162905.mail.bf1.yahoo.com#1371225929.28496.YahooMailNeo@web162905.mail.bf1.yahoo.com I think pretty much every concern raised was addressed except for a lingering doubt expressed by Noah over whether IS NOT DISTINCT FROM semantics were really the right basis for matching rows. Based on that feedback, I spent a lot of time looking at why that might or might not be correct, and decided that I had been wrong to base the behavior on that, for the reasons Robert expressed so clearly a couple messages back. Hence this patch. I had made the mistake of using an operator which used a column-by-column comparison based on the equality operator for the default opclass for comparing two values of each respective column type. I had been challenged on that in the review process, and was responding to it with the fix contained in this patch. The first post on this thread starts with: | Attached is a patch for a bit of infrastructure I believe to be | necessary for correct behavior of REFRESH MATERIALIZED VIEW | CONCURRENTLY as well as incremental maintenance of matviews. I think it is fairly obvious that REFRESH should REgenerate a FRESH copy of the data, versus incremental maintenance -- which attempts to keep the matview up-to-date without regenerating the full set of data. Whenever there is logical replication (and materialized views are, conceptually, one form of that -- within the database) I feel it is important to be able to correct any possible "drift". With matviews, I see the way to do that as the REFRESH command, and I feel that it is important to be able to do that in a way that can run concurrently with readers of the matview -- without blocking them or being blocked by them. Discussion of incremental maintenance really belongs on a different thread. Since I have gone to the trouble to read a lot of papers on the topic, and select one that I think is a good basis for our implementation, I hope everyone will frame discussion in terms of either: - how best to implement the techniques from that paper, or - why some other paper presents a better technique. I think it would be madness to approach implementing incremental maintenance based on an ad hoc set of thoughts rather than a peer-reviewed paper. I know how tempting it is it start from zero and think, "if we just do X we can cover this sort of query." I had a few thoughts like that before I read all those papers and discovered that there were many subtle issues to cover. We will have plenty of time to get creative with alternatives when we get past the query types specifically addressed in the paper and begin to move into, for example, CTEs and window functions. I certainly expect robust discussion around those areas, or even some of the infrastructure for capturing deltas and triggering the incremental maintenance. I really didn't expect to have to burn so much time and energy arguing over whether a REFRESH should leave the matview accurately containing the results of the matview's query. >> We can argue about how it should be named After looking at the existing suggestions and thinking about it I'm leaning toward these operators (based on a star in front of the usual default comparison operators): *= *<> *> *>= *< *<= >> and whether it should be documented I thought we had a consensus to document both the existing record comparison operators and these new ones, and I'm fine with that. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Sep 23, 2013 at 1:19 PM, Stephen Frost <sfrost@snowman.net> wrote: > So I've gathered and I'm not terribly surprised that it's broken. It > was unfortunate that we committed it as such, imv. I'm *not* convinced > that means we should build on that in a direction that doesn't appear to > be getting us any closer to real matviews. Being in master doesn't make > it released. > >> This is an improvement >> over what is available in 9.3, because even though you still have to >> rerun the full query, you don't have to lock out access to the table >> in order to apply the changes. > > I see how you can view it as an improvement, but consider what you'd say > if a user showed up on IRC wanting to implement *exactly* this in their > DB using a cronjob and a bit of plpgsql. Would we encourage him/her and > say "gee, that's a great approach, just compare the whole row!" Heck > no, we'd say "uhh, you need a key there that you can depend on when > you're doing your updates." We'd probably even suggest that they, I > dunno, make that key their *primary key* for the matview. If they asked > what would happen with their little plpgsql code when using citext or > other, similar, type, we'd tell them exactly what would happen when > using such a type with regular updates, and maybe even describe how they > could get themselves into trouble if they tried to issue updates which > changed those key fields and therefore run into possible lock contention > from the unique index on values-that-are-not-really-unique-to-them. > > I can't see anyone encouraging that and here we are building a major new > feature on it! To be honest, I'm amazed that I appear to be alone with > this. I've written various scripts over the years for this kind of thing, and they all compared the whole row. I used a PK comparison to determine which row needed to be updated and then compared the remaining fields to figure out which values needed to be updated. I had assumed that's how most people did it; what do you do? Anyway, that is exactly what Kevin is proposing to do here and, to be clear, he's NOT proposing to use the binary-identical semantics to identify the row to be updated. That will happen using the semantics of whatever index the user chooses to create on the PK column. Rather, he's only using the binary-identical to decide which rows are completely unchanged. I might be wrong here, but it seems to me that that renders most of your argument here moot. Under Kevin's proposal, changing a citext column that acts as a PK for the matview WON'T cause the row to be deleted and reinserted, but what it will do is say, oh, it's the same row (the values are equal) but the case is different (the rows are not binary-equal), let me go update the PK column with the new value. From where I stand, that seems like exactly the right behavior. What do you think should happen instead? >> Now, the next project Kevin's going to work on, and that he was >> working on when he discovered this problem, is incremental >> maintenance: that is, allowing us to update the view *without* needing >> to rerun the entire query. This record comparison operator will be >> just as important in that context. > > You state this but I don't see where you justify this claim.. The next sentence was intended as justification. >> The *only* strategy refreshing a >> materialized view that *doesn't* need this operator is the only we >> have in 9.3: through out all the old data and replace it with the >> result of re-executing the query. > > I really don't agree with this, but allow me to play along a bit. What > is going to happen with this incremental updates when you want to use an > index to figure out what to update? We already do use an index to figure out what to update with concurrent refresh; or at least we can, when the query planner thinks that is the cheapest strategy. I don't imagine that that part will change significantly for incremental updates. > Are you going to build the index > using your binary equality operator, which could allow *duplicates that > are not really duplicates*, even in the externally-visible world? And > then we'll be returning multiple rows to the user when we shouldn't be? No, see above. There's no intention that the user should need to use this new opclass to make materialized views work correctly. The user is required to define a unique index in order to use REFRESH MATERIALIZED VIEW CONCURRENTLY, and that unique index defines the semantics used to compare existing rows to new candidate rows. Only after we've matched up the old and new rows do we use the exact-equality semantics to decide whether to update the existing row or do nothing. > There's a whole slew of evil examples of this- it's called unnecessary > DISTINCT's. Or perhaps you'll simply look up based on the actual > equality answer and then *pick one* to use, perhaps the most recent, Sure, all of that would suck, but nobody's proposing anything like that. > but > then that may or may not equal what the actual query would generate > *anyway* because of costing, join ordering, etc, etc, as I already > pointed out. Most queries I write produce the same results every time. The order of the rows may vary when I don't care, but I take care to control the query so that the contents of the rows don't vary. For example, if I use string_agg() to group rows, I make sure the rows always arrive in the same order, using ORDER BY. The reason I do this is because, even today, failing to do so causes too many user-visible artifacts. However, I admit that people who don't adhere to that practice will see additional anomalies with materialized views just as they do when they run the same query repeatedly. However, that will be true in any materialized view implementation and has little or nothing to do with the present proposal. > You're trying to guarantee something here that you can't. > Trying to fake it and tell the users "oh, you'll be ok" is actually > worse because it means they'll try and depend on it and then get all > kinds of upset when it ends up *not* working. I don't know what this is referring to. > I'd much rather tell users "look, if you want to use these for *data* > fields, that's fine, because our smart matview will figure out the keys > associated with a GROUP BY and will update based on those, but if you > GROUP BY a column with a type whose equality operator allows things to > be different, you'll get *the same behavior you would get from the > query*, which is to say, it won't be deterministic." Again, Kevin is proposing something strictly BETTER than this. We *will* figure out the keys associated with the GROUP BY and we *will* update based on those, assuming you define the unique index on the materialized view with the same opclass used for the GROUP BY. However, if you GROUP BY a column with a type whose equality operator allows things to be different, then the materialized view will reference the actual value produced by the last execution of the query, rather than possibly producing something else. >> If you want anything smarter than >> that, you MUST compare old and new rows for equality so that you can >> update only those rows that have been changed. And if you compare >> them *any strategy other than the one Kevin is proposing*, namely >> binary equality, then you may sometimes decide that a row has not been >> changed when it really has, and then you won't update the row, and >> then incremental maintenance will be enabled to produce *wrong >> answers*. So to me this has come to seem pretty much open and shut. > > The incremental maintenance approach, I'd hope, would be keeping track > of what changed and what rows in the view need to be regenerated due to > those underlying rows changing. Doing that, you *would* know what > needed to be updated, and how. If you're not keeping track of the > specific rows that changed underneath, then I'm curious how these > incremental matviews will work differently from what's being described > here, where we're trying to make the whole freakin' row the key. Kevin posted an email discussing the algorithm that he was proposing to implement back in May. http://www.postgresql.org/message-id/1368561126.64093.YahooMailNeo@web162904.mail.bf1.yahoo.com I would suggest that you read the referenced papers for details as to how the algorithm works. To make a long story short, they do need to keep track of what's changed, and how. However, that still seems largely orthogonal to the present discussion. >> We all know that materialized views are not going to always match the >> data returned by the underlying query. Perhaps the canonical example >> is SELECT random(). As you pointed out upthread, any query that is >> nondeterministic is a potential problem for materialized views. When >> you write a query that can return different output based on the order >> in which input rows are scanned, or based on any sort of external >> state such as a random-number generator, each refresh of a >> materialized view based on that query may produce different answers. > > Of course. > >> There's not a lot we can do about that, except tell people to avoid >> using such queries in materialized views that they expect to be >> stable. However, what we're talking about here is a completely >> separate problem. Even if the query is 100% deterministic, without >> this patch, the materialized view can get out of sync with the query >> results. > > Only a subset of the queries involving these types are 100% > deterministic and those are the stupid simple cases where the view is > just being used as a synonym. As soon as you start doing anything that > actual uses the equality operator (and we have a *lot* of things that > do!), you're going to get nondeterministic behavior. You're trying to > paper over this and I'm not impressed. This does not match my experience. I have written many complex views over the years which it would have been beneficial to materialize. For example, I had a 34-table join involving multiple aggregates in one web app I deployed. I would not consider that to be using the view "just as a synonym", but I can assure you that the results were quite deterministic. That view involved both text and numeric data, as I recall. > Or are you telling me that a matview using the binary-equality technique > will be deterministic when your GROUP BY a citext column in the view? I > don't buy it. No, I'm not telling you that. That would be a stupid thing to say, so I'm glad I'm not saying it. >> Granted, most of the ways in which it can get out of sync are fairly >> subtle: failing to preserve case in a data type where comparisons are >> text-insensitive; gaining or loosing an all-zeroes null bitmap on an >> array type; adding or removing trailing zeroes after the decimal point >> in a numeric. If the materialized view sometimes said "1" when the >> query was returning "0", we'd presumably all say "that's a bug, let's >> fix it". But what we're actually talking about is that the query >> returns "0.00" and the view still says zero. So we're doing a lot of >> soul-searching about whether that's unequal enough to justify updating >> the row. Personally, though, there's not a lot of doubt in my mind. > > If it was as open-and-shut as that, and we *could* be 100% > deterministic in *all* cases with these special types when going through > views, I'd be a lot more on-board. However, we can't; not without > insisting that the users use some new set of "binary-aggregation" > operators which are able to *pick* which of the equal-but-not-really > options they want; and by-the-way, you can *bet* that's going to need to > be type-specific. We don't have to control which operators users can select. The existing implementation, as committed, lets the user control that by choosing which unique indexes to define. It's actually a little bit clunky at the moment; if you define multiple unique indexes on the materialized view, it seems that we consider a row to be "the same row" only if the equality operators for all columns in all such indexes think that the rows hasn't been updated. We might want to revisit that; I can see insisting that the user pick exactly one unique index to define the comparison semantics, and giving them explicit syntax for doing so, precisely to eliminate some of the lock contention and other problems that might occur from excess updates. I guess the importance of such a change depends largely on how likely you think it is that someone might put a unique index on a materialized view for some reason other than to define the comparison semantics. But whether we change that or not, the basic point you are concerned about here is already covered. >> If I create a table and I put 0 into a column of that table and then >> create a materialized view and that 0 ends up in the materialized >> view, and then I update the 0 to 0.00 and refresh the view, I expect >> that change to propagate through to the materialized view. It works >> that way if I select from a regular, non-materialized view; and it >> also works that way if I select from the table itself. > > Sure- and then you try to do the same thing with a join or a group by. > Just because it works in the simple SELECT case doesn't mean we get to > tell users "you can depend on this!" No, but I think the case you're concerned about will work mostly-acceptably today and even better after the proposed patch. If you disagree, perhaps you could prepare a test case demonstrating the problem. I don't think that either Kevin or myself are trying to ram through a feature that is half-baked here, and I certainly agree that it is very important to get as much of this possible worked out before 9.4 arrives in stores everywhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Kevin Grittner (kgrittn@ymail.com) wrote: > Unless we can tell whether there are any differences between two > versions of a row, we can't accurately generate the delta to drive > the incremental maintenance. This is predicated on the assumption that you simply generate the new view and then try to figure out what to go update. I'm trying to explain that using that methodology is what landed us in this situation to begin with. > The initial thread discussing how > incremental maintenance would be done is here: My apologies for not paying quite as close attention to that thread as I should have. > I think it is fairly obvious that REFRESH should REgenerate a FRESH > copy of the data, versus incremental maintenance -- which attempts > to keep the matview up-to-date without regenerating the full set of > data. Having 'REFRESH' regenerate a fresh copy of the data makes sense to me, and is what we have now, no? The only issue there is that it takes out a big lock, which I appreciate that you're trying to get rid of. > Whenever there is logical replication (and materialized > views are, conceptually, one form of that -- within the database) I > feel it is important to be able to correct any possible "drift". > With matviews, I see the way to do that as the REFRESH command, and > I feel that it is important to be able to do that in a way that can > run concurrently with readers of the matview -- without blocking > them or being blocked by them. Of course. > Discussion of incremental maintenance really belongs on a different > thread. I'm really getting tired of everyone saying "this is the only way to do it" (or perhaps "well, this is already committed, therefore it must be what we're gonna do") when a) we're already planning to rip this out and change it, or so I thought, and b) we're trying to make promises we can't keep with this approach. > Since I have gone to the trouble to read a lot of papers > on the topic, and select one that I think is a good basis for our > implementation, I hope everyone will frame discussion in terms of > either: > - how best to implement the techniques from that paper, or > - why some other paper presents a better technique. My recollection from the hackers meeting is that I'm trying to simply paraphrase what you had said was in the paper wrt keeping track of what rows are changed underneath and using that as a basis to implement the changes necessary in the view. Does the paper you're referring to describe rerunning the whole query and then trying to figure out what's been changed..? That's really what I'm having trouble understanding why anyone would want to implement. I'll try and find time to hunt down the threads and papers on it, but I really could have sworn this was gone over at the hacker meeting- and it made a lot of sense to me, then. > I really didn't expect to have to burn so much time > and energy arguing over whether a REFRESH should leave the matview > accurately containing the results of the matview's query. I appreciate you bringing me up to speed on where things actually are here- again, sorry for not realizing the direction that this was going in earlier; it really didn't even occur to me that it would have gone down this road. I, also, didn't expect to spend so much time on this. > >> We can argue about how it should be named Really, I'm back to trying to figure out why we want to go down this road at all. > >> and whether it should be documented > > I thought we had a consensus to document both the existing record > comparison operators and these new ones, and I'm fine with that. If it gets added, it certainly should be documented, and heavily caveated. Thanks, Stephen
On Mon, Sep 23, 2013 at 2:46 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Anyway, that is exactly what Kevin is proposing to do here and, to be > clear, he's NOT proposing to use the binary-identical semantics to > identify the row to be updated. That will happen using the semantics > of whatever index the user chooses to create on the PK column. > Rather, he's only using the binary-identical to decide which rows are > completely unchanged. I might be wrong here, but it seems to me that > that renders most of your argument here moot. Under Kevin's proposal, > changing a citext column that acts as a PK for the matview WON'T cause > the row to be deleted and reinserted, but what it will do is say, oh, > it's the same row (the values are equal) but the case is different > (the rows are not binary-equal), let me go update the PK column with > the new value. From where I stand, that seems like exactly the right > behavior. What do you think should happen instead? Ah, I'm wrong here. It is true that we use the unique index semantics to compare the results of rerunning the query to what's in the view, but it's not true (currently) that we ever do updates on the view as opposed to delete-and-reinsert. Apparently that code got (mostly) ripped out at some point, but I was confused by a comment that wasn't fully updated to reflect the new reality. Still, I believe that most of the points I'm making here remain valid, because the key assumption you seem to be making is that Kevin is proposing to use binary-identical semantics throughout, and that's not true. Old rows and new candidate rows are matched up using the user-specified opclass, but binary-identical. Binary-identical just determines whether to replace the rows. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert, all, Skipping out on much of the side-discussion to try and drive at the heart of this.. * Robert Haas (robertmhaas@gmail.com) wrote: > I would suggest that you read the referenced papers for details as to > how the algorithm works. To make a long story short, they do need to > keep track of what's changed, and how. I suppose it's good to know that I wasn't completely misunderstanding the discussion in Ottawa. > However, that still seems > largely orthogonal to the present discussion. It *solves* this issue, from where I'm sitting, without any binary operators at all. rows 4, 5, 6 are used to compose matrow 1. When 4, 5, or 6 are updated, matrow 1 gets updated. When the update happens, and it *will* happen (there's no question about "oh, should we actually update this record?"), it's a normal PG update and all of the values in the row get set to whatever the new values are. The only reason we've having any of this discussion is because, in the current implementation, aiui anyway, we're saying "oh, the user wants us to update the matview, but we have *no clue* what actually changed, so we're just gonna try and guess.." This is changing that from "we're gonna try and guess.." to "well, if they aren't *binary identical*, we'll change them." However, if you're actually tracking what's changing and rebuilding the rows based on that, there's no question about binary equality, and you're still only updating the rows that actually need to be updated. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> wrote: > I'm trying to explain that using that methodology is what landed > us in this situation to begin with. I'm trying to figure out what situation you think we're in. Seriously, if you could apply the patch and show one example that demonstrates what you see to be a problem, that would be great. >> I think it is fairly obvious that REFRESH should REgenerate a FRESH >> copy of the data, versus incremental maintenance -- which attempts >> to keep the matview up-to-date without regenerating the full set of >> data. > > Having 'REFRESH' regenerate a fresh copy of the data makes sense to me, > and is what we have now, no? The only issue there is that it takes out > a big lock, which I appreciate that you're trying to get rid of. > >> Whenever there is logical replication (and materialized >> views are, conceptually, one form of that -- within the database) I >> feel it is important to be able to correct any possible "drift". >> With matviews, I see the way to do that as the REFRESH command, and >> I feel that it is important to be able to do that in a way that can >> run concurrently with readers of the matview -- without blocking >> them or being blocked by them. > > Of course. > >> Discussion of incremental maintenance really belongs on a different >> thread. > > I'm really getting tired of everyone saying "this is the only way to do > it" (or perhaps "well, this is already committed, therefore it must be > what we're gonna do") What I'm saying is that REFRESH and incremental maintenance are two different things, and conflating them just confuses everything. > when a) we're already planning to rip this out and change it, or > so I thought, The entire change to matview-specific code is to use a different operator in two places. Outside of that, it consists of adding the 12th non-default opclass to core. > and b) we're trying to make promises we can't keep with this > approach. I don't see any such. If you do, please describe them; or better yet, give an example. >> Since I have gone to the trouble to read a lot of papers >> on the topic, and select one that I think is a good basis for our >> implementation, I hope everyone will frame discussion in terms of >> either: >> - how best to implement the techniques from that paper, or >> - why some other paper presents a better technique. > > My recollection from the hackers meeting is that I'm trying to simply > paraphrase what you had said was in the paper wrt keeping track of what > rows are changed underneath and using that as a basis to implement the > changes necessary in the view. Does the paper you're referring to > describe rerunning the whole query and then trying to figure out what's > been changed..? That's really what I'm having trouble understanding > why anyone would want to implement. I'll try and find time to hunt down > the threads and papers on it, but I really could have sworn this was > gone over at the hacker meeting- and it made a lot of sense to me, then. The only thing the paper says on the topic is that any incremental maintenance scheme is a heuristic. There will always be cases when it would be faster and less resource-intensive to regenerate the data from the defining query. There is at least an implication that a good implementation will try to identify when it is in such a situation, and ignore the whole incremental maintenance approach in favor of what we are doing with REFRESH. The example they give is if there is an unqualified DELETE of every row in a table which is part of an inner join generating the result, that it would almost be faster to to generate the (empty) result set than to run their algorithm to determine that all the rows need to be deleted. One reason for having a REFRESH that re-runs the query like this is that it *is* a recommended "escape hatch" when a mass operation makes the incremental calculations too expensive. >> I really didn't expect to have to burn so much time >> and energy arguing over whether a REFRESH should leave the matview >> accurately containing the results of the matview's query. > > I appreciate you bringing me up to speed on where things actually are > here- again, sorry for not realizing the direction that this was going > in earlier; it really didn't even occur to me that it would have gone > down this road. I, also, didn't expect to spend so much time on this. > >>>> We can argue about how it should be named > > Really, I'm back to trying to figure out why we want to go down this > road at all. > >>>> and whether it should be documented >> >> I thought we had a consensus to document both the existing record >> comparison operators and these new ones, and I'm fine with that. > > If it gets added, it certainly should be documented, That seems to be the consensus. In fact, I would have submitted that with this patch if there had been any documentation for the default record comparison operators. It seemed like that might have been omitted on purpose, and it seemed weird to add documentation for a non-default operator for records when (a) we didn't document the default operator for records and (b) we don't document many of the other non-default operators already in core. > and heavily caveated. I'm not sure what caveats would be needed. It seems to me that a clear description of what it does would suffice. Like all the other non-default opclasses in core, it will be non-default because it is less frequently useful. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Sep 23, 2013 at 3:45 PM, Stephen Frost <sfrost@snowman.net> wrote: > Robert, all, > > Skipping out on much of the side-discussion to try and drive at the > heart of this.. > > * Robert Haas (robertmhaas@gmail.com) wrote: >> I would suggest that you read the referenced papers for details as to >> how the algorithm works. To make a long story short, they do need to >> keep track of what's changed, and how. > > I suppose it's good to know that I wasn't completely misunderstanding > the discussion in Ottawa. > >> However, that still seems >> largely orthogonal to the present discussion. > > It *solves* this issue, from where I'm sitting, without any binary > operators at all. > > rows 4, 5, 6 are used to compose matrow 1. When 4, 5, or 6 are updated, > matrow 1 gets updated. > > When the update happens, and it *will* happen (there's no question about > "oh, should we actually update this record?"), it's a normal PG update > and all of the values in the row get set to whatever the new values are. I don't know why there shouldn't be a question about that. Suppose that the MAX() aggregate is in use. If 4 or 5 or 6 is updated so as to change the maximum of the three, then matrow 1 needs updating. But if the maximum remains the same, then it doesn't. The right way to decide whether it needs updating is to re-aggregate those three rows and then see whether you get the same (read: binary identical) out of the aggregate that you got the last time you ran it. Also, suppose the same statement updates row 4, row 5, and row 6. Instead of updating the materialized view three times, you do it just once at end-of-statmement, like an AFTER STATEMENT trigger that somehow knows which rows were updated. In this case even something like AVG() could produce the same result as it did before the update. And you'd surely want to avoid updating the matview if the new value was the same as what was already stored in the matview (but not if it was equal but not the same). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > I don't know why there shouldn't be a question about that. Because anything else would be an internal optimization which must be proven to be correct, imv, also.. > Suppose > that the MAX() aggregate is in use. If 4 or 5 or 6 is updated so as > to change the maximum of the three, then matrow 1 needs updating. But > if the maximum remains the same, then it doesn't. The right way to > decide whether it needs updating is to re-aggregate those three rows > and then see whether you get the same (read: binary identical) out of > the aggregate that you got the last time you ran it. You could argue the same about PG doing that for any row update- check if anything is actually *binary* different and, if not, then don't update it. Of course, there's questions about if that's "right" and what about triggers, etc.. > Also, suppose the same statement updates row 4, row 5, and row 6. > Instead of updating the materialized view three times, you do it just > once at end-of-statmement, like an AFTER STATEMENT trigger that > somehow knows which rows were updated. Sorry if I wasn't clear, but that's exactly what I was trying to describe regarding how it should work. I was NOT intending to suggest that each update immediately update the matview. It's just that we keep *track* of what was updated and then, at some convenient point, actually run the process to update the matview rows (maybe in an AFTER statement, maybe 5 minutes from now). > In this case even something > like AVG() could produce the same result as it did before the update. Sure it could. > And you'd surely want to avoid updating the matview if the new value > was the same as what was already stored in the matview (but not if it > was equal but not the same). I don't see why updating a row that was built with AVG() should be avoided over a row that was built with MAX(), unless you're suggesting there's a different set of rows involved in the two or there's some additional optimization around figuring out if these particular changes *should* actually change the result. That's an analysis which could still happen and wouldn't need to rely on any binary equality test, and it'd need to have a whole lot more smarts than this approach anyway or you'll still end up running a query against all of the rows involved in the AVG() to then only decide at the last moment to not update the row, which doesn't strike me as a great optimization. Perhaps that's why we didn't implement it for PG itself? Thanks, Stephen
* Kevin Grittner (kgrittn@ymail.com) wrote: > The only thing the paper says on the topic is that any incremental > maintenance scheme is a heuristic. There will always be cases when > it would be faster and less resource-intensive to regenerate the > data from the defining query. There is at least an implication > that a good implementation will try to identify when it is in such > a situation, and ignore the whole incremental maintenance approach > in favor of what we are doing with REFRESH. The example they give > is if there is an unqualified DELETE of every row in a table which > is part of an inner join generating the result, that it would > almost be faster to to generate the (empty) result set than to run > their algorithm to determine that all the rows need to be deleted. > One reason for having a REFRESH that re-runs the query like this is > that it *is* a recommended "escape hatch" when a mass operation > makes the incremental calculations too expensive. I was just chatting about this with a coworker and we see a *lot* of cases where an unqualified DELETE and then INSERT would be perfectly acceptable over the existing heavy lock that's taken out. We're already doing that in a lot of places because it's actually pretty cheap and the "view" results aren't all that big. It's also completely defensible and doesn't require any new operators. > > and heavily caveated. > > I'm not sure what caveats would be needed. It seems to me that a > clear description of what it does would suffice. Like all the > other non-default opclasses in core, it will be non-default because > it is less frequently useful. "This will claim things are different, even when they aren't different when cast to text, or possibly even when extracted in binary mode, ensure this is really what you want" is a pretty big caveat, imv. Thanks, Stephen
* Robert Haas (robertmhaas@gmail.com) wrote: > In this case even something > like AVG() could produce the same result as it did before the update. > And you'd surely want to avoid updating the matview if the new value > was the same as what was already stored in the matview (but not if it > was equal but not the same). BTW, I'm wondering how you're going to explain that, with an optimization of MAX() for matviews that only looks at the records which were updated, that we decided to change 'A' to 'a' on a citext column because they were binary different, yet if you considered all of the rows under the MAX(), 'A' would have nearly always come first and therefore would have been the more likely candidate (and what the user probably would have gotten running the query themselves). Also, as asked, I'll work up an example of how the matview can produce a different result from the regular view in some cases, even with the binary equality stuff. Thanks, Stephen
* Kevin Grittner (kgrittn@ymail.com) wrote: > I'm trying to figure out what situation you think we're in. > Seriously, if you could apply the patch and show one example that > demonstrates what you see to be a problem, that would be great. Here's an example to illustrate what I'm talking about when it comes down to "you can't claim that you'll produce exactly what the query will always, with these types:" =# table citext_table;id | name ----+------- 1 | one 3 | three 5 | 4 | Three 2 | Three (5 rows) =# create MATERIALIZED VIEW citext_matview AS select name from citext_table where id > 0 group by name; SELECT 3 =# table citext_matview;name -------onethree (3 rows) =# set enable_seqscan = false; SET =# select name from citext_table where id > 0 group by name;name -------oneThree (3 rows) Yes, the 'set enable_seqscan = false;' is a bit of a cheat- but I hope you agree that it *shouldn't* change the *result* of a query. There's certainly other cases where a different plan could be picked and result in the same kind of difference between the matview and running the query. Build a matview w/ max(id) as id and a unique index on 'id' and you'll still get the same issue. The problem is that the seqscan will pick up on 'three' first while the index scan will find 'Three' first. This is all with the patch from http://www.postgresql.org/message-id/1379024847.48294.YahooMailNeo@web162904.mail.bf1.yahoo.com applied. I simply don't believe it's possible to be consistent in this unless we declare the type's equality to be insufficient for us and, everywhere that we call a type's equality operator, *also* check which of the equal values involved in the comparison is "bigger" or "lower" (based on binary comparison) and actually pick one, consistently, all the time. This is why I'm complaining that we're trying to paper over a difference that just isn't that simple. I understand why you're trying to- it's definitely annoying, but this isn't so much a solution as it is a hack that doesn't actually work in all cases. I don't have a silver bullet for this but I don't like saying "let's implement these binary operators and make things *look* consistent in most cases, and then fail subtly in complex situations." I agree that users may complain if the underlying relation ends up not having *any* entries with the value that's in the matview and a refresh doesn't update it. I expect users will *also* complain if we implement these operators and then they write their own queries using this awesome new 'binary' comparison operator to compare their equal-to-citext strings- and discover that the binary operators say things that *look* the same are actually *different* (thinking encoding issues, overlong encodings, etc). That'd be a subtle and painful bug (technically in their code, not ours, but still) to find, and then what do you do? Argueing against citext (as I've had to do in the past..) would probably be more difficult for some if they saw these operators and figured they could use them. As I mentioned in the thread w/ Robert, when looking at further optimizations down the road, we're going to be in a situation of looking at only a subset of the records and then considering what to do when supporting MAX() efficiently leads us to running 'greater(existing,new)' which returns false, but q and r are binary different. This argument is saying "always replace what's there" but every other, existing, value in the table might be represented by the 'existing' representation. Thanks, Stephen
On Mon, Sep 23, 2013 at 9:21 PM, Stephen Frost <sfrost@snowman.net> wrote: > Here's an example to illustrate what I'm talking about when it comes > down to "you can't claim that you'll produce exactly what the query > will always, with these types:" Your example demonstrates that if a given query can generate two different outputs, A and B, based on the same input data, then the contents of the materialized view cannot be equal to be A and also equal to B. Well, no duh. Of course, you don't need citext, or any other data type with a loose notion of equality, to generate that sort of problem: rhaas=# create table chicken_little (d date); CREATE TABLE rhaas=# insert into chicken_little values ('2012-02-21'); INSERT 0 1 rhaas=# create materialized view henny_penny as select d::text from chicken_little; SELECT 1 rhaas=# table chicken_little; d ------------2012-02-21 (1 row) rhaas=# table henny_penny; d ------------2012-02-21 (1 row) rhaas=# set datestyle = 'german'; SET rhaas=# table chicken_little; d ------------21.02.2012 (1 row) rhaas=# table henny_penny; d ------------2012-02-21 (1 row) But I'm still wondering what this is intended to prove. There are an infinite number of ways to write queries that produce different results, and I think we all know that materialized views aren't going to hold up very well if given such queries. That seems a poor excuse for not fixing the cases that can be made to work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-23 21:21:53 -0400, Stephen Frost wrote: > Here's an example to illustrate what I'm talking about when it comes > down to "you can't claim that you'll produce exactly what the query > will always, with these types:" > > =# table citext_table; > id | name > ----+------- > 1 | one > 3 | three > 5 | > 4 | Three > 2 | Three > (5 rows) > > =# create MATERIALIZED VIEW citext_matview AS select name from citext_table where id > 0 group by name; > SELECT 3 > > =# table citext_matview; > name > ------- > > one > three > (3 rows) I don't really understand why you have a problem with this specific thing here. Kevin's goal seems to be to make materialized views behave consistent with the way a plain view would if the matviews would just have been refreshed fully, concurrently or incrementally and there have been no further data changes. SELECT * FROM table WHERE candidate_key IN (...); used in a view or plainly currently guarantees you that you get the original casing for citext. And if you e.g. have some additional expensive joins, such a matview can very well make sense. What does it matter that ... GROUP BY citext_col; doesn't return the same casing consistently? You're aggregating, not accessing the original data. If you need to separate the different casings now, cast to text. Now, I can perfectly understand having a bit of architectural qualms about Kevin's approach of things on the SQL level. But this doesn't seem to be the thread to discuss that. FWIW, I can't forsee any realistic approach that actually won't do such comparisons (although not necessarily through C). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/23/2013 10:38 PM, Stephen Frost wrote: > >>> and heavily caveated. >> I'm not sure what caveats would be needed. It seems to me that a >> clear description of what it does would suffice. Like all the >> other non-default opclasses in core, it will be non-default because >> it is less frequently useful. > "This will claim things are different, even when they aren't different > when cast to text, or possibly even when extracted in binary mode, > ensure this is really what you want" is a pretty big caveat, imv. Yes, it should be documented that it tests for sameness and gives no guarantees that lack of sameness means "different" (as determined by some other operator) -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
* Robert Haas (robertmhaas@gmail.com) wrote: > Your example demonstrates that if a given query can generate two > different outputs, A and B, based on the same input data, then the > contents of the materialized view cannot be equal to be A and also > equal to B. Well, no duh. Two different outputs based on what *plan* is chosen. > Of course, you don't need citext, or any other data type with a loose > notion of equality, to generate that sort of problem: [...] > rhaas=# set datestyle = 'german'; > SET I'm talking about *planner differences* changing the results. If we've got a ton of cases where a different plan means different output, then we've got some serious problems. I'd argue that it's pretty darn clear that datestyle is going to be a *slightly* different animal. My example doesn't *require* changing any GUCs, it was just expedient for illustration. > But I'm still wondering what this is intended to prove. These types are, to those that use them at least, a known quantity wrt what you get when using them in GROUP BYs, JOINs, etc. You're trying to 'fix' something that isn't really broken and you're only doing it half-baked anyway because your 'fix' isn't going to actually make these types produce consistent results. > There are an > infinite number of ways to write queries that produce different > results, and I think we all know that materialized views aren't going > to hold up very well if given such queries. That seems a poor excuse > for not fixing the cases that can be made to work. New operators are not without cost, I don't think it's a good idea to expose our internal binary representations of data out to the SQL level, and the justification for adding them is that they solve a case that they don't. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> wrote: > Skipping out on much of the side-discussion to try and drive at > the heart of this.. > > Robert Haas (robertmhaas@gmail.com) wrote: >> I would suggest that you read the referenced papers for details >> as to how the algorithm works. To make a long story short, they >> do need to keep track of what's changed, and how. > > I suppose it's good to know that I wasn't completely > misunderstanding the discussion in Ottawa. > >> However, that still seems largely orthogonal to the present >> discussion. > > It *solves* this issue, from where I'm sitting, without any > binary operators at all. > [ argument that the only way we should ever do REFRESH is by > using captured deltas, through incremental maintenance techniques > ] That would ensure that a query could not be used to define a matview until we had implemented incremental maintenance for queries of that type and complexity. I expect that to come *close* to covering all useful queries that way will take five to ten years, if all goes well. The approach I'm taking makes all queries available *now*, with improvements in how many can be maintained incrementally improving over time. This is the route every other database I've looked at has taken (for good reason, I think). Ultimately, even when we have incremental maintenance supported for all queries that can be used to define a matview, I think there will be demand for a REFRESH command that re-runs the base query. Not only does that fit the workload for some matviews, but consider this, from the paper I cited[1]: | Recomputing the view from scratch is too wasteful in most cases. | Using the heuristic of inertia (only a part of the view changes | in response to changes in the base relations), it is often | cheaper to compute only the changes in the view. We stress that | the above is only a heuristic. For example, if an entire base | relation is deleted, it may be cheaper to recompute a view that | depends on the deleted relation (if the new view will quickly | evaluate to an empty relation) than to compute the changes to the | view. What we're talking about is a performance heuristic -- not something more profound than that. The route I'm taking is to get it *working correctly now* using the simple technique, and then embarking on the long journey of optimizing progressively more cases. What your argument boils down to IMV is essentially a case of premature optimization. You have yet to show any case where the existing patch does not yield correct results, or show that there is a better way to get to the point this patch takes us. > [ later post shows a query that does not produce deterministic > results ] Sure, two direct runs of that same query, or two runs through a regular view, could show different results (considering synchronized scans, among other things). I don't see what that proves. Of course a refresh of a matview is only going to produce one of those and then will not produce a different result until it is refreshed or (once we add incremental maintenance) something changes in the underlying data. Nobody ever claimed that a query which does not produce consistent results would somehow produce them with this patch. There are queries using citext, numeric, and other types which *do* provide consistent results which are consistently produced by a straight query, a simple view, or a non-concurrent refresh of a materialized view; this patch will cause a concurrent refresh to produce the same results as those, rather than something different. Period. That's the point, and the whole point. You have not shown that it doesn't. You have not shown why adding a 12th non-default opclass is a particular problem here (although we have a consensus to use different operators, to reserve this operator namespace for other things). You have not made any case at all for why people should wait for incremental maintenance to be mature (a project which will take years) before being able to use materialized views with concurrent refreshes. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] A. Gupta, I. S. Mumick, and V. S. Subrahmanian. Maintaining Views Incrementally. In SIGMOD 1993, pages 157-167. http://dl.acm.org/citation.cfm?id=170066
On Tue, Sep 24, 2013 at 7:14 AM, Stephen Frost <sfrost@snowman.net> wrote: >> Of course, you don't need citext, or any other data type with a loose >> notion of equality, to generate that sort of problem: > [...] >> rhaas=# set datestyle = 'german'; >> SET > > I'm talking about *planner differences* changing the results. If we've > got a ton of cases where a different plan means different output, then > we've got some serious problems. I'd argue that it's pretty darn clear > that datestyle is going to be a *slightly* different animal. My example > doesn't *require* changing any GUCs, it was just expedient for > illustration. I'm completely confused, here. What's so special about planner differences? Obviously, there are tons of queries that can produce different results based on planner differences, but materialized views didn't create that problem and they aren't responsible for solving it. Also, even restricting ourselves to planner differences, there's no particular link between the behavior of the type's equality operator and whether or not the query always produces the same results. For example, let's consider good old text. rhaas=# create table tag_data (id integer, tag text, userid text, primary key (id, tag)); CREATE TABLE rhaas=# insert into tag_data values (1, 'foo', 'tom'), (1, 'bar', 'dick'), (2, 'baz', 'harry'); INSERT 0 3 rhaas=# select id, string_agg(tag||':'||userid, ', ') tags from tag_data group by 1;id | tags ----+------------------- 1 | foo:tom, bar:dick 2 | baz:harry (2 rows) rhaas=# set enable_seqscan=false; SET rhaas=# select id, string_agg(tag||':'||userid, ', ') tags from tag_data group by 1;id | tags ----+------------------- 1 | bar:dick, foo:tom 2 | baz:harry (2 rows) Now texteq() is just a glorified version of memcmp(), so what does this complaint possibly have to do with Kevin's patch, or even materialized views in general? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Kevin Grittner (kgrittn@ymail.com) wrote: > That's the point, and the whole point. You have not shown that it > doesn't. You have not shown why adding a 12th non-default opclass > is a particular problem here (although we have a consensus to use > different operators, to reserve this operator namespace for other > things). We need justification to add operators, imv, especially ones that expose our internal binary representation of data. I worry that adding these will come back to bite us later and that we're making promises we won't be able to keep. If these inconsistencies in what happens with these data types are an issue then REFRESH can be handled as a wholesale DELETE/INSERT. Trying to do this incremental-but-not-really maintenance where the whole query is run but we try to skimp on what's actually getting updated in the matview is a premature optimization, imv, and one which may be less performant and more painful, with more gotchas and challenges for our users, to deal with in the long run. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> wrote: > We need justification to add operators, imv, especially ones that > expose our internal binary representation of data. I believe I have done so. > I worry that adding these will come back to bite us later How? > and that we're making promises we won't be able to keep. The promise that a concurrent refresh will produce the same set of rows as a non-concurrent one? > If these inconsistencies in what happens with these data types > are an issue then REFRESH can be handled as a wholesale > DELETE/INSERT. I have real-life experience with handling faux materialized views by creating tables (at Wisconsin Courts). We initially did it the way you describe, but the run time was excessive (in Milwaukee County the overnight run did not always complete before the start of business hours the next day). Switching to logic similar to what I've implemented here, it completed an order of magnitude faster, and generated a small fraction of the WAL and logical replication data that the technique you describe did. Performing preliminary steps in temporary tables to minimize the changes needed to permanent tables seems beneficial enough on the face of it that I think the burden of proof should be on someone arguing that deleting all rows and re-inserting (in the same transaction) is, in general, a superior approach to finding the differences and applying only those/ > Trying to do this incremental-but-not-really maintenance where > the whole query is run but we try to skimp on what's actually > getting updated in the matview is a premature optimization, imv, > and one which may be less performant and more painful, with more > gotchas and challenges for our users, to deal with in the long > run. I have the evidence of a ten-fold performance improvement plus minimized WAL and replication work on my side. What evidence do you have to back your assertions? (Don't forget to work in bloat and vacuum truncation issues to the costs of your proposal.) -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Kevin Grittner (kgrittn@ymail.com) wrote: > Stephen Frost <sfrost@snowman.net> wrote: > > I worry that adding these will come back to bite us later > > How? User misuse is certainly one consideration, but I wonder what's going to happen if we change our internal representation of data (eg: numerics get changed again), or when incremental matview maintenance happens and we start looking at subsets of rows instead of the entire query. Will the first update of a matview after a change to numeric's internal data structure cause the entire thing to be rewritten? > > and that we're making promises we won't be able to keep. > > The promise that a concurrent refresh will produce the same set of > rows as a non-concurrent one? The promise that we'll always return the binary representation of the data that we saw last. When greatest(x,y) comes back 'false' for a MAX(), we then have to go check "well, does the type consider them equal?", because, if the type considers them equal, we then have to decide if we should replace x with y anyway, because it's different at a binary level. That's what we're saying we'll always do now. We're also saying that we'll replace things based on plan differences rather than based on if the rows underneath actually changed at all. We could end up with material differences in the result of matviews updated through incremental REFRESH and matviews updated through actual incremental mainteance- and people may *care* about those because we've told them (or they discover) they can depend on these types of changes to be reflected in the result. > > Trying to do this incremental-but-not-really maintenance where > > the whole query is run but we try to skimp on what's actually > > getting updated in the matview is a premature optimization, imv, > > and one which may be less performant and more painful, with more > > gotchas and challenges for our users, to deal with in the long > > run. > > I have the evidence of a ten-fold performance improvement plus > minimized WAL and replication work on my side. What evidence do > you have to back your assertions? (Don't forget to work in bloat > and vacuum truncation issues to the costs of your proposal.) I don't doubt that there are cases in both directions and I'm not trying to argue that it'd always be faster, but I doubt it's always slower. I'm surprised that you had a case where the query was apparently quite fast yet the data set hardly changed and resulted in a very large result but I don't doubt that it happened. What I was trying to get at is really that the delete/insert approach would be good enough in very many cases and it wouldn't have what look, to me anyway, as some pretty ugly warts around these cases. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> wrote: > Kevin Grittner (kgrittn@ymail.com) wrote: >> Stephen Frost <sfrost@snowman.net> wrote: >> > I worry that adding these will come back to bite us later >> >> How? > > User misuse is certainly one consideration, I think that one has been talked to death already, with the consensus that we should use different operator names and document them as a result. Any comparison operator can be misused. Andres said he has seen the operators from text_pattern_ops used in production; but we have not removed, for example, ~>=~ from our text operators as a result. > but I wonder what's going to > happen if we change our internal representation of data (eg: numerics > get changed again), or when incremental matview maintenance happens and > we start looking at subsets of rows instead of the entire query. Will > the first update of a matview after a change to numeric's internal data > structure cause the entire thing to be rewritten? Something like that could happen, if the newly generated values, for example, all took less space than the old. On the first REFRESH on the new version of the software it might delete and insert all rows; then it would stay with the new representation. I have trouble seeing that as a problem, since presumably we only come up with a new representation because it is smaller, faster, or cures some bug. >>> and that we're making promises we won't be able to keep. >> >> The promise that a concurrent refresh will produce the same set of >> rows as a non-concurrent one? > > The promise that we'll always return the binary representation of the > data that we saw last. When greatest(x,y) comes back 'false' for a > MAX(), we then have to go check "well, does the type consider them > equal?", because, if the type considers them equal, we then have to > decide if we should replace x with y anyway, because it's different > at a binary level. That's what we're saying we'll always do now. I'm having a little trouble following that. The query runs as it always has, with all the old definitions of comparisons. After it is done, we check whether the rows are the same. The operation of MAX() will not be affected in any way. If MAX() returns a value which is not the same as what the matview has, the matview will be modified to match what MAX() returned. > We're also saying that we'll replace things based on plan differences > rather than based on if the rows underneath actually changed at all. Only if the user uses a query which does not produce deterministic results. > We could end up with material differences in the result of matviews > updated through incremental REFRESH and matviews updated through > actual incremental mainteance- and people may *care* about those > because we've told them (or they discover) they can depend on these > types of changes to be reflected in the result. Perhaps we should document the recommendation that people not create materialized views with non-deterministic results, but I don't think that should be a hard restriction. For example, I could see someone creating a materialized view to pick a random sample from a jury pool to be the on the jury panel for today (from which juries are selected on that day), and not want that to be predictable and not want it to change until the next refresh of the panel matview. (That would not be my first design choice, but it would not be a horrid choice if there were sufficient logging of the REFRESH statements in a place the user could not modify.) >>> Trying to do this incremental-but-not-really maintenance where >>> the whole query is run but we try to skimp on what's actually >>> getting updated in the matview is a premature optimization, imv, >>> and one which may be less performant and more painful, with more >>> gotchas and challenges for our users, to deal with in the long >>> run. >> >> I have the evidence of a ten-fold performance improvement plus >> minimized WAL and replication work on my side. What evidence do >> you have to back your assertions? (Don't forget to work in bloat >> and vacuum truncation issues to the costs of your proposal.) > > I don't doubt that there are cases in both directions and I'm not trying > to argue that it'd always be faster, but I doubt it's always slower. Sure. We provide a way to support those cases, although it involves blocking. So far, even the tests I expected to be faster with heap replacement have come out marginally slower that way than with CONCURRENTLY, due to index activity; but I have no doubt cases can be constructed that go the other way. > I'm surprised that you had a case where the query was apparently quite > fast yet the data set hardly changed and resulted in a very large result > but I don't doubt that it happened. The history of all events in the county (tens of millions of records in Milwaukee County) need to be scanned to generate the status of cases and the date of last activity, as well as scanning all future calendar events to see what is scheduled. This is compared to tables setting standards for how quickly different types of cases should be handled and how long a case should go without some activity. The results are used to provide a dashboard for judges to see where things are slipping, and drill down to detail so they can get a stalled case moving again. On any one day a relatively small number of cases change status or cross time boundaries which make them need attention. I really look forward to the time when we can support these queries with incremental maintenance in matviews. I don't expect that to be until maybe version 10.3, though. Not only will it eliminate a big batch job each night, but judges will be able to see their work reflected in the dashboard on the same day. > What I was trying to get at is really that the delete/insert > approach would be good enough in very many cases and it wouldn't > have what look, to me anyway, as some pretty ugly warts around > these cases. I guess we could add a "DELETE everything and INSERT the new version of everything option for REFRESH in addition to what is there now, but I would be very reluctant to use it as a replacement. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Kevin Grittner (kgrittn@ymail.com) wrote: > Stephen Frost <sfrost@snowman.net> wrote: > > The promise that we'll always return the binary representation of the > > data that we saw last. When greatest(x,y) comes back 'false' for a > > MAX(), we then have to go check "well, does the type consider them > > equal?", because, if the type considers them equal, we then have to > > decide if we should replace x with y anyway, because it's different > > at a binary level. That's what we're saying we'll always do now. > > I'm having a little trouble following that. You're looking at it from the perspective of what's committed today, I think. > The query runs as it > always has, with all the old definitions of comparisons. After it > is done, we check whether the rows are the same. The operation of > MAX() will not be affected in any way. If MAX() returns a value > which is not the same as what the matview has, the matview will be > modified to match what MAX() returned. I'm referring to a case where we're doing incremental maintenance of the matview (in the far distant future..) and all we've got is the current MAX value and the value from the row being updated. We're going to update that MAX on cases where the values are "equal-but-binary-different". > > We're also saying that we'll replace things based on plan differences > > rather than based on if the rows underneath actually changed at all. > > Only if the user uses a query which does not produce deterministic > results. Which is apt to happen.. > > We could end up with material differences in the result of matviews > > updated through incremental REFRESH and matviews updated through > > actual incremental mainteance- and people may *care* about those > > because we've told them (or they discover) they can depend on these > > types of changes to be reflected in the result. > > Perhaps we should document the recommendation that people not > create materialized views with non-deterministic results, but I > don't think that should be a hard restriction. I wasn't suggesting a hard restriction, but I was postulating about how the behavior may change in the future (or we may need to do very hacky things to prevent it from channging) and how these decisions may impact that. > > What I was trying to get at is really that the delete/insert > > approach would be good enough in very many cases and it wouldn't > > have what look, to me anyway, as some pretty ugly warts around > > these cases. > > I guess we could add a "DELETE everything and INSERT the new > version of everything option for REFRESH in addition to what is > there now, but I would be very reluctant to use it as a > replacement. It wouldn't address my concerns anyway, which are around these binary operators and the update-in-place approach being risky and setting us up for problems down the road. Thanks, Stephen
On Tue, Sep 24, 2013 at 12:00 PM, Stephen Frost <sfrost@snowman.net> wrote: > It wouldn't address my concerns anyway, which are around these binary > operators and the update-in-place approach being risky and setting us up > for problems down the road. I think that if you want to hold up a bug fix to a feature that's already committed, you need to be more specific than to say that there might be problems down the road. I can't see that you've identified any consequence of this change that is clearly bad. I understand that you don't like the idea of making binary-comparison operators user-visible due to the risk of user-confusion, but we have lots of confusing features already and we handle that by documenting them. This one doesn't seem any worse than average; in fact, to me it seems rather minor. At any rate, I think we're getting distracted from the real point here. Here's what this patch is attempting to fix: rhaas=# create table sales (txn_date date, amount numeric); CREATE TABLE rhaas=# insert into sales values ('2013-09-01', '100.00'), ('2013-09-01', '150.00'), ('2013-09-02', '75.0'); INSERT 0 3 rhaas=# create materialized view sales_summary as select txn_date, sum(amount) as amount from sales group by 1; SELECT 2 rhaas=# create unique index on sales_summary (txn_date); CREATE INDEX rhaas=# select * from sales_summary; txn_date | amount ------------+--------2013-09-01 | 250.002013-09-02 | 75.0 (2 rows) rhaas=# update sales set amount = round(amount, 2); UPDATE 3 rhaas=# refresh materialized view concurrently sales_summary; REFRESH MATERIALIZED VIEW rhaas=# select * from sales_summary; txn_date | amount ------------+--------2013-09-01 | 250.002013-09-02 | 75.0 (2 rows) rhaas=# refresh materialized view sales_summary; REFRESH MATERIALIZED VIEW rhaas=# select * from sales_summary; txn_date | amount ------------+--------2013-09-01 | 250.002013-09-02 | 75.00 (2 rows) Note that, after we change the data in the underlying table, the output of the query changes. But REFRESH MATERIALIZED VIEW CONCURRENTLY doesn't fix it. However, REFRESH MATERIALIZED VIEW without CONCURRENTLY does fix it. That's a bug, because if there are two ways of refreshing a materialized view they should both produce the same answer. Shouldn't they? The fact that users can write queries that don't always give the same answer is not a reason why it's OK for REFRESH CONCURRENTLY to misbehave on queries that do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Tue, Sep 24, 2013 at 12:00 PM, Stephen Frost <sfrost@snowman.net> wrote: > > It wouldn't address my concerns anyway, which are around these binary > > operators and the update-in-place approach being risky and setting us up > > for problems down the road. > > I think that if you want to hold up a bug fix to a feature that's > already committed, you need to be more specific than to say that there > might be problems down the road. Committed isn't released and I simply can't agree that introducing new operators is a 'bug fix'. Had it gone out as-is, I can't see us back-patching a bunch of new operators to fix it. > Note that, after we change the data in the underlying table, the > output of the query changes. But REFRESH MATERIALIZED VIEW > CONCURRENTLY doesn't fix it. However, REFRESH MATERIALIZED VIEW > without CONCURRENTLY does fix it. That's a bug, because if there are > two ways of refreshing a materialized view they should both produce > the same answer. Shouldn't they? The same queries run over time without changes to the underlying data really should return the same data, shoudln't they? Is it a bug that they don't? In general, I agree that they should produce the same results, as should incrementally maintained views when they happen. I'm not convinced that choosing whatever the 'new' value is to represent the value in the matview for the equal-but-not-binary-identical will always be the correct answer though. > The fact that users can write > queries that don't always give the same answer is not a reason why > it's OK for REFRESH CONCURRENTLY to misbehave on queries that do. This really is, imv, agreeing to hold a higher standard for matviews than we do for what matviews are *based* on- which is queries. Thanks, Stephen
On Tue, Sep 24, 2013 at 1:04 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Tue, Sep 24, 2013 at 12:00 PM, Stephen Frost <sfrost@snowman.net> wrote: >> > It wouldn't address my concerns anyway, which are around these binary >> > operators and the update-in-place approach being risky and setting us up >> > for problems down the road. >> >> I think that if you want to hold up a bug fix to a feature that's >> already committed, you need to be more specific than to say that there >> might be problems down the road. > > Committed isn't released and I simply can't agree that introducing new > operators is a 'bug fix'. Had it gone out as-is, I can't see us > back-patching a bunch of new operators to fix it. You're perfectly welcome to argue that we should rip REFRESH MATERIALIZED VIEW CONCURRENTLY back out, or change the implementation of it. However, that's not the topic of this thread. Had it gone out as is, we would have either had to come up with some more complex fix that could make things right without catalog changes, or we would have had to document that it doesn't work correctly. Fortunately we are not faced with that conundrum. >> Note that, after we change the data in the underlying table, the >> output of the query changes. But REFRESH MATERIALIZED VIEW >> CONCURRENTLY doesn't fix it. However, REFRESH MATERIALIZED VIEW >> without CONCURRENTLY does fix it. That's a bug, because if there are >> two ways of refreshing a materialized view they should both produce >> the same answer. Shouldn't they? > > The same queries run over time without changes to the underlying data > really should return the same data, shoudln't they? Not necessarily. There are and always have been plenty of cases where that isn't true. > Is it a bug that they don't? No. > In general, I agree that they should produce the same results, as should > incrementally maintained views when they happen. I'm not convinced that > choosing whatever the 'new' value is to represent the value in the > matview for the equal-but-not-binary-identical will always be the > correct answer though. Well, you have yet to provide an example of where it isn't the right behavior. I initially thought that perhaps we needed a type-specific concept of exact equality, so that two things would be exactly equal if they would both be dumped the same way by pg_dump, even if the internal representations were different. But on further thought I'm no longer convinced that's a good idea. For example, consider the compact-numeric format that I introduced a few releases back. The changes are backward compatible, so you can run pg_upgrade on a cluster containing the old format and everything works just fine. But your table will be larger than you really want it to be until you rewrite it. Any materialized view that was built by selecting those numeric values will *also* be larger than you want it to be until you rewrite it. Fortunately, you can shrink the table by using a rewriting variant of ALTER TABLE. After you do that, you will perhaps also want to shrink the materialized view. And in fact, REFRESH will do that for you, but currently, REFRESH CONCURRENTLY won't. It seems to me that it would be nicer if it did. Now I admit that's an arguable point. We could certainly define an intermediate notion of equality that is more equal than whatever = does, but not as equal as exact binary equality. However, such a new notion would have no precedence in our existing code, whereas the use of binary equality does. Going to a lot of work to build up this intermediate notion of equality does not seem like a good idea to me; I think a general rule of good programming is not to invent more ways to do things than are clearly necessary, and requiring every core and extension type to define an exact-equality operator just to support this feature seems like excessive in the extreme. Moreover, I think what to include in that intermediate notion of equality would be kind of hard to decide, because there are a lot of subtly different questions that one can ask, and we'd inevitably have to answer the question "how equal does it have to be to be equal enough?". Numerics and arrays have multiple ways of referencing what is intended to be the exact same value, but those can be disambiguated by passing them to a function that cares, like pg_column_size(). Then, on a user-visible level, numerics allow variation in the number of trailing zeroes after the decimal point. Floats have extra digits that aren't shown except with extra_float_digits, so that the value can be less than totally equal even if the text representation is the same. citext intentionally ignores case. In every one of those cases, it's possible that the user of materialized views might say "you know, if it's equal enough that the = operator says it's the same, then I really don't mind if the materialized view maintenance gets skipped". But it's also possible that they might say, "you know, those values are not really the same, and the materialized view really ought to reflect the latest data". I think the conservative (and therefore correct) approach is to decide that we're always going to update if we detect any difference at all. The only real argument for ignoring any differences is that it will mean more changes to the materialized view. And that is true enough, but in practical scenarios such updates will be rare. I would argue that if someone goes through and add spends a lot of time adding or removing trailing zeroes to the base tables on which a materialized view is based, the likelihood is that those trailing zeros do matter and the change ought to reflect through to the materialized views as well. It is not impossible that someone could upgrade to a version of PG that supports a more compact or otherwise-enhanced version of some data type, rewrites the table for some reason, and is then surprised to see a change storm on the associated materialized view as well. But I don't think it's very likely - we have no such changes planned. And even if does happen at some point down the road, I can't see justifying the effort of building a third notion of equality as effort well-spent to avoid it, especially because it's equally possible that the user *would* want the changes to propagate through to the materialized view. >> The fact that users can write >> queries that don't always give the same answer is not a reason why >> it's OK for REFRESH CONCURRENTLY to misbehave on queries that do. > > This really is, imv, agreeing to hold a higher standard for matviews > than we do for what matviews are *based* on- which is queries. I don't think it is, and your explanation of why you think it is does not make any sense to me. If a view produced an answer that the underlying query *could not have produced* then we would clearly consider that a bug. But, you're arguing that a materialized view on which REFRESH CONCURRENTLY should be allowed to produce just such answers. It is obviously true, and unavoidable, that if the query can produce more than one result set depending on the query plan or other factors, then the materialized view contents can match only one of those possible result sets. But you are arguing that it should be allowed to produce some OTHER result set, that a direct execution of the query could *never* have produced, and I can't see how that can be right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > Now I admit that's an arguable point. We could certainly define an > intermediate notion of equality that is more equal than whatever = > does, but not as equal as exact binary equality. I suggested it up-thread and don't recall seeing a response, so here it is again- passing the data through the binary-out function for the type and comparing *that* would allow us to change the interal binary representation of data types and would be something which we could at least explain to users as to why X isn't the same as Y according to this binary operator. > I think the conservative (and therefore correct) approach is to decide > that we're always going to update if we detect any difference at all. And there may be users who are surprised that a refresh changed parts of the table that have nothing to do with what was changed in the underlying relation, because a different plan was used and the result ended up being binary-different. It's easy to explain to users why that would be when we're doing a wholesale replace but I don't think it'll be nearly as clear why that happened when we're not replacing the whole table and why REFRESH can basically end up changing anything (but doesn't consistently). If we're paying attention to the records changed and only updating the matview's records when they're involved, that becomes pretty clear. What's happening here feels very much like unintended consequences. > It is obviously true, and unavoidable, that if the query can produce > more than one result set depending on the query plan or other factors, > then the materialized view contents can match only one of those > possible result sets. But you are arguing that it should be allowed > to produce some OTHER result set, that a direct execution of the query > could *never* have produced, and I can't see how that can be right. I agree that the matview shouldn't produce things which *can't* exist through an execution of the query. I don't intend to argue against that but I realize that's a fallout of not accepting the patch to implement the binary comparison operators. My complaint is more generally that if this approach needs such then there's going to be problems down the road. No, I can't predict exactly what they are and perhaps I'm all wet here, but this kind of binary-equality operations are something I've tried to avoid since discovering what happens when you try to compare a couple of floating point numbers. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> wrote: >> I think the conservative (and therefore correct) approach is to >> decide that we're always going to update if we detect any >> difference at all. > > And there may be users who are surprised that a refresh changed > parts of the table that have nothing to do with what was changed > in the underlying relation, because a different plan was used and > the result ended up being binary-different. Binary different for "equal" values could include box (or other geometry) objects moved to completely new locations and/or not quite the same size. Here is v2 of the patch which changes from the universally disliked operator names v1 used. It also fixes bugs in the row comparisons for pass-by-reference types, fixes a couple nearby comments, and adds regression tests for a matview containing a box column. The box type is interesting in that its = operator only worries about the area of the box, and considers two boxes with no more than EPSILON difference to be equal. This means that boxes at totally different locations can be equal, and that if A = B and B = C it is not necessarily true that A = C. This doesn't cause too much trouble in general because boxes don't have btree opclasses. Since there are types, including core types, without a default btree opclass, materialized views containing them cannot use RMVC as currently committed. This patch would fix that, or we could rip out the current implementation and go to the "delete everything and insert the entire new matview contents" approach. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Tue, Sep 24, 2013 at 2:22 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> Now I admit that's an arguable point. We could certainly define an >> intermediate notion of equality that is more equal than whatever = >> does, but not as equal as exact binary equality. > > I suggested it up-thread and don't recall seeing a response, so here it > is again- passing the data through the binary-out function for the type > and comparing *that* would allow us to change the interal binary > representation of data types and would be something which we could at > least explain to users as to why X isn't the same as Y according to this > binary operator. > >> I think the conservative (and therefore correct) approach is to decide >> that we're always going to update if we detect any difference at all. > > And there may be users who are surprised that a refresh changed parts of > the table that have nothing to do with what was changed in the > underlying relation, because a different plan was used and the result > ended up being binary-different. It's easy to explain to users why that > would be when we're doing a wholesale replace but I don't think it'll be > nearly as clear why that happened when we're not replacing the whole > table and why REFRESH can basically end up changing anything (but > doesn't consistently). If we're paying attention to the records changed > and only updating the matview's records when they're involved, that > becomes pretty clear. What's happening here feels very much like > unintended consequences. FWIW you make some interesting points (I did a triple take on the plan dependent changes) but I'm 100% ok with the proposed behavior. Matviews satisfy 'truth' as *defined by the underlying query only*. This is key: there may be N candidate 'truths' for that query: it's not IMNSHO reasonable to expect the post-refresh truth to be approximately based in the pre-refresh truth. Even if the implementation happened to do what you're asking for it would only be demonstrating undefined but superficially useful behavior...a good analogy would be the old scan behavior where an unordered scan would come up in 'last update order'. That (again, superficially useful) behavior was undefined and we reserved the right to change it. And we did. Unnecessarily defined behaviors defeat future performance optimizations. So Kevin's patch AIUI defines a hitherto non-user accessible (except in the very special case of row-wise comparison) mechanic to try and cut down the number of rows that *must* be refreshed. It may or may not do a good job at that on a situational basis -- if it was always better we'd probably be overriding the default behavior. I don't think it's astonishing at all for matview to pseudo-randomly adjust case over a citext column; that's just part of the deal with equality ambiguous types. As long as the matview doesn't expose a dataset that was impossible to have been generated by the underlying query, I'm good. merlin
Kevin Grittner <kgrittn@ymail.com> wrote: > Here is v2 of the patch which changes from the universally > disliked operator names v1 used. It also fixes bugs in the row > comparisons for pass-by-reference types, fixes a couple nearby > comments, and adds regression tests for a matview containing a > box column. I accidentally omitted the regression tests for matview with a box column from v2. Rather than redoing the whole thing, here is a v2a patch to add just that omitted part. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Tue, Sep 24, 2013 at 3:22 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> Now I admit that's an arguable point. We could certainly define an >> intermediate notion of equality that is more equal than whatever = >> does, but not as equal as exact binary equality. > > I suggested it up-thread and don't recall seeing a response, so here it > is again- passing the data through the binary-out function for the type > and comparing *that* would allow us to change the interal binary > representation of data types and would be something which we could at > least explain to users as to why X isn't the same as Y according to this > binary operator. Sorry, I missed that suggestion. I'm wary of inventing a completely new way of doing this. I don't think that there's any guarantee that the send/recv functions won't expose exactly the same implementation details as a direct check for binary equality. For example, array_send() seems to directly reveal the presence or absence of a NULL bitmap. Even if there were no such anomalies today, it feels fragile to rely on a fairly-unrelated concept to have exactly the semantics we want here, and it will surely be much slower. Binary equality has existing precedent and is used in numerous places in the code for good reason. Users might be confused about the use of those semantics in those places also, but AFAICT nobody is. I think the best thing I can say about this proposal is that it would clearly be better than what we're doing now, which is just plain wrong. I don't think it's the best proposal, however. >> It is obviously true, and unavoidable, that if the query can produce >> more than one result set depending on the query plan or other factors, >> then the materialized view contents can match only one of those >> possible result sets. But you are arguing that it should be allowed >> to produce some OTHER result set, that a direct execution of the query >> could *never* have produced, and I can't see how that can be right. > > I agree that the matview shouldn't produce things which *can't* exist > through an execution of the query. I don't intend to argue against that > but I realize that's a fallout of not accepting the patch to implement > the binary comparison operators. My complaint is more generally that if > this approach needs such then there's going to be problems down the > road. No, I can't predict exactly what they are and perhaps I'm all wet > here, but this kind of binary-equality operations are something I've > tried to avoid since discovering what happens when you try to compare > a couple of floating point numbers. So, I get that. But what I think is that the problem that's coming up here is almost the flip side of that. If you are working with types that are a little bit imprecise, such as floats or citext or box, you want to use comparison strategies that have a little bit of tolerance for error. In the case of box, this is actually built into the comparison operator. In the case of floats, it's not; you as the application programmer have to deal with the fact that comparisons are imprecise - like by avoiding equality comparisons. On the other hand, if you are *replicating* those data types, then you don't want that tolerance. If you're checking whether two boxes are equal, you may indeed want the small amount of fuzziness that our comparison operators allow. But if you're copying a box or a float from one table to another, or from one database to another, you want the values copied exactly, including all of those low-order bits that tend to foul up your comparisons. That's why float8out() normally doesn't display any extra_float_digits - because you as the user shouldn't be relying on them - but pg_dump does back them up because not doing so would allow errors to propagate. Similarly here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote: > On 09/12/2013 06:27 PM, Kevin Grittner wrote: > >Attached is a patch for a bit of infrastructure I believe to be > >necessary for correct behavior of REFRESH MATERIALIZED VIEW > >CONCURRENTLY as well as incremental maintenance of matviews. > > Here is attempt at a review of the v1 patch. > > There has been a heated discussion on list about if we even want > this type of operator. I will try to summarize it here > > The arguments against it are > * that a record_image_identical call on two records that returns > false can still return true under the equality operator, and the > records might (or might not) appear to be the same to users > * Once we expose this as an operator via SQL someone will find it > and use it and then complain when we change things such as the > internal representation of a datatype which might > break there queries > * The differences between = and record_image_identical might not be > understood by everywhere who finds the operator leading to confusion > * Using a criteria other than equality (the = operator provided by > the datatype) might cause problems if we later provide 'on change' > triggers for materialized views > > The arguments for this patch are > * We want the materialized view to return the same value as would be > returned if the query were executed directly. This not only means > that the values should be the same according to a datatypes = > operator but that they should appear the same 'to the eyeball'. > * Supporting the materialized view refresh check via SQL makes a lot > of sense but doing that requires exposing something via SQL > * If we adequately document what we mean by record_image_identical > and the operator we pick for this then users shouldn't be surprised > at what they get if they use this This is a good summary. I think there are a few things that make this issue difficult to decide: 1. We have to use an operator to give the RMVC (REFRESH MATERIALIZED VIEW CONCURRENTLY) SPI query the ability to optimize this query. If we could do this with an SQL or C function, there would be less concern about the confusion caused by adding this capability. Question: If we are comparing based on some primary key, why do we need this to optimize? Certainly the primary key index will be used, no? 2. Agregates are already non-deterministic, so some feel that adding this feature doesn't improve much. The counter-argument is that without the binary row comparison ability, rows could be returned that could _never_ have been produced by the base data, which is more of a user surprise than non-deterministic rows. 3. Our type casting and operators are already complex, and adding another set of operators only compounds that. 4. There are legitimate cases where tool makers might want the ability to compare rows binarily, e.g. for replication, and adding these operators would help with that. I think we need to see a patch from Kevin that shows all the row comparisons documented and we can see the impact of the user-visibile part. One interesting approach would be to only allow the operator to be called from SPI queries. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Sep 26, 2013 at 05:50:08PM -0400, Bruce Momjian wrote: > I think we need to see a patch from Kevin that shows all the row > comparisons documented and we can see the impact of the user-visibile > part. > > One interesting approach would be to only allow the operator to be > called from SPI queries. It would also be good to know about similar non-default entries in pg_opclass so we can understand the expected impact. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> wrote: > On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote: >> On 09/12/2013 06:27 PM, Kevin Grittner wrote: >>> Attached is a patch for a bit of infrastructure I believe to be >>> necessary for correct behavior of REFRESH MATERIALIZED VIEW >>> CONCURRENTLY as well as incremental maintenance of matviews. >> >> Here is attempt at a review of the v1 patch. >> >> There has been a heated discussion on list about if we even want >> this type of operator. I will try to summarize it here >> >> The arguments against it are >> * that a record_image_identical call on two records that returns >> false can still return true under the equality operator, and the >> records might (or might not) appear to be the same to users >> * Once we expose this as an operator via SQL someone will find >> it and use it and then complain when we change things such as >> the internal representation of a datatype which might break >> there queries I don't see where that is possible unless they count on the order of records sorted with the new operators, which would not be something I recommend. Changing the internal representation cannot break simple use of the alternative equality definition as long as all you cared about was whether the binary storage format of the values was identical. >> * The differences between = and record_image_identical might not >> be understood by everywhere who finds the operator leading to >> confusion Either they find it and read the code, or we document it and they read the docs. >> * Using a criteria other than equality (the = operator provided >> by the datatype) might cause problems if we later provide 'on >> change' triggers for materialized views I would fight user-defined triggers for matviews tooth and nail. We need to get incremental maintenance working instead. >> The arguments for this patch are >> * We want the materialized view to return the same value as >> would be returned if the query were executed directly. This not >> only means that the values should be the same according to a >> datatypes = operator but that they should appear the same 'to >> the eyeball'. And to functions the user can run against the values. The example with the null bitmap for arrays being included when not necessary is something that isn't directly apparent to the eye, but queries which use pg_column_size would not get equal results. >> * Supporting the materialized view refresh check via SQL makes a >> lot of sense but doing that requires exposing something via SQL >> * If we adequately document what we mean by >> record_image_identical and the operator we pick for this then >> users shouldn't be surprised at what they get if they use this We first need to document the existing record comparison operators. If they read the docs for comparing "row_constructors" and expect that to be the behavior they get when they compare records, they will be surprised. > This is a good summary. I think there are a few things that make > this issue difficult to decide: > > 1. We have to use an operator to give the RMVC (REFRESH > MATERIALIZED VIEW CONCURRENTLY) SPI query the ability to optimize > this query. If we could do this with an SQL or C function, there > would be less concern about the confusion caused by adding this > capability. (a) We can't. (b) Why would that be less confusing? > Question: If we are comparing based on some primary key, why do > we need this to optimize? Because comparing primary keys doesn't tell us whether the old and new values in the row all match. > Certainly the primary key index will be used, no? It currently uses all columns which are part of unique indexes on the matview which are not partial (i.e., they do not use a WHERE clause), they index only on columns (not expressions). It is not possible to define a primary key on a matview. There are two reasons for using these columns: (a) It provides a correctness guarantee against having duplicated rows which have no nulls. This guarantee is what allows us to use the differential technique which is faster than retail DELETE and re-INSERT of everything. (b) Since we don't support hash joins of records, and it would tend to be pretty slow if we did, it allows faster hash joins on one or more columns which stand a good chance of doing this efficiently. > 2. Agregates are already non-deterministic, Only some of them, and only with certain source data. > so some feel that adding this feature doesn't improve much. It allows a materialized view which contains a column of a type without a default btree opclass to *use* concurrent refresh, it makes queries with deterministic results always generate matview results with exactly match the results of a VIEW using the query if it were run at the same moment, and for nondeterministic queries, it provides results consistent with *some* result set which could have been returned by a view at the same time. Unless we do something, none of these things are true. That seems like much to me. > The counter-argument is that without the binary row comparison > ability, rows could be returned that could _never_ have been > produced by the base data, which is more of a user surprise than > non-deterministic rows. Yes, having the matview represent results from from running the query at *some* point in time is *one* of the benefits. > 3. Our type casting and operators are already complex, and > adding another set of operators only compounds that. It cannot have any effect on any of the existing operators, so I'm not sure whether you are referring to the extra operators and functions, or something else. It does not, for example, introduce any risk of "ambiguous operators". > 4. There are legitimate cases where tool makers might want the > ability to compare rows binarily, e.g. for replication, and > adding these operators would help with that. > > I think we need to see a patch from Kevin that shows all the row > comparisons documented and we can see the impact of the > user-visibile part. I'm inclined to first submit a proposed documentation patch for the existing record operators, and see if we can make everyone happy with that, and *then* see about adding documentation for the new ones. Trying to deal with both at once is likely to increase confusion and wheel-spinning. > One interesting approach would be to only allow the operator to > be called from SPI queries. Why would that be a good idea? > It would also be good to know about similar non-default entries > in pg_opclass so we can understand the expected impact. A quick query (lacking schema information and schema qualification) shows what is there by default: select amname, typname, opcname from pg_opclass c join pg_am a on a.oid = c.opcmethod join pg_type t on t.oid = c.opcintype where not opcdefault order by 1, 2, 3; amname | typname | opcname --------+---------+--------------------- btree | bpchar | bpchar_pattern_ops btree | inet | cidr_ops btree | record | record_image_ops btree | text | text_pattern_ops btree | text | varchar_ops btree | text | varchar_pattern_ops hash | bpchar | bpchar_pattern_ops hash | inet | cidr_ops hash | text | text_pattern_ops hash | text | varchar_ops hash | text | varchar_pattern_ops spgist | point | kd_point_ops (12 rows) The text_pattern_ops opclass, for example adds memcmp based comparisons for text strings, that differ from what the default comparisons regarding greater or lesser values: test=# select 'a' < E'\u02E5'; ?column? ---------- f (1 row) test=# select 'a' ~<~ E'\u02E5'; ?column? ---------- t (1 row) Then there are types with no default btree opclass, so "normal" equality tests are not defined. When I run a simple query for such types I get hundreds, although some of those may not be important for this discussion. Let's pick one, just as an example: box. test=# select '(2,2),(1,1)'::box = '(12,82),(12.1,92)'::box; ?column? ---------- t (1 row) test=# \set A '''(1.9999996,1.9999996),(1,1)''::box' test=# \set B '''(2,2),(1,1)''::box' test=# \set C '''(2.0000004,2.0000004),(1,1)''::box' test=# select :A = :B, :B = :C, :A = :C; ?column? | ?column? | ?column? ----------+----------+---------- t | t | f (1 row) The = operator only tells you whether the boxes are approximately the same size. There is an operator which checks for "same as?" based on all corners matching, but it still allows EPSILON differences: test=# select '(2,2),(1,1)'::box ~= '(12,82),(12.1,92)'::box; ?column? ---------- f (1 row) test=# select :A ~= :B, :B ~= :C, :A ~= :C; ?column? | ?column? | ?column? ----------+----------+---------- t | t | t (1 row) These infelicities are not too painful if they are not incorporated into an opclass, however. But lack of a btree opclass for the type means you get this sort of problem with current "record =": test=# select (row ('(2,2),(1,1)'::box))::record = (row ('(2,2),(1,1)'::box))::record; ERROR: could not identify an equality operator for type box The new operator allows record comparisons when the record contains such types: test=# select (row ('(2,2),(1,1)'::box))::record *= (row ('(2,2),(1,1)'::box))::record; ?column? ---------- t (1 row) ... and would notice if you changed between "equal" values: test=# select (row (:A))::record *= (row (:B))::record, test-# (row (:B))::record *= (row (:C))::record, test-# (row (:A))::record *= (row (:C))::record; ?column? | ?column? | ?column? ----------+----------+---------- f | f | f (1 row) -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> wrote: > Bruce Momjian <bruce@momjian.us> wrote: >> I think we need to see a patch from Kevin that shows all the row >> comparisons documented and we can see the impact of the >> user-visibile part. First draft attached. > I'm inclined to first submit a proposed documentation patch for the > existing record operators, and see if we can make everyone happy > with that, and *then* see about adding documentation for the new > ones. Trying to deal with both at once is likely to increase > confusion and wheel-spinning. When I took a stab at it, the new operators only seemed to merit a single paragraph, so that is included after all. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 09/28/2013 03:03 PM, Kevin Grittner wrote: > <para> > + To support matching of rows which include elements without a default > + B-tree operator class, the following operators are defined for composite > + type comparison: > + <literal>*=</>, > + <literal>*<></>, > + <literal>*<</>, > + <literal>*<=</>, > + <literal>*></>, and > + <literal>*>=</>. > + These operators are also used internally to maintain materialized views, > + and may be useful to replication software. To ensure that all user > + visible changes are detected, even when the equality operator for the > + type treats old and new values as equal, the byte images of the stored > + data are compared. While ordering is deterministic, it is not generally > + useful except to facilitate merge joins. Ordering may differ between > + system architectures and major releases of > + <productname>PostgreSQL</productname>. > + </para> How about To support matching of rows which include elements without a default B-tree operator class, the following operators aredefined for composite type comparison: <literal>*=</>, <literal>*<></>, <literal>*<</>, <literal>*<=</>, <literal>*></>, and <literal>*>=</>. These operators compare the internal binary representation of the two rows. Two rows might have a different binary representation even though comparisons of the two rows with the equality operator is true. The ordering of rows under these comparision operators is deterministic but not otherwise meaningful. These operators are used internally for materialized views and might be useful for other specialized purposes such as replication but are not intended to be generally useful for writing queries.
Steve Singer <steve@ssinger.info> wrote: > How about > > To support matching of rows which include elements without a default > B-tree operator class, the following operators are defined for composite > type comparison: > <literal>*=</>, > <literal>*<></>, > <literal>*<</>, > <literal>*<=</>, > <literal>*></>, and > <literal>*>=</>. > > These operators compare the internal binary representation of the two > rows. Two rows might have a different binary representation even > though comparisons of the two rows with the equality operator is true. > The ordering of rows under these comparision operators is deterministic > but not otherwise meaningful. These operators are used internally for > materialized views and might be useful for other specialized purposes > such as replication but are not intended to be generally useful for > writing queries. I agree that's an improvement. Thanks! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Sep 27, 2013 at 03:34:20PM -0700, Kevin Grittner wrote: > >> The arguments for this patch are > >> * We want the materialized view to return the same value as > >> would be returned if the query were executed directly. This not > >> only means that the values should be the same according to a > >> datatypes = operator but that they should appear the same 'to > >> the eyeball'. > > And to functions the user can run against the values. The example > with the null bitmap for arrays being included when not necessary > is something that isn't directly apparent to the eye, but queries > which use pg_column_size would not get equal results. pg_column_size() is by definition internal details, so I am not worried about that not matching. > >> * Supporting the materialized view refresh check via SQL makes a > >> lot of sense but doing that requires exposing something via SQL > >> * If we adequately document what we mean by > >> record_image_identical and the operator we pick for this then > >> users shouldn't be surprised at what they get if they use this > > We first need to document the existing record comparison operators. > If they read the docs for comparing "row_constructors" and expect > that to be the behavior they get when they compare records, they > will be surprised. Well, if they appear in \do, I am thinking they should be documented. > > This is a good summary. I think there are a few things that make > > this issue difficult to decide: > > > > 1. We have to use an operator to give the RMVC (REFRESH > > MATERIALIZED VIEW CONCURRENTLY) SPI query the ability to optimize > > this query. If we could do this with an SQL or C function, there > > would be less concern about the confusion caused by adding this > > capability. > > (a) We can't. > > (b) Why would that be less confusing? Because function names, especially long ones, are more clearly self-documenting than operators. > > Question: If we are comparing based on some primary key, why do > > we need this to optimize? > > Because comparing primary keys doesn't tell us whether the old and > new values in the row all match. OK, but my question was about why we need a full set of operators rather than just equal, and maybe not equal. I thought you said we needed others, e.g. >, so we could do merge joins, but I thought we would just be doing comparisons after primary keys are joined, and if that is true, we could just use a function. Actually, I am now realizing you have to use the non-binary-level equals comparison on keys, then the binary-level equals on rows for this to work --- that's pretty confusing. Is that true? > > 3. Our type casting and operators are already complex, and > > adding another set of operators only compounds that. > > It cannot have any effect on any of the existing operators, so I'm > not sure whether you are referring to the extra operators and > functions, or something else. It does not, for example, introduce > any risk of "ambiguous operators". It makes our system more complex for the user to understand. > > One interesting approach would be to only allow the operator to > > be called from SPI queries. > > Why would that be a good idea? Because then it would be more of an "internal" operator. > > It would also be good to know about similar non-default entries > > in pg_opclass so we can understand the expected impact. > > A quick query (lacking schema information and schema qualification) > shows what is there by default: OK, the unique list is: opcname--------------------- varchar_ops kd_point_ops cidr_ops text_pattern_ops varchar_pattern_ops bpchar_pattern_ops(6rows) Do these all have operators defined too? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 09/30/2013 09:08 AM, Kevin Grittner wrote: > Steve Singer <steve@ssinger.info> wrote: > >> How about >> >> To support matching of rows which include elements without a default >> B-tree operator class, the following operators are defined for composite >> type comparison: >> <literal>*=</>, >> <literal>*<></>, >> <literal>*<</>, >> <literal>*<=</>, >> <literal>*></>, and >> <literal>*>=</>. >> >> These operators compare the internal binary representation of the two >> rows. Two rows might have a different binary representation even >> though comparisons of the two rows with the equality operator is true. >> The ordering of rows under these comparision operators is deterministic >> but not otherwise meaningful. These operators are used internally for >> materialized views and might be useful for other specialized purposes >> such as replication but are not intended to be generally useful for >> writing queries. > I agree that's an improvement. Thanks! Are there any outstanding issues on this patch preventing it from being committed? I think we have discussed this patch enough such that we now have consensus on proceeding with adding a record identical operator to SQL. No one has objected to the latest names of the operators. You haven't adjusted the patch to reduce the duplication between the equality and comparison functions, if you disagree with me and feel that doing so would increase the code complexity and be inconsistent with how we do things elsewhere that is fine. Steve > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > >
On Thu, Oct 3, 2013 at 09:59:03AM -0400, Steve Singer wrote: > Are there any outstanding issues on this patch preventing it from > being committed? I have not received answers to my email of October 1: http://www.postgresql.org/message-id/20131001024620.GB13385@momjian.us -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
* Robert Haas (robertmhaas@gmail.com) wrote: > I'm wary of inventing a completely new way of doing this. I don't > think that there's any guarantee that the send/recv functions won't > expose exactly the same implementation details as a direct check for > binary equality. I don't follow this thought. Changing the binary representation which is returned when users use the binary-mode protocol would be a pretty massive user-impacting change, while adding a new way of storing NUMERIC internally wouldn't even be visible to end users. > For example, array_send() seems to directly reveal > the presence or absence of a NULL bitmap. That's part of the definition of what the binary protocol for array *is* though, so that's simply a fact of life for our users. That doesn't mean we can't, say, change the array header to remove the internal type OID and use a mapping from the type OID that's on the tuple to the type OID inside the array- as long as array_send() still produces the same binary structure for the end user. > Even if there were no such > anomalies today, it feels fragile to rely on a fairly-unrelated > concept to have exactly the semantics we want here, and it will surely > be much slower. I agree that it would be slower but performance should be a consideration once correctness is accomplished and this distinction feels a great deal more "correct", imv. > Binary equality has existing precedent and is used in > numerous places in the code for good reason. Users might be confused > about the use of those semantics in those places also, but AFAICT > nobody is. You've stated that a few times and I've simply not had time to run down the validity of it- so, where does internal-to-PG binary equality end up being visible to our users? Independent of that, are there places in the backend which could actually be refactored to use these new operators where it would reduce code complexity? > On the other hand, if you are *replicating* those data types, then you > don't want that tolerance. If you're checking whether two boxes are > equal, you may indeed want the small amount of fuzziness that our > comparison operators allow. But if you're copying a box or a float > from one table to another, or from one database to another, you want > the values copied exactly, including all of those low-order bits that > tend to foul up your comparisons. That's why float8out() normally > doesn't display any extra_float_digits - because you as the user > shouldn't be relying on them - but pg_dump does back them up because > not doing so would allow errors to propagate. Similarly here. I agree that we should be copying the values exactly- and I think we're already good there when it comes to doing a *copy*. I further agree that updating the matview should be a copy, but the manner in which we're doing that is using an equality check to see if the value needs to be updated or not which is where things get a bit fuzzy. If we were consistently copying and updating the value based on some external knowledge that the value has changed (similar to how slony works w/ triggers that dump change sets into a table- it doesn't consider "has any value on this row changed?"; the user did an update, presumably for some purpose, therefore the change gets recorded and propagated), I'd be perfectly happy. Thanks, Stephen
Steve, Thanks for following-up on this; I had meant to reply much sooner but other things got in the way. Thanks again, Stephen * Steve Singer (steve@ssinger.info) wrote: > Are there any outstanding issues on this patch preventing it from > being committed? > I think we have discussed this patch enough such that we now have > consensus on proceeding with adding a record identical operator to > SQL. > No one has objected to the latest names of the operators. > > You haven't adjusted the patch to reduce the duplication between the > equality and comparison functions, if you disagree with me and feel > that doing so would increase the code complexity and be inconsistent > with how we do things elsewhere that is fine.
Bruce Momjian <bruce@momjian.us> wrote: > On Fri, Sep 27, 2013 at 03:34:20PM -0700, Kevin Grittner wrote: >> We first need to document the existing record comparison operators. >> If they read the docs for comparing "row_constructors" and expect >> that to be the behavior they get when they compare records, they >> will be surprised. > > Well, if they appear in \do, I am thinking they should be documented. This patch now covers the ones which are record comparison operators, old and new. Feel free to document the others if you feel strongly on that point; but I don't feel that becomes the business of this patch. >> Because comparing primary keys doesn't tell us whether the old and >> new values in the row all match. > > OK, but my question was about why we need a full set of operators rather > than just equal, and maybe not equal. I thought you said we needed > others, e.g. >, so we could do merge joins, but I thought we would just > be doing comparisons after primary keys are joined, and if that is true, > we could just use a function. http://www.postgresql.org/docs/current/interactive/xoper-optimization.html#AEN54334 > Actually, I am now realizing you have to use the non-binary-level equals > comparison on keys, then the binary-level equals on rows for this to > work --- that's pretty confusing. Is that true? It's a matter of replacing the = operator for record comparisons in these two places with the new *= operator. http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/matview.c;h=238ccc72f5205ae00a15e6e17f384addfa445552;hb=master#l553 http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/matview.c;h=238ccc72f5205ae00a15e6e17f384addfa445552;hb=master#l684 (With or without the new operator, the second of these needs to be schema-qualified to avoid trouble if the user adds a different implementation of = ahead of the pg_catalog implementation on search_path, as users can do.) The difference between = and *= would not generally be visible to end users -- just to those working on matview.c. >> A quick query (lacking schema information and schema qualification) >> shows what is there by default: > > OK, the unique list is: > > opcname > --------------------- > varchar_ops > kd_point_ops > cidr_ops > text_pattern_ops > varchar_pattern_ops > bpchar_pattern_ops > (6 rows) > > Do these all have operators defined too? Every operator class is associated with operators. For example, while text_pattern_ops uses the same = and <> operators as the default text opclass (because that already uses memcmp), it adds ~>~, ~>=~, ~<~, and ~<=~ operators which also use memcmp (ignoring character encoding and collation). -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Steve Singer <steve@ssinger.info> wrote: > You haven't adjusted the patch to reduce the duplication between the > equality and comparison functions, if you disagree with me and feel that > doing so would increase the code complexity and be inconsistent with how > we do things elsewhere that is fine. I think the main reason to keep them separate is that it makes it easier to compare record_cmp to record_image_cmp and record_eq to record_image_eq to see what the differences and similarities are. Other reasons are that I think all those conditionals inside a combined function would get messy and make the logic harder to understand. The number of places that would need conditionals, plus new wrappers that would be needed, would mean that the net reduction in lines of code would be minimal. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Oct 3, 2013 at 11:12 AM, Stephen Frost <sfrost@snowman.net> wrote: >> Binary equality has existing precedent and is used in >> numerous places in the code for good reason. Users might be confused >> about the use of those semantics in those places also, but AFAICT >> nobody is. > > You've stated that a few times and I've simply not had time to run down > the validity of it- so, where does internal-to-PG binary equality end up > being visible to our users? Independent of that, are there places in > the backend which could actually be refactored to use these new > operators where it would reduce code complexity? Well, the most obvious example is HOT. README.HOT sayeth: % The requirement for doing a HOT update is that none of the indexed % columns are changed. This is checked at execution time by comparing the % binary representation of the old and new values. We insist on bitwise % equality rather than using datatype-specific equality routines. The % main reason to avoid the latter is that there might be multiple notions % of equality for a datatype, and we don't know exactly which one is % relevant for the indexes at hand. We assume that bitwise equality % guarantees equality for all purposes. That bit about multiple notions of equality applies here too - because we don't know what the user will do with the data in the materialized view, Kevin's looking for an operator which guarantees equality for all purposes. Note that HOT could feed everything through the send and recv functions, as you are proposing here, and that would allow HOT to apply in cases where it does not currently apply. We could, for example, perform a HOT update when a value in a numeric column is changed from the long format to the short format without changing the user-perceived value. You could argue that HOT isn't user-visible, but we certainly advise people to think about structuring their indexing in a fashion that does not defeat HOT, so I think to some extent it is user-visible. Also, in xfunc.sgml, we have this note: The planner also sometimes relies on comparing constants via bitwise equality, so you can get undesirable planningresults if logically-equivalent values aren't bitwise equal. There are other places as well. If two node trees are compared using equal(), any Const nodes in that tree will be compared for binary equality. So for example MergeWithExistingConstraint() will error out if the constraints are equal under btree equality operators but not binary equal. equal() is also used in various places in the planner, which may be the reason for the above warning. The point I want to make here is that we have an existing precedent to use bitwise equality when we want to make sure that values are equivalent for all purposes, regardless of what opclass or whatever is in use. There are not a ton of those places but there are some. >> On the other hand, if you are *replicating* those data types, then you >> don't want that tolerance. If you're checking whether two boxes are >> equal, you may indeed want the small amount of fuzziness that our >> comparison operators allow. But if you're copying a box or a float >> from one table to another, or from one database to another, you want >> the values copied exactly, including all of those low-order bits that >> tend to foul up your comparisons. That's why float8out() normally >> doesn't display any extra_float_digits - because you as the user >> shouldn't be relying on them - but pg_dump does back them up because >> not doing so would allow errors to propagate. Similarly here. > > I agree that we should be copying the values exactly- and I think we're > already good there when it comes to doing a *copy*. I further agree > that updating the matview should be a copy, but the manner in which > we're doing that is using an equality check to see if the value needs to > be updated or not which is where things get a bit fuzzy. If we were > consistently copying and updating the value based on some external > knowledge that the value has changed (similar to how slony works w/ > triggers that dump change sets into a table- it doesn't consider "has > any value on this row changed?"; the user did an update, presumably for > some purpose, therefore the change gets recorded and propagated), I'd be > perfectly happy. Sure, that'd work, but it doesn't explain what's wrong with Kevin's proposal. You're basically saying that memcpy(a, b, len) is OK with you but if (memcmp(a, b, len) != 0) memcpy(a, b, len) is not OK with you. I don't understand how you can endorse copying the value exactly, but not be OK with the optimization that says, well if it already matches exactly, then we don't need to copy it. We can certainly rip out the current implementation of REFRESH MATERIALIZED VIEW CONCURRENTLY and replace it with something that deletes every row in the view and reinserts them all, but it will be far less efficient than what we have now. All that is anybody is asking for here is the ability to skip deleting and reinserting rows that are absolutely identical in the old and new versions of the view. Your send/recv proposal would let us also skip deleting and reinserting rows that are ALMOST identical except for not-normally-user-visible binary format differences... but since we haven't worried about allowing such cases for e.g. HOT updates, I don't think we need to worry about them here, either. In practice, such changes are rare as hen's teeth anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Stephen Frost <sfrost@snowman.net> wrote: > If we were consistently copying and updating the value based on > some external knowledge that the value has changed (similar to > how slony works w/ triggers that dump change sets into a table- > it doesn't consider "has any value on this row changed?"; the > user did an update, presumably for some purpose, therefore the > change gets recorded and propagated), I'd be perfectly happy. That is the equivalent of the incremental maintenance feature which won't be coming until after REFRESH has settled down. REFRESH is more like the slony initial population of a table. REFRESH CONCURRENTLY is a way of doing that which allows users to continue to read from the matview without blocking or interruption while the REFRESH is running; I think it would be a bug if that could generate different results from a non-CONCURRENT REFRESH. Wanting incremental maintenance (which we all want) is no reason to block an earlier implementation of REFRESH CONCURRENTLY which is not, and does not use, incremental maintenance. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Oct 3, 2013 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote: > The point I want to make here is that we have an existing precedent to > use bitwise equality when we want to make sure that values are > equivalent for all purposes, regardless of what opclass or whatever is > in use. There are not a ton of those places but there are some. Btree opclasses (which are of course special, per "35.14.6. System Dependencies on Operator Classes") are restricted by the reflexive law, which implies that bitwise equality is a stronger condition than regularly equality. I'm inclined to agree that it isn't a big deal to expose a bitwise equality operator. We're talking about the possibility of being potentially overly eager according to someone's definition about refreshing, not the opposite. With reference to an actual case where this is possible, I don't think this is astonishing, though I grant Stephen that that's aesthetic. I also agree that an intermediate notion of equality isn't worth it. -- Peter Geoghegan
* Robert Haas (robertmhaas@gmail.com) wrote: > You could argue that HOT isn't user-visible, but we certainly advise > people to think about structuring their indexing in a fashion that > does not defeat HOT, so I think to some extent it is user-visible. I do think saying HOT is user-visible is really stretching things and do we really say somewhere "please be careful to make sure your updates to key fields are *BINARY IDENTICAL* to what's stored or HOT won't be used"? If anything, that should be listed on a PG 'gotchas' page. > Also, in xfunc.sgml, we have this note: > > The planner also sometimes relies on comparing constants via > bitwise equality, so you can get undesirable planning results if > logically-equivalent values aren't bitwise equal. And that would be under "C-Language Functions", which also includes things like "take care to zero out any alignment padding bytes that might be present in structs". > There are other places as well. If two node trees are compared using > equal(), any Const nodes in that tree will be compared for binary > equality. These would primairly be cases where we've created a Const out of a string, or similar, which the *user provided*, no? It strikes me as at least unlikely that we'd end up storing a given string from the user in different ways in memory and so this consideration, again, makes sense for people writing C code but not for your general SQL user. > So for example MergeWithExistingConstraint() will error out > if the constraints are equal under btree equality operators but not > binary equal. equal() is also used in various places in the planner, > which may be the reason for the above warning. I wonder if this would need to be changed because you could actually define constraints that operate at a binary level and therefore don't overlap even though they look like they overlap based on btree equality. > The point I want to make here is that we have an existing precedent to > use bitwise equality when we want to make sure that values are > equivalent for all purposes, regardless of what opclass or whatever is > in use. There are not a ton of those places but there are some. I agree that there are some cases and further that these operators provide a way of saying "are these definitely the same?" but they fall down on "are these definitely different?" That makes these operators useful for these kinds of optimizations, but that's it. Providing SQL level optimization-only operators like this is akin to the SQL standard defining indexes. > Sure, that'd work, but it doesn't explain what's wrong with Kevin's > proposal. You're basically saying that memcpy(a, b, len) is OK with > you but if (memcmp(a, b, len) != 0) memcpy(a, b, len) is not OK with > you. I don't understand how you can endorse copying the value > exactly, but not be OK with the optimization that says, well if it > already matches exactly, then we don't need to copy it. Adding new operators isn't all about what happens at the C-code level. That said, I agree that PG, in general, is more 'open' to exposing implementation details than is perhaps ideal, but it can also be quite useful in some instances. I don't really like doing that in top-level operators like this, but it doesn't seem like there's a whole lot of help for it. I'm not convinced that using the send/recv approach would be all that big of a performance hit but I've not tested it. > We can certainly rip out the current implementation of REFRESH > MATERIALIZED VIEW CONCURRENTLY and replace it with something that > deletes every row in the view and reinserts them all, but it will be > far less efficient than what we have now. All that is anybody is > asking for here is the ability to skip deleting and reinserting rows > that are absolutely identical in the old and new versions of the view. If this was an entirely internal thing, it'd be different, but it's not. > Your send/recv proposal would let us also skip deleting and > reinserting rows that are ALMOST identical except for > not-normally-user-visible binary format differences... but since we > haven't worried about allowing such cases for e.g. HOT updates, I > don't think we need to worry about them here, either. In practice, > such changes are rare as hen's teeth anyway. I'm not entirely convinced that what was done for HOT in this regard is a precedent we should be building on top of. Thanks, Stephen
On 10/04/2013 12:22 AM, Stephen Frost wrote: > That said, I agree that PG, in general, is more 'open' to exposing > implementation details than is perhaps ideal, Every *real* system is more open to exposing implementation details than is *ideal*. One very popular "implementation detail" which surprises users over and over is performance under different use cases. There is no way you can hide this. That said, I see nothing bad in having an operator for "binary equal" or alternately called "guaranteed equal". Its negator is not "guaranteed unequal" but "not guaranteed to be equal" The main exposed implementation detail of this operator is that it is very fast and can be recommended to be used at user level for speeding up equal query like this SELECT * FROM t WHERE <guaranteed equal> or <equal> where the plain <equal> will only be called when fast <guaranteed equal> fails. a bit similar to how you can cut down on index size on long text fields by indexing their hashes and then querying SELECT * FROM tWHERE hashtext(verylongtext) = hashtext(sample) AND verylongtext = sample It is absolutely possible that the fact that hash clashes exist also confuses some users the same way the possibility of having binary inequality and be still NOT DISTINCT FROM, but I argue that having a fast "guaranteed equal" operation available to users is useful. some users may also initially get confused by SELECT 3/2; returning 1 and not "the right answer" of 1.5. They may even bitch and moan that PostgreSQL is completely broken. But then they look it up or ask on the net and are confused no more. Greetings Hannu
* Hannu Krosing (hannu@krosing.net) wrote: > The main exposed implementation detail of this operator is that it is > very fast and can be recommended to be used at user level for speeding > up equal query like this > > SELECT * FROM t WHERE <guaranteed equal> or <equal> > > where the plain <equal> will only be called when fast <guaranteed equal> > fails. Yeah, this would be exactly the kind of misuse that we will need to be prepared to support with these new operators. If this is actually faster/better/whatever, then we should be implementing it in our conditional handling, not encouraging users to create hacks like this. > a bit similar to how you can cut down on index size on long text fields by > indexing their hashes and then querying > > SELECT * FROM t > WHERE hashtext(verylongtext) = hashtext(sample) > AND verylongtext = sample This case clearly requires a great deal more thought and consideration on the DBA's side and is also a lot more obvious what it's doing than having 'where x *= 123 or x = 123'. Thanks, Stephen
On Thu, Oct 3, 2013 at 10:46:05AM -0700, Kevin Grittner wrote: > > opcname > > --------------------- > > varchar_ops > > kd_point_ops > > cidr_ops > > text_pattern_ops > > varchar_pattern_ops > > bpchar_pattern_ops > > (6 rows) > > > > Do these all have operators defined too? > > Every operator class is associated with operators. For example, > while text_pattern_ops uses the same = and <> operators as the > default text opclass (because that already uses memcmp), it adds > ~>~, ~>=~, ~<~, and ~<=~ operators which also use memcmp (ignoring > character encoding and collation). OK, my questions have been answered and I am no longer concerned about this patch causing "equality" confusion for our users. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +