Thread: record identical operator

Re: record identical operator

From

Benedikt Grundmann

Date:

13 September 2013, 07:05:37

On Thu, Sep 12, 2013 at 11:27 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Attached is a patch for a bit of infrastructure I believe to be
necessary for correct behavior of REFRESH MATERIALIZED VIEW
CONCURRENTLY as well as incremental maintenance of matviews.
[...]
The patch adds an "identical" operator (===) for the record type:

[...]

The new operator is logically similar to IS NOT DISTINCT FROM for a
record, although its implementation is very different. For one
thing, it doesn't replace the operation with column level operators
in the parser. For another thing, it doesn't look up operators for
each type, so the "identical" operator does not need to be
implemented for each type to use it as shown above. It compares
values byte-for-byte, after detoasting. The test for identical
records can avoid the detoasting altogether for any values with
different lengths, and it stops when it finds the first column with
a difference.

I toyed with the idea of supporting hashing of records using this
operator, but could not see how that would be a performance win.

The identical (===) and not identical (!==) operator names were
chosen because of a vague similarity to the "exactly equals"
concepts in JavaScript and PHP, which use that name. The semantics
aren't quite the same, but it seemed close enough not to be too
surprising. The additional operator names seemed natural to me
based on the first two, but I'm not really that attached to these
names for the operators if someone has a better idea.

Since the comparison of record values is not documented (only
comparisons involving row value constructors), it doesn't seem like
we should document this special case. It is intended primarily for
support of matview refresh and maintenance, and it seems likely
that record comparison was not documented on the basis that it is
intended primarily for support of such things as indexing and merge
joins -- so leaving the new operators undocumented seems consistent
with existing policy. I'm open to arguments that the policy should
change.

-

Wouldn't it be slightly less surprising / magical to not declare new operators
but just new primitive functions? In the least invasive version they could even
be called matview_is_record_identical or similar.

cheers,

Bene

Re: record identical operator

From

Kevin Grittner

Date:

13 September 2013, 12:55:08

Benedikt Grundmann <bgrundmann@janestreet.com> wrote:

> Kevin Grittner <kgrittn@ymail.com> wrote:
>
>> Attached is a patch for a bit of infrastructure I believe to be
>> necessary for correct behavior of REFRESH MATERIALIZED VIEW
>> CONCURRENTLY as well as incremental maintenance of matviews.
>> [...]
>> The patch adds an "identical" operator (===) for the record
>> type:

> Wouldn't it be slightly less surprising / magical to not declare
> new operators but just new primitive functions?  In the least
> invasive version they could even be called
> matview_is_record_identical or similar.

I'm not sure what is particularly surprising or magical about new
operators, but it is true that the patch could be smaller if
operators were not added.  The SQL functions added by the current
patch to support the operator approach are:

extern Datum record_image_eq(PG_FUNCTION_ARGS);
extern Datum record_image_ne(PG_FUNCTION_ARGS);
extern Datum record_image_lt(PG_FUNCTION_ARGS);
extern Datum record_image_gt(PG_FUNCTION_ARGS);
extern Datum record_image_le(PG_FUNCTION_ARGS);
extern Datum record_image_ge(PG_FUNCTION_ARGS);
extern Datum btrecordimagecmp(PG_FUNCTION_ARGS);

All take two record arguments and all but the last return a
boolean.  The last one returns -1, 0, or 1 depending on how the
values compare.  All comparisons are based on memcmp() of the data
in its storage format (after detoasting, where applicable).

As currently written, the patch still requires a unique index on
the matview in order to allow RMVC, but this patch was intended to
support removal of that restriction, which is something some people
were saying they wanted.  It just seemed best to do that with a
separate patch once we had the mechanism to support it.

RMVC currently generates its "diff" data with a query similar to
this:

test=# explain analyze
test-# SELECT *
test-#   FROM citext_matview m
test-#   FULL JOIN citext_table t ON (t === m)
test-#   WHERE t.id IS NULL OR m.id IS NULL;
test=# \pset pager off
Pager usage is off.
test=# explain analyze
SELECT *
  FROM citext_matview m
  FULL JOIN citext_table t ON (t.id = m.id AND t === m)
  WHERE t.id IS NULL OR m.id IS NULL;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Hash Full Join  (cost=1.11..2.24 rows=1 width=16) (actual time=0.056..0.056 rows=0 loops=1)
   Hash Cond: (m.id = t.id)
   Join Filter: (t.* === m.*)
   Filter: ((t.id IS NULL) OR (m.id IS NULL))
   Rows Removed by Filter: 5
   ->  Seq Scan on citext_matview m  (cost=0.00..1.05 rows=5 width=40) (actual time=0.002..0.006 rows=5 loops=1)
   ->  Hash  (cost=1.05..1.05 rows=5 width=40) (actual time=0.023..0.023 rows=5 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Seq Scan on citext_table t  (cost=0.00..1.05 rows=5 width=40) (actual time=0.010..0.016 rows=5 loops=1)
 Total runtime: 0.102 ms
(10 rows)

With the operator support, we can remove the primary key columns
from the join conditions, and it still works, albeit with a slower
plan:

test=# explain analyze
SELECT *
  FROM citext_matview m
  FULL JOIN citext_table t ON (t === m)
  WHERE t.id IS NULL OR m.id IS NULL;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Merge Full Join  (cost=2.22..2.32 rows=1 width=16) (actual time=0.072..0.072 rows=0 loops=1)
   Merge Cond: (m.* === t.*)
   Filter: ((t.id IS NULL) OR (m.id IS NULL))
   Rows Removed by Filter: 5
   ->  Sort  (cost=1.11..1.12 rows=5 width=40) (actual time=0.035..0.035 rows=5 loops=1)
         Sort Key: m.*
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on citext_matview m  (cost=0.00..1.05 rows=5 width=40) (actual time=0.012..0.016 rows=5 loops=1)
   ->  Sort  (cost=1.11..1.12 rows=5 width=40) (actual time=0.014..0.014 rows=5 loops=1)
         Sort Key: t.*
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on citext_table t  (cost=0.00..1.05 rows=5 width=40) (actual time=0.003..0.003 rows=5 loops=1)
 Total runtime: 0.128 ms
(13 rows)

So, if the operators are included it will be a very small and
simple patch to relax the requirement for a unique index.
Currently we generate an error if no index columns were found, but
with the new operators we could leave them out and the query would
still work.  Adding indexes to matviews would then be just a matter
of optimization, not a requirement to be able to use the RMVC
feature.

Using the functions instead of the operators things work just as
well as long as we use columns in the join conditions, which is
currently based on indexed columns:

test=# explain analyze
test-# SELECT *
test-#   FROM citext_matview m
test-#   FULL JOIN citext_table t ON (t.id = m.id AND record_image_eq(t, m))
test-#   WHERE t.id IS NULL OR m.id IS NULL;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Hash Full Join  (cost=1.11..2.24 rows=1 width=16) (actual time=0.056..0.056 rows=0 loops=1)
   Hash Cond: (m.id = t.id)
   Join Filter: record_image_eq(t.*, m.*)
   Filter: ((t.id IS NULL) OR (m.id IS NULL))
   Rows Removed by Filter: 5
   ->  Seq Scan on citext_matview m  (cost=0.00..1.05 rows=5 width=40) (actual time=0.003..0.006 rows=5 loops=1)
   ->  Hash  (cost=1.05..1.05 rows=5 width=40) (actual time=0.023..0.023 rows=5 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Seq Scan on citext_table t  (cost=0.00..1.05 rows=5 width=40) (actual time=0.011..0.015 rows=5 loops=1)
 Total runtime: 0.101 ms
(10 rows)

Without columns in the FULL JOIN conditions, the function fails
entirely, because it is the operator information which lets the
planner know that a FULL JOIN is even possible:

test=# explain analyze
test-# SELECT *
test-#   FROM citext_matview m
test-#   FULL JOIN citext_table t ON (record_image_eq(t, m))
test-#   WHERE t.id IS NULL OR m.id IS NULL;
ERROR:  FULL JOIN is only supported with merge-joinable or hash-joinable join conditions

So, either the indexes would continue to be required, or we would
need some other criteria for deciding which columns to use for the
additional FULL JOIN criteria.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Andres Freund

Date:

13 September 2013, 15:46:55

Hi Kevin,

On 2013-09-12 15:27:27 -0700, Kevin Grittner wrote:
> The new operator is logically similar to IS NOT DISTINCT FROM for a
> record, although its implementation is very different.  For one
> thing, it doesn't replace the operation with column level operators
> in the parser.  For another thing, it doesn't look up operators for
> each type, so the "identical" operator does not need to be
> implemented for each type to use it as shown above.  It compares
> values byte-for-byte, after detoasting.  The test for identical
> records can avoid the detoasting altogether for any values with
> different lengths, and it stops when it finds the first column with
> a difference.

In the general case, that operator sounds dangerous to me. We don't
guarantee that a Datum containing the same data always has the same
binary representation. E.g. array can have a null bitmap or may not have
one, depending on how they were created.

I am not actually sure whether that's a problem for your usecase, but I
get headaches when we try circumventing the type abstraction that way.

Yes, we do such tricks in other places already, but afaik in all those
places errorneously believing two Datums are distinct is not error, just
a missed optimization. Allowing a general operator with such a murky
definition to creep into something SQL exposed... Hm. Not sure.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Kevin Grittner

Date:

13 September 2013, 21:36:38

Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-09-12 15:27:27 -0700, Kevin Grittner wrote:
>> The new operator is logically similar to IS NOT DISTINCT FROM for a
>> record, although its implementation is very different.  For one
>> thing, it doesn't replace the operation with column level operators
>> in the parser.  For another thing, it doesn't look up operators for
>> each type, so the "identical" operator does not need to be
>> implemented for each type to use it as shown above.  It compares
>> values byte-for-byte, after detoasting.  The test for identical
>> records can avoid the detoasting altogether for any values with
>> different lengths, and it stops when it finds the first column with
>> a difference.
>
> In the general case, that operator sounds dangerous to me. We don't
> guarantee that a Datum containing the same data always has the same
> binary representation. E.g. array can have a null bitmap or may not have
> one, depending on how they were created.
>
> I am not actually sure whether that's a problem for your usecase, but I
> get headaches when we try circumventing the type abstraction that way.
>
> Yes, we do such tricks in other places already, but afaik in all those
> places errorneously believing two Datums are distinct is not error, just
> a missed optimization. Allowing a general operator with such a murky
> definition to creep into something SQL exposed... Hm. Not sure.

Well, the only two alternatives I could see were to allow
user-visible differences not to be carried to the matview if they
old and new values were considered "equal", or to implement an
"identical" operator or function in every type that was to be
allowed in a matview.  Given those options, what's in this patch
seemed to me to be the least evil.

It might be worth noting that this scheme doesn't have a problem
with correctness if there are multiple equal values which are not
identical, as long as any two identical values are equal.  If the
query which generates contents for a matview generates
non-identical but equal values from one run to the next without any
particular reason, that might cause performance problems.

To mangle Orwell: "Among pairs of equal values, some pairs are more
equal than others."

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Andres Freund

Date:

13 September 2013, 21:59:10

On 2013-09-13 14:36:27 -0700, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-09-12 15:27:27 -0700, Kevin Grittner wrote:
> >> The new operator is logically similar to IS NOT DISTINCT FROM for a
> >> record, although its implementation is very different.  For one
> >> thing, it doesn't replace the operation with column level operators
> >> in the parser.  For another thing, it doesn't look up operators for
> >> each type, so the "identical" operator does not need to be
> >> implemented for each type to use it as shown above.  It compares
> >> values byte-for-byte, after detoasting.  The test for identical
> >> records can avoid the detoasting altogether for any values with
> >> different lengths, and it stops when it finds the first column with
> >> a difference.
> >
> > In the general case, that operator sounds dangerous to me. We don't
> > guarantee that a Datum containing the same data always has the same
> > binary representation. E.g. array can have a null bitmap or may not have
> > one, depending on how they were created.
> >
> > I am not actually sure whether that's a problem for your usecase, but I
> > get headaches when we try circumventing the type abstraction that way.
> >
> > Yes, we do such tricks in other places already, but afaik in all those
> > places errorneously believing two Datums are distinct is not error, just
> > a missed optimization. Allowing a general operator with such a murky
> > definition to creep into something SQL exposed... Hm. Not sure.
>
> Well, the only two alternatives I could see were to allow
> user-visible differences not to be carried to the matview if they
> old and new values were considered "equal", or to implement an
> "identical" operator or function in every type that was to be
> allowed in a matview.  Given those options, what's in this patch
> seemed to me to be the least evil.
>
> It might be worth noting that this scheme doesn't have a problem
> with correctness if there are multiple equal values which are not
> identical, as long as any two identical values are equal.  If the
> query which generates contents for a matview generates
> non-identical but equal values from one run to the next without any
> particular reason, that might cause performance problems.

I am not actually that concerned with MVCs using this, you're quite
capable of analyzing the dangers. What I am wary of is exposing an
operator that's basically broken from the get go to SQL.
Now, the obvious issue there is that matviews use SQL to refresh :(

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Kevin Grittner

Date:

13 September 2013, 22:13:25

Andres Freund <andres@2ndquadrant.com> wrote:

> I am not actually that concerned with MVCs using this, you're quite
> capable of analyzing the dangers. What I am wary of is exposing an
> operator that's basically broken from the get go to SQL.
> Now, the obvious issue there is that matviews use SQL to refresh :(

I'm not sure why these operators are more broken or dangerous than
those which already exist to support the text_pattern_ops and
bpchar_pattern_ops operator families.  I could overload those
operator names as much as possible if that seems better.  As I said
at the start of the thread, I have no particular attachment to
these operator names.  For example, that would mean using ~>=~ as
the operator for record_image_ge() instead of using >==.  It would
be trivial to make that adjustment to the patch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Andres Freund

Date:

13 September 2013, 22:20:19

On 2013-09-13 15:13:20 -0700, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
>
> > I am not actually that concerned with MVCs using this, you're quite
> > capable of analyzing the dangers. What I am wary of is exposing an
> > operator that's basically broken from the get go to SQL.
> > Now, the obvious issue there is that matviews use SQL to refresh :(
>
> I'm not sure why these operators are more broken or dangerous than
> those which already exist to support the text_pattern_ops and
> bpchar_pattern_ops operator families.  I could overload those
> operator names as much as possible if that seems better.  As I said
> at the start of the thread, I have no particular attachment to
> these operator names.  For example, that would mean using ~>=~ as
> the operator for record_image_ge() instead of using >==.  It would
> be trivial to make that adjustment to the patch.

Hm. I don't see the similarity. Those have pretty clearly defined
behaviour. Not one that's dependendant on padding bytes, null bitmaps
that can or cannot be present and such.

I guess we need input from others here.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Kevin Grittner

Date:

14 September 2013, 02:20:17

Andres Freund <andres@2ndquadrant.com> wrote:

> Not one that's dependendant on padding bytes, null bitmaps that
> can or cannot be present and such.

Can you provide an example of where that's an issue with this
patch?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Andres Freund

Date:

14 September 2013, 13:09:57

On 2013-09-13 19:20:11 -0700, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> 
> > Not one that's dependendant on padding bytes, null bitmaps that
> > can or cannot be present and such.
> 
> Can you provide an example of where that's an issue with this
> patch?

I haven't yet tested your patch, but what I am talking about is that
e.g.: SELECT (ARRAY[1,2,3,NULL])[1:3] = ARRAY[1,2,3];
obviously should be true. But both arrays don't have the same binary
representation since the former has a null bitmap, the latter not.
So, if you had a composite type like (int4[]) and would compare that
without invoking operators you'd return something false in some cases
because of the null bitmaps.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Kevin Grittner

Date:

14 September 2013, 18:26:01

Andres Freund <andres@2ndquadrant.com> wrote:

> what I am talking about is that
> e.g.: SELECT (ARRAY[1,2,3,NULL])[1:3] = ARRAY[1,2,3];
> obviously should be true.

The patch does not change the behavior of the = operator for any
type under any circumstances.

> But both arrays don't have the same binary representation since
> the former has a null bitmap, the latter not. So, if you had a
> composite type like (int4[]) and would compare that without
> invoking operators you'd return something false in some cases
> because of the null bitmaps.

Not for the = operator.  The new "identical" operator would find
them to not be identical, though.

Since the new operator is only for the record type, I need to wrap
the values in your example:

test=# SELECT row ((ARRAY[1,2,3,NULL])[1:3])::record
test-#      = row (ARRAY[1,2,3])::record;
 ?column?
----------
 t
(1 row)

test=# SELECT row ((ARRAY[1,2,3,NULL])[1:3])::record
test-#    === row (ARRAY[1,2,3])::record;
 ?column?
----------
 f
(1 row)

Or, to borrow from the citext example using arrays:

test=# CREATE TABLE array_table (
test(#   id serial primary key,
test(#   nums int4[]
test(# );
CREATE TABLE
test=# INSERT INTO array_table (nums)
test-#   VALUES ((ARRAY[1,2,3,NULL])[1:3]), (ARRAY[1,2,3]),
test-#          (ARRAY[1,2,3]), (NULL), (NULL);
INSERT 0 5
test=# CREATE MATERIALIZED VIEW array_matview AS
test-#   SELECT * FROM array_table;
SELECT 5
test=# CREATE UNIQUE INDEX array_matview_id
test-#   ON array_matview (id);
CREATE INDEX
test=# select * from array_matview;
 id |  nums   
----+---------
  1 | {1,2,3}
  2 | {1,2,3}
  3 | {1,2,3}
  4 |
  5 |
(5 rows)

Note that the on-disk representation of the row where id = 1
differs from the on-disk representation where id in (2,3), both in
the table and the matview.

test=# SELECT *
test-#   FROM array_matview m
test-#   FULL JOIN array_table t ON (t.id = m.id AND t === m)
test-#   WHERE t.id IS NULL OR m.id IS NULL;
 id | nums | id | nums
----+------+----+------
(0 rows)

... so the query looking for work for the RMVC statement finds
nothing to do.

test=# UPDATE array_table SET nums = (ARRAY[1,2,3,NULL])[1:3]
test-#   WHERE id between 1 and 2;
UPDATE 2

Now we have added an unnecessary bitmap to the on-disk storage of
the value where id = 2.

test=# SELECT *
test-#   FROM array_matview m
test-#   FULL JOIN array_table t ON (t.id = m.id AND t === m)
test-#   WHERE t.id IS NULL OR m.id IS NULL;
 id |  nums   | id |  nums   
----+---------+----+---------
    |         |  2 | {1,2,3}
  2 | {1,2,3} |    |
(2 rows)

... and the query sees that they differ.

test=# REFRESH MATERIALIZED VIEW CONCURRENTLY array_matview;
REFRESH MATERIALIZED VIEW
test=# SELECT *
test-#   FROM array_matview m
test-#   FULL JOIN array_table t ON (t.id = m.id AND t === m)
test-#   WHERE t.id IS NULL OR m.id IS NULL;
 id | nums | id | nums
----+------+----+------
(0 rows)

The REFRESH causes them to match again, and later REFRESH runs
won't see a need to do any work there unless the on-disk
representation changes again.

As far as I can see, we have four choices:

(1)  Never update values that are "equal", even if they appear
different to the users, as was demonstrated with the citext
example.

(2)  Require every data type which can be used in a matview to
implement some new operator or function for "identical".  Perhaps
that could be mitigated to only implementat it if equal values can
have user-visible differences.

(3)  Embed special cases into record identical tests for types
known to allow multiple on-disk representations which have no
user-visible differences.

(4)  Base the need to update a matview column on whether its
on-disk representation is identical to what a new run of the
defining query would generate.  If this causes performance problems
for use of a given type in a matview, one possible solution would
be to modify that particular type to use a canonical format when
storing a value into a record.  For example, storing an array which
has a bitmap of null values even though there are no nulls in the
array could strip the bitmap as it is stored to the record.

Currently we are using (4).  I only included (3) for completeness;
even just typing it as a hypothetical made me want to take a
shower.  (1) seems pretty horrid, too.  (2) isn't evil, exactly,
but any types which allowed user-visible differences in equal
values would not exhibit correct behavior in matviews until and
unless an "identical" operator was added.  It might also perform
noticeably worse than (4).

Option (4) has the advantage of showing correct logical behavior in
all types immediately, and restricting any performance fixes to the
types with the inconsistent storage formats.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Andres Freund

Date:

14 September 2013, 18:58:48

On 2013-09-14 11:25:52 -0700, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
>
> > what I am talking about is that
> > e.g.: SELECT (ARRAY[1,2,3,NULL])[1:3] = ARRAY[1,2,3];
> > obviously should be true.
>
> The patch does not change the behavior of the = operator for any
> type under any circumstances.

Yes, sure. I wasn't thinking you would.

> > But both arrays don't have the same binary representation since
> > the former has a null bitmap, the latter not. So, if you had a
> > composite type like (int4[]) and would compare that without
> > invoking operators you'd return something false in some cases
> > because of the null bitmaps.
>
> Not for the = operator.  The new "identical" operator would find
> them to not be identical, though.

Yep. And I think that's a problem if exposed to SQL. People won't
understand the hazards and end up using it because its faster or
somesuch.

> Since the new operator is only for the record type, I need to wrap
> the values in your example:

Yes.

> The REFRESH causes them to match again, and later REFRESH runs
> won't see a need to do any work there unless the on-disk
> representation changes again.

Yes, I understand that the matview code itself will just perform
superflous work. We use such comparisons in other parts of the code
similarly.

> As far as I can see, we have four choices:
>
> (1)  Never update values that are "equal", even if they appear
> different to the users, as was demonstrated with the citext
> example.

I think, introducing a noticeable amount of infrastructure for this just
because of citext is a bad idea.
At some point we need to replace citext with proper case-insensitive
collation support - then it really might become necessary.

> (2)  Require every data type which can be used in a matview to
> implement some new operator or function for "identical".  Perhaps
> that could be mitigated to only implementat it if equal values can
> have user-visible differences.

That basically would require adding a new member to btree opclasses that
btrees don't need themselves... Hm.

> (3)  Embed special cases into record identical tests for types
> known to allow multiple on-disk representations which have no
> user-visible differences.

I think this is a complete nogo. a) I don't forsee we know of all these
cases b) it wouldn't be extensible.

Oh. Now that I've read further, I see you feel the same. Good ;)

> (4)  Base the need to update a matview column on whether its
> on-disk representation is identical to what a new run of the
> defining query would generate.  If this causes performance problems
> for use of a given type in a matview, one possible solution would
> be to modify that particular type to use a canonical format when
> storing a value into a record.  For example, storing an array which
> has a bitmap of null values even though there are no nulls in the
> array could strip the bitmap as it is stored to the record.

If matview refreshs weren't using plain SQL and thus wouldn't require
exposing that operator to SQL I wouldn't have a problem with this...

There's the ungodly ugly choice of having an matview_equal function (or
operator) that checks if we're doing a refresh atm...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Kevin Grittner

Date:

15 September 2013, 11:35:16

Andres Freund <andres@2ndquadrant.com> wrote:

> If matview refreshs weren't using plain SQL and thus wouldn't
> require exposing that operator to SQL I wouldn't have a problem
> with this...

If RMVC were the end of the story, it might be worth building up a
mass of execution nodes directly, although it would be hard to see
how we could make the right planning choices (e.g., MergeJoin
versus HashJoin) that way.  But the whole incremental maintenance
area, to have any chance of working accurately and without an
endless stream of bugs, needs to be based on relational algebra.
There needs to be a way to express that in a much higher level
language than execution node creation.  If it doesn't use SQL we
would need to invent a relational language very much like it, which
would be silly when we have a perfectly good language we can
already use.  The sky is blue; let's move on.

The test for identical records will be needed in SQL if we want to
have these matview features.  We could limit use of that to
contexts where MatViewIncrementalMaintenanceIsEnabled(), but I
don't see the point.  If someone uses an undocumented operator, and
uses it inappropriately, they may get a surprising result.  We
already have undocumented operators to support record comparison
and pattern opclasses, and if people use those they may get
surprising results.  I don't recall any reports of problems from
people actually trying to do so.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Noah Misch

Date:

15 September 2013, 23:49:35

On Sat, Sep 14, 2013 at 08:58:32PM +0200, Andres Freund wrote:
> On 2013-09-14 11:25:52 -0700, Kevin Grittner wrote:
> > Andres Freund <andres@2ndquadrant.com> wrote:
> > > But both arrays don't have the same binary representation since
> > > the former has a null bitmap, the latter not. So, if you had a
> > > composite type like (int4[]) and would compare that without
> > > invoking operators you'd return something false in some cases
> > > because of the null bitmaps.
> > 
> > Not for the = operator.  The new "identical" operator would find
> > them to not be identical, though.
> 
> Yep. And I think that's a problem if exposed to SQL. People won't
> understand the hazards and end up using it because its faster or
> somesuch.

The important question is whether to document the new operator and/or provide
it under a guessable name.  If we give the operator a weird name, don't
document it, and put an "internal use only" comment in the catalogs, that is
essentially as good as hiding this feature at the SQL level.

I'm of two minds on that question.  On the one hand, MV maintenance is hardly
the first use case for an identity operator.  Any replication system or user
space materialized view implementation might want this.  On the other hand,
offering it for the record type exclusively is surprising.  It's also
surprising how records with different numbers of dropped columns can be found
identical, even though a record column within the top-level record is not
permitted to vary that way.

Supposing a decision to document the operator, a second question is whether
"===" is the right name:

On Thu, Sep 12, 2013 at 03:27:27PM -0700, Kevin Grittner wrote:
> The identical (===) and not identical (!==) operator names were
> chosen because of a vague similarity to the "exactly equals"
> concepts in JavaScript and PHP, which use that name.  The semantics
> aren't quite the same, but it seemed close enough not to be too
> surprising.

Maybe.  If we were mimicking the JavaScript/PHP operator of the same name,
'1.0'::numeric === '1.00'::numeric would return true, and '1'::int4 ===
'1'::int2 would return false.  The patch submitted returns false for the first
example and raises a "cannot compare dissimilar column types" error when
comparing an int4 record column to an int2 record column.

> I think, introducing a noticeable amount of infrastructure for this just
> because of citext is a bad idea.
> At some point we need to replace citext with proper case-insensitive
> collation support - then it really might become necessary.

citext is just one example.  Others include '1.0'::numeric = '1.00'::numeric
and '30 day'::interval = '1 mon'::interval.

> > (2)  Require every data type which can be used in a matview to
> > implement some new operator or function for "identical".  Perhaps
> > that could be mitigated to only implementat it if equal values can
> > have user-visible differences.
> 
> That basically would require adding a new member to btree opclasses that
> btrees don't need themselves... Hm.

That's the wrong way to model it.  Identity is a particular kind of equality,
which is to say a particular hash opclass.  It so happens that memcmp() is the
logical way to implement most/all identity operators, so we can get a btree
opclass as well.

Type-specific identity operators seem like overkill, anyway.  If we find that
meaningless variations in a particular data type are causing too many false
non-matches for the generic identity operator, the answer is to make the
functions generating datums of that type settle on a canonical form.  That
would be the solution for your example involving array null bitmaps.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: record identical operator

From

Hannu Krosing

Date:

16 September 2013, 00:08:46

On 09/15/2013 01:35 PM, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
>
>> If matview refreshs weren't using plain SQL and thus wouldn't
>> require exposing that operator to SQL I wouldn't have a problem
>> with this...
> If RMVC were the end of the story, it might be worth building up a
> mass of execution nodes directly, although it would be hard to see
> how we could make the right planning choices (e.g., MergeJoin
> versus HashJoin) that way.  But the whole incremental maintenance
> area, to have any chance of working accurately and without an
> endless stream of bugs, needs to be based on relational algebra.
> There needs to be a way to express that in a much higher level
> language than execution node creation.  If it doesn't use SQL we
> would need to invent a relational language very much like it, which
> would be silly when we have a perfectly good language we can
> already use.  The sky is blue; let's move on.
>
> The test for identical records will be needed in SQL if we want to
> have these matview features.  We could limit use of that to
> contexts where MatViewIncrementalMaintenanceIsEnabled(), but I
> don't see the point.  If someone uses an undocumented operator, and
> uses it inappropriately, they may get a surprising result.
Just remember to document it as "undocumented" so
people will know not to use them ;)

Lots of people were bitten when (undocumented) hash
functions were changed thus breaking hash-based partitioning.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: record identical operator

From

Kevin Grittner

Date:

16 September 2013, 14:01:07

Hannu Krosing <hannu@2ndQuadrant.com> wrote:

> Lots of people were bitten when (undocumented) hash
> functions were changed thus breaking hash-based partitioning.

Nobody can be affected by the new operators in this patch unless
they choose to use them to compare two records.  They don't work
for any other type and they don't come into play unless you
specifically request them.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Andres Freund

Date:

16 September 2013, 14:28:36

On 2013-09-15 19:49:26 -0400, Noah Misch wrote:
> On Sat, Sep 14, 2013 at 08:58:32PM +0200, Andres Freund wrote:
> > On 2013-09-14 11:25:52 -0700, Kevin Grittner wrote:
> > > Andres Freund <andres@2ndquadrant.com> wrote:
> > > > But both arrays don't have the same binary representation since
> > > > the former has a null bitmap, the latter not. So, if you had a
> > > > composite type like (int4[]) and would compare that without
> > > > invoking operators you'd return something false in some cases
> > > > because of the null bitmaps.
> > > 
> > > Not for the = operator.  The new "identical" operator would find
> > > them to not be identical, though.
> > 
> > Yep. And I think that's a problem if exposed to SQL. People won't
> > understand the hazards and end up using it because its faster or
> > somesuch.
> 
> The important question is whether to document the new operator and/or provide
> it under a guessable name.  If we give the operator a weird name, don't
> document it, and put an "internal use only" comment in the catalogs, that is
> essentially as good as hiding this feature at the SQL level.

Doesn't match my experience.

> Type-specific identity operators seem like overkill, anyway.  If we find that
> meaningless variations in a particular data type are causing too many false
> non-matches for the generic identity operator, the answer is to make the
> functions generating datums of that type settle on a canonical form.  That
> would be the solution for your example involving array null bitmaps.

I think that's pretty much unrealistic. I am pretty sure that if either
of us starts looking we will find at about a dozen of such cases and
miss the other dozen. Not to speak about external code which is damn
likely to contain such cases.
And I think that efficiency will often make such normalization expensive
(consider postgis where Datums afaik can exist with an internal bounding
box or without).

I think it's far more realistic to implement an identity operator that
will fall back to a type specific operator iff equals has "strange"
properties.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Kevin Grittner

Date:

16 September 2013, 15:28:25

Andres Freund <andres@2ndquadrant.com> wrote:

> I think it's far more realistic to implement an identity operator
> that will fall back to a type specific operator iff equals has
> "strange" properties.

My biggest problem with that approach is that it defaults to
incorrect behavior for types for which user-visible differences
among "equal" values are possible, and requires per-type fixes to
get correct behavior.  The approach in this patch provides correct
behavior as the default, with possible per-type changes for
optimization, as people find that desirable.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Merlin Moncure

Date:

16 September 2013, 15:42:37

On Sun, Sep 15, 2013 at 6:49 PM, Noah Misch <noah@leadboat.com> wrote:
> On Sat, Sep 14, 2013 at 08:58:32PM +0200, Andres Freund wrote:
>> On 2013-09-14 11:25:52 -0700, Kevin Grittner wrote:
>> > Andres Freund <andres@2ndquadrant.com> wrote:
>> > > But both arrays don't have the same binary representation since
>> > > the former has a null bitmap, the latter not. So, if you had a
>> > > composite type like (int4[]) and would compare that without
>> > > invoking operators you'd return something false in some cases
>> > > because of the null bitmaps.
>> >
>> > Not for the = operator.  The new "identical" operator would find
>> > them to not be identical, though.
>>
>> Yep. And I think that's a problem if exposed to SQL. People won't
>> understand the hazards and end up using it because its faster or
>> somesuch.
>
> The important question is whether to document the new operator and/or provide
> it under a guessable name.  If we give the operator a weird name, don't
> document it, and put an "internal use only" comment in the catalogs, that is
> essentially as good as hiding this feature at the SQL level.
>
> I'm of two minds on that question.  On the one hand, MV maintenance is hardly
> the first use case for an identity operator.  Any replication system or user
> space materialized view implementation might want this.  On the other hand,
> offering it for the record type exclusively is surprising.  It's also
> surprising how records with different numbers of dropped columns can be found
> identical, even though a record column within the top-level record is not
> permitted to vary that way.
>
> Supposing a decision to document the operator, a second question is whether
> "===" is the right name:

I vote to reserve '===' as shorthand for 'IS NOT DISTINCT FROM' and
give the binary equality operator a funky name.  I would document the
operator though.

merlin

Re: record identical operator

From

Hannu Krosing

Date:

16 September 2013, 15:59:08

On 09/16/2013 04:01 PM, Kevin Grittner wrote:
> Hannu Krosing <hannu@2ndQuadrant.com> wrote:
>
>> Lots of people were bitten when (undocumented) hash
>> functions were changed thus breaking hash-based partitioning.
> Nobody can be affected by the new operators in this patch unless
> they choose to use them to compare two records.  They don't work
> for any other type and they don't come into play unless you
> specifically request them.
What I meant is that rather than leave it really undocumented,
document it as "system function for specific usage, has caveats
and may change in future versions. use at your own risk and
make sure you know what you are doing"

PostgreSQL has good enough introspection features that people
tend to find functions and operators using psql-s \d ...
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: record identical operator

From

Kevin Grittner

Date:

16 September 2013, 17:47:01

Hannu Krosing <hannu@2ndQuadrant.com> wrote:

> What I meant is that rather than leave it really undocumented,
> document it as "system function for specific usage, has caveats
> and may change in future versions. use at your own risk and
> make sure you know what you are doing"

Well, that was my original assumption and intention; but when I
went to look for where the operators for record *equals* were
defined, I found that we had apparently chosen to leave them
undocumented.  Oddly, under a section titled "Row-wise Comparison"
we only document the behavior of comparisons involving what the SQL
spec calls <row value constructor>.   I asked whether that was
intentional, and heard only the chirping of crickets:

http://www.postgresql.org/message-id/1378848776.70700.YahooMailNeo@web162902.mail.bf1.yahoo.com

If we choose not to document the equals operator for records, it
hardly makes sense to document the identical operator for records.

> PostgreSQL has good enough introspection features that people
> tend to find functions and operators using psql-s \d ...

One would think so, yet I don't recall seeing anyone posting
regarding the existing undocumented record comparison operators.
Nor do I recall seeing anyone posting about the undocumented
pattern comparison operators.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Andres Freund

Date:

16 September 2013, 17:49:08

On 2013-09-16 10:46:53 -0700, Kevin Grittner wrote:
> One would think so, yet I don't recall seeing anyone posting
> regarding the existing undocumented record comparison operators. 
> Nor do I recall seeing anyone posting about the undocumented
> pattern comparison operators.

I've used and have seen them being used in client code...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Kevin Grittner

Date:

16 September 2013, 17:58:08

Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-09-16 10:46:53 -0700, Kevin Grittner wrote:
>> I don't recall seeing anyone posting
>> regarding the existing undocumented record comparison operators.
>> Nor do I recall seeing anyone posting about the undocumented
>> pattern comparison operators.
>
> I've used and have seen them being used in client code...

Which, the record operators or the text pattern operators (which
ignore collations and multi-byte character considerations and just
use memcmp())?

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/varlena.c;h=5e2c2ddc532c604a05f365f0cf6761033a35be76;hb=master#l1719

Is the fact that you have seen them used in client code even though
they are not documented an argument for or against adding
documentation for them?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Andres Freund

Date:

16 September 2013, 18:05:57

On 2013-09-16 10:58:01 -0700, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-09-16 10:46:53 -0700, Kevin Grittner wrote:
> >> I don't recall seeing anyone posting
> >> regarding the existing undocumented record comparison operators.
> >> Nor do I recall seeing anyone posting about the undocumented
> >> pattern comparison operators.
> >
> > I've used and have seen them being used in client code...
> 
> Which, the record operators or the text pattern operators (which
> ignore collations and multi-byte character considerations and just
> use memcmp())?

I was referring to the text pattern ops. But I've seen the record ones
as well.

>
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/varlena.c;h=5e2c2ddc532c604a05f365f0cf6761033a35be76;hb=master#l1719
> 
> Is the fact that you have seen them used in client code even though
> they are not documented an argument for or against adding
> documentation for them?

Neither. It's an argument that providing operators with confusing
behaviours will not go unnoticed/unused.

I don't really think the record identical and pattern ops situations are
comparable. The latter has a well defined behaviour (using the C locale
basically) and is only useable for types where it's well defined. The
proposed identical operator returns false for comparisons of actually
equal data, that's quite different imo.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Merlin Moncure

Date:

16 September 2013, 18:23:47

On Mon, Sep 16, 2013 at 12:46 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Hannu Krosing <hannu@2ndQuadrant.com> wrote:
>
>> What I meant is that rather than leave it really undocumented,
>> document it as "system function for specific usage, has caveats
>> and may change in future versions. use at your own risk and
>> make sure you know what you are doing"
>
> Well, that was my original assumption and intention; but when I
> went to look for where the operators for record *equals* were
> defined, I found that we had apparently chosen to leave them
> undocumented.  Oddly, under a section titled "Row-wise Comparison"
> we only document the behavior of comparisons involving what the SQL
> spec calls <row value constructor>.   I asked whether that was
> intentional, and heard only the chirping of crickets:
>
> http://www.postgresql.org/message-id/1378848776.70700.YahooMailNeo@web162902.mail.bf1.yahoo.com
>
> If we choose not to document the equals operator for records, it
> hardly makes sense to document the identical operator for records.
>
>> PostgreSQL has good enough introspection features that people
>> tend to find functions and operators using psql-s \d ...
>
> One would think so, yet I don't recall seeing anyone posting
> regarding the existing undocumented record comparison operators.
> Nor do I recall seeing anyone posting about the undocumented
> pattern comparison operators.

This behavior came about via a gripe of mine (but mostly courtesy Tom
Lane_: http://www.postgresql.org/message-id/6EE64EF3AB31D5448D0007DD34EEB34101AEF5@Herge.rcsinc.local

It brought row-wise comparon closer- (if not exactly to-) SQL spec.
The underlying use-case is to do ISAM-like record iteration over the
index.  "index" being the operative word.

merlin

Re: record identical operator

From

Kevin Grittner

Date:

16 September 2013, 18:34:52

Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-09-16 10:58:01 -0700, Kevin Grittner wrote:
>> Andres Freund <andres@2ndquadrant.com> wrote:
>>> On 2013-09-16 10:46:53 -0700, Kevin Grittner wrote:
>>>> I don't recall seeing anyone posting regarding the existing
>>>> undocumented record comparison operators.  Nor do I recall
>>>> seeing anyone posting about the undocumented pattern
>>>> comparison operators.
>>>
>>> I've used and have seen them being used in client code...
>>
>> Which, the record operators or the text pattern operators (which
>> ignore collations and multi-byte character considerations and
>> just use memcmp())?
>
> I was referring to the text pattern ops. But I've seen the record
> ones as well.

>> Is the fact that you have seen them used in client code even
>> though they are not documented an argument for or against adding
>> documentation for them?
>
> Neither. It's an argument that providing operators with confusing
> behaviours will not go unnoticed/unused.

It is true that there are likely to be some who find the new
operators useful and will read the code to determine how to use
them, just as they apparently did for the undocumented operators
which already exist.

> I don't really think the record identical and pattern ops
> situations are comparable. The latter has a well defined
> behaviour (using the C locale basically) and is only useable for
> types where it's well defined.

But it is *not* restricted to text in the C locale.  It uses
memcmp() on text values even if they contain multi-byte characters
and use collations which require different sequencing.

> The proposed identical operator returns false for comparisons of
> actually equal data,

That is the point of it.  It tells you whether the record images
are byte-for-byte identical for all values within the record.  In
some cases a detected difference clearly causes a user-visible
difference; in others it is merely a performance or storage-space
difference.  For example in your array example the array with the
unnecessary bitmap attached to it takes an extra eight bytes on
disk, for no apparent benefit.  The point is that code which wants
to know whether those are identical images can detect that they are
not.  This works by default for all types, without having to track
down which types are capable of doing this for "equal" values or in
which cases the differences are user-visible.

> that's quite different imo.

Sorting a string based on its byte image, with no regard to its
collation or character boundaries, is what these text pattern
operators do, and it is exactly what my patch does.  My patch just
provides such a test for all data types within a record by default,
rather than requiring a new opfamily for every type.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Kevin Grittner

Date:

16 September 2013, 18:53:21

Merlin Moncure <mmoncure@gmail.com> wrote:
> Kevin Grittner <kgrittn@ymail.com> wrote:

>> I don't recall seeing anyone posting regarding the existing
>> undocumented record comparison operators.

> This behavior came about via a gripe of mine (but mostly courtesy
> Tom Lane_:
> http://www.postgresql.org/message-id/6EE64EF3AB31D5448D0007DD34EEB34101AEF5@Herge.rcsinc.local
>
> It brought row-wise comparon closer- (if not exactly to-) SQL
> spec.  The underlying use-case is to do ISAM-like record
> iteration over the index.  "index" being the operative word.

Indeed.  The documented behavior complies with the spec (and is
extremely useful!), but only fully matches the behavior when one of
the arguments is a row value constructor.  The documentation sort
of acknowledges this by describing the arguments not as "row" or
"record", but as "row_constructor".  Comparison operators which
followed those exact rules would not be usable for indexing, so we
have slightly different semantics to support that.  I have no gripe
whatsoever with any of that.  I'm even OK with not documenting this
behavior, since it might do more to confuse than elucidate.  I did
ask the question, because if we choose not to document the behavior
of record comparisons based around the concept of "record equals",
I don't think it makes sense to document record comparisons based
around the concept of "record image equals".  That is the
terminology used in the function names and opfamily, but I have
been using "identical" in discussion, which perhaps has confused
the issue.

If we take documentation of record comparisons farther, we need to
be prepared to explain why this is correct (because it is required
by spec):

test=# select row(1, null::text) = row(1, null::text);
 ?column?
----------

(1 row)

... but this is also correct (because otherwise indexes don't
work):

test=# select row(1, null::text) = row(1, null::text)::record;
 ?column?
----------
 t
(1 row)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Noah Misch

Date:

16 September 2013, 20:58:34

On Mon, Sep 16, 2013 at 04:28:23PM +0200, Andres Freund wrote:
> On 2013-09-15 19:49:26 -0400, Noah Misch wrote:
> > Type-specific identity operators seem like overkill, anyway.  If we find that
> > meaningless variations in a particular data type are causing too many false
> > non-matches for the generic identity operator, the answer is to make the
> > functions generating datums of that type settle on a canonical form.  That
> > would be the solution for your example involving array null bitmaps.
> 
> I think that's pretty much unrealistic. I am pretty sure that if either
> of us starts looking we will find at about a dozen of such cases and
> miss the other dozen. Not to speak about external code which is damn
> likely to contain such cases.

It wouldn't be a problem if we missed cases or declined to update known cases.
The array example probably isn't worth changing.  Who's going to repeatedly
flip-flop an array column value between the two representations and then
bemoan the resulting MV write traffic?

> And I think that efficiency will often make such normalization expensive
> (consider postgis where Datums afaik can exist with an internal bounding
> box or without).

If it's so difficult to canonicalize between two supposedly-identical
variations, it's likewise a stretch to trust that all credible operations will
fail to distinguish between those variations.

> I think it's far more realistic to implement an identity operator that
> will fall back to a type specific operator iff equals has "strange"
> properties.

Complicating such efforts, the author of a custom identity operator doesn't
have the last word on functions that process the data type.  Take your postgis
example; if a second party adds a has_internal_bounding_box() function, an
identity operator ignoring that facet is no longer valid.

memcmp() has served well for HOT and for _equalConst(); why would it suddenly
fall short for MV maintenance?

nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: record identical operator

From

Andres Freund

Date:

16 September 2013, 21:04:23

On 2013-09-16 16:58:21 -0400, Noah Misch wrote:
> memcmp() has served well for HOT and for _equalConst(); why would it suddenly
> fall short for MV maintenance?

I don't have a problem using it internally, I have a problem exposing
the capability to sql. Don't tell me that's the same.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Kevin Grittner

Date:

16 September 2013, 21:39:39

Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-09-16 16:58:21 -0400, Noah Misch wrote:
>> memcmp() has served well for HOT and for _equalConst(); why
>> would it suddenly fall short for MV maintenance?
>
> I don't have a problem using it internally, I have a problem
> exposing the capability to sql.

... like we do in *pattern ops and the
suppress_redundant_updates_trigger() function?

> Don't tell me that's the same.

No, this gives users a way to make the same test that HOT uses for
whether values match, albeit undocumented.  Well, not exactly the
same test, because this patch detoasts before comparing -- but
pretty close.  The question is, if it's unsafe for a user to make
this test, why would it be safe for HOT to use it?

I'm really having trouble understanding what problem you have with
this.  Can you describe a scenario where someone shoots themselves
in the foot with it, because I'm not seeing any?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Andres Freund

Date:

16 September 2013, 21:58:52

On 2013-09-16 14:39:30 -0700, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-09-16 16:58:21 -0400, Noah Misch wrote:
> >> memcmp() has served well for HOT and for _equalConst(); why
> >> would it suddenly fall short for MV maintenance?
> >
> > I don't have a problem using it internally, I have a problem
> > exposing the capability to sql.
>
> ... like we do in *pattern ops and the

That's still an absurd comparison. pattern ops have a defined behaviour,
even for multibyte characters. After all, you can simply have a UTF-8
database with C collation. Remember, in a C collation pattern ops can
use normal indexes.
The only thing that happens is that multibyte chars are sorted
differently. They aren't sorted basically randomly or anything.

> suppress_redundant_updates_trigger() function?

You get superflous trigger calls. So what. It's not usable for anything
but a trigger.

> I'm really having trouble understanding what problem you have with
> this.  Can you describe a scenario where someone shoots themselves
> in the foot with it, because I'm not seeing any?

It certainly will be surprising to just about anyone that something like:

SELECT * FROM foo WHERE whatever_composite_or_row === '(...)';

may not return rows where there's no SQL level discernible difference
(even by looking at the text conversion) between
whatever_composite_or_row and '(...)' because e.g. of the aforementioned
array null bitmaps.

I can understand claiming that the risk is acceptable but arguing it's
not there seems extremly strange to me.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Andres Freund

Date:

16 September 2013, 22:01:41

On 2013-09-16 23:58:46 +0200, Andres Freund wrote:
> > suppress_redundant_updates_trigger() function?
> 
> You get superflous trigger calls. So what. It's not usable for anything
> but a trigger.

Primarily unneccesary IO, not unneccessary trigger calls (which can also
happen).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Kevin Grittner

Date:

16 September 2013, 22:26:15

Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-09-16 14:39:30 -0700, Kevin Grittner wrote:
>> Andres Freund <andres@2ndquadrant.com> wrote:
>>> On 2013-09-16 16:58:21 -0400, Noah Misch wrote:
>>>> memcmp() has served well for HOT and for _equalConst(); why
>>>> would it suddenly fall short for MV maintenance?
>>>
>>> I don't have a problem using it internally, I have a problem
>>> exposing the capability to sql.
>>
>> ... like we do in *pattern ops and the
>
> That's still an absurd comparison. pattern ops have a defined
> behaviour, even for multibyte characters. After all, you can
> simply have a UTF-8 database with C collation. Remember, in a C
> collation pattern ops can use normal indexes.
> The only thing that happens is that multibyte chars are sorted
> differently. They aren't sorted basically randomly or anything.

Neither are records using the new operator -- there is a
deterministic sort order based on the bytes representing the
record's values.

>> suppress_redundant_updates_trigger() function?
>
> You get superflous trigger calls. So what.

My feeling exactly.

>> I'm really having trouble understanding what problem you have
>> with this.  Can you describe a scenario where someone shoots
>> themselves in the foot with it, because I'm not seeing any?
>
> It certainly will be surprising to just about anyone that
> something like:
>
> SELECT * FROM foo WHERE whatever_composite_or_row === '(...)';
>
> may not return rows where there's no SQL level discernible
> difference (even by looking at the text conversion) between
> whatever_composite_or_row and '(...)' because e.g. of the
> aforementioned array null bitmaps.

Since the operator is bound to a function named record_image_eq(),
it seems to me to be a feature that if the image isn't equal due to
a bitmap being present in one and not the other it says they
differ.  It's not a bug or a problem.  It gives you a way to *tell*
whether two rows actually have the same representation, which could
be valuable for debugging.  The closest you can easily come without
this is to see that pg_column_size() yields a different result for
them, where that is true (as in the array bitmap case).

> I can understand claiming that the risk is acceptable but arguing
> it's not there seems extremly strange to me.

It's not a risk.  It's why the operator exists.  Perhaps the name
of the operator (===) or the fact that I've been using the
shorthand description of "identical" instead of writing out "record
image equals" in these emails has confused you.  I can stop using
the shorter description and have absolutely no attachment to the
operator name, if that helps.

The point of the operator is to determine whether two records,
which may compare as "equal" (but which don't necessarily appear
the same to a user), have the same byte image for each
corresponding value.  The point of the opfamily is to be able to do
a MergeJoin on this basis.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Andres Freund

Date:

16 September 2013, 22:52:28

On 2013-09-16 15:26:08 -0700, Kevin Grittner wrote:
> > I can understand claiming that the risk is acceptable but arguing
> > it's not there seems extremly strange to me.
>
> It's not a risk.  It's why the operator exists. 

Pft. It's fine if the materialized view code uses it. I don't see the
danger there.
It's about users discovering it. If they notice it, they will use it
because "its a crapload faster" than normal row comparisons. And deals
with NULLs in a "simpler" way.

> Perhaps the name
> of the operator (===) or the fact that I've been using the
> shorthand description of "identical" instead of writing out "record
> image equals" in these emails has confused you.

If you really think that "confusing you" is the correct description for
concerns about users not understanding limitations (which you didn't
seem to know about), go ahead. Way to go to solicit feedback.

> I can stop using
> the shorter description and have absolutely no attachment to the
> operator name, if that helps.

You're unfortunately going to need operators if you want mergejoins to
work, right? If not I'd have said name it matview_needs_update() or
something like that... But yes, === certainly seems like a bad idea.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Hannu Krosing

Date:

17 September 2013, 09:08:36

On 09/16/2013 04:01 PM, Kevin Grittner wrote:
> Hannu Krosing <hannu@2ndQuadrant.com> wrote:
>
>> Lots of people were bitten when (undocumented) hash
>> functions were changed thus breaking hash-based partitioning.
> Nobody can be affected by the new operators in this patch unless
> they choose to use them to compare two records.  They don't work
> for any other type and they don't come into play unless you
> specifically request them.
>
Maybe the binary equality operator should be named ====
for "really deeply equal"

to distinguish it from === which would be merely NOT DISTINCT FROM

we could even go one step further and define ===== to mean "the same".

?

This could fit better in conceptual sequence of operator 'strength'

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: record identical operator

From

Kevin Grittner

Date:

17 September 2013, 12:23:58

Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-09-16 15:26:08 -0700, Kevin Grittner wrote:
>>> I can understand claiming that the risk is acceptable but
>>> arguing it's not there seems extremly strange to me.
>>
>> It's not a risk.  It's why the operator exists.
>
> Pft. It's fine if the materialized view code uses it. I don't see
> the danger there.
> It's about users discovering it. If they notice it, they will use
> it because "its a crapload faster" than normal row comparisons.

To have clean semantics, I think the operator should mean that the
stored format of the row is the same.  Regarding the array null
bitmap example, I think it would be truly weird if the operator
said that the stored format was the same, but this happened:

test=# select pg_column_size(ARRAY[1,2,3]);
 pg_column_size
----------------
             36
(1 row)

test=# select pg_column_size((ARRAY[1,2,3,NULL])::int4[3]);
 pg_column_size
----------------
             44
(1 row)

They have the same stored format, but a different number of
bytes?!?

> And deals with NULLs in a "simpler" way.

That might be addressed by leaving room to implement IS NOT
DISTINCT FROM as an operator.  See below.

>> Perhaps the name of the operator (===) or the fact that I've
>> been using the shorthand description of "identical" instead of
>> writing out "record image equals" in these emails has confused
>> you.
>
> If you really think that "confusing you" is the correct
> description for concerns about users not understanding
> limitations (which you didn't seem to know about), go ahead. Way
> to go to solicit feedback.

I see how that could be taken in a pejorative way; that was not
intended.  I apologize.  I was rushing a bit on that email to make
an appointment shortly afterward.  The point was to suggest that
bad communication on my part about the intent and concept of the
operator may have contributed to you saying that there was a risk
of it working as intended.  There is perhaps a risk that someone
will think that it does something other than what is intended, but
when you say that there is a "risk" that it will behave exactly as
intended, it does suggest that we're talking past each other.

>> I can stop using the shorter description and have absolutely no
>> attachment to the operator name, if that helps.
>
> You're unfortunately going to need operators if you want
> mergejoins to work, right?

Yes, operators are needed for that.

> If not I'd have said name it matview_needs_update() or something
> like that... But yes, === certainly seems like a bad idea.

I've come to agree with that.  The appointment was to meet with a
local PostgreSQL user who I've talked to before.  He reminded me
that his pet peeve was that he couldn't use IS [ NOT ] DISTINCT
FROM with the ALL | ANY construct because the IS [ NOT ] DISTINCT
FROM predicate isn't an operator.

Hannu Krosing <hannu@2ndQuadrant.com> wrote:

> Maybe the binary equality operator should be named ====
> for "really deeply equal"
>
> to distinguish it from === which would be merely NOT DISTINCT FROM
>
> we could even go one step further and define ===== to mean "the same".
>
> ?
>
> This could fit better in conceptual sequence of operator 'strength'

I'm inclined to go with this.  It would leave the door open to
implementing IS NOT DISTINCT FROM on a type-by-type basis.  My one
concern is that it doesn't make room for a "has no user-visible
differences from" operator; but I'm not sure whether that is
something that there really is any use for.  But if we want to
reserve name space for it "just in case" someone has a use for it,
that would be between IS NOT DISTINCT FROM and "has the same
storage representation as", so that would leave:

  =      equals (but doesn't necessarily look the same as)
  ===    IS NOT DISTINCT FROM as an operator
  ====   reserved for "has no user-visible differences from"
  =====  stored image is the same

Of course, that begs the question of whether == is already "taken".
If not, we could knock one '=' off of everything above except for
"equals".  What existing uses are known for == ?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Rod Taylor

Date:

17 September 2013, 12:46:47

On Tue, Sep 17, 2013 at 8:23 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

Of course, that begs the question of whether == is already "taken".
If not, we could knock one '=' off of everything above except for
"equals". What existing uses are known for == ?

== is already taken as a common typo in plpgsql scripts. I strongly prefer if this remained an error.

IF foo == bar THEN ...

Re: record identical operator

From

Hannu Krosing

Date:

17 September 2013, 13:15:00

On 09/17/2013 02:46 PM, Rod Taylor wrote:

On Tue, Sep 17, 2013 at 8:23 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

Of course, that begs the question of whether == is already "taken".
If not, we could knock one '=' off of everything above except for
"equals". What existing uses are known for == ?

== is already taken as a common typo in plpgsql scripts. I strongly prefer if this remained an error.

IF foo == bar THEN ...

That was also my reason for not suggesting == .
It is too widely used in other systems for simple equality check.

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: record identical operator

From

Hannu Krosing

Date:

17 September 2013, 13:30:45

On 09/17/2013 12:52 AM, Andres Freund wrote:
> On 2013-09-16 15:26:08 -0700, Kevin Grittner wrote:
>>> I can understand claiming that the risk is acceptable but arguing
>>> it's not there seems extremly strange to me.
>> It's not a risk.  It's why the operator exists. 
> Pft. It's fine if the materialized view code uses it. I don't see the
> danger there.
> It's about users discovering it. If they notice it, they will use it
> because "its a crapload faster" than normal row comparisons. And deals
> with NULLs in a "simpler" way.
This is why I suggested naming the operator ==== with four '='s
as this should make the user ponder why there are so many equal signs.
And there are other languages where more equal signs mean "stronger"
equality.

Basically it is "equality with possible false negatives".

If a user wants to use it for speeding up simple equality, it should be
used as
"WHERE t.* ==== sample AND t.* = sample"

>
>> Perhaps the name
>> of the operator (===) or the fact that I've been using the
>> shorthand description of "identical" instead of writing out "record
>> image equals" in these emails has confused you.
> If you really think that "confusing you" is the correct description for
> concerns about users not understanding limitations (which you didn't
> seem to know about), 
IIRC he did start with a visible limitation of = being too weak equality
for some fields
> go ahead. Way to go to solicit feedback.
>
>> I can stop using
>> the shorter description and have absolutely no attachment to the
>> operator name, if that helps.
> You're unfortunately going to need operators if you want mergejoins to
> work, right? If not I'd have said name it matview_needs_update() or
> something like that... But yes, === certainly seems like a bad idea.
but having === as an operator form of IS NOT DISTINCT FROM
seems like a good idea to me.

We will get an operator which most people expect, with sufficiently
strange name to make people look up the docs and we will not be
violating SQL spec.

And it prepares their heads for stronger semantics of ==== :)
>
> Greetings,
>
> Andres Freund
>

Re: record identical operator

From

Dimitri Fontaine

Date:

17 September 2013, 13:56:55

Kevin Grittner <kgrittn@ymail.com> writes:
>   =      equals (but doesn't necessarily look the same as)
>   ===    IS NOT DISTINCT FROM as an operator
>   ====   reserved for "has no user-visible differences from"
>   =====  stored image is the same

I understand the need for more than one equality operator and my
programming language of choice exposes eq, eql, equal and equalp for
solving a very similar situation. Oh, and also offers = and string= and
string-equal, among others.

My vote goes against using a different number of the same character to
name the operators though, as that's not visually helpful enough IMO.

In yet another language whose grammar took inspiration with Prolog,
dealing with binary is made with explicitely using the bit syntax
expression where any datum is boxed into << and >>.
 http://erlang.org/doc/reference_manual/expressions.html#id78513

It might not be practical to do the same in SQL and have either the
following syntax example or even a unary "bit representation" operator
to do the same:
 SELECT <<row(expression, here)>> = <<column_name>> …

That would probably prevent using indexes?

It might be inspiration to name the bit-comparison operator something
like <<=>> though, or =><= or something else. If only we could force the
usage of unicode for the SQL input itself… I'm sure there's a unicode
glyph perfect for what we want here…

About IS NOT DISTINCT FROM, I would propose something where NOT and
DISTINCT are kept as concepts. Not usually is expressed with ! and we
already have a bunch of "same as" or "matches" operators with ~, so:
 <~>   IS DISTINCT FROM !<~>  IS NOT DISTINCT FROM

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: record identical operator

From

Robert Haas

Date:

18 September 2013, 15:06:54

On Tue, Sep 17, 2013 at 8:23 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> To have clean semantics, I think the operator should mean that the
> stored format of the row is the same.  Regarding the array null
> bitmap example, I think it would be truly weird if the operator
> said that the stored format was the same, but this happened:
>
> test=# select pg_column_size(ARRAY[1,2,3]);
>  pg_column_size
> ----------------
>              36
> (1 row)
>
> test=# select pg_column_size((ARRAY[1,2,3,NULL])::int4[3]);
>  pg_column_size
> ----------------
>              44
> (1 row)
>
> They have the same stored format, but a different number of
> bytes?!?

Hmm.  For most of this thread, I was leaning toward the view that
comparing the binary representations was the wrong concept, and that
we actually needed to have type-specific operators that understand the
semantics of the data type.

But I think this example convinces me otherwise.  What we really want
to do here is test whether two values are the same, and if you can
feed two values that are supposedly the same to some function and get
two different answers, well then they're not really the same, are
they?

Now, I grant that the array case is pretty weird.  An array with an
all-zeroes null bitmap is basically semantically identical to one with
no null bitmap at all.  But there are other such cases as well.  You
can have two floats that print the same way except when
extra_float_digits=3, for example, and I think that's probably a
difference that we *wouldn't* want to paper over.  You can have a
long-form numeric that represents a value that could have been
represented as a short-form numeric, which is similar to the array
case.  There are probably other examples as well.  But in each of
those cases, the point is that there *is* some operation which will
distinguish between the two supposedly-identical values, and therefore
they are not identical for all purposes.  Therefore, I see no harm in
having an operator that tests for
are-these-values-identical-for-all-purposes.  If that's useful for
RMVC, then there may be other legitimate uses for it as well.

And once we decide that's OK, I think we ought to document it.  Sure,
it's a little confusing, but we can explain it, I think.  It's a good
opportunity to point out to people that, most of the time, they really
want something else, like the equality operator for the default btree
opclass.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

18 September 2013, 15:47:55

* Robert Haas (robertmhaas@gmail.com) wrote:
> Therefore, I see no harm in
> having an operator that tests for
> are-these-values-identical-for-all-purposes.  If that's useful for
> RMVC, then there may be other legitimate uses for it as well.
>
> And once we decide that's OK, I think we ought to document it.  Sure,
> it's a little confusing, but we can explain it, I think.  It's a good
> opportunity to point out to people that, most of the time, they really
> want something else, like the equality operator for the default btree
> opclass.

For my 2c on this, while this can be useful for *us*, and maybe folks
hacking pretty close to PG, I can't get behind introducing this as an
'===' or some such operator.  I've missed why this can't be a simple
function and why in the world we would want to encourage users to use
this by making it look like a normal language construct of SQL, which
damn well better consider numbers which are equal in value to be equal,
regardless of their representation.

What the heck is the use case for this being a user-oriented, SQL
operator..?
Thanks,
    Stephen

Re: record identical operator

From

Andres Freund

Date:

18 September 2013, 15:53:33

On 2013-09-18 11:50:23 -0400, Stephen Frost wrote:
> For my 2c on this, while this can be useful for *us*, and maybe folks
> hacking pretty close to PG, I can't get behind introducing this as an
> '===' or some such operator.  I've missed why this can't be a simple
> function and why in the world we would want to encourage users to use
> this by making it look like a normal language construct of SQL, which
> damn well better consider numbers which are equal in value to be equal,
> regardless of their representation.

I certainly understand the feeling...

I think this really needs to have an obscure name. Like ==!!== or
somesuch (is equal very much, but doesn't actually test for equality ;))

> What the heck is the use case for this being a user-oriented, SQL
> operator..?

The materalized view code uses generated SQL, so it has to be SQL
accessible. And it needs to be an operator because the join planning
code requires that :(

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Kevin Grittner

Date:

18 September 2013, 15:54:59

Robert Haas <robertmhaas@gmail.com> wrote:
> Kevin Grittner <kgrittn@ymail.com> wrote:
>> To have clean semantics, I think the operator should mean that the
>> stored format of the row is the same.  Regarding the array null
>> bitmap example, I think it would be truly weird if the operator
>> said that the stored format was the same, but this happened:
>>
>> test=# select pg_column_size(ARRAY[1,2,3]);
>>   pg_column_size
>> ----------------
>>               36
>> (1 row)
>>
>> test=# select pg_column_size((ARRAY[1,2,3,NULL])::int4[3]);
>>   pg_column_size
>> ----------------
>>               44
>> (1 row)
>>
>> They have the same stored format, but a different number of
>> bytes?!?
>
> Hmm.  For most of this thread, I was leaning toward the view that
> comparing the binary representations was the wrong concept, and that
> we actually needed to have type-specific operators that understand the
> semantics of the data type.
>
> But I think this example convinces me otherwise.  What we really want
> to do here is test whether two values are the same, and if you can
> feed two values that are supposedly the same to some function and get
> two different answers, well then they're not really the same, are
> they?

Right.  Not only would the per-type solution make materialized views
maintenance broken by default, requiring per-type work to make it
work reasonably, with silent failures for any type you didn't know
about, but "no user-visible differences" is a pretty slippery
concept.  Did you really think of all the functions someone might
use to look at a value?  Might there be performance differences we
care about that should be handled, even if the user has no way to
dig out the difference?  Will that change in a future release?

> Now, I grant that the array case is pretty weird.  An array with an
> all-zeroes null bitmap is basically semantically identical to one with
> no null bitmap at all.  But there are other such cases as well.  You
> can have two floats that print the same way except when
> extra_float_digits=3, for example, and I think that's probably a
> difference that we *wouldn't* want to paper over.  You can have a
> long-form numeric that represents a value that could have been
> represented as a short-form numeric, which is similar to the array
> case.  There are probably other examples as well.  But in each of
> those cases, the point is that there *is* some operation which will
> distinguish between the two supposedly-identical values, and therefore
> they are not identical for all purposes.  Therefore, I see no harm in
> having an operator that tests for
> are-these-values-identical-for-all-purposes.  If that's useful for
> RMVC, then there may be other legitimate uses for it as well.
>
> And once we decide that's OK, I think we ought to document it.

That seems to be the consensus.  I don't think we can really
document this form of record comparison without also documenting
how equality works.  I'll work something up for the next version of
the patch.

> Sure, it's a little confusing, but we can explain it, I think.  It's a good
> opportunity to point out to people that, most of the time, they really
> want something else, like the equality operator for the default btree
> opclass.

I think the hardest part will be documenting the difference between
the row value constructor semantics (which are all that is
currently documented) and the record equality semantics (used for
sorting and building indexes).  In a green field I think I would
have argued for having just the standard semantics we have
documented, and modifying our sort execution nodes and index builds
to deal with that.  This is one of those cases where the breakage
from changing to that is hard to justify on a cleaner conceptual
semantics basis.

There also seems to be universal agreement that the operator names
should be something other than what I put in the v1 patch, but we
don't have agreement on what should be used instead.  We need six
operators, to support the btree am requirements.  Currently the
patch has:

=== !== >== >>> <== <<<

Suggested "same as" operators so far are:

====
=====
<<=>>
=><=

Anyone want to champion one of those, or something else?  How about
the other five operators to go with your favorite?

Keep in mind that this thread has also turned up strong support for
an operator to express IS NOT DISTINCT FROM -- so that it can be
used with ANY/ALL, among other things.  Long term, having an
opfamily for that might help us clean up the semantics of record
comparison when there are NULLs involved.  Currently we use the =
operator but act as though IS NOT DISTINCT FROM was specified
(except for some cases involving a row value constructor).  Any
serious discussion of that should probably move to a new thread,
but I mention it here because some people wanted to reserve
operator space for that, which could conflict with "same as"
operators.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

18 September 2013, 15:56:42

* Andres Freund (andres@2ndquadrant.com) wrote:
> I think this really needs to have an obscure name. Like ==!!== or
> somesuch (is equal very much, but doesn't actually test for equality ;))

hah.

> > What the heck is the use case for this being a user-oriented, SQL
> > operator..?
>
> The materalized view code uses generated SQL, so it has to be SQL
> accessible. And it needs to be an operator because the join planning
> code requires that :(

Ugh.  This feels like a pretty ugly hack to deal with that.  I haven't
got any magical wand to address it, but making an SQL operator for 'are
these really the same bytes' to deal with what is essentially
implementation detail is _very_ grotty.
Thanks,
    Stephen

Re: record identical operator

From

Stephen Frost

Date:

18 September 2013, 16:03:05

* Kevin Grittner (kgrittn@ymail.com) wrote:
> Right.  Not only would the per-type solution make materialized views
> maintenance broken by default, requiring per-type work to make it
> work reasonably, with silent failures for any type you didn't know
> about, but "no user-visible differences" is a pretty slippery
> concept. 

I don't like those possibilities, of course, but I'm starting to wonder
about this whole concept of looking at it byte-wise.  If I'm following
correctly, what we're looking at here is having a way for matviews to
tell if these bytes are the same as those bytes, for the purpose of
deciding to update the matview, right?  Yet we can then have cases where
the row isn't *actually* different from a value perspective, yet we're
going to update it anyway because it's represented slightly differently?

What happens if we later want to add support for users to have a matview
trigger that's called when a matview row *actually* changes?  We'd end
up calling it on what are essentially false positives, or having to do
some double-check later on "well, did it *really* change?", neither of
which is good at all.  If we had the IS NOT DISTINCT FROM operators
discussed, would that work for this even if it isn't as performant?  Or
is there an issue with that?
Thanks,
    Stephen

Re: record identical operator

From

Merlin Moncure

Date:

18 September 2013, 16:06:19

On Wed, Sep 18, 2013 at 10:59 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Andres Freund (andres@2ndquadrant.com) wrote:
>> I think this really needs to have an obscure name. Like ==!!== or
>> somesuch (is equal very much, but doesn't actually test for equality ;))
>
> hah.
>
>> > What the heck is the use case for this being a user-oriented, SQL
>> > operator..?
>>
>> The materalized view code uses generated SQL, so it has to be SQL
>> accessible. And it needs to be an operator because the join planning
>> code requires that :(
>
> Ugh.  This feels like a pretty ugly hack to deal with that.  I haven't
> got any magical wand to address it, but making an SQL operator for 'are
> these really the same bytes' to deal with what is essentially
> implementation detail is _very_ grotty.

Having matviews using SQL expressible features is a *good* thing.
Having a user accessible operator is nice to have (if for no other
reason than to allow testing for which matview rows would be
refreshed).  I just don't understand what all the fuss is about except
to make sure not to utilize an operator name that is better suited for
other purposes.

merlin

Re: record identical operator

From

Kevin Grittner

Date:

18 September 2013, 16:12:37

Stephen Frost <sfrost@snowman.net> wrote:

> making an SQL operator for 'are these really the same bytes' to
> deal with what is essentially implementation detail is _very_
> grotty.

We already have some such operators, although Andres argues that
comparing to that isn't fair because we at least know it is a
string of characters; we're just ignoring character boundaries and
collations.  Some of the operators use for the existing byte
comparison opclasses are:

~<~ ~<=~ ~>=~ ~>~

Go ahead and try them out with existing text values.  Andres has
said that he has seen these used in production systems.

= and <> aren't listed above even though they do a byte-for-byte
comparison because, well, I guess because we have chosen to treat
two UTF8 strings which produce the same set of glyphs using
different bytes as unequal.  :-/

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Andres Freund

Date:

18 September 2013, 16:13:28

On 2013-09-18 11:06:13 -0500, Merlin Moncure wrote:
> > Ugh.  This feels like a pretty ugly hack to deal with that.  I haven't
> > got any magical wand to address it, but making an SQL operator for 'are
> > these really the same bytes' to deal with what is essentially
> > implementation detail is _very_ grotty.

I know the feeling, but I don't have a better suggestion either, so...

> Having matviews using SQL expressible features is a *good* thing.
> Having a user accessible operator is nice to have (if for no other
> reason than to allow testing for which matview rows would be
> refreshed).  I just don't understand what all the fuss is about except
> to make sure not to utilize an operator name that is better suited for
> other purposes.

It's an externally exposed API with not easily understandable semantics
that's not actually all that useful outside specific usecases. If we
decide to change it we're creating an API breakage. And we get to deal
with people saying it's broken because they don't understand the
semantics.

That said, I am ok with this if we use strange operator names and
document that the semantics are complicated...

==!!<
==!!<=
...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Stephen Frost

Date:

18 September 2013, 16:34:21

* Merlin Moncure (mmoncure@gmail.com) wrote:
> Having matviews using SQL expressible features is a *good* thing.

Fine, then it should be implemented *using SQL*, which is based on
*values*, not on how the value is represented in bits and bytes.

> Having a user accessible operator is nice to have (if for no other
> reason than to allow testing for which matview rows would be
> refreshed).

If it's not actually *changing* (wrt its value), then I'm not at all
impressed with the notion that it's going to get updated anyway.
Thanks,
    Stephen

Re: record identical operator

From

Stephen Frost

Date:

18 September 2013, 16:36:37

* Kevin Grittner (kgrittn@ymail.com) wrote:
> = and <> aren't listed above even though they do a byte-for-byte
> comparison because, well, I guess because we have chosen to treat
> two UTF8 strings which produce the same set of glyphs using
> different bytes as unequal.  :-/

I tend to side with Andres on this case actually- we're being asked to
store specific UTF8 bytes by the end user.  That is not the same as
treating two different numerics which are the same *number* as
different because they have different binary representations, which is
entirely an internal-to-postgres consideration.
Thanks,
    Stephen

Re: record identical operator

From

Steve Singer

Date:

18 September 2013, 16:54:17

On 09/18/2013 11:39 AM, Stephen Frost wrote:
> * Kevin Grittner (kgrittn@ymail.com) wrote:
>> = and <> aren't listed above even though they do a byte-for-byte
>> comparison because, well, I guess because we have chosen to treat
>> two UTF8 strings which produce the same set of glyphs using
>> different bytes as unequal.  :-/
> I tend to side with Andres on this case actually- we're being asked to
> store specific UTF8 bytes by the end user.  That is not the same as
> treating two different numerics which are the same *number* as
> different because they have different binary representations, which is
> entirely an internal-to-postgres consideration.

The problem is that there are datatypes (citext, postgis,...) that have 
defined = to return true when comparing two values that are different 
not just stored differently.  Are you saying that matview's should 
update only when the = operator of the datatype returns false and if 
people don't like this behaviour they should fix the datatypes?

>     Thanks,
>
>         Stephen

Re: record identical operator

From

Stephen Frost

Date:

18 September 2013, 17:01:55

* Steve Singer (steve@ssinger.info) wrote:
> The problem is that there are datatypes (citext, postgis,...) that
> have defined = to return true when comparing two values that are
> different not just stored differently.

If the definition of the type is that they're equal, then they're equal.
Certainly there are cases where this is really rather broken
(particularly in the postgis case that you mention), but I don't think
that means we should change our definition of equality to generally be
"are the bytes the same"- clearly that'd lead to incorrect behavior in
the NUMERIC case.

> Are you saying that
> matview's should update only when the = operator of the datatype
> returns false and if people don't like this behaviour they should
> fix the datatypes?

imv, we are depending on the "=" operator to tell us when the
values are equal, regardless of type.  I have a hard time seeing how we
can do anything else.  The PostGIS situation is already broken when you
consider UNIQUE constraints and, while it's unfortunate that they'd need
to change their data type to fix that, I do feel it's on them to deal
with it.

Anyone can create an extension with their own data type which returns
wrong data and results, it's not on us to figure out how to make those
work even in the face of blatent violations like making "=" not actually
mean "these values are the same".
Thanks,
    Stephen

Re: record identical operator

From

Kevin Grittner

Date:

18 September 2013, 17:39:11

Stephen Frost <sfrost@snowman.net> wrote:

> If it's not actually *changing* (wrt its value), then I'm not at
> all impressed with the notion that it's going to get updated
> anyway.

But PostgreSQL very specifically (and as far as I can tell
*intentionally*) allows you to *change* a value and have it still
be considered *equal*.  The concept of equal values really means
more like "equivalent" or "close enough" for common purposes.  It
very specifically does *not* mean the same value.

As just one example, think how much easier the citext type would be
to implement if it folded all values to lower case as they were
input, rather than preserving the data as entered and considering
different capitalizations as "equal".

The notion that in PostgreSQL a value has not changed if the new
value is equal to the old is just flat out wrong.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

18 September 2013, 17:53:16

Kevin,

On Wednesday, September 18, 2013, Kevin Grittner wrote:

Stephen Frost <sfrost@snowman.net> wrote:

> If it's not actually *changing* (wrt its value), then I'm not at
> all impressed with the notion that it's going to get updated
> anyway.

But PostgreSQL very specifically (and as far as I can tell
*intentionally*) allows you to *change* a value and have it still
be considered *equal*.

I'm curious where you're going with that- of course you can update a value and have the same value (and possibly the same byte representation) stored over the old.

The concept of equal values really means
more like "equivalent" or "close enough" for common purposes. It
very specifically does *not* mean the same value.

I'm really curious about your thoughts on unique indexes then. Should two numerics which are the same value but different byte representations be allowed in a unique index?

As just one example, think how much easier the citext type would be
to implement if it folded all values to lower case as they were
input, rather than preserving the data as entered and considering
different capitalizations as "equal".

If the type operator says they're equal, then I think we need to consider them as equal. If an update happens with a conditional of:

where col1 = 'Abc'

When col1 is 'ABC' using citext, should we still issue the update?

The notion that in PostgreSQL a value has not changed if the new
value is equal to the old is just flat out wrong.

The value *can* be changed to be equal to the existing value but that doesn't make the two values *not equal*.

Thanks,

Stephen

Re: record identical operator

From

Hannu Krosing

Date:

18 September 2013, 18:06:12

On 09/18/2013 05:53 PM, Andres Freund wrote:
> On 2013-09-18 11:50:23 -0400, Stephen Frost wrote:
>> For my 2c on this, while this can be useful for *us*, and maybe folks
>> hacking pretty close to PG, I can't get behind introducing this as an
>> '===' or some such operator.  I've missed why this can't be a simple
>> function and why in the world we would want to encourage users to use
>> this by making it look like a normal language construct of SQL, which
>> damn well better consider numbers which are equal in value to be equal,
>> regardless of their representation.
> I certainly understand the feeling...
>
> I think this really needs to have an obscure name. Like ==!!== or
> somesuch (is equal very much, but doesn't actually test for equality ;))
In PostgreSQL equality can be "anything" :)

In other words, we have "pluggable equality", so it is entirely
feasible to have an opclass where binary equality is *the* equality

the problem started with some "opclass equality" (case insensitive
comparison) missing user-visible changes.

Cheers


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: record identical operator

From

Kevin Grittner

Date:

18 September 2013, 18:09:04

Stephen Frost <sfrost@snowman.net> wrote:

> I don't think that means we should change our definition of
> equality to generally be "are the bytes the same"- clearly that'd
> lead to incorrect behavior in the NUMERIC case.

Nobody is talking in any way, shape, or form about changing our
concept of what is "equal".  We're talking about recognizing that
in PostgreSQL "equal" does *not* mean "the same".  If we used the
equal concept for determining what has changed, if someone was
tracking numeric data without precision and scale so that they
could track accuracy (by storing the correct number of decimal
positions) the accuracy could not be replicated to a materialized
view.  Of course, streaming replication would replicate the
change, but if '1.4' was stored in a column copied into a matview
and they later updated the source to '1.40' the increase in
accuracy would not flow to the matview.  That would be a bug, not a
feature.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Kevin Grittner

Date:

18 September 2013, 18:35:00

Stephen Frost <sfrost@snowman.net>
> Kevin Grittner  wrote:
>> Stephen Frost <sfrost@snowman.net> wrote:
>>
>>> If it's not actually *changing* (wrt its value), then I'm not
>>> at all impressed with the notion that it's going to get updated
>>> anyway.
>>
>> But PostgreSQL very specifically (and as far as I can tell
>> *intentionally*) allows you to *change* a value and have it
>> still be considered *equal*.  
>
> I'm curious where you're going with that- of course you can
> update a value and have the same value (and possibly the same
> byte representation) stored over the old.

The way I see it, you can update a column to a different value
which will compare as equal.  That's fine.  Nobody wants to change
that.  But it is still not the same value.

>> The concept of equal values really means
>> more like "equivalent" or "close enough" for common purposes.  It
>> very specifically does *not* mean the same value.
>
> I'm really curious about your thoughts on unique indexes then.
> Should two numerics which are the same value but different byte
> representations be allowed in a unique index?

Not if it is defined with the default opclass, which will use an
equal operator.  Of course, this patch would allow an index on a
record to be defined with record_image_ops, in which case it would
sort by the raw bytes in the values of the record.  That's not
going to be useful in very many places, which is why it would not
be the default.  You don't get that behavior unless you ask for it.

See this docs page for a similar example related to complex numbers:

http://www.postgresql.org/docs/current/interactive/xindex.html#XINDEX-EXAMPLE

> If the type operator says they're equal, then I think we need to
> consider them as equal.

Absolutely.  Two different values may be equal within an opclass.

> If an update happens with a conditional of:
>
> where col1 = 'Abc'
>
> When col1 is 'ABC' using citext, should we still issue the
> update?

Absolutely not, because the update was requested in the case that
the equality test was true.  Yet if a row is updated to replace
'Abc' with 'ABC', then streaming replication should copy the
"different but equal value" (it does), a normal view should now
show 'ABC' (it does), and a refresh of a matview should cause the
matview to show 'ABC' (it doesn't in git, but this patch would make
that work).

> The value *can* be changed to be equal to the existing value but
> that doesn't make the two values *not equal*.

Nobody has ever argued that they should be considered *not equal*.
It's just about providing a way to recognize when two equal values
*are not the same*.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Kevin Grittner

Date:

18 September 2013, 18:40:56

Kevin Grittner <kgrittn@ymail.com> wrote:

>> If an update happens with a conditional of:
>>
>> where col1 = 'Abc'
>>
>> When col1 is 'ABC' using citext, should we still issue the
>> update?
>
> Absolutely not, because the update was requested in the case that
> the equality test was true.

Sorry, as if this thread were not long enough, I misread that and
gave the wrong answer.  Yes, the equal operator was used and the
equal operator for two citext values says those are equal, so the
row *should* be updated.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Dimitri Fontaine

Date:

18 September 2013, 19:19:45

Kevin Grittner <kgrittn@ymail.com> writes:
> change, but if '1.4' was stored in a column copied into a matview
> and they later updated the source to '1.40' the increase in
> accuracy would not flow to the matview.  That would be a bug, not a
> feature.

Maybe the answer to that use case is to use the seg extension?
 http://www.postgresql.org/docs/9.3/interactive/seg.html

IOW, colour me unconvinced about that binary-equality opclass use case
in MatViews. We are trusting the btree equality operator about
everywhere in PostgreSQL and it's quite disturbing to be told that in
fact we should not trust it.

Would it make sense for you to produce a patch without the extra
operators, behavior, opclass and opfamily here so that we can focus on
the MatView parts of it?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: record identical operator

From

Kevin Grittner

Date:

18 September 2013, 19:41:37

Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> change, but if '1.4' was stored in a column copied into a matview
>> and they later updated the source to '1.40' the increase in
>> accuracy would not flow to the matview.  That would be a bug, not a
>> feature.
>
> Maybe the answer to that use case is to use the seg extension?
>
>   http://www.postgresql.org/docs/9.3/interactive/seg.html

You are arguing that we should provide lesser support for numeric
columns (and who knows how many other types) in materialized views
than we do in streaming replication, pg_dump,
suppress_redundant_updates_trigger(), and other places?  Why?

> IOW, colour me unconvinced about that binary-equality opclass use case
> in MatViews. We are trusting the btree equality operator about
> everywhere in PostgreSQL and it's quite disturbing to be told that in
> fact we should not trust it.

Who said we should not trust it?  I have said that equality in
PostgreSQL does not mean the two values appear the same or behave
the same in all cases -- the definer of the class gets to determine
how many definitions of equality there are, and which (if any) is
tied to the = operator name.  That should not be news to anybody;
it's in the documentation.  I'm proposing a second definition of
equality with a different operator name for comparing two records,
without in any way disturbing the existing definition.  I am taken
completely by surprise that in this case creating a second opclass
for something is somehow controversial.  The documentation I cited
previously provides a clear example of another case where two
completely different concepts of equality for a type are useful.

We have, as a community, gone to a fair amount of trouble  to make
the concept of equality pluggable and allow multiple types of
equality per type.  To me it seems the perfect tool to solve this
problem.  Why the fuss?

> Would it make sense for you to produce a patch without the extra
> operators, behavior, opclass and opfamily here so that we can focus on
> the MatView parts of it?

No, matviews cannot be fixed without the new operators.  Here are
the stats on the patch:

kgrittn@Kevin-Desktop:~/pg/master$ git diff --stat master..matview
 contrib/citext/expected/citext.out       |   41 +++
 contrib/citext/expected/citext_1.out     |   41 +++
 contrib/citext/sql/citext.sql            |   23 ++
 src/backend/commands/matview.c           |    7 +-
 src/backend/utils/adt/rowtypes.c         |  482 ++++++++++++++++++++++++++
 src/include/catalog/pg_amop.h            |   10 +
 src/include/catalog/pg_amproc.h          |    1 +
 src/include/catalog/pg_opclass.h         |    1 +
 src/include/catalog/pg_operator.h        |   14 +
 src/include/catalog/pg_opfamily.h        |    1 +
 src/include/catalog/pg_proc.h            |   12 +-
 src/include/utils/builtins.h             |    7 +
 src/test/regress/expected/opr_sanity.out |    7 +-
 13 files changed, 642 insertions(+), 5 deletions(-)

The changes to matview.c are the only ones that are
matview-specific.  Basically, that consists of using the new
operator instead of = in a couple places.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Hannu Krosing

Date:

18 September 2013, 21:09:40

On 09/18/2013 09:19 PM, Dimitri Fontaine wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> change, but if '1.4' was stored in a column copied into a matview
>> and they later updated the source to '1.40' the increase in
>> accuracy would not flow to the matview.  That would be a bug, not a
>> feature.
> Maybe the answer to that use case is to use the seg extension?
>
>   http://www.postgresql.org/docs/9.3/interactive/seg.html
>
> IOW, colour me unconvinced about that binary-equality opclass use case
> in MatViews. We are trusting the btree equality operator about
> everywhere in PostgreSQL and it's quite disturbing to be told that in
> fact we should not trust it.
The problem is, that in this case the simple VIEW and MATVIEW
would yield different results.


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: record identical operator

From

Hannu Krosing

Date:

18 September 2013, 21:13:46

On 09/18/2013 09:41 PM, Kevin Grittner wrote:
> Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote:
>> Kevin Grittner <kgrittn@ymail.com> writes:
>>> change, but if '1.4' was stored in a column copied into a matview
>>> and they later updated the source to '1.40' the increase in
>>> accuracy would not flow to the matview.  That would be a bug, not a
>>> feature.
>> Maybe the answer to that use case is to use the seg extension?
>>
>>    http://www.postgresql.org/docs/9.3/interactive/seg.html
> You are arguing that we should provide lesser support for numeric
> columns (and who knows how many other types) in materialized views
> than we do in streaming replication, pg_dump,
> suppress_redundant_updates_trigger(), and other places?  Why?
>
>> IOW, colour me unconvinced about that binary-equality opclass use case
>> in MatViews. We are trusting the btree equality operator about
>> everywhere in PostgreSQL and it's quite disturbing to be told that in
>> fact we should not trust it.
> Who said we should not trust it?  I have said that equality in
> PostgreSQL does not mean the two values appear the same or behave
> the same in all cases -- the definer of the class gets to determine
> how many definitions of equality there are, and which (if any) is
> tied to the = operator name.  That should not be news to anybody;
> it's in the documentation.  I'm proposing a second definition of
> equality with a different operator name for comparing two records,
> without in any way disturbing the existing definition. 
Basically what proposed === does is "is guaranteed to be equal".
If it is not *guaranteed* it is safe to re-evaluate, either using
equal or something else.



-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: record identical operator

From

Hannu Krosing

Date:

18 September 2013, 21:22:06

On 09/18/2013 06:05 PM, Stephen Frost wrote:
> * Kevin Grittner (kgrittn@ymail.com) wrote:
>> Right.  Not only would the per-type solution make materialized views
>> maintenance broken by default, requiring per-type work to make it
>> work reasonably, with silent failures for any type you didn't know
>> about, but "no user-visible differences" is a pretty slippery
>> concept.  
> I don't like those possibilities, of course, but I'm starting to wonder
> about this whole concept of looking at it byte-wise.  If I'm following
> correctly, what we're looking at here is having a way for matviews to
> tell if these bytes are the same as those bytes, for the purpose of
> deciding to update the matview, right? 
Basically what is needed is to check if the rowm *might* have
changed, so the new value, which may or may not be "equal"
would be refreshed into matview.
>  Yet we can then have cases where
> the row isn't *actually* different from a value perspective, yet we're
> going to update it anyway because it's represented slightly differently?
>
> What happens if we later want to add support for users to have a matview
> trigger that's called when a matview row *actually* changes?  We'd end
> up calling it on what are essentially false positives, or having to do
> some double-check later on "well, did it *really* change?", neither of
> which is good at all.  If we had the IS NOT DISTINCT FROM operators
> discussed, would that work for this even if it isn't as performant?  Or
> is there an issue with that?
IS NOT DISTINCT solves the problem with weird equality of NULLS
and nothing else, so it would not help here.
What the proposed operator family solves is "possiby changed"

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: record identical operator

From

Hannu Krosing

Date:

18 September 2013, 21:40:22

On 09/18/2013 05:54 PM, Kevin Grittner wrote:
> ...
> I think the hardest part will be documenting the difference between
> the row value constructor semantics (which are all that is
> currently documented) and the record equality semantics (used for
> sorting and building indexes).  In a green field I think I would
> have argued for having just the standard semantics we have
> documented, and modifying our sort execution nodes and index builds
> to deal with that.  This is one of those cases where the breakage
> from changing to that is hard to justify on a cleaner conceptual
> semantics basis.
>
> There also seems to be universal agreement that the operator names
> should be something other than what I put in the v1 patch, but we
> don't have agreement on what should be used instead.  We need six
> operators, to support the btree am requirements.  Currently the
> patch has:
>
> === !== >== >>> <== <<<
>
> Suggested "same as" operators so far are:
>
> ====
> =====
> <<=>>
> =><=
>
> Anyone want to champion one of those, or something else?  How about
> the other five operators to go with your favorite?
ANother take would be using "possibly unequal" for this with operator
defined as

*==* (*definitely* equal , or guaranteed to be equal)

the inequality operator would thus become !== (may be not equal)

and ordering ops  would be ?<==(maybe smaller or equal), ?<(maybe smaller)
and same for larger ?>== and ?>

as a table

*==*       "binary equal, surely very equal by any other definition as wall"
!==?      "maybe not equal" -- "binary inequal, may still be equal by
other comparison methods"
<==?      "binary smaller or equal, may be anything by other comparison
methods"
<?           
>==?
>?


Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: record identical operator

From

Hannu Krosing

Date:

18 September 2013, 21:50:01

On 09/18/2013 07:53 PM, Stephen Frost wrote:
>
> I'm really curious about your thoughts on unique indexes then. Should
> two numerics which are the same value but different byte
> representations be allowed in a unique index?
You could have multiple btree opclasses defined which would enforce
different kind of "uniqueness"

For example you could have an opclass which considers two strings
"equal" if four first bytes are equal.

If you would create an unique index using that opclass you could not
have both "industrial" and
"induction" as primary keys as the same time, as the unique index would
consider them equal.

But you would still want to see the change in your matview after you do

UPDATE mytable set id = 'industrial' where id = 'induction';

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: record identical operator

From

Dimitri Fontaine

Date:

18 September 2013, 22:55:14

Kevin Grittner <kgrittn@ymail.com> writes:
> You are arguing that we should provide lesser support for numeric
> columns (and who knows how many other types) in materialized views
> than we do in streaming replication, pg_dump,
> suppress_redundant_updates_trigger(), and other places?  Why?

Because you're saying that you need SQL semantics, and probably because
I'm not understanding well enough the problem you're trying to solve.

> We have, as a community, gone to a fair amount of trouble  to make
> the concept of equality pluggable and allow multiple types of
> equality per type.  To me it seems the perfect tool to solve this
> problem.  Why the fuss?

Because I don't understand why you need another equality than the
default btree one, certainly. The other opclass, to my knowledge, are
only used in relation with index searches, that is when comparing heap
or input values with indexed values, right?

> No, matviews cannot be fixed without the new operators.  Here are
> the stats on the patch:

Ok, then someone (preferably a commiter) need to understand the problem
at hand in a better way than I do now, I guess. If possible I will read
through your patch, I'm curious now.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: record identical operator

From

Vik Fearing

Date:

18 September 2013, 23:29:16

On 09/19/2013 12:55 AM, Dimitri Fontaine wrote:
>> We have, as a community, gone to a fair amount of trouble  to make
>> > the concept of equality pluggable and allow multiple types of
>> > equality per type.  To me it seems the perfect tool to solve this
>> > problem.  Why the fuss?
> Because I don't understand why you need another equality than the
> default btree one, certainly.

Basically because 'word'::citext and 'Word'::citext are the same to your
brain but not to your eyeballs.

Unique indexes, for example, only need to please your brain.  Matviews
need to please your eyeballs.

-- 
Vik

Re: record identical operator

From

Kevin Grittner

Date:

19 September 2013, 03:49:09

Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> You are arguing that we should provide lesser support for numeric
>> columns (and who knows how many other types) in materialized views
>> than we do in streaming replication, pg_dump,
>> suppress_redundant_updates_trigger(), and other places?  Why?
>
> Because you're saying that you need SQL semantics, and probably because
> I'm not understanding well enough the problem you're trying to solve.

There are examples in the patch and this thread, but rather than
reference back to those I'll add a new one.  Without the patch:

test=# CREATE TABLE nt (id serial PRIMARY KEY, grp citext, num numeric);
CREATE TABLE
test=# INSERT INTO nt (grp, num) VALUES
test-#   ('one', '1.0'),
test-#   ('one', '2.0'),
test-#   ('two', '123.000');
INSERT 0 3
test=# CREATE VIEW nv AS SELECT grp, sum(num) AS num FROM nt GROUP BY grp;
CREATE VIEW
test=# SELECT * FROM nv ORDER BY grp;
 grp |   num  
-----+---------
 one |     3.0
 two | 123.000
(2 rows)

test=# CREATE MATERIALIZED VIEW nm AS SELECT grp, sum(num) AS num FROM nt GROUP BY grp;
SELECT 2
test=# CREATE UNIQUE INDEX nm_id ON nm (grp);
CREATE INDEX
test=# SELECT * FROM nm ORDER BY grp;
 grp |   num  
-----+---------
 one |     3.0
 two | 123.000
(2 rows)

test=# UPDATE nt SET grp = 'Two', num = '123.0000' WHERE id = 3;
UPDATE 1
test=# REFRESH MATERIALIZED VIEW CONCURRENTLY nm;
REFRESH MATERIALIZED VIEW
test=# SELECT * FROM nv ORDER BY grp;
 grp |   num   
-----+----------
 one |      3.0
 Two | 123.0000
(2 rows)

test=# SELECT * FROM nm ORDER BY grp;
 grp |   num  
-----+---------
 one |     3.0
 two | 123.000
(2 rows)

The problem, as I see it, is that the view and the concurrently
refreshed materialized view don't yield the same results for the
same query.  The rows are equal, but they are not the same.  With
the patch the matview, after RMVC, looks just the same as the view.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Dimitri Fontaine

Date:

19 September 2013, 08:54:30

Kevin Grittner <kgrittn@ymail.com> writes:
> There are examples in the patch and this thread, but rather than
> reference back to those I'll add a new one.  Without the patch:

Thanks much for doing that.

> The problem, as I see it, is that the view and the concurrently
> refreshed materialized view don't yield the same results for the
> same query.  The rows are equal, but they are not the same.  With
> the patch the matview, after RMVC, looks just the same as the view.

My understanding is that if you choose citext then you don't care at all
about the case, so that the two relations actually yield the same
results for the right definition of "same" here: the citext one.

In other words, the results only look different in ways that don't
matter for the datatype involved, and I think that if it matters to the
user then he needs to review is datatype choices or view definition.

So my position on that is that your patch is only adding confusion for
no benefits that I'm able to understand.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: record identical operator

From

Hannu Krosing

Date:

19 September 2013, 11:07:38

On 09/19/2013 10:54 AM, Dimitri Fontaine wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> There are examples in the patch and this thread, but rather than
>> reference back to those I'll add a new one.  Without the patch:
> Thanks much for doing that.
>
>> The problem, as I see it, is that the view and the concurrently
>> refreshed materialized view don't yield the same results for the
>> same query.  The rows are equal, but they are not the same.  With
>> the patch the matview, after RMVC, looks just the same as the view.
> My understanding is that if you choose citext then you don't care at all
> about the case, so that the two relations actually yield the same
> results for the right definition of "same" here: the citext one.
>
> In other words, the results only look different in ways that don't
> matter for the datatype involved, and I think that if it matters to the
> user then he needs to review is datatype choices or view definition.
>
> So my position on that is that your patch is only adding confusion for
> no benefits that I'm able to understand.
The aim is to have a view and materialized view return the same results.

If they do not, the user is guaranteed to be confused and to consider
the matview implementation broken

the patch solves the general problem of "when the table changes, refresh"

After saying it like this, the problem could also be solved by including
xmin(s) for rows from underlying table(s)in the matview.

Would this be a better approach ?

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: record identical operator

From

Kevin Grittner

Date:

19 September 2013, 12:33:33

Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:

>> The problem, as I see it, is that the view and the concurrently
>> refreshed materialized view don't yield the same results for the
>> same query.  The rows are equal, but they are not the same.
>> With the patch the matview, after RMVC, looks just the same as
>> the view.
>
> My understanding is that if you choose citext then you don't care
> at all about the case

That's not my understanding.  If that was what citext was for it
would be much simpler to force the case in creating each value.  It
*preserves* the case for display, but ignores it for comparisons.
That's the contract of the type, like it or not.  "Equal" does not
mean "the same".  They clearly want to preserve and display
differences among equal values.

Streaming replication would preserve the differences.  pg_dump and
restore of the data would preserve the differences.  SELECTing the
data shows the differences.  suppress_redundant_updates_trigger()
will not suppress an UPDATE setting an "equal" value unless it is
also *the same*.  A normal VIEW shows the differences. RMV without
CONCURRENTLY will show the difference.  I'm suggesting that RMVC
and the upcoming incremental update of matviews should join this
crowd.

A matview which is being refreshed with the CONCURRENTLY option, or
maintained by incremental maintenance should not look different
from a regular VIEW, and should not suddenly look completely
different after a non-concurrent REFRESH.  If things are left that
way, people will quite justifiably feel that matviews are broken.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Andres Freund

Date:

19 September 2013, 12:36:28

On 2013-09-19 05:33:22 -0700, Kevin Grittner wrote:
> Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote:
> > Kevin Grittner <kgrittn@ymail.com> writes:
>
> >> The problem, as I see it, is that the view and the concurrently
> >> refreshed materialized view don't yield the same results for the
> >> same query.  The rows are equal, but they are not the same.
> >> With the patch the matview, after RMVC, looks just the same as
> >> the view.
> >
> > My understanding is that if you choose citext then you don't care
> > at all about the case
>
> That's not my understanding.  If that was what citext was for it
> would be much simpler to force the case in creating each value.  It
> *preserves* the case for display, but ignores it for comparisons.
> That's the contract of the type, like it or not.  "Equal" does not
> mean "the same".  They clearly want to preserve and display
> differences among equal values.

I agree.

I am not 100% sure if the can of worms this opens is worth the trouble,
but from my POV it's definitely an understandable and sensible goal.

My complaints about this "subfeature" were never about trying to get
that right for matviews...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Kevin Grittner

Date:

19 September 2013, 12:41:49

Hannu Krosing <hannu@2ndQuadrant.com> wrote:

> the patch solves the general problem of "when the table changes,
> refresh"
>
> After saying it like this, the problem could also be solved by
> including xmin(s) for rows from underlying table(s)in the
> matview.
>
> Would this be a better approach ?

Now you're moving from REFRESH territory into incremental
maintenance.  There have been a very large number of papers written
on that topic, I have reviewed the literature, and I have picked a
technique which looks like it will be fast and reliable -- based on
relational algebra.  Let's save discussion of alternatives such as
you're suggesting here, for when I get past the easy stuff ... like
just refreshing a view so that the new contents are the same as
what you would see if you re-ran the query defining the matview.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Hannu Krosing

Date:

19 September 2013, 12:59:26

On 09/19/2013 02:41 PM, Kevin Grittner wrote:
> Hannu Krosing <hannu@2ndQuadrant.com> wrote:
>
>> the patch solves the general problem of "when the table changes,
>> refresh"
>>
>> After saying it like this, the problem could also be solved by
>> including xmin(s) for rows from underlying table(s)in the
>> matview.
>>
>> Would this be a better approach ?
> Now you're moving from REFRESH territory into incremental
> maintenance.  There have been a very large number of papers written
> on that topic, I have reviewed the literature, and I have picked a
> technique which looks like it will be fast and reliable -- based on
> relational algebra.  Let's save discussion of alternatives such as
> you're suggesting here, for when I get past the easy stuff ... like
> just refreshing a view so that the new contents are the same as
> what you would see if you re-ran the query defining the matview.
I'm sure that comparing xmin records would work exactly
similar to binary comparisons of records for detecting
possible change in 99.999% of real-world cases.

I am also pretty sure it would have its own cans of worms
all over again when trying to actually implement this in
matviews :)

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: record identical operator

From

Robert Haas

Date:

19 September 2013, 19:20:44

On Wed, Sep 18, 2013 at 7:29 PM, Vik Fearing <vik.fearing@dalibo.com> wrote:
> On 09/19/2013 12:55 AM, Dimitri Fontaine wrote:
>>> We have, as a community, gone to a fair amount of trouble  to make
>>> > the concept of equality pluggable and allow multiple types of
>>> > equality per type.  To me it seems the perfect tool to solve this
>>> > problem.  Why the fuss?
>> Because I don't understand why you need another equality than the
>> default btree one, certainly.
>
> Basically because 'word'::citext and 'Word'::citext are the same to your
> brain but not to your eyeballs.
>
> Unique indexes, for example, only need to please your brain.  Matviews
> need to please your eyeballs.

Right, and well said.

It's perfectly reasonable to want a "unique" index that doesn't allow
both "Kevin O'leary" and "Kevin O'Leary" to be listed in the
"irish_names" table, but it's also perfectly reasonable to want to
remember that the user who entered the data spelled it the second way
and not the first.  And it's also reasonable to want any corrections
made to the table to propagate to any materialized views defined over
it.

The fact that we have multiple concepts of equality can be confusing,
but it's not for no reason, and we're not the only database or
programming language to have such a concept.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator - Review

From

Steve Singer

Date:

19 September 2013, 22:31:49

On 09/12/2013 06:27 PM, Kevin Grittner wrote:
> Attached is a patch for a bit of infrastructure I believe to be
> necessary for correct behavior of REFRESH MATERIALIZED VIEW
> CONCURRENTLY as well as incremental maintenance of matviews.

Here is attempt at a review of the v1 patch.

There has been a heated discussion on list about if we even want this 
type of operator. I will try to summarize it here

The arguments against it are
* that a record_image_identical call on two records that returns false 
can still return true under the equality operator, and the records might 
(or might not) appear to be the same to users
* Once we expose this as an operator via SQL someone will find it and 
use it and then complain when we change things such as the internal 
representation of a datatype which might   break there queries
* The differences between = and record_image_identical might not be 
understood by everywhere who finds the operator leading to confusion
* Using a criteria other than equality (the = operator provided by the 
datatype) might cause problems if we later provide 'on change' triggers 
for materialized views

The arguments for this patch are
* We want the materialized view to return the same value as would be 
returned if the query were executed directly.  This not only means that 
the values should be the same according to a datatypes = operator but 
that they should appear the same 'to the eyeball'.
* Supporting the materialized view refresh check via SQL makes a lot of 
sense but doing that requires exposing something via SQL
* If we adequately document what we mean by record_image_identical and 
the operator we pick for this then users shouldn't be surprised at what 
they get if they use this

I think there is agreement that better (as in more obscure) operators 
than === and !== need to be picked  and we need to find a place in the 
user-docs to warn users of the behaviour of this operators.   Hannu has 
proposed

*==*       "binary equal, surely very equal by any other definition as wall"
!==?      "maybe not equal" -- "binary inequal, may still be equal by
other comparison methods"

and no one has yet objected to this.  My vote would be to update the patch to use those operator names and then figure
outwhere we can document them.  It it means we have to section describing the equal operator for records as well then
maybewe should do that so we can get on with things.

Code Review
--------------------

The record_image_eq and record_image_cmp functions are VERY similar.  It seems to me that you could have the meet of
thesefunctions into a common function (that isn't exposed via SQL) that then behaves differently  in 2 or 3 spots based
ona parameter that controls if it detoasts the tuple for the compare or returns false on the equals.

Beyond that I don't see any issues with the code.

You call out a question about two records being compared have a different number of columns should it always be an
error,or only an error when they match on the columns they do have in common.

The current behaviour is
 select * FROM r1,r2 where r1<==r2; a | b | a | b | c
---+---+---+---+--- 1 | 1 | 1 | 2 | 1
(1 row)

update r2 set b=1;
UPDATE 1
test=# select * FROM r1,r2 where r2<==r1;
ERROR:  cannot compare record types with different numbers of columns

This seems like a bad idea to me.  I don't like that I get a type comparison error only sometimes based on the values
ofthe data in the column.  If I'm a user testing code that compares two records with this operator and I test my query
with1 value pair then and it works then I'd naively expect to get a true or false on all other value sets of the same
recordtype.

Re: record identical operator - Review

From

Martijn van Oosterhout

Date:

20 September 2013, 10:21:38

On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote:
> I think there is agreement that better (as in more obscure)
> operators than === and !== need to be picked  and we need to find a
> place in the user-docs to warn users of the behaviour of this
> operators.   Hannu has proposed
>
> *==*       "binary equal, surely very equal by any other definition as wall"
> !==?      "maybe not equal" -- "binary inequal, may still be equal by
> other comparison methods"

It's a pity operators must be non-alpha and can't be named. Something
like:

SELECT foo OPERATOR("byte_equivalent") bar;

is simultaneously obscure, yet clear.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: record identical operator

From

Stephen Frost

Date:

20 September 2013, 12:27:51

* Kevin Grittner (kgrittn@ymail.com) wrote:
> ... like
> just refreshing a view so that the new contents are the same as
> what you would see if you re-ran the query defining the matview.

I've heard this notion a few times of wanting to make sure that what you
get from running the query matches the matview.  While that makes sense
when the equality operator and what-you-see actually match, it doesn't
when they don't because the what-you-see ends up being non-deterministic
and can change based on the order the datums are seen in during the
query processing which can change with different data ordering on disk
and even due to simply different plans for the same data, I believe.

Consider a GROUP BY with a citext column as one of the key fields.
You're going to get whatever value the aggregate happened to come across
first when building the HashAgg.  Having the binary equality operator
doesn't help that and seems like it could even result in change storms
happening due to a different plan when the actual data didn't change.
Thanks,
    Stephen

Re: record identical operator - Review

From

Kevin Grittner

Date:

20 September 2013, 12:38:59

Steve Singer <steve@ssinger.info> wrote:
> On 09/12/2013 06:27 PM, Kevin Grittner wrote:
>> Attached is a patch for a bit of infrastructure I believe to be
>> necessary for correct behavior of REFRESH MATERIALIZED VIEW
>> CONCURRENTLY as well as incremental maintenance of matviews.
>
> Here is attempt at a review of the v1 patch.

Thanks for looking at it.

> There has been a heated discussion on list about if we even want this
> type of operator.

It's not a question of whether we want to allow operators which
define comparisons between objects in a non-standard way.  We
already have 11 such cases; this would add one more to make 12.  In
all cases there are different operators for making a comparison
that return potentially different results from the default
operators, to support uses that are specific to that type.

> I think there is agreement that better (as in more obscure) operators
> than === and !== need to be picked  and we need to find a place in the
> user-docs to warn users of the behaviour of this operators.  Hannu has
> proposed
>
> *==*      "binary equal, surely very equal by any other definition as
> wall"
> !==?      "maybe not equal" -- "binary inequal, may still be
> equal by
> other comparison methods"
>
> and no one has yet objected to this.

I do.  The issue is that there are a great many places that there
are multiple definitions of equality.  We generally try to use
something which tries to convey something about the nature of the
operation, since the fact that it can have different results is a
given.  There is nothing in those operators that says "binary image
comparison".  I thought about prepending a hash character in front
of normal operators, because that somehow suggests binary
operations to my eye (although I have no idea whether it does so
for anyone else), but those operators are already used for an
alternative set of comparisons for intervals, with a different
meaning.  I'm still trying to come up with something I really like.

> My vote would be to update the patch to
> use those operator names and then figure out where we can document them.  It it
> means we have to section describing the equal operator for records as well then
> maybe we should do that so we can get on with things.

> Code Review
> --------------------
>
> The record_image_eq and record_image_cmp functions are VERY similar.  It seems
> to me that you could have the meet of these functions into a common function
> (that isn't exposed via SQL) that then behaves differently  in 2 or 3 spots
> based on a parameter that controls if it detoasts the tuple for the compare or
> returns false on the equals.

Did you look at the record_eq and record_cmp functions which exist
before this patch?  Is there a reason we should do it one way for
the default operators and another way for this alternative?  Do you
think we should change record_eq and record_cmp to do things the
way you suggest?

> Beyond that I don't see any issues with the code.
>
> You call out a question about two records being compared have a different number
> of columns should it always be an error, or only an error when they match on the
> columns they do have in common.
>
> The current behaviour is
>
>   select * FROM r1,r2 where r1<==r2;
>   a | b | a | b | c
> ---+---+---+---+---
>   1 | 1 | 1 | 2 | 1
> (1 row)
>
> update r2 set b=1;
> UPDATE 1
> test=# select * FROM r1,r2 where r2<==r1;
> ERROR:  cannot compare record types with different numbers of columns
>
> This seems like a bad idea to me.  I don't like that I get a type comparison
> error only sometimes based on the values of the data in the column.  If I'm
> a user testing code that compares two records with this operator and I test my
> query with 1 value pair then and it works then I'd naively expect to get a
> true or false on all other value sets of the same record type.

Again, this is a result of following the precedent set by the
existing record comparison operators.

test=# create table r1 (c1 int, c2 int);
CREATE TABLE
test=# create table r2 (c1 int, c2 int, c3 int);
CREATE TABLE
test=# insert into r1 values (1, 2);
INSERT 0 1
test=# insert into r2 values (3, 4, 5);
INSERT 0 1
test=# select * from r1 join r2 on r1 < r2;
 c1 | c2 | c1 | c2 | c3
----+----+----+----+----
  1 |  2 |  3 |  4 |  5
(1 row)

test=# update r1 set c1 = 3, c2 = 4;
UPDATE 1
test=# select * from r1 join r2 on r1 < r2;
ERROR:  cannot compare record types with different numbers of columns

Would be in favor of forcing a change to the record comparison
operators which have been in use for years?  If not, is there a
good reason to have them behave differently in this regard?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator - Review

From

Stephen Frost

Date:

20 September 2013, 13:45:39

Steve,
 Thanks for providing a summary.

* Steve Singer (steve@ssinger.info) wrote:
> The arguments for this patch are
> * We want the materialized view to return the same value as would be
> returned if the query were executed directly.  This not only means
> that the values should be the same according to a datatypes =
> operator but that they should appear the same 'to the eyeball'.

With the cases where the equality operator doesn't match the 'to the
eyeball' appearance, this really seems to be a pipe dream to me, unless
we *also* define an ordering or similar which would then change the
existing semantics for end users (which might be reasonable), but I'm
not sure how we could do that without input from the type- iow, I don't
think we can say "well, we'll order by byte representation".  It's also
going to possibly be a performance hit since we're going to have to do
both an equality op call during a HashAgg/HashJoin/whatever and have to
do a "which one is bigger of these two equal things", and then I wonder
if we'd need to allow users to somehow specify "I don't want the bigger
of the equal things, I want the smaller, in this query for this equality
check."  I'd really like to see how we're going to provide for that when
the user is doing a GROUP BY without breaking something else or causing
problems with later SQL spec revisions.

> * Supporting the materialized view refresh check via SQL makes a lot
> of sense but doing that requires exposing something via SQL

... which we don't currently expose for the queries that we already
support users running today.  Users seem to generally be accepting of
that too, even though what they end up with in the output isn't
necessairly consistent from query to query.  The issue here is that
we're trying to make the mat view look like what the query would do when
run at the same time, which is a bit of a losing battle, imv.

> * If we adequately document what we mean by record_image_identical
> and the operator we pick for this then users shouldn't be surprised
> at what they get if they use this

That's a bit over-simplistic, really.  We do try to keep to the POLA
(principle of least astonishment) and that's not going to be easy here.

> I think there is agreement that better (as in more obscure)
> operators than === and !== need to be picked  and we need to find a
> place in the user-docs to warn users of the behaviour of this
> operators.   Hannu has proposed

I'm a bit on the fence about this, after having discussed my concerns
with Robert at PostgresOpen.  If we're going to expose and support
these at the SQL level, and we can figure out some semantics and
consistency for using them that isn't totally baffling, then perhaps
having them as simple and clear operator names would be reasonable.  One
concern here is if the SQL spec might decide some day that '===' is a
useful operator for something else (perhaps even "they look the same
when cast to text").

> *==*       "binary equal, surely very equal by any other definition as wall"
> !==?      "maybe not equal" -- "binary inequal, may still be equal by
> other comparison methods"

Those look alright to me also though, but we'd need to work out the
other operation names, right?  And then figure out if we want to use
those other operators (less-than, greater-than, etc) when doing equality
operations to figure out which equal value to return to the user.
Thanks,
    Stephen

Re: record identical operator

From

Kevin Grittner

Date:

20 September 2013, 13:49:38

Stephen Frost <sfrost@snowman.net> wrote:
> * Kevin Grittner (kgrittn@ymail.com) wrote:
>> ... like
>> just refreshing a view so that the new contents are the same as
>> what you would see if you re-ran the query defining the matview.
>
> I've heard this notion a few times of wanting to make sure that what you
> get from running the query matches the matview.  While that makes sense
> when the equality operator and what-you-see actually match, it doesn't
> when they don't because the what-you-see ends up being non-deterministic
> and can change based on the order the datums are seen in during the
> query processing which can change with different data ordering on disk
> and even due to simply different plans for the same data, I believe.

That's a fair point to some extent.  Where notions of equality
differ, it is not always non-deterministic, but it can be.  For
citext you are correct.  For a sum() of numeric data, the number of
decimal positions will be the largest value seen; the value present
in the query results will not vary by order of rows scanned or by
plan.

The result of this is that with the patch, an incremental refresh
of a matview will always match what the query returned at the time
it was run (there is no *correctness* problem) but if someone uses
a query with non-deterministic results they may have a lot of
activity on a concurrent refresh even if there was no change to the
underlying data -- so you could have a *performance* penalty in
cases where the query returns something different, compared to
leaving the old "equal but not the same" results.

> Consider a GROUP BY with a citext column as one of the key fields.
> You're going to get whatever value the aggregate happened to come across
> first when building the HashAgg.  Having the binary equality operator
> doesn't help that and seems like it could even result in change storms
> happening due to a different plan when the actual data didn't change.

Yup.  A person who wants to GROUP BY a citext value for a matview
might want to fold it to a consistent capitalization to avoid that
issue.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator - Review

From

Kevin Grittner

Date:

20 September 2013, 13:55:25

Stephen Frost <sfrost@snowman.net> wrote:

> The issue here is that we're trying to make the mat view look
> like what the query would do when run at the same time, which is
> a bit of a losing battle, imv.

If we're not doing that, I don't see the point of matviews.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator - Review

From

Stephen Frost

Date:

20 September 2013, 14:00:46

On Friday, September 20, 2013, Kevin Grittner wrote:

Stephen Frost <sfrost@snowman.net> wrote:

> The issue here is that we're trying to make the mat view look
> like what the query would do when run at the same time, which is
> a bit of a losing battle, imv.

If we're not doing that, I don't see the point of matviews.

When taken out of context, I can see how that might not come across correctly. I was merely pointing out that it's a losing battle in the context of types which have equality operators which claim equalness but cast to text with results that don't match there.

Thanks,

Stephen

Re: record identical operator - Review

From

Kevin Grittner

Date:

20 September 2013, 14:31:29

Stephen Frost <sfrost@snowman.net> wrote:
> On Friday, September 20, 2013, Kevin Grittner  wrote:
>> Stephen Frost <sfrost@snowman.net> wrote:
>>
>>> The issue here is that we're trying to make the mat view look
>>> like what the query would do when run at the same time, which
>>> is a bit of a losing battle, imv.
>>
>> If we're not doing that, I don't see the point of matviews.
>
> When taken out of context, I can see how that might not come
> across correctly. I was merely pointing out that it's a losing
> battle in the context of types which have equality operators
> which claim equalness but cast to text with results that don't
> match there.

I think the patch I've submitted wins that battle.  The only oddity
is that if someone uses a query for a matview which can provide
results with rows which are equal to previous result rows under the
default record opclass but different under this patch's opclass,
RMVC could update to the latest query results when someone thinks
that might not be necessary.  Workarounds would include using a
query with deterministic results (like using the upper() or lower()
function on a grouped citext column in the result set) or not using
the CONCURRENTLY option.  Neither seems too onerous.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

20 September 2013, 14:49:16

* Kevin Grittner (kgrittn@ymail.com) wrote:
> The result of this is that with the patch, an incremental refresh
> of a matview will always match what the query returned at the time
> it was run (there is no *correctness* problem) but if someone uses
> a query with non-deterministic results they may have a lot of
> activity on a concurrent refresh even if there was no change to the
> underlying data -- so you could have a *performance* penalty in
> cases where the query returns something different, compared to
> leaving the old "equal but not the same" results.

You mean 'at the time of the incremental refresh', right?  Yet that may
or may not match running that query directly by an end-user as the plan
that a user might get for the entire query could be different than what
is run for an incremental update, or due to statistics updates, etc.

> > Consider a GROUP BY with a citext column as one of the key fields.
> > You're going to get whatever value the aggregate happened to come across
> > first when building the HashAgg.  Having the binary equality operator
> > doesn't help that and seems like it could even result in change storms
> > happening due to a different plan when the actual data didn't change.
>
> Yup.  A person who wants to GROUP BY a citext value for a matview
> might want to fold it to a consistent capitalization to avoid that
> issue.

I'm trying to figure out why that's a perfectly acceptable solution for
users running views with GROUP BYs, but apparently it isn't sufficient
for mat views?  In other words, you would suggest telling users "sorry,
you can't rely on the value returned by a GROUP BY on that citext column
using a normal view, but we're going to try and do better for mat
views".

I don't intend the above to imply that we should never update values in
mat views when we can do so in a reliable way to ensure that the value
matches what a view would return.  This matches our notion of UPDATE,
where we will still UPDATE a value even if the old value and the new
value are equal according to the type's equality operator, when the
conditional for the UPDATE is using a reliable type (eg: integer).
Thanks,
    Stephen

Re: record identical operator

From

Andres Freund

Date:

20 September 2013, 14:55:18

On 2013-09-20 10:51:46 -0400, Stephen Frost wrote:
> I'm trying to figure out why that's a perfectly acceptable solution for
> users running views with GROUP BYs, but apparently it isn't sufficient
> for mat views?

Err, because users wrote a GROUP BY? They haven't (neccessarily) in the
cases of the matviews we're talking about?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Kevin Grittner

Date:

20 September 2013, 15:01:27

Stephen Frost <sfrost@snowman.net> wrote:
> * Kevin Grittner (kgrittn@ymail.com) wrote:
>> The result of this is that with the patch, an incremental refresh
>> of a matview will always match what the query returned at the time
>> it was run (there is no *correctness* problem) but if someone uses
>> a query with non-deterministic results they may have a lot of
>> activity on a concurrent refresh even if there was no change to the
>> underlying data -- so you could have a *performance* penalty in
>> cases where the query returns something different, compared to
>> leaving the old "equal but not the same" results.
>
> You mean 'at the time of the incremental refresh', right?  Yet that may
> or may not match running that query directly by an end-user as the plan
> that a user might get for the entire query could be different than what
> is run for an incremental update, or due to statistics updates, etc.

I'm confused.  The refresh *does* run the query.  Sure, if the
query is run again it could return different results.  I'm missing
the point here.

>>> Consider a GROUP BY with a citext column as one of the key
>>> fields.  You're going to get whatever value the aggregate
>>> happened to come across first when building the HashAgg.
>>> Having the binary equality operator doesn't help that and seems
>>> like it could even result in change storms happening due to a
>>> different plan when the actual data didn't change.
>>
>> Yup.  A person who wants to GROUP BY a citext value for a matview
>> might want to fold it to a consistent capitalization to avoid that
>> issue.
>
> I'm trying to figure out why that's a perfectly acceptable solution for
> users running views with GROUP BYs, but apparently it isn't sufficient
> for mat views?  In other words, you would suggest telling users "sorry,
> you can't rely on the value returned by a GROUP BY on that citext column
> using a normal view, but we're going to try and do better for mat
> views".

Again, I'm lost.  If they don't do something to make the result
deterministic, it could be different on each run of the VIEW, and
on each REFRESH of the matview.  I don't see why that is an
argument for trying to suppress the effort needed make the matview
match the latest run of the query.

> I don't intend the above to imply that we should never update values in
> mat views when we can do so in a reliable way to ensure that the value
> matches what a view would return.  This matches our notion of UPDATE,
> where we will still UPDATE a value even if the old value and the new
> value are equal according to the type's equality operator, when the
> conditional for the UPDATE is using a reliable type (eg: integer).

Well, we provide a trigger function to suppress the UPDATE
operation if the old and new values are identical -- in terms of
what is stored.  We do not attempt to use the default btree equals
operator to suppress updates to different values in some
circumstances.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

20 September 2013, 15:02:34

* Andres Freund (andres@2ndquadrant.com) wrote:
> On 2013-09-20 10:51:46 -0400, Stephen Frost wrote:
> > I'm trying to figure out why that's a perfectly acceptable solution for
> > users running views with GROUP BYs, but apparently it isn't sufficient
> > for mat views?
>
> Err, because users wrote a GROUP BY? They haven't (neccessarily) in the
> cases of the matviews we're talking about?

Sure; my thinking was going back to what Hannu had suggested where we
have a mechanism to see if the value was updated (using xmin or similar)
and then update it in the mat view in that case, without actually doing
a comparison at all.

That wouldn't necessairly work for the GROUP BY case, but that situation
doesn't work reliably today anyway.  If we solve the GROUP BY case in
SQL for these types then we could use the same for mat views, but we've
been happy enough to ignore the issue thus far.
Thanks,
    Stephen

Re: record identical operator

From

Stephen Frost

Date:

20 September 2013, 15:20:45

* Kevin Grittner (kgrittn@ymail.com) wrote:
> Stephen Frost <sfrost@snowman.net> wrote:
> > You mean 'at the time of the incremental refresh', right?  Yet that may
> > or may not match running that query directly by an end-user as the plan
> > that a user might get for the entire query could be different than what
> > is run for an incremental update, or due to statistics updates, etc.
>
> I'm confused.  The refresh *does* run the query.  Sure, if the
> query is run again it could return different results.  I'm missing
> the point here.

Perhaps I'm assuming things are farther along than they are..  I was
assumed that 'incremental mat view' updates were actuallly 'partial'-
iow, that it keeps track of the rows in the mat view which are
impacted by the set of rows in the base relations and then runs specific
queries to pull out and operate on the base-rel set-of-rows to update
the specific mat-view-rows.  If it's running the whole query again then
it's certainly more likely to get the same results that the user did,
but it's not a guarantee and that's only a happy coincidence while we're
still doing the whole query approach (which I hope we'd eventually
progress beyond..).

> > I'm trying to figure out why that's a perfectly acceptable solution for
> > users running views with GROUP BYs, but apparently it isn't sufficient
> > for mat views?  In other words, you would suggest telling users "sorry,
> > you can't rely on the value returned by a GROUP BY on that citext column
> > using a normal view, but we're going to try and do better for mat
> > views".
>
> Again, I'm lost. 

Perhaps my reply to Andres will help clear that up?

> If they don't do something to make the result
> deterministic, it could be different on each run of the VIEW, and
> on each REFRESH of the matview.  I don't see why that is an
> argument for trying to suppress the effort needed make the matview
> match the latest run of the query.

Is this really just "run the new query and look if the old and new row
are different" wrt 'incremental' view updates?  If so, I think we're
trying to design something here that we're going to totally break very
shortly when we move beyond that kind of sledgehammer approach to
incremental mat view updates and I'd rather we figure out a solution
which will work beyond that..
> > I don't intend the above to imply that we should never update values in
> > mat views when we can do so in a reliable way to ensure that the value
> > matches what a view would return.  This matches our notion of UPDATE,
> > where we will still UPDATE a value even if the old value and the new
> > value are equal according to the type's equality operator, when the
> > conditional for the UPDATE is using a reliable type (eg: integer).
>
> Well, we provide a trigger function to suppress the UPDATE
> operation if the old and new values are identical -- in terms of
> what is stored.  We do not attempt to use the default btree equals
> operator to suppress updates to different values in some
> circumstances.

This feels like we're trying to figure out how to create a key for a
whole row based on the binary representation of that row and then use
that to check if we should update a given row or not.  This seems a bit
radical, but perhaps that's what we should just do then, rather than
invent binary equivilance operators for what-things-look-like-now for
this- extract the row columns out using their binary _send functions
and then hash the result to build a key that you can use.  At least then
we're using a documented (well, we could probably improve on this, but
still) and stable representation of the data that at least some of our
users already deal with.
Thanks,
    Stephen

Re: record identical operator

From

Andres Freund

Date:

20 September 2013, 15:22:01

On 2013-09-20 11:05:06 -0400, Stephen Frost wrote:
> * Andres Freund (andres@2ndquadrant.com) wrote:
> > On 2013-09-20 10:51:46 -0400, Stephen Frost wrote:
> > > I'm trying to figure out why that's a perfectly acceptable solution for
> > > users running views with GROUP BYs, but apparently it isn't sufficient
> > > for mat views?
> > 
> > Err, because users wrote a GROUP BY? They haven't (neccessarily) in the
> > cases of the matviews we're talking about?
> 
> Sure; my thinking was going back to what Hannu had suggested where we
> have a mechanism to see if the value was updated (using xmin or similar)
> and then update it in the mat view in that case, without actually doing
> a comparison at all.

VACUUM, HOT pruning. Have fun.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Stephen Frost

Date:

20 September 2013, 15:27:55

* Andres Freund (andres@2ndquadrant.com) wrote:
> On 2013-09-20 11:05:06 -0400, Stephen Frost wrote:
> > Sure; my thinking was going back to what Hannu had suggested where we
> > have a mechanism to see if the value was updated (using xmin or similar)
> > and then update it in the mat view in that case, without actually doing
> > a comparison at all.
>
> VACUUM, HOT pruning. Have fun.

Yea, clearly oversimplified, but I do expect that we're going to reach a
point where we're looking at the rows being updated in the base rels and
which rows they map to in the view and then marking those rows as
needing to be updated.  That whole mechanism doesn't depend on this
"are-they-binary-equal" approach and is what I had anticipated as the
path we'd be going down in the future.

The above is also what I recall had been discussed at the hackers
meeting, along with some ideas/papers about how to specifically
implement partial updates, hence my assumption that was what we were
talking about..
Thanks,
    Stephen

Re: record identical operator - Review

From

Steve Singer

Date:

20 September 2013, 17:12:39

On 09/20/2013 08:38 AM, Kevin Grittner wrote:
> Did you look at the record_eq and record_cmp functions which exist 
> before this patch?  Is there a reason we should do it one way for the 
> default operators and another way for this alternative? 
> Do you think we should change record_eq and record_cmp to do things 
> the way you suggest? 

I think the record_eq and record_cmp functions would be better if they 
shared code as well, but I don't think changing that should be part of 
this patch, you aren't otherwise touching those functions. I know some 
people dislike code that is switch based and prefer duplication, my 
preference is to avoid duplication.

This seems like a bad idea to me.  I don't like that I get a type comparison
error only sometimes based on the values of the data in the column.  If I'm
a user testing code that compares two records with this operator and I test my
query with 1 value pair then and it works then I'd naively expect to get a
true or false on all other value sets of the same record type.

> Again, this is a result of following the precedent set by the
> existing record comparison operators.
>
> test=# create table r1 (c1 int, c2 int);
> CREATE TABLE
> test=# create table r2 (c1 int, c2 int, c3 int);
> CREATE TABLE
> test=# insert into r1 values (1, 2);
> INSERT 0 1
> test=# insert into r2 values (3, 4, 5);
> INSERT 0 1
> test=# select * from r1 join r2 on r1 < r2;
>   c1 | c2 | c1 | c2 | c3
> ----+----+----+----+----
>    1 |  2 |  3 |  4 |  5
> (1 row)
>
> test=# update r1 set c1 = 3, c2 = 4;
> UPDATE 1
> test=# select * from r1 join r2 on r1 < r2;
> ERROR:  cannot compare record types with different numbers of columns
>
> Would be in favor of forcing a change to the record comparison
> operators which have been in use for years?  If not, is there a
> good reason to have them behave differently in this regard?
>
> --
I hadn't picked up on that you copied that part of the behaviour from 
the existing comparison operators.
No we would need a really good reason for changing the behaviour of the 
comparison operators and I don't think we have that.  I agree that the 
binary identical operators should behave similarly to the existing 
comparison operators and error out on the column number mismatch after 
comparing the columns that are present in both.

Steve

> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>

Re: record identical operator

From

Alvaro Herrera

Date:

20 September 2013, 18:50:29

Dimitri Fontaine escribió:

> My understanding is that if you choose citext then you don't care at all
> about the case, so that the two relations actually yield the same
> results for the right definition of "same" here: the citext one.

For the record, I don't think citext means that the user doesn't care"
about the case; it only means they want the comparisons to be
case-insensitive, but case should nonetheless be preserved.  That is,
"case-insensitive, case-preserving".  A parallel is MS-DOS file name
semantics.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: record identical operator

From

Robert Haas

Date:

23 September 2013, 16:27:29

On Fri, Sep 20, 2013 at 11:23 AM, Stephen Frost <sfrost@snowman.net> wrote:
> Perhaps I'm assuming things are farther along than they are..  I was
> assumed that 'incremental mat view' updates were actuallly 'partial'-
> iow, that it keeps track of the rows in the mat view which are
> impacted by the set of rows in the base relations and then runs specific
> queries to pull out and operate on the base-rel set-of-rows to update
> the specific mat-view-rows.  If it's running the whole query again then
> it's certainly more likely to get the same results that the user did,
> but it's not a guarantee and that's only a happy coincidence while we're
> still doing the whole query approach (which I hope we'd eventually
> progress beyond..).

It seems odd to me that you have such strong opinions about what is
and is not acceptable here given that you don't seem to familiar with
the current state of this work.  I will attempt to summarize my
understanding.  In 9.3, we can refresh a materialized view by taking
an access exclusive lock on the relation, rerunning the query, and
replacing the contents wholesale.  In master, there is a new option to
perform the refresh concurrently, which is documented here:

http://www.postgresql.org/docs/devel/static/sql-refreshmaterializedview.html

It reruns the query in its entirety and then figures out what inserts,
updates, and deletes are needed to make the matview match the output
of the query (see refresh_by_match_merge).  This is an improvement
over what is available in 9.3, because even though you still have to
rerun the full query, you don't have to lock out access to the table
in order to apply the changes.  However, currently, it sometimes fails
to perform updates that are needed to make the contents of the view
match the query output, because it relies on a notion of equality
other than exact equality.  Kevin is proposing to fix this problem via
this patch.

Now, the next project Kevin's going to work on, and that he was
working on when he discovered this problem, is incremental
maintenance: that is, allowing us to update the view *without* needing
to rerun the entire query.  This record comparison operator will be
just as important in that context.  The *only* strategy refreshing a
materialized view that *doesn't* need this operator is the only we
have in 9.3: through out all the old data and replace it with the
result of re-executing the query.  If you want anything smarter than
that, you MUST compare old and new rows for equality so that you can
update only those rows that have been changed.  And if you compare
them *any strategy other than the one Kevin is proposing*, namely
binary equality, then you may sometimes decide that a row has not been
changed when it really has, and then you won't update the row, and
then incremental maintenance will be enabled to produce *wrong
answers*.  So to me this has come to seem pretty much open and shut.

We all know that materialized views are not going to always match the
data returned by the underlying query.  Perhaps the canonical example
is SELECT random().  As you pointed out upthread, any query that is
nondeterministic is a potential problem for materialized views.  When
you write a query that can return different output based on the order
in which input rows are scanned, or based on any sort of external
state such as a random-number generator, each refresh of a
materialized view based on that query may produce different answers.
There's not a lot we can do about that, except tell people to avoid
using such queries in materialized views that they expect to be
stable.  However, what we're talking about here is a completely
separate problem.  Even if the query is 100% deterministic, without
this patch, the materialized view can get out of sync with the query
results.

Granted, most of the ways in which it can get out of sync are fairly
subtle: failing to preserve case in a data type where comparisons are
text-insensitive; gaining or loosing an all-zeroes null bitmap on an
array type; adding or removing trailing zeroes after the decimal point
in a numeric.  If the materialized view sometimes said "1" when the
query was returning "0", we'd presumably all say "that's a bug, let's
fix it".  But what we're actually talking about is that the query
returns "0.00" and the view still says zero.  So we're doing a lot of
soul-searching about whether that's unequal enough to justify updating
the row.  Personally, though, there's not a lot of doubt in my mind.
If I create a table and I put 0 into a column of that table and then
create a materialized view and that 0 ends up in the materialized
view, and then I update the 0 to 0.00 and refresh the view, I expect
that change to propagate through to the materialized view.  It works
that way if I select from a regular, non-materialized view; and it
also works that way if I select from the table itself.  The idea that
materialized views should somehow be exempt from reflecting changes to
the underlying data in certain corner cases seems odd and indefensible
to me, and I can't understand why anyone's arguing that we should do
that.

If we're going to avoid that, we need this operator.  We can argue
about how it should be named and whether it should be documented, and
we can have all those arguments and still fix the problem.  But if we
decide we're not going to add this operator, then that seems to be
basically saying that we don't want to allow materialized views to
accurately reflect the results of the underlying queries.  And I think
that would be an extremely poor decision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

23 September 2013, 17:16:40

* Robert Haas (robertmhaas@gmail.com) wrote:
> It seems odd to me that you have such strong opinions about what is
> and is not acceptable here given that you don't seem to familiar with
> the current state of this work.

That'd be because the arguments that I've been trying to make around
this havn't had terribly much to do with the current state of this work
but rather the higher level concepts- which are more important anyway,
imv.

> I will attempt to summarize my
> understanding.  In 9.3, we can refresh a materialized view by taking
> an access exclusive lock on the relation, rerunning the query, and
> replacing the contents wholesale.  In master, there is a new option to
> perform the refresh concurrently, which is documented here:
>
> http://www.postgresql.org/docs/devel/static/sql-refreshmaterializedview.html
>
> It reruns the query in its entirety and then figures out what inserts,
> updates, and deletes are needed to make the matview match the output
> of the query (see refresh_by_match_merge).

So I've gathered and I'm not terribly surprised that it's broken.  It
was unfortunate that we committed it as such, imv.  I'm *not* convinced
that means we should build on that in a direction that doesn't appear to
be getting us any closer to real matviews.  Being in master doesn't make
it released.

> This is an improvement
> over what is available in 9.3, because even though you still have to
> rerun the full query, you don't have to lock out access to the table
> in order to apply the changes.

I see how you can view it as an improvement, but consider what you'd say
if a user showed up on IRC wanting to implement *exactly* this in their
DB using a cronjob and a bit of plpgsql.  Would we encourage him/her and
say "gee, that's a great approach, just compare the whole row!"  Heck
no, we'd say "uhh, you need a key there that you can depend on when
you're doing your updates."  We'd probably even suggest that they, I
dunno, make that key their *primary key* for the matview.  If they asked
what would happen with their little plpgsql code when using citext or
other, similar, type, we'd tell them exactly what would happen when
using such a type with regular updates, and maybe even describe how they
could get themselves into trouble if they tried to issue updates which
changed those key fields and therefore run into possible lock contention
from the unique index on values-that-are-not-really-unique-to-them.

I can't see anyone encouraging that and here we are building a major new
feature on it!  To be honest, I'm amazed that I appear to be alone with
this.

> Now, the next project Kevin's going to work on, and that he was
> working on when he discovered this problem, is incremental
> maintenance: that is, allowing us to update the view *without* needing
> to rerun the entire query.  This record comparison operator will be
> just as important in that context.

You state this but I don't see where you justify this claim..

> The *only* strategy refreshing a
> materialized view that *doesn't* need this operator is the only we
> have in 9.3: through out all the old data and replace it with the
> result of re-executing the query.

I really don't agree with this, but allow me to play along a bit.  What
is going to happen with this incremental updates when you want to use an
index to figure out what to update?  Are you going to build the index
using your binary equality operator, which could allow *duplicates that
are not really duplicates*, even in the externally-visible world?  And
then we'll be returning multiple rows to the user when we shouldn't be?
There's a whole slew of evil examples of this- it's called unnecessary
DISTINCT's.  Or perhaps you'll simply look up based on the actual
equality answer and then *pick one* to use, perhaps the most recent, but
then that may or may not equal what the actual query would generate
*anyway* because of costing, join ordering, etc, etc, as I already
pointed out.  You're trying to guarantee something here that you can't.
Trying to fake it and tell the users "oh, you'll be ok" is actually
worse because it means they'll try and depend on it and then get all
kinds of upset when it ends up *not* working.

I'd much rather tell users "look, if you want to use these for *data*
fields, that's fine, because our smart matview will figure out the keys
associated with a GROUP BY and will update based on those, but if you
GROUP BY a column with a type whose equality operator allows things to
be different, you'll get *the same behavior you would get from the
query*, which is to say, it won't be deterministic."

> If you want anything smarter than
> that, you MUST compare old and new rows for equality so that you can
> update only those rows that have been changed.  And if you compare
> them *any strategy other than the one Kevin is proposing*, namely
> binary equality, then you may sometimes decide that a row has not been
> changed when it really has, and then you won't update the row, and
> then incremental maintenance will be enabled to produce *wrong
> answers*.  So to me this has come to seem pretty much open and shut.

The incremental maintenance approach, I'd hope, would be keeping track
of what changed and what rows in the view need to be regenerated due to
those underlying rows changing.  Doing that, you *would* know what
needed to be updated, and how.  If you're not keeping track of the
specific rows that changed underneath, then I'm curious how these
incremental matviews will work differently from what's being described
here, where we're trying to make the whole freakin' row the key.

> We all know that materialized views are not going to always match the
> data returned by the underlying query.  Perhaps the canonical example
> is SELECT random().  As you pointed out upthread, any query that is
> nondeterministic is a potential problem for materialized views.  When
> you write a query that can return different output based on the order
> in which input rows are scanned, or based on any sort of external
> state such as a random-number generator, each refresh of a
> materialized view based on that query may produce different answers.

Of course.

> There's not a lot we can do about that, except tell people to avoid
> using such queries in materialized views that they expect to be
> stable.  However, what we're talking about here is a completely
> separate problem.  Even if the query is 100% deterministic, without
> this patch, the materialized view can get out of sync with the query
> results.

Only a subset of the queries involving these types are 100%
deterministic and those are the stupid simple cases where the view is
just being used as a synonym.  As soon as you start doing anything that
actual uses the equality operator (and we have a *lot* of things that
do!), you're going to get nondeterministic behavior.  You're trying to
paper over this and I'm not impressed.

Or are you telling me that a matview using the binary-equality technique
will be deterministic when your GROUP BY a citext column in the view?  I
don't buy it.

> Granted, most of the ways in which it can get out of sync are fairly
> subtle: failing to preserve case in a data type where comparisons are
> text-insensitive; gaining or loosing an all-zeroes null bitmap on an
> array type; adding or removing trailing zeroes after the decimal point
> in a numeric.  If the materialized view sometimes said "1" when the
> query was returning "0", we'd presumably all say "that's a bug, let's
> fix it".  But what we're actually talking about is that the query
> returns "0.00" and the view still says zero.  So we're doing a lot of
> soul-searching about whether that's unequal enough to justify updating
> the row.  Personally, though, there's not a lot of doubt in my mind.

If it was as open-and-shut as that, and we *could* be 100%
deterministic in *all* cases with these special types when going through
views, I'd be a lot more on-board.  However, we can't; not without
insisting that the users use some new set of "binary-aggregation"
operators which are able to *pick* which of the equal-but-not-really
options they want; and by-the-way, you can *bet* that's going to need to
be type-specific.

> If I create a table and I put 0 into a column of that table and then
> create a materialized view and that 0 ends up in the materialized
> view, and then I update the 0 to 0.00 and refresh the view, I expect
> that change to propagate through to the materialized view.  It works
> that way if I select from a regular, non-materialized view; and it
> also works that way if I select from the table itself.

Sure- and then you try to do the same thing with a join or a group by.
Just because it works in the simple SELECT case doesn't mean we get to
tell users "you can depend on this!"

> The idea that
> materialized views should somehow be exempt from reflecting changes to
> the underlying data in certain corner cases seems odd and indefensible
> to me, and I can't understand why anyone's arguing that we should do
> that.

*This* implementation of matviews, which doesn't actually track the
changes to the underlying data and *also* doesn't regenerate everything
is the *only* case where this concern is actually realized- and that's
because it's trying to make a key column out of an entire row, which is,
I'd argue, generally accepted in the DB world as a *bad idea*, with good
reason.

> If we're going to avoid that, we need this operator.  We can argue
> about how it should be named and whether it should be documented, and
> we can have all those arguments and still fix the problem.  But if we
> decide we're not going to add this operator, then that seems to be
> basically saying that we don't want to allow materialized views to
> accurately reflect the results of the underlying queries.  And I think
> that would be an extremely poor decision.

I don't want this implementation of matviews, which I think we can all
agree isn't the end-all, be-all wrt what we want matvies in PG to look
like anyway, to introduce things that will cause problems for us and
confuse our users.
Thanks,
    Stephen

Re: record identical operator

From

Kevin Grittner

Date:

23 September 2013, 18:44:36

Stephen Frost <sfrost@snowman.net> wrote:
> Robert Haas (robertmhaas@gmail.com) wrote:

>> Now, the next project Kevin's going to work on, and that he was
>> working on when he discovered this problem, is incremental
>> maintenance: that is, allowing us to update the view *without*
>> needing to rerun the entire query.  This record comparison
>> operator will be just as important in that context.
>
> You state this but I don't see where you justify this claim..

Unless we can tell whether there are any differences between two
versions of a row, we can't accurately generate the delta to drive
the incremental maintenance.  The initial thread discussing how
incremental maintenance would be done is here:

http://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo@web162904.mail.bf1.yahoo.com#1368561126.64093.YahooMailNeo@web162904.mail.bf1.yahoo.com

The thread on the initial patch for REFRESH MATERIALIZED VIEW
CONCURRENTLY started with:

| Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY
| for 9.4 CF1.  The goal of this patch is to allow a refresh
| without interfering with concurrent reads, using transactional
| semantics.
|
| It is my hope to get this committed during this CF to allow me to
| focus on incremental maintenance for the rest of the release cycle.

There was much discussion, testing, and revision then, which is
here:

http://www.postgresql.org/message-id/flat/1371225929.28496.YahooMailNeo@web162905.mail.bf1.yahoo.com#1371225929.28496.YahooMailNeo@web162905.mail.bf1.yahoo.com

I think pretty much every concern raised was addressed except for a
lingering doubt expressed by Noah over whether IS NOT DISTINCT FROM
semantics were really the right basis for matching rows.  Based on
that feedback, I spent a lot of time looking at why that might or
might not be correct, and decided that I had been wrong to base the
behavior on that, for the reasons Robert expressed so clearly a
couple messages back.  Hence this patch.

I had made the mistake of using an operator which used a
column-by-column comparison based on the equality operator for the
default opclass for comparing two values of each respective column
type.  I had been challenged on that in the review process, and was
responding to it with the fix contained in this patch.

The first post on this thread starts with:

| Attached is a patch for a bit of infrastructure I believe to be
| necessary for correct behavior of REFRESH MATERIALIZED VIEW
| CONCURRENTLY as well as incremental maintenance of matviews.

I think it is fairly obvious that REFRESH should REgenerate a FRESH
copy of the data, versus incremental maintenance -- which attempts
to keep the matview up-to-date without regenerating the full set of
data.  Whenever there is logical replication (and materialized
views are, conceptually, one form of that -- within the database) I
feel it is important to be able to correct any possible "drift".
With matviews, I see the way to do that as the REFRESH command, and
I feel that it is important to be able to do that in a way that can
run concurrently with readers of the matview -- without blocking
them or being blocked by them.

Discussion of incremental maintenance really belongs on a different
thread.  Since I have gone to the trouble to read a lot of papers
on the topic, and select one that I think is a good basis for our
implementation, I hope everyone will frame discussion in terms of
either:
  -  how best to implement the techniques from that paper, or
  -  why some other paper presents a better technique.

I think it would be madness to approach implementing incremental
maintenance based on an ad hoc set of thoughts rather than a
peer-reviewed paper.  I know how tempting it is it start from zero
and think, "if we just do X we can cover this sort of query."  I
had a few thoughts like that before I read all those papers and
discovered that there were many subtle issues to cover.  We will
have plenty of time to get creative with alternatives when we get
past the query types specifically addressed in the paper and begin
to move into, for example, CTEs and window functions.  I certainly
expect robust discussion around those areas, or even some of the
infrastructure for capturing deltas and triggering the incremental
maintenance.  I really didn't expect to have to burn so much time
and energy arguing over whether a REFRESH should leave the matview
accurately containing the results of the matview's query.

>> We can argue about how it should be named

After looking at the existing suggestions and thinking about it I'm
leaning toward these operators (based on a star in front of the
usual default comparison operators):

  *=  *<>  *>  *>=  *<  *<=

>> and whether it should be documented

I thought we had a consensus to document both the existing record
comparison operators and these new ones, and I'm fine with that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Robert Haas

Date:

23 September 2013, 18:46:49

On Mon, Sep 23, 2013 at 1:19 PM, Stephen Frost <sfrost@snowman.net> wrote:
> So I've gathered and I'm not terribly surprised that it's broken.  It
> was unfortunate that we committed it as such, imv.  I'm *not* convinced
> that means we should build on that in a direction that doesn't appear to
> be getting us any closer to real matviews.  Being in master doesn't make
> it released.
>
>> This is an improvement
>> over what is available in 9.3, because even though you still have to
>> rerun the full query, you don't have to lock out access to the table
>> in order to apply the changes.
>
> I see how you can view it as an improvement, but consider what you'd say
> if a user showed up on IRC wanting to implement *exactly* this in their
> DB using a cronjob and a bit of plpgsql.  Would we encourage him/her and
> say "gee, that's a great approach, just compare the whole row!"  Heck
> no, we'd say "uhh, you need a key there that you can depend on when
> you're doing your updates."  We'd probably even suggest that they, I
> dunno, make that key their *primary key* for the matview.  If they asked
> what would happen with their little plpgsql code when using citext or
> other, similar, type, we'd tell them exactly what would happen when
> using such a type with regular updates, and maybe even describe how they
> could get themselves into trouble if they tried to issue updates which
> changed those key fields and therefore run into possible lock contention
> from the unique index on values-that-are-not-really-unique-to-them.
>
> I can't see anyone encouraging that and here we are building a major new
> feature on it!  To be honest, I'm amazed that I appear to be alone with
> this.

I've written various scripts over the years for this kind of thing,
and they all compared the whole row.  I used a PK comparison to
determine which row needed to be updated and then compared the
remaining fields to figure out which values needed to be updated.  I
had assumed that's how most people did it; what do you do?

Anyway, that is exactly what Kevin is proposing to do here and, to be
clear, he's NOT proposing to use the binary-identical semantics to
identify the row to be updated.  That will happen using the semantics
of whatever index the user chooses to create on the PK column.
Rather, he's only using the binary-identical to decide which rows are
completely unchanged.  I might be wrong here, but it seems to me that
that renders most of your argument here moot. Under Kevin's proposal,
changing a citext column that acts as a PK for the matview WON'T cause
the row to be deleted and reinserted, but what it will do is say, oh,
it's the same row (the values are equal) but the case is different
(the rows are not binary-equal), let me go update the PK column with
the new value.  From where I stand, that seems like exactly the right
behavior.  What do you think should happen instead?

>> Now, the next project Kevin's going to work on, and that he was
>> working on when he discovered this problem, is incremental
>> maintenance: that is, allowing us to update the view *without* needing
>> to rerun the entire query.  This record comparison operator will be
>> just as important in that context.
>
> You state this but I don't see where you justify this claim..

The next sentence was intended as justification.

>> The *only* strategy refreshing a
>> materialized view that *doesn't* need this operator is the only we
>> have in 9.3: through out all the old data and replace it with the
>> result of re-executing the query.
>
> I really don't agree with this, but allow me to play along a bit.  What
> is going to happen with this incremental updates when you want to use an
> index to figure out what to update?

We already do use an index to figure out what to update with
concurrent refresh; or at least we can, when the query planner thinks
that is the cheapest strategy.  I don't imagine that that part will
change significantly for incremental updates.

> Are you going to build the index
> using your binary equality operator, which could allow *duplicates that
> are not really duplicates*, even in the externally-visible world? And
> then we'll be returning multiple rows to the user when we shouldn't be?

No, see above.  There's no intention that the user should need to use
this new opclass to make materialized views work correctly.  The user
is required to define a unique index in order to use REFRESH
MATERIALIZED VIEW CONCURRENTLY, and that unique index defines the
semantics used to compare existing rows to new candidate rows.  Only
after we've matched up the old and new rows do we use the
exact-equality semantics to decide whether to update the existing row
or do nothing.

> There's a whole slew of evil examples of this- it's called unnecessary
> DISTINCT's.  Or perhaps you'll simply look up based on the actual
> equality answer and then *pick one* to use, perhaps the most recent,

Sure, all of that would suck, but nobody's proposing anything like that.

> but
> then that may or may not equal what the actual query would generate
> *anyway* because of costing, join ordering, etc, etc, as I already
> pointed out.

Most queries I write produce the same results every time.  The order
of the rows may vary when I don't care, but I take care to control the
query so that the contents of the rows don't vary.  For example, if I
use string_agg() to group rows, I make sure the rows always arrive in
the same order, using ORDER BY.  The reason I do this is because, even
today, failing to do so causes too many user-visible artifacts.
However, I admit that people who don't adhere to that practice will
see additional anomalies with materialized views just as they do when
they run the same query repeatedly.  However, that will be true in any
materialized view implementation and has little or nothing to do with
the present proposal.

> You're trying to guarantee something here that you can't.
> Trying to fake it and tell the users "oh, you'll be ok" is actually
> worse because it means they'll try and depend on it and then get all
> kinds of upset when it ends up *not* working.

I don't know what this is referring to.

> I'd much rather tell users "look, if you want to use these for *data*
> fields, that's fine, because our smart matview will figure out the keys
> associated with a GROUP BY and will update based on those, but if you
> GROUP BY a column with a type whose equality operator allows things to
> be different, you'll get *the same behavior you would get from the
> query*, which is to say, it won't be deterministic."

Again, Kevin is proposing something strictly BETTER than this.  We
*will* figure out the keys associated with the GROUP BY and we *will*
update based on those, assuming you define the unique index on the
materialized view with the same opclass used for the GROUP BY.
However, if you GROUP BY a column with a type whose equality operator
allows things to be different, then the materialized view will
reference the actual value produced by the last execution of the
query, rather than possibly producing something else.

>> If you want anything smarter than
>> that, you MUST compare old and new rows for equality so that you can
>> update only those rows that have been changed.  And if you compare
>> them *any strategy other than the one Kevin is proposing*, namely
>> binary equality, then you may sometimes decide that a row has not been
>> changed when it really has, and then you won't update the row, and
>> then incremental maintenance will be enabled to produce *wrong
>> answers*.  So to me this has come to seem pretty much open and shut.
>
> The incremental maintenance approach, I'd hope, would be keeping track
> of what changed and what rows in the view need to be regenerated due to
> those underlying rows changing.  Doing that, you *would* know what
> needed to be updated, and how.  If you're not keeping track of the
> specific rows that changed underneath, then I'm curious how these
> incremental matviews will work differently from what's being described
> here, where we're trying to make the whole freakin' row the key.

Kevin posted an email discussing the algorithm that he was proposing
to implement back in May.

http://www.postgresql.org/message-id/1368561126.64093.YahooMailNeo@web162904.mail.bf1.yahoo.com

I would suggest that you read the referenced papers for details as to
how the algorithm works.  To make a long story short, they do need to
keep track of what's changed, and how.  However, that still seems
largely orthogonal to the present discussion.

>> We all know that materialized views are not going to always match the
>> data returned by the underlying query.  Perhaps the canonical example
>> is SELECT random().  As you pointed out upthread, any query that is
>> nondeterministic is a potential problem for materialized views.  When
>> you write a query that can return different output based on the order
>> in which input rows are scanned, or based on any sort of external
>> state such as a random-number generator, each refresh of a
>> materialized view based on that query may produce different answers.
>
> Of course.
>
>> There's not a lot we can do about that, except tell people to avoid
>> using such queries in materialized views that they expect to be
>> stable.  However, what we're talking about here is a completely
>> separate problem.  Even if the query is 100% deterministic, without
>> this patch, the materialized view can get out of sync with the query
>> results.
>
> Only a subset of the queries involving these types are 100%
> deterministic and those are the stupid simple cases where the view is
> just being used as a synonym.  As soon as you start doing anything that
> actual uses the equality operator (and we have a *lot* of things that
> do!), you're going to get nondeterministic behavior.  You're trying to
> paper over this and I'm not impressed.

This does not match my experience.  I have written many complex views
over the years which it would have been beneficial to materialize.
For example, I had a 34-table join involving multiple aggregates in
one web app I deployed.  I would not consider that to be using the
view "just as a synonym", but I can assure you that the results were
quite deterministic.  That view involved both text and numeric data,
as I recall.

> Or are you telling me that a matview using the binary-equality technique
> will be deterministic when your GROUP BY a citext column in the view?  I
> don't buy it.

No, I'm not telling you that.  That would be a stupid thing to say, so
I'm glad I'm not saying it.

>> Granted, most of the ways in which it can get out of sync are fairly
>> subtle: failing to preserve case in a data type where comparisons are
>> text-insensitive; gaining or loosing an all-zeroes null bitmap on an
>> array type; adding or removing trailing zeroes after the decimal point
>> in a numeric.  If the materialized view sometimes said "1" when the
>> query was returning "0", we'd presumably all say "that's a bug, let's
>> fix it".  But what we're actually talking about is that the query
>> returns "0.00" and the view still says zero.  So we're doing a lot of
>> soul-searching about whether that's unequal enough to justify updating
>> the row.  Personally, though, there's not a lot of doubt in my mind.
>
> If it was as open-and-shut as that, and we *could* be 100%
> deterministic in *all* cases with these special types when going through
> views, I'd be a lot more on-board.  However, we can't; not without
> insisting that the users use some new set of "binary-aggregation"
> operators which are able to *pick* which of the equal-but-not-really
> options they want; and by-the-way, you can *bet* that's going to need to
> be type-specific.

We don't have to control which operators users can select.  The
existing implementation, as committed, lets the user control that by
choosing which unique indexes to define.  It's actually a little bit
clunky at the moment; if you define multiple unique indexes on the
materialized view, it seems that we consider a row to be "the same
row" only if the equality operators for all columns in all such
indexes think that the rows hasn't been updated.  We might want to
revisit that; I can see insisting that the user pick exactly one
unique index to define the comparison semantics, and giving them
explicit syntax for doing so, precisely to eliminate some of the lock
contention and other problems that might occur from excess updates.  I
guess the importance of such a change depends largely on how likely
you think it is that someone might put a unique index on a
materialized view for some reason other than to define the comparison
semantics.  But whether we change that or not, the basic point you are
concerned about here is already covered.

>> If I create a table and I put 0 into a column of that table and then
>> create a materialized view and that 0 ends up in the materialized
>> view, and then I update the 0 to 0.00 and refresh the view, I expect
>> that change to propagate through to the materialized view.  It works
>> that way if I select from a regular, non-materialized view; and it
>> also works that way if I select from the table itself.
>
> Sure- and then you try to do the same thing with a join or a group by.
> Just because it works in the simple SELECT case doesn't mean we get to
> tell users "you can depend on this!"

No, but I think the case you're concerned about will work
mostly-acceptably today and even better after the proposed patch.  If
you disagree, perhaps you could prepare a test case demonstrating the
problem.  I don't think that either Kevin or myself are trying to ram
through a feature that is half-baked here, and I certainly agree that
it is very important to get as much of this possible worked out before
9.4 arrives in stores everywhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

23 September 2013, 19:05:56

* Kevin Grittner (kgrittn@ymail.com) wrote:
> Unless we can tell whether there are any differences between two
> versions of a row, we can't accurately generate the delta to drive
> the incremental maintenance.

This is predicated on the assumption that you simply generate the new
view and then try to figure out what to go update.  I'm trying to
explain that using that methodology is what landed us in this situation
to begin with.

>   The initial thread discussing how
> incremental maintenance would be done is here:

My apologies for not paying quite as close attention to that thread as I
should have.

> I think it is fairly obvious that REFRESH should REgenerate a FRESH
> copy of the data, versus incremental maintenance -- which attempts
> to keep the matview up-to-date without regenerating the full set of
> data.

Having 'REFRESH' regenerate a fresh copy of the data makes sense to me,
and is what we have now, no?  The only issue there is that it takes out
a big lock, which I appreciate that you're trying to get rid of.

>   Whenever there is logical replication (and materialized
> views are, conceptually, one form of that -- within the database) I
> feel it is important to be able to correct any possible "drift".
> With matviews, I see the way to do that as the REFRESH command, and
> I feel that it is important to be able to do that in a way that can
> run concurrently with readers of the matview -- without blocking
> them or being blocked by them.

Of course.

> Discussion of incremental maintenance really belongs on a different
> thread. 

I'm really getting tired of everyone saying "this is the only way to do
it" (or perhaps "well, this is already committed, therefore it must be
what we're gonna do") when a) we're already planning to rip this out
and change it, or so I thought, and b) we're trying to make promises we
can't keep with this approach.

> Since I have gone to the trouble to read a lot of papers
> on the topic, and select one that I think is a good basis for our
> implementation, I hope everyone will frame discussion in terms of
> either:
>   -  how best to implement the techniques from that paper, or
>   -  why some other paper presents a better technique.

My recollection from the hackers meeting is that I'm trying to simply
paraphrase what you had said was in the paper wrt keeping track of what
rows are changed underneath and using that as a basis to implement the
changes necessary in the view.  Does the paper you're referring to
describe rerunning the whole query and then trying to figure out what's
been changed..?  That's really what I'm having trouble understanding
why anyone would want to implement.  I'll try and find time to hunt down
the threads and papers on it, but I really could have sworn this was
gone over at the hacker meeting- and it made a lot of sense to me, then.

> I really didn't expect to have to burn so much time
> and energy arguing over whether a REFRESH should leave the matview
> accurately containing the results of the matview's query.

I appreciate you bringing me up to speed on where things actually are
here- again, sorry for not realizing the direction that this was going
in earlier; it really didn't even occur to me that it would have gone
down this road.  I, also, didn't expect to spend so much time on this.

> >> We can argue about how it should be named

Really, I'm back to trying to figure out why we want to go down this
road at all.

> >> and whether it should be documented
>
> I thought we had a consensus to document both the existing record
> comparison operators and these new ones, and I'm fine with that.

If it gets added, it certainly should be documented, and heavily
caveated.
Thanks,
    Stephen

Re: record identical operator

From

Robert Haas

Date:

23 September 2013, 19:10:11

On Mon, Sep 23, 2013 at 2:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Anyway, that is exactly what Kevin is proposing to do here and, to be
> clear, he's NOT proposing to use the binary-identical semantics to
> identify the row to be updated.  That will happen using the semantics
> of whatever index the user chooses to create on the PK column.
> Rather, he's only using the binary-identical to decide which rows are
> completely unchanged.  I might be wrong here, but it seems to me that
> that renders most of your argument here moot. Under Kevin's proposal,
> changing a citext column that acts as a PK for the matview WON'T cause
> the row to be deleted and reinserted, but what it will do is say, oh,
> it's the same row (the values are equal) but the case is different
> (the rows are not binary-equal), let me go update the PK column with
> the new value.  From where I stand, that seems like exactly the right
> behavior.  What do you think should happen instead?

Ah, I'm wrong here.  It is true that we use the unique index semantics
to compare the results of rerunning the query to what's in the view,
but it's not true (currently) that we ever do updates on the view as
opposed to delete-and-reinsert.  Apparently that code got (mostly)
ripped out at some point, but I was confused by a comment that wasn't
fully updated to reflect the new reality.

Still, I believe that most of the points I'm making here remain valid,
because the key assumption you seem to be making is that Kevin is
proposing to use binary-identical semantics throughout, and that's not
true.  Old rows and new candidate rows are matched up using the
user-specified opclass, but binary-identical.  Binary-identical just
determines whether to replace the rows.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

23 September 2013, 19:42:21

Robert, all,

Skipping out on much of the side-discussion to try and drive at the
heart of this..

* Robert Haas (robertmhaas@gmail.com) wrote:
> I would suggest that you read the referenced papers for details as to
> how the algorithm works.  To make a long story short, they do need to
> keep track of what's changed, and how.

I suppose it's good to know that I wasn't completely misunderstanding
the discussion in Ottawa.

> However, that still seems
> largely orthogonal to the present discussion.

It *solves* this issue, from where I'm sitting, without any binary
operators at all.

rows 4, 5, 6 are used to compose matrow 1.  When 4, 5, or 6 are updated,
matrow 1 gets updated.

When the update happens, and it *will* happen (there's no question about
"oh, should we actually update this record?"), it's a normal PG update
and all of the values in the row get set to whatever the new values are.

The only reason we've having any of this discussion is because, in the
current implementation, aiui anyway, we're saying "oh, the user wants us
to update the matview, but we have *no clue* what actually changed, so
we're just gonna try and guess.."  This is changing that from "we're
gonna try and guess.." to "well, if they aren't *binary identical*,
we'll change them."  However, if you're actually tracking what's
changing and rebuilding the rows based on that, there's no question
about binary equality, and you're still only updating the rows that
actually need to be updated.
Thanks,
    Stephen

Re: record identical operator

From

Kevin Grittner

Date:

23 September 2013, 19:56:10

Stephen Frost <sfrost@snowman.net> wrote:

> I'm trying to explain that using that methodology is what landed
> us in this situation to begin with.

I'm trying to figure out what situation you think we're in.
Seriously, if you could apply the patch and show one example that
demonstrates what you see to be a problem, that would be great.

>> I think it is fairly obvious that REFRESH should REgenerate a FRESH
>> copy of the data, versus incremental maintenance -- which attempts
>> to keep the matview up-to-date without regenerating the full set of
>> data.
>
> Having 'REFRESH' regenerate a fresh copy of the data makes sense to me,
> and is what we have now, no?  The only issue there is that it takes out
> a big lock, which I appreciate that you're trying to get rid of.
>
>>   Whenever there is logical replication (and materialized
>> views are, conceptually, one form of that -- within the database) I
>> feel it is important to be able to correct any possible "drift".
>> With matviews, I see the way to do that as the REFRESH command, and
>> I feel that it is important to be able to do that in a way that can
>> run concurrently with readers of the matview -- without blocking
>> them or being blocked by them.
>
> Of course.
>
>> Discussion of incremental maintenance really belongs on a different
>> thread.
>
> I'm really getting tired of everyone saying "this is the only way to do
> it" (or perhaps "well, this is already committed, therefore it must be
> what we're gonna do")

What I'm saying is that REFRESH and incremental maintenance are two
different things, and conflating them just confuses everything.

> when a) we're already planning to rip this out and change it, or
> so I thought,

The entire change to matview-specific code is to use a different
operator in two places.  Outside of that, it consists of adding the
12th non-default opclass to core.

> and b) we're trying to make promises we can't keep with this
> approach.

I don't see any such.  If you do, please describe them; or better
yet, give an example.

>> Since I have gone to the trouble to read a lot of papers
>> on the topic, and select one that I think is a good basis for our
>> implementation, I hope everyone will frame discussion in terms of
>> either:
>>   -  how best to implement the techniques from that paper, or
>>   -  why some other paper presents a better technique.
>
> My recollection from the hackers meeting is that I'm trying to simply
> paraphrase what you had said was in the paper wrt keeping track of what
> rows are changed underneath and using that as a basis to implement the
> changes necessary in the view.  Does the paper you're referring to
> describe rerunning the whole query and then trying to figure out what's
> been changed..?  That's really what I'm having trouble understanding
> why anyone would want to implement.  I'll try and find time to hunt down
> the threads and papers on it, but I really could have sworn this was
> gone over at the hacker meeting- and it made a lot of sense to me, then.

The only thing the paper says on the topic is that any incremental
maintenance scheme is a heuristic.  There will always be cases when
it would be faster and less resource-intensive to regenerate the
data from the defining query.  There is at least an implication
that a good implementation will try to identify when it is in such
a situation, and ignore the whole incremental maintenance approach
in favor of what we are doing with REFRESH.  The example they give
is if there is an unqualified DELETE of every row in a table which
is part of an inner join generating the result, that it would
almost be faster to to generate the (empty) result set than to run
their algorithm to determine that all the rows need to be deleted.
One reason for having a REFRESH that re-runs the query like this is
that it *is* a recommended "escape hatch" when a mass operation
makes the incremental calculations too expensive.

>> I really didn't expect to have to burn so much time
>> and energy arguing over whether a REFRESH should leave the matview
>> accurately containing the results of the matview's query.
>
> I appreciate you bringing me up to speed on where things actually are
> here- again, sorry for not realizing the direction that this was going
> in earlier; it really didn't even occur to me that it would have gone
> down this road.  I, also, didn't expect to spend so much time on this.
>
>>>> We can argue about how it should be named
>
> Really, I'm back to trying to figure out why we want to go down this
> road at all.
>
>>>> and whether it should be documented
>>
>> I thought we had a consensus to document both the existing record
>> comparison operators and these new ones, and I'm fine with that.
>
> If it gets added, it certainly should be documented,

That seems to be the consensus.  In fact, I would have submitted
that with this patch if there had been any documentation for the
default record comparison operators.  It seemed like that might
have been omitted on purpose, and it seemed weird to add
documentation for a non-default operator for records when (a) we
didn't document the default operator for records and (b) we don't
document many of the other non-default operators already in core.

> and heavily caveated.

I'm not sure what caveats would be needed.  It seems to me that a
clear description of what it does would suffice.  Like all the
other non-default opclasses in core, it will be non-default because
it is less frequently useful.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Robert Haas

Date:

23 September 2013, 19:59:08

On Mon, Sep 23, 2013 at 3:45 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Robert, all,
>
> Skipping out on much of the side-discussion to try and drive at the
> heart of this..
>
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> I would suggest that you read the referenced papers for details as to
>> how the algorithm works.  To make a long story short, they do need to
>> keep track of what's changed, and how.
>
> I suppose it's good to know that I wasn't completely misunderstanding
> the discussion in Ottawa.
>
>> However, that still seems
>> largely orthogonal to the present discussion.
>
> It *solves* this issue, from where I'm sitting, without any binary
> operators at all.
>
> rows 4, 5, 6 are used to compose matrow 1.  When 4, 5, or 6 are updated,
> matrow 1 gets updated.
>
> When the update happens, and it *will* happen (there's no question about
> "oh, should we actually update this record?"), it's a normal PG update
> and all of the values in the row get set to whatever the new values are.

I don't know why there shouldn't be a question about that.  Suppose
that the MAX() aggregate is in use.  If 4 or 5 or 6 is updated so as
to change the maximum of the three, then matrow 1 needs updating.  But
if the maximum remains the same, then it doesn't.  The right way to
decide whether it needs updating is to re-aggregate those three rows
and then see whether you get the same (read: binary identical) out of
the aggregate that you got the last time you ran it.

Also, suppose the same statement updates row 4, row 5, and row 6.
Instead of updating the materialized view three times, you do it just
once at end-of-statmement, like an AFTER STATEMENT trigger that
somehow knows which rows were updated.  In this case even something
like AVG() could produce the same result as it did before the update.
And you'd surely want to avoid updating the matview if the new value
was the same as what was already stored in the matview (but not if it
was equal but not the same).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

23 September 2013, 20:30:02

* Robert Haas (robertmhaas@gmail.com) wrote:
> I don't know why there shouldn't be a question about that.

Because anything else would be an internal optimization which must be
proven to be correct, imv, also..

> Suppose
> that the MAX() aggregate is in use.  If 4 or 5 or 6 is updated so as
> to change the maximum of the three, then matrow 1 needs updating.  But
> if the maximum remains the same, then it doesn't.  The right way to
> decide whether it needs updating is to re-aggregate those three rows
> and then see whether you get the same (read: binary identical) out of
> the aggregate that you got the last time you ran it.

You could argue the same about PG doing that for any row update- check
if anything is actually *binary* different and, if not, then don't
update it.  Of course, there's questions about if that's "right" and
what about triggers, etc..

> Also, suppose the same statement updates row 4, row 5, and row 6.
> Instead of updating the materialized view three times, you do it just
> once at end-of-statmement, like an AFTER STATEMENT trigger that
> somehow knows which rows were updated.

Sorry if I wasn't clear, but that's exactly what I was trying to
describe regarding how it should work.  I was NOT intending to suggest
that each update immediately update the matview.  It's just that we
keep *track* of what was updated and then, at some convenient point,
actually run the process to update the matview rows (maybe in an AFTER
statement, maybe 5 minutes from now).

> In this case even something
> like AVG() could produce the same result as it did before the update.

Sure it could.

> And you'd surely want to avoid updating the matview if the new value
> was the same as what was already stored in the matview (but not if it
> was equal but not the same).

I don't see why updating a row that was built with AVG() should be
avoided over a row that was built with MAX(), unless you're suggesting
there's a different set of rows involved in the two or there's some
additional optimization around figuring out if these particular changes
*should* actually change the result.  That's an analysis which could
still happen and wouldn't need to rely on any binary equality test, and
it'd need to have a whole lot more smarts than this approach anyway or
you'll still end up running a query against all of the rows involved in
the AVG() to then only decide at the last moment to not update the row,
which doesn't strike me as a great optimization.  Perhaps that's why we
didn't implement it for PG itself?
Thanks,
    Stephen

Re: record identical operator

From

Stephen Frost

Date:

23 September 2013, 20:36:01

* Kevin Grittner (kgrittn@ymail.com) wrote:
> The only thing the paper says on the topic is that any incremental
> maintenance scheme is a heuristic.  There will always be cases when
> it would be faster and less resource-intensive to regenerate the
> data from the defining query.  There is at least an implication
> that a good implementation will try to identify when it is in such
> a situation, and ignore the whole incremental maintenance approach
> in favor of what we are doing with REFRESH.  The example they give
> is if there is an unqualified DELETE of every row in a table which
> is part of an inner join generating the result, that it would
> almost be faster to to generate the (empty) result set than to run
> their algorithm to determine that all the rows need to be deleted.
> One reason for having a REFRESH that re-runs the query like this is
> that it *is* a recommended "escape hatch" when a mass operation
> makes the incremental calculations too expensive.

I was just chatting about this with a coworker and we see a *lot* of
cases where an unqualified DELETE and then INSERT would be perfectly
acceptable over the existing heavy lock that's taken out.  We're already
doing that in a lot of places because it's actually pretty cheap and the
"view" results aren't all that big.  It's also completely defensible and
doesn't require any new operators.

> > and heavily caveated.
>
> I'm not sure what caveats would be needed.  It seems to me that a
> clear description of what it does would suffice.  Like all the
> other non-default opclasses in core, it will be non-default because
> it is less frequently useful.

"This will claim things are different, even when they aren't different
when cast to text, or possibly even when extracted in binary mode,
ensure this is really what you want" is a pretty big caveat, imv.
Thanks,
    Stephen

Re: record identical operator

From

Stephen Frost

Date:

23 September 2013, 21:33:47

* Robert Haas (robertmhaas@gmail.com) wrote:
> In this case even something
> like AVG() could produce the same result as it did before the update.
> And you'd surely want to avoid updating the matview if the new value
> was the same as what was already stored in the matview (but not if it
> was equal but not the same).

BTW, I'm wondering how you're going to explain that, with an
optimization of MAX() for matviews that only looks at the records which
were updated, that we decided to change 'A' to 'a' on a citext column
because they were binary different, yet if you considered all of the
rows under the MAX(), 'A' would have nearly always come first and
therefore would have been the more likely candidate (and what the user
probably would have gotten running the query themselves).

Also, as asked, I'll work up an example of how the matview can produce
a different result from the regular view in some cases, even with the
binary equality stuff.
Thanks,
    Stephen

Re: record identical operator

From

Stephen Frost

Date:

24 September 2013, 01:19:14

* Kevin Grittner (kgrittn@ymail.com) wrote:
> I'm trying to figure out what situation you think we're in.
> Seriously, if you could apply the patch and show one example that
> demonstrates what you see to be a problem, that would be great.

Here's an example to illustrate what I'm talking about when it comes
down to "you can't claim that you'll produce exactly what the query
will always, with these types:"

=# table citext_table;id | name
----+------- 1 | one 3 | three 5 |  4 | Three 2 | Three
(5 rows)

=# create MATERIALIZED VIEW citext_matview AS select name from citext_table where id > 0 group by name;
SELECT 3

=# table citext_matview;name
-------onethree
(3 rows)

=# set enable_seqscan = false;
SET

=# select name from citext_table where id > 0 group by name;name
-------oneThree
(3 rows)

Yes, the 'set enable_seqscan = false;' is a bit of a cheat- but I hope
you agree that it *shouldn't* change the *result* of a query.  There's
certainly other cases where a different plan could be picked and result
in the same kind of difference between the matview and running the
query.

Build a matview w/ max(id) as id and a unique index on 'id' and you'll
still get the same issue.  The problem is that the seqscan will pick up
on 'three' first while the index scan will find 'Three' first.

This is all with the patch from
http://www.postgresql.org/message-id/1379024847.48294.YahooMailNeo@web162904.mail.bf1.yahoo.com
applied.

I simply don't believe it's possible to be consistent in this unless we
declare the type's equality to be insufficient for us and, everywhere
that we call a type's equality operator, *also* check which of the equal
values involved in the comparison is "bigger" or "lower" (based on
binary comparison) and actually pick one, consistently, all the time.

This is why I'm complaining that we're trying to paper over a difference
that just isn't that simple.  I understand why you're trying to- it's
definitely annoying, but this isn't so much a solution as it is a hack
that doesn't actually work in all cases.

I don't have a silver bullet for this but I don't like saying "let's
implement these binary operators and make things *look* consistent in
most cases, and then fail subtly in complex situations."  I agree that
users may complain if the underlying relation ends up not having *any*
entries with the value that's in the matview and a refresh doesn't
update it.

I expect users will *also* complain if we implement these operators and
then they write their own queries using this awesome new 'binary'
comparison operator to compare their equal-to-citext strings- and
discover that the binary operators say things that *look* the same are
actually *different* (thinking encoding issues, overlong encodings,
etc).  That'd be a subtle and painful bug (technically in their code,
not ours, but still) to find, and then what do you do?  Argueing against
citext (as I've had to do in the past..) would probably be more
difficult for some if they saw these operators and figured they could
use them.

As I mentioned in the thread w/ Robert, when looking at further
optimizations down the road, we're going to be in a situation of looking
at only a subset of the records and then considering what to do when
supporting MAX() efficiently leads us to running 'greater(existing,new)'
which returns false, but q and r are binary different.  This argument
is saying "always replace what's there" but every other, existing, value
in the table might be represented by the 'existing' representation.
Thanks,
    Stephen

Re: record identical operator

From

Robert Haas

Date:

24 September 2013, 03:08:20

On Mon, Sep 23, 2013 at 9:21 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Here's an example to illustrate what I'm talking about when it comes
> down to "you can't claim that you'll produce exactly what the query
> will always, with these types:"

Your example demonstrates that if a given query can generate two
different outputs, A and B, based on the same input data, then the
contents of the materialized view cannot be equal to be A and also
equal to B.  Well, no duh.

Of course, you don't need citext, or any other data type with a loose
notion of equality, to generate that sort of problem:

rhaas=# create table chicken_little (d date);
CREATE TABLE
rhaas=# insert into chicken_little values ('2012-02-21');
INSERT 0 1
rhaas=# create materialized view henny_penny as select d::text from
chicken_little;
SELECT 1
rhaas=# table chicken_little;    d
------------2012-02-21
(1 row)

rhaas=# table henny_penny;    d
------------2012-02-21
(1 row)

rhaas=# set datestyle = 'german';
SET
rhaas=# table chicken_little;    d
------------21.02.2012
(1 row)

rhaas=# table henny_penny;    d
------------2012-02-21
(1 row)

But I'm still wondering what this is intended to prove.  There are an
infinite number of ways to write queries that produce different
results, and I think we all know that materialized views aren't going
to hold up very well if given such queries.  That seems a poor excuse
for not fixing the cases that can be made to work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Andres Freund

Date:

24 September 2013, 08:24:52

On 2013-09-23 21:21:53 -0400, Stephen Frost wrote:
> Here's an example to illustrate what I'm talking about when it comes
> down to "you can't claim that you'll produce exactly what the query
> will always, with these types:"
> 
> =# table citext_table;
>  id | name  
> ----+-------
>   1 | one
>   3 | three
>   5 | 
>   4 | Three
>   2 | Three
> (5 rows)
> 
> =# create MATERIALIZED VIEW citext_matview AS select name from citext_table where id > 0 group by name;
> SELECT 3
> 
> =# table citext_matview;
>  name  
> -------
>  
>  one
>  three
> (3 rows)

I don't really understand why you have a problem with this specific
thing here. Kevin's goal seems to be to make materialized views behave
consistent with the way a plain view would if the matviews would just
have been refreshed fully, concurrently or incrementally and there have
been no further data changes.
SELECT * FROM table WHERE candidate_key IN (...);
used in a view or plainly currently guarantees you that you get the
original casing for citext. And if you e.g. have some additional
expensive joins, such a matview can very well make sense.

What does it matter that ... GROUP BY citext_col; doesn't return the
same casing consistently? You're aggregating, not accessing the original
data. If you need to separate the different casings now, cast to text.

Now, I can perfectly understand having a bit of architectural qualms
about Kevin's approach of things on the SQL level. But this doesn't seem
to be the thread to discuss that. FWIW, I can't forsee any realistic
approach that actually won't do such comparisons (although not
necessarily through C).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: record identical operator

From

Hannu Krosing

Date:

24 September 2013, 09:42:31

On 09/23/2013 10:38 PM, Stephen Frost wrote:
>
>>> and heavily caveated.
>> I'm not sure what caveats would be needed.  It seems to me that a
>> clear description of what it does would suffice.  Like all the
>> other non-default opclasses in core, it will be non-default because
>> it is less frequently useful.
> "This will claim things are different, even when they aren't different
> when cast to text, or possibly even when extracted in binary mode,
> ensure this is really what you want" is a pretty big caveat, imv.
Yes, it should be documented that it tests for sameness and gives
no guarantees that lack of sameness means "different" (as
determined by some other operator)

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: record identical operator

From

Stephen Frost

Date:

24 September 2013, 11:11:53

* Robert Haas (robertmhaas@gmail.com) wrote:
> Your example demonstrates that if a given query can generate two
> different outputs, A and B, based on the same input data, then the
> contents of the materialized view cannot be equal to be A and also
> equal to B.  Well, no duh.

Two different outputs based on what *plan* is chosen.

> Of course, you don't need citext, or any other data type with a loose
> notion of equality, to generate that sort of problem:
[...]
> rhaas=# set datestyle = 'german';
> SET

I'm talking about *planner differences* changing the results.  If we've
got a ton of cases where a different plan means different output, then
we've got some serious problems.  I'd argue that it's pretty darn clear
that datestyle is going to be a *slightly* different animal.  My example
doesn't *require* changing any GUCs, it was just expedient for
illustration.

> But I'm still wondering what this is intended to prove.

These types are, to those that use them at least, a known quantity wrt
what you get when using them in GROUP BYs, JOINs, etc.  You're trying
to 'fix' something that isn't really broken and you're only doing it
half-baked anyway because your 'fix' isn't going to actually make these
types produce consistent results.

> There are an
> infinite number of ways to write queries that produce different
> results, and I think we all know that materialized views aren't going
> to hold up very well if given such queries.  That seems a poor excuse
> for not fixing the cases that can be made to work.

New operators are not without cost, I don't think it's a good idea to
expose our internal binary representations of data out to the SQL level,
and the justification for adding them is that they solve a case that
they don't.
Thanks,
    Stephen

Re: record identical operator

From

Kevin Grittner

Date:

24 September 2013, 12:38:36

Stephen Frost <sfrost@snowman.net> wrote:

> Skipping out on much of the side-discussion to try and drive at
> the heart of this..
>
> Robert Haas (robertmhaas@gmail.com) wrote:
>> I would suggest that you read the referenced papers for details
>> as to how the algorithm works.  To make a long story short, they
>> do need to keep track of what's changed, and how.
>
> I suppose it's good to know that I wasn't completely
> misunderstanding the discussion in Ottawa.
>
>> However, that still seems largely orthogonal to the present
>> discussion.
>
> It *solves* this issue, from where I'm sitting, without any
> binary operators at all.

> [ argument that the only way we should ever do REFRESH is by
> using captured deltas, through incremental maintenance techniques
> ]

That would ensure that a query could not be used to define a
matview until we had implemented incremental maintenance for
queries of that type and complexity.  I expect that to come *close*
to covering all useful queries that way will take five to ten
years, if all goes well.  The approach I'm taking makes all queries
available *now*, with improvements in how many can be maintained
incrementally improving over time.  This is the route every other
database I've looked at has taken (for good reason, I think).

Ultimately, even when we have incremental maintenance supported for
all queries that can be used to define a matview, I think there
will be demand for a REFRESH command that re-runs the base query.
Not only does that fit the workload for some matviews, but consider
this, from the paper I cited[1]:

| Recomputing the view from scratch is too wasteful in most cases.
| Using the heuristic of inertia (only a part of the view changes
| in response to changes in the base relations), it is often
| cheaper to compute only the changes in the view.  We stress that
| the above is only a heuristic.  For example, if an entire base
| relation is deleted, it may be cheaper to recompute a view that
| depends on the deleted relation (if the new view will quickly
| evaluate to an empty relation) than to compute the changes to the
| view.

What we're talking about is a performance heuristic -- not
something more profound than that.  The route I'm taking is to get
it *working correctly now* using the simple technique, and then
embarking on the long journey of optimizing progressively more
cases.

What your argument boils down to IMV is essentially a case of
premature optimization.  You have yet to show any case where the
existing patch does not yield correct results, or show that there
is a better way to get to the point this patch takes us.

> [ later post shows a query that does not produce deterministic
> results ]

Sure, two direct runs of that same query, or two runs through a
regular view, could show different results (considering
synchronized scans, among other things).  I don't see what that
proves.  Of course a refresh of a matview is only going to produce
one of those and then will not produce a different result until it
is refreshed or (once we add incremental maintenance) something
changes in the underlying data.

Nobody ever claimed that a query which does not produce consistent
results would somehow produce them with this patch.  There are
queries using citext, numeric, and other types which *do* provide
consistent results which are consistently produced by a straight
query, a simple view, or a non-concurrent refresh of a materialized
view; this patch will cause a concurrent refresh to produce the
same results as those, rather than something different.  Period.
That's the point, and the whole point.  You have not shown that it
doesn't.  You have not shown why adding a 12th non-default opclass
is a particular problem here (although we have a consensus to use
different operators, to reserve this operator namespace for other
things).  You have not made any case at all for why people should
wait for incremental maintenance to be mature (a project which will
take years) before being able to use materialized views with
concurrent refreshes.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] A. Gupta, I. S. Mumick, and V. S. Subrahmanian.  Maintaining
Views Incrementally.  In SIGMOD 1993, pages 157-167.
http://dl.acm.org/citation.cfm?id=170066

Re: record identical operator

From

Robert Haas

Date:

24 September 2013, 12:49:54

On Tue, Sep 24, 2013 at 7:14 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> Of course, you don't need citext, or any other data type with a loose
>> notion of equality, to generate that sort of problem:
> [...]
>> rhaas=# set datestyle = 'german';
>> SET
>
> I'm talking about *planner differences* changing the results.  If we've
> got a ton of cases where a different plan means different output, then
> we've got some serious problems.  I'd argue that it's pretty darn clear
> that datestyle is going to be a *slightly* different animal.  My example
> doesn't *require* changing any GUCs, it was just expedient for
> illustration.

I'm completely confused, here.  What's so special about planner
differences?  Obviously, there are tons of queries that can produce
different results based on planner differences, but materialized views
didn't create that problem and they aren't responsible for solving it.

Also, even restricting ourselves to planner differences, there's no
particular link between the behavior of the type's equality operator
and whether or not the query always produces the same results.  For
example, let's consider good old text.

rhaas=# create table tag_data (id integer, tag text, userid text,
primary key (id, tag));
CREATE TABLE
rhaas=# insert into tag_data values (1, 'foo', 'tom'), (1, 'bar',
'dick'), (2, 'baz', 'harry');
INSERT 0 3
rhaas=# select id, string_agg(tag||':'||userid, ', ') tags from
tag_data group by 1;id |       tags
----+------------------- 1 | foo:tom, bar:dick 2 | baz:harry
(2 rows)

rhaas=# set enable_seqscan=false;
SET
rhaas=# select id, string_agg(tag||':'||userid, ', ') tags from
tag_data group by 1;id |       tags
----+------------------- 1 | bar:dick, foo:tom 2 | baz:harry
(2 rows)

Now texteq() is just a glorified version of memcmp(), so what does
this complaint possibly have to do with Kevin's patch, or even
materialized views in general?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

24 September 2013, 13:29:17

* Kevin Grittner (kgrittn@ymail.com) wrote:
> That's the point, and the whole point.  You have not shown that it
> doesn't.  You have not shown why adding a 12th non-default opclass
> is a particular problem here (although we have a consensus to use
> different operators, to reserve this operator namespace for other
> things). 

We need justification to add operators, imv, especially ones that expose
our internal binary representation of data.  I worry that adding these
will come back to bite us later and that we're making promises we won't
be able to keep.

If these inconsistencies in what happens with these data types are an
issue then REFRESH can be handled as a wholesale DELETE/INSERT.  Trying
to do this incremental-but-not-really maintenance where the whole query
is run but we try to skimp on what's actually getting updated in the
matview is a premature optimization, imv, and one which may be less
performant and more painful, with more gotchas and challenges for our
users, to deal with in the long run.
Thanks,
    Stephen

Re: record identical operator

From

Kevin Grittner

Date:

24 September 2013, 13:46:33

Stephen Frost <sfrost@snowman.net> wrote:

> We need justification to add operators, imv, especially ones that
> expose our internal binary representation of data.

I believe I have done so.

> I worry that adding these will come back to bite us later

How?

> and that we're making promises we won't be able to keep.

The promise that a concurrent refresh will produce the same set of
rows as a non-concurrent one?

> If these inconsistencies in what happens with these data types
> are an issue then REFRESH can be handled as a wholesale
> DELETE/INSERT.

I have real-life experience with handling faux materialized views
by creating tables (at Wisconsin Courts).  We initially did it the
way you describe, but the run time was excessive (in Milwaukee
County the overnight run did not always complete before the start
of business hours the next day).  Switching to logic similar to
what I've implemented here, it completed an order of magnitude
faster, and generated a small fraction of the WAL and logical
replication data that the technique you describe did.  Performing
preliminary steps in temporary tables to minimize the changes
needed to permanent tables seems beneficial enough on the face of
it that I think the burden of proof should be on someone arguing
that deleting all rows and re-inserting (in the same transaction)
is, in general, a superior approach to finding the differences and
applying only those/

> Trying to do this incremental-but-not-really maintenance where
> the whole query is run but we try to skimp on what's actually
> getting updated in the matview is a premature optimization, imv,
> and one which may be less performant and more painful, with more
> gotchas and challenges for our users, to deal with in the long
> run.

I have the evidence of a ten-fold performance improvement plus
minimized WAL and replication work on my side.  What evidence do
you have to back your assertions?  (Don't forget to work in bloat
and vacuum truncation issues to the costs of your proposal.)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

24 September 2013, 14:36:43

* Kevin Grittner (kgrittn@ymail.com) wrote:
> Stephen Frost <sfrost@snowman.net> wrote:
> > I worry that adding these will come back to bite us later
>
> How?

User misuse is certainly one consideration, but I wonder what's going to
happen if we change our internal representation of data (eg: numerics
get changed again), or when incremental matview maintenance happens and
we start looking at subsets of rows instead of the entire query.  Will
the first update of a matview after a change to numeric's internal data
structure cause the entire thing to be rewritten?

> > and that we're making promises we won't be able to keep.
>
> The promise that a concurrent refresh will produce the same set of
> rows as a non-concurrent one?

The promise that we'll always return the binary representation of the
data that we saw last.  When greatest(x,y) comes back 'false' for a
MAX(), we then have to go check "well, does the type consider them
equal?", because, if the type considers them equal, we then have to
decide if we should replace x with y anyway, because it's different
at a binary level.  That's what we're saying we'll always do now.

We're also saying that we'll replace things based on plan differences
rather than based on if the rows underneath actually changed at all.
We could end up with material differences in the result of matviews
updated through incremental REFRESH and matviews updated through
actual incremental mainteance- and people may *care* about those
because we've told them (or they discover) they can depend on these
types of changes to be reflected in the result.

> > Trying to do this incremental-but-not-really maintenance where
> > the whole query is run but we try to skimp on what's actually
> > getting updated in the matview is a premature optimization, imv,
> > and one which may be less performant and more painful, with more
> > gotchas and challenges for our users, to deal with in the long
> > run.
>
> I have the evidence of a ten-fold performance improvement plus
> minimized WAL and replication work on my side.  What evidence do
> you have to back your assertions?  (Don't forget to work in bloat
> and vacuum truncation issues to the costs of your proposal.)

I don't doubt that there are cases in both directions and I'm not trying
to argue that it'd always be faster, but I doubt it's always slower.
I'm surprised that you had a case where the query was apparently quite
fast yet the data set hardly changed and resulted in a very large result
but I don't doubt that it happened.  What I was trying to get at is
really that the delete/insert approach would be good enough in very many
cases and it wouldn't have what look, to me anyway, as some pretty ugly
warts around these cases.
Thanks,        Stephen

Re: record identical operator

From

Kevin Grittner

Date:

24 September 2013, 15:19:12

Stephen Frost <sfrost@snowman.net> wrote:
> Kevin Grittner (kgrittn@ymail.com) wrote:
>> Stephen Frost <sfrost@snowman.net> wrote:
>> > I worry that adding these will come back to bite us later
>>
>> How?
>
> User misuse is certainly one consideration,

I think that one has been talked to death already, with the
consensus that we should use different operator names and document
them as a result.  Any comparison operator can be misused.  Andres
said he has seen the operators from text_pattern_ops used in
production; but we have not removed, for example, ~>=~ from our
text operators as a result.

> but I wonder what's going to
> happen if we change our internal representation of data (eg: numerics
> get changed again), or when incremental matview maintenance happens and
> we start looking at subsets of rows instead of the entire query.  Will
> the first update of a matview after a change to numeric's internal data
> structure cause the entire thing to be rewritten?

Something like that could happen, if the newly generated values,
for example, all took less space than the old.  On the first
REFRESH on the new version of the software it might delete and
insert all rows; then it would stay with the new representation.  I
have trouble seeing that as a problem, since presumably we only
come up with a new representation because it is smaller, faster, or
cures some bug.

>>> and that we're making promises we won't be able to keep.
>>
>> The promise that a concurrent refresh will produce the same set of
>> rows as a non-concurrent one?
>
> The promise that we'll always return the binary representation of the
> data that we saw last.  When greatest(x,y) comes back 'false' for a
> MAX(), we then have to go check "well, does the type consider them
> equal?", because, if the type considers them equal, we then have to
> decide if we should replace x with y anyway, because it's different
> at a binary level.  That's what we're saying we'll always do now.

I'm having a little trouble following that.  The query runs as it
always has, with all the old definitions of comparisons.  After it
is done, we check whether the rows are the same.  The operation of
MAX() will not be affected in any way.  If MAX() returns a value
which is not the same as what the matview has, the matview will be
modified to match what MAX() returned.

> We're also saying that we'll replace things based on plan differences
> rather than based on if the rows underneath actually changed at all.

Only if the user uses a query which does not produce deterministic
results.

> We could end up with material differences in the result of matviews
> updated through incremental REFRESH and matviews updated through
> actual incremental mainteance- and people may *care* about those
> because we've told them (or they discover) they can depend on these
> types of changes to be reflected in the result.

Perhaps we should document the recommendation that people not
create materialized views with non-deterministic results, but I
don't think that should be a hard restriction.  For example, I
could see someone creating a materialized view to pick a random
sample from a jury pool to be the on the jury panel for today (from
which juries are selected on that day), and not want that to be
predictable and not want it to change until the next refresh of the
panel matview.  (That would not be my first design choice, but it
would not be a horrid choice if there were sufficient logging of
the REFRESH statements in a place the user could not modify.)

>>> Trying to do this incremental-but-not-really maintenance where
>>> the whole query is run but we try to skimp on what's actually
>>> getting updated in the matview is a premature optimization, imv,
>>> and one which may be less performant and more painful, with more
>>> gotchas and challenges for our users, to deal with in the long
>>> run.
>>
>> I have the evidence of a ten-fold performance improvement plus
>> minimized WAL and replication work on my side.  What evidence do
>> you have to back your assertions?  (Don't forget to work in bloat
>> and vacuum truncation issues to the costs of your proposal.)
>
> I don't doubt that there are cases in both directions and I'm not trying
> to argue that it'd always be faster, but I doubt it's always slower.

Sure.  We provide a way to support those cases, although it
involves blocking.  So far, even the tests I expected to be faster
with heap replacement have come out marginally slower that way than
with CONCURRENTLY, due to index activity; but I have no doubt cases
can be constructed that go the other way.

> I'm surprised that you had a case where the query was apparently quite
> fast yet the data set hardly changed and resulted in a very large result
> but I don't doubt that it happened.

The history of all events in the county (tens of millions of
records in Milwaukee County) need to be scanned to generate the
status of cases and the date of last activity, as well as scanning
all future calendar events to see what is scheduled.  This is
compared to tables setting standards for how quickly different
types of cases should be handled and how long a case should go
without some activity.  The results are used to provide a dashboard
for judges to see where things are slipping, and drill down to
detail so they can get a stalled case moving again.  On any one day
a relatively small number of cases change status or cross time
boundaries which make them need attention.

I really look forward to the time when we can support these queries
with incremental maintenance in matviews.  I don't expect that to
be until maybe version 10.3, though.  Not only will it eliminate a
big batch job each night, but judges will be able to see their work
reflected in the dashboard on the same day.

> What I was trying to get at is really that the delete/insert
> approach would be good enough in very many cases and it wouldn't
> have what look, to me anyway, as some pretty ugly warts around
> these cases.

I guess we could add a "DELETE everything and INSERT the new
version of everything option for REFRESH in addition to what is
there now, but I would be very reluctant to use it as a
replacement.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

24 September 2013, 15:57:54

* Kevin Grittner (kgrittn@ymail.com) wrote:
> Stephen Frost <sfrost@snowman.net> wrote:
> > The promise that we'll always return the binary representation of the
> > data that we saw last.  When greatest(x,y) comes back 'false' for a
> > MAX(), we then have to go check "well, does the type consider them
> > equal?", because, if the type considers them equal, we then have to
> > decide if we should replace x with y anyway, because it's different
> > at a binary level.  That's what we're saying we'll always do now.
>
> I'm having a little trouble following that. 

You're looking at it from the perspective of what's committed today, I
think.

> The query runs as it
> always has, with all the old definitions of comparisons.  After it
> is done, we check whether the rows are the same.  The operation of
> MAX() will not be affected in any way.  If MAX() returns a value
> which is not the same as what the matview has, the matview will be
> modified to match what MAX() returned.

I'm referring to a case where we're doing incremental maintenance of the
matview (in the far distant future..) and all we've got is the current
MAX value and the value from the row being updated.  We're going to
update that MAX on cases where the values are
"equal-but-binary-different".

> > We're also saying that we'll replace things based on plan differences
> > rather than based on if the rows underneath actually changed at all.
>
> Only if the user uses a query which does not produce deterministic
> results.

Which is apt to happen..

> > We could end up with material differences in the result of matviews
> > updated through incremental REFRESH and matviews updated through
> > actual incremental mainteance- and people may *care* about those
> > because we've told them (or they discover) they can depend on these
> > types of changes to be reflected in the result.
>
> Perhaps we should document the recommendation that people not
> create materialized views with non-deterministic results, but I
> don't think that should be a hard restriction. 

I wasn't suggesting a hard restriction, but I was postulating about how
the behavior may change in the future (or we may need to do very hacky
things to prevent it from channging) and how these decisions may impact
that.

> > What I was trying to get at is really that the delete/insert
> > approach would be good enough in very many cases and it wouldn't
> > have what look, to me anyway, as some pretty ugly warts around
> > these cases.
>
> I guess we could add a "DELETE everything and INSERT the new
> version of everything option for REFRESH in addition to what is
> there now, but I would be very reluctant to use it as a
> replacement.

It wouldn't address my concerns anyway, which are around these binary
operators and the update-in-place approach being risky and setting us up
for problems down the road.
Thanks,
    Stephen

Re: record identical operator

From

Robert Haas

Date:

24 September 2013, 16:44:35

On Tue, Sep 24, 2013 at 12:00 PM, Stephen Frost <sfrost@snowman.net> wrote:
> It wouldn't address my concerns anyway, which are around these binary
> operators and the update-in-place approach being risky and setting us up
> for problems down the road.

I think that if you want to hold up a bug fix to a feature that's
already committed, you need to be more specific than to say that there
might be problems down the road.  I can't see that you've identified
any consequence of this change that is clearly bad.  I understand that
you don't like the idea of making binary-comparison operators
user-visible due to the risk of user-confusion, but we have lots of
confusing features already and we handle that by documenting them.
This one doesn't seem any worse than average; in fact, to me it seems
rather minor.

At any rate, I think we're getting distracted from the real point
here.  Here's what this patch is attempting to fix:

rhaas=# create table sales (txn_date date, amount numeric);
CREATE TABLE
rhaas=# insert into sales values ('2013-09-01', '100.00'),
('2013-09-01', '150.00'), ('2013-09-02', '75.0');
INSERT 0 3
rhaas=# create materialized view sales_summary as select txn_date,
sum(amount) as amount from sales group by 1;
SELECT 2
rhaas=# create unique index on sales_summary (txn_date);
CREATE INDEX
rhaas=# select * from sales_summary;          txn_date  | amount
------------+--------2013-09-01 | 250.002013-09-02 |   75.0
(2 rows)

rhaas=# update sales set amount = round(amount, 2);
UPDATE 3
rhaas=# refresh materialized view concurrently sales_summary;
REFRESH MATERIALIZED VIEW
rhaas=# select * from sales_summary; txn_date  | amount
------------+--------2013-09-01 | 250.002013-09-02 |   75.0
(2 rows)

rhaas=# refresh materialized view sales_summary;
REFRESH MATERIALIZED VIEW
rhaas=# select * from sales_summary; txn_date  | amount
------------+--------2013-09-01 | 250.002013-09-02 |  75.00
(2 rows)

Note that, after we change the data in the underlying table, the
output of the query changes.  But REFRESH MATERIALIZED VIEW
CONCURRENTLY doesn't fix it.  However, REFRESH MATERIALIZED VIEW
without CONCURRENTLY does fix it.  That's a bug, because if there are
two ways of refreshing a materialized view they should both produce
the same answer.  Shouldn't they?  The fact that users can write
queries that don't always give the same answer is not a reason why
it's OK for REFRESH CONCURRENTLY to misbehave on queries that do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

24 September 2013, 17:02:12

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Tue, Sep 24, 2013 at 12:00 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > It wouldn't address my concerns anyway, which are around these binary
> > operators and the update-in-place approach being risky and setting us up
> > for problems down the road.
>
> I think that if you want to hold up a bug fix to a feature that's
> already committed, you need to be more specific than to say that there
> might be problems down the road.

Committed isn't released and I simply can't agree that introducing new
operators is a 'bug fix'.  Had it gone out as-is, I can't see us
back-patching a bunch of new operators to fix it.

> Note that, after we change the data in the underlying table, the
> output of the query changes.  But REFRESH MATERIALIZED VIEW
> CONCURRENTLY doesn't fix it.  However, REFRESH MATERIALIZED VIEW
> without CONCURRENTLY does fix it.  That's a bug, because if there are
> two ways of refreshing a materialized view they should both produce
> the same answer.  Shouldn't they?

The same queries run over time without changes to the underlying data
really should return the same data, shoudln't they?  Is it a bug that
they don't?

In general, I agree that they should produce the same results, as should
incrementally maintained views when they happen.  I'm not convinced that
choosing whatever the 'new' value is to represent the value in the
matview for the equal-but-not-binary-identical will always be the
correct answer though.

> The fact that users can write
> queries that don't always give the same answer is not a reason why
> it's OK for REFRESH CONCURRENTLY to misbehave on queries that do.

This really is, imv, agreeing to hold a higher standard for matviews
than we do for what matviews are *based* on- which is queries.
Thanks,
    Stephen

Re: record identical operator

From

Robert Haas

Date:

24 September 2013, 18:05:40

On Tue, Sep 24, 2013 at 1:04 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Tue, Sep 24, 2013 at 12:00 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> > It wouldn't address my concerns anyway, which are around these binary
>> > operators and the update-in-place approach being risky and setting us up
>> > for problems down the road.
>>
>> I think that if you want to hold up a bug fix to a feature that's
>> already committed, you need to be more specific than to say that there
>> might be problems down the road.
>
> Committed isn't released and I simply can't agree that introducing new
> operators is a 'bug fix'.  Had it gone out as-is, I can't see us
> back-patching a bunch of new operators to fix it.

You're perfectly welcome to argue that we should rip REFRESH
MATERIALIZED VIEW CONCURRENTLY back out, or change the implementation
of it.  However, that's not the topic of this thread.  Had it gone out
as is, we would have either had to come up with some more complex fix
that could make things right without catalog changes, or we would have
had to document that it doesn't work correctly.  Fortunately we are
not faced with that conundrum.

>> Note that, after we change the data in the underlying table, the
>> output of the query changes.  But REFRESH MATERIALIZED VIEW
>> CONCURRENTLY doesn't fix it.  However, REFRESH MATERIALIZED VIEW
>> without CONCURRENTLY does fix it.  That's a bug, because if there are
>> two ways of refreshing a materialized view they should both produce
>> the same answer.  Shouldn't they?
>
> The same queries run over time without changes to the underlying data
> really should return the same data, shoudln't they?

Not necessarily.  There are and always have been plenty of cases where
that isn't true.

> Is it a bug that they don't?

No.

> In general, I agree that they should produce the same results, as should
> incrementally maintained views when they happen.  I'm not convinced that
> choosing whatever the 'new' value is to represent the value in the
> matview for the equal-but-not-binary-identical will always be the
> correct answer though.

Well, you have yet to provide an example of where it isn't the right
behavior.  I initially thought that perhaps we needed a type-specific
concept of exact equality, so that two things would be exactly equal
if they would both be dumped the same way by pg_dump, even if the
internal representations were different.

But on further thought I'm no longer convinced that's a good idea.
For example, consider the compact-numeric format that I introduced a
few releases back.  The changes are backward compatible, so you can
run pg_upgrade on a cluster containing the old format and everything
works just fine.  But your table will be larger than you really want
it to be until you rewrite it.  Any materialized view that was built
by selecting those numeric values will *also* be larger than you want
it to be until you rewrite it.  Fortunately, you can shrink the table
by using a rewriting variant of ALTER TABLE.  After you do that, you
will perhaps also want to shrink the materialized view.  And in fact,
REFRESH will do that for you, but currently, REFRESH CONCURRENTLY
won't.  It seems to me that it would be nicer if it did.

Now I admit that's an arguable point.  We could certainly define an
intermediate notion of equality that is more equal than whatever =
does, but not as equal as exact binary equality.  However, such a new
notion would have no precedence in our existing code, whereas the use
of binary equality does.  Going to a lot of work to build up this
intermediate notion of equality does not seem like a good idea to me;
I think a general rule of good programming is not to invent more ways
to do things than are clearly necessary, and requiring every core and
extension type to define an exact-equality operator just to support
this feature seems like excessive in the extreme.

Moreover, I think what to include in that intermediate notion of
equality would be kind of hard to decide, because there are a lot of
subtly different questions that one can ask, and we'd inevitably have
to answer the question "how equal does it have to be to be equal
enough?".   Numerics and arrays have multiple ways of referencing what
is intended to be the exact same value, but those can be disambiguated
by passing them to a function that cares, like pg_column_size().
Then, on a user-visible level, numerics allow variation in the number
of trailing zeroes after the decimal point.  Floats have extra digits
that aren't shown except with extra_float_digits, so that the value
can be less than totally equal even if the text representation is the
same.  citext intentionally ignores case.  In every one of those
cases, it's possible that the user of materialized views might say
"you know, if it's equal enough that the = operator says it's the
same, then I really don't mind if the materialized view maintenance
gets skipped".  But it's also possible that they might say, "you know,
those values are not really the same, and the materialized view really
ought to reflect the latest data".

I think the conservative (and therefore correct) approach is to decide
that we're always going to update if we detect any difference at all.
The only real argument for ignoring any differences is that it will
mean more changes to the materialized view.  And that is true enough,
but in practical scenarios such updates will be rare.  I would argue
that if someone goes through and add spends a lot of time adding or
removing trailing zeroes to the base tables on which a materialized
view is based, the likelihood is that those trailing zeros do matter
and the change ought to reflect through to the materialized views as
well.  It is not impossible that someone could upgrade to a version of
PG that supports a more compact or otherwise-enhanced version of some
data type, rewrites the table for some reason, and is then surprised
to see a change storm on the associated materialized view as well.
But I don't think it's very likely - we have no such changes planned.
And even if does happen at some point down the road, I can't see
justifying the effort of building a third notion of equality as effort
well-spent to avoid it, especially because it's equally possible that
the user *would* want the changes to propagate through to the
materialized view.

>> The fact that users can write
>> queries that don't always give the same answer is not a reason why
>> it's OK for REFRESH CONCURRENTLY to misbehave on queries that do.
>
> This really is, imv, agreeing to hold a higher standard for matviews
> than we do for what matviews are *based* on- which is queries.

I don't think it is, and your explanation of why you think it is does
not make any sense to me.  If a view produced an answer that the
underlying query *could not have produced* then we would clearly
consider that a bug.  But, you're arguing that a materialized view on
which REFRESH CONCURRENTLY should be allowed to produce just such
answers.

It is obviously true, and unavoidable, that if the query can produce
more than one result set depending on the query plan or other factors,
then the materialized view contents can match only one of those
possible result sets.  But you are arguing that it should be allowed
to produce some OTHER result set, that a direct execution of the query
could *never* have produced, and I can't see how that can be right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Stephen Frost

Date:

24 September 2013, 19:20:09

* Robert Haas (robertmhaas@gmail.com) wrote:
> Now I admit that's an arguable point.  We could certainly define an
> intermediate notion of equality that is more equal than whatever =
> does, but not as equal as exact binary equality.

I suggested it up-thread and don't recall seeing a response, so here it
is again- passing the data through the binary-out function for the type
and comparing *that* would allow us to change the interal binary
representation of data types and would be something which we could at
least explain to users as to why X isn't the same as Y according to this
binary operator.

> I think the conservative (and therefore correct) approach is to decide
> that we're always going to update if we detect any difference at all.

And there may be users who are surprised that a refresh changed parts of
the table that have nothing to do with what was changed in the
underlying relation, because a different plan was used and the result
ended up being binary-different.  It's easy to explain to users why that
would be when we're doing a wholesale replace but I don't think it'll be
nearly as clear why that happened when we're not replacing the whole
table and why REFRESH can basically end up changing anything (but
doesn't consistently).  If we're paying attention to the records changed
and only updating the matview's records when they're involved, that
becomes pretty clear.  What's happening here feels very much like
unintended consequences.

> It is obviously true, and unavoidable, that if the query can produce
> more than one result set depending on the query plan or other factors,
> then the materialized view contents can match only one of those
> possible result sets.  But you are arguing that it should be allowed
> to produce some OTHER result set, that a direct execution of the query
> could *never* have produced, and I can't see how that can be right.

I agree that the matview shouldn't produce things which *can't* exist
through an execution of the query.  I don't intend to argue against that
but I realize that's a fallout of not accepting the patch to implement
the binary comparison operators.  My complaint is more generally that if
this approach needs such then there's going to be problems down the
road.  No, I can't predict exactly what they are and perhaps I'm all wet
here, but this kind of binary-equality operations are something I've
tried to avoid since discovering what happens when you try to compare
a couple of floating point numbers.
Thanks,
    Stephen

Re: record identical operator

From

Kevin Grittner

Date:

24 September 2013, 22:46:46

Stephen Frost <sfrost@snowman.net> wrote:

>> I think the conservative (and therefore correct) approach is to
>> decide that we're always going to update if we detect any
>> difference at all.
>
> And there may be users who are surprised that a refresh changed
> parts of the table that have nothing to do with what was changed
> in the underlying relation, because a different plan was used and
> the result ended up being binary-different.

Binary different for "equal" values could include box (or other
geometry) objects moved to completely new locations and/or not
quite the same size.

Here is v2 of the patch which changes from the universally disliked
operator names v1 used.  It also fixes bugs in the row comparisons
for pass-by-reference types, fixes a couple nearby comments, and
adds regression tests for a matview containing a box column.

The box type is interesting in that its = operator only worries
about the area of the box, and considers two boxes with no more
than EPSILON difference to be equal.  This means that boxes at
totally different locations can be equal, and that if A = B and B =
C it is not necessarily true that A = C.  This doesn't cause too
much trouble in general because boxes don't have btree opclasses.

Since there are types, including core types, without a default
btree opclass, materialized views containing them cannot use RMVC
as currently committed.  This patch would fix that, or we could rip
out the current implementation and go to the "delete everything and
insert the entire new matview contents" approach.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

record-identical-v2.patch

Re: record identical operator

From

Merlin Moncure

Date:

25 September 2013, 02:08:40

On Tue, Sep 24, 2013 at 2:22 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> Now I admit that's an arguable point.  We could certainly define an
>> intermediate notion of equality that is more equal than whatever =
>> does, but not as equal as exact binary equality.
>
> I suggested it up-thread and don't recall seeing a response, so here it
> is again- passing the data through the binary-out function for the type
> and comparing *that* would allow us to change the interal binary
> representation of data types and would be something which we could at
> least explain to users as to why X isn't the same as Y according to this
> binary operator.
>
>> I think the conservative (and therefore correct) approach is to decide
>> that we're always going to update if we detect any difference at all.
>
> And there may be users who are surprised that a refresh changed parts of
> the table that have nothing to do with what was changed in the
> underlying relation, because a different plan was used and the result
> ended up being binary-different.  It's easy to explain to users why that
> would be when we're doing a wholesale replace but I don't think it'll be
> nearly as clear why that happened when we're not replacing the whole
> table and why REFRESH can basically end up changing anything (but
> doesn't consistently).  If we're paying attention to the records changed
> and only updating the matview's records when they're involved, that
> becomes pretty clear.  What's happening here feels very much like
> unintended consequences.

FWIW you make some interesting points (I did a triple take on the plan
dependent changes) but I'm 100% ok with the proposed behavior.
Matviews satisfy 'truth' as *defined by the underlying query only*.
This is key: there may be N candidate 'truths' for that query: it's
not IMNSHO reasonable to expect the post-refresh truth to be
approximately based in the pre-refresh truth.  Even if the
implementation happened to do what you're asking  for it would only be
demonstrating undefined but superficially useful behavior...a good
analogy would be the old scan behavior where an unordered scan would
come up in 'last update order'.  That (again, superficially useful)
behavior was undefined and we reserved the right to change it.  And we
did.  Unnecessarily defined behaviors defeat future performance
optimizations.

So Kevin's patch AIUI defines a hitherto non-user accessible (except
in the very special case of row-wise comparison) mechanic to try and
cut down the number of rows that *must* be refreshed.  It may or may
not do a good job at that on a situational basis -- if it was always
better we'd probably be overriding the default behavior.  I don't
think it's astonishing at all for matview to pseudo-randomly adjust
case over a citext column; that's just part of the deal with equality
ambiguous types.  As long as the matview doesn't expose a dataset that
was impossible to have been generated by the underlying query, I'm
good.

merlin

Re: record identical operator

From

Kevin Grittner

Date:

25 September 2013, 15:40:45

Kevin Grittner <kgrittn@ymail.com> wrote:

> Here is v2 of the patch which changes from the universally
> disliked operator names v1 used.  It also fixes bugs in the row
> comparisons for pass-by-reference types, fixes a couple nearby
> comments, and adds regression tests for a matview containing a
> box column.

I accidentally omitted the regression tests for matview with a box
column from v2.  Rather than redoing the whole thing, here is a v2a
patch to add just that omitted part.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

record-identical-v2a.patch

Re: record identical operator

From

Robert Haas

Date:

26 September 2013, 15:18:30

On Tue, Sep 24, 2013 at 3:22 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> Now I admit that's an arguable point.  We could certainly define an
>> intermediate notion of equality that is more equal than whatever =
>> does, but not as equal as exact binary equality.
>
> I suggested it up-thread and don't recall seeing a response, so here it
> is again- passing the data through the binary-out function for the type
> and comparing *that* would allow us to change the interal binary
> representation of data types and would be something which we could at
> least explain to users as to why X isn't the same as Y according to this
> binary operator.

Sorry, I missed that suggestion.

I'm wary of inventing a completely new way of doing this.  I don't
think that there's any guarantee that the send/recv functions won't
expose exactly the same implementation details as a direct check for
binary equality.  For example, array_send() seems to directly reveal
the presence or absence of a NULL bitmap.  Even if there were no such
anomalies today, it feels fragile to rely on a fairly-unrelated
concept to have exactly the semantics we want here, and it will surely
be much slower.  Binary equality has existing precedent and is used in
numerous places in the code for good reason.  Users might be confused
about the use of those semantics in those places also, but AFAICT
nobody is.

I think the best thing I can say about this proposal is that it would
clearly be better than what we're doing now, which is just plain
wrong.  I don't think it's the best proposal, however.

>> It is obviously true, and unavoidable, that if the query can produce
>> more than one result set depending on the query plan or other factors,
>> then the materialized view contents can match only one of those
>> possible result sets.  But you are arguing that it should be allowed
>> to produce some OTHER result set, that a direct execution of the query
>> could *never* have produced, and I can't see how that can be right.
>
> I agree that the matview shouldn't produce things which *can't* exist
> through an execution of the query.  I don't intend to argue against that
> but I realize that's a fallout of not accepting the patch to implement
> the binary comparison operators.  My complaint is more generally that if
> this approach needs such then there's going to be problems down the
> road.  No, I can't predict exactly what they are and perhaps I'm all wet
> here, but this kind of binary-equality operations are something I've
> tried to avoid since discovering what happens when you try to compare
> a couple of floating point numbers.

So, I get that.

But what I think is that the problem that's coming up here is almost
the flip side of that.  If you are working with types that are a
little bit imprecise, such as floats or citext or box, you want to use
comparison strategies that have a little bit of tolerance for error.
In the case of box, this is actually built into the comparison
operator.  In the case of floats, it's not; you as the application
programmer have to deal with the fact that comparisons are imprecise -
like by avoiding equality comparisons.

On the other hand, if you are *replicating* those data types, then you
don't want that tolerance.  If you're checking whether two boxes are
equal, you may indeed want the small amount of fuzziness that our
comparison operators allow.  But if you're copying a box or a float
from one table to another, or from one database to another, you want
the values copied exactly, including all of those low-order bits that
tend to foul up your comparisons.  That's why float8out() normally
doesn't display any extra_float_digits - because you as the user
shouldn't be relying on them - but pg_dump does back them up because
not doing so would allow errors to propagate.  Similarly here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator - Review

From

Bruce Momjian

Date:

26 September 2013, 21:50:14

On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote:
> On 09/12/2013 06:27 PM, Kevin Grittner wrote:
> >Attached is a patch for a bit of infrastructure I believe to be
> >necessary for correct behavior of REFRESH MATERIALIZED VIEW
> >CONCURRENTLY as well as incremental maintenance of matviews.
> 
> Here is attempt at a review of the v1 patch.
> 
> There has been a heated discussion on list about if we even want
> this type of operator. I will try to summarize it here
> 
> The arguments against it are
> * that a record_image_identical call on two records that returns
> false can still return true under the equality operator, and the
> records might (or might not) appear to be the same to users
> * Once we expose this as an operator via SQL someone will find it
> and use it and then complain when we change things such as the
> internal representation of a datatype which might
>    break there queries
> * The differences between = and record_image_identical might not be
> understood by everywhere who finds the operator leading to confusion
> * Using a criteria other than equality (the = operator provided by
> the datatype) might cause problems if we later provide 'on change'
> triggers for materialized views
> 
> The arguments for this patch are
> * We want the materialized view to return the same value as would be
> returned if the query were executed directly.  This not only means
> that the values should be the same according to a datatypes =
> operator but that they should appear the same 'to the eyeball'.
> * Supporting the materialized view refresh check via SQL makes a lot
> of sense but doing that requires exposing something via SQL
> * If we adequately document what we mean by record_image_identical
> and the operator we pick for this then users shouldn't be surprised
> at what they get if they use this

This is a good summary.  I think there are a few things that make this
issue difficult to decide:

1.  We have to use an operator to give the RMVC (REFRESH MATERIALIZED
VIEW CONCURRENTLY) SPI query the ability to optimize this query.   If we
could do this with an SQL or C function, there would be less concern
about the confusion caused by adding this capability.

Question:  If we are comparing based on some primary key, why do we need
this to optimize?  Certainly the primary key index will be used, no?

2.  Agregates are already non-deterministic, so some feel that adding
this feature doesn't improve much.  The counter-argument is that without
the binary row comparison ability, rows could be returned that could
_never_ have been produced by the base data, which is more of a user
surprise than non-deterministic rows.

3.  Our type casting and operators are already complex, and adding
another set of operators only compounds that.

4.  There are legitimate cases where tool makers might want the ability
to compare rows binarily, e.g. for replication, and adding these
operators would help with that.

I think we need to see a patch from Kevin that shows all the row
comparisons documented and we can see the impact of the user-visibile
part.

One interesting approach would be to only allow the operator to be
called from SPI queries.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: record identical operator - Review

From

Bruce Momjian

Date:

27 September 2013, 00:27:45

On Thu, Sep 26, 2013 at 05:50:08PM -0400, Bruce Momjian wrote:
> I think we need to see a patch from Kevin that shows all the row
> comparisons documented and we can see the impact of the user-visibile
> part.
> 
> One interesting approach would be to only allow the operator to be
> called from SPI queries.

It would also be good to know about similar non-default entries in
pg_opclass so we can understand the expected impact.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: record identical operator - Review

From

Kevin Grittner

Date:

27 September 2013, 22:34:29

Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote:
>> On 09/12/2013 06:27 PM, Kevin Grittner wrote:
>>> Attached is a patch for a bit of infrastructure I believe to be
>>> necessary for correct behavior of REFRESH MATERIALIZED VIEW
>>> CONCURRENTLY as well as incremental maintenance of matviews.
>>
>> Here is attempt at a review of the v1 patch.
>>
>> There has been a heated discussion on list about if we even want
>> this type of operator. I will try to summarize it here
>>
>> The arguments against it are
>> * that a record_image_identical call on two records that returns
>> false can still return true under the equality operator, and the
>> records might (or might not) appear to be the same to users
>> * Once we expose this as an operator via SQL someone will find
>> it and use it and then complain when we change things such as
>> the internal representation of a datatype which might break
>> there queries

I don't see where that is possible unless they count on the order
of records sorted with the new operators, which would not be
something I recommend.  Changing the internal representation cannot
break simple use of the alternative equality definition as long as
all you cared about was whether the binary storage format of the
values was identical.

>> * The differences between = and record_image_identical might not
>> be understood by everywhere who finds the operator leading to
>> confusion

Either they find it and read the code, or we document it and they
read the docs.

>> * Using a criteria other than equality (the = operator provided
>> by the datatype) might cause problems if we later provide 'on
>> change' triggers for materialized views

I would fight user-defined triggers for matviews tooth and nail.
We need to get incremental maintenance working instead.

>> The arguments for this patch are
>> * We want the materialized view to return the same value as
>> would be returned if the query were executed directly.  This not
>> only means that the values should be the same according to a
>> datatypes = operator but that they should appear the same 'to
>> the eyeball'.

And to functions the user can run against the values.  The example
with the null bitmap for arrays being included when not necessary
is something that isn't directly apparent to the eye, but queries
which use pg_column_size would not get equal results.

>> * Supporting the materialized view refresh check via SQL makes a
>> lot of sense but doing that requires exposing something via SQL
>> * If we adequately document what we mean by
>> record_image_identical and the operator we pick for this then
>> users shouldn't be surprised at what they get if they use this

We first need to document the existing record comparison operators.
If they read the docs for comparing "row_constructors" and expect
that to be the behavior they get when they compare records, they
will be surprised.

> This is a good summary.  I think there are a few things that make
> this issue difficult to decide:
>
> 1.  We have to use an operator to give the RMVC (REFRESH
> MATERIALIZED VIEW CONCURRENTLY) SPI query the ability to optimize
> this query.  If we could do this with an SQL or C function, there
> would be less concern about the confusion caused by adding this
> capability.

 (a)  We can't.

 (b)  Why would that be less confusing?

> Question:  If we are comparing based on some primary key, why do
> we need this to optimize?

Because comparing primary keys doesn't tell us whether the old and
new values in the row all match.

> Certainly the primary key index will be used, no?

It currently uses all columns which are part of unique indexes on
the matview which are not partial (i.e., they do not use a WHERE
clause), they index only on columns (not expressions).  It is not
possible to define a primary key on a matview.  There are two
reasons for using these columns:

(a)  It provides a correctness guarantee against having duplicated
rows which have no nulls.  This guarantee is what allows us to use
the differential technique which is faster than retail DELETE and
re-INSERT of everything.

(b)  Since we don't support hash joins of records, and it would
tend to be pretty slow if we did, it allows faster hash joins on
one or more columns which stand a good chance of doing this
efficiently.

> 2.  Agregates are already non-deterministic,

Only some of them, and only with certain source data.

> so some feel that adding this feature doesn't improve much.

It allows a materialized view which contains a column of a type
without a default btree opclass to *use* concurrent refresh, it
makes queries with deterministic results always generate matview
results with exactly match the results of a VIEW using the query if
it were run at the same moment, and for nondeterministic queries,
it provides results consistent with *some* result set which could
have been returned by a view at the same time.  Unless we do
something, none of these things are true.  That seems like much to
me.

> The counter-argument is that without the binary row comparison
> ability, rows could be returned that could _never_ have been
> produced by the base data, which is more of a user surprise than
> non-deterministic rows.

Yes, having the matview represent results from from running the
query at *some* point in time is *one* of the benefits.

> 3.  Our type casting and operators are already complex, and
> adding another set of operators only compounds that.

It cannot have any effect on any of the existing operators, so I'm
not sure whether you are referring to the extra operators and
functions, or something else.  It does not, for example, introduce
any risk of "ambiguous operators".

> 4.  There are legitimate cases where tool makers might want the
> ability to compare rows binarily, e.g. for replication, and
> adding these operators would help with that.
>
> I think we need to see a patch from Kevin that shows all the row
> comparisons documented and we can see the impact of the
> user-visibile part.

I'm inclined to first submit a proposed documentation patch for the
existing record operators, and see if we can make everyone happy
with that, and *then* see about adding documentation for the new
ones.  Trying to deal with both at once is likely to increase
confusion and wheel-spinning.

> One interesting approach would be to only allow the operator to
> be called from SPI queries.

Why would that be a good idea?

> It would also be good to know about similar non-default entries
> in pg_opclass so we can understand the expected impact.

A quick query (lacking schema information and schema qualification)
shows what is there by default:

select amname, typname, opcname
  from pg_opclass c
  join pg_am a on a.oid = c.opcmethod
  join pg_type t on t.oid = c.opcintype
  where not opcdefault
  order by 1, 2, 3;

 amname | typname |       opcname      
--------+---------+---------------------
 btree  | bpchar  | bpchar_pattern_ops
 btree  | inet    | cidr_ops
 btree  | record  | record_image_ops
 btree  | text    | text_pattern_ops
 btree  | text    | varchar_ops
 btree  | text    | varchar_pattern_ops
 hash   | bpchar  | bpchar_pattern_ops
 hash   | inet    | cidr_ops
 hash   | text    | text_pattern_ops
 hash   | text    | varchar_ops
 hash   | text    | varchar_pattern_ops
 spgist | point   | kd_point_ops
(12 rows)

The text_pattern_ops opclass, for example adds memcmp based
comparisons for text strings, that differ from what the default
comparisons regarding greater or lesser values:

test=# select 'a' < E'\u02E5';
 ?column?
----------
 f
(1 row)

test=# select 'a' ~<~ E'\u02E5';
 ?column?
----------
 t
(1 row)

Then there are types with no default btree opclass, so "normal"
equality tests are not defined.  When I run a simple query for such
types I get hundreds, although some of those may not be important
for this discussion.  Let's pick one, just as an example: box.

test=# select '(2,2),(1,1)'::box = '(12,82),(12.1,92)'::box;
 ?column?
----------
 t
(1 row)

test=# \set A '''(1.9999996,1.9999996),(1,1)''::box'
test=# \set B '''(2,2),(1,1)''::box'
test=# \set C '''(2.0000004,2.0000004),(1,1)''::box'
test=# select :A = :B, :B = :C, :A = :C;
 ?column? | ?column? | ?column?
----------+----------+----------
 t        | t        | f
(1 row)

The = operator only tells you whether the boxes are approximately
the same size.  There is an operator which checks for "same as?"
based on all corners matching, but it still allows EPSILON
differences:

test=# select '(2,2),(1,1)'::box ~= '(12,82),(12.1,92)'::box;
 ?column?
----------
 f
(1 row)

test=# select :A ~= :B, :B ~= :C, :A ~= :C;
 ?column? | ?column? | ?column?
----------+----------+----------
 t        | t        | t
(1 row)

These infelicities are not too painful if they are not incorporated
into an opclass, however.  But lack of a btree opclass for the type
means you get this sort of problem with current "record =":

test=# select (row ('(2,2),(1,1)'::box))::record = (row ('(2,2),(1,1)'::box))::record;
ERROR:  could not identify an equality operator for type box

The new operator allows record comparisons when the record contains
such types:

test=# select (row ('(2,2),(1,1)'::box))::record *= (row ('(2,2),(1,1)'::box))::record;
 ?column?
----------
 t
(1 row)

... and would notice if you changed between "equal" values:

test=# select (row (:A))::record *= (row (:B))::record,
test-#        (row (:B))::record *= (row (:C))::record,
test-#        (row (:A))::record *= (row (:C))::record;
 ?column? | ?column? | ?column?
----------+----------+----------
 f        | f        | f
(1 row)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator - Review

From

Kevin Grittner

Date:

28 September 2013, 19:03:32

Kevin Grittner <kgrittn@ymail.com> wrote:
> Bruce Momjian <bruce@momjian.us> wrote:

>> I think we need to see a patch from Kevin that shows all the row
>> comparisons documented and we can see the impact of the
>> user-visibile part.

First draft attached.

> I'm inclined to first submit a proposed documentation patch for the
> existing record operators, and see if we can make everyone happy
> with that, and *then* see about adding documentation for the new
> ones.  Trying to deal with both at once is likely to increase
> confusion and wheel-spinning.

When I took a stab at it, the new operators only seemed to merit a
single paragraph, so that is included after all.
 
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

record-identical-v2b.patch

Re: record identical operator - Review

From

Steve Singer

Date:

29 September 2013, 17:46:54

On 09/28/2013 03:03 PM, Kevin Grittner wrote:
> <para>
> +    To support matching of rows which include elements without a default
> +    B-tree operator class, the following operators are defined for composite
> +    type comparison:
> +    <literal>*=</>,
> +    <literal>*<></>,
> +    <literal>*<</>,
> +    <literal>*<=</>,
> +    <literal>*></>, and
> +    <literal>*>=</>.
> +    These operators are also used internally to maintain materialized views,
> +    and may be useful to replication software.  To ensure that all user
> +    visible changes are detected, even when the equality operator for the
> +    type treats old and new values as equal, the byte images of the stored
> +    data are compared.  While ordering is deterministic, it is not generally
> +    useful except to facilitate merge joins.  Ordering may differ between
> +    system architectures and major releases of
> +    <productname>PostgreSQL</productname>.
> +   </para>

How about
 To support matching of rows which include elements without a default    B-tree operator class, the following operators
aredefined for composite    type comparison:    <literal>*=</>,    <literal>*<></>,    <literal>*<</>,
<literal>*<=</>,   <literal>*></>, and   <literal>*>=</>.

These operators compare the internal binary representation of the two 
rows.   Two rows might have a different binary representation even 
though comparisons of the two rows with the equality operator is true.   
The ordering of rows under these comparision operators is deterministic 
but not otherwise meaningful.  These operators are used internally for 
materialized views and might be useful for other specialized purposes 
such as replication but are not intended to be generally useful for 
writing queries.

Re: record identical operator - Review

From

Kevin Grittner

Date:

30 September 2013, 13:08:26

Steve Singer <steve@ssinger.info> wrote:

> How about
>
>   To support matching of rows which include elements without a default
>     B-tree operator class, the following operators are defined for composite
>     type comparison:
>     <literal>*=</>,
>     <literal>*<></>,
>     <literal>*<</>,
>     <literal>*<=</>,
>     <literal>*></>, and
>     <literal>*>=</>.
>
> These operators compare the internal binary representation of the two
> rows.  Two rows might have a different binary representation even
> though comparisons of the two rows with the equality operator is true.
> The ordering of rows under these comparision operators is deterministic
> but not otherwise meaningful.  These operators are used internally for
> materialized views and might be useful for other specialized purposes
> such as replication but are not intended to be generally useful for
> writing queries.

I agree that's an improvement.  Thanks!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator - Review

From

Bruce Momjian

Date:

01 October 2013, 02:46:28

On Fri, Sep 27, 2013 at 03:34:20PM -0700, Kevin Grittner wrote:
> >> The arguments for this patch are
> >> * We want the materialized view to return the same value as
> >> would be returned if the query were executed directly.  This not
> >> only means that the values should be the same according to a
> >> datatypes = operator but that they should appear the same 'to
> >> the eyeball'.
> 
> And to functions the user can run against the values.  The example
> with the null bitmap for arrays being included when not necessary
> is something that isn't directly apparent to the eye, but queries
> which use pg_column_size would not get equal results.

pg_column_size() is by definition internal details, so I am not worried
about that not matching.

> >> * Supporting the materialized view refresh check via SQL makes a
> >> lot of sense but doing that requires exposing something via SQL
> >> * If we adequately document what we mean by
> >> record_image_identical and the operator we pick for this then
> >> users shouldn't be surprised at what they get if they use this
> 
> We first need to document the existing record comparison operators.
> If they read the docs for comparing "row_constructors" and expect
> that to be the behavior they get when they compare records, they
> will be surprised.

Well, if they appear in \do, I am thinking they should be documented.

> > This is a good summary.  I think there are a few things that make
> > this issue difficult to decide:
> >
> > 1.  We have to use an operator to give the RMVC (REFRESH
> > MATERIALIZED VIEW CONCURRENTLY) SPI query the ability to optimize
> > this query.  If we could do this with an SQL or C function, there
> > would be less concern about the confusion caused by adding this
> > capability.
> 
>  (a)  We can't.
> 
>  (b)  Why would that be less confusing?

Because function names, especially long ones, are more clearly
self-documenting than operators.

> > Question:  If we are comparing based on some primary key, why do
> > we need this to optimize?
> 
> Because comparing primary keys doesn't tell us whether the old and
> new values in the row all match.

OK, but my question was about why we need a full set of operators rather
than just equal, and maybe not equal.  I thought you said we needed
others, e.g. >, so we could do merge joins, but I thought we would just
be doing comparisons after primary keys are joined, and if that is true,
we could just use a function.

Actually, I am now realizing you have to use the non-binary-level equals
comparison on keys, then the binary-level equals on rows for this to
work --- that's pretty confusing.  Is that true?

> > 3.  Our type casting and operators are already complex, and
> > adding another set of operators only compounds that.
> 
> It cannot have any effect on any of the existing operators, so I'm
> not sure whether you are referring to the extra operators and
> functions, or something else.  It does not, for example, introduce
> any risk of "ambiguous operators".

It makes our system more complex for the user to understand.

> > One interesting approach would be to only allow the operator to
> > be called from SPI queries.
> 
> Why would that be a good idea?

Because then it would be more of an "internal" operator.

> > It would also be good to know about similar non-default entries
> > in pg_opclass so we can understand the expected impact.
> 
> A quick query (lacking schema information and schema qualification)
> shows what is there by default:

OK, the unique list is:
       opcname--------------------- varchar_ops kd_point_ops cidr_ops text_pattern_ops varchar_pattern_ops
bpchar_pattern_ops(6rows)
 

Do these all have operators defined too?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: record identical operator - Review

From

Steve Singer

Date:

03 October 2013, 13:59:20

On 09/30/2013 09:08 AM, Kevin Grittner wrote:
> Steve Singer <steve@ssinger.info> wrote:
>
>> How about
>>
>>     To support matching of rows which include elements without a default
>>       B-tree operator class, the following operators are defined for composite
>>       type comparison:
>>       <literal>*=</>,
>>       <literal>*<></>,
>>       <literal>*<</>,
>>       <literal>*<=</>,
>>       <literal>*></>, and
>>       <literal>*>=</>.
>>
>> These operators compare the internal binary representation of the two
>> rows.  Two rows might have a different binary representation even
>> though comparisons of the two rows with the equality operator is true.
>> The ordering of rows under these comparision operators is deterministic
>> but not otherwise meaningful.  These operators are used internally for
>> materialized views and might be useful for other specialized purposes
>> such as replication but are not intended to be generally useful for
>> writing queries.
> I agree that's an improvement.  Thanks!

Are there any outstanding issues on this patch preventing it from being 
committed?
I think we have discussed this patch enough such that we now have 
consensus on proceeding with adding a record identical operator to SQL.
No one has objected to the latest names of the operators.

You haven't adjusted the patch to reduce the duplication between the 
equality and comparison functions, if you disagree with me and feel that 
doing so would increase the code complexity and be inconsistent with how 
we do things elsewhere that is fine.

Steve

> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>

Re: record identical operator - Review

From

Bruce Momjian

Date:

03 October 2013, 15:04:37

On Thu, Oct  3, 2013 at 09:59:03AM -0400, Steve Singer wrote:
> Are there any outstanding issues on this patch preventing it from
> being committed?

I have not received answers to my email of October 1:
http://www.postgresql.org/message-id/20131001024620.GB13385@momjian.us

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: record identical operator

From

Stephen Frost

Date:

03 October 2013, 15:09:40

* Robert Haas (robertmhaas@gmail.com) wrote:
> I'm wary of inventing a completely new way of doing this.  I don't
> think that there's any guarantee that the send/recv functions won't
> expose exactly the same implementation details as a direct check for
> binary equality.

I don't follow this thought.  Changing the binary representation which
is returned when users use the binary-mode protocol would be a pretty
massive user-impacting change, while adding a new way of storing NUMERIC
internally wouldn't even be visible to end users.

> For example, array_send() seems to directly reveal
> the presence or absence of a NULL bitmap.

That's part of the definition of what the binary protocol for array *is*
though, so that's simply a fact of life for our users.  That doesn't
mean we can't, say, change the array header to remove the internal type
OID and use a mapping from the type OID that's on the tuple to the type
OID inside the array- as long as array_send() still produces the same
binary structure for the end user.

> Even if there were no such
> anomalies today, it feels fragile to rely on a fairly-unrelated
> concept to have exactly the semantics we want here, and it will surely
> be much slower.

I agree that it would be slower but performance should be a
consideration once correctness is accomplished and this distinction
feels a great deal more "correct", imv.

> Binary equality has existing precedent and is used in
> numerous places in the code for good reason.  Users might be confused
> about the use of those semantics in those places also, but AFAICT
> nobody is.

You've stated that a few times and I've simply not had time to run down
the validity of it- so, where does internal-to-PG binary equality end up
being visible to our users?  Independent of that, are there places in
the backend which could actually be refactored to use these new
operators where it would reduce code complexity?

> On the other hand, if you are *replicating* those data types, then you
> don't want that tolerance.  If you're checking whether two boxes are
> equal, you may indeed want the small amount of fuzziness that our
> comparison operators allow.  But if you're copying a box or a float
> from one table to another, or from one database to another, you want
> the values copied exactly, including all of those low-order bits that
> tend to foul up your comparisons.  That's why float8out() normally
> doesn't display any extra_float_digits - because you as the user
> shouldn't be relying on them - but pg_dump does back them up because
> not doing so would allow errors to propagate.  Similarly here.

I agree that we should be copying the values exactly- and I think we're
already good there when it comes to doing a *copy*.  I further agree
that updating the matview should be a copy, but the manner in which
we're doing that is using an equality check to see if the value needs to
be updated or not which is where things get a bit fuzzy.  If we were
consistently copying and updating the value based on some external
knowledge that the value has changed (similar to how slony works w/
triggers that dump change sets into a table- it doesn't consider "has
any value on this row changed?"; the user did an update, presumably for
some purpose, therefore the change gets recorded and propagated), I'd be
perfectly happy.
Thanks,
    Stephen

Re: record identical operator - Review

From

Stephen Frost

Date:

03 October 2013, 15:11:02

Steve,

Thanks for following-up on this; I had meant to reply much sooner but
other things got in the way.
Thanks again,
    Stephen

* Steve Singer (steve@ssinger.info) wrote:
> Are there any outstanding issues on this patch preventing it from
> being committed?
> I think we have discussed this patch enough such that we now have
> consensus on proceeding with adding a record identical operator to
> SQL.
> No one has objected to the latest names of the operators.
>
> You haven't adjusted the patch to reduce the duplication between the
> equality and comparison functions, if you disagree with me and feel
> that doing so would increase the code complexity and be inconsistent
> with how we do things elsewhere that is fine.

Re: record identical operator - Review

From

Kevin Grittner

Date:

03 October 2013, 17:46:13

Bruce Momjian <bruce@momjian.us> wrote:
> On Fri, Sep 27, 2013 at 03:34:20PM -0700, Kevin Grittner wrote:

>> We first need to document the existing record comparison operators.
>> If they read the docs for comparing "row_constructors" and expect
>> that to be the behavior they get when they compare records, they
>> will be surprised.
>
> Well, if they appear in \do, I am thinking they should be documented.

This patch now covers the ones which are record comparison
operators, old and new.  Feel free to document the others if you
feel strongly on that point; but I don't feel that becomes the
business of this patch.

>> Because comparing primary keys doesn't tell us whether the old and
>> new values in the row all match.
>
> OK, but my question was about why we need a full set of operators rather
> than just equal, and maybe not equal.  I thought you said we needed
> others, e.g. >, so we could do merge joins, but I thought we would just
> be doing comparisons after primary keys are joined, and if that is true,
> we could just use a function.

http://www.postgresql.org/docs/current/interactive/xoper-optimization.html#AEN54334

> Actually, I am now realizing you have to use the non-binary-level equals
> comparison on keys, then the binary-level equals on rows for this to
> work --- that's pretty confusing.  Is that true?

It's a matter of replacing the = operator for record comparisons in
these two places with the new *= operator.

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/matview.c;h=238ccc72f5205ae00a15e6e17f384addfa445552;hb=master#l553

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/matview.c;h=238ccc72f5205ae00a15e6e17f384addfa445552;hb=master#l684

(With or without the new operator, the second of these needs to be
schema-qualified to avoid trouble if the user adds a different
implementation of = ahead of the pg_catalog implementation on
search_path, as users can do.)  The difference between = and *=
would not generally be visible to end users -- just to those
working on matview.c.

>> A quick query (lacking schema information and schema qualification)
>> shows what is there by default:
>
> OK, the unique list is:
>
>           opcname
>     ---------------------
>     varchar_ops
>     kd_point_ops
>     cidr_ops
>     text_pattern_ops
>     varchar_pattern_ops
>     bpchar_pattern_ops
>     (6 rows)
>
> Do these all have operators defined too?

Every operator class is associated with operators.  For example,
while text_pattern_ops uses the same = and <> operators as the
default text opclass (because that already uses memcmp), it adds
~>~, ~>=~, ~<~, and ~<=~ operators which also use memcmp (ignoring
character encoding and collation).

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator - Review

From

Kevin Grittner

Date:

03 October 2013, 17:50:50

Steve Singer <steve@ssinger.info> wrote:

> You haven't adjusted the patch to reduce the duplication between the
> equality and comparison functions, if you disagree with me and feel that
> doing so would increase the code complexity and be inconsistent with how
> we do things elsewhere that is fine.

I think the main reason to keep them separate is that it makes it
easier to compare record_cmp to record_image_cmp and record_eq to
record_image_eq to see what the differences and similarities are.
Other reasons are that I think all those conditionals inside a
combined function would get messy and make the logic harder to
understand.  The number of places that would need conditionals,
plus new wrappers that would be needed, would mean that the net
reduction in lines of code would be minimal.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Robert Haas

Date:

03 October 2013, 17:53:55

On Thu, Oct 3, 2013 at 11:12 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> Binary equality has existing precedent and is used in
>> numerous places in the code for good reason.  Users might be confused
>> about the use of those semantics in those places also, but AFAICT
>> nobody is.
>
> You've stated that a few times and I've simply not had time to run down
> the validity of it- so, where does internal-to-PG binary equality end up
> being visible to our users?  Independent of that, are there places in
> the backend which could actually be refactored to use these new
> operators where it would reduce code complexity?

Well, the most obvious example is HOT.  README.HOT sayeth:

% The requirement for doing a HOT update is that none of the indexed
% columns are changed.  This is checked at execution time by comparing the
% binary representation of the old and new values.  We insist on bitwise
% equality rather than using datatype-specific equality routines.  The
% main reason to avoid the latter is that there might be multiple notions
% of equality for a datatype, and we don't know exactly which one is
% relevant for the indexes at hand.  We assume that bitwise equality
% guarantees equality for all purposes.

That bit about multiple notions of equality applies here too - because
we don't know what the user will do with the data in the materialized
view, Kevin's looking for an operator which guarantees equality for
all purposes.  Note that HOT could feed everything through the send
and recv functions, as you are proposing here, and that would allow
HOT to apply in cases where it does not currently apply.  We could,
for example, perform a HOT update when a value in a numeric column is
changed from the long format to the short format without changing the
user-perceived value.

You could argue that HOT isn't user-visible, but we certainly advise
people to think about structuring their indexing in a fashion that
does not defeat HOT, so I think to some extent it is user-visible.

Also, in xfunc.sgml, we have this note:
       The planner also sometimes relies on comparing constants via       bitwise equality, so you can get undesirable
planningresults if       logically-equivalent values aren't bitwise equal.

There are other places as well.  If two node trees are compared using
equal(), any Const nodes in that tree will be compared for binary
equality.  So for example MergeWithExistingConstraint() will error out
if the constraints are equal under btree equality operators but not
binary equal.  equal() is also used in various places in the planner,
which may be the reason for the above warning.

The point I want to make here is that we have an existing precedent to
use bitwise equality when we want to make sure that values are
equivalent for all purposes, regardless of what opclass or whatever is
in use.  There are not a ton of those places but there are some.

>> On the other hand, if you are *replicating* those data types, then you
>> don't want that tolerance.  If you're checking whether two boxes are
>> equal, you may indeed want the small amount of fuzziness that our
>> comparison operators allow.  But if you're copying a box or a float
>> from one table to another, or from one database to another, you want
>> the values copied exactly, including all of those low-order bits that
>> tend to foul up your comparisons.  That's why float8out() normally
>> doesn't display any extra_float_digits - because you as the user
>> shouldn't be relying on them - but pg_dump does back them up because
>> not doing so would allow errors to propagate.  Similarly here.
>
> I agree that we should be copying the values exactly- and I think we're
> already good there when it comes to doing a *copy*.  I further agree
> that updating the matview should be a copy, but the manner in which
> we're doing that is using an equality check to see if the value needs to
> be updated or not which is where things get a bit fuzzy.  If we were
> consistently copying and updating the value based on some external
> knowledge that the value has changed (similar to how slony works w/
> triggers that dump change sets into a table- it doesn't consider "has
> any value on this row changed?"; the user did an update, presumably for
> some purpose, therefore the change gets recorded and propagated), I'd be
> perfectly happy.

Sure, that'd work, but it doesn't explain what's wrong with Kevin's
proposal.  You're basically saying that memcpy(a, b, len) is OK with
you but if (memcmp(a, b, len) != 0) memcpy(a, b, len) is not OK with
you.  I don't understand how you can endorse copying the value
exactly, but not be OK with the optimization that says, well if it
already matches exactly, then we don't need to copy it.

We can certainly rip out the current implementation of REFRESH
MATERIALIZED VIEW CONCURRENTLY and replace it with something that
deletes every row in the view and reinserts them all, but it will be
far less efficient than what we have now.  All that is anybody is
asking for here is the ability to skip deleting and reinserting rows
that are absolutely identical in the old and new versions of the view.

Your send/recv proposal would let us also skip deleting and
reinserting rows that are ALMOST identical except for
not-normally-user-visible binary format differences... but since we
haven't worried about allowing such cases for e.g. HOT updates, I
don't think we need to worry about them here, either.  In practice,
such changes are rare as hen's teeth anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Kevin Grittner

Date:

03 October 2013, 18:03:40

Stephen Frost <sfrost@snowman.net> wrote:

> If we were consistently copying and updating the value based on
> some external knowledge that the value has changed (similar to
> how slony works w/ triggers that dump change sets into a table-
> it doesn't consider "has any value on this row changed?"; the
> user did an update, presumably for some purpose, therefore the
> change gets recorded and propagated), I'd be perfectly happy.

That is the equivalent of the incremental maintenance feature which
won't be coming until after REFRESH has settled down.  REFRESH is
more like the slony initial population of a table.  REFRESH
CONCURRENTLY is a way of doing that which allows users to continue
to read from the matview without blocking or interruption while the
REFRESH is running; I think it would be a bug if that could
generate different results from a non-CONCURRENT REFRESH.

Wanting incremental maintenance (which we all want) is no reason to
block an earlier implementation of REFRESH CONCURRENTLY which is
not, and does not use, incremental maintenance.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: record identical operator

From

Peter Geoghegan

Date:

03 October 2013, 18:28:09

On Thu, Oct 3, 2013 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> The point I want to make here is that we have an existing precedent to
> use bitwise equality when we want to make sure that values are
> equivalent for all purposes, regardless of what opclass or whatever is
> in use.  There are not a ton of those places but there are some.

Btree opclasses (which are of course special, per "35.14.6. System
Dependencies on Operator Classes") are restricted by the reflexive
law, which implies that bitwise equality is a stronger condition than
regularly equality. I'm inclined to agree that it isn't a big deal to
expose a bitwise equality operator.

We're talking about the possibility of being potentially overly eager
according to someone's definition about refreshing, not the opposite.
With reference to an actual case where this is possible, I don't think
this is astonishing, though I grant Stephen that that's aesthetic.

I also agree that an intermediate notion of equality isn't worth it.

-- 
Peter Geoghegan

Re: record identical operator

From

Stephen Frost

Date:

03 October 2013, 22:19:30

* Robert Haas (robertmhaas@gmail.com) wrote:
> You could argue that HOT isn't user-visible, but we certainly advise
> people to think about structuring their indexing in a fashion that
> does not defeat HOT, so I think to some extent it is user-visible.

I do think saying HOT is user-visible is really stretching things and do
we really say somewhere "please be careful to make sure your updates to
key fields are *BINARY IDENTICAL* to what's stored or HOT won't be
used"?  If anything, that should be listed on a PG 'gotchas' page.

> Also, in xfunc.sgml, we have this note:
>
>         The planner also sometimes relies on comparing constants via
>         bitwise equality, so you can get undesirable planning results if
>         logically-equivalent values aren't bitwise equal.

And that would be under "C-Language Functions", which also includes
things like "take care to zero out any alignment padding bytes that
might be present in structs".

> There are other places as well.  If two node trees are compared using
> equal(), any Const nodes in that tree will be compared for binary
> equality.

These would primairly be cases where we've created a Const out of a
string, or similar, which the *user provided*, no?  It strikes me as at
least unlikely that we'd end up storing a given string from the user in
different ways in memory and so this consideration, again, makes sense
for people writing C code but not for your general SQL user.

> So for example MergeWithExistingConstraint() will error out
> if the constraints are equal under btree equality operators but not
> binary equal.  equal() is also used in various places in the planner,
> which may be the reason for the above warning.

I wonder if this would need to be changed because you could actually
define constraints that operate at a binary level and therefore don't
overlap even though they look like they overlap based on btree equality.

> The point I want to make here is that we have an existing precedent to
> use bitwise equality when we want to make sure that values are
> equivalent for all purposes, regardless of what opclass or whatever is
> in use.  There are not a ton of those places but there are some.

I agree that there are some cases and further that these operators
provide a way of saying "are these definitely the same?" but they fall
down on "are these definitely different?"  That makes these operators
useful for these kinds of optimizations, but that's it.  Providing SQL
level optimization-only operators like this is akin to the SQL standard
defining indexes.

> Sure, that'd work, but it doesn't explain what's wrong with Kevin's
> proposal.  You're basically saying that memcpy(a, b, len) is OK with
> you but if (memcmp(a, b, len) != 0) memcpy(a, b, len) is not OK with
> you.  I don't understand how you can endorse copying the value
> exactly, but not be OK with the optimization that says, well if it
> already matches exactly, then we don't need to copy it.

Adding new operators isn't all about what happens at the C-code level.

That said, I agree that PG, in general, is more 'open' to exposing
implementation details than is perhaps ideal, but it can also be quite
useful in some instances.  I don't really like doing that in top-level
operators like this, but it doesn't seem like there's a whole lot of
help for it.  I'm not convinced that using the send/recv approach would
be all that big of a performance hit but I've not tested it.

> We can certainly rip out the current implementation of REFRESH
> MATERIALIZED VIEW CONCURRENTLY and replace it with something that
> deletes every row in the view and reinserts them all, but it will be
> far less efficient than what we have now.  All that is anybody is
> asking for here is the ability to skip deleting and reinserting rows
> that are absolutely identical in the old and new versions of the view.

If this was an entirely internal thing, it'd be different, but it's not.

> Your send/recv proposal would let us also skip deleting and
> reinserting rows that are ALMOST identical except for
> not-normally-user-visible binary format differences... but since we
> haven't worried about allowing such cases for e.g. HOT updates, I
> don't think we need to worry about them here, either.  In practice,
> such changes are rare as hen's teeth anyway.

I'm not entirely convinced that what was done for HOT in this regard is
a precedent we should be building on top of.
Thanks,
    Stephen

Re: record identical operator

From

Hannu Krosing

Date:

04 October 2013, 00:15:07

On 10/04/2013 12:22 AM, Stephen Frost wrote:
> That said, I agree that PG, in general, is more 'open' to exposing
> implementation details than is perhaps ideal, 
Every *real* system is more open to exposing implementation
details than is *ideal*.

One very popular "implementation detail" which surprises users
over and over is performance under different use cases.

There is no way you can hide this.

That said, I see nothing bad in having an operator for "binary equal"
or alternately called "guaranteed equal".

Its negator is not "guaranteed unequal" but "not guaranteed to be equal"

The main exposed implementation detail of this operator is that it is
very fast and can be recommended to be used at user level for speeding
up equal query like this

SELECT * FROM t WHERE <guaranteed equal> or <equal>

where the plain <equal> will only be called when fast <guaranteed equal>
fails.

a bit similar to how you can cut down on index size on long text fields by
indexing their hashes and then querying

SELECT * FROM tWHERE hashtext(verylongtext) = hashtext(sample)   AND verylongtext = sample

It is absolutely possible that the fact that hash clashes exist also
confuses
some users the same way the possibility of having binary inequality and
be still NOT DISTINCT FROM, but I argue that having a fast "guaranteed
equal"
operation available to users is useful.

some users may also initially get confused by SELECT 3/2; returning 1
and not
"the right answer" of 1.5.

They may even bitch and moan that PostgreSQL is completely broken.

But then they look it up or ask on the net and are confused no more.

Greetings
Hannu

Re: record identical operator

From

Stephen Frost

Date:

04 October 2013, 01:07:00

* Hannu Krosing (hannu@krosing.net) wrote:
> The main exposed implementation detail of this operator is that it is
> very fast and can be recommended to be used at user level for speeding
> up equal query like this
>
> SELECT * FROM t WHERE <guaranteed equal> or <equal>
>
> where the plain <equal> will only be called when fast <guaranteed equal>
> fails.

Yeah, this would be exactly the kind of misuse that we will need to be
prepared to support with these new operators.  If this is actually
faster/better/whatever, then we should be implementing it in our
conditional handling, not encouraging users to create hacks like this.

> a bit similar to how you can cut down on index size on long text fields by
> indexing their hashes and then querying
>
> SELECT * FROM t
>  WHERE hashtext(verylongtext) = hashtext(sample)
>     AND verylongtext = sample

This case clearly requires a great deal more thought and consideration
on the DBA's side and is also a lot more obvious what it's doing than
having 'where x *= 123 or x = 123'.
Thanks,
    Stephen

Re: record identical operator - Review

From

Bruce Momjian

Date:

07 October 2013, 00:20:59

On Thu, Oct  3, 2013 at 10:46:05AM -0700, Kevin Grittner wrote:
> >           opcname
> >     ---------------------
> >     varchar_ops
> >     kd_point_ops
> >     cidr_ops
> >     text_pattern_ops
> >     varchar_pattern_ops
> >     bpchar_pattern_ops
> >     (6 rows)
> >
> > Do these all have operators defined too?
> 
> Every operator class is associated with operators.  For example,
> while text_pattern_ops uses the same = and <> operators as the
> default text opclass (because that already uses memcmp), it adds
> ~>~, ~>=~, ~<~, and ~<=~ operators which also use memcmp (ignoring
> character encoding and collation).

OK, my questions have been answered and I am no longer concerned about
this patch causing "equality" confusion for our users.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +