Re: record identical operator - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: record identical operator
Date
Msg-id 20130924143921.GS2706@tamriel.snowman.net
Whole thread Raw
In response to Re: record identical operator  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: record identical operator
List pgsql-hackers
* Kevin Grittner (kgrittn@ymail.com) wrote:
> Stephen Frost <sfrost@snowman.net> wrote:
> > I worry that adding these will come back to bite us later
>
> How?

User misuse is certainly one consideration, but I wonder what's going to
happen if we change our internal representation of data (eg: numerics
get changed again), or when incremental matview maintenance happens and
we start looking at subsets of rows instead of the entire query.  Will
the first update of a matview after a change to numeric's internal data
structure cause the entire thing to be rewritten?

> > and that we're making promises we won't be able to keep.
>
> The promise that a concurrent refresh will produce the same set of
> rows as a non-concurrent one?

The promise that we'll always return the binary representation of the
data that we saw last.  When greatest(x,y) comes back 'false' for a
MAX(), we then have to go check "well, does the type consider them
equal?", because, if the type considers them equal, we then have to
decide if we should replace x with y anyway, because it's different
at a binary level.  That's what we're saying we'll always do now.

We're also saying that we'll replace things based on plan differences
rather than based on if the rows underneath actually changed at all.
We could end up with material differences in the result of matviews
updated through incremental REFRESH and matviews updated through
actual incremental mainteance- and people may *care* about those
because we've told them (or they discover) they can depend on these
types of changes to be reflected in the result.

> > Trying to do this incremental-but-not-really maintenance where
> > the whole query is run but we try to skimp on what's actually
> > getting updated in the matview is a premature optimization, imv,
> > and one which may be less performant and more painful, with more
> > gotchas and challenges for our users, to deal with in the long
> > run.
>
> I have the evidence of a ten-fold performance improvement plus
> minimized WAL and replication work on my side.  What evidence do
> you have to back your assertions?  (Don't forget to work in bloat
> and vacuum truncation issues to the costs of your proposal.)

I don't doubt that there are cases in both directions and I'm not trying
to argue that it'd always be faster, but I doubt it's always slower.
I'm surprised that you had a case where the query was apparently quite
fast yet the data set hardly changed and resulted in a very large result
but I don't doubt that it happened.  What I was trying to get at is
really that the delete/insert approach would be good enough in very many
cases and it wouldn't have what look, to me anyway, as some pretty ugly
warts around these cases.
Thanks,        Stephen

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Next
From: Robert Haas
Date:
Subject: Re: ENABLE/DISABLE CONSTRAINT NAME