Thread: Whole-row comparison ?

Whole-row comparison ?

From

Date:

17 May 2007, 10:21:56

Hi there,

I'm trying to implement a "versionned" storage for a datawarehouse system, meaning I have tables which store daily information about objects and I would like to have a row inserted (i.e. a new version) only if it differs from the most recent existing row. For instance instead of storing

version attribute1 attribute2

1 x y

2 x y

3 x y

4 z y

5 z y

6 z t

we would only keep the diffs :

version attribute1 attribute2

1 x y

4 z y

6 z t

This would save lots of space and the full table could be easily accessed through a view.

However as the tables have a large number of rows (several hundreds) I would like to avoid having to write conditions like WHERE (old.att1 != new.attr1 or old.attr2 != new.attr2 or ... or old.attr245 != new.attr245 )

So my question is: is there a way to do some kind of whole-row comparison ? For instance a hash of the full row or something similar ? Is there a standard way of solving this problem ?

Thanks a lot for any hind !

Christian

Re: Whole-row comparison ?

From

Richard Broersma Jr

Date:

17 May 2007, 10:38:36

> So my question is: is there a way to do some kind of whole-row
> comparison ? For instance a hash of the full row or something similar ?
> Is there a standard way of solving this problem ?

Sure it is called row wise comparison:
http://www.postgresql.org/docs/8.2/interactive/functions-comparisons.html#ROW-WISE-COMPARISON

you can write something like:

SELECT * FROM Your_tableWHERE (c1,c2,c3,c4) = ('a',2,'xyz',123);

Regards,
Richard Broersma Jr.

Re: Whole-row comparison ?

From

Richard Broersma Jr

Date:

17 May 2007, 14:20:42

--- christian.roche.ext@nsn.com wrote:
> select * from temp."BSC_Table" t, public.bsc_view p where t.id = p.id
> and row(t) <> row(p);
> 
> ERROR: operator does not exist: "temp"."BSC_Table" <> bsc_view
> SQL state: 42883
> Hint: No operator matches the given name and argument type(s). You may
> need to add explicit type casts.
> 

Don't forget to reply all so that everyone on the list can participate.  Also, do this is good
since it doesn't limit you to my limited knowledge. ;)

row() doesn't do what you think it does.

you have to specify every column that you want to compare, so:

row(t.col1, t.col2, t.col3, t.col4) <> row(p.col1, p.col2, p.col3, p.col4)

where the datatypes of each respective column match. i.e. t.col1 maps to p.col1, ...

Regards,
Richard Broersma Jr.

Re: Whole-row comparison ?

From

Date:

17 May 2007, 15:00:54

Ok I understand now that this row(t.*) syntax is new to postgres 8.2

As explained in the documentation, §4.2.11. Row Constructors:

A row constructor can include the syntax rowvalue.*, which will be expanded to a list of the elements of the row value,
justas occurs when the .* syntax is used at the top level of a SELECT list. For example, if table t has columns f1 and
f2,these are the same: 
SELECT ROW(t.*, 42) FROM t;
SELECT ROW(t.f1, t.f2, 42) FROM t;

This would obviously simplify the syntax of my queries a lot since I have so many columns.  I'm going to try and
installthe newest version and check it.  I'll keep you informed. 

Thanks to all!
Christian

---
> select * from temp."BSC_Table" t, public.bsc_view p where t.id = p.id
> and row(t) <> row(p);
>
> ERROR: operator does not exist: "temp"."BSC_Table" <> bsc_view SQL
> state: 42883
> Hint: No operator matches the given name and argument type(s). You may
> need to add explicit type casts.
>

Don't forget to reply all so that everyone on the list can participate.  Also, do this is good since it doesn't limit
youto my limited knowledge. ;) 

row() doesn't do what you think it does.

you have to specify every column that you want to compare, so:

row(t.col1, t.col2, t.col3, t.col4) <> row(p.col1, p.col2, p.col3, p.col4)

where the datatypes of each respective column match. i.e. t.col1 maps to p.col1, ...

Regards,
Richard Broersma Jr.

Versionning (was: Whole-row comparison)

From

Date:

01 June 2007, 14:08:08

Hi all,

first let me tell you that this nice "whole-row comparison" feature of
postgres 8.2 allowed me to create a versionned database model very
neatly.  The SQL statement that inserts in the destination table only
the one rows that are new or that have changed since last time is very
simply written:

INSERT INTO bsc_table
SELECT nextval('version_seq'), <fields>
FROM load.bsc_table AS ld LEFT JOIN bsc_view AS nt USING (obj_id)
WHERE nt.obj_id IS NULL OR row(nt.*) <> row(ld.*);

bsc_view is a view that returns the latest version of each object in the
bsc table:

CREATE VIEW bsc_view AS
SELECT  <fields>
FROM bsc_table
WHERE (obj_id, ver_id) IN (SELECT obj_id, max(ver_id) FROM bsc_table
GROUP BY obj_id);

This is all nice as long as I only want to access the very last version
of the table.  However what should be done if I now need to access an
earlier version ?  The most elegant way would be to pass a ver_id
parameter to bsc_view, something like :

CREATE VIEW bsc_view(int) AS
SELECT  <fields>
FROM bsc_table
WHERE (obj_id, ver_id) IN (SELECT obj_id, max(ver_id)    FROM bsc_table      WHERE ver_id <= $1        GROUP BY
obj_id));

However postgres doesn't allow parameters in views as far as I know.  I
guess I could create a function returning a set of rows, but then I
would lose most advantages of rewritten views, especially optimization,
right ?

I've contemplated reusing an awful hack from my Access era, namely using
a single-rowed table to store the parameter and joining the view on it.
The parameter would be updated before the view is called; this would
work but would definitely be ugly.  Can someone think of a better way to
do that ?

Thanks a lot,
Christian

-----Original Message-----
I'm trying to implement a "versionned" storage for a datawarehouse
system, meaning I have tables which store daily information about
objects and I would like to have a row inserted (i.e. a new version)
only if it differs from the most recent existing row.  For instance
instead of storing
version    attribute1    attribute2
1            x                y
2            x                y
3            x                y
4            z                y
5            z                y
6            z                t
we would only keep the diffs :
version    attribute1    attribute2
1            x                y
4            z                y
6            z                t

Re: Versionning (was: Whole-row comparison)

From

Andrew Sullivan

Date:

01 June 2007, 14:47:34

On Fri, Jun 01, 2007 at 08:07:46PM +0300, christian.roche.ext@nsn.com wrote:
> I've contemplated reusing an awful hack from my Access era, namely using
> a single-rowed table to store the parameter and joining the view on it.
> The parameter would be updated before the view is called; this would
> work but would definitely be ugly.  Can someone think of a better way to
> do that ?

I sort of don't see how that hack would be any different from a SRF. 
You'd lose the planner benefits anyway, I think, because you'd have
to plan for the generic case where the data could be anything, no?

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
Users never remark, "Wow, this software may be buggy and hard 
to use, but at least there is a lot of code underneath."    --Damien Katz

Re: Versionning (was: Whole-row comparison)

From

Date:

01 June 2007, 15:07:20

Hi Andrew,

I must admit I don't really understand what you say.  I don't know what
SRF stand for, and what you say about generic case is not clear to me,
sorry.

My idea is that using a parameter table allows me to keep using a view,
which is optimized for instance when used against a WHERE condition.

For example, I could write :

CREATE VIEW bsc_view AS
SELECT  <fields>
FROM bsc_table
WHERE (obj_id, ver_id) IN (SELECT obj_id, max(ver_id) FROM bsc_table, param_table    WHERE ver_id <= param_table.ver_id
    GROUP BY obj_id)); 

and the following statement would be optimized:

UPDATE param_table SET ver_id = xxx;
SELECT * FROM bsc_view WHERE obj_id = yyy;

which would not be the case would I have used a multi-row function.

Does this make sense ?

Thanks a lot,
Christian

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of ext Andrew Sullivan
Sent: Friday, June 01, 2007 17:47
To: pgsql-sql@postgresql.org
Subject: Re: [SQL] Versionning (was: Whole-row comparison)

On Fri, Jun 01, 2007 at 08:07:46PM +0300, christian.roche.ext@nsn.com
wrote:
> I've contemplated reusing an awful hack from my Access era, namely
> using a single-rowed table to store the parameter and joining the view
on it.
> The parameter would be updated before the view is called; this would
> work but would definitely be ugly.  Can someone think of a better way
> to do that ?

I sort of don't see how that hack would be any different from a SRF.
You'd lose the planner benefits anyway, I think, because you'd have to
plan for the generic case where the data could be anything, no?

A

Re: Versionning (was: Whole-row comparison)

From

Andrew Sullivan

Date:

01 June 2007, 15:51:39

On Fri, Jun 01, 2007 at 09:07:10PM +0300, christian.roche.ext@nsn.com wrote:
>  
> Hi Andrew,
> 
> I must admit I don't really understand what you say.  I don't know what
> SRF stand for, and what you say about generic case is not clear to me,
> sorry.

Sorry, it stands for set returning function.  I thought someone
upthread suggested that instead of a view.

> and the following statement would be optimized:
> 
> UPDATE param_table SET ver_id = xxx;
> SELECT * FROM bsc_view WHERE obj_id = yyy; 
> 
> which would not be the case would I have used a multi-row function.
> 
> Does this make sense ?

Yes, but I don't think it's true.  Because you change the value of
ver_id all the time, the actual result can't be collapsed to a
constant, so you end up having to execute the query with the
additional value, and you still have to plan that.  The same thing is
true of a function, which will have its plan prepared the first time
you execute it.  (I could be wrong about this; I suppose the only way
would be to try it.)

A
-- 
Andrew Sullivan  | ajs@crankycanuck.ca
However important originality may be in some fields, restraint and 
adherence to procedure emerge as the more significant virtues in a 
great many others.   --Alain de Botton

Re: Versionning (was: Whole-row comparison)

From

Date:

04 June 2007, 07:40:44

Hi Andrew,

what is worrying me is that if I use a SRF, any additional WHERE
condition would not be taken into account before executing the
underlying query, e.g., in this request using a view, the WHERE
condition would be considered in the final query :

UPDATE params
SET version = ver_id;

SELECT *
FROM bsc_list_view
WHERE obj_id = 'xxx';

because the bsc_list_view would be expanded to the underlying request,
while using a SRF, the whole table would be scaned before the WHERE
condition is applied:

SELECT *
FROM bsc_list_srf(ver_id)
WHERE obj_id = 'xxx';

This is what I mean when I say that the optimization would be lost when
using a SRF.  Now what is the "Right Thing To Do" in this particular
case ?  The nicest thing would really to have parametrized view.  Is
there any fundamental reason why such a beast does not exist, or is it
only postgres (compared to higher-level RDBMS) ?

Thanks a lot !
Christian


-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of ext Andrew Sullivan
Sent: Friday, June 01, 2007 18:51
To: pgsql-sql@postgresql.org
Subject: Re: [SQL] Versionning (was: Whole-row comparison)


Yes, but I don't think it's true.  Because you change the value of
ver_id all the time, the actual result can't be collapsed to a constant,
so you end up having to execute the query with the additional value, and
you still have to plan that.  The same thing is true of a function,
which will have its plan prepared the first time you execute it.  (I
could be wrong about this; I suppose the only way would be to try it.)

Re: Versionning (was: Whole-row comparison)

From

Andrew Sullivan

Date:

04 June 2007, 10:12:41

On Mon, Jun 04, 2007 at 01:40:18PM +0300, christian.roche.ext@nsn.com wrote:
> case ?  The nicest thing would really to have parametrized view.  Is
> there any fundamental reason why such a beast does not exist, or is it
> only postgres (compared to higher-level RDBMS) ?

I don't think there's a fundamental reason, no.  But why couldn't you
change your query to issue the SRF directly, with the parameter:

SELECT * FROM some_srf(param1, param2)?

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
"The year's penultimate month" is not in truth a good way of saying
November.    --H.W. Fowler