Thread: combining semi-duplicate rows

combining semi-duplicate rows

From

"hjenkins"

Date:

18 December 2007, 18:44:34

Hello,

I have a table of rows which partially duplicate one another and need to
be merged and moved into a table with a primary key.

As an additional complication, some of the duplicates contain different
information, ex.:
schema1.datatable: key1  None None 3    4
schema2.datatable: key1  1    2    7    None

desired result:
schema1.datatable: key1  1    2    3    4

I looked for a specific function that would do this sort of merge and
failed. So I tried:

UPDATE schema1.datatable SET schema1.datatable.field1 =
schema2.datatable.field1 FROM schema2.datatable
WHERE schema2.datatable.keyfield = schema1.datatable.keyfield
AND schema1.datatable.field1 = None;

Which is suboptimal because I'd need a command for each field, but it
would be a start. However, the schema names are not recognised. I get
error messages to the effect that "cross-database references are not
implemented" or "relation "schema1/2" does not exist.Even the much simpler

SELECT DISTINCT schema2.datatable INTO schema1.datatable;

...gives me these messages. Qualifying right up to the database level
produces "improper qualified name (too many dotted names)".

I'm pretty sure that this isn't a capitalization/quoting problem as
described in the FAQ. Is it not possible to use these functions between
schemas? Or am I misusing the functions in a more basic way?

The problem is somewhat similar to this one:
http://archives.postgresql.org/pgsql-sql/2007-02/msg00055.php

Namely,

Regards,
H.Jenkins

Re: combining semi-duplicate rows

From

Tom Lane

Date:

18 December 2007, 20:28:57

"hjenkins" <hjenkins@uvic.ca> writes:
> So I tried:

> UPDATE schema1.datatable SET schema1.datatable.field1 =
> schema2.datatable.field1 FROM schema2.datatable
> WHERE schema2.datatable.keyfield = schema1.datatable.keyfield
> AND schema1.datatable.field1 = None;

> Which is suboptimal because I'd need a command for each field, but it
> would be a start. However, the schema names are not recognised. I get
> error messages to the effect that "cross-database references are not
> implemented" or "relation "schema1/2" does not exist.

The target column of a SET clause can't be qualified with the relation
name; it would introduce ambiguity in the case of composite-type fields,
and it's useless anyway since the target relation was already given.
Your example works for me (syntactically at least) as

regression=# UPDATE schema1.datatable SET field1 =
schema2.datatable.field1 FROM schema2.datatable
WHERE schema2.datatable.keyfield = schema1.datatable.keyfield
AND schema1.datatable.field1 = 'None';
UPDATE 0

Personally, though, I'd use some aliases to improve readability and
forestall the onset of carpal tunnel syndrome:

regression=# UPDATE schema1.datatable t SET field1 =
s.field1 FROM schema2.datatable s
WHERE s.keyfield = t.keyfield
AND t.field1 = 'None';
UPDATE 0

            regards, tom lane