Thread: INSERT ... SELECT DISTINCT - Doesn't work...

INSERT ... SELECT DISTINCT - Doesn't work...

From
"Cesar A. K. Grossmann"
Date:
Hi

I'm trying to normalize a database, and want to derivate one relation
from another. The original relation is like:

user_operations (
        user_id    integer,
        user_name  varchar,
        user_operation varchar)

It stores the 'user_operation' item, and also works as a user database
(or some like this...). To get the different users from the database, I
have this query:

SELECT DISTINCT user_id, user_name FROM user_operations;

I need to do some normalization works here, and want to insert data from
user_operations in the new 'users' relation:

CREATE TABLE users (
        user_id         integer,
        user_name       varchar);

To inser the data, I have tried the straight:

INSERT
        INTO users (user_id, user_name)
        SELECT DISTINCT user_id, user_name FROM user_operations;

But it doesn't work as I expect. Suppose there are 15000 rows at
user_operations, but only 50 different (user_id, user_name). The SELECT
DISTINCT returns only 50 rows, but the INSERT ... SELECT DISTINCT
inserts 15000 rows!

I think the DISTINCT clause, when used in a INSERT INTO ... SELECT
doesn't have any effect... Is it a bug?

Can anybody help me figure out how to get only the different (user_id,
user_name) from user_operations, without any repeat?

TIA
--
César A. K. Grossmann
ckant@usa.net
http://members.xoom.com/ckant/
http://www.halcyon.com/sciclub/cgi-pvt/instr/instr.html

Re: INSERT ... SELECT DISTINCT - Doesn't work...

From
Mike Mascari
Date:
"Cesar A. K. Grossmann" wrote:
>
> Hi
>
> I'm trying to normalize a database, and want to derivate one relation
> from another. The original relation is like:
>
> user_operations (
>         user_id    integer,
>         user_name  varchar,
>         user_operation varchar)
>
> It stores the 'user_operation' item, and also works as a user database
> (or some like this...). To get the different users from the database, I
> have this query:
>
> SELECT DISTINCT user_id, user_name FROM user_operations;
>
> I need to do some normalization works here, and want to insert data from
> user_operations in the new 'users' relation:
>
> CREATE TABLE users (
>         user_id         integer,
>         user_name       varchar);
>
> To inser the data, I have tried the straight:
>
> INSERT
>         INTO users (user_id, user_name)
>         SELECT DISTINCT user_id, user_name FROM user_operations;
>
> But it doesn't work as I expect. Suppose there are 15000 rows at
> user_operations, but only 50 different (user_id, user_name). The SELECT
> DISTINCT returns only 50 rows, but the INSERT ... SELECT DISTINCT
> inserts 15000 rows!
>
> I think the DISTINCT clause, when used in a INSERT INTO ... SELECT
> doesn't have any effect... Is it a bug?
>
> Can anybody help me figure out how to get only the different (user_id,
> user_name) from user_operations, without any repeat?

Hmm. I can't repeat this behavior in 7.0.0beta3. Are you using
the older 6.x series? If so, you might try:

"CREATE TABLE xxxx AS SELECT DISTINCT..."

or

"SELECT DISTINCT * INTO bar FROM foo..."

but who knows how 6.x will behave... I can only recommend
upgrading at your earliest convenience.

Mike Mascari

Re: INSERT ... SELECT DISTINCT - Doesn't work...

From
Tom Lane
Date:
Mike Mascari <mascarm@mascari.com> writes:
> "Cesar A. K. Grossmann" wrote:
>> I think the DISTINCT clause, when used in a INSERT INTO ... SELECT
>> doesn't have any effect... Is it a bug?

> Hmm. I can't repeat this behavior in 7.0.0beta3. Are you using
> the older 6.x series?

INSERT ... SELECT DISTINCT sort of works in 6.5 (I think it might have
been completely broken in the distant past).  The example Cesar gave
looked safe enough, but you can get burnt by the problem that what's
actually "DISTINCT'd" is the completed tuples ready to be inserted in
the target table.  So, for example,
    CREATE TABLE dest (f1 int, f2 serial);
    INSERT INTO dest(f1) SELECT DISTINCT f1 FROM src;
won't do what it should because the f2 values are distinct.

7.0 fixes that problem, but it still has nasty bugs if the destination
column datatypes aren't the same as those of the source data you
are DISTINCT'ing.  The underlying problem here is that we really need
a two-level querytree representation, so that the SELECT stuff can
happen separately from preparing the data to be INSERTed.  We intend
to fix all this in the 7.2 development cycle; there's a wholesale
querytree redesign planned for that release.

Cesar's example looked like it didn't run into these problems, so I'm
not sure what's going wrong for him.

            regards, tom lane

Re: INSERT ... SELECT DISTINCT - Doesn't work...

From
"Cesar A. K. Grossmann"
Date:
> Cesar's example looked like it didn't run into these problems, so I'm
> not sure what's going wrong for him.

I'm sorry. In fact, I was not so carefull with my analisis and the
example I provided you was not exactly like the real problem.

First, I have failed to use the same type in the columns (integer in one
relation, varchar in the other). I think this was the major problem. To
make the example more realistic:

user_operations (
        user_id    integer,
        user_name  varchar,
        user_operation varchar)

CREATE TABLE users (
        user_id         varchar,
        user_name       varchar);

the column user_id from the relation users was with the wrong type
(varchar). I changed it to integer (as user_operations), and the INSERT
works well.

[]s
--
César A. K. Grossmann
ckant@usa.net
http://members.xoom.com/ckant/
Para os sádicos e corajosos e infantis:
http://www.halcyon.com/sciclub/cgi-pvt/instr/instr.html