Thread: INSERT ... SELECT DISTINCT - Doesn't work...
Hi I'm trying to normalize a database, and want to derivate one relation from another. The original relation is like: user_operations ( user_id integer, user_name varchar, user_operation varchar) It stores the 'user_operation' item, and also works as a user database (or some like this...). To get the different users from the database, I have this query: SELECT DISTINCT user_id, user_name FROM user_operations; I need to do some normalization works here, and want to insert data from user_operations in the new 'users' relation: CREATE TABLE users ( user_id integer, user_name varchar); To inser the data, I have tried the straight: INSERT INTO users (user_id, user_name) SELECT DISTINCT user_id, user_name FROM user_operations; But it doesn't work as I expect. Suppose there are 15000 rows at user_operations, but only 50 different (user_id, user_name). The SELECT DISTINCT returns only 50 rows, but the INSERT ... SELECT DISTINCT inserts 15000 rows! I think the DISTINCT clause, when used in a INSERT INTO ... SELECT doesn't have any effect... Is it a bug? Can anybody help me figure out how to get only the different (user_id, user_name) from user_operations, without any repeat? TIA -- César A. K. Grossmann ckant@usa.net http://members.xoom.com/ckant/ http://www.halcyon.com/sciclub/cgi-pvt/instr/instr.html
"Cesar A. K. Grossmann" wrote: > > Hi > > I'm trying to normalize a database, and want to derivate one relation > from another. The original relation is like: > > user_operations ( > user_id integer, > user_name varchar, > user_operation varchar) > > It stores the 'user_operation' item, and also works as a user database > (or some like this...). To get the different users from the database, I > have this query: > > SELECT DISTINCT user_id, user_name FROM user_operations; > > I need to do some normalization works here, and want to insert data from > user_operations in the new 'users' relation: > > CREATE TABLE users ( > user_id integer, > user_name varchar); > > To inser the data, I have tried the straight: > > INSERT > INTO users (user_id, user_name) > SELECT DISTINCT user_id, user_name FROM user_operations; > > But it doesn't work as I expect. Suppose there are 15000 rows at > user_operations, but only 50 different (user_id, user_name). The SELECT > DISTINCT returns only 50 rows, but the INSERT ... SELECT DISTINCT > inserts 15000 rows! > > I think the DISTINCT clause, when used in a INSERT INTO ... SELECT > doesn't have any effect... Is it a bug? > > Can anybody help me figure out how to get only the different (user_id, > user_name) from user_operations, without any repeat? Hmm. I can't repeat this behavior in 7.0.0beta3. Are you using the older 6.x series? If so, you might try: "CREATE TABLE xxxx AS SELECT DISTINCT..." or "SELECT DISTINCT * INTO bar FROM foo..." but who knows how 6.x will behave... I can only recommend upgrading at your earliest convenience. Mike Mascari
Mike Mascari <mascarm@mascari.com> writes: > "Cesar A. K. Grossmann" wrote: >> I think the DISTINCT clause, when used in a INSERT INTO ... SELECT >> doesn't have any effect... Is it a bug? > Hmm. I can't repeat this behavior in 7.0.0beta3. Are you using > the older 6.x series? INSERT ... SELECT DISTINCT sort of works in 6.5 (I think it might have been completely broken in the distant past). The example Cesar gave looked safe enough, but you can get burnt by the problem that what's actually "DISTINCT'd" is the completed tuples ready to be inserted in the target table. So, for example, CREATE TABLE dest (f1 int, f2 serial); INSERT INTO dest(f1) SELECT DISTINCT f1 FROM src; won't do what it should because the f2 values are distinct. 7.0 fixes that problem, but it still has nasty bugs if the destination column datatypes aren't the same as those of the source data you are DISTINCT'ing. The underlying problem here is that we really need a two-level querytree representation, so that the SELECT stuff can happen separately from preparing the data to be INSERTed. We intend to fix all this in the 7.2 development cycle; there's a wholesale querytree redesign planned for that release. Cesar's example looked like it didn't run into these problems, so I'm not sure what's going wrong for him. regards, tom lane
> Cesar's example looked like it didn't run into these problems, so I'm > not sure what's going wrong for him. I'm sorry. In fact, I was not so carefull with my analisis and the example I provided you was not exactly like the real problem. First, I have failed to use the same type in the columns (integer in one relation, varchar in the other). I think this was the major problem. To make the example more realistic: user_operations ( user_id integer, user_name varchar, user_operation varchar) CREATE TABLE users ( user_id varchar, user_name varchar); the column user_id from the relation users was with the wrong type (varchar). I changed it to integer (as user_operations), and the INSERT works well. []s -- César A. K. Grossmann ckant@usa.net http://members.xoom.com/ckant/ Para os sádicos e corajosos e infantis: http://www.halcyon.com/sciclub/cgi-pvt/instr/instr.html