I can't get the following statement to complete with reasonable time.
I've had it running for over ten hours without getting anywhere. I
suspect (hope) there may be a better way to accomplish what I'm trying
to do (set fields containing unique values to null):
UPDATE requests
SET session = NULL
WHERE session IN
(
SELECT session
FROM requests
GROUP BY session
HAVING COUNT(*) = 1
);
Output of EXPLAIN:
Nested Loop
(cost=170350.16..305352.37 rows=33533 width=98)
-> HashAggregate
(cost=170350.16..170350.16 rows=200 width=8)
-> Subquery Scan "IN_subquery"
(cost=169728.12..170261.30 rows=35545 width=8)
-> HashAggregate
(cost=169728.12..169905.85 rows=35545 width=8)
Filter: (count(*) = 1)
-> Seq Scan on requests
(cost=0.00..139207.75 rows=6104075 width=8)
-> Index Scan using requests_session_idx on requests
(cost=0.00..672.92 rows=168 width=106)
Index Cond: (requests."session" = "outer"."session")
If I drop the index on requests(session):
Hash Join
(cost=170350.66..340414.12 rows=33533 width=98)
Hash Cond: ("outer"."session" = "inner"."session")
-> Seq Scan on requests
(cost=0.00..139207.75 rows=6104075 width=106)
-> Hash
(cost=170350.16..170350.16 rows=200 width=8)
-> HashAggregate
(cost=170350.16..170350.16 rows=200 width=8)
-> Subquery Scan "IN_subquery"
(cost=169728.12..170261.30 rows=35545 width=8)
-> HashAggregate
(cost=169728.12..169905.85 rows=35545 width=8)
Filter: (count(*) = 1)
-> Seq Scan on requests
(cost=0.00..139207.75 rows=6104075
width=8)
The subquery itself requires 5-10 min to run on its own, and may return
several million rows.
Using EXISTS rather than IN (I'm using 7.4-RC2, not sure if IN queries
were already improved in this release):
UPDATE requests
SET session = NULL
WHERE NOT EXISTS
(
SELECT r.session
FROM requests r
WHERE
r.session = session
AND NOT r.id = id
);
With and without index:
Result
(cost=227855.74..415334.22 rows=8075449 width=101)
One-Time Filter: (NOT $0)
InitPlan
-> Seq Scan on requests r
(cost=0.00..227855.74 rows=201 width=8)
Filter: (("session" = "session") AND (id <> id))
-> Seq Scan on requests
(cost=0.00..187478.49 rows=8075449 width=101)
I've been running this for more than an hour so far, and no end in
sight, either... Any ideas?