This is all guesswork though, since we don't know quite what's happening on Robert's system. It might be worth setting "log_autovacuum_min_duration = 0" (either globally, or as a reloption on the relevant tables), and seeing if there seems to be any correlation between autovacuum/autoanalyze activity and the occurrences of poor plan choices.
Ok, doing a SET plan_cache_mode=force_custom_plan before the COPY and resetting it after appears to fix the problem. We’re going to run it over the weekend to make sure.
We are at it again. I have a DELETE operation that’s taking 48 minutes so far. I had set plan_cache_mode = force_custom_plan for the entire server before this happened, as we started seeing the COPY slowdown again. I have dtrace information again, but primarily shows the nested scan operation.
40665,15978,0 years 0 mons 0 days 0 hours 48 mins 49.62347 secs,2021-11-24 20:13:30.017188 +00:00,tapesystem,active,,DELETE FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND (bucket_id = $1)),Administrator
So how do we avoid this query plan? Do we need to start doing explicit analyzes after every delete?
EXPLAIN DELETE FROM ds3.blob WHERE EXISTS(SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND (bucket_id = '85b9e793-2141-455c-a752-90c2346cdfe1')); 250k objects in blob 256k objects in s3_object
’Normal’ explain, very few objects with that bucket.
QUERY PLAN Delete on blob (cost=0.71..6.76 rows=1 width=12) -> Nested Loop (cost=0.71..6.76 rows=1 width=12) -> Index Scan using ds3_s3_object__bucket_id on s3_object (cost=0.29..2.31 rows=1 width=22) Index Cond: (bucket_id = '85b9e793-2141-455c-a752-90c2346cdfe1'::uuid) -> Index Scan using ds3_blob__object_id on blob (cost=0.42..4.44 rows=1 width=22) Index Cond: (object_id = s3_object.id)