In light of this, do you still think it's worthwhile making this change?
For me, I think all it's going to result in is extra planner work without any performance gains.
Hmm, with the query below, I can see that the new plan is cheaper than the old plan, and the cost difference exceeds STD_FUZZ_FACTOR.
create table t (a int, b int); insert into t select i%100000, i from generate_series(1,10000000)i; analyze t;
-- on master explain (costs on) select distinct a from t order by a limit 1; QUERY PLAN -------------------------------------------------------------------------------------------------- Limit (cost=120188.50..120188.51 rows=1 width=4) -> Sort (cost=120188.50..120436.95 rows=99379 width=4) Sort Key: a -> HashAggregate (cost=118697.82..119691.61 rows=99379 width=4) Group Key: a -> Gather (cost=97331.33..118200.92 rows=198758 width=4) Workers Planned: 2 -> HashAggregate (cost=96331.33..97325.12 rows=99379 width=4) Group Key: a -> Parallel Seq Scan on t (cost=0.00..85914.67 rows=4166667 width=4) (10 rows)
-- on patched explain (costs on) select distinct a from t order by a limit 1; QUERY PLAN -------------------------------------------------------------------------------------------------- Limit (cost=106573.93..106574.17 rows=1 width=4) -> Unique (cost=106573.93..130260.88 rows=99379 width=4) -> Gather Merge (cost=106573.93..129763.98 rows=198758 width=4) Workers Planned: 2 -> Sort (cost=105573.91..105822.35 rows=99379 width=4) Sort Key: a -> HashAggregate (cost=96331.33..97325.12 rows=99379 width=4) Group Key: a -> Parallel Seq Scan on t (cost=0.00..85914.67 rows=4166667 width=4) (9 rows)
It seems that including a LIMIT clause can potentially favor the new plan.