Thread: Performance tradeoff

Performance tradeoff

From
"Shawn Chisholm"
Date:
Hi All,

I am wondering about the relative performance of  "insert into table1 select distinct a,b from ..." and "insert into
table1select a,b from ... group by a,b" when querying tables of different sizes (10K, 100K, 1s, 10s, 100s of millions
ofrows).   

The distinct way tends to sort/unique and the group by tends to hash aggregate... any opinions on which is better?

I can also change the schema to a certain extent, so would it be worthwhile to put indices on the queried tables (or
refactorthem) hoping the distinct does an index scan instead of sort...  would the query planner take advantage of
that?

Thanks,

Shawn


Re: Performance tradeoff

From
Josh Berkus
Date:
Shawn,

> I can also change the schema to a certain extent, so would it be worthwhile
> to put indices on the queried tables (or refactor them) hoping the distinct
> does an index scan instead of sort...  would the query planner take
> advantage of that?

Use the GROUP BY, with an index on the grouped columns and lots of work_mem
(sort_mem in 7.4).   This will give the planner the option of a hashaggregate
which could be significantly faster than the other methods.

--
Josh Berkus
Aglio Database Solutions
San Francisco