Home > mailing lists

Re: Why so slow? - Mailing list pgsql-performance

From	Alessandro Gagliardi
Subject	Re: Why so slow?
Date	February 20, 2012 17:14:55
Msg-id	CAAB3BBKpF8AsDG7D5wdWz7eqK6YwYurMQ2ymjSToXsojXiv2wA@mail.gmail.com Whole thread Raw
In response to	Re: Why so slow? (Ants Aasma <ants.aasma@eesti.ee>)
List	pgsql-performance

Tree view

Ah, that did make a big difference! It went from taking 10x as long to taking only 1.5x as long (about what I would have expected, if not better.) Thank you!

On Fri, Feb 17, 2012 at 9:29 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:

On Feb 17, 2012 8:35 PM, "Alessandro Gagliardi" <alessandro@path.com> wrote:
> Here is the EXPLAIN: http://explain.depesz.com/s/ley
>
> I'm using PostgreSQL 9.0.6 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 32-bit
>
> My random_page_cost is 2 and yet it still insists on using Seq Scan on blocks.
As could be inferred from the row counts, it's slow because its joining and then aggregating a quarter of the blocks table. The hash join with its sequential scan is probably the correct choice for that type of join, it's the join itself that should be optimized out. The optimizer doesn't figure out that the join can be turned into a semi join if the output is aggregated with distinct and is from only one of the tables (in this case, because the output is the join key, it can be from either table).
To make the optimizers job easier you can rewrite it as a semi-join explicitly:
SELECT DISTINCT(user_id) FROM seen_its WHERE EXISTS (SELECT 1 FROM blocks WHERE blocks.user_id = seen_its.user_id) AND seen_its.created BETWEEN (now()::date - interval '8 days')::timestamp AND now()::date::timestamp
--
Ants Aasma

pgsql-performance by date:

From: Ofer Israeli
Date: 20 February 2012, 16:17:05
Subject: Re: Insertions slower than Updates?

From: Alessandro Gagliardi
Date: 20 February 2012, 18:06:44
Subject: Indexes and Primary Keys on Rapidly Growing Tables

Re: Why so slow? - Mailing list pgsql-performance

Previous

Next