Home > mailing lists

Re: group by will not use an index? - Mailing list pgsql-performance

From	Brian Herlihy
Subject	Re: group by will not use an index?
Date	January 10, 2007 00:07:19
Msg-id	20070110010703.7815.qmail@web52315.mail.yahoo.com Whole thread Raw
In response to	group by will not use an index? (tsuraan <tsuraan@gmail.com>)
Responses	Re: group by will not use an index?
List	pgsql-performance

Tree view

Actually, as I recently discovered, GROUP BY is faster than DISTINCT. It's just due to how they are implemented, so don't go looking for any deep reason :) The thread "GROUP BY vs DISTINCT" from 2006-12-20 discusses it. DISTINCT sorts the results to find the unique rows, but GROUP BY uses a hash.

Brian

----- Original Message ----
From: Adam Rich <adam.r@sbcglobal.net>
To: tsuraan <tsuraan@gmail.com>; pgsql-performance <pgsql-performance@postgresql.org>
Sent: Wednesday, 10 January, 2007 7:32:50 AM
Subject: Re: [PERFORM] group by will not use an index?

That query looks strange to me (a group by without an aggregate). See if this is

any faster:

SELECT DISTINCT DATE(inserted) FROM Messages

I won't hold my breath though, I don't think there's any way around the full table scan

in Postgres, because the index does not contain enough information about transactional

state, so table access is always required (unlike virtually every other type of db)

-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of tsuraan
Sent: Tuesday, January 09, 2007 5:06 PM
To: pgsql-performance
Subject: [PERFORM] group by will not use an index?

I have a table of messages with paths and inserted dates (among other things), like so:

CREATE TABLE Messages (
    msgkey BIGSERIAL PRIMARY KEY,
    path TEXT NOT NULL,
    inserted TIMESTAMP WITHOUT TIMEZONE DEFAULT NOW()
);

I run a query to determine which days actually saw emails come in, like so:

SELECT DATE(inserted) FROM Messages GROUP BY DATE(inserted);

That's obviously not very efficient, so I made an index:

CREATE INDEX messages_date_inserted_ind ON Messages(DATE(inserted));

However, GROUP BY does not use this index:

=# explain analyze select date(inserted) from messages group by date(inserted);
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=104773.10..104789.51 rows=1313 width=8) (actual time=31269.476..31269.557 rows=44 loops=1)
   -> Seq Scan on messages (cost=0.00..101107.25 rows=1466340 width=8) (actual time=23.923..25248.400 rows=1467036 loops=1)
Total runtime: 31269.735 ms
(3 rows)

Is it possible to get pg to use an index in a group by? I don't see why it wouldn't be possible, but maybe I'm missing something.

Using pg 8.1.4...

pgsql-performance by date:

From: Scott Marlowe
Date: 09 January 2007, 22:42:12
Subject: Re: group by will not use an index?

From: "Steinar H. Gunderson"
Date: 10 January 2007, 00:10:14
Subject: Re: group by will not use an index?

Re: group by will not use an index? - Mailing list pgsql-performance

Previous

Next