Thread: PostgreSQL not using index for statement with group by

PostgreSQL not using index for statement with group by

From
Mark Starkman
Date:

I am new to PostgreSQL and I am evaluating it for use as a data  warehouse.  I am really struggling to get a simple query to perform well.  I have put the appropriate indexes on the table (at least they are appropriate from my use with several other RDBMS’s).  However, the query doesn’t perform well, and I’m not sure how to get it to return in reasonable amount of time.  Right now the query takes between 2 – 3 minutes to return.  There are about 39 million rows in the table. Here is all of the information that I have.  Please let me know if you I have done anything wrong or what needs to change.

 

Thanks,

Mark

 

Table Definition:

CREATE TABLE temp_inventory_fact

(

  item_id integer NOT NULL,

  date_id timestamp with time zone NOT NULL,

  "CBL_Key" integer NOT NULL,

  product_group_id integer NOT NULL,

  supplier_id numeric(19) NOT NULL,

  "Cost" numeric(19,9) NOT NULL,

  qty_on_hand numeric(19,9) NOT NULL,

  qty_allocated numeric(19,9) NOT NULL,

  qty_backordered numeric(19,9) NOT NULL,

  qty_on_po numeric(19,9) NOT NULL,

  qty_in_transit numeric(19,9) NOT NULL,

  qty_reserved numeric(19,9) NOT NULL,

  nonstock_id boolean NOT NULL

)

WITH (

  OIDS=FALSE

);

 

Query:

select product_group_id, SUM("Cost")

FROM temp_inventory_Fact

where product_group_id < 100

group by product_group_id

order by product_group_id

limit 50;

 

Indexes on table:

CREATE INDEX idx_temp_inventory_fact_product_cost ON temp_inventory_fact (product_group_id, "Cost");

CREATE INDEX idx_temp_inventory_fact_product ON temp_inventory_fact (product_group_id);

Re: PostgreSQL not using index for statement with group by

From
"Kevin Grittner"
Date:
Mark Starkman <mark.starkman@activant.com> wrote:

> I'm not sure how to get it to return in
> reasonable amount of time.

Some more information could help.

What version of PostgreSQL is this?

Please give an overview of the hardware and OS.

Please show your postgresql.conf file, excluding comments.

Please run your query with EXPLAIN ANALYZE in front, so we can see the
execution plan, with cost estimates compared to actual information.
If the the plan indicates a sequential scan, and you think an indexed
scan may be faster, you might be able to coerce it into the indexed
plan for diagnostic purposes by running this on the connection before
an EXPLAIN ANALYZE run:

set enable_seqscan = off;

You don't want to leave it off, or try to use that in production, but
it might be useful in figuring out what's going on.

That might be enough to diagnose the issue.

-Kevin

Re: PostgreSQL not using index for statement with group by

From
Scott Marlowe
Date:
On Thu, Sep 3, 2009 at 7:33 AM, Mark Starkman<mark.starkman@activant.com> wrote:
> I am new to PostgreSQL and I am evaluating it for use as a data  warehouse.
> I am really struggling to get a simple query to perform well.  I have put
> the appropriate indexes on the table (at least they are appropriate from my
> use with several other RDBMS’s).

Ok, first things first.  Pgsql isn't like most other dbms.  It's
indexes do not contain visibility info, which means that if the db
were to use the indexes to look up entries in a table, it still has to
go back to the table to look those values up to see if they are
visible to the current transation.

So, if you're retrieving a decent percentage of the table, it's
cheaper to just hit the table.  Note that this makes PostgreSQL poorly
suited for very wide tables.

Generally the trick to making large accesses run fast in pgsql is to
increase work_mem.  But some queries just aren't efficient in pgsql
that can be efficient in other dbs.

Possibly clustering on product_group_id would help.