Re: not using indexes on large table

From: Andreas Kostyrka
Subject: Re: not using indexes on large table
Date: ,
Msg-id: 20070421211704.GS4118@andi-lap
(view: Whole thread, Raw)
In response to: not using indexes on large table  (Jeroen Kleijer)
List: pgsql-performance

Tree view

not using indexes on large table  (Jeroen Kleijer, )
 Re: not using indexes on large table  (Vincenzo Romano, )
 Re: not using indexes on large table  (Andreas Kostyrka, )
 Re: not using indexes on large table  ("Dave Dutcher", )
 Re: not using indexes on large table  (Scott Marlowe, )

* Jeroen Kleijer <> [070421 23:10]:
>
> Hi all,
>
> I'm a bit new to PostgreSQL and database design in general so forgive me
> for asking stupid questions. ;-)
>
> I've setup a PostgreSQL database on a Linux machine (2 processor, 1GB
> mem) and while the database itself resides on a NetApp filer, via NFS,
> this doesn't seem to impact the performance to drastically.
>
> I basically use it for indexed tables without any relation between 'em
> so far this has worked perfectly.
>
> For statistics I've created the following table:
> volume varchar(30),
> qtree varchar(255),
> file varchar(512),
> ctime timestamp,
> mtime timestamp,
> atime timestamp
> annd created separate indexes on the volume and qtree columns.
>
> This table gets filled with the copy command and about 2 hours and
> some 40 million records later I issue a reindex command to make sure the
> indexes are accurate. (for good interest, there are some 35 values for
> volume and some 1450 for qtrees)
>
> While filling of this table, my database grows to an (expected) 11.5GB.
>
> The problems comes when I try to do a query without using a where clause
> because by then, it completely discards the indexes and does a complete
> table scan which takes over half an hour! (40.710.725 rows, 1110258
> pages, 1715 seconds)
>
> I've tried several things but doing a query like:
> select distinct volume from project_access_times
> or
> select distinct qtree from project_access_times
> always result in a full sequential table scan even after a 'vacuum' and
> 'vacuum analyze'.

Try:
select volume from project_access_times group by project_access_times;

And no matter, runnning a database over NFS smells like a dead rat.

Hopefully, you've mounted it hard, but still NFS does not have normal
semantics, e.g. locking, etc.

Next thing, as you've got only one client for that NFS mount, try to
make it to cache aggressivly meta data. The ac prefixed options in
nfs(5) come to mind.

Andreas


pgsql-performance by date:

From: Tom Lane
Date:
Subject: Re: Odd problem with planner choosing seq scan
From: Ulrich Cech
Date:
Subject: Re: Large objetcs performance