Large DB - Mailing list pgsql-general
From | Mooney, Ryan |
---|---|
Subject | Large DB |
Date | |
Msg-id | B295F3D09B0D3840BEA3B46C51FC389C1F5C06@pnlmse28.pnl.gov Whole thread Raw |
Responses |
Re: Large DB
Re: Large DB |
List | pgsql-general |
Hello, I have a single table that just went over 234GB in size with about 290M+ rows. I think that I'm starting to approach some limits since things have gotten quite a bit slower over the last couple days. The table is really simple and I'm mostly doing simple data mining queries like the query included below. These have gone from taking a under a few minutes to taking tens of minutes over the last week or so (a query like the below would generally return 1-6 million rows from the base table). The queries use the indexes fairly well, although I suspect that the order of host/starttime is suboptimal (fewer hosts than starttime, and the table is naturally in starttime order). I'm going to try adding an index on just starttime (and later just host) and see if I can tune the queries on that more. I never delete rows from the table, only do inserts (up to around 11,000/minute mostly in one big burst every minute, this is anticipated to go up some over time). There are about 32 processes doing the inserts (on the same machine - yeah I know it'd be happier if they moved); I think it might help if there was only one, but for architectural reasons that won't happen for a while. This is on a dual 3Ghz xenon with 4G Ram and an IDE-SCSI raid array (ACNC) I'm running RH Fedora with kernel 2.4.22-1.2115.nptlsmp (we'd tried FBSD 4/5 early on, but the insert speeds were actually better with RH9 by a ~10% or so - this was pre fbsd 5.2, but it's a bit late to migrate easily now). I'm trying to figure out ways to squeak another ounce or two of performance out of this machine, I've included the things I've tuned so far below. The query processes are mostly stuck in D state so I expect that I'm hitting some hw limitations, but I'm only doing sub 15MB from the disk array (from iostat) and I know it can do in the 40-60MB range when we tested the raw speed, and only 400 or so tps which is also well under the arrays limits so I suspect that its thrashing a bit, this is also indicated by the contrast between rrqm/s (read requests merged per second) which is pushing 2000 and the actual r/s (read requests that were issued to the device) at around 400 or so (same as tps). I suspect that a lot of the time is spent reading indexes, so a "better" indexing scheme may be my best bet. Estimating the table size ------------------------- stats=> select relfilenode,relname from pg_class where relfilenode=37057796; relfilenode | relname -------------+--------- 37057796 | tp3 du -sc 37057796* | grep total 234002372 total However the two indexes are also - large (which may be part of the problem, which is why I'm trying just starttime for an index; They are currently in the 140-150G range). The query optimizer thinks I have ~ 290M rows (I'm not actually doing a real select count since the last time I did that was around 10M rows or so and it took a long while, I don't want to wait days :). ------------------------------ stats=> explain select count(*) from tp3; QUERY PLAN ----------------------------------------------------------------------- Aggregate (cost=7632998.20..7632998.20 rows=1 width=0) -> Seq Scan on tp3 (cost=0.00..6906493.16 rows=290602016 width=0) (2 rows) Table def ---------------------------- stats=> \d tp3 Table "public.tp3" Column | Type | Modifiers -------------+-----------------------------+----------- host | character(4) | point | character varying(64) | type | character(1) | cooked | character(1) | starttime | timestamp without time zone | intervallen | interval | arrivetime | timestamp without time zone | pvalue | numeric | Indexes: "tp3_host_starttime" btree (host, starttime, cooked) "tp3_point_starttime" btree (point, starttime, cooked) Sample data mining query: ---------------------------- select point, avg(pvalue) as avg from tp3 where host in ('node', 'node', ....) and starttime between 'timestamp' and 'timestamp' group by point Tuning done so far: ---------------------------- $ cat /etc/sysctl.conf kernel.shmall=805306368 kernel.shmmax=805306368 $ egrep -v "^#|^$" postgresql.conf shared_buffers = 60800 sort_mem = 1286720 # min 64, size in KB vacuum_mem = 102400 # min 1024, size in KB fsync=false # Play fast and loose - whee max_files_per_process = 1000 wal_buffers = 16 checkpoint_segments = 20 checkpoint_timeout = 100 effective_cache_size = 160000
pgsql-general by date: