Re: Large DB - Mailing list pgsql-general
From | Mooney, Ryan |
---|---|
Subject | Re: Large DB |
Date | |
Msg-id | B295F3D09B0D3840BEA3B46C51FC389C1F5C0C@pnlmse28.pnl.gov Whole thread Raw |
In response to | Large DB ("Mooney, Ryan" <ryan.mooney@pnl.gov>) |
Responses |
Re: Large DB
|
List | pgsql-general |
> On Wed, 31 Mar 2004 10:08:26 -0800, "Mooney, Ryan" > <ryan.mooney@pnl.gov> > wrote: > >I haven't run ANALYZE on this table in a while. After about 50-60M > >rows it didn't seem to change the query plan at all and since there > >were never any deletes/updates it seemed like it wasn't > making much/any > >difference (should have been no pages to reclaim). > > Reclaiming pages is not the job of ANALYSE, VACUUM does this. Sorry, yes that's what I meant - lack of sleep :> > Might be a valid assumption as well -- if you're lucky. But > do you want to depend on luck? Eg. 75% of the today's rows > contain timestamps that are greater than what the planner > believes to be the maximum. Point taken. See vacuum info below... > No VACCUM, no ANALYSE, no REINDEX. This explains why the > planner thinks there are only 4M pages, which gives 640 > bytes/row if there were 50M rows at that time. OTOH the > EXPLAIN shows 290M rows for the seq scan. Something doesn't > fit together here. Ok, so I ran a vacuum analyse. It took ~1.7 days to finish. I'm running another one now to see how it does. I had turned off data updates during the vacuum below, I've turned them back on for the current run to get some idea about how its going to work ongoing, this means that the machine was basically idle during the vacuum so that's a pretty good picture of how long just the vacuum took. The invalid page block was caused when I tried the 2.6 kernel (for other reasons than DB performance), its been there for a while, and I can deal w/ the data loss so I just turned on zero_damaged_pages = true (a nice feature would be to be able to set this by table - I didn't see any way to do that), I did try doing the "select ctid from tp3 offset 10257032 limit 20;" and deleting rows with those ctid's, but even after deleting several hundred rows it was still complaining, so I gave up and moved on... INFO: vacuuming "public.tp3" INFO: index "tp3_host_starttime" now contains 1401476187 tuples in 19778677 pages DETAIL: 11657230 index tuples were removed. 108729 index pages have been deleted, 1 are currently reusable. CPU 892.96s/602.59u sec elapsed 149710.57 sec. ERROR: invalid page header in block 10257032 of "tp3_point_starttime" I had thought that there had never been any deletes/updates but my co-worker pointed out that we did do some updates on a couple million rows a couple months back, so that explains the removed index tuples. Here is the explain analyse, you can see why I think that an index on just host might be better - hosts are a small set, starttime is a large set so the index should be more Efficient, at the very least it should be (starttime, host), not (host, starttime) unless the indexing engine is smart enough to make that not matter (I suspect its not???). Db => explain analyse select point, avg(pvalue) as avg from tp3 where host in ('m480', 'm456', 'm455', 'm453', 'm452', 'm451', 'm450', 'm449', 'm368', 'm367', 'm366', 'm365', 'm364', 'm363', 'm362', 'm361', 'm783', 'm782', 'm781', 'm779', 'm778', 'm776', 'm774', 'm739', 'm738', 'm737', 'm736', 'm735', 'm732', 'm731', 'm730', 'm727') and starttime between '2004-03-30 07:28:25' and '2004-03-30 08:37:33' group by point; HashAggregate (cost=195.08..195.08 rows=1 width=25) (actual time=98667.32..98667.37 rows=12 loops=1) -> Index Scan using tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime, tp3_host_starttime on tp3 (cost=0.00..195.07 rows=1 width=25) (actual time=59.42..96366.43 rows=206238 loops=1) Index Cond: (((host = 'm480'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm456'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm455'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm453'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm452'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm451'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm450'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm449'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm368'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm367'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm366'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm365'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm364'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm363'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm362'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm361'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm783'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm782'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm781'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm779'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm778'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm776'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm774'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm739'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm738'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm737'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm736'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm735'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm732'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm731'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm730'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) OR ((host = 'm727'::bpchar) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone))) Filter: (((host = 'm480'::bpchar) OR (host = 'm456'::bpchar) OR (host = 'm455'::bpchar) OR (host = 'm453'::bpchar) OR (host = 'm452'::bpchar) OR (host = 'm451'::bpchar) OR (host = 'm450'::bpchar) OR (host = 'm449'::bpchar) OR (host = 'm368'::bpchar) OR (host = 'm367'::bpchar) OR (host = 'm366'::bpchar) OR (host = 'm365'::bpchar) OR (host = 'm364'::bpchar) OR (host = 'm363'::bpchar) OR (host = 'm362'::bpchar) OR (host = 'm361'::bpchar) OR (host = 'm783'::bpchar) OR (host = 'm782'::bpchar) OR (host = 'm781'::bpchar) OR (host = 'm779'::bpchar) OR (host = 'm778'::bpchar) OR (host = 'm776'::bpchar) OR (host = 'm774'::bpchar) OR (host = 'm739'::bpchar) OR (host = 'm738'::bpchar) OR (host = 'm737'::bpchar) OR (host = 'm736'::bpchar) OR (host = 'm735'::bpchar) OR (host = 'm732'::bpchar) OR (host = 'm731'::bpchar) OR (host = 'm730'::bpchar) OR (host = 'm727'::bpchar)) AND (starttime >= '2004-03-30 07:28:25'::timestamp without time zone) AND (starttime <= '2004-03-30 08:37:33'::timestamp without time zone)) Total runtime: 98760.26 msec (5 rows) > Hackers, what could update reltuples, but not relpages? > > Or, Ryan, is it possible that you already had 290M rows when > you ran ANALYSE and you have more than 1G rows today? Yes, appears to be the case... > BTW, ANALYSE is basically a constant time operation. Having followed the ongoing discusion about this I can concur that it is definitely NOT O(1). Unfortunately I didn't track the "time to vacuum" as The table grew. However I believe that I'm going to follow the suggestions about reducing the table size so I'll have a brand new BD to play with in a couple weeks, so knowing what I know now, I can track that if anyones interested in the data besides me :) > > Servus > Manfred >
pgsql-general by date: