Re: Large DB - Mailing list pgsql-general
From | Mooney, Ryan |
---|---|
Subject | Re: Large DB |
Date | |
Msg-id | B295F3D09B0D3840BEA3B46C51FC389C1F5C0F@pnlmse28.pnl.gov Whole thread Raw |
In response to | Large DB ("Mooney, Ryan" <ryan.mooney@pnl.gov>) |
List | pgsql-general |
Actually in the case referenced I have over 1.4 billion rows. The 206238 row example should be no problem. I had no (realistic based disk I/O bandwidth, etc..) performance issues whatsoever in the hundreds of millions of rows (even with sub-optimal indexes). My average query has to reference 2-30 Million rows, so yes that can take up to a few minutes, but even dd'ing that amount of data to/from the disk array took almost as long (that's what I used as my base benchmark, if my query approaches that I don't see any room to complain :>) I could only really see 206K rows being a major issue IF they were really large rows AND you were doing really complex queries AND the indexes were really poorly defined, or you were on REALLY slow hardware. Unfortunately I didn't follow that thread as closely, so I don't remember exactly what the issues expressed there were. > -----Original Message----- > From: Bob.Henkel@hartfordlife.com > [mailto:Bob.Henkel@hartfordlife.com] > Sent: Tuesday, April 06, 2004 2:17 PM > To: Ericson Smith > Cc: Manfred Koizar; pgsql-general@postgresql.org; > pgsql-general-owner@postgresql.org; Mooney, Ryan > Subject: Re: [GENERAL] Large DB > > > > > > > > I'm a fairly new Postgresql user. And a long time Oracle > user so keep that in mind. > > So you are telling me that for 206238 rows on a table the > best route is to break the table into separate tables? To me > that is horrible to have to do unless you really wanted to do > that for some reason. And unless the count is way higher > then I read how is 206238 a large amout of rows on a table? > > > |---------+----------------------------------> > | | Ericson Smith | > | | <eric@did-it.com> | > | | Sent by: | > | | pgsql-general-owner@pos| > | | tgresql.org | > | | | > | | | > | | 04/06/2004 08:27 AM | > | | | > |---------+----------------------------------> > > >------------------------------------------------------------- > -----------------------------------------------------------------| > | > | > | To: Manfred Koizar <mkoi-pg@aon.at> > | > | cc: "Mooney, Ryan" <ryan.mooney@pnl.gov>, > pgsql-general@postgresql.org > | > | Subject: Re: [GENERAL] Large DB > | > > >------------------------------------------------------------- > -----------------------------------------------------------------| > > > > > I've been following this thread with interest since it > started, and it really seems that there is just too much data > in that single table. When it comes down to it, making > smaller separate tables seems to be the way to go. Querying > will be a little harder, but much faster. > > Warmest regards, > Ericson Smith > Tracking Specialist/DBA > +-----------------------+---------------------------------+ > | http://www.did-it.com | "When you have to shoot, shoot, | > | eric@did-it.com | don't talk! - Tuco | > | 516-255-0500 | | > +-----------------------+---------------------------------+ > > > > Manfred Koizar wrote: > > >On Sat, 03 Apr 2004 22:39:31 -0800, "Mooney, Ryan" > ><ryan.mooney@pnl.gov> > >wrote: > > > > > >>Ok, so I ran a vacuum analyse. It took ~1.7 days to finish. > >> > >> > > > >Just to make it clear: VACUUM and ANALYSE are two different > commands. > > > >VACUUM is for cleaning up. It has to visit every tuple in > every page, > >and if there are dead row versions, it also has to scan all indices > >belonging to the table. If there are lots of deleted rows and > >vacuum_mem is to small, VACUUM has to scan each index > several times to > >delete all index entries pointing to dead rows. This might > raise the > >cost to even more than O(tuplecount). > > > >ANALYSE collects a random sample of the rows in the table, > the sample > >size depends on default_statistics_target and the maximum value you > >have set any column's statistics target to (ALTER TABLE ... ALTER > >COLUMN ... SET STATISTICS ...). If you didn't touch either, > the sample > >size is 3000 rows. Then these 3000 rows are sorted and counted in > >different ways to generate statistics. > > > >The number of pages that have to be touched to collect the sample > >depends on the table size, but it does by far not grow > proportionally > >to the number of pages, nblocks. The cost growth rate is > greater than > >O(ln(nblocks)) and significantly lesser than O(nblocks). I have no > >simple formula for it, but I estimate that analysing your tp3 table > >would need between 28000 and 30000 page reads, which should > be doable > >in a few minutes. > > > >VACUUM ANALYSE is just VACUUM followed by ANALYSE with the added > >benefit, that the number of rows does not have to be estimated by > >ANALYSE, because VACUUM knows the exact value. > > > > > > > >>The invalid page block was caused when I tried the 2.6 kernel (for > >>other reasons than DB performance), its been there for a > while, and I > >>can deal w/ the data loss > >> > >> > > > > > > > >>ERROR: invalid page header in block 10257032 of > "tp3_point_starttime" > >> > >> > > > >AFAICS the invalid page is in an index, so there is no data > loss. You > >could simply drop and re-create that index. That might take > some time, > >though :-( > > > > > > > >>Here is the explain analyse, you can see why I think that > an index on > >>just host might be better - hosts are a small set, starttime is a > >>large set so the index should be more > >>Efficient, > >> > >> > > > >I think you got that backwards. If there are not many > hosts, then an > >index on host is not very selective, IOW you get a lot of > hits when you > >look for a particular host. OTOH if you select a sufficiently small > >starttime interval, you get only a few rows, so using an > index is most > >efficient. > > > > > > > >>at the very least it should be (starttime, host), not (host, > >>starttime) unless > >>the indexing engine is smart enough to make that not matter > (I suspect > >>its not???). > >> > >> > > > >Yes, it should be (starttime, host). And no, PG is > generally not smart > >enough to use an index if there is no condition on the first index > >column. > > > > > > > >> -> Index Scan using tp3_host_starttime, tp3_host_starttime, > >> [...], tp3_host_starttime on tp3 > >>(cost=0.00..195.07 rows=1 width=25) (actual time=59.42..96366.43 > >>rows=206238 loops=1) > >> > >> > > > >Estimated number of rows: 1 > >Actual number of rows: 206238 > >The planner is way off here. Furtunately your huge number of rows > >makes it rule out every other (most probably slower) plan. > > > >How many hosts are there? Even if there are a few hundred, an index > >scan with that condition has to access and skip millions of index > >tuples. An index on (starttime, host) would visit less > index tuples, > >and would more likely access the heap tuples in physical order. > > > > > > > >>Having followed the ongoing discusion about this I can > concur that it > >>is definitely NOT O(1). Unfortunately I didn't track the "time to > >>vacuum" > >> > >> > > > >The discussion on -hackers and the patch I posted earlier today are > >about ANALYSE, not VACUUM. > > > > > > > >>However I believe that I'm going to follow the suggestions about > >>reducing the table size so I'll have a brand new BD to play > with in a > >>couple weeks, > >> > >> > > > >Hopefully we'll see a success story here. > > > > > > > >>so knowing what I know now, I can track that if anyones > interested in > >>the data besides me :) > >> > >> > > > >VACUUM and ANALYSE times? Certainly. > > > >Servus > > Manfred > > > >---------------------------(end of > >broadcast)--------------------------- > >TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org > so that your > > message can get through to the mailing list cleanly > > > > > > > (See attached file: eric.vcf) > ---------------------------(end of > broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index > scan if your > joining column's datatypes do not match > > > > > ************************************************************** > *********** > PRIVILEGED AND CONFIDENTIAL: This communication, including > attachments, is for the exclusive use of addressee and may > contain proprietary, confidential and/or privileged > information. If you are not the intended recipient, any use, > copying, disclosure, dissemination or distribution is > strictly prohibited. If you are not the intended recipient, > please notify the sender immediately by return e-mail, delete > this communication and destroy all copies. > ************************************************************** > *********** > >
pgsql-general by date: