Re: Large DB - Mailing list pgsql-general

From Mooney, Ryan
Subject Re: Large DB
Date
Msg-id B295F3D09B0D3840BEA3B46C51FC389C1F5C0F@pnlmse28.pnl.gov
Whole thread Raw
In response to Large DB  ("Mooney, Ryan" <ryan.mooney@pnl.gov>)
List pgsql-general
Actually in the case referenced I have over 1.4 billion rows.  The
206238 row example
should be no problem.  I had no (realistic based disk I/O bandwidth,
etc..)
performance issues whatsoever in the hundreds of millions of rows (even
with
sub-optimal indexes).  My average query has to reference 2-30 Million
rows, so
yes that can take up to a few minutes, but even dd'ing that amount of
data to/from
the disk array took almost as long (that's what I used as my base
benchmark, if
my query approaches that I don't see any room to complain :>)

I could only really see 206K rows being a major issue IF they were
really large rows
AND you were doing really complex queries AND the indexes were really
poorly defined,
or you were on REALLY slow hardware.  Unfortunately I didn't follow that
thread as
closely, so I don't remember exactly what the issues expressed there
were.

> -----Original Message-----
> From: Bob.Henkel@hartfordlife.com
> [mailto:Bob.Henkel@hartfordlife.com]
> Sent: Tuesday, April 06, 2004 2:17 PM
> To: Ericson Smith
> Cc: Manfred Koizar; pgsql-general@postgresql.org;
> pgsql-general-owner@postgresql.org; Mooney, Ryan
> Subject: Re: [GENERAL] Large DB
>
>
>
>
>
>
>
> I'm a fairly new Postgresql user.  And a long time Oracle
> user so keep that in mind.
>
> So you are telling me that for 206238 rows on a table the
> best route is to break the table into separate tables?  To me
> that is horrible to have to do unless you really wanted to do
> that for some reason.  And unless the count is way higher
> then I read how is 206238 a large amout of rows on a table?
>
>
> |---------+---------------------------------->
> |         |           Ericson Smith          |
> |         |           <eric@did-it.com>      |
> |         |           Sent by:               |
> |         |           pgsql-general-owner@pos|
> |         |           tgresql.org            |
> |         |                                  |
> |         |                                  |
> |         |           04/06/2004 08:27 AM    |
> |         |                                  |
> |---------+---------------------------------->
>
> >-------------------------------------------------------------
> -----------------------------------------------------------------|
>   |
>                                                                    |
>   |       To:       Manfred Koizar <mkoi-pg@aon.at>
>                                                                    |
>   |       cc:       "Mooney, Ryan" <ryan.mooney@pnl.gov>,
> pgsql-general@postgresql.org
>          |
>   |       Subject:  Re: [GENERAL] Large DB
>                                                                    |
>
> >-------------------------------------------------------------
> -----------------------------------------------------------------|
>
>
>
>
> I've been following this thread with interest since it
> started, and it really seems that there is just too much data
> in that single table. When it comes down to it, making
> smaller separate tables seems to be the way to go. Querying
> will be a little harder, but much faster.
>
> Warmest regards,
> Ericson Smith
> Tracking Specialist/DBA
> +-----------------------+---------------------------------+
> | http://www.did-it.com | "When you have to shoot, shoot, |
> | eric@did-it.com       | don't talk!             - Tuco  |
> | 516-255-0500          |                                 |
> +-----------------------+---------------------------------+
>
>
>
> Manfred Koizar wrote:
>
> >On Sat, 03 Apr 2004 22:39:31 -0800, "Mooney, Ryan"
> ><ryan.mooney@pnl.gov>
> >wrote:
> >
> >
> >>Ok, so I ran a vacuum analyse.  It took ~1.7 days to finish.
> >>
> >>
> >
> >Just to make it clear:  VACUUM and ANALYSE are two different
> commands.
> >
> >VACUUM is for cleaning up.  It has to visit every tuple in
> every page,
> >and if there are dead row versions, it also has to scan all indices
> >belonging to the table.  If there are lots of deleted rows and
> >vacuum_mem is to small, VACUUM has to scan each index
> several times to
> >delete all index entries pointing to dead rows.  This might
> raise the
> >cost to even more than O(tuplecount).
> >
> >ANALYSE collects a random sample of the rows in the table,
> the sample
> >size depends on default_statistics_target and the maximum value you
> >have set any column's statistics target to (ALTER TABLE ... ALTER
> >COLUMN ... SET STATISTICS ...).  If you didn't touch either,
> the sample
> >size is 3000 rows.  Then these 3000 rows are sorted and counted in
> >different ways to generate statistics.
> >
> >The number of pages that have to be touched to collect the sample
> >depends on the table size, but it does by far not grow
> proportionally
> >to the number of pages, nblocks.  The cost growth rate is
> greater than
> >O(ln(nblocks)) and significantly lesser than O(nblocks).  I have no
> >simple formula for it, but I estimate that analysing your tp3 table
> >would need between 28000 and 30000 page reads, which should
> be doable
> >in a few minutes.
> >
> >VACUUM ANALYSE is just VACUUM followed by ANALYSE with the added
> >benefit, that the number of rows does not have to be estimated by
> >ANALYSE, because VACUUM knows the exact value.
> >
> >
> >
> >>The invalid page block was caused when I tried the 2.6 kernel (for
> >>other reasons than DB performance), its been there for a
> while, and I
> >>can deal w/ the data loss
> >>
> >>
> >
> >
> >
> >>ERROR:  invalid page header in block 10257032 of
> "tp3_point_starttime"
> >>
> >>
> >
> >AFAICS the invalid page is in an index, so there is no data
> loss.  You
> >could simply drop and re-create that index.  That might take
> some time,
> >though :-(
> >
> >
> >
> >>Here is the explain analyse, you can see why I think that
> an index on
> >>just host might be better - hosts are a small set, starttime is a
> >>large set so the index should be more
> >>Efficient,
> >>
> >>
> >
> >I think you got that backwards.  If there are not many
> hosts, then an
> >index on host is not very selective, IOW you get a lot of
> hits when you
> >look for a particular host.  OTOH if you select a sufficiently small
> >starttime interval, you get only a few rows, so using an
> index is most
> >efficient.
> >
> >
> >
> >>at the very least it should be (starttime, host), not (host,
> >>starttime) unless
> >>the indexing engine is smart enough to make that not matter
> (I suspect
> >>its not???).
> >>
> >>
> >
> >Yes, it should be (starttime, host).  And no, PG is
> generally not smart
> >enough to use an index if there is no condition on the first index
> >column.
> >
> >
> >
> >>  ->  Index Scan using tp3_host_starttime, tp3_host_starttime,
> >>      [...], tp3_host_starttime on tp3
> >>(cost=0.00..195.07 rows=1 width=25) (actual time=59.42..96366.43
> >>rows=206238 loops=1)
> >>
> >>
> >
> >Estimated number of rows: 1
> >Actual number of rows: 206238
> >The planner is way off here.  Furtunately your huge number of rows
> >makes it rule out every other (most probably slower) plan.
> >
> >How many hosts are there?  Even if there are a few hundred, an index
> >scan with that condition has to access and skip millions of index
> >tuples.  An index on (starttime, host) would visit less
> index tuples,
> >and would more likely access the heap tuples in physical order.
> >
> >
> >
> >>Having followed the ongoing discusion about this I can
> concur that it
> >>is definitely NOT O(1).  Unfortunately I didn't track the "time to
> >>vacuum"
> >>
> >>
> >
> >The discussion on -hackers and the patch I posted earlier today are
> >about ANALYSE, not VACUUM.
> >
> >
> >
> >>However I believe that I'm going to follow the suggestions about
> >>reducing the table size so I'll have a brand new BD to play
> with in a
> >>couple weeks,
> >>
> >>
> >
> >Hopefully we'll see a success story here.
> >
> >
> >
> >>so knowing what I know now, I can track that if anyones
> interested in
> >>the data besides me :)
> >>
> >>
> >
> >VACUUM and ANALYSE times?  Certainly.
> >
> >Servus
> > Manfred
> >
> >---------------------------(end of
> >broadcast)---------------------------
> >TIP 3: if posting/reading through Usenet, please send an appropriate
> >      subscribe-nomail command to majordomo@postgresql.org
> so that your
> >      message can get through to the mailing list cleanly
> >
> >
> >
> (See attached file: eric.vcf)
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index
> scan if your
>       joining column's datatypes do not match
>
>
>
>
> **************************************************************
> ***********
> PRIVILEGED AND CONFIDENTIAL: This communication, including
> attachments, is for the exclusive use of addressee and may
> contain proprietary, confidential and/or privileged
> information.  If you are not the intended recipient, any use,
> copying, disclosure, dissemination or distribution is
> strictly prohibited.  If you are not the intended recipient,
> please notify the sender immediately by return e-mail, delete
> this communication and destroy all copies.
> **************************************************************
> ***********
>
>

pgsql-general by date:

Previous
From: Jeff Eckermann
Date:
Subject: Re: Creating a trigger function
Next
From: "Gavin M. Roy"
Date:
Subject: Re: Large DB