Re: Thousands of tables versus on table? - Mailing list pgsql-performance
From | david@lang.hm |
---|---|
Subject | Re: Thousands of tables versus on table? |
Date | |
Msg-id | Pine.LNX.4.64.0706051325500.24361@asgard.lang.hm Whole thread Raw |
In response to | Re: Thousands of tables versus on table? (Gregory Stark <stark@enterprisedb.com>) |
Responses |
Re: Thousands of tables versus on table?
|
List | pgsql-performance |
On Tue, 5 Jun 2007, Gregory Stark wrote: > "Scott Marlowe" <smarlowe@g2switchworks.com> writes: > >> Sorry, I think I initially read your response as "Postgres doesn't really get >> any faster by breaking the tables up" without the "like that" part. > > Well breaking up the tables like that or partitioning, either way should be > about equivalent really. Breaking up the tables and doing it in the > application should perform even better but it does make the schema less > flexible and harder to do non-partition based queries and so on. but he said in the initial message that they don't do cross-customer reports anyway, so there really isn't any non-partition based querying going on anyway. > I guess I should explain what I originally meant: A lot of people come from a > flat-file world and assume that things get slower when you deal with large > tables. In fact due to the magic of log(n) accessing records from a large > index is faster than first looking up the table and index info in a small > index and then doing a second lookup in up in an index for a table half the > size. however, if your query plan every does a sequential scan of a table then you are nog doing a log(n) lookup are you? > Where the win in partitioning comes in is in being able to disappear some of > the data entirely. By making part of the index key implicit in the choice of > partition you get away with a key that's half as large. And in some cases you > can get away with using a different key entirely which wouldn't otherwise have > been feasible to index. In some cases you can even do sequential scans whereas > in an unpartitioned table you would have to use an index (or scan the entire > table). > > But the real reason people partition data is really for the management ease. > Being able to drop, and load entire partitions in O(1) is makes it feasible to > manage data on a scale that would simply be impossible without partitioned > tables. remember that the origional question wasn't about partitioned tables, it was about the performance problem he was having with one large table (slow insert speed) and asking if postgres would collapse if he changed his schema to use a seperate table per customer. I see many cases where people advocate collapsing databases/tables togeather by adding a column that indicates which customer the line is for. however I really don't understand why it is more efficiant to have a 5B line table that you do a report/query against 0.1% of then it is to have 1000 different tables of 5M lines each and do a report/query against 100% of. it would seem that the fact that you don't have to skip over 99.9% of the data to find things that _may_ be relavent would have a noticable cost in and of itself. David Lang
pgsql-performance by date: