Re: Thousands of tables versus on table? - Mailing list pgsql-performance

From david@lang.hm
Subject Re: Thousands of tables versus on table?
Date
Msg-id Pine.LNX.4.64.0706051325500.24361@asgard.lang.hm
Whole thread Raw
In response to Re: Thousands of tables versus on table?  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: Thousands of tables versus on table?
List pgsql-performance
On Tue, 5 Jun 2007, Gregory Stark wrote:

> "Scott Marlowe" <smarlowe@g2switchworks.com> writes:
>
>> Sorry, I think I initially read your response as "Postgres doesn't really get
>> any faster by breaking the tables up" without the "like that" part.
>
> Well breaking up the tables like that or partitioning, either way should be
> about equivalent really. Breaking up the tables and doing it in the
> application should perform even better but it does make the schema less
> flexible and harder to do non-partition based queries and so on.

but he said in the initial message that they don't do cross-customer
reports anyway, so there really isn't any non-partition based querying
going on anyway.

> I guess I should explain what I originally meant: A lot of people come from a
> flat-file world and assume that things get slower when you deal with large
> tables. In fact due to the magic of log(n) accessing records from a large
> index is faster than first looking up the table and index info in a small
> index and then doing a second lookup in up in an index for a table half the
> size.

however, if your query plan every does a sequential scan of a table then
you are nog doing a log(n) lookup are you?

> Where the win in partitioning comes in is in being able to disappear some of
> the data entirely. By making part of the index key implicit in the choice of
> partition you get away with a key that's half as large. And in some cases you
> can get away with using a different key entirely which wouldn't otherwise have
> been feasible to index. In some cases you can even do sequential scans whereas
> in an unpartitioned table you would have to use an index (or scan the entire
> table).
>
> But the real reason people partition data is really for the management ease.
> Being able to drop, and load entire partitions in O(1) is makes it feasible to
> manage data on a scale that would simply be impossible without partitioned
> tables.

remember that the origional question wasn't about partitioned tables, it
was about the performance problem he was having with one large table (slow
insert speed) and asking if postgres would collapse if he changed his
schema to use a seperate table per customer.

I see many cases where people advocate collapsing databases/tables
togeather by adding a column that indicates which customer the line is
for.

however I really don't understand why it is more efficiant to have a 5B
line table that you do a report/query against 0.1% of then it is to have
1000 different tables of 5M lines each and do a report/query against 100%
of. it would seem that the fact that you don't have to skip over 99.9% of
the data to find things that _may_ be relavent would have a noticable cost
in and of itself.

David Lang

pgsql-performance by date:

Previous
From: Chander Ganesan
Date:
Subject: Re: Append table
Next
From: "Steven Flatt"
Date:
Subject: Re: performance drop on 8.2.4, reverting to 8.1.4