Re: Large number of tables slow insert - Mailing list pgsql-performance

From Scott Carey
Subject Re: Large number of tables slow insert
Date
Msg-id a1ec7d000808241808r15beea9jeac0b7bbd9e767f7@mail.gmail.com
Whole thread Raw
In response to Re: Large number of tables slow insert  ("Loic Petit" <tls.wydd@free.fr>)
List pgsql-performance
I don't know if the max_fsm_relations issue will solve your problem or not.  I do know that you definitely want to increase it to a number larger than the sum of all your tables and indexes -- preferably with room to grow.  Additionally the max_fsm_pages value will likely need to be increased as your data size grows.

I work with about 9000 tables at the moment (growing each day) and do not see your issue.  I do not have indexes on most of my tables, and max_fsm_relations is set to 30000.

Although this will increase the number of tables even more-- you may want to consider partitioning your tables by time:  day or week or month.
This way, you may not even need an index on the date, as it will only scan tables over the date range specified ( NOTE -- this is not true if you use prepared statements -- prepared statements + partitioned tables = performance disaster). 
In addition, this may allow you to add the indexes on the partitioned table at a later date.  For example:

Partitions by week -- the current week's table has no indexes and is thus fast to insert.  But once it becomes last week's table and you are only inserting into a new table, the old one can have indexes added to it -- it is now mostly a read-only table.  In this way, full scans will only be needed for the current week's table, which will most of the time be smaller than the others and more likely be cached in memory as well.  You may want to partition by day or month instead.
You may want to combine several sensors into one table, so that you can partition by day or even hour.  It all depends on how you expect to access the data later and how much you can afford to deal with managing all those tables -- postgres only does some of the partitioning work for you and you have to be very careful with your queries.  There are some query optimizer oddities with partitioned tables one has to be aware of.

On Sun, Aug 24, 2008 at 3:30 PM, Loic Petit <tls.wydd@free.fr> wrote:
Quite a lot of answers !


> Does all INSERTs are made by one application? Maybe PREPARED STATEMENTS will help you? What about optimization on application level?
Yes it's on only one application (through JDBC), optimizations (grouped transaction, prepared statement) will be done but that our very first test in hard condition which scared us all :p.


> Can you tell us what kind of application this is? It sounds like a control systems application where you will write the current values of the sensors with each scan of a PLC.  If so, is that a good idea?  Also is 3,000 sensors realistic? That would be a lot of sensors for one control system.
Our research project is trying to manage large scale sensor network deployments. 3.000 is quite a huge deployment but it can be realistic for huge aggricultural deployment for example.


> That would be an insert plus updates to each of your 6 indexes every 0.33 ms. Is that a good idea?  Is there a better strategy? What are you measuring with the instruments e.g. is this a process plant or manufacturing facility? What will people do with this data?
I try to suppress the indexes the more I can. Actually I only really need the index on timestamp to see for example the last readings, and to search for a historical data by period, the others (values) are more for "when this sensor was over 45ºC" for instance but it can be without indexes (will be slow but less heavy at insert time). I get the data from differents telosb motes that gathers temperature / humidity and light.


> Have you checked what you are bottlenecking on - CPU or disk? Try iostat/top/etc during the inserts. Also check actual disk utilizatio (iostat -x on linux/freebsd; varies on others) to see what percentage of time the disk/storage device is busy.
I saw the results of iostat and top, the postgres process was at 70% cpu . Yes I know that my test machine is not brand new but I have to find a good solution with this.

Ok I just ran some tests. It seems that I spammed too much right after the creation of the tables, thus the vacuum analyse could not be ran. I have better results now :

Average of writing 10 rows in each table
ON 1000 TABLES
    Without indexes at all : ~1.5s
    With only the index on timestamp : ~2.5s
    With all indexes : ~30s

ON 3000 TABLES
    Without indexes at all : ~8s
    With only the index on timestamp : ~45s
    With all indexes : ~3min

I don't know why but the difference is quite huge with indexes  ! When I did my vacuum the system told me about the "max_fsm_relations" (1000). Do you think it will change something (as Scott said). I didn't have time to run tests with vacuum analyze on system table see you tomorow for other news...

pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Identifying the nature of blocking I/O
Next
From: "Scott Carey"
Date:
Subject: Re: Identifying the nature of blocking I/O