Thread: select count() out of memory
Hi I am volume testing a db model that consists of a paritioned tables. The db has been running for a week and a half now and has built up to contain approx 55000 partition tables of 18000 rows each. The root table therefore contains about 1 billion rows. When I try to do a "select count(*)" of the root table, it does some work for a while, perhaps 5-10 minutes and the aborts with ERROR: out of memory DETAIL: Failed on request of size 130. Does anybody have any suggestion as to which parameter I should tune to give it more memory to be able to perform queries on the root table? regards thomas The last parts of the db log is the following, I dont think anything other than the last 2 lines are relevant. pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_amproc_opc_proc_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used MdSmgr: 4186112 total in 9 blocks; 911096 free (4 chunks); 3275016 used LockTable (locallock hash): 2088960 total in 8 blocks; 418784 free (25 chunks); 1670176 used Timezones: 47592 total in 2 blocks; 5968 free (0 chunks); 41624 used ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used ERROR: out of memory DETAIL: Failed on request of size 130.
<tfinneid@student.matnat.uio.no> writes: > > ERROR: out of memory > DETAIL: Failed on request of size 130. > > Does anybody have any suggestion as to which parameter I should tune to > give it more memory to be able to perform queries on the root table? This indicates that malloc() failed which means the system couldn't provide any more memory. Either you have a very low memory ulimit (look at ulimit -a in the same session as Postgres) or your machine is really low on memory. Perhaps you have shared_buffers set very high or some other program is using all your available memory (and swap)? > The last parts of the db log is the following, I dont think anything other > than the last 2 lines are relevant. You're wrong. All the lines like: > pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0 > chunks); 696 used are a dump of Postgres's current memory allocations and could be useful in showing if there's a memory leak causing this. Also, what version of Postgres is this? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Hi I have tried to answer to the best of my knowledge but its running on Soalris 10, and I am not that familiar with solaris ( Go Linux!!! :) > any more memory. Either you have a very low memory ulimit (look at ulimit > -a > in the same session as Postgres) or your machine is really low on memory. > Perhaps you have shared_buffers set very high or some other program is > using > all your available memory (and swap)? > the machine has 32GB RAM, I dont know how much swap it has, but I do know the disk system is a disk cluster with 16x450GB disks, it probably has a local disk as well but I dont know how big it is. -bash-3.00$ ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited open files (-n) 256 pipe size (512 bytes, -p) 10 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 16357 virtual memory (kbytes, -v) unlimited this is my config checkpoint_segments = 96 effective_cache_size = 128000 shared_buffers = 430000 max_fsm_pages = 208000 max_fsm_relations = 10000 max_connections = 1000 autovacuum = off # enable autovacuum subprocess? fsync = on # turns forced synchronization on or off #full_page_writes = on # recover from partial page writes wal_sync_method = fdatasync wal_buffers = 256 commit_delay = 5 #commit_siblings = 5 # range 1-1000 > Also, what version of Postgres is this? Apparently its 8.1.8, I thought it was 8.2 > are a dump of Postgres's current memory allocations and could be useful in > showing if there's a memory leak causing this. The file is 20M, these are the last lines: (the first line continues unttill ff_26000) idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_4_value2: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_4_value1: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_4_trace_id: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_3_value7: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_3_value2: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_3_value1: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_3_trace_id: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_2_value7: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_2_value2: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_2_value1: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_2_trace_id: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_1_value7: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_1_value2: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_1_value1: 1024 total in 1 blocks; 392 free (0 chunks); 632 used idx_attributes_g1_seq_1_ff_1_trace_id: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_index_indrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_namespace_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_statistic_relid_att_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_type_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_proc_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_authid_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_operator_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_index_indexrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_class_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_amproc_opc_proc_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used MdSmgr: 4186112 total in 9 blocks; 911096 free (4 chunks); 3275016 used LockTable (locallock hash): 2088960 total in 8 blocks; 418784 free (25 chunks); 1670176 used Timezones: 47592 total in 2 blocks; 5968 free (0 chunks); 41624 used ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used ERROR: out of memory DETAIL: Failed on request of size 130.
<tfinneid@student.matnat.uio.no> writes: > max_connections = 1000 Do you actually have anywhere near this number of processes? What is your setting for work_mem? Keep in mind every process could use as much as work_mem and actually it's possible to use that much several times over. Also, what is your maintenance_work_mem and do you have many vacuums or other such commands running at the time? 1,000 processes is a large number of processes. You may be better off re-architecting to run fewer processes simultaneously. But if that's not possible you'll have to keep it in mind to tune other things properly. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
God morgen Please display these memory settings from your postgresql.conf file sort_mem shared_buffers Takk Martin-- ----- Original Message ----- From: <tfinneid@student.matnat.uio.no> To: "Gregory Stark" <stark@enterprisedb.com> Cc: <tfinneid@student.matnat.uio.no>; <pgsql-general@postgresql.org> Sent: Thursday, October 25, 2007 7:07 AM Subject: Re: [GENERAL] select count() out of memory > Hi > > I have tried to answer to the best of my knowledge but its running on > Soalris 10, and I am not that familiar with solaris ( Go Linux!!! :) > > > any more memory. Either you have a very low memory ulimit (look at ulimit > > -a > > in the same session as Postgres) or your machine is really low on memory. > > Perhaps you have shared_buffers set very high or some other program is > > using > > all your available memory (and swap)? > > > > the machine has 32GB RAM, I dont know how much swap it has, but I do know > the disk system is a disk cluster with 16x450GB disks, it probably has a > local disk as well but I dont know how big it is. > > -bash-3.00$ ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > file size (blocks, -f) unlimited > open files (-n) 256 > pipe size (512 bytes, -p) 10 > stack size (kbytes, -s) 10240 > cpu time (seconds, -t) unlimited > max user processes (-u) 16357 > virtual memory (kbytes, -v) unlimited > > > this is my config > > checkpoint_segments = 96 > effective_cache_size = 128000 > shared_buffers = 430000 > max_fsm_pages = 208000 > max_fsm_relations = 10000 > > max_connections = 1000 > > autovacuum = off # enable autovacuum subprocess? > > fsync = on # turns forced synchronization on > or off > #full_page_writes = on # recover from partial page writes > wal_sync_method = fdatasync > wal_buffers = 256 > > commit_delay = 5 > #commit_siblings = 5 # range 1-1000 > > > > > Also, what version of Postgres is this? > > Apparently its 8.1.8, I thought it was 8.2 > > > are a dump of Postgres's current memory allocations and could be useful in > > showing if there's a memory leak causing this. > > The file is 20M, these are the last lines: (the first line continues > unttill ff_26000) > > > idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_4_value2: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_4_value1: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_4_trace_id: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_3_value7: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_3_value2: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_3_value1: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_3_trace_id: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_2_value7: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_2_value2: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_2_value1: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_2_trace_id: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_1_value7: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_1_value2: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_1_value1: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > idx_attributes_g1_seq_1_ff_1_trace_id: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used > pg_index_indrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 > used > pg_namespace_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used > pg_statistic_relid_att_index: 1024 total in 1 blocks; 328 free (0 chunks); > 696 used > pg_type_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used > pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 > used > pg_proc_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used > pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); > 696 used > pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0 > chunks); 768 used > pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); > 696 used > pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0 chunks); > 632 used > pg_authid_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used > pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 328 free (0 > chunks); 696 used > pg_operator_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used > pg_index_indexrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); > 632 used > pg_class_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used > pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0 > chunks); 696 used > pg_amproc_opc_proc_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 > used > pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 > used > MdSmgr: 4186112 total in 9 blocks; 911096 free (4 chunks); 3275016 used > LockTable (locallock hash): 2088960 total in 8 blocks; 418784 free (25 > chunks); 1670176 used > Timezones: 47592 total in 2 blocks; 5968 free (0 chunks); 41624 used > ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used > ERROR: out of memory > DETAIL: Failed on request of size 130. > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ >
> <tfinneid@student.matnat.uio.no> writes: > >> max_connections = 1000 > > Do you actually have anywhere near this number of processes? What is your > setting for work_mem? Keep in mind every process could use as much as > work_mem > and actually it's possible to use that much several times over. > > Also, what is your maintenance_work_mem and do you have many vacuums or > other > such commands running at the time? > > 1,000 processes is a large number of processes. You may be better off > re-architecting to run fewer processes simultaneously. But if that's not > possible you'll have to keep it in mind to tune other things properly. The application only needs about 20 connections under normal situations, but might need up to 100 in some situations, f.ex. if there is much latency and new connections arrive before another is finished. I could certainly reduce the number to 100 or 50, but do you think that would help with this problem. regards thomas
tfinneid@student.matnat.uio.no wrote: > > are a dump of Postgres's current memory allocations and could be useful in > > showing if there's a memory leak causing this. > > The file is 20M, these are the last lines: (the first line continues > unttill ff_26000) > > > idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free (0 > chunks); 632 used You have 26000 partitions??? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> God morgen > > Please display these memory settings from your postgresql.conf file > sort_mem > shared_buffers I have shown the entire configuration. if its not in the configuration shown, I have changed its value. I have used the configuration example provided by Sun regarding running postgres on solaris. http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp regards thomas
> tfinneid@student.matnat.uio.no wrote: > >> > are a dump of Postgres's current memory allocations and could be >> useful in >> > showing if there's a memory leak causing this. >> >> The file is 20M, these are the last lines: (the first line continues >> unttill ff_26000) >> >> >> idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free (0 >> chunks); 632 used > > You have 26000 partitions??? At the moment the db has 55000 partitions, and thats only a fifth of the complete volume the system will have in production. The reason I chose this solution is that a partition will be loaded with new data every 3-30 seconds, and all that will be read by up to 15 readers every time new data is available. The data will be approx 2-4TB in production in total. So it will be too slow if I put it in a single table with permanent indexes. I did a test previously, where I created 1 million partitions (without data) and I checked the limits of pg, so I think it should be ok. thomas
tfinneid@student.matnat.uio.no wrote: > > tfinneid@student.matnat.uio.no wrote: > > > >> > are a dump of Postgres's current memory allocations and could be > >> useful in > >> > showing if there's a memory leak causing this. > >> > >> The file is 20M, these are the last lines: (the first line continues > >> unttill ff_26000) > >> > >> > >> idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free (0 > >> chunks); 632 used > > > > You have 26000 partitions??? > > At the moment the db has 55000 partitions, and thats only a fifth of the > complete volume the system will have in production. The reason I chose > this solution is that a partition will be loaded with new data every 3-30 > seconds, and all that will be read by up to 15 readers every time new data > is available. The data will be approx 2-4TB in production in total. So it > will be too slow if I put it in a single table with permanent indexes. > > I did a test previously, where I created 1 million partitions (without > data) and I checked the limits of pg, so I think it should be ok. Clearly it's not. The difference could be the memory usage and wastage for all those relcache entries and other stuff. I would reduce the number of partitions to a more reasonable value (within the tens, most likely) Maybe your particular problem can be solved by raising max_locks_per_transaction (?) but I wouldn't count on it. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
> I have shown the entire configuration. if its not in the configuration > shown, I have changed its value. I meant to say "I haven't changed its value" thomas
Alvaro Herrera <alvherre@commandprompt.com> writes: > tfinneid@student.matnat.uio.no wrote: >> I did a test previously, where I created 1 million partitions (without >> data) and I checked the limits of pg, so I think it should be ok. > Clearly it's not. You couldn't have tested it too much --- even planning a query over so many tables would take forever, and actually executing it would surely have run the system out of locktable space before it even started scanning. The partitioning facility is designed for partition counts in the tens, or maybe hundreds at the most. regards, tom lane
> tfinneid@student.matnat.uio.no wrote: >> > tfinneid@student.matnat.uio.no wrote: >> > >> >> > are a dump of Postgres's current memory allocations and could be >> >> useful in >> >> > showing if there's a memory leak causing this. >> >> >> >> The file is 20M, these are the last lines: (the first line continues >> >> unttill ff_26000) >> >> >> >> >> >> idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free >> (0 >> >> chunks); 632 used >> > >> > You have 26000 partitions??? >> >> At the moment the db has 55000 partitions, and thats only a fifth of the >> complete volume the system will have in production. The reason I chose >> this solution is that a partition will be loaded with new data every >> 3-30 >> seconds, and all that will be read by up to 15 readers every time new >> data >> is available. The data will be approx 2-4TB in production in total. So >> it >> will be too slow if I put it in a single table with permanent indexes. >> >> I did a test previously, where I created 1 million partitions (without >> data) and I checked the limits of pg, so I think it should be ok. > > Clearly it's not. I does not mean my problem has anything to do with the number of partitions. It might have, or it might not, and thats the problem, the cause has not been located yet. According to the documented limits of pg, The difference could be the memory usage and wastage > for all those relcache entries and other stuff. I would reduce the > number of partitions to a more reasonable value (within the tens, most > likely) The db worked fine until it reached perhaps 30-40 thousand partitions.
> Alvaro Herrera <alvherre@commandprompt.com> writes: >> tfinneid@student.matnat.uio.no wrote: >>> I did a test previously, where I created 1 million partitions (without >>> data) and I checked the limits of pg, so I think it should be ok. > >> Clearly it's not. > > You couldn't have tested it too much --- even planning a query over so > many tables would take forever, and actually executing it would surely > have run the system out of locktable space before it even started > scanning. And this is the testing, so you're right.... Its only the select on the root table that fails. Operations on a single partitions is no problem. > The partitioning facility is designed for partition counts in the tens, > or maybe hundreds at the most. Maybe, but it works even on 55000 partitions as long as the operations are done against a partition and not the root table.
On 10/25/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > tfinneid@student.matnat.uio.no wrote: > >> I did a test previously, where I created 1 million partitions (without > >> data) and I checked the limits of pg, so I think it should be ok. > > > Clearly it's not. > > You couldn't have tested it too much --- even planning a query over so > many tables would take forever, and actually executing it would surely > have run the system out of locktable space before it even started > scanning. > > The partitioning facility is designed for partition counts in the tens, > or maybe hundreds at the most. I've had good results well into the hundreds, but after about 400 or so, things start to get a bit wonky.
tfinneid@student.matnat.uio.no wrote: > > Alvaro Herrera <alvherre@commandprompt.com> writes: > > The partitioning facility is designed for partition counts in the tens, > > or maybe hundreds at the most. > > Maybe, but it works even on 55000 partitions as long as the operations are > done against a partition and not the root table. It will work on a million partitions and more, provided you do operations on single partitions. What you want to do is not possible, period. Maybe when we redesign partitioning, but that's far into the future. Kindly do not waste our time (nor yours). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On 10/25/07, tfinneid@student.matnat.uio.no <tfinneid@student.matnat.uio.no> wrote: > > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> tfinneid@student.matnat.uio.no wrote: > >>> I did a test previously, where I created 1 million partitions (without > >>> data) and I checked the limits of pg, so I think it should be ok. > > > >> Clearly it's not. > > > > You couldn't have tested it too much --- even planning a query over so > > many tables would take forever, and actually executing it would surely > > have run the system out of locktable space before it even started > > scanning. > > And this is the testing, so you're right.... > > Its only the select on the root table that fails. Operations on a single > partitions is no problem. Not sure I understand exactly what you're saying. Are you selecting directly from the child table, or from the parent table with constraint_exclusion turned on? If you're hitting the child table directly, you aren't actually using partitioning. It's a wholly independent table at that point. If you're hitting a single child table through the parent table via constraint_exclusion, then you are using partitioning, but only hitting on physical table. But hitting the parent table with no constraining where clause is a recipe for disaster. The very reason to use partitioning is so that you never have to scan through a single giant table. Anyway, you're heading off into new territory with 55,000 partitions. What is the average size, in MB of one of your partitions? I found with my test, there was a point of diminishing returns after 400 or so partitions at which point indexes were no longer needed, because the average query just seq scanned the partitions it needed, and they were all ~ 16 or 32 Megs.
On Oct 25, 2007, at 9:36 AM, tfinneid@student.matnat.uio.no wrote: >> tfinneid@student.matnat.uio.no wrote: >>>> tfinneid@student.matnat.uio.no wrote: >>>> >>>>>> are a dump of Postgres's current memory allocations and could be >>>>> useful in >>>>>> showing if there's a memory leak causing this. >>>>> >>>>> The file is 20M, these are the last lines: (the first line >>>>> continues >>>>> unttill ff_26000) >>>>> >>>>> >>>>> idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; >>>>> 392 free >>> (0 >>>>> chunks); 632 used >>>> >>>> You have 26000 partitions??? >>> >>> At the moment the db has 55000 partitions, and thats only a fifth >>> of the >>> complete volume the system will have in production. The reason I >>> chose >>> this solution is that a partition will be loaded with new data every >>> 3-30 >>> seconds, and all that will be read by up to 15 readers every time >>> new >>> data >>> is available. The data will be approx 2-4TB in production in >>> total. So >>> it >>> will be too slow if I put it in a single table with permanent >>> indexes. >>> >>> I did a test previously, where I created 1 million partitions >>> (without >>> data) and I checked the limits of pg, so I think it should be ok. >> >> Clearly it's not. > > I does not mean my problem has anything to do with the number of > partitions. It might have, or it might not, and thats the problem, the > cause has not been located yet. > > According to the documented limits of pg, > The difference could be the memory usage and wastage >> for all those relcache entries and other stuff. I would reduce the >> number of partitions to a more reasonable value (within the tens, >> most >> likely) > > The db worked fine until it reached perhaps 30-40 thousand partitions. It depends on how you have the partitions set up and how you're accessing them. Are all of these partitions under the same parent table? If so, then trying run a SELECT COUNT(*) against the parent table is simply insane. Think about it, you're asking one query to scan 55000 tables. What you need to do is partition based on your access patterns, not what you *think* will help with performance down the road. Look into constraint exclusion, whether or not you can just access child tables directly, and whether you really need all of these under one logical table. Also, no matter how you do the partitioning, once you get up to that many and more relations in your system, dumps and restores take a lot longer. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
On 10/25/07, tfinneid@student.matnat.uio.no <tfinneid@student.matnat.uio.no> wrote: > Hi > > I am volume testing a db model that consists of a paritioned tables. The > db has been running for a week and a half now and has built up to contain > approx 55000 partition tables of 18000 rows each. The root table therefore > contains about 1 billion rows. When I try to do a "select count(*)" of the > root table, it does some work for a while, perhaps 5-10 minutes and the > aborts with > > ERROR: out of memory > DETAIL: Failed on request of size 130. So, out of curiosity, I asked my Oracle DBA friend if she'd ever heard of anyone having 60,000 or so partitions in a table, and she looked at me like I had a third eye in my forehead and said in her sweet voice "Well, that would certainly be an edge case". She sounded like she was worried about me.
> tfinneid@student.matnat.uio.no wrote: > It will work on a million partitions and more, provided you do > operations on single partitions. Thats good enough for me, thats exactly what I want. I just used the select count() on the root to get a feeling of how many rows it was in total. An then I thought that the error message was just a configuration issue. But since doing operations like that on the the root table of this magnitude is not a good idea, I won't. > What you want to do is not possible, period. Maybe when we redesign > partitioning, but that's far into the future. Kindly do not waste our > time (nor yours). Thank you for that prompt reply. In all fairness, thats why I asked the question here, to find out the facts, not to be abused for being ignorant about pg. thomas
tfinneid@student.matnat.uio.no wrote: > > tfinneid@student.matnat.uio.no wrote: > > It will work on a million partitions and more, provided you do > > operations on single partitions. > > Thats good enough for me, thats exactly what I want. In that case, why use partitions at all? They are simple independent tables. -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "El destino baraja y nosotros jugamos" (A. Schopenhauer)
Scott Marlowe escribió: > So, out of curiosity, I asked my Oracle DBA friend if she'd ever heard > of anyone having 60,000 or so partitions in a table, and she looked at > me like I had a third eye in my forehead and said in her sweet voice > "Well, that would certainly be an edge case". She sounded like she > was worried about me. Did you get rid of that third eye already? I would be equally worried. -- Alvaro Herrera http://www.advogato.org/person/alvherre "Someone said that it is at least an order of magnitude more work to do production software than a prototype. I think he is wrong by at least an order of magnitude." (Brian Kernighan)
>> The db worked fine until it reached perhaps 30-40 thousand partitions. > > It depends on how you have the partitions set up and how you're > accessing them. Are all of these partitions under the same parent > table? If so, then trying run a SELECT COUNT(*) against the parent > table is simply insane. Think about it, you're asking one query to > scan 55000 tables. What you need to do is partition based on your > access patterns, not what you *think* will help with performance down > the road. Look into constraint exclusion, whether or not you can > just access child tables directly, and whether you really need all of > these under one logical table. Also, no matter how you do the > partitioning, once you get up to that many and more relations in your > system, dumps and restores take a lot longer. The design is based on access patterns, i.e. one partition represents a group of data along a discrete axis, so the partitions are the perfect for modeling that. Only the last partition will be used on normal cases. The previous partitions only need to exists until the operator deletes them, which will be sometime between 1-6 weeks. Regarding dumps and restore; the system will always be offline during those operations and it will be so for several days, because a new project might start at another location in the world, so the travelling there takes time. In the mean time, all admin tasks can be performed without problems, even backup operations that take 3 days. regards thomas thomas
On Oct 25, 2007, at 10:36 AM, tfinneid@student.matnat.uio.no wrote: >>> The db worked fine until it reached perhaps 30-40 thousand >>> partitions. >> >> It depends on how you have the partitions set up and how you're >> accessing them. Are all of these partitions under the same parent >> table? If so, then trying run a SELECT COUNT(*) against the parent >> table is simply insane. Think about it, you're asking one query to >> scan 55000 tables. What you need to do is partition based on your >> access patterns, not what you *think* will help with performance down >> the road. Look into constraint exclusion, whether or not you can >> just access child tables directly, and whether you really need all of >> these under one logical table. Also, no matter how you do the >> partitioning, once you get up to that many and more relations in your >> system, dumps and restores take a lot longer. > > The design is based on access patterns, i.e. one partition > represents a > group of data along a discrete axis, so the partitions are the > perfect for > modeling that. Only the last partition will be used on normal > cases. The > previous partitions only need to exists until the operator deletes > them, > which will be sometime between 1-6 weeks. > > Regarding dumps and restore; the system will always be offline during > those operations and it will be so for several days, because a new > project > might start at another location in the world, so the travelling there > takes time. In the mean time, all admin tasks can be performed without > problems, even backup operations that take 3 days. Excellent, it sounds like you should be fine then. One thing to note: if you want to get an "idea" of how many rows you have in your partitions, you can run a SUM aggregate on reltuples in pg_class for all of your partitions. The more recent the last ANALYZE for each table, the more accurate those values will be. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
> Are you selecting directly from the child table, or from the parent > table with constraint_exclusion turned on? the problem was when selecting from the parent table, but selecting from child tables are no problem. As stated in other replies, I only wanted to know how many rows where in the table in total, it is not a part of the actual operations of the server. > But hitting the parent table with no constraining where clause is a > recipe for disaster. The very reason to use partitioning is so that > you never have to scan through a single giant table. So I have found out... > Anyway, you're heading off into new territory with 55,000 partitions. Perhaps, but I am only using the child tables for actual operations though. But I also have a couple of indexes on each child table, so there is now about 150000 indexes as well. The intended operations of the server works fine, its the select on the parent table that fails. > What is the average size, in MB of one of your partitions? I found > with my test, there was a point of diminishing returns after 400 or so > partitions at which point indexes were no longer needed, because the > average query just seq scanned the partitions it needed, and they were > all ~ 16 or 32 Megs. I have no idea, but I suspect about a couple of megabytes each, at least thats the size of the raw data. then maybe add a couple of megabytes more for internal stuff. regards thomas
> So, out of curiosity, I asked my Oracle DBA friend if she'd ever heard > of anyone having 60,000 or so partitions in a table, and she looked at > me like I had a third eye in my forehead and said in her sweet voice > "Well, that would certainly be an edge case". She sounded like she > was worried about me. That means I am exploring new territory, which is good. Of course, there is a possibility the design could be done in a different way so as not to need that many partitions. But I asked on this list a some of months ago about tips on how to design this data model and what I came up with was what I have today. The problem is the work is proprietary and confidential, so its difficult for me to explain in detail what I want to do. But you could look at my previous post asking the question for the descriptions of the problem I want to solve. regards thomas
> tfinneid@student.matnat.uio.no wrote: >> Thats good enough for me, thats exactly what I want. > > In that case, why use partitions at all? They are simple independent > tables. For two reasons, - the data logically belongs together. - because its more practical to create tables as childs of a parent table than as independent tables. - changes to the table is applied to all partitions, and prohibits tables with different dd. - performing the create operation does not require the source code to contain the ddl of the parent table. regards thomas
> Excellent, it sounds like you should be fine then. One thing to > note: if you want to get an "idea" of how many rows you have in your > partitions, you can run a SUM aggregate on reltuples in pg_class for > all of your partitions. The more recent the last ANALYZE for each > table, the more accurate those values will be. cool, thanks. regards thomas
tfinneid@student.matnat.uio.no wrote: > > tfinneid@student.matnat.uio.no wrote: > >> Thats good enough for me, thats exactly what I want. > > > > In that case, why use partitions at all? They are simple independent > > tables. > > For two reasons, > - the data logically belongs together. > - because its more practical to create tables as childs of a parent table > than as independent tables. > - changes to the table is applied to all partitions, and prohibits > tables with different dd. > - performing the create operation does not require the source code to > contain the ddl of the parent table. You can use CREATE TABLE LIKE, which copies the definition but does not set the inheritance. -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC "Saca el libro que tu religión considere como el indicado para encontrar la oración que traiga paz a tu alma. Luego rebootea el computador y ve si funciona" (Carlos Duclós)
tfinneid@student.matnat.uio.no writes: >> In that case, why use partitions at all? They are simple independent >> tables. > For two reasons, > - the data logically belongs together. > - because its more practical to create tables as childs of a parent table > than as independent tables. > - changes to the table is applied to all partitions, and prohibits > tables with different dd. > - performing the create operation does not require the source code to > contain the ddl of the parent table. In other words, you really should have only one table; they aren't independent. What you need to do is dial down your ideas of how many partitions are reasonable to have. regards, tom lane
> tfinneid@student.matnat.uio.no wrote: >> > tfinneid@student.matnat.uio.no wrote: >> >> Thats good enough for me, thats exactly what I want. >> > >> > In that case, why use partitions at all? They are simple independent >> > tables. >> >> For two reasons, >> - the data logically belongs together. >> - because its more practical to create tables as childs of a parent >> table >> than as independent tables. >> - changes to the table is applied to all partitions, and prohibits >> tables with different dd. >> - performing the create operation does not require the source code to >> contain the ddl of the parent table. > > You can use CREATE TABLE LIKE, which copies the definition but does not > set the inheritance. I know, but point 1 of reason 2 was why I decided against it. I was discussing the different options with myself when I was trying to decide, and I decided on the partitions. So unless there will be a problem with the real operations, I cant se any reason to change. regards thomas
On Oct 25, 2007, at 11:16 AM, Alvaro Herrera wrote: > tfinneid@student.matnat.uio.no wrote: >>> tfinneid@student.matnat.uio.no wrote: >>>> Thats good enough for me, thats exactly what I want. >>> >>> In that case, why use partitions at all? They are simple >>> independent >>> tables. >> >> For two reasons, >> - the data logically belongs together. >> - because its more practical to create tables as childs of a >> parent table >> than as independent tables. >> - changes to the table is applied to all partitions, and prohibits >> tables with different dd. >> - performing the create operation does not require the source >> code to >> contain the ddl of the parent table. > > You can use CREATE TABLE LIKE, which copies the definition but does > not > set the inheritance. That won't propogate changes made later. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
> tfinneid@student.matnat.uio.no writes: >>> In that case, why use partitions at all? They are simple independent >>> tables. > >> For two reasons, >> - the data logically belongs together. >> - because its more practical to create tables as childs of a parent >> table >> than as independent tables. >> - changes to the table is applied to all partitions, and prohibits >> tables with different dd. >> - performing the create operation does not require the source code to >> contain the ddl of the parent table. > > In other words, you really should have only one table; they aren't > independent. What you need to do is dial down your ideas of how many > partitions are reasonable to have. Yes, but no. Each partition represents a chunk of information on a discrete timeline. So there is no point in grouping it all into a single table, because the access pattern is to only access data from a specific point in time, i.e. a single partition, usually the latest. Since the amount of data is so big, approx 3MB per second, and each partition needs to be indexed before the clients start reading the data (in the same second). I find its better to use partitions, even though I am not actually using it. regards thomas
> this is my config > > checkpoint_segments = 96 > effective_cache_size = 128000 > shared_buffers = 430000 > max_fsm_pages = 208000 > max_fsm_relations = 10000 > > max_connections = 1000 > > autovacuum = off # enable autovacuum subprocess? > > fsync = on # turns forced synchronization on > or off > #full_page_writes = on # recover from partial page writes > wal_sync_method = fdatasync > wal_buffers = 256 > > commit_delay = 5 > #commit_siblings = 5 # range 1-1000 Now that you hopefully understand more about the server i am building, does anybody got any suggestions on improvements of the config? I could certainly reduce the max_connections to 1/10. but are there other configurations that could either be reduced or increased or even set to a non default value? regards thomas
tfinneid@student.matnat.uio.no writes: >> In other words, you really should have only one table; they aren't >> independent. What you need to do is dial down your ideas of how many >> partitions are reasonable to have. > Yes, but no. Each partition represents a chunk of information on a > discrete timeline. So there is no point in grouping it all into a single > table, because the access pattern is to only access data from a specific > point in time, i.e. a single partition, usually the latest. Since the > amount of data is so big, approx 3MB per second, and each partition needs > to be indexed before the clients start reading the data (in the same > second). I find its better to use partitions, even though I am not > actually using it. You are making a common beginner error, which is to suppose that N little tables are better than one big one. They are not. What you're effectively doing is replacing the upper levels of a big table's indexes with lookups in the system catalogs, which in point of fact is a terrible tradeoff from a performance standpoint. From a database-theory standpoint, if all this data is alike then you should have it all in one big table. There are certain practical cases where it's worth partitioning, but not at the level of granularity that you are proposing. This is why nobody, not even Oracle, tries to support tens of thousands of partitions. regards, tom lane
Alvaro Herrera wrote: ... > > You can use CREATE TABLE LIKE, which copies the definition but does not > set the inheritance. > Well, sort of. Unless I'm using it incorrectly it only copies the basic column definitions and, as optionally specified, defaults and some of the constraints. Primary key constraints are lost as CREATE TABLE newtable (LIKE oldtable) does not create any indexes including those necessary for the primary key. I don't know how foreign-key constraints are handled as I haven't used this form of CREATE TABLE where foreign keys are involved. Cheers, Steve
On 10/25/07, Steve Crawford <scrawford@pinpointresearch.com> wrote: > Alvaro Herrera wrote: > ... > > > > You can use CREATE TABLE LIKE, which copies the definition but does not > > set the inheritance. > > > > Well, sort of. > > Unless I'm using it incorrectly it only copies the basic column > definitions and, as optionally specified, defaults and some of the > constraints. > > Primary key constraints are lost as CREATE TABLE newtable (LIKE > oldtable) does not create any indexes including those necessary for the > primary key. > > I don't know how foreign-key constraints are handled as I haven't used > this form of CREATE TABLE where foreign keys are involved. Neither inheritance nor creating LIKE will inherit primary keys. Foreign keys will not be inherited by either method either.
Tom Lane wrote: > You are making a common beginner error, which is to suppose that N > little tables are better than one big one. They are not. Well that depends on how you define better. For my purposes, it is better. > What you're > effectively doing is replacing the upper levels of a big table's indexes > with lookups in the system catalogs, which in point of fact is a > terrible tradeoff from a performance standpoint. Only if you assume I use all data in all tables all the time. But as I have explained in other replies recently, most of the times only data from the newest child table is used. I did the performance tests before deciding on the design and having it all in one large table would not perform at all within requirements, The reason was that the indexes for the ever growing table would take longer and longer to update at each insert. When I use partitions, or child tables, I can use COPY to insert the data into the new chilkd table and then add the indexes to the single table only. That was, by far, the fastets solution. > From a database-theory standpoint, if all this data is alike then you > should have it all in one big table. Then, what is the point with partitions, if you can not use it to somehow separate logically similar data into different paritions because one has a need to do so? Of course I could have put it in a single table, had it not been for the performance. I could have used a discrete timestamp to separate the data, but why? partitions is more practical. > There are certain practical cases > where it's worth partitioning, but not at the level of granularity that > you are proposing. If its practical to use partitions, granularity does not come into the equation. regards thomas
On 10/25/07, Thomas Finneid <tfinneid@student.matnat.uio.no> wrote: > > Tom Lane wrote: > > > You are making a common beginner error, which is to suppose that N > > little tables are better than one big one. They are not. > > Well that depends on how you define better. For my purposes, it is better. > > > What you're > > effectively doing is replacing the upper levels of a big table's indexes > > with lookups in the system catalogs, which in point of fact is a > > terrible tradeoff from a performance standpoint. > > Only if you assume I use all data in all tables all the time. But as I > have explained in other replies recently, most of the times only data > from the newest child table is used. > > I did the performance tests before deciding on the design and having it > all in one large table would not perform at all within requirements, The > reason was that the indexes for the ever growing table would take longer > and longer to update at each insert. > > When I use partitions, or child tables, I can use COPY to insert the > data into the new chilkd table and then add the indexes to the single > table only. That was, by far, the fastets solution. > > > From a database-theory standpoint, if all this data is alike then you > > should have it all in one big table. > > Then, what is the point with partitions, if you can not use it to > somehow separate logically similar data into different paritions because > one has a need to do so? Of course I could have put it in a single > table, had it not been for the performance. I could have used a discrete > timestamp to separate the data, but why? partitions is more practical. > > > There are certain practical cases > > where it's worth partitioning, but not at the level of granularity that > > you are proposing. > > If its practical to use partitions, granularity does not come into the > equation. It may well be that one big table and partial indexes would do what you want. Did you explore partial indexes against one big table? That can be quite handy. i.e create table mybigtable (ts timestamp, id int primary key, row1 numeric, ..... rown numeric); populate with a years worth of data, i.e. 100M rows or something like that create index mybigtable_20071025 on mybigtable (id) where ts between '2007-10-25 00:00:00' and '2007-10-25 23:59:59.99999'; repeat as needed. now, when you want something from the table, you can just ask for it with a timestamp range and it will hit the index, and the table, all pretty fast. Worth a look I guess.
"Thomas Finneid" <tfinneid@student.matnat.uio.no> writes: >> What you're >> effectively doing is replacing the upper levels of a big table's indexes >> with lookups in the system catalogs, which in point of fact is a >> terrible tradeoff from a performance standpoint. > > Only if you assume I use all data in all tables all the time. But as I have > explained in other replies recently, most of the times only data from the > newest child table is used. Tom's point is that if you have 55k tables then just *finding* the newest child table is fairly expensive. You're accessing a not insignificant-sized index and table of tables. And the situation is worse when you consider the number of columns all those tables have, all the indexes those tables have, all the column keys those indexes the tables have have, etc. Nonetheless you've more or less convinced me that you're not completely nuts. I would suggest not bothering with inheritance though. Inheritance imposes additional costs to track the inheritance relationships. For your purposes you may as well just create separate tables and not bother trying to use inheritance. > If its practical to use partitions, granularity does not come into the > equation. Uhm, yes it does. This is engineering, it's all about trade-offs. Having 55k tables will have costs and benefits. I think it's a bit early to dismiss the costs. Keep in mind that profiling them may be a bit tricky since they occur during planning and DDL that you haven't finished experimenting with yet. The problem you just ran into is just an example of the kind of costs it imposes. You should also consider some form of compromise with separate tables but at a lower level of granularity. Perhaps one partition per day instead of one per 30s. you could drop a partition when all the keys in it are marked as dead. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Em Thursday 25 October 2007 13:36:09 tfinneid@student.matnat.uio.no escreveu: > > Regarding dumps and restore; the system will always be offline during > those operations and it will be so for several days, because a new project > might start at another location in the world, so the travelling there > takes time. In the mean time, all admin tasks can be performed without > problems, even backup operations that take 3 days. This sounds a lot like oil exploration... Data gathered from sensors is usually a few TBs, explosions have definite intervals, interference between sensors, etc. Putting the data inside the DB fast is part of the solution, getting it out fast to be processes / analyzed is another part. -- Jorge Godoy <jgodoy@gmail.com>
Gregory Stark wrote: > Tom's point is that if you have 55k tables then just *finding* the newest > child table is fairly expensive. You're accessing a not insignificant-sized > index and table of tables. And the situation is worse when you consider the > number of columns all those tables have, all the indexes those tables have, > all the column keys those indexes the tables have have, etc. Yes, I got that. But I name the child tables so that I when my server receives read requests, I retreive details from the request to be able to figure out the exact child table name, without the system needing to do any internal searches to find the newest table. > Nonetheless you've more or less convinced me that you're not completely nuts. thank you for only regarding me as somewhat nuts :) > I would suggest not bothering with inheritance though. Inheritance imposes > additional costs to track the inheritance relationships. For your purposes you > may as well just create separate tables and not bother trying to use > inheritance. As addressed in a previous reply, I find inheritance better for a couple of practical reasons. >> If its practical to use partitions, granularity does not come into the >> equation. > > Uhm, yes it does. This is engineering, it's all about trade-offs. Having 55k > tables will have costs and benefits. I think it's a bit early to dismiss the > costs. Keep in mind that profiling them may be a bit tricky since they occur > during planning and DDL that you haven't finished experimenting with yet. The > problem you just ran into is just an example of the kind of costs it imposes. See answer on why granularity is not relevant for my case. > You should also consider some form of compromise with separate tables but at a > lower level of granularity. Perhaps one partition per day instead of one per > 30s. you could drop a partition when all the keys in it are marked as dead. The structure of the data is divided in a descrete timeline, so every predefined x seconds a whole new bunch of data arrives, and all that belongs in a single partition. regards thomas
Scott Marlowe wrote: > It may well be that one big table and partial indexes would do what > you want. Did you explore partial indexes against one big table? > That can be quite handy. Hmm, interresting, I suppose it could work. Tanks for the suggestion, Ill keep it in mind. regards thomas
Jorge Godoy wrote: > Em Thursday 25 October 2007 13:36:09 tfinneid@student.matnat.uio.no escreveu: >> Regarding dumps and restore; the system will always be offline during >> those operations and it will be so for several days, because a new project >> might start at another location in the world, so the travelling there >> takes time. In the mean time, all admin tasks can be performed without >> problems, even backup operations that take 3 days. > > This sounds a lot like oil exploration... Data gathered from sensors is > usually a few TBs, explosions have definite intervals, interference between > sensors, etc. Sorry I cant talk about what the work actually is, a colleague of mine just got reprimanded for just mentioning he was working on a compression library. The manager thought he was revealing *too much* :) > Putting the data inside the DB fast is part of the solution, getting it out > fast to be processes / analyzed is another part. But you are right about that part, things needs to be fast. regards thomas
On Fri, Oct 26, 2007 at 08:26:13AM +0200, Thomas Finneid wrote: > Scott Marlowe wrote: > >It may well be that one big table and partial indexes would do what > >you want. Did you explore partial indexes against one big table? > >That can be quite handy. > > Hmm, interresting, I suppose it could work. Tanks for the suggestion, > Ill keep it in mind. That's still going to have to do a tablescan on the whole dataset (a couple of terabytes?) before building the index isn't it? that doesn't sound like something you'd want to do too often. Are there any thoughts of deferring index update so that many rows can be merged simultaneously, rather than doing many individual index operations? It sounds as though this is what Thomas is really after and it would also remove the need for dropping indexes while doing a bulk insert of data. I apologise if this has been discussed before! Sam
On Fri, Oct 26, 2007 at 08:25:09AM +0200, Thomas Finneid wrote: > Gregory Stark wrote: > >Tom's point is that if you have 55k tables then just *finding* the newest > >child table is fairly expensive. You're accessing a not insignificant-sized > >index and table of tables. And the situation is worse when you consider the > >number of columns all those tables have, all the indexes those tables have, > >all the column keys those indexes the tables have have, etc. > > Yes, I got that. But I name the child tables so that I when my server > receives read requests, I retreive details from the request to be able > to figure out the exact child table name, without the system needing to > do any internal searches to find the newest table. I think the lookup that is being referred to is the fact that if you've got 55k (plus) files in a directory then the filesystem still has to perform a search in the directory to locate the actual file associated with the filename. There probably isn't going to be much difference between the filesystem performing this lookup vs the database descending a few levels of its index structure. I have a feeling you may have thought about this before, but thought it wouldn't hurt to point it out more explicitly. Sam
"Sam Mason" <sam@samason.me.uk> writes: > On Fri, Oct 26, 2007 at 08:25:09AM +0200, Thomas Finneid wrote: >> Gregory Stark wrote: >> >Tom's point is that if you have 55k tables then just *finding* the newest >> >child table is fairly expensive. You're accessing a not insignificant-sized >> >index and table of tables. And the situation is worse when you consider the >> >number of columns all those tables have, all the indexes those tables have, >> >all the column keys those indexes the tables have have, etc. >> >> Yes, I got that. But I name the child tables so that I when my server >> receives read requests, I retreive details from the request to be able >> to figure out the exact child table name, without the system needing to >> do any internal searches to find the newest table. > > I think the lookup that is being referred to is the fact that if you've > got 55k (plus) files in a directory then the filesystem still has to > perform a search in the directory to locate the actual file associated > with the filename. There probably isn't going to be much difference > between the filesystem performing this lookup vs the database descending > a few levels of its index structure. That's true but it's in *addition* to the database having to find the catalog records for the table which involves an index lookup itself. Actually many index lookups since it has to look up the catalog record for the table, for all the columns of the table, for all indexes of the table, for all the index keys of those indexes, all constraints of the table, all triggers of the table, all dependencies on other objects and of other objects on this table (this latter is why I suggest not using inheritance). Each of these lookups is using an index to find the table out of 55k records which is just the same work that you're saving in the top level of the index tree. If you think there's one obvious solution then you just haven't analyzed the problem seriously. In serious engineering there are always tradeoffs. The cleanest prettiest solution is not necessarily -- and in cases where you're dealing with large numbers like this almost certainly isn't -- the optimal choice. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Fri, Oct 26, 2007 at 02:21:39PM +0100, Gregory Stark wrote: > "Sam Mason" <sam@samason.me.uk> writes: > > I think the lookup that is being referred to is the fact that if you've > > got 55k (plus) files in a directory then the filesystem still has to > > perform a search in the directory to locate the actual file associated > > with the filename. There probably isn't going to be much difference > > between the filesystem performing this lookup vs the database descending > > a few levels of its index structure. > > That's true but it's in *addition* to the database having to find the catalog > records for the table which involves an index lookup itself. > > Actually many index lookups since it has to look up the catalog record for the > table, for all the columns of the table, for all indexes of the table, for all > the index keys of those indexes, all constraints of the table, all triggers of > the table, all dependencies on other objects and of other objects on this > table (this latter is why I suggest not using inheritance). OK, sounds pretty horrible to contemplate. If most of the queries are to the same table (sounds like they will be) then the relavant information will remain resident in memory won't it? Sam
> "Sam Mason" <sam@samason.me.uk> writes: > >> On Fri, Oct 26, 2007 at 08:25:09AM +0200, Thomas Finneid wrote: >>> Gregory Stark wrote: >>> >Tom's point is that if you have 55k tables then just *finding* the >>> newest >>> >child table is fairly expensive. You're accessing a not >>> insignificant-sized >>> >index and table of tables. And the situation is worse when you >>> consider the >>> >number of columns all those tables have, all the indexes those tables >>> have, >>> >all the column keys those indexes the tables have have, etc. >>> >>> Yes, I got that. But I name the child tables so that I when my server >>> receives read requests, I retreive details from the request to be able >>> to figure out the exact child table name, without the system needing to >>> do any internal searches to find the newest table. >> >> I think the lookup that is being referred to is the fact that if you've >> got 55k (plus) files in a directory then the filesystem still has to >> perform a search in the directory to locate the actual file associated >> with the filename. There probably isn't going to be much difference >> between the filesystem performing this lookup vs the database descending >> a few levels of its index structure. > > That's true but it's in *addition* to the database having to find the > catalog > records for the table which involves an index lookup itself. > > Actually many index lookups since it has to look up the catalog record for > the > table, for all the columns of the table, for all indexes of the table, for > all > the index keys of those indexes, all constraints of the table, all > triggers of > the table, all dependencies on other objects and of other objects on this > table (this latter is why I suggest not using inheritance). > > Each of these lookups is using an index to find the table out of 55k > records > which is just the same work that you're saving in the top level of the > index > tree. > > If you think there's one obvious solution then you just haven't analyzed > the > problem seriously. In serious engineering there are always tradeoffs. The > cleanest prettiest solution is not necessarily -- and in cases where > you're > dealing with large numbers like this almost certainly isn't -- the optimal > choice. Serious engineering does not imply perfect engineering, I have analyzed it and made my tradeoffs. What you are forgetting here is that you clearly dont understand the enire solution, So I will try to explain it again. And if you still think its bonkers, the I urge you to come up with a solution that works with the requirements. Every predefined X seconds (e.g. 3,6,9,12 etc ) a bunch of data arrives, which must be stored by descrete time groups, e.g. second 3,6,9,12. The data that arrives is approx 4MB per second, so in this case its 12MB. This has to be processed by the server and written to the db, within 1 second. There can be up to 5 writers at the same time. Within that same second, at least 16 readers should be able to read all the data, *each*. Writers and readers are only concerned with the latest data, i.e. data from the latest time group, e.g. second 9. This has to go on every predefined seconds for the next 6-12 weeks, without stop, pause or any errors. These are the requirements. When I performed performance tests I found several unwanted effects from several test scenarios. Here are the most important ones: - single large table, with indexes created when table is created. - this leads to the performance of an insert degrading as more data is added, when I get to 1 billion rows it took 50 seconds to add the data. My lesson from this is that - single inserts can never be efficient enough - indexes cause linear performance drop as data volume increases So I tried a different approach, which would address both issues: - separate tables for each bulk of data - use of bulk insert through jdbc COPY. - add indexes to the newly create table after the copy is finished. My lesson from this is: - insert take constant time, no matter how much data is in the base - adding the indexes after insert takes constant time, i.e. some milliseconds. From this I realised that using either single tables or partitions is the way to go, since I only need to access the latest data, i.e. the newest table, in normal situations. After thinking about it and discussing with this group, I found that using partitions would be more practical for two reasons: - changes to the parent table is automatically propagated to all child tables, so the schema remains consistent and the server wont brake because of differences in the tables. - it is more maintainable to use "create with inheritance" sql in source code than the entire ddl of the table. So now I have tested the server 24/7 for a week and a half, with 1 writer and 16 readers writing all the mentioned data, and everything works fine. Expect for the select on the parent table, which now runs out of memory. Which in it self is not a problem since I will never use the parent table in production in any case. regards tom
tfinneid@student.matnat.uio.no wrote: > > tfinneid@student.matnat.uio.no wrote: > > > >> > are a dump of Postgres's current memory allocations and could be > >> useful in > >> > showing if there's a memory leak causing this. > >> > >> The file is 20M, these are the last lines: (the first line continues > >> unttill ff_26000) > >> > >> > >> idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free (0 > >> chunks); 632 used > > > > You have 26000 partitions??? > > At the moment the db has 55000 partitions, and thats only a fifth of the > complete volume the system will have in production. The reason I chose > this solution is that a partition will be loaded with new data every 3-30 > seconds, and all that will be read by up to 15 readers every time new data > is available. The data will be approx 2-4TB in production in total. So it > will be too slow if I put it in a single table with permanent indexes. > > I did a test previously, where I created 1 million partitions (without > data) and I checked the limits of pg, so I think it should be ok. Clearly it's not. The difference could be the memory usage and wastage for all those relcache entries and other stuff. I would reduce the number of partitions to a more reasonable value (within the tens, most likely) Maybe your particular problem can be solved by raising max_locks_per_transaction (?) but I wouldn't count on it. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
On Friday 26 October 2007 8:56 am, tfinneid@student.matnat.uio.no wrote: > > Serious engineering does not imply perfect engineering, I have analyzed it > and made my tradeoffs. What you are forgetting here is that you clearly > dont understand the enire solution, So I will try to explain it again. And > if you still think its bonkers, the I urge you to come up with a solution > that works with the requirements. > > Every predefined X seconds (e.g. 3,6,9,12 etc ) a bunch of data arrives, > which must be stored by descrete time groups, e.g. second 3,6,9,12. The > data that arrives is approx 4MB per second, so in this case its 12MB. This > has to be processed by the server and written to the db, within 1 second. > There can be up to 5 writers at the same time. Within that same second, at > least 16 readers should be able to read all the data, *each*. Writers and > readers are only concerned with the latest data, i.e. data from the latest > time group, e.g. second 9. > This has to go on every predefined seconds for the next 6-12 weeks, > without stop, pause or any errors. These are the requirements. > > When I performed performance tests I found several unwanted effects from > several test scenarios. Here are the most important ones: > > - single large table, with indexes created when table is created. > - this leads to the performance of an insert degrading as more data is > added, when I get > to 1 billion rows it took 50 seconds to add the data. > > My lesson from this is that > - single inserts can never be efficient enough > - indexes cause linear performance drop as data volume increases > > So I tried a different approach, which would address both issues: > > - separate tables for each bulk of data > - use of bulk insert through jdbc COPY. > - add indexes to the newly create table after the copy is finished. > > My lesson from this is: > - insert take constant time, no matter how much data is in the base > - adding the indexes after insert takes constant time, i.e. some > milliseconds. > > From this I realised that using either single tables or partitions is the > way to go, since I only need to access the latest data, i.e. the newest > table, in normal situations. > > After thinking about it and discussing with this group, I found that using > partitions would be more practical for two reasons: > - changes to the parent table is automatically propagated to all child > tables, so the schema remains consistent and the server wont brake because > of differences in the tables. > - it is more maintainable to use "create with inheritance" sql in source > code than the entire ddl of the table. > > So now I have tested the server 24/7 for a week and a half, with 1 writer > and 16 readers writing all the mentioned data, and everything works fine. > Expect for the select on the parent table, which now runs out of memory. > Which in it self is not a problem since I will never use the parent table > in production in any case. > > regards > > tom I might be missing the point, but couldn't you do a Copy to a single table instead of multiple inserts and avoid the index overhead. -- Adrian Klaver aklaver@comcast.net
Adrian Klaver wrote: > I might be missing the point, but couldn't you do a Copy to a single table > instead of multiple inserts and avoid the index overhead. Are you saying, have one large table with indexes and do a COPY to it or are you saying a one small empty table and do a COPY to it? thomas
-------------- Original message ---------------------- From: Thomas Finneid <tfinneid@student.matnat.uio.no> > > > Adrian Klaver wrote: > > I might be missing the point, but couldn't you do a Copy to a single table > > instead of multiple inserts and avoid the index overhead. > > Are you saying, have one large table with indexes and do a COPY to it or > are you saying a one small empty table and do a COPY to it? > > thomas I'm thinking do a COPY to one large table. If the cost of indexing is relatively fixed as you indicated in your previous post then you reduce the indexing overhead to each COPY operation instead of each insert. -- Adrian Klaver aklaver@comcast.net
Adrian Klaver wrote: > I'm thinking do a COPY to one large table. If the cost of indexing is relatively fixed as > you indicated in your previous post then you reduce the indexing overhead to each > COPY operation instead of each insert. No, what I meant whas that creating an index on a table with data after the data has been COPYed, is faster than having one large table with an existing index where data is added all the time and the index is updatesd along with that. I did do a performance test some time ago on that or something similar, I dont remember the exact details, but what ever I tested it was slower. But I will try exactly what you suggested tomorrow, when at the test machine. The issue is it has to runn for some time before it is possible to see any lasting degradation in performance. thomas
On 25 Okt, 17:36, tfinn...@student.matnat.uio.no wrote: > > The design is based on access patterns, i.e. one partition represents a > group of data along a discrete axis, so the partitions are the perfect for > modeling that. Only the last partition will be used on normal cases. The > previous partitions only need to exists until the operator deletes them, > which will be sometime between 1-6 weeks. This has been interesting reading because I'm working on a system which involves a more batch-oriented approach in loading the data, where I've found partitions to be useful both from a performance perspective (it looks like my indexes would be inconveniently big otherwise for the total volume of data) and from an administrative perspective (it's convenient to control the constraints for discrete subsets of my data). However, if all but the most recent data remains relatively stable, why not maintain your own statistics for each partition or, as someone else suggested, use the pg_class statistics? I'd just be interested to hear what the best practices are when tables get big and where the access patterns favour the most recently loaded data and/or reliably identifiable subsets of the data, as they seem to in this case and in my own case. The various tuning guides out there have been very useful, but isn't there a point at which partitioning is inevitable? Paul