Thread: select count() out of memory

select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
Hi

I am volume testing a db model that consists of a paritioned tables. The
db has been running for a week and a half now and has built up to contain
approx 55000 partition tables of 18000 rows each. The root table therefore
contains about 1 billion rows. When I try to do a "select count(*)" of the
root table, it does some work for a while, perhaps 5-10 minutes and the
aborts with

    ERROR:  out of memory
    DETAIL:  Failed on request of size 130.

Does anybody have any suggestion as to which parameter I should tune to
give it more memory to be able to perform queries on the root table?

regards

thomas


The last parts of the db log is the following, I dont think anything other
than the last 2 lines are relevant.

pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_amproc_opc_proc_index: 1024 total in 1 blocks; 256 free (0 chunks); 768
used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks); 768
used
MdSmgr: 4186112 total in 9 blocks; 911096 free (4 chunks); 3275016 used
LockTable (locallock hash): 2088960 total in 8 blocks; 418784 free (25
chunks); 1670176 used
Timezones: 47592 total in 2 blocks; 5968 free (0 chunks); 41624 used
ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
ERROR:  out of memory
DETAIL:  Failed on request of size 130.



Re: select count() out of memory

From
Gregory Stark
Date:
<tfinneid@student.matnat.uio.no> writes:

>
>     ERROR:  out of memory
>     DETAIL:  Failed on request of size 130.
>
> Does anybody have any suggestion as to which parameter I should tune to
> give it more memory to be able to perform queries on the root table?

This indicates that malloc() failed which means the system couldn't provide
any more memory. Either you have a very low memory ulimit (look at ulimit -a
in the same session as Postgres) or your machine is really low on memory.
Perhaps you have shared_buffers set very high or some other program is using
all your available memory (and swap)?

> The last parts of the db log is the following, I dont think anything other
> than the last 2 lines are relevant.

You're wrong. All the lines like:

> pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0
> chunks); 696 used

are a dump of Postgres's current memory allocations and could be useful in
showing if there's a memory leak causing this.

Also, what version of Postgres is this?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
Hi

I have tried to answer to the best of my knowledge but its running on
Soalris 10, and I am not that familiar with solaris ( Go Linux!!! :)

> any more memory. Either you have a very low memory ulimit (look at ulimit
> -a
> in the same session as Postgres) or your machine is really low on memory.
> Perhaps you have shared_buffers set very high or some other program is
> using
> all your available memory (and swap)?
>

the machine has 32GB RAM, I dont know how much swap it has, but I do know
the disk system is a disk cluster with 16x450GB disks, it probably has a
local disk as well but I dont know how big it is.

-bash-3.00$ ulimit -a
core file size        (blocks, -c) unlimited
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
open files                    (-n) 256
pipe size          (512 bytes, -p) 10
stack size            (kbytes, -s) 10240
cpu time             (seconds, -t) unlimited
max user processes            (-u) 16357
virtual memory        (kbytes, -v) unlimited


this is my config

checkpoint_segments = 96
effective_cache_size = 128000
shared_buffers = 430000
max_fsm_pages = 208000
max_fsm_relations = 10000

max_connections = 1000

autovacuum = off                        # enable autovacuum subprocess?

fsync = on                              # turns forced synchronization on
or off
#full_page_writes = on                  # recover from partial page writes
wal_sync_method = fdatasync
wal_buffers = 256

commit_delay = 5
#commit_siblings = 5                    # range 1-1000



> Also, what version of Postgres is this?

Apparently its 8.1.8, I thought it was 8.2

> are a dump of Postgres's current memory allocations and could be useful in
> showing if there's a memory leak causing this.

The file is 20M, these are the last lines: (the first line continues
unttill ff_26000)


idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_4_value2: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_4_value1: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_4_trace_id: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_3_value7: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_3_value2: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_3_value1: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_3_trace_id: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_2_value7: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_2_value2: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_2_value1: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_2_trace_id: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_1_value7: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_1_value2: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_1_value1: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
idx_attributes_g1_seq_1_ff_1_trace_id: 1024 total in 1 blocks; 392 free (0
chunks); 632 used
pg_index_indrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_namespace_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_type_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
pg_proc_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0
chunks); 768 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks);
696 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_authid_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_operator_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 392 free (0 chunks);
632 used
pg_class_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0
chunks); 696 used
pg_amproc_opc_proc_index: 1024 total in 1 blocks; 256 free (0 chunks); 768
used
pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks); 768
used
MdSmgr: 4186112 total in 9 blocks; 911096 free (4 chunks); 3275016 used
LockTable (locallock hash): 2088960 total in 8 blocks; 418784 free (25
chunks); 1670176 used
Timezones: 47592 total in 2 blocks; 5968 free (0 chunks); 41624 used
ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
ERROR:  out of memory
DETAIL:  Failed on request of size 130.






Re: select count() out of memory

From
Gregory Stark
Date:
<tfinneid@student.matnat.uio.no> writes:

> max_connections = 1000

Do you actually have anywhere near this number of processes? What is your
setting for work_mem? Keep in mind every process could use as much as work_mem
and actually it's possible to use that much several times over.

Also, what is your maintenance_work_mem and do you have many vacuums or other
such commands running at the time?

1,000 processes is a large number of processes. You may be better off
re-architecting to run fewer processes simultaneously. But if that's not
possible you'll have to keep it in mind to tune other things properly.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: select count() out of memory

From
"Martin Gainty"
Date:
God morgen

Please display these memory settings from your postgresql.conf file
sort_mem
shared_buffers

Takk
Martin--
----- Original Message -----
From: <tfinneid@student.matnat.uio.no>
To: "Gregory Stark" <stark@enterprisedb.com>
Cc: <tfinneid@student.matnat.uio.no>; <pgsql-general@postgresql.org>
Sent: Thursday, October 25, 2007 7:07 AM
Subject: Re: [GENERAL] select count() out of memory


> Hi
>
> I have tried to answer to the best of my knowledge but its running on
> Soalris 10, and I am not that familiar with solaris ( Go Linux!!! :)
>
> > any more memory. Either you have a very low memory ulimit (look at
ulimit
> > -a
> > in the same session as Postgres) or your machine is really low on
memory.
> > Perhaps you have shared_buffers set very high or some other program is
> > using
> > all your available memory (and swap)?
> >
>
> the machine has 32GB RAM, I dont know how much swap it has, but I do know
> the disk system is a disk cluster with 16x450GB disks, it probably has a
> local disk as well but I dont know how big it is.
>
> -bash-3.00$ ulimit -a
> core file size        (blocks, -c) unlimited
> data seg size         (kbytes, -d) unlimited
> file size             (blocks, -f) unlimited
> open files                    (-n) 256
> pipe size          (512 bytes, -p) 10
> stack size            (kbytes, -s) 10240
> cpu time             (seconds, -t) unlimited
> max user processes            (-u) 16357
> virtual memory        (kbytes, -v) unlimited
>
>
> this is my config
>
> checkpoint_segments = 96
> effective_cache_size = 128000
> shared_buffers = 430000
> max_fsm_pages = 208000
> max_fsm_relations = 10000
>
> max_connections = 1000
>
> autovacuum = off                        # enable autovacuum subprocess?
>
> fsync = on                              # turns forced synchronization on
> or off
> #full_page_writes = on                  # recover from partial page writes
> wal_sync_method = fdatasync
> wal_buffers = 256
>
> commit_delay = 5
> #commit_siblings = 5                    # range 1-1000
>
>
>
> > Also, what version of Postgres is this?
>
> Apparently its 8.1.8, I thought it was 8.2
>
> > are a dump of Postgres's current memory allocations and could be useful
in
> > showing if there's a memory leak causing this.
>
> The file is 20M, these are the last lines: (the first line continues
> unttill ff_26000)
>
>
> idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_4_value2: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_4_value1: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_4_trace_id: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_3_value7: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_3_value2: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_3_value1: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_3_trace_id: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_2_value7: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_2_value2: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_2_value1: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_2_trace_id: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_1_value7: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_1_value2: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_1_value1: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> idx_attributes_g1_seq_1_ff_1_trace_id: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used
> pg_index_indrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
> used
> pg_namespace_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
> pg_statistic_relid_att_index: 1024 total in 1 blocks; 328 free (0 chunks);
> 696 used
> pg_type_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
> pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
> used
> pg_proc_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
> pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks);
> 696 used
> pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0
> chunks); 768 used
> pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks);
> 696 used
> pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0 chunks);
> 632 used
> pg_authid_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
> pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 328 free (0
> chunks); 696 used
> pg_operator_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632
used
> pg_index_indexrelid_index: 1024 total in 1 blocks; 392 free (0 chunks);
> 632 used
> pg_class_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used
> pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 328 free (0
> chunks); 696 used
> pg_amproc_opc_proc_index: 1024 total in 1 blocks; 256 free (0 chunks); 768
> used
> pg_amop_opc_strat_index: 1024 total in 1 blocks; 256 free (0 chunks); 768
> used
> MdSmgr: 4186112 total in 9 blocks; 911096 free (4 chunks); 3275016 used
> LockTable (locallock hash): 2088960 total in 8 blocks; 418784 free (25
> chunks); 1670176 used
> Timezones: 47592 total in 2 blocks; 5968 free (0 chunks); 41624 used
> ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
> ERROR:  out of memory
> DETAIL:  Failed on request of size 130.
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>


Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> <tfinneid@student.matnat.uio.no> writes:
>
>> max_connections = 1000
>
> Do you actually have anywhere near this number of processes? What is your
> setting for work_mem? Keep in mind every process could use as much as
> work_mem
> and actually it's possible to use that much several times over.
>
> Also, what is your maintenance_work_mem and do you have many vacuums or
> other
> such commands running at the time?
>
> 1,000 processes is a large number of processes. You may be better off
> re-architecting to run fewer processes simultaneously. But if that's not
> possible you'll have to keep it in mind to tune other things properly.

The application only needs about 20 connections under normal situations,
but might need up to 100 in some situations, f.ex. if there is much
latency and new connections arrive before another is finished.

I could certainly reduce the number to 100 or 50, but do you think that
would help with this problem.

regards

thomas


Re: select count() out of memory

From
Alvaro Herrera
Date:
tfinneid@student.matnat.uio.no wrote:

> > are a dump of Postgres's current memory allocations and could be useful in
> > showing if there's a memory leak causing this.
>
> The file is 20M, these are the last lines: (the first line continues
> unttill ff_26000)
>
>
> idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free (0
> chunks); 632 used

You have 26000 partitions???

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> God morgen
>
> Please display these memory settings from your postgresql.conf file
> sort_mem
> shared_buffers

I have shown the entire configuration. if its not in the configuration
shown, I have changed its value.

I have used the configuration example provided by Sun regarding running
postgres on solaris.

http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp

regards

thomas




Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> tfinneid@student.matnat.uio.no wrote:
>
>> > are a dump of Postgres's current memory allocations and could be
>> useful in
>> > showing if there's a memory leak causing this.
>>
>> The file is 20M, these are the last lines: (the first line continues
>> unttill ff_26000)
>>
>>
>> idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free (0
>> chunks); 632 used
>
> You have 26000 partitions???

At the moment the db has 55000 partitions, and thats only a fifth of the
complete volume the system will have in production. The reason I chose
this solution is that a partition will be loaded with new data every 3-30
seconds, and all that will be read by up to 15 readers every time new data
is available. The data will be approx 2-4TB in production in total. So it
will be too slow if I put it in a single table with permanent indexes.

I did a test previously, where I created 1 million partitions (without
data) and I checked the limits of pg, so I think it should be ok.

thomas


Re: select count() out of memory

From
Alvaro Herrera
Date:
tfinneid@student.matnat.uio.no wrote:
> > tfinneid@student.matnat.uio.no wrote:
> >
> >> > are a dump of Postgres's current memory allocations and could be
> >> useful in
> >> > showing if there's a memory leak causing this.
> >>
> >> The file is 20M, these are the last lines: (the first line continues
> >> unttill ff_26000)
> >>
> >>
> >> idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free (0
> >> chunks); 632 used
> >
> > You have 26000 partitions???
>
> At the moment the db has 55000 partitions, and thats only a fifth of the
> complete volume the system will have in production. The reason I chose
> this solution is that a partition will be loaded with new data every 3-30
> seconds, and all that will be read by up to 15 readers every time new data
> is available. The data will be approx 2-4TB in production in total. So it
> will be too slow if I put it in a single table with permanent indexes.
>
> I did a test previously, where I created 1 million partitions (without
> data) and I checked the limits of pg, so I think it should be ok.

Clearly it's not.  The difference could be the memory usage and wastage
for all those relcache entries and other stuff.  I would reduce the
number of partitions to a more reasonable value (within the tens, most
likely)

Maybe your particular problem can be solved by raising
max_locks_per_transaction (?) but I wouldn't count on it.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> I have shown the entire configuration. if its not in the configuration
> shown, I have changed its value.

I meant to say "I haven't changed its value"

thomas




Re: select count() out of memory

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> tfinneid@student.matnat.uio.no wrote:
>> I did a test previously, where I created 1 million partitions (without
>> data) and I checked the limits of pg, so I think it should be ok.

> Clearly it's not.

You couldn't have tested it too much --- even planning a query over so
many tables would take forever, and actually executing it would surely
have run the system out of locktable space before it even started
scanning.

The partitioning facility is designed for partition counts in the tens,
or maybe hundreds at the most.

            regards, tom lane

Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> tfinneid@student.matnat.uio.no wrote:
>> > tfinneid@student.matnat.uio.no wrote:
>> >
>> >> > are a dump of Postgres's current memory allocations and could be
>> >> useful in
>> >> > showing if there's a memory leak causing this.
>> >>
>> >> The file is 20M, these are the last lines: (the first line continues
>> >> unttill ff_26000)
>> >>
>> >>
>> >> idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free
>> (0
>> >> chunks); 632 used
>> >
>> > You have 26000 partitions???
>>
>> At the moment the db has 55000 partitions, and thats only a fifth of the
>> complete volume the system will have in production. The reason I chose
>> this solution is that a partition will be loaded with new data every
>> 3-30
>> seconds, and all that will be read by up to 15 readers every time new
>> data
>> is available. The data will be approx 2-4TB in production in total. So
>> it
>> will be too slow if I put it in a single table with permanent indexes.
>>
>> I did a test previously, where I created 1 million partitions (without
>> data) and I checked the limits of pg, so I think it should be ok.
>
> Clearly it's not.

I does not mean my problem has anything to do with the number of
partitions. It might have, or it might not, and thats the problem, the
cause has not been located yet.

According to the documented limits of pg,
  The difference could be the memory usage and wastage
> for all those relcache entries and other stuff.  I would reduce the
> number of partitions to a more reasonable value (within the tens, most
> likely)

The db worked fine until it reached perhaps 30-40 thousand partitions.




Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> tfinneid@student.matnat.uio.no wrote:
>>> I did a test previously, where I created 1 million partitions (without
>>> data) and I checked the limits of pg, so I think it should be ok.
>
>> Clearly it's not.
>
> You couldn't have tested it too much --- even planning a query over so
> many tables would take forever, and actually executing it would surely
> have run the system out of locktable space before it even started
> scanning.

And this is the testing, so you're right....

Its only the select on the root table that fails. Operations on a single
partitions is no problem.

> The partitioning facility is designed for partition counts in the tens,
> or maybe hundreds at the most.

Maybe, but it works even on 55000 partitions as long as the operations are
done against a partition and not the root table.


Re: select count() out of memory

From
"Scott Marlowe"
Date:
On 10/25/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > tfinneid@student.matnat.uio.no wrote:
> >> I did a test previously, where I created 1 million partitions (without
> >> data) and I checked the limits of pg, so I think it should be ok.
>
> > Clearly it's not.
>
> You couldn't have tested it too much --- even planning a query over so
> many tables would take forever, and actually executing it would surely
> have run the system out of locktable space before it even started
> scanning.
>
> The partitioning facility is designed for partition counts in the tens,
> or maybe hundreds at the most.

I've had good results well into the hundreds, but after about 400 or
so, things start to get a bit wonky.

Re: select count() out of memory

From
Alvaro Herrera
Date:
tfinneid@student.matnat.uio.no wrote:
> > Alvaro Herrera <alvherre@commandprompt.com> writes:

> > The partitioning facility is designed for partition counts in the tens,
> > or maybe hundreds at the most.
>
> Maybe, but it works even on 55000 partitions as long as the operations are
> done against a partition and not the root table.

It will work on a million partitions and more, provided you do
operations on single partitions.

What you want to do is not possible, period.  Maybe when we redesign
partitioning, but that's far into the future.  Kindly do not waste our
time (nor yours).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: select count() out of memory

From
"Scott Marlowe"
Date:
On 10/25/07, tfinneid@student.matnat.uio.no
<tfinneid@student.matnat.uio.no> wrote:
> > Alvaro Herrera <alvherre@commandprompt.com> writes:
> >> tfinneid@student.matnat.uio.no wrote:
> >>> I did a test previously, where I created 1 million partitions (without
> >>> data) and I checked the limits of pg, so I think it should be ok.
> >
> >> Clearly it's not.
> >
> > You couldn't have tested it too much --- even planning a query over so
> > many tables would take forever, and actually executing it would surely
> > have run the system out of locktable space before it even started
> > scanning.
>
> And this is the testing, so you're right....
>
> Its only the select on the root table that fails. Operations on a single
> partitions is no problem.

Not sure I understand exactly what you're saying.

Are you selecting directly from the child table, or from the parent
table with constraint_exclusion turned on?

If you're hitting the child table directly, you aren't actually using
partitioning.  It's a wholly independent table at that point.

If you're hitting a single child table through the parent table via
constraint_exclusion, then you are using partitioning, but only
hitting on physical table.

But hitting the parent table with no constraining where clause is a
recipe for disaster.  The very reason to use partitioning is so that
you never have to scan through a single giant table.

Anyway, you're heading off into new territory with 55,000 partitions.
What is the average size, in MB of one of your partitions?  I found
with my test, there was a point of diminishing returns after 400 or so
partitions at which point indexes were no longer needed, because the
average query just seq scanned the partitions it needed, and they were
all ~ 16 or 32 Megs.

Re: select count() out of memory

From
Erik Jones
Date:
On Oct 25, 2007, at 9:36 AM, tfinneid@student.matnat.uio.no wrote:

>> tfinneid@student.matnat.uio.no wrote:
>>>> tfinneid@student.matnat.uio.no wrote:
>>>>
>>>>>> are a dump of Postgres's current memory allocations and could be
>>>>> useful in
>>>>>> showing if there's a memory leak causing this.
>>>>>
>>>>> The file is 20M, these are the last lines: (the first line
>>>>> continues
>>>>> unttill ff_26000)
>>>>>
>>>>>
>>>>> idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks;
>>>>> 392 free
>>> (0
>>>>> chunks); 632 used
>>>>
>>>> You have 26000 partitions???
>>>
>>> At the moment the db has 55000 partitions, and thats only a fifth
>>> of the
>>> complete volume the system will have in production. The reason I
>>> chose
>>> this solution is that a partition will be loaded with new data every
>>> 3-30
>>> seconds, and all that will be read by up to 15 readers every time
>>> new
>>> data
>>> is available. The data will be approx 2-4TB in production in
>>> total. So
>>> it
>>> will be too slow if I put it in a single table with permanent
>>> indexes.
>>>
>>> I did a test previously, where I created 1 million partitions
>>> (without
>>> data) and I checked the limits of pg, so I think it should be ok.
>>
>> Clearly it's not.
>
> I does not mean my problem has anything to do with the number of
> partitions. It might have, or it might not, and thats the problem, the
> cause has not been located yet.
>
> According to the documented limits of pg,
>   The difference could be the memory usage and wastage
>> for all those relcache entries and other stuff.  I would reduce the
>> number of partitions to a more reasonable value (within the tens,
>> most
>> likely)
>
> The db worked fine until it reached perhaps 30-40 thousand partitions.

It depends on how you have the partitions set up and how you're
accessing them.  Are all of these partitions under the same parent
table?  If so, then trying run a SELECT COUNT(*) against the parent
table is simply insane.  Think about it, you're asking one query to
scan 55000 tables.  What you need to do is partition based on your
access patterns, not what you *think* will help with performance down
the road.  Look into constraint exclusion, whether or not you can
just access child tables directly, and whether you really need all of
these under one logical table.  Also, no matter how you do the
partitioning, once you get up to that many and more relations in your
system, dumps and restores take a lot longer.

Erik Jones
Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: select count() out of memory

From
"Scott Marlowe"
Date:
On 10/25/07, tfinneid@student.matnat.uio.no
<tfinneid@student.matnat.uio.no> wrote:
> Hi
>
> I am volume testing a db model that consists of a paritioned tables. The
> db has been running for a week and a half now and has built up to contain
> approx 55000 partition tables of 18000 rows each. The root table therefore
> contains about 1 billion rows. When I try to do a "select count(*)" of the
> root table, it does some work for a while, perhaps 5-10 minutes and the
> aborts with
>
>     ERROR:  out of memory
>     DETAIL:  Failed on request of size 130.

So, out of curiosity, I asked my Oracle DBA friend if she'd ever heard
of anyone having 60,000 or so partitions in a table, and she looked at
me like I had a third eye in my forehead and said in her sweet voice
"Well, that would certainly be an edge case".  She sounded like she
was worried about me.

Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> tfinneid@student.matnat.uio.no wrote:
> It will work on a million partitions and more, provided you do
> operations on single partitions.

Thats good enough for me, thats exactly what I want. I just used the
select count() on the root to get a feeling of how many rows it was in
total. An then I thought that the error message was just a configuration
issue. But since doing operations like that on the the root table of this
magnitude is not a good idea, I won't.

> What you want to do is not possible, period.  Maybe when we redesign
> partitioning, but that's far into the future.  Kindly do not waste our
> time (nor yours).

Thank you for that prompt reply.

In all fairness, thats why I asked the question here, to find out the
facts, not to be abused for being ignorant about pg.

thomas



Re: select count() out of memory

From
Alvaro Herrera
Date:
tfinneid@student.matnat.uio.no wrote:
> > tfinneid@student.matnat.uio.no wrote:
> > It will work on a million partitions and more, provided you do
> > operations on single partitions.
>
> Thats good enough for me, thats exactly what I want.

In that case, why use partitions at all?  They are simple independent
tables.

--
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
"El destino baraja y nosotros jugamos" (A. Schopenhauer)

Re: select count() out of memory

From
Alvaro Herrera
Date:
Scott Marlowe escribió:

> So, out of curiosity, I asked my Oracle DBA friend if she'd ever heard
> of anyone having 60,000 or so partitions in a table, and she looked at
> me like I had a third eye in my forehead and said in her sweet voice
> "Well, that would certainly be an edge case".  She sounded like she
> was worried about me.

Did you get rid of that third eye already?  I would be equally worried.

--
Alvaro Herrera                        http://www.advogato.org/person/alvherre
"Someone said that it is at least an order of magnitude more work to do
production software than a prototype. I think he is wrong by at least
an order of magnitude."                              (Brian Kernighan)

Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
>> The db worked fine until it reached perhaps 30-40 thousand partitions.
>
> It depends on how you have the partitions set up and how you're
> accessing them.  Are all of these partitions under the same parent
> table?  If so, then trying run a SELECT COUNT(*) against the parent
> table is simply insane.  Think about it, you're asking one query to
> scan 55000 tables.  What you need to do is partition based on your
> access patterns, not what you *think* will help with performance down
> the road.  Look into constraint exclusion, whether or not you can
> just access child tables directly, and whether you really need all of
> these under one logical table.  Also, no matter how you do the
> partitioning, once you get up to that many and more relations in your
> system, dumps and restores take a lot longer.

The design is based on access patterns, i.e. one partition represents a
group of data along a discrete axis, so the partitions are the perfect for
modeling that. Only the last partition will be used on normal cases. The
previous partitions only need to exists until the operator deletes them,
which will be sometime between 1-6 weeks.

Regarding dumps and restore; the system will always be offline during
those operations and it will be so for several days, because a new project
might start at another location in the world, so the travelling there
takes time. In the mean time, all admin tasks can be performed without
problems, even backup operations that take 3 days.

regards

thomas
thomas



Re: select count() out of memory

From
Erik Jones
Date:
On Oct 25, 2007, at 10:36 AM, tfinneid@student.matnat.uio.no wrote:

>>> The db worked fine until it reached perhaps 30-40 thousand
>>> partitions.
>>
>> It depends on how you have the partitions set up and how you're
>> accessing them.  Are all of these partitions under the same parent
>> table?  If so, then trying run a SELECT COUNT(*) against the parent
>> table is simply insane.  Think about it, you're asking one query to
>> scan 55000 tables.  What you need to do is partition based on your
>> access patterns, not what you *think* will help with performance down
>> the road.  Look into constraint exclusion, whether or not you can
>> just access child tables directly, and whether you really need all of
>> these under one logical table.  Also, no matter how you do the
>> partitioning, once you get up to that many and more relations in your
>> system, dumps and restores take a lot longer.
>
> The design is based on access patterns, i.e. one partition
> represents a
> group of data along a discrete axis, so the partitions are the
> perfect for
> modeling that. Only the last partition will be used on normal
> cases. The
> previous partitions only need to exists until the operator deletes
> them,
> which will be sometime between 1-6 weeks.
>
> Regarding dumps and restore; the system will always be offline during
> those operations and it will be so for several days, because a new
> project
> might start at another location in the world, so the travelling there
> takes time. In the mean time, all admin tasks can be performed without
> problems, even backup operations that take 3 days.

Excellent, it sounds like you should be fine then.  One thing to
note:  if you want to get an "idea" of how many rows you have in your
partitions, you can run a SUM aggregate on reltuples in pg_class for
all of your partitions.  The more recent the last ANALYZE for each
table, the more accurate those values will be.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> Are you selecting directly from the child table, or from the parent
> table with constraint_exclusion turned on?

the problem was when selecting from the parent table, but selecting from
child tables are no problem. As stated in other replies, I only wanted to
know how many rows where in the table in total, it is not a part of the
actual operations of the server.

> But hitting the parent table with no constraining where clause is a
> recipe for disaster.  The very reason to use partitioning is so that
> you never have to scan through a single giant table.

So I have found out...

> Anyway, you're heading off into new territory with 55,000 partitions.

Perhaps, but I am only using the child tables for actual operations
though. But I also have a couple of indexes on each child table, so there
is now about 150000 indexes as well.
The intended operations of the server works fine, its the select on the
parent table that fails.

> What is the average size, in MB of one of your partitions?  I found
> with my test, there was a point of diminishing returns after 400 or so
> partitions at which point indexes were no longer needed, because the
> average query just seq scanned the partitions it needed, and they were
> all ~ 16 or 32 Megs.

I have no idea, but I suspect about a couple of megabytes each, at least
thats the size of the raw data. then maybe add a couple of megabytes more
for internal stuff.

regards

thomas


Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> So, out of curiosity, I asked my Oracle DBA friend if she'd ever heard
> of anyone having 60,000 or so partitions in a table, and she looked at
> me like I had a third eye in my forehead and said in her sweet voice
> "Well, that would certainly be an edge case".  She sounded like she
> was worried about me.

That means I am exploring new territory, which is good. Of course, there
is a possibility the design could be done in a different way so as not to
need that many partitions. But I asked on this list a some of months ago
about tips on how to design this data model and what I came up with was
what I have today.

The problem is the work is proprietary and confidential, so its difficult
for me to explain in detail what I want to do. But you could look at my
previous post asking the question for the descriptions of the problem I
want to solve.

regards

thomas


Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> tfinneid@student.matnat.uio.no wrote:
>> Thats good enough for me, thats exactly what I want.
>
> In that case, why use partitions at all?  They are simple independent
> tables.

For two reasons,
- the data logically belongs together.
- because its more practical to create tables as childs of a parent table
than as independent tables.
   - changes to the table is applied to all partitions, and prohibits
tables with different dd.
   - performing the create operation does not require the source code to
contain the ddl of the parent table.


regards

thomas


Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> Excellent, it sounds like you should be fine then.  One thing to
> note:  if you want to get an "idea" of how many rows you have in your
> partitions, you can run a SUM aggregate on reltuples in pg_class for
> all of your partitions.  The more recent the last ANALYZE for each
> table, the more accurate those values will be.

cool, thanks.

regards

thomas


Re: select count() out of memory

From
Alvaro Herrera
Date:
tfinneid@student.matnat.uio.no wrote:
> > tfinneid@student.matnat.uio.no wrote:
> >> Thats good enough for me, thats exactly what I want.
> >
> > In that case, why use partitions at all?  They are simple independent
> > tables.
>
> For two reasons,
> - the data logically belongs together.
> - because its more practical to create tables as childs of a parent table
> than as independent tables.
>    - changes to the table is applied to all partitions, and prohibits
> tables with different dd.
>    - performing the create operation does not require the source code to
> contain the ddl of the parent table.

You can use CREATE TABLE LIKE, which copies the definition but does not
set the inheritance.

--
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Saca el libro que tu religión considere como el indicado para encontrar la
oración que traiga paz a tu alma. Luego rebootea el computador
y ve si funciona" (Carlos Duclós)

Re: select count() out of memory

From
Tom Lane
Date:
tfinneid@student.matnat.uio.no writes:
>> In that case, why use partitions at all?  They are simple independent
>> tables.

> For two reasons,
> - the data logically belongs together.
> - because its more practical to create tables as childs of a parent table
> than as independent tables.
>    - changes to the table is applied to all partitions, and prohibits
> tables with different dd.
>    - performing the create operation does not require the source code to
> contain the ddl of the parent table.

In other words, you really should have only one table; they aren't
independent.  What you need to do is dial down your ideas of how many
partitions are reasonable to have.

            regards, tom lane

Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> tfinneid@student.matnat.uio.no wrote:
>> > tfinneid@student.matnat.uio.no wrote:
>> >> Thats good enough for me, thats exactly what I want.
>> >
>> > In that case, why use partitions at all?  They are simple independent
>> > tables.
>>
>> For two reasons,
>> - the data logically belongs together.
>> - because its more practical to create tables as childs of a parent
>> table
>> than as independent tables.
>>    - changes to the table is applied to all partitions, and prohibits
>> tables with different dd.
>>    - performing the create operation does not require the source code to
>> contain the ddl of the parent table.
>
> You can use CREATE TABLE LIKE, which copies the definition but does not
> set the inheritance.

I know, but point 1 of reason 2 was why I decided against it. I was
discussing the different options with myself when I was trying to decide,
and I decided on the partitions. So unless there will be a problem with
the real operations, I cant se any reason to change.

regards

thomas




Re: select count() out of memory

From
Erik Jones
Date:
On Oct 25, 2007, at 11:16 AM, Alvaro Herrera wrote:

> tfinneid@student.matnat.uio.no wrote:
>>> tfinneid@student.matnat.uio.no wrote:
>>>> Thats good enough for me, thats exactly what I want.
>>>
>>> In that case, why use partitions at all?  They are simple
>>> independent
>>> tables.
>>
>> For two reasons,
>> - the data logically belongs together.
>> - because its more practical to create tables as childs of a
>> parent table
>> than as independent tables.
>>    - changes to the table is applied to all partitions, and prohibits
>> tables with different dd.
>>    - performing the create operation does not require the source
>> code to
>> contain the ddl of the parent table.
>
> You can use CREATE TABLE LIKE, which copies the definition but does
> not
> set the inheritance.

That won't propogate changes made later.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> tfinneid@student.matnat.uio.no writes:
>>> In that case, why use partitions at all?  They are simple independent
>>> tables.
>
>> For two reasons,
>> - the data logically belongs together.
>> - because its more practical to create tables as childs of a parent
>> table
>> than as independent tables.
>>    - changes to the table is applied to all partitions, and prohibits
>> tables with different dd.
>>    - performing the create operation does not require the source code to
>> contain the ddl of the parent table.
>
> In other words, you really should have only one table; they aren't
> independent.  What you need to do is dial down your ideas of how many
> partitions are reasonable to have.

Yes, but no. Each partition represents a chunk of information on a
discrete timeline. So there is no point in grouping it all into a single
table, because the access pattern is to only access data from a specific
point in time, i.e. a single partition, usually the latest. Since the
amount of data is so big, approx 3MB per second, and each partition needs
to be indexed before the clients start reading the data (in the same
second). I find its better to use partitions, even though I am not
actually using it.

regards

thomas


Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> this is my config
>
> checkpoint_segments = 96
> effective_cache_size = 128000
> shared_buffers = 430000
> max_fsm_pages = 208000
> max_fsm_relations = 10000
>
> max_connections = 1000
>
> autovacuum = off                        # enable autovacuum subprocess?
>
> fsync = on                              # turns forced synchronization on
> or off
> #full_page_writes = on                  # recover from partial page writes
> wal_sync_method = fdatasync
> wal_buffers = 256
>
> commit_delay = 5
> #commit_siblings = 5                    # range 1-1000

Now that you hopefully understand more about the server i am building,
does anybody got any suggestions on improvements of the config? I could
certainly reduce the max_connections to 1/10. but are there other
configurations that could either be reduced or increased or even set to a
non default value?

regards

thomas


Re: select count() out of memory

From
Tom Lane
Date:
tfinneid@student.matnat.uio.no writes:
>> In other words, you really should have only one table; they aren't
>> independent.  What you need to do is dial down your ideas of how many
>> partitions are reasonable to have.

> Yes, but no. Each partition represents a chunk of information on a
> discrete timeline. So there is no point in grouping it all into a single
> table, because the access pattern is to only access data from a specific
> point in time, i.e. a single partition, usually the latest. Since the
> amount of data is so big, approx 3MB per second, and each partition needs
> to be indexed before the clients start reading the data (in the same
> second). I find its better to use partitions, even though I am not
> actually using it.

You are making a common beginner error, which is to suppose that N
little tables are better than one big one.  They are not.  What you're
effectively doing is replacing the upper levels of a big table's indexes
with lookups in the system catalogs, which in point of fact is a
terrible tradeoff from a performance standpoint.

From a database-theory standpoint, if all this data is alike then you
should have it all in one big table.  There are certain practical cases
where it's worth partitioning, but not at the level of granularity that
you are proposing.  This is why nobody, not even Oracle, tries to
support tens of thousands of partitions.

            regards, tom lane

Re: select count() out of memory

From
Steve Crawford
Date:
Alvaro Herrera wrote:
...
>
> You can use CREATE TABLE LIKE, which copies the definition but does not
> set the inheritance.
>

Well, sort of.

Unless I'm using it incorrectly it only copies the basic column
definitions and, as optionally specified, defaults and some of the
constraints.

Primary key constraints are lost as CREATE TABLE newtable (LIKE
oldtable) does not create any indexes including those necessary for the
primary key.

I don't know how foreign-key constraints are handled as I haven't used
this form of CREATE TABLE where foreign keys are involved.

Cheers,
Steve

Re: select count() out of memory

From
"Scott Marlowe"
Date:
On 10/25/07, Steve Crawford <scrawford@pinpointresearch.com> wrote:
> Alvaro Herrera wrote:
> ...
> >
> > You can use CREATE TABLE LIKE, which copies the definition but does not
> > set the inheritance.
> >
>
> Well, sort of.
>
> Unless I'm using it incorrectly it only copies the basic column
> definitions and, as optionally specified, defaults and some of the
> constraints.
>
> Primary key constraints are lost as CREATE TABLE newtable (LIKE
> oldtable) does not create any indexes including those necessary for the
> primary key.
>
> I don't know how foreign-key constraints are handled as I haven't used
> this form of CREATE TABLE where foreign keys are involved.

Neither inheritance nor creating LIKE will inherit primary keys.

Foreign keys will not be inherited by either method either.

Re: select count() out of memory

From
Thomas Finneid
Date:
Tom Lane wrote:

> You are making a common beginner error, which is to suppose that N
> little tables are better than one big one.  They are not.

Well that depends on how you define better. For my purposes, it is better.

> What you're
> effectively doing is replacing the upper levels of a big table's indexes
> with lookups in the system catalogs, which in point of fact is a
> terrible tradeoff from a performance standpoint.

Only if you assume I use all data in all tables all the time. But as I
have explained in other replies recently, most of the times only data
from the newest child table is used.

I did the performance tests before deciding on the design and having it
all in one large table would not perform at all within requirements, The
reason was that the indexes for the ever growing table would take longer
and longer to update at each insert.

When I use partitions, or child tables, I can use COPY to insert the
data into the new chilkd table and then add the indexes to the single
table only. That was, by far, the fastets solution.

> From a database-theory standpoint, if all this data is alike then you
> should have it all in one big table.

Then, what is the point with partitions, if you can not use it to
somehow separate logically similar data into different paritions because
one has a need to do so? Of course I could have put it in a single
table, had it not been for the performance. I could have used a discrete
timestamp to separate the data, but why? partitions is more practical.

> There are certain practical cases
> where it's worth partitioning, but not at the level of granularity that
> you are proposing.

If its practical to use partitions, granularity does not come into the
equation.

regards

thomas



Re: select count() out of memory

From
"Scott Marlowe"
Date:
On 10/25/07, Thomas Finneid <tfinneid@student.matnat.uio.no> wrote:
>
> Tom Lane wrote:
>
> > You are making a common beginner error, which is to suppose that N
> > little tables are better than one big one.  They are not.
>
> Well that depends on how you define better. For my purposes, it is better.
>
> > What you're
> > effectively doing is replacing the upper levels of a big table's indexes
> > with lookups in the system catalogs, which in point of fact is a
> > terrible tradeoff from a performance standpoint.
>
> Only if you assume I use all data in all tables all the time. But as I
> have explained in other replies recently, most of the times only data
> from the newest child table is used.
>
> I did the performance tests before deciding on the design and having it
> all in one large table would not perform at all within requirements, The
> reason was that the indexes for the ever growing table would take longer
> and longer to update at each insert.
>
> When I use partitions, or child tables, I can use COPY to insert the
> data into the new chilkd table and then add the indexes to the single
> table only. That was, by far, the fastets solution.
>
> > From a database-theory standpoint, if all this data is alike then you
> > should have it all in one big table.
>
> Then, what is the point with partitions, if you can not use it to
> somehow separate logically similar data into different paritions because
> one has a need to do so? Of course I could have put it in a single
> table, had it not been for the performance. I could have used a discrete
> timestamp to separate the data, but why? partitions is more practical.
>
> > There are certain practical cases
> > where it's worth partitioning, but not at the level of granularity that
> > you are proposing.
>
> If its practical to use partitions, granularity does not come into the
> equation.

It may well be that one big table and partial indexes would do what
you want.  Did you explore partial indexes against one big table?
That can be quite handy.

i.e

create table mybigtable (ts timestamp, id int primary key, row1
numeric, ..... rown numeric);
populate with a years worth of data, i.e. 100M rows or something like that
create index mybigtable_20071025 on mybigtable (id) where ts between
'2007-10-25 00:00:00' and '2007-10-25 23:59:59.99999';

repeat as needed.  now, when you want something from the table, you
can just ask for it with a timestamp range and it will hit the index,
and the table, all pretty fast.

Worth a look I guess.

Re: select count() out of memory

From
Gregory Stark
Date:
"Thomas Finneid" <tfinneid@student.matnat.uio.no> writes:

>> What you're
>> effectively doing is replacing the upper levels of a big table's indexes
>> with lookups in the system catalogs, which in point of fact is a
>> terrible tradeoff from a performance standpoint.
>
> Only if you assume I use all data in all tables all the time. But as I have
> explained in other replies recently, most of the times only data from the
> newest child table is used.

Tom's point is that if you have 55k tables then just *finding* the newest
child table is fairly expensive. You're accessing a not insignificant-sized
index and table of tables. And the situation is worse when you consider the
number of columns all those tables have, all the indexes those tables have,
all the column keys those indexes the tables have have, etc.

Nonetheless you've more or less convinced me that you're not completely nuts.

I would suggest not bothering with inheritance though. Inheritance imposes
additional costs to track the inheritance relationships. For your purposes you
may as well just create separate tables and not bother trying to use
inheritance.

> If its practical to use partitions, granularity does not come into the
> equation.

Uhm, yes it does. This is engineering, it's all about trade-offs. Having 55k
tables will have costs and benefits. I think it's a bit early to dismiss the
costs. Keep in mind that profiling them may be a bit tricky since they occur
during planning and DDL that you haven't finished experimenting with yet. The
problem you just ran into is just an example of the kind of costs it imposes.

You should also consider some form of compromise with separate tables but at a
lower level of granularity. Perhaps one partition per day instead of one per
30s. you could drop a partition when all the keys in it are marked as dead.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: select count() out of memory

From
Jorge Godoy
Date:
Em Thursday 25 October 2007 13:36:09 tfinneid@student.matnat.uio.no escreveu:
>
> Regarding dumps and restore; the system will always be offline during
> those operations and it will be so for several days, because a new project
> might start at another location in the world, so the travelling there
> takes time. In the mean time, all admin tasks can be performed without
> problems, even backup operations that take 3 days.

This sounds a lot like oil exploration...  Data gathered from sensors is
usually a few TBs, explosions have definite intervals, interference between
sensors, etc.

Putting the data inside the DB fast is part of the solution, getting it out
fast to be processes / analyzed is another part.

--
Jorge Godoy      <jgodoy@gmail.com>


Re: select count() out of memory

From
Thomas Finneid
Date:

Gregory Stark wrote:
> Tom's point is that if you have 55k tables then just *finding* the newest
> child table is fairly expensive. You're accessing a not insignificant-sized
> index and table of tables. And the situation is worse when you consider the
> number of columns all those tables have, all the indexes those tables have,
> all the column keys those indexes the tables have have, etc.

Yes, I got that. But I name the child tables so that I when my server
receives read requests, I retreive details from the request to be able
to figure out the exact child table name, without the system needing to
do any internal searches to find the newest table.

> Nonetheless you've more or less convinced me that you're not completely nuts.

thank you for only regarding me as somewhat nuts :)

> I would suggest not bothering with inheritance though. Inheritance imposes
> additional costs to track the inheritance relationships. For your purposes you
> may as well just create separate tables and not bother trying to use
> inheritance.

As addressed in a previous reply, I find inheritance better for a couple
of practical reasons.

>> If its practical to use partitions, granularity does not come into the
>> equation.
>
> Uhm, yes it does. This is engineering, it's all about trade-offs. Having 55k
> tables will have costs and benefits. I think it's a bit early to dismiss the
> costs. Keep in mind that profiling them may be a bit tricky since they occur
> during planning and DDL that you haven't finished experimenting with yet. The
> problem you just ran into is just an example of the kind of costs it imposes.

See answer on why granularity is not relevant for my case.

> You should also consider some form of compromise with separate tables but at a
> lower level of granularity. Perhaps one partition per day instead of one per
> 30s. you could drop a partition when all the keys in it are marked as dead.

The structure of the data is divided in a descrete timeline, so every
predefined x seconds a whole new bunch of data arrives, and all that
belongs in a single partition.


regards

thomas

Re: select count() out of memory

From
Thomas Finneid
Date:

Scott Marlowe wrote:
> It may well be that one big table and partial indexes would do what
> you want.  Did you explore partial indexes against one big table?
> That can be quite handy.

Hmm, interresting, I suppose it could work. Tanks for the suggestion,
Ill keep it in mind.

regards

thomas


Re: select count() out of memory

From
Thomas Finneid
Date:
Jorge Godoy wrote:
> Em Thursday 25 October 2007 13:36:09 tfinneid@student.matnat.uio.no escreveu:
>> Regarding dumps and restore; the system will always be offline during
>> those operations and it will be so for several days, because a new project
>> might start at another location in the world, so the travelling there
>> takes time. In the mean time, all admin tasks can be performed without
>> problems, even backup operations that take 3 days.
>
> This sounds a lot like oil exploration...  Data gathered from sensors is
> usually a few TBs, explosions have definite intervals, interference between
> sensors, etc.

Sorry I cant talk about what the work actually is, a colleague of mine
just got reprimanded for just mentioning he was working on a compression
library. The manager thought he was revealing *too much* :)

> Putting the data inside the DB fast is part of the solution, getting it out
> fast to be processes / analyzed is another part.

But you are right about that part, things needs to be fast.

regards

thomas


Re: select count() out of memory

From
Sam Mason
Date:
On Fri, Oct 26, 2007 at 08:26:13AM +0200, Thomas Finneid wrote:
> Scott Marlowe wrote:
> >It may well be that one big table and partial indexes would do what
> >you want.  Did you explore partial indexes against one big table?
> >That can be quite handy.
>
> Hmm, interresting, I suppose it could work. Tanks for the suggestion,
> Ill keep it in mind.

That's still going to have to do a tablescan on the whole dataset (a
couple of terabytes?) before building the index isn't it? that doesn't
sound like something you'd want to do too often.

Are there any thoughts of deferring index update so that many rows
can be merged simultaneously, rather than doing many individual index
operations?  It sounds as though this is what Thomas is really after and
it would also remove the need for dropping indexes while doing a bulk
insert of data.  I apologise if this has been discussed before!


  Sam

Re: select count() out of memory

From
Sam Mason
Date:
On Fri, Oct 26, 2007 at 08:25:09AM +0200, Thomas Finneid wrote:
> Gregory Stark wrote:
> >Tom's point is that if you have 55k tables then just *finding* the newest
> >child table is fairly expensive. You're accessing a not insignificant-sized
> >index and table of tables. And the situation is worse when you consider the
> >number of columns all those tables have, all the indexes those tables have,
> >all the column keys those indexes the tables have have, etc.
>
> Yes, I got that. But I name the child tables so that I when my server
> receives read requests, I retreive details from the request to be able
> to figure out the exact child table name, without the system needing to
> do any internal searches to find the newest table.

I think the lookup that is being referred to is the fact that if you've
got 55k (plus) files in a directory then the filesystem still has to
perform a search in the directory to locate the actual file associated
with the filename.  There probably isn't going to be much difference
between the filesystem performing this lookup vs the database descending
a few levels of its index structure.

I have a feeling you may have thought about this before, but thought it
wouldn't hurt to point it out more explicitly.


  Sam

Re: select count() out of memory

From
Gregory Stark
Date:
"Sam Mason" <sam@samason.me.uk> writes:

> On Fri, Oct 26, 2007 at 08:25:09AM +0200, Thomas Finneid wrote:
>> Gregory Stark wrote:
>> >Tom's point is that if you have 55k tables then just *finding* the newest
>> >child table is fairly expensive. You're accessing a not insignificant-sized
>> >index and table of tables. And the situation is worse when you consider the
>> >number of columns all those tables have, all the indexes those tables have,
>> >all the column keys those indexes the tables have have, etc.
>>
>> Yes, I got that. But I name the child tables so that I when my server
>> receives read requests, I retreive details from the request to be able
>> to figure out the exact child table name, without the system needing to
>> do any internal searches to find the newest table.
>
> I think the lookup that is being referred to is the fact that if you've
> got 55k (plus) files in a directory then the filesystem still has to
> perform a search in the directory to locate the actual file associated
> with the filename.  There probably isn't going to be much difference
> between the filesystem performing this lookup vs the database descending
> a few levels of its index structure.

That's true but it's in *addition* to the database having to find the catalog
records for the table which involves an index lookup itself.

Actually many index lookups since it has to look up the catalog record for the
table, for all the columns of the table, for all indexes of the table, for all
the index keys of those indexes, all constraints of the table, all triggers of
the table, all dependencies on other objects and of other objects on this
table (this latter is why I suggest not using inheritance).

Each of these lookups is using an index to find the table out of 55k records
which is just the same work that you're saving in the top level of the index
tree.

If you think there's one obvious solution then you just haven't analyzed the
problem seriously. In serious engineering there are always tradeoffs. The
cleanest prettiest solution is not necessarily -- and in cases where you're
dealing with large numbers like this almost certainly isn't -- the optimal
choice.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: select count() out of memory

From
Sam Mason
Date:
On Fri, Oct 26, 2007 at 02:21:39PM +0100, Gregory Stark wrote:
> "Sam Mason" <sam@samason.me.uk> writes:
> > I think the lookup that is being referred to is the fact that if you've
> > got 55k (plus) files in a directory then the filesystem still has to
> > perform a search in the directory to locate the actual file associated
> > with the filename.  There probably isn't going to be much difference
> > between the filesystem performing this lookup vs the database descending
> > a few levels of its index structure.
>
> That's true but it's in *addition* to the database having to find the catalog
> records for the table which involves an index lookup itself.
>
> Actually many index lookups since it has to look up the catalog record for the
> table, for all the columns of the table, for all indexes of the table, for all
> the index keys of those indexes, all constraints of the table, all triggers of
> the table, all dependencies on other objects and of other objects on this
> table (this latter is why I suggest not using inheritance).

OK, sounds pretty horrible to contemplate.  If most of the queries
are to the same table (sounds like they will be) then the relavant
information will remain resident in memory won't it?


  Sam

Re: select count() out of memory

From
tfinneid@student.matnat.uio.no
Date:
> "Sam Mason" <sam@samason.me.uk> writes:
>
>> On Fri, Oct 26, 2007 at 08:25:09AM +0200, Thomas Finneid wrote:
>>> Gregory Stark wrote:
>>> >Tom's point is that if you have 55k tables then just *finding* the
>>> newest
>>> >child table is fairly expensive. You're accessing a not
>>> insignificant-sized
>>> >index and table of tables. And the situation is worse when you
>>> consider the
>>> >number of columns all those tables have, all the indexes those tables
>>> have,
>>> >all the column keys those indexes the tables have have, etc.
>>>
>>> Yes, I got that. But I name the child tables so that I when my server
>>> receives read requests, I retreive details from the request to be able
>>> to figure out the exact child table name, without the system needing to
>>> do any internal searches to find the newest table.
>>
>> I think the lookup that is being referred to is the fact that if you've
>> got 55k (plus) files in a directory then the filesystem still has to
>> perform a search in the directory to locate the actual file associated
>> with the filename.  There probably isn't going to be much difference
>> between the filesystem performing this lookup vs the database descending
>> a few levels of its index structure.
>
> That's true but it's in *addition* to the database having to find the
> catalog
> records for the table which involves an index lookup itself.
>
> Actually many index lookups since it has to look up the catalog record for
> the
> table, for all the columns of the table, for all indexes of the table, for
> all
> the index keys of those indexes, all constraints of the table, all
> triggers of
> the table, all dependencies on other objects and of other objects on this
> table (this latter is why I suggest not using inheritance).
>
> Each of these lookups is using an index to find the table out of 55k
> records
> which is just the same work that you're saving in the top level of the
> index
> tree.
>
> If you think there's one obvious solution then you just haven't analyzed
> the
> problem seriously. In serious engineering there are always tradeoffs. The
> cleanest prettiest solution is not necessarily -- and in cases where
> you're
> dealing with large numbers like this almost certainly isn't -- the optimal
> choice.

Serious engineering does not imply perfect engineering, I have analyzed it
and made my tradeoffs. What you are forgetting here is that you clearly
dont understand the enire solution, So I will try to explain it again. And
if you still think its bonkers, the I urge you to come up with a solution
that works with the requirements.

Every predefined X seconds (e.g. 3,6,9,12 etc ) a bunch of data arrives,
which must be stored by descrete time groups, e.g. second 3,6,9,12. The
data that arrives is approx 4MB per second, so in this case its 12MB. This
has to be processed by the server and written to the db, within 1 second.
There can be up to 5 writers at the same time. Within that same second, at
least 16 readers should be able to read all the data, *each*. Writers and
readers are only concerned with the latest data, i.e. data from the latest
time group, e.g. second 9.
This has to go on every predefined seconds for the next 6-12 weeks,
without stop, pause or any errors. These are the requirements.

When I performed performance tests I found several unwanted effects from
several test scenarios. Here are the most important ones:

- single large table, with indexes created when table is created.
   - this leads to the performance of an insert degrading as more data is
added, when I get
     to 1 billion rows it took 50 seconds to add the data.

     My lesson from this is that
      - single inserts can never be efficient enough
      - indexes cause linear performance drop as data volume increases

So I tried a different approach, which would address both issues:

- separate tables for each bulk of data
   - use of bulk insert through jdbc COPY.
   - add indexes to the newly create table after the copy is finished.

   My lesson from this is:
   - insert take constant time, no matter how much data is in the base
   - adding the indexes after insert takes constant time, i.e. some
milliseconds.

From this I realised that using either single tables or partitions is the
way to go, since I only need to access the latest data, i.e. the newest
table, in normal situations.

After thinking about it and discussing with this group, I found that using
partitions would be more practical for two reasons:
- changes to the parent table is automatically propagated to all child
tables, so the schema remains consistent and the server wont brake because
of differences in the tables.
- it is more maintainable to use "create with inheritance" sql in source
code than the entire ddl of the table.

So now I have tested the server 24/7 for a week and a half, with 1 writer
and 16 readers writing all the mentioned data, and everything works fine.
Expect for the select on the parent table, which now runs out of memory.
Which in it self is not a problem since I will never use the parent table
in production in any case.

regards

tom



Re: select count() out of memory

From
Alvaro Herrera
Date:
tfinneid@student.matnat.uio.no wrote:
> > tfinneid@student.matnat.uio.no wrote:
> >
> >> > are a dump of Postgres's current memory allocations and could be
> >> useful in
> >> > showing if there's a memory leak causing this.
> >>
> >> The file is 20M, these are the last lines: (the first line continues
> >> unttill ff_26000)
> >>
> >>
> >> idx_attributes_g1_seq_1_ff_4_value7: 1024 total in 1 blocks; 392 free (0
> >> chunks); 632 used
> >
> > You have 26000 partitions???
>
> At the moment the db has 55000 partitions, and thats only a fifth of the
> complete volume the system will have in production. The reason I chose
> this solution is that a partition will be loaded with new data every 3-30
> seconds, and all that will be read by up to 15 readers every time new data
> is available. The data will be approx 2-4TB in production in total. So it
> will be too slow if I put it in a single table with permanent indexes.
>
> I did a test previously, where I created 1 million partitions (without
> data) and I checked the limits of pg, so I think it should be ok.

Clearly it's not.  The difference could be the memory usage and wastage
for all those relcache entries and other stuff.  I would reduce the
number of partitions to a more reasonable value (within the tens, most
likely)

Maybe your particular problem can be solved by raising
max_locks_per_transaction (?) but I wouldn't count on it.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: select count() out of memory

From
Adrian Klaver
Date:
On Friday 26 October 2007 8:56 am, tfinneid@student.matnat.uio.no wrote:
>
> Serious engineering does not imply perfect engineering, I have analyzed it
> and made my tradeoffs. What you are forgetting here is that you clearly
> dont understand the enire solution, So I will try to explain it again. And
> if you still think its bonkers, the I urge you to come up with a solution
> that works with the requirements.
>
> Every predefined X seconds (e.g. 3,6,9,12 etc ) a bunch of data arrives,
> which must be stored by descrete time groups, e.g. second 3,6,9,12. The
> data that arrives is approx 4MB per second, so in this case its 12MB. This
> has to be processed by the server and written to the db, within 1 second.
> There can be up to 5 writers at the same time. Within that same second, at
> least 16 readers should be able to read all the data, *each*. Writers and
> readers are only concerned with the latest data, i.e. data from the latest
> time group, e.g. second 9.
> This has to go on every predefined seconds for the next 6-12 weeks,
> without stop, pause or any errors. These are the requirements.
>
> When I performed performance tests I found several unwanted effects from
> several test scenarios. Here are the most important ones:
>
> - single large table, with indexes created when table is created.
>    - this leads to the performance of an insert degrading as more data is
> added, when I get
>      to 1 billion rows it took 50 seconds to add the data.
>
>      My lesson from this is that
>       - single inserts can never be efficient enough
>       - indexes cause linear performance drop as data volume increases
>
> So I tried a different approach, which would address both issues:
>
> - separate tables for each bulk of data
>    - use of bulk insert through jdbc COPY.
>    - add indexes to the newly create table after the copy is finished.
>
>    My lesson from this is:
>    - insert take constant time, no matter how much data is in the base
>    - adding the indexes after insert takes constant time, i.e. some
> milliseconds.
>
> From this I realised that using either single tables or partitions is the
> way to go, since I only need to access the latest data, i.e. the newest
> table, in normal situations.
>
> After thinking about it and discussing with this group, I found that using
> partitions would be more practical for two reasons:
> - changes to the parent table is automatically propagated to all child
> tables, so the schema remains consistent and the server wont brake because
> of differences in the tables.
> - it is more maintainable to use "create with inheritance" sql in source
> code than the entire ddl of the table.
>
> So now I have tested the server 24/7 for a week and a half, with 1 writer
> and 16 readers writing all the mentioned data, and everything works fine.
> Expect for the select on the parent table, which now runs out of memory.
> Which in it self is not a problem since I will never use the parent table
> in production in any case.
>
> regards
>
> tom
I might be missing the point, but couldn't you do a Copy to a single table
instead of multiple inserts and avoid the index overhead.
--
Adrian Klaver
aklaver@comcast.net

Re: select count() out of memory

From
Thomas Finneid
Date:

Adrian Klaver wrote:
> I might be missing the point, but couldn't you do a Copy to a single table
> instead of multiple inserts and avoid the index overhead.

Are you saying, have one large table with indexes and do a COPY to it or
are you saying a one small empty table and do a COPY to it?

thomas

Re: select count() out of memory

From
aklaver@comcast.net (Adrian Klaver)
Date:
 -------------- Original message ----------------------
From: Thomas Finneid <tfinneid@student.matnat.uio.no>
>
>
> Adrian Klaver wrote:
> > I might be missing the point, but couldn't you do a Copy to a single table
> > instead of multiple inserts and avoid the index overhead.
>
> Are you saying, have one large table with indexes and do a COPY to it or
> are you saying a one small empty table and do a COPY to it?
>
> thomas

I'm thinking do a COPY to one large table.  If the cost of indexing is relatively fixed as
you indicated in your previous post then you reduce the indexing overhead to each
COPY operation instead of each insert.

--
Adrian Klaver
aklaver@comcast.net

Re: select count() out of memory

From
Thomas Finneid
Date:

Adrian Klaver wrote:
> I'm thinking do a COPY to one large table.  If the cost of indexing is relatively fixed as
> you indicated in your previous post then you reduce the indexing overhead to each
> COPY operation instead of each insert.

No, what I meant whas that creating an index on a table with data after
the data has been COPYed, is faster than having one large table with an
existing index where data is added all the time and the index is
updatesd along with that.

I did do a performance test some time ago on that or something similar,
I dont remember the exact details, but what ever I tested it was slower.
But I will try exactly what you suggested tomorrow, when at the test
machine. The issue is it has to runn for some time before it is possible
to see any lasting degradation in performance.


thomas

Re: select count() out of memory

From
Paul Boddie
Date:
On 25 Okt, 17:36, tfinn...@student.matnat.uio.no wrote:
>
> The design is based on access patterns, i.e. one partition represents a
> group of data along a discrete axis, so the partitions are the perfect for
> modeling that. Only the last partition will be used on normal cases. The
> previous partitions only need to exists until the operator deletes them,
> which will be sometime between 1-6 weeks.

This has been interesting reading because I'm working on a system
which involves a more batch-oriented approach in loading the data,
where I've found partitions to be useful both from a performance
perspective (it looks like my indexes would be inconveniently big
otherwise for the total volume of data) and from an administrative
perspective (it's convenient to control the constraints for discrete
subsets of my data). However, if all but the most recent data remains
relatively stable, why not maintain your own statistics for each
partition or, as someone else suggested, use the pg_class statistics?

I'd just be interested to hear what the best practices are when tables
get big and where the access patterns favour the most recently loaded
data and/or reliably identifiable subsets of the data, as they seem to
in this case and in my own case. The various tuning guides out there
have been very useful, but isn't there a point at which partitioning
is inevitable?

Paul