Thread: Auto-clustering?
Hi all,
I have a table that in the typical case holds two minute sample data for a few thousand sources. Often we need to report on these data for a particular source over a particular time period and we're finding this query tends to get a bit slow.
The structure of the table:
Table "public.sample"
Column | Type | Modifiers
-------------------+--------------------------+-------------------------------------------------
client | integer | not null
aggregateid | bigint | not null
sample | bigint | not null default nextval('samplekey'::regclass)
customer | integer |
period | integer | not null
starttime | integer | not null
duration | integer | not null
ip | text |
tariff | integer |
bytessentrate | bigint |
bytessent | bigint |
bytesreceived | bigint |
packets | integer | not null
queuetype | integer | not null default 0
collection | integer |
bytesreceivedrate | bigint |
greatestrate | bigint |
invalidated | timestamp with time zone |
Indexes:
"sample_pkey" PRIMARY KEY, btree (sample)
"sample_collection_starttime_idx" btree (collection, starttime)
"sample_customer_starttime_idx" btree (customer, starttime)
"sample_sample_idx" btree (client, sample)
Foreign-key constraints:
"sample_client_fkey" FOREIGN KEY (client) REFERENCES client(client)
Column | Type | Modifiers
-------------------+--------------------------+-------------------------------------------------
client | integer | not null
aggregateid | bigint | not null
sample | bigint | not null default nextval('samplekey'::regclass)
customer | integer |
period | integer | not null
starttime | integer | not null
duration | integer | not null
ip | text |
tariff | integer |
bytessentrate | bigint |
bytessent | bigint |
bytesreceived | bigint |
packets | integer | not null
queuetype | integer | not null default 0
collection | integer |
bytesreceivedrate | bigint |
greatestrate | bigint |
invalidated | timestamp with time zone |
Indexes:
"sample_pkey" PRIMARY KEY, btree (sample)
"sample_collection_starttime_idx" btree (collection, starttime)
"sample_customer_starttime_idx" btree (customer, starttime)
"sample_sample_idx" btree (client, sample)
Foreign-key constraints:
"sample_client_fkey" FOREIGN KEY (client) REFERENCES client(client)
fc=# explain analyse select collection, period, tariff, sum(bytesSent), sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600 as startchunk from sample_20101001 where starttime between 1287493200 and 1290171599 and collection=128 and ip = '10.9.125.207' group by startchunk, tariff, collection, period; QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=34959.01..34959.03 rows=1 width=44) (actual time=67047.850..67047.850 rows=0 loops=1)
-> Bitmap Heap Scan on sample_20101001 (cost=130.56..34958.91 rows=5 width=44) (actual time=67047.847..67047.847 rows=0 loops=1)
Recheck Cond: ((collection = 128) AND (starttime >= 1287493200) AND (starttime <= 1290171599))
Filter: (ip = '10.9.125.207'::text)
-> Bitmap Index Scan on sample_20101001_collection_starttime_idx (cost=0.00..130.56 rows=9596 width=0) (actual time=9806.115..9806.115 rows=6830 loops=1)
Index Cond: ((collection = 128) AND (starttime >= 1287493200) AND (starttime <= 1290171599))
Total runtime: 67048.201 ms
(7 rows)
I figure at most there should only be ~20,000 rows to be read from disk, and I expect that the index is doing a pretty good job of making sure only the rows that need reading are read. inclusion of the ip in the query is almost redundant as most of the time an ip has its own collection.... My suspicion is that the rows that we're interested in are very sparsely distributed on disk, so we're having to read too many pages for the query...
All of the queries on this table are reporting on a single collection, so ideally a collection's data would all be stored in the same part of the disk... or at least clumped together. This can be achieved using "cluster", however as far as I know there's no automated, non-cronesque means of clustering and having the table become unusable during the cluster is not ideal.
I've considered partitioning, but I don't think that's going to give the effect I need. Apparently clustering is only going to scale to a few dozen child tables, so that's only going to give one order of magnitude performance for significant complexity.
Are there any other options?
Cheers!
--Royce
2010/12/17 Royce Ausburn <royce@inomial.com>
how about (auto)vacuuming?
you can test this suspicion in very simple way:
- create test table (like yours including indexes including constraints, but with no data)
- insert into test select * from yours order by
- analyze test tablee available
- test the query on the new table
If new query is much faster, and if you have intensive random UPD/DEL/INS activity, periodic CLUSTER could be a good idea...
but it depends on actual usage patterns (SELECT/modify ratio, types of updates, and so on).
cron is a way of automation, isn't it :-)
regarding partitioning: I guess it starts to make sense around 10M rows or 10G Bytes in one table.
regarding clustering: it does not help with index bloat.
and finally, you did not specify what PostgreSQL version are you using.
cheers,
Filip
Hi all,I have a table that in the typical case holds two minute sample data for a few thousand sources. Often we need to report on these data for a particular source over a particular time period and we're finding this query tends to get a bit slow.The structure of the table:Table "public.sample"
Column | Type | Modifiers
-------------------+--------------------------+-------------------------------------------------
client | integer | not null
aggregateid | bigint | not null
sample | bigint | not null default nextval('samplekey'::regclass)
customer | integer |
period | integer | not null
starttime | integer | not null
duration | integer | not null
ip | text |
tariff | integer |
bytessentrate | bigint |
bytessent | bigint |
bytesreceived | bigint |
packets | integer | not null
queuetype | integer | not null default 0
collection | integer |
bytesreceivedrate | bigint |
greatestrate | bigint |
invalidated | timestamp with time zone |
Indexes:
"sample_pkey" PRIMARY KEY, btree (sample)
"sample_collection_starttime_idx" btree (collection, starttime)
"sample_customer_starttime_idx" btree (customer, starttime)
"sample_sample_idx" btree (client, sample)
Foreign-key constraints:
"sample_client_fkey" FOREIGN KEY (client) REFERENCES client(client)fc=# explain analyse select collection, period, tariff, sum(bytesSent), sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600 as startchunk from sample_20101001 where starttime between 1287493200 and 1290171599 and collection=128 and ip = '10.9.125.207' group by startchunk, tariff, collection, period; QUERY PLAN---------------------------------------------------------------------------------------------------------------------------------------------------------------------HashAggregate (cost=34959.01..34959.03 rows=1 width=44) (actual time=67047.850..67047.850 rows=0 loops=1)-> Bitmap Heap Scan on sample_20101001 (cost=130.56..34958.91 rows=5 width=44) (actual time=67047.847..67047.847 rows=0 loops=1)Recheck Cond: ((collection = 128) AND (starttime >= 1287493200) AND (starttime <= 1290171599))Filter: (ip = '10.9.125.207'::text)-> Bitmap Index Scan on sample_20101001_collection_starttime_idx (cost=0.00..130.56 rows=9596 width=0) (actual time=9806.115..9806.115 rows=6830 loops=1)Index Cond: ((collection = 128) AND (starttime >= 1287493200) AND (starttime <= 1290171599))Total runtime: 67048.201 ms(7 rows)
how about (auto)vacuuming?
I figure at most there should only be ~20,000 rows to be read from disk, and I expect that the index is doing a pretty good job of making sure only the rows that need reading are read. inclusion of the ip in the query is almost redundant as most of the time an ip has its own collection.... My suspicion is that the rows that we're interested in are very sparsely distributed on disk, so we're having to read too many pages for the query...
you can test this suspicion in very simple way:
- create test table (like yours including indexes including constraints, but with no data)
- insert into test select * from yours order by
- analyze test tablee available
- test the query on the new table
If new query is much faster, and if you have intensive random UPD/DEL/INS activity, periodic CLUSTER could be a good idea...
but it depends on actual usage patterns (SELECT/modify ratio, types of updates, and so on).
All of the queries on this table are reporting on a single collection, so ideally a collection's data would all be stored in the same part of the disk... or at least clumped together. This can be achieved using "cluster", however as far as I know there's no automated, non-cronesque means of clustering and having the table become unusable during the cluster is not ideal.
cron is a way of automation, isn't it :-)
I've considered partitioning, but I don't think that's going to give the effect I need. Apparently clustering is only going to scale to a few dozen child tables, so that's only going to give one order of magnitude performance for significant complexity.
regarding partitioning: I guess it starts to make sense around 10M rows or 10G Bytes in one table.
regarding clustering: it does not help with index bloat.
and finally, you did not specify what PostgreSQL version are you using.
cheers,
Filip
2010/12/17 Filip Rembiałkowski <filip.rembialkowski@gmail.com>: > regarding clustering: it does not help with index bloat. I'm almost sure it does, CLUSTER re-creates all indexes from scratch after copying the tuples. Regards, Marti
you are right, I must have missed it...
Table "public.u"
Column | Type | Modifiers
--------+-----------------------------+-----------
id | integer |
t | timestamp without time zone |
d | text |
Indexes:
"u_d" btree (d)
"u_id" btree (id)
"u_t" btree (t)
filip@filip=# select oid, relname, pg_Relation_size(oid) from pg_class where relname in('u','u_id','u_t','u_d');
oid | relname | pg_relation_size
-------+---------+------------------
64283 | u | 15187968
64289 | u_id | 6758400
64290 | u_t | 6086656
64291 | u_d | 16482304
filip@filip=# CLUSTER u USING u_t;
CLUSTER
filip@filip=# select oid, relname, pg_Relation_size(oid) from pg_class where relname in('u','u_id','u_t','u_d');
oid | relname | pg_relation_size
-------+---------+------------------
64283 | u | 12115968
64289 | u_id | 3391488
64290 | u_t | 3391488
64291 | u_d | 8216576
(4 rows)
So CLUSTER is effectively CLUSTER + REINDEX... nice.
Table "public.u"
Column | Type | Modifiers
--------+-----------------------------+-----------
id | integer |
t | timestamp without time zone |
d | text |
Indexes:
"u_d" btree (d)
"u_id" btree (id)
"u_t" btree (t)
filip@filip=# select oid, relname, pg_Relation_size(oid) from pg_class where relname in('u','u_id','u_t','u_d');
oid | relname | pg_relation_size
-------+---------+------------------
64283 | u | 15187968
64289 | u_id | 6758400
64290 | u_t | 6086656
64291 | u_d | 16482304
filip@filip=# CLUSTER u USING u_t;
CLUSTER
filip@filip=# select oid, relname, pg_Relation_size(oid) from pg_class where relname in('u','u_id','u_t','u_d');
oid | relname | pg_relation_size
-------+---------+------------------
64283 | u | 12115968
64289 | u_id | 3391488
64290 | u_t | 3391488
64291 | u_d | 8216576
(4 rows)
So CLUSTER is effectively CLUSTER + REINDEX... nice.
W dniu 17 grudnia 2010 10:41 użytkownik Marti Raudsepp <marti@juffo.org> napisał:
2010/12/17 Filip Rembiałkowski <filip.rembialkowski@gmail.com>:> regarding clustering: it does not help with index bloat.I'm almost sure it does, CLUSTER re-creates all indexes from scratch
after copying the tuples.
Regards,
Marti
> fc=# explain analyse select collection, period, tariff, sum(bytesSent), > sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600 > as startchunk from sample_20101001 where starttime between 1287493200 > and 1290171599 and collection=128 and ip = '10.9.125.207' group by > startchunk, tariff, collection, > period; If CLUSTER locks bother you, and you don't do UPDATEs, you might consider doing something like this : - accumulate the rows in a "recent" table - every hour, INSERT INTO archive SELECT * FROM recent ORDER BY (your cluster fields) - DELETE FROM recent the rows you just inserted - VACUUM recent The cluster in your archive table will not be perfect but at least all rows from 1 source in 1 hour will be stored close together. But clustering doesn't need to be perfect either, if you get 100x better locality, that's already good ! Now, if you have a huge amount of data but never query it with a precision exceeding 1 hour, you might consider creating an aggregate table where, at the end of every hour, you only store sum(), min(), max() of the data for the last hour's data using GROUP BY the fields you want. You could also use a trigger, but that would generate a huge amount of UPDATEs. For the above query you'd do : INSERT INTO stats_by_hour (columns...) SELECT collection, ip, period, tariff, sum(bytesSent), sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600 as startchunk from sample_20101001 WHERE starttime > some value GROUP BY collection, ip, period, tariff, startchunk Then you can run aggregates against this much smaller table instead.
Royce Ausburn a écrit : > All of the queries on this table are reporting on a single collection, so ideally a collection's data would all be storedin the same part of the disk... or at least clumped together. This can be achieved using "cluster", however as faras I know there's no automated, non-cronesque means of clustering and having the table become unusable during the clusteris not ideal. > > > If the lock level used by CLUSTER is a problem for you, you could consider pg_reorg contrib. AFAIK, it does similar work as CLUSTER but allowing a concurrent read and write activity on the table. Regards. Philippe.
On 17/12/2010, at 8:27 PM, Filip Rembiałkowski wrote:
2010/12/17 Royce Ausburn <royce@inomial.com>Hi all,I have a table that in the typical case holds two minute sample data for a few thousand sources. Often we need to report on these data for a particular source over a particular time period and we're finding this query tends to get a bit slow.
how about (auto)vacuuming?
A key piece of information I left out: we almost never update rows in this table.
I figure at most there should only be ~20,000 rows to be read from disk, and I expect that the index is doing a pretty good job of making sure only the rows that need reading are read. inclusion of the ip in the query is almost redundant as most of the time an ip has its own collection.... My suspicion is that the rows that we're interested in are very sparsely distributed on disk, so we're having to read too many pages for the query...
you can test this suspicion in very simple way:
- create test table (like yours including indexes including constraints, but with no data)
- insert into test select * from yours order by
- analyze test tablee available
- test the query on the new table
If new query is much faster, and if you have intensive random UPD/DEL/INS activity, periodic CLUSTER could be a good idea...
but it depends on actual usage patterns (SELECT/modify ratio, types of updates, and so on).
Good idea! This vastly improves query times.
and finally, you did not specify what PostgreSQL version are you using.
--Royce
On 17/12/2010, at 9:20 PM, Pierre C wrote: > >> fc=# explain analyse select collection, period, tariff, sum(bytesSent), sum(bytesReceived), sum(packets), max(sample),(starttime / 3600) * 3600 as startchunk from sample_20101001 where starttime between 1287493200 and 1290171599 and collection=128 and ip = '10.9.125.207' group by startchunk, tariff, collection, period; > > If CLUSTER locks bother you, and you don't do UPDATEs, you might consider doing something like this : > > - accumulate the rows in a "recent" table > - every hour, INSERT INTO archive SELECT * FROM recent ORDER BY (your cluster fields) > - DELETE FROM recent the rows you just inserted > - VACUUM recent > > The cluster in your archive table will not be perfect but at least all rows from 1 source in 1 hour will be stored closetogether. But clustering doesn't need to be perfect either, if you get 100x better locality, that's already good ! That's a really decent idea and can slot in perfectly well with how the application already works! We have existing DBAOcode that handles monthly tables; it'll happily pop data in to a recent table.... In fact we can probably tolerate havinga "today" table. Thanks! --Royce