Thread: Partition table query performance
I have a problem with partitioning and I'm wondering if anyone can provide some insight. I'm trying to find the max value of a column across multiple partitions. The query against the partition set is quite slow while queries against child partitions is very fast!
I setup a basic Range Partition table definition.
A parent table: Data { dataID, sensorID, value, ts }
child tables Data_YYYY_WEEKNO { dataID, sensorID, value, ts} inherited from Data
Each child tables has a primary key index on dataID and a composite index on (sensorID, ts).
Each child has check constraints for the week range identified in the table name (non overlapping)
I want to perform a simple operation: select the max ts (timestamp) giving a sensorID. Given my indexs on the table, this should be a simple and fast operation.
DB=# EXPLAIN ANALYZE select max(ts) from "Data" where valid=true and "sensorID"=8293 ;
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=334862.92..334862.93 rows=1 width=8) (actual time=85183.381..85183.383 rows=1 loops=1)
-> Append (cost=2.30..329397.68 rows=2186096 width=8) (actual time=1.263..76592.755 rows=2205408 loops=1)
-> Bitmap Heap Scan on "Data" (cost=2.30..8.84 rows=3 width=8) (actual time=0.027..0.027 rows=0 loops=1)
Recheck Cond: ("sensorID" = 8293)
Filter: valid
-> Bitmap Index Scan on "def_data_sensorID_ts" (cost=0.00..2.30 rows=6 width=0) (actual time=0.021..0.021 rows=0 loops=1)
Index Cond: ("sensorID" = 8293)
-> Index Scan using "Data_2008_01_sensorID_ts_index" on "Data_2008_01" "Data" (cost=0.00..4.27 rows=1 width=8) (actual time=0.014..0.014 rows=0 loops=1)
Index Cond: ("sensorID" = 8293)
Filter: valid
-> Bitmap Heap Scan on "Data_2008_02" "Data" (cost=3.01..121.08 rows=98 width=8) (actual time=0.017..0.017 rows=0 loops=1)
Recheck Cond: ("sensorID" = 8293)
Filter: valid
-> Bitmap Index Scan on "Data_2008_02_sensorID_ts_index" (cost=0.00..2.99 rows=98 width=0) (actual time=0.011..0.011 rows=0 loops=1)
Index Cond: ("sensorID" = 8293)
.
.
. (omitted a list of all partitions with same as data above)
.
Total runtime: 85188.694 ms
When I query against a specific partition:
DB=# EXPLAIN ANALYZE select max(ts) from "Data_2008_48" where valid=true and "sensorID"=8293 ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.10..0.11 rows=1 width=0) (actual time=3.830..3.832 rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..0.10 rows=1 width=8) (actual time=3.817..3.819 rows=1 loops=1)
-> Index Scan Backward using "Data_2008_48_sensorID_ts_index" on "Data_2008_48" (cost=0.00..15304.55 rows=148959 width=8) (actual time=3.813..3.813 rows=1 loops=1)
Index Cond: ("sensorID" = 8293)
Filter: ((ts IS NOT NULL) AND valid)
Total runtime: 0.225 ms
The query plan against the child partition makes sense - Uses the index to find the max value. The query plan for the partitions uses a combination of bitmap heap scans and index scans.
Why would the query plan choose to use a bitmap heap scan after bitmap index scan or is that the best choice? (what is it doing?) and what can I do to speed up this query?
As a sanity check I did a union query of all partitions to find the max(ts). My manual union query executed in 13ms vs the query against the parent table that was 85,188ms!!!.
Greg Jaman
"Greg Jaman" <gjaman@gmail.com> writes: > I have a problem with partitioning and I'm wondering if anyone can provide > some insight. I'm trying to find the max value of a column across multiple > partitions. The query against the partition set is quite slow while queries > against child partitions is very fast! I'm afraid this is a known problematic use case of Postgres's current partitioning support. Postgres is not capable of finding the plan which you're undoubtedly looking for where it uses the same plan as your child table query iterating over the partitions. There are several groups working to improve this in different ways but none of them appear to be on track to be in 8.4 so it will be 8.5 or later before they appear. Sorry. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
Thanks Gregory,
I was on IRC yesterday and a few people indicated the same thing...
Searching for the last reading is a very important function for our database. I wrote the below function searches all child tables for the max. It is not optimization because it doesn't omit tables by look at the check constraints on child tables to see if the last found max is greater than the constraints. Right now this function executes in 50ms vs the 80+ for the same query against the partition set.
create or replace function Data_max(in_sensorID integer) returns bigint AS
$$
DECLARE
childtable RECORD;
childres RECORD;
max_dataID bigint := NULL;
max_ts timestamp without time zone;
BEGIN
FOR childtable in select pc.relname as relname from pg_class pc join pg_inherits pi on pc.oid=pi.inhrelid where inhparent=(select oid from pg_class where relname='Data')
LOOP
EXECUTE ' SELECT "dataID", ts FROM ' || quote_ident( childtable.relname )
|| ' WHERE "sensorID"=' || quote_literal(in_sensorID) || ' order by ts desc limit 1 ' INTO childres;
IF childres is not NULL THEN
IF max_ts is NULL OR childres.ts > max_ts THEN
max_ts:= childres.ts;
max_dataID:= childres."dataID";
END IF;
END IF;
END LOOP;
return max_dataID;
END;
$$
language 'plpgsql';
I was on IRC yesterday and a few people indicated the same thing...
Searching for the last reading is a very important function for our database. I wrote the below function searches all child tables for the max. It is not optimization because it doesn't omit tables by look at the check constraints on child tables to see if the last found max is greater than the constraints. Right now this function executes in 50ms vs the 80+ for the same query against the partition set.
create or replace function Data_max(in_sensorID integer) returns bigint AS
$$
DECLARE
childtable RECORD;
childres RECORD;
max_dataID bigint := NULL;
max_ts timestamp without time zone;
BEGIN
FOR childtable in select pc.relname as relname from pg_class pc join pg_inherits pi on pc.oid=pi.inhrelid where inhparent=(select oid from pg_class where relname='Data')
LOOP
EXECUTE ' SELECT "dataID", ts FROM ' || quote_ident( childtable.relname )
|| ' WHERE "sensorID"=' || quote_literal(in_sensorID) || ' order by ts desc limit 1 ' INTO childres;
IF childres is not NULL THEN
IF max_ts is NULL OR childres.ts > max_ts THEN
max_ts:= childres.ts;
max_dataID:= childres."dataID";
END IF;
END IF;
END LOOP;
return max_dataID;
END;
$$
language 'plpgsql';
On Wed, Nov 26, 2008 at 4:48 PM, Gregory Stark <stark@enterprisedb.com> wrote:
"Greg Jaman" <gjaman@gmail.com> writes:I'm afraid this is a known problematic use case of Postgres's current
> I have a problem with partitioning and I'm wondering if anyone can provide
> some insight. I'm trying to find the max value of a column across multiple
> partitions. The query against the partition set is quite slow while queries
> against child partitions is very fast!
partitioning support. Postgres is not capable of finding the plan which you're
undoubtedly looking for where it uses the same plan as your child table query
iterating over the partitions.
There are several groups working to improve this in different ways but none of
them appear to be on track to be in 8.4 so it will be 8.5 or later before they
appear. Sorry.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!
Maybe this is an obviously dumb thing to do, but it looked reasonable to me. The problem is, the seemingly simple sort belowcauses a fairly powerful computer to completely freeze for 5-10 minutes. During the sort, you can't login, you can'tuse any shell sessions you already have open, the Apache server barely works, and even if you do "nice -20 top" beforeyou start the sort, the top(1) command comes to a halt while the sort is proceeding! As nearly as I can tell, thesort operation is causing a swap storm of some sort -- nothing else in my many years of UNIX/Linux experience can causea "nice -20" process to freeze. The sort operation never finishes -- it's always killed by the system. Once it dies, everything returns to normal. This is 8.3.0. (Yes, I'll upgrade soon.) Is this a known bug, or do I have to rewrite this query somehow? Maybe add indexesto all four columns being sorted? Thanks! Craig => explain select * from plus order by supplier_id, compound_id, units, price; QUERY PLAN ----------------------------------------------------------------------- Sort (cost=5517200.48..5587870.73 rows=28268100 width=65) Sort Key: supplier_id, compound_id, units, price -> Seq Scan on plus (cost=0.00..859211.00 rows=28268100 width=65) => \d plus Table "emol_warehouse_1.plus" Column | Type | Modifiers ---------------+---------------+----------- supplier_id | integer | supplier_name | text | compound_id | text | amount | text | units | text | price | numeric(12,2) | currency | text | description | text | sku | text | Indexes: "i_plus_compound_id" btree (supplier_id, compound_id) "i_plus_supplier_id" btree (supplier_id) max_connections = 1000 shared_buffers = 2000MB work_mem = 256MB max_fsm_pages = 1000000 max_fsm_relations = 5000 synchronous_commit = off #wal_sync_method = fdatasync wal_buffers = 256kB checkpoint_segments = 30 effective_cache_size = 4GB Machine: Dell, 8x64-bit CPUs, 8GB ram, Perc6i battery-backed RAID controller, 8 disks as RAID10
Craig James wrote: > Maybe this is an obviously dumb thing to do, ... and it was. I answered my own question: The problem came from using psql(1) to do something I should have done withpg_dump. > but it looked reasonable to > me. The problem is, the seemingly simple sort below causes a fairly > powerful computer to completely freeze for 5-10 minutes. During the > sort, you can't login, you can't use any shell sessions you already have > open, the Apache server barely works, and even if you do "nice -20 top" > before you start the sort, the top(1) command comes to a halt while the > sort is proceeding! As nearly as I can tell, the sort operation is > causing a swap storm of some sort -- nothing else in my many years of > UNIX/Linux experience can cause a "nice -20" process to freeze. > > The sort operation never finishes -- it's always killed by the system. > Once it dies, everything returns to normal. > > This is 8.3.0. (Yes, I'll upgrade soon.) Is this a known bug, or do I > have to rewrite this query somehow? Maybe add indexes to all four > columns being sorted? > > Thanks! > Craig > > > => explain select * from plus order by supplier_id, compound_id, units, > price; > QUERY PLAN > ----------------------------------------------------------------------- > Sort (cost=5517200.48..5587870.73 rows=28268100 width=65) > Sort Key: supplier_id, compound_id, units, price > -> Seq Scan on plus (cost=0.00..859211.00 rows=28268100 width=65) > > => \d plus Table "emol_warehouse_1.plus" > Column | Type | Modifiers > ---------------+---------------+----------- > supplier_id | integer | supplier_name | text | > compound_id | text | amount | text | > units | text | price | numeric(12,2) | > currency | text | description | text | > sku | text | Indexes: > "i_plus_compound_id" btree (supplier_id, compound_id) > "i_plus_supplier_id" btree (supplier_id) > > > max_connections = 1000 > shared_buffers = 2000MB > work_mem = 256MB > max_fsm_pages = 1000000 > max_fsm_relations = 5000 > synchronous_commit = off > #wal_sync_method = fdatasync > wal_buffers = 256kB > checkpoint_segments = 30 > effective_cache_size = 4GB > > Machine: Dell, 8x64-bit CPUs, 8GB ram, Perc6i battery-backed RAID > controller, 8 disks as RAID10 Craig
Don't reply to another message when starting a new thread. People will miss your message. Craig James wrote: > Maybe this is an obviously dumb thing to do, but it looked reasonable to > me. Looks reasonable here too - except I'm not sure what I'd do with 2 million rows of sorted table in my console. I'm guessing you're piping the output into something. > The problem is, the seemingly simple sort below causes a fairly > powerful computer to completely freeze for 5-10 minutes. During the > sort, you can't login, you can't use any shell sessions you already have > open, the Apache server barely works, and even if you do "nice -20 top" > before you start the sort, the top(1) command comes to a halt while the > sort is proceeding! As nearly as I can tell, the sort operation is > causing a swap storm of some sort -- nothing else in my many years of > UNIX/Linux experience can cause a "nice -20" process to freeze. Nothing should cause that to your machine. I've never seen "top" just freeze unless you set up some sort of fork-bomb and ramp the load up so fast it can't cope. Oh, and nice-ing the client isn't going to do anything to the backend actually doing the sorting. > The sort operation never finishes -- it's always killed by the system. > Once it dies, everything returns to normal. You're running out of memory then. It'll be the out-of-memory killer (assuming you're on Linux). > This is 8.3.0. (Yes, I'll upgrade soon.) Make "soon" more urgent than it has been up to now - no point in risking all your data to some already fixed bug is there? Unless you've been carefully tracking the release notes and have established that there's no need in your precise scenario. > Is this a known bug, or do I > have to rewrite this query somehow? Maybe add indexes to all four > columns being sorted? Indexes won't necessarily help if you're sorting the whole table. Maybe if you had one on all four columns. > => explain select * from plus order by supplier_id, compound_id, units, > price; > max_connections = 1000 > shared_buffers = 2000MB > work_mem = 256MB So can you support (1000 * 256 * 2) + 2000 MB of RAM? > effective_cache_size = 4GB ...while leaving 4GB free for disk caching? > Machine: Dell, 8x64-bit CPUs, 8GB ram, Perc6i battery-backed RAID > controller, 8 disks as RAID10 It appears not. Remember that work_mem is not only per-connection, a single query can use multiples of it (hence the *2 above). If you genuinely have a lot of connections I'd drop it down to (say) 4MB to make sure you don't swap on a regular basis (should probably be even lower to be truly safe). Then, for the odd case when you need a large value, issue a SET work_mem before the query. -- Richard Huxton Archonet Ltd
Craig James <craig_james@emolecules.com> writes: > Maybe this is an obviously dumb thing to do, but it looked reasonable > to me. The problem is, the seemingly simple sort below causes a > fairly powerful computer to completely freeze for 5-10 minutes. trace_sort output might be informative. regards, tom lane
>> Maybe this is an obviously dumb thing to do, but it looked reasonable to >> me. > > Looks reasonable here too - except I'm not sure what I'd do with 2 > million rows of sorted table in my console. I'm guessing you're piping > the output into something. Probably it's psql that is choking from buffering the rows. If you want to fetch that huge amount of data into a user application, a CURSOR is the best way to do so.