Thread: Performance problem with low correlation data
I have a problem with the method that PG uses to access my data. Data into testinsert is inserted every 15 minutes. ne_id varies from 1 to 20000. CREATE TABLE testinsert ( ne_id integer NOT NULL, t timestamp without time zone NOT NULL, v integer[], CONSTRAINT testinsert_pk PRIMARY KEY (ne_id, t) ) CREATE UNIQUE INDEX testinsert_time_key ON testinsert USING btree (t, ne_id); This table has, then, a t correlation of 1, and a ne_id correlation close to 0. I query this table using another table: CREATE TABLE idtable ( id integer NOT NULL, groupname varchar(50) CONSTRAINT idtable_pk PRIMARY KEY (id, groupname) ) CREATE INDEX idtable_group_idx ON idtable USING btree (groupname); where each id is associated with a group: select * from idtable left outer join testinsert on id=ne_id where groupname='a group name' and time between $a_date and$another_date PG usually choose a nested loop join over all the ne_ids found for groupname='a group name'. BUT, given the correlation in the table, this is a horrible data access: the table (15GB) gets read randomly, since datafor one ne_id is scattered all over the table; The "best" way to read the table would still be a nested loop, but a loop on the "t" values, not on the ne_id values, sincedata for the same timestamp is "close". Or, even better, something like this would be very nice: Bitmap Heap Scan for each id found in idtable where groupname='a group name' BitmapOr BitmapIndexScan using ne_id and time between $a_date and $another_date That is: I understand why PG is using that access method to fetch the indexes, but I would like it to fetch the heaps onlyafter ALL the indexes have been read, so that it could reorder them... So, given that: How can I tell to PG to use an algorithm such as: fetch the heap for each quarter for each id found where groupname='a group name' fetch all the indexes instead of: for each id found where groupname='a group name' fetch the heap fetch all the indexes where ne_id=id time between $a_date and $another_date ???? ( some other infos: 1) results clustering the table are x10-x20 faster, but I can't cluster the table (it gets written every 15 minutes and readpretty often) 2) I know all the "t" values that I'm going to query, since there won't be more than 1 t per ne_id per 15 minutes; so I could use a generate_series($a_date, $another_date, 15 minutes) if that could help somehow: select * from idtable cross join generate_series($a_date, $another_date, 15 minutes) as myt left outer join testinsert on id=ne_id and myt=t where groupname='a group name' but it doesn't help... )
On Mon, Jul 6, 2009 at 6:32 PM, Scara Maccai<m_lists@yahoo.it> wrote: > The "best" way to read the table would still be a nested loop, but a loop on the > "t" values, not on the ne_id values, since data for the same timestamp is "close". But that would be a different query -- there's no restrictions on the t values in this one. > How can I tell to PG to use an algorithm such as: > > fetch the heap > for each quarter > for each id found where groupname='a group name' > fetch all the indexes Have you tried something using IN or EXISTS instead of a join? The algorithm you describe doesn't work for the join because it has to produce a record which includes the matching group columns. A bitmap scan would return the various groups (I know in your case there's only one but the optimizer can't be sure) mixed together. That might work better in 8.4 than 8.3 as the IN and EXISTS handling is improved. Actually I wonder if doing a sequential scan with a hash join against the group list wouldn't be a better option. That would scan more of the heap but if they're really that spread out it might not make much difference, and it would avoid having to do all the index scans. You could experiment with turning enable_nestloop off and see how fast the hash join plan is. -- greg http://mit.edu/~gsstark/resume.pdf
> But that would be a different query -- there's no > restrictions on the > t values in this one. There is a restriction on the t values: select * from idtable left outer join testinsert on id=ne_id where groupname='a group name' and time between $a_date and$another_date > Have you tried something using IN or EXISTS instead of a > join? I still get nested loop join on the ne_id column... > The > algorithm you describe doesn't work for the join because it > has to > produce a record which includes the matching group columns. Yeah, I thought about that. Basically I guess the "perfect" algorithm would be something like: Hash Join <---- this is needed to join values from both relations -> Bitmap Heap Scan for each id found in idtable where groupname='a group name' BitmapOr BitmapIndexScan using ne_id and time between $a_date and $another_date -> select id from idtable where groupname='a group name' > Actually I wonder if doing a sequential scan with a hash > join against > the group list wouldn't be a better option. The table is pretty big (60M rows), sequential scans are the reason why my queries are so slow: since the correlation onthe ne_id col is so bad, the planner chooses seq scans when dealing with most of the "t" values, even if the number of"ne_id" values is low. For the moment I've found this solution: whenever too many "t" are selected, which would lead the planner towards a seq scan (or a very poor bitmap index scan incase I disable seq scans) I create a temporary table: create temporary table alldata as select * FROM generate_series(mydatestart, mydateend, '15 minutes'::interval) as t cross join idtable where groupname='a group name' order by t,id; analyze alldata; select * from alldata left outer join testinsert using (ne_id,t); basically I'm doing what I'd like PG to do: since the correlation on the "t" col is good, and correlation on the "id" col is bad, query the index using the right order:"t" first, "id" then (given by the "order by t,id" on the creation of the temp table). I would like PG to do that for me. Since it knows an index scan looping on ne_id would be wrong, I'd like it to create a"materialized" table where data is ordered by "t" first instead of going for the seq scan. This would lead to a x10 - x100 improvement on query time.
Since noone replied to http://www.mail-archive.com/pgsql-general@postgresql.org/msg133360.html, I tried another approach: I can't cluster the whole table every day; it would take too much (as I said, table as 60M rows, and I have hundreds of them). Plus, it wouldn't really make much sense: the only portion of table to be clustered is the one written after the last "cluster"command (since no row is deleted/updated, only inserted each 15 minutes). So I thought: I'll "cluster" only the part that has been written every day: begin; lock table testinsert in ACCESS EXCLUSIVE MODE; insert into testinsert select ne_id+100000, t, v from testinsert where t between '2009-08-01 00:00:00' and '2009-08-02 00:00:00' order by ne_id,t; DELETE from testinsert where t between '2009-08-01 00:00:00' and '2009-08-02 00:00:00' and ne_id<100000; update testinsert set ne_id = ne_id - 100000 where t between '2009-08-01 00:00:00' and '2009-08-02 00:00:00'; commit; this would run after midnight of 2009-08-02. Next day would have different time values. What I'm trying to do here is cluster on ne_id,t the portion of table written every day. Well, I guess the table is layed out as expected, but in pg_stats correlation for the ne_id col is still VERY low: select attname,n_distinct,correlation from pg_stats where tablename='testinsert3'; attname | n_distinct | correlation ---------+------------+------------- ne_id | 20000 | 0.111041 <---- low value t | 864 | 0.987778 v | 1 | 1 this leads the planner to sequence scans of the table as soon as 10% of the table has to be read: explain select * FROM idtable as g inner join testinsert on id=ne_id where groupid between 1 and 4 and t between'2009-08-01 00:00:00' and '2009-08-09 00:00:00' Hash Join (cost=134.45..2127071.28 rows=614331 width=244) Hash Cond: (testinsert3.ne_id = g.id) -> Seq Scan on testinsert (cost=0.00..2063200.00 rows=15358272 width=236) Filter: ((t >= '2009-08-01 00:00:00'::timestamp without time zone) AND (t <= '2009-08-09 00:00:00'::timestampwithout time zone)) -> Hash (cost=124.45..124.45 rows=800 width=8) -> Bitmap Heap Scan on idtable g (cost=24.45..124.45 rows=800 width=8) Recheck Cond: ((groupid >= 1) AND (groupid <= 4)) -> Bitmap Index Scan on idtable_pk (cost=0.00..24.25 rows=800 width=0) Index Cond: ((groupid >= 1) AND (groupid <= 4)) Which is a terrible plan! testinsert contains t values between '2009-08-01' and '2009-08-09', and ne_id from 1 to 20000. But only 800 out of 20000ne_id have to be read; there's no need for a table scan! I guess this is a reflection of the poor "correlation" on ne_id; but, as I said, I don't really think ne_id is so bad correlated. In fact, doing a "select ne_id, t from testinsert limit 100000" I can see that data is laid out pretty much by "ne_id, t",grouped by day (that is, same ne_id for one day, then next ne_id and so on until next day). How is the "correlation" calculated? Can someone explain to me why, after the procedure above,correlation is so low???
m_lists@yahoo.it wrote: > testinsert contains t values between '2009-08-01' and '2009-08-09', and ne_id from 1 to 20000. But only 800 out of 20000ne_id have to be read; there's no need for a table scan! > I guess this is a reflection of the poor "correlation" on ne_id; but, as I said, I don't really think ne_id is so bad correlated. > In fact, doing a "select ne_id, t from testinsert limit 100000" I can see that data is laid out pretty much by "ne_id,t", grouped by day (that is, same ne_id for one day, then next ne_id and so on until next day). > How is the "correlation" calculated? Can someone explain to me why, after the procedure above,correlation is so low??? Did you run ANALYZE after the procedure above? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
> > testinsert contains t values between '2009-08-01' and '2009-08-09', and ne_id > from 1 to 20000. But only 800 out of 20000 ne_id have to be read; there's no > need for a table scan! > > I guess this is a reflection of the poor "correlation" on ne_id; but, as I > said, I don't really think ne_id is so bad correlated. > > In fact, doing a "select ne_id, t from testinsert limit 100000" I can see > that data is laid out pretty much by "ne_id, t", grouped by day (that is, same > ne_id for one day, then next ne_id and so on until next day). > > How is the "correlation" calculated? Can someone explain to me why, after the > procedure above,correlation is so low??? > > Did you run ANALYZE after the procedure above? Yes I did; the correlation on that column stays low. Of course, I didn't expect a correlation = 1, since data is layed out (pretty much) like this: (ne_id1) (t1 day1) (ne_id1) (t2 day1) ... (ne_id1) (tn day1) (ne_id2) (t1 day1) (ne_id2) (t2 day1) ... (ne_id2) (tn day1) ... (pretty much all the ne_ids) (ne_id1) (t1 day2) (ne_id1) (t2 day2) ... (ne_id1) (tn day2) (ne_id2) (t1 day2) (ne_id2) (t2 day2) ... (ne_id2) (tn day2) ... and so on so I ne_id is not strictly incrementing, but it is pretty much the same (sequencially) for a whole whole day...