Thread: Optimizer's issue
I found strange issue in very simple query. Statistics for all columns is on the level 1000 but I also tried other levels. create table g ( id bigint primary key, isgroup boolean not null); create table a ( groupid bigint references g(id), id bigint, unique(id, groupid)); analyze g; analyze a; select count(*) from a 294 select count(*) from g 320 explain analyze select * from g join a on a.groupid = g.id where g.isgroup Hash Join (cost=5.35..11.50 rows=11 width=25) (actual time=0.261..1.755 rows=294 loops=1) Hash Cond: (a.groupid = g.id) -> Seq Scan on a (cost=0.00..4.94 rows=294 width=16) (actual time=0.047..0.482 rows=294 loops=1) -> Hash (cost=5.20..5.20 rows=12 width=9) (actual time=0.164..0.164 rows=12 loops=1) -> Seq Scan on g (cost=0.00..5.20 rows=12 width=9) (actual time=0.042..0.136 rows=12 loops=1) Filter: isgroup Total runtime: 2.225 ms And this is more interesting: explain analyze select * from g join a on a.groupid = g.id where not g.isgroup Hash Join (cost=9.05..17.92 rows=283 width=25) (actual time=2.038..2.038 rows=0 loops=1) Hash Cond: (a.groupid = g.id) -> Seq Scan on a (cost=0.00..4.94 rows=294 width=16) (actual time=0.046..0.478 rows=294 loops=1) -> Hash (cost=5.20..5.20 rows=308 width=9) (actual time=1.090..1.090 rows=308 loops=1) -> Seq Scan on g (cost=0.00..5.20 rows=308 width=9) (actual time=0.038..0.557 rows=308 loops=1) Filter: (NOT isgroup) Total runtime: 2.126 ms PostgreSQL 8.3 These queries are part of big query and optimizer put them on the leaf of query tree, so rows miscount causes a real problem. Statistics for table a: id -- histogram_bounds: {1,40,73,111,143,174,204,484,683,715,753} correlation: 0.796828 groupid ------- n_distinct: 12 most_common_vals: {96,98,21,82,114,131,48,44,173,682,752} most_common_freqs: {0.265306,0.166667,0.163265,0.136054,0.0884354,0.0782313,0.0714286,0.00680272,0.00680272,0.00680272,0.00680272} correlation: 0.366704 for table g: id -- histogram_bounds: {1,32,64,101,134,166,199,451,677,714,753} correlation: 1 isgroup ------- n_distinct: 2 most_common_freqs: {0.9625,0.0375} correlation: 0.904198
Vlad Arkhipov wrote: > I found strange issue in very simple query. You forgot to mention what your problem is. Yours, Laurenz Albe
Albe Laurenz пишет: > Vlad Arkhipov wrote: > >> I found strange issue in very simple query. >> > > You forgot to mention what your problem is. > > Yours, > Laurenz Albe > It was written below in my first post: "These queries are part of big query and optimizer put them on the leaf of query tree, so rows miscount causes a real problem. " actual rows count for the first query is 294, estimate - 11; for the second -- 283 and 0. Postgres tries to use nested loops when it should use merge/hash join and vice versa.
On Thu, 24 Apr 2008, Vlad Arkhipov wrote: > It was written below in my first post: > "These queries are part of big query and optimizer put them on the leaf > of query tree, so rows miscount causes a real problem. " > actual rows count for the first query is 294, estimate - 11; for the > second -- 283 and 0. Postgres tries to use nested loops when it should > use merge/hash join and vice versa. The queries are taking less than three milliseconds. You only have three hundred rows in each table. The precise algorithm Postgres uses probably won't make much of a difference. Have you tried turning off the relevant join plans to see how long Postgres takes with your desired plan? When you have millions of rows, the algorithm matters a lot more. Matthew -- Richards' Laws of Data Security: 1. Don't buy a computer. 2. If you must buy a computer, don't turn it on.
On Thu, 24 Apr 2008 03:14:54 +0200, Vlad Arkhipov <arhipov@dc.baikal.ru> wrote: > I found strange issue in very simple query. Statistics for all columns > is on the level 1000 but I also tried other levels. > > create table g ( > id bigint primary key, > isgroup boolean not null); > > create table a ( > groupid bigint references g(id), > id bigint, > unique(id, groupid)); > > analyze g; > analyze a; > > select count(*) from a > 294 > > select count(*) from g > 320 > > explain analyze > select * > from g > join a on a.groupid = g.id > where g.isgroup > > Hash Join (cost=5.35..11.50 rows=11 width=25) (actual time=0.261..1.755 > rows=294 loops=1) > Hash Cond: (a.groupid = g.id) > -> Seq Scan on a (cost=0.00..4.94 rows=294 width=16) (actual > time=0.047..0.482 rows=294 loops=1) > -> Hash (cost=5.20..5.20 rows=12 width=9) (actual time=0.164..0.164 > rows=12 loops=1) > -> Seq Scan on g (cost=0.00..5.20 rows=12 width=9) (actual > time=0.042..0.136 rows=12 loops=1) > Filter: isgroup > Total runtime: 2.225 ms You should really put an EXPLAIN ANALYZE of your big query. This little query plan seems OK to me. Two very small tables, ok, hash'em, it's the best. Now, of course if it is repeated for every row in your JOIN, you have a problem. The question is, why is it repeated for every row ? This cannot be answered without seeing the whole query. Another question would be, is there a way to structure the tables differently ? Again, this cannot be answered without seeing the whole query, and some explanation about what the data & fields mean. Please provide more information...
PFC пишет: > On Thu, 24 Apr 2008 03:14:54 +0200, Vlad Arkhipov > <arhipov@dc.baikal.ru> wrote: > >> I found strange issue in very simple query. Statistics for all columns >> is on the level 1000 but I also tried other levels. >> >> create table g ( >> id bigint primary key, >> isgroup boolean not null); >> >> create table a ( >> groupid bigint references g(id), >> id bigint, >> unique(id, groupid)); >> >> analyze g; >> analyze a; >> >> select count(*) from a >> 294 >> >> select count(*) from g >> 320 >> >> explain analyze >> select * >> from g >> join a on a.groupid = g.id >> where g.isgroup >> >> Hash Join (cost=5.35..11.50 rows=11 width=25) (actual time=0.261..1.755 >> rows=294 loops=1) >> Hash Cond: (a.groupid = g.id) >> -> Seq Scan on a (cost=0.00..4.94 rows=294 width=16) (actual >> time=0.047..0.482 rows=294 loops=1) >> -> Hash (cost=5.20..5.20 rows=12 width=9) (actual time=0.164..0.164 >> rows=12 loops=1) >> -> Seq Scan on g (cost=0.00..5.20 rows=12 width=9) (actual >> time=0.042..0.136 rows=12 loops=1) >> Filter: isgroup >> Total runtime: 2.225 ms > > You should really put an EXPLAIN ANALYZE of your big query. > > This little query plan seems OK to me. > Two very small tables, ok, hash'em, it's the best. > Now, of course if it is repeated for every row in your JOIN, you > have a problem. > The question is, why is it repeated for every row ? > This cannot be answered without seeing the whole query. > > Another question would be, is there a way to structure the tables > differently ? > Again, this cannot be answered without seeing the whole query, and > some explanation about what the data & fields mean. > > Please provide more information... > > > I redesigned tables structure and the query seems to be become faster. You was right, the problem was not in this query.