Re: PostgreSQL OR performance - Mailing list pgsql-performance

From Віталій Тимчишин
Subject Re: PostgreSQL OR performance
Date
Msg-id 331e40660811070114rd70cf0dy194020a9f9ea8802@mail.gmail.com
Whole thread Raw
In response to Re: PostgreSQL OR performance  ("Віталій Тимчишин" <tivv00@gmail.com>)
Responses Re: PostgreSQL OR performance  ("David Wilson" <david.t.wilson@gmail.com>)
Re: PostgreSQL OR performance  (Richard Huxton <dev@archonet.com>)
List pgsql-performance

Yes, the query should output exactly same result as in "Union" plan. I will run "slow" explain analyze now and will repost after it will complete (tomorrow?).
BTW: I'd say planner should think rows estimated as sum of "ORs" estimation minus intersection, but no more then sum or ORs (if intersection is 0). For first condition it has rows=525975, for second it has rows=2403 (with other plans, of course), so it's strange it has such a high estimation.... It's exactly 50% of full cartesian join of merge, so it does think that every second pair would succeed, that is not true.


I am sorry, I've emptied atom_match table, so one part produce 0 result, but anyway here is explain:

"Merge Join  (cost=518771.07..62884559.80 rows=1386158171 width=32) (actual time=30292.802..755751.242 rows=34749 loops=1)"
"  Merge Cond: (production.run.id = (production.company.run_id)::bigint)"
"  Join Filter: (((production.company.name)::text = (production.company.name)::text) OR (hashed subplan))"
"  ->  Sort  (cost=45474.92..45606.54 rows=52648 width=38) (actual time=562.928..595.128 rows=15507 loops=1)"
"        Sort Key: production.run.id"
"        Sort Method:  external sort  Disk: 880kB"
"        ->  Nested Loop  (cost=1184.82..39904.24 rows=52648 width=38) (actual time=90.571..530.925 rows=15507 loops=1)"
"              ->  HashAggregate  (cost=1.55..1.56 rows=1 width=8) (actual time=3.077..3.078 rows=1 loops=1)"
"                    ->  Seq Scan on run  (cost=0.00..1.55 rows=1 width=8) (actual time=3.066..3.068 rows=1 loops=1)"
"                          Filter: ((name)::text = 'test'::text)"
"              ->  Nested Loop  (cost=1183.27..39376.19 rows=52648 width=30) (actual time=87.489..484.605 rows=15507 loops=1)"
"                    ->  HashAggregate  (cost=1.55..1.56 rows=1 width=8) (actual time=0.016..0.019 rows=1 loops=1)"
"                          ->  Seq Scan on run  (cost=0.00..1.55 rows=1 width=8) (actual time=0.009..0.011 rows=1 loops=1)"
"                                Filter: ((name)::text = 'test'::text)"
"                    ->  Bitmap Heap Scan on company  (cost=1181.72..38592.03 rows=62608 width=30) (actual time=87.465..441.014 rows=15507 loops=1)"
"                          Recheck Cond: ((production.company.run_id)::bigint = production.run.id)"
"                          Filter: ((production.company.status)::text = 'unprocessed'::text)"
"                          ->  Bitmap Index Scan on comp_run  (cost=0.00..1166.07 rows=62608 width=0) (actual time=65.828..65.828 rows=15507 loops=1)"
"                                Index Cond: ((production.company.run_id)::bigint = production.run.id)"
"  ->  Materialize  (cost=469981.13..498937.42 rows=2316503 width=30) (actual time=15915.639..391938.338 rows=242752539 loops=1)"
"        ->  Sort  (cost=469981.13..475772.39 rows=2316503 width=30) (actual time=15915.599..19920.912 rows=2316503 loops=1)"
"              Sort Key: production.company.run_id"
"              Sort Method:  external merge  Disk: 104896kB"
"              ->  Seq Scan on company  (cost=0.00..58808.03 rows=2316503 width=30) (actual time=22.244..7476.954 rows=2316503 loops=1)"
"  SubPlan"
"    ->  Nested Loop  (cost=2267.65..3314.94 rows=22 width=1038) (actual time=0.009..0.009 rows=0 loops=1)"
"          ->  Hash Join  (cost=2267.65..3141.36 rows=22 width=523) (actual time=0.006..0.006 rows=0 loops=1)"
"                Hash Cond: ((atom_match.atom1_id)::integer = s1.id)"
"                ->  Seq Scan on atom_match  (cost=0.00..30.38 rows=1630 width=8) (actual time=0.002..0.002 rows=0 loops=1)"
"                      Filter: ((match_function_id)::integer = 2)"
"                ->  Hash  (cost=1292.04..1292.04 rows=12209 width=523) (never executed)"
"                      ->  Index Scan using atomstr_typ on atoms_string s1  (cost=0.00..1292.04 rows=12209 width=523) (never executed)"
"                            Index Cond: ((atom_type_id)::integer = (-1))"
"          ->  Index Scan using pk_atoms_string on atoms_string s2  (cost=0.00..7.88 rows=1 width=523) (never executed)"
"                Index Cond: (s2.id = (atom_match.atom2_id)::integer)"
"Total runtime: 755802.686 ms"

P.S. May be I've chosen wrong list and my Q better belongs to -hackers?

pgsql-performance by date:

Previous
From: "David Rees"
Date:
Subject: Re: Create and drop temp table in 8.3.4
Next
From: "David Wilson"
Date:
Subject: Re: PostgreSQL OR performance