Thread: Question with hashed IN
I've noticed that when the stats are wrong (like in cases where you've loaded data but reltuples hasn't been updated yet) that a hashed NOT IN seems to take a significant time penalty. Is this to be expected? On a pktest table with 1 million integers and a dual table with a single integer and sort_mem set high enough to give a hashed subplan for the various reltuples values, I saw the following behavior for explain analyze select * from dual where a not in (select a from pktest); with reltuples=1000 for pktest, query takes about 96 seconds reltuples=10000, query takes about 15 seconds reltuples=100000, query takes about 8 seconds And the memory usage seemed to be the same even if I set sort_mem back to 1024.
On Sat, 16 Aug 2003, Stephan Szabo wrote: > > I've noticed that when the stats are wrong (like > in cases where you've loaded data but reltuples > hasn't been updated yet) that a hashed NOT IN > seems to take a significant time penalty. Is > this to be expected? > > On a pktest table with 1 million integers and a dual table with a single > integer and sort_mem set high enough to give a hashed subplan for the > various reltuples values, I saw the following behavior for > > explain analyze select * from dual where a not in (select a from pktest); > > with reltuples=1000 for pktest, query takes about 96 seconds > reltuples=10000, query takes about 15 seconds > reltuples=100000, query takes about 8 seconds > > And the memory usage seemed to be the same even if I set sort_mem back > to 1024. Errm, I meant in the cases where it still chose a hashed subplan. Stupid cold medicine.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: >> with reltuples=1000 for pktest, query takes about 96 seconds >> reltuples=10000, query takes about 15 seconds >> reltuples=100000, query takes about 8 seconds > Errm, I meant in the cases where it still chose a hashed > subplan. Stupid cold medicine. I'm confused too. Please explain again when you're feeling better... regards, tom lane
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > Basically, the first thing I noticed was that changing reltuples > on the pg_class row for a table affected the speed of > explain analyze select * from othertable where foo not in (select bar from > table); > even when the plan wasn't changing, seqscan + filter on hashed subquery. That doesn't make any sense to me --- AFAICS, only the planner pays any attention to reltuples, so it could only affect things via changing the plan. Could we see details? > Then I noted that changing sort_mem changed the point at which it would > choose a hashed subquery in the initial plan based on the estimated > tuples, but didn't seem to actually affect the real memory usage, Yeah, the hashed=subquery code doesn't make any attempt to spill to disk. So if the planner's estimate is badly off, you could see actual usage well in excess of sort_mem. regards, tom lane
On Sun, 17 Aug 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > >> with reltuples=1000 for pktest, query takes about 96 seconds > >> reltuples=10000, query takes about 15 seconds > >> reltuples=100000, query takes about 8 seconds > > > Errm, I meant in the cases where it still chose a hashed > > subplan. Stupid cold medicine. > > I'm confused too. Please explain again when you're feeling better... Basically, the first thing I noticed was that changing reltuples on the pg_class row for a table affected the speed of explain analyze select * from othertable where foo not in (select bar from table); even when the plan wasn't changing, seqscan + filter on hashed subquery. I thought that was kind of odd since the plan didn't seem any different, but the real world time changed by about a factor of 10. Then I noted that changing sort_mem changed the point at which it would choose a hashed subquery in the initial plan based on the estimated tuples, but didn't seem to actually affect the real memory usage, which means that a table with a few million rows but reltuples still set at 1000 would eat up a very large amount of memory (in my case it sent my machine a few hundred megs into swap).
On Sun, 17 Aug 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > Basically, the first thing I noticed was that changing reltuples > > on the pg_class row for a table affected the speed of > > explain analyze select * from othertable where foo not in (select bar from > > table); > > even when the plan wasn't changing, seqscan + filter on hashed subquery. > > That doesn't make any sense to me --- AFAICS, only the planner pays any > attention to reltuples, so it could only affect things via changing the > plan. Could we see details? I've included a perl file that generates data like that I was using and the output of the commands from that through psql -E on my machine. The times seem pretty repeatable in any order so caching and such doesn't seem to be playing a big part. > > Then I noted that changing sort_mem changed the point at which it would > > choose a hashed subquery in the initial plan based on the estimated > > tuples, but didn't seem to actually affect the real memory usage, > > Yeah, the hashed=subquery code doesn't make any attempt to spill to > disk. So if the planner's estimate is badly off, you could see actual > usage well in excess of sort_mem. Ah, that makes sense then.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > On Sun, 17 Aug 2003, Tom Lane wrote: >> That doesn't make any sense to me --- AFAICS, only the planner pays any >> attention to reltuples, so it could only affect things via changing the >> plan. Could we see details? > I've included a perl file that generates data like that I was using and > the output of the commands from that through psql -E on my machine. The > times seem pretty repeatable in any order so caching and such doesn't seem > to be playing a big part. Oh, I see what it is. The initial sizing of the hash table (number of buckets) is done using the planner's estimate of the number of rows out of the subplan. In your later examples, the hash table is woefully overloaded and so searching it takes longer (too many items on each hash chain). I'm not sure how important this is to work on. We could try to make the executor's hash code more able to adapt when the hash table grows beyond what it was expecting (by rehashing, etc) but personally I'd rather spend the time on trying to improve the estimate to begin with. regards, tom lane
On Sun, 17 Aug 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > On Sun, 17 Aug 2003, Tom Lane wrote: > >> That doesn't make any sense to me --- AFAICS, only the planner pays any > >> attention to reltuples, so it could only affect things via changing the > >> plan. Could we see details? > > > I've included a perl file that generates data like that I was using and > > the output of the commands from that through psql -E on my machine. The > > times seem pretty repeatable in any order so caching and such doesn't seem > > to be playing a big part. > > Oh, I see what it is. The initial sizing of the hash table (number of > buckets) is done using the planner's estimate of the number of rows out > of the subplan. In your later examples, the hash table is woefully > overloaded and so searching it takes longer (too many items on each > hash chain). > > I'm not sure how important this is to work on. We could try to make the > executor's hash code more able to adapt when the hash table grows beyond > what it was expecting (by rehashing, etc) but personally I'd rather spend > the time on trying to improve the estimate to begin with. Right. In case you're wondering, this all came out of playing with doing the NOT IN query for the ALTER TABLE ADD CONSTRAINT stuff for foreign keys where those values are potentially still default when the alter occurs. I've built a not quite complete version using NOT IN and another with NOT EXISTS (neither handles the 0 column case, although I don't think you can actually specify it syntactically, and doesn't handle permissions correctly if the owner can't read the other table) for performance testing purposes. The things I noticed were side effects of watching that distilled into simpler tests.
I said: > Oh, I see what it is. The initial sizing of the hash table (number of > buckets) is done using the planner's estimate of the number of rows out > of the subplan. In your later examples, the hash table is woefully > overloaded and so searching it takes longer (too many items on each > hash chain). > I'm not sure how important this is to work on. We could try to make the > executor's hash code more able to adapt when the hash table grows beyond > what it was expecting (by rehashing, etc) but personally I'd rather spend > the time on trying to improve the estimate to begin with. After some further looking, I realized that the backend's standard hashtable management code (dynahash.c) already has a perfectly good algorithm for expanding hashtables when they start to get full. But the code I'd written for hashed aggregation and grouping didn't use dynahash.c, because the latter didn't support anything more complex than memcmp() for comparing keys. This was obviously a dumb decision in hindsight, so I've invested the couple hours' work needed to improve dynahash's API and make the hashed-aggregation code use it. As of CVS tip I get more reasonable behavior in your example: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------Seq Scanon fktest (cost=12510.00..12532.50 rows=500 width=4) (actual time=7803.77..7803.77 rows=0 loops=1) Filter: (NOT (hashedsubplan)) SubPlan -> Seq Scan on pktest (cost=0.00..10010.00 rows=1000000 width=4) (actual time=248.34..5408.39rows=1000000 loops=1)Total runtime: 7979.45 msec (5 rows) QUERY PLAN -----------------------------------------------------------------------------------------------------------------------Seq Scanon fktest (cost=1260.00..1282.50 rows=500 width=4) (actual time=4482.05..4482.05 rows=0 loops=1) Filter: (NOT (hashedsubplan)) SubPlan -> Seq Scan on pktest (cost=0.00..1010.00 rows=100000 width=4) (actual time=0.09..2052.05rows=1000000 loops=1)Total runtime: 4651.02 msec (5 rows) QUERY PLAN ---------------------------------------------------------------------------------------------------------------------Seq Scanon fktest (cost=135.00..157.50 rows=500 width=4) (actual time=5830.16..5830.16 rows=0 loops=1) Filter: (NOT (hashedsubplan)) SubPlan -> Seq Scan on pktest (cost=0.00..110.00 rows=10000 width=4) (actual time=0.03..3413.05 rows=1000000loops=1)Total runtime: 5994.26 msec (5 rows) QUERY PLAN -------------------------------------------------------------------------------------------------------------------Seq Scanon fktest (cost=22.50..45.00 rows=500 width=4) (actual time=4160.85..4160.85 rows=0 loops=1) Filter: (NOT (hashed subplan)) SubPlan -> Seq Scan on pktest (cost=0.00..20.00 rows=1000 width=4) (actual time=0.03..1755.32 rows=1000000loops=1)Total runtime: 4326.24 msec (5 rows) Before, the same four tests gave runtimes like these on this machine:Total runtime: 16590.97 msecTotal runtime: 13792.45msecTotal runtime: 22702.87 msecTotal runtime: 115465.43 msec (I forget now whether those times were taken with a backend compiled for profiling --- so they may not be directly comparable. But at least I can say that the increase in runtime with smaller estimates is gone.) regards, tom lane