Thread: Question with hashed IN

Question with hashed IN

From
Stephan Szabo
Date:
I've noticed that when the stats are wrong (like
in cases where you've loaded data but reltuples
hasn't been updated yet) that a hashed NOT IN
seems to take a significant time penalty.  Is
this to be expected?

On a pktest table with 1 million integers and a dual table with a single
integer and sort_mem set high enough to give a hashed subplan for the
various reltuples values, I saw the following behavior for

explain analyze select * from dual where a not in (select a from pktest);

with reltuples=1000 for pktest, query takes about 96 seconds
reltuples=10000, query takes about 15 seconds
reltuples=100000, query takes about 8 seconds

And the memory usage seemed to be the same even if I set sort_mem back
to 1024.



Re: Question with hashed IN

From
Stephan Szabo
Date:
On Sat, 16 Aug 2003, Stephan Szabo wrote:

>
> I've noticed that when the stats are wrong (like
> in cases where you've loaded data but reltuples
> hasn't been updated yet) that a hashed NOT IN
> seems to take a significant time penalty.  Is
> this to be expected?
>
> On a pktest table with 1 million integers and a dual table with a single
> integer and sort_mem set high enough to give a hashed subplan for the
> various reltuples values, I saw the following behavior for
>
> explain analyze select * from dual where a not in (select a from pktest);
>
> with reltuples=1000 for pktest, query takes about 96 seconds
> reltuples=10000, query takes about 15 seconds
> reltuples=100000, query takes about 8 seconds
>
> And the memory usage seemed to be the same even if I set sort_mem back
> to 1024.

Errm, I meant in the cases where it still chose a hashed
subplan. Stupid cold medicine.



Re: Question with hashed IN

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
>> with reltuples=1000 for pktest, query takes about 96 seconds
>> reltuples=10000, query takes about 15 seconds
>> reltuples=100000, query takes about 8 seconds

> Errm, I meant in the cases where it still chose a hashed
> subplan. Stupid cold medicine.

I'm confused too.  Please explain again when you're feeling better...
        regards, tom lane


Re: Question with hashed IN

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> Basically, the first thing I noticed was that changing reltuples
> on the pg_class row for a table affected the speed of
> explain analyze select * from othertable where foo not in (select bar from
> table);
> even when the plan wasn't changing, seqscan + filter on hashed subquery.

That doesn't make any sense to me --- AFAICS, only the planner pays any
attention to reltuples, so it could only affect things via changing the
plan.  Could we see details?

> Then I noted that changing sort_mem changed the point at which it would
> choose a hashed subquery in the initial plan based on the estimated
> tuples, but didn't seem to actually affect the real memory usage,

Yeah, the hashed=subquery code doesn't make any attempt to spill to
disk.  So if the planner's estimate is badly off, you could see actual
usage well in excess of sort_mem.
        regards, tom lane


Re: Question with hashed IN

From
Stephan Szabo
Date:
On Sun, 17 Aug 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> >> with reltuples=1000 for pktest, query takes about 96 seconds
> >> reltuples=10000, query takes about 15 seconds
> >> reltuples=100000, query takes about 8 seconds
>
> > Errm, I meant in the cases where it still chose a hashed
> > subplan. Stupid cold medicine.
>
> I'm confused too.  Please explain again when you're feeling better...

Basically, the first thing I noticed was that changing reltuples
on the pg_class row for a table affected the speed of

explain analyze select * from othertable where foo not in (select bar from
table);

even when the plan wasn't changing, seqscan + filter on hashed subquery.
I thought that was kind of odd since the plan didn't seem any different,
but the real world time changed by about a factor of 10.

Then I noted that changing sort_mem changed the point at which it would
choose a hashed subquery in the initial plan based on the estimated
tuples, but didn't seem to actually affect the real memory usage, which
means that a table with a few million rows but reltuples still set at 1000
would eat up a very large amount of memory (in my case it sent my machine
a few hundred megs into swap).



Re: Question with hashed IN

From
Stephan Szabo
Date:
On Sun, 17 Aug 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > Basically, the first thing I noticed was that changing reltuples
> > on the pg_class row for a table affected the speed of
> > explain analyze select * from othertable where foo not in (select bar from
> > table);
> > even when the plan wasn't changing, seqscan + filter on hashed subquery.
>
> That doesn't make any sense to me --- AFAICS, only the planner pays any
> attention to reltuples, so it could only affect things via changing the
> plan.  Could we see details?

I've included a perl file that generates data like that I was using and
the output of the commands from that through psql -E on my machine.  The
times seem pretty repeatable in any order so caching and such doesn't seem
to be playing a big part.

> > Then I noted that changing sort_mem changed the point at which it would
> > choose a hashed subquery in the initial plan based on the estimated
> > tuples, but didn't seem to actually affect the real memory usage,
>
> Yeah, the hashed=subquery code doesn't make any attempt to spill to
> disk.  So if the planner's estimate is badly off, you could see actual
> usage well in excess of sort_mem.

Ah, that makes sense then.

Re: Question with hashed IN

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Sun, 17 Aug 2003, Tom Lane wrote:
>> That doesn't make any sense to me --- AFAICS, only the planner pays any
>> attention to reltuples, so it could only affect things via changing the
>> plan.  Could we see details?

> I've included a perl file that generates data like that I was using and
> the output of the commands from that through psql -E on my machine.  The
> times seem pretty repeatable in any order so caching and such doesn't seem
> to be playing a big part.

Oh, I see what it is.  The initial sizing of the hash table (number of
buckets) is done using the planner's estimate of the number of rows out
of the subplan.  In your later examples, the hash table is woefully
overloaded and so searching it takes longer (too many items on each
hash chain).

I'm not sure how important this is to work on.  We could try to make the
executor's hash code more able to adapt when the hash table grows beyond
what it was expecting (by rehashing, etc) but personally I'd rather spend
the time on trying to improve the estimate to begin with.
        regards, tom lane


Re: Question with hashed IN

From
Stephan Szabo
Date:
On Sun, 17 Aug 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > On Sun, 17 Aug 2003, Tom Lane wrote:
> >> That doesn't make any sense to me --- AFAICS, only the planner pays any
> >> attention to reltuples, so it could only affect things via changing the
> >> plan.  Could we see details?
>
> > I've included a perl file that generates data like that I was using and
> > the output of the commands from that through psql -E on my machine.  The
> > times seem pretty repeatable in any order so caching and such doesn't seem
> > to be playing a big part.
>
> Oh, I see what it is.  The initial sizing of the hash table (number of
> buckets) is done using the planner's estimate of the number of rows out
> of the subplan.  In your later examples, the hash table is woefully
> overloaded and so searching it takes longer (too many items on each
> hash chain).
>
> I'm not sure how important this is to work on.  We could try to make the
> executor's hash code more able to adapt when the hash table grows beyond
> what it was expecting (by rehashing, etc) but personally I'd rather spend
> the time on trying to improve the estimate to begin with.

Right.

In case you're wondering, this all came out of playing with doing the NOT
IN query for the ALTER TABLE ADD CONSTRAINT stuff for foreign keys where
those values are potentially still default when the alter occurs. I've
built a not quite complete version using NOT IN and another with NOT
EXISTS (neither handles the 0 column case, although I don't think you can
actually specify it syntactically, and doesn't handle permissions
correctly if the owner can't read the other table) for performance testing
purposes.  The things I noticed were side effects of watching that
distilled into simpler tests.



Re: Question with hashed IN

From
Tom Lane
Date:
I said:
> Oh, I see what it is.  The initial sizing of the hash table (number of
> buckets) is done using the planner's estimate of the number of rows out
> of the subplan.  In your later examples, the hash table is woefully
> overloaded and so searching it takes longer (too many items on each
> hash chain).

> I'm not sure how important this is to work on.  We could try to make the
> executor's hash code more able to adapt when the hash table grows beyond
> what it was expecting (by rehashing, etc) but personally I'd rather spend
> the time on trying to improve the estimate to begin with.

After some further looking, I realized that the backend's standard hashtable
management code (dynahash.c) already has a perfectly good algorithm for
expanding hashtables when they start to get full.  But the code I'd
written for hashed aggregation and grouping didn't use dynahash.c,
because the latter didn't support anything more complex than memcmp()
for comparing keys.  This was obviously a dumb decision in hindsight,
so I've invested the couple hours' work needed to improve dynahash's API
and make the hashed-aggregation code use it.

As of CVS tip I get more reasonable behavior in your example:
                                                       QUERY PLAN
 
 

---------------------------------------------------------------------------------------------------------------------------Seq
Scanon fktest  (cost=12510.00..12532.50 rows=500 width=4) (actual time=7803.77..7803.77 rows=0 loops=1)  Filter: (NOT
(hashedsubplan))  SubPlan    ->  Seq Scan on pktest  (cost=0.00..10010.00 rows=1000000 width=4) (actual
time=248.34..5408.39rows=1000000 loops=1)Total runtime: 7979.45 msec
 
(5 rows)
                                                     QUERY PLAN                                                       

-----------------------------------------------------------------------------------------------------------------------Seq
Scanon fktest  (cost=1260.00..1282.50 rows=500 width=4) (actual time=4482.05..4482.05 rows=0 loops=1)  Filter: (NOT
(hashedsubplan))  SubPlan    ->  Seq Scan on pktest  (cost=0.00..1010.00 rows=100000 width=4) (actual
time=0.09..2052.05rows=1000000 loops=1)Total runtime: 4651.02 msec
 
(5 rows)
                                                    QUERY PLAN                                                      

---------------------------------------------------------------------------------------------------------------------Seq
Scanon fktest  (cost=135.00..157.50 rows=500 width=4) (actual time=5830.16..5830.16 rows=0 loops=1)  Filter: (NOT
(hashedsubplan))  SubPlan    ->  Seq Scan on pktest  (cost=0.00..110.00 rows=10000 width=4) (actual time=0.03..3413.05
rows=1000000loops=1)Total runtime: 5994.26 msec
 
(5 rows)
                                                   QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------Seq
Scanon fktest  (cost=22.50..45.00 rows=500 width=4) (actual time=4160.85..4160.85 rows=0 loops=1)  Filter: (NOT (hashed
subplan)) SubPlan    ->  Seq Scan on pktest  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.03..1755.32
rows=1000000loops=1)Total runtime: 4326.24 msec
 
(5 rows)

Before, the same four tests gave runtimes like these on this machine:Total runtime: 16590.97 msecTotal runtime:
13792.45msecTotal runtime: 22702.87 msecTotal runtime: 115465.43 msec
 
(I forget now whether those times were taken with a backend compiled for
profiling --- so they may not be directly comparable.  But at least I
can say that the increase in runtime with smaller estimates is gone.)
        regards, tom lane