Jie Li <jay23jack@gmail.com> writes:
> I'm new to window functions. Recently I run some simple queries but
> surprised to find percent_rank is so slower than rank, could anybody tell me
> why?
Huh, interesting. I can reproduce this with toy data, such as
create table inventory1 (inv_date_sk int, inv_item_sk int);
insert into inventory1 select 1, random()* 100000 from generate_series(1,189000);
explain analyze select inv_date_sk,inv_item_sk, percent_rank()over(partition by inv_date_sk order by inv_item_sk) from
inventory1;
The example is *not* particularly slow if you leave work_mem at default.
But if you bump up work_mem enough so that the WindowAgg's internal
tuplestore fits into memory, it slows down like crazy. A bit of quality
time with oprofile shows that all the time is going into this memmove()
in tuplestore_trim():
/* * Slide the array down and readjust pointers. This may look pretty * stupid, but we expect that there will
usuallynot be very many * tuple-pointers to move, so this isn't that expensive; and it keeps a * lot of other
logicsimple. * * In fact, in the current usage for merge joins, it's demonstrable that * there will always be
exactlyone non-removed tuple; so optimize that * case. */ if (nremove + 1 == state->memtupcount)
state->memtuples[0]= state->memtuples[nremove]; else memmove(state->memtuples, state->memtuples + nremove,
(state->memtupcount - nremove) * sizeof(void *));
We're throwing away one tuple at a time as we advance forward through
the tuplestore, and moving 100000+ tuple pointers each time. Ugh.
This code was all right when written, because (IIRC) the mergejoin
case was actually the only caller. But it's not all right for
WindowAgg's less-predictable usage patterns.
I thought for a bit about changing things around so that the first-used
tuple slot isn't necessarily state->memtuples[0], but just like the
comment says, that complicates a lot of other logic. And there isn't
any easy place to reclaim the wasted slots later.
What seems like the best bet is to put in a heuristic to make
tuplestore_trim simply not do anything until nremove reaches some
reasonably large amount, perhaps 10% of the number of stored tuples.
This wastes up to 10% of the alloted memory, but that seems tolerable.
We could complicate things a bit more by remembering that so-and-so
many slots are authorized to be removed, and forcing a trim operation
to discard them if we find ourselves in memory trouble. I'm not sure
that extra complication is worthwhile though. Comments?
regards, tom lane