Home > mailing lists

Re: strange slow query - lost lot of time somewhere - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: strange slow query - lost lot of time somewhere
Date	May 11, 2022 03:50:11
Msg-id	CAApHDvpJhdmWWb3maE7XRj-F2k+9c7h=12_oENpVYkLW=A4OSQ@mail.gmail.com Whole thread Raw
In response to	Re: strange slow query - lost lot of time somewhere (David Rowley <dgrowleyml@gmail.com>)
List	pgsql-hackers

Tree view

On Tue, 10 May 2022 at 14:22, Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Fri, May 06, 2022 at 09:27:57PM +1200, David Rowley wrote:
> > I'm very tempted to change the EXPLAIN output in at least master to
> > display the initial and final (maximum) hash table sizes. Wondering if
> > anyone would object to that?
>
> No objection to add it to v15.
>
> I'll point out that "Cache Mode" was added to EXPLAIN between 11.1 and 11.2
> without controversy, so this could conceivably be backpatched to v14, too.
>
> commit 6c32c0977783fae217b5eaa1d22d26c96e5b0085

This is seemingly a good point, but I don't really think it's a case
of just keeping the EXPLAIN output stable in minor versions, it's more
about adding new fields to structs.

I just went and wrote the patch and the fundamental difference seems
to be that what I did in 6c32c0977 managed to only add a new field in
the empty padding between two fields. That resulted in no fields in
the struct being pushed up in their address offset. The idea here is
not to break any extension that's already been compiled that
references some field that comes after that.

In the patch I've just written, I've had to add some fields which
causes sizeof(MemoizeState) to go up resulting in the offsets of some
later fields changing.

One thing I'll say about this patch is that I found it annoying that I
had to add code to cache_lookup() when we failed to find an entry.
That's probably not the end of the world as that's only for cache
misses.  Ideally, I'd just be looking at the size of the hash table at
the end of execution, however, naturally, we must show the EXPLAIN
output before we shut down the executor.

I just copied the Hash Join output. It looks like:

# alter table tt alter column a set (n_distinct=4);
ALTER TABLE
# analyze tt;
# explain (analyze, costs off, timing off) select * from tt inner join
t2 on tt.a=t2.a;
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Nested Loop (actual rows=1000000 loops=1)
   ->  Seq Scan on tt (actual rows=1000000 loops=1)
   ->  Memoize (actual rows=1 loops=1000000)
         Cache Key: tt.a
         Cache Mode: logical
         Hits: 999990  Misses: 10  Evictions: 0  Overflows: 0  Memory Usage: 2kB
         Hash Buckets: 16 (originally 4)
         ->  Index Only Scan using t2_pkey on t2 (actual rows=1 loops=10)
               Index Cond: (a = tt.a)
               Heap Fetches: 0
 Planning Time: 0.483 ms
 Execution Time: 862.860 ms
(12 rows)

Does anyone have any views about the attached patch going into v15?

David

Attachment

memoize_display_hash_buckets.patch

pgsql-hackers by date:

From: Masahiko Sawada
Date: 11 May 2022, 03:46:56
Subject: Re: Perform streaming logical transactions by background workers and parallel apply

From: David Rowley
Date: 11 May 2022, 03:54:19
Subject: Re: First draft of the PG 15 release notes

Re: strange slow query - lost lot of time somewhere - Mailing list pgsql-hackers

Attachment

Previous

Next