Re: Yet another vectorized engine - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Yet another vectorized engine
Date
Msg-id 302c6f54-668a-dd38-f8ba-29f0cd667e02@postgrespro.ru
Whole thread Raw
In response to Re: Yet another vectorized engine  (Hubert Zhang <hzhang@pivotal.io>)
Responses Re: Yet another vectorized engine  (Hubert Zhang <hzhang@pivotal.io>)
List pgsql-hackers


On 26.02.2020 13:11, Hubert Zhang wrote:


and with JIT:

 13.88%  postgres  postgres             [.] tts_buffer_heap_getsomeattrs
   7.15%  postgres  vectorize_engine.so  [.] vfloat8_accum
   6.03%  postgres  postgres             [.] HeapTupleSatisfiesVisibility
   5.55%  postgres  postgres             [.] bpchareq
   4.42%  postgres  vectorize_engine.so  [.] VExecStoreColumns
   4.19%  postgres  postgres             [.] hashbpchar
   4.09%  postgres  vectorize_engine.so  [.] vfloat8pl


I also tested Q1 with your latest code. Result of vectorized is still slow.
PG13 native: 38 secs
PG13 Vec: 30 secs
PG13 JIT: 23 secs
PG13 JIT+Vec: 27 secs


It is strange that your results are much slower than my and profile is very different.
Which postgres configuration you are using?


My perf result is as belows. There are three parts: 
1. lookup_hash_entry(43.5%) this part is not vectorized yet.
It is vectorized in some sense: lookup_hash_entry performs bulk of hash lookups and pass array with results of such lookups to aggregate transmit functions.
It will be possible to significantly increase speed of HashAgg if we store data in order of grouping attributes and use RLE (run length encoding) to peform just one
hash lookup for group of values. But it requires creation of special partitions (like it is done in Vertica and VOPS).

2. scan part: fetch_input_tuple(36%)
3. vadvance_aggregates part(20%)
I also perfed on PG96 vectorized version and got similar perf results and running time of vectorized PG96 and PG13 are also similar. But PG13 is much faster than PG96. So I just wonder whether we merge all the latest executor code of PG13 into the vectorized PG13 branch?

Sorry, I do not understand the question. vectorize_executor contains patched versions of nodeSeqscan  and nodeAgg from standard executor.
When performing porting to PG13, I took the latest version of nodeAgg and tried to apply your patches to it. Certainly not always it was possible and I have to rewrite a lt of places. Concerning nodeSeqscan - I took old version from vectorize_executor and port it to PG13.


- agg_fill_hash_table ◆ - 43.50% lookup_hash_entry (inlined) ▒ + 39.07% LookupTupleHashEntry ▒ 0.56% ExecClearTuple (inlined) ▒ - 36.06% fetch_input_tuple ▒ - ExecProcNode (inlined) ▒ - 36.03% VExecScan ▒ - 34.60% ExecScanFetch (inlined) ▒ - ExecScanFetch (inlined) ▒ - VSeqNext ▒ + 16.64% table_scan_getnextslot (inlined) ▒ - 10.29% slot_getsomeattrs (inlined) ▒ - 10.17% slot_getsomeattrs_int ▒ + tts_buffer_heap_getsomeattrs ▒ 7.14% VExecStoreColumns ▒ + 1.38% ExecQual (inlined) ▒ - 20.30% Vadvance_aggregates (inlined) ▒ - 17.46% Vadvance_transition_function (inlined) ▒ + 11.95% vfloat8_accum ▒ + 4.74% vfloat8pl ▒ 0.75% vint8inc_any ▒ + 2.77% ExecProject (inlined)


It is strange that I am not seeing lookup_hash_entry in profile in my case.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Resolving the python 2 -> python 3 mess
Next
From: Fujii Masao
Date:
Subject: Re: Wait event that should be reported while waiting for WALarchiving to finish