Re: ADD COLUMN ts tsvector GENERATED too slow - Mailing list pgsql-general

From Florents Tselai
Subject Re: ADD COLUMN ts tsvector GENERATED too slow
Date
Msg-id D8E9BE00-14C4-4964-8DB5-D91A3FDA9456@gmail.com
Whole thread Raw
In response to Re: ADD COLUMN ts tsvector GENERATED too slow  (Florents Tselai <florents.tselai@gmail.com>)
Responses Re: ADD COLUMN ts tsvector GENERATED too slow  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-general


On 6 Jul 2022, at 1:35 PM, Florents Tselai <florents.tselai@gmail.com> wrote:



On 6 Jul 2022, at 1:11 PM, Francisco Olarte <folarte@peoplecall.com> wrote:

On Wed, 6 Jul 2022 at 11:55, Florents Tselai <florents.tselai@gmail.com> wrote:
Also, fwiw looking at top the CPU% and MEM% activity, looks like it does data crunching work.
...
On 06.07.22 10:42, Florents Tselai wrote:
I have a beefy server (40+ worker processes , 40GB+ shared buffers) and a table holding (key text, text text,) of around 50M rows.
These are text fields extracted from 4-5 page pdfs each.

How big is yout table? from your query it seems you expect more than
1M-1 ( left... ), but if you have very big text columns it may be
spending a lot of time fully decompressing / reading them ( I'm not
sure if it left(..) on toasted values is optimized to stop after
reading enough ). Also, it has to rewrite a lot of data to insert the
columns, it it takes some ms per row which I would not discard 50M
rows * 1 ms / row = 50ksecs = 500k secs ~=13.9 hours per ms-row, so at
2 ms ( which may be right for reading a big row, calculating the
vector and writing an even bigger row ) it would take more than a day
to finish, which I would not discard given you are asking for a heavy
thing.

50M+ rows and iirc pg_relation_size was north of 80GB or so.


If you have stopped it I would try doing a 1000 row sample in a copied

Haven’t stopped it as I’m not convinced there’s an alternative to just waiting
For it to complete :/ 

table to get an speed idea. Otherwise, with this query, I would
normally monitor disk usage of disk files as an indication of

Actually, I monitored my disk usage and it was **definitely** working as 
It had already eaten up an additional 30% of my disk capacity.

Thus, I’ll have to fall back on my initial solution and use GIN indexes 
To get ts_vectors on the fly.

progress, I'm not sure there is another thing you could look at
without disturbing it.

FWIW, I would consider high mem usage normal in these kind of query,
hi cpu would depend on what you call it, but it wouldn't surprise me
if it has at least one cpu running at full detoasting and doing
vectors, I do not know if alter table can go paralell..


You’re probably right, a lot of the CPU usage could be detoasting.

Francisco Olarte.


Thanks everyone for your comments.
You can consider this solved. 

pgsql-general by date:

Previous
From: Florents Tselai
Date:
Subject: Re: ADD COLUMN ts tsvector GENERATED too slow
Next
From: Alvaro Herrera
Date:
Subject: Re: ADD COLUMN ts tsvector GENERATED too slow