Re: ADD COLUMN ts tsvector GENERATED too slow - Mailing list pgsql-general

From Florents Tselai
Subject Re: ADD COLUMN ts tsvector GENERATED too slow
Date
Msg-id CD7B549E-3BEF-41A9-BA53-D23D816A38D3@gmail.com
Whole thread Raw
In response to Re: ADD COLUMN ts tsvector GENERATED too slow  (Francisco Olarte <folarte@peoplecall.com>)
Responses Re: ADD COLUMN ts tsvector GENERATED too slow  (Florents Tselai <florents.tselai@gmail.com>)
Re: ADD COLUMN ts tsvector GENERATED too slow  (Francisco Olarte <folarte@peoplecall.com>)
List pgsql-general

> On 6 Jul 2022, at 1:11 PM, Francisco Olarte <folarte@peoplecall.com> wrote:
>
> On Wed, 6 Jul 2022 at 11:55, Florents Tselai <florents.tselai@gmail.com> wrote:
>> Also, fwiw looking at top the CPU% and MEM% activity, looks like it does data crunching work.
> ...
>>>> On 06.07.22 10:42, Florents Tselai wrote:
>>>>> I have a beefy server (40+ worker processes , 40GB+ shared buffers) and a table holding (key text, text text,) of
around50M rows. 
>>>>> These are text fields extracted from 4-5 page pdfs each.
>
> How big is yout table? from your query it seems you expect more than
> 1M-1 ( left... ), but if you have very big text columns it may be
> spending a lot of time fully decompressing / reading them ( I'm not
> sure if it left(..) on toasted values is optimized to stop after
> reading enough ). Also, it has to rewrite a lot of data to insert the
> columns, it it takes some ms per row which I would not discard 50M
> rows * 1 ms / row = 50ksecs = 500k secs ~=13.9 hours per ms-row, so at
> 2 ms ( which may be right for reading a big row, calculating the
> vector and writing an even bigger row ) it would take more than a day
> to finish, which I would not discard given you are asking for a heavy
> thing.

50M+ rows and iirc pg_relation_size was north of 80GB or so.

>
> If you have stopped it I would try doing a 1000 row sample in a copied

Haven’t stopped it as I’m not convinced    there’s an alternative to just waiting
For it to complete :/

> table to get an speed idea. Otherwise, with this query, I would
> normally monitor disk usage of disk files as an indication of
> progress, I'm not sure there is another thing you could look at
> without disturbing it.
>
> FWIW, I would consider high mem usage normal in these kind of query,
> hi cpu would depend on what you call it, but it wouldn't surprise me
> if it has at least one cpu running at full detoasting and doing
> vectors, I do not know if alter table can go paralell..
>

You’re probably right, a lot of the CPU usage could be detoasting.

> Francisco Olarte.
>
>




pgsql-general by date:

Previous
From: Francisco Olarte
Date:
Subject: Re: ADD COLUMN ts tsvector GENERATED too slow
Next
From: Florents Tselai
Date:
Subject: Re: ADD COLUMN ts tsvector GENERATED too slow