Thread: Removing a key from jsonb is sloooow
I'm trying to remove a key from a jsonb column in a table with 10K rows, and the performance is abysmal. When the key is missing, it takes 5 minutes. When the key is present, it takes even longer.
Test with non-existent key:
>> update projects set misc = misc - 'foo';
Time: 324711.960 ms (05:24.712)
Time: 324711.960 ms (05:24.712)
What can I do to improve this?
On 7/17/19 7:30 AM, Volkan Unsal wrote: > I'm trying to remove a key from a jsonb column in a table with 10K rows, > and the performance is abysmal. When the key is missing, it takes 5 > minutes. When the key is present, it takes even longer. > > Test with non-existent key: > > >> update projects set misc = misc - 'foo'; > Time: 324711.960 ms (05:24.712) > > What can I do to improve this? Provide some useful information: 1) Postgres version 2) Table schema 3) Explain analyze of query -- Adrian Klaver adrian.klaver@aklaver.com
Hi,
Your current query actually process your 10K rows,
it is a bit of an overkill if only a few row contains the key you want to delete.
Depending on how big your json data is, this could be problematic.
Have you considered adding a where clause to your query ?
Also, maybe you could create an index on your jsonb column
to improve the identification of rows which
contains the key you want to delete ?
Le mer. 17 juil. 2019 à 16:31, Volkan Unsal <spocksplanet@gmail.com> a écrit :
I'm trying to remove a key from a jsonb column in a table with 10K rows, and the performance is abysmal. When the key is missing, it takes 5 minutes. When the key is present, it takes even longer.Test with non-existent key:>> update projects set misc = misc - 'foo';
Time: 324711.960 ms (05:24.712)What can I do to improve this?
On 7/17/19 7:58 AM, Volkan Unsal wrote: Please post to list also. Ccing list > @Adrian > > More information about my setup: > > Postgres version: > PostgreSQL 10.9 (Debian 10.9-1.pgdg90+1) on x86_64-pc-linux-gnu, > compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit > > Table schema: > CREATE TABLE public.projects ( > misc jsonb DEFAULT '{}'::jsonb NOT NULL > ); > > Explain analyze: > explain analyze update projects set misc = misc - 'foo'; > > Update on projects (cost=0.00..4240.93 rows=10314 width=1149) (actual > time=346318.291..346318.295 rows=0 loops=1) > -> Seq Scan on projects (cost=0.00..4240.93 rows=10314 width=1149) > (actual time=1.011..266.435 rows=10314 loops=1) > Planning time: 40.087 ms > Trigger trigger_populate_tsv_body_on_projects: time=341202.492 calls=10314 > Execution time: 346320.260 ms > > Time: 345969.035 ms (05:45.969) > > > > On Wed, Jul 17, 2019 at 10:39 AM Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > On 7/17/19 7:30 AM, Volkan Unsal wrote: > > I'm trying to remove a key from a jsonb column in a table with > 10K rows, > > and the performance is abysmal. When the key is missing, it takes 5 > > minutes. When the key is present, it takes even longer. > > > > Test with non-existent key: > > > > >> update projects set misc = misc - 'foo'; > > Time: 324711.960 ms (05:24.712) > > > > What can I do to improve this? > > Provide some useful information: > > 1) Postgres version > > 2) Table schema > > 3) Explain analyze of query > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > > > -- > *Volkan Unsal* > /Product Engineer/ > volkanunsal.com <http://volkanunsal.com> -- Adrian Klaver adrian.klaver@aklaver.com
More information about my setup:
Postgres version:
PostgreSQL 10.9 (Debian 10.9-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
Table schema:
CREATE TABLE public.projects (
misc jsonb DEFAULT '{}'::jsonb NOT NULL
);
Explain analyze:
explain analyze update projects set misc = misc - 'foo';
Update on projects (cost=0.00..4240.93 rows=10314 width=1149) (actual time=346318.291..346318.295 rows=0 loops=1)
-> Seq Scan on projects (cost=0.00..4240.93 rows=10314 width=1149) (actual time=1.011..266.435 rows=10314 loops=1)
Planning time: 40.087 ms
Trigger trigger_populate_tsv_body_on_projects: time=341202.492 calls=10314
Execution time: 346320.260 ms
Time: 345969.035 ms (05:45.969)
Update on projects (cost=0.00..4240.93 rows=10314 width=1149) (actual time=346318.291..346318.295 rows=0 loops=1)
-> Seq Scan on projects (cost=0.00..4240.93 rows=10314 width=1149) (actual time=1.011..266.435 rows=10314 loops=1)
Planning time: 40.087 ms
Trigger trigger_populate_tsv_body_on_projects: time=341202.492 calls=10314
Execution time: 346320.260 ms
Time: 345969.035 ms (05:45.969)
Figured out that it's due to the trigger. Thanks for your help, Adrian!
On Wed, Jul 17, 2019 at 10:39 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 7/17/19 7:30 AM, Volkan Unsal wrote:
> I'm trying to remove a key from a jsonb column in a table with 10K rows,
> and the performance is abysmal. When the key is missing, it takes 5
> minutes. When the key is present, it takes even longer.
>
> Test with non-existent key:
>
> >> update projects set misc = misc - 'foo';
> Time: 324711.960 ms (05:24.712)
>
> What can I do to improve this?
Provide some useful information:
1) Postgres version
2) Table schema
3) Explain analyze of query
--
Adrian Klaver
adrian.klaver@aklaver.com
Volkan Unsal
Product Engineer
On 7/17/19 7:59 AM, Volkan Unsal wrote: > Aha, it's due to the trigger, isn't it? Yes. > > On Wed, Jul 17, 2019 at 10:58 AM Volkan Unsal <spocksplanet@gmail.com > <mailto:spocksplanet@gmail.com>> wrote: > > @Adrian > > More information about my setup: > > Postgres version: > PostgreSQL 10.9 (Debian 10.9-1.pgdg90+1) on x86_64-pc-linux-gnu, > compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit > > Table schema: > CREATE TABLE public.projects ( > misc jsonb DEFAULT '{}'::jsonb NOT NULL > ); > > Explain analyze: > explain analyze update projects set misc = misc - 'foo'; > > Update on projects (cost=0.00..4240.93 rows=10314 width=1149) > (actual time=346318.291..346318.295 rows=0 loops=1) > -> Seq Scan on projects (cost=0.00..4240.93 rows=10314 > width=1149) (actual time=1.011..266.435 rows=10314 loops=1) > Planning time: 40.087 ms > Trigger trigger_populate_tsv_body_on_projects: time=341202.492 > calls=10314 > Execution time: 346320.260 ms > > Time: 345969.035 ms (05:45.969) > > > > On Wed, Jul 17, 2019 at 10:39 AM Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > On 7/17/19 7:30 AM, Volkan Unsal wrote: > > I'm trying to remove a key from a jsonb column in a table > with 10K rows, > > and the performance is abysmal. When the key is missing, it > takes 5 > > minutes. When the key is present, it takes even longer. > > > > Test with non-existent key: > > > > >> update projects set misc = misc - 'foo'; > > Time: 324711.960 ms (05:24.712) > > > > What can I do to improve this? > > Provide some useful information: > > 1) Postgres version > > 2) Table schema > > 3) Explain analyze of query > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > > > -- > *Volkan Unsal* > /Product Engineer/ > volkanunsal.com <http://volkanunsal.com> > > > > -- > *Volkan Unsal* > /Product Engineer/ > volkanunsal.com <http://volkanunsal.com> -- Adrian Klaver adrian.klaver@aklaver.com
Volkan Unsal <spocksplanet@gmail.com> writes: > I'm trying to remove a key from a jsonb column in a table with 10K rows, > and the performance is abysmal. When the key is missing, it takes 5 > minutes. When the key is present, it takes even longer. How wide are the jsonb values? It seems likely that most of this is TOAST overhead [1], ie time to reassemble wide jsonb values and then split them up again. As Tumasgiu already mentioned, it'd likely be useful to suppress updates of rows that don't actually need to change, assuming that the key appears in a minority of rows. And an index could help even more, by avoiding the need to reconstruct wide values to see if the key appears in them. Of course, if most of the rows need an update, neither of these will help and you just gotta live with it. Possibly reconsider your approach of using a large JSONB value to contain fields you need to update individually. That's never going to be great for performance. SQL (or at least Postgres) is incapable of updating portions of columns efficiently. regards, tom lane [1] https://www.postgresql.org/docs/current/storage-toast.html