Re: Avoid sorting when doing an array_agg - Mailing list pgsql-general

From Kiriakos Georgiou
Subject Re: Avoid sorting when doing an array_agg
Date
Msg-id 58C699B6-4616-405A-B28E-ED177866E1F3@olympiakos.com
Whole thread Raw
In response to Avoid sorting when doing an array_agg  (Alexis Woo <awoo2611@gmail.com>)
Responses Re: Avoid sorting when doing an array_agg
List pgsql-general

The array_agg() has nothing to do with it.  It’s the group by.

Without knowing what you are conceptually trying to accomplish, I can’t say much.

On my test 9.4.10 db, a similar example does a HashAggregate, so no sorting (google HashAggregate vs GroupAggregate).  But still it’s an expensive query because of all the I/O.

If I wanted to instantly have the user ids for a specific first, last name and category combo, I’d maintain a summary table via an insert trigger on the users table.

 

Kiriakos

 

 

From: <pgsql-general-owner@postgresql.org> on behalf of Alexis Woo <awoo2611@gmail.com>
Date: Friday, December 2, 2016 at 12:25 PM
To: <pgsql-general@postgresql.org>
Subject: [GENERAL] Avoid sorting when doing an array_agg

 

I have a users table which contains ~70 million rows that looks like this:

 

   Column    |       Type        |

-------------+-------------------+

 id          | integer           |

 first_name  | character varying |

 last_name   | character varying |

 category_id | integer           |

Indexes:

    "users_id_idx" btree (id)

    "users_category_id_first_name_last_name_idx" btree (category_id, first_name, last_name)

 

I'm trying to retrieve the ids for each (first_name, last_name) couple for one specific category_id.

The query that I'm currently doing is the following:

 

select array_agg(id)

from users

where category_id = 5432

group by first_name, last_name;

 

For which the explain analyze output is the following:

 

 GroupAggregate  (cost=618461.35..626719.42 rows=26881 width=19) (actual time=1683.139..2613.386 rows=102943 loops=1)

   Group Key: first_name, last_name

   ->  Sort  (cost=618461.35..620441.86 rows=792206 width=19) (actual time=1683.116..2368.904 rows=849428 loops=1)

         Sort Key: first_name, last_name

         Sort Method: external merge  Disk: 25304kB

         ->  Bitmap Heap Scan on users  (cost=26844.16..524595.92 rows=792206 width=19) (actual time=86.046..229.469 rows=849428 loops=1)

               Recheck Cond: (category_id = 5432)

               Heap Blocks: exact=7938

               ->  Bitmap Index Scan on users_category_id_first_name_last_name_idx  (cost=0.00..26646.11 rows=792206 width=0) (actual time=85.006..85.006 rows=849428 loops=1)

                     Index Cond: (category_id = 5432)

 

What seems to greatly decrease the performance of the query is the "Sort Method: external merge Disk: 7526kB."

 

Is it possible to aggregate the ids without doing a sort ?

If not, what other options, apart from increasing the work_mem, do I have ?

Thanks,

Alexis

 

pgsql-general by date:

Previous
From: rob stone
Date:
Subject: Re: No select privileges when not connecting from login postgres
Next
From: Joseph Brenner
Date:
Subject: Select works only when connected from login postgres