query patterns for multipass aggregating - Mailing list pgsql-general

From Rob Nikander
Subject query patterns for multipass aggregating
Date
Msg-id 11335409-A083-4951-BC7A-3C4FD118AC04@gmail.com
Whole thread Raw
Responses Re: query patterns for multipass aggregating
List pgsql-general
Hi,

I’ve run into this pattern a few times, and I usually get a little confused. I’m wondering if there are some common
solutionsor techniques. 

Simplified example:

I have tables `items`, `colors`, and `images`. Items have many colors, and many images.

I want a query to list items, with their colors and images. Each result row is an item, so the colors and images must
beaggregated into arrays or json.  

If there were only one other table, it’s a simple join and group…

    select items.*, array_agg(color_name)
    from items join colors on items.id = colors.item_id
    group by items.id

Now to get images too, my first try uses a CTE…

    // Tables:
    // items (id, name, foo)
    // colors (item_id, color_name, color_foo)
    // images (item_id, image_file, image_foo)

    with items_with_colors as (
      // This is the same query as above
      select items.*, array_agg(colors.color_name) as color_names
      from items
      join colors on items.id = colors.item_id
      group by items.id
    )
    // Same idea repeated but now joining to images table
    select items.*, array_agg(images.image_file) as image_files
    from items_with_colors items
    join images on items.id = images.item_id
    group by items.id, items.name, items.foo, items.color_names;  // mmm :(

One first problem is on the last line. I have to list out all the columns. In my real situation there are many more.
Listingthem is only an inconvenience, but I’m more worried that it is internally comparing all the columns, when really
itcould compare only `items.id`. 

So… are there some other techniques that usually (or might) work better? It seems like the kind of thing that could run
inparallel, if I wrote it right.  

Rob



pgsql-general by date:

Previous
From: Adrien Nayrat
Date:
Subject: Re: pg_dump out of memory for large table with LOB
Next
From: "Peter J. Holzer"
Date:
Subject: Re: Fwd: Log file