Home > mailing lists

Re: Efficient sorting the results of a join, without denormalization - Mailing list pgsql-general

From	David G. Johnston
Subject	Re: Efficient sorting the results of a join, without denormalization
Date	May 31, 2015 04:33:24
Msg-id	CAKFQuwb7_DegQ2KXBA6BsEz7ZC1iD5LjVm5mWHP+hQvMKsdM6A@mail.gmail.com Whole thread
In response to	Efficient sorting the results of a join, without denormalization ("Glen M. Witherington" <glen@fea.st>)
Responses	Re: Efficient sorting the results of a join, without denormalization
List	pgsql-general

Tree view

On Saturday, May 30, 2015, Glen M. Witherington <glen@fea.st> wrote:

Sorry about the horrendous subject, let me explain by example:

Let's take this schema:

```
CREATE TABLE a (
id bigserial PRIMARY KEY,
created_at timestamp with time zone NOT NULL DEFAULT NOW()
);

CREATE TABLE b(
id bigserial PRIMARY KEY,
a_id bigint NOT NULL REFERENCES a(id),
created_at timestamp with time zone NOT NULL DEFAULT NOW()
);

CREATE TABLE c(
id bigserial PRIMARY KEY,
b_id bigint NOT NULL REFERENCES b(id),
created_at timestamp with time zone NOT NULL DEFAULT NOW()
);
```

And let's fill it up with some dummy data, that roughly matches the
distribution of mine:

```
INSERT INTO a SELECT FROM generate_series(1, 5);
INSERT INTO b(a_id) SELECT (i % 5) + 1 FROM generate_series(1, 100) i;
INSERT INTO c(b_id) SELECT (trunc(random() * 100)+1) FROM
generate_series(1, 1000000);
```

And here's the query I want to do, efficiently:

````
SELECT * FROM c
JOIN b ON b.id = c.b_id
JOIN a ON a.id = b.a_id
WHERE a.id = 3
ORDER BY b.created_at DESC
LIMIT 10
```

There seems to simply be no index I can put on the data, that allows me
to run this query efficiently. Four hours of playing with this, the only
solution I can see is, normalizing table `c` by adding a field "b's
a_id" and then creating an index on (bs_a_id, created_at).

But surely there's a better solution?

This is one problem with using made up surrogate keys...

The PK of A is a component of both the PK of B and the PK of C but you throw that information away by using serial fields for PKs instead. You should have unique indexes on B and C that incorporate the ID from A and then indeed you will end up with a join sequence that can be executed against efficiently.

All that said you really should put indexes on the foreign keys...

I haven't run the code to see the actual plan....

David J.

pgsql-general by date:

From: Maxim Boguk
Date: 31 May 2015, 03:28:39
Subject: Curious case of huge simple btree indexes bloat.

From: "Glen M. Witherington"
Date: 31 May 2015, 04:43:31
Subject: Re: Efficient sorting the results of a join, without denormalization

Re: Efficient sorting the results of a join, without denormalization - Mailing list pgsql-general

Previous

Next