Home > mailing lists

Splitting Postgres into Separate Clusters? - Mailing list pgsql-general

From	Paul Jungwirth
Subject	Splitting Postgres into Separate Clusters?
Date	March 9, 2013 22:54:28
Msg-id	CA+6hpamSK-11vpPXH9QwYzOpNfNvo5iEc7UR=SOWGZCKxK9Naw@mail.gmail.com Whole thread Raw
Responses	Re: Splitting Postgres into Separate Clusters?
List	pgsql-general

Tree view

Hello,

I'm running a specialized search engine that indexes a few tens of millions of web pages, keeping everything in Postgres, and one problem I'm starting to see is poor cache hit rates. My database has two or three tables just for the text of the scraped pages, with one row every time a page is scraped and a `text` column for the HTML. These tables are almost-but-not-quite write-only. They are only read by one job, which uses them to create other objects in the system. I'd like the rest of the database to be in-memory all the time, but I don't really care if I have to read these tables from disk. To keep my problem tables from dragging down performance on the rest of the system, I'm thinking of splitting them out. I suppose I could save the HTML outside of Postgres entirely, but I'm wondering if a nicer solution would be to keep these tables in a separate "cluster" (i.e. /var/lib/postgresql/9.1/something_else -- I hope this is the right terminology). Then I could tune that cluster differently from the main cluster, or even put it on a different machine. And I could still use dblink to query both clusters conveniently (I think; this isn't really that important). Does this seem like a worthwhile approach? Is there a better way to deal with a few out-sized tables?

Thanks,
Paul

--
_________________________________
Pulchritudo splendor veritatis.

pgsql-general by date:

From: Bèrto ëd Sèra
Date: 09 March 2013, 21:08:29
Subject: Re: table spaces

From: Paul Jungwirth
Date: 10 March 2013, 00:20:33
Subject: Avoiding a deadlock

Splitting Postgres into Separate Clusters? - Mailing list pgsql-general

Previous

Next