Re: [HACKERS] Fwd: Joins and links - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: [HACKERS] Fwd: Joins and links |
Date | |
Msg-id | 3785FF36.D2E85655@trust.ee Whole thread Raw |
In response to | Re: [HACKERS] Fwd: Joins and links (Bob Devine <devine@cs.utah.edu>) |
List | pgsql-hackers |
Leon wrote: > > Hannu Krosing wrote: > > > The difference will quickly degrade as more rows are fetched in one > > query and cache misses and disk head movement start rattling your > > disks. The analogy being a man who needs 10 different items from a > > supermarket and takes 10 full round trips from home to buy them. > > Frankly, I didn't even consider fetching database from disk. This > slows queries immensely and I wonder if there exist someone who > doesn't keep their entire DB in RAM. Well, I personally dont even know, how I could keep my entire PostgreSQL DB in RAM :) It would be interesting to know what percentage of people do use PostgreSQL for databases that are small enough to fit in RAM - surely not the ones who need splitting of tables >2GB. And I think that setting up PostgreSQL for maximum RAM usage would make a nice topic for "Optimizing PostgreSQL". When my backends are mostly idle they usually use about 3-4MB of memory, (hardly enough for any database :). It is quite possible that some bigger tables end up in a disk-cache, but you can expect to find all your data in that cache only if you do many queries on the same tables in a row, and the machine is otherways idle. > I think the gain will raise with the number of tables, because > the more tables - the more index lookups are saved. My point is that sometimes even sequential scan is faster than index lookup, and not only due to overhead of using the index, but due to better disk performance of sequential reads vs. random reads For in-memory databases this of course does not count. Still I'm quite sure that the main effort in PostgreSQL development has so far gone to optimising queries where most of the data is fetched from the disk. > > Fixing the optimizer would get a performance gain on a far wider > > range of tasks, and is still needed for links. > > But general fixing of optimizer is total rewritement of it, whereas > link fix is almost a fast hack. I'm not too sure about it. It certainly can be done without a _total_ rewrite, but getting all the new node types and access methods into the parser/planner/executor may not be trivial. One idea would be a cross-table OID index for anything in memory. Then, assuming that everything is in-memory, using oids as links would be only trivially, if at all, slower (2-10 memory accesses and comparisons) than "straight" link lookup, that could also be chasing linear chains of forward-id-links on frequently updated DBs. On infrequently updated DBs you could just use triggers and/or cron jobs to keep your reports updated, I quess that this is what most commercial OLAP systems do. Actually I lived my first halfyear of using PostgreSQL under a delusion that lookup by OID would be somewhat special (fast). Probably due to my misunderstanding of the (ever diminishing) O in ORDBMS :) There have been some attempts to get the object-orientedness better supported by PGSQL, (possibly even some contrib funtions), but nobody seems to have needed it bad enough to 1) implement itand 2) shout long enough to get it included in standart distibution. Most (all?) of the developers seem to be die-hard RDBMS guys and thanks to that we have now a solid and reasonably fast Relational DBMS with some OO rudiments So I quess that unless you do (at least part of) links, no-one else will ;( > Unfortunately I already have a project to work on. There is too > little of me for two projects. Darn! I almost hoped we would get one more PostgreSQL hacker as I'm sure that after familiarising oneself with the code enougth to implement links one would be quite capable of helping with most of PostgreSQL development <grin> ---------------------- Hannu
pgsql-hackers by date: