Re: [HACKERS] Need some help on code - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [HACKERS] Need some help on code
Date
Msg-id 199806072026.QAA23211@candle.pha.pa.us
Whole thread Raw
In response to Need some help on code  (Maarten Boekhold <maartenb@dutepp0.et.tudelft.nl>)
Responses Re: [HACKERS] Need some help on code
List pgsql-hackers
>
> Hi,
>
> I was trying to change to cluster command to do the its writes clustered
> by a 100 tuples, thus hoping to improve performance. However, the code
> I've written crashes. This has certainly to do with some internal states
> of pgsql that aren't preserved in a HeapTuple.
>
> Could somebody with knowledge have a brief glimpse on my code and perhaps
> tell me how to do it properly?

I did not look at the code, but I can pretty much tell you that bunching
the write will not help performance.  We already do that pretty well
with the cache.

THe problem with the cluster is the normal problem of using an index to
seek into a data table, where the data is not clustered on the index.
Every entry in the index requires a different page, and each has to be
read in from disk.

Often the fastest way is to discard the index, and just read the table,
sorting each in pieces, and merging them in.  That is what psort does,
which is our sort code.  That is why I recommend the SELECT INTO
solution if you have enough disk space.

Once it is clustered, subsequent clusters should be very fast, because
only the out-of-order entries cause random disk seeks.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

pgsql-hackers by date:

Previous
From: Maarten Boekhold
Date:
Subject: Need some help on code
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] NEW POSTGRESQL LOGOS