Home > mailing lists
Re: Index Scans become Seq Scans after VACUUM ANALYSE - Mailing list pgsql-hackers

From	Mark Pritchard
Subject	Re: Index Scans become Seq Scans after VACUUM ANALYSE
Date	April 18, 2002 11:21:59
Msg-id	1019094698.25540.116.camel@set.tangent.net.au Whole thread Raw
In response to	Re: Index Scans become Seq Scans after VACUUM ANALYSE (Mark Pritchard <mark@tangent.net.au>)
List	pgsql-hackers
Tree view
Apologies for the naff double post, but I meant to add that obviously
the figures for the solaris box are bogus after the first run...imagine
a file system cache of an entire 2gb file. I tried creating a file of
4gb on this box, but it bombed with a "file too large error".
Unfortunately, I can't rip memory out of this box as I don't have
exclusive access.

On Thu, 2002-04-18 at 11:49, Mark Pritchard wrote:
> I threw together the attached program (compiles fine with gcc 2.95.2 on
> Solaris 2.6 and egcs-2.91.66 on RedHat Linux 6.2) and ran it a few
> times. Data is below. Usual disclaimers about hastily written code etc
> :)
> 
> Machine = ghoul (generic intel, 384mb ram, dual p3-800, ide disk running
> dma)
> 
> Sequential
> Bytes Read    Time    Bytes / Sec
> 536870912       27.14   19783933.74
> 536870912       27.14   19783990.60
> 536870912       27.11   19801872.14
> 536870912       26.92   19942928.41
> 536870912       27.31   19657408.43
>                         19794026.66 (avg)
> 
> Random        
> Bytes Read    Time    Bytes / Sec
> 1073741824      519.57  2066589.21
> 1073741824      517.78  2073751.44
> 1073741824      516.92  2077193.23
> 1073741824      513.18  2092333.29
> 1073741824      510.68  2102579.88
>                         2082489.41 (avg)
> 
> Machine = jedi (Sun E420, 3gb ram, dual 400s, test on single scsi disk)
> 
> Sequential        
> Bytes Read    Time    Bytes / Sec
> 2097152000      65.19   32167675.28
> 2097152000      65.22   32154114.65
> 2097152000      65.16   32182561.99
> 2097152000      65.12   32206105.12
> 2097152000      64.67   32429463.26
>                         32227984.06 (avg)
> 
> Random        
> Bytes Read    Time    Bytes / Sec
> 4194304000      1522.22 2755394.79
> 4194304000      278.18  15077622.05
> 4194304000      91.43   45874730.07
> 4194304000      61.43   68273795.19
> 4194304000      54.55   76890231.51
>                         41774354.72
> 
> If I interpret Tom's "divide" instruction correctly, is that a factor of
> 10 on the linux box?
> 
> On Thu, 2002-04-18 at 01:16, Tom Lane wrote:
> > "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> writes:
> > > On my own few experience I think this could be solved decreasing
> > > random_page_cost, if you would prefer to use indexes than seq scans, then
> > > you can lower random_page_cost to a point in which postgres works as you
> > > want. So the planner would prefer indexes when in standard conditions it
> > > would prefer seq scans.
> > 
> > It's entirely possible that the default value of random_page_cost is too
> > high, at least for many modern machines.  The experiments I did to get
> > the 4.0 figure were done a couple years ago, on hardware that wasn't
> > exactly new at the time.  I have not heard of anyone else trying to
> > measure it though.
> > 
> > I don't think I have the source code I used anymore, but the principle
> > is simple enough:
> > 
> > 1. Make a large file (several times the size of your machine's RAM, to
> > ensure you swamp out kernel disk buffering effects).  Fill with random
> > data. (NB: do not fill with zeroes, some filesystems optimize this away.)
> > 
> > 2. Time reading the file sequentially, 8K per read request.
> > Repeat enough to get a statistically trustworthy number.
> > 
> > 3. Time reading randomly-chosen 8K pages from the file.  Repeat
> > enough to get a trustworthy number (the total volume of pages read
> > should be several times the size of your RAM).
> > 
> > 4. Divide.
> > 
> > The only tricky thing about this is making sure you are measuring disk
> > access times and not being fooled by re-accessing pages the kernel still
> > has cached from a previous access.  (The PG planner does try to account
> > for caching effects, but that's a separate estimate; the value of
> > random_page_cost isn't supposed to include caching effects.)  AFAIK the
> > only good way to do that is to use a large test, which means it takes
> > awhile to run; and you need enough spare disk space for a big test file.
> > 
> > It'd be interesting to get some numbers for this across a range of
> > hardware, filesystems, etc ...
> > 
> >             regards, tom lane
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> > 
> 
> 
> ----
> 

> #include <errno.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <time.h>
> #include <sys/stat.h>
> #include <sys/time.h>
> 
> /**
>  * Constants
>  */
> 
> #define BLOCK_SIZE        (8192)
> 
> /**
>  * Prototypes
>  */
> 
>     // Creates the test file filled with random data
>     void createTestFile(char *testFileName, long long fileSize);
> 
>     // Handles runtime errors by displaying the function, activity and error number
>     void handleError(char *functionName, char *activity);
> 
>     // Standard entry point
>     int main(int argc, char *args[]);
> 
>     // Prints correct usage and quits
>     void printUsageAndQuit();
> 
>     // Tests performance of random reads of the given file
>     void testRandom(char *testFileName, long long amountToRead);
> 
>     // Tests performance of sequential reads of the given file
>     void testSeq(char *testFileName);
> 
> /**
>  * Definitions
>  */
> 
> /**
>  * createTestFile()
>  */
> void createTestFile(char *testFileName, long long fileSize)
> {
>     FILE *testFile;
>     long long reps, i, j, bufferReps;
>     time_t timetmp;
>     long long *buffer;
>     size_t written;
> 
>     // Indicate op
>     printf("Creating test file %s of %lld mb\n",testFileName,fileSize);
> 
>     // Adjust file size to bytes
>     fileSize *= (1024*1024);
> 
>     // Allocate a buffer for writing out random long longs
>     if (!(buffer = malloc(BLOCK_SIZE)))
>         handleError("createTestFile()","malloc");
> 
>     // Open the file for writing
>     if (!(testFile = fopen(testFileName, "wb")))
>         handleError("createTestFile()","fopen");
> 
>     // Initialise the random number generator
>     srandom(time(NULL));
> 
>     // Write data
>     reps         = fileSize / BLOCK_SIZE;
>     bufferReps    = BLOCK_SIZE / sizeof(long long);
>     for (i = 0; i < reps; i++)
>     {
>         // Fill buffer with random data
>         for (j = 0; j < bufferReps; j++)
>             buffer[j] = random();
> 
>         // Write
>         written = fwrite(buffer, sizeof(long long), bufferReps, testFile);
>         if (written != bufferReps)
>             handleError("createTestFile()","fwrite");
>     }
> 
>     // Flush and close
>     if (fflush(testFile))
>         handleError("createTestFile()","fflush");
>     if (fclose(testFile))
>         handleError("createTestFile()","fclose");
> 
>     // Free buffer
>     free(buffer);
> }
> 
> /**
>  * handleError()
>  */
> void handleError(char *functionName, char *activity)
> {
>     fprintf(stderr, "Error in %s while attempting %s. Error %d (%s)\n", functionName, activity, errno,
strerror(errno));
>     exit(1);
> }
> 
> /**
>  * main()
>  */
> int main(int argc, char *argv[])
> {
>     // Print usage and quit if argument count is definitely incorrect
>     if (argc < 3)
>     {
>         // Definitely wrong
>         printUsageAndQuit();
>     }
>     else
>     {
>         // Dispatch
>         if (!strcmp(argv[1], "create"))
>         {
>             if (argc != 4)
>                 printUsageAndQuit();
> 
>             // Create the test file of the specified size
>             createTestFile(argv[2], atol(argv[3]));
>         }
>         else if (!strcmp(argv[1], "seqtest"))
>         {
>             if (argc != 3)
>                 printUsageAndQuit();
> 
>             // Test performance of sequential reads
>             testSeq(argv[2]);
>         }
>         else if (!strcmp(argv[1], "rndtest"))
>         {
>             if (argc != 4)
>                 printUsageAndQuit();
> 
>             // Test performance of random reads
>             testRandom(argv[2], atol(argv[3]));
>         }
>         else
>         {
>             // Unknown command
>             printUsageAndQuit();
>         }
>     }
> 
>     return 0;
> }
> 
> /**
>  * printUsageAndQuit()
>  */
> void printUsageAndQuit()
> {
>     puts("USAGE: rndpgcst [create <file> <size_in_mb>] | [seqtest <file>] | [rndtest <file> <read_in_mb>]");
> 
>     exit(1);
> }
> 
> /**
>  * testSeq()
>  */
> void testSeq(char *testFileName)
> {
>     FILE *testFile;
>     char *buffer;
>     long long reps, totalRead, thisRead, timeTaken;
>     struct timeval startTime, endTime;
>     struct timezone timezoneDiscard;
> 
>     // Indicate op
>     printf("Sequential read test of %s\n",testFileName);
> 
>     // Grab a buffer
>     buffer = malloc(BLOCK_SIZE);
> 
>     // Open the file for reading
>     if (!(testFile = fopen(testFileName, "rb")))
>         handleError("testSeq()","fopen");
> 
>     // Start timer
>     if (gettimeofday(&startTime, &timezoneDiscard) == -1)
>         handleError("testSeq()", "gettimeofday start");
> 
>     // Read all data from file
>     totalRead = 0;
>     while ((thisRead = fread(buffer, 1, BLOCK_SIZE, testFile)) != 0)
>         totalRead += thisRead;
> 
>     // End timer
>     if (gettimeofday(&endTime, &timezoneDiscard) == -1)
>         handleError("testSeq()", "gettimeofday start");
> 
>     // Close
>     if (fclose(testFile))
>         handleError("testSeq()","fclose");
> 
>     // Free the buffer
>     free(buffer);
> 
>     // Display time taken
>     timeTaken = (endTime.tv_sec - startTime.tv_sec) * 1000000;
>     timeTaken += (endTime.tv_usec - startTime.tv_usec);
>     printf("%lld bytes read in %f seconds\n", totalRead, (double) timeTaken / (double) 1000000);
> }
> 
> /**
>  * testRandom()
>  */
> void testRandom(char *testFileName, long long amountToRead)
> {
>     FILE *testFile;
>     long long reps, i, fileSize, timeTaken, totalRead, readPos, thisRead, offsetMax;
>     struct stat fileStats;
>     char *buffer;
>     struct timeval startTime, endTime;
>     struct timezone timezoneDiscard;
> 
>     // Indicate op
>     printf("Random read test of %s for %lld mb\n", testFileName, amountToRead);
> 
>     // Initialise the random number generator
>     srandom(time(NULL));
> 
>     // Adjust amount to read
>     amountToRead *= (1024*1024);
> 
>     // Determine file size
>     if (stat(testFileName, &fileStats) == -1)
>         handleError("testRandom()", "stat");
>     fileSize = fileStats.st_size;
> 
>     // Grab a buffer
>     buffer = malloc(BLOCK_SIZE);
> 
>     // Open the file for reading
>     if (!(testFile = fopen(testFileName, "rb")))
>         handleError("testRandom()","fopen");
> 
>     // Start timer
>     if (gettimeofday(&startTime, &timezoneDiscard) == -1)
>         handleError("testRandom()", "gettimeofday start");
> 
>     // Read data from file
>     reps         = amountToRead / BLOCK_SIZE;
>     offsetMax    = fileSize / BLOCK_SIZE;
>     for (i = 0; i < reps; i++)
>     {
>         // Determine read position
>         readPos = (random() % offsetMax) * BLOCK_SIZE;
> 
>         // Seek and read
>         if (fseek(testFile, readPos, SEEK_SET) == -1)
>             handleError("testRandom()","fseek");
>         if ((thisRead = fread(buffer, 1, BLOCK_SIZE, testFile)) != BLOCK_SIZE)
>             handleError("testRandom()","fread");
>     }
> 
>     // End timer
>     if (gettimeofday(&endTime, &timezoneDiscard) == -1)
>         handleError("testRandom()", "gettimeofday start");
> 
>     // Close
>     if (fclose(testFile))
>         handleError("testRandom()","fclose");
> 
>     // Free the buffer
>     free(buffer);
> 
>     // Display time taken
>     timeTaken = (endTime.tv_sec - startTime.tv_sec) * 1000000;
>     timeTaken += (endTime.tv_usec - startTime.tv_usec);
>     printf("%lld bytes read in %f seconds\n", amountToRead, (double) timeTaken / (double) 1000000);
> }
pgsql-hackers by date:
From: Dav Coleman
Date: 18 April 2002, 11:21:47
Subject: SQL Query Optimization
From: Tom Lane
Date: 18 April 2002, 11:22:02
Subject: Re: Index Scans become Seq Scans after VACUUM ANALYSE
Re: Index Scans become Seq Scans after VACUUM ANALYSE - Mailing list pgsql-hackers

Previous

Next