From lm@snafu.Sun.COM Fri May 18 14:07:11 1990 From: lm@snafu.Sun.COM (Larry McVoy) Newsgroups: comp.unix.wizards Subject: Re: Sun file buffering Date: 18 May 90 07:05:58 GMT Reply-To: lm@sun.UUCP (Larry McVoy) Organization: Sun Microsystems, Mountain View In article <23359@adm.BRL.MIL> jim@pemrac.space.swri.edu (James Biard) writes: >I am having trouble with a Sun 370 (Sparc architechture) under Sun OS 4.1 >with very long latency file buffering in memory. I am writing an application >that writes large amounts of data to a file (or files) at a high rate. >The system has 32 Mbytes of core, and what I am finding, through vmstat and >peeking in the kernel, is that the amount of free memory drops to about 256 >kbytes after the program has been started. The rate at which the free memory >drops to this point is directly correlated with the rate at which I write to >the output file. If I then kill the program, the free memory level remains at >this low point for arbitrarily long times (> 30 minutes). If I delete the >output file (or files), the free memory rises to a normal level of about 24 >Mbytes. I tried using sync during or after program execution, but it made no >difference. I tried using fsync, and found that fsync after every write raised >the low water mark for free memory to about 10 kbytes, but that fsync's after >every 10 or 100 writes made no discernable difference. Naturally, an fsync >after every write ruined program performance. I tried dynamically raising the >values of minfree and desfree in the kernel with no effect, but then I'm not >sure of what I'm doing there, and may have done it wrong. Oh, boy. I know what is going on. You're probably not going to like the answer, but the system design has its merits, so pay attention. This is OS 101 as taght by your preacher. Dig out your Multics books, because that's where this came from. You need to understand the difference between the SunOS 4.x and other Unix VM and file system (in particular, the file system cache, aka the buffer cache). (I lie a little here - most of this applies to Mach - The SunOS and the Mach VM systems are quite similar.) In the old systems, God created two protions of core - process space and file space. God named 10 % of memory ``the buffer cache.'' All I/O went through the buffer cache, with no nasty effects on the process portion of memory. God looked, and thought it was good. Then came Adam from a land called Bezerkley. Many improvements were made in the buffer cache replacement algorithms but they were not visionary like God's so they were called heuristics. These heuristics were also seen and blessed as good, for they detected sequential I/O and switched from that nasty LRU to a nice MRU replacement algorithm. Life went on. Little changed. Then a disciple of Adam, a fast talking brash young man, left the land of Bezerkley in search of that promised land, BIG BUSINESS. (He found it). He took what was good and ported it to a new machine. He sold many machines based on the work done in Bezerkley. But after a time he grew tired with the messy Bezerkley VM system, for it was based a venerable, but outdated, machine. And he decided to redesign the VM system for the good of the world. So he did. And his friends implemented this new system. And it was good. As part of the design, the distinction between the process and the file disappeared. This was known as unified memory and was blessed by all as good. (This was before it was tested.) But our friends grew tired from their hard work implementing this new system and they grew careless. They threw away Adam's carefully tuned heuristics. And this was bad. They used the old pager, which loosely approximates LRU. And this was worse, especially for file I/O. And then they shipped it, because they had had enough and wanted to learn about new things, like beer. So now you know why you are having problems. I suggest that you get religion but skip straight to the beer part and hang out until a release >from the company of our fast talking friend. I suspect that this release will address many of your anxieties. --- What I say is my opinion. I am not paid to speak for Sun, I'm paid to hack. Besides, I frequently read news when I'm drjhgunghc, err, um, drunk. Larry McVoy, Sun Microsystems (415) 336-7627 ...!sun!lm or lm@sun.com