[WARNINGS]
link-id not found - [fmiosacomparativestudypartikernellandchapter1bootup|1
link-id not found - [fmiosacomparativestudypartikernellandchapter3scheduling|3
link-id not found - [fmiosacomparativestudypartikernellandchapter5memorymanagement|5
link-id not found - [fmiosacomparativestudypartikernellandchapter7deadlocks|7
link-id not found - [fmiosacomparativestudypartiiuserlandchapter8iolayers|8
link-id not found - [fmiosacomparativestudypartiiuserlandchapter10environmentlayer|10

Intro (I 1 2 3 4 5 6 7) (II 8 9 10 11) (D) (C) (B

I)

h1. Chapter 9: File Systems

("rm -rf /bin/laden" – Anonymous)

The topic of file systems actually incorporates many subjects that haven't been discussed yet. And, being the first chapter of discussion for traditional User Space, this chapter must cover quite a bit of ground. Physical files, those stored on external media, will be looked at, first. Then, we will take a look at basic input/output (I/O) concepts (which are used with many things, not just disk I/O). Afterwards, we will start to discuss cases when the file system as a whole deals with "files" that may not be the physical variety (i.e. file representations of devices, running processes and their information as a file, etc).

h2. 1. Physical File Basics

Data stored in RAM and cache are only there when the computer is on. The moment the computer is turned off, this data vanishes. This memory is considered volatile. Hard drives, flash cards, CD-ROMs are all examples of non-volatile media. Any memory stored on these will stick around even when the computer is shut off. Data stored on these mediums may take the form of files.

Files come in different shapes and sizes:

  • Regular files include text and binary files. Text files can be edited or otherwise opened to reveal readable text. Binary files are those that are not text. These include archives (compressed files such as JPG or ZIP or TGZ) and executables. Binary files have some sort of internal structure set by a particular standard. JPG files adhere to the Jpeg standard, MP3 files to the Mpeg2 Layer III standard, and executable files to the ELF standard, see Trac "Ticket redminemarkdownextraformatter1":/issues/show/36 or "Executable and Linkable Format (ELF)" in the Bibliography.

  • Directories of files are often files, themselves. The information they contain is a list of what files are in that directory.

  • Character-special files are related to input/output and are used to model serial I/O devices such as terminals, printers, and networks.

  • Block-special files are used to model disks such as floppies and CD-ROMs.

Accessing data inside files can be done in two different ways: sequential and random access. Sequential access is exemplified by tape drives where data can only be read and written in sequential order. If you wanted to read some data from the middle of the file you had to start from the beginning of the file and scan through to the middle. Hard drives and CD-ROMs are an example of random-access devices where you can read and write to any part of a file without having to start from the beginning.

To implement security, files are assigned attributes that the file system driver must strictly adhere to. These attributes include not just date, but also protections (similar to memory protection attributes: read-only, read/write, etc) and permissions (who can do what to the file, Chapter 11 Security).

The next issue is how file and directory structures are designed and allocated.

h3. File Structure

There are a few different ways to implement file allocation. One way is to allocate data contiguously on the device, where the blocks (think: pages; see Managing Disk Space, below) containing the file are stored in contiguous memory. This only works if a file's max size is known ahead of time and it doesn't change much. Otherwise, serious checkerboarding (gaps between chunks of data) can occur as these files shrink, or are deleted, much like some of the early memory models discussed in Chapter 5.

Another allocation method is using linked lists. With this, file blocks are linked to one another (i.e. file block 1 has a pointer at the end of it that points to file block 2, etc). This allows for files' blocks to be placed randomly on the disk. However, random access is difficult and time consuming with this method because the reader will have to start at block 1 and follow the links from one block to the other, turning it into a sequential-data model at run-time.

However, keeping a table of linked lists for allocation would expedite random access. This table would contain all the links for the list. So, when attempting random access, instead of traversing through each block just to get to a single block in the center of the list, you can jump directly there using the link in the table. Since the table is held in memory by the file system driver, the seek can occur without making any disk references. However, keeping this table in memory takes a lot of time to load, and a lot of space to store.

The third, and most widely used, file allocation technique is using i-nodes (index nodes). Each file has an associated little table, called an i-node, that contains some of the attributes of the file and a tree of addresses to all the blocks that the file uses. If the file is large, then one of the entries in the i-node table is the address of a single indirect block, which contains addresses of more blocks of pointers. If the file is even larger, then double indirect blocks point to single indirect blocks, which point to single blocks, etc.

h3. Directory Structure

The basic structure of most file directories has a root directory. This is the directory at the top of the hierarchy. All directories are contained within other directories, which are contained within this. Some file systems denote @/@ as the root directory (POSIX systems) and some use @\@ (MS-DOS and Win32). Let's say you're on a POSIX system in your home directory, i.e. @/home/dimitri/@. Every file you attempt to access will be looked for within this directory. This is called your working directory (or current directory). A file in this directory, say @notes@, would actually have a full, absolute path name of @/home/dimitri/notes@. You can access files in any directory (not just the current one) using the absolute path name for that file.

Directory nodes are typically treated like files and are stored under the same name. They have the same types of attributes, and their data are actually lists of the files that are inside that directory.

h3. Managing Disk Space

When thinking about disk space management, many of the same concepts that dealt with (volatile) memory management apply. Instead of pages, file systems use blocks. These blocks are kept at a uniform size, which simplifies moving and managing. As with pages, the choice of block size requires careful consideration, as well as how to store a list of the free blocks.

Disk drives do go bad, too. With older disk drives, a software method of keeping track of blocks that have gone bad on the drive is a necessity. However, modern disk drives have become more and more reliable with their own built-in bad block management systems. Therefore, this topic won't be discussed here.

Since disk references take so much longer than memory references, keeping a memory cache of recently used file data is useful and is implemented in different OS's. Many algorithms can be used for managing this; most of them are similar to those discussed in Chapter 5 such as LRU and FIFO.

h2. 2. Orthogonality

This is the name applied to the concept that everything is a file, not just physical files residing on a disk. This includes file representations of:

  • devices (from UARTs to network interfaces)

  • networks, protocols, services

  • control and debugging of processes, system management, service discovery and naming (namer)

  • graphics/video

  • permanent storage structures (on disk, flash, mem, network)

This was actually a fairly wild concept when it was introduced, but now it is standard in many file systems (e.g. POSIX systems.) However, most, like UNIX and Linux, only implement this to a degree. For instance, their @dev@ directory may show devices as files, but those "files" are just gateways to the actual block or character devices and can't be traversed with normal filesystem calls, such as @cd@, @ls@, or manipulations, such as those made with @cp@ (copy) or @cat@ (concatenate).

FMI/OS takes this orthogonality concept to the true definition. Below, in section 4, there is an example of one of its servers, @namer@, being accessed through the file system that illustrates the power of this concept.

h2. 3. Monolithic OSes

h3. MS-DOS

On disk, DOS uses linked list table file allocation and avoids large tables by using large (32k) blocks. It also dynamically loads drivers into the kernel when executed.

h3. UNIX

==== File System Structure: ====

Files in UNIX are stored using i-nodes. Directory files are just specially identified files that contain lists of all the i-node#/filename pairs of the files in that directory.

==== I/O: ====

Since drivers are compiled into the kernel, the entirety of the kernel must be recompiled every time a device is added. Linux provides loadable modules that can be loaded at runtime to prevent this, but they are still used by the kernel, directly; this does not help the vulnerability factor of the kernel to the drivers.

These resources are accessible through the file system via a visible path name. For example, access to the primary hard drive is through the file @/dev/hda@. This is the "name" given to the driver that controls the device. User-level software (Level 6) may speak to the hardware device through this file name. Although this concept simplifies communication with drivers, the communication among these drivers, and drivers to devices, take place in the kernel in the form of syscalls. UNIX implements 300 syscalls and counting, actually.

h2. 4. Client-Server

h3. Plan9

In Plan9, device drivers are servers. All the kernel does is direct the messages from the clients to the appropriate servers (via the IPC). Since all the devices just look like servers to the programs then there is a true uniform interface to all resources.

Disks, graphics, security systems, etc., are all servers. A convenient implication of this is network transparency. This means that the introduction of a network is transparent to the system, i.e. remote computers or services appear local to users. This is because the IPC can send messages to processes across a network as easily as it can to local processes. So if a client wanted to access a disk drive that was actually on another computer, it would send a message to its server. The IPC then takes this message and sends it over the network to the appropriate server on the appropriate computer. This is possible because the kernel has its own NIC (Network Information Center: DNS server, etc) and TCP implementations.

Using different types of mappings and bindings each program assembles and computes its own namespace (environment in which it runs). Here are some common syscalls that Plan9 provides to facilitate this:

  • @bind()@ makes one file, or directory, appear as a different name. This is similar to a link usually created by the POSIX @ln@ command, but this can do unions. For instance, @bind("/user/dimitri/lib", "/lib", MBEFORE)@ will enable all searches into @/lib@ to, not only look through the original @/lib@ but @/user/dimitri/lib@ as well. The @MBEFORE@ flag tells the program to search through the latter first.

  • @unmount()@ reverses a @bind()@.

  • @mount()@ is like @bind()@ but actually makes an entire file system appear under a different directory name.

h3. FMI/OS

FMI/OS uses all of the Plan9 file system techniques mentioned above with only a few changes.

One of the key differences is the implementation of Asynchronous Buffer Caching, or "ABC":http://amatus.g-cipher.net/cgi-bin/archzoom.cgi/fmios-pqm@ocgnet.org--fmios/library--libc--1.0--patch-13/fmi/abc.c. This useful tool makes synchronous I/O reads act like asynchronous reads: instead of blocking for sending a buffer at a time, the ABC blocks the thread while it reads the whole block. The ABC also reads a certain number of blocks ahead in anticipation of another read request. It stores these blocks in a linked list whose nodes are pointed to by the nodes of a hash table.

When looking for a block, if it exists in the table it moves that node to the top. If it doesn't exist it creates a new entry (i.e. read the block from the disk and store in the buffer) and places the node at the top of the table. This method provides the ABC an inherent implementation of the LRU (Least Recently Used) paging algorithm.

A prime example of our use of orthogonality can be seen using the @namer@ service (Chapter 4). Suppose there was a server, @foo@, that wanted to be registered as a server, in order to open itself up for receiving messages. @Foo@ would first have to ask the kernel for a port number, let's say it gets the number 1234. Then, as do most programs, @foo@ would send a message to the namer requesting to be "registered" with the port number 1234. The namer will then mark @foo@ down in the lookup table with 1234 as its port number.

Now, there is an alternate way to "register" @foo@ with the namer. Since @namer@ is a service, it has its own directory, @/namer@. So, if you were to create the file in that directory @/namer/foo@, and put 1234 in it, the you would have effectively "registered" @foo@ with port number 1234.


(c)2006 Dimitri Hammond

h4. Comment by dam on Sun Dec 3 14:20:15 2006

Should I include more about FMI/OS's mounting, mount tables, etc?

AddComment

Also available in: HTML TXT