Cache is used to keep the most commonly used sections of RAM in thecache, where it can be accessed quickly. This is necessary becauseCPU speeds increase much faster than speed of memory access. If wecould access RAM at 3 GHz, there wouldn't be any need for cache,because RAM could keep up. Because it can't keep up, we use cache.
What if we wanted more RAM than we had available. For example, wemight have 1 M of RAM, what if we wanted 10 M? How could we manage?
One way to extend the amount of memory accessible by a programis to use disk. Thus, we can use 10 Megs of disk space. At anytime, only 1 Meg resides in RAM.
In effect, RAM acts like cache for disk.
This idea of extending memory is called virtual memory.It's called "virtual" only because it's not RAM. It doesn'tmean it's fake.
The real problem with disk is that it's really, really slowto access. If registers can be accessed in 1 nanosecond,and cache in 5 ns and RAM in about 100 ns, then disk is accessedin fractions of seconds. It can be a million times slower toaccess disk than a register.
The advantage of disk is it's easy to get lots of disk spacefor a small cost.
Still, becaues disk is so slow to access, we want to avoidaccessing disk unnecessarily.
Initially, virtual memory meant the idea of using diskto extend RAM. Programs wouldn't have to care whether thememory was "real" memory (i.e., RAM) or disk. The operating systemand hardware would figure that out.
Later on, virtual memory was used as a means of memoryprotection. Every program uses a range of addressed called theaddress space.
The assumption of operating systems developers is that anyuser program can not be trusted. User programs will try to destroythemselves, other user programs, and the operating system itself.That seems like such a negative view, however, it's how operating systemsare designed. It's not necessary that programs have to be deliberatelymalicious. Programs can be accidentally malicious (modify the dataof a pointer pointing to garbage memory).
Virtual memory can help there too. It can help preventprograms from interfering with other programs.
Occasionally, you want programs to cooperate, and sharememory. Virtual memory can also help in that respect.
Each process is allocated an address space. This is a setof valid addresses that can be used. This address space canbe changed dynamically. For example, the program might requestadditional memory (from dynamic memory allocation) from theoperating system.
If a process tries to access an address that is not partof its address space, an error occurs, and the operating systemtakes over, usually killing the process (core dumps, etc).
How does virtual memory play a role? As you run a program,it generates addresses. Addresses are generated (for RISC machines)in one of three ways:
A load instruction A store instruction Fetching an instruction Load/store create data addresses, while fetching an instructioncreates instruction addresses. Of course, RAM doesn't distinguishbetween the two kinds of addresses. It just sees it as an address.Each address generated by a program is considered virtual.It must be translated to a real physical address. Thus,address tranlation is occuring all the time. As you might imagine,this must be handled in hardware, if it's to be done efficiently.
You might think translating each address from virtual to physicalis a crazy idea, because of how slow it is. However, you getmemory protection from address translation, so it's worth thehardware needed to get memory protection.
There is a corresponding terminology in virtual memory to acache line. It's called a page.
A page is a sequence of N bytes where N is a powerof 2.These days, page sizes are at least 4K in size and maybe aslarge as 64 K or more.
Let's assume that we have 1M of RAM. RAM is also called physicalmemory. We can subdivide the RAM into 4K pages. Thus1M / 4K = 256 pages. Thus, our RAM has 256 physical pages, weachholding 4K.
Let's assume we have 10 M of disk. Thus, we have 2560 disk pages.
In principle, each program may have up to 4 G of address space.Thus, it can, in principle, access 220 virtualpages. In reality, many of those pages are considered invalid pages.
If this looks a lot like a fully-associative cache, butwhose offset is much much larger, it's because that'sbasically what it is.
We must convert the virtual page number to a physical page number.In our example, the virtual page consists of 20 bits. A page table isa data structure which consists of 220 page tableentries (PTEs). Think of the page table as an array of page tableentries, indexed by the virtual page number.
The page table's index starts at 0, and ends at 220 -1.
Here's how it looks:
Suppose your program generates a virtual address. You'dextract bits B31-12 to get the virtual pagenumber. Use that as an index into the above page tableto access the page table entry (PTE).
Each PTE consists of a valid bit and a 20 bit physical page (it's20 bits, because we assume we have 1M of RAM, and 1M of RAM requires20 bits to access each byte). If the valid bit is 1, then the virtualpage is in RAM, and you can get the physical page from the PTE. Thisis called a page hit, and is basically the same as a cache hit.
If the valid bit is 0, the page is not in RAM, and the 20 bitphysical page is meaningless. This means, we must get the disk pagecorresponding to the virtual page from disk and place it into a pagein RAM. This is called a page fault.
Because disk access is slow, slow, slow, we want to minimizethe number of page faults.
In general, this is done by making RAM fully associative. Thatis, any disk page can go into any RAM page (disk, RAM, and virtualpages all have the same size).
In practice, some pages in RAM are reserved for the operatingsystem to make the OS run efficiently.
Then, you'd see if the virtual page had a corresponding physicalpage in RAM using the page table. If the valid bit of the PTE is 1,then you'd translate the virtual page to a physical page, and appendthe page offset. That would give you a physical address in RAM.
What's worse, the page tables we've been talking about areincomplete. If we have a page fault, we need to find the page ondisk. Where is it located? That information is kept in another pagetable, which is indexed by the virtual page (same as the page table wetalked about), and tells you where on disk to find it. Then, we haveto copy that page to RAM, and update the first page table. Thus,we need two page table tables!
These page tables are basically just data. Thus, they occupymemory as any data occupies memory. When we switch from one processto another, we need to load its page table in RAM for easy access.It's useful to keep it located in certain parts of RAM for just such apurpose. If RAM is suitable large, we can have several processes'page tables in RAM at the same time.
A page table register can hold the physical address of thepage table that's currently active to get quick access. Still, theseare large, and we may want to find ways to speed things up.
Another idea is to use a kind of closed hash table. The hash table'ssize is based on the number of physical pages. The number of physicalpages is usually a lot smaller than the number of all virtual pagesput together.
A hash function takes a virtual page number as input, and producesan index into the hash table as the result. Each entry of thehash table consists of a virtual page number and a physical page number.You check to see if the virtual page number matched, and if so, thenyou use the physical page.
If it missed, then you must resolve the collision based on the hashtable. In practice, you may need the number of entries of the hashtable to be a few times larger than the number of physical pages,to avoid excessive collisions.
An inverted page table takes longer to access because you may havecollisions, but it takes up a lot less memory. It helps with pagehits. However, if you have a page fault, you still need a page tablethat maps virtual pages to disk pages, and that will be large.
The idea of a TLB is to create a special cache for translations.Here's one example of a TLB.
Each row in the TLB is like one slot of a cache. Assumewe have 64 rows. When you have a virtual address, you cansplit it into a virtual page and an offset.
In parallel, compare the virtual page to all of the entries of theTLB (say, 64). There should be, at most, one match. Just like afully associative cache, you want to check if the TLB entry is valid.
If a TLB hit occurs, replace the virtual page with a physical pageto create a physical address.
If there's a TLB miss, then it's still possible that the virtualpage resides in RAM. You must now look up the PTE (page table entry)to see if this is the case. If the PTE says the virtual page is inRAM, then you can update the TLB, so that it has a correct virtualto physical page translation.
The TLB is designed to only store a limited subset of virtualto physical page translation. It is really just a cache for thepage table, storing only the most frequently used translations.
The TLB can be kept small enough that it can be fully associative.However, some CPU designers make larger TLBs that are direct mappedor set associative.
They insist on using those numbers. Another customer comes in,and insists on using those numbers too. How can you accomodateboth customers?
Basically, you cheat. You tell the first customer you have boxes100, 101, and 102, but you assign him boxes 200, 201, and 202.Similarly, you tell the second customer that you also have boxes100, 101, and 102, but you assign her boxes 320, 321, and 322.
Whenever customer 1 wants the mail in box 100, you translateit to box 200. Whenever customer 2 wants mail in box 100, youtranslate it to box 320. Thus, your two customers get to use thebox numbers they want, and through the magic of translation, theytwo customers avoid using each other's boxes.
If the post office wanted to reserve its own boxes for itsown use, it could reserve boxes 1 through 100 to itself, and neverassign those boxes, directly or indirectly to a customer. Evenif a customer wants box 50, they can be assigned box 150, safelyoutside the range of boxes reserved for the post office.
This same analogy applies to real programs. Each program canassume it uses the same set of 32 bit virtual addresses. We justmake sure that those virtual pages do not map to the same disk page,nor to the same physical page.
Who's job is it to assign the pages? It's the operating system.When a program starts up, it will want a certain range of addresses.The operating system creates a page table for the program, making surethe disk pages it uses do not conflict with the disk pages ofother programs.
However, because disk access is slower, it makes sense to usedirty bits for pages. Thus, if a page hasn't been modified(maybe because it's read only), there's no reason to copy it backto disk. Just toss it out.
This is good because at one point, programmers had to worryvery much about whether a chunk of memory resided on the diskor in RAM. Programmers had to spend a great deal of effortmanaging this, and it distracted them from coding. With virtualmemory, the management of disk as an extension of RAM is handledautomatically.
There may be issues of synchronization to handle, but that'sa topic that's best left to a course in operating systems.Suffice it to say that we do have a way to map virtualpages to the same disk page to allow for sharing.
Sharing is available when you want two processes to collaboratein a somewhat safe manner. Both processes, for the most part, havetheir own memory. They only share a small region between thetwo of them.
The two are somewhat orthogonal (independent) of each other. Forexample, we could use the disk as an extension of the address spaceaccessible to the programmer, without providing memory protection.Thus, a programmer might be able to access all disk pages, if theoperating system allowed it when setting up page tables.
It's also possible to have memory protection without any disks.Provided each process only needs a small number of pages, we couldallow all of those virtual pages to reside in RAM. For example,suppose each process has only 2 virtual pages. Then, since (inour example) RAM stores 256 pages, we could allow 128 processesto have virtual page 0 and 1, and they would not interferewith each other in virtual memory.