Cache Coherence: The Cache Coherence Directory
-Aviral Mittal (avimit att yhaoo dat cam)
Connect @

Cache Coherency: Cache Coherence Directory
It is assumed that the reader is well versed in Cache Memory Organization before they may read this techrature. You can click here for techrature on Cache Memory Organization.

What is a Cache Coherence Directory?
Hardware based Cache Coherence uses mainly 2 techniques. One of them is using Cache Directory, the other one is called Bus Snooping. This techrature is all about the former i.e. using Cache Directory.
The Cache directory is like a lookup table. A table which stores information about what cache lines are present in the whole of the system, and where those cache lines actually are. For example let us take a system which as 3 CPUs, and each of the CPU has its own cache. Now a cache line can be either totally absent from all the 3 caches, or may be present in 1 or more caches at the same time. The cache directory will have this information in a lookup table. So in short a Directory is like a memory which stores information on which cache lines are stored in which caches.

Example 1: This example is to work out the structure of Cache Directory.

Main Memory Size
256 Bytes
Addr Width = 8 bits
Cache Line Size
1 Byte

Cache Memory Size
8 Bytes
Addr Width = 3 bits
Number of Processors in System
(This is also number of caches
in the system)

Now Since the main memory size is 256 bytes, and cache line size (for the sake of argument) is just 1 byte, there are total 256 Uniqe cache lines in this system.
Each cache memory size is 8 byte, so each cache memory can hold 8 cache lines. Let us also assme that cache is direct-mapped, i.e. 1-Way cache memory, so that a line may only be stored at only 1 given fixed location in the cache memory.
Now there are 3 cache memories in this system, that means this system can hold maximum 24 unique cache lines at any given time. This will happen when all the 3 caches have different cache lines stored in them.

Purpose of the Cache Directory:
One of the Primary purpose of the Cache directory is to keep track of which cache lines out of all 256 unique possible cache lines are present in any of the caches at any given time, and what is the state of a cache line as observed by the Directory. This in turns helps the system to maintain hardware based cache-coherency.
But How? Keep on reading

Step 1. Determination of the existence of the cache line in the system (i.e. if a cache line is presnt in 1 or more of the 3 caches)
The Cache directory must determine after looking into 8 bit address, if the cache line identified by this address is at all present in any of the 3 caches?
This can easily be done using a Separate Memory for the directory. We will call this the Direcory Tag-Memory or Directory Tag-Ram. This will work in the same way as any classical 'Tag-Memory or Tag-Ram'. Since in this example the directory needs to keep track of 24 cache lines, this Tag memory for the directory will have max 32 entries (As nearest power of 2 to 24 is 32). This in turn means 5 bits address for 32 entries. 5 bits will be the 'index' of lookup, and 3 bits will be the 'Tag' stored at the Index location. This Cache memory with 32 locations and 3 bit TAG, will reveal which cache line(s) out of 256 Unique cache lines are present in any of the 3 caches. For example consider a 8 bit address say "1000_1010": In this address the 5 LSBs ("0_1010") serve as 'Index or Set' which are used as an address to the 32 location memory to point exactly 1 of those 32 locations. The contents of the location will be 3 bits 'Tag' and if the tag stored at the location pointed to by the 'Index' matches the 3 MSBs of the address, "(100") then its a match, and the line is deemed to be in the directory and hence in the system in 1 or more of the 3 caches. Its nothing special here, its just the classic way of looking up 'Tags' in the 'Tag' memory in any system with cache(s).
Note that the Size of Tag/Index of the directory has Nothing to do with the size of Tag/Index of the cache memories in the system. The cache memories in the system will usually have different size of Tag/Index to the size of Tag/Index used in the directory lookup table or directory Tag Memory.
Yes, to start with the Directory Tag-Ram will have nothing in it. As the processor(s) start operating, the Directory Tag-Ram will start getting populated. Also in addition to store the 3 bits tag one more bit 'valid/invalid' is also stored for every Tag to indicate if the Tag is Valid or Invalid. At the start up the tag-memory must be initialized and all the entries must  have their valid bit set to 0 to indicate all tags are invalid.
So far this is how the entry in Directory Tag Ram will look like

Valid/Invalid (1 bit)
Tag (3 bits)
4 bits of information for each of the 32 enteris in Directory Tag-Ram.

Step 2. Expanding the information in each of the 32 entreis.
If a cache line is present in any or more of the 3 caches, then the directory must exactly know which cache or caches have it. This can be done using 3 more bits in the entry of directory Tag-Ram as we have 3 cache memories in the system. Lets name them Cache2, Cache1 and Cache0. This is called 'Membership Vector'. Its pretty simple 3 bits correspond to 3 cache memories, and if the line is present in a cache memory, its corresponding bit is high. For example if the cache line identified by the address "1000_1010" is present in Cache0 only the membership vector for this line will be "001". And if this cache line is present in say cache2 and cache1, the membership vector will be "110" and so on and so forth.

Now the entry in the Directory has 3 more bits and it looks like this:

Valid/Invalid (1 bit)
Tag (3 Bits)
Membership Vector (3 Bits).

7 bits of information for each of the 32 entries in Directory Tag-Ram. This 7 Bits of infromation collectively is also called pay-load. So pay-load for Directory Tag-Ram is 7 bits wide.
Note if the Valid/Invalid Bit is 0, then Tag and Membership Vector does not matter, can be just xxx.

The pay-load also needs to be extended further to include information on Cache Line State, and it may also incldue which cache is the 'owner' of the cache line.
The Cache Line can be in any of the MOESI state. One single cache line can be in different states in diffrent caches.
However the directory may not individually keep track of which cache line is in which state in each cache.
Ownersihp of the line must be identified, 2 bits are requierd to identify the owner. Hence 2 more bits for the payload which means the following:

Ownership Code
No Owner
Cache 0 is the owner
Cache 1 is the owner
Cache 2 is the owner

If a cache line is in M/E/O state in any cache, then ONLY this caache has it in M/E/O state, as these cache states are exclusive.
If a cache line is in S state in one cache, it can be in O state in 1 other cache.

The directory needs to know the 'state' of line as a global 'state' view.
So if any cache line is present in any of M/E/O state in any cache, the Directory needs to know the owner, and owner field will be non-zero. A two bit coding will help to identify the 'state' of line from Directory's point of view. In this techrature its called 'StateCode'

State Code
Shared  State (S), Ownership Code must also be "00", Membership Vecotr cannot be 0, also called Shared Clean (SC) State
Modified (M) State, Ownership Code cannot be "00", Membership Vecotr cannot be 0, also called Unique Dirty (UD) State
Exclusive (E) State, Ownership Code cannot be "00", Membersihp Vector cannot be 0, Also Called Unique Clean (UC) State
Owneed (O) State, Ownership Code cannot be "00", Membrsihp Vecotor cannot be 0, also called Shared Dirty (SD) state.

For Invalid (I) state, membership vector is all 0s and Ownership Code is all 0s. State Code does not matter, but can be kept "00" as well.

Note: While  the size in bits for 'Ownership Code' and membership Vector depends upon Number of Caches, the size of State Code will always be 2 bits.

The Tag Payload now looks like this:
Valid/Invalid (1 bit) Tag (3 Bits) Membership Vector (3 Bits). OwnershipCode (2 Bits)
State Code (2 Bits)

Total Number of Payload Bits = 1 + 3 + 3 + 2 +2 = 10 bits in this example

This is how the directory keeps track of Cache lines and the state of cache line in a Cache-coherent System.
Assumptions: There is Only 1 level of Cache in the system, Like L1 (no L2 or L3 etc). And there is only 1 Cache Domain in the System.

<= Cache Organization                             Next => xxxx

Cache Organizations:
4-Way-Set-Associative Cache
2-Way-Set-Associative Cache

Cache Coherence Directory.
Directory Based Cache Coherence.
Cache Coherence Directory Fundamentals
Structure of Cache Coherence Directory.