Integrated Device Technology, Inc. reserves the right to make changes to its products or specifications at any time, without notice, in order to improve design or performance and to supply the best possible product. IDT does not assume any responsibility for use of any circuitry described other than the circuitry embodied in an IDT product. IDT makes no representations that circuitry described herein is free from patent infringement or other rights of third parties which may result from its use. No license is granted by implication or otherwise under any patent, patent rights, or other rights of Integrated Device Technology, Inc.

LIFE SUPPORT POLICY
Integrated Device Technology’s products are not authorized for use as critical components in life support devices or systems unless a specific written agreement pertaining to such intended use is executed between the manufacturer and an officer of IDT.

1. Life support devices or systems are devices or systems that (a) are intended for surgical implant into the body, or (b) support or sustain life, and whose failure to perform, when properly used in accordance with instructions for use provided in the labeling, can be reasonably expected to result in a significant injury to the user.

2. A critical component is any component of a life support device or system whose failure to perform can be reasonably expected to cause the failure of the life support device or system, or to affect its safety or effectiveness.

The IDT logo is a registered trademark and BiCameral, BurstRAM, BUSMUX, CacheRAM, DECnet, Double-Density, FASTX, Four-Port, FLEXI-CACHE, Flexi-PAK, Flow-thruEDC, IDT/c, IDTenvY, IDT/sae, IDT/sim, IDT/ux, MacStation, MICROSLICE, NICSTAR, Orion, PalatteDAC, REAL8, R3041, R3051, R3052, R3081, R3721, R4600, RISCompiler, RISController, RISCard, RISC/30, RISC Subsystem, RISC Windows, SARAM, SmartLogic, SolutionPak, SyncFIFO, SyncBiFIFO, SPC, and TargetSystem are trademarks of Integrated Device Technology, Inc.

MIPS is a registered trademark of MIPS Computer Systems, Inc.

All others are trademarks of their respective companies.
# Table of Contents

## Overview

<table>
<thead>
<tr>
<th>Section</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>Introduction</td>
<td>1-1</td>
</tr>
<tr>
<td>Features</td>
<td>1-3</td>
</tr>
<tr>
<td>Device Overview</td>
<td>1-4</td>
</tr>
<tr>
<td>Pipeline Overview</td>
<td>1-4</td>
</tr>
<tr>
<td>CPU Register Overview</td>
<td>1-5</td>
</tr>
<tr>
<td>CPU Instruction Set Overview</td>
<td>1-6</td>
</tr>
<tr>
<td>Data Formats and Addressing</td>
<td>1-13</td>
</tr>
<tr>
<td>Coprocessors (CP0-CP2)</td>
<td>1-15</td>
</tr>
<tr>
<td>System Control Coprocessor, CP0</td>
<td>1-15</td>
</tr>
<tr>
<td>Floating-Point Co-Processor</td>
<td>1-18</td>
</tr>
<tr>
<td>Floating-Point Units</td>
<td>1-18</td>
</tr>
<tr>
<td>Virtual to Physical Address Mapping</td>
<td>1-19</td>
</tr>
<tr>
<td>Joint TLB</td>
<td>1-19</td>
</tr>
<tr>
<td>Instruction TLB</td>
<td>1-20</td>
</tr>
<tr>
<td>Data TLB</td>
<td>1-20</td>
</tr>
<tr>
<td>Cache Memory</td>
<td>1-20</td>
</tr>
<tr>
<td>Instruction Cache</td>
<td>1-20</td>
</tr>
<tr>
<td>Data Cache</td>
<td>1-20</td>
</tr>
<tr>
<td>Write buffer</td>
<td>1-21</td>
</tr>
<tr>
<td>R4600/R4700 Clocks</td>
<td>1-21</td>
</tr>
<tr>
<td>System Interface</td>
<td>1-22</td>
</tr>
<tr>
<td>Comparison of R4600/R4700 and R4400</td>
<td>1-23</td>
</tr>
</tbody>
</table>

## CPU Instruction Set Summary

<table>
<thead>
<tr>
<th>Section</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>Introduction</td>
<td>2-1</td>
</tr>
<tr>
<td>CPU Instruction Formats</td>
<td>2-1</td>
</tr>
<tr>
<td>Load and Store Instructions</td>
<td>2-2</td>
</tr>
<tr>
<td>Scheduling a Load Delay Slot</td>
<td>2-2</td>
</tr>
<tr>
<td>Defining Access Types</td>
<td>2-2</td>
</tr>
<tr>
<td>Computational Instructions</td>
<td>2-4</td>
</tr>
<tr>
<td>64-bit Virtual Address Operations with 32-bit operands</td>
<td>2-4</td>
</tr>
<tr>
<td>Cycle Timing for Multiply and Divide Instructions</td>
<td>2-4</td>
</tr>
<tr>
<td>Jump and Branch Instructions</td>
<td>2-5</td>
</tr>
<tr>
<td>Overview of Jump Instructions</td>
<td>2-5</td>
</tr>
<tr>
<td>Overview of Branch Instructions</td>
<td>2-5</td>
</tr>
<tr>
<td>Special Instructions</td>
<td>2-5</td>
</tr>
<tr>
<td>Exception Instructions</td>
<td>2-5</td>
</tr>
<tr>
<td>Coprocessor Instructions</td>
<td>2-5</td>
</tr>
</tbody>
</table>
### The CPU Pipeline

**Chapter 3**

- **Introduction** 3-1
- **CPU Pipeline Operation** 3-1
- **CPU Pipeline Stages** 3-2
  - 1I - Instruction Fetch, Phase one 3-2
  - 2I - Instruction Fetch, Phase two 3-2
  - 1R - Register Fetch, Phase one 3-2
  - 2R - Register Fetch, Phase two 3-2
  - 1A - Execution, Phase one 3-2
  - 2A - Execution, Phase two 3-2
  - 1D - Data Fetch, Phase one 3-2
  - 2D - Data Fetch, Phase two 3-3
  - 1W - Write Back, Phase one 3-3
  - 2W - Write Back, Phase two 3-3
- **Branch Delay** 3-4
- **Load Delay** 3-4
- **Interlock and Exception Handling** 3-5
  - **Exception Conditions** 3-6
  - **Stall Conditions** 3-7
  - **Slip Conditions** 3-8
- **R4600/R4700 Write Buffer** 3-9

### Memory Management

**Chapter 4**

- **Translation Lookaside Buffer (TLB)** 4-1
  - **Hits and Misses** 4-1
  - **Multiple Matches** 4-1
- **Address Spaces** 4-1
  - **Virtual Address Space** 4-1
  - **Physical Address Space** 4-2
  - **Virtual-to-Physical Address Translation** 4-2
  - **32-bit Virtual Address Translation** 4-3
  - **64-bit Virtual Address Translation** 4-3
- **Operating Modes** 4-4
  - **User Mode Operations** 4-4
  - **32-bit User Mode (useg)** 4-5
  - **64-bit User Mode (xuseg)** 4-6
  - **Supervisor Mode Operations** 4-6
  - **32-bit Supervisor Mode, User Space (suseg)** 4-7
  - **32-bit Supervisor Mode, Supervisor Space (sseg)** 4-7
  - **64-bit Supervisor Mode, User Space (xsuseg)** 4-7
  - **64-bit Supervisor Mode, Current Supervisor Space (xsseg)** 4-7
  - **64-bit Supervisor Mode, Separate Supervisor Space (csseg)** 4-8
    - **Kernel Mode Operations** 4-8
      - **32-bit Kernel Mode, User Space (kuseg)** 4-10
      - **32-bit Kernel Mode, Kernel Space 0 (kseg0)** 4-10
      - **32-bit Kernel Mode, Kernel Space 1 (kseg1)** 4-10
      - **32-bit Kernel Mode, Supervisor Space (kseg)** 4-10
      - **32-bit Kernel Mode, Kernel Space 3 (kseg3)** 4-11
      - **64-bit Kernel Mode, User Space (xkuseg)** 4-11
      - **64-bit Kernel Mode, Current Supervisor Space (xksseg)** 4-11
      - **64-bit Kernel Mode, Physical Spaces (xkphys)** 4-12
      - **64-bit Kernel Mode, Kernel Space (xkseg)** 4-12
      - **64-bit Kernel Mode, Compatibility Spaces (ckseg1:0, cksseg, ckseg3)** 4-12
<table>
<thead>
<tr>
<th>Topic</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>System Control Coprocessor</td>
<td>4-12</td>
</tr>
<tr>
<td>Format of a TLB Entry</td>
<td>4-13</td>
</tr>
<tr>
<td>CP0 Registers</td>
<td>4-15</td>
</tr>
<tr>
<td>Index Register (0)</td>
<td>4-16</td>
</tr>
<tr>
<td>Random Register (1)</td>
<td>4-16</td>
</tr>
<tr>
<td>EntryLo0 (2), and EntryLo1 (3) Registers</td>
<td>4-17</td>
</tr>
<tr>
<td>PageMask Register (5)</td>
<td>4-17</td>
</tr>
<tr>
<td>Wired Register (6)</td>
<td>4-18</td>
</tr>
<tr>
<td>EntryHi Register (CP0 Register 10)</td>
<td>4-18</td>
</tr>
<tr>
<td>Processor Revision Identifier (PRId) Register (15)</td>
<td>4-19</td>
</tr>
<tr>
<td>Config Register (16)</td>
<td>4-19</td>
</tr>
<tr>
<td>Load Linked Address (LLAddr) Register (17)</td>
<td>4-20</td>
</tr>
<tr>
<td>Cache Tag Registers [TagLo (28) and TagHi (29)]</td>
<td>4-21</td>
</tr>
<tr>
<td>Virtual-to-Physical Address Translation Process</td>
<td>4-22</td>
</tr>
<tr>
<td>TLB Misses</td>
<td>4-23</td>
</tr>
<tr>
<td>TLB Instructions</td>
<td>4-23</td>
</tr>
</tbody>
</table>

**CPU Exception Processing**

**Chapter 5**

<table>
<thead>
<tr>
<th>Topic</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>How Exception Processing Works</td>
<td>5-1</td>
</tr>
<tr>
<td>Exception Processing Registers</td>
<td>5-1</td>
</tr>
<tr>
<td>Context Register (4)</td>
<td>5-2</td>
</tr>
<tr>
<td>Bad Virtual Address Register (BadVAddr) (8)</td>
<td>5-3</td>
</tr>
<tr>
<td>Count Register (9)</td>
<td>5-3</td>
</tr>
<tr>
<td>Compare Register (11)</td>
<td>5-3</td>
</tr>
<tr>
<td>Status Register (12)</td>
<td>5-4</td>
</tr>
<tr>
<td>Status Register Format</td>
<td>5-4</td>
</tr>
<tr>
<td>Status Register Modes and Access States</td>
<td>5-6</td>
</tr>
<tr>
<td>Status Register Reset</td>
<td>5-6</td>
</tr>
<tr>
<td>Cause Register (13)</td>
<td>5-7</td>
</tr>
<tr>
<td>Exception Program Counter (EPC) Register (14)</td>
<td>5-8</td>
</tr>
<tr>
<td>XContext Register (20)</td>
<td>5-9</td>
</tr>
<tr>
<td>Error Checking and Correcting (ECC) Register (26)</td>
<td>5-9</td>
</tr>
<tr>
<td>Cache Error (CacheErr) Register (27)</td>
<td>5-10</td>
</tr>
<tr>
<td>Error Exception Program Counter (Error EPC) Register (30)</td>
<td>5-11</td>
</tr>
<tr>
<td>Processor Exceptions</td>
<td>5-12</td>
</tr>
<tr>
<td>Exception Types</td>
<td>5-12</td>
</tr>
<tr>
<td>Reset Exception Process</td>
<td>5-12</td>
</tr>
<tr>
<td>Cache Error Exception Process</td>
<td>5-13</td>
</tr>
<tr>
<td>Soft Reset and NMI Exception Process</td>
<td>5-13</td>
</tr>
<tr>
<td>General Exception Process</td>
<td>5-13</td>
</tr>
<tr>
<td>Exception Vector Locations</td>
<td>5-13</td>
</tr>
<tr>
<td>Priority of Exceptions</td>
<td>5-14</td>
</tr>
<tr>
<td>Reset Exception</td>
<td>5-15</td>
</tr>
<tr>
<td>Soft Reset Exception</td>
<td>5-16</td>
</tr>
<tr>
<td>Nonmaskable Interrupt (NMI) Exception</td>
<td>5-17</td>
</tr>
<tr>
<td>Address Error Exception</td>
<td>5-18</td>
</tr>
<tr>
<td>TLB Exceptions</td>
<td>5-19</td>
</tr>
<tr>
<td>TLB Refill Exception</td>
<td>5-19</td>
</tr>
<tr>
<td>TLB Invalid Exception</td>
<td>5-20</td>
</tr>
<tr>
<td>TLB Modified Exception</td>
<td>5-21</td>
</tr>
<tr>
<td>Cache Error Exception</td>
<td>5-22</td>
</tr>
<tr>
<td>Bus Error Exception</td>
<td>5-23</td>
</tr>
<tr>
<td>Integer Overflow Exception</td>
<td>5-24</td>
</tr>
<tr>
<td>Trap Exception</td>
<td>5-25</td>
</tr>
<tr>
<td>System Call Exception</td>
<td>5-26</td>
</tr>
<tr>
<td>------------------------</td>
<td>------</td>
</tr>
<tr>
<td>Breakpoint Exception</td>
<td>5-27</td>
</tr>
<tr>
<td>Reserved Instruction Exception</td>
<td>5-28</td>
</tr>
<tr>
<td>Coprocessor Unusable Exception</td>
<td>5-29</td>
</tr>
<tr>
<td>Floating-Point Exception</td>
<td>5-30</td>
</tr>
<tr>
<td>Interrupt Exception</td>
<td>5-31</td>
</tr>
<tr>
<td>Exception Handling and Servicing Flowcharts</td>
<td>5-32</td>
</tr>
</tbody>
</table>

**Floating-Point Unit**  

**Chapter 6**

Overview  
- The R4600/R4700 Floating-Point Coprocessor  6-1
- FPU Features  6-2
- FPU Programming Model  6-2
- Floating-Point General Registers (FGRs)  6-2
- Floating-Point Registers  6-3
- Floating-Point Control Registers  6-3
  - Implementation and Revision Register, (FCR0)  6-4
  - Control/Status Register (FCR31)  6-4
  - Accessing the Control/Status Register  6-6
  - IEEE Standard 754  6-6
  - Control/Status Register FS Bit  6-6
  - Control/Status Register Condition Bit  6-6
  - Control/Status Register Cause, Flag, and Enable Fields  6-6
  - Cause Bits  6-6
  - Enable Bits  6-6
  - Flag Bits  6-7
  - Control/Status Register Rounding Mode Control Bits  6-7
- Floating-Point Formats  6-7
- Binary Fixed-Point Format  6-9
- Floating-Point Instruction Set Overview  6-10
  - Floating-Point Load, Store, and Move Instructions  6-11
  - Transfers Between FPU and Memory  6-11
  - Transfers Between FPU and CPU  6-11
  - Load Delay and Hardware Interlocks  6-12
  - Data Alignment  6-12
  - Endianness  6-12
  - Floating-Point Conversion Instructions  6-12
  - Floating-Point Computational Instructions  6-12
  - Branch on FPU Condition Instructions  6-12
  - Floating-Point Compare Operations  6-12
- FPU Instruction Pipeline Overview  6-13
  - Instruction Execution  6-13
  - Instruction Execution Cycle Time  6-14
  - Instruction Scheduling Constraints  6-15
  - FPU Multiplier Constraints  6-15
  - FPU Adder Constraints  6-15
  - Resource Scheduling Rules  6-15
# Floating-Point Exceptions
- Exception Types 7-1
- Exception Trap Processing 7-2
- Flags 7-2
- FPU Exceptions 7-3
- Inexact Exception (I) 7-3
- Invalid Operation Exception (V) 7-3
- Division-by-Zero Exception (Z) 7-4
- Overflow Exception (O) 7-4
- Underflow Exception (U) 7-4
- Unimplemented Instruction Exception (E) 7-5
- Saving and Restoring State 7-5
- Trap Handlers for IEEE Standard 754 Exceptions 7-6

# Processor Signal Descriptions
- Introduction 8-1
- System Interface Signals 8-2
- Clock/Control Interface Signals 8-3
- Interrupt Interface Signals 8-4
- JTAG Interface Signals 8-4
- Initialization Interface Signals 8-5

# Initialization Interface
- Introduction 9-1
- Functional Overview 9-1
- Reset and Initialization Signal Descriptions 9-1
- Power-on Reset 9-3
  - Cold Reset 9-3
  - Warm Reset 9-3
- Initialization Sequence 9-4
- Boot-Mode Settings 9-6

# Clock Interface
- Introduction 10-1
- Signal Terminology 10-1
- Basic System Clocks 10-1
  - MasterClock 10-1
  - MasterOut 10-2
  - SyncIn/SyncOut 10-2
  - PClock 10-2
  - SClock 10-2
  - TClock 10-2
  - RClock 10-2
- System Timing Parameters 10-3
  - Alignment to SClock 10-3
  - Alignment to MasterClock 10-3
  - Phase-Locked Loop (PLL) 10-3
- PLL Components and Operation 10-4
  - Passive Components 10-4
- Connecting Clocks to a Phase-Locked System 10-5
- Connecting Clocks to a System without Phase Locking 10-6
  - Connecting to a Gate-Array Device 10-6
  - Connecting to a CMOS Logic System 10-8
# Table of Contents

## Cache Organization, Operation and Coherency  Chapter 11

<table>
<thead>
<tr>
<th>Section</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>Introduction</td>
<td>11-1</td>
</tr>
<tr>
<td>Memory Organization</td>
<td>11-1</td>
</tr>
<tr>
<td>Overview of Cache Operations</td>
<td>11-2</td>
</tr>
<tr>
<td>R4600/R4700 Cache Description</td>
<td>11-2</td>
</tr>
<tr>
<td>Cache Line Size</td>
<td>11-2</td>
</tr>
<tr>
<td>Cache Organization and Accessibility</td>
<td>11-2</td>
</tr>
<tr>
<td>Organization of the Primary Instruction Cache (I-Cache)</td>
<td>11-3</td>
</tr>
<tr>
<td>Organization of the Primary Data Cache (D-Cache)</td>
<td>11-3</td>
</tr>
<tr>
<td>Accessing the Primary Caches</td>
<td>11-5</td>
</tr>
<tr>
<td>Cache States</td>
<td>11-5</td>
</tr>
<tr>
<td>Primary Cache States</td>
<td>11-6</td>
</tr>
<tr>
<td>Cache Line Ownership</td>
<td>11-6</td>
</tr>
<tr>
<td>Cache Write Policy</td>
<td>11-6</td>
</tr>
<tr>
<td>Cache State Transition Diagrams</td>
<td>11-7</td>
</tr>
<tr>
<td>Cache Coherency Overview</td>
<td>11-7</td>
</tr>
<tr>
<td>Cache Coherency Attributes</td>
<td>11-7</td>
</tr>
<tr>
<td>Uncached</td>
<td>11-8</td>
</tr>
<tr>
<td>Noncoherent</td>
<td>11-8</td>
</tr>
<tr>
<td>Cache Operation Modes</td>
<td>11-8</td>
</tr>
<tr>
<td>R4600/R4700 Processor Synchronization Support</td>
<td>11-8</td>
</tr>
<tr>
<td>Test-and-Set</td>
<td>11-8</td>
</tr>
<tr>
<td>Counter</td>
<td>11-9</td>
</tr>
<tr>
<td>Load Linked and Store Conditional</td>
<td>11-10</td>
</tr>
<tr>
<td>Examples Using LL and SC</td>
<td>11-11</td>
</tr>
</tbody>
</table>

## System Interface  Chapter 12

<table>
<thead>
<tr>
<th>Section</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>Introduction</td>
<td>12-1</td>
</tr>
<tr>
<td>Terminology</td>
<td>12-1</td>
</tr>
<tr>
<td>System Interface Description</td>
<td>12-1</td>
</tr>
<tr>
<td>Interface Buses</td>
<td>12-2</td>
</tr>
<tr>
<td>Address and Data Cycles</td>
<td>12-2</td>
</tr>
<tr>
<td>Issue Cycles</td>
<td>12-3</td>
</tr>
<tr>
<td>Handshake Signals</td>
<td>12-4</td>
</tr>
<tr>
<td>System Interface Protocols</td>
<td>12-4</td>
</tr>
<tr>
<td>Master and Slave States</td>
<td>12-5</td>
</tr>
<tr>
<td>Moving from Master to Slave State</td>
<td>12-5</td>
</tr>
<tr>
<td>External Arbitration</td>
<td>12-5</td>
</tr>
<tr>
<td>Uncompelled Change to Slave State</td>
<td>12-5</td>
</tr>
<tr>
<td>Processor and External Requests</td>
<td>12-6</td>
</tr>
<tr>
<td>Rules for Processor Requests</td>
<td>12-6</td>
</tr>
<tr>
<td>Processor Requests</td>
<td>12-7</td>
</tr>
<tr>
<td>Processor Read Request</td>
<td>12-8</td>
</tr>
<tr>
<td>Processor Write Request</td>
<td>12-8</td>
</tr>
<tr>
<td>External Requests</td>
<td>12-9</td>
</tr>
<tr>
<td>External Read Request</td>
<td>12-10</td>
</tr>
<tr>
<td>External Write Request</td>
<td>12-10</td>
</tr>
<tr>
<td>Read Response</td>
<td>12-10</td>
</tr>
</tbody>
</table>
# Table of Contents

## R4600/R4700 Processor Interrupts  
Chapter 13  
- Introduction  13-1  
- Hardware Interrupts  13-1  
- Nonmaskable Interrupt (NMI)  13-1  
- Asserting Interrupts  13-1  

## R4600/R4700 Error Checking  
Chapter 14  
- Introduction  14-1  
- Error Checking in the Processor  14-1  
  - Types of Error Checking  14-1  
  - Parity Error Detection  14-1  
  - Error Checking Operation  14-2  
  - System Interface  14-2  
  - System Interface Command Bus  14-2  
- Summary of Error Checking Operations  14-3  

## CPU Instruction Set Details  
Appendix A  
- Introduction  A-1  
- Instruction Classes  A-1  
- Instruction Formats  A-2  
- Instruction Notation Conventions  A-2  
  - Instruction Notation Examples  A-4  
- Load and Store Instructions  A-4  
- Jump and Branch Instructions  A-5  
- Coprocessor Instructions  A-6  
- System Control Coprocessor (CP0) Instructions  A-6  
- CPU Instruction Opcode Bit Encoding  A-151  

## FPU Instruction Set Details  
Appendix B  
- Introduction  B-1  
- Instruction Formats  B-1  
  - Floating-Point Loads, Stores, and Moves  B-3  
  - Floating-Point Operations  B-4  
- Instruction Notation Conventions  B-4  
  - Instruction Notation Examples  B-4  
- Load and Store Instructions  B-5  
- Computational Instructions  B-6  
- FPU Instruction Opcode Bit Encoding  B-45  

## Cache Operations Timing  
Appendix C  
- Introduction  C-1  
  - Caveats About Cache Operations  C-1  
  - Cache Operations Tables  C-1  
  - Details on the Fill_I Equation  C-3  

## Standby Mode Operation  
Appendix D  
- Entering Standby Mode  D-1  

## Coprocessor 0 Hazards  
Appendix E
<table>
<thead>
<tr>
<th>Number</th>
<th>Figure Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>Figure 1.1</td>
<td>R4600/R4700 Block Diagram</td>
<td>1-4</td>
</tr>
<tr>
<td>Figure 1.2</td>
<td>R4600/R4700 CPU Registers</td>
<td>1-5</td>
</tr>
<tr>
<td>Figure 1.3</td>
<td>CPU Instruction Formats</td>
<td>1-6</td>
</tr>
<tr>
<td>Figure 1.4</td>
<td>Big-Endian Byte Ordering</td>
<td>1-13</td>
</tr>
<tr>
<td>Figure 1.5</td>
<td>Little-Endian Byte Ordering</td>
<td>1-13</td>
</tr>
<tr>
<td>Figure 1.6</td>
<td>Little-Endian Data in a Doubleword</td>
<td>1-14</td>
</tr>
<tr>
<td>Figure 1.7</td>
<td>Big-Endian Data in a Doubleword</td>
<td>1-14</td>
</tr>
<tr>
<td>Figure 1.8</td>
<td>Big-Endian Misaligned Word Addressing</td>
<td>1-15</td>
</tr>
<tr>
<td>Figure 1.9</td>
<td>Little-Endian Misaligned Word Addressing</td>
<td>1-15</td>
</tr>
<tr>
<td>Figure 1.10</td>
<td>R4600/R4700 CP0 Registers</td>
<td>1-16</td>
</tr>
<tr>
<td>Figure 1.11</td>
<td>Typical System Block Diagram</td>
<td>1-22</td>
</tr>
<tr>
<td>Figure 2.1</td>
<td>CPU Instruction Formats</td>
<td>2-1</td>
</tr>
<tr>
<td>Figure 3.1</td>
<td>Instruction Pipeline Stages</td>
<td>3-1</td>
</tr>
<tr>
<td>Figure 3.2</td>
<td>CPU Pipeline Activities</td>
<td>3-3</td>
</tr>
<tr>
<td>Figure 3.3</td>
<td>CPU Pipeline Branch Delay</td>
<td>3-4</td>
</tr>
<tr>
<td>Figure 3.4</td>
<td>CPU Pipeline Load Delay</td>
<td>3-4</td>
</tr>
<tr>
<td>Figure 3.5</td>
<td>Correspondence of Pipeline Stage to Interlock Condition</td>
<td>3-5</td>
</tr>
<tr>
<td>Figure 3.6</td>
<td>Exception Detection</td>
<td>3-7</td>
</tr>
<tr>
<td>Figure 3.7</td>
<td>Data Cache Miss</td>
<td>3-8</td>
</tr>
<tr>
<td>Figure 3.8</td>
<td>Instruction cache miss</td>
<td>3-9</td>
</tr>
<tr>
<td>Figure 4.1</td>
<td>Overview of a Virtual-to-Physical Address Translation</td>
<td>4-2</td>
</tr>
<tr>
<td>Figure 4.2</td>
<td>32-bit Virtual Address Translation</td>
<td>4-3</td>
</tr>
<tr>
<td>Figure 4.3</td>
<td>64-bit Virtual Address Translation</td>
<td>4-4</td>
</tr>
<tr>
<td>Figure 4.4</td>
<td>User Mode Virtual Address Space</td>
<td>4-5</td>
</tr>
<tr>
<td>Figure 4.5</td>
<td>Supervisor Mode Virtual Address Space</td>
<td>4-6</td>
</tr>
<tr>
<td>Figure 4.6</td>
<td>Kernel Mode Address Space</td>
<td>4-9</td>
</tr>
<tr>
<td>Figure 4.7</td>
<td>CP0 Registers and the TLB</td>
<td>4-13</td>
</tr>
<tr>
<td>Figure 4.8</td>
<td>Format of a TLB Entry</td>
<td>4-14</td>
</tr>
<tr>
<td>Figure 4.9</td>
<td>Fields of the PageMask and EntryHi Registers</td>
<td>4-14</td>
</tr>
<tr>
<td>Figure 4.10</td>
<td>Fields of the EntryLo0 and EntryLo Registers</td>
<td>4-15</td>
</tr>
<tr>
<td>Figure 4.11</td>
<td>Index Register</td>
<td>4-16</td>
</tr>
<tr>
<td>Figure 4.12</td>
<td>Random Register</td>
<td>4-16</td>
</tr>
<tr>
<td>Figure 4.13</td>
<td>Wired Register Boundary</td>
<td>4-18</td>
</tr>
<tr>
<td>Figure 4.14</td>
<td>Wired Register</td>
<td>4-18</td>
</tr>
<tr>
<td>Figure 4.15</td>
<td>Processor Revision Identifier Register Format</td>
<td>4-19</td>
</tr>
<tr>
<td>Figure 4.16</td>
<td>Config Register Format</td>
<td>4-19</td>
</tr>
<tr>
<td>Figure 4.17</td>
<td>LLAddr Register Format</td>
<td>4-21</td>
</tr>
<tr>
<td>Figure 4.18</td>
<td>TagLo and TagHi Register (P-cache) Formats</td>
<td>4-21</td>
</tr>
<tr>
<td>Figure 4.19</td>
<td>TLB Address Translation</td>
<td>4-22</td>
</tr>
<tr>
<td>Number</td>
<td>Figure Title</td>
<td>Page</td>
</tr>
<tr>
<td>-------------</td>
<td>------------------------------------------------------</td>
<td>--------</td>
</tr>
<tr>
<td>Figure 5.1</td>
<td>Context Register Format</td>
<td>5-2</td>
</tr>
<tr>
<td>Figure 5.2</td>
<td>BadVAddr Register Format</td>
<td>5-3</td>
</tr>
<tr>
<td>Figure 5.3</td>
<td>Count Register Format</td>
<td>5-3</td>
</tr>
<tr>
<td>Figure 5.4</td>
<td>Compare Register Format</td>
<td>5-3</td>
</tr>
<tr>
<td>Figure 5.5</td>
<td>Status Register</td>
<td>5-4</td>
</tr>
<tr>
<td>Figure 5.6</td>
<td>Cause Register Format</td>
<td>5-7</td>
</tr>
<tr>
<td>Figure 5.7</td>
<td>EPC Register Format</td>
<td>5-8</td>
</tr>
<tr>
<td>Figure 5.8</td>
<td>XContext Register Format</td>
<td>5-9</td>
</tr>
<tr>
<td>Figure 5.9</td>
<td>ECC Register Format</td>
<td>5-10</td>
</tr>
<tr>
<td>Figure 5.10</td>
<td>CacheErr Register Format</td>
<td>5-10</td>
</tr>
<tr>
<td>Figure 5.11</td>
<td>ErrorEPC Register Format</td>
<td>5-12</td>
</tr>
<tr>
<td>Figure 5.12</td>
<td>Reset Exception Processing</td>
<td>5-12</td>
</tr>
<tr>
<td>Figure 5.13</td>
<td>Cache Error Exception Processing</td>
<td>5-13</td>
</tr>
<tr>
<td>Figure 5.14</td>
<td>Soft Reset and NMI Exception Processing</td>
<td>5-13</td>
</tr>
<tr>
<td>Figure 5.15</td>
<td>General Exception Processing (Except Reset, Soft Reset, NMI, and Cache Error)</td>
<td>5-13</td>
</tr>
<tr>
<td>Figure 5.16</td>
<td>General Exception Handler (HW)</td>
<td>5-13</td>
</tr>
<tr>
<td>Figure 5.17</td>
<td>General Exception Servicing Guidelines (SW)</td>
<td>5-33</td>
</tr>
<tr>
<td>Figure 5.18</td>
<td>TLB/XTLB Miss Exception Handler (HW)</td>
<td>5-35</td>
</tr>
<tr>
<td>Figure 5.19</td>
<td>TLB/XTLB Exception Servicing Guidelines (SW)</td>
<td>5-36</td>
</tr>
<tr>
<td>Figure 5.20</td>
<td>Cache Error Exception Handling (HW) and Servicing Guidelines (SW)</td>
<td>5-37</td>
</tr>
<tr>
<td>Figure 5.21</td>
<td>Reset, Soft Reset &amp; NMI Exception Handling (HW) and Servicing Guidelines (SW)</td>
<td>5-38</td>
</tr>
<tr>
<td>Figure 6.1</td>
<td>FPU Functional Block Diagram</td>
<td>6-1</td>
</tr>
<tr>
<td>Figure 6.2</td>
<td>FPU Registers</td>
<td>6-3</td>
</tr>
<tr>
<td>Figure 6.3</td>
<td>Implementation/Revision Register</td>
<td>6-4</td>
</tr>
<tr>
<td>Figure 6.4</td>
<td>FP Control/Status Register Bit Assignments</td>
<td>6-5</td>
</tr>
<tr>
<td>Figure 6.5</td>
<td>Control/Status Register Cause, Flag, and Enable Fields</td>
<td>6-5</td>
</tr>
<tr>
<td>Figure 6.6</td>
<td>Single-Precision Floating-Point Format</td>
<td>6-7</td>
</tr>
<tr>
<td>Figure 6.7</td>
<td>Double-Precision Floating-Point Format</td>
<td>6-8</td>
</tr>
<tr>
<td>Figure 6.8</td>
<td>Binary Fixed-Point Format</td>
<td>6-9</td>
</tr>
<tr>
<td>Figure 6.9</td>
<td>FPU Instruction Pipeline</td>
<td>6-13</td>
</tr>
<tr>
<td>Figure 7.1</td>
<td>Control/Status Register Exception/Flag/Trap/Enable Bits</td>
<td>7-1</td>
</tr>
<tr>
<td>Figure 8.1</td>
<td>R4600/R4700 Processor Signals</td>
<td>8-1</td>
</tr>
<tr>
<td>Figure 9.1</td>
<td>Power-on Reset</td>
<td>9-4</td>
</tr>
<tr>
<td>Figure 9.2</td>
<td>Cold Reset</td>
<td>9-5</td>
</tr>
<tr>
<td>Figure 9.3</td>
<td>Warm Reset</td>
<td>9-6</td>
</tr>
<tr>
<td>Number</td>
<td>Figure Title</td>
<td>Page</td>
</tr>
<tr>
<td>------------</td>
<td>-------------------------------------------------------------------------------</td>
<td>------</td>
</tr>
<tr>
<td>Figure 10.1</td>
<td>Signal Transitions</td>
<td>10-1</td>
</tr>
<tr>
<td>Figure 10.2</td>
<td>Clock-to-Q Delay</td>
<td>10-1</td>
</tr>
<tr>
<td>Figure 10.3</td>
<td>Processor Clocks, PClock-to-SClock Division by 2</td>
<td>10-3</td>
</tr>
<tr>
<td>Figure 10.4</td>
<td>PLL Passive Components</td>
<td>10-4</td>
</tr>
<tr>
<td>Figure 10.5</td>
<td>R4600/R4700 PLL Network</td>
<td>10-5</td>
</tr>
<tr>
<td>Figure 10.6</td>
<td>R4600/R4700 Processor Phase-Locked System</td>
<td>10-6</td>
</tr>
<tr>
<td>Figure 10.7</td>
<td>Gate-Array System Without Phase Lock, Using the R4600/R4700 Processor</td>
<td>10-7</td>
</tr>
<tr>
<td>Figure 10.8</td>
<td>Gate Array and CMOS System Without Phase Lock, Using the R4600/R4700 Processor</td>
<td>10-9</td>
</tr>
<tr>
<td>Figure 11.1</td>
<td>Logical Hierarchy of Memory</td>
<td>11-1</td>
</tr>
<tr>
<td>Figure 11.2</td>
<td>Cache Support in the R4600/R4700</td>
<td>11-2</td>
</tr>
<tr>
<td>Figure 11.3</td>
<td>R4600/R4700 Primary I-Cache Line Format</td>
<td>11-3</td>
</tr>
<tr>
<td>Figure 11.4</td>
<td>R4600/R4700 8-Word Primary Data Cache Line Format</td>
<td>11-4</td>
</tr>
<tr>
<td>Figure 11.5</td>
<td>Primary Cache Data and Tag Organization</td>
<td>11-5</td>
</tr>
<tr>
<td>Figure 11.6</td>
<td>Primary Data Cache State Diagram</td>
<td>11-7</td>
</tr>
<tr>
<td>Figure 11.7</td>
<td>Synchronization with Test-and-Set</td>
<td>11-9</td>
</tr>
<tr>
<td>Figure 11.8</td>
<td>Synchronization Using a Counter</td>
<td>11-10</td>
</tr>
<tr>
<td>Figure 11.9</td>
<td>Test-and-Set using LL and SC</td>
<td>11-11</td>
</tr>
<tr>
<td>Figure 11.10</td>
<td>Counter Using LL and SC</td>
<td>11-12</td>
</tr>
<tr>
<td>Figure 12.1</td>
<td>System Interface Buses</td>
<td>12-2</td>
</tr>
<tr>
<td>Figure 12.2</td>
<td>State of RdRdy* Signal for Read Requests</td>
<td>12-3</td>
</tr>
<tr>
<td>Figure 12.3</td>
<td>State of WrRdy* Signal for Write Requests</td>
<td>12-3</td>
</tr>
<tr>
<td>Figure 12.4</td>
<td>System Interface Register-to-Register Operation</td>
<td>12-4</td>
</tr>
<tr>
<td>Figure 12.5</td>
<td>Requests and System Events</td>
<td>12-6</td>
</tr>
<tr>
<td>Figure 12.6</td>
<td>Back-to-Back Write Cycle Timing (R4000 compatible mode)</td>
<td>12-7</td>
</tr>
<tr>
<td>Figure 12.7</td>
<td>Processor Requests</td>
<td>12-7</td>
</tr>
<tr>
<td>Figure 12.8</td>
<td>Processor Request</td>
<td>12-8</td>
</tr>
<tr>
<td>Figure 12.9</td>
<td>External Requests</td>
<td>12-9</td>
</tr>
<tr>
<td>Figure 12.10</td>
<td>External Request</td>
<td>12-9</td>
</tr>
<tr>
<td>Figure 12.11</td>
<td>Read Response</td>
<td>12-11</td>
</tr>
<tr>
<td>Figure 12.12</td>
<td>Processor Read Request Protocol</td>
<td>12-16</td>
</tr>
<tr>
<td>Figure 12.13</td>
<td>Uncached Read—External Cycles</td>
<td>12-18</td>
</tr>
<tr>
<td>Figure 12.14</td>
<td>Processor Read Cycle</td>
<td>12-19</td>
</tr>
<tr>
<td>Figure 12.15</td>
<td>Processor Noncoherent Word Write Request Protocol</td>
<td>12-20</td>
</tr>
<tr>
<td>Figure 12.16</td>
<td>Write re-issue</td>
<td>12-20</td>
</tr>
<tr>
<td>Figure 12.17</td>
<td>Pipelined Writes</td>
<td>12-21</td>
</tr>
<tr>
<td>Figure 12.18</td>
<td>Processor Noncoherent Block Write Request Protocol</td>
<td>12-22</td>
</tr>
<tr>
<td>Figure 12.19</td>
<td>Delayed for the Assertion of WrRdy*</td>
<td>12-23</td>
</tr>
<tr>
<td>Number</td>
<td>Figure Title</td>
<td>Page</td>
</tr>
<tr>
<td>----------</td>
<td>-----------------------------------------------------------------------------</td>
<td>--------</td>
</tr>
<tr>
<td>Figure 12.20</td>
<td>Two Processor Write Requests, Second Write Arbitration Protocol for External Requests</td>
<td>2-24</td>
</tr>
<tr>
<td>Figure 12.21</td>
<td>External Read Request, System Interface in Master State</td>
<td>12-25</td>
</tr>
<tr>
<td>Figure 12.22</td>
<td>System Interface Release External Null Request</td>
<td>12-26</td>
</tr>
<tr>
<td>Figure 12.23</td>
<td>External Write Request, with System Interface initially Master State</td>
<td>12-27</td>
</tr>
<tr>
<td>Figure 12.24</td>
<td>Processor Word Read Request, followed by a Word Read Response</td>
<td>12-28</td>
</tr>
<tr>
<td>Figure 12.25</td>
<td>Block Read Response With Zero Wait State</td>
<td>12-29</td>
</tr>
<tr>
<td>Figure 12.26</td>
<td>Block Read Transaction With One Wait State</td>
<td>12-29</td>
</tr>
<tr>
<td>Figure 12.27</td>
<td>Read Response, Reduced Data Rate, System Interface in Slave State</td>
<td>12-30</td>
</tr>
<tr>
<td>Figure 12.28</td>
<td>System Interface Command Syntax Bit Definition</td>
<td>12-33</td>
</tr>
<tr>
<td>Figure 12.29</td>
<td>Read Request SysCmd Bus Bit Definition</td>
<td>12-33</td>
</tr>
<tr>
<td>Figure 12.30</td>
<td>Write Request SysCmd Bus Bit Definition</td>
<td>12-34</td>
</tr>
<tr>
<td>Figure 12.31</td>
<td>Null Request SysCmd Bus Bit Definition</td>
<td>12-36</td>
</tr>
<tr>
<td>Figure 12.32</td>
<td>Data Identifier SysCmd Bus Bit Definition</td>
<td>12-36</td>
</tr>
<tr>
<td>Figure 12.33</td>
<td>Retrieving a Data Block in Sequential Order</td>
<td>12-39</td>
</tr>
<tr>
<td>Figure 12.34</td>
<td>Retrieving Data in a Subblock Order</td>
<td>12-39</td>
</tr>
<tr>
<td>Figure 13.1</td>
<td>Interrupt Register Bits and Enables</td>
<td>13-1</td>
</tr>
<tr>
<td>Figure 13.2</td>
<td>R4600/R4700 Interrupt Signals</td>
<td>13-2</td>
</tr>
<tr>
<td>Figure 13.3</td>
<td>R4600/R4700 Nonmaskable Interrupt Signal</td>
<td>13-2</td>
</tr>
<tr>
<td>Figure 13.4</td>
<td>Masking of the R4600/R4700 Interrupts</td>
<td>13-3</td>
</tr>
<tr>
<td>Figure A.1</td>
<td>CPU Instruction Formats</td>
<td>A-2</td>
</tr>
<tr>
<td>Figure B.1</td>
<td>Load and Store Instruction Format</td>
<td>B-5</td>
</tr>
<tr>
<td>Figure B.2</td>
<td>Computational Instruction Format</td>
<td>B-6</td>
</tr>
<tr>
<td>Figure B.3</td>
<td>Bit Encoding for FPU Instructions</td>
<td>B-45</td>
</tr>
<tr>
<td>Number</td>
<td>Table Title</td>
<td>Page</td>
</tr>
<tr>
<td>--------</td>
<td>-----------------------------------------------------------------------------</td>
<td>------</td>
</tr>
<tr>
<td>Table 1.1</td>
<td>CPU Instruction Set: Load and Store Instructions</td>
<td>1-7</td>
</tr>
<tr>
<td>Table 1.2</td>
<td>CPU Instruction Set: Arithmetic Instructions (ALU Immediate)</td>
<td>1-7</td>
</tr>
<tr>
<td>Table 1.3</td>
<td>CPU Instruction Set: Arithmetic (3-Operand, R-Type)</td>
<td>1-8</td>
</tr>
<tr>
<td>Table 1.4</td>
<td>CPU Instruction Set: Multiply and Divide Instructions</td>
<td>1-8</td>
</tr>
<tr>
<td>Table 1.5</td>
<td>CPU Instruction Set: Jump and Branch Instruction</td>
<td>1-8</td>
</tr>
<tr>
<td>Table 1.6</td>
<td>CPU Instruction Set: Shift Instructions</td>
<td>1-9</td>
</tr>
<tr>
<td>Table 1.7</td>
<td>Instruction Set: Coprocessor Instructions</td>
<td>1-9</td>
</tr>
<tr>
<td>Table 1.8</td>
<td>CPU Instruction Set: Special Instructions</td>
<td>1-9</td>
</tr>
<tr>
<td>Table 1.9</td>
<td>MIPS 2/MIPS 3 Additional: Load and Store Instructions</td>
<td>1-10</td>
</tr>
<tr>
<td>Table 1.10</td>
<td>MIPS 2/MIPS 3 Additional: Arithmetic Instructions (ALU Immediate)</td>
<td>1-10</td>
</tr>
<tr>
<td>Table 1.11</td>
<td>MIPS 2/MIPS 3 Additional: Multiply and Divide Instructions</td>
<td>1-10</td>
</tr>
<tr>
<td>Table 1.12</td>
<td>MIPS 2/MIPS 3 Additional: Branch Instructions</td>
<td>1-11</td>
</tr>
<tr>
<td>Table 1.13</td>
<td>MIPS 2/MIPS 3 Additional: Arithmetic Instructions (3-operand, R-type)</td>
<td>1-11</td>
</tr>
<tr>
<td>Table 1.14</td>
<td>MIPS 2/MIPS 3 Additional: Shift Instructions</td>
<td>1-11</td>
</tr>
<tr>
<td>Table 1.15</td>
<td>MIPS 2/MIPS 3 Additional: Exception Instructions</td>
<td>1-12</td>
</tr>
<tr>
<td>Table 1.16</td>
<td>MIPS 2/MIPS 3 Additional: Coprocessor Instructions</td>
<td>1-12</td>
</tr>
<tr>
<td>Table 1.17</td>
<td>CP0 Instructions</td>
<td>1-12</td>
</tr>
<tr>
<td>Table 1.18</td>
<td>System Control Coprocessor (CP0) Register Definitions</td>
<td>1-17</td>
</tr>
<tr>
<td>Table 1.19</td>
<td>Floating-Point Latency Cycles</td>
<td>1-18</td>
</tr>
<tr>
<td>Table 1.20</td>
<td>System Interface Comparison Between R4400 PC and R4600/R4700</td>
<td>1-23</td>
</tr>
<tr>
<td>Table 1.21</td>
<td>Cache Comparison Between R4400 PC and R4600/R4700</td>
<td>1-24</td>
</tr>
<tr>
<td>Table 1.22</td>
<td>TLB Comparison Between R4400 PC and R4600/R4700</td>
<td>1-25</td>
</tr>
<tr>
<td>Table 1.23</td>
<td>Pipeline Comparison Between R4400 PC and R4600/R4700</td>
<td>1-25</td>
</tr>
<tr>
<td>Table 1.24</td>
<td>Coprocessor 0 Comparison Between R4400 PC and R4600/R4700</td>
<td>1-26</td>
</tr>
<tr>
<td>Table 1.25</td>
<td>Coprocessor 1 Comparison Between R4400 PC and R4600/R4700</td>
<td>1-26</td>
</tr>
<tr>
<td>Table 2.1</td>
<td>Byte Access within a Doubleword</td>
<td>2-3</td>
</tr>
<tr>
<td>Table 2.2</td>
<td>Multiply/Divide Instruction Cycle Timing</td>
<td>2-4</td>
</tr>
<tr>
<td>Table 3.1</td>
<td>Pipeline Exceptions</td>
<td>3-6</td>
</tr>
<tr>
<td>Table 3.2</td>
<td>Pipeline Interlocks</td>
<td>3-6</td>
</tr>
<tr>
<td>Number</td>
<td>Table Title</td>
<td>Page</td>
</tr>
<tr>
<td>---------</td>
<td>-----------------------------------------------------------------</td>
<td>------</td>
</tr>
<tr>
<td>Table 4.1</td>
<td>32-bit and 64-bit User Mode Segments</td>
<td>4-5</td>
</tr>
<tr>
<td>Table 4.2</td>
<td>32-bit and 64-bit Supervisor Mode Segments</td>
<td>4-7</td>
</tr>
<tr>
<td>Table 4.3</td>
<td>32-bit Kernel Mode Segments</td>
<td>4-10</td>
</tr>
<tr>
<td>Table 4.4</td>
<td>64-bit Kernel Mode Segments</td>
<td>4-11</td>
</tr>
<tr>
<td>Table 4.5</td>
<td>Cacheability and Coherency Attributes</td>
<td>4-12</td>
</tr>
<tr>
<td>Table 4.6</td>
<td>TLB Page Coherency (C) Bit Values</td>
<td>4-15</td>
</tr>
<tr>
<td>Table 4.7</td>
<td>Index Register Field Descriptions</td>
<td>4-16</td>
</tr>
<tr>
<td>Table 4.8</td>
<td>Random Register Field Descriptions</td>
<td>4-17</td>
</tr>
<tr>
<td>Table 4.9</td>
<td>Mask Field Values for Page Sizes</td>
<td>4-17</td>
</tr>
<tr>
<td>Table 4.10</td>
<td>Wired Register Field Descriptions</td>
<td>4-18</td>
</tr>
<tr>
<td>Table 4.11</td>
<td>PRId Register Fields</td>
<td>4-19</td>
</tr>
<tr>
<td>Table 4.12</td>
<td>Config Register Fields</td>
<td>4-20</td>
</tr>
<tr>
<td>Table 4.13</td>
<td>Cache Tag Register Fields</td>
<td>4-21</td>
</tr>
<tr>
<td>Table 4.14</td>
<td>TLB Instructions</td>
<td>4-23</td>
</tr>
<tr>
<td>Table 5.1</td>
<td>CP0 Exception Processing Registers</td>
<td>5-2</td>
</tr>
<tr>
<td>Table 5.2</td>
<td>Context Register Fields</td>
<td>5-2</td>
</tr>
<tr>
<td>Table 5.3</td>
<td>Status Register Fields</td>
<td>5-5</td>
</tr>
<tr>
<td>Table 5.4</td>
<td>Cause Register Fields</td>
<td>5-7</td>
</tr>
<tr>
<td>Table 5.5</td>
<td>Cause Register ExcCode Field</td>
<td>5-8</td>
</tr>
<tr>
<td>Table 5.6</td>
<td>XContext Register Fields</td>
<td>5-9</td>
</tr>
<tr>
<td>Table 5.7</td>
<td>ECC Register Fields</td>
<td>5-10</td>
</tr>
<tr>
<td>Table 5.8</td>
<td>CacheErr Register Fields</td>
<td>5-11</td>
</tr>
<tr>
<td>Table 5.9</td>
<td>Exception Vector Base Addresses</td>
<td>5-14</td>
</tr>
<tr>
<td>Table 5.10</td>
<td>Exception Vector Offsets</td>
<td>5-14</td>
</tr>
<tr>
<td>Table 5.11</td>
<td>Exception Priority Order</td>
<td>5-14</td>
</tr>
<tr>
<td>Table 5.12</td>
<td>List of Exception Flowcharts</td>
<td>5-32</td>
</tr>
<tr>
<td>Table 6.1</td>
<td>Floating-Point Control Register Assignments</td>
<td>6-4</td>
</tr>
<tr>
<td>Table 6.2</td>
<td>FCR0 Fields</td>
<td>6-4</td>
</tr>
<tr>
<td>Table 6.3</td>
<td>Control/Status Register Fields</td>
<td>6-5</td>
</tr>
<tr>
<td>Table 6.4</td>
<td>Rounding Mode Bit Decoding</td>
<td>6-7</td>
</tr>
<tr>
<td>Table 6.5</td>
<td>Equations for Calculating Values in Single and Double-Precision Floating-Point Format</td>
<td>6-8</td>
</tr>
<tr>
<td>Table 6.6</td>
<td>Floating-Point Format Parameter Values</td>
<td>6-9</td>
</tr>
<tr>
<td>Table 6.7</td>
<td>Minimum and Maximum Floating-Point Values</td>
<td>6-9</td>
</tr>
<tr>
<td>Table 6.8</td>
<td>Binary Fixed-Point Format Fields</td>
<td>6-9</td>
</tr>
<tr>
<td>Table 6.9</td>
<td>FPU Instruction Summary: Load, Move and Store Instructions</td>
<td>6-10</td>
</tr>
<tr>
<td>Table 6.10</td>
<td>FPU Instruction Summary: Conversion Instructions</td>
<td>6-10</td>
</tr>
<tr>
<td>Table 6.11</td>
<td>FPU Instruction Summary: Computational Instructions</td>
<td>6-11</td>
</tr>
<tr>
<td>Table 6.12</td>
<td>FPU Instruction Summary: Compare and Branch Instructions</td>
<td>6-11</td>
</tr>
<tr>
<td>Table 6.13</td>
<td>Mnemonics and Definitions of Compare Instruction Conditions</td>
<td>6-13</td>
</tr>
<tr>
<td>Table 6.14</td>
<td>Floating-Point Operation Latencies</td>
<td>6-14</td>
</tr>
<tr>
<td>Table 7.1</td>
<td>Default FPU Exception Actions</td>
<td>7-2</td>
</tr>
<tr>
<td>Table 7.2</td>
<td>FPU Exception-Causing Conditions</td>
<td>7-3</td>
</tr>
<tr>
<td>Table 8.1</td>
<td>System Interface Signals</td>
<td>8-2</td>
</tr>
<tr>
<td>Table 8.2</td>
<td>Clock/Control Interface Signals</td>
<td>8-3</td>
</tr>
<tr>
<td>Table 8.3</td>
<td>Interrupt Interface Signals</td>
<td>8-4</td>
</tr>
<tr>
<td>Table 8.4</td>
<td>JTAG Interface Signals</td>
<td>8-4</td>
</tr>
<tr>
<td>Table 8.5</td>
<td>Initialization Interface Signals</td>
<td>8-5</td>
</tr>
<tr>
<td>Table 8.6</td>
<td>R4600/R4700 Processor Signal Summary</td>
<td>8-6</td>
</tr>
<tr>
<td>Number</td>
<td>Table Title</td>
<td>Page</td>
</tr>
<tr>
<td>---------</td>
<td>-------------------------------------------------</td>
<td>------</td>
</tr>
<tr>
<td>Table 9.1</td>
<td>R4600/R4700 Processor Signal Summary</td>
<td>9-2</td>
</tr>
<tr>
<td>Table 9.2</td>
<td>Boot-Mode Settings</td>
<td>9-7</td>
</tr>
<tr>
<td>Table 11.1</td>
<td>Cache States</td>
<td>11-6</td>
</tr>
<tr>
<td>Table 11.2</td>
<td>Coherency Attributes and Processor Behavior</td>
<td>11-8</td>
</tr>
<tr>
<td>Table 12.1</td>
<td>Load Miss to Primary Cache</td>
<td>12-11</td>
</tr>
<tr>
<td>Table 12.2</td>
<td>Store Miss to Primary Cache</td>
<td>12-12</td>
</tr>
<tr>
<td>Table 12.3</td>
<td>System Interface Requests</td>
<td>12-14</td>
</tr>
<tr>
<td>Table 12.4</td>
<td>Transmit Data Rates and Patterns</td>
<td>12-30</td>
</tr>
<tr>
<td>Table 12.5</td>
<td>Release Latency for External Requests</td>
<td>12-32</td>
</tr>
<tr>
<td>Table 12.6</td>
<td>Encoding of SysCmd(7:5) for System Interface Commands</td>
<td>12-33</td>
</tr>
<tr>
<td>Table 12.7</td>
<td>Encoding of SysCmd(4:3) for Read Requests</td>
<td>12-34</td>
</tr>
<tr>
<td>Table 12.8</td>
<td>Encoding of SysCmd(2:0) for Block Read Request</td>
<td>12-34</td>
</tr>
<tr>
<td>Table 12.9</td>
<td>Doubleword, Word, or Partial-word Read Request Data Size Encoding of SysCmd(2:0)</td>
<td>12-34</td>
</tr>
<tr>
<td>Table 12.10</td>
<td>Write Request Encoding of SysCmd(4:3)</td>
<td>12-35</td>
</tr>
<tr>
<td>Table 12.11</td>
<td>Block Write Request Encoding of SysCmd(2:0)</td>
<td>12-35</td>
</tr>
<tr>
<td>Table 12.12</td>
<td>Doubleword, Word, or Partial-word Write Request Data Size Encoding of SysCmd(2:0)</td>
<td>12-35</td>
</tr>
<tr>
<td>Table 12.13</td>
<td>External Null Request Encoding of SysCmd(4:3)</td>
<td>12-36</td>
</tr>
<tr>
<td>Table 12.14</td>
<td>Processor Data Identifier Encoding of SysCmd(7:3)</td>
<td>12-37</td>
</tr>
<tr>
<td>Table 12.15</td>
<td>External Data Identifier Encoding of SysCmd(7:3)</td>
<td>12-38</td>
</tr>
<tr>
<td>Table 12.16</td>
<td>Sequence of Doublewords Transferred Using Subblock Ordering: Address 102</td>
<td>12-40</td>
</tr>
<tr>
<td>Table 12.17</td>
<td>Sequence of Doublewords Transferred Using Subblock Ordering: Address 112</td>
<td>12-40</td>
</tr>
<tr>
<td>Table 12.18</td>
<td>Sequence of Doublewords Transferred Using Subblock Ordering: Address 012</td>
<td>12-40</td>
</tr>
<tr>
<td>Table 12.19</td>
<td>Partial Word Transfer Byte Lane Usage</td>
<td>12-41</td>
</tr>
<tr>
<td>Table 14.1</td>
<td>Error Checking and Correcting Summary for Internal Transactions</td>
<td>14-3</td>
</tr>
<tr>
<td>Table 14.2</td>
<td>Error Checking and Correcting Summary for External Transactions</td>
<td>14-3</td>
</tr>
<tr>
<td>Table A.1</td>
<td>CPU Instruction Operation Notations</td>
<td>A-3</td>
</tr>
<tr>
<td>Table A.2</td>
<td>Load and Store Common Functions</td>
<td>A-4</td>
</tr>
<tr>
<td>Table A.3</td>
<td>Access Type Specifications for Loads/Stores</td>
<td>A-5</td>
</tr>
<tr>
<td>Table B.1</td>
<td>Valid FPU Instruction Formats</td>
<td>B-2</td>
</tr>
<tr>
<td>Table B.2</td>
<td>Logical Negation of Predicates by Condition</td>
<td>B-3</td>
</tr>
<tr>
<td>Table B.3</td>
<td>Load and Store Common Functions</td>
<td>B-5</td>
</tr>
<tr>
<td>Table B.4</td>
<td>Format Field Decoding</td>
<td>B-6</td>
</tr>
<tr>
<td>Table B.5</td>
<td>Floating-Point Instructions and Operations</td>
<td>B-7</td>
</tr>
<tr>
<td>Table C.1</td>
<td>Primary Data Cache Operations</td>
<td>C-2</td>
</tr>
<tr>
<td>Table C.2</td>
<td>Primary Instruction Cache Operations</td>
<td>C-3</td>
</tr>
<tr>
<td>Table E.3</td>
<td>Coprocessor 0 Hazards</td>
<td>E-1</td>
</tr>
</tbody>
</table>
Introduction
The IDT79R4600 (R4600) and IDT79R4700 (R4700) support a wide variety of processor-based applications. Because of their low power consumption, coupled with high performance, they are well suited for a wide variety of embedded applications, including laser printers, X-terminals, internetworking equipment, imaging equipment, and high-end video games. The R4600 and R4700 are also well-suited to high-performance desktop applications such as Windows™ NT desktop and notebook systems, and 3-D workstations.

Compatible with the IDT79R4400PC family for both hardware and software, the R4600 and R4700 will serve in many of the same applications, but in addition support low-power operation for applications such as notebook computers.

Floating Point
The R4700 has improved FPA multiply operations. All other features of the R4700 are the same as those in the R4600. In this manual, these two products are referred to collectively as the R4600/R4700, except when information pertains only to one of them. In that situation they are referred to individually.

Secondary Cache
The R4600/R4700 does not provide integrated secondary cache and multiprocessor support as found in the R4000SC and R4000MC, but it is simple to build an external secondary cache. For most embedded applications, however, the large on-chip, two-way set associative caches make this unnecessary.

Performance
The R4600/R4700 brings R4000SC performance levels to the R4000PC package, while at the same time providing lower cost and lower power. It does this by providing larger on-chip caches that are two-way set associative, fewer pipeline stalls, and early restart for data cache misses. The result is higher performance than for an R4000 at the same frequency and for the same system latencies (exact figures are system dependent).

Upward Compatibility
The R4600/R4700 provides complete upward application-software compatibility with the IDT79R3000 family of microprocessors, including the IDT79R3000A and the IDT RISController™ family (IDT79R30xx family) as well the IDT79R4000 family of microprocessors. Microsoft Windows™ NT and UNISOFT Unix™ V.4 operating systems insure the availability of thousands of applications programs, geared to provide a complete solution to a large number of processing needs. An array of development tools facilitates the rapid development of R4600/R4700-based systems, enabling a wide variety of customers to take advantage of the MIPS Open Architecture philosophy.

Together with the R4400, the R4600/R4700 provides a compatible, timely, and necessary evolution path from 32-bit to true, 64-bit computing. The original design objectives of the R4000 clearly mandated this evolution path; the result is a true 64-bit processor fully compatible with 32-bit operating systems and applications.

The R4600/R4700 enables 32-bit applications to access 64-bit compute power painlessly. The software tools support a wide variety of models, including 32-bit address and data, 64-bit address and data, and 32-bit address/64-bit data. 32-bit address/data enables applications to be migrated without “cleaning up” some software.
The R4600/R4700 offers high-performance, large caches, and MMU and FPA functions to these systems. For desktop systems, the R4600/R4700 supports a full migration to 64-bit, allowing 64-bit systems to execute true 64-bit or older 32-bit applications. For embedded applications, the power and bandwidth of 64-bit data types can be used without the memory expansion of 64-bit addressing.

The list on the following page summarizes the R4600/R4700 features. For a feature-by-feature comparison with the R4000, refer to the tables beginning on page 9-23.
Features

- True 64-bit microprocessor
  - 64-bit integer operations
  - 64-bit floating-point operations
  - 64-bit registers
  - 64-bit virtual address space

- High-performance microprocessor
  - For R4600: 133 peak MIPS at 133MHz
  - For R4700: 175 peak MIPS at 175MHz
  - For R4600: 44 peak MFLOP/s at 133MHz
  - For R4700: 87 peak MFLOP/s at 175MHz
  - For R4600: 109 SPECint92 and 83 SPECfp92 at 150MHz
  - For R4700: 132 SPECint92 and 94 SPECfp92 at 175MHz

- Large two-way set associative caches on-chip

- Improved FPA multiply performance (R4700 only)
  - 1 mul, 1 add every 4 clock cycles

- High level of integration
  - 64-bit integer CPU
  - 64-bit floating-point unit
  - 16KB instruction cache; 16KB data cache
  - Flexible MMU with large TLB

- Low-power operation
  - 3.3V or 5V power supply options
  - For R4600: 25mW/MHz internal power dissipation
    (2.5W @ 100MHz, 3.3V)
  - For R4700: 24mW/MHz internal power dissipation
    (2.4W @ 100MHz, 3.3V)
  - Standby mode reduces internal power to 400mW

- Fully software compatible with R4000 Processor Family

- Standard operating system support includes:
  - Microsoft Windows NT
  - UNISOFT Unix™ System V.4
  - JMI C-executive
  - VX Works

- Available in 179-pin PGA or 208-pin MQUAD

- Input and output clock frequency:
  - Input clock at one-half pipeline frequency
  - Output clock is a programmable divisor of the pipeline frequency
  - Selectable bus frequency
  - Ratios of 1/2...1/8 of pipeline rate

- 64GB physical address space

- Processor family for a wide variety of applications
  - Desktop workstations and PCs
  - Deskside or departmental servers
  - Routers
  - High-performance embedded applications
  - Notebooks

- Large number of development tools, including:
  - Cross compilers
  - Logic models
  - Logic analyzer support
Device Overview
The R4600/R4700 family brings a high-level of integration designed for high-performance and high-bandwidth computing. The key elements of the R4600/R4700 are briefly described below. An overview of these blocks is found here, with more detailed information on each block presented in subsequent chapters.

Figure 1.1 shows a block level representation of the functional units within the R4600/R4700.

Pipeline Overview
The R4600/R4700 uses a 5-stage pipeline similar to the IDT79R3000. The simplicity of this pipeline allows the R4600/R4700 to be lower-cost and lower-power than super-scalar or super-pipelined processors. Unlike the R3000, the R4600/R4700 does virtual-to-physical translation in parallel with cache access. This allows the R4600/R4700 to operate at over twice the frequency of the R3000 and to support a larger TLB for address translation.

Compared to the 8-stage R4000 pipeline, the R4600/R4700 is more efficient (requires fewer stalls). This is because the branch and load latency for the R4600/R4700 is shorter than for the R4000 (both are 2 cycles for the R4600/R4700 but are 3 and 4 cycles respectively for the R4000).
The internal pipeline of the R4600/R4700 processor operates at twice the frequency of the master clock, as discussed in Chapter 3. The processor achieves high throughput by pipelining cache accesses, shortening register access times, implementing virtual-indexed primary caches, and allowing the latency of certain functional units to span more than one pipeline clock cycles.

Refer to Chapter 3 for a detailed discussion of the CPU pipeline operation, including descriptions of the delay instructions, interruptions to the pipeline flow caused by interlocks and exceptions, and the R4600/R4700 implementation of a store buffer. Refer to Chapter 6 for a detailed discussion of the FPU pipeline.

CPU Register Overview

The R4600/R4700 has thirty-two general purpose registers. These registers are used for scalar integer operations and address calculation. The register file consists of two read ports and one write port, and is fully bypassed to minimize operation latency in the pipeline.

Figure 1.2 shows the R4600/R4700 CPU registers.

<table>
<thead>
<tr>
<th>General Purpose Registers</th>
<th>Multiply and Divide Registers</th>
</tr>
</thead>
<tbody>
<tr>
<td>r0</td>
<td>HI</td>
</tr>
<tr>
<td>r1</td>
<td></td>
</tr>
<tr>
<td>r2</td>
<td>LO</td>
</tr>
<tr>
<td>•</td>
<td></td>
</tr>
<tr>
<td>•</td>
<td></td>
</tr>
<tr>
<td>•</td>
<td></td>
</tr>
<tr>
<td>r29</td>
<td></td>
</tr>
<tr>
<td>r30</td>
<td></td>
</tr>
<tr>
<td>r31</td>
<td></td>
</tr>
</tbody>
</table>

Two of the CPU general purpose registers have assigned functions:
- r0 is hardwired to a value of zero, and can be used as the target register for any instruction whose result is to be discarded. r0 can also be used as a source when a zero value is needed.
- r31 is used as an implicit return destination address register by the JAL and BAL series of instructions.

The CPU has three special purpose registers:
- PC — Program Counter register
- HI — Multiply and Divide register higher result
- LO — Multiply and Divide register lower result

The two Multiply and Divide registers (HI, LO) store:
- the product of integer multiply operations, or
- the quotient (in LO) and remainder (in HI) of integer divide operations.

The R4600/R4700 processor has no Program Status Word (PSW) register as such; this is covered by the Status and Cause registers incorporated within the System Control Coprocessor (CP0). CP0 registers are described later in this chapter.
**CPU Instruction Set Overview**

Each CPU instruction is 32 bits long. As shown in Figure 1.3, there are three instruction formats:

- immediate (I-type)
- jump (J-type)
- register (R-type)

![Figure 1.3 CPU Instruction Formats](image)

Each format contains a number of different instructions, which are described further in this chapter. Fields of the instruction formats are described in Chapter 2.

Instruction decoding is simplified by limiting the number of formats to these three. This limitation means that the more complicated (and less frequently used) operations and addressing modes can be synthesized by the compiler, using sequences of these same simple instructions.

The instruction set can be further divided into the following groupings:

- **Load and Store** instructions move data between memory and general registers. They are all immediate (I-type) instructions, since the only addressing mode supported is base register plus 16-bit, signed immediate offset.

- **Computational** instructions perform arithmetic, logical, shift, multiply, and divide operations on values in registers. They include register (R-type, in which both the operands and the result are stored in registers) and immediate (I-type, in which one operand is a 16-bit immediate value) formats.

- **Jump and Branch** instructions change the control flow of a program. Jumps are always made to a paged, absolute address formed by combining a 26-bit target address with the high-order bits of the Program Counter (J-type format) or register address (R-type format). Branches have 16-bit offsets relative to the program counter (I-type). Jump And Link instructions save their return address in register 31.

- **Coprocessor** instructions perform operations in the coprocessors. Coprocessor load and store instructions are I-type.

- **Coprocessor 0** (system coprocessor) instructions perform operations on CP0 registers to control the memory management and exception handling facilities of the processor and the standby mode for power management. These are listed in Table 1.17.

- **Special** instructions perform system calls and breakpoint operations. These instructions are always R-type.

- **Exception** instructions cause a branch to the general exception-handling vector based upon the result of a comparison. These instructions occur in both R-type (both the operands and the result are registers) and I-type (one operand is a 16-bit immediate value) formats.
Overview

Chapter 2 provides more detail about these instructions, and Appendix A gives a complete description of each.

Table 1.1 through Table 1.16 list CPU instructions common to MIPS R-Series processors, along with the level in which they first appeared. The last column in each table refers to the MIPS ISA level in which the instruction first appeared. Table 1.17 lists CP0 instructions.

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level¹</th>
</tr>
</thead>
<tbody>
<tr>
<td>LB</td>
<td>Load Byte</td>
<td>I</td>
</tr>
<tr>
<td>LBU</td>
<td>Load Byte Unsigned</td>
<td>I</td>
</tr>
<tr>
<td>LH</td>
<td>Load Halfword</td>
<td>I</td>
</tr>
<tr>
<td>LHU</td>
<td>Load Halfword Unsigned</td>
<td>I</td>
</tr>
<tr>
<td>LW</td>
<td>Load Word</td>
<td>I</td>
</tr>
<tr>
<td>LWL</td>
<td>Load Word Left</td>
<td>I</td>
</tr>
<tr>
<td>LWR</td>
<td>Load Word Right</td>
<td>I</td>
</tr>
<tr>
<td>SB</td>
<td>Store Byte</td>
<td>I</td>
</tr>
<tr>
<td>SH</td>
<td>Store Halfword</td>
<td>I</td>
</tr>
<tr>
<td>SW</td>
<td>Store Word</td>
<td>I</td>
</tr>
<tr>
<td>SWL</td>
<td>Store Word Left</td>
<td>I</td>
</tr>
<tr>
<td>SWR</td>
<td>Store Word Right</td>
<td>I</td>
</tr>
</tbody>
</table>

**Note:**¹ For Tables 1.1 through 1.17 this column refers to the level in which the instruction first appeared.

Table 1.1 CPU Instruction Set: Load and Store Instructions

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADDI</td>
<td>Add Immediate</td>
<td>I</td>
</tr>
<tr>
<td>ADDIU</td>
<td>Add Immediate Unsigned</td>
<td>I</td>
</tr>
<tr>
<td>SLTI</td>
<td>Set on Less Than Immediate</td>
<td>I</td>
</tr>
<tr>
<td>SLTIU</td>
<td>Set on Less Than Immediate Unsigned</td>
<td>I</td>
</tr>
<tr>
<td>ANDI</td>
<td>AND Immediate</td>
<td>I</td>
</tr>
<tr>
<td>ORI</td>
<td>OR Immediate</td>
<td>I</td>
</tr>
<tr>
<td>XORI</td>
<td>Exclusive OR Immediate</td>
<td>I</td>
</tr>
<tr>
<td>LUI</td>
<td>Load Upper Immediate</td>
<td>I</td>
</tr>
</tbody>
</table>

Table 1.2 CPU Instruction Set: Arithmetic Instructions (ALU Immediate)
<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADD</td>
<td>Add</td>
<td>I</td>
</tr>
<tr>
<td>ADDU</td>
<td>Add Unsigned</td>
<td>I</td>
</tr>
<tr>
<td>SUB</td>
<td>Subtract</td>
<td>I</td>
</tr>
<tr>
<td>SUBU</td>
<td>Subtract Unsigned</td>
<td>I</td>
</tr>
<tr>
<td>SLT</td>
<td>Set on Less Than</td>
<td>I</td>
</tr>
<tr>
<td>SLTU</td>
<td>Set on Less Than Unsigned</td>
<td>I</td>
</tr>
<tr>
<td>AND</td>
<td>AND</td>
<td>I</td>
</tr>
<tr>
<td>OR</td>
<td>OR</td>
<td>I</td>
</tr>
<tr>
<td>XOR</td>
<td>Exclusive OR</td>
<td>I</td>
</tr>
<tr>
<td>NOR</td>
<td>NOR</td>
<td>I</td>
</tr>
</tbody>
</table>

**Table 1.3** CPU Instruction Set: Arithmetic (3-Operand, R-Type)

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>MULT</td>
<td>Multiply</td>
<td>I</td>
</tr>
<tr>
<td>MULTU</td>
<td>Multiply Unsigned</td>
<td>I</td>
</tr>
<tr>
<td>DIV</td>
<td>Divide</td>
<td>I</td>
</tr>
<tr>
<td>DIVU</td>
<td>Divide Unsigned</td>
<td>I</td>
</tr>
<tr>
<td>MFHI</td>
<td>Move From HI</td>
<td>I</td>
</tr>
<tr>
<td>MTHI</td>
<td>Move To HI</td>
<td>I</td>
</tr>
<tr>
<td>MFLO</td>
<td>Move From LO</td>
<td>I</td>
</tr>
<tr>
<td>MTLO</td>
<td>Move To LO</td>
<td>I</td>
</tr>
</tbody>
</table>

**Table 1.4** CPU Instruction Set: Multiply and Divide Instructions

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>J</td>
<td>Jump</td>
<td>I</td>
</tr>
<tr>
<td>JAL</td>
<td>Jump And Link</td>
<td>I</td>
</tr>
</tbody>
</table>

**Table 1.5** CPU Instruction Set: Jump and Branch Instruction
<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>JR</td>
<td>Jump Register</td>
<td>I</td>
</tr>
<tr>
<td>JALR</td>
<td>Jump And Link Register</td>
<td>I</td>
</tr>
<tr>
<td>BEQ</td>
<td>Branch on Equal</td>
<td>I</td>
</tr>
<tr>
<td>BNE</td>
<td>Branch on Not Equal</td>
<td>I</td>
</tr>
<tr>
<td>BLEZ</td>
<td>Branch on Less Than or Equal to Zero</td>
<td>I</td>
</tr>
<tr>
<td>BGTZ</td>
<td>Branch on Greater Than Zero</td>
<td>I</td>
</tr>
<tr>
<td>BLTZ</td>
<td>Branch on Less Than Zero</td>
<td>I</td>
</tr>
<tr>
<td>BGEZ</td>
<td>Branch on Greater Than or Equal to Zero</td>
<td>I</td>
</tr>
<tr>
<td>BLTZAL</td>
<td>Branch on Less Than Zero And Link</td>
<td>I</td>
</tr>
<tr>
<td>BGEZAL</td>
<td>Branch on Greater Than or Equal to Zero And Link</td>
<td>I</td>
</tr>
</tbody>
</table>

Table 1.5  CPU Instruction Set: Jump and Branch Instruction

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>SLL</td>
<td>Shift Left Logical</td>
<td>I</td>
</tr>
<tr>
<td>SRL</td>
<td>Shift Right Logical</td>
<td>I</td>
</tr>
<tr>
<td>SRA</td>
<td>Shift Right Arithmetic</td>
<td>I</td>
</tr>
<tr>
<td>SLLV</td>
<td>Shift Left Logical Variable</td>
<td>I</td>
</tr>
<tr>
<td>SRLV</td>
<td>Shift Right Logical Variable</td>
<td>I</td>
</tr>
<tr>
<td>SRAV</td>
<td>Shift Right Arithmetic Variable</td>
<td>I</td>
</tr>
</tbody>
</table>

Table 1.6  CPU Instruction Set: Shift Instructions

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>LWCz</td>
<td>Load Word to Coprocessor z</td>
<td>I</td>
</tr>
<tr>
<td>SWCz</td>
<td>Store Word from Coprocessor z</td>
<td>I</td>
</tr>
<tr>
<td>MTCz</td>
<td>Move To Coprocessor z</td>
<td>I</td>
</tr>
<tr>
<td>MFCz</td>
<td>Move From Coprocessor z</td>
<td>I</td>
</tr>
<tr>
<td>CTCz</td>
<td>Move Control to Coprocessor z</td>
<td>I</td>
</tr>
<tr>
<td>CFCz</td>
<td>Move Control From Coprocessor z</td>
<td>I</td>
</tr>
<tr>
<td>COPz</td>
<td>Coprocessor Operation z</td>
<td>I</td>
</tr>
<tr>
<td>BCzT</td>
<td>Branch on Coprocessor z True</td>
<td>I</td>
</tr>
<tr>
<td>BCzF</td>
<td>Branch on Coprocessor z False</td>
<td>I</td>
</tr>
</tbody>
</table>

Table 1.7  Instruction Set: Coprocessor Instructions

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>SYSCALL</td>
<td>System Call</td>
<td>I</td>
</tr>
<tr>
<td>BREAK</td>
<td>Break</td>
<td>I</td>
</tr>
</tbody>
</table>

Table 1.8  CPU Instruction Set: Special Instructions
<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>LD</td>
<td>Load Doubleword</td>
<td>III</td>
</tr>
<tr>
<td>LDL</td>
<td>Load Doubleword Left</td>
<td>III</td>
</tr>
<tr>
<td>LDR</td>
<td>Load Doubleword Right</td>
<td>III</td>
</tr>
<tr>
<td>LL</td>
<td>Load Linked</td>
<td>II</td>
</tr>
<tr>
<td>LLD</td>
<td>Load Linked Doubleword</td>
<td>III</td>
</tr>
<tr>
<td>LWU</td>
<td>Load Word Unsigned</td>
<td>III</td>
</tr>
<tr>
<td>SC</td>
<td>Store Conditional</td>
<td>II</td>
</tr>
<tr>
<td>SCD</td>
<td>Store Conditional Doubleword</td>
<td>III</td>
</tr>
<tr>
<td>SD</td>
<td>Store Doubleword</td>
<td>III</td>
</tr>
<tr>
<td>SDL</td>
<td>Store Doubleword Left</td>
<td>III</td>
</tr>
<tr>
<td>SDR</td>
<td>Store Doubleword Right</td>
<td>III</td>
</tr>
<tr>
<td>SYNC</td>
<td>Sync</td>
<td>II</td>
</tr>
</tbody>
</table>

Table 1.9  MIPS 2/ MIPS 3 Additional: Load and Store Instructions

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>DADDI</td>
<td>Doubleword Add Immediate</td>
<td>III</td>
</tr>
<tr>
<td>DADDIU</td>
<td>Doubleword Add Immediate</td>
<td>III</td>
</tr>
</tbody>
</table>

Table 1.10  MIPS 2/ MIPS 3 Additional: Arithmetic Instructions (ALU Immediate)

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>DMULT</td>
<td>Doubleword Multiply</td>
<td>III</td>
</tr>
<tr>
<td>DMULTU</td>
<td>Doubleword Multiply Unsigned</td>
<td>III</td>
</tr>
<tr>
<td>DDIV</td>
<td>Doubleword Divide</td>
<td>III</td>
</tr>
<tr>
<td>DDIVU</td>
<td>Doubleword Divide Unsigned</td>
<td>III</td>
</tr>
</tbody>
</table>

Table 1.11  MIPS 2/ MIPS 3 Additional: Multiply and Divide Instructions
### Table 1.12 MIPS 2/MIPS 3 Additional: Branch Instructions

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>BEQL</td>
<td>Branch on Equal Likely</td>
<td>II</td>
</tr>
<tr>
<td>BNEL</td>
<td>Branch on Not Equal Likely</td>
<td>II</td>
</tr>
<tr>
<td>BLEZL</td>
<td>Branch on Less Than or Equal to Zero Likely</td>
<td>II</td>
</tr>
<tr>
<td>BGTZL</td>
<td>Branch on Greater Than Zero Likely</td>
<td>II</td>
</tr>
<tr>
<td>BLTZL</td>
<td>Branch on Less Than Zero Likely</td>
<td>II</td>
</tr>
<tr>
<td>BGEZL</td>
<td>Branch on Greater Than or Equal to Zero Likely</td>
<td>II</td>
</tr>
<tr>
<td>BLTZALL</td>
<td>Branch on Less Than Zero And Link Likely</td>
<td>II</td>
</tr>
<tr>
<td>BGEZALL</td>
<td>Branch on Greater Than or Equal to Zero And Link Likely</td>
<td>II</td>
</tr>
<tr>
<td>BCzTL</td>
<td>Branch on Coprocessor z True Likely</td>
<td>II</td>
</tr>
<tr>
<td>BCzFL</td>
<td>Branch on Coprocessor z False Likely</td>
<td>II</td>
</tr>
</tbody>
</table>

### Table 1.13 MIPS 2/MIPS 3 Additional: Arithmetic Instructions

(3-operand, R-type)

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>DADD</td>
<td>Doubleword Add</td>
<td>III</td>
</tr>
<tr>
<td>DADDU</td>
<td>Doubleword Add Unsigned</td>
<td>III</td>
</tr>
<tr>
<td>DSUB</td>
<td>Doubleword Subtract</td>
<td>III</td>
</tr>
<tr>
<td>DSUBU</td>
<td>Doubleword Subtract Unsigned</td>
<td>III</td>
</tr>
</tbody>
</table>

### Table 1.14 MIPS 2/MIPS 3 Additional: Shift Instructions

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>DSLL</td>
<td>Doubleword Shift Left Logical</td>
<td>III</td>
</tr>
<tr>
<td>DSRL</td>
<td>Doubleword Shift Right Logical</td>
<td>III</td>
</tr>
<tr>
<td>DSRA</td>
<td>Doubleword Shift Right Arithmetic</td>
<td>III</td>
</tr>
<tr>
<td>DSOX</td>
<td>Doubleword Shift Left Logical Variable</td>
<td>III</td>
</tr>
<tr>
<td>DSOX</td>
<td>Doubleword Shift Right Logical Variable</td>
<td>III</td>
</tr>
<tr>
<td>DSOX</td>
<td>Doubleword Shift Right Arithmetic Variable</td>
<td>III</td>
</tr>
<tr>
<td>DSOX</td>
<td>Doubleword Shift Left Logical + 32</td>
<td>III</td>
</tr>
<tr>
<td>DSOX</td>
<td>Doubleword Shift Right Logical + 32</td>
<td>III</td>
</tr>
<tr>
<td>DSOX</td>
<td>Doubleword Shift Right Arithmetic + 32</td>
<td>III</td>
</tr>
<tr>
<td>OpCode</td>
<td>Description</td>
<td>MIPS ISA Level</td>
</tr>
<tr>
<td>---------</td>
<td>------------------------------------</td>
<td>----------------</td>
</tr>
<tr>
<td>TGE</td>
<td>Trap if Greater Than or Equal</td>
<td>II</td>
</tr>
<tr>
<td>TGEU</td>
<td>Trap if Greater Than or Equal Unsigned</td>
<td>II</td>
</tr>
<tr>
<td>TLT</td>
<td>Trap if Less Than</td>
<td>II</td>
</tr>
<tr>
<td>TLTU</td>
<td>Trap if Less Than Unsigned</td>
<td>II</td>
</tr>
<tr>
<td>TEQ</td>
<td>Trap if Equal</td>
<td>II</td>
</tr>
<tr>
<td>TNE</td>
<td>Trap if Not Equal</td>
<td>II</td>
</tr>
<tr>
<td>TGEI</td>
<td>Trap if Greater Than or Equal Immediate</td>
<td>II</td>
</tr>
<tr>
<td>TGEIU</td>
<td>Trap if Greater Than or Equal Immediate Unsigned</td>
<td>II</td>
</tr>
<tr>
<td>TLTI</td>
<td>Trap if Less Than Immediate</td>
<td>II</td>
</tr>
<tr>
<td>TLTIU</td>
<td>Trap if Less Than Immediate Unsigned</td>
<td>II</td>
</tr>
<tr>
<td>TEQI</td>
<td>Trap if Equal Immediate</td>
<td>II</td>
</tr>
<tr>
<td>TNEI</td>
<td>Trap if Not Equal Immediate</td>
<td>II</td>
</tr>
</tbody>
</table>

Table 1.15 MIPS 2/ MIPS 3 Additional: Exception Instructions

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>DMFCz</td>
<td>Doubleword Move From Coprocessor z</td>
<td>II</td>
</tr>
<tr>
<td>DMTCz</td>
<td>Doubleword Move To Coprocessor z</td>
<td>II</td>
</tr>
<tr>
<td>LDCz</td>
<td>Load Double Coprocessor z</td>
<td>II</td>
</tr>
<tr>
<td>SDCz</td>
<td>Store Double Coprocessor z</td>
<td>II</td>
</tr>
</tbody>
</table>

Table 1.16 MIPS 2/ MIPS 3 Additional: Coprocessor Instructions

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
<th>MIPS ISA Level</th>
</tr>
</thead>
<tbody>
<tr>
<td>DMFC0</td>
<td>Doubleword Move From CP0</td>
<td>III</td>
</tr>
<tr>
<td>DMTC0</td>
<td>Doubleword Move To CP0</td>
<td>III</td>
</tr>
<tr>
<td>MTC0</td>
<td>Move to CP0</td>
<td>I</td>
</tr>
<tr>
<td>MFC0</td>
<td>Move from CP0</td>
<td>I</td>
</tr>
<tr>
<td>TLBR</td>
<td>Read Indexed TLB Entry</td>
<td>I</td>
</tr>
<tr>
<td>TLBWI</td>
<td>Write Indexed TLB Entry</td>
<td>I</td>
</tr>
<tr>
<td>TLBWR</td>
<td>Write Random TLB Entry</td>
<td>I</td>
</tr>
<tr>
<td>TLBP</td>
<td>Probe TLB for Matching Entry</td>
<td>I</td>
</tr>
<tr>
<td>CACHE</td>
<td>Cache Operation</td>
<td>R4xxx only</td>
</tr>
<tr>
<td>ERET</td>
<td>Exception Return</td>
<td>R4xxx only</td>
</tr>
<tr>
<td>WAIT</td>
<td>Enter Standby mode</td>
<td>R4600 only</td>
</tr>
</tbody>
</table>

Table 1.17 CP0 Instructions
Data Formats and Addressing

The R4600/R4700 processor uses four data formats: a 64-bit doubleword, a 32-bit word, a 16-bit halfword, and an 8-bit byte. Byte ordering within each of the larger data formats—halfword, word, doubleword—can be configured in either big-endian or little-endian order. Endianness refers to the location of byte 0 within the multi-byte data structure. Figures 1.4 and 1.5 show the ordering of bytes within words and the ordering of words within multiple-word structures for the big-endian and little-endian conventions.

When the R4000 processor is configured as a big-endian system, byte 0 is the most-significant (leftmost) byte, thereby providing compatibility with MC 68000' and IBM 370' conventions. Figure 1.4 shows this configuration.

When configured as a little-endian system, byte 0 is always the least-significant (rightmost) byte, which is compatible with iAPX' x86 and DEC VAX' conventions. Figure 1.5 shows this configuration.

In this text, bit 0 is always the least-significant (rightmost) bit; thus, bit designations are always little-endian (although no instructions explicitly designate bit positions within words).

Figures 1.6 and 1.7 show little-endian and big-endian byte ordering in doublewords.
The CPU uses byte addressing for halfword, word, and doubleword accesses with the following alignment constraints:

- Halfword accesses must be aligned on an even byte boundary (0, 2, 4...).
- Word accesses must be aligned on a byte boundary divisible by four (0, 4, 8...).
- Doubleword accesses must be aligned on a byte boundary divisible by eight (0, 8, 16...).

The following special instructions load and store words that are not aligned on 4-byte (word) or 8-word (doubleword) boundaries:

- LWL LWR SWL SWR
- LDL LDR SDL SDR

These instructions are used in pairs to provide addressing of misaligned words. Addressing misaligned data incurs one additional instruction cycle over that required for addressing aligned data. This extra cycle is because of an extra instruction for the "pair" (e.g., LWL and LWR form a pair). Also note that the CPU moves the unaligned data at the same rate as a hardware mechanism.
Figures 1.8 and 1.9 show the access of a misaligned word that has byte address 3.

![Figure 1.8 Big-Endian Misaligned Word Addressing](image1)

![Figure 1.9 Little-Endian Misaligned Word Addressing](image2)

**Coprocessors (CP0-CP2)**

The MIPS ISA (MIPS III) for the R4600/R4700 (and R4000/R4400) defines three coprocessors (designated CP0 through CP2):

- **Coprocessor 0 (CP0)** is incorporated on the CPU chip and supports the virtual memory system and exception handling. CP0 is also referred to as the System Control Coprocessor.
- **Coprocessor 1 (CP1)** is incorporated on the R4600/R4700, and implements the MIPS floating-point instruction set.
- **Coprocessor 2 (CP2)** is reserved for future use.

CP0 and CP1 are described in the sections that follow.

**System Control Coprocessor, CP0**

CP0 translates virtual addresses into physical addresses and manages exceptions and transitions between kernel, supervisor, and user states. CP0 also controls the cache subsystem, as well as providing diagnostic control and error recovery facilities.

CP0 is also used to control the power management for the R4600/R4700. This is the standby mode and it can be used to reduce the power consumption of the internal core of the CPU. The standby mode is entered by executing the WAIT instruction with the SysAD bus idle and is exited by any interrupt. This feature is discussed in Appendix G.
The CP0 registers shown in Figure 1.10 and described in Table 1.18 on page 1.17 manipulate the memory management and exception handling capabilities of the CPU.

**Note:** Access to reserved or undefined CP0 register results are undefined. An exception may or may not result.

<table>
<thead>
<tr>
<th>Register Name</th>
<th>Reg. #</th>
<th>Register Name</th>
<th>Reg. #</th>
</tr>
</thead>
<tbody>
<tr>
<td>Index</td>
<td>0</td>
<td>Config</td>
<td>16</td>
</tr>
<tr>
<td>Random</td>
<td>1</td>
<td>LLAddr</td>
<td>17</td>
</tr>
<tr>
<td>EntryLo0</td>
<td>2</td>
<td>Wired</td>
<td>18</td>
</tr>
<tr>
<td>EntryLo1</td>
<td>3</td>
<td>PageMask</td>
<td>19</td>
</tr>
<tr>
<td>Context</td>
<td>4</td>
<td>BadVAAddr</td>
<td>20</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Count</td>
<td>21</td>
</tr>
<tr>
<td></td>
<td></td>
<td>EntryHi</td>
<td>22</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Compare</td>
<td>23</td>
</tr>
<tr>
<td></td>
<td></td>
<td>SR</td>
<td>24</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Cause</td>
<td>25</td>
</tr>
<tr>
<td></td>
<td></td>
<td>EPC</td>
<td>26</td>
</tr>
<tr>
<td></td>
<td></td>
<td>PRId</td>
<td>27</td>
</tr>
<tr>
<td></td>
<td></td>
<td>ECC</td>
<td>28</td>
</tr>
<tr>
<td></td>
<td></td>
<td>CacheErr</td>
<td>29</td>
</tr>
<tr>
<td></td>
<td></td>
<td>TagLo</td>
<td>30</td>
</tr>
<tr>
<td></td>
<td></td>
<td>TagHi</td>
<td>31</td>
</tr>
<tr>
<td></td>
<td></td>
<td>ErrorEPC</td>
<td></td>
</tr>
</tbody>
</table>

**Figure 1.10**  R4600/ R4700 CP0 Registers
<table>
<thead>
<tr>
<th>Number</th>
<th>Register</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Index</td>
<td>Programmable pointer into TLB array</td>
</tr>
<tr>
<td>1</td>
<td>Random</td>
<td>Pseudorandom pointer into TLB array <em>(read only)</em></td>
</tr>
<tr>
<td>2</td>
<td>EntryLo0</td>
<td>Low half of TLB entry for even virtual page (VPN)</td>
</tr>
<tr>
<td>3</td>
<td>EntryLo1</td>
<td>Low half of TLB entry for odd virtual page (VPN)</td>
</tr>
<tr>
<td>4</td>
<td>Context</td>
<td>Pointer to kernel virtual page table entry (PTE) for 32-bit address spaces</td>
</tr>
<tr>
<td>5</td>
<td>PageMask</td>
<td>TLB Page Mask</td>
</tr>
<tr>
<td>6</td>
<td>Wired</td>
<td>Number of wired TLB entries</td>
</tr>
<tr>
<td>7</td>
<td>—</td>
<td>Reserved</td>
</tr>
<tr>
<td>8</td>
<td>BadVAddr</td>
<td>Bad virtual address</td>
</tr>
<tr>
<td>9</td>
<td>Count</td>
<td>Timer Count</td>
</tr>
<tr>
<td>10</td>
<td>EntryHi</td>
<td>High half of TLB entry</td>
</tr>
<tr>
<td>11</td>
<td>Compare</td>
<td>Timer Compare</td>
</tr>
<tr>
<td>12</td>
<td>SR</td>
<td>Status register</td>
</tr>
<tr>
<td>13</td>
<td>Cause</td>
<td>Cause of last exception</td>
</tr>
<tr>
<td>14</td>
<td>EPC</td>
<td>Exception Program Counter</td>
</tr>
<tr>
<td>15</td>
<td>PRId</td>
<td>Processor Revision Identifier</td>
</tr>
<tr>
<td>16</td>
<td>Config</td>
<td>Configuration register</td>
</tr>
<tr>
<td>17</td>
<td>LLAddr</td>
<td>Load Linked Address</td>
</tr>
<tr>
<td>18-19</td>
<td>—</td>
<td>Reserved</td>
</tr>
<tr>
<td>20</td>
<td>XContext</td>
<td>Pointer to kernel virtual PTE table for 64-bit address spaces</td>
</tr>
<tr>
<td>21-25</td>
<td>—</td>
<td>Reserved</td>
</tr>
<tr>
<td>26</td>
<td>ECC</td>
<td>Secondary-cache error checking and correcting (ECC) and Primary parity</td>
</tr>
<tr>
<td>27</td>
<td>CacheErr</td>
<td>Cache Error and Status register</td>
</tr>
<tr>
<td>28</td>
<td>TagLo</td>
<td>Cache Tag register</td>
</tr>
<tr>
<td>29</td>
<td>TagHi</td>
<td>Cache Tag register</td>
</tr>
<tr>
<td>30</td>
<td>ErrorEPC</td>
<td>Error Exception Program Counter</td>
</tr>
<tr>
<td>31</td>
<td>—</td>
<td>Reserved</td>
</tr>
</tbody>
</table>

Table 1.18 System Control Coprocessor (CP0) Register Definitions
Floating-Point Co-Processor
The R4600/R4700 incorporates an entire floating-point co-processor on chip, including a floating-point register file and execution units. The floating-point co-processor forms a “seamless” interface with the integer unit, decoding and executing instructions in parallel with the integer unit. The R4700 enhances the FPA implemented in the original R4600, resulting in an improved peak MFLOP rate.

Floating-Point Units
The R4600/R4700 floating-point execution units supports single and double precision arithmetic, as specified in the IEEE Standard 754. The execution unit is broken into a separate multiply unit and a combined add/convert/divide/square root unit. Overlap of multiplies and add/subtract is supported. The multiplier is partially pipelined, allowing a new multiply to begin every 6 cycles for the R4600, and every 4 cycles for the R4700.

As in the R3010 and R4000, the R4600/R4700 maintains fully precise floating-point exceptions while allowing both overlapped and pipelined operations. Precise exceptions are extremely important in mission-critical environments, such as ADA, and highly desirable for debugging in any environment.

The floating-point unit’s operation set includes floating-point add, subtract, multiply, divide, square root, conversion between fixed-point and floating-point format, conversion among floating-point formats, and floating-point compare. These operations comply with the IEEE Standard 754.

Table 1.19 shows the latencies of some of the floating-point instructions in internal processor cycles. Due to pipelining, repeat rates may be higher. Also note that many operations are autonomous and can go in parallel.

<table>
<thead>
<tr>
<th>Operation</th>
<th>Single Precision</th>
<th>Double Precision</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADD</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>SUB</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>MUL</td>
<td>R4600: 8</td>
<td>R4600: 8</td>
</tr>
<tr>
<td></td>
<td>R4700: 4</td>
<td>R4700: 5</td>
</tr>
<tr>
<td>DIV</td>
<td>32</td>
<td>61</td>
</tr>
<tr>
<td>SQRT</td>
<td>31</td>
<td>60</td>
</tr>
<tr>
<td>CMP</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>FIX</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>FLOAT</td>
<td>6</td>
<td>6</td>
</tr>
<tr>
<td>ABS</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>MOV</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>NEG</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>LWC1, LDC1</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>SWC1, SDC1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Table 1.19  Floating-Point Latency Cycles
Virtual to Physical Address Mapping

The R4600/R4700 provides three modes of operation:

- user mode
- supervisor mode
- kernel mode

This mechanism is available to system software to provide a secure environment for user processes. Bits in a status register determine the mode of operation. In the user mode, the R4600/R4700 provides a single, uniform virtual address space of 256GB (2GB when Status.UX = 0).

When operating in the kernel mode, four distinct virtual address spaces, totalling 1024GB (4GB when Status.KX = 0), are simultaneously available and are differentiated by the high-order bits of the virtual address.

The R4600/R4700 processors also support a supervisor mode in which the virtual address space is 256.5GB (2.5GB when Status.SX = 0), divided into three regions based on the high-order bits of the virtual address.

When the R4600/R4700 uses 64-bit virtual addresses, the address space layouts are an upward compatible extension of the 32-bit virtual address space layout. A detailed description of the addressing is given in Chapter 4.

Joint TLB

For fast virtual-to-physical address decoding, the R4600/R4700 uses a large, fully associative TLB which maps 96 Virtual pages to their corresponding physical addresses. The TLB is organized as 48 pairs of even-odd entries, and maps a virtual address and address space identifier into the large, 64GB physical address space.

Two mechanisms are provided to assist in controlling the amount of mapped space, and the replacement characteristics of various memory regions. First, the page size can be configured, on a per-entry basis, to map a page size of 4KB to 16MB (in multiples of 4). A CP0 register is loaded with the page size of a mapping, and that size is entered into the TLB when a new entry is written. Thus, operating systems can provide special purpose maps: for example, a typical frame buffer can be memory mapped using only one TLB entry.

The second mechanism controls the replacement algorithm when a TLB miss occurs. The R4600/R4700 provides a random replacement algorithm to select a TLB entry to be written with a new mapping; however, the processor provides a mechanism whereby a system specific number of mappings can be locked into the TLB, and thus avoid being randomly replaced. This facilitates the design of real-time systems, by allowing deterministic access to critical software.

The joint TLB also contains information to control the cache coherency protocol for each page. Specifically, each page has attribute bits to determine whether the coherency algorithm is: uncached, non-coherent write-back, non-coherent write-through, non-coherent write-allocate, non-coherent write-through no write-allocate, sharable, exclusive, or update. Non-coherent write-back is typically used for both code and data on the R4600/R4700; the write-through modes support more efficient frame buffer accesses than the R4000 family. The coherent modes are supported for R4000 compatibility and generate different transaction types on the system interface: cache coherency is not supported however.
**Instruction TLB**

The R4600/R4700 also incorporates a 2-entry instruction TLB. Each entry maps a 4KB page. The instruction TLB improves performance by allowing instruction address translation to occur in parallel with data address translation. When a miss occurs on an instruction address translation, the least-recently used ITLB entry is filled from the JTLB. The operation of the ITLB is invisible to the user.

**Data TLB**

The R4600/R4700 also incorporates a 4-entry data TLB. Each entry maps a 4KB page. The data TLB improves performance by allowing data address translation to occur in parallel with data address translation. When a miss occurs on an data address translation, the DTLB is filled from the JTLB. The DTLB refill is pseudo-LRU: the least recently used entry of the least recently used half is filled. The operation of the DTLB is invisible to the user.

**Cache Memory**

In order to keep the R4600/R4700’s high-performance pipeline full and operating efficiently, the R4600/R4700 incorporates on-chip instruction and data caches that can be accessed in a single processor cycle. Each cache has its own 64-bit data path and can be accessed in parallel. The cache subsystem provides the integer and floating-point units with an aggregate bandwidth of 1.6GB per second at a system clock frequency of 50MHz.

Furthermore, the large, Two-way set associative caches increase emulation performance of DOS and Windows 3.1 applications when running under Windows NT.

**Instruction Cache**

The R4600/R4700 incorporates a two-way set associative on-chip instruction cache. This virtually indexed, physically tagged cache is 16KB in size and is protected with word parity.

Because the cache is virtually indexed, the virtual-to-physical address translation occurs in parallel with the cache access, thus further increasing performance by allowing these two operations to occur simultaneously. The tag holds a 24-bit physical address and valid bit, and is parity protected.

The instruction cache is 64-bits wide, and can be refilled or accessed in a single processor cycle. Instruction fetches require only 32 bits per cycle, for a peak instruction bandwidth of 700 MB/sec @ 175MHz. Sequential accesses take advantage of the 64-bit fetch to reduce power dissipation, and cache miss refill writes 64 bits per cycle to minimize the cache miss penalty. The line size is eight instructions (32 bytes) to maximize performance.

**Data Cache**

For fast, single cycle data access, the R4600/R4700 includes a 16KB on-chip data cache that is two-way set associative with a fixed 32-byte (eight words) line size. Both the D-cache and the I-cache can be accessed each pipeline cycle; thus, the data bandwidth is 1400 MB/sec @ 175 MHz, in addition to the 700 MB/sec instruction bandwidth.

The data cache is protected with byte parity and its tag is protected with a single parity bit. It is virtually indexed and physically tagged to allow simultaneous address translation and data cache access.
The normal write policy is writeback, which means that a store to a cache line does not immediately cause memory to be updated. This increases system performance by reducing bus traffic and eliminating the bottleneck of waiting for each store operation to finish before issuing a subsequent memory operation. Software can however select write-through on a per-page basis when it is appropriate, such as for frame buffers.

Associated with the Data Cache is the store buffer. When the R4600/R4700 executes a Store instruction, this single-entry buffer gets written with the store data while the tag comparison is performed. If the tag matches, then the data is written into the Data Cache in the next cycle that the Data Cache is not accessed (the next non-load cycle). The store buffer allows the R4600/R4700 to execute a store every processor cycle and to perform back-to-back stores without penalty.

**Write buffer**

Writes to external memory, whether cache miss writebacks or stores to uncached or write-through addresses, use the on-chip write buffer. The write buffer holds up to four 64-bit address and data pairs or 1 cache line to be written back. The entire buffer is used for a data cache writeback and allows the processor to proceed in parallel with memory update. For uncached and write-through stores, the write buffer significantly increases performance over the R4000 family of processors.

**R4600/R4700 Clocks**

The R4600/R4700 has a number of clocks for the user. First, there is the pipeline clock, PClock. This clock is used for the pipeline and pipeline related functions internal to the R4600/R4700. It is two times the MasterClock frequency. The next clock is the system interface clock, SClock. This is also an internal clock and is used to sample data at the system interface and to clock data into the processor system interface output registers. The SClock is a divided version of the PClock. The divisor is selected at boot time.

There are three external clocks. (Some outputs are replicated to minimize loading.) The MasterOut is at the same frequency as MasterClock and can be used to clock certain external logic. The other clocks are used by the external agent. These are the TClock, Transmit clock, and the RClock, Receive clock. The TClock is used to clock the output registers (signals transmitted to the R4600/R4700) of the external agent and is at the same frequency as SClock. The RClock is used to clock the input register (signals received from the R4600/R4700) of the external agent. It is also at the same frequency as the SClock but its phase leads the SClock and TClock by 25%. The R4600/R4700 implements an on-chip PLL to eliminate the effects of clock skew.
System Interface

The R4600/R4700 supports a 64-bit system interface that is compatible with the R4000PC system interface. This interface operates from two clocks provided by the R4600/R4700, TClock[1:0] and RClock[1:0], at a division of the pipeline clock.

The interface consists of a 64-bit Address/Data bus with 8 check bits and a 9-bit command bus. In addition, there are 8 handshake signals and 6 interrupt inputs. The interface has a simple timing specification and is capable of transferring data between the processor and memory at a peak rate of 400MB/sec at 50MHz.

Figure 1.11 shows a typical system using the R4600/R4700. In this example there is DRAM, a boot EPROM and an optional secondary cache.
### Comparison of R4600/R4700 and R4400

This section compares features of the R4600/R4700 to the earlier R4400 PC. Table 1.20 to Table 1.26 highlight some of the differences between the R4600/R4700 and the R4400 PC. This list is not exhaustive.

<table>
<thead>
<tr>
<th>Item</th>
<th>R4400 PC</th>
<th>R4600/ R4700</th>
</tr>
</thead>
<tbody>
<tr>
<td>I/O</td>
<td>R4400: TTL compatible</td>
<td>R4600/R4700: TTL-compatible (5V ±0.5%)</td>
</tr>
<tr>
<td></td>
<td>RV4400: LV CMOS</td>
<td>RV4600/RV4700: LVCMOS (3.3V±0.3V)</td>
</tr>
<tr>
<td>Package</td>
<td>179-pin ceramic PGA</td>
<td>same and 208-pin MQUAD</td>
</tr>
<tr>
<td>JTAG</td>
<td>yes</td>
<td>no (serial out connected directly to serial in)</td>
</tr>
<tr>
<td>Block transfer sizes</td>
<td>16B or 32B</td>
<td>32B</td>
</tr>
<tr>
<td>Sclow divisor</td>
<td>2, 3, 4, 6, 8</td>
<td>2, 3, 4, 5, 6, 7, 8</td>
</tr>
<tr>
<td>Non-block writes</td>
<td>max throughput of 4 sclock cycles</td>
<td>two new system interface protocol options that support 2 sclock cycle throughput (remains 4 in compatibility mode)</td>
</tr>
<tr>
<td>Serial configuration</td>
<td>as described in R4000 User's Guide</td>
<td>different, as described in Table 9.2 on page 9-7</td>
</tr>
<tr>
<td>Address bits 63..56 on reads and writes</td>
<td>zero</td>
<td>bits 19..12 of virtual address</td>
</tr>
<tr>
<td>Uncached and write-through stores</td>
<td>uncached stores are buffered in 1-entry uncached store buffer (write through not possible)</td>
<td>uncached and write-through stores buffered in 4-entry write buffer</td>
</tr>
<tr>
<td>SysADC</td>
<td>parity only</td>
<td>same</td>
</tr>
<tr>
<td>SysADC for non-data cycles</td>
<td>parity</td>
<td>zero</td>
</tr>
<tr>
<td>SysCmdP</td>
<td>parity</td>
<td>zero</td>
</tr>
<tr>
<td>Parity error during writeback</td>
<td>use Cache Error exception</td>
<td>output bad parity</td>
</tr>
<tr>
<td>Error bit in data identifier of read responses</td>
<td>Bus Error if error bit set for any doubleword</td>
<td>only check error bit of first doubleword; all other error bits ignored</td>
</tr>
<tr>
<td>Parity error on read data</td>
<td>Bus Error if parity error in any doubleword</td>
<td>bad parity written to cache; take Cache Error exception if bad parity occurs on doublewords that the processor is waiting for</td>
</tr>
<tr>
<td>Block writes</td>
<td>1-2 null cycles between address and data</td>
<td>0 cycles between address and data</td>
</tr>
<tr>
<td>Release after Read Request</td>
<td>variable latency</td>
<td>0 latency</td>
</tr>
<tr>
<td>SysAD value for x cycles of write-back data pattern</td>
<td>data bus undefined</td>
<td>data bus maintains last D cycle value</td>
</tr>
<tr>
<td>SysAD bus use after last D cycle of writeback</td>
<td>data bus undefined</td>
<td>trailing x cycles (e.g. DDxxDDxx, not DDxxDD) follow rule in entry immediately preceding</td>
</tr>
<tr>
<td>Output slew rate</td>
<td>dynamic feedback control</td>
<td>simple CMOS output buffers with 2-bit static strength control</td>
</tr>
<tr>
<td>IOOut output</td>
<td>output slew rate control feedback loop output</td>
<td>driven HIGH, do not connect (reserved for future output)</td>
</tr>
<tr>
<td>IOIn input</td>
<td>output slew rate control input</td>
<td>should be driven high (reserved for future input)</td>
</tr>
<tr>
<td>GrpRunB output</td>
<td>do not connect</td>
<td>same (reserved for future output)</td>
</tr>
<tr>
<td>GrpStallB input</td>
<td>should be connected to VCC</td>
<td>same (reserved for future input)</td>
</tr>
<tr>
<td>FaultB output pin</td>
<td>indicates compare mismatch</td>
<td>driven HIGH, do not connect (reserved for future output)</td>
</tr>
</tbody>
</table>

Table 1.20 System Interface Comparison Between R4400 PC and R4600/R4700
<table>
<thead>
<tr>
<th>Item</th>
<th>R4400 PC</th>
<th>R4600/R4700</th>
</tr>
</thead>
<tbody>
<tr>
<td>Cache Sizes</td>
<td>16KB Instruction cache, 16KB Data cache</td>
<td>16KB Instruction cache, 16KB Data cache</td>
</tr>
<tr>
<td>Cache Line Sizes</td>
<td>software selectable between 16B and 32B</td>
<td>fixed at 32B</td>
</tr>
<tr>
<td>Cache Index</td>
<td>vAddr_{13..0}</td>
<td>vAddr_{12..0}</td>
</tr>
<tr>
<td>Cache Tag</td>
<td>pAddr_{35..12}</td>
<td>same</td>
</tr>
<tr>
<td>Cache Organization</td>
<td>direct mapped</td>
<td>2-way set associative</td>
</tr>
<tr>
<td>Data cache write policy</td>
<td>write-allocate and write-back</td>
<td>write-allocate or not based on TLB entry, write-through or not based on TLB entry</td>
</tr>
<tr>
<td>Data cache miss</td>
<td>stall, output address, copy dirty data to writeback buffer, refill cache, output writeback data</td>
<td>same, with FIFO to select the set to refill</td>
</tr>
<tr>
<td>Data order for block reads</td>
<td>sub-block ordering</td>
<td>same</td>
</tr>
<tr>
<td>Data order for block writes</td>
<td>sequential</td>
<td>same</td>
</tr>
<tr>
<td>Instruction cache miss restart</td>
<td>restart after all data received and written to cache</td>
<td>same</td>
</tr>
<tr>
<td>Data cache miss restart</td>
<td>restart after all data received and written to cache</td>
<td>restart on first doubleword, send subsequent doublewords to response buffer</td>
</tr>
<tr>
<td>Instruction Tag</td>
<td>2-bit cache state</td>
<td>1-bit cache state</td>
</tr>
<tr>
<td>Cache miss overhead</td>
<td>5-8 cycles</td>
<td>3 cycles</td>
</tr>
<tr>
<td>Instruction cache parity</td>
<td>1 parity bit per 8 data bits</td>
<td>1 parity bit per 32 data bits</td>
</tr>
<tr>
<td>Data cache parity</td>
<td>1 parity bit per 8 data bits</td>
<td>same</td>
</tr>
</tbody>
</table>

Table 1.21 Cache Comparison Between R4400 PC and R4600/R4700
### Overview

#### Chapter 1

<table>
<thead>
<tr>
<th>Item</th>
<th>R4400 PC</th>
<th>R4600/ R4700</th>
</tr>
</thead>
<tbody>
<tr>
<td>Instruction virtual address translation</td>
<td>2-entry ITLB</td>
<td>same</td>
</tr>
<tr>
<td>ITLB miss</td>
<td>1 cycle penalty, refilled from JTLB, LRU replacement</td>
<td>1 cycle on branch, jump, and ERET. 2 cycles otherwise, refilled from JTLB, LRU replacement</td>
</tr>
<tr>
<td>Data virtual address translation</td>
<td>done directly in JTLB</td>
<td>4-entry DTLB</td>
</tr>
<tr>
<td>DTLB miss</td>
<td>n.a.</td>
<td>1 cycle penalty, refilled from JTLB, pseudo-LRU replacement</td>
</tr>
<tr>
<td>JTLB</td>
<td>48 entries of even/odd page pairs, fully associative</td>
<td>same</td>
</tr>
<tr>
<td>Page size</td>
<td>4KB, 16KB, ..., 16MB</td>
<td>same</td>
</tr>
<tr>
<td>Multiple entry match in JTLB</td>
<td>sets TS in Status and disables TLB until Reset to prevent damage</td>
<td>no damage for multiple match; no detection or shutdown implemented</td>
</tr>
<tr>
<td>Virtual address size</td>
<td>VSIZE = 40</td>
<td>same</td>
</tr>
<tr>
<td>Physical address size</td>
<td>PSIZE = 36</td>
<td>same</td>
</tr>
</tbody>
</table>

*Table 1.22  TLB Comparison Between R4400 PC and R4600/ R4700*

<table>
<thead>
<tr>
<th>Item</th>
<th>R4400 PC</th>
<th>R4600/ R4700</th>
</tr>
</thead>
<tbody>
<tr>
<td>ALU latency</td>
<td>1 cycle</td>
<td>1 cycle</td>
</tr>
<tr>
<td>Load latency</td>
<td>3 cycles</td>
<td>2 cycles</td>
</tr>
<tr>
<td>Branch latency</td>
<td>4 cycles (2 cycle penalty for taken branches)</td>
<td>2 cycles (no penalty for taken branches)</td>
</tr>
<tr>
<td>Store buffer (not write buffer)</td>
<td>2 doublewords</td>
<td>1 doubleword</td>
</tr>
<tr>
<td>Integer multiply</td>
<td>integer multiply hardware, 1 cycle to issue</td>
<td>done in floating-point multiplier, 4 cycles to issue</td>
</tr>
<tr>
<td>Integer divide</td>
<td>done in integer datapath adder, slips until done</td>
<td>done in floating-point adder, 4 cycles to issue</td>
</tr>
<tr>
<td>Integer multiply</td>
<td>HIGH and LOW available at the same time</td>
<td>LOW available one cycle before HIGH</td>
</tr>
<tr>
<td>Integer divide</td>
<td>HIGH and LOW available at the same time</td>
<td>HIGH available one cycle before LOW</td>
</tr>
<tr>
<td>HIGH and LOW hazards</td>
<td>yes, HIGH and LOW written early in pipeline</td>
<td>no, HIGH and LOW written after W</td>
</tr>
<tr>
<td>MFHI/MFLO latency</td>
<td>1 cycle</td>
<td>2 cycles</td>
</tr>
<tr>
<td>SLLV, SRLV, SRAV</td>
<td>2 cycles to issue</td>
<td>1 cycle to issue</td>
</tr>
<tr>
<td>DSLL, DSRL, DSRA, DSLL32, DSRL32, DSRA32, DSLV, DSRV, DSRLV, DSRAV</td>
<td>2 cycles to issue</td>
<td>1 cycle to issue</td>
</tr>
</tbody>
</table>

*Table 1.23  Pipeline Comparison Between R4400 PC and R4600/ R4700*
<table>
<thead>
<tr>
<th>Item</th>
<th>R4400 PC</th>
<th>R4600/R4700</th>
</tr>
</thead>
<tbody>
<tr>
<td>WatchLo, WatchHi</td>
<td>implemented</td>
<td>unimplemented (no watch registers)</td>
</tr>
<tr>
<td>Config</td>
<td>as described in <em>R4000 User's Guide</em></td>
<td>subset</td>
</tr>
<tr>
<td>Status</td>
<td>as described in <em>R4000 User's Guide</em>, but RP not functional</td>
<td>no TS or RP</td>
</tr>
<tr>
<td>Low-power standby mode</td>
<td>no</td>
<td>WAIT instruction disables internal clock, freezing pipeline and other state; resume on interrupt</td>
</tr>
<tr>
<td>MFC0/MTC0 hazard</td>
<td>only hazardous for certain cp0 register combinations</td>
<td>always hazardous -- detected and 1-cycle slip inserted</td>
</tr>
<tr>
<td>EntryLo0, EntryLo1</td>
<td>as described in <em>R4000 User's Guide</em></td>
<td>two new cache algorithms added to C field for non-coherent write-through</td>
</tr>
<tr>
<td>TagLo, TagHi, ECC, CacheErr</td>
<td>R4400SC bits implemented but meaningless</td>
<td>Only bits meaningful on R4400 PC implemented</td>
</tr>
<tr>
<td>TagLo</td>
<td>as described in <em>R4000 User's Guide</em></td>
<td>bits 5..3 read/writeable but otherwise unused, bit 2 used for F bit</td>
</tr>
<tr>
<td>Exceptions</td>
<td>as described in <em>R4000 User's Guide</em> (VCEI and VCED not possible)</td>
<td>VCEI, VCED, and WATCH exceptions not implemented</td>
</tr>
<tr>
<td>Index CACHE ops</td>
<td>use vAddr_{13..4} to select line</td>
<td>use vAddr_{13} to select set, vAddr_{12..5} to select line of set</td>
</tr>
<tr>
<td>Index Store Tag CACHE op</td>
<td>Status.CE ignored</td>
<td>TagLo.P stored if Status.CE set</td>
</tr>
<tr>
<td>PRId</td>
<td>Imp = 0x04</td>
<td>R4600: Imp = 0x20 R4700: Imp = 0x21</td>
</tr>
</tbody>
</table>

**Table 1.24  Coprocessor 0 Comparison Between R4400 PC and R4600/R4700**

<table>
<thead>
<tr>
<th>Item</th>
<th>R4400 PC</th>
<th>R4600/R4700</th>
</tr>
</thead>
<tbody>
<tr>
<td>Possible exception stall</td>
<td>only for operands that can cause exceptions</td>
<td>some simplifications in detection hardware</td>
</tr>
<tr>
<td>Floating-point divide</td>
<td>separate divide unit</td>
<td>done in floating-point adder</td>
</tr>
<tr>
<td>Floating-point square root</td>
<td>done in floating-point adder</td>
<td>same</td>
</tr>
<tr>
<td>Converts to/from 64-bit integer</td>
<td>uses unimplemented for integer operands/results with more than 53 bits of precision</td>
<td>handles full 64-bit operands and results</td>
</tr>
<tr>
<td>Floating-point registers</td>
<td>Status.FR enables all 32 floating point registers</td>
<td>same</td>
</tr>
<tr>
<td>FCR0</td>
<td>Imp = 0x05</td>
<td>R4600: Imp = 0x20 R4700: Imp = 0x21</td>
</tr>
</tbody>
</table>

**Table 1.25  Coprocessor 1 Comparison Between R4400 PC and R4600/R4700**
Introduction
This chapter is an overview of the central processing unit (CPU) instruction set; refer to Appendix A for detailed descriptions of individual CPU instructions.

An overview of the floating-point unit (FPU) instruction set is in Chapter 6; refer to Appendix B for detailed descriptions of individual FPU instructions.

CPU Instruction Formats
Each CPU instruction consists of a single 32-bit word, aligned on a word boundary. There are three instruction formats—immediate (I-type), jump (J-type), and register (R-type)—as shown in Figure 2.1. The use of a small number of instruction formats simplifies instruction decoding (thus higher frequency operations) and allowing the compiler to synthesize more complicated (and less frequently used) operations and addressing modes from these three formats as needed.

![Figure 2.1 CPU Instruction Formats](image)

In the MIPS architecture, coprocessor instructions are implementation-dependent: refer to Appendix A for details of individual Coprocessor 0 instructions.
Load and Store Instructions

Load and store are immediate (I-type) instructions that move data between memory and the general registers. The only addressing mode that load and store instructions directly support is base register plus 16-bit signed immediate offset.

Scheduling a Load Delay Slot

A load instruction that does not allow its result to be used by the instruction immediately following is called a delayed load instruction. The instruction slot immediately following this delayed load instruction is referred to as the load delay slot.

In the R4600/R4700 processor, the instruction immediately following a load instruction can request the contents of the loaded register, however, in such cases, hardware interlocks insert additional real cycles. Consequently, scheduling load delay slots can be desirable, both for performance and R-Series (e.g., R3051) processor compatibility. However, the scheduling of load delay slots is not absolutely required.

Defining Access Types

Access type indicates the size of an R4600/R4700 processor data item to be loaded or stored, set by the load or store instruction opcode. Access types are defined in Appendix A.

Regardless of access type or byte ordering (endianness), the address given specifies the low-order byte in the addressed field. For a big-endian configuration, the low-order byte is the most-significant byte; for a little-endian configuration, the low-order byte is the least-significant byte.

The access type, together with the three low-order bits of the address, define the bytes accessed within the addressed doubleword, which is shown in Table 2.1 on page 2-3.
Only the combinations shown in Table 2.1 are permissible; other combinations cause address error exceptions. See Appendix A for individual descriptions of CPU load and store instructions.

<table>
<thead>
<tr>
<th>Access Type Mnemonic (Value)</th>
<th>Low Order Address Bits</th>
<th>Bytes Accesseds</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>2 1 0</td>
<td>Big endian</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Little endian</td>
</tr>
<tr>
<td></td>
<td></td>
<td>(63---------31---0)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Byte</td>
</tr>
<tr>
<td>Doubleword (7)</td>
<td>0 0 0</td>
<td>0 1 2 3 4 5 6 7</td>
</tr>
<tr>
<td>Septibyte (6)</td>
<td>0 0 0</td>
<td>1 2 3 4 5 6</td>
</tr>
<tr>
<td></td>
<td>0 0 1</td>
<td>1 2 3 4 5 6 7</td>
</tr>
<tr>
<td>Sextibyte (5)</td>
<td>0 0 0</td>
<td>0 1 2 3 4 5</td>
</tr>
<tr>
<td></td>
<td>0 1 0</td>
<td>2 3 4 5 6 7</td>
</tr>
<tr>
<td>Quintibyte (4)</td>
<td>0 0 0</td>
<td>0 1 2 3 4 5</td>
</tr>
<tr>
<td></td>
<td>0 1 1</td>
<td>3 4 5 6 7 7</td>
</tr>
<tr>
<td>Word (3)</td>
<td>0 0 0</td>
<td>0 1 2 3</td>
</tr>
<tr>
<td></td>
<td>1 0 0</td>
<td>4 5 6 7 7</td>
</tr>
<tr>
<td>Triplebyte (2)</td>
<td>0 0 0</td>
<td>0 1 2</td>
</tr>
<tr>
<td></td>
<td>0 0 1</td>
<td>1 2 3</td>
</tr>
<tr>
<td></td>
<td>1 0 0</td>
<td>4 5 6</td>
</tr>
<tr>
<td></td>
<td>1 0 1</td>
<td>5 6 7</td>
</tr>
<tr>
<td>Halfword (1)</td>
<td>0 0 0</td>
<td>0 1</td>
</tr>
<tr>
<td></td>
<td>0 1 0</td>
<td>2 3</td>
</tr>
<tr>
<td></td>
<td>1 0 0</td>
<td>4 5</td>
</tr>
<tr>
<td></td>
<td>1 1 0</td>
<td>6 7</td>
</tr>
<tr>
<td>Byte (0)</td>
<td>0 0 0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0 0 1</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>0 1 0</td>
<td>2</td>
</tr>
<tr>
<td></td>
<td>0 1 1</td>
<td>3</td>
</tr>
<tr>
<td></td>
<td>1 0 0</td>
<td>4</td>
</tr>
<tr>
<td></td>
<td>1 0 1</td>
<td>5</td>
</tr>
<tr>
<td></td>
<td>1 1 0</td>
<td>6</td>
</tr>
<tr>
<td></td>
<td>1 1 1</td>
<td>7</td>
</tr>
</tbody>
</table>

Table 2.1 Byte Access within a Doubleword
Computational Instructions

Computational instructions can be either: 1) in register (R-type) format, in which both operands are registers, or 2) in immediate (I-type) format, in which one operand is a 16-bit immediate.

Computational instructions perform the following operations on register values:
- arithmetic
- logical
- shift
- multiply
- divide

These operations fit in the following four categories of computational instructions:
- ALU Immediate instructions
- three-Operand Register-Type instructions
- shift instructions
- multiply and divide instructions

64-bit Virtual Address Operations with 32-bit operands

Operands to 32-bit operand opcodes must be in sign-extended form. 32-bit operand opcodes include all non-doubleword operations, such as: ADD, ADDU, SUB, SUBU, ADDI, SLL, SRL, SRA, SLLV, etc. The result of operations that use incorrect sign-extended 32-bit values is unpredictable.

Cycle Timing for Multiply and Divide Instructions

MFHI and MFLO instructions (described in Appendix A) are interlocked so that any attempt to read them before prior multiply or divide instructions complete delays the execution of these instructions until the prior instructions finish.

Table 2.2 gives the number of processor cycles (PCycles) required to resolve an interlock or stall between various multiply or divide instructions, and a subsequent MFHI or MFLO instruction.

<table>
<thead>
<tr>
<th>Instruction</th>
<th>R4600</th>
<th>R4700</th>
</tr>
</thead>
<tbody>
<tr>
<td>MULT</td>
<td>10</td>
<td>8</td>
</tr>
<tr>
<td>MULTU</td>
<td>10</td>
<td>8</td>
</tr>
<tr>
<td>DIV</td>
<td>42</td>
<td>42</td>
</tr>
<tr>
<td>DIVU</td>
<td>42</td>
<td>42</td>
</tr>
<tr>
<td>DMULT</td>
<td>12</td>
<td>10</td>
</tr>
<tr>
<td>DMULTU</td>
<td>12</td>
<td>10</td>
</tr>
<tr>
<td>DDIV</td>
<td>74</td>
<td>74</td>
</tr>
<tr>
<td>DDIVU</td>
<td>74</td>
<td>74</td>
</tr>
</tbody>
</table>

Table 2.2 Multiply/Divide Instruction Cycle Timing

For more information about computational instructions, refer to the individual instruction as described in Appendix A.
Jump and Branch Instructions
Jump and branch instructions change the control flow of a program. All jump and branch instructions occur with a delay of one instruction: that is, the instruction immediately following the jump or branch (this is known as the instruction in the delay slot) always executes while the target instruction is being fetched from storage.

Overview of Jump Instructions
Subroutine calls in high-level languages are usually implemented with Jump or Jump and Link instructions, both of which are J-type instructions. In J-type format, the 26-bit target address shifts left 2 bits and combines with the high-order 4 bits of the current program counter to form an absolute address.

Returns, dispatches, and large cross-page jumps are usually implemented with the Jump Register or Jump and Link Register instructions. Both are R-type instructions that take the 32-bit or 64-bit byte address contained in one of the general purpose registers.

For more information about jump instructions, refer to the individual instruction as described in Appendix A.

Overview of Branch Instructions
All branch instruction target addresses are computed by adding the address of the instruction in the delay slot to the 16-bit offset (shifts left 2 bits and is sign-extended to 32 bits). All branches occur with a delay of one instruction.

If a conditional branch likely is not taken, the instruction in the delay slot is nullified. For regular conditional branches, the delay slot is always executed.

For more information about branch instructions, refer to the individual instruction as described in Appendix A.

Special Instructions
Special instructions allow the software to initiate traps; they are always R-type. For more information about special instructions, refer to the individual instruction as described in Appendix A.

Exception Instructions
Exception instructions are extensions to the MIPS ISA. For more information about exception instructions, refer to the individual instruction as described in Appendix A.

Coprocessor Instructions
Coprocessor instructions perform operations in their respective coprocessors. Coprocessor loads and stores are I-type, and coprocessor computational instructions have coprocessor-dependent formats.

Individual coprocessor instructions are described in Appendices A (for CP0) and B (for the FPU, CP1).

CP0 instructions perform operations specifically on the System Control Coprocessor registers to manipulate the memory management and exception handling facilities of the processor. Appendix A contains details of the CP0 instructions.
Introduction
This chapter describes the basic operation of the CPU pipeline, which includes descriptions of the delay instructions (instructions that follow a branch or load instruction in the pipeline), interruptions to the pipeline flow caused by interlocks and exceptions, and R4600/R4700 implementation of an uncached store buffer. The FPU pipeline is described in a later chapter.

CPU Pipeline Operation
The R4600/R4700 uses a 5-stage pipeline similar to the R3000. The simplicity of this pipeline allows the R4600/R4700 to be lower cost and lower power than super-scalar or super-pipelined processors. Unlike the R3000, the R4600/R4700 does virtual to physical translation in parallel with cache access. This allows the R4600/R4700 to operate at over twice the frequency of the R3000 and to support a larger TLB for address translation.

Compared to the 8-stage R4000 pipeline, the R4600/R4700 is more efficient (requires fewer stalls).

Once the pipeline has been filled, five instructions are executed simultaneously. Figure 3.1 shows the five stages of the instruction pipeline; the next section describes the pipeline stages.

---

Figure 3.1  Instruction Pipeline Stages

1I-1R Instruction cache access
2I Instruction virtual to physical address translation in ITLB
2A-2D Data cache access and load align
1D Data virtual to physical address translation in DTLB
1D-2D Virtual to physical address translation in JTLB
2R Register file read
2R Bypass calculation

Figure 3.1 Instruction Pipeline Stages
CPU Pipeline Stages

This section describes each of the phases of the five pipeline stages. Each stage has 2 phases:

- 1I - Instruction Fetch, Phase one
- 2I - Instruction Fetch, Phase two
- 1R - Register Fetch, Phase one
- 2R - Register Fetch, Phase two
- 1A - Execution, Phase one
- 2A - Execution, Phase two
- 1D - Data Fetch, Phase one
- 2D - Data Fetch, Phase two
- 1W - Write Back, Phase one
- 2W - Write Back, Phase two

1I - Instruction Fetch, Phase one

During the 1I phase the instruction address translation begins in the ITLB.

2I - Instruction Fetch, Phase two

During the 2I phase, the instruction cache fetch begins and the instruction address translation in the ITLB continues.

1R - Register Fetch, Phase one

During the 1R phase, the following occurs:
- The instruction cache fetch finishes.
- The instruction cache tag is checked against the page frame number obtained from the ITLB.

2R - Register Fetch, Phase two

During the 2R phase, the following occurs:
- The instruction decoder decodes the instruction.
- Any required operands are fetched from the register file.
- Make a decision to either issue or slip (for an interlock condition).
- For a branch, the branch address is calculated.

1A - Execution, Phase one

During the 1A phase, one of the following occurs:
- Any result from the A or D stages are bypassed.
- The arithmetic logic unit (ALU) starts the integer arithmetic, logical or shift operation.
- The ALU calculates the data virtual address for load and store instructions.
- The ALU determines whether the branch condition is true.

2A - Execution, Phase two

During the 2A phase, one of the following occurs:
- The integer arithmetic, logical or shift operation will complete.
- A data cache access will start.
- Store data is shifted to the specified byte position(s).
- The data virtual to physical address translation in the DTLB will start.

1D - Data Fetch, Phase one

During the 1D phase, one of the following occurs:
- The data cache access will continue.
- The data address translation in the DTLB completes.
- The virtual to physical address translation in the JTLB will start.
2D - Data Fetch, Phase two
During the 2D phase, one of the following occurs:
• The data cache access will finish and the data is shifted down and extended.
• The virtual to physical address translation in the JTLB will finish.
The data cache tag is checked against the PFN from the DTLB or JTLB for any data cache access.

1W - Write Back, Phase one
This phase is used internally by the processor to resolve all exceptions, in preparation for the register file write.

2W - Write Back, Phase two
For register-to-register and load instructions, the result is written back to the register file during the 2W stage. Branch instructions perform no operation during this stage.

Figure 3.2 shows the activities occurring during each ALU pipeline stage, for load, store, and branch instructions.

![Figure 3.2 CPU Pipeline Activities](image-url)
**Branch Delay**

The CPU pipeline has a branch delay of one cycle and a load delay of one cycle. The one-cycle branch delay is a result of the branch decision logic operating during the 1A pipeline phase of the branch instruction. This allows the branch target address calculated in the previous phase to be used for the instruction access in the following 1I phase. The pipeline will begin the fetch of the branch path as well as the fall-through path in the cycle following the delay slot. After the branch decision is made, the processor will continue with the fetch of either the branch path (for a taken branch) or the fall-through path (for the non-taken branch).

Figure 3.3 illustrates the branch delay.

**Load Delay**

The completion of a load at the end of the 2D pipeline phase produces an operand that is available for the 1A pipeline phase of the instruction following the load delay slot.

Figure 3.4 shows the load delay of one pipeline cycle.
Interlock and Exception Handling

Smooth pipeline flow is interrupted when cache misses or exceptions occur, or when data dependencies are detected. Interruptions handled using hardware, such as cache misses, are referred to as interlocks, while those that are handled using software are called exceptions.

There are two types of interlocks:
- stalls, which are resolved by halting the pipeline
- slips, which require the back end of the pipeline to advance while the front end of the pipeline is held static

At each cycle, exception and interlock conditions are checked for all active instructions.

Because each exception or interlock condition corresponds to a particular pipeline stage, a condition can be traced back to the particular instruction in the exception/interlock stage, as shown in Figure 3.5. For instance, a Reserved Instruction (RI) exception is raised in the execution (A) stage.

<table>
<thead>
<tr>
<th>State</th>
<th>Pipeline Stage</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>I</td>
</tr>
<tr>
<td>Stall</td>
<td>ITM</td>
</tr>
<tr>
<td>Slip</td>
<td></td>
</tr>
<tr>
<td>Exceptions</td>
<td>ITLB</td>
</tr>
<tr>
<td></td>
<td>IPErr</td>
</tr>
<tr>
<td></td>
<td>BP</td>
</tr>
<tr>
<td></td>
<td>DTLB</td>
</tr>
<tr>
<td></td>
<td>TLBMod</td>
</tr>
<tr>
<td></td>
<td>Intr</td>
</tr>
</tbody>
</table>

Figure 3.5 Correspondence of Pipeline Stage to Interlock Condition

For a description of the pipeline interlocks and exceptions listed in Figure 3.5, refer to Table 3.1 and Table 3.2, which follow.
Table 3.1 and Table 3.2 describe the pipeline interlocks and exceptions listed in Figure 3.5.

<table>
<thead>
<tr>
<th>Exception</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ITLB</td>
<td>Instruction Translation or Address Exception</td>
</tr>
<tr>
<td>Intr</td>
<td>External Interrupt</td>
</tr>
<tr>
<td>IBE</td>
<td>Instruction Bus Error</td>
</tr>
<tr>
<td>RI</td>
<td>Reserved Instruction</td>
</tr>
<tr>
<td>BP</td>
<td>Breakpoint</td>
</tr>
<tr>
<td>SC</td>
<td>System Call</td>
</tr>
<tr>
<td>CUn</td>
<td>Coprocessor Unusable</td>
</tr>
<tr>
<td>IPErr</td>
<td>Instruction Parity Error</td>
</tr>
<tr>
<td>OVF</td>
<td>Integer Overflow</td>
</tr>
<tr>
<td>FPE</td>
<td>FP Interrupt</td>
</tr>
<tr>
<td>ExTrap</td>
<td>EX Stage Traps</td>
</tr>
<tr>
<td>DTLB</td>
<td>Data Translation or Address Exception</td>
</tr>
<tr>
<td>TLBMod</td>
<td>TLB Modified</td>
</tr>
<tr>
<td>DBE</td>
<td>Data Bus Error</td>
</tr>
<tr>
<td>DPErr</td>
<td>Data Parity Error</td>
</tr>
<tr>
<td>NMI</td>
<td>Non-maskable Interrupt (or Soft Reset)</td>
</tr>
<tr>
<td>Reset</td>
<td>Reset</td>
</tr>
</tbody>
</table>

Table 3.1 Pipeline Exceptions

<table>
<thead>
<tr>
<th>Interlock</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ITM</td>
<td>Instruction TLB Miss</td>
</tr>
<tr>
<td>ICM</td>
<td>Instruction Cache Miss</td>
</tr>
<tr>
<td>CPE</td>
<td>Coprocessor Possible Exception</td>
</tr>
<tr>
<td>DCM</td>
<td>Data Cache Miss</td>
</tr>
<tr>
<td>LDI</td>
<td>Load Interlock</td>
</tr>
<tr>
<td>MDSt</td>
<td>Multiply/Divide Start</td>
</tr>
<tr>
<td>FCBsy</td>
<td>FP Coprocessor Busy</td>
</tr>
</tbody>
</table>

Table 3.2 Pipeline Interlocks

**Exception Conditions**

When an exception condition occurs, the relevant instruction and all those that follow it in the pipeline are cancelled. Accordingly, any stall conditions and any later exception conditions that may have referenced this instruction are inhibited; there is no benefit in servicing stalls for a cancelled instruction.
When an exceptional condition is detected for an instruction, the R4600/R4700 will kill it and all following instructions. When this instruction reaches the W stage, the exception flag causes it to write various CP0 registers with the exception state, change the current PC to the appropriate exception vector address and clear the exception bits of earlier pipeline stages.

This implementation allows all preceding instructions to complete execution and prevents all subsequent instructions from completing. Thus the value in the EPC is sufficient to restart execution. It also ensures that exceptions are taken in the order of execution; an instruction taking an exception may itself be killed by an instruction further down the pipeline that takes an exception in a later cycle.

Figure 3.6 shows the exception detection procedure (e.g., a reserved instruction exception).

![Figure 3.6 Exception Detection](image)

**Stall Conditions**

Stalls are used to stop the pipeline for conditions detected after the R pipe-stage. When a stall occurs, the processor will resolve the condition and then the pipeline will continue.
Figure 3.7 shows a data cache miss stall.

The data cache miss is detected in the D pipe stage. If the cache line to be replaced is dirty — the W bit is set — the data is moved to the internal write buffer in the next cycle. The first doubleword of data is returned to the cache in 3 and the pipeline will then restart. The remainder of the cache line is returned in the subsequent cycles. The data to be written back will be returned to memory some time after the entire new cache line is returned.

**Slip Conditions**

During the 2R and 1A pipe-stages, internal logic will determine whether it is possible to start the current instruction in this cycle. If all of the source operands are available (either from the register file or via the internal bypass logic) and all the hardware resources necessary to complete the instruction will be available at the necessary time(s), then the instruction “issues”; otherwise, the instruction will “slip”. Slipped instructions are retried on subsequent cycles until they issue. The backend of the pipeline (stages D and W) will advance normally during slips in an attempt to resolve the conflict. “NOPS” will be inserted into the bubble in the pipeline. Instructions killed by branch likely instructions, ERET or exceptions will not cause slips.
Figure 3.8 shows an instruction cache miss.

Instruction cache misses are detected in R as shown in Figure 3.8 and the pipeline slips in its A stage. There can never be a writeback required for an instruction cache miss since dirty data can never exist in the I cache. Writes are not allowed to the I cache. Note that early restart is not employed for instruction cache misses, the requested cache line will be loaded into the cache in its entirety and, after that, the pipeline will restart.

R4600/R4700 Write Buffer

The R4600/R4700 contains a write buffer to improve the performance of writes to the external memory. Writes to external memory, whether cache miss writebacks or stores to uncached or write-through addresses, use this on-chip write buffer. The write buffer holds up to four 64-bit address and data pairs.

For a cache miss write-back, the entire buffer is used for the write-back data and allows the processor to proceed in parallel with the memory update. For uncached and write-through stores, the write buffer uncouples the CPU from the write to memory allowing increased performance over the R4000 family of processors. If the write buffer is full, additional stores will stall until there is room for them in the write buffer.
The R4600/R4700 processor provides a full-featured memory management unit (MMU) which uses an on-chip Translation Lookaside Buffer (TLB) to translate virtual addresses into physical addresses.

This chapter describes the processor virtual and physical address spaces, the virtual-to-physical address translation, the operation of the TLB in making these translations, and those System Control Coprocessor (CP0) registers that provide the software interface to the TLB.

**Translation Lookaside Buffer (TLB)**

Mapped virtual addresses are translated into physical addresses using an on-chip TLB. The TLB is a fully associative memory that holds 48 entries, which provide mapping to 48 odd/even page pairs (96 pages). When address mapping is indicated, each TLB entry is checked simultaneously for a match with the virtual address that is extended with an ASID stored in the EntryHi register.

The address mapped to a page ranges in size from 4Kbytes to 16Mbytes, in multiples of 4—that is, 4K, 16K, 64K, 256K, 1M, 4M, 16M.

**Hits and Misses**

If there is a virtual address match, or hit, in the TLB, the physical page number is extracted from the TLB and concatenated with the offset to form the physical address (see Figure 4.1).

If no match occurs (TLB miss), an exception is taken and software refills the TLB from the page table resident in memory. Software can write over a selected TLB entry or use a hardware mechanism to write into a random entry.

**Multiple Matches**

The R4600/R4700 does not provide any detection or shutdown mechanism for multiple matches in the TLB. There is no damage possible from this condition. The result is undefined for this condition. Software is expected never to allow this to occur.

**Address Spaces**

This section describes the virtual and physical address spaces and the manner in which virtual addresses are converted or “translated” into physical addresses in the TLB.

**Virtual Address Space**

The processor virtual address can be either 32- or 64-bits wide, depending on mode of operation (user, supervisor or kernel) and the setting of the corresponding extended address bit in the Status register (UX, SX and KX).

- For the extended address bit = 0, addresses are 32-bits wide.
- For the extended address bit = 1, addresses are 64-bits wide.

Both 32-bit and 64-bit address wrap in the same way. For example, in 64-bit mode 0xffffffff will wrap to 0x0000000000000000. While the R4400 slipped on shift of >32-bit or other shift variables, the R4600/R4700 does not.

---

1. There are virtual-to-physical address translations that occur outside of the TLB. For example, addresses in kseg0 and kseg1 spaces are unmapped translations. In these spaces the physical address is 0x0000 0000 0 || VA[28:0]
Figure 4.1 shows the translation of a virtual address into a physical address.

As shown in Figure 4.2 and Figure 4.3, the virtual address is extended with an 8-bit address space identifier (ASID), which reduces the frequency of TLB flushing when switching contexts. This 8-bit ASID is in the CP0 EntryHi register, described later in this chapter. The Global bit (G) is in the EntryLo0 and EntryLo1 registers, described later in this chapter.

**Physical Address Space**

Using a 36-bit address, the processor physical address space encompasses 64 Gigabytes. The section following describes the translation of a virtual address to a physical address.

**Virtual-to-Physical Address Translation**

Converting a virtual address to a physical address begins by comparing the virtual address from the processor with the virtual address in the TLB; there is a match when the virtual page number (VPN) of the address is the same as the VPN field of the entry, and either:

- the Global (G) bit of the TLB entry is set, or
- the ASID field of the virtual address is the same as the ASID field of the TLB entry.

This match is referred to as a **TLB hit**. If there is no match, a TLB Miss exception is taken by the processor and software is allowed to refill the TLB from a page table of virtual/physical addresses in memory.

If there is a virtual address match in the TLB, the physical address is output from the TLB and concatenated with the **Offset**, which represents an address within the page frame space. The Offset does not pass through the TLB.

Virtual-to-physical translation is described in greater detail throughout the remainder of this chapter; Figure 4.19 on page 22 is a flow diagram of the process.

The next two sections describe the 32-bit and 64-bit address translations.
32-bit Virtual Address Translation

Figure 4.2 shows the virtual-to-physical-address translation of a 32-bit virtual address.

- The top portion of Figure 4.2 shows a virtual address with a 12-bit, or 4Kbyte, page size, labelled Offset. The remaining 20 bits of the address represent the VPN, and index the 1M-entry page table.
- The bottom portion of Figure 4.2 shows a virtual address with a 24-bit, or 16Mbyte, page size, labelled Offset. The remaining 8 bits of the address represent the VPN, and index the 256-entry page table.

![Virtual Address with 1M (2²⁰) 4-Kbyte pages](image1)

![Virtual Address with 256 (2⁸)16-Mbyte pages](image2)

64-bit Virtual Address Translation

Figure 4.3 on page 4 shows the virtual-to-physical-address translation of a 64-bit virtual address. This figure illustrates the two extremes in the range of possible page sizes: a 4Kbyte page (12 bits) and a 16Mbyte page (24 bits).

- The top portion of Figure 4.3 shows a virtual address with a 12-bit, or 4Kbyte, page size, labelled Offset. The remaining 28 bits of the address represent the VPN, and index the 256M-entry page table.
- The bottom portion of Figure 4.3 shows a virtual address with a 24-bit, or 16Mbyte, page size, labelled Offset. The remaining 16 bits of the address represent the VPN, and index the 64K-entry page table.
The processor has three operating modes that function in both 32- and 64-bit operations:
- User mode
- Supervisor mode
- Kernel mode
These modes are described in the next three sections.

User Mode Operations
In User mode, a single, uniform virtual address space—labelled User segment—is available; its size is:
- 2 Gbytes (2^31 bytes) for Status.UX = 0 (useg)
- 1 Tbyte (2^40 bytes) for Status.UX = 1 (xuseg)
Figure 4.4 shows the User mode virtual address space.

The User segment starts at address 0 and the current active user process resides in either useg (32-bit virtual addressing) or xuseg (in 64-bit virtual addressing). The TLB identically maps all references to useg/xuseg from all modes, and controls cache accessibility.

The processor operates in User mode when the Status register contains the following bit-values:

- KSU bits = \(10_2\)
- EXL = 0
- ERL = 0

In conjunction with these bits, the UX bit in the Status register selects between 32- or 64-bit User virtual addressing as follows:

- when UX = 0, 32-bit useg space is selected
- when UX = 1, 64-bit xuseg space is selected

Table 4.1 lists the characteristics of the two user mode segments, useg and xuseg.

<table>
<thead>
<tr>
<th>Address Bit Values</th>
<th>Status Register Bit Values</th>
<th>Segment Name</th>
<th>Address Range</th>
<th>Segment Size</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>KSU</td>
<td>EXL</td>
<td>ERL</td>
<td>UX</td>
</tr>
<tr>
<td>32-bit (A(31) = 0)</td>
<td>(10_2)</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>64-bit (A(63:40) = 0)</td>
<td>(10_2)</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

Table 4.1 32-bit and 64-bit User Mode Segments

**32-bit User Mode (useg)**

In User mode, when Status.UX = 0, User mode virtual addressing is compatible with the 32-bit addressing model shown in Figure 4.4, and a 2-Gbyte user address space is available, labelled useg.
All valid User mode virtual addresses have their most-significant bit cleared to 0; any attempt to reference an address with the most-significant bit set while in User mode causes an Address Error exception. In 32-bit User mode virtual addressing, the TLB refill exception vector is used for TLB misses. The system maps all references to useg through the TLB, and bit settings within the TLB entry for the page determine the cacheability of a reference.

### 64-bit User Mode (xuseg)

In User mode, when Status.UX = 1, User mode virtual addressing is extended to the 64-bit model shown in Figure 4.4, and a 1-Tbyte user address space is available, labelled xuseg.

All valid User mode virtual addresses have bits 63:40 equal to 0; an attempt to reference an address with bits 63:40 not equal to 0 causes an Address Error exception.

The extended addressing TLB refill exception vector is used for TLB misses.

### Supervisor Mode Operations

Supervisor mode is designed for layered operating systems in which a true kernel runs in R4600/R4700 Kernel mode, and the rest of the operating system runs in Supervisor mode.

The processor operates in Supervisor mode when the Status register contains the following bit-values:
- KSU = 012
- EXL = 0
- ERL = 0

In conjunction with these bits, the SX bit in the Status register selects between 32- or 64-bit Supervisor mode virtual addressing:
- when SX = 0, 32-bit supervisor space virtual addressing is selected
- when SX = 1, 64-bit supervisor space virtual addressing is selected

Figure 4.5 shows Supervisor mode address mapping. Table 4.2, which follows the figure, lists the characteristics of the supervisor mode segments; descriptions of the address spaces follow.

#### Figure 4.5  Supervisor Mode Virtual Address Space

<table>
<thead>
<tr>
<th>32-bit*</th>
<th>64-bit</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x FFFF FFFF</td>
<td>0x FFFF FFFF FFFF</td>
</tr>
<tr>
<td>0x B000 0000</td>
<td>Address error</td>
</tr>
<tr>
<td>0x C000 0000</td>
<td>0.5 GB Mapped</td>
</tr>
<tr>
<td>0x A000 0000</td>
<td>Address error</td>
</tr>
<tr>
<td>0x 8000 0000</td>
<td>Address error</td>
</tr>
<tr>
<td>0x 0000 0000</td>
<td>2 GB Mapped</td>
</tr>
<tr>
<td>0x 0000 0000</td>
<td>Mapped</td>
</tr>
</tbody>
</table>

| 0x FFFF FFFF FFFF | Address error   |
| 0x FFFF FFFF B000 0000 | Mapped          |
| 0x FFFF FFFF C000 0000 | Address error |
| 0x 4000 0100 0000 0000 | 1 TB Mapped    |
| 0x 4000 0000 0000 0000 | Address error |
| 0x 0000 0100 0000 0000 | 1 TB Mapped    |
| 0x 0000 0000 0000 0000 | Mapped          |

Note: *In 32-bit virtual addressing, bit 31 is sign-extended through bits 63:32. Failure results in an Address Error exception.
32-bit Supervisor Mode, User Space (suseg)
In Supervisor mode, when Status.SX = 0 and the most-significant bit of the 32-bit virtual address is set to 0, the suseg virtual address space is selected; it covers the full $2^{31}$ bytes (2Gbytes) of the current user address space. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address. This mapped space starts at virtual address 0x0000 0000 and runs through 0x7FFF FFFF.

32-bit Supervisor Mode, Supervisor Space (sseg)
In Supervisor mode, when Status.SX = 0 and the three most-significant bits of the 32-bit virtual address are 1102, the sseg virtual address space is selected; it covers $2^{29}$-bytes (512Mbytes) of the current supervisor address space. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address. This mapped space begins at virtual address 0xC000 0000 and runs through 0xDFFF FFFF.

64-bit Supervisor Mode, User Space (xsuseg)
In Supervisor mode, when Status.SX = 1 and bits 63:62 of the virtual address are set to 002, the xsuseg virtual address space is selected; it covers the full $2^{40}$ bytes (1Tbyte) of the current user address space. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address. This mapped space starts at virtual address 0x0000 0000 0000 0000 and runs through 0x0000 00FF FFFF FFFF.

64-bit Supervisor Mode, Current Supervisor Space (xsseg)
In Supervisor mode, when Status.SX = 1 and bits 63:62 of the virtual address are set to 012, the xsseg current supervisor virtual address space is selected. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address. This mapped space begins at virtual address 0x4000 0000 0000 0000 and runs through 0x4000 00FF FFFF FFFF.
64-bit Supervisor Mode, Separate Supervisor Space (csseg)

In Supervisor mode, when Status.SX = 1 and bits 63:62 of the virtual address are set to 112, the csseg separate supervisor virtual address space is selected. Addressing of the csseg is compatible with addressing sseg in 32-bit mode. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address.

This mapped space begins at virtual address 0xFFFF FFFF C000 0000 and runs through 0xFFFF FFFF DFFF FFFF.

Kernel Mode Operations

The processor operates in Kernel mode when the Status register contains one of the following values:

- $KSU = 00_2$
- $EXL = 1$
- $ERL = 1$

In conjunction with these bits, the $KX$ bit in the Status register selects between 32- or 64-bit Kernel mode addressing:

- when $KX = 0$, 32-bit kernel space virtual addressing is selected
- when $KX = 1$, 64-bit kernel space virtual addressing is selected

The processor enters Kernel mode whenever an exception is detected and it remains in Kernel mode until an Exception Return (ERET) instruction is executed. The ERET instruction restores the processor to the mode existing prior to the exception.

Kernel mode virtual address space is divided into regions differentiated by the high-order bits of the virtual address, as shown in Figure 4.6.
Note: *In 32-bit virtual addressing, bit 31 is sign-extended through bits 63:32. Failure results in an Address Error exception.

Figure 4.6  Kernel Mode Address Space
Table 4.3 lists the characteristics of the 32-bit kernel mode segments, and Table 4.4 lists the characteristics of the 64-bit kernel mode segments.

<table>
<thead>
<tr>
<th>Address Bit Values</th>
<th>Status Register Is One Of These Values</th>
<th>Segment Name</th>
<th>Address Range</th>
<th>Segment Size</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>KSU</td>
<td>EXL</td>
<td>ERL</td>
<td>KX</td>
</tr>
<tr>
<td>A(31) = 0</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>A(31:29) = 100₂</td>
<td>KSU = 00₂</td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>or</td>
<td></td>
<td></td>
<td>EXL = 1</td>
</tr>
<tr>
<td>A(31:29) = 101₂</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>or</td>
<td></td>
<td></td>
<td>ERL = 1</td>
</tr>
<tr>
<td>A(31:29) = 110₂</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>A(31:29) = 111₂</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
</tbody>
</table>

Table 4.3 32-bit Kernel Mode Segments

### 32-bit Kernel Mode, User Space (kuseg)

In Kernel mode, when Status.KX = 0, and the most-significant bit of the virtual address, A31, is cleared, the 32-bit kuseg virtual address space is selected; it covers the full 2³¹ bytes (2 Gbytes) of the current user address space. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address.

### 32-bit Kernel Mode, Kernel Space 0 (kseg0)

In Kernel mode, when Status.KX = 0 and the most-significant three bits of the virtual address are 100₂, 32-bit kseg0 virtual address space is selected; it is the current 2²⁹-byte (512-Mbyte) kernel physical space.

References to kseg0 are not mapped through the TLB; the physical address selected is defined by subtracting 0x8000 0000 from the virtual address (physical address = 0x0000 0000 0 11 VA[28:0]).

The K0 field of the Config register, described in this chapter, controls cacheability and coherency.

### 32-bit Kernel Mode, Kernel Space 1 (kseg1)

In Kernel mode, when Status.KX = 0 and the most-significant three bits of the 32-bit virtual address are 101₂, 32-bit kseg1 virtual address space is selected; it is the current 2²⁹-byte (512-Mbyte) kernel physical space.

References to kseg1 are not mapped through the TLB; the physical address selected is defined by subtracting 0xA000 0000 from the virtual address (physical address = 0x0000 0000 0 11 VA[28:0]).

Caches are disabled for accesses to these addresses, and physical memory (or memory-mapped I/O device registers) are accessed directly.

### 32-bit Kernel Mode, Supervisor Space (ksseg)

In Kernel mode, when Status.KX = 0 and the most-significant three bits of the 32-bit virtual address are 110₂, the ksseg virtual address space is selected; it is the current 2²⁹-byte (512-Mbyte) supervisor virtual space.

The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address.
32-bit Kernel Mode, Kernel Space 3 (kseg3)

In Kernel mode, when Status.KX = 0 and the most-significant three bits of the 32-bit virtual address are 1112, the kseg3 virtual address space is selected; it is the current 229-byte (512Mbyte) kernel virtual space. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address.

<table>
<thead>
<tr>
<th>Address Bit Values</th>
<th>Status Register Is One Of These Values</th>
<th>Segment Name</th>
<th>Address Range</th>
<th>Segment Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>A(63:62) = 002</td>
<td>KX = 0</td>
<td>xkuseg</td>
<td>0x0000 0000 0000 0000 through</td>
<td>1 Tbyte (240 bytes)</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>0x0000 00FF FFFF FFFF</td>
<td></td>
</tr>
<tr>
<td>A(63:62) = 012</td>
<td>KSU = 002 or EXL = 1</td>
<td>xksseg</td>
<td>0x4000 0000 0000 0000 through</td>
<td>1 Tbyte (240 bytes)</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>0x4000 00FF FFFF FFFF</td>
<td></td>
</tr>
<tr>
<td>A(63:62) = 102</td>
<td></td>
<td>xkphys</td>
<td>0x8000 0000 0000 0000 through</td>
<td>8 236-byte spaces</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>0xBFFF FFFF FFFF FFFF</td>
<td></td>
</tr>
<tr>
<td>A(63:62) = 112</td>
<td>EXL = 1 or ERL = 1</td>
<td>xkseg</td>
<td>0xC000 0000 0000 0000 through</td>
<td>244 bytes</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>0xC000 00FF 7FFF FFFF</td>
<td></td>
</tr>
<tr>
<td>A(63:62) = 112</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>A(61:31) = -1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>ckseg0</td>
<td>0xFFFF FFFF 8000 0000 through</td>
<td>512 Mbytes (229 bytes)</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>0xFFFF FFFF 9FFF FFFF</td>
<td></td>
</tr>
<tr>
<td>A(63:62) = 112</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>A(61:31) = -1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>ckseg1</td>
<td>0xFFFF FFFF A000 0000 through</td>
<td>512 Mbytes (229 bytes)</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>0xFFFF FFFF BFFF FFFF</td>
<td></td>
</tr>
<tr>
<td>A(63:62) = 112</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>A(61:31) = -1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>cksseg</td>
<td>0xFFFF FFFF C000 0000 through</td>
<td>512 Mbytes (229 bytes)</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>0xFFFF FFFF DFFF FFFF</td>
<td></td>
</tr>
<tr>
<td>A(63:62) = 112</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>A(61:31) = -1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>ckseg3</td>
<td>0xFFFF FFFF E000 0000 through</td>
<td>512 Mbytes (229 bytes)</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>0xFFFF FFFF FFFF FFFF</td>
<td></td>
</tr>
</tbody>
</table>

Table 4.4 64-bit Kernel Mode Segments

64-bit Kernel Mode, User Space (xkuseg)

In Kernel mode, when Status.KX = 1 and bits 63:62 of the 64-bit virtual address are 002, the xkuseg virtual address space is selected; it covers the current user address space. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address.

As a special feature for the ECC handler, if the ERL bit of the Status register is set, the user address region becomes a 231-byte unmapped, uncached space. This allows the ECC exception code to operate uncached using r0 as a base register.

64-bit Kernel Mode, Current Supervisor Space (xksseg)

In Kernel mode, when Status.KX = 1 and bits 63:62 of the 64-bit virtual address are 012, the xksseg virtual address space is selected; it is the current supervisor virtual space. The virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address.
64-bit Kernel Mode, Physical Spaces (xkphys)

In Kernel mode, when Status.KX = 1 and bits 63:62 of the 64-bit virtual address are 102, the xkphys virtual address space is selected; it is a set of eight $2^{36}$-byte kernel physical spaces. Accesses with address bits 58:36 not equal to 0 cause an address error.

References to this space are not mapped; the physical address selected is taken from bits 35:0 of the virtual address. Bits 61:59 of the virtual address specify the cacheability and coherency attributes, as shown in Table 4.5.

<table>
<thead>
<tr>
<th>Value (61:59)</th>
<th>Cacheability and Coherency Attributes</th>
<th>Starting Address</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Cacheable, noncoherent, write-through, no write allocate</td>
<td>0x8000 0000 0000 0000</td>
</tr>
<tr>
<td>1</td>
<td>Cacheable, noncoherent, write-through, write allocate</td>
<td>0x8800 0000 0000 0000</td>
</tr>
<tr>
<td>2</td>
<td>Uncached</td>
<td>0x9000 0000 0000 0000</td>
</tr>
<tr>
<td>3</td>
<td>Cacheable, noncoherent</td>
<td>0x9800 0000 0000 0000</td>
</tr>
<tr>
<td>4 - 7</td>
<td>Reserved</td>
<td>0xA000 0000 0000 0000</td>
</tr>
</tbody>
</table>

Table 4.5 Cacheability and Coherency Attributes

64-bit Kernel Mode, Kernel Space (xkseg)

In Kernel mode, when Status.KX = 1 and bits 63:62 of the 64-bit virtual address are 112, the address space selected is one of the following:
- kernel virtual space, xkseg, the current supervisor virtual space; the virtual address is extended with the contents of the 8-bit ASID field to form a unique virtual address
- one of the four 32-bit kernel compatibility spaces, as described in the next section.

64-bit Kernel Mode, Compatibility Spaces (ckseg1:0, cksseg, ckseg3)

In Kernel mode, when Status.KX = 1, bits 63:62 of the 64-bit virtual address are 112, and bits 61:31 of the virtual address equal “–1”, the lower two bytes of address, as shown in Figure 4.6, select one of the following 512-Mbyte compatibility spaces.
- ckseg0. This 64-bit virtual address space is an unmapped region, compatible with the 32-bit address model kseg0. The KO field of the Config register, described in this chapter, controls cacheability and coherency.
- ckseg1. This 64-bit virtual address space is an unmapped and uncached region, compatible with the 32-bit address model kseg1.
- cksseg. This 64-bit virtual address space is the current supervisor virtual space, compatible with the 32-bit address model kseg.
- ckseg3. This 64-bit virtual address space is kernel virtual space, compatible with the 32-bit address model kseg3.

System Control Coprocessor

The System Control Coprocessor (CP0) is implemented as an integral part of the CPU, and supports memory management, address translation, exception handling, and other privileged operations. CP0 contains the registers shown in Figure 4.7 plus a 48-entry TLB. The sections that follow describe how the processor uses each of the memory management-related registers.

Each CP0 register has a unique number that identifies it; this number is referred to as the register number. For instance, the Page Mask register is register number 5.
Format of a TLB Entry

Figure 4.8 shows the TLB entry formats for both 32- and 64-bit virtual addressing. Each field of an entry has a corresponding field in the EntryHi, EntryLo0, EntryLo1, or PageMask registers, as shown in Figure 4.9 and Figure 4.10; for example the Mask field of the TLB entry is also held in the PageMask register.

Note: *Register number
The format of the EntryHi, EntryLo0, EntryLo1, and PageMask registers are nearly the same as the TLB entry. The one exception is the Global field (G bit), which is used in the TLB, but is reserved in the EntryHi register. Figure 4.9 and Figure 4.10 describe the TLB entry fields that are shown in Figure 4.8.

<table>
<thead>
<tr>
<th>255</th>
<th>217</th>
<th>216</th>
<th>205</th>
<th>204</th>
<th>96</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>MASK</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>191</th>
<th>190</th>
<th>189</th>
<th>168</th>
<th>167</th>
<th>141</th>
<th>140</th>
<th>139</th>
<th>136</th>
<th>135</th>
<th>128</th>
</tr>
</thead>
<tbody>
<tr>
<td>R</td>
<td>0</td>
<td>VPN2</td>
<td>G</td>
<td>0</td>
<td>ASID</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>127</th>
<th>94</th>
<th>93</th>
<th>70</th>
<th>69</th>
<th>67</th>
<th>66</th>
<th>65</th>
<th>64</th>
</tr>
</thead>
<tbody>
<tr>
<td>34</td>
<td>27</td>
<td>1</td>
<td>4</td>
<td>8</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>63</th>
<th>30</th>
<th>29</th>
<th>6</th>
<th>5</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>34</td>
<td>24</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 4.8 Format of a TLB Entry

<table>
<thead>
<tr>
<th>31</th>
<th>25</th>
<th>24</th>
<th>13</th>
<th>12</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>MASK</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Mask..... Page comparison mask.
0.......... Reserved. Must be written as zeroes, and returns zeroes when read.

<table>
<thead>
<tr>
<th>64-bit VA</th>
</tr>
</thead>
<tbody>
<tr>
<td>63</td>
</tr>
<tr>
<td>R</td>
</tr>
</tbody>
</table>

VPN2.... Virtual page number divided by two (maps to two pages).
ASID.... Address space ID field. An 8-bit field that lets multiple processes share the TLB; each process has a distinct mapping of otherwise identical virtual page numbers.
R.......... Region. (00 → user, 01 → supervisor, 11 → kernel) used to match vAddr63...62
Fill........ Reserved. Returns zero when read, ignored on writes.
0.......... Reserved. Must be written as zeroes, and returns zeroes when read.

Figure 4.9 Fields of the PageMask and EntryHi Registers
The TLB page coherency attribute (C) bits specify whether references to the page should be cached; if cached, the algorithm selects between several coherency attributes. Table 4.6 shows the coherency attributes selected by the C bits.

<table>
<thead>
<tr>
<th>C(5:3) Value</th>
<th>Page Coherency Attribute</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Cacheable, noncoherent, write-through, no write allocate</td>
</tr>
<tr>
<td>1</td>
<td>Cacheable, noncoherent, write-through, write allocate</td>
</tr>
<tr>
<td>2</td>
<td>Uncached</td>
</tr>
<tr>
<td>3</td>
<td>Cacheable, noncoherent, write-back</td>
</tr>
<tr>
<td>4 - 7</td>
<td>Reserved</td>
</tr>
</tbody>
</table>

Table 4.6 TLB Page Coherency (C) Bit Values

**CP0 Registers**

The following sections describe the CP0 registers (shown in Figure 4.7 on page 13) that are assigned specifically as a software interface with memory management (each register is followed by its register number in parentheses).

- Index register (CP0 register number 0)
- Random register (1)
- EntryLo0 (2) and EntryLo1 (3) registers
- PageMask register (5)
- Wired register (6)
- EntryHi register (10)
- PRId register (15)
- Config register (16)
- LLAddr register (17)
- TagLo (28) and TagHi (29) registers
Index Register (0)

The Index register is a 32-bit, read/write register containing six bits to index an entry in the TLB. The high-order bit of the register shows the success or failure of a TLB Probe (TLBP) instruction.

The Index register also specifies the TLB entry affected by TLB Read (TLBR) or TLB Write Index (TLBWI) instructions.

Figure 4.11 shows the format of the Index register; Table 4.7, which follows the figure, describes the Index register fields.

```
<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>P</td>
<td>Probe failure. Set to 1 when the previous TLBProbe (TLBP) instruction was unsuccessful.</td>
</tr>
<tr>
<td>Index</td>
<td>Index to the TLB entry affected by the TLBRead and TLBWrite instructions</td>
</tr>
<tr>
<td>0</td>
<td>Reserved. Must be written as zeroes, and returns zeroes when read.</td>
</tr>
</tbody>
</table>
```

Random Register (1)

The Random register is a read-only register of which six bits index an entry in the TLB. This register decrements as each instruction executes, and its values range between an upper and a lower bound, as follows:

- A lower bound is set by the number of TLB entries reserved for exclusive use by the operating system (the contents of the Wired register).
- An upper bound is set by the total number of TLB entries. Thus the upper bound is 47 (The TLB entries are number from 0 to 47).

The R4600/R4700 implements this register differently from the R4000: The R4000 counts both valid and invalid instructions, while the R4600/R4700 counts only valid instructions.

The Random register specifies the entry in the TLB that is affected by the TLB Write Random instruction. The register does not need to be read for this purpose; however, the register is readable to verify proper operation of the processor.

To simplify testing, the Random register is set to the value of the upper bound upon system reset. This register is also set to the upper bound when the Wired register is written.

Figure 4.12 shows the format of the Random register; Table 4.8 on page 17 describes the Random register fields.
EntryLo0 (2), and EntryLo1 (3) Registers

The EntryLo register consists of two registers that have identical formats:
- EntryLo0 is used for even virtual pages.
- EntryLo1 is used for odd virtual pages.

The EntryLo0 and EntryLo1 registers are read/write registers. They hold the physical page frame number (PFN) of the TLB entry for even and odd pages, respectively, when performing TLB read and write operations. Figure 4.10 on page 15 shows the format of these registers.

PageMask Register (5)

The PageMask register is a read/write register used for reading from or writing to the TLB; it holds a comparison mask that sets the variable page size for each TLB entry, as shown in Table 4.9.

TLB read and write operations use this register as either a source or a destination; when virtual addresses are presented for translation into physical address, the corresponding bits in the TLB identify which virtual address bits among bits 24:13 are used in the comparison.

When the Mask field is not one of the values shown in Table 4.9, the operation of the TLB is undefined.

<table>
<thead>
<tr>
<th>Page Size</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
</tr>
</thead>
<tbody>
<tr>
<td>4 Kbytes</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>16 Kbytes</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>64 Kbytes</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>256 Kbytes</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1 Mbyte</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>4 Mbytes</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>16 Mbytes</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Table 4.9 Mask Field Values for Page Sizes
Wired Register (6)

The Wired register is a read/write register that specifies the boundary between the wired and random entries of the TLB, as shown in Figure 4.13. Wired entries are nonreplaceable entries, which cannot be overwritten by a TLB write random operation. Random entries can be overwritten.

![Figure 4.13 Wired Register Boundary](image)

The Wired register is set to 0 upon system reset. Writing this register also sets the Random register to the value of its upper bound (see Random register, above). Figure 4.14 shows the format of the Wired register; Table 4.10, which follows the figure, describes the register fields.

![Figure 4.14 Wired Register](image)

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Wired</td>
<td>TLB Wired boundary (the number of wired TLB entries)</td>
</tr>
<tr>
<td>0</td>
<td>Reserved. Must be written as zeroes, and returns zeroes when read.</td>
</tr>
</tbody>
</table>

Table 4.10 Wired Register Field Descriptions

EntryHi Register (CP0 Register 10)

The EntryHi register holds the high-order bits of a TLB entry for TLB read and write operations.

The EntryHi register is accessed by the TLB Probe, TLB Write Random, TLB Write Indexed, and TLB Read Indexed instructions.

Figure 4.9 shows the format of this register.

When either a TLB refill, TLB invalid, or TLB modified exception occurs, the EntryHi register is loaded with the virtual page number (VPN2) and the ASID of the virtual address that did not have a matching TLB entry. (See Chapter 5 for more information about these exceptions.)
Processor Revision Identifier (PRId) Register (15)

The 32-bit, read-only Processor Revision Identifier (PRId) register contains information identifying the implementation and revision level of the CPU and CP0. Figure 4.15 shows the format of the PRId register; Table 4.11 describes the PRId register fields.

![Figure 4.15 Processor Revision Identifier Register Format](image)

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Imp</td>
<td>Implementation number</td>
</tr>
<tr>
<td></td>
<td>R4600: Imp = 0x20</td>
</tr>
<tr>
<td></td>
<td>R4700: Imp = 0x21</td>
</tr>
<tr>
<td>Rev</td>
<td>Revision number</td>
</tr>
<tr>
<td>0</td>
<td>Reserved. Must be written as zeroes, and returns zeroes when read.</td>
</tr>
</tbody>
</table>

Table 4.11 PRId Register Fields

The low-order byte (bits 7:0) of the PRId register is interpreted as a revision number, and the high-order byte (bits 15:8) is interpreted as an implementation number. The implementation number of the R4600/R4700 processor is 0x20. The content of the high-order halfword (bits 31:16) of the register are reserved.

The revision number is stored as a value in the form \( y.x \), where \( y \) is a major revision number in bits 7:4 and \( x \) is a minor revision number in bits 3:0.

The revision number can distinguish some chip revisions, however there is no guarantee that changes to the chip will necessarily be reflected in the PRId register, or that changes to the revision number necessarily reflect real chip changes. For this reason, these values are not listed and software should not rely on the revision number in the PRId register to characterize the chip. Certain attributes, such as cache size, are independent of implementation number.

Config Register (16)

The Config register specifies various configuration options selected on R4600/R4700 processors; Table 4.12 lists these options.

Some configuration options, as defined by Config bits 31:3, are set by the hardware during reset and are included in the Config register as read-only status bits for the software to access. The K0 field is the only read/write field (as indicated by Config register bits 2:0) and controlled by software; on reset these fields are undefined.

Figure 4.16 shows the format of the Config register; Table 4.12, which follows the figure, describes the Config register fields.

![Figure 4.16 Config Register Format](image)
Load Linked Address (LLAddr) Register (17)

The read/write Load Linked Address (LLAddr) register contains the physical address read by the most recent Load Linked instruction. This register is for diagnostic purposes only, and serves no function during normal operation.

Figure 4.17 shows the format of the LLAddr register; PAddr represents bits of the physical address, PA(35:4).

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>EC</td>
<td>System clock ratio:</td>
</tr>
<tr>
<td></td>
<td>0 → processor clock frequency divided by 2</td>
</tr>
<tr>
<td></td>
<td>1 → processor clock frequency divided by 3</td>
</tr>
<tr>
<td></td>
<td>2 → processor clock frequency divided by 4</td>
</tr>
<tr>
<td></td>
<td>3 → processor clock frequency divided by 5</td>
</tr>
<tr>
<td></td>
<td>4 → processor clock frequency divided by 6</td>
</tr>
<tr>
<td></td>
<td>5 → processor clock frequency divided by 7</td>
</tr>
<tr>
<td></td>
<td>6 → processor clock frequency divided by 8</td>
</tr>
<tr>
<td></td>
<td>7 Reserved</td>
</tr>
<tr>
<td>EP</td>
<td>Writeback data rate:</td>
</tr>
<tr>
<td></td>
<td>0 → DDDD Doubleword every cycle</td>
</tr>
<tr>
<td></td>
<td>1 → DDxDDx 2 Doublewords every 3 cycles</td>
</tr>
<tr>
<td></td>
<td>2 → DDxxDDxx 2 Doublewords every 4 cycles</td>
</tr>
<tr>
<td></td>
<td>3 → DxDDxDDx 2 Doublewords every 4 cycles</td>
</tr>
<tr>
<td></td>
<td>4 → DDxxxxDDxxxx 2 Doublewords every 5 cycles</td>
</tr>
<tr>
<td></td>
<td>5 → DDxxxxDDxxxx 2 Doublewords every 6 cycles</td>
</tr>
<tr>
<td></td>
<td>6 → DxxxDDxDDxDDxx 2 Doublewords every 6 cycles</td>
</tr>
<tr>
<td></td>
<td>7 → DDxxxxDDxxxxxx 2 Doublewords every 7 cycles</td>
</tr>
<tr>
<td></td>
<td>8 → DxxxDDxDDxDDxDDxx 2 Doublewords every 8 cycles</td>
</tr>
<tr>
<td></td>
<td>9 - 15 Reserved</td>
</tr>
<tr>
<td>BE</td>
<td>BigEndianMem</td>
</tr>
<tr>
<td></td>
<td>0 → Little endian</td>
</tr>
<tr>
<td></td>
<td>1 → Big endian</td>
</tr>
<tr>
<td>IC</td>
<td>Primary I-cache Size (I-cache size = (2^{12+IC}) bytes). In the R4600/R4700 processor, this is set to 16 Kbytes (IC = 010)</td>
</tr>
<tr>
<td>DC</td>
<td>Primary D-cache Size (D-cache size = (2^{12+DC}) bytes). In the R4600/R4700 processor, this is set to 16 Kbytes (DC = 010)</td>
</tr>
<tr>
<td>IB</td>
<td>Primary I-cache line size</td>
</tr>
<tr>
<td></td>
<td>1 → 32 bytes (8 Words)</td>
</tr>
<tr>
<td>DB</td>
<td>Primary D-cache line size</td>
</tr>
<tr>
<td></td>
<td>1 → 32 bytes (8 Words)</td>
</tr>
<tr>
<td>K0</td>
<td>kseg0 coherency algorithm (see EntryLo0 and EntryLo1 registers)</td>
</tr>
<tr>
<td>Others</td>
<td>Reserved. Returns indicated values when read.</td>
</tr>
</tbody>
</table>

Table 4.12 Config Register Fields
Cache Tag Registers [TagLo (28) and TagHi (29)]

The TagLo and TagHi registers are 32-bit read/write registers that hold the primary cache tag and parity during cache initialization, cache diagnostics, or cache error processing. The Tag registers are written by the CACHE and MTC0 instructions.

The P field of these registers is ignored on Index Store Tag operations. Parity is computed by the store operation.

The Windows NT Operating System uses the TagLo cp0 register to save/restore gp registers in the TLB refill exception handler. Thus, all 32 bits must be present, even though they have no use for the primary purpose of TagLo.

Figure 4.18 shows the format of these registers for primary cache operations. Table 4.13 lists the field definitions of the TagLo and TagHi registers.

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>PTagLo</td>
<td>Specifies the physical address bits 35:12</td>
</tr>
<tr>
<td>PState</td>
<td>Specifies the primary cache state</td>
</tr>
<tr>
<td>P</td>
<td>Specifies the primary tag even parity bit</td>
</tr>
<tr>
<td>F</td>
<td>The FIFO bit used to implement FIFO refill of the cache</td>
</tr>
<tr>
<td>RWNT</td>
<td>Read/Write bits required for Windows NT</td>
</tr>
<tr>
<td>0</td>
<td>Reserved. Must be written as zeroes; returns zeroes when read</td>
</tr>
</tbody>
</table>

Table 4.13 Cache Tag Register Fields
Virtual-to-Physical Address Translation Process

During virtual-to-physical address translation, the CPU compares the 8-bit ASID (if the Global bit, \( G \), is not set) of the virtual address to the ASID of the TLB entry to see if there is a match.

The following comparison is also made:

- For the 64-bit virtual addresses, the highest 15-to-27 bits (depending upon the page size) of the virtual address are compared to the contents of the TLB virtual page number.

If a TLB entry matches, the physical address and access control bits (\( C, D, \) and \( V \)) are retrieved from the matching TLB entry. While the \( V \) bit of the entry must be set for a valid translation to take place, it is not involved in the determination of a matching TLB entry.

Figure 4.19 illustrates the TLB address translation process.

![Figure 4.19 TLB Address Translation](image-url)
**TLB Misses**

If there is no TLB entry that matches the virtual address, a TLB miss exception occurs. If the access control bits ($D$ and $V$) indicate that the access is not valid, a TLB modification or TLB invalid exception occurs. If the $C$ bits equal $010_2$, the physical address that is retrieved accesses main memory, bypassing the cache.

**TLB Instructions**

Table 4.14 lists the instructions that the CPU provides for working with the TLB. See Appendix A for a detailed description of these instructions.

<table>
<thead>
<tr>
<th>Op Code</th>
<th>Description of Instruction</th>
</tr>
</thead>
<tbody>
<tr>
<td>TLBP</td>
<td>Translation Lookaside Buffer Probe</td>
</tr>
<tr>
<td>TLBR</td>
<td>Translation Lookaside Buffer Read</td>
</tr>
<tr>
<td>TLBWI</td>
<td>Translation Lookaside Buffer Write Index</td>
</tr>
<tr>
<td>TLBWR</td>
<td>Translation Lookaside Buffer Write Random</td>
</tr>
</tbody>
</table>

*Table 4.14 TLB Instructions*
This chapter describes the CPU exception processing, including an explanation of exception processing, followed by the format and use of each CPU exception register.

The chapter concludes with a description of each exception’s cause, together with the manner in which the CPU processes and services these exceptions. For information about Floating-Point Unit exceptions, see Chapter 7.

**How Exception Processing Works**

The processor receives exceptions from a number of sources, including translation lookaside buffer (TLB) misses, arithmetic overflows, I/O interrupts, and system calls. When the CPU detects one of these exceptions, the normal sequence of instruction execution is suspended and the processor enters Kernel mode (see Chapter 4 for a description of system operating modes).

The processor then disables interrupts and forces execution of a software exception processor (called a handler) located at a fixed address. The handler may save the context of the processor, including the contents of the program counter, the current operating mode (User or Supervisor), and the status of the interrupts (enabled or disabled). This context would be saved so it can be restored when the exception has been serviced.

When an exception occurs, the CPU loads the Exception Program Counter (EPC) register with a location where execution can restart after the exception has been serviced. The restart location in the EPC register is the address of the instruction that caused the exception or, if the instruction was executing in a branch delay slot, the address of the branch instruction immediately preceding the delay slot.

The registers described later in the chapter assist in this exception processing by retaining address, cause and status information.

For a description of the exception handling process, see the description of the individual exception contained in this chapter, or the flowcharts at the end of this chapter.

**Exception Processing Registers**

This section describes the CP0 registers that are used in exception processing. Table 5.1 on page 5-2 lists these registers, along with their number—each register has a unique identification number that is referred to as its *register number*. For instance, the ECC register is register number 26. The remaining CP0 registers are used in memory management, as described in Chapter 4.

Software examines the CP0 registers during exception processing to determine the cause of the exception and the state of the CPU at the time the exception occurred. The registers in Table 5.1 are used in exception processing, and are described in the sections that follow.
The Context register is a read/write register containing the pointer to an entry in the page table entry (PTE) array; this array is an operating system data structure that stores virtual-to-physical address translations. When there is a TLB miss, the CPU loads the TLB with the missing translation from the PTE array. Normally, the operating system uses the Context register to address the current page map which resides in the kernel-mapped segment, kseg3. The Context register duplicates some of the information provided in the BadVAddr register, but the information is arranged in a form that is more useful for a software TLB exception handler. Figure 5.1 shows the format of the Context register; Table 5.2, which follows the figure, describes the Context register fields.

<table>
<thead>
<tr>
<th>Register Name</th>
<th>Reg. No.</th>
</tr>
</thead>
<tbody>
<tr>
<td>Context</td>
<td>4</td>
</tr>
<tr>
<td>BadVAddr (Bad Virtual Address)</td>
<td>8</td>
</tr>
<tr>
<td>Count</td>
<td>9</td>
</tr>
<tr>
<td>Compare register</td>
<td>11</td>
</tr>
<tr>
<td>Status</td>
<td>12</td>
</tr>
<tr>
<td>Cause</td>
<td>13</td>
</tr>
<tr>
<td>EPC (Exception Program Counter)</td>
<td>14</td>
</tr>
<tr>
<td>XContext</td>
<td>20</td>
</tr>
<tr>
<td>ECC</td>
<td>26</td>
</tr>
<tr>
<td>CacheErr (Cache Error and Status)</td>
<td>27</td>
</tr>
<tr>
<td>ErrorEPC (Error Exception Program Counter)</td>
<td>30</td>
</tr>
</tbody>
</table>

Table 5.1 CP0 Exception Processing Registers

**Context Register (4)**

The Context register is a read/write register containing the pointer to an entry in the page table entry (PTE) array; this array is an operating system data structure that stores virtual-to-physical address translations. When there is a TLB miss, the CPU loads the TLB with the missing translation from the PTE array. Normally, the operating system uses the Context register to address the current page map which resides in the kernel-mapped segment, kseg3. The Context register duplicates some of the information provided in the BadVAddr register, but the information is arranged in a form that is more useful for a software TLB exception handler. Figure 5.1 shows the format of the Context register; Table 5.2, which follows the figure, describes the Context register fields.

**Figure 5.1 Context Register Format**

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>BadVPN2</td>
<td>This field is written by hardware on a miss. It contains the virtual page number (VPN) of the most recent virtual address that did not have a valid translation.</td>
</tr>
<tr>
<td>PTEBase</td>
<td>This field is a read/write field for use by the operating system. It is normally written with a value that allows the operating system to use the Context register as a pointer into the current PTE array in memory.</td>
</tr>
</tbody>
</table>

Table 5.2 Context Register Fields

The 19-bit BadVPN2 field contains bits 31:13 of the virtual address that caused the TLB miss; bit 12 is excluded because a single TLB entry maps to an even-odd page pair. For a 4-Kbyte page size, this format can directly address the pair-table of 8-byte PTEs. For other page and PTE sizes, shifting and masking this value produces the appropriate address.
Bad Virtual Address Register (BadVAddr) (8)

The Bad Virtual Address register (BadVAddr) is a read-only register that displays the most recent virtual address that caused one of the following exceptions: Address Error (e.g., unaligned access), TLB Invalid, TLB Modified, TLB Refill, Virtual Coherency Data Access, or Virtual Coherency Instruction Fetch.

The processor does not write to the BadVAddr register when the EXL bit in the Status register is set to a 1.

Figure 5.2 shows the format of the BadVAddr register.

![BadVAddr Register](image)

**Note:** The BadVAddr register does not save any information for bus errors, since bus errors are not addressing errors.

Count Register (9)

The Count register acts as a timer, incrementing at a constant rate—half the maximum instruction issue rate—whether or not an instruction is executed, retired, or any forward progress is made through the pipeline.

This register can be read or written. It can be written for diagnostic purposes or system initialization; for example, to synchronize processors.

Figure 5.3 shows the format of the Count register.

![Count Register](image)

Compare Register (11)

The Compare register acts as a timer (see also the Count register); it maintains a stable value that does not change on its own.

When the value of the Count register equals the value of the Compare register, interrupt bit IP(7) in the Cause register is set. This causes an interrupt as soon as the interrupt is enabled.

Writing a value to the Compare register, as a side effect, clears the timer interrupt.

For diagnostic purposes, the Compare register is a read/write register. In normal use however, the Compare register is write-only. Figure 5.4 shows the format of the Compare register.

![Compare Register](image)
Status Register (12)

The Status register (SR) is a read/write register that contains the operating mode, interrupt enabling, and the diagnostic states of the processor. The following list describes the more important Status register fields; Figure 5.5 show the format of the entire register, including descriptions of the fields. Some of the important fields include:

- The 8-bit Interrupt Mask (IM) field controls the enabling of eight interrupt conditions. Interrupts must be enabled before they can cause the exception, and the corresponding bits are set in both the Interrupt Mask field of the Status register and the Interrupt Pending field of the Cause register. For more information, refer to the Interrupt Pending (IP) field of the Cause register. IM[1:0] are the masks for the two software interrupts while IM[7:2] correspond to Int[5:0].

- The 4-bit Coprocessor Usability (CU) field controls the usability of 4 possible coprocessors. Regardless of the CU0 bit setting, CP0 is always usable in Kernel mode. For all other cases, an instruction for or access to an unusable coprocessor causes an exception.

- The 9-bit Diagnostic Status (DS) field (Status[24:16]) is used for self-testing, and checks the cache and virtual memory system.

- The Reverse-Endian (RE) bit, bit 25, reverses the endianness of the machine. The processor can be configured as either little-endian or big-endian at system reset. This selection is always used in Kernel and Supervisor modes, and also in User mode when the RE bit is 0. Setting the RE bit to 1 inverts the User mode endianness.

Status Register Format

Figure 5.5 shows the format of the Status register. Table 5.3, which follows the figure, describes the Status register fields.
### Field Description

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
</table>
| CU    | Controls the usability of each of the four coprocessor unit numbers. CP0 is always usable when in Kernel mode, regardless of the setting of the CU<sub>0</sub> bit.  
  1 → usable  
  0 → unusable |
| FR    | Enables additional floating-point registers  
  0 → 16 registers  
  1 → 32 registers |
| RE    | Reverse-Endian bit, valid in User mode. |
| BEV   | Controls the location of TLB refill and general exception vectors.  
  0 → normal  
  1→ bootstrap |
| SR    | 1→ Indicates a soft reset or NMI has occurred. |
| CH    | Hit (tag match and valid state) or miss indication for last CACHE Hit Invalidate, Hit Write Back Invalidate, Hit Write Back, or Hit Set Virtual for a primary cache.  
  0 → miss  
  1 → hit |
| CE    | Contents of the ECC register set or modify the check bits of the caches when CE = 1; see description of the ECC register. |
| DE    | Specifies that cache parity errors cannot cause exceptions.  
  0 → parity remains enabled  
  1→ disables parity |
| 0     | Reserved. Must be written as zeroes, and returns zeroes when read. |
| IM    | Interrupt Mask controls the enabling of each of the external, internal, and software interrupts. An interrupt is taken if interrupts are enabled, and the corresponding bits are set in both the Interrupt Mask field of the Status register and the Interrupt Pending field of the Cause register. IM[7:2] correspond to interrupts Int[5:0] and IM[1:0] to the software interrupts.  
  0 → disabled  
  1→ enabled |
| KX    | KX controls whether the TLB Refill Vector or the XTLB Refill Vector address is used for TLB misses on kernel addresses  
  0 → TLB Refill Vector  
  1 → XTLB Refill Vector |
| SX    | Enables 64-bit virtual addressing and operations in Supervisor mode. The extended-addressing TLB refill exception is used for TLB misses on supervisor addresses.  
  0 → 32-bit  
  1 → 64-bit |
| UX    | Enables 64-bit virtual addressing and operations in User mode. The extended-addressing TLB refill exception is used for TLB misses on user addresses.  
  0 → 32-bit  
  1 → 64-bit |
| KSU   | Mode bits  
  10<sub>2</sub> → User  
  01<sub>2</sub> → Supervisor  
  00<sub>2</sub> → Kernel |
| ERL   | Error Level  
  0 → normal  
  1 → error |
| EXL   | Exception Level  
  0 → normal  
  1 → exception  
  **Note:** When going from 0 to 1, IE should be disabled (0) first. This would be done when preparing to return from the exception handler, such as before executing the ERET instruction. |
| IE    | Interrupt Enable  
  0 → disable interrupts  
  1 → enables interrupts |

*Table 5.3 Status Register Fields*
Status Register Modes and Access States

Fields of the Status register set the modes and access states described in the sections that follow.

Interrupt Enable: Interrupts are enabled when all of the following conditions are true:
- $IE = 1$
- $EXL = 0$
- $ERL = 0$

If these conditions are met, the settings of the $IM$ bits identify the interrupt.

Note: Setting the $IE$ bit may be delayed by up to 3 cycles. If performing nested interrupts, re-enable the $IE$ bit first.

Operating Modes: The following CPU Status register bit settings are required for User, Kernel, and Supervisor modes (see Chapter 4 for more information about operating modes).
- The processor is in User mode when $KSU = 10_2$, $EXL = 0$, and $ERL = 0$.
- The processor is in Supervisor mode when $KSU = 01_2$, $EXL = 0$, and $ERL = 0$.
- The processor is in Kernel mode when $KSU = 00_2$, or $EXL = 1$, or $ERL = 1$.

32- and 64-bit Virtual Addressing: The following CPU Status register bit settings select 32- or 64-bit virtual addressing for User and Supervisor operating modes. Enabling 64-bit virtual addressing permits the execution of 64-bit opcodes and translation of 64-bit virtual addresses. 64-bit virtual addressing for User and Supervisor modes can be set independently but is always used for Kernel mode.
- The $KX$ field controls whether the TLB Refill Vector or the XTLB Refill Vector address is used for TLB misses on Kernel addresses. 64-bit opcodes are always valid in Kernel mode.
- 64-bit addressing and operations are enabled for Supervisor mode when $SX = 1$.
- 64-bit addressing and operations are enabled for User mode when $UX = 1$.

Kernel Address Space Accesses: Access to the kernel address space is allowed when the processor is in Kernel mode.

Supervisor Address Space Accesses: Access to the supervisor address space is allowed when the processor is in Kernel or Supervisor mode, as described above in the paragraph titled Operating Modes.

User Address Space Accesses: Access to the user address space is allowed in any of the three operating modes.

Status Register Reset

The contents of the Status register are undefined at reset, except for the following bits — $ERL$ and $BEV = 1$.

The $SR$ bit distinguishes between Reset and Soft Reset (Nonmaskable Interrupt [NMI]).
Cause Register (13)
The 32-bit read/write Cause register describes the cause of the most recent exception.
Figure 5.6 shows the fields of this register; Table 5.4, which follows the figure, describes the Cause register fields. A 5-bit exception code (ExcCode) indicates the cause of the most recent exception, as listed in Table 5.5 on page 5-8.
All bits in the Cause register, with the exception of the IP(1:0) bits, are read-only; IP(1:0) are used for software interrupts.

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>BD</td>
<td>Indicates whether the last exception taken occurred in a branch delay slot. 1 → delay slot 0 → normal</td>
</tr>
<tr>
<td>CE</td>
<td>Coprocessor unit number referenced when a Coprocessor Unusable exception is taken.</td>
</tr>
<tr>
<td>IP</td>
<td>Indicates an interrupt is pending. 1 → interrupt pending 0 → no interrupt</td>
</tr>
<tr>
<td>ExcCode</td>
<td>Exception code field (see Table 5.5 on page 5-8)</td>
</tr>
<tr>
<td>0</td>
<td>Reserved. Must be written as zeroes, and returns zeroes when read.</td>
</tr>
</tbody>
</table>

Table 5.4 Cause Register Fields
The Exception Program Counter (EPC) is a read/write register that contains the address at which processing resumes after an exception has been serviced.

For synchronous exceptions, the EPC register contains either:
- the virtual address of the instruction that was the direct cause of the exception, or
- the virtual address of the immediately preceding branch or jump instruction (when the instruction is in a branch delay slot, and the Branch Delay bit in the Cause register is set).

The processor does not write to the EPC register when the EXL bit in the Status register is set to a 1.

Figure 5.7 shows the format of the EPC register.

<table>
<thead>
<tr>
<th>Exception Code Value</th>
<th>Mnemonic</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Int</td>
<td>Interrupt</td>
</tr>
<tr>
<td>1</td>
<td>Mod</td>
<td>TLB modification exception</td>
</tr>
<tr>
<td>2</td>
<td>TLBL</td>
<td>TLB exception (load or instruction fetch)</td>
</tr>
<tr>
<td>3</td>
<td>TLBS</td>
<td>TLB exception (store)</td>
</tr>
<tr>
<td>4</td>
<td>AdEL</td>
<td>Address error exception (load or instruction fetch)</td>
</tr>
<tr>
<td>5</td>
<td>AdES</td>
<td>Address error exception (store)</td>
</tr>
<tr>
<td>6</td>
<td>IBE</td>
<td>Bus error exception (instruction fetch)</td>
</tr>
<tr>
<td>7</td>
<td>DBE</td>
<td>Bus error exception (data reference: load or store)</td>
</tr>
<tr>
<td>8</td>
<td>Sys</td>
<td>Syscall exception</td>
</tr>
<tr>
<td>9</td>
<td>Bp</td>
<td>Breakpoint exception</td>
</tr>
<tr>
<td>10</td>
<td>RI</td>
<td>Reserved instruction exception</td>
</tr>
<tr>
<td>11</td>
<td>CpU</td>
<td>Coprocessor Unusable exception</td>
</tr>
<tr>
<td>12</td>
<td>Ov</td>
<td>Arithmetic Overflow exception</td>
</tr>
<tr>
<td>13</td>
<td>Tr</td>
<td>Trap exception</td>
</tr>
<tr>
<td>14</td>
<td>—</td>
<td>Reserved</td>
</tr>
<tr>
<td>15</td>
<td>FPE</td>
<td>Floating-Point exception</td>
</tr>
<tr>
<td>16–31</td>
<td>—</td>
<td>Reserved</td>
</tr>
</tbody>
</table>

**Table 5.5 Cause Register ExcCode Field**

**Exception Program Counter (EPC) Register (14)**

The Exception Program Counter (EPC) is a read/write register that contains the address at which processing resumes after an exception has been serviced.

For synchronous exceptions, the EPC register contains either:
- the virtual address of the instruction that was the direct cause of the exception, or
- the virtual address of the immediately preceding branch or jump instruction (when the instruction is in a branch delay slot, and the Branch Delay bit in the Cause register is set).

The processor does not write to the EPC register when the EXL bit in the Status register is set to a 1.

Figure 5.7 shows the format of the EPC register.
**XContext Register (20)**

The read/write XContext register contains a pointer to an entry in the page table entry (PTE) array, an operating system data structure that stores virtual-to-physical address translations. When there is a TLB miss, the operating system software loads the TLB with the missing translation from the PTE array. The XContext register duplicates some of the information provided in the BadVAddr register, and puts it in a form useful for a software TLB exception handler.

The XContext register is for use with the XTLB refill handler, which loads TLB entries for references to a 64-bit address space, and is included solely for operating system use. The operating system sets the PTE base field in the register, as needed. Normally, the operating system uses the XContext register to address the current page map, which resides in the kernel-mapped segment kseg3.

Figure 5.8 shows the format of the XContext register; Table 5.6, which follows the figure, describes the XContext register fields.

![Figure 5.8 XContext Register Format](image)

The 27-bit BadVPN2 field has bits 39:13 of the virtual address that caused the TLB miss; bit 12 is excluded because a single TLB entry maps to an even-odd page pair. For a 4-Kbyte page size, this format may be used directly to address the pair-table of 8-byte PTEs. For other page and PTE sizes, shifting and masking this value produces the appropriate address.

<table>
<thead>
<tr>
<th><strong>Field</strong></th>
<th><strong>Description</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>BadVPN2</td>
<td>The Bad Virtual Page Number/2 field is written by hardware on a miss. It contains the VPN of the most recent invalidly translated virtual address.</td>
</tr>
<tr>
<td>R</td>
<td>The Region field contains bits 63:62 of the virtual address. 00&lt;sub&gt;2&lt;/sub&gt; = user 01&lt;sub&gt;2&lt;/sub&gt; = supervisor 11&lt;sub&gt;2&lt;/sub&gt; = kernel.</td>
</tr>
<tr>
<td>PTEBase</td>
<td>The Page Table Entry Base read/write field is normally written with a value that allows the operating system to use the Context register as a pointer into the current PTE array in memory.</td>
</tr>
</tbody>
</table>

**Error Checking and Correcting (ECC) Register (26)**

The 8-bit Error Checking and Correcting (ECC) register reads or writes primary-cache data parity bits for cache initialization, cache diagnostics, or cache error processing. (Tag parity is loaded from and stored to the TagLo register.)

The ECC register is loaded by the Index Load Tag CACHE operation. Content of the ECC register is:

- written into the primary data cache on store instructions (instead of the computed parity) when the CE bit of the Status register is set
- substituted for the computed instruction parity for the CACHE operation Fill

To force a cache parity value use the Status CE bit and the ECC register.
Figure 5.9 shows the format of the ECC register; Table 5.7, which follows the figure, describes the register fields.

![ECC Register](image)

**Table 5.7 ECC Register Fields**

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ECC</td>
<td>An 8-bit field specifying the parity bits read from or written to a primary cache.</td>
</tr>
<tr>
<td>0</td>
<td>Reserved. Must be written as zeroes, and returns zeroes when read.</td>
</tr>
</tbody>
</table>

**Cache Error (CacheErr) Register (27)**

The 32-bit read-only CacheErr register processes parity errors in the primary cache. Parity errors cannot be corrected.

The CacheErr register holds cache index and status bits that indicate the source and nature of the error; it is loaded when a Cache Error exception is asserted. When a read response returns with bad parity this exception is also asserted.

Figure 5.10 shows the format of the CacheErr register; which follows the figure, describes the CacheErr register fields.

![CacheErr Register](image)
The `ErrorEPC` register is similar to the `EPC` register, except that `ErrorEPC` is used on parity error exceptions. It is also used to store the program counter (PC) on Reset, Soft Reset, and nonmaskable interrupt (NMI) exceptions.

The read/write `ErrorEPC` register contains the virtual address at which instruction processing can resume after servicing an error. This address can be:

- the virtual address of the instruction that caused the exception
- the virtual address of the immediately preceding branch or jump instruction, when this address is in a branch delay slot.

There is no branch delay slot indication for the `ErrorEPC` register.
Figure 5.11 shows the format of the ErrorEPC register.

![ErrorEPC Register Format]

**Processor Exceptions**

This section describes the processor exceptions—it describes the cause of each exception, its processing by the hardware, and servicing by a handler (software). The types of exception, with exception processing operations, are described in the next section.

**Exception Types**

This section gives sample exception handler operations for the following exception types:
- reset
- soft reset
- nonmaskable interrupt (NMI)
- cache error
- remaining processor exceptions

When the EXL bit in the Status register is 0, either User or Supervisor operating mode is specified by the KSU bits in the Status register. When the EXL bit or the ERL bit is a 1, the processor is in Kernel mode.

When the processor takes an exception, the EXL bit is set to 1, which means the system is in Kernel mode. After saving the appropriate state, the exception handler typically resets the EXL bit back to 0. When restoring the state and restarting, the handler sets the EXL bit back to 1.

Returning from an exception, also resets the EXL bit to 0 (see the ERET instruction in Appendix A).

In the following sections, sample hardware processes for various exceptions are shown, together with the servicing required by the handler (software).

**Reset Exception Process**

Figure 5.12 shows the Reset exception process.

```
T: undefined
Random ← TLBENTRIES-1
Wired ← 0
Config ← 0 || EC || EP || 00000000 || BE || 110 || 010 || 010 || 1 || 1 || 0 || undefined
ErrorEPC ← PC
SR ← SR31:23 || 1 || 0 || 0 || SR19:3 || 1 || SR1:0
PC ← 0xFFFF FFFF BFC0 0000
```

![Figure 5.12 Reset Exception Processing]
Cache Error Exception Process

Figure 5.13 shows the Cache Error exception process.

```
T: ErrorEPC ← PC
   CacheErr ← ER || EC || ED || ET || ES || EE || EB || 0^25
   SR ← SR31:3 || 1 || SR1:0
   if SR22 = 1 then /* What is the BEV bit setting */
      PC ← 0xFFFF FFFF BFC0 0200 + 0x100 /* access boot-PROM area */
   else
      PC ← 0xFFFF FFFF A000 0000 + 0x100 /* access main memory area */
   endif
```

Figure 5.13 Cache Error Exception Processing

Soft Reset and NMI Exception Process

Figure 5.14 shows the Soft Reset and NMI exception process.

```
T: ErrorEPC ← PC
   SR ← SR31:23 || 1 || 0 || 1 || SR19:3 || 1 || SR1:0
   PC ← 0xFFFF FFFF BFC0 0000
```

Figure 5.14 Soft Reset and NMI Exception Processing

General Exception Process

Figure 5.15 shows the process used for exceptions other than Reset, Soft Reset, NMI, and Cache Error.

```
T: Cause ← BD || 0 || CE || 0^{12} || Cause_{15:8} || 0 || ExcCode || 0^2
   if SR1 = 0 then /* system in User or Supervisor mode with no current exception */
      EPC ← PC
   endif
   SR ← SR31:2 || 1 || SR0
   if SR22 = 1 then /* What is the BEV bit setting */
      PC ← 0xFFFF FFFF BFC0 0200 + vector /* access to uncached space */
   else
      PC ← 0xFFFF FFFF 8000 0000 + vector /* access to cached space */
   endif
```

Figure 5.15 General Exception Processing (Except Reset, Soft Reset, NMI, and Cache Error)

Exception Vector Locations

The Reset, Soft Reset, and NMI exceptions are always vectored to location 0xFFFF FFFF BFC0 0000 (virtual address), corresponding to kseg0.

Addresses for all other exceptions are a combination of a vector offset and a base address. The base address is determined by the BEV bit of the Status register, as shown in Table 5.9.
Table 5.10 shows the vector offset that is added to the base address to create the exception address.

<table>
<thead>
<tr>
<th>BEV</th>
<th>R4600/R4700 Processor Vector Base</th>
<th>Cache Error Base</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0xFFFF FFFF 8000 0000</td>
<td>0xFFFF FFFF A000 0000</td>
</tr>
<tr>
<td>1</td>
<td>0xFFFF FFFF BFC0 0200</td>
<td>0xFFFF FFFF BFC0 0200</td>
</tr>
</tbody>
</table>

Table 5.9 Exception Vector Base Addresses

As shown in Table 5.9, when $BEV = 0$, the vector base for the Cache Error exception changes from $kseg0$ (0xFFFF FFFF 8000 0000) to $kseg1$ (0xFFFF FFFF A000 0000).

When $BEV = 1$, the vector base for the Cache Error exception is 0xFFFF FFFF BFC0 0200. This is an uncached and unmapped space, allowing the exception to bypass the cache and TLB.

<table>
<thead>
<tr>
<th>Exception</th>
<th>R4600/R4700 Processor Vector Offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>TLB refill, EXL = 0</td>
<td>0x000</td>
</tr>
<tr>
<td>XTLB refill, EXL = 0 (X = 64-bit TLB)</td>
<td>0x080</td>
</tr>
<tr>
<td>Cache Error</td>
<td>0x100</td>
</tr>
<tr>
<td>Others</td>
<td>0x180</td>
</tr>
</tbody>
</table>

Table 5.10 Exception Vector Offsets

Priority of Exceptions

The remainder of this chapter describes exceptions in the order of their priority, as shown in Table 5.11. While more than one exception can occur for a single instruction, only the exception with the highest priority is reported.

<table>
<thead>
<tr>
<th>Exception Priority</th>
<th>Exception / Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Reset <em>(highest priority)</em></td>
</tr>
<tr>
<td>2</td>
<td>Soft Reset</td>
</tr>
<tr>
<td>3</td>
<td>Nonmaskable Interrupt (NMI)</td>
</tr>
<tr>
<td>4</td>
<td>Address error — Instruction fetch</td>
</tr>
<tr>
<td>5</td>
<td>TLB refill — Instruction fetch</td>
</tr>
<tr>
<td>6</td>
<td>TLB invalid — Instruction fetch</td>
</tr>
<tr>
<td>7</td>
<td>Cache error — Instruction fetch</td>
</tr>
<tr>
<td>8</td>
<td>Bus error — Instruction fetch</td>
</tr>
<tr>
<td>9</td>
<td>Integer overflow, Trap, System Call, Breakpoint, Reserved Instruction, Coprocessor Unusable, or Floating-Point Exception</td>
</tr>
<tr>
<td>10</td>
<td>Address error — Data access</td>
</tr>
<tr>
<td>11</td>
<td>TLB refill — Data access</td>
</tr>
<tr>
<td>12</td>
<td>TLB invalid — Data access</td>
</tr>
<tr>
<td>13</td>
<td>TLB modified — Data write</td>
</tr>
<tr>
<td>14</td>
<td>Cache error — Data access</td>
</tr>
<tr>
<td>15</td>
<td>Bus error — Data access</td>
</tr>
<tr>
<td>16</td>
<td>Interrupt <em>(lowest priority)</em></td>
</tr>
</tbody>
</table>

Table 5.11 Exception Priority Order

Generally speaking, the exceptions described in the following sections are handled ("processed") by hardware; these exceptions are then serviced by software.
Reset Exception
This section explains the Reset exception.

Cause
The Reset exception occurs when the ColdReset\(^1\) signal is asserted and then deasserted. This exception is not maskable.

Processing
The CPU provides a special exception vector for this exception of:
0xFFFF FFFF BFC0 0000

The Reset vector resides in unmapped and uncached CPU address space, so the hardware need not initialize the TLB or the cache to process this exception. It also means the processor can fetch and execute instructions while the caches and virtual memory are in an undefined state.

The contents of all registers in the CPU are undefined when this exception occurs, except for the following register fields:
- In the Status register, \( SR \) is cleared to 0, and \( ERL \) and \( BEV \) are set to 1. All other bits are undefined.
- The Random register is initialized to the value of its upper bound.
- The Wired register is initialized to 0.
- Some of the Config Register bits are initialized from the boot-time mode stream.

Reset exception processing is shown in Figure 5.12 on page 12.

Servicing
The Reset exception is serviced by:
- initializing all processor registers, coprocessor registers, caches, and the memory system
- performing diagnostic tests
- bootstrapping the operating system

\(^1\) In the following sections (and throughout this manual) a signal followed by an asterisk, such as \textbf{Reset*}, is low active.
**Soft Reset Exception**

This section explains the Soft Reset exception.

**Cause**

The Soft Reset exception occurs in response to the **Reset** input signal, and execution begins at the Reset vector when **Reset** is deasserted. This exception is not maskable.

**Processing**

The Reset exception vector is used for this exception, located within unmapped and uncached address space so that the cache and TLB need not be initialized to process this exception. When a Soft Reset occurs, the SR bit of the Status register is set to distinguish this exception from a Reset exception.

The primary purpose of the Soft Reset exception is to reinitialize the processor after a fatal error during normal operations. Unlike an NMI, all cache and bus state machines are reset by this exception. Like Reset, it can be used on the processor in any state; the caches, TLB, and normal exception vectors need not be properly initialized. Soft Reset preserves the state of the caches and memory system, while resetting the bus state and cache state machine.

When this exception occurs, the contents of all registers are preserved except for:

- **ErrorEPC** register, which contains the restart PC
- **ERL** bit of the **Status** register, which is set to 1
- **SR** bit of the **Status** register, which is set to 1
- **BEV** bit of the **Status** register, which is set to 1

Because the Soft Reset can abort cache and bus operations, cache and memory state is undefined when this exception occurs.

Soft reset exception processing is shown in Figure 5.14 on page 13.

**Servicing**

The Soft Reset exception is serviced by saving the current processor state for diagnostic purposes, and reinitializing for the Reset exception.
Nonmaskable Interrupt (NMI) Exception

This section explains the Nonmaskable Interrupt exception.

Cause

The Nonmaskable Interrupt (NMI) exception occurs in response to the falling edge of the NMI pin, or an external write to the \texttt{Int\*[6]} bit of the \texttt{Interrupt} register.

Unlike all other interrupts, this interrupt is not maskable; it occurs regardless of the settings of the \texttt{EXL}, \texttt{ERL}, and the \texttt{IE} bits in the \texttt{Status} register.

Processing

The Reset exception vector is used for this exception. This vector is located within unmapped and uncached address space so that the cache and TLB need not be initialized to process an NMI interrupt. When an NMI exception occurs, the \texttt{SR} bit of the \texttt{Status} register is set to differentiate this exception from a Reset exception.

Because an NMI can occur in the midst of another exception, it is not normally possible to continue program execution after servicing an NMI.

Unlike Reset and Soft Reset, but like other exceptions, NMI is taken only at instruction boundaries. The state of the caches and memory system are preserved by this exception.

To terminate a pending read that has hung the best approach is to return a bus error. However, if you wish to use a CPU exception to indicate a hung read, Soft Reset is preferable to NMI.

When this exception occurs, the contents of all registers are preserved except for:

- \texttt{ErrorEPC} register, which contains the restart PC
- \texttt{ERL} bit of the \texttt{Status} register, which is set to 1
- \texttt{SR} bit of the \texttt{Status} register, which is set to 1
- \texttt{BEV} bit of the \texttt{Status} register, which is set to 1

NMI exception processing is shown in Figure 5.14 on page 13.

Servicing

The NMI exception is serviced by saving the current processor state for diagnostic purposes, and reinitializing the system for the Reset exception.
Address Error Exception

This section explains the Address Error exception.

Cause

The Address Error exception occurs when an attempt is made to execute one of the following:

- load or store a doubleword that is not aligned on a doubleword boundary (except for use of special instruction)
- load, fetch, or store a word that is not aligned on a word boundary (except for use of special instruction)
- load or store a halfword that is not aligned on a halfword boundary
- reference the kernel address space from User or Supervisor mode
- reference the supervisor address space from User mode

This exception is not maskable.

Processing

The common exception vector is used for this exception. The AdEL or AdES code in the Cause register is set, indicating whether the instruction (shown by the EPC register and BD bit in the Cause register) caused the exception with an instruction reference, load operation, or store operation.

When this exception occurs, the BadVAddr register retains the virtual address that was not properly aligned or referenced protected address space. The contents of the VPN field of the Context and EntryHi registers are undefined, as are the contents of the EntryLo register.

The EPC register contains the address of the instruction that caused the exception, unless this instruction is in a branch delay slot. If it is in a branch delay slot, the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set as indication.

Address Error exception processing is shown in Figure 5.15 on page 13.

Servicing

Typically the process executing at the time is handed a segmentation violation signal. This error is usually fatal to the process incurring the exception.

To resume execution, the EPC register must be altered so that the unaligned reference instruction does not re-execute; this is accomplished by adding a value of 4 to the EPC register (EPC register + 4) before returning.

If an unaligned reference instruction is in a branch delay slot, interpretation of the branch instruction is required to resume execution.
TLB Exceptions

This section explains the TLB Exceptions. For specifics on the exceptions listed here, refer to the following three subsections.

Three types of TLB exceptions can occur:

• TLB Refill occurs when there is no TLB entry that matches an attempted reference to a mapped address space.
• TLB Invalid occurs when a virtual address reference matches a TLB entry that is marked invalid.
• TLB Modified occurs when a store operation virtual address reference to memory matches a TLB entry which is marked valid but is not dirty (the entry is not writable).

The following three subsections describe the TLB exceptions.

TLB Refill Exception

This subsection explains the TLB refill exception.

Cause

The TLB refill exception occurs when there is no TLB entry to match a reference to a mapped address space. This exception is not maskable.

Processing

There are two special exception vectors for this exception; one for references to 32-bit virtual address spaces, and one for references to 64-bit virtual address spaces. The UX, SX, and KX bits of the Status register determine whether the user, supervisor or kernel address spaces referenced are 32-bit or 64-bit spaces. All references use these vectors when the EXL bit is set to 0 in the Status register. This exception sets the TLBL or TLBS code in the ExcCode field of the Cause register. This code indicates whether the instruction, as shown by the EPC register and the BD bit in the Cause register, caused the miss by an instruction reference, load operation, or store operation.

When this exception occurs, the BadVAddr, Context, XContext and EntryHi registers hold the virtual address that failed address translation. The EntryHi register also contains the ASID from which the translation fault occurred. The Random register normally suggests a valid location in which to place the replacement TLB entry. The contents of the EntryLo register are undefined. The EPC register contains the address of the instruction that caused the exception, unless this instruction is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set.

TLB Refill exception processing is shown in Figure 5.15 on page 13.

Servicing

To service this exception, the contents of the Context or XContext register are used as a virtual address to fetch memory locations containing the physical page frame and access control bits for a pair of TLB entries. The two entries are placed into the EntryLo0/EntryLo1 register; the EntryHi and EntryLo registers are written into the TLB.

It is possible that the virtual address used to obtain the physical address and access control information is on a page that is not resident in the TLB. This condition is processed by allowing a TLB refill exception in the TLB refill handler. This second exception goes to the common exception vector because the EXL bit of the Status register is set.
**TLB Invalid Exception**

This subsection explains the TLB invalid exception.

**Cause**

The TLB invalid exception occurs when a virtual address reference matches a TLB entry that is marked invalid (TLB valid bit cleared). This exception is not maskable.

**Processing**

The common exception vector is used for this exception. The `TLBL` or `TLBS` code in the `ExcCode` field of the `Cause` register is set. This indicates whether the instruction, as shown by the `EPC` register and `BD` bit in the `Cause` register, caused the miss by an instruction reference, load operation, or store operation.

When this exception occurs, the `BadVAddr`, `Context`, `XContext` and `EntryHi` registers contain the virtual address that failed address translation. The `EntryHi` register also contains the ASID from which the translation fault occurred. The `Random` register normally contains a valid location in which to put the replacement TLB entry. The contents of the `EntryLo` registers are undefined.

The `EPC` register contains the address of the instruction that caused the exception unless this instruction is in a branch delay slot, in which case the `EPC` register contains the address of the preceding branch instruction and the `BD` bit of the `Cause` register is set.

TLB Invalid exception processing is shown in Figure 5.15 on page 13.

**Servicing**

A TLB entry is typically marked invalid when one of the following is true:

- a virtual address does not exist
- the virtual address exists, but is not in main memory (a page fault)
- a trap is desired on any reference to the page (for example, to maintain a reference bit or during debug)

After servicing the cause of a TLB Invalid exception, the TLB entry is located with TLBP (TLB Probe), and replaced by an entry with that entry’s `Valid` bit set.
**TLB Modified Exception**

This subsection explains the TLB modified exception.

**Cause**

The TLB modified exception occurs when a store operation virtual address reference to memory matches a TLB entry that is marked valid but is not dirty and therefore is not writable. This exception is not maskable.

**Processing**

The common exception vector is used for this exception, and the Mod code in the Cause register is set.

When this exception occurs, the BadVAddr, Context, XContext and EntryHi registers contain the virtual address that failed address translation. The EntryHi register also contains the ASID from which the translation fault occurred. The contents of the EntryLo registers are undefined.

The EPC register contains the address of the instruction that caused the exception unless that instruction is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set.

TLB Modified exception processing is shown in Figure 5.15 on page 13.

**Servicing**

The kernel uses the failed virtual address or virtual page number to identify the corresponding access control information. The page identified may or may not permit write accesses; if writes are not permitted, a write protection violation occurs.

If write accesses are permitted, the page frame is marked dirty/writable by the kernel in its own data structures. The TLBP instruction places the index of the TLB entry that must be altered into the Index register. The EntryLo register is loaded with a word containing the physical page frame and access control bits (with the D bit set), and the EntryHi and EntryLo registers are written into the TLB.
Cache Error Exception

This section explains the Cache Error exception.

Cause

The Cache Error exception occurs when a primary cache parity error is detected. This exception is maskable by the DE bit of the Status register.

Processing

The processor sets the ERL bit in the Status register, saves the exception restart address in ErrorEPC register, and then transfers to a special vector in uncached space:

- If the BEV bit = 0, the vector is 0xFFFF FFFF A000 0100.
- If the BEV bit = 1, the vector is 0xFFFF FFFF BFC0 0300.

No other registers are changed.

Cache Error exception processing is shown in Figure 5.13 on page 13.

Servicing

All errors should be logged. To correct cache parity errors the system uses the CACHE instruction to invalidate the cache block, overwrites the old data through a cache miss, and resumes execution with an ERET.

Other errors are not correctable and are likely to be fatal to the current process.
Bus Error Exception

This section explains the Bus Error exception.

Cause
A Bus Error exception is raised by board-level circuitry for events such as bus time-out, backplane bus parity errors, and invalid physical memory addresses or access types. This exception is not maskable.

A Bus Error exception occurs only when a cache miss refill, uncached reference, or unbuffered write occurs synchronously; a Bus Error exception resulting from a buffered write transaction must be reported using the general interrupt mechanism.

Processing
The common interrupt vector is used for a Bus Error exception. The *IBE* or *DBE* code in the *ExcCode* field of the *Cause* register is set, signifying whether the instruction (as indicated by the *EPC* register and *BD* bit in the *Cause* register) caused the exception by an instruction reference, load operation, or store operation.

The *EPC* register contains the address of the instruction that caused the exception, unless it is in a branch delay slot, in which case the *EPC* register contains the address of the preceding branch instruction and the *BD* bit of the *Cause* register is set. Bus Error processing is shown in Figure 5.15 on page 13.

Servicing
The physical address at which the fault occurred can be computed from information available in the CP0 registers.

- If the *IBE* code in the *Cause* register is set (indicating an instruction fetch reference), the virtual address is contained in the *EPC* register.
- If the *DBE* code is set (indicating a load or store reference), the instruction that caused the exception is located at the virtual address contained in the *EPC* register (or 4+ the contents of the *EPC* register if the *BD* bit of the *Cause* register is set).

The virtual address of the load and store reference can then be obtained by interpreting the instruction. The physical address can be obtained by using the TLBP instruction and reading the *EntryLo* register to compute the physical page number.

The process executing at the time of this exception is handed a bus error signal, which is usually fatal.
Integer Overflow Exception
This section explains the Integer Overflow exception.

Cause
An Integer Overflow exception occurs when an ADD, ADDI, SUB, DADD, DADDI or DSUB\(^1\) instruction results in a 2’s complement overflow. This exception is not maskable.

Processing
The common exception vector is used for this exception, and the OV code in the Cause register is set.
The EPC register contains the address of the instruction that caused the exception unless the instruction is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set.
Integer Overflow exception processing is shown in Figure 5.15 on page 13.

Servicing
The process executing at the time of the exception is handed a floating-point exception/integer overflow signal. This error is usually fatal to the current process.

---
\(^1\) See Appendix A for instruction description.
Trap Exception

This section explains the Trap exception.

Cause

The Trap exception occurs when a TGE, TGEU, TLT, TLTU, TEQ, TNE, TGEI, TGEUI, TLTI, TLTUI, TEQI, or TNEI1 instruction results in a TRUE condition. This exception is not maskable.

Processing

The common exception vector is used for this exception, and the Tr code in the Cause register is set.

The EPC register contains the address of the instruction causing the exception unless the instruction is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction and the BD bit of the Cause register is set.

Trap exception processing is shown in Figure 5.15 on page 13.

Servicing

The process executing at the time of a Trap exception is handed a floating-point exception/integer overflow signal. This error is usually fatal.

---

1. See Appendix A for instruction description.
System Call Exception

This section explains the System Call exception.

Cause
A System Call exception occurs during an attempt to execute the SYSCALL instruction. This exception is not maskable.

Processing
The common exception vector is used for this exception, and the Sys code in the Cause register is set.

The EPC register contains the address of the SYSCALL instruction unless it is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction.

If the SYSCALL instruction is in a branch delay slot, the BD bit of the Status register is set; otherwise this bit is cleared.

System Call exception processing is shown in Figure 5.15 on page 13.

Servicing
When this exception occurs, control is transferred to the applicable system routine.

To resume execution, the EPC register must be altered so that the SYSCALL instruction does not re-execute; this is accomplished by adding a value of 4 to the EPC register (EPC register + 4) before returning.

If a SYSCALL instruction is in a branch delay slot, a more complicated algorithm, beyond the scope of this description, may be required.
**Breakpoint Exception**

This section explains the Breakpoint exception.

**Cause**

A Breakpoint exception occurs when an attempt is made to execute the BREAK instruction. This exception is not maskable.

**Processing**

The common exception vector is used for this exception, and the BP code in the Cause register is set.

The EPC register contains the address of the BREAK instruction unless it is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction.

If the BREAK instruction is in a branch delay slot, the BD bit of the Status register is set, otherwise the bit is cleared.

Breakpoint exception processing is shown in Figure 5.15 on page 13.

**Servicing**

When the Breakpoint exception occurs, control is transferred to the applicable system routine. Additional distinctions can be made by analyzing the unused bits of the BREAK instruction (bits 25:6), and loading the contents of the instruction whose address the EPC register contains. A value of 4 must be added to the contents of the EPC register (EPC register + 4) to locate the instruction if it resides in a branch delay slot.

To resume execution, the EPC register must be altered so that the BREAK instruction does not re-execute; this is accomplished by adding a value of 4 to the EPC register (EPC register + 4) before returning.

If a BREAK instruction is in a branch delay slot, interpretation of the branch instruction is required to resume execution.
Reserved Instruction Exception

This section explains the Reserved Instruction exception.

Cause

The Reserved Instruction exception occurs when one of the following conditions occurs:

- an attempt is made to execute an instruction with an undefined major opcode (bits 31:26)
- an attempt is made to execute a SPECIAL instruction with an undefined minor opcode (bits 5:0)
- an attempt is made to execute a REGIMM instruction with an undefined minor opcode (bits 20:16)
- an attempt is made to execute 64-bit operations in 32-bit virtual addressing when in User or Supervisor modes. 64-bit operations are always valid in Kernel mode regardless of the value of the \( KX \) bit in the Status register.

This exception is not maskable.

Reserved Instruction exception processing is shown in Figure 5.15 on page 13.

Processing

The common exception vector is used for this exception, and the \( RI \) code in the Cause register is set.

The EPC register contains the address of the reserved instruction unless it is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction.

Servicing

No instructions in the MIPS ISA are currently interpreted. The process executing at the time of this exception is handed an illegal instruction/reserved operand fault signal. This error is usually fatal.
Coprocessor Unusable Exception

This section explains the Coprocessor Unusable exception.

Cause
The Coprocessor Unusable exception occurs when an attempt is made to execute a coprocessor instruction for either:
- a corresponding coprocessor unit that has not been marked usable, or
- CP0 instructions, when the unit has not been marked usable and the process executes in User mode.
This exception is not maskable.

Processing
The common exception vector is used for this exception, and the CPU code in the Cause register is set. The contents of the Coprocessor Usage Error field of the coprocessor Control register indicate which of the four coprocessors was referenced. The EPC register contains the address of the unusable coprocessor instruction unless it is in a branch delay slot, in which case the EPC register contains the address of the preceding branch instruction.
Coprocessor Unusable exception processing is shown in Figure 5.15 on page 13.

Servicing
The coprocessor unit to which an attempted reference was made is identified by the Coprocessor Usage Error field, which results in one of the following situations:
- If the process is entitled access to the coprocessor, the coprocessor is marked usable and the corresponding user state is restored to the coprocessor.
- If the process is entitled access to the coprocessor, but the coprocessor does not exist or has failed, interpretation of the coprocessor instruction is possible.
- If the BD bit is set in the Cause register, the branch instruction must be interpreted; then the coprocessor instruction can be emulated and execution resumed with the EPC register advanced past the coprocessor instruction.
- If the process is not entitled access to the coprocessor, the process executing at the time is handed an illegal instruction/privileged instruction fault signal. This error is usually fatal.
Floating-Point Exception

This section explains the Floating-Point exception.

Cause

The Floating-Point exception is used by the floating-point coprocessor. This exception is not maskable.

Processing

The common exception vector is used for this exception, and the FPE code in the Cause register is set.

The contents of the Floating-Point Control/Status register indicate the cause of this exception.

Floating-Point exception processing is shown in Figure 5.15 on page 13.

Servicing

This exception is cleared by clearing the appropriate bit in the Floating-Point Control/Status register.

For an unimplemented instruction exception, the kernel should emulate the instruction; for other exceptions, the kernel should pass the exception to the user program that caused the exception.
**Interrupt Exception**

This section explains the Interrupt exception.

**Cause**

The Interrupt exception occurs when one of the eight interrupt conditions is asserted. The significance of these interrupts is dependent upon the specific system implementation.

Each of the eight interrupts can be masked by clearing the corresponding bit in the Int-Mask field of the Status register, and all of the eight interrupts can be masked at once by clearing the IE bit of the Status register.

**Processing**

The common exception vector is used for this exception, and the Int code in the Cause register is set.

The IP field of the Cause register indicates current interrupt requests. It is possible that more than one of the bits can be simultaneously set (or even no bits may be set if the interrupt is asserted and then deasserted before this register is read).

Interrupt exception processing is shown in Figure 5.15 on page 13.

**Servicing**

If the interrupt is caused by one of the two software-generated exceptions (SW1 or SW0), the interrupt condition is cleared by setting the corresponding Cause register bit to 0.

If the interrupt is hardware-generated, the interrupt condition is cleared by correcting the condition causing the interrupt pin to be asserted.

**NOTE:** due to the write buffer, a store to an external device will not necessarily occur until after other instructions in the pipeline finish. Thus, the user must ensure that the store will occur before the return from exception instruction (ERET) is executed otherwise the interrupt may be serviced again even though there should be no interrupt pending.
Exception Handling and Servicing Flowcharts

The remainder of this chapter contains figures of flowcharts for the exceptions described in Table 5.12, and guidelines for their handlers.

<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Figure 5.16,</td>
<td>General exceptions and their exception handler</td>
</tr>
<tr>
<td>Figure 5.17</td>
<td></td>
</tr>
<tr>
<td>Figure 5.18,</td>
<td>TLB/XTLB miss exception and their exception handler</td>
</tr>
<tr>
<td>Figure 5.19</td>
<td></td>
</tr>
<tr>
<td>Figure 5.20</td>
<td>Cache error exception and its handler</td>
</tr>
<tr>
<td>Figure 5.21</td>
<td>Reset, soft reset and NMI exceptions, and a guideline to their handler.</td>
</tr>
</tbody>
</table>

Table 5.12 List of Exception Flowcharts

Generally speaking, the exceptions are handled by hardware (HW), and then the exceptions are serviced by software (SW).
CPU Exception Processing

Figure 5.16  General Exception Handler (HW)

Instructions in Branch Delay Slot?

Yes

Cause 31 (BD) ← 1

Set BadVA
EPC ← (PC - 4)

Set FP Control Status Register
EnHi ← VPN2, ASID
Context ← VPN2
Set Cause Register
EXCCode, CE

EXL (SR1)

=1

=0

Set BadVA
EPC ← PC

EXL ← 1

Processor forced to Kernel Mode
& interrupt disabled

=0 (normal)

=1 (bootstrap)

BEV

(Base is sign extended for 64 bits)

PC ← 0xFFFF FFFF 8000 0000
+ 180
(unmapped, cached)

PC ← 0xFFFF FFFF BFC0 0200
+ 180
(unmapped, uncached)

No

Check if exception within another exception

Cause 31 (BD) ← 0

Comments

*FP Control Status Register is only set if the respective exception occurs. EnHi, X/Context are set only for TLB: Invalid, Modified, & Refill exceptions

Exceptions other than Reset, Soft Reset, NMI, CacheErr or first-level TLB miss

Note: Interrupts can be masked by IE or IMs
### Figure 5.17  General Exception Servicing Guidelines (SW)

<table>
<thead>
<tr>
<th>MFC0 -</th>
<th>X/Context</th>
</tr>
</thead>
<tbody>
<tr>
<td>EPC</td>
<td>STATUS</td>
</tr>
<tr>
<td>Cause</td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>MTC0 -</th>
<th>(Set Status Bits:)</th>
</tr>
</thead>
<tbody>
<tr>
<td>KSU ← 00</td>
<td></td>
</tr>
<tr>
<td>EXL ← 0</td>
<td></td>
</tr>
<tr>
<td>IE ← 1</td>
<td></td>
</tr>
</tbody>
</table>

Check CAUSE REG. & Jump to appropriate Service Code

Service Code

EXL = 1

MTC0 -

EPC

STATUS

ERET

**Comments**

- Unmapped vector so TLBMod, TLBInv, TLB Refill exceptions not possible
- EXL=1 so Interrupt exceptions disabled
- OS/System to avoid all other exceptions
- Only CacheErr, Reset, Soft Reset, NMI exceptions possible.

(Optional - only to enable Interrupts while keeping Kernel Mode)

- After EXL=0, all exceptions allowed.
  - except interrupt if masked by IE or IM and CacheErr if masked by DE)

- ERET is not allowed in the branch delay slot of another Jump Instruction
- Processor does not execute the instruction which is in the ERET's branch delay slot
  - PC ← EPC; EXL ← 0
  - LLbit ← 0
Figure 5.18 TLB/XTLB Miss Exception Handler (HW)
Figure 5.19 TLB/XTLB Exception Servicing Guidelines (SW)

- Unmapped vector so TLBMod, TLBInv, TLB Refill or VCEP exceptions not possible
- EXL=1 so Interrupt exceptions disabled
- OS/System to avoid all other exceptions
- Only CacheErr, Reset, Soft Reset, NMI exceptions possible.

- Load the mapping of the virtual address in Context Reg. Move it to ENLO and Write into the TLB
- There could be a TLB miss again during the mapping of the data or instruction address. The processor will jump to the general exception vector since the EXL is 1. (Option to complete the first level refill in the general exception handler or ERET to the original instruction and take the exception again)

- ERET is not allowed in the branch delay slot of another Jump Instruction
- Processor does not execute the instruction which is in the ERET's branch delay slot
  - PC ← EPC; EXL ← 0
  - LLbit ← 0
Figure 5.20  Cache Error Exception Handling (HW) and Servicing Guidelines (SW)

Note: Can be masked/disabled by DE (SR16) bit = 1

Set CacheErr Reg.

Instr. in Br. Dly. Slot?

Yes

ErrEPC ← (PC - 4)

ErrEPC ← PC

ERL ← 1

=0 (normal)

BEV

=1 (bootstrap)

(Base is sign extended for 64 bits)

PC ← 0xFFFF FFFF A000 0000
(unmapped, uncached)

PC ← 0xFFFF FFFF BFC0 0200
(unmapped, uncached)

Comments

* Unmapped Uncached vector so TLB related & Cache Error Exception not possible
* ERL = 1 so Interrupt exceptions disabled
* OS/System to avoid all other exceptions
* Only Reset, Soft Reset, NMI exceptions possible.

* ERET is not allowed in the branch delay slot of another Jump Instruction
* Processor does not execute the instruction which is in the ERET’s branch delay slot
* PC ← ErrorEPC; ERL ← 0
* LLbit ← 0
**Figure 5.21  Reset, Soft Reset & NMI Exception Handling (HW) and Servicing Guidelines (SW)**

**Soft Reset or NMI Exception**
- Status:
  - BEV ← 1
  - SR ← 1
  - ERL ← 1

**Reset Exception**
- Random ← TLBENTRIES - 1
- Wired ← 0
- Config ← Update(31:6)|| Undef(5:0)
- Status:
  - BEV ← 1
  - SR ← 0
  - ERL ← 1

- ErrorEPC ← PC

- PC ← 0xFFFF FFFF BFC0 0000

---

**Reset, Soft Reset & NMI Servicing Guidelines (SW)**

- NMI?
  - Yes
  - NMI Service Code
  - Status bit 20 (SR)
    - = 0
      - ERET (Optional)
    - = 1
      - Soft Reset Service Code
      - Reset Service Code
  - No
    - Note: There is no indication from the processor to differentiate between NMI & Soft Reset; there must be a system level indication.

---
This chapter describes the R4600 and R4700 floating-point unit (FPU) features, including the programming model, instruction set and formats, and the pipeline.

The FPU, with associated system software, fully conforms to the requirements of ANSI/IEEE Standard 754–1985, IEEE Standard for Binary Floating-Point Arithmetic. In addition, the MIPS architecture fully supports the recommendations of the standard and precise exceptions.

Overview

The FPU operates as a coprocessor for the CPU (it is assigned coprocessor label CP1), and extends the CPU instruction set to perform arithmetic operations on floating-point values.

The R4600/R4700 Floating-Point Coprocessor

The R4600/R4700 incorporates an entire floating-point coprocessor on chip, including a floating-point register file and execution units. The floating-point coprocessor forms a seamless interface with the integer unit, decoding and executing instructions in parallel with the integer unit. In comparison to the R4600, the floating point coprocessor of the R4700 has improved floating multiply operations.

The R4600/R4700 uses the floating-point unit to perform integer multiply and divide, and results are placed in the HI and LO registers. The values can then be transferred to the general purpose register file using the MFHI/MFLO instructions. The R4700 performs an integer multiply faster than the R4600 by 2 clock cycles, but it takes the same number of clock cycles for integer division. The R4700 improves the multiply compared to the R4600 by performing a single-precision multiply in 4 clock cycles, and a double-precision multiply in 5 clock cycles.

Figure 6.1 illustrates the functional organization of the FPU.
FPU Features
This section briefly describes the operating model, the load/store instruction set, and the coprocessor interface in the FPU. A more detailed description is given in the sections that follow.

- **Full 64-bit Operation.** When the FR bit in the CPU Status register equals 0, the FPU is configured for sixteen 64-bit registers for double-precision values or thirty-two 32-bit registers for single-precision values. When the FR bit in the CPU Status register equals 1, the FPU is configured for thirty-two 64-bit registers. Each register can hold single- or double-precision values. The FPU also includes a 32-bit Control/Status register that provides access to all IEEE-Standard exception handling capabilities.

- **Load and Store Instruction Set.** Like the CPU, the FPU uses a load-and store-oriented instruction set, with single-cycle load and store operations. Overlap of multiply and add is supported.

- **Tightly Coupled Coprocessor Interface.** The FPU resides on-chip to form a tightly coupled unit with a seamless integration of floating-point and fixed-point instruction sets.

FPU Programming Model
This section describes the set of FPU registers and their data organization. The FPU registers include **Floating-Point General Purpose registers (FGRs)** and two control registers: **Control/Status** and **Implementation/Revision**.

Floating-Point General Registers (FGRs)
The FPU has a set of **Floating-Point General Purpose registers (FGRs)** that can be accessed in the following ways:

- As 32 general-purpose registers (32 FGRs), each of which is 32-bits wide when the FR bit in the CPU Status register equals 0; or as 32 general-purpose registers (32 FGRs), each of which is 64-bits wide when FR equals 1. The CPU accesses these registers through move, load, and store instructions.
- As 16 floating-point registers (see the next section for a description of FPRs), each of which is 64-bits wide, when the FR bit in the CPU Status register equals 0. The FPRs hold values in either single- or double-precision floating-point format. Each FPR corresponds to adjacently numbered FGRs as shown in Figure 6.2 on page 6-3.
- As 32 floating-point registers (see the next section for a description of FPRs), each of which is 64-bits wide, when the FR bit in the CPU Status register equals 1. The FPRs hold values in either single- or double-precision floating-point format. Each FPR corresponds to an FGR as shown in Figure 6.2.
Floating-Point Registers

The FPU provides:
- 16 Floating-Point registers (FPRs) for Status.FR = 0, or
- 32 Floating-Point registers (FPRs) for Status.FR = 1.

These 64-bit registers hold floating-point values during floating-point operations and are physically formed from the General Purpose registers (FGRs). When the FR bit in the Status register equals 1, the FPR references a single 64-bit FGR.

The FPRs hold values in either single- or double-precision floating-point format. If the FR bit equals 0, only even numbers (the least register, as shown in Figure 6.2) can be used to address FPRs. When the FR bit is set to a 1, all FPR register numbers are valid.

If the FR bit equals 0 during a double-precision floating-point operation, the general registers are accessed in double pairs. Thus, in a double-precision operation, selecting Floating-Point Register 0 (FPR0) actually addresses adjacent Floating-Point General Purpose registers FGR0 and FGR1.

Floating-Point Control Registers

The FPU has 32 control registers (FCRs) that can only be accessed by move operations. The FCRs are described below:
- The Implementation/Revision register (FCR0) holds revision information about the FPU.
- The Control/Status register (FCR31) controls and monitors exceptions, holds the result of compare operations, and establishes rounding modes.
- FCR1 to FCR30 are reserved.
Table 6.1 lists the assignments of the FCRs.

<table>
<thead>
<tr>
<th>FCR Number</th>
<th>Use</th>
</tr>
</thead>
<tbody>
<tr>
<td>FCR0</td>
<td>Coprocessor implementation and revision register</td>
</tr>
<tr>
<td>FCR1 to FCR30</td>
<td>Reserved</td>
</tr>
<tr>
<td>FCR31</td>
<td>Rounding mode, cause, trap enables, and flags</td>
</tr>
</tbody>
</table>

Table 6.1 Floating-Point Control Register Assignments

Implementation and Revision Register, (FCR0)
The read-only Implementation and Revision register (FCR0) specifies the implementation and revision number of the FPU. This information can determine the coprocessor revision and performance level, and can also be used by diagnostic software.

Figure 6.3 shows the layout of the register; Table 6.2, which follows the figure, describes the Implementation and Revision register (FCR0) fields.

![Implementation/Revision Register (FCR0)](image)

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Imp</td>
<td>Implementation number R4600: 0x20 R4700: 0x21</td>
</tr>
<tr>
<td>Rev</td>
<td>Revision number in the form of y.x</td>
</tr>
<tr>
<td>0</td>
<td>Reserved.</td>
</tr>
</tbody>
</table>

Table 6.2 FCR0 Fields

The revision number is a value of the form y.x, where:

- y is a major revision number held in bits 7:4.
- x is a minor revision number held in bits 3:0.

The revision number distinguishes some chip revisions; however, there is no guarantee that changes to the chip are necessarily reflected by the revision number, or that changes to the revision number necessarily reflect real chip changes. For this reason revision number values are not listed, and software should not rely on the revision number to characterize the chip.

Control/Status Register (FCR31)
The Control/Status register (FCR31) contains control and status information that can be accessed by instructions in either Kernel or User mode. FCR31 also controls the arithmetic rounding mode and enables User mode traps, as well as identifying any exceptions that may have occurred in the most recently executed instruction, along with any exceptions that may have occurred without being trapped.
Figure 6.4 on page 6-5 shows the format of the Control/Status register, and Table 6.3, which follows the figure, describes the Control/Status register fields. Figure 6.5 on page 6-5 shows the Control/Status register Cause, Flag, and Enable fields.

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>FS</td>
<td>When set, denormalized results are flushed to 0 instead of causing an unimplemented operation exception.</td>
</tr>
<tr>
<td>C</td>
<td>Condition bit. See description of Control/Status register Condition bit.</td>
</tr>
<tr>
<td>Cause</td>
<td>Cause bits. See Figure 6.5 and the description of Control/Status register Cause, Flag, and Enable bits.</td>
</tr>
<tr>
<td>Enables</td>
<td>Enable bits. See Figure 6.5 and the description of Control/Status register Cause, Flag, and Enable bits.</td>
</tr>
<tr>
<td>Flags</td>
<td>Flag bits. See Figure 6.5 and the description of Control/Status register Cause, Flag, and Enable bits.</td>
</tr>
<tr>
<td>RM</td>
<td>Rounding mode bits. See Table 6.4 on page 7 and the description of Control/Status register Rounding Mode Control bits.</td>
</tr>
</tbody>
</table>

Table 6.3 Control/Status Register Fields
Accessing the Control/Status Register

When the Control/Status register is read by a Move Control From Coprocessor 1 (CFC1) instruction, all unfinished instructions in the pipeline are completed before the contents of the register are moved to the main processor. If a floating-point exception occurs as the pipeline empties, the FP exception is taken and the CFC1 instruction is re-executed after the exception is serviced.

The bits in the Control/Status register can be set or cleared by writing to the register using a Move Control To Coprocessor 1 (CTC1) instruction. CTC1 is not issued until all previous floating-point operations are complete.

IEEE Standard 754

IEEE Standard 754 specifies that floating-point operations detect certain exceptional cases, raise flags, and can invoke an exception handler when an exception occurs. These features are implemented in the MIPS architecture with the Cause, Enable, and Flag fields of the Control/Status register. The Flag bits implement IEEE 754 exception status flags, and the Cause and Enable bits implement exception handling.

Control/Status Register FS Bit

When the FS bit is set, denormalized results are flushed to 0 instead of causing an unimplemented operation exception.

Control/Status Register Condition Bit

When a floating-point Compare operation takes place, the result is stored at bit 23, the Condition bit, to save or restore the state of the condition line. The C bit is set to 1 if the condition is true; the bit is cleared to 0 if the condition is false. Bit 23 is affected only by compare and Move Control To FPU instructions.

Control/Status Register Cause, Flag, and Enable Fields

Figure 6.5 on page 6-5 illustrates the Cause, Flag, and Enable fields of the Control/Status register.

Cause Bits

Bits 17:12 in the Control/Status register contain Cause bits, as shown in Figure 6.5 on page 6-5, which reflect the results of the most recently executed instruction. The Cause bits are a logical extension of the CP0 Cause register; they identify the exceptions raised by the last floating-point operation and raise an interrupt or exception if the corresponding enable bit is set. If more than one exception occurs on a single instruction, each appropriate bit is set.

The Cause bits are written by each floating-point operation (but not by load, store, or move operations). The Unimplemented Operation (E) bit is set to a 1 if software emulation is required, otherwise it remains 0. The other bits are set to 0 or 1 to indicate the occurrence or non-occurrence (respectively) of an IEEE 754 exception.

When a floating-point exception is taken, no results are stored, and the only state affected is the Cause bits. Exceptions caused by an immediately previous floating-point operation can be determined by reading the Cause field.

Enable Bits

A floating-point operation that sets an enabled Cause bit forces an immediate exception, as does setting both Cause and Enable bits with CTC1. The floating-point exception or interrupt is enabled when the corresponding enable be is set.

There is no enable for Unimplemented Operation (E). Setting Unimplemented Operation always generates a floating-point exception.
Before returning from a floating-point exception, or doing a CTC1, software must first clear the enabled Cause bits to prevent a repeat of the interrupt. Thus, User mode programs can never observe enabled Cause bits set; if this information is required in a User mode handler, it must be passed somewhere other than the Status register.

For a floating-point operation that sets only unenabled Cause bits, no exception occurs and the default result defined by IEEE 754 is stored. In this case, the exceptions that were caused by the immediately previous floating-point operation can be determined by reading the Cause field.

**Flag Bits**

When an exception case is detected and the exception Enable is not set, the corresponding flag bit is set. If an exception is taken, none of the flag bits are modified. Note however that system software may set the flag bits before invoking a user exception handler.

The Flag bits are cumulative and indicate that an exception was raised by an operation that was executed since they were explicitly reset. Flag bits are set to 1 if an IEEE 754 exception is raised, otherwise they remain unchanged. The Flag bits are never cleared as a side effect of floating-point operations; however, they can be set or cleared by writing a new value into the Status register, using a Move To Coprocessor Control instruction.

**Control/Status Register Rounding Mode Control Bits**

Bits 1 and 0 in the Control/Status register constitute the Rounding Mode (RM) field.

As shown in Table 6.4, these bits specify the rounding mode that the FPU uses for all floating-point operations.

<table>
<thead>
<tr>
<th>Rounding Mode RM(1:0)</th>
<th>Mnemonic</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>RN</td>
<td>Round result to nearest representable value: round to value with least-significant bit 0 when the two nearest representable values are equally near.</td>
</tr>
<tr>
<td>1</td>
<td>RZ</td>
<td>Round toward 0: round to value closest to and not greater in magnitude than the infinitely precise result.</td>
</tr>
<tr>
<td>2</td>
<td>RP</td>
<td>Round toward (+\infty): round to value closest to and not less than the infinitely precise result.</td>
</tr>
<tr>
<td>3</td>
<td>RM</td>
<td>Round toward (-\infty): round to value closest to and not greater than the infinitely precise result.</td>
</tr>
</tbody>
</table>

**Floating-Point Formats**

The FPU performs both 32-bit (single-precision) and 64-bit (double-precision) IEEE standard floating-point operations. The 32-bit single-precision format has a 24-bit signed-magnitude fraction field (f+s) and an 8-bit exponent (e), as shown in Figure 6.6.
The 64-bit double-precision format has a 53-bit signed-magnitude fraction field \((f+s)\) and an 11-bit exponent, as shown in Figure 6.7.

As shown in the above figures, numbers in floating-point format are composed of three fields:
- sign field, \(s\)
- biased exponent, \(e = E + \text{bias}\)
- fraction, \(f = .b_1b_2...b_{p-1}\)

The range of the unbiased exponent \(E\) includes every integer between the two values \(E_{\text{min}}\) and \(E_{\text{max}}\) inclusive, together with two other reserved values:
- \(E_{\text{min}} - 1\) (to encode \(+0\) and denormalized numbers)
- \(E_{\text{max}} + 1\) (to encode \(-\infty\) and NaNs [Not a Number])

For single- and double-precision formats, each representable nonzero numerical value has just one encoding.

For single- and double-precision formats, the value of a number, \(v\), is determined by the equations shown in Table 6.5.

<table>
<thead>
<tr>
<th>No.</th>
<th>Equation</th>
</tr>
</thead>
<tbody>
<tr>
<td>(1)</td>
<td>if (E = E_{\text{max}} + 1) and (f \neq 0), then (v) is NaN, regardless of (s)</td>
</tr>
<tr>
<td>(2)</td>
<td>if (E = E_{\text{max}} + 1) and (f = 0), then (v = (-1)^s \infty)</td>
</tr>
<tr>
<td>(3)</td>
<td>if (E_{\text{min}} \leq E \leq E_{\text{max}}), then (v = (-1)^s 2^E (1.f))</td>
</tr>
<tr>
<td>(4)</td>
<td>if (E = E_{\text{min}} - 1) and (f \neq 0), then (v = (-1)^s 2^{E_{\text{min}}}(0.f))</td>
</tr>
<tr>
<td>(5)</td>
<td>if (E = E_{\text{min}} - 1) and (f = 0), then (v = (-1)^s 0)</td>
</tr>
</tbody>
</table>

Table 6.5 Equations for Calculating Values in Single and Double-Precision Floating-Point Format

For all floating-point formats, if \(v\) is NaN, the most-significant bit of \(f\) determines whether the value is a signaling or quiet NaN; \(v\) is a signaling NaN if the most-significant bit of \(f\) is set, otherwise, \(v\) is a quiet NaN.
defines the values for the format parameters. Minimum and maximum floating-point values are given in Table 6.7.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Format</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Single</td>
</tr>
<tr>
<td>f</td>
<td>24</td>
</tr>
<tr>
<td>$E_{\text{max}}$</td>
<td>+127</td>
</tr>
<tr>
<td>$E_{\text{min}}$</td>
<td>-126</td>
</tr>
<tr>
<td>Exponent $bias$</td>
<td>+127</td>
</tr>
<tr>
<td>Exponent width in bits</td>
<td>8</td>
</tr>
<tr>
<td>Integer bit</td>
<td>hidden</td>
</tr>
<tr>
<td>Fraction width in bits</td>
<td>24</td>
</tr>
<tr>
<td>Format width in bits</td>
<td>32</td>
</tr>
</tbody>
</table>

Table 6.6 Floating-Point Format Parameter Values

<table>
<thead>
<tr>
<th>Type</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Float Minimum</td>
<td>1.40129846e–45</td>
</tr>
<tr>
<td>Float Minimum Norm</td>
<td>1.17549435e–38</td>
</tr>
<tr>
<td>Float Maximum</td>
<td>3.40282347e+38</td>
</tr>
<tr>
<td>Double Minimum</td>
<td>4.9406564584124654e–324</td>
</tr>
<tr>
<td>Double Minimum Norm</td>
<td>2.2250738585072014e–308</td>
</tr>
<tr>
<td>Double Maximum</td>
<td>1.7976931348623157e+308</td>
</tr>
</tbody>
</table>

Table 6.7 Minimum and Maximum Floating-Point Values

**Binary Fixed-Point Format**

Binary fixed-point values are held in 2’s complement format. Unsigned fixed-point values are not directly provided by the floating-point instruction set. Figure 6.8 illustrates binary fixed-point format; Table 6.8, which follows the figure, lists the binary fixed-point format fields.

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>sign</td>
<td>sign bit</td>
</tr>
<tr>
<td>integer</td>
<td>integer value</td>
</tr>
</tbody>
</table>

Table 6.8 Binary Fixed-Point Format Fields
Floating-Point Instruction Set Overview

All FPU instructions are 32-bits long, aligned on a word boundary. They can be divided into the following groups:

- **Load, Store, and Move** instructions move data between memory, the main processor, and the FPU General Purpose registers.
- **Conversion** instructions perform conversion operations between the various data formats.
- **Computational** instructions perform arithmetic operations on floating-point values in the FPU registers.
- **Compare** instructions perform comparisons of the contents of registers and set a conditional bit based on the results.
- **Branch on FPU Condition** instructions perform a branch to the specified target if the specified coprocessor condition is met.

Table 6.9 through Table 6.12 list the instruction set of the FPU. A complete description of each instruction is provided in Appendix B.

In the instruction formats shown in Table 6.9 through Table 6.12, the `fmt` appended to the instruction opcode specifies the data format: `s` specifies single-precision binary floating-point, `d` specifies double-precision binary floating-point, and `w` specifies binary fixed-point.

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>LWC1</td>
<td>Load Word to FPU</td>
</tr>
<tr>
<td>SWC1</td>
<td>Store Word from FPU</td>
</tr>
<tr>
<td>LDC1</td>
<td>Load Doubleword to FPU</td>
</tr>
<tr>
<td>SDC1</td>
<td>Store Doubleword From FPU</td>
</tr>
<tr>
<td>MTC1</td>
<td>Move Word To FPU</td>
</tr>
<tr>
<td>MFC1</td>
<td>Move Word From FPU</td>
</tr>
<tr>
<td>CTC1</td>
<td>Move Control Word To FPU</td>
</tr>
<tr>
<td>CFC1</td>
<td>Move Control Word From FPU</td>
</tr>
<tr>
<td>DMTC1</td>
<td>Doubleword Move To FPU</td>
</tr>
<tr>
<td>DMFC1</td>
<td>Doubleword Move From FPU</td>
</tr>
</tbody>
</table>

*Table 6.9 FPU Instruction Summary: Load, Move and Store Instructions*

<table>
<thead>
<tr>
<th>OpCode</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>CVT.S.fmt</td>
<td>Floating-point Convert to Single FP</td>
</tr>
<tr>
<td>CVT.D.fmt</td>
<td>Floating-point Convert to Double FP</td>
</tr>
<tr>
<td>CVT.W.fmt</td>
<td>Floating-point Convert to Single Fixed Point</td>
</tr>
<tr>
<td>ROUND.w.fmt</td>
<td>Floating-point Round</td>
</tr>
<tr>
<td>TRUNC.w.fmt</td>
<td>Floating-point Truncate</td>
</tr>
<tr>
<td>CEIL.w.fmt</td>
<td>Floating-point Ceiling</td>
</tr>
<tr>
<td>FLOOR.w.fmt</td>
<td>Floating-point Floor</td>
</tr>
</tbody>
</table>

*Table 6.10 FPU Instruction Summary: Conversion Instructions*
Floating-Point Load, Store, and Move Instructions

This section discusses the manner in which the FPU uses the load, store and move instructions listed in Table 6.9 on page 10; Appendix B provides a detailed description of each instruction.

Transfers Between FPU and Memory

All data movement between the FPU and memory is accomplished by using one of the following instructions:

- Load Word To Coprocessor 1 (LWC1) or Store Word To Coprocessor 1 (SWC1) instructions, which reference a single 32-bit word of the FPU general registers
- Load Doubleword (LDC1) or Store Doubleword (SDC1) instructions, which reference a 64-bit doubleword.

These load and store operations are unformatted; no format conversions are performed and therefore no floating-point exceptions can occur due to these operations.

With the LDC1 and SDC1 instructions the R4600/R4700 floating-point unit can take advantage of the 64-bit wide data cache and issue a coprocessor load or store double-word instruction with every cycle.

Transfers Between FPU and CPU

Data can also be moved directly between the FPU and the CPU by using one of the following instructions:

- Move To Coprocessor 1 (MTC1)
- Move From Coprocessor 1 (MFC1)
- Doubleword Move To Coprocessor 1 (DMTC1)
- Doubleword Move From Coprocessor 1 (DMFC1)

Like the floating-point load and store operations, these operations perform no format conversions and never cause floating-point exceptions.
Load Delay and Hardware Interlocks
The instruction immediately following a load can use the contents of the loaded register. In such cases the hardware interlocks, requiring additional real cycles; for this reason, scheduling load delay slots is desirable, although it is not required for functional code.

Data Alignment
All coprocessor loads and stores reference the following aligned data items:
- For word loads and stores, the access type is always WORD, and the low-order 2 bits of the address must always be 0.
- For doubleword loads and stores, the access type is always DOUBLE-WORD, and the low-order 3 bits of the address must always be 0.

Endianness
Regardless of byte-numbering order (endianness) of the data, the address specifies the byte that has the smallest byte address in the addressed field. For a big-endian system, it is the leftmost byte; for a little-endian system, it is the rightmost byte.

Floating-Point Conversion Instructions
Conversion instructions perform conversions between the various data formats such as single- or double-precision, fixed- or floating-point formats. Table 6.10 on page 10 lists conversion instructions; Appendix B gives a detailed description of each instruction.

Floating-Point Computational Instructions
Computational instructions perform arithmetic operations on floating-point values, in registers. Table 6.11 on page 11 lists the computational instructions and Appendix B provides a detailed description of each instruction. There are two categories of computational instructions:
- 3-Operand Register-Type instructions, which perform floating-point addition, subtraction, multiplication, division, and square root.
- 2-Operand Register-Type instructions, which perform floating-point absolute value, move, and negate.

Branch on FPU Condition Instructions
Table 6.12 on page 11 lists the Branch on FPU (coprocessor unit 1) condition instructions that can test the result of the FPU compare (C.cond) instructions. Appendix B gives a detailed description of each instruction.

Floating-Point Compare Operations
The floating-point compare (C.fmt.cond) instructions interpret the contents of two FPU registers (fs, ft) in the specified format (fmt) and arithmetically compare them. A result is determined based on the comparison and conditions (cond) specified in the instruction.
Table 6.12 on page 11 lists the compare instructions; Appendix B gives a detailed description of each instruction. Table 6.13 on page 13 lists the mnemonics for the compare instruction conditions.
FPU Instruction Pipeline Overview

The FPU provides an instruction pipeline that parallels the CPU instruction pipeline. It shares the same five-stage pipeline architecture with the CPU (see Chapter 3).

Instruction Execution

Figure 6.9 illustrates the 5-stage FPU pipeline. This is the same as that of the integer pipeline but allows for the longer execution times of the floating-point instructions.

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Definition</th>
<th>Mnemonic</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>F</td>
<td>False</td>
<td>T</td>
<td>True</td>
</tr>
<tr>
<td>UN</td>
<td>Unordered</td>
<td>OR</td>
<td>Ordered</td>
</tr>
<tr>
<td>EQ</td>
<td>Equal</td>
<td>NEQ</td>
<td>Not Equal</td>
</tr>
<tr>
<td>UEQ</td>
<td>Unordered or Equal</td>
<td>OLG</td>
<td>Ordered or Less Than or Greater Than</td>
</tr>
<tr>
<td>OLT</td>
<td>Ordered Less Than</td>
<td>UGE</td>
<td>Unordered or Greater Than or Equal</td>
</tr>
<tr>
<td>ULT</td>
<td>Unordered or Less Than</td>
<td>OGE</td>
<td>Ordered Greater Than</td>
</tr>
<tr>
<td>OLE</td>
<td>Ordered Less Than or Equal</td>
<td>UGT</td>
<td>Unordered or Greater Than</td>
</tr>
<tr>
<td>ULE</td>
<td>Unordered or Less Than or Equal</td>
<td>OGT</td>
<td>Ordered Greater Than</td>
</tr>
<tr>
<td>SF</td>
<td>Signaling False</td>
<td>ST</td>
<td>Signaling True</td>
</tr>
<tr>
<td>NGLE</td>
<td>Not Greater Than or Less Than or Equal</td>
<td>GLE</td>
<td>Greater Than, or Less Than or Equal</td>
</tr>
<tr>
<td>SEQ</td>
<td>Signaling Equal</td>
<td>SNE</td>
<td>Signaling Not Equal</td>
</tr>
<tr>
<td>NGL</td>
<td>Not Greater Than or Less Than</td>
<td>GL</td>
<td>Greater Than or Less Than</td>
</tr>
<tr>
<td>LT</td>
<td>Less Than</td>
<td>NLT</td>
<td>Not Less Than</td>
</tr>
<tr>
<td>NGE</td>
<td>Not Greater Than or Equal</td>
<td>GE</td>
<td>Greater Than or Equal</td>
</tr>
<tr>
<td>LE</td>
<td>Less Than or Equal</td>
<td>NLE</td>
<td>Not Less Than or Equal</td>
</tr>
<tr>
<td>NGT</td>
<td>Not Greater Than</td>
<td>GT</td>
<td>Greater Than</td>
</tr>
</tbody>
</table>

Table 6.13 Mnemonics and Definitions of Compare Instruction Conditions

![Diagram of FPU Instruction Pipeline](image-url)
Figure 6.9 on page 6-13 assumes that one instruction is completed every PCycle. Most FPU instructions, however, require more than one cycle in the EX stage. This means the FPU must stall the pipeline if an instruction execution cannot proceed because of register or resource conflicts.

Floating-point operations proceed in parallel with non-floating-point operations. Floating-point operations are not allowed to overlap each other, with two exceptions:

- An add operation may start 2 cycles after the start of a multiply and thus will be completely overlapped by the multiply.
- A multiply operation may overlap for up to 2 cycles, as follows:
  - R4600: A new multiply may start 6 cycles after another multiply.
  - R4700: A new multiply may start 4 cycles after another multiply (for both single and double precision).

Non-floating-point operations as well as other integer operations may be executed in parallel with the floating-point operations. All of this is handled automatically by internal hardware in the R4600/R4700.

### Instruction Execution Cycle Time

Unlike the CPU, which executes almost all instructions in a single cycle, more time may be required to execute FPU instructions.

Table 6.14 gives the minimum latency of each floating-point operation.

<table>
<thead>
<tr>
<th>Operation</th>
<th>Pipeline Cycles</th>
<th>Operation</th>
<th>Pipeline Cycles</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Single</td>
<td>Double</td>
<td></td>
</tr>
<tr>
<td>ADD.fmt</td>
<td>4</td>
<td>4</td>
<td>BC1T</td>
</tr>
<tr>
<td>SUB.fmt</td>
<td>4</td>
<td>4</td>
<td>BC1F</td>
</tr>
<tr>
<td>MUL.fmt</td>
<td>8</td>
<td>8</td>
<td>BC1TL</td>
</tr>
<tr>
<td>R4600</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>R4700</td>
<td>4</td>
<td>5</td>
<td></td>
</tr>
<tr>
<td>DIV.fmt</td>
<td>32</td>
<td>61</td>
<td>BC1FL</td>
</tr>
<tr>
<td>SQRT.fmt</td>
<td>31</td>
<td>60</td>
<td>LWC1, LDC1</td>
</tr>
<tr>
<td>ABS.fmt</td>
<td>1</td>
<td>1</td>
<td>SWC1, SDC1</td>
</tr>
<tr>
<td>MOV.fmt</td>
<td>1</td>
<td>1</td>
<td>TRUNC.W.fmt</td>
</tr>
<tr>
<td>NEG.fmt</td>
<td>1</td>
<td>1</td>
<td>MTC1, DMTC1</td>
</tr>
<tr>
<td>ROUND.W.fmt</td>
<td>4</td>
<td>4</td>
<td>MFC1, DMFC1</td>
</tr>
<tr>
<td>CEIL.W.fmt</td>
<td>4</td>
<td>4</td>
<td>CTC1</td>
</tr>
<tr>
<td>FLOOR.W.fmt</td>
<td>4</td>
<td>4</td>
<td>CFC1</td>
</tr>
<tr>
<td>CVT.S.fmt</td>
<td>(a)</td>
<td>4</td>
<td>CMP</td>
</tr>
<tr>
<td>CVT.D.fmt</td>
<td>2</td>
<td>(a)</td>
<td>FIX</td>
</tr>
<tr>
<td>CVT.W.fmt</td>
<td>4</td>
<td>4</td>
<td>FLOAT</td>
</tr>
<tr>
<td>C.fmt.cond</td>
<td>3</td>
<td>3</td>
<td></td>
</tr>
</tbody>
</table>

*Note:* (a) These operations are illegal.

Table 6.14 Floating-Point Operation Latencies
Instruction Scheduling Constraints
The FPU resource scheduler only issues instructions to the FPU op units (adder and multiplier) when no hardware use conflicts will occur. In addition, some overlap possibilities are disallowed to keep the scheduler simple (and/or increase performance).

FPU Multiplier Constraints
The FPU multiplier is partially pipelined in the R4600, allowing a new multiply to begin every 6 cycles. It is more fully pipelined in the R4700, allowing a new multiply to begin every 4 cycles.

FPU Adder Constraints
The FPU scheduler may issue an add operation (ADD.fmt or SUB.fmt) 2 cycles after a multiply (MUL.fmt).

Resource Scheduling Rules
The FPU Resource Scheduler issues instructions while adhering to the rules described below. These scheduling rules optimize op unit executions; if the rules are not followed, the hardware interlocks to guarantee correct operation.

DIV.[S,D] can start only when all of the following conditions are met in the 1A phase.
• The adder is idle (division is performed in the adder).
• The multiplier is idle.

MUL.[S,D] can start only when all of the following conditions are met in the 1A phase.
• The multiplier is one of the following:
  - idle.
  - Started execution at least 6 cycles earlier on the current multiply
• The adder is idle.

SQRT.[S,D] can start when the following conditions are met in the 1A phase.
• The adder is idle.
• The multiplier must be idle.

CVT.fmt instructions can only start when all of the following conditions are met in the 1A phase.
• The adder is idle.
• The multiplier is idle.

ADD.[S,D] or SUB.[S,D] can start only when all of the following conditions are met in the 1A phase.
• The adder is idle
• The multiplier is either:
  - idle.
  - started execution of the current multiply at least 2 cycles earlier.

NEG.[S,D] or ABS.[S,D] can start only when all of the following conditions are met in the 1A phase.
• The adder is idle.
• The multiplier is idle.

C.COND.[S,D] can start only when all of the following conditions are met in the 1A phase.
• The adder is idle.
• The multiplier is idle.
This chapter describes FPU floating-point exceptions, including FPU exception types, exception trap processing, exception flags, saving and restoring state when handling an exception, and trap handlers for IEEE Standard 754 exceptions.

A floating-point exception occurs whenever the FPU cannot handle either the operands or the results of a floating-point operation in its normal way. The FPU responds by generating an exception to initiate a software trap or by setting a status flag.

**Exception Types**

The FP Control/Status register described in Chapter 6 contains an Enable bit for each exception type; exception Enable bits determine whether an exception will cause the FPU to initiate a trap or set a status flag.

- If a trap is taken, the FPU remains in the state found at the beginning of the operation and a software exception handling routine executes.
- If no trap is taken, an appropriate value is written into the FPU destination register and execution continues.

The FPU supports the five IEEE Standard 754 exceptions:
- Inexact (I)
- Underflow (U)
- Overflow (O)
- Division by Zero (Z)
- Invalid Operation (V)

Cause bits, Enables, and Flag bits (status flags) are used.

The FPU adds a sixth exception type, Unimplemented Operation (E). This exception indicates the use of a software implementation. The Unimplemented Operation exception has no Enable or Flag bit; whenever this exception occurs, an unimplemented exception trap is taken.

Figure 7.1 illustrates the Control/Status register bits that support exceptions.

![Figure 7.1 Control/Status Register Exception/Flag/Trap/Enable Bits](image-url)
Each of the five IEEE Standard 754 exceptions (V, Z, O, U, I) is associated with a trap under user control, and is enabled by setting one of the five Enable bits. When an exception occurs and its corresponding Enable bit is not set, both the corresponding Cause and Flag bits are set. When an exception occurs and its corresponding Enable bit is set, the corresponding Cause bit is set and the subsequent exception processing allows a trap to be taken.

**Exception Trap Processing**

When a floating-point exception trap is taken, the Cause register indicates the floating-point coprocessor is the cause of the exception trap. The Floating-Point Exception (FPE) code is used, and the Cause bits of the floating-point Control/Status register indicate the reason for the floating-point exception. These bits are, in effect, an extension of the system coprocessor Cause register.

**Flags**

A Flag bit is provided for each IEEE exception. This Flag bit is set to a 1 on the assertion of its corresponding exception, with no corresponding exception trap signaled.

The Flag bit is reset by writing a new value into the Status register; flags can be saved and restored by software either individually or as a group.

When no exception trap is signaled, the floating-point coprocessor takes a default action, providing a substitute value for the exception-causing result of the floating-point operation. The particular default action taken depends upon the type of exception. Table 7.1 lists the default action taken by the FPU for each of the IEEE exceptions.

<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
<th>Rounding Mode</th>
<th>Default action</th>
</tr>
</thead>
<tbody>
<tr>
<td>I</td>
<td>Inexact exception</td>
<td>Any</td>
<td>Supply a rounded result</td>
</tr>
<tr>
<td>U</td>
<td>Underflow exception</td>
<td>Any</td>
<td>Take unimplemented unless FCSR.FS bit is set.</td>
</tr>
<tr>
<td>O</td>
<td>Overflow exception</td>
<td>RN</td>
<td>Modify overflow values to $\infty$ with the sign of the intermediate result</td>
</tr>
<tr>
<td></td>
<td></td>
<td>RZ</td>
<td>Modify overflow values to the format’s largest finite number with the sign of the intermediate result</td>
</tr>
<tr>
<td></td>
<td></td>
<td>RP</td>
<td>Modify negative overflows to the format’s most negative finite number; modify positive overflows to $+\infty$</td>
</tr>
<tr>
<td></td>
<td></td>
<td>RM</td>
<td>Modify positive overflows to the format’s largest finite number; modify negative overflows to $-\infty$</td>
</tr>
<tr>
<td>Z</td>
<td>Division by zero</td>
<td>Any</td>
<td>Supply a properly signed $\infty$</td>
</tr>
<tr>
<td>V</td>
<td>Invalid operation</td>
<td>Any</td>
<td>Supply a quiet Not a Number (NaN)</td>
</tr>
</tbody>
</table>

**Table 7.1 Default FPU Exception Actions**

The FPU detects the eight exception causes internally. When the FPU encounters one of these unusual situations, it causes either an IEEE exception or an Unimplemented Operation exception (E).
lists the exception-causing conditions of the IEEE Standard 754.

<table>
<thead>
<tr>
<th>FPA Internal Result</th>
<th>IEEE Standard 754</th>
<th>Trap Enable</th>
<th>Trap Disable</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inexact result</td>
<td>I</td>
<td>I</td>
<td>I</td>
<td>Loss of accuracy</td>
</tr>
<tr>
<td>Exponent overflow</td>
<td>O, I</td>
<td>O, I</td>
<td>O, I</td>
<td>Normalized exponent &gt; E_{\text{max}}</td>
</tr>
<tr>
<td>Division by zero</td>
<td>Z</td>
<td>Z</td>
<td>Z</td>
<td>Zero is (exponent = E_{\text{min}}-1, mantissa = 0)</td>
</tr>
<tr>
<td>Overflow on convert</td>
<td>V</td>
<td>E</td>
<td>E</td>
<td>Source out of integer range</td>
</tr>
<tr>
<td>Signaling NaN source</td>
<td>V</td>
<td>V</td>
<td>V</td>
<td>Signaling NaN source produces quiet NaN result</td>
</tr>
<tr>
<td>Invalid operation</td>
<td>V</td>
<td>V</td>
<td>V</td>
<td></td>
</tr>
<tr>
<td>Exponent underflow</td>
<td>U</td>
<td>E</td>
<td>E</td>
<td>Normalized exponent &lt; E_{\text{min}}</td>
</tr>
<tr>
<td>Denormalized source</td>
<td>None</td>
<td>E</td>
<td>E</td>
<td>Exponent = E-1 and mantissa &lt;&gt; 0</td>
</tr>
</tbody>
</table>

Note: aThe IEEE Standard 754 specifies an inexact exception on overflow only if the overflow trap is disabled.

Table 7.2 FPU Exception-Causing Conditions

FPU Exceptions
The following sections describe the conditions that cause the FPU to generate each of its exceptions, and details the FPU response to each exception-causing condition.

Inexact Exception (I)
The FPU generates the Inexact exception if the rounded result of an operation is not exact or if it overflows. The FPU usually examines the operands of floating-point operations before execution actually begins, to determine (based on the exponent values of the operands) if the operation can possibly cause an exception. If there is a possibility of an instruction causing an exception trap, the FPU uses a coprocessor stall to execute the instruction.

It is impossible, however, for the FPU to predetermine if an instruction will produce an inexact result. If Inexact exception traps are enabled, the FPU uses the coprocessor stall mechanism to execute all floating-point operations that require more than two cycles. Since this mode of execution can impact performance, Inexact exception traps should be enabled only when necessary.

Trap Enabled Results: If Inexact exception traps are enabled, the result register is not modified and the source registers are preserved.

Trap Disabled Results: The rounded or overflowed result is delivered to the destination register if no other software trap occurs.

Invalid Operation Exception (V)
The Invalid Operation exception is signaled if one or both of the operands are invalid for an implemented operation. When the exception occurs without a trap, the MIPS ISA defines the result as a quiet Not a Number (NaN). The invalid operations are:

- Addition or subtraction: magnitude subtraction of infinities, such as: (+ \infty) + (- \infty) or (- \infty) - (- \infty)
- Multiplication: 0 times \infty, with any signs
- Division: 0/0, or \infty/\infty, with any signs
- Comparison of predicates involving < or > without?, when the operands are unordered
- Any arithmetic operation on a signaling NaN. A move (MOV) operation is not considered to be an arithmetic operation, but absolute value (ABS) and negate (NEG) are considered to be arithmetic operations and cause this exception if one or both operands is a signaling NaN.
- Square root: \sqrt{x}, where x is less than zero
Software can simulate the Invalid Operation exception for other operations that are invalid for the given source operands. Examples of these operations include IEEE Standard 754-specified functions implemented in software, such as Remainder: $x \text{ REM } y$, where $y$ is 0 or $x$ is infinite; conversion of a floating-point number to a decimal format whose value causes an overflow, is infinity, or is NaN; and transcendental functions, such as $\ln(-5)$ or $\cos^{-1}(3)$. Refer to Appendix B for examples or for routines to handle these cases.

**Trap Enabled Results:** The original operand values are undisturbed.

**Trap Disabled Results:** The FPU sets the Invalid Operation Exception flag and a quiet NaN is delivered to the destination register.

### Division-by-Zero Exception (Z)

The Division-by-Zero exception is signaled on an implemented divide operation if the divisor is zero and the dividend is a finite nonzero number. Software can simulate this exception for other operations that produce a signed infinity, such as $\ln(0)$, $\sec(\pi/2)$, $\csc(0)$, or $0^{-1}$.

**Trap Enabled Results:** The result register is not modified, and the source registers are preserved.

**Trap Disabled Results:** The result, when no trap occurs, is a correctly signed infinity.

### Overflow Exception (O)

The Overflow exception is signaled when the magnitude of the rounded floating-point result, with an unbounded exponent range, is larger than the largest finite number of the destination format. (This exception also sets the Inexact exception and Flag bits.)

**Trap Enabled Results:** The result register is not modified, and the source registers are preserved.

**Trap Disabled Results:** The result, when no trap occurs, is determined by the rounding mode and the sign of the intermediate result.

### Underflow Exception (U)

Two related events contribute to the Underflow exception:

- creation of a tiny nonzero result between $\pm2^{\text{Emin}}$ which can cause some later exception because it is so tiny
- extraordinary loss of accuracy during the approximation of such tiny numbers by denormalized numbers.

IEEE Standard 754 allows a variety of ways to detect these events, but requires they be detected the same way for all operations.

Tinniness can be detected by one of the following methods:

- after rounding (when a nonzero result, computed as though the exponent range were unbounded, would lie strictly between $\pm2^{\text{Emin}}$)
- before rounding (when a nonzero result, computed as though the exponent range and the precision were unbounded, would lie strictly between $\pm2^{\text{Emin}}$).

The MIPS architecture requires that tinniness be detected after rounding.

Loss of accuracy can be detected by one of the following methods:

- denormalization loss (when the delivered result differs from what would have been computed if the exponent range were unbounded)
- inexact result (when the delivered result differs from what would have been computed if the exponent range and precision were both unbounded).

The MIPS architecture requires that loss of accuracy be detected as an inexact result.

**Trap Enabled Results:** When an underflow trap is enabled, underflow is signaled when tinniness is detected regardless of loss of accuracy. If underflow traps are enabled, the result register is not modified, and the source registers are preserved.
**Trap Disabled Results:** When an underflow trap is not enabled and FCSR.FS is clear, then take an unimplemented exception. When an underflow trap is not enabled and FCSR.FS is set, raise Inexact and return either 0 or $\pm 2^{E_{\text{min}}}$, as appropriate for the current rounding mode.

**Unimplemented Instruction Exception (E)**

Any attempt to execute an instruction with an operation code or format code that has been reserved for future definition sets the *Unimplemented* bit in the *Cause* field in the FPU Control/Status register and traps. The operand and destination registers remain undisturbed and the instruction is emulated in software. Any of the IEEE Standard 754 exceptions can arise from the emulated operation, and these exceptions in turn are simulated.

The Unimplemented Instruction exception can also be signaled when unusual operands or result conditions are detected that the implemented hardware cannot handle properly. These include:

- Denormalized operand
- Quiet NaN operand
- Underflow
- Reserved opcodes
- Unimplemented formats
- Conversion of a floating-point number to a fixed point format when an overflow occurs or the source operand value is Infinity or a NaN.
- Operations which are invalid for their format (for instance, CVT.S.S)

Denormalized and NaN operands are only trapped if the instruction is a convert or computational operation. Moves and compares do not trap if their operands are either denormalized or NaNs.

The use of this exception for such conditions is optional; most of these conditions are newly developed and are not expected to be widely used in early implementations. Loopholes are provided in the architecture so that these conditions can be implemented with assistance provided by software, maintaining full compatibility with the IEEE Standard 754.

**Trap Enabled Results:** The original operand values are undisturbed.

**Trap Disabled Results:** This trap cannot be disabled.

**Saving and Restoring State**

Sixteen or thirty-two doubleword coprocessor load or store operations save or restore the coprocessor floating-point register state in memory. The remainder of control and status information can be saved or restored through Move To/From Coprocessor Control Register instructions, and saving and restoring the processor registers. Normally, the Control/Status register is saved first and restored last.

When the coprocessor Control/Status register (**FCR31**) is read, and the coprocessor is executing one or more floating-point instructions, the instruction(s) in progress are either completed or reported as exceptions. The architecture requires that no more than one of these pending instructions can cause an exception. Information indicating the type of exception is placed in the Control/Status register. When state is restored, state information in the status word indicates that exceptions are pending.

Writing a zero value to the *Cause* field of Control/Status register clears all pending exceptions, permitting normal processing to restart after the floating-point register state is restored.

The *Cause* field of the Control/Status register holds the results of only one instruction; the FPU examines source operands before an operation is initiated to determine if this instruction can possibly cause an exception. If an exception is possible, the FPU executes the instruction in stall mode to ensure that no more than one instruction (that might cause an exception) is executed at a time.
Trap Handlers for IEEE Standard 754 Exceptions

The IEEE Standard 754 strongly recommends that users be allowed to specify a trap handler for any of the five standard exceptions that can compute: the trap handler can either compute or specify a substitute result to be placed in the destination register of the operation.

By retrieving an instruction using the processor Exception Program Counter (EPC) register, the trap handler determines:

• exceptions occurring during the operation
• the operation being performed
• the destination format

On Overflow or Underflow exceptions (except for conversions), and on Inexact exceptions, the trap handler gains access to the correctly rounded result by examining source registers and simulating the operation in software.

On Overflow or Underflow exceptions encountered on floating-point conversions, and on Invalid Operation and Divide-by-Zero exceptions, the trap handler gains access to the operand values by examining the source registers of the instruction.

The IEEE Standard 754 recommends that, if enabled, the overflow and underflow traps take precedence over a separate inexact trap. This prioritization is accomplished in software; hardware sets the bits for both the Inexact exception and the Overflow or Underflow exception.
Introduction

This chapter describes the signals used by and in conjunction with the R4600/R4700 processor. The signals include the System interface, the Clock/Control interface, the Interrupt interface, the Joint Test Action Group (JTAG) interface, and the Initialization interface.

Signals are listed in bold, and low active signals have a trailing asterisk — for instance, the low-active Read Ready signal is \textbf{RdRdy}. The signal description also tells if the signal is an input (the processor receives it) or output (the processor sends it out).

Figure 8.1 illustrates the functional groupings of the processor signals.

\begin{figure}
\centering
\includegraphics[width=\textwidth]{image.png}
\caption{R4600/ R4700 Processor Signals}
\end{figure}
### System Interface Signals

System interface signals provide the connection between the R4600/R4700 processor and the other components in the system. Table 8.1 lists the system interface signals.

<table>
<thead>
<tr>
<th>Name</th>
<th>Definition</th>
<th>Direction</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ExtRqst*</td>
<td>External request</td>
<td>Input</td>
<td>An external agent asserts \texttt{ExtRqst} to request use of the System interface. The processor grants the request by asserting \texttt{Release}.</td>
</tr>
<tr>
<td>Release*</td>
<td>Release interface</td>
<td>Output</td>
<td>In response to the assertion of \texttt{ExtRqst} or a CPU read request, the processor asserts \texttt{Release}, signalling to the requesting device that the System interface is available.</td>
</tr>
<tr>
<td>RdRdy*</td>
<td>Read ready</td>
<td>Input</td>
<td>The external agent asserts \texttt{RdRdy} to indicate that it can accept a processor read request.</td>
</tr>
<tr>
<td>SysAD(63:0)</td>
<td>System address/data bus</td>
<td>Input/Output</td>
<td>A 64-bit address and data bus for communication between the processor and an external agent.</td>
</tr>
<tr>
<td>SysADC(7:0)</td>
<td>System address/data check bus</td>
<td>Input/Output</td>
<td>An 8-bit bus containing check bits for the \texttt{SysAD} bus.</td>
</tr>
<tr>
<td>SysCmd(8:0)</td>
<td>System command/data identificer</td>
<td>Input/Output</td>
<td>A 9-bit bus for command and data identifier transmission between the processor and an external agent.</td>
</tr>
<tr>
<td>SysCmdP</td>
<td>System command/data identifier bus parity</td>
<td>Input/Output</td>
<td>A single, even-parity bit for the \texttt{SysCmd} bus.</td>
</tr>
<tr>
<td>ValidIn*</td>
<td>Valid input</td>
<td>Input</td>
<td>The external agent asserts \texttt{ValidIn} when it is driving a valid address or data on the SysAD bus and a valid command or data identifier on the SysCmd bus.</td>
</tr>
<tr>
<td>ValidOut*</td>
<td>Valid output</td>
<td>Output</td>
<td>The processor asserts \texttt{ValidOut} when it is driving a valid address or data on the SysAD bus and a valid command or data identifier on the SysCmd bus.</td>
</tr>
<tr>
<td>WrRdy*</td>
<td>Write ready</td>
<td>Input</td>
<td>An external agent asserts \texttt{WrRdy} when it can accept a processor write request.</td>
</tr>
</tbody>
</table>

*Table 8.1 System Interface Signals*
Clock/Control Interface Signals

The Clock/Control interface signals make up the interface for clocking and maintenance. Table 8.2 lists the Clock/Control interface signals.

<table>
<thead>
<tr>
<th>Name</th>
<th>Definition</th>
<th>Direction</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>IOOut</td>
<td>I/O output</td>
<td>Output</td>
<td>Reserved for future output. Always High.</td>
</tr>
<tr>
<td>IOIn</td>
<td>I/O input</td>
<td>Input</td>
<td>Reserved for future input. Should be driven High.</td>
</tr>
<tr>
<td>MasterClock</td>
<td>Master clock</td>
<td>Input</td>
<td>Master clock input that establishes the processor operating frequency. It is 1/2 the pipeline frequency.</td>
</tr>
<tr>
<td>MasterOut</td>
<td>Master clock out</td>
<td>Output</td>
<td>Master clock output aligned with MasterClock.</td>
</tr>
<tr>
<td>RClock(1:0)</td>
<td>Receive clocks</td>
<td>Output</td>
<td>Two identical receive clocks that establish the System interface frequency.</td>
</tr>
<tr>
<td>SyncOut</td>
<td>Synchronization</td>
<td>Output</td>
<td>SyncOut must be connected to SyncIn through an interconnect that models the interconnect between MasterOut, TClock, RClock, and the external agent.</td>
</tr>
<tr>
<td>SyncIn</td>
<td>Synchronization</td>
<td>Input</td>
<td>Synchronization clock input.</td>
</tr>
<tr>
<td>TClock(1:0)</td>
<td>Transmit clocks</td>
<td>Output</td>
<td>Two identical transmit clocks that establish the System interface frequency.</td>
</tr>
<tr>
<td>VccP</td>
<td>Quiet Vcc for PLL</td>
<td>Input</td>
<td>Quiet Vcc for the internal phase locked loop.</td>
</tr>
<tr>
<td>VssP</td>
<td>Quiet Vss for PLL</td>
<td>Input</td>
<td>Quiet Vss for the internal phase locked loop.</td>
</tr>
</tbody>
</table>

Table 8.2 Clock/Control Interface Signals
Interrupt Interface Signals

The Interrupt interface signals make up the interface used by external agents to interrupt the R4600/R4700 processor. Six hardware interrupts (Int*[5:0]) and one NMI are available on the R4600/R4700. Table 8.3 lists the Interrupt interface signals.

<table>
<thead>
<tr>
<th>Name</th>
<th>Definition</th>
<th>Direction</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Int*[5:0]</td>
<td>Interrupt</td>
<td>Input</td>
<td>Six general processor interrupts, bit-wise ORed with bits 5:0 of the interrupt register.</td>
</tr>
<tr>
<td>NMI*</td>
<td>Nonmaskable interrupt</td>
<td>Input</td>
<td>Nonmaskable interrupt, ORed with bit 6 of the interrupt register.</td>
</tr>
</tbody>
</table>

Table 8.3 Interrupt Interface Signals

JTAG Interface Signals

The R4600/R4700 does not implement JTAG. The signals are provided for compatibility with existing R4x00PC designs. Table 8.4 lists the JTAG interface signals.

<table>
<thead>
<tr>
<th>Name</th>
<th>Definition</th>
<th>Direction</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>JTDI</td>
<td>JTAG data in</td>
<td>Input</td>
<td>Connected directly to JTDO. No JTAG implemented. Should be pulled High.</td>
</tr>
<tr>
<td>JTCK</td>
<td>TAG clock input</td>
<td>Input</td>
<td>Unused input. Should be pulled High.</td>
</tr>
<tr>
<td>JTDI</td>
<td>JTAG data out</td>
<td>Output</td>
<td>Connected directly to JTDI. If no external scan used, this is a no connect.</td>
</tr>
<tr>
<td>JTMS</td>
<td>JTAG command</td>
<td>Input</td>
<td>Unused input. Should be pulled High.</td>
</tr>
</tbody>
</table>

Table 8.4 JTAG Interface Signals
Initialization Interface Signals

The Initialization interface signals make up the interface by which an external agent initializes the processor operating parameters. Table 8.5 lists the Initialization interface signals.

<table>
<thead>
<tr>
<th>Name</th>
<th>Definition</th>
<th>Direction</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ColdReset*</td>
<td>Cold reset</td>
<td>Input</td>
<td>This signal must be asserted for a power on reset or a cold reset. The clocks SClock, TClock, and RClock begin to cycle and are synchronized with the deasserted edge of ColdReset*. ColdReset* must be deasserted synchronously with MasterClock.</td>
</tr>
<tr>
<td>ModeClock</td>
<td>Boot mode clock</td>
<td>Output</td>
<td>Serial boot-mode data clock output; runs at the Master Clock frequency divided by 256: (MasterClock/256).</td>
</tr>
<tr>
<td>ModeIn</td>
<td>Boot mode data in</td>
<td>Input</td>
<td>Serial boot-mode data input.</td>
</tr>
<tr>
<td>Reset*</td>
<td>Reset</td>
<td>Input</td>
<td>This signal must be asserted for any reset sequence. It can be asserted synchronously or asynchronously for a cold reset, or synchronously to initiate a warm reset. Reset* must be deasserted synchronously with MasterClock.</td>
</tr>
<tr>
<td>VCCOk</td>
<td>Vcc is OK</td>
<td>Input</td>
<td>When asserted, this signal indicates to the processor that VCC &gt; VCC-min for more than 100 milliseconds and will remain stable. The assertion of VCCOk initiates the initialization sequence.</td>
</tr>
</tbody>
</table>

Table 8.5 Initialization Interface Signals
Table 8.6 lists the R4600/R4700 processor signals and their possible states.

<table>
<thead>
<tr>
<th>Description</th>
<th>Name</th>
<th>I/O</th>
<th>Asserted State</th>
<th>3-State</th>
<th>Reset State</th>
</tr>
</thead>
<tbody>
<tr>
<td>System address/data bus</td>
<td>SysAD[63:0]</td>
<td>I/O</td>
<td>High</td>
<td>Yes</td>
<td>a</td>
</tr>
<tr>
<td>System address/data check bus</td>
<td>SysADC[7:0]</td>
<td>I/O</td>
<td>High</td>
<td>Yes</td>
<td>a</td>
</tr>
<tr>
<td>System command/data identifier bus</td>
<td>SysCmd[8:0]</td>
<td>I/O</td>
<td>High</td>
<td>Yes</td>
<td>a</td>
</tr>
<tr>
<td>System command/data identifier bus parity</td>
<td>SysCmdP</td>
<td>I/O</td>
<td>High</td>
<td>Yes</td>
<td>a</td>
</tr>
<tr>
<td>Valid input</td>
<td>ValidIn*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Valid output</td>
<td>ValidOut*</td>
<td>O</td>
<td>Low</td>
<td>Yes</td>
<td>b</td>
</tr>
<tr>
<td>External request</td>
<td>ExtRqst*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Release interface</td>
<td>Release*</td>
<td>O</td>
<td>Low</td>
<td>Yes</td>
<td>b</td>
</tr>
<tr>
<td>Read ready</td>
<td>RdRdy*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Write ready</td>
<td>WrRdy*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Interrupts</td>
<td>Int*[5:0]</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Nonmaskable interrupt</td>
<td>NMI*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Boot mode data in</td>
<td>ModeIn</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Boot mode clock</td>
<td>ModeClock</td>
<td>O</td>
<td>High</td>
<td>No</td>
<td>d</td>
</tr>
<tr>
<td>JTAG data in</td>
<td>JTDI</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>JTAG data out</td>
<td>JTDO</td>
<td>O</td>
<td>High</td>
<td>Yes</td>
<td>b</td>
</tr>
<tr>
<td>JTAG command</td>
<td>JTMS</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>JTAG clock input</td>
<td>JTCK</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Transmit clocks</td>
<td>TClock[1:0]</td>
<td>O</td>
<td>High</td>
<td>Yes</td>
<td>c</td>
</tr>
<tr>
<td>Receive clocks</td>
<td>RClock[1:0]</td>
<td>O</td>
<td>High</td>
<td>Yes</td>
<td>c</td>
</tr>
<tr>
<td>Master clock</td>
<td>MasterClock</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Master clock out</td>
<td>MasterOut</td>
<td>O</td>
<td>High</td>
<td>Yes</td>
<td>c</td>
</tr>
<tr>
<td>Synchronization clock out</td>
<td>SyncOut</td>
<td>O</td>
<td>High</td>
<td>Yes</td>
<td>c</td>
</tr>
<tr>
<td>Synchronization clock in</td>
<td>SyncIn</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>I/O output</td>
<td>IOOut</td>
<td>O</td>
<td>High</td>
<td>Yes</td>
<td>b</td>
</tr>
<tr>
<td>I/O input</td>
<td>IOIn</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Vcc is OK</td>
<td>VCCOk</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Cold reset</td>
<td>ColdReset*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Reset</td>
<td>Reset*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Fault</td>
<td>Fault*</td>
<td>O</td>
<td>Low</td>
<td>Yes</td>
<td>b</td>
</tr>
</tbody>
</table>

**Key to Reset State Column:**
- a All I/O pins [SysAD[63:0], SysADC[7:0], etc.] remain 3-stated until the Reset* signal deasserts.
- b All output only pins (ValidOut*, Release*, etc.), except the clocks, are 3-stated until the ColdReset* signal deasserts.
- c All clocks, except ModeClock, are 3-stated until VCCOk asserts.
- d ModeClock is always driven.
- NA Not applicable to input pins.
Introduction
This chapter describes the R4600/R4700 Initialization interface. This includes the reset signal description and types, initialization sequence, with signals and timing dependencies, and boot modes, which are set at initialization time.

Signal names are listed in bold letters—for instance the signal $VCCOk$ indicates the Vcc voltage is stable. Low-active signals are indicated by an asterisk at the end of the name, as in $ColdReset^*$. 

Functional Overview
The R4600/R4700 processor has the following three types of resets. Refer to Figure 9.1 on page 9-4, Figure 9.2 on page 9-5, and Figure 9.3 on page 9-6 for timing diagrams of these resets.

- **Power-on reset**: Starts when the power supply is turned on and completely reinitializes the internal state machine of the processor without saving any state information.

- **Cold reset**: Restarts all clocks, but the power supply remains stable. A cold reset completely reinitializes the internal state machine of the processor without saving any state information.

- **Warm reset**: Restarts processor, but does not affect clocks. A warm reset preserves the processor internal state.

These resets use the $VCCOk$, $ColdReset^*$, and $Reset^*$ input signals, which are summarized in the next subsection. Descriptions of each type of reset operation is described.

The Initialization interface is a serial interface that operates at the frequency of the $MasterClock$ divided by 256 (i.e. $MasterClock/256$). This low-frequency operation allows the initialization information to be stored in a low-cost EPROM or PLD.

Reset and Initialization Signal Descriptions
This section describes the three reset signals, $VCCOk$, $ColdReset^*$, and $Reset^*$, and the two initialization signals, $ModeIn$ and $ModeClock$.

- **$VCCOk$**: When asserted\(^1\), $VCCOk$ indicates to the processor that the 5.0 (3.3) volt power supply (Vcc) has been above 4.75 (3.0) volts for more than 100 milliseconds (ms) and is expected to remain stable. The assertion of $VCCOk$ initiates the reading of the boot-time mode control serial stream. This is described in the subsection “Initialization Sequence” on page 9-4.

- **$ColdReset^*$**: The $ColdReset^*$ signal must be asserted (low) for either a power-on reset or a cold reset. The clocks $SClock$, $TClock$, and $RClock$ begin to cycle and are synchronized with the de-asserted edge (high) of $ColdReset^*$. $ColdReset^*$ must be de-asserted synchronously with $MasterClock$.

- **$Reset^*$**: The $Reset^*$ signal must be asserted for any reset sequence. It can be asserted synchronously or asynchronously for a cold reset, or synchronously to initiate a warm reset. $Reset^*$ must be de-asserted synchronously with $MasterClock$.

- **$ModeIn$**: Serial boot mode data in.

- **$ModeClock$**: Serial boot mode data out, at the $MasterClock$ frequency divided by 256 ($MasterClock/256$).

---

\(^1\) *Asserted* means the signal is true, or in its valid state. For example, the low-active $Reset^*$ signal is said to be asserted when it is in a low (true) state; the high-active $VCCOk$ signal is true when it is asserted high.
Table 9.1 lists the processor signals and their possible states.

<table>
<thead>
<tr>
<th>Description</th>
<th>Name</th>
<th>I/O</th>
<th>Asserted State</th>
<th>3-State</th>
<th>Reset State</th>
</tr>
</thead>
<tbody>
<tr>
<td>System address/data bus</td>
<td>SysAD[63:0]</td>
<td>I/O</td>
<td>High</td>
<td>Yes</td>
<td>a</td>
</tr>
<tr>
<td>System address/data check bus</td>
<td>SysADC[7:0]</td>
<td>I/O</td>
<td>High</td>
<td>Yes</td>
<td>a</td>
</tr>
<tr>
<td>System command/data identifier bus</td>
<td>SysCmd[8:0]</td>
<td>I/O</td>
<td>High</td>
<td>Yes</td>
<td>a</td>
</tr>
<tr>
<td>System command/data identifier bus parity</td>
<td>SysCmdP</td>
<td>I/O</td>
<td>High</td>
<td>Yes</td>
<td>a</td>
</tr>
<tr>
<td>Valid input</td>
<td>ValidIn*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Valid output</td>
<td>ValidOut*</td>
<td>O</td>
<td>Low</td>
<td>Yes</td>
<td>b</td>
</tr>
<tr>
<td>External request</td>
<td>ExtRqst*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Release interface</td>
<td>Release*</td>
<td>O</td>
<td>Low</td>
<td>Yes</td>
<td>b</td>
</tr>
<tr>
<td>Read ready</td>
<td>RdRdy*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Write ready</td>
<td>WrRdy*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Interrupts</td>
<td>Int*[5:0]</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Nonmaskable interrupt</td>
<td>NMI*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Boot mode data in</td>
<td>Model*</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Boot mode clock</td>
<td>ModeClock</td>
<td>O</td>
<td>High</td>
<td>No</td>
<td>d</td>
</tr>
<tr>
<td>JTAG data in</td>
<td>JTDI</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>JTAG data out</td>
<td>JTDQ</td>
<td>O</td>
<td>High</td>
<td>Yes</td>
<td>b</td>
</tr>
<tr>
<td>JTAG command</td>
<td>JTMS</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>JTAG clock input</td>
<td>JTCK</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Transmit clocks</td>
<td>TClock[1:0]</td>
<td>O</td>
<td>High</td>
<td>Yes</td>
<td>c</td>
</tr>
<tr>
<td>Receive clocks</td>
<td>RClock[1:0]</td>
<td>O</td>
<td>High</td>
<td>Yes</td>
<td>c</td>
</tr>
<tr>
<td>Master clock</td>
<td>MasterClock</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Master clock out</td>
<td>MasterOut</td>
<td>O</td>
<td>High</td>
<td>Yes</td>
<td>c</td>
</tr>
<tr>
<td>Synchronization clock out</td>
<td>SyncOut</td>
<td>O</td>
<td>High</td>
<td>Yes</td>
<td>c</td>
</tr>
<tr>
<td>Synchronization clock in</td>
<td>SynCh</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>I/O output</td>
<td>IOOut</td>
<td>O</td>
<td>High</td>
<td>Yes</td>
<td>b</td>
</tr>
<tr>
<td>I/O input</td>
<td>IOIn</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Vcc is OK</td>
<td>VCCOk</td>
<td>I</td>
<td>High</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Cold reset</td>
<td>ColdReset*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Reset</td>
<td>Reset*</td>
<td>I</td>
<td>Low</td>
<td>No</td>
<td>NA</td>
</tr>
<tr>
<td>Fault</td>
<td>Fault*</td>
<td>O</td>
<td>Low</td>
<td>Yes</td>
<td>b</td>
</tr>
</tbody>
</table>

Key to Reset State Column:

a  All I/O pins (SysAD[63:0], SysADC[7:0], etc.) remain 3-stated until the Reset* signal deasserts.

b  All output only pins (ValidOut*, Release*, etc.), except the clocks, are 3-stated until the ColdReset* signal deasserts.

c  All clocks, except ModeClock, are 3-stated until VCCOk asserts.

d  ModeClock is always driven.

NA  Not applicable to input pins.
Power-on Reset

Figure 9.1, Figure 9.2, and Figure 9.3 illustrate the power-on, warm, and cold resets.

This is the sequence for a power-on reset:
1. Power-on reset applies a stable Vcc of at least 4.5 (3.0) volts from the 5.0 (3.3) volt power supply to the processor. During this time, VCCOk is deasserted. ColdReset* and Reset* are asserted and the MasterClock input oscillates.
2. After at least 100 ms of stable Vcc and MasterClock, the VCCOk signal is asserted to the processor. The assertion of VCCOk begins the initialization of the processor. After the mode bits have been read in, the processor allows its internal phase locked loops to lock, stabilizing the processor internal clock, PClock, the SyncOut-SyncIn clock path (described in Chapter 10), and the master clock output, MasterOut.
3. ColdReset* is asserted for at least 64K (or $2^{16}$) MasterClock cycles after the assertion of VCCOk. Once the processor reads the boot-time mode control serial data stream, ColdReset* can be deasserted. ColdReset* must be deasserted synchronously with MasterClock.
4. The deasserted edge of ColdReset* synchronizes the edges of SClock, TClock, and RClock (to all processors, if in a multiprocessor system).
5. After ColdReset* is deasserted synchronously and SClock, TClock, and RClock have stabilized, Reset* is deasserted to allow the processor to begin running. (Reset* must be held asserted for at least 64 MasterClock cycles after the deassertion of ColdReset*.) Reset* must be deasserted synchronously with MasterClock.

Note: ColdReset* must be asserted when VCCOk asserts. The behavior of the processor is undefined if VCCOk asserts while ColdReset* is deasserted.

Cold Reset

A cold reset can begin anytime after the processor has read the initialization data stream, causing the processor to start with the Reset exception.

A cold reset requires the same sequence as a power-on reset except that the power is presumed to be stable before the assertion of the reset inputs and the deassertion of VCCOk.

To begin the reset sequence, VCCOk must be deasserted for a minimum of 100 ms before reassertion.

Warm Reset

To execute a warm reset, the Reset* input is asserted synchronously with MasterClock. It is then held asserted for at least 64 MasterClock cycles before being deasserted synchronously with MasterClock. The processor internal clocks, PClock and SClock, and the System interface clocks, TClock and RClock, are not affected by a warm reset. The boot-time mode control serial data stream is not read by the processor on a warm reset. A warm reset forces the processor to start with a Soft Reset exception.

The master clock output, MasterOut, generates any reset-related signals for the processor that must be synchronous with MasterClock.

After a power-on reset, cold reset, or warm reset, all processor internal state machines are reset, and the processor begins execution at the reset vector. All processor internal states are preserved during a warm reset, although the precise state of the caches depends on whether or not a cache miss sequence has been interrupted by resetting the processor state machines.
Initialization Sequence
The boot-mode initialization sequence begins immediately after VCCOk is asserted. As the processor reads the serial stream of 256 bits through the ModeIn pin, the boot-mode bits initialize all fundamental processor modes. (The signals used are described in Chapter 8).

This is the initialization sequence:
1. The system deasserts the VCCOk signal. The ModeClock output is held asserted.
2. The processor synchronizes the ModeClock output at the time VCCOk is asserted. The first rising edge of ModeClock occurs at least 256 MasterClock cycles after VCCOk is asserted. There could be more clock cycles due to internal delays on the VCCOk signal. After the first rising edge, each additional rising edge will be 256 master clock cycles.
3. Each bit of the initialization stream is presented at the ModeIn pin after each rising edge of the ModeClock. The processor samples 256 initialization bits from the ModeIn input.

![Figure 9.1 Power-on Reset](image)

**Figure 9.1** Power-on Reset
Figure 9.2 Cold Reset
Unlike the R4000, the speed of the R4600/R4700 output drivers is statically controlled at boot time.

Table 9.2 lists the processor boot-mode settings. The following rules apply to the boot-mode settings listed in the table:

- Bit 0 of the stream is presented to the processor when VCCOK is first asserted.
- Selecting a reserved value results in undefined processor behavior.
- Bits 19 to 255 are reserved bits.
- Zeros must be scanned in for all reserved bits.
<table>
<thead>
<tr>
<th>Serial Bit</th>
<th>Value</th>
<th>Mode Setting</th>
<th>Serial Bit</th>
<th>Value</th>
<th>Mode Setting</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Reserved (must be zero)</td>
<td></td>
<td>9:10</td>
<td>Non-block Write</td>
<td>Selects the manner in which non-block writes are handled, bit 10 is most significant</td>
</tr>
<tr>
<td>1:4</td>
<td><strong>XmitDatPat:</strong> System interface data rate for block writes only (bit 4 most significant)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>DDDDD</td>
<td></td>
<td>10</td>
<td>R4x00 compatible</td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>DDxDDx</td>
<td></td>
<td>2</td>
<td>Pipelined Writes</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>DDxxxDDxx</td>
<td></td>
<td>3</td>
<td>Write re-issue</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>DxxDDxDxDDx</td>
<td>11</td>
<td></td>
<td>TmrIntEn: Disables the timer interrupt on Int*[5]</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>DDxxxxDDxxxx</td>
<td></td>
<td>13:14</td>
<td>Drv_Out: Output driver slew rate control. Bit 14 is most significant. Affects only outputs that are not clocks.</td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>DDxxxxxDDxxxx</td>
<td></td>
<td>10</td>
<td>100% strength (fastest)</td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>DxxxDDxDDxDxx</td>
<td>12</td>
<td>11</td>
<td>83% strength</td>
<td></td>
</tr>
<tr>
<td>7</td>
<td>DDxxxxxDxDDxxxxx</td>
<td>13:14</td>
<td>8</td>
<td>Reserved (must be zero)</td>
<td></td>
</tr>
<tr>
<td>8</td>
<td>DxxxDxxxxDDxDxxx</td>
<td></td>
<td>9-15</td>
<td>Reserved</td>
<td></td>
</tr>
<tr>
<td>5:7</td>
<td><strong>SysCkRatio:</strong> PClock to SClock divisor, frequency relationship between SClock, RClock, and TClock and PClock, bit 7 most significant.</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>Divide by 2</td>
<td></td>
<td>15</td>
<td>Tcloc[0]*:</td>
<td>[0] Enabled. [1] Disabled.</td>
</tr>
<tr>
<td>2</td>
<td>Divide by 4</td>
<td></td>
<td>17</td>
<td>Rcloc[0]*:</td>
<td>[0] Enabled. [1] Disabled.</td>
</tr>
<tr>
<td>4</td>
<td>Divide by 6</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>Divide by 7</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>Divide by 8</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>7</td>
<td>Reserved</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>8</td>
<td><strong>EndBit:</strong> Specifies byte ordering</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>Little-endian ordering</td>
<td></td>
<td>19:255</td>
<td>Reserved (must be zero)</td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>Big-endian ordering</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Table 9.2 Boot-Mode Settings
Introduction
This chapter describes the clock signals ("clocks") used in the R4600/R4700 processor and the processor status reporting mechanism.

The subject matter includes basic system clocks, system timing parameters, connecting clocks to a phase-locked system, connecting clocks to a system without phase locking, and processor status outputs.

Signal Terminology
The following terminology is used in this chapter (and book) when describing signals:
- **Rising edge** indicates a low-to-high transition.
- **Falling edge** indicates a high-to-low transition.
- **Clock-to-Q delay** is the amount of time it takes for a signal to move from the input of a device (clock) to the output of the device (Q).

Figure 10.1 and Figure 10.2 illustrate these terms.

Basic System Clocks
The various clock signals used in the R4600/R4700 processor are described below, starting with **MasterClock**, upon which the processor bases all internal and external clocking. Note: All output clocks will have approximately a 50% duty cycle ± the jitter and any difference in rise and/or fall times.

**MasterClock**
The processor bases all internal and external clocking on the single **MasterClock** input signal. The processor generates the clock output signal, **MasterOut**, at the same frequency as **MasterClock** and aligns **MasterOut** with **MasterClock**, if **SyncIn** is properly connected to **SyncOut**.
**MasterOut**

The processor generates the clock output signal, MasterOut, at the same frequency as MasterClock and aligns MasterOut with MasterClock, if SyncIn is properly connected to SyncOut. MasterOut clocks certain external logic, such as the reset logic.

**SyncIn/ SyncOut**

The processor generates SyncOut at the same frequency as MasterClock and aligns SyncIn with MasterClock. SyncOut must be connected to SyncIn either directly, or through an external buffer. The processor can compensate for both output driver and input buffer delays (and, when necessary, delay caused by an external buffer according to the connections of TClock and RClock to the rest of the system) when aligning SyncIn with MasterClock. Figure 10.8 on page 10-9 gives an illustration of SyncOut connected to SyncIn through an external buffer.

**PClock**

The processor generates an internal clock, PClock, at twice the frequency of MasterClock and precisely aligns every other rising edge of PClock with the rising edge of MasterClock.

All internal registers and latches use PClock, which is the pipeline clock rate.

**SClock**

The R4600/R4700 processor divides PClock by 2, 3, 4, 5, 6, 7 or 8, programmed at boot-mode initialization to generate the internal clock signal, SClock. The processor uses SClock to sample data at the system interface and to clock data into the processor system interface output registers.

The first rising edge of SClock, after ColdReset* is deasserted, is aligned with the first rising edge of MasterClock.

**TClock**

TClock (transmit clock) clocks the output registers of an external agent, and can be a global system clock for any other logic in the external agent. TClock is identical to SClock. The edges of TClock align precisely with the edges of SClock and TClock can also be aligned with MasterClock, when SyncIn is properly connected to SyncOut.

**RClock**

The external agent uses RClock (receive clock) to clock its input registers. The processor generates RClock at the same frequency as SClock, although RClock leads TClock and SClock by 25 percent of SClock cycle time.
Figure 10.3 shows the clocks for a \textbf{PClock-to-SClock} division by 2.

System Timing Parameters

As shown in Figure 10.3, data provided to the processor must be stable a minimum of $t_{DS}$ nanoseconds (ns) before the rising edge of \textbf{SClock} and be held valid for a minimum of $t_{DH}$ ns after the rising edge of \textbf{SClock}.

Alignment to \textbf{SClock}

Processor data becomes stable a minimum of $t_{DM}$ ns and a maximum of $t_{DO}$ ns after the rising edge of \textbf{SClock}. This drive-time is the sum of the maximum delay through the processor output drivers together with the maximum clock-to-Q delay of the processor output registers.

Alignment to \textbf{MasterClock}

Certain processor inputs (specifically \textbf{VCCOk}, \textbf{ColdReset*}, and \textbf{Reset*}) are sampled based on \textbf{MasterClock}, while others are output based on \textbf{MasterClock}. The same setup, hold, and drive-off parameters, $t_{DS}$, $t_{DH}$, $t_{DM}$, and $t_{DO}$, shown in Figure 10.3, apply to these inputs and outputs, but they are measured relative to \textbf{MasterClock} instead of \textbf{SClock}.

Phase-Locked Loop (PLL)

The processor aligns \textbf{SyncOut}, \textbf{PClock}, \textbf{SClock}, \textbf{TClock}, and \textbf{RClock} with internal phase-locked loop (PLL) circuits that generate aligned clocks based on \textbf{SyncOut}/\textbf{SyncIn}. By their nature, PLL circuits are only capable of generating aligned clocks for \textbf{MasterClock} frequencies within a limited range.
Clocks generated using PLL circuits contain some inherent inaccuracy, or *jitter*, a clock aligned with **MasterClock** by the PLL can lead or trail **MasterClock** by as much as the related maximum jitter specified in the data sheet.

## PLL Components and Operation

The passive components required for the Phase Locked Loop circuit are contained in the packages for the R4600 and R4700. There are no required external passive components.

### Passive Components

The Phase Locked Loop circuit requires several passive components for proper operation, which are connected to **PLLCap0**, **PLLCap1**, **VccP**, and **VssP**, as illustrated in Figure 10.4.

![Figure 10.4 PLL Passive Components](image)

It is essential to isolate the analog power and ground for the PLL circuit (**VccP/VssP**) from the regular power and ground (**Vcc/Vss**). Initial evaluations have yielded good results with the following values:

\[
\begin{align*}
R &= 5 \text{ ohms} \\
C1 &= 1 \text{ nF} \\
C2 &= 82 \text{ nF} \\
C3 &= 10 \text{ \mu F} \\
Cp &= 470 \text{ pF}
\end{align*}
\]

Since the optimum values for the filter components depend upon the application and the system noise environment, these values should be considered as starting points for further experimentation within your specific application.
Figure 10.5 shows the internal PLL and clock distribution network of the R4600/R4700.

Connecting Clocks to a Phase-Locked System

When the processor is used in a phase-locked system, the external agent must phase lock its operation to a common MasterClock. In such a system, the delivery of data and data sampling have common characteristics, even if the components have different delay values. For example, transmission time (the amount of time a signal takes to move from one component to another along a trace on the board) between any two components A and B of a phase-locked system can be calculated from the following equation:

\[
\text{Transmission Time} = (\text{SClock period}) - (t_{DO} \text{ for A}) - (t_{DS} \text{ for B}) - (\text{Clock Jitter for A Max}) - (\text{Clock Jitter for B Max})
\]
Figure 10.6 shows a block-level diagram of a phase-locked system using the R4600/R4700 processor.

Connecting Clocks to a System without Phase Locking

When the R4600/R4700 processor is used in a system in which the external agent cannot lock its phase to a common MasterClock, the output clocks RClock and TClock can clock the remainder of the system. Two clocking methodologies are described in this section: connecting to a gate-array device or connecting to discrete CMOS logic devices.

Connecting to a Gate-Array Device

When connecting to a gate-array device, both RClock and TClock are used within the gate-array. The gate array internally buffers RClock and uses this buffered version to clock registers that sample processor outputs.

These sampling registers should be immediately followed by staging registers clocked by an internally buffered version of TClock. This buffered version of TClock should be the global system clock for the logic inside the gate array and the clock for all registers that drive processor inputs. Figure 10.7 on page 10-7 is a block diagram of this circuit.

Staging registers place a constraint on the sum of the clock-to-Q delay of the sample registers and the setup time of the synchronizing registers inside the gate arrays, as shown in the following equation:

\[
\text{Clock-to-Q Delay + Setup of Synch Register} \leq 0.25 \times (\text{RClock period})
\]

\[
- (\text{Max Clock Jitter for RClock})
\]

\[
- (\text{Max Delay Mismatch for Clock Buffers on RClock and TClock})
\]
Figure 10.7 is a block diagram of a system without phase lock, using the R4600/R4700 processor with an external agent implemented as a gate array.

In a system without phase lock, the transmission time for a signal from the processor to an external agent composed of gate arrays can be calculated from the following equation:

Transmission Time = (75 percent of TClock period) – (t_{DO} for R4600/R4700) + (Min External Clock Buffer Delay) – (External Sample Register Setup Time) – (Max Clock Jitter for R4600/R4700 Internal Clocks) – (Max Clock Jitter for RClock)
The transmission time for a signal from an external agent composed of gate arrays to the processor in a system without phase lock can be calculated from the following equation:

Transmission Time = (TClock period) – (t_{DS} for R4600/R4700) – (Max External Clock Buffer Delay) – (Max External Output Register Clock-to-Q Delay) – (Max Clock Jitter for TClock) – (Max Clock Jitter for R4600/R4700 Internal Clocks)

**Connecting to a CMOS Logic System**

The processor uses matched delay clock buffers to generate aligned clocks to external CMOS logic. A matched delay clock buffer is inserted in the **SyncOut/SyncIn** alignment path of the processor, skewing **SyncOut, MasterOut, RClock**, and **TClock** to lead **MasterClock** by the buffer delay amount, while leaving **PClock** aligned with **MasterClock**.

The remaining matched delay clock buffers are available to generate a buffered version of **TClock** aligned with **MasterClock**. Alignment error of this buffered **TClock** is the sum of the maximum delay mismatch of the matched delay clock buffers, and the maximum clock jitter of **TClock**.

As the global system clock for the discrete logic that forms the external agent, the buffered version of **TClock** clocks registers that sample processor outputs, as well as clocking the registers that drive the processor inputs.

The transmission time for a signal from the processor to an external agent composed of discrete CMOS logic devices can be calculated from the following equation:

Transmission Time = (TClock period) – (t_{DO} for R4600/R4700) – (External Sample Register Setup Time) – (Max External Clock Buffer Delay Mismatch) – (Max Clock Jitter for R4600/R4700 Internal Clocks) – (Max Clock Jitter for TClock)
Figure 10.8 is a block diagram of a system without phase lock, employing the R4600/R4700 processor and an external agent composed of both a gate array and discrete CMOS logic devices.

![Block Diagram of System Without Phase Lock](image)

The transmission time for a signal from an external agent composed of discrete CMOS logic devices can be calculated from the following equation:

\[
\text{Transmission Time} = (\text{TClock period}) - (t_{DS} \text{ for R4600/R4700}) - (\text{Max External Output Register Clock-to-Q Delay}) - (\text{Max External Clock Buffer Delay Mismatch}) - (\text{Max Clock Jitter for R4600/R4700 Internal Clocks}) - (\text{Max Clock Jitter for T Clock})
\]

In this clocking methodology, the hold time of data driven from the processor to an external sampling register is a critical parameter. To guarantee hold time, the minimum output delay of the processor, \( t_{DM} \), must be greater than the sum of the following:

\[
\text{Min hold time for the external sampling register} + \text{max clock jitter for R4600/R4700 internal clocks} + \text{max clock jitter for T Clock} + \text{max delay mismatch of the external clock buffers}
\]
Introduction
This chapter describes in detail the cache memory: its place in the R4600/R4700 memory organization and individual operations of the primary cache.

This chapter uses the following terminology:
• The primary cache may also be referred to as the P-cache.
• The primary data cache may also be referred to as the D-cache.
• The primary instruction cache may also be referred to as the I-cache.

These terms are used interchangeably throughout this book.

Memory Organization

Figure 11.1 shows the R4600/R4700 system memory hierarchy. In the logical memory hierarchy, caches lie between the CPU and main memory. They are designed to make the speedup of memory accesses transparent to the user. Each functional block in Figure 11.1 has the capacity to hold more data than the block above it. For instance, physical main memory has a larger capacity than the primary cache. At the same time, each functional block takes longer to access than any block above it. For instance, it takes longer to access data in main memory than in the CPU on-chip registers.

The R4600/R4700 processor has two on-chip primary caches: one holds instructions (the instruction cache), the other holds data (the data cache).
Overview of Cache Operations

As described earlier, caches provide fast temporary data storage, and they make the speedup of memory accesses transparent to the user. In general, the processor accesses cache-resident instructions or data through the following procedure:

1. The processor, through the on-chip cache controller, attempts to access the next instruction or data in the primary cache.

2. The cache controller checks to see if this instruction or data is present in the primary cache.
   - If the instruction/data is present, the processor retrieves it. This is called a primary-cache hit.
   - If the instruction/data is not present in the primary cache, it is retrieved as a cache line from memory and is written into the primary cache.

3. The processor retrieves the instruction/data from the primary cache and operation continues. For a data cache miss, the processor can restart the pipeline after the first doubleword (the one at the miss address) is retrieved and continues the cache line refill in parallel.

It is possible for the same data to be in two places simultaneously: main memory and the primary cache. This data is kept consistent through the use of either a write-back or a write-through methodology. For a write-back cache, the modified data is not written back to memory until the cache line is replaced. In a write-through cache, the data is written to memory as the cached data is modified (with a possible delay due to the write buffer).

R4600/ R4700 Cache Description

This section describes the organization of on-chip primary caches. As Figure 11.1 on page 1 shows, the R4600/R4700 contains separate primary instruction and data caches.

Figure 11.2 provides block diagrams of the R4600/R4700 memory model.

![Figure 11.2 Cache Support in the R4600/ R4700](image)

Cache Line Size

A cache line is the smallest unit of information that can be fetched from memory to be filled into the cache. A primary cache line is 8 words in length, and is represented by a single tag.

Upon a cache miss in the primary cache, the missing cache line is loaded from memory into the primary cache.

Cache Organization and Accessibility

This section describes the organization of the primary cache, including the manner in which it is mapped, the addressing used to index the cache, and composition of the cache lines. The primary instruction and data caches are indexed with a virtual address (VA).
Organization of the Primary Instruction Cache (I-Cache)

Each line of primary i-cache data (although it is actually an instruction, it is referred to as data to distinguish it from its tag) has an associated 28-bit tag that contains a 24-bit physical address, a single valid bit, a reserved bit, a single parity bit and the FIFO replacement bit. Word parity is used on i-cache data.

The R4600/R4700 processor primary i-cache has the following characteristics:
- two-way set associative
- indexed with a virtual address
- checked with a physical tag
- organized with 8-word (32-byte) cache line.

Figure 11.3 shows the format of a primary i-cache line.

```
 27  26  25  24  23
  F   P   0   V  PTag
  1   1   1   1

PTag: Physical tag (bits 35:12 of the physical address)
V: Valid bit
F: FIFO Replacement Bit. Complemented on refill.
P: Even parity for the PTag and V fields
DataP: Even parity; 1 parity bit per word of data
Data: Cache data
```

Organization of the Primary Data Cache (D-Cache)

Each line of primary d-cache data has an associated 30-bit tag that contains a 24-bit physical address, 2-bit cache line state, a write-back bit, a parity bit for the physical address and cache state fields, a parity bit for the write-back bit and the FIFO replacement bit.

The R4600/R4700 processor primary d-cache has the following characteristics:
- write-back or write-through on a per-page basis
- two-way set associative
- indexed with a virtual address
- checked with a physical tag
- organized with 8-word (32-byte) cache line.
Figure 11.4 shows the format of a primary D-cache line.

In the R4600/R4700, the W (write-back) bit, not the cache state, indicates whether or not the primary cache contains modified data that must be written back to memory.

**Note:** There is no hardware support for cache coherency. Thus the only cache states used are Dirty Exclusive and Invalid.
Accessing the Primary Caches

Figure 11.5 shows the virtual address (VA) index into the primary caches. Each instruction and data cache size is 16 Kbytes.

Figure 11.5 Primary Cache Data and Tag Organization

Cache States

The terms below are used to describe the state of a cache line:

- **Exclusive**: a cache line that is present in exactly one cache in the system is exclusive. This is always the case for the R4600/R4700. All cache lines are in an exclusive state.
- **Dirty**: a cache line that contains data that has changed since it was loaded from memory is dirty.
- **Clean**: a cache line that contains data that has not changed since it was loaded from memory is clean.
- **Shared**: a cache line that is present in more than one cache in the system. The R4600/R4700 does not provide for hardware cache coherency. This state should never happen in normal operations.

The R4600/R4700 only supports the four cache states as shown in Table 11.1 on page 6. The only states that will occur in the R4600/R4700, under normal operations are the Dirty Exclusive and Invalid states.

**Note**: Even though valid data is in the Dirty Exclusive state, it may still be consistent with memory. One must look at the dirty bit, W, to determine if the cache line is to be written back to memory when it is replaced.
Each primary cache line in the R4600/R4700 system is in one of the states described in Table 11.1.

<table>
<thead>
<tr>
<th>Cache Line State</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Invalid</td>
<td>A cache line that does not contain valid information must be marked invalid, and cannot be used. A cache line in any other state than invalid is assumed to contain valid information.</td>
</tr>
<tr>
<td>Shared</td>
<td>A cache line that is present in more than one cache in the system is shared. This state will not occur for normal operations.</td>
</tr>
<tr>
<td>Clean Exclusive</td>
<td>A clean exclusive cache line contains valid information and this cache line is not present in any other cache. The cache line is consistent with memory and is not owned by the processor (see “Cache Line Ownership” on page 6 in this chapter). This state will not occur for normal operations.</td>
</tr>
<tr>
<td>Dirty Exclusive</td>
<td>A dirty exclusive cache line contains valid information and is not present in any other cache. The cache line may or may not be consistent with memory and is owned by the processor (see “Cache Line Ownership” on page 6 in this chapter). Use the W bit to determine if the line must be written back on replacement.</td>
</tr>
</tbody>
</table>

**Table 11.1 Cache States**

**Primary Cache States**

Each primary data cache line is normally in one of the following states:
- invalid
- dirty exclusive

Each primary instruction cache line is in one of the following states:
- invalid
- valid

**Cache Line Ownership**

The processor is the owner of a cache line when it is in the dirty exclusive state and is responsible for the contents of that line. There can only be one owner for each cache line.

The ownership of a cache line is set and maintained through the rules described below.

- A processor assumes ownership of the cache line if the state of the primary cache line is dirty exclusive.
- A processor that owns a cache line is responsible for writing the cache line back to memory if the line is replaced during the execution of a Write-back or Write-back Invalidate cache instruction if the line is in a write-back page. The Cache instruction is explained in Appendix A.
- Memory always owns clean cache lines
- The processor gives up ownership of a cache line when the state of the cache line changes to invalid.

Therefore, based on these rules and that any valid data cache line is in the Dirty Exclusive state (under normal operating conditions), the processor is considered to be the owner of the cache line.

**Cache Write Policy**

The R4600/R4700 processor manages its primary data cache by using either a write-back or a write-through policy on a per-page basis. In a write-back cache, the data is not written back to memory until the cache line is replaced. A write-through policy means the store data is written to the cache and to memory. The write of the data to memory may not occur at the same time as the write to cache due to the write buffer.

For a write-back entry, if the cache line is valid and has been modified (the W bit is set), the processor writes this cache line back to memory when the line is replaced, either in the course of satisfying a cache miss or during the execution of a Write-back or Write-back Invalidate CACHE instruction.
For a write-through entry, whenever a store hits in the cache line, the data is also written to memory via the write buffer. The store will not set or clear the \( W \) bit for a write-through cache line. This is to allow a different virtual address that maps to the same physical address and with a write-back policy to still set the \( W \) bit. For a miss to a write-through line, the action taken will be determined by the write-allocation policy. For a write-allocate entry, the cache line is first retrieved from memory and the store will then continue. A no write-allocate entry will just post the write to the system interface, via the write buffer, in the same manner as an uncached write.

When the processor writes a cache line back to memory, it does not ordinarily retain a copy of the cache line, and the state of the cache line is changed to invalid. However, there are exceptions. For example, the processor retains a copy of the cache line if a cache line is written back by the Hit Write-back cache instruction. If the \( W \) bit is set, the cache line is written back and the \( W \) bit is cleared. The processor signals this line retention during a write by setting \texttt{SysCmd(2)} to a 1, as described in Chapter 12.

**Cache State Transition Diagrams**

The following sections describe the cache state diagrams that illustrate the cache state transitions for the primary cache. Figure 11.6 shows the state diagram of the primary cache.

When an external agent supplies a cache line, it need not return the initial state of the cache line, for normal operations (see Chapter 12 for a definition of an external agent). This is because the only read request the R4600/R4700 should issue are for non-coherent data and the lower three bits for the data identifier are reserved. The initial state will automatically be set to DE by the R4600/R4700. Otherwise, the processor changes the state of the cache line during one of the following events:

- A store to a dirty exclusive line remains in a dirty exclusive state.
- The state is changed to invalid for:
  - A Cache invalidate operation.
  - If the line is replaced

![Figure 11.6 Primary Data Cache State Diagram](image)

**Cache Coherency Overview**

Systems using more than one master must have a mechanism to maintain data consistency throughout the system. This mechanism is called a cache coherency protocol. The R4600/R4700 does not provide any hardware cache coherency. Cache coherency must be handled with software.

**Cache Coherency Attributes**

Cache coherency attributes are necessary to ensure the consistency of data throughout the system.
Bits in the translation look-aside buffer (TLB) control coherency on a per-page basis. Specifically, the TLB contains 3 bits per entry that provide two possible coherency attribute types; they are listed below and described more fully in the following sections:

- uncached
- noncoherent (includes 3 attribute values)

Table 11.2 summarizes the behavior of the processor on load misses and store misses for each of the coherency attribute types listed above. The following sections describe in detail these coherency attribute types.

<table>
<thead>
<tr>
<th>Attribute Type</th>
<th>Load Miss</th>
<th>Store Miss</th>
</tr>
</thead>
<tbody>
<tr>
<td>Uncached</td>
<td>Main memory read</td>
<td>Main memory write</td>
</tr>
<tr>
<td>Noncoherent</td>
<td>Noncoherent read</td>
<td>Noncoherent read (write-allocate page)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Main memory write (no write-allocate page)</td>
</tr>
</tbody>
</table>

Table 11.2 Coherency Attributes and Processor Behavior

**Uncached**

Lines within an uncached page are never in a cache. When a page has the uncached coherency attribute, the processor issues a doubleword, partial-doubleword, word, or partial-word read or write request directly to main memory (bypassing the cache) for any load or store to a location within that page.

**Noncoherent**

Lines with a noncoherent attribute type can reside in a cache; a load miss causes the processor to issue a noncoherent block read request to a location within the cached page. For a store miss to a write-allocate page, the processor issues a noncoherent block read request to a location within the cached page and then does the write-through. If the page has the no write-allocate attribute, a store miss will generate a write to the memory as in the uncached case.

**Cache Operation Modes**

The R4600/R4700 processor only supports the no-secondary-cache mode (only uncached and noncoherent coherency attributes are applicable) of R4x00 operation.

**R4600/ R4700 Processor Synchronization Support**

In a multiprocessor system, it is essential that two or more processors working on a common task execute without corrupting each other’s subtasks. Synchronization, an operation that guarantees an orderly access to shared memory, must be implemented for a properly functioning multiprocessor system. Two of the more widely used methods are discussed in this section: test-and-set, and counter. Even though the R4600/R4700 does not support symmetric multi-processing (SMP), these are useful for multi-master and heterogenous multi-processing.

**Test-and-Set**

Test-and-set uses a variable called the semaphore, which protects data from being simultaneously modified by more than one processor. In other words, a processor can lock out other processors from accessing shared data when the processor is in a critical section, a part of program in which no more than a fixed number of processors is allowed to execute. In the case of test-and-set, only one processor can enter the critical section.
Figure 11.7 illustrates a test-and-set synchronization procedure that uses a semaphore; when the semaphore is set to 0, the shared data is unlocked, and when the semaphore is set to 1, the shared data is locked.

The processor begins by loading the semaphore and checking to see if it is unlocked (set to 0) in steps 1 and 2. If the semaphore is not 0, the processor loops back to step 1. If the semaphore is 0, indicating the shared data is not locked, the processor next tries to lock out any other access to the shared data (step 3). If not successful, the processor loops back to step 1, and reloads the semaphore.

If the processor is successful at setting the semaphore (step 4), it executes the critical section of code (step 5) and gains access to the shared data, completes its task, unlocks the semaphore (step 6), and continues processing.

**Counter**

Another common synchronization technique uses a counter. A counter is a designated memory location that can be incremented or decremented.

In the test-and-set method, only one processor at a time is permitted to enter the critical section. Using a counter, up to \(N\) processors are allowed to concurrently execute the critical section. All processors after the \(N\)th processor must wait until one of the \(N\) processors exits the critical section and a space becomes available.

The counter works by not allowing more than one processor to modify it at any given time. Conceptually, the counter can be viewed as a variable that counts the number of limited resources (for example, the number of processes, or software licenses, etc.).
Figure 11.8 shows this process.

**Figure 11.8  Synchronization Using a Counter**

**Load Linked and Store Conditional**

The R4600/R4700 instructions *Load Linked* (LL) and *Store Conditional* (SC) provide support for processor synchronization. These two instructions work very much like their simpler counterparts, load and store. The LL instruction, in addition to doing a simple load, has the side effect of setting a bit called the link bit. This link bit forms a breakable link between the LL instruction and the subsequent SC instruction. The SC performs a simple store if the link bit is set when the store executes. If the link bit is not set, then the store fails to execute. The success or failure of the SC is indicated in the target register of the store.

The link is broken upon completion of an ERET (return from exception) instruction.

The most important features of LL and SC are:

- They provide a mechanism for generating all of the common synchronization primitives including test-and-set, counters, sequencers, etc., with no additional overhead.
- When they operate, bus traffic is generated only if the state of the cache line changes; lock words stay in the cache until some other processor takes ownership of that cache line.
**Examples Using LL and SC**

Figure 11.9 shows how to implement test-and-set using LL and SC instructions.

![Flowchart for test-and-set using LL and SC instructions](image-url)

**Figure 11.9 Test-and-Set using LL and SC**
Figure 11.10 shows synchronization using a counter.

```
Loop1: LL r2,(r1)
       BLEZ r2,Loop1
       NOP

       SUB r3,r2,1
       SC r3,(r1)

       BEQ r3,0,Loop1
       NOP

...  

Loop2: LL r2,(r1)
       ADD r3,r2,1
       SC r3,(r1)

       BEQ r3,0,Loop2
       NOP
```

**Figure 11.10 Counter Using LL and SC**
Introduction
The System interface allows the processor to access external resources needed to satisfy cache misses and uncached operations, while permitting an external agent access to some of the processor internal resources.

This chapter describes the system interface from the point of view of both the processor and the external agent.

Terminology
The following terms are used in this chapter:

An external agent is any logic device connected to the processor, over the system interface, that allows the processor to issue requests.

A system event is an event that occurs within the processor and requires access to external system resources.

Sequence refers to the precise series of requests that a processor generates to service a system event.

Protocol refers to the cycle-by-cycle signal transitions that occur on the system interface pins to assert a processor or external request.

Syntax refers to the precise definition of bit patterns on encoded buses, such as the command bus.

System Interface Description
The R4600/R4700 processor supports a 64-bit address/data interface that can construct a simple uniprocessor with main memory. The System interface consists of:

- 64-bit address and data bus, SysAD
- 8-bit SysAD check bus, SysADC (even parity only)
- 9-bit command bus, SysCmd
- six handshake signals:
  - RdRdy*, WrRdy*
  - ExtRqst*, Release*
  - ValidIn*, ValidOut*

The processor uses the system interface to access external resources in order to service processor requests such as cache misses, cache line write-backs, write-through stores and uncached operations.
Interface Buses

Figure 12.1 shows the primary communication paths for the system interface: a 64-bit address and data bus, \texttt{SysAD(63:0)}, and a 9-bit command bus, \texttt{SysCmd(8:0)}. These \texttt{SysAD} and the \texttt{SysCmd} buses are bidirectional; that is, they are driven by the processor to issue a processor request, and by the external agent to issue an external request (see “Processor and External Request Protocols” on page 12-14 for more information).

A request through the system interface consists of:

- an address
- a System interface command that specifies the precise nature of the request
- a series of data elements if the request is for a write or read response.

A request through the system interface consists of:
- an address
- a System interface command that specifies the precise nature of the request
- a series of data elements if the request is for a write or read response.

Address and Data Cycles

Cycles in which the \texttt{SysAD} bus contains a valid address are called \textit{address cycles}. Cycles in which the \texttt{SysAD} bus contains valid data are called \textit{data cycles}. Validity is determined by the state of the \texttt{ValidIn*} and \texttt{ValidOut*} signals (described in “Interface Buses” on page 12-2).

The \texttt{SysCmd} bus identifies the contents of the \texttt{SysAD} bus during any cycle in which it is valid. The most significant bit of the \texttt{SysCmd} bus is always used to indicate whether the current cycle is an address cycle or a data cycle.

- During address cycles [\texttt{SysCmd(8) = 0}], the remainder of the \texttt{SysCmd} bus, \texttt{SysCmd(7:0)}, contains a \textit{System interface command} (the encoding of system interface commands is detailed in “System Interface Commands and Data Identifiers” on page 12-32).
- During data cycles [\texttt{SysCmd(8) = 1}], the remainder of the \texttt{SysCmd} bus, \texttt{SysCmd(7:0)}, contains a \textit{data identifier} (the encoding of data identifiers is detailed later in this chapter).
**Issue Cycles**

There are two types of processor issue cycles:
- processor read request issue cycles
- processor write request issue cycles.

The processor samples the signal `RdRdy*` to determine the *issue cycle* for a processor read request; the processor samples the signal `WrRdy*` to determine the *issue cycle* of a processor write request.

As shown in Figure 12.2, `RdRdy*` must be asserted for one clock cycle, two cycles prior to the address cycle of the processor read request to define the address cycle as the issue cycle (cycle 5 in Figure 12.2). `RdRdy*` does not need to be asserted during the issue cycle.

![Figure 12.2 State of RdRdy* Signal for Read Requests](image)

Note: `RdRdy*` must be sampled LOW at the end of cycle 3, which is marked with an asterisk.

As shown in Figure 12.3, `WrRdy*` must be asserted for one clock cycle, two cycles prior to the first address cycle of the processor write request to define the address cycle as the issue cycle (cycle 5 in Figure 12.3). `WrRdy*` does not need to be asserted during the issue cycle.

![Figure 12.3 State of WrRdy* Signal for Write Requests](image)

Note: `WrRdy*` must be sampled LOW at the end of cycle 3, which is marked with an asterisk.

The processor repeats the address cycle for the request until the conditions for a valid issue cycle are met. After the issue cycle, if the processor request requires data to be sent, the data transmission begins. There is only one issue cycle for any processor request.

The processor accepts external requests, even while attempting to issue a processor request, by releasing the system interface to slave state in response to an assertion of `ExtRqst*` by the external agent.
Note that the rules governing the issue cycle of a processor request are strictly applied to determine the action the processor takes. The processor either:
- completes the issuance of the processor request in its entirety before the external request is accepted, or
- releases the system interface to slave state without completing the issuance of the processor request.

In the latter case, the processor issues the processor request (provided the processor request is still necessary) after the external request is complete. The rules governing an issue cycle again apply to the processor request.

**Handshake Signals**

The processor manages the flow of requests through the following six control signals:
- \textbf{RdRdy*}, \textbf{WrRdy*} are used by the external agent to indicate when it can accept a new read (\textbf{RdRdy*}) or write (\textbf{WrRdy*}) transaction.
- \textbf{ExtRqst*}, \textbf{Release*} are used to transfer control of the \textbf{SysAD} and \textbf{SysCmd} buses. \textbf{ExtRqst*} is used by an external agent to indicate a need to control the interface. \textbf{Release*} is asserted by the processor when it transfers the mastership of the system interface to the external agent.
- The \textbf{R4600/R4700} processor uses \textbf{ValidOut*} and the external agent uses \textbf{ValidIn*} to indicate valid command/data on the \textbf{SysCmd}/\textbf{SysAD} buses.

**System Interface Protocols**

Figure 12.4 shows the system interface operates from register to register. That is, processor outputs come directly from output registers and begin to change with the rising edge of \textbf{SClock}.

Processor inputs are fed directly to input registers that latch these input signals with the rising edge of \textbf{SClock}. This allows the system interface to run at the highest possible clock frequency.

---

1. \textbf{SClock} is an internal clock used by the processor to sample data at the system interface and to clock data into the processor system interface output registers; see Chapter 10 for more details.
Master and Slave States

When the R4600/R4700 processor is driving the SysAD and SysCmd buses, the system interface is in master state. When the external agent is driving the SysAD and SysCmd buses, the system interface is in slave state.

In master state, the processor drives the SysAD and SysCmd buses and will assert the signal ValidOut* whenever these buses are valid.

In slave state, the external agent drives the SysAD and SysCmd buses and asserts the signal ValidIn* whenever these buses are valid.

Moving from Master to Slave State

The system interface remains in master state unless one of the following occurs:

• The external agent requests and is granted the system interface (external arbitration).
• The processor issues a read request and performs an uncompelled change to slave state.

External Arbitration

The system interface must be in slave state for the external agent to issue an external request through the system interface. The transition from master state to slave state is arbitrated by the processor using the system interface handshake signals ExtRqst* and Release*. This transition is described by the following procedure:

1. An external agent signals that it wishes to issue an external request by asserting ExtRqst*.
2. When the processor is ready to accept an external request, it releases the system interface from master to slave state by asserting Release* for one cycle.
3. The system interface returns to master state as soon as the issue of the external request is complete.

This process is described in “External Arbitration Protocol” on page 12-24.

Uncompelled Change to Slave State

An uncompelled change to slave state is the transition of the system interface from master state to slave state, initiated by the processor when a processor read request is pending. Release* is asserted automatically after a read request. An uncompelled change to slave state occurs during the issue cycle of a read request.

After an uncompelled change to slave state, the processor returns to master state at the end of the next external request. This can be a read response, or some other type of external request.

An external agent must note that the processor has performed an uncompelled change to slave state and begin driving the SysAD bus along with the SysCmd bus. As long as the system interface is in slave state, the external agent can begin a single external request without arbitrating for the system interface; that is, without asserting ExtRqst*.

After the external request, the system interface returns to master state.

Whenever a processor read request is pending, after the issue of a read request, the processor automatically switches the system interface to slave state, even though the external agent is not arbitrating to issue an external request. This transition to slave state allows the external agent to quickly return read response data.
Processor and External Requests

There are two broad categories of requests: *processor requests* and *external requests*. These two categories are described in this section.

When a system event occurs, the processor issues either a single request or a series of requests—called *processor requests*—through the system interface, to access an external resource and service the event. For this to work, the processor system interface must be connected to an external agent that is compatible with the system interface protocol, and can coordinate access to system resources.

An external agent requesting access to a processor status register generates an *external request*. This access request passes through the system interface. System events and request cycles are shown in Figure 12.5.

![Figure 12.5 Requests and System Events](image)

**Rules for Processor Requests**

The following rules apply to processor requests.

- After issuing a processor read request, the processor cannot issue a subsequent read request until it has received a read response.
- After the processor has issued a write request in R4x00 compatible write mode (set at boot time), the processor cannot issue a subsequent request until at least four cycles after the issue cycle of the write request. This means back-to-back write requests with a single data cycle are separated by two unused system cycles, as shown in Figure 12.6.
- After the processor has issued a write request in either of the two new write modes, write reissue and pipelined writes, the processor can issue a subsequent write immediately provided the WrRdy* requirement is met. This is discussed in more detail later in this chapter.
Processor Requests

A processor request is a request or a series of requests, through the system interface, to access some external resource. As shown in Figure 12.7, processor requests include only reads and writes.

Read request asks for a block, doubleword, partial doubleword, word, or partial word of data either from main memory or from another system resource.

Write request provides a block, doubleword, partial doubleword, word, or partial word of data to be written either to main memory or to another system resource.

Processor requests are managed by the processor in the equivalent of the R4000/R4400 no-secondary-cache mode.

In no-secondary-cache mode, the processor issues requests in a strict sequential fashion; that is, the processor is only allowed to have one request pending at any time. For example, the processor issues a read request and waits for a read response before issuing any subsequent requests. The processor submits a write request only if there are no read requests pending.

The processor has the input signals \textbf{RdRdy*} and \textbf{WrRdy*} to allow an external agent to manage the flow of processor requests. \textbf{RdRdy*} controls the flow of processor read requests, while \textbf{WrRdy*} controls the flow of processor write requests.

The processor request cycle sequence is shown in Figure 12.8.
Processor Read Request

When a processor issues a read request, the external agent must access the specified resource and return the requested data. (Processor read requests are described in this section; external read requests are described in “External Requests” on page 12-9.)

A processor read request can be split from the external agent’s return of the requested data; in other words, the external agent can initiate an unrelated external request before it returns the response data for a processor read. A processor read request is completed after the last word of response data has been received from the external agent.

Note that the data identifier (see “System Interface Commands and Data Identifiers” on page 12-32) associated with the response data can signal that the returned data is erroneous, causing the processor to take a bus error.

Processor read requests that have been issued, but for which data has not yet been returned, are said to be pending. A read remains pending until the requested read data is returned.

In no-secondary-cache mode, the external agent must be capable of accepting a processor read request any time the following two conditions are met:

- There is no processor read request pending.
- The signal $\text{RdRdy}^*$ has been asserted for one clock cycle, two cycles before the issue cycle.

Processor Write Request

When a processor issues a write request, the specified resource is accessed and the data is written to it. (Processor write requests are described in this section; external write requests are described in “External Requests” on page 12-9.)

A processor write request is complete after the last word of data has been transmitted to the external agent.

In no-secondary-cache mode, the external agent must be capable of accepting a processor write request any time the following two conditions are met:

- No processor read request is pending.
- The signal $\text{WrRdy}^*$ has been asserted for one clock cycle, two cycles before the issue cycle.
The R4600/R4700 has added two new modes to enhance the throughput of non-block writes. These modes allow for 2 cycle throughput on back-to-back non-block writes. The actual protocol is discussed in the write protocol section of this chapter. The external agent must be capable of accepting a processor write request in these modes under the same conditions as for the R4x00 compatibility mode (except as explained in the protocol section).

**External Requests**

External requests include read, write and null requests, as shown in Figure 12.9. This section also includes a description of read response, a special case of an external request.

*Read* request asks for a word of data from the processor's internal resource.

*Write* request provides a word of data to be written to the processor's internal resource.

*Null* request requires no action by the processor; it provides a mechanism for the external agent to return control of the system interface to the master state without affecting the processor.

The processor controls the flow of external requests through the arbitration signals *ExtRqst* and *Release*, as shown in Figure 12.10. The external agent must acquire mastership of the system interface before it is allowed to issue an external request; the external agent arbitrates for mastership of the system interface by asserting *ExtRqst* and then waiting for the processor to assert *Release* for one cycle.
Mastership of the system interface always returns to the processor after an external request is issued. The processor does not accept a subsequent external request until it has completed the current request.

If there are no processor requests pending, the processor decides, based on its internal state, whether to accept the external request, or to issue a new processor request. The processor can issue a new processor request even if the external agent is requesting access to the system interface.

The external agent asserts ExtRqst* indicating that it wishes to begin an external request. The external agent then waits for the processor to signal that it is ready to accept this request by asserting Release*. The processor signals that it is ready to accept an external request based on the criteria listed below.

- The processor completes any processor request that is in progress.
- While waiting for the assertion of RdRdy* to issue a processor read request, the processor can accept an external request if the request is delivered to the processor one or more cycles before RdRdy* is asserted.
- While waiting for the assertion of WrRdy* to issue a processor write request, the processor can accept an external request provided the request is delivered to the processor one or more cycles before WrRdy* is asserted.
- If waiting for the response to a read request after the processor has made an uncompelled change to a slave state, the external agent can issue an external request before providing the read response data.

**External Read Request**

In contrast to a processor read request, data is returned directly in response to an external read request; no other requests can be issued until the processor returns the requested data. An external read request is complete after the processor returns the requested word of data.

The data identifier (see “System Interface Commands and Data Identifiers” on page 12-32) associated with the response data can signal that the returned data is erroneous, causing the processor to take a bus error.

**Note:** The R4600/R4700 does not contain any resources that are readable by an external read request; in response to an external read request the processor returns undefined data and a data identifier with its Erroneous Data bit, SysCmd(5), set.

**External Write Request**

When an external agent issues a write request, the specified resource is accessed and the data is written to it. An external write request is complete after the word of data has been transmitted to the processor.

The only processor resource available to an external write request is the IP field of the Cause register.

**Read Response**

A read response returns data in response to a processor read request, as shown in Figure 12.11. While a read response is technically an external request, it has one characteristic that differentiates it from all other external requests—it does not perform system interface arbitration. For this reason, read responses are handled separately from all other external requests, and are simply called read responses. When a read response comes back with bad parity for the first datum, a cache error exception results.
Handling Requests

This section details the sequence, protocol, and syntax (see “Terminology” on page 12-1 for definitions of these terms) of both processor and external requests. The following system events are discussed:

- load miss (no-secondary-cache mode)
- store miss (no-secondary-cache mode)
- store hit
- uncached loads/stores
- CACHE operations
- load linked store conditional.

Load Miss

When a processor load misses in the primary cache, before the processor can proceed it must obtain the cache line that contains the data element to be loaded from the external agent.

If the new cache line replaces a current cache line with a W bit set, the current cache line must be written back.

The processor examines the coherency attribute (cache coherency attributes are described in Chapter 11) in the TLB entry for the page that contains the requested cache line, and executes the following request:

- The coherency attribute is noncoherent, the processor issues a noncoherent read request.

Table 12.1 shows the actions taken on a load miss to primary cache.

<table>
<thead>
<tr>
<th>Page Attribute</th>
<th>State of Data Cache Line Being Replaced</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Clean/Invalid</td>
</tr>
<tr>
<td>Noncoherent</td>
<td>NCR</td>
</tr>
<tr>
<td>NCR</td>
<td>Processor noncoherent block read request</td>
</tr>
<tr>
<td>NCR/W</td>
<td>Processor noncoherent block read request followed by processor block write request</td>
</tr>
</tbody>
</table>

Table 12.1  Load Miss to Primary Cache
No-Secondary-Cache Mode — Load Miss

In no-secondary-cache mode, if the cache line must be written back on a load miss, the read request is issued and completed before the write request is handled. The processor takes the following steps:

1. The processor issues a noncoherent read request for the cache line that contains the data element to be loaded.
2. The processor then waits for an external agent to provide the read response.
3. The processor will restart the pipeline after the first doubleword (the data that missed is fetched first). The rest of the data cache line will be placed into the cache in parallel.

If the current cache line must be written back, the processor issues a write request to save the dirty cache line in memory.

Store Miss

When a processor store misses in the primary cache, the processor may request, from the external agent, the cache line that contains the target location of the store for pages that are either write-back or write-through with write-allocate only. The processor examines the coherency attribute in the TLB entry for the page (TLB page coherency attributes are listed in Chapter 4) that contains the requested cache line to see if the line is write-allocate or no-write-allocate.

The processor then executes one of the following requests:

- If the coherency attribute is noncoherent, write-back or noncoherent, write-through with write-allocate, a noncoherent block read request is issued.
- If the coherency attribute is noncoherent, write-through with no write-allocate, the processor issues a non-block write request.

Table 12.1 shows the actions taken on a store miss to the primary cache.

<table>
<thead>
<tr>
<th>Page Attribute</th>
<th>State of Data Cache Line Being Replaced</th>
</tr>
</thead>
<tbody>
<tr>
<td>Clean/Invalid</td>
<td>Dirty (W=1)</td>
</tr>
<tr>
<td>Noncoherent, write-back or write-allocate</td>
<td>NCR</td>
</tr>
<tr>
<td>Noncoherent, write-through with no write-allocate</td>
<td>NCW</td>
</tr>
</tbody>
</table>

Table 12.2  Store Miss to Primary Cache

No-Secondary-Cache Mode — Store Miss

If the coherency attribute is write-back or write-through with write-allocate, the processor issues a read request for the cache line that contains the data element to be loaded, then awaits the external agent to provide read data in response to the read request. Then, if the current cache line must be written back, the processor issues a write request for the current cache line. For a write-through, no write-allocate store miss, the processor issues a write request only.
In no-secondary-cache mode, if the new cache line replaces a current cache line whose Write back (W) bit is set, the current cache line moves to an internal write buffer before the new cache line is loaded in the primary cache.

**Store Hit**

This section describes store hits in no-secondary-cache mode for both write-back and write-through lines.

**No-Secondary-Cache Mode — Store Hit**

In no-secondary-cache mode, the action on the system interface will be determined by whether the line is write-back or write-through. All lines that use a write-back policy are set to the dirty exclusive cache state and there is no bus transactions generated. For lines with a write-through policy, the store will also generate a processor write request for the store data.

**Uncached Loads or Stores**

When the processor performs an uncached load, it issues a noncoherent word read request (the actual access can be for a doubleword, word, partial word or byte, but the request is called a word read request to differentiate it from the block read request). When the processor performs an uncached store, it issues a doubleword, partial doubleword, word, or partial word write request.

The CPU expects valid parity and data in the full SysAD bus (all 64 bits), even if it is looking for less than a double word. Even if you do not want to return the full double word, you still must tell it not to check the parity if you are not using all 64 bits. In other words, either return 64 bits with parity, or tell it not to check parity.

All writes by the processor will be buffered from the system interface by the 4-deep write buffer. The write requests are sent to the system interface when there are no other requests in progress. If the write buffer contains any entries when a block request is needed, the write buffer is first flushed before any read request will occur (cache miss or uncached load).

Both a data cache miss and an uncached data load will flush the write buffer.

**CACHE Operations**

The processor provides a variety of CACHE operations to maintain the state and contents of the primary cache. During the execution of the CACHE operation instructions, the processor can issue write requests.
Load Linked/Store Conditional Operation

Generally, the execution of a Load Linked/Store Conditional instruction sequence is not visible at the system interface; that is, no special requests are generated due to the execution of this instruction sequence.

There is, however, one situation in which the execution of a Load Linked/Store Conditional instruction sequence is visible, as indicated by the link address retained bit during a processor read request, as programmed by the SysCmd(2) bit. This situation occurs when the data location targeted by a Load-Linked-Store-Conditional instruction sequence maps to the same cache line to which the instruction area containing the Load Linked/Store Conditional code sequence is mapped. In this case, immediately after executing the Load Linked instruction, the cache line that contains the link location is replaced by the instruction line containing the code. The link address is kept in a register separate from the cache, and remains active as long as the link bit, set by the Load Linked instruction, is set.

The link bit, which is set by the load linked instruction, is cleared by a change of cache state for the line containing the link address, or by a Return From Exception.

For more information, refer to Chapter 11, or see the specific Load Linked and Store Conditional instructions described in Appendix A.

Processor and External Request Protocols

The following sections contain a cycle-by-cycle description of the bus arbitration protocols for each type of processor and external request. Table 12.3 lists the abbreviations and definitions for each of the buses that are used in the timing diagrams that follow.

<table>
<thead>
<tr>
<th>Scope</th>
<th>Abbreviation</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>Global</td>
<td>Unsd</td>
<td>Unused</td>
</tr>
<tr>
<td>SysAD bus</td>
<td>Addr</td>
<td>Physical address</td>
</tr>
<tr>
<td></td>
<td>Data&lt;n&gt;</td>
<td>Data element number n of a block of data</td>
</tr>
<tr>
<td>SysCmd bus</td>
<td>Cmd</td>
<td>An unspecified system interface command</td>
</tr>
<tr>
<td></td>
<td>Read</td>
<td>A processor or external read request command</td>
</tr>
<tr>
<td></td>
<td>Write</td>
<td>A processor or external write request command</td>
</tr>
<tr>
<td></td>
<td>SINull</td>
<td>A system interface release external null request command</td>
</tr>
<tr>
<td></td>
<td>NData</td>
<td>A noncoherent data identifier for a data element other than the last data element</td>
</tr>
<tr>
<td></td>
<td>NEOD</td>
<td>A noncoherent data identifier for the last data element</td>
</tr>
</tbody>
</table>

Table 12.3 System Interface Requests

Processor Request Protocols

Processor request protocols described in this section include:
- read
- write

Note: In the timing diagrams, the two closely spaced, wavy vertical lines (see SCycle 2 in Figure 12.20 on page 12-24) indicate one or more identical cycles.
**Processor Read Request Protocol Steps**

The following sequence describes the protocol for a processor read request (the numbered steps below correspond to the numbers in Figure 12.12 on page 12-16).

1. **RdRdy** is asserted low, indicating the external agent is ready to accept a read request.

2. With the system interface in master state, a processor read request is issued by driving a read command on the **SysCmd** bus and a read address on the **SysAD** bus.

3. At the same time, the processor asserts **ValidOut** for one cycle, indicating valid data is present on the **SysCmd** and the **SysAD** buses.

   **Note:** Only one processor read request can be pending at a time.

4. The processor makes an uncompelled change to slave state at the issue cycle of the read request by asserting the **Release** signal for one cycle.

   **Note:** The external agent must not assert the signal **ExtRqst** for the purposes of returning a read response, but rather must wait for the uncompelled change to slave state. The signal **ExtRqst** can be asserted before or during a read response to perform an external request other than a read response.

5. The processor releases the **SysCmd** and the **SysAD** buses one SCycle after the assertion of **Release**.

6. The external agent drives the **SysCmd** and the **SysAD** buses within two cycles after the assertion of **Release**.

Once in slave state (starting at cycle 5 in Figure 12.12), the external agent can return the requested data through a read response. The read response can return the requested data or, if the requested data could not be successfully retrieved, an indication that the returned data is erroneous. If the returned data is erroneous, the processor takes a bus error exception.

**Note:** The R4600/R4700 only check the error bit for the first doubleword of read response data, all other error bits are ignored.
Figure 12.12 illustrates a processor read request, coupled with an uncompelled change to slave state.

**Note:** Timings for the *SysADC* and *SysCmdP* buses are the same as those of the *SysAD* and *SysCmd* buses, respectively.

The assertion of Release* indicates either an uncompelled change to slave state, or a response to the assertion of ExtRqst*, whereupon the processor accepts either a read response, or any other external request. If any external request other than a read response is issued, the processor performs another uncompelled change to slave state after processing the external request.

The actual read response, where the external agent returns the requested data, is shown later in this chapter.

**External Instruction Read Response Time**

The R4600/R4700 accesses the external bus due to instruction cache miss or an uncached reference. The length of time for an external read is based on the overhead at the beginning and end of the read along with the time to drive the address and get the response data.
**Instruction Read Latency Steps for System Clock**

The read latency for a system clock in the divide-by-two mode is as follows:

1. The startup overhead is one to two pipeline cycles (PCycle) for the CPU to transfer the address to the pads to be output. The second PCycle is needed if the miss is detected on a PCycle not aligned with the rising edge of SClock.
2. The CPU drives the address on the SysAD bus for two PCycles.
3. The CPU tri-states the SysAD bus for two PCycles.
4. The CPU waits for the main memory to return the data. This is expressed as \( n \times 2 \) PCycles.
5. The first double word is driven in the SysAD from the main memory for two PCycles.
6. The remaining three double words of instruction are driven on SysAD for 3*2 PCycles.

**Notes on the Instruction Read Latency Steps:**

- For instruction misses the pipeline starts after all the instructions are returned.
- \( n \) is the total number of idle cycles (even between double word instruction). For zero wait state systems, \( n = 0 \).

**Example of Instruction Block Read With Zero Wait State**

The following example shows an instruction block read with a zero wait state:

- **Step** | **Description** | **PCycles**
  - 1. | CPU overhead for cache miss detection: | 1-2
  - 2. | Address driven on SysAD bus: | 2
  - 3. | SysAD bus tri-stated: | 2
  - 4. | Memory latency to return the data: | 0*2
  - 5. | First double word driven on SysAD bus: | 2
  - 6. | Remaining three instructions returned: | 2*3=6

Total PCycles: 13-14

**External Data Read Response Time**

The R4600/R4700 accesses the external bus due to data cache miss or an uncached reference. The length of time for an external read is based on the overhead at the beginning and end of the read along with the time to drive the address and get the response data.
Data Read Latency Steps for System Clock
The read latency for a system clock in the divide-by-two mode is as follows:

1. The startup overhead is one to two pipeline cycles (PCycle) for the CPU to generate the parity for the address to be output. The second PCycle is needed if the miss is detected or a PCycle not aligned with the rising edge of SClock.
2. The CPU drives the address on the SysAD bus for two PCycles.
3. The CPU tri-states the SysAD bus for two PCycles.
4. The CPU waits for the main memory to return the data. This is expressed as $n \times 2$ PCycles where $n$ is the number of SClock cycles for the first data to be returned in a block read, or the latency for the single read. For zero wait state memory system $n$ should be zero.
5. The first double word is driven in the SysAD from the main memory for two PCycles.
6. The end of the overhead is two PCycles: one to transfer the data from the pads and generate the parity, and one to write to the register (or cache, if it is cacheable data).

Notes on the Data Read Latency Steps:

a. If $n=0$ and the line being replaced is dirty, the CPU takes one to two additional PCycles of overhead to move the dirty data into the write buffer.
b. The additional latency for returning the remaining three data elements should be added in a similar fashion.
c. If cache line needs to be written back the read request is posted first, then the write is completed.

Example of Data Single Read With Zero Wait State
The following example shows a data block read with a zero wait state:

<table>
<thead>
<tr>
<th>Step</th>
<th>Description</th>
<th>PCycles</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.</td>
<td>CPU overhead for cache miss detection</td>
<td>1-2</td>
</tr>
<tr>
<td>2.</td>
<td>Address driven on SysAD bus</td>
<td>2</td>
</tr>
<tr>
<td>3.</td>
<td>SysAD bus tri-stated</td>
<td>2</td>
</tr>
<tr>
<td>4.</td>
<td>Memory latency to return the data</td>
<td>0*2</td>
</tr>
<tr>
<td>5.</td>
<td>First double word driven on SysAD bus</td>
<td>2</td>
</tr>
<tr>
<td>6.</td>
<td>CPU overhead to write the data cache, do the fixup, and then restart</td>
<td>2</td>
</tr>
</tbody>
</table>

Total PCycles: 9-10

External Cycles for Read Latency
The external cycles to get the response data will look similar to Figure 12.13. For a larger “divide by” it will take longer to get the response data.

![Figure 12.13 Uncached Read—External Cycles](image-url)
The same operation is shown in greater detail in Figure 12.14. These figures assume the following:

1. Data is returned immediately after the Release* is asserted, and after the bus turn-around cycle (when the CPU tri-states the bus to allow the external agent to drive it).

2. The data meets the setup and hold requirements for the rising edge of the SClock that is identified in the preceding and following figures with an asterisk.

<table>
<thead>
<tr>
<th>SCycle</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
<th>10</th>
<th>11</th>
<th>12</th>
</tr>
</thead>
<tbody>
<tr>
<td>SClock</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>*</td>
<td></td>
<td></td>
</tr>
<tr>
<td>SysAD Bus</td>
<td></td>
<td>Addr</td>
<td>Data0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>SysCmd Bus</td>
<td></td>
<td>Read</td>
<td>NEOD</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ValidOut*</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ValidIn*</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ExtRqst*</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Release*</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>RdRdy*</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 12.14  Processor Read Cycle

**Processor Write Request Protocol**

Processor write requests are issued using one of two protocols.

- Doubleword, partial doubleword, word, or partial word writes use a word\(^1\) write request protocol.
- Block writes use a block write request protocol.

Processor word write requests are issued with the system interface in master state, as described in the following steps. Figure 12.15 shows a processor noncoherent word write request cycle.

1. A processor single word write request is issued by driving a write command on the **SysCmd** bus and a write address on the **SysAD** bus.

2. The processor asserts **ValidOut**\(^*\).

3. The processor drives a data identifier on the **SysCmd** bus and data on the **SysAD** bus.

4. The data identifier associated with the data cycle must contain a last data cycle indication. At the end of the cycle, **ValidOut**\(^*\) is deasserted.

**Note:** Timings for the **SysADC** and **SysCmdP** buses are the same as those of the **SysAD** and **SysCmd** buses, respectively.

\(^1\) Called word to distinguish it from block request protocol. Data transferred can actually be doubleword, partial doubleword, word, or partial word.
The R4600/R4700 interface requires that WrRdy* be asserted two system cycles prior to the issue of a write, for one clock cycle. An external agent that deasserts WrRdy* immediately upon receiving the write that fills its buffer will stop a subsequent write for four system cycles in R4000 non-block write compatible mode. This leaves two null system cycles after a write address/data pair to give the external agent time to stop the next write. This is illustrated in Figure 12.6 on page 12-7.

An Address/data pair every four system cycles is not sufficiently high performance for all applications. For this reason, the R4600/R4700 provides two new protocol options that modify the R4000 back-to-back write protocol to allow an address/data pair every two system cycles. The first protocol, called write re-issue, allows WrRdy* to be deasserted during the address cycle and forces a write to be re-issued. The second, called pipelined writes, leaves the sample point of WrRdy* unchanged and requires that the external agent accept one more write than the R4000 protocol.
The write re-issue protocol is shown in Figure 12.16. Writes issue when WrRdy* is asserted both two cycles prior to the address cycle and during the address cycle.

![Figure 12.16 Write re-issue](image)

The pipelined write protocol is shown in Figure 12.17. This protocol maintains the R4000 write issue rule (issue if WrRdy* asserted two cycles prior to the address cycle, for one clock cycle), but simply eliminates the two null cycles between writes. The external agent is then required to accept one more write after it deasserts WrRdy*.

![Figure 12.17 Pipelined Writes](image)

All three write protocols apply for both single write and block writes. This means that in pipeline write, for example, a single write can be followed immediately by a block write that the external agent must accept.

Processor block write requests are issued with the system interface in master state, as described below; a processor noncoherent block request for eight words of data is illustrated in Figure 12.18 on page 12-22.

1. The processor issues a write command on the **SysCmd** bus and a write address on the **SysAD** bus.
2. The processor asserts **ValidOut***.
3. The processor drives a data identifier on the **SysCmd** bus and data on the **SysAD** bus.
4. The processor asserts **ValidOut*** for a number of cycles sufficient to transmit the block of data.
5. The data identifier associated with the last data cycle must contain a last data cycle indication.
Figure 12.18 illustrate a processor noncoherent block request for eight words of data with a data pattern of DDDD.

**Processor Request and Flow Control**

The external agent uses RdRdy* to control the flow of processor read requests. Figure 12.19 on page 12-23 illustrates this flow control, as described in the steps below.

1. The processor samples the signal RdRdy* to determine if the external agent is capable of accepting a read request.
2. The signal WrRdy* controls the flow of a processor write request.
3. The processor does not complete the issue of a read request, until it issues an address cycle in response to the request for which the signal RdRdy* was asserted two cycles earlier.
4. The processor does not complete the issue of a write request until it issues an address cycle in response to the write request for which the signal WrRdy* was asserted two cycles earlier.
Figure 12.19 illustrates two processor write requests in which the issue of the second is delayed for the assertion of \( \text{WrRdy}^* \).

**Note:** Timings for the \text{SysADC} and \text{SysCmdP} buses are the same as those of the \text{SysAD} and \text{SysCmd} buses, respectively.

**External Request Protocols**

External requests can only be issued with the system interface in slave state. An external agent asserts \text{ExtRqst}^* to arbitrate (see “External Arbitration Protocol” on page 12-24) for the system interface, then waits for the processor to release the system interface to slave state by asserting \text{Release}^* before the external agent issues an external request. If the system interface is already in slave state—that is, the processor has previously performed an uncompelled change to slave state—the external agent can begin an external request immediately.

After issuing an external request, the external agent must return the system interface to master state. If the external agent does not have any additional external requests to perform, \text{ExtRqst}^* must be deasserted two cycles after the cycle in which \text{Release}^* was asserted. For a string of external requests, the \text{ExtRqst}^* signal is asserted until the last request cycle, whereupon it is deasserted two cycles after the cycle in which \text{Release}^* was asserted.

The processor continues to handle external requests as long as \text{ExtRqst}^* is asserted; however, the processor cannot release the system interface to slave state for a subsequent external request until it has completed the current request. As long as \text{ExtRqst}^* is asserted, the string of external requests is not interrupted by a processor request.

This section describes the following external request protocols:

- read
- null
- write
- read response
**External Arbitration Protocol**

System interface arbitration uses the signals \texttt{ExtRqst*} and \texttt{Release*} as described above. Figure 12.20 is a timing diagram of the arbitration protocol, in which slave and master states are shown.

The arbitration cycle consists of the following steps:
1. The external agent asserts \texttt{ExtRqst*} when it wishes to submit an external request.
2. The processor waits until it is ready to handle an external request, whereupon it asserts \texttt{Release*} for one cycle.
3. The processor sets the \texttt{SysAD} and \texttt{SysCmd} buses to tri-state.
4. The external agent must begin driving the \texttt{SysAD} bus and the \texttt{SysCmd} bus two cycles after the assertion of \texttt{Release*}.
5. The external agent deasserts \texttt{ExtRqst*} two cycles after the assertion of \texttt{Release*}, unless the external agent wishes to perform an additional external request.
6. The external agent sets the \texttt{SysAD} and the \texttt{SysCmd} buses to tri-state at the completion of an external request.

The processor can start issuing a processor request one cycle after the external agent sets the bus to tri-state.

**Note:** Timings for the \texttt{SysADC} and \texttt{SysCmdP} buses are the same as those of the \texttt{SysAD} and \texttt{SysCmd} buses, respectively.

---

**External Read Request Protocol**

External reads are requests for a word of data from a processor internal resource, such as a register. External read requests cannot be split; that is, no other request can occur between the external read request and its read response.
Figure 12.21 shows a timing diagram of an external read request, which consists of the following steps:
1. An external agent asserts $\text{ExtRqst}^*$ to arbitrate for the system interface.
2. The processor releases the system interface to slave state by asserting $\text{Release}^*$ for one cycle and then deasserting $\text{Release}^*$.
3. After $\text{Release}^*$ is deasserted, the $\text{SysAD}$ and $\text{SysCmd}$ buses are set to a tri-state for one cycle.
4. The external agent drives a read request command on the $\text{SysCmd}$ bus and a read request address on the $\text{SysAD}$ bus and asserts $\text{ValidIn}^*$ for one cycle.
5. After the address and command are sent, the external agent releases the $\text{SysCmd}$ and $\text{SysAD}$ buses by setting them to tri-state and allowing the processor to drive them. The processor, having accessed the data that is the target of the read, returns this data to the external agent. The processor accomplishes this by driving a data identifier on the $\text{SysCmd}$ bus, the response data on the $\text{SysAD}$ bus, and asserting $\text{ValidOut}^*$ for one cycle. The data identifier indicates that this is last-data-cycle response data.
6. The system interface is in master state. The processor continues driving the $\text{SysCmd}$ and $\text{SysAD}$ buses after the read response is returned.

**Note:** Timings for the $\text{SysADC}$ and $\text{SysCmdP}$ buses are the same as those of the $\text{SysAD}$ and $\text{SysCmd}$ buses, respectively.

External read requests are only allowed to read a word of data from the processor. The processor response to external read requests for any data element other than a word is undefined.
External null requests require no action from the processor other than to return the system interface to master state.

Figure 12.22 shows timing diagram of the external null request cycle, which consist of the following steps:

1. The external agent asserts \texttt{ExtRqst}* to arbitrate for the system interface.
2. The processor releases the system interface to slave state by asserting \texttt{Release}*.
3. The external agent drives a system interface release external null request command on the \texttt{SysCmd} bus, and asserts \texttt{ValidIn}* for one cycle to return the system interface back to master state.
4. The \texttt{SysAD} bus is unused (does not contain valid data) during the address cycle associated with an external null request.
5. After the address cycle is issued, the null request is complete.

For a system interface release external null request, the external agent releases the \texttt{SysCmd} and \texttt{SysAD} buses, and expects the system interface to return to master state.

![System Interface Release External Null Request](image)

**External Write Request Protocol**

External write requests use a protocol identical to the processor single word write protocol except the \texttt{ValidIn}* signal is asserted instead of \texttt{ValidOut}*.

Figure 12.23 on page 12-27 shows a timing diagram of an external write request, which consists of the following steps:

1. The external agent asserts \texttt{ExtRqst}* to arbitrate for the system interface.
2. The processor releases the system interface to slave state by asserting \texttt{Release}*.
3. The external agent drives a write command on the \texttt{SysCmd} bus, a write address on the \texttt{SysAD} bus, and asserts \texttt{ValidIn}.
4. The external agent drives a data identifier on the \texttt{SysCmd} bus, data on the \texttt{SysAD} bus, and asserts \texttt{ValidIn}.
5. The data identifier associated with the data cycle must contain a coherent or noncoherent last data cycle indication.
6. After the data cycle is issued, the write request is complete and the external agent sets the \texttt{SysCmd} and \texttt{SysAD} buses to a tri-state, allowing the system interface to return to master state. Timings for the \texttt{SysADC} and \texttt{SysCmdP} buses are the same as those of the \texttt{SysAD} and \texttt{SysCmd} buses, respectively.
External write requests are only allowed to write a word of data to the processor. Processor behavior in response to an external write request for any data element other than a word is undefined.

**Read Response Protocol**

An external agent must return data to the processor in response to a processor read request by using a read response protocol. A read response protocol consists of the following steps:

1. The external agent waits for the processor to perform an uncompelled change to slave state.
2. The external agent returns the data through a single data cycle or a series of data cycles.
3. After the last data cycle is issued, the read response is complete and the external agent sets the `SysCmd` and `SysAD` buses to a tri-state.
4. The system interface returns to master state.

**Note:** The processor always performs an uncompelled change to slave state in the same cycle that it issues a read request.

5. The data identifier for data cycles must indicate the fact that this data is *response data*.
6. The data identifier associated with the last data cycle must contain a *last data cycle* indication.

For read responses to non-coherent block read requests (which is the only read request for normal operations of the R4600/R4700,) the response data will not need to identify an initial cache state. The cache state will automatically be assigned as dirty exclusive by the R4600/R4700.

The data identifier associated with a data cycle can indicate that the data transmitted during that cycle is erroneous; however, an external agent must return a data block of the correct size regardless of the fact that the data may be in error. The R4600/R4700 only checks the error bit for the first doubleword of a block, the other error bits for the block of data are ignored If an initial erroneous data cycle is detected, the processor takes a bus error at the completion of the data transfer.
Read response data must only be delivered to the processor when a processor read request is pending. The behavior of the processor is undefined when a read response is presented to it and there is no processor read pending.

Figure 12.24 illustrates a processor word read request followed by a word read response. Figure 12.25 illustrates a read response for a processor block read with the system interface already in slave state. Figure 12.26 illustrates a block read transaction with one wait state.

**Note:** Timings for the SysADc and SysCmdP buses are the same as those of the SysAD and SysCmd buses, respectively.
Data Rate Control

The system interface supports a maximum data rate of one doubleword per cycle. The data rate the processor can support is directly related to the rate at which the external agent can accept data.

Read Data Pattern

The rate at which data is delivered to the processor can be determined by the external agent—for example, the external agent can drive data and assert ValidIn* every $n$ cycles, instead of every cycle. An external agent can deliver data at any rate it chooses, but must not deliver data to the processor any faster than the processor is capable of receiving it.

The processor only accepts cycles as valid when ValidIn* is asserted and the SysCmd bus contains a data identifier. If the external agent sends more data items than requested (e.g., a fifth doubleword of read response data with ValidIn* asserted) or the last data (i.e., the fourth doubleword) of a block read is not tagged as the last data item, it is an error and the resulting actions of the processor for these cases will be undefined.
Figure 12.27 shows a read response with reduced data rate and with the system interface in slave state.

---

**Write Data Transfer Patterns**

The write data pattern specifies the pattern the R4600/R4700 uses when writing a block to the external agent. This pattern is specified through the mode bits.

A data pattern is a sequence of letters indicating the **data** and **unused** cycles that repeat to provide the appropriate data rate. For example, the data pattern **DDxx** specifies a repeatable data rate of two doublewords every four cycles, with the last two cycles unused.

Table 12.4 lists the maximum processor data rate and the data pattern for each data rate.

<table>
<thead>
<tr>
<th>Maximum Data Transmit Rate Block writes</th>
<th>Data Pattern</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 Double/1 SClock Cycle</td>
<td>DDDD</td>
</tr>
<tr>
<td>2 Doubles/3 SClock Cycles</td>
<td>DxDDxDx</td>
</tr>
<tr>
<td>1 Double/2 SClock Cycles</td>
<td>DDxDDxDxx</td>
</tr>
<tr>
<td>1 Double/2 SClock Cycles</td>
<td>DxxDDxDx</td>
</tr>
<tr>
<td>2 Doubles/5 SClock Cycles</td>
<td>DDxDDxDDxx</td>
</tr>
<tr>
<td>1 Double/3 SClock Cycles</td>
<td>DDxDDxDDxx</td>
</tr>
<tr>
<td>1 Double/3 SClock Cycles</td>
<td>DxxDDxDxDDxx</td>
</tr>
<tr>
<td>1 Double/4 SClock Cycles</td>
<td>DDxDDxDDxDDxx</td>
</tr>
<tr>
<td>1 Double/4 SClock Cycles</td>
<td>DxDDxDDxDDxDDxx</td>
</tr>
</tbody>
</table>

*Table 12.4 Transmit Data Rates and Patterns*

In Table 12.4 data patterns are specified using the letters **D** and **x**: **D** indicates a data cycle and **x** indicates an unused cycle. During the unused cycles, the data bus will maintain the last data value (D).
Independent Transmissions on the SysAD Bus

In most applications, the SysAD bus is a point-to-point connection, running from the processor to a bidirectional registered transceiver residing in an external agent. For these applications, the SysAD bus has only two possible drivers, the processor or the external agent.

Certain applications may require connection of additional drivers and receivers to the SysAD bus, to allow transmissions over the SysAD bus that the processor is not involved in. These are called independent transmissions. To effect an independent transmission, the external agent must coordinate control of the SysAD bus by using arbitration handshake signals and external null requests.

An independent transmission on the SysAD bus follows this procedure:
1. The external agent requests mastership of the SysAD bus, to issue an external request.
2. The processor releases the system interface to slave state.
3. The external agent then allows the independent transmission to take place on the SysAD bus, making sure that ValidIn* is not asserted while the transmission is occurring.
4. When the transmission is complete, the external agent must issue a system interface release external null request to return the system interface to master state.

System Interface Endianness

The endianness of the system interface is programmed at boot time through the boot-time mode control interface (see chapter 9, Initialization Interface), and remains fixed until the next time the processor boot-time mode bits are read. Software cannot change the endianness of the system interface and the external system; software can set the reverse endian bit to reverse the interpretation of endianness inside the processor, but the endianness of the system interface remains unchanged.

System Interface Cycle Time

The processor specifies minimum and maximum cycle counts for various processor transactions and for the processor response time to external requests. Processor requests themselves are constrained by the system interface request protocol, and request cycle counts can be determined by examining the protocol. The following system interface interactions can vary within minimum and maximum cycle counts:

- waiting period for the processor to release the system interface to slave state in response to an external request (release latency)
- response time for an external request that requires a response (external response latency)

The remainder of this section describes and tabulates the minimum and maximum cycle counts for these system interface interactions.
Release Latency

Release latency is generally defined as the number of cycles the processor can wait to release the system interface to slave state for an external request. When no processor requests are in progress, internal activity can cause the processor to wait some number of cycles before releasing the system interface. Release latency is therefore more specifically defined as the number of cycles that occur between the assertion of ExtRqst* and the assertion of Release*.

There are three categories of release latency:
- Category 1: when the external request signal is asserted two cycles before the last cycle of a processor request.
- Category 2: when the external request signal is not asserted during a processor request, or is asserted during the last cycle of a processor request.
- Category 3: when the processor makes an uncompelled change to slave state.

Table 12.5 summarizes the minimum and maximum release latencies for requests that fall into categories 1, 2 and 3. Note that the maximum and minimum cycle count values are subject to change.

<table>
<thead>
<tr>
<th>Category</th>
<th>Minimum PCycles</th>
<th>Maximum PCycles</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>4</td>
<td>6</td>
</tr>
<tr>
<td>2</td>
<td>4</td>
<td>24</td>
</tr>
<tr>
<td>3</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Table 12.5  Release Latency for External Requests

The differences in the minimum and maximum times are due to internal conditions not readily observable externally.

System Interface Commands and Data Identifiers

System interface commands specify the nature and attributes of any system interface request; this specification is made during the address cycle for the request. System interface data identifiers specify the attributes of data transmitted during a system interface data cycle.

The following sections describe the syntax, that is, the bitwise encoding of system interface commands and data identifiers.

Reserved bits and reserved fields in the command or data identifier should be set to 1 for system interface commands and data identifiers associated with external requests. For system interface commands and data identifiers associated with processor requests, reserved bits and reserved fields in the command and data identifier are undefined.

Command and Data Identifier Syntax

System interface commands and data identifiers are encoded in 9 bits and are transmitted on the SysCmd bus from the processor to an external agent, or from an external agent to the processor, during address and data cycles. Bit 8 (the most-significant bit) of the SysCmd bus determines whether the current content of the SysCmd bus is a command or a data identifier and, therefore, whether the current cycle is an address cycle or a data cycle. For system interface commands, SysCmd(8) must be set to 0. For system interface data identifiers, SysCmd(8) must be set to 1.
System Interface Command Syntax

This section describes the SysCmd bus encoding for system interface commands. Figure 12.28 shows a common encoding used for all system interface commands.

Figure 12.28 System Interface Command Syntax Bit Definition

SysCmd(8) must be set to 0 for all system interface commands. SysCmd(7:5) specify the system interface request type which may be read, write or null; Table 12.6 lists the encoding of SysCmd(7:5). Table 12.6 shows the types of requests encoded by the SysCmd(7:5) bits.

<table>
<thead>
<tr>
<th>SysCmd(7:5)</th>
<th>Command</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Read Request</td>
</tr>
<tr>
<td>1</td>
<td>Reserved</td>
</tr>
<tr>
<td>2</td>
<td>Write Request</td>
</tr>
<tr>
<td>3</td>
<td>Null Request</td>
</tr>
<tr>
<td>4 - 7</td>
<td>Reserved</td>
</tr>
</tbody>
</table>

Table 12.6 Encoding of SysCmd(7:5) for System Interface Commands

SysCmd(4:0) are specific to each type of request and are defined in each of the following sections.

Read Requests

Figure 12.29 shows the format of a SysCmd read request.

Figure 12.29 Read Request SysCmd Bus Bit Definition
Table 12.7, Table 12.8, and Table 12.9 list the encoding of \textit{SysCmd(4:0)} for read requests.

<table>
<thead>
<tr>
<th>SysCmd(4:3)</th>
<th>Read Attributes</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 - 1</td>
<td>Reserved</td>
</tr>
<tr>
<td>2</td>
<td>Noncoherent block read</td>
</tr>
<tr>
<td>3</td>
<td>Doubleword, partial doubleword, word, or partial word</td>
</tr>
</tbody>
</table>

\textbf{Table 12.7 Encoding of SysCmd(4:3) for Read Requests}

<table>
<thead>
<tr>
<th>SysCmd(2)</th>
<th>Link Address Retained Indication</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Link address not retained</td>
</tr>
<tr>
<td>1</td>
<td>Link address retained</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>SysCmd(1:0)</th>
<th>Read Block Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>1</td>
<td>8 words</td>
</tr>
<tr>
<td>2 - 3</td>
<td>Reserved</td>
</tr>
</tbody>
</table>

\textbf{Table 12.8 Encoding of SysCmd(2:0) for Block Read Request}

<table>
<thead>
<tr>
<th>SysCmd(2:0)</th>
<th>Read Data Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1 byte valid (Byte)</td>
</tr>
<tr>
<td>1</td>
<td>2 bytes valid (Halfword)</td>
</tr>
<tr>
<td>2</td>
<td>3 bytes valid (Tribyte)</td>
</tr>
<tr>
<td>3</td>
<td>4 bytes valid (Word)</td>
</tr>
<tr>
<td>4</td>
<td>5 bytes valid (Quintibyte)</td>
</tr>
<tr>
<td>5</td>
<td>6 bytes valid (Sextibyte)</td>
</tr>
<tr>
<td>6</td>
<td>7 bytes valid (Septibyte)</td>
</tr>
<tr>
<td>7</td>
<td>8 bytes valid (Doubleword)</td>
</tr>
</tbody>
</table>

\textbf{Table 12.9 Doubleword, Word, or Partial-word Read Request Data Size Encoding of SysCmd(2:0)}

\textbf{Write Requests}

Figure 12.30 shows the format of a \textit{SysCmd} write request.
Table 12.10 lists the write attributes encoded in bits \textit{SysCmd}(4:3). Table 12.11 lists the block write replacement attributes encoded in bits \textit{SysCmd}(2:0). Table 12.12 lists the write request bit encoding in \textit{SysCmd}(2:0).

<table>
<thead>
<tr>
<th>\textit{SysCmd}(4:3)</th>
<th>Write Attributes</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>1</td>
<td>Reserved</td>
</tr>
<tr>
<td>2</td>
<td>Block write</td>
</tr>
<tr>
<td>3</td>
<td>Doubleword, partial doubleword, word, or partial word</td>
</tr>
</tbody>
</table>

Table 12.10 Write Request Encoding of \textit{SysCmd}(4:3)

<table>
<thead>
<tr>
<th>\textit{SysCmd}(2)</th>
<th>Cache Line Replacement Attributes</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Cache line replaced</td>
</tr>
<tr>
<td>1</td>
<td>Cache line retained</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>\textit{SysCmd}(1:0)</th>
<th>Write Block Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Reserved</td>
</tr>
<tr>
<td>1</td>
<td>8 words</td>
</tr>
<tr>
<td>2 - 3</td>
<td>Reserved</td>
</tr>
</tbody>
</table>

Table 12.11 Block Write Request Encoding of \textit{SysCmd}(2:0)

<table>
<thead>
<tr>
<th>\textit{SysCmd}(2:0)</th>
<th>Write Data Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1 byte valid (Byte)</td>
</tr>
<tr>
<td>1</td>
<td>2 bytes valid (Halfword)</td>
</tr>
<tr>
<td>2</td>
<td>3 bytes valid (Tribyte)</td>
</tr>
<tr>
<td>3</td>
<td>4 bytes valid (Word)</td>
</tr>
<tr>
<td>4</td>
<td>5 bytes valid (Quintibyte)</td>
</tr>
<tr>
<td>5</td>
<td>6 bytes valid (Sextibyte)</td>
</tr>
<tr>
<td>6</td>
<td>7 bytes valid (Septibyte)</td>
</tr>
<tr>
<td>7</td>
<td>8 bytes valid (Doubleword)</td>
</tr>
</tbody>
</table>

Table 12.12 Doubleword, Word, or Partial-word Write Request Data Size Encoding of \textit{SysCmd}(2:0)
Null Requests

Figure 12.31 shows the format of a SysCmd null request.

![Figure 12.31 Null Request SysCmd Bus Bit Definition](image)

System interface release external null requests use the null request command. Table 12.13 lists the encoding of SysCmd(4:3) for external null requests. SysCmd(2:0) are reserved for both instances of null requests.

<table>
<thead>
<tr>
<th>SysCmd(4:3)</th>
<th>Null Attributes</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>System Interface release</td>
</tr>
<tr>
<td>1 - 3</td>
<td>Reserved</td>
</tr>
</tbody>
</table>

Table 12.13  External Null Request Encoding of SysCmd(4:3)

System Interface Data Identifier Syntax

This section defines the encoding of the SysCmd bus for system interface data identifiers. Figure 12.32 shows a common encoding scheme used for all system interface data identifiers.

![Figure 12.32 Data Identifier SysCmd Bus Bit Definition](image)

SysCmd(8) must be set to 1 for all system interface data identifiers. System interface data identifiers use the format for noncoherent data.

Noncoherent Data

Noncoherent data is defined as follows:

- data that is associated with processor block write requests and processor doubleword, partial doubleword, word, or partial word write requests
- data that is returned in response to a processor noncoherent block read request or a processor doubleword, partial doubleword, word, or partial word read request
- data that is associated with external write requests
- data that is returned in response to an external read request
**Data Identifier Bit Definitions**

*SysCmd(7)* marks the last data element and *SysCmd(6)* indicates whether or not the data is response data, for both processor and external coherent and noncoherent data identifiers. Response data is data returned in response to a read request.

*SysCmd(5)* indicates whether or not the data element is error free. Erroneous data contains an uncorrectable error and is returned to the processor, forcing a bus error. The processor delivers data with the good data bit deasserted if a primary parity error is detected for a transmitted data item.

*SysCmd(4)* indicates to the processor whether to check the data and check bits for this data element.

*SysCmd(3)* is reserved for external data identifiers.

*SysCmd(4:3)* are reserved for noncoherent processor data identifiers.

*SysCmd(2:0)* are reserved for noncoherent data identifiers.

Table 12.14 lists the encoding of *SysCmd(7:3)* for processor data identifiers.

<table>
<thead>
<tr>
<th>SysCmd(7)</th>
<th>Last Data Element Indication</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Last data element</td>
</tr>
<tr>
<td>1</td>
<td>Not the last data element</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>SysCmd(6)</th>
<th>Response Data Indication</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Data is response data</td>
</tr>
<tr>
<td>1</td>
<td>Data is not response data</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>SysCmd(5)</th>
<th>Good Data Indication</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Data is error free</td>
</tr>
<tr>
<td>1</td>
<td>Data is erroneous</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>SysCmd(4:3)</th>
<th>Reserved</th>
</tr>
</thead>
</table>

Table 12.14 Processor Data Identifier Encoding of SysCmd(7:3)
Table 12.15 lists the encoding of \textbf{SysCmd(7:3)} for external data identifiers.

<table>
<thead>
<tr>
<th>\textbf{SysCmd(7)}</th>
<th>\textbf{Last Data Element Indication}</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Last data element</td>
</tr>
<tr>
<td>1</td>
<td>Not the last data element</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>\textbf{SysCmd(6)}</th>
<th>\textbf{Response Data Indication}</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Data is response data</td>
</tr>
<tr>
<td>1</td>
<td>Data is not response data</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>\textbf{SysCmd(5)}</th>
<th>\textbf{Good Data Indication}</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Data is error free</td>
</tr>
<tr>
<td>1</td>
<td>Data is erroneous</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>\textbf{SysCmd(4)}</th>
<th>\textbf{Data Checking Enable}</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Check the data and check bits</td>
</tr>
<tr>
<td>1</td>
<td>Do not check the data and check bits</td>
</tr>
</tbody>
</table>

| \textbf{SysCmd(3)} | Reserved |

\textbf{Table 12.15  External Data Identifier Encoding of SysCmd(7:3)}

\textbf{System Interface Addresses}

System interface addresses are full 36-bit physical addresses presented on the least-significant 36 bits (bits 35 through 0) of the \textbf{SysAD} bus during address cycles; the remaining bits of the \textbf{SysAD} bus are unused during address cycles.

\textbf{Addressing Conventions}

Addresses associated with doubleword, partial doubleword, word, or partial word transactions, are aligned for the size of the data element. The system uses the following address conventions:

- Addresses associated with block requests are aligned to double-word boundaries; that is, the low-order 3 bits of address are 0.
- Doubleword requests set the low-order 3 bits of address to 0.
- Word requests set the low-order 2 bits of address to 0.
- Halfword requests set the low-order bit of address to 0.
- Byte, tribyte, quintibyte, sextibyte, and septibyte requests use the byte address.

\textbf{Subblock Ordering}

The order in which data is returned in response to a processor block read request is \textit{subblock ordering}. In subblock ordering, the processor delivers the address of the requested doubleword within the block. An external agent must return the block of data using subblock ordering, starting with the addressed doubleword.

A block of data elements (whether bytes, halfwords, words, or doublewords) can be retrieved from storage in two ways: in sequential order, or using a subblock order. This section describes these retrieval methods, with an emphasis on subblock ordering. Note that the R4600/R4700 only uses subblock ordering for block reads.
Example of Sequential Ordering
Sequential ordering retrieves the data elements of a block in serial, or sequential, order.
Figure 12.33 shows a sequential order in which DW0 is taken first and DW3 is taken last.

![Figure 12.33 Retrieving a Data Block in Sequential Order](image)

Example of Subblock Ordering
Subblock ordering allows the system to define the order in which the data elements are retrieved. The smallest data element of a block transfer for the R4600/R4700 is a doubleword, and Figure 12.34 shows the retrieval of a block of data that consists of 4 doublewords (the cache line size is 8 words), in which DW2 is taken first.

![Figure 12.34 Retrieving Data in a Subblock Order](image)

Using the subblock ordering shown in Figure 12.34, the doubleword at the target address is retrieved first (DW2), followed by the remaining doubleword (DW3) in this quadword. Next, the quadword that fills out the octalword are retrieved in the same order as the prior quadword (in this case DW0 is followed by DW1).
It may be easier way to understand subblock ordering by taking a look at the method used for generating the address of each doubleword as it is retrieved. The subblock ordering logic generates this address by executing a bit-wise exclusive-OR (XOR) of the starting block address with the output of a binary counter that increments with each doubleword, starting at doubleword zero (002).

Using this scheme, Table 12.16, Table 12.17, and Table 12.18 list the subblock ordering of doublewords for an 8-word block, based on three different starting-block addresses: 102, 112, and 012. The subblock ordering is generated by an XOR of the subblock address (either 102, 112, or 012) with the binary count of the doubleword (002 through 112). Thus, the third doubleword retrieved from a block of data with a starting address of 102 is found by taking the XOR of address 102 with the binary count of DW2, 102. The result is 002, or DW0 (shown in Table 12.16).

<table>
<thead>
<tr>
<th>Cycle</th>
<th>Starting Block Address</th>
<th>Binary Count</th>
<th>Double Word Retrieved</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>10</td>
<td>00</td>
<td>10</td>
</tr>
<tr>
<td>2</td>
<td>10</td>
<td>01</td>
<td>11</td>
</tr>
<tr>
<td>3</td>
<td>10</td>
<td>10</td>
<td>00</td>
</tr>
<tr>
<td>4</td>
<td>10</td>
<td>11</td>
<td>01</td>
</tr>
</tbody>
</table>

Table 12.16  Sequence of Doublewords Transferred Using Subblock Ordering: Address 102

<table>
<thead>
<tr>
<th>Cycle</th>
<th>Starting Block Address</th>
<th>Binary Count</th>
<th>Double Word Retrieved</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>11</td>
<td>00</td>
<td>11</td>
</tr>
<tr>
<td>2</td>
<td>11</td>
<td>01</td>
<td>10</td>
</tr>
<tr>
<td>3</td>
<td>11</td>
<td>10</td>
<td>01</td>
</tr>
<tr>
<td>4</td>
<td>11</td>
<td>11</td>
<td>00</td>
</tr>
</tbody>
</table>

Table 12.17  Sequence of Doublewords Transferred Using Subblock Ordering: Address 112

<table>
<thead>
<tr>
<th>Cycle</th>
<th>Starting Block Address</th>
<th>Binary Count</th>
<th>Double Word Retrieved</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>01</td>
<td>00</td>
<td>01</td>
</tr>
<tr>
<td>2</td>
<td>01</td>
<td>01</td>
<td>00</td>
</tr>
<tr>
<td>3</td>
<td>01</td>
<td>10</td>
<td>11</td>
</tr>
<tr>
<td>4</td>
<td>01</td>
<td>11</td>
<td>10</td>
</tr>
</tbody>
</table>

Table 12.18  Sequence of Doublewords Transferred Using Subblock Ordering: Address 012

For block write requests, the processor always delivers the address of the doubleword at the beginning of the block; the processor delivers data beginning with the doubleword at the beginning of the block and progresses sequentially through the doublewords that form the block.
During data cycles, the valid byte lines depend upon the position of the data with respect to the aligned doubleword (this may be a byte, halfword, tribyte, quadbyte/word, quintibyte, sextibyte, septibyte, or an octalbyte/doubleword). For example, in little-endian mode, on a byte request where the address modulo 8 is 0, **SysAD(7:0)** are valid during the data cycles.

Table 12.19 shows the byte lanes used for partial word transfers for both little and big endian.

<table>
<thead>
<tr>
<th># Bytes</th>
<th>Address Mod 8</th>
<th>SysAD byte lanes used (big endian)</th>
<th>63:56</th>
<th>55:48</th>
<th>47:40</th>
<th>39:32</th>
<th>31:24</th>
<th>23:16</th>
<th>15:8</th>
<th>7:0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 (000)</td>
<td>0</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>2</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>3</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>5</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>7</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>2 (001)</td>
<td>0</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>2</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>3 (010)</td>
<td>0</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>5</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4 (011)</td>
<td>0</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>5 (100)</td>
<td>0</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>3</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6 (101)</td>
<td>0</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>2</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>7 (110)</td>
<td>0</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>8 (111)</td>
<td>0</td>
<td>•</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>SysAD byte lanes used (little endian)</th>
</tr>
</thead>
</table>

Table 12.19 Partial Word Transfer Byte Lane Usage
Processor Internal Address Map

External reads and writes provide access to processor internal resources that may be of interest to an external agent. The processor decodes bits \texttt{SysAD(6:0)} of the address associated with an external read or write request to determine which processor internal resource is the target.

However, the R4600/R4700 does not contain any resources that are \textit{readable} through an external read request. Therefore, in response to an external read request the processor returns undefined data and a data identifier with its \textit{Erroneous Data} bit, \texttt{SysCmd(5)}, set.

The \textit{Interrupt} register is the only processor internal resource available for \textit{write} access by an external request. The \textit{Interrupt} register is accessed by an external write request with an address of \texttt{0002} on bits 6:4 of the \texttt{SysAD} bus.

The interrupt register is described in detail in Chapter 13, “R4600/R4700 Processor Interrupts.”
Introduction

The R4600/R4700 processor supports the following interrupts: six hardware interrupts, one internal “timer interrupt,” two software interrupts, and one nonmaskable interrupt. The processor takes an exception on any interrupt.

This chapter describes the six hardware and single nonmaskable interrupts. A description of the software and the timer interrupts can be found in Chapter 5. CPU exception processing is also described in Chapter 5. Floating-point exception processing is described in Chapter 6.

Hardware Interrupts

The six CPU hardware interrupts can be caused by external write requests to the R4600/R4700, or can be caused through dedicated interrupt pins. These pins are latched into an internal register by the rising edge of SClock.

Nonmaskable Interrupt (NMI)

The nonmaskable interrupt is caused either by an external write request to the R4600/R4700 or by a dedicated pin in the R4600/R4700. This pin is latched into an internal register by the rising edge of SClock.

Asserting Interrupts

External writes to the CPU are directed to various internal resources, based on an internal address map of the processor. When SysAD[6:0] = 0 during an ADDR cycle of external write request, an external write to any address writes to an architecturally transparent register called the Interrupt register; this register is available for external write cycles, but not for external reads.

During a data cycle, SysAD[22:16] are the write enables for the seven individual Interrupt register bits (0 = disabled, 1 = enabled) and SysAD[6:0] are the values to be written into these bits (0 = no interrupt, 1 = interrupt). This allows any subset of the Interrupt register to be set or cleared with a single write request. Figure 13.1 shows the mechanics of an external write to the Interrupt register.

![Figure 13.1 Interrupt Register Bits and Enables](image-url)
Figure 13.2 shows how the R4600/R4700 interrupts are readable through the *Cause* register. The interrupt bits, $\text{Int}^{*(5:0)}$, are latched into the internal register by the rising edge of $\text{SClock}$.

- Bit 5 of the *Interrupt* register in the R4600/R4700 is ORed with the $\text{Int}^{*(5)}$ pin and then multiplexed with the internal *Timer* $\text{Interrupt}$ signal. This result is directly readable as bit 15 of the *Cause* register.
- Bits 4:0 of the *Interrupt* register are bit-wise ORed with the current value of the interrupt pins $\text{Int}^{*[4:0]}$ and the result is directly readable as bits 14:10 of the *Cause* register.

Figure 13.3 shows the internal derivation of the *NMI* signal, for the R4600/R4700 processor.

The *NMI* pin is latched into an internal register by the rising edge of $\text{SClock}$. Bit 6 of the *Interrupt* register is then ORed with the inverted value of *NMI* to form the nonmaskable interrupt. Only the one falling edge of the latched signal will cause the NMI.
Figure 13.4 shows the masking of the R4600/R4700 interrupt signal.

- **Cause** register bits 15:8 (IP7-IP0) are AND-ORed with **Status** register interrupt mask bits 15:8 (IM7-IM0) to mask individual interrupts.
- **Status** register bit 0 is a global Interrupt Enable (IE). It is ANDed with the output of the AND-OR logic to produce the R4600/R4700 interrupt signal.

![Figure 13.4 Masking of the R4600/R4700 Interrupts](image-url)
Introduction
This chapter describes the Error Checking mechanism used in the R4600/R4700 processor.

Error Checking in the Processor
Error checking codes allow the processor to detect and sometimes correct errors made when moving data from one place to another.
Two major types of data errors can occur in data transmission:
• hard errors, which are permanent, arise from broken interconnects, internal shorts, or open leads
• soft errors, which are transient, are caused by system noise, power surges, and alpha particles.
Hard errors must be corrected by physical repair of the damaged equipment and restoration of data from backup. Soft errors can be corrected by using error checking and correcting codes.

Types of Error Checking
The R4600/R4700 uses parity (error detection only).

Parity Error Detection
Parity is the simplest error detection scheme. By appending a bit to the end of an item of data—called a parity bit—single bit errors can be detected; however, these errors cannot be corrected.
There are two types of parity:
• Odd Parity adds 1 to any even number of 1s in the data, making the total number of 1s odd (including the parity bit).
• Even Parity adds 1 to any odd number of 1s in the data, making the total number of 1s even (including the parity bit).
Odd and even parity are shown in the example below:

<table>
<thead>
<tr>
<th>Data(3:0)</th>
<th>Odd Parity Bit</th>
<th>Even Parity Bit</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 0 1 0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0 0 0 0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1 1 1 1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1 1 0 1</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

The example above shows a single bit in Data(3:0) with a value of 1; this bit is Data(1).
• In even parity, the parity bit is set to 1. This makes 2 (an even number) the total number of bits with a value of 1.
• Odd parity makes the parity bit a 0 to keep the total number of 1-value bits an odd number—in the case shown above, the single bit Data(1).

The example below shows odd and even parity bits for various data values:

<table>
<thead>
<tr>
<th>Data(3:0)</th>
<th>Odd Parity Bit</th>
<th>Even Parity Bit</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 1 1 0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0 0 0 0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1 1 1 1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1 1 0 1</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

Parity allows single-bit error detection, but it does not indicate which bit is in error—for example, suppose an odd-parity value of 00011 arrives. The last bit is the parity bit, and since odd parity demands an odd number (1,3,5) of 1s, this data is in error: it has an even number of 1s. However it is impossible to tell which bit is in error.
**Error Checking Operation**

The processor verifies data correctness by using parity as it passes data from/to the system interface to/from the primary caches.

**System Interface**

The processor generates correct check bits for doubleword, word, or partial-word data transmitted to the system interface. As it checks for data correctness, the processor passes data check bits from the primary cache, directly without changing the bits, to the system interface.

The processor does not check data received from the system interface for external writes. By setting the `NChck` bit in the data identifier, it is possible to prevent the processor from checking read response data from the system interface.

For cache refill, if the NChck bit is set, the CPU will generally correct parity before placing data into the cache. The R4600/R4700 only checks parity for the first double word returned on a block instruction fetch, that is, for the double word that contains the instruction that was missed on in the cache. This double word is checked just as if it had been read out of the ICache. This parity check is done as a byte parity check. For single read, and with the NChck bit set, the CPU will check parity for all 64-bit, even if the transfer size is less than that.

When the R4600/R4700 is checking parity it does not actually regenerate the word parity, but rather turns the byte parity supplied by the system into word parity. It XORS the bits in groups of four. As a result, if bad byte parity is supplied by the system, bad word parity will get written into the cache. This is done to be consistent with what happens in the DCache.

The processor does not check addresses received from the system interface and does not generate correct check bits for addresses transmitted to the system interface.

The processor does not contain a data corrector; instead, the processor takes a cache error exception when it detects an error based on data check bits. Software is responsible for error handling.

**System Interface Command Bus**

In the R4600/R4700 processor, the system interface command bus has no parity. `SysCmdP` always drives zero out for CPU valid cycles and is not checked when the system interface is in slave state.
### Summary of Error Checking Operations

Error Checking operations are summarized in Table 14.1 and Table 14.2.

<table>
<thead>
<tr>
<th>Bus</th>
<th>Uncached Load</th>
<th>Uncached Store</th>
<th>Primary Cache Load from System Interface</th>
<th>Primary Cache Write to System Interface</th>
<th>Cache Instruction</th>
</tr>
</thead>
<tbody>
<tr>
<td>Processor Data</td>
<td>From System Interface</td>
<td>Not Checked</td>
<td>From System Interface unchanged</td>
<td>Checked; Trap on Error</td>
<td>Check on cache write-back; Trap on Error</td>
</tr>
<tr>
<td>System Interface Address/Command and Check Bits: Transmit</td>
<td>Not Generated</td>
<td>Not Generated</td>
<td>Not Generated</td>
<td>Not Generated</td>
<td>Not Generated</td>
</tr>
<tr>
<td>System Interface Address/Command and Check Bits: Receive</td>
<td>Not Checked</td>
<td>NA</td>
<td>Not Checked</td>
<td>NA</td>
<td>NA</td>
</tr>
<tr>
<td>System Interface Data</td>
<td>Checked; Trap on Error</td>
<td>From Processor</td>
<td>Checked; Trap on Error</td>
<td>From Primary Cache</td>
<td>From Primary Cache</td>
</tr>
<tr>
<td>System Interface Data Check Bits</td>
<td>Checked; Trap on Error</td>
<td>Generated</td>
<td>Checked; Trap on Error</td>
<td>From Primary Cache</td>
<td>From Primary Cache</td>
</tr>
</tbody>
</table>

Table 14.1 Error Checking and Correcting Summary for Internal Transactions

<table>
<thead>
<tr>
<th>Bus</th>
<th>Read Request</th>
<th>Write Request</th>
</tr>
</thead>
<tbody>
<tr>
<td>Processor Data</td>
<td>NA</td>
<td>NA</td>
</tr>
<tr>
<td>System Interface Address, Command, and Check Bits: Transmit</td>
<td>Generated</td>
<td>NA</td>
</tr>
<tr>
<td>System Interface Address, Command, and Check Bits: Receive</td>
<td>Not Checked</td>
<td>Not Checked</td>
</tr>
<tr>
<td>System Interface Data</td>
<td>From Processor</td>
<td>Checked; Trap on Error</td>
</tr>
<tr>
<td>System Interface Data Check Bits</td>
<td>Generated</td>
<td>Checked; Trap on Error</td>
</tr>
</tbody>
</table>

Table 14.2 Error Checking and Correcting Summary for External Transactions
Introduction

This appendix provides a detailed description of the operation of each R4600/R4700 instruction. The instructions are listed in alphabetical order.

Exceptions that may occur due to the execution of each instruction are listed after the description of each instruction. Descriptions of the immediate cause and manner of handling exceptions are omitted from the instruction descriptions in this appendix.

Figures at the end of this appendix list the bit encoding for the constant fields of each instruction, and the bit encoding for each individual instruction is included with that instruction.

Instruction Classes

CPU instructions are divided into the following classes:

• **Load and Store** instructions move data between memory and general registers. They are all I-type instructions, since the only addressing mode supported is *base register + 16-bit immediate offset*.

• **Computational** instructions perform arithmetic, logical and shift operations on values in registers. They occur in both R-type (both operands are registers) and I-type (one operand is a 16-bit immediate) formats.

• **Jump and Branch** instructions change the control flow of a program. Jumps are always made to absolute 26-bit word addresses (J-type format), or register addresses (R-type), for returns and dispatches. Branches have 16-bit offsets relative to the program counter (I-type).

  **Jump and Link** instructions save their return address in register 31.

• **Coprocessor** instructions perform operations in the coprocessors. Coprocessor loads and stores are I-type. Coprocessor computational instructions have coprocessor-dependent formats (see the FPU instructions in Appendix B). Coprocessor zero (CP0) instructions manipulate the memory management and exception handling facilities of the processor.

• **Special** instructions perform a variety of tasks, including movement of data between special and general registers, trap, and breakpoint. They are always R-type.
Instruction Formats

Every CPU instruction consists of a single word (32 bits) aligned on a word boundary and the major instruction formats are shown in Figure A.1.

<table>
<thead>
<tr>
<th></th>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>I-Type (Immediate)</td>
<td>op</td>
<td>rs</td>
<td>rt</td>
<td>immediate</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>J-Type (Jump)</td>
<td>op</td>
<td>target</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>R-Type (Register)</td>
<td>op</td>
<td>rs</td>
<td>rt</td>
<td>rd</td>
<td>shamt</td>
<td>funct</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- **op**: 6-bit operation code
- **rs**: 5-bit source register specifier
- **rt**: 5-bit target (source/destination) or branch condition
- **immediate**: 16-bit immediate, branch displacement or address displacement
- **target**: 26-bit jump target address
- **rd**: 5-bit destination register specifier
- **shamt**: 5-bit shift amount
- **funct**: 6-bit function field

---

Instruction Notation Conventions

In this appendix, all variable subfields in an instruction format (such as *rs*, *rt*, *immediate*, etc.) are shown in lowercase names.

For the sake of clarity, we sometimes use an alias for a variable subfield in the formats of specific instructions. For example, we use *rs = base* in the format for load and store instructions. Such an alias is always lower case, since it refers to a variable subfield.

Figures with the actual bit encoding for all the mnemonics are located at the end of this Appendix, and the bit encoding also accompanies each instruction.

In the instruction descriptions that follow, the *Operation* section describes the operation performed by each instruction using a high-level language notation.
Special symbols used in the notation are described in Table A.1

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>←</td>
<td>Assignment.</td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td>(x_y)</td>
<td>Replication of bit value (x) into a (y)-bit string. Note: (x) is always a single-bit</td>
</tr>
<tr>
<td>(x_{yz})</td>
<td>Selection of bits (y) through (z) of bit string (x). Little-endian bit notation is always used. If (y) is less than (z), this expression is an empty (zero length) bit string.</td>
</tr>
<tr>
<td>+</td>
<td>2's complement or floating-point addition.</td>
</tr>
<tr>
<td>-</td>
<td>2's complement or floating-point subtraction.</td>
</tr>
<tr>
<td>*</td>
<td>2's complement or floating-point multiplication.</td>
</tr>
<tr>
<td>div</td>
<td>2's complement integer division.</td>
</tr>
<tr>
<td>mod</td>
<td>2's complement modulo.</td>
</tr>
<tr>
<td>/</td>
<td>Floating-point division.</td>
</tr>
<tr>
<td>&lt;</td>
<td>2's complement less than comparison.</td>
</tr>
<tr>
<td>and</td>
<td>Bit-wise logical AND.</td>
</tr>
<tr>
<td>or</td>
<td>Bit-wise logical OR.</td>
</tr>
<tr>
<td>xor</td>
<td>Bit-wise logical XOR.</td>
</tr>
<tr>
<td>nor</td>
<td>Bit-wise logical NOR.</td>
</tr>
<tr>
<td>GPR[(x)]</td>
<td>General-Register (x). The content of GPR[0] is always zero. Attempts to alter the content of GPR[0] have no effect.</td>
</tr>
<tr>
<td>CPR[(z;,x)]</td>
<td>Coprocessor unit (z), general register (x).</td>
</tr>
<tr>
<td>CCR[(z;,x)]</td>
<td>Coprocessor unit (z), control register (x).</td>
</tr>
<tr>
<td>COC[(z)]</td>
<td>Coprocessor unit (z) condition signal.</td>
</tr>
<tr>
<td>BigEndianMem</td>
<td>Big-endian mode as configured at reset (0 (\rightarrow) Little, 1 (\rightarrow) Big). Specifies the endianness of the memory interface (see LoadMemory and StoreMemory), and the endianness of Kernel and Supervisor mode execution.</td>
</tr>
<tr>
<td>ReverseEndian</td>
<td>Signal to reverse the endianness of load and store instructions in User mode; effected by setting the RE bit of the Status register. Thus, ReverseEndian may be computed as (SR(_{25}) and User mode).</td>
</tr>
<tr>
<td>BigEndianCPU</td>
<td>The endianness for load and store instructions (0 (\rightarrow) Little, 1 (\rightarrow) Big). In User mode, this endianness may be reversed by setting SR(_{25}). Thus, BigEndianCPU may be computed as BigEndianMem XOR ReverseEndian.</td>
</tr>
<tr>
<td>LLbit</td>
<td>Bit of state to specify synchronization instructions. Set by LL, cleared by ERET and Invalidate and read by SC.</td>
</tr>
<tr>
<td>T+i:</td>
<td>Indicates the time steps between operations. Each of the statements within a time step are defined to be executed in sequential order (as modified by conditional and loop constructs). Operations which are marked (T+i:) are executed at instruction cycle (i) relative to the start of execution of the instruction. Thus, an instruction which starts at time (j) executes operations marked (T+i:) at time (i + j). The interpretation of the order of execution between two instructions or two operations which execute at the same time should be pessimistic; the order is not defined.</td>
</tr>
</tbody>
</table>

Table A.1 CPU Instruction Operation Notations
Instruction Notation Examples

The following examples illustrate the application of some of the instruction notation conventions:

Example #1:

\[
GPR[rt] \leftarrow \text{immediate} \mid0^{16}
\]

Sixteen zero bits are concatenated with an immediate value (typically 16 bits), and the 32-bit string (with the lower 16 bits set to zero) is assigned to General-Purpose Register rt.

Example #2:

\[
(\text{immediate}_{15})^{16} \mid \text{immediate}_{15..0}
\]

Bit 15 (the sign bit) of an immediate value is extended for 16 bit positions, and the result is concatenated with bits 15 through 0 of the immediate value to form a 32-bit sign extended value.

Load and Store Instructions

In the R4600/R4700, as in the case of processors, the instruction immediately following a load may use the loaded contents of the register. In such cases, the hardware interlocks, requiring additional real cycles, so scheduling load delay slots is still desirable, although not required for functional code.

Two special instructions are provided in the R4600/R4700 implementation of the MIPS ISA, Load Linked and Store Conditional. These instructions are used in carefully coded sequences to provide one of several synchronization primitives, including test-and-set, bit-level locks, semaphores, and sequencers/event counts.

In the load and store descriptions, the functions listed in Table A.2 are used to summarize the handling of virtual addresses and physical memory.

<table>
<thead>
<tr>
<th>Function</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>AddressTranslation</td>
<td>Uses the TLB to find the physical address given the virtual address. The function fails and an exception is taken if the required translation is not present in the TLB.</td>
</tr>
<tr>
<td>LoadMemory</td>
<td>Uses the cache and main memory to find the contents of the word containing the specified physical address. The low-order two bits of the address and the Access Type field indicates which of each of the four bytes within the data word need to be returned. If the cache is enabled for this access, the entire word is returned and loaded into the cache.</td>
</tr>
<tr>
<td>StoreMemory</td>
<td>Uses the cache, write buffer, and main memory to store the word or part of word specified as data in the word containing the specified physical address. The low-order two bits of the address and the Access Type field indicates which of each of the four bytes within the data word should be stored.</td>
</tr>
</tbody>
</table>

Table A.2 Load and Store Common Functions
As shown in Table A.2, the Access Type field indicates the size of the data item to be loaded or stored. Regardless of access type or byte-numbering order (endianness), the address specifies the byte which has the smallest byte address in the addressed field. For a big-endian machine, this is the leftmost byte and contains the sign for a 2’s complement number; for a little-endian machine, this is the rightmost byte.

<table>
<thead>
<tr>
<th>Access Type Mnemonic</th>
<th>Value</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>DOUBLEWORD</td>
<td>7</td>
<td>8 bytes (64 bits)</td>
</tr>
<tr>
<td>SEPTIBYTE</td>
<td>6</td>
<td>7 bytes (56 bits)</td>
</tr>
<tr>
<td>SEXTIBYTE</td>
<td>5</td>
<td>6 bytes (48 bits)</td>
</tr>
<tr>
<td>QUINTIBYTE</td>
<td>4</td>
<td>5 bytes (40 bits)</td>
</tr>
<tr>
<td>WORD</td>
<td>3</td>
<td>4 bytes (32 bits)</td>
</tr>
<tr>
<td>TRIPLEBYTE</td>
<td>2</td>
<td>3 bytes (24 bits)</td>
</tr>
<tr>
<td>HALFWORD</td>
<td>1</td>
<td>2 bytes (16 bits)</td>
</tr>
<tr>
<td>BYTE</td>
<td>0</td>
<td>1 byte (8 bits)</td>
</tr>
</tbody>
</table>

Table A.3  Access Type Specifications for Loads/Stores

The bytes within the addressed doubleword which are used can be determined directly from the access type and the three low-order bits of the address.

**Jump and Branch Instructions**

All jump and branch instructions have an architectural delay of exactly one instruction. That is, the instruction immediately following a jump or branch (that is, occupying the delay slot) is always executed while the target instruction is being fetched from storage. A delay slot may not itself be occupied by a jump or branch instruction; however, this error is not detected and the results of such an operation are undefined.

If an exception or interrupt prevents the completion of a legal instruction during a delay slot, the hardware sets the **EPC** register to point at the jump or branch instruction that precedes it. When the code is restarted, both the jump or branch instructions and the instruction in the delay slot are reexecuted.

Because jump and branch instructions may be restarted after exceptions or interrupts, they must be restartable. Therefore, when a jump or branch instruction stores a return link value, register 31 (the register in which the link is stored) may not be used as a source register.

Since instructions must be word-aligned, a **Jump Register** or **Jump and Link Register** instruction must use a register whose two low-order bits are zero. If these low-order bits are not zero, an address exception will occur when the jump target instruction is subsequently fetched.
**Coprocessor Instructions**

Coprocessors are alternate execution units, which have register files separate from the CPU. The R4600/R4700 architecture (MIPS III) provides three coprocessor units, or classes, and these coprocessors have two register spaces, each space containing thirty-two registers. These registers may be either 32-bits or 64-bits wide.

- The first space, *coprocessor general* registers, may be directly loaded from memory and stored into memory, and their contents may be transferred between the coprocessor and processor.
- The second space, *coprocessor control* registers, may only have their contents transferred directly between the coprocessor and the processor. Coprocessor instructions may alter registers in either space.

**System Control Coprocessor (CP0) Instructions**

There are some special limitations imposed on operations involving CP0 that is incorporated within the CPU. The move to/from coprocessor instructions are the only valid mechanism for writing to and reading from the CP0 registers.

Several CP0 instructions are defined to directly read, write, and probe TLB entries and to modify the operating modes in preparation for returning to User mode or interrupt-enabled states.
**ADD**

Format:

ADD rd, rs, rt

Description:

The contents of general register rs and the contents of general register rt are added to form the result. The result is placed into general register rd. The operands must be valid sign-extended, 32-bit values.

An overflow exception occurs if the carries out of bits 30 and 31 differ (2’s complement overflow). The destination register rd is not modified when an integer overflow exception occurs.

Operation:

\[
\begin{align*}
T & : \quad \text{temp} \leftarrow \text{GPR}[rs] + \text{GPR}[rt] \\
\text{GPR}[rd] & \leftarrow (\text{temp}_{31})^{32} || \text{temp}_{31..0}
\end{align*}
\]

Exceptions:

Integer overflow exception
### ADDI - Add Immediate

#### Format:

```
ADDI rt, rs, immediate
```

#### Description:

The 16-bit `immediate` is sign-extended and added to the contents of general register `rs` to form the result. The result is placed into general register `rt`. The `rs` operand must be valid sign-extended, 32-bit values.

An overflow exception occurs if carries out of bits 30 and 31 differ (2’s complement overflow). The destination register `rt` is not modified when an integer overflow exception occurs.

#### Operation:

```
T: temp ← GPR[rs] + (immediate15)48 || immediate15..0
GPR[rt] ← (temp31)32 || temp31..0
```

#### Exceptions:

- Integer overflow exception
ADDIU  Add Immediate Unsigned

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADDIU</td>
<td>rs</td>
<td>rt</td>
<td>immediate</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 0 1 0 0 1</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

ADDIU  rt, rs, immediate

**Description:**

The 16-bit *immediate* is sign-extended and added to the contents of general register rs to form the result. The result is placed into general register rt. No integer overflow exception occurs under any circumstances. The rs operand must be valid sign-extended, 32-bit values.

The only difference between this instruction and the ADDI instruction is that ADDIU never causes an overflow exception.

**Operation:**

\[
\text{T: } \text{temp} \leftarrow \text{GPR[rs]} + (\text{immediate}_{15})^{48} \parallel \text{immediate}_{15..0} \\
\text{GPR[rt]} \leftarrow (\text{temp}_{31})^{32} \parallel \text{temp}_{31..0}
\]

**Exceptions:**

None
## ADDU

### Add Unsigned

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>rs</td>
<td>rt</td>
<td>rd</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>ADDU</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

ADDU rd, rs, rt

**Description:**

The contents of general register rs and the contents of general register rt are added to form the result. The result is placed into general register rd. No overflow exception occurs under any circumstances. The source operands must be valid sign-extended, 32-bit values.

The only difference between this instruction and the ADD instruction is that ADDU never causes an overflow exception.

**Operation:**

\[
\text{T: } \text{temp} \leftarrow \text{GPR[rs]} + \text{GPR[rt]} \\
\text{GPR[rd]} \leftarrow (\text{temp}_{31})^{32} || \text{temp}_{31\ldots0}
\]

**Exceptions:**

None
**AND**

**Format:**

AND rd, rs, rt

**Description:**
The contents of general register rs are combined with the contents of general register rt in a bit-wise logical AND operation. The result is placed into general register rd.

**Operation:**

\[
T: \text{GPR}[rd] \leftarrow \text{GPR}[rs] \text{ and GPR}[rt]
\]

**Exceptions:**
None
**ANDI**

**And Immediate**

**ANDI**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>ANDI</td>
<td>rs</td>
<td>rt</td>
<td>immediate</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 0 1 1 0 0</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
ANDI  rt, rs, immediate

**Description:**
The 16-bit *immediate* is zero-extended and combined with the contents of general register *rs* in a bit-wise logical AND operation. The result is placed into general register *rt*.

**Operation:**

\[
T: \ GPR[rt] \leftarrow 0^{48} || (\text{immediate and GPR[rs]_{15..0}})
\]

**Exceptions:**
None
BCzF Branch On Coprocessor z False

<table>
<thead>
<tr>
<th>31 26 25 21 20 16 15 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COPz 0 1 0 0 x x*</td>
</tr>
<tr>
<td>6 5 5 16</td>
</tr>
</tbody>
</table>

Format:

BCzF offset

Description:

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If coprocessor z’s condition signal (CpCond), as sampled during the previous instruction, is false, then the program branches to the target address with a delay of one instruction.

Because the internal condition signal is sampled during the previous instruction, there must be at least one instruction between this instruction and a coprocessor instruction that changes the internal condition signal.

Operation:

\[
\begin{align*}
T-1: & \quad \text{condition} \leftarrow \text{not COC}[z] \\
T: & \quad \text{target} \leftarrow (\text{offset}_{15})_{16} || \text{offset} || 0^2 \\
T+1: & \quad \text{if condition then} \\
& \quad \text{PC} \leftarrow \text{PC} + \text{target} \\
& \quad \text{endif}
\end{align*}
\]

Note: *See the table “Opcode Bit Encoding” on next page, or “CPU Instruction Opcode Bit Encoding” at the end of Appendix A.

Exceptions:

Coprocessor unusable exception

Opcode Bit Encoding:

<table>
<thead>
<tr>
<th>BCzF Bit # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>BC0F Bit # 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0</td>
</tr>
<tr>
<td>BC1F Bit # 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0</td>
</tr>
<tr>
<td>BC2F Bit # 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0</td>
</tr>
</tbody>
</table>

- Opcode
- Coprocessor Unit Number
- BC sub-opcode
- Branch condition
BCzFL  Branch On Coprocessor z False Likely  BCzFL

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COPz</td>
<td>BC</td>
<td>BCFL</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>01000x*</td>
<td>01000</td>
<td>00010</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
BCzFL  offset

**Description:**
A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If the contents of coprocessor z's condition signal, as sampled during the previous instruction, is false, the target address is branched to with a delay of one instruction.

If the conditional branch is not taken, the instruction in the branch delay slot is nullified.

Because the internal condition signal is sampled during the previous instruction, there must be at least one instruction between this instruction and a coprocessor instruction that changes the internal condition signal.

NOTE: *See the table “Opcode Bit Encoding” on next page, or “CPU Instruction Opcode Bit Encoding” at the end of Appendix A.

**Operation:**

\[
\begin{align*}
T-1: & \quad \text{condition} \leftarrow \text{not} \ COC[z] \\
T: & \quad \text{target} \leftarrow (\text{offset}_{15})_{10} \|\| \text{offset} \|\| 0^2 \\
T+1: & \quad \text{if condition then} \\
& \quad \quad \text{PC} \leftarrow \text{PC} + \text{target} \\
& \quad \quad \text{else} \\
& \quad \quad \quad \text{NullifyCurrentInstruction} \\
& \quad \quad \text{endif}
\end{align*}
\]

**Exceptions:**
Coprocessor unusable exception

**Opcode Bit Encoding:**

<table>
<thead>
<tr>
<th>BCzFL</th>
<th>Bit #</th>
<th>31</th>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>20</th>
<th>19</th>
<th>18</th>
<th>17</th>
<th>16</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>BC0FL</td>
<td>Bit #</td>
<td>31</td>
<td>30</td>
<td>29</td>
<td>28</td>
<td>27</td>
<td>26</td>
<td>25</td>
<td>24</td>
<td>23</td>
<td>22</td>
<td>21</td>
<td>20</td>
<td>19</td>
<td>18</td>
<td>17</td>
<td>16</td>
<td>0</td>
</tr>
<tr>
<td>BC1FL</td>
<td>Bit #</td>
<td>31</td>
<td>30</td>
<td>29</td>
<td>28</td>
<td>27</td>
<td>26</td>
<td>25</td>
<td>24</td>
<td>23</td>
<td>22</td>
<td>21</td>
<td>20</td>
<td>19</td>
<td>18</td>
<td>17</td>
<td>16</td>
<td>0</td>
</tr>
<tr>
<td>BC2FL</td>
<td>Bit #</td>
<td>31</td>
<td>30</td>
<td>29</td>
<td>28</td>
<td>27</td>
<td>26</td>
<td>25</td>
<td>24</td>
<td>23</td>
<td>22</td>
<td>21</td>
<td>20</td>
<td>19</td>
<td>18</td>
<td>17</td>
<td>16</td>
<td>0</td>
</tr>
</tbody>
</table>

Coprocessor Unit Number

Object Code

BC sub-opcode

Branch condition
**BCzT**  Branch On Coprocessor z True  **BCzT**

<table>
<thead>
<tr>
<th>31</th>
<th>26 25</th>
<th>21 20</th>
<th>16 15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COPz</td>
<td>BC</td>
<td>BCT</td>
<td>offset</td>
<td></td>
</tr>
<tr>
<td>0 1 0 0 x x*</td>
<td>0 1 0 0 0</td>
<td>0 0 0 0 1</td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

BCzT  offset

**Description:**

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If the coprocessor z's condition signal (CpCond) is true, then the program branches to the target address, with a delay of one instruction.

Because the internal condition signal is sampled during the previous instruction, there must be at least one instruction between this instruction and a coprocessor instruction that changes the internal condition signal.

**Operation:**

\[
\begin{align*}
T-1: & \text{ condition } \leftarrow \text{COC}[z] \\
T: & \text{ target } \leftarrow (\text{offset}_{16}) \parallel \text{offset} \parallel 0^2 \\
T+1: & \text{ if condition then} \\
& \quad \text{PC } \leftarrow \text{PC } + \text{target} \\
& \quad \text{endif}
\end{align*}
\]

NOTE: “See the table “Opcode Bit Encoding” on next page, or “CPU Instruction Opcode Bit Encoding” at the end of Appendix A.

**Exceptions:**

Coprocessor unusable exception

**Opcode Bit Encoding:**

<table>
<thead>
<tr>
<th>BCzT</th>
<th>Bit #</th>
<th>31</th>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>20</th>
<th>19</th>
<th>18</th>
<th>17</th>
<th>16</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td></td>
<td></td>
</tr>
<tr>
<td>BC0T</td>
<td>Bit #</td>
<td>31</td>
<td>30</td>
<td>29</td>
<td>28</td>
<td>27</td>
<td>26</td>
<td>25</td>
<td>24</td>
<td>23</td>
<td>22</td>
<td>21</td>
<td>20</td>
<td>19</td>
<td>18</td>
<td>17</td>
<td>16</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td></td>
<td></td>
</tr>
<tr>
<td>BC1T</td>
<td>Bit #</td>
<td>31</td>
<td>30</td>
<td>29</td>
<td>28</td>
<td>27</td>
<td>26</td>
<td>25</td>
<td>24</td>
<td>23</td>
<td>22</td>
<td>21</td>
<td>20</td>
<td>19</td>
<td>18</td>
<td>17</td>
<td>16</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td></td>
<td></td>
</tr>
<tr>
<td>BC2T</td>
<td>Bit #</td>
<td>31</td>
<td>30</td>
<td>29</td>
<td>28</td>
<td>27</td>
<td>26</td>
<td>25</td>
<td>24</td>
<td>23</td>
<td>22</td>
<td>21</td>
<td>20</td>
<td>19</td>
<td>18</td>
<td>17</td>
<td>16</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
BCzTL  Branch On Coprocessor z  True Likely

<table>
<thead>
<tr>
<th>COPz</th>
<th>BC</th>
<th>BCTL</th>
<th>offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 1 0 0 x*</td>
<td>0 1 0 0 0</td>
<td>0 0 0 1 1</td>
<td>0 1 6 5 16 15</td>
</tr>
</tbody>
</table>

Format:

BCzTL  offset

Description:

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If the contents of coprocessor z's condition signal, as sampled during the previous instruction, is true, the target address is branched to with a delay of one instruction.

If the conditional branch is not taken, the instruction in the branch delay slot is nullified.

Because the internal condition signal is sampled during the previous instruction, there must be at least one instruction between this instruction and a coprocessor instruction that changes the internal condition signal.

Operation:

\[
\begin{align*}
T-1: & \quad \text{condition} \leftarrow \text{COC}[z] \\
T: & \quad \text{target} \leftarrow (\text{offset}_{15})_{\text{BC}}|| \text{offset} || 0^2 \\
T+1: & \quad \begin{cases} 
\text{PC} \leftarrow \text{PC} + \text{target} \\
\text{else} \\
\text{NullifyCurrentInstruction}
\end{cases}
\end{align*}
\]

NOTE: “See the table “Opcode Bit Encoding” on next page, or “CPU Instruction Opcode Bit Encoding” at the end of Appendix A.

Exceptions:

Coprocessor unusable exception

Opcode Bit Encoding:

<table>
<thead>
<tr>
<th>Bit #</th>
<th>CPU Instruction Set Details Appendix A</th>
</tr>
</thead>
<tbody>
<tr>
<td>31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0</td>
<td>BCzTL</td>
</tr>
<tr>
<td>0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1</td>
<td>BC0TL</td>
</tr>
<tr>
<td>0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1</td>
<td>BC1TL</td>
</tr>
<tr>
<td>0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 1</td>
<td>BC2TL</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Opcode</th>
<th>Coprocessor Unit Number</th>
<th>BC sub-opcode</th>
<th>Branch condition</th>
</tr>
</thead>
<tbody>
<tr>
<td>BCzTL</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
**BEQ**  Branch On Equal  **BEQ**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>BEQ</td>
<td>0 0 0 1 0 0</td>
<td>rs</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
BEQ rs, rt, offset

**Description:**
A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. The contents of general register rs and the contents of general register rt are compared. If the two registers are equal, then the program branches to the target address, with a delay of one instruction.

**Operation:**

| T: target ← (offset_{15})^46 || offset || 0^2  
| condition ← (GPR[rs] = GPR[rt])  
| T+1: if condition then  
| PC ← PC + target  
| endif |

**Exceptions:**
None
**BEQL**  
**Branch On Equal Likely**  

<table>
<thead>
<tr>
<th>Operation</th>
<th>Description</th>
</tr>
</thead>
</table>

**Format:**

BEQL rs, rt, offset

**Description:**

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. The contents of general register rs and the contents of general register rt are compared. If the two registers are equal, the target address is branched to, with a delay of one instruction. If the conditional branch is not taken, the instruction in the branch delay slot is nullified.

**Operation:**

\[
\begin{align*}
T: & \quad \text{target} \leftarrow (\text{offset}_{16}) \parallel \text{offset} \parallel 0^2 \\
& \quad \text{condition} \leftarrow (\text{GPR}[rs] = \text{GPR}[rt]) \\
T+1: & \quad \text{if condition then} \\
& \quad \quad \text{PC} \leftarrow \text{PC} + \text{target} \\
& \quad \text{else} \\
& \quad \quad \text{NullifyCurrentInstruction} \\
& \quad \text{endif}
\end{align*}
\]

**Exceptions:**

None
**BGEZ**

**Format:**

BGEZ rs, offset

**Description:**

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If the contents of general register rs have the sign bit cleared, then the program branches to the target address, with a delay of one instruction.

**Operation:**

```
T:  target <- (offset_{16}) || offset || 0^2
    condition <- (GPR[rs]_{63} = 0)
T+1: if condition then
      PC <- PC + target
    endif
```

**Exceptions:**

None
**BGEZAL**  
**Branch On Greater Than**  
**Or Equal To Zero And Link**  
**BGEZAL**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>REGIMM</td>
<td>rs</td>
<td>BGEZAL</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 0 0 0 0 1</td>
<td>1 0 0 0 1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
BGEZAL rs, offset

**Description:**
A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. Unconditionally, the address of the instruction after the delay slot is placed in the link register, r31. If the contents of general register rs have the sign bit cleared, then the program branches to the target address, with a delay of one instruction.

General register rs may not be general register 31, because such an instruction is not restartable. An attempt to execute this instruction is not trapped, however.

**Operation:**

\[
\begin{align*}
T & : \text{target} \leftarrow (\text{offset}_{15})^{46} || \text{offset} || 0^2 \\
& \quad \text{condition} \leftarrow (\text{GPR}[rs]_{63} = 0) \\
& \quad \text{GPR}[31] \leftarrow \text{PC} + 8 \\
T+1 & : \text{if condition then} \\
& \quad \text{PC} \leftarrow \text{PC} + \text{target} \\
& \quad \text{endif}
\end{align*}
\]

**Exceptions:**
None
BGEZALL  Branch On Greater Than 
Or Equal To Zero 
And Link Likely

**Format:**

BGEZALL rs, offset

**Description:**

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. Unconditionally, the address of the instruction after the delay slot is placed in the link register, r31. If the contents of general register rs have the sign bit cleared, then the program branches to the target address, with a delay of one instruction. General register rs may not be general register 31, because such an instruction is not restartable. An attempt to execute this instruction is not trapped, however. If the conditional branch is not taken, the instruction in the branch delay slot is nullified.

**Operation:**

| T: | target ← (offset_{16})_{46} \| offset \| 0^{2} |
| T+1: | if condition then |
| | PC ← PC + target |
| | else |
| | NullifyCurrentInstruction |
| | endif |

\[
\begin{array}{ccccccc}
31 & 26 & 25 & 21 & 20 & 16 & 15 & 0 \\
\hline
\text{REGIMM} & \text{rs} & \text{BGEZALL} & \text{offset} \\
000001 & & 10011 & \\
6 & 5 & 5 & 16 \\
\end{array}
\]

**Exceptions:**

None
### BGEZL

**Branch On Greater Than Or Equal To Zero Likely**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>REGIMM</td>
<td>rs</td>
<td>BGEZL</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>000001</td>
<td>00011</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

| 6 | 5 | 5 | 16 |

**Format:**

BGEZL rs, offset

**Description:**

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If the contents of general register rs have the sign bit cleared, then the program branches to the target address, with a delay of one instruction. If the conditional branch is not taken, the instruction in the branch delay slot is nullified.

**Operation:**

```plaintext
T: target ← (offset_{15})^{46} || offset || 0^2
   condition ← (GPR[rs]_{63} = 0)
T+1: if condition then
   PC ← PC + target
  else
     NullifyCurrentInstruction
endif
```

**Exceptions:**

None
**BGTZ**  Branch On Greater Than Zero

**Format:**

BGTZ rs, offset

**Description:**

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. The contents of general register rs are compared to zero. If the contents of general register rs have the sign bit cleared and are not equal to zero, then the program branches to the target address, with a delay of one instruction.

**Operation:**

| T: | target ← (offset_{15}) || offset || 0^2 |
|    | condition ← (GPR[rs]_63 = 0) and (GPR[rs] \neq 0^64) |
| T+1: | if condition then |
|      | PC ← PC + target |
|      | endif |

**Exceptions:**

None
**BGTZL**  
**Branch On Greater Than Zero Likely**  

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>BGTZL</td>
<td>rs</td>
<td>0</td>
<td>00000</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- **Format:**
  
  BGTZL rs, offset

- **Description:**
  
  A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit *offset*, shifted left two bits and sign-extended. The contents of general register *rs* are compared to zero. If the contents of general register *rs* have the sign bit cleared and are not equal to zero, then the program branches to the target address, with a delay of one instruction. If the conditional branch is not taken, the instruction in the branch delay slot is nullified.

- **Operation:**

  T: target ← (offset << 2) || offset || 0^2

  condition ← (GPR[rs] = 0) and (GPR[rs] != 0^64)

  T+1: if condition then

  PC ← PC + target

  else

  NullifyCurrentInstruction

  endif

- **Exceptions:**

  None
**Format:**

BLEZ rs, offset

**Description:**

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. The contents of general register rs are compared to zero. If the contents of general register rs have the sign bit set, or are equal to zero, then the program branches to the target address, with a delay of one instruction.

**Operation:**

\[
\begin{align*}
T: & \quad \text{target} \leftarrow (\text{offset}_{15})^{46} \mathbin{||} \text{offset} \mathbin{||} 0^2 \\
& \quad \text{condition} \leftarrow (\text{GPR}[\text{rs}]_{63} = 1) \text{ and } (\text{GPR}[\text{rs}] = 0^{64}) \\
T+1: & \quad \text{if condition then} \\
& \quad \quad \text{PC} \leftarrow \text{PC} + \text{target} \\
& \quad \quad \text{endif}
\end{align*}
\]

**Exceptions:**

None
Format:
BLEZL rs, offset

Description:
A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. The contents of general register rs is compared to zero. If the contents of general register rs have the sign bit set, or are equal to zero, then the program branches to the target address, with a delay of one instruction.

If the conditional branch is not taken, the instruction in the branch delay slot is nullified.

Operation:

| T: | target ← (offset15) 46 || offset || 0^2 |
| --- | --- |
| condition ← (GPR[rs]63 = 1) and (GPR[rs] = 0^64) |
| T+1: if condition then PC ← PC + target |
| else NullifyCurrentInstruction |
| endif |

Exceptions:
None
### BLTZ
Branch On Less Than Zero

**Format:**

BLTZ rs, offset

**Description:**

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If the contents of general register rs have the sign bit set, then the program branches to the target address, with a delay of one instruction.

**Operation:**

\[
T: \quad \text{target} \leftarrow (\text{offset}_{15})^{46} \parallel \text{offset} \parallel 0^2 \\
\text{condition} \leftarrow (\text{GPR}[rs]_{63} = 1) \\
T+1: \quad \text{if condition then} \\
\quad \text{PC} \leftarrow \text{PC} + \text{target} \\
\text{endif}
\]

**Exceptions:**

None
BLTZAL

Branch On Less Than Zero And Link

BLTZAL

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>REGIMM</td>
<td>rs</td>
<td>BLTZAL</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 0 0 0 0 1</td>
<td>1 0 0 0 0</td>
<td>6 5 5 16</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
BLTZAL rs, offset

**Description:**
A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. Unconditionally, the address of the instruction after the delay slot is placed in the link register, r31. If the contents of general register rs have the sign bit set, then the program branches to the target address, with a delay of one instruction.

General register rs may not be general register 31, because such an instruction is not restartable. An attempt to execute this instruction with register 31 specified as rs is not trapped, however.

**Operation:**

\[
\begin{align*}
T & : \ target \leftarrow (\text{offset}_{16})^{46} \ || \ \text{offset} \ || \ 0^2 \\
    & \quad \text{condition} \leftarrow (\text{GPR}[rs]_8 = 1) \\
    & \quad \text{GPR}[31] \leftarrow \text{PC} + 8 \\
T+1 & : \text{if condition then} \\
    & \quad \text{PC} \leftarrow \text{PC} + \text{target} \\
    & \quad \text{endif}
\end{align*}
\]

**Exceptions:**
None
## Format:

BLTZALL rs, offset

## Description:

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. Unconditionally, the address of the instruction after the delay slot is placed in the link register, r31. If the contents of general register rs have the sign bit set, then the program branches to the target address, with a delay of one instruction.

General register rs may not be general register 31, because such an instruction is not restartable. An attempt to execute this instruction with register 31 specified as rs is not trapped, however. If the conditional branch is not taken, the instruction in the branch delay slot is nullified.

## Operation:

| T: | target ← (offset_{15})^{46} || offset || 0^2 |
|    | condition ← (GPR[rs]_{63} = 1) |
|    | GPR[31] ← PC + 8 |
| T+1: | if condition then |
|     | PC ← PC + target |
|     | else |
|     | NullifyCurrentInstruction |
| endif |

## Exceptions:

None
**BLTZL**  Branch On Less Than Zero Likely  

![BLTZL Instruction Format](image)

**Format:**

BLTZ rs, offset

**Description:**

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If the contents of general register rs have the sign bit set, then the program branches to the target address, with a delay of one instruction. If the conditional branch is not taken, the instruction in the branch delay slot is nullified.

**Operation:**

| T: | target ← (offset_{16} \| offset \| 0^2) 
condition ← (GPR[rs]_{63} = 1) |
| T+1: | if condition then 
PC ← PC + target 
else 
NullifyCurrentInstruction 
endif |

**Exceptions:**

None
**BNE**  
**Branch On Not Equal**  

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>BNE</td>
<td>rs</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**  
BNE rs, rt, offset

**Description:**  
A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. The contents of general register rs and the contents of general register rt are compared. If the two registers are not equal, then the program branches to the target address, with a delay of one instruction.

**Operation:**

\[
\text{T: } \text{target} \leftarrow (\text{offset_16})_2 || \text{offset} || 0^2 \\
\text{condition} \leftarrow (\text{GPR}[rs] \neq \text{GPR}[rt]) \\
\text{T+1: } \text{if condition then} \\
\quad \text{PC} \leftarrow \text{PC} + \text{target} \\
\quad \text{endif}
\]

**Exceptions:**  
None
**BNEL** Branch On Not Equal Likely

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>BNEL</td>
<td>rs</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 1 0 1 0 1</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
BNEL rs, rt, offset

**Description:**
A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. The contents of general register rs and the contents of general register rt are compared. If the two registers are not equal, then the program branches to the target address, with a delay of one instruction.

If the conditional branch is not taken, the instruction in the branch delay slot is nullified.

**Operation:**

T: target ← (offset₁₅)₄₀₆ || offset || 0²
condition ← (GPR[rs] ≠ GPR[rt])
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif

**Exceptions:**
None
**Format:**
BREAK

**Description:**
A breakpoint trap occurs, immediately and unconditionally transferring control to the exception handler. The code field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.

**Operation:**

| T: | BreakpointException |

**Exceptions:**
Breakpoint exception
Format:
CACHE op, offset(base)

Description:
The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The virtual address is translated to a physical address using the TLB, and the 5-bit sub-opcode specifies a cache operation for that address.

If CP0 is not usable (User or Supervisor mode) the CP0 enable bit in the Status register is clear, and a coprocessor unusable exception is taken. The operation of this instruction on any operation/cache combination not listed below is undefined. The operation of this instruction on uncached addresses is also undefined.

The R4600/R4700 uses only the tag comparisons, not the valid bits, to choose which data it supplies to the instruction unit. This makes it important that the tags of the A and B sets are never the same.

The Index operation uses part of the virtual address to specify a cache block, with vAddr_{13} selecting the set to access.

For a primary cache of 16KB with 32 bytes per tag, vAddr_{12..5} specifies the block.

Index Load Tag also uses vAddr_{4..3} to select the doubleword for reading parity. When the CE bit of the Status register is set, Hit WriteBack, Hit WriteBack Invalidate, Index WriteBack Invalidate, and Fill also use vAddr_{4..3} to select the doubleword that has its parity modified. This operation is performed unconditionally.

The Hit operation accesses the specified cache as normal data references, and performs the specified operation if the cache block contains valid data with the specified physical address (a hit). If both sets are invalid or contain different addresses (a miss), no operation is performed.

Write back from a primary cache goes to memory. The address to be written is specified by the cache tag and not the translated physical address.

TLB Refill and TLB Invalid exceptions can occur on any operation. For Index operations (where the physical address is used to index the cache but need not match the cache tag) unmapped addresses may be used to avoid TLB exceptions. This operation never causes TLB Modified or Virtual Coherency exceptions.

Bits 17..16 of the instruction specify the cache as follows:

<table>
<thead>
<tr>
<th>Code</th>
<th>Name</th>
<th>Cache</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>I</td>
<td>primary instruction</td>
</tr>
<tr>
<td>1</td>
<td>D</td>
<td>primary data</td>
</tr>
<tr>
<td>2 - 3</td>
<td>NA</td>
<td>Undefined</td>
</tr>
</tbody>
</table>
Bits 20..18 (this value is listed under the **Code** column) of the instruction specify the operation as follows:

<table>
<thead>
<tr>
<th>Code</th>
<th>Caches</th>
<th>Name</th>
<th>Operation</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 I</td>
<td>Index Invalidate</td>
<td>Set the cache state of the cache block to Invalid. Index_Invalidate_I writes the physical address of the cache op into the tag when it clears the valid bit, which is different from the R4000.</td>
<td></td>
</tr>
<tr>
<td>0 D</td>
<td>Index Write-Back Invalidate</td>
<td>Examine the cache state and W bit of the primary data cache block at the index specified by the virtual address. If the state is not Invalid and the W bit is set, then write back the block to memory. The address to write is taken from the primary cache tag. Set cache state of primary cache block to Invalid.</td>
<td></td>
</tr>
<tr>
<td>1 I, D</td>
<td>Index Load Tag</td>
<td>Read the tag for the cache block at the specified index and place it into the TagLo CP0 registers, ignoring parity errors. Also load the data parity bits into the ECC register.</td>
<td></td>
</tr>
<tr>
<td>1 I, D</td>
<td>Index Store Tag</td>
<td>Write the tag for the cache block at the specified index from the TagLo and TagHi CP0 registers.</td>
<td></td>
</tr>
<tr>
<td>3 D</td>
<td>Create Dirty Exclusive</td>
<td>This operation is used to avoid loading data needlessly from memory when writing new contents into an entire cache block. If the cache block does not contain the specified address, and the block is dirty, write it back to the memory. In all cases, set the cache block tag to the specified physical address, set the cache state to Dirty Exclusive.</td>
<td></td>
</tr>
<tr>
<td>4 I, D</td>
<td>Hit Invalidate</td>
<td>If the cache block contains the specified address, mark the cache block invalid.</td>
<td></td>
</tr>
<tr>
<td>5 D</td>
<td>Hit WriteBack Invalidate</td>
<td>If the cache block contains the specified address, write back the data if it is dirty, and mark the cache block invalid.</td>
<td></td>
</tr>
<tr>
<td>5 I</td>
<td>Fill</td>
<td>Fill the primary instruction cache block from memory. If the CE bit of the Status register is set, the contents of the ECC register is used instead of the computed parity bits for addressed doubleword when written to the instruction cache. Uses bit 13 to pick the set.</td>
<td></td>
</tr>
<tr>
<td>6 D</td>
<td>Hit WriteBack</td>
<td>If the cache block contains the specified address, and the W bit is set, write back the data to memory and clear the W bit.</td>
<td></td>
</tr>
<tr>
<td>6 I</td>
<td>Hit WriteBack</td>
<td>If the cache block contains the specified address, write back the data unconditionally.</td>
<td></td>
</tr>
</tbody>
</table>

**Operation:**

\[
T: \quad \text{vAddr} \leftarrow ((\text{offset}_{15})^{48} || \text{offset}_{15..0}) + \text{GPR}[\text{base}] \\
(p\text{Addr}, \text{uncached}) \leftarrow \text{AddressTranslation} (\text{vAddr}, \text{DATA}) \\
\text{CacheOp} (\text{op}, \text{vAddr}, p\text{Addr})
\]

**Exceptions:**

Coprocessor unusable exception
CFCz

**Format:**
CFCz rt, rd

**Description:**
The contents of coprocessor control register rd of coprocessor unit z are loaded into general register rt.
This instruction is not valid for CP0.

**Operation:**

\[
\begin{align*}
T: & \quad \text{data} \leftarrow (\text{CCR}[z, \text{rd}]_{31})^{32} || \text{CCR}[z, \text{rd}] \\
T+1: & \quad \text{GPR}[rt] \leftarrow \text{data}
\end{align*}
\]

**Exceptions:**
Coprocessor unusable exception

*Opcode Bit Encoding:*

```
<table>
<thead>
<tr>
<th>Bit #</th>
<th>31</th>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>CFCz</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>CFC1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>CFC2</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>
```

OpCodes

- **Opcode**
- **Coprocessor Unit Number**
- **Coprocessor Suboperation**
**COPz**

**Coprocessor Operation**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COPz</td>
<td>CO</td>
<td>cofun</td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 1 0 0 x*</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Description:**
A coprocessor operation is performed. The operation may specify and reference internal coprocessor registers, and may change the state of the coprocessor condition line, but does not modify state within the processor or the cache/memory system. Details of coprocessor operations are contained in Appendix B.

**Operation:**

\[ T: \text{CoprocessorOperation}(z, \text{cofun}) \]

**Exceptions:**
- Coprocessor unusable exception
- Coprocessor interrupt or Floating-Point Exception

***Opcode Bit Encoding:**

<table>
<thead>
<tr>
<th>COPz</th>
<th>Bit # 31 30 29 28 27 26 25</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP0</td>
<td>0 1 0 0 0 0 1</td>
<td></td>
</tr>
<tr>
<td>COP1</td>
<td>0 1 0 0 0 1 1</td>
<td></td>
</tr>
<tr>
<td>COP2</td>
<td>0 1 0 0 1 0 1</td>
<td></td>
</tr>
</tbody>
</table>

- Opcode
- CO sub-opcode (see end of Appendix A)
- Coprocessor Unit Number
**CTCz**  
**Move Control to Coprocessor**  

<table>
<thead>
<tr>
<th>COPz</th>
<th>CT</th>
<th>rt</th>
<th>rd</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0100x x*</td>
<td>00110</td>
<td>6</td>
<td>5</td>
<td>5</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>11</td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
CTCz rt, rd

**Description:**
The contents of general register rt are loaded into control register rd of coprocessor unit z.
This instruction is not valid for CP0.

**Operation:**

\[
\begin{align*}
T &: \text{ data } \leftarrow \text{GPR}[rt] \\
T + 1 & : \text{CCR}[z,rd] \leftarrow \text{data}
\end{align*}
\]

**Exceptions:**
- Coprocessor unusable

*NOTE:* "See “CPU Instruction Opcode Bit Encoding” at the end of Appendix A."
DADD  Doubleword Add

Format:
DADD rd, rs, rt

Description:
The contents of general register rs and the contents of general register rt are added to form the result. The result is placed into general register rd.

An overflow exception occurs if the carries out of bits 62 and 63 differ (2’s complement overflow). The destination register rd is not modified when an integer overflow exception occurs.

Operation:
\[
T: \quad \text{GPR}[rd] \leftarrow \text{GPR}[rs] + \text{GPR}[rt]
\]

Exceptions:
Integer overflow exception
DADDI Doubleword Add Immediate

<table>
<thead>
<tr>
<th>Format:</th>
<th>DADDI rt, rs, immediate</th>
</tr>
</thead>
<tbody>
<tr>
<td>Description:</td>
<td>The 16-bit immediate is sign-extended and added to the contents of general register rs to form the result. The result is placed into general register rt. An overflow exception occurs if carries out of bits 62 and 63 differ (2’s complement overflow). The destination register rt is not modified when an integer overflow exception occurs.</td>
</tr>
<tr>
<td>Operation:</td>
<td>T: GPR [rt] ← GPR[rs] + (immediate_{15})^{48}</td>
</tr>
<tr>
<td>Exceptions:</td>
<td>Integer overflow exception</td>
</tr>
</tbody>
</table>
**DADDIU** Doubleword Add Immediate Unsigned **DADDIU**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>DADDIU</td>
<td>rs</td>
<td>rt</td>
<td>immediate</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 1 1 0 0 1</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
DADDIU rt, rs, immediate

**Description:**
The 16-bit immediate is sign-extended and added to the contents of general register rs to form the result. The result is placed into general register rt. No integer overflow exception occurs under any circumstances.

The only difference between this instruction and the DADDI instruction is that DADDIU never causes an overflow exception.

**Operation:**

| T: GPR [rt] ← GPR[rs] + (immediate_{15})^{48} || immediate_{15..0} |

**Exceptions:**
None
DADDU  Doubleword Add Unsigned  DADDU

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>rs</td>
<td>rt</td>
<td>rd</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1 0 1 0 1</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
DADDU rd, rs, rt

**Description:**
The contents of general register rs and the contents of general register rt are added to form the result. The result is placed into general register rd. No overflow exception occurs under any circumstances. The only difference between this instruction and the DADD instruction is that DADDU never causes an overflow exception.

**Operation:**

| T: GPR[rd] ← GPR[rs] + GPR[rt] |

**Exceptions:**
None
DDIV Doubleword Divide

Format:
DDIV rs, rt

Description:
The contents of general register rs are divided by the contents of general register rt, treating both operands as 2’s complement values. No overflow exception occurs under any circumstances, and the result of this operation is undefined when the divisor is zero.

This instruction is typically followed by additional instructions to check for a zero divisor and for overflow.

When the operation completes, the quotient word of the double result is loaded into special register LO, and the remainder word of the double result is loaded into special register HI.

If either of the two preceding instructions is MFHI or MFLO, the results of those instructions are undefined. Correct operation requires separating reads of HI or LO from writes by two or more instructions.

Operation:

\[
\begin{align*}
T-2: & \quad \text{LO} \leftarrow \text{undefined} \\
& \quad \text{HI} \leftarrow \text{undefined} \\
T-1: & \quad \text{LO} \leftarrow \text{undefined} \\
& \quad \text{HI} \leftarrow \text{undefined} \\
T: & \quad \text{LO} \leftarrow \text{GPR}[rs] \div \text{GPR}[rt] \\
& \quad \text{HI} \leftarrow \text{GPR}[rs] \mod \text{GPR}[rt]
\end{align*}
\]

Exceptions:
None
**DDIVU** Doubleword Divide Unsigned 

| Format: | DDIVU rs, rt |
| Description: | The contents of general register rs are divided by the contents of general register rt, treating both operands as unsigned values. No integer overflow exception occurs under any circumstances, and the result of this operation is undefined when the divisor is zero. This instruction is typically followed by additional instructions to check for a zero divisor. When the operation completes, the quotient word of the double result is loaded into special register LO, and the remainder word of the double result is loaded into special register HI. If either of the two preceding instructions is MFHI or MFLO, the results of those instructions are undefined. Correct operation requires separating reads of HI or LO from writes by two or more instructions. |
| Operation: | T–2: LO ← undefined  
            HI ← undefined  
            T–1: LO ← undefined  
            HI ← undefined  
            T: LO ← (0 || GPR[rs]) div (0 || GPR[rt])  
               HI ← (0 || GPR[rs]) mod (0 || GPR[rt]) |
| Exceptions: | None |
DIV
Divide

Format:
DIV rs, rt

Description:
The contents of general register rs are divided by the contents of
general register rt, treating both operands as 2’s complement values. No
overflow exception occurs under any circumstances, and the result of this
operation is undefined when the divisor is zero.
The operands must be valid sign-extended, 32-bit values.
This instruction is typically followed by additional instructions to
test for a zero divisor and for overflow.
When the operation completes, the quotient word of the double result
is loaded into special register LO, and the remainder word of the double
result is loaded into special register HI.

If either of the two preceding instructions is MFHI or MFLO, the results
of those instructions are undefined. Correct operation requires separating
reads of HI or LO from writes by two or more instructions.

Operation:

<table>
<thead>
<tr>
<th>T–2:</th>
<th>LO</th>
<th>← undefined</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>HI</td>
<td>← undefined</td>
</tr>
<tr>
<td>T–1:</td>
<td>LO</td>
<td>← undefined</td>
</tr>
<tr>
<td></td>
<td>HI</td>
<td>← undefined</td>
</tr>
<tr>
<td>T:</td>
<td>q</td>
<td>← GPR[rs]<em>{31..0} div GPR[rt]</em>{31..0}</td>
</tr>
<tr>
<td></td>
<td>r</td>
<td>← GPR[rs]<em>{31..0} mod GPR[rt]</em>{31..0}</td>
</tr>
<tr>
<td></td>
<td>LO</td>
<td>← (q_{31})_32</td>
</tr>
<tr>
<td></td>
<td>HI</td>
<td>← (r_{31})_32</td>
</tr>
</tbody>
</table>

Exceptions:
None
DIVU Divide Unsigned

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>rs</td>
<td>rt</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>DIVU</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>10</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
DIVU rs, rt

**Description:**
The contents of general register rs are divided by the contents of general register rt, treating both operands as unsigned values. No integer overflow exception occurs under any circumstances, and the result of this operation is undefined when the divisor is zero.

The operands must be valid sign-extended, 32-bit values.

This instruction is typically followed by additional instructions to check for a zero divisor.

When the operation completes, the quotient word of the double result is loaded into special register LO, and the remainder word of the double result is loaded into special register HI.

If either of the two preceding instructions is MFHI or MFLO, the results of those instructions are undefined. Correct operation requires separating reads of HI or LO from writes by two or more instructions.

**Operation:**

T–2: LO ← undefined
     HI ← undefined
T–1: LO ← undefined
     HI ← undefined
T:  q ← (0 || GPR[rs]31..0) div (0 || GPR[rt]31..0)
    r ← (0 || GPR[rs]31..0) mod (0 || GPR[rt]31..0)
    LO ← (q31)32 || q31..0
    HI ← (r31)32 || r31..0

**Exceptions:**
None
**DMFC0** Doubleword Move From System Control Coprocessor

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP0</td>
<td>DMF</td>
<td>rt</td>
<td>rd</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010000</td>
<td>00001</td>
<td></td>
<td></td>
<td>00000000</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
DMFC0 rt, rd

**Description:**
The contents of coprocessor register rd of the CP0 are loaded into general register rt.

This operation is defined in kernel mode regardless of the setting of the Status.KX bit. Execution of this instruction with in supervisor mode with Status.SX = 0 or in user mode with UX = 0, causes a reserved instruction exception. All 64-bits of the general register destination are written from the coprocessor register source. The operation of DMFC0 on a 32-bit coprocessor 0 register is undefined.

**Operation:**

\[
\begin{align*}
T: & \quad \text{data} \leftarrow \text{CPR}[0,\text{rd}] \\
T+1: & \quad \text{GPR}[rt] \leftarrow \text{data}
\end{align*}
\]

**Exceptions:**
- Coprocessor unusable exception
- Reserved instruction exception for supervisor mode with Status.SX = 0 or user mode with Status.UX = 0.
DMTC0  
Doubleword Move To  
System Control Coprocessor  

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP0</td>
<td>DMT</td>
<td>rt</td>
<td>rd</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 1 0 0 0 0</td>
<td>0 0 1 0 1</td>
<td>5</td>
<td>5</td>
<td>0 0 0 0 0 0 0 0 0 0 0 0 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

DMTC0 rt, rd

**Description:**

The contents of general register rt are loaded into coprocessor register rd of the CP0.

This operation is defined in kernel mode regardless of the setting of the Status.KX bit. Execution of this instruction with in supervisor mode with Status.SX = 0 or in user mode with UX = 0, causes a reserved instruction exception.

All 64-bits of the coprocessor 0 register are written from the general register source. The operation of DMTC0 on a 32-bit coprocessor 0 register is undefined.

Because the state of the virtual address translation system may be altered by this instruction, the operation of load instructions, store instructions, and TLB operations immediately prior to and after this instruction are undefined.

**Operation:**

\[
\begin{align*}
T: & \quad \text{data} \leftarrow \text{GPR}[rt] \\
T+1: & \quad \text{CPR}[0,rd] \leftarrow \text{data}
\end{align*}
\]

**Exceptions:**

Reserved instruction exception for supervisor mode with Status.SX = 0 or user mode with Status.UX = 0.
DMULT  Doubleword Multiply  DMULT

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL 000000</td>
<td>rs</td>
<td>rt</td>
<td>0</td>
<td>000000000000</td>
<td>DMULT 011100</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

| 6 | 5 | 5 | 10 | 6 |

**Format:**
DMULT rs, rt

**Description:**
The contents of general registers rs and rt are multiplied, treating both operands as 2’s complement values. No integer overflow exception occurs under any circumstances.

When the operation completes, the low-order word of the double result is loaded into special register LO, and the high-order word of the double result is loaded into special register HI.

If either of the two preceding instructions is MFHI or MFLO, the results of these instructions are undefined. Correct operation requires separating reads of HI or LO from writes by a minimum of two other instructions.

**Operation:**

| T–2: LO     | undefined |
| HI          | undefined |
| T–1: LO     | undefined |
| HI          | undefined |
| T: t        | GPR[rs] * GPR[rt] |
| LO          | t63..0    |
| HI          | t127..64  |

**Exceptions:**
None
Format:
DMULTU rs, rt

Description:
The contents of general register rs and the contents of general register rt are multiplied, treating both operands as unsigned values. No overflow exception occurs under any circumstances.

When the operation completes, the low-order word of the double result is loaded into special register LO, and the high-order word of the double result is loaded into special register HI.

If either of the two preceding instructions is MFHI or MFLO, the results of these instructions are undefined. Correct operation requires separating reads of HI or LO from writes by a minimum of two instructions.

Operation:

T–2:  LO ← undefined
      HI ← undefined
T–1:  LO ← undefined
      HI ← undefined
      t ← (0 || GPR[rs]) * (0 || GPR[rt])
      LO ← t_{63..0}
      HI ← t_{127..64}

Exceptions:
None
### DSLL
#### Doubleword Shift Left Logical

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>rt</td>
<td>rd</td>
<td>sa</td>
<td>DSLL</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

| 6 | 5 | 5 | 5 | 5 | 6 |

**Format:**
DSLL rd, rt, sa

**Description:**
The contents of general register rt are shifted left by sa bits, inserting zeros into the low-order bits. The result is placed in register rd.

**Operation:**

\[
T: \quad s \leftarrow 0 \parallel sa
\]

\[
GPR[rd] \leftarrow GPR[rt]_{(63-s)\ldots0} \parallel 0^s
\]

**Exceptions:**

None
**DSLLV Doubleword Shift Left Logical Variable**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>rs</td>
<td>rt</td>
<td>rd</td>
<td>0</td>
<td>00000</td>
<td>010100</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- **Format:**
  
  DSLLV rd, rt, rs

- **Description:**
  
  The contents of general register rt are shifted left by the number of bits specified by the low-order six bits contained in general register rs, inserting zeros into the low-order bits. The result is placed in register rd.

- **Operation:**

  T:  $s \leftarrow \text{GPR}[rs]_{5..0}$  
  $\text{GPR}[rd] \leftarrow \text{GPR}[rt]_{(63-s)\.0} || 0^s$

- **Exceptions:**

  None
**Format:**
DSLL32 rd, rt, sa

**Description:**
The contents of general register rt are shifted left by 32+sa bits, inserting zeros into the low-order bits. The result is placed in register rd.

**Operation:**

\[
\begin{align*}
T & : s \leftarrow 1 \parallel sa \\
\text{GPR}[rd] & \leftarrow \text{GPR}[rt]_{(63-s)} 0\parallel 0^5
\end{align*}
\]

**Exceptions:**
None
**DSRA**

**Doubleword Shift Right Arithmetic**

| 31 | 26 | 25 | 24 | 21 | 20 | 19 | 18 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| SPECIAL | 000000 | rt | rd | sa | DSRA | 111011 |
| 6 | 5 | 5 | 5 | 5 | 6 |

**Format:**
DSRA rd, rt, sa

**Description:**
The contents of general register rt are shifted right by sa bits, sign-extending the high-order bits. The result is placed in register rd.

**Operation:**

\[
T: \quad s \leftarrow 0 || sa \\
GPR[rd] \leftarrow (GPR[rt]_{63})^s || GPR[rt]_{63..s}
\]

**Exceptions:**
None
**Format:**

DSRAV rd, rt, rs

**Description:**

The contents of general register rt are shifted right by the number of bits specified by the low-order six bits of general register rs, sign-extending the high-order bits. The result is placed in register rd.

**Operation:**

\[
T: \quad s \leftarrow \text{GPR}[rs]_{5..0} \\
\text{GPR}[rd] \leftarrow (\text{GPR}[rt]_{63})^s \| \text{GPR}[rt]_{63..s}
\]

**Exceptions:**

None
DSRA32 Doubleword Shift Right
Arithmetic + 32

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>000000</td>
<td>rt</td>
<td>rd</td>
<td>sa</td>
<td>DSRA32</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
DSRA32 rd, rt, sa

**Description:**
The contents of general register rt are shifted right by 32+sa bits, sign-extending the high-order bits. The result is placed in register rd.

**Operation:**

\[
\begin{align*}
T & : s \leftarrow 1 || sa \\
GPR[rd] & \leftarrow (GPR[rt]_{63})^s || GPR[rt]_{63..s}
\end{align*}
\]

**Exceptions:**
None
**DSRL**

**Doubleword Shift Right Logical**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>000000</td>
<td>rt</td>
<td>rd</td>
<td>sa</td>
<td>DSRL</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

DSRL rd, rt, sa

**Description:**

The contents of general register rt are shifted right by sa bits, inserting zeros into the high-order bits. The result is placed in register rd.

**Operation:**

\[
T: \quad s \leftarrow 0 || sa \\
GPR[rd] \leftarrow 0^6 || GPR[rt]_{63..s}
\]

**Exceptions:**

None
### DSRLV

**Doubleword Shift Right Logical Variable**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>rs</td>
<td>rt</td>
<td>rd</td>
<td>0</td>
<td>00000</td>
<td>DSRLV</td>
<td>010110</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

| 6 | 5 | 5 | 5 | 5 | 6 |

**Format:**

DSRLV rd, rt, rs

**Description:**

The contents of general register rt are shifted right by the number of bits specified by the low-order six bits of general register rs, inserting zeros into the high-order bits. The result is placed in register rd.

**Operation:**

\[
T: \quad s \leftarrow \text{GPR}[rs]_{5..0} \\
\text{GPR}[rd] \leftarrow 0^s \ || \ \text{GPR}[rt]_{63..s}
\]

**Exceptions:**

None
**DSRL32**  
**Doubleword Shift Right Logical + 32**

<table>
<thead>
<tr>
<th></th>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>0</td>
<td>0</td>
<td>rt</td>
<td>rd</td>
<td>sa</td>
<td>DSRL32</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td></td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
DSRL32 rd, rt, sa

**Description:**
The contents of general register rt are shifted right by $32 + sa$ bits, inserting zeros into the high-order bits. The result is placed in register rd.

**Operation:**

\[
T: \quad s \leftarrow 1 \parallel sa \\
GPR[rd] \leftarrow 0^s \parallel GPR[rt]_{63..s}
\]

**Exceptions:**
None
Format:
DSUB rd, rs, rt

Description:
The contents of general register rt are subtracted from the contents of general register rs to form a result. The result is placed into general register rd.
The only difference between this instruction and the DSUBU instruction is that DSUBU never traps on overflow.
An integer overflow exception takes place if the carries out of bits 62 and 63 differ (2’s complement overflow). The destination register rd is not modified when an integer overflow exception occurs.

Operation:
T: GPR[rd] ← GPR[rs] − GPR[rt]

Exceptions:
Integer overflow exception
**DSUBU** Doubleword Subtract Unsigned

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>rs</td>
<td>rt</td>
<td>rd</td>
<td>0</td>
<td>DSUBU</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 0 0 0 0</td>
<td>0 0 0 0 0</td>
<td>1 0 1 1 1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
DSUBU rd, rs, rt

**Description:**
The contents of general register \( rt \) are subtracted from the contents of general register \( rs \) to form a result. The result is placed into general register \( rd \).

The only difference between this instruction and the DSUB instruction is that DSUBU never traps on overflow. No integer overflow exception occurs under any circumstances.

**Operation:**

\[
T: \quad \text{GPR}[rd] \leftarrow \text{GPR}[rs] - \text{GPR}[rt]
\]

**Exceptions:**
None
**ERET**  
**Exception Return**  
**ERET**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>24</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP0</td>
<td>CO</td>
<td>000000000000000000000000000</td>
<td>ERET</td>
<td>0110000</td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
ERET

**Description:**
ERET is the R4600 instruction for returning from an interrupt, exception, or error trap. Unlike a branch or jump instruction, ERET does not execute the next instruction.

ERET must not itself be placed in a branch delay slot.

If the processor is servicing an error trap (SR₂ = 1), then load the PC from the *ErrorEPC* and clear the *ERL* bit of the *Status* register (SR₂). Otherwise (SR₂ = 0), load the PC from the *EPC*, and clear the *EXL* bit of the *Status* register (SR₁).

An ERET executed between a LL and SC also causes the SC to fail.

**Operation:**

<table>
<thead>
<tr>
<th>T: if SR₂ = 1 then</th>
</tr>
</thead>
<tbody>
<tr>
<td>PC ← ErrorEPC</td>
</tr>
<tr>
<td>SR ← SR₃₁..3</td>
</tr>
<tr>
<td>else</td>
</tr>
<tr>
<td>PC ← EPC</td>
</tr>
<tr>
<td>SR ← SR₃₁..2</td>
</tr>
<tr>
<td>endif</td>
</tr>
<tr>
<td>LLbit ← 0</td>
</tr>
</tbody>
</table>

**Exceptions:**
Coprocessor unusable exception
**Jump**

![Jump Instruction Format]

**Format:**

J target

**Description:**

The 26-bit target address is shifted left two bits and combined with the high-order bits of the address of the delay slot. The program unconditionally jumps to this calculated address with a delay of one instruction.

**Operation:**

- T: temp $\leftarrow$ target
- T+1: PC $\leftarrow$ PC$_{63..28}$ || temp || 0$^2$

**Exceptions:**

None
**JAL**

**Jump And Link**

<table>
<thead>
<tr>
<th>Format:</th>
</tr>
</thead>
<tbody>
<tr>
<td>JAL target</td>
</tr>
</tbody>
</table>

**Description:**
The 26-bit target address is shifted left two bits and combined with the high-order bits of the address of the delay slot. The program unconditionally jumps to this calculated address with a delay of one instruction. The address of the instruction after the delay slot is placed in the link register, r31.

**Operation:**

| T: temp ← target |
| GPR[31] ← PC + 8 |
| T+1: PC ← PC_{63..28} || temp || 0^2 |

**Exceptions:**

None
**Format:**

- JALR rs
- JALR rd, rs

**Description:**

The program unconditionally jumps to the address contained in general register rs, with a delay of one instruction. The address of the instruction after the delay slot is placed in general register rd. The default value of rd, if omitted in the assembly language instruction, is 31.

Register specifiers rs and rd may not be equal, because such an instruction does not have the same effect when re-executed. However, an attempt to execute this instruction is *not* trapped, and the result of executing such an instruction is undefined.

Since instructions must be word-aligned, a **Jump and Link Register** instruction must specify a target register (rs) whose two low-order bits are zero. If these low-order bits are not zero, an address exception will occur when the jump target instruction is subsequently fetched.

**Operation:**

<p>| | | | | | | | | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>26</td>
<td>25</td>
<td>21</td>
<td>20</td>
<td>16</td>
<td>15</td>
<td>11</td>
<td>10</td>
</tr>
<tr>
<td>SPECIAL</td>
<td>rs</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>rd</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Exceptions:**

None
### CPU Instruction Set Details

**JR**

**Jump Register**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21 20</th>
<th>6 5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>rs</td>
<td>0</td>
<td>00000000000000000000000000000000</td>
<td>JR 001000</td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>15</td>
<td></td>
<td></td>
<td>6</td>
</tr>
</tbody>
</table>

**Format:**

JR rs

**Description:**

The program unconditionally jumps to the address contained in general register rs, with a delay of one instruction.

Since instructions must be word-aligned, a Jump Register instruction must specify a target register (rs) whose two low-order bits are zero. If these low-order bits are not zero, an address exception will occur when the jump target instruction is subsequently fetched.

**Operation:**

| T: | temp ← GPR[rs] |
| T+1: | PC ← temp |

**Exceptions:**

None
### LB Load Byte

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>LB</td>
<td></td>
<td></td>
<td>base</td>
<td>rt</td>
<td></td>
<td></td>
<td>offset</td>
</tr>
<tr>
<td>1 0 0 0 0 0</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

LB rt, offset(base)

**Description:**

The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The contents of the byte at the memory location specified by the effective address are sign-extended and loaded into general register rt.

**Operation:**

\[
\begin{align*}
T: & \quad vAddr \leftarrow ((\text{offset}_{15})^{48} || \text{offset}_{15..0}) + \text{GPR}[\text{base}] \\
& \quad (pAddr, \text{uncached}) \leftarrow \text{AddressTranslation}(vAddr, \text{DATA}) \\
& \quad pAddr \leftarrow pAddr_{\text{PSIZE} - 1..3} || (pAddr_{2..0} \text{xor ReverseEndian}^3) \\
& \quad \text{mem} \leftarrow \text{LoadMemory}(\text{uncached, BYTE, pAddr, vAddr, DATA}) \\
& \quad \text{byte} \leftarrow vAddr_{2..0} \text{xor BigEndianCPU}^3 \\
& \quad \text{GPR}[rt] \leftarrow (\text{mem}_{7+8*\text{byte}})^{56} || \text{mem}_{7+8*\text{byte}..8*\text{byte}}
\end{align*}
\]

**Exceptions:**

- TLB refill exception
- TLB invalid exception
- Bus error exception
- Address error exception
**LBU**  
Load Byte Unsigned

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>LBU</td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>1 0 0 1 0 0</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**  
LBU rt, offset(base)

**Description:**  
The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The contents of the byte at the memory location specified by the effective address are zero-extended and loaded into general register rt.

**Operation:**

\[
T: \quad vAddr \leftarrow ((\text{offset}_{15})^{48} \| \text{offset}_{15..0}) + \text{GPR}[\text{base}]
\]

\[
(p\text{Addr, uncached}) \leftarrow \text{AddressTranslation}(v\text{Addr, DATA})
\]

\[
p\text{Addr} \leftarrow p\text{Addr}_{\text{PSIZE} - 1..3} \| (p\text{Addr}_{2..0} \text{xor ReverseEndian}^3)
\]

\[
\text{mem} \leftarrow \text{LoadMemory}(\text{uncached, BYTE, pAddr, vAddr, DATA})
\]

\[
\text{byte} \leftarrow v\text{Addr}_{2..0} \text{xor BigEndianCPU}^3
\]

\[
\text{GPR}[\text{rt}] \leftarrow 0^{56} \| \text{mem}_{7+8*\text{byte}..8*\text{byte}}
\]

**Exceptions:**

- TLB refill exception
- TLB invalid exception
- Bus error exception
- Address error exception
**Format:**
LD rt, offset(base)

**Description:**
The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The contents of the 64-bit doubleword at the memory location specified by the effective address are loaded into general register rt.

If any of the three least-significant bits of the effective address are non-zero, an address error exception occurs.

**Operation:**

\[
\begin{align*}
T: & \quad vAddr \leftarrow ((\text{offset}_{15})^{48} || \text{offset}_{15..0}) + \text{GPR}[\text{base}] \\
& \quad (pAddr, \text{uncached}) \leftarrow \text{AddressTranslation}(vAddr, \text{DATA}) \\
& \quad \text{mem} \leftarrow \text{LoadMemory}(\text{uncached, DOUBLEWORD}, pAddr, vAddr, \text{DATA}) \\
& \quad \text{GPR}[rt] \leftarrow \text{mem}
\end{align*}
\]

**Exceptions:**
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
**LDCz**  Load Doubleword To Coprocessor  **LDCz**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>LDCz</td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>1 1 0 1 x x*</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
LDCz rt, offset(base)

**Description:**
The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The processor reads a doubleword from the addressed memory location and makes the data available to coprocessor unit z. The manner in which each coprocessor uses the data is defined by the individual coprocessor specifications.

If any of the three least-significant bits of the effective address are non-zero, an address error exception takes place.

This instruction is not valid for use with CP0.

This instruction is undefined when the least-significant bit of the rt field is non-zero.

Execution of the instruction referencing coprocessor 3 causes a reserved instruction exception, not a coprocessor unusable exception.

NOTE: “See the table “Opcode Bit Encoding” on next page, or “CPU Instruction Opcode Bit Encoding” at the end of Appendix A.

**Operation:**

\[
\begin{align*}
T: & \quad vAddr \leftarrow ((offset_{15})^{15} || offset_{15..0}) + GPR[base] \\
& \quad (pAddr, \text{uncached}) \leftarrow \text{AddressTranslation}(vAddr, \text{DATA}) \\
& \quad \text{mem} \leftarrow \text{LoadMemory}(\text{uncached, DOUBLEWORD}, pAddr, vAddr, \text{DATA}) \\
& \quad \text{COPzLD}(rt, \text{mem})
\end{align*}
\]

**Exceptions:**
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Coprocessor unusable exception
Reserved instruction exception (coprocessor 3)

**Opcode Bit Encoding:**

<table>
<thead>
<tr>
<th>LDCz</th>
<th>Bit #</th>
<th>31</th>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>LDC1</td>
<td>1 1 0 1 0 1 0 1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>LDC2</td>
<td>1 1 0 1 1 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

 Opcode  Coprocessor Unit Number
**LDL Load Doubleword Left**

**Format:**
LDL rt, offset(base)

**Description:**
This instruction can be used in combination with the LDR instruction to load a register with eight consecutive bytes from memory, when the bytes cross a doubleword boundary. LDL loads the left portion of the register with the appropriate part of the high-order doubleword; LDR loads the right portion of the register with the appropriate part of the low-order doubleword.

The LDL instruction adds its sign-extended 16-bit offset to the contents of general register base to form a virtual address which can specify an arbitrary byte. It reads bytes only from the doubleword in memory which contains the specified starting byte. From one to eight bytes will be loaded, depending on the starting byte specified.

Conceptually, it starts at the specified byte in memory and loads that byte into the high-order (left-most) byte of the register; then it loads bytes from memory into the register until it reaches the low-order byte of the doubleword in memory. The least-significant (right-most) byte(s) of the register will not be changed.

The contents of general register rt are internally bypassed within the processor so that no NOP is needed between an immediately preceding load instruction which specifies register rt and a following LDL (or LDR) instruction which also specifies register rt.

No address exceptions due to alignment are possible.
Operation:

\[ T: \quad vAddr \leftarrow (\text{offset}_{15}^{48} \parallel \text{offset}_{15..0}) + \text{GPR}[\text{base}] \]

\[ (pAddr, \text{uncached}) \leftarrow \text{AddressTranslation}(vAddr, \text{DATA}) \]

\[ pAddr \leftarrow pAddr_{\text{PSIZE}-1..3} \parallel (pAddr_{2..0} \text{xor} \text{ReverseEndian}^{3}) \]

if BigEndianMem = 0 then

\[ pAddr \leftarrow pAddr_{\text{PSIZE}-1..3} \parallel 0^{3} \]

endif

byte \leftarrow vAddr_{2..0} \text{xor} \text{BigEndianCPU}^{3}

mem \leftarrow \text{LoadMemory}(\text{uncached}, \text{byte}, pAddr, vAddr, \text{DATA})

\[ \text{GPR}[rt] \leftarrow \text{mem}_{7+8*\text{byte}.0} \parallel \text{GPR}[rt]_{55-8*\text{byte}.0} \]

Given a doubleword in a register and a doubleword in memory, the operation of LDL is as follows:

### LDL

<table>
<thead>
<tr>
<th>Register</th>
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
<th>E</th>
<th>F</th>
<th>G</th>
<th>H</th>
</tr>
</thead>
<tbody>
<tr>
<td>Memory</td>
<td>I</td>
<td>J</td>
<td>K</td>
<td>L</td>
<td>M</td>
<td>N</td>
<td>O</td>
<td>P</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>vAddr_{2..0}</th>
<th>BigEndianCPU = 0</th>
<th>BigEndianCPU = 1</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>7</td>
</tr>
<tr>
<td>0</td>
<td>P</td>
<td>B</td>
</tr>
<tr>
<td>1</td>
<td>O</td>
<td>P</td>
</tr>
<tr>
<td>2</td>
<td>N</td>
<td>O</td>
</tr>
<tr>
<td>3</td>
<td>M</td>
<td>N</td>
</tr>
<tr>
<td>4</td>
<td>L</td>
<td>M</td>
</tr>
<tr>
<td>5</td>
<td>K</td>
<td>L</td>
</tr>
<tr>
<td>6</td>
<td>J</td>
<td>K</td>
</tr>
<tr>
<td>7</td>
<td>I</td>
<td>J</td>
</tr>
</tbody>
</table>

**Little-endian memory (BigEndianMem = 0)**

**BEM** BigEndianMem = 1

**Type** AccessType (see Table 2.1 on page 3) sent to memory

**Offset** Addr_{2..0} sent to memory

**Exceptions:**

- TLB refill exception
- TLB invalid exception
- Bus error exception
- Address error exception
## LDR

### Load Doubleword Right

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>LDR</td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>011011</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

LDR rt, offset(base)

**Description:**

This instruction can be used in combination with the LDL instruction to load a register with eight consecutive bytes from memory, when the bytes cross a doubleword boundary. LDR loads the right portion of the register with the appropriate part of the low-order doubleword; LDL loads the left portion of the register with the appropriate part of the high-order doubleword.

The LDR instruction adds its sign-extended 16-bit offset to the contents of general register base to form a virtual address which can specify an arbitrary byte. It reads bytes only from the doubleword in memory which contains the specified starting byte. From one to eight bytes will be loaded, depending on the starting byte specified.

Conceptually, it starts at the specified byte in memory and loads that byte into the low-order (right-most) byte of the register; then it loads bytes from memory into the register until it reaches the high-order byte of the doubleword in memory. The most significant (left-most) byte(s) of the register will not be changed.

The contents of general register rt are internally bypassed within the processor so that no NOP is needed between an immediately preceding load instruction which specifies register rt and a following LDR (or LDL) instruction which also specifies register rt.

No address exceptions due to alignment are possible.

---

<table>
<thead>
<tr>
<th>address 8</th>
<th>memory (big-endian)</th>
<th>before</th>
<th>register</th>
<th>LDR $24,4($0)</th>
</tr>
</thead>
<tbody>
<tr>
<td>address 0</td>
<td>0 1 2 3 4 5 6 7</td>
<td>A B C D E F G H</td>
<td>$24</td>
<td></td>
</tr>
</tbody>
</table>

After the LDR instruction, the register contains the values ABCD01234, with ABCD being loaded from memory and the rest of the register being ignored.
**Operation:**

\[
T: \quad vAddr \leftarrow ((\text{offset}_{15})^{48} || \text{offset}_{15..0}) + \text{GPR}[\text{base}]
\]

\[
\begin{align*}
\text{pAddr} \leftarrow \text{AddressTranslation}(\text{vAddr}, \text{DATA}) \\
\text{pAddr} \leftarrow \text{pAddr}_{\text{PSIZE}_{-1..3}} || (\text{pAddr}_{2..0} \text{ xor ReverseEndian}^3) \\
\text{if BigEndianMem} = 1 \text{ then} \\
\quad \text{pAddr} \leftarrow \text{pAddr}_{31..3} || 0^3 \\
\text{endif}
\end{align*}
\]

byte \leftarrow \text{vAddr}_{2..0} \text{ xor BigEndianCPU}^3

mem \leftarrow \text{LoadMemory}(\text{uncached, byte, pAddr, vAddr, DATA})

\[
\text{GPR}[\text{rt}] \leftarrow \text{GPR}[\text{rt}]_{63..64-8*\text{byte}} || \text{mem}_{63..8*\text{byte}}
\]

Given a doubleword in a register and a doubleword in memory, the operation of LDR is as follows:

<table>
<thead>
<tr>
<th>LDR</th>
<th>Register</th>
<th>Memory</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>A B C D E F G H</td>
<td>I J K L M N O P</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>BigEndianCPU = 0</th>
<th>BigEndianCPU = 1</th>
</tr>
</thead>
<tbody>
<tr>
<td>vAddr_{2..0}</td>
<td>destination</td>
</tr>
<tr>
<td>0</td>
<td>I J K L M N O P</td>
</tr>
<tr>
<td>1</td>
<td>A I J K L M N O</td>
</tr>
<tr>
<td>2</td>
<td>A B I J K L M N</td>
</tr>
<tr>
<td>3</td>
<td>A B C I J K L M</td>
</tr>
<tr>
<td>4</td>
<td>A B C D I J K L</td>
</tr>
<tr>
<td>5</td>
<td>A B C D E I J K</td>
</tr>
<tr>
<td>6</td>
<td>A B C D E F I J</td>
</tr>
<tr>
<td>7</td>
<td>A B C D E F G I</td>
</tr>
</tbody>
</table>

*LEMLittle-endian memory (BigEndianMem = 0)*
*BEBigEndianMem = 1*
*TypeAccessType (see Table 2.1 on page 3) sent to memory*
*OffsetpAddr_{2..0} sent to memory*

**Exceptions:**
- TLB refill exception
- TLB invalid exception
- Bus error exception
- Address error exception
**LH Load Halfword**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>LH</td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>00001</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

LH rt, offset(base)

**Description:**

The 16-bit *offset* is sign-extended and added to the contents of general register *base* to form a virtual address. The contents of the halfword at the memory location specified by the effective address are sign-extended and loaded into general register *rt*.

If the least-significant bit of the effective address is non-zero, an address error exception occurs.

**Operation:**

\[
\begin{align*}
T: & \quad \text{vAddr} \leftarrow ((\text{offset}_{15})^{48} \parallel \text{offset}_{15..0}) + \text{GPR[base]} \\
& \quad \text{pAddr} \leftarrow \text{AddressTranslation (vAddr, DATA)} \\
& \quad \text{pAddr} \leftarrow \text{pAddr}_{\text{SIZE} - 1..3} \parallel \text{(pAddr}_{2..0} \text{ xor (ReverseEndian} \parallel 0))} \\
& \quad \text{mem} \leftarrow \text{LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA)} \\
& \quad \text{byte} \leftarrow \text{vAddr}_{2..0} \text{ xor (BigEndianCPU}^{2} \parallel 0) \\
& \quad \text{GPR[rt]} \leftarrow \text{mem}_{15+8^{*}\text{byte}}^{16} \parallel \text{mem}_{15+8^{*}\text{byte}..8^{*}\text{byte}}^{16}
\end{align*}
\]

**Exceptions:**

- TLB refill exception
- TLB invalid exception
- Bus error exception
- Address error exception
### LHU

**Load Halfword Unsigned**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>LHU</td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>1 0 0 1 0 1</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

LHU rt, offset(base)

**Description:**

The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The contents of the halfword at the memory location specified by the effective address are zero-extended and loaded into general register rt.

If the least-significant bit of the effective address is non-zero, an address error exception occurs.

**Operation:**

\[
T: \quad \text{vAddr} \leftarrow ((\text{offset}_{15})^{48} \parallel \text{offset}_{15..0}) + \text{GPR}[\text{base}]
\]

\[
(\text{pAddr, uncached}) \leftarrow \text{AddressTranslation (vAddr, DATA)}
\]

\[
\text{pAddr} \leftarrow \text{pAddr}_{\text{PSIZE} - 1..3} \parallel (\text{pAddr}_{2..0} \text{xor (ReverseEndian}^2 \parallel 0))
\]

\[
\text{mem} \leftarrow \text{LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA)}
\]

\[
\text{byte} \leftarrow \text{vAddr}_{2..0} \text{xor (BigEndianCPU}^2 \parallel 0)
\]

\[
\text{GPR}[rt] \leftarrow 0^{48} \parallel \text{mem}_{15+8\times\text{byte}..8\times\text{byte}}
\]

**Exceptions:**

- TLB refill exception
- TLB invalid exception
- Bus Error exception
- Address error exception
Format:
LL rt, offset(base)

Description:
The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The contents of the word at the memory location specified by the effective address are loaded into general register rt. The loaded word is sign-extended.

This instruction implicitly performs a SYNC operation; all loads and stores to shared memory fetched prior to the LL must access memory before the LL, and loads and stores to shared memory fetched subsequent to the LL must access memory after the LL. The processor begins checking the accessed word for modification by other processors and devices.

Load Linked and Store Conditional can be used to atomically update memory locations as shown:

```
L1:
LL    T1, (T0)
ADD   T2, T1, 1
SC    T2, (T0)
BEQ   T2, 0, L1
NOP
```

This atomically increments the word addressed by T0. Changing the ADD to an OR changes this to an atomic bit set.

This instruction is available in User mode, and it is not necessary for CP0 to be enabled.

The operation of LL is undefined if the addressed location is uncached and, for synchronization between multiple processors, the operation of LL is undefined if the addressed location is noncoherent. A cache miss that occurs between LL and SC may cause SC to fail, so no load or store operation should occur between LL and SC, otherwise the SC may never be successful. Exceptions also cause SC to fail, so persistent exceptions must be avoided.

If either of the two least-significant bits of the effective address are non-zero, an address error exception takes place.
Operation:

\[
\begin{align*}
\text{T: } & \quad \text{vAddr } \leftarrow ((\text{offset}_{15})^{48} || \text{offset}_{15..0}) + \text{GPR[base]} \\
& \quad \text{(pAddr, uncached)} \leftarrow \text{AddressTranslation (vAddr, DATA)} \\
& \quad \text{pAddr } \leftarrow \text{pAddr}_{\text{SIZE-1..3}} || (\text{pAddr}_{2..0} \oplus \text{ReverseEndian} || 0^2)) \\
& \quad \text{mem } \leftarrow \text{LoadMemory (uncached, WORD, pAddr, vAddr, DATA)} \\
& \quad \text{byte } \leftarrow \text{vAddr}_{2..0} \oplus \text{BigEndianCPU} || 0^2 \\
& \quad \text{GPR[rt]} \leftarrow (\text{mem}_{31+8*\text{byte}})^{32} || \text{mem}_{31+8*\text{byte}..8*\text{byte}} \\
& \quad \text{LLbit } \leftarrow 1 \\
& \quad \text{SyncOperation()} 
\end{align*}
\]

Exceptions:

- TLB refill exception
- TLB invalid exception
- Bus error exception
- Address error exception
**LLD Load Linked Doubleword**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>LLD</strong></td>
<td></td>
<td></td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>6</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**

LLD rt, offset(base)

**Description:**

The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The contents of the doubleword at the memory location specified by the effective address are loaded into general register rt.

This instruction implicitly performs a SYNC operation; all loads and stores to shared memory fetched prior to the LLD must access memory before the LLD, and loads and stores to shared memory fetched subsequent to the LLD must access memory after the LLD. The processor begins checking the accessed doubleword for modification by other processors and devices.

Load Linked Doubleword and Store Conditional Doubleword can be used to atomically update memory locations:

```
L1:
    LLD  T1, (T0)
    ADD  T2, T1, 1
    SCD  T2, (T0)
    BEQ  T2, 0, L1
    NOP
```

This atomically increments the word addressed by T0. Changing the ADD to an OR changes this to an atomic bit set.

The operation of LLD is undefined if the addressed location is uncached and, for synchronization between multiple processors, the operation of LLD is undefined if the addressed location is noncoherent. A cache miss that occurs between LLD and SCD may cause SCD to fail, so no load or store operation should occur between LLD and SCD, otherwise the SCD may never be successful. Exceptions also cause SCD to fail, so persistent exceptions must be avoided.

This instruction is available in User mode, and it is not necessary for CP0 to be enabled.

If any of the three least-significant bits of the effective address are non-zero, an address error exception takes place.
Operation:

\[
T: \quad \text{vAddr} \leftarrow ((\text{offset}_{15})^{16} \| \text{offset}_{15,0}) + \text{GPR}[\text{base}]
\]
\[
(p\text{Addr}, \text{uncached}) \leftarrow \text{AddressTranslation (vAddr, DATA)}
\]
\[
\text{mem} \leftarrow \text{LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA)}
\]
\[
\text{GPR}[rt] \leftarrow \text{mem}
\]
\[
\text{LLbit} \leftarrow 1
\]
\[
\text{SyncOperation()}
\]

Exceptions:

- TLB refill exception
- TLB invalid exception
- Bus error exception
- Address error exception
### Format:

LUI rt, immediate

### Description:

The 16-bit immediate is shifted left 16 bits and concatenated to 16 bits of zeros. The result is placed into general register rt. The loaded word is sign-extended.

### Operation:

\[
T: \text{GPR}[rt] \leftarrow \text{immediate}_{15}^{32} \| \text{immediate} \| 0^{16}
\]

### Exceptions:

None
**LW**

**Load Word**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>LW</td>
<td></td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

LW rt, offset(base)

**Description:**

The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The contents of the word at the memory location specified by the effective address are loaded into general register rt. The loaded word is sign-extended.

If either of the two least-significant bits of the effective address is non-zero, an address error exception occurs.

**Operation:**

\[
\begin{align*}
T: & \quad \text{vAddr} \leftarrow ((\text{offset}_{15})^{48} \ || \ \text{offset}_{15..0}) + \text{GPR}[\text{base}] \\
(p\text{Addr}, \text{uncached}) & \leftarrow \text{AddressTranslation (vAddr, DATA)} \\
p\text{Addr} & \leftarrow p\text{Addr}_{\text{SIZE-1..3}} \ || \ (p\text{Addr}_{2..0} \ xor (\text{ReverseEndian} \ || \ 0^2)) \\
\text{mem} & \leftarrow \text{LoadMemory (uncached, WORD, pAddr, vAddr, DATA)} \\
\text{byte} & \leftarrow \text{vAddr}_{2..0} \ xor (\text{BigEndianCPU} \ || \ 0^2) \\
\text{GPR}[rt] & \leftarrow (\text{mem}_{31+8*\text{byte}})^{32} \ || \ \text{mem}_{31+8*\text{byte}..\text{8*byte}}
\end{align*}
\]

**Exceptions:**

- TLB refill exception
- TLB invalid exception
- Bus error exception
- Address error exception
LWCz

Load Word To Coprocessor

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>LWCz</td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>1 1 0 0 x x</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
LWCz rt, offset(base)

**Description:**
The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The processor reads a word from the addressed memory location, and makes the data available to coprocessor unit z.

The manner in which each coprocessor uses the data is defined by the individual coprocessor specifications.

If either of the two least-significant bits of the effective address is non-zero, an address error exception occurs.

This instruction is not valid for use with CP0.

NOTE: *See the table “Opcode Bit Encoding” on next page, or “CPU Instruction Opcode Bit Encoding” at the end of Appendix A.

**Operation:**

T: vAddr ← ((offset_{15})^{48} || offset_{15..0}) + GPR(base)
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddr_{PSIZE-1..3} || (pAddr_{2..0} xor (ReverseEndian || 0^2))
mem ← LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA)
byte ← vAddr_{2..0} xor (BigEndianCPU || 0^2)
COPzLW (byte, rt, mem)

**Exceptions:**
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Coprocessor unusable exception

**Opcode Bit Encoding:**

<table>
<thead>
<tr>
<th>LWCz</th>
<th>Bit #31</th>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>LWC1</td>
<td>1 1 0 0 0 0 1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>LWC2</td>
<td>1 1 0 0 1 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Opcodes: LWC1

Coprocessor Unit Number
**Format:**

LWL rt, offset(base)

**Description:**

This instruction can be used in combination with the LWR instruction to load a register with four consecutive bytes from memory, when the bytes cross a word boundary. LWL loads the left portion of the register with the appropriate part of the high-order word; LWR loads the right portion of the register with the appropriate part of the low-order word.

The LWL instruction adds its sign-extended 16-bit offset to the contents of general register base to form a virtual address which can specify an arbitrary byte. It reads bytes only from the word in memory which contains the specified starting byte. From one to four bytes will be loaded, depending on the starting byte specified. The loaded word is sign-extended.

Conceptually, it starts at the specified byte in memory and loads that byte into the high-order (left-most) byte of the register; then it loads bytes from memory into the register until it reaches the low-order byte of the word in memory. The least-significant (right-most) byte(s) of the register will not be changed.

The contents of general register rt are internally bypassed within the processor so that no NOP is needed between an immediately preceding load instruction which specifies register rt and a following LWL (or LWR) instruction which also specifies register rt.

No address exceptions due to alignment are possible.
**Operation:**

Given a doubleword in a register and a doubleword in memory, the operation of LWL is as follows:

```
T:   vAddr ← ((offset_{15})^{48} || offset_{15..0}) + GPR[base]
    (pAddr, uncached) ← AddressTranslation (vAddr, DATA)
    pAddr ← pAddr_{PSIZE-1..3} || (pAddr_{2..0} xor ReverseEndian^3)
    if BigEndianMem = 0 then
        pAddr ← pAddr_{PSIZE-1..3} || 0^3
    endif
    byte ← vAddr_{1..0} xor BigEndianCPU^2
    word ← vAddr_{2} xor BigEndianCPU
    mem ← LoadMemory (uncached, 0 || byte, pAddr, vAddr, DATA)
    temp ← mem_{31+32*word-8*byte..32*word || GPR[rt]_{23-8*byte..0}}
    GPR[rt] ← (temp_{31})^{32} || temp
```

```
<table>
<thead>
<tr>
<th>vAddr_{2..0}</th>
<th>destination</th>
<th>type</th>
<th>offset</th>
<th>destination</th>
<th>type</th>
<th>offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>S S S S S F G H</td>
<td>0</td>
<td>0</td>
<td>S S S S I J K L</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<td>1</td>
<td>S S S S O P G H</td>
<td>1</td>
<td>0</td>
<td>S S S S J K L H</td>
<td>2</td>
<td>4</td>
</tr>
<tr>
<td>2</td>
<td>S S S S N O P H</td>
<td>2</td>
<td>0</td>
<td>S S S S K L G H</td>
<td>1</td>
<td>4</td>
</tr>
<tr>
<td>3</td>
<td>S S S S M N O P</td>
<td>3</td>
<td>0</td>
<td>S S S S L F G H</td>
<td>0</td>
<td>4</td>
</tr>
<tr>
<td>4</td>
<td>S S S S L F G H</td>
<td>0</td>
<td>4</td>
<td>S S S S M N O P</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>5</td>
<td>S S S S K L G H</td>
<td>1</td>
<td>4</td>
<td>S S S S N O P H</td>
<td>2</td>
<td>0</td>
</tr>
<tr>
<td>6</td>
<td>S S S S J K L H</td>
<td>2</td>
<td>4</td>
<td>S S S S O P G H</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>7</td>
<td>S S S S I J K L</td>
<td>3</td>
<td>4</td>
<td>S S S S F G H</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>
```

Key to table:
- **LEM**: Little-endian memory (BigEndianMem = 0)
- **BEM**: BigEndianMem = 1
- **Type**: AccessType (see Table 2.1 on page 3) sent to memory
- **Offset**: pAddr_{2..0} sent to memory
- **Sign-extend of destination_{31}

**Exceptions:**
- TLB refill exception
- TLB invalid exception
- Bus error exception
- Address error exception
LWR Instruction Details

**Format:**

LWR rt, offset(base)

**Description:**

This instruction can be used in combination with the LWL instruction to load a register with four consecutive bytes from memory, when the bytes cross a word boundary. LWR loads the right portion of the register with the appropriate part of the low-order word; LWL loads the left portion of the register with the appropriate part of the high-order word.

The LWR instruction adds its sign-extended 16-bit offset to the contents of general register base to form a virtual address which can specify an arbitrary byte. It reads bytes only from the word in memory which contains the specified starting byte. From one to four bytes will be loaded, depending on the starting byte specified. The loaded word is sign-extended.

Conceptually, it starts at the specified byte in memory and loads that byte into the low-order (right-most) byte of the register; then it loads bytes from memory into the register until it reaches the high-order byte of the word in memory. The most significant (left-most) byte(s) of the register will not be changed.

The contents of general register rt are internally bypassed within the processor so that no NOP is needed between an immediately preceding load instruction which specifies register rt and a following LWR (or LWL) instruction which also specifies register rt.

No address exceptions due to alignment are possible.
Operation:

Given a word in a register and a word in memory, the operation of LWR is as follows:

![LWR Table]

### BigEndianCPU = 0

<table>
<thead>
<tr>
<th>vAddr_{2..0}</th>
<th>destination</th>
<th>type</th>
<th>offset</th>
<th>BigEndianCPU = 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>S S S S M N O P</td>
<td>0</td>
<td>0 4</td>
<td>S S S S E F G I 0 7 0</td>
</tr>
<tr>
<td>1</td>
<td>S S S S E M N O</td>
<td>1</td>
<td>1 4</td>
<td>S S S S E F I J 1 6 0</td>
</tr>
<tr>
<td>2</td>
<td>S S S S E F M N</td>
<td>2</td>
<td>2 4</td>
<td>S S S S E I J K 2 5 0</td>
</tr>
<tr>
<td>3</td>
<td>S S S S E F G M</td>
<td>3</td>
<td>3 4</td>
<td>S S S S I J K L 3 4 0</td>
</tr>
<tr>
<td>4</td>
<td>S S S S I J K L</td>
<td>0</td>
<td>4 0</td>
<td>S S S S E F G M 0 3 4</td>
</tr>
<tr>
<td>5</td>
<td>S S S S E I J K</td>
<td>1</td>
<td>5 0</td>
<td>S S S S E F M N 1 2 4</td>
</tr>
<tr>
<td>6</td>
<td>S S S S E F I J</td>
<td>2</td>
<td>6 0</td>
<td>S S S S E M N O 2 1 4</td>
</tr>
<tr>
<td>7</td>
<td>S S S S E F G I</td>
<td>3</td>
<td>7 0</td>
<td>S S S S M N O P 3 0 4</td>
</tr>
</tbody>
</table>

### BigEndianCPU = 1

<table>
<thead>
<tr>
<th>vAddr_{2..0}</th>
<th>destination</th>
<th>type</th>
<th>offset</th>
<th>BigEndianCPU = 1</th>
</tr>
</thead>
</table>

#### Key to table:
- **LEM**: Little-endian memory (BigEndianMem = 0)
- **BEM**: BigEndianMem = 1
- **Type**: AccessType (see Table 2.1 on page 3) sent to memory
- **Offset**: vAddr_{2..0} sent to memory
- **S**: sign-extend of destination_{31}

#### Exceptions:
- TLB refill exception
- TLB invalid exception
- Bus error exception
- Address error exception
**Format:**

LWU rt, offset(base)

**Description:**

The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The contents of the word at the memory location specified by the effective address are loaded into general register rt. The loaded word is zero-extended.

If either of the two least-significant bits of the effective address is non-zero, an address error exception occurs.

**Operation:**

\[
\text{T: } v\text{Addr} \leftarrow (((\text{offset}_{15})^48 \ || \ offset_{15..0}) + \text{GPR}[\text{base}]
\]

\[
(\text{pAddr, uncached}) \leftarrow \text{AddressTranslation (vAddr, DATA)}
\]

\[
\text{pAddr} \leftarrow \text{pAddr}_{\text{PSIZE-1..3}} \ || \ (\text{pAddr}_{2..0} \ xor \ (\text{ReverseEndian || 0}^2))
\]

\[
\text{mem} \leftarrow \text{LoadMemory (uncached, WORD, pAddr, vAddr, DATA)}
\]

\[
\text{byte} \leftarrow v\text{Addr}_{2..0} \ xor \ (\text{BigEndianCPU || 0}^2)
\]

\[
\text{GPR}[rt] \leftarrow 0^32 \ || \ \text{mem}_{31+8^*\text{byte..8^*byte}}
\]

**Exceptions:**

- TLB refill exception
- TLB invalid exception
- Bus error exception
- Address error exception
Format:
MFC0 rt, rd

Description:
The contents of coprocessor register rd of the CP0 are loaded into general register rt. May be used on both 32-bit and 64-bit CP0 registers.

Operation:

T: \( \text{data} \leftarrow \text{CPR}[0, \text{rd}] \)
T+1: \( \text{GPR}[rt] \leftarrow (\text{data}_{31})^{32} \mid \text{data}_{31..0} \)

Exceptions:
Coprocessor unusable exception
**MFCz**  
**Move From Coprocessor**  
**MFCz**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COPz</td>
<td>MF</td>
<td>rt</td>
<td>rd</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010010</td>
<td>00000</td>
<td>00000000</td>
<td>000000000000</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Note:**  
*See the table “Opcode Bit Encoding” on next page, or “CPU Instruction Opcode Bit Encoding” at the end of Appendix A.

**Format:**  
MFCz rt, rd

**Description:**  
The contents of coprocessor register rd of coprocessor z are loaded into general register rt.
Execution of the instruction referencing coprocessor 3 causes a reserved instruction exception, not a coprocessor unusable exception.

**Operation:**

\[
\begin{align*}
T: & \quad \text{if } rd_0 = 0 \text{ then } \\
& \quad \quad \text{data} \leftarrow CPR[z,rd_{4..1} || 0]_{31..0} \\
& \quad \text{else } \\
& \quad \quad \text{data} \leftarrow CPR[z,rd_{4..1} || 0]_{63..32} \\
& \quad \text{endif} \\
T+1: & \quad \text{GPR}[rt] \leftarrow (data_{31})^{32} || data
\end{align*}
\]

**Exceptions:**  
Coprocessor unusable exception  
Reserved instruction exception (coprocessor 3)

**Opcode Bit Encoding:**
**Format:**
MFHI rd

**Description:**
The contents of special register HI are loaded into general register rd.
To ensure proper operation in the event of interruptions, the two instructions which follow a MFHI instruction may not be any of the instructions which modify the HI register: MULT, MULTU, DIV, DIVU, MTHI, DMULT, DMULTU, DDIV, DDIVU.

**Operation:**

| T: | GPR[rd] ← HI |

**Exceptions:**
None
**MFLO**  
*Move From Lo*

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>000000</td>
<td>0</td>
<td>0</td>
<td>rd</td>
<td>0</td>
<td>00000</td>
<td>MFLO</td>
<td>010010</td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>10</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

MFLO  rd

**Description:**

The contents of special register *LO* are loaded into general register *rd*. To ensure proper operation in the event of interruptions, the two instructions which follow a MFLO instruction may not be any of the instructions which modify the *LO* register: MULT, MULTU, DIV, DIVU, MTLO, DMULT, DMULTU, DDIV, DDIVU.

**Operation:**

\[
T: \text{GPR}[rd] \leftarrow \text{LO}
\]

**Exceptions:**

None
Format:
MTC0 rt, rd

Description:
The contents of general register rt are loaded into coprocessor register rd of CP0.
Because the state of the virtual address translation system may be altered by this instruction, the operation of load instructions, store instructions, and TLB operations immediately prior to and after this instruction are undefined.

Operation:

| T: | data ← GPR[rt] |
| T+1: | CPR[0,rd] ← data |

Exceptions:
Coprocessor unusable exception
**Format:**

\[ \text{MTCz \ rt, rd} \]

**Description:**

The contents of general register \( rt \) are loaded into coprocessor register \( rd \) of coprocessor \( z \). Execution of the instruction referencing coprocessor 3 causes a reserved instruction exception, not a coprocessor unusable exception.

**Operation:**

\[
\begin{align*}
T: & \quad \text{data } \leftarrow \text{GPR}[rt]_{31..0} \\
T+1: & \quad \text{if } rd_0 = 0 \\
& \quad \text{CPR}[z, rd_{4..1} || 0] \leftarrow \text{CPR}[z, rd_{4..1} || 0]_{63..32} || \text{data} \\
& \quad \text{else} \\
& \quad \text{CPR}[z, rd_{4..1} || 0] \leftarrow \text{data} || \text{CPR}[z, rd_{4..1} || 0]_{31..0} \\
& \quad \text{endif}
\end{align*}
\]

**Exceptions:**

- Coprocessor unusable exception
- Reserved instruction exception (coprocessor 3)

**Opcode Bit Encoding:**

<table>
<thead>
<tr>
<th>MTCz</th>
<th>Bit #</th>
<th>31</th>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>23</th>
<th>22</th>
<th>21</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>COP1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>COP2</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- Opcode
- Coprocessor Unit Number
- Coprocessor Suboperation
**MTHI**

**Move To HI**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>6</th>
<th>5</th>
<th>15</th>
<th>6</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>rs</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**MTHI**

**Format:**

MTHI rs

**Description:**

The contents of general register *rs* are loaded into special register *HI*. If a MTHI operation is executed following a MULT, MULTU, DIV, or DIVU instruction, but before any MFLO, MFHI, MTLO, or MTHI instructions, the contents of special register *LO* are undefined.

**Operation:**

- T–2: HI ← undefined
- T–1: HI ← undefined
- T: HI ← GPR[rs]

**Exceptions:**

None
**Format:**

MTLO rs

**Description:**
The contents of general register rs are loaded into special register LO. If a MTLO operation is executed following a MULT, MULTU, DIV, or DIVU instruction, but before any MFLO, MFHI, MTLO, or MTHI instructions, the contents of special register HI are undefined.

**Operation:**

| T−2: | LO ← undefined |
| T−1: | LO ← undefined |
| T:   | LO ← GPR[rs]   |

**Exceptions:**
None
**MULT**

**Description:**
The contents of general registers *rs* and *rt* are multiplied, treating both operands as 32-bit 2’s complement values. No integer overflow exception occurs under any circumstances. The operands must be valid 32-bit, sign-extended values.

When the operation completes, the low-order word of the double result is loaded into special register *LO*, and the high-order word of the double result is loaded into special register *HI*.

If either of the two preceding instructions is MFHI or MFLO, the results of these instructions are undefined. Correct operation requires separating reads of *HI* or *LO* from writes by a minimum of two other instructions.

**Operation:**

\[
\begin{array}{c}
T-2: \\
LO & \leftarrow \text{undefined} \\
HI & \leftarrow \text{undefined} \\
T-1: \\
LO & \leftarrow \text{undefined} \\
HI & \leftarrow \text{undefined} \\
T: \\
t & \leftarrow \text{GPR}[rs]_{31..0} \times \text{GPR}[rt]_{31..0} \\
LO & \leftarrow (t_{31})_{32} || t_{31..0} \\
HI & \leftarrow (t_{63})_{32} || t_{63..32}
\end{array}
\]

**Exceptions:**
None
MULTU Multiply Unsigned MULTU

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>rs</td>
<td>rt</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>10</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

MULTU rs, rt

**Description:**

The contents of general register rs and the contents of general register rt are multiplied, treating both operands as unsigned values. No overflow exception occurs under any circumstances. The operands must be valid 32-bit, sign-extended values.

When the operation completes, the low-order word of the double result is loaded into special register LO, and the high-order word of the double result is loaded into special register HI.

If either of the two preceding instructions is MFHI or MFLO, the results of these instructions are undefined. Correct operation requires separating reads of HI or LO from writes by a minimum of two instructions.

**Operation:**

\[
\begin{align*}
T-2: & \quad LO \leftarrow \text{undefined} \\
& \quad HI \leftarrow \text{undefined} \\
T-1: & \quad LO \leftarrow \text{undefined} \\
& \quad HI \leftarrow \text{undefined} \\
T: & \quad t \leftarrow (0 \parallel \text{GPR}[rs]_{31..0}) \ast (0 \parallel \text{GPR}[rt]_{31..0}) \\
& \quad LO \leftarrow (t_{31})_{32} || t_{31..0} \\
& \quad HI \leftarrow (t_{63})_{32} || t_{63..32}
\end{align*}
\]

**Exceptions:**

None
**NOR**

**Format:**
NOR rd, rs, rt

**Description:**
The contents of general register rs are combined with the contents of general register rt in a bit-wise logical NOR operation. The result is placed into general register rd.

**Operation:**

| T: | GPR[rd] ← GPR[rs] nor GPR[rt] |

**Exceptions:**
None
**Format:**

OR rd, rs, rt

**Description:**

The contents of general register rs are combined with the contents of general register rt in a bit-wise logical OR operation. The result is placed into general register rd.

**Operation:**

\[
T: \quad GPR[rd] \leftarrow GPR[rs] \text{ or } GPR[rt]
\]

**Exceptions:**

None
ORI

**Format:**

ORI rt, rs, immediate

**Description:**

The 16-bit immediate is zero-extended and combined with the contents of general register rs in a bit-wise logical OR operation. The result is placed into general register rt.

**Operation:**

\[ T: \text{GPR}[rt] \leftarrow \text{GPR}[rs]_{63..16} \lor (\text{immediate} \text{ or } \text{GPR}[rs]_{15..0}) \]

**Exceptions:**

None
**SB: Store Byte**

**Format:**

SB rt, offset(base)

**Description:**

The 16-bit *offset* is sign-extended and added to the contents of general register *base* to form a virtual address. The least-significant byte of register *rt* is stored at the effective address.

**Operation:**

\[
T: \quad \text{vAddr} \leftarrow ((\text{offset}_{15})^{48} \ || \ \text{offset}_{15..0}) + \text{GPR}[base]
\]

\[
(\text{pAddr, uncached}) \leftarrow \text{AddressTranslation (vAddr, DATA)}
\]

\[
\text{pAddr} \leftarrow \text{pAddr}_{\text{PSIZE-1..3}} \ || \ (\text{pAddr}_{2..0} \ xor \ \text{ReverseEndian}^{3})
\]

\[
\text{byte} \leftarrow \text{vAddr}_{2..0} \ xor \ \text{BigEndianCPU}^{3}
\]

\[
\text{data} \leftarrow \text{GPR}[rt]_{63-8*\text{byte..0}} \ || \ 0^{8*\text{byte}}
\]

\[
\text{StoreMemory (uncached, BYTE, data, pAddr, vAddr, DATA)}
\]

**Exceptions:**

- TLB refill exception
- TLB invalid exception
- TLB modification exception
- Bus error exception
- Address error exception
**Format:**

SC rt, offset(base)

**Description:**

The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The contents of general register rt are conditionally stored at the memory location specified by the effective address.

This instruction implicitly performs a SYNC operation; loads and stores to shared memory fetched prior to the SC must access memory before the SC; loads and stores to shared memory fetched subsequent to the SC must access memory after the SC.

If any other processor or device has modified the physical address since the time of the previous Load Linked instruction, or if an ERET instruction occurs between the Load Linked instruction and this store instruction, the store fails and is inhibited from taking place.

The success or failure of the store operation (as defined above) is indicated by the contents of general register rt after execution of the instruction. A successful store sets the contents of general register rt to 1; an unsuccessful store sets it to 0.

The operation of Store Conditional is undefined when the address is different from the address used in the last Load Linked.

This instruction is available in User mode; it is not necessary for CP0 to be enabled.

If either of the two least-significant bits of the effective address is non-zero, an address error exception takes place.

If this instruction should both fail and take an exception, the exception takes precedence.

**Operation:**

\[
\begin{align*}
T: & \quad \text{vAddr} \leftarrow (\text{offset}_{15}^{48} \| \text{offset}_{15..0}) + \text{GPR}[\text{base}] \\
& \quad (\text{pAddr, uncached}) \leftarrow \text{AddressTranslation}(\text{vAddr, DATA}) \\
& \quad \text{pAddr} \leftarrow \text{pAddr}_{\text{PSIZE-1..3}} \| \text{(pAddr}_{2..0} \text{xor (ReverseEndian} \| 0^2)) \\
& \quad \text{data} \leftarrow \text{GPR}[rt]_{63..8^\text{byte..0}} \| 0^8\text{byte}^6.0 \\
& \quad \text{if LLbit then} \\
& \quad \quad \text{StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA)} \\
& \quad \quad \text{endif} \\
& \quad \text{GPR}[rt] \leftarrow 0^{63} \| \text{LLbit} \\
& \quad \text{SyncOperation()} 
\end{align*}
\]

**Exceptions:**

- TLB refill exception
- TLB invalid exception
- TLB modification exception
- Bus error exception
- Address error exception
**Format:**

SCD rt, offset(base)

**Description:**

The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The contents of general register rt are conditionally stored at the memory location specified by the effective address.

This instruction implicitly performs a SYNC operation; loads and stores to shared memory fetched prior to the SCD must access memory before the SCD; loads and stores to shared memory fetched subsequent to the SCD must access memory after the SCD.

If any other processor or device has modified the physical address since the time of the previous Load Linked Doubleword instruction, or if an ERET instruction occurs between the Load Linked Doubleword instruction and this store instruction, the store fails and is inhibited from taking place.

The success or failure of the store operation (as defined above) is indicated by the contents of general register rt after execution of the instruction. A successful store sets the contents of general register rt to 1; an unsuccessful store sets it to 0.

The operation of Store Conditional Doubleword is undefined when the address is different from the address used in the last Load Linked Doubleword.

This instruction is available in User mode; it is not necessary for CP0 to be enabled.

If either of the three least-significant bits of the effective address is non-zero, an address error exception takes place.

If this instruction should both fail and take an exception, the exception takes precedence.

**Operation:**

\[
\begin{align*}
T: & \quad \text{vAddr} \leftarrow ((\text{offset}_{15})^{48} || \text{offset}_{15..0}) + \text{GPR[base]} \\
& \quad (\text{pAddr, uncached}) \leftarrow \text{AddressTranslation (vAddr, DATA)} \\
& \quad \text{data} \leftarrow \text{GPR[rt]} \\
& \quad \text{if LLbit then} \\
& \quad \quad \text{StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA)} \\
& \quad \quad \text{endif} \\
& \quad \text{GPR[rt]} \leftarrow 0^{63} || \text{LLbit} \\
& \quad \text{SyncOperation()} 
\end{align*}
\]

**Exceptions:**

- TLB refill exception
- TLB invalid exception
- TLB modification exception
- Bus error exception
- Address error exception
### Store Doubleword (SD)

**Format:**

\[
\text{SD } rt, \text{ offset}(\text{base})
\]

**Description:**

The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The contents of general register rt are stored at the memory location specified by the effective address.

If either of the three least-significant bits of the effective address are non-zero, an address error exception occurs.

**Operation:**

\[
T: \quad \text{vAddr} \leftarrow ((\text{offset}_{15})^{48} \parallel \text{offset}_{15..0}) + \text{GPR}[\text{base}]
\]

(pAddr, uncached) ← AddressTranslation (vAddr, DATA)

\[
data \leftarrow \text{GPR}[rt]
\]

StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA)

**Exceptions:**
- TLB refill exception
- TLB invalid exception
- TLB modification exception
- Bus error exception
- Address error exception
**Format:**

SDCz rt, offset(base)

**Description:**

The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. Coprocessor unit z sources a doubleword, which the processor writes to the addressed memory location. The data to be stored is defined by individual coprocessor specifications.

If any of the three least-significant bits of the effective address are non-zero, an address error exception takes place.

This instruction is not valid for use with CP0.

This instruction is undefined when the least-significant bit of the rt field is non-zero.

**Operation:**

\[
\begin{align*}
T: & \quad \text{vAddr} \leftarrow ([\text{offset}_{15}]^{48} \mid \text{offset}_{15..0}) + \text{GPR}[\text{base}] \\
& \quad (\text{pAddr}, \text{uncached}) \leftarrow \text{AddressTranslation} (\text{vAddr}, \text{DATA}) \\
& \quad \text{data} \leftarrow \text{COPzSD}(\text{rt}) \\
& \quad \text{StoreMemory} (\text{uncached}, \text{DOUBLEWORD}, \text{data}, \text{pAddr}, \text{vAddr}, \text{DATA})
\end{align*}
\]

**Note:** *See the table in this section under “Opcode Bit Encoding.” Also see “CPU Instruction Opcode Bit Encoding” at the end of Appendix A.

**Exceptions:**

TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
Coprocessor unusable exception

**Opcode Bit Encoding:**

<table>
<thead>
<tr>
<th>SDCz</th>
<th>Bit #31</th>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SDC1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>SDC2</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

SD opcode Coprocessor Unit Number
**Format:**
SDL rt, offset(base)

**Description:**
This instruction can be used with the SDR instruction to store the contents of a register into eight consecutive bytes of memory, when the bytes cross a doubleword boundary. SDL stores the left portion of the register into the appropriate part of the high-order doubleword of memory; SDR stores the right portion of the register into the appropriate part of the low-order doubleword.

The SDL instruction adds its sign-extended 16-bit offset to the contents of general register base to form a virtual address which may specify an arbitrary byte. It alters only the word in memory which contains that byte. From one to four bytes will be stored, depending on the starting byte specified.

Conceptually, it starts at the most-significant byte of the register and copies it to the specified byte in memory; then it copies bytes from register to memory until it reaches the low-order byte of the word in memory.

No address exceptions due to alignment are possible.

**Operation:**

```
T: vAddr ← ((offset15)48 || offset15..0) + GPR[base]
   (pAddr, uncached) ← AddressTranslation (vAddr, DATA)
   pAddr ← pAddrPSIZE -1..3 || (pAddr2..0 xor ReverseEndian3)
   if BigEndianMem = 0 then
      pAddr ← pAddr31..3 || 03
   endif
   byte ← vAddr2..0 xor BigEndianCPU3
   data ← 056–8*byte || GPR[rt]63..56–8*byte
   Storememory (uncached, byte, data, pAddr, vAddr, DATA)
```
Given a doubleword in a register and a doubleword in memory, the operation of SDL is as follows:

**SDL**

- **Register**
  - A B C D E F G H
- **Memory**
  - I J K L M N O P

**BigEndianCPU = 0**

<table>
<thead>
<tr>
<th>vAddr2_0</th>
<th>destination</th>
<th>type</th>
<th>offset</th>
<th>BigEndianCPU = 1</th>
<th>destination</th>
<th>type</th>
<th>offset</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td>LEM</td>
<td></td>
<td></td>
<td></td>
<td>BEM</td>
</tr>
<tr>
<td>0</td>
<td>I J K L M N O A</td>
<td>0</td>
<td>0 7</td>
<td>A B C D E F G H</td>
<td>7</td>
<td>0 0</td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>I J K L M N A B</td>
<td>1</td>
<td>0 6</td>
<td>I A B C D E F G</td>
<td>6</td>
<td>0 1</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>I J K L M A B C</td>
<td>2</td>
<td>0 5</td>
<td>I J A B C D E F</td>
<td>5</td>
<td>0 2</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>I J K L A B C D</td>
<td>3</td>
<td>0 4</td>
<td>I J K A B C D E</td>
<td>4</td>
<td>0 3</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>I J K A B C D E</td>
<td>4</td>
<td>0 3</td>
<td>I J K L A B C D</td>
<td>3</td>
<td>0 4</td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>I J A B C D E F</td>
<td>5</td>
<td>0 2</td>
<td>I J K L M A B C</td>
<td>2</td>
<td>0 5</td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>I A B C D E F G</td>
<td>6</td>
<td>0 1</td>
<td>I J K L M N A B</td>
<td>1</td>
<td>0 6</td>
<td></td>
</tr>
<tr>
<td>7</td>
<td>A B C D E F G H</td>
<td>7</td>
<td>0 0</td>
<td>I J K L M N O A</td>
<td>0</td>
<td>0 7</td>
<td></td>
</tr>
</tbody>
</table>

**BigEndianCPU = 1**

**Exceptions:**
- TLB refill exception
- TLB invalid exception
- TLB modification exception
- Bus error exception
- Address error exception

- **LEM** Little-endian memory (BigEndianMem = 0)
- **BEM** BigEndianMem = 1
- **Type** AccessType (see Table 2.1 on page 2-3) sent to memory
- **Offset** pAddr2_0 sent to memory
**Format:**
SDR rt, offset(base)

**Description:**
This instruction can be used with the SDL instruction to store the contents of a register into eight consecutive bytes of memory, when the bytes cross a boundary between two doublewords. SDR stores the right portion of the register into the appropriate part of the low-order doubleword; SDL stores the left portion of the register into the appropriate part of the low-order doubleword of memory.

The SDR instruction adds its sign-extended 16-bit offset to the contents of general register base to form a virtual address which may specify an arbitrary byte. It alters only the word in memory which contains that byte. From one to eight bytes will be stored, depending on the starting byte specified.

Conceptually, it starts at the least-significant (rightmost) byte of the register and copies it to the specified byte in memory; then it copies bytes from register to memory until it reaches the high-order byte of the word in memory. No address exceptions due to alignment are possible.

**Operation:**

\[
T: \quad vAddr \leftarrow ((\text{offset}_{15})_{48} \| \text{offset}_{15..0}) + \text{GPR}[\text{base}]
\]
\[
(pAddr, \text{uncached}) \leftarrow \text{AddressTranslation}(vAddr, \text{DATA})
\]
\[
pAddr \leftarrow pAddr_{\text{PSIZE} - 1..3} \| (pAddr_{2..0} \oplus \text{ReverseEndian})
\]

If BigEndianMem = 0 then
\[
pAddr \leftarrow pAddr_{\text{PSIZE} - 31..3} \| 0
\]
endif
\[
\text{byte} \leftarrow vAddr_{1..0} \oplus \text{BigEndianCPU}
\]
\[
\text{data} \leftarrow \text{GPR}[rt]_{63-8*\text{byte}} \| 0^{8*\text{byte}}
\]

Given a doubleword in a register and a doubleword in memory, the operation of SDR is as follows:
## SDR

<table>
<thead>
<tr>
<th>Register</th>
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
<th>E</th>
<th>F</th>
<th>G</th>
<th>H</th>
</tr>
</thead>
<tbody>
<tr>
<td>Memory</td>
<td>I</td>
<td>J</td>
<td>K</td>
<td>L</td>
<td>M</td>
<td>N</td>
<td>O</td>
<td>P</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>vAddr&lt;sub&gt;2..0&lt;/sub&gt;</th>
<th>BigEndianCPU = 0</th>
<th>BigEndianCPU = 1</th>
<th>Offset</th>
<th>BigEndianCPU = 0</th>
<th>BigEndianCPU = 1</th>
<th>Offset</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>destination</td>
<td>type</td>
<td>LEM</td>
<td>BEM</td>
<td>destination</td>
<td>type</td>
</tr>
<tr>
<td>0</td>
<td>A B C D E F G H</td>
<td>7 0 0</td>
<td>H J K L M N O P</td>
<td>0 7 0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>B C D E F G H P</td>
<td>6 1 0</td>
<td>G H K L M N O P</td>
<td>1 6 0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>C D E F G H O P</td>
<td>5 2 0</td>
<td>F G H L M N O P</td>
<td>2 5 0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>D E F G H N O P</td>
<td>4 3 0</td>
<td>E F G H M N O P</td>
<td>3 4 0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>E F G H M N O P</td>
<td>3 4 0</td>
<td>D E F G H N O P</td>
<td>4 3 0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>F G H L M N O P</td>
<td>2 5 0</td>
<td>C D E F G H O P</td>
<td>5 2 0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>G H K L M N O P</td>
<td>1 6 0</td>
<td>B C D E F G H P</td>
<td>6 1 0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>7</td>
<td>H J K L M N O P</td>
<td>0 7 0</td>
<td>A B C D E F G H</td>
<td>7 0 0</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- **LEM**: Little-endian memory (BigEndianMem = 0)
- **BEM**: BigEndianMem = 1
- **Type**: AccessType (see Table 2.1 on page 2-3) sent to memory
- **Offset**: pAddr<sub>2..0</sub> sent to memory

### Exceptions:
- TLB refill exception
- TLB invalid exception
- TLB modification exception
- Bus error exception
- Address error exception
### Format:
\[ \text{SH } rt, \text{ offset(base)} \]

### Description:
The 16-bit \( \text{offset} \) is sign-extended and added to the contents of general register \( \text{base} \) to form an unsigned effective address. The least-significant halfword of register \( rt \) is stored at the effective address. If the least-significant bit of the effective address is non-zero, an address error exception occurs.

### Operation:
\[
T: \quad \text{vAddr} \leftarrow ((\text{offset}[15:0]) \ || \ \text{offset}[15..0]) + \text{GPR}[\text{base}]
\]
\[
(\text{pAddr}, \text{uncached}) \leftarrow \text{AddressTranslation} (\text{vAddr}, \text{DATA})
\]
\[
\text{pAddr} \leftarrow \text{pAddr}_{PSIZE-1..3} \ || \ (\text{pAddr}_{2..0} \ \text{xor} (\text{ReverseEndian}^2 \ || \ 0))
\]
\[
\text{byte} \leftarrow \text{vAddr}_{2..0} \ \text{xor} (\text{BigEndianCPU}^2 \ || \ 0)
\]
\[
\text{data} \leftarrow \text{GPR}[rt]_{63-8*\text{byte}.0} \ || \ 0^{8*\text{byte}}
\]
\[
\text{StoreMemory (uncached, HALFWORD, data, pAddr, vAddr, DATA)}
\]

### Exceptions:
- TLB refill exception
- TLB invalid exception
- TLB modification exception
- Bus error exception
- Address error exception
SLL  
Shift Left Logical

**Format:**

\[
\text{SLL } rd, \ rt, \ sa
\]

**Description:**

The contents of general register \( rt \) are shifted left by \( sa \) bits, inserting zeros into the low-order bits.

The result is placed in register \( rd \).

The operand must be a valid sign-extended, 32-bit value.

**Operation:**

\[
\begin{align*}
T: & \ s \leftarrow 0 || sa \\
& \text{temp} \leftarrow \text{GPR}[rt]_{31..0} || 0^6 \\
& \text{GPR}[rd] \leftarrow (\text{temp}_{31})^{32} || \text{temp}
\end{align*}
\]

**Exceptions:**

None
**Format:**

SLLV rd, rt, rs

**Description:**

The contents of general register rt are shifted left the number of bits specified by the low-order five bits contained in general register rs, inserting zeros into the low-order bits.

The result is placed in register rd.

The operand must be a valid sign-extended, 32-bit value.

**Operation:**

\[
\begin{align*}
T: & \quad s \leftarrow 0 \parallel GP[rs]_{4..0} \\
& \quad \text{temp} \leftarrow GP[rt]_{(31..6)..0} \parallel 0^s \\
& \quad GP[rd] \leftarrow (\text{temp}_{31})^{32} \parallel \text{temp}
\end{align*}
\]

**Exceptions:**

None
SLT Set On Less Than

Format:
SLT rd, rs, rt

Description:
The contents of general register rt are subtracted from the contents of general register rs. Considering both quantities as signed integers, if the contents of general register rs are less than the contents of general register rt, the result is set to one; otherwise the result is set to zero.

The result is placed into general register rd.

No integer overflow exception occurs under any circumstances. The comparison is valid even if the subtraction used during the comparison overflows.

Operation:

T: if GPR[rs] < GPR[rt] then
   GPR[rd] ← 0^{63} || 1
else
   GPR[rd] ← 0^{64}
endif

Exceptions:
None
## SLTI

**Set On Less Than Immediate**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SLTI</td>
<td>rs</td>
<td>rt</td>
<td>immediate</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>001010</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

SLTI rt, rs, immediate

**Description:**

The 16-bit immediate is sign-extended and subtracted from the contents of general register rs. Considering both quantities as signed integers, if rs is less than the sign-extended immediate, the result is set to one; otherwise the result is set to zero.

The result is placed into general register rt.

No integer overflow exception occurs under any circumstances. The comparison is valid even if the subtraction used during the comparison overflows.

**Operation:**

\[
T: \quad \text{if GPR}[rs] < (\text{immediate}_{15}^{48} \ || \ \text{immediate}_{15}\ldots0) \text{ then}
\]
\[
\quad \text{GPR}[rd] \leftarrow 0^{63} \ || \ 1
\]
\[
\text{else}
\]
\[
\quad \text{GPR}[rd] \leftarrow 0^{64}
\]
\[
\text{endif}
\]

**Exceptions:**

None
Format:
SLTIU rt, rs, immediate

Description:
The 16-bit immediate is sign-extended and subtracted from the contents of general register rs. Considering both quantities as unsigned integers, if rs is less than the sign-extended immediate, the result is set to one; otherwise the result is set to zero.
The result is placed into general register rt.
No integer overflow exception occurs under any circumstances. The comparison is valid even if the subtraction used during the comparison overflows.

Operation:

T: if (0 || GPR[rs]) < 0 || (immediate_{15})48 || immediate_{15..0} then
   GPR[rd] ← 0^{63} || 1
else
   GPR[rd] ← 0^{64}
endif

Exceptions:
None
**SLTU**  
Set On Less Than Unsigned  

<table>
<thead>
<tr>
<th></th>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>rs</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6</td>
</tr>
<tr>
<td>rt</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>5</td>
</tr>
<tr>
<td>rd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>5</td>
</tr>
<tr>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>5</td>
</tr>
<tr>
<td>SLTU</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>6</td>
</tr>
</tbody>
</table>

**Format:**  
SLTU rd, rs, rt

**Description:**  
The contents of general register rt are subtracted from the contents of general register rs. Considering both quantities as unsigned integers, if the contents of general register rs are less than the contents of general register rt, the result is set to one; otherwise the result is set to zero. The result is placed into general register rd.

No integer overflow exception occurs under any circumstances. The comparison is valid even if the subtraction used during the comparison overflows.

**Operation:**

```
T: if (0 || GPR[rs]) < 0 || GPR[rt] then
   GPR[rd] ← 063 || 1
else
   GPR[rd] ← 064
endif
```

**Exceptions:**  
None
**Format:**
SRA rd, rt, sa

**Description:**
The contents of general register rt are shifted right by sa bits, sign-extending the high-order bits.
The result is placed in register rd.
The operand must be a valid sign-extended, 32-bit value.

**Operation:**
\[
\begin{align*}
T & : \ s \leftarrow 0 || sa \\
& \quad \text{temp} \leftarrow (\text{GPR}[rt]_{31})^5 || \text{GPR}[rt]_{31..s} \\
& \quad \text{GPR}[rd] \leftarrow (\text{temp}_{31})^{32} || \text{temp}
\end{align*}
\]

**Exceptions:**
None
**SRAV**

**Shift Right**

**Arithmetic Variable**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>rs</td>
<td>rt</td>
<td>rd</td>
<td>0</td>
<td>SRAV</td>
<td>00000</td>
<td>000111</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

SRAV rd, rt, rs

**Description:**

The contents of general register rt are shifted right by the number of bits specified by the low-order five bits of general register rs, sign-extending the high-order bits.

The result is placed in register rd.

The operand must be a valid sign-extended, 32-bit value.

**Operation:**

\[
\begin{align*}
T: & \quad s \leftarrow \text{GPR}[rs]_{4..0} \\
& \quad \text{temp} \leftarrow (\text{GPR}[rt]_{31})^s \ || \ \text{GPR}[rt]_{31..s} \\
& \quad \text{GPR}[rd] \leftarrow (\text{temp}_{31})^{32} \ || \ \text{temp}
\end{align*}
\]

**Exceptions:**

None
**SRL**  
*Shift Right Logical*

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>000000</td>
<td>rt</td>
<td>rd</td>
<td>sa</td>
<td>SRL</td>
<td>000010</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
SRL rd, rt, sa

**Description:**
The contents of general register rt are shifted right by sa bits, inserting zeros into the high-order bits.
The result is placed in register rd.
The operand must be a valid sign-extended, 32-bit value.

**Operation:**

\[
\begin{align*}
T: & \quad s \leftarrow 0 || sa \\
& \quad \text{temp} \leftarrow 0^s || \text{GPR}[rt]_{31..s} \\
& \quad \text{GPR}[rd] \leftarrow (\text{temp}_{31})^{32} || \text{temp}
\end{align*}
\]

**Exceptions:**
None
**SRLV**  
Shift Right Logical Variable  

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>rs</td>
<td>rt</td>
<td>rd</td>
<td>0</td>
<td>00000</td>
<td>SRLV</td>
<td>000110</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

SRLV rd, rt, rs

**Description:**

The contents of general register rt are shifted right by the number of bits specified by the low-order five bits of general register rs, inserting zeros into the high-order bits.

The result is placed in register rd.

The operand must be a valid sign-extended, 32-bit value.

**Operation:**

\[
\begin{align*}
T: & \quad s \leftarrow \text{GPR}[rs]_{4..0} \\
& \quad \text{temp} \leftarrow 0^5 || \text{GPR}[rt]_{31..s} \\
& \quad \text{GPR}[rd] \leftarrow (\text{temp}_{31})^{32} || \text{temp}
\end{align*}
\]

**Exceptions:**

None
**Format:**

SUB rd, rs, rt

**Description:**

The contents of general register rt are subtracted from the contents of general register rs to form a result. The result is placed into general register rd. The operands must be valid sign-extended, 32-bit values.

The only difference between this instruction and the SUBU instruction is that SUBU never traps on overflow.

An integer overflow exception takes place if the carries out of bits 30 and 31 differ (2’s complement overflow). The destination register rd is not modified when an integer overflow exception occurs.

**Operation:**

\[
\begin{align*}
T: & \quad \text{temp} \leftarrow \text{GPR}[rs] - \text{GPR}[rt] \\
& \quad \text{GPR}[rd] \leftarrow (\text{temp}_{31})^{32} || \text{temp}_{31..0}
\end{align*}
\]

**Exceptions:**

Integer overflow exception
### SUBU

**Subtract Unsigned**

#### Format:

SUBU rd, rs, rt

#### Description:

The contents of general register rt are subtracted from the contents of general register rs to form a result.

The result is placed into general register rd.

The operands must be valid sign-extended, 32-bit values.

The only difference between this instruction and the SUB instruction is that SUBU never traps on overflow. No integer overflow exception occurs under any circumstances.

#### Operation:

\[
\begin{align*}
T: & \quad \text{temp} \leftarrow \text{GPR}[rs] - \text{GPR}[rt] \\
& \quad \text{GPR}[rd] \leftarrow (\text{temp}_{31})_{32} \parallel \text{temp}_{31..0}
\end{align*}
\]

#### Exceptions:

None
**Format:**

SW rt, offset(base)

**Description:**

The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. The contents of general register rt are stored at the memory location specified by the effective address.

If either of the two least-significant bits of the effective address are non-zero, an address error exception occurs.

**Operation:**

\[
\begin{align*}
T & \leftarrow ((\text{offset}_{15})^{48} \| \text{offset}_{15..0}) + \text{GPR[base]} \\
(p\text{Addr}, \text{uncached}) & \leftarrow \text{AddressTranslation}(v\text{Addr}, \text{DATA}) \\
p\text{Addr} & \leftarrow p\text{Addr}_{\text{PSIZE}-1..3} \| (p\text{Addr}_{2..0} \text{xor (ReverseEndian} \| 0^2) \\
\text{byte} & \leftarrow v\text{Addr}_{2..0} \text{xor (BigEndianCPU} \| 0^2) \\
\text{data} & \leftarrow \text{GPR[rt]}_{63-8*\text{byte}} \| 0^8*\text{byte} \\
\text{StoreMemory}(\text{uncached}, \text{WORD}, \text{data}, p\text{Addr}, v\text{Addr}, \text{DATA})
\end{align*}
\]

**Exceptions:**

TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
**SWCz**  
**Store Word From Coprocessor**  

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SWCz</td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>1 1 1 0 x*</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**  
SWCz rt, offset(base)

**Description:**  
The 16-bit offset is sign-extended and added to the contents of general register base to form a virtual address. Coprocessor unit z sources a word, which the processor writes to the addressed memory location.

The data to be stored is defined by individual coprocessor specifications.

This instruction is not valid for use with CP0.

If either of the two least-significant bits of the effective address is non-zero, an address error exception occurs.

Execution of the instruction referencing coprocessor 3 causes a reserved instruction exception, not a coprocessor unusable exception.

**Operation:**

T:  
\[ \text{vAddr} \leftarrow (\text{offset}_{15} \| \text{offset}_{15..0}) + \text{GPR}[\text{base}] \]

\[ (\text{pAddr, uncached}) \leftarrow \text{AddressTranslation (vAddr, DATA)} \]

\[ \text{pAddr} \leftarrow \text{pAddr}_{\text{PSIZE-1..3}} \| (\text{pAddr}_{2..0} \oplus (\text{ReverseEndian} \| 0^2)) \]

\[ \text{byte} \leftarrow \text{vAddr}_{2..0} \oplus (\text{BigEndianCPU} \| 0^2) \]

\[ \text{data} \leftarrow \text{COPzSW (byte,rt)} \]

\[ \text{StoreMemory (uncached, WORD, data, pAddr, vAddr DATA)} \]

**Note:**  
*See the table in this section under “Opcode Bit Encoding.”

Also see “CPU Instruction Opcode Bit Encoding” at the end of Appendix A.

**Exceptions:**  
TLB refill exception

TLB invalid exception

TLB modification exception

Bus error exception

Address error exception

Coprocessor unusable exception

Reserved instruction exception (coprocessor 3)

**Opcode Bit Encoding:**

<table>
<thead>
<tr>
<th>SWCz</th>
<th>Bit #31</th>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SWC1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>SWCz</th>
<th>Bit #31</th>
<th>30</th>
<th>29</th>
<th>28</th>
<th>27</th>
<th>26</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SWC2</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>

SW opcode  
Coprocessor Unit Number
**SWL**

**Store Word Left**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SWL</td>
<td>1 0 1 0 1 0</td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

SWL rt, offset(base)

**Description:**

This instruction can be used with the SWR instruction to store the contents of a register into four consecutive bytes of memory, when the bytes cross a word boundary. SWL stores the left portion of the register into the appropriate part of the high-order word of memory; SWR stores the right portion of the register into the appropriate part of the low-order word.

The SWL instruction adds its sign-extended 16-bit offset to the contents of general register base to form a virtual address which may specify an arbitrary byte. It alters only the word in memory which contains that byte. From one to four bytes will be stored, depending on the starting byte specified.

Conceptually, it starts at the most-significant byte of the register and copies it to the specified byte in memory; then it copies bytes from register to memory until it reaches the low-order byte of the word in memory.

No address exceptions due to alignment are possible.

**Operation:**

T: \( vAddr \leftarrow ((\text{offset}_{16})_{16} \parallel \text{offset}_{15\ldots0}) + \text{GPR}[\text{base}] \)

\( (\text{pAddr, uncached}) \leftarrow \text{AddressTranslation}(vAddr, \text{DATA}) \)

\( \text{pAddr} \leftarrow \text{pAddr}_{\text{SIZE}-1\ldots3} \parallel (\text{pAddr}_{2\ldots0} \text{xor ReverseEndian}^3) \)

If BigEndianMem = 0 then

\( \text{pAddr} \leftarrow \text{pAddr}_{31\ldots2} \parallel 0^2 \)

endif

byte \( \leftarrow vAddr_{1\ldots0} \text{xor BigEndianCPU}^2 \)

if \( (vAddr_{2} \text{xor BigEndianCPU}) = 0 \) then

\( \text{data} \leftarrow 0^{32} \parallel 0^{24-8*\text{byte}} \parallel \text{GPR}[\text{rt}]_{31\ldots24-8*\text{byte}} \)

else

\( \text{data} \leftarrow 0^{24-8*\text{byte}} \parallel \text{GPR}[\text{rt}]_{31\ldots24-8*\text{byte}} \parallel 0^{32} \)

endif

\( \text{StoreMemory(uncached, byte, data, pAddr, vAddr, DATA)} \)
Given a doubleword in a register and a doubleword in memory, the operation of SWL is as follows:

**SWL**

<table>
<thead>
<tr>
<th>Register</th>
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
<th>E</th>
<th>F</th>
<th>G</th>
<th>H</th>
</tr>
</thead>
<tbody>
<tr>
<td>Memory</td>
<td>I</td>
<td>J</td>
<td>K</td>
<td>L</td>
<td>M</td>
<td>N</td>
<td>O</td>
<td>P</td>
</tr>
</tbody>
</table>

**BigEndianCPU = 0**

<table>
<thead>
<tr>
<th>vAddr2..0</th>
<th>destination</th>
<th>type</th>
<th>offset</th>
<th>destination</th>
<th>type</th>
<th>offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>I J K L M N O E</td>
<td>0</td>
<td>0</td>
<td>7</td>
<td>E F G H M N O P</td>
<td>3</td>
</tr>
<tr>
<td>1</td>
<td>I J K L M N E F</td>
<td>1</td>
<td>0</td>
<td>6</td>
<td>I E F G M N O P</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>I J K L M E F G</td>
<td>2</td>
<td>0</td>
<td>5</td>
<td>I J E F M N O P</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>I J K L E F G H</td>
<td>3</td>
<td>0</td>
<td>4</td>
<td>I J K E M N O P</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>I J K E M N O P</td>
<td>0</td>
<td>4</td>
<td>3</td>
<td>I J K L E F G H</td>
<td>3</td>
</tr>
<tr>
<td>5</td>
<td>I J E F M N O P</td>
<td>1</td>
<td>4</td>
<td>2</td>
<td>I J K L M E F G</td>
<td>2</td>
</tr>
<tr>
<td>6</td>
<td>I E F G M N O P</td>
<td>2</td>
<td>4</td>
<td>1</td>
<td>I J K L M N E F</td>
<td>1</td>
</tr>
<tr>
<td>7</td>
<td>E F G H M N O P</td>
<td>3</td>
<td>4</td>
<td>0</td>
<td>I J K L M N O E</td>
<td>0</td>
</tr>
</tbody>
</table>

**BigEndianCPU = 1**

<table>
<thead>
<tr>
<th>vAddr2..0</th>
<th>destination</th>
<th>type</th>
<th>offset</th>
<th>destination</th>
<th>type</th>
<th>offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>I J K L M N O E</td>
<td>0</td>
<td>0</td>
<td>7</td>
<td>E F G H M N O P</td>
<td>3</td>
</tr>
<tr>
<td>1</td>
<td>I J K L M N E F</td>
<td>1</td>
<td>0</td>
<td>6</td>
<td>I E F G M N O P</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>I J K L M E F G</td>
<td>2</td>
<td>0</td>
<td>5</td>
<td>I J E F M N O P</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>I J K L E F G H</td>
<td>3</td>
<td>0</td>
<td>4</td>
<td>I J K E M N O P</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>I J K E M N O P</td>
<td>0</td>
<td>4</td>
<td>3</td>
<td>I J K L E F G H</td>
<td>3</td>
</tr>
<tr>
<td>5</td>
<td>I J E F M N O P</td>
<td>1</td>
<td>4</td>
<td>2</td>
<td>I J K L M E F G</td>
<td>2</td>
</tr>
<tr>
<td>6</td>
<td>I E F G M N O P</td>
<td>2</td>
<td>4</td>
<td>1</td>
<td>I J K L M N E F</td>
<td>1</td>
</tr>
<tr>
<td>7</td>
<td>E F G H M N O P</td>
<td>3</td>
<td>4</td>
<td>0</td>
<td>I J K L M N O E</td>
<td>0</td>
</tr>
</tbody>
</table>

**LEM** Little-endian memory (BigEndianMem = 0)

**BEM** BigEndianMem = 1

**Type**AccessType (see Table 2.1 on page 2-3) sent to memory

**Offset** pAddr2..0 sent to memory

**Exceptions:**
- TLB refill exception
- TLB invalid exception
- TLB modification exception
- Bus error exception
- Address error exception
**Format:**

\[ \text{SWR rt, offset(base)} \]

**Description:**

This instruction can be used with the SWL instruction to store the contents of a register into four consecutive bytes of memory, when the bytes cross a boundary between two words. SWR stores the right portion of the register into the appropriate part of the low-order word; SWL stores the left portion of the register into the appropriate part of the low-order word of memory.

The SWR instruction adds its sign-extended 16-bit \( \text{offset} \) to the contents of general register \( \text{base} \) to form a virtual address which may specify an arbitrary byte. It alters only the word in memory which contains that byte. From one to four bytes will be stored, depending on the starting byte specified.

Conceptually, it starts at the least-significant (rightmost) byte of the register and copies it to the specified byte in memory; then copies bytes from register to memory until it reaches the high-order byte of the word in memory.

No address exceptions due to alignment are possible.

---

**SWR Store Word Right**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 0 1 1 1 0</td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

\[ \text{SWR rt, offset(base)} \]

**Description:**

This instruction can be used with the SWL instruction to store the contents of a register into four consecutive bytes of memory, when the bytes cross a boundary between two words. SWR stores the right portion of the register into the appropriate part of the low-order word; SWL stores the left portion of the register into the appropriate part of the low-order word of memory.

The SWR instruction adds its sign-extended 16-bit \( \text{offset} \) to the contents of general register \( \text{base} \) to form a virtual address which may specify an arbitrary byte. It alters only the word in memory which contains that byte. From one to four bytes will be stored, depending on the starting byte specified.

Conceptually, it starts at the least-significant (rightmost) byte of the register and copies it to the specified byte in memory; then copies bytes from register to memory until it reaches the high-order byte of the word in memory.

No address exceptions due to alignment are possible.

---

**Table:**

<table>
<thead>
<tr>
<th>Address</th>
<th>Register</th>
<th>Memory (big-endian)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td></td>
<td>before</td>
</tr>
<tr>
<td>4</td>
<td></td>
<td>0 1 2 3</td>
</tr>
<tr>
<td>5</td>
<td></td>
<td>4 5 6 7</td>
</tr>
<tr>
<td>24</td>
<td></td>
<td>ABCD</td>
</tr>
<tr>
<td>4</td>
<td></td>
<td>after</td>
</tr>
<tr>
<td>0</td>
<td></td>
<td>0 1 2 3</td>
</tr>
<tr>
<td>5</td>
<td></td>
<td>D 5 6 7</td>
</tr>
<tr>
<td>24</td>
<td></td>
<td>$24</td>
</tr>
</tbody>
</table>

**Diagram:**

SWR $24,1($0)
Operation:

$$\begin{align*}
T: & \quad vAddr \leftarrow (\text{offset}_{10})_4rowsing operator) + GPR[\text{base}] \\
(pAddr, \text{uncached}) \leftarrow \text{AddressTranslation} (vAddr, \text{DATA}) \\
pAddr \leftarrow pAddr_{\text{PSIZE}-1..3} \parallel (pAddr_{2..0} \text{ xor } \text{ReverseEndian}^3) \\
\text{if } \text{BigEndianMem} = 0 \text{ then} \\
pAddr \leftarrow pAddr_{31..2} \parallel 0^2 \\
\text{end if} \\
\text{byte} \leftarrow vAddr_{1..0} \text{ xor } \text{BigEndianCPU}^2 \\
\text{if } (vAddr_2 \text{ xor } \text{BigEndianCPU}) = 0 \text{ then} \\
data \leftarrow 0^{32} \parallel GPR[rt]_{31..8*\text{byte}..0} \parallel 0^8\text{byte} \\
\text{else} \\
data \leftarrow GPR[rt]_{31..8*\text{byte}..0} \parallel 0^8\text{byte} \parallel 0^{32} \\
\text{endif} \\
\text{StoreMemory(uncached, WORD-byte, data, pAddr, vAddr, DATA)} \\
\end{align*}$$

Given a doubleword in a register and a doubleword in memory, the operation of SWR is as follows:

### SWR

<table>
<thead>
<tr>
<th>Register</th>
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
<th>E</th>
<th>F</th>
<th>G</th>
<th>H</th>
</tr>
</thead>
<tbody>
<tr>
<td>Memory</td>
<td>I</td>
<td>J</td>
<td>K</td>
<td>L</td>
<td>M</td>
<td>N</td>
<td>O</td>
<td>P</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>vAddr_{2..0}</th>
<th>BigEndianCPU = 0</th>
<th>BigEndianCPU = 1</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>offset</td>
<td></td>
</tr>
<tr>
<td></td>
<td>LEM</td>
<td>BEM</td>
</tr>
<tr>
<td>0</td>
<td>I J K L E F G H</td>
<td>3 0 4</td>
</tr>
<tr>
<td>1</td>
<td>I J K L F G H P</td>
<td>2 1 4</td>
</tr>
<tr>
<td>2</td>
<td>I J K L G H O P</td>
<td>1 2 4</td>
</tr>
<tr>
<td>3</td>
<td>I J K L H N O P</td>
<td>0 3 4</td>
</tr>
<tr>
<td>4</td>
<td>E F G H M N O P</td>
<td>3 4 0</td>
</tr>
<tr>
<td>5</td>
<td>F G H L M N O P</td>
<td>2 5 0</td>
</tr>
<tr>
<td>6</td>
<td>G H K L M N O P</td>
<td>1 6 0</td>
</tr>
<tr>
<td>7</td>
<td>H J K L M N O P</td>
<td>0 7 0</td>
</tr>
</tbody>
</table>

*LEM* Little-endian memory (BigEndianMem = 0)

*BEM* BigEndianMem = 1

*Type* AccessType (see Table 2.1 on page 2-3) sent to memory

*Offset* pAddr_{2..0} sent to memory

### Exceptions:

- TLB refill exception
- TLB invalid exception
- TLB modification exception
- Bus error exception
- Address error exception
SYNC

### Format:

```
SYNC
```

### Description:

The SYNC instruction ensures that any loads and stores fetched prior to the present instruction are completed before any loads or stores after this instruction are allowed to start. Use of the SYNC instruction to serialize certain memory references may be required in a multiprocessor environment for proper synchronization. For example:

The SYNC in processor A prevents DATA being written after FLAG, which could cause processor B to read stale data. The SYNC in processor B prevents DATA from being read before FLAG, which could likewise result in reading stale data. For processors which only execute loads and stores in order, with respect to shared memory, this instruction is a NOP.

### Operation:

```
T: SyncOperation()
```

### Exceptions:

None
**Format:**
SYSCALL

**Description:**
A system call exception occurs, immediately and unconditionally transferring control to the exception handler.

The code field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.

**Operation:**

| T: SystemCallException |

**Exceptions:**
System Call exception
### Format:

```
TEQ rs, rt
```

### Description:

The contents of general register `rt` are compared to general register `rs`. If the contents of general register `rs` are equal to the contents of general register `rt`, a trap exception occurs.

The code field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.

### Operation:

```
T: if GPR[rs] = GPR[rt] then
    TrapException
endif
```

### Exceptions:

- Trap exception
**Format:**
TEQI rs, immediate

**Description:**
The 16-bit *immediate* is sign-extended and compared to the contents of general register *rs*. If the contents of general register *rs* are equal to the sign-extended *immediate*, a trap exception occurs.

**Operation:**

```
T: if GPR[rs] = (immediate_{15})^{48} || immediate_{15..0} then
    TrapException
endif
```

**Exceptions:**
Trap exception
TGE

Trap If Greater Than Or Equal

Format:
TGE rs, rt

Description:
The contents of general register rt are compared to the contents of general register rs. Considering both quantities as signed integers, if the contents of general register rs are greater than or equal to the contents of general register rt, a trap exception occurs.

The code field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.

Operation:

T: if GPR[rs] \geq GPR[rt] then
   TrapException
endif

Exceptions:
Trap exception
**TGEI**  
**Trap If Greater Than Or Equal Immediate**

| Format: | TGEI rs, immediate |

**Description:**  
The 16-bit *immediate* is sign-extended and compared to the contents of general register *rs*. Considering both quantities as signed integers, if the contents of general register *rs* are greater than or equal to the sign-extended *immediate*, a trap exception occurs.

**Operation:**

```c
T: if GPR[rs] \geq (immediate_{15})^{48} || immediate_{15..0} then
   TrapException
endif
```

**Exceptions:**  
Trap exception
TGEIU

Trap If Greater Than Or Equal
Immediate Unsigned

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>REGIMM</td>
<td>rs</td>
<td>TGEIU</td>
<td>immediate</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 0 0 0 0 0 1</td>
<td>0 1 0 0 1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

| 6 | 5 | 5 | 16 |

**Format:**

TGEIU rs, immediate

**Description:**
The 16-bit *immediate* is sign-extended and compared to the contents of general register *rs*. Considering both quantities as unsigned integers, if the contents of general register *rs* are greater than or equal to the sign-extended *immediate*, a trap exception occurs.

**Operation:**

\[
T: \text{ if } (0 || \text{GPR}[rs]) \geq (0 || (\text{immediate}_{15})^{48} || \text{immediate}_{15..0}) \text{ then }
\]

\[
\text{TrapException}
\]

\[
\text{endif}
\]

**Exceptions:**

Trap exception
**TGEU**  Trap If Greater Than Or Equal Unsigned

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>rs</td>
<td>rt</td>
<td>code</td>
<td>TGEU</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 0 0 0 0 0</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>10</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

TGEU rs, rt

**Description:**

The contents of general register rt are compared to the contents of general register rs. Considering both quantities as unsigned integers, if the contents of general register rs are greater than or equal to the contents of general register rt, a trap exception occurs.

The code field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.

**Operation:**

```plaintext
T: if (0 || GPR[rs]) ≥ (0 || GPR[rt]) then
    TrapException
endif
```

**Exceptions:**

Trapped exception
TLBP  Probe TLB For Matching Entry  TLBP

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP0</td>
<td>CO</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Format:**
TLBP

**Description:**
The Index register is loaded with the address of the TLB entry whose contents match the contents of the EntryHi register. If no TLB entry matches, the high-order bit of the Index register is set.

The architecture does not specify the operation of memory references associated with the instruction immediately after a TLBP instruction, nor is the operation specified if more than one TLB entry matches.

**Operation:**

\[
T: \text{Index} \leftarrow 1 || 0^{31} \\
\text{for } i \text{ in } 0..\text{TLBEntries}-1 \\
\quad \text{if } (\text{TLB}[i]_{167..141} \text{ and not } (0^{15} || \text{TLB}[i]_{216..205})) \\
\quad \quad \text{and } (\text{EntryHi}_{39..13}) \text{ and not } (0^{15} || \text{TLB}[i]_{216..205}) \text{ and } \\
\quad \quad (\text{TLB}[i]_{140} \text{ or } (\text{TLB}[i]_{135..128} = \text{EntryHi}_{7..0})) \text{ then} \\
\quad \quad \text{Index} \leftarrow 0^{26} || i_{5..0} \\
\quad \text{endif} \\
\text{endfor}
\]

**Exceptions:**
Coprocessor unusable exception
TLBR Read Indexed TLB Entry TLBR

Format:
TLBR

Description:
The G bit (which controls ASID matching) read from the TLB is written into both of the EntryLo0 and EntryLo1 registers.
The EntryHi and EntryLo registers are loaded with the contents of the TLB entry pointed at by the contents of the TLB Index register. The operation is invalid (and the results are unspecified) if the contents of the TLB Index register are greater than the number of TLB entries in the processor.

Operation:

T: \[
\begin{align*}
\text{T: } & \text{PageMask} \leftarrow \text{TLB[Index}_5..0]_{255..192} \\
& \text{EntryHi} \leftarrow \text{TLB[Index}_5..0]_{191..128} \text{ and not TLB[Index}_5..0]_{255..192} \\
& \text{EntryLo1} \leftarrow \text{TLB[Index}_5..0]_{127..65} \text{ } | \text{ TLB[Index}_5..0]_{140} \\
& \text{EntryLo0} \leftarrow \text{TLB[Index}_5..0]_{63..1} \text{ } | \text{ TLB[Index}_5..0]_{140}
\end{align*}
\]

Exceptions:
Coprocessor unusable exception
**TLBWI**  
Write Indexed TLB Entry  

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP0</td>
<td>CO</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>19</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Format:**
TLBWI

**Description:**
The $G$ bit of the TLB is written with the logical AND of the $G$ bits in the
*EntryLo0* and *EntryLo1* registers.
The TLB entry pointed at by the contents of the TLB Index register is
loaded with the contents of the *EntryHi* and *EntryLo* registers.
The operation is invalid (and the results are unspecified) if the contents
of the TLB Index register are greater than the number of TLB entries in the
processor.

**Operation:**
\[
T: \quad \text{TLB}[\text{Index}_{5:0}] \leftarrow \text{PageMask} \land (\text{EntryHi and not PageMask}) \land (\text{EntryLo1 and not PageMask}) \land (\text{EntryLo0 and not PageMask})
\]

**Exceptions:**
Coprocessor unusable exception
TLBWR  Write Random TLB Entry

<table>
<thead>
<tr>
<th></th>
<th>COP0</th>
<th>CO</th>
<th>0</th>
<th>TLBWR</th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>010000</td>
<td>1</td>
<td>0</td>
<td>000110</td>
</tr>
<tr>
<td>26</td>
<td>6</td>
<td>1</td>
<td>19</td>
<td>6</td>
</tr>
</tbody>
</table>

**Format:**
TLBWR

**Description:**
The G bit of the TLB is written with the logical AND of the G bits in the EntryLo0 and EntryLo1 registers.
The TLB entry pointed at by the contents of the TLB Random register is loaded with the contents of the EntryHi and EntryLo registers.

**Operation:**

\[
T: \quad \text{TLB[Random}_{5..0}] \leftarrow \text{PageMask || (EntryHi and not PageMask || EntryLo1 || EntryLo0)}
\]

**Exceptions:**
Coprocessor unusable exception
Format:
TLT rs, rt

Description:
The contents of general register rt are compared to general register rs. Considering both quantities as signed integers, if the contents of general register rs are less than the contents of general register rt, a trap exception occurs.

The code field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.

Operation:
```
T: if GPR[rs] < GPR[rt] then
   TrapException
endif
```

Exceptions:
Trap exception
**Format:**

TLTI rs, immediate

**Description:**

The 16-bit immediate is sign-extended and compared to the contents of general register rs. Considering both quantities as signed integers, if the contents of general register rs are less than the sign-extended immediate, a trap exception occurs.

**Operation:**

\[
T: \text{if GPR}[rs] < (\text{immediate}_{15})^{48} || \text{immediate}_{15..0} \text{ then}
\]

\[
\text{TrapException}
\]

\[
\text{endif}
\]

**Exceptions:**

Trap exception
**TLTIU**  
**Trap If Less Than Immediate Unsigned**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>REGIMM</td>
<td>rs</td>
<td>TLIU</td>
<td>immediate</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 0 0 0 0 1</td>
<td>0 1 0 1 1</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
TLTIU rs, immediate

**Description:**
The 16-bit immediate is sign-extended and compared to the contents of general register rs. Considering both quantities as signed integers, if the contents of general register rs are less than the sign-extended immediate, a trap exception occurs.

**Operation:**

```plaintext
T: if (0 || GPR[rs]) < (0 || (immediate[15] || immediate[15..0])) then
   TrapException
endif
```

**Exceptions:**
Traps exception
## TLTU

### Trap If Less Than Unsigned

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL 0 0 0 0 0 0</td>
<td>rs</td>
<td>rt</td>
<td>code</td>
<td>TLTU 1 1 0 0 1 1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Format:

`TLTU rs, rt`

#### Description:

The contents of general register `rt` are compared to general register `rs`. Considering both quantities as unsigned integers, if the contents of general register `rs` are less than the contents of general register `rt`, a trap exception occurs.

The code field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.

#### Operation:

```
T:   if (0 || GPR[rs]) < (0 || GPR[rt]) then
    TrapException
endif
```

#### Exceptions:

- Trap exception
**Format:**

TNE rs, rt

**Description:**

The contents of general register rt are compared to general register rs. If the contents of general register rs are not equal to the contents of general register rt, a trap exception occurs.

The code field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.

**Operation:**

```
T: if GPR[rs] ≠ GPR[rt] then
    TrapException
endif
```

**Exceptions:**

Trap exception
### TNEI

**Trap If Not Equal Immediate**

```

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>REGIMM</td>
<td>rs</td>
<td>TNEI</td>
<td>immediate</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 0 0 0 0 1</td>
<td></td>
<td>0 1 1 1 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
```

**Format:**

TNEI rs, immediate

**Description:**

The 16-bit *immediate* is sign-extended and compared to the contents of general register rs. If the contents of general register rs are not equal to the sign-extended *immediate*, a trap exception occurs.

**Operation:**

```
T: if GPR[rs] ≠ (immediate_{15})^{48} || immediate_{15..0} then
   TrapException
endif
```

**Exceptions:**

Trap exception
WAIT

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>24</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP0</td>
<td>CO</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>6</td>
<td>1</td>
<td>19</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
WAIT

**Description:**
The WAIT instruction is used to halt the internal pipeline and thus reduce the power consumption of the CPU. See Appendix G for more details.

**Operation:**

```plaintext
T: if SysAD bus is idle then
    StopPipeline
endif
```

**Exceptions:**
Coprocessor unusable exception
**Format:**

XOR rd, rs, rt

**Description:**

The contents of general register rs are combined with the contents of general register rt in a bit-wise logical exclusive OR operation. The result is placed into general register rd.

**Operation:**

\[ T: \ GPR[rd] \leftarrow GPR[rs] \ xor \ GPR[rt] \]

**Exceptions:**

None
**XORI**  
**Exclusive OR Immediate**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>XORI</td>
<td></td>
<td>rs</td>
<td></td>
<td>rt</td>
<td></td>
<td></td>
<td>immediate</td>
</tr>
<tr>
<td>0 0 1 1 1 0</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**  
XORI rt, rs, immediate

**Description:**  
The 16-bit *immediate* is zero-extended and combined with the contents of general register rs in a bit-wise logical exclusive OR operation. The result is placed into general register rt.

**Operation:**

\[ T: \text{GPR}[rt] \leftarrow \text{GPR[rs]} \oplus (0^{48} \mid \text{immediate}) \]

**Exceptions:**  
None
CPU Instruction Opcode Bit Encoding

The remainder of this Appendix presents the opcode bit encoding for the CPU instruction set (ISA and extensions), as implemented by the R4600/R4700.

Table A.4 lists the R4600/R4700 Opcode Bit Encoding.
Table A.4

**CPU Instruction Set Details**

**Key to Table:**
- * Operation codes marked with an asterisk cause reserved instruction exceptions in all current implementations and are reserved for future versions of the architecture.
- g Operation codes marked with a gamma cause a reserved instruction exception. They are reserved for future versions of the architecture.
- d Operation codes marked with a delta are valid only for R4600 processors with CP0 enabled, and cause a reserved instruction exception on other processors.
- f Operation codes marked with a phi are invalid but do not cause reserved instruction exceptions in R4600 implementations.

<table>
<thead>
<tr>
<th>Op</th>
<th>16..12</th>
<th>11..8</th>
<th>7..4</th>
<th>3..0</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADDI</td>
<td>ADDIU</td>
<td>SLTI</td>
<td>SLTIU</td>
<td>ANDI</td>
</tr>
<tr>
<td>COP0</td>
<td>COP1</td>
<td>COP2</td>
<td>*</td>
<td>BEQ</td>
</tr>
<tr>
<td>ADDD</td>
<td>ADDIU</td>
<td>LLD</td>
<td>LDR</td>
<td>*</td>
</tr>
<tr>
<td>LB</td>
<td>LH</td>
<td>LWL</td>
<td>LW</td>
<td>LBU</td>
</tr>
<tr>
<td>SB</td>
<td>SH</td>
<td>SWL</td>
<td>SW</td>
<td>SDL</td>
</tr>
<tr>
<td>LL</td>
<td>LWC1</td>
<td>LWC2</td>
<td>*</td>
<td>LLD</td>
</tr>
<tr>
<td>SC</td>
<td>SWC1</td>
<td>SWC2</td>
<td>*</td>
<td>SCD</td>
</tr>
<tr>
<td>SPECIAL</td>
<td>ADDI</td>
<td>COP0</td>
<td>COP1</td>
<td>COP2</td>
</tr>
<tr>
<td>Opcode</td>
<td>SLL</td>
<td>SRL</td>
<td>SRA</td>
<td>SLLV</td>
</tr>
<tr>
<td></td>
<td>JR</td>
<td>JALR</td>
<td>*</td>
<td>SYSCALL</td>
</tr>
<tr>
<td></td>
<td>MFHI</td>
<td>MTHI</td>
<td>MFLO</td>
<td>MTLO</td>
</tr>
<tr>
<td></td>
<td>MULT</td>
<td>MULTU</td>
<td>DIV</td>
<td>DIVU</td>
</tr>
<tr>
<td></td>
<td>ADD</td>
<td>ADDU</td>
<td>SUB</td>
<td>SUBU</td>
</tr>
<tr>
<td></td>
<td>*</td>
<td>SLT</td>
<td>SLTU</td>
<td>DADD</td>
</tr>
<tr>
<td></td>
<td>TGE</td>
<td>TGEU</td>
<td>TLT</td>
<td>TLTU</td>
</tr>
<tr>
<td></td>
<td>DSLL</td>
<td>*</td>
<td>DSRL</td>
<td>DSRA</td>
</tr>
<tr>
<td>REGIMM rt</td>
<td>BLTZ</td>
<td>BGEZ</td>
<td>BLTZL</td>
<td>BGEZL</td>
</tr>
<tr>
<td></td>
<td>TGEIT</td>
<td>TGEIUI</td>
<td>TLTI</td>
<td>TLTIU</td>
</tr>
<tr>
<td></td>
<td>BLTZAL</td>
<td>BGEZAL</td>
<td>BLTZALL</td>
<td>BGEZALL</td>
</tr>
<tr>
<td>COPz rs</td>
<td>MF</td>
<td>DMF</td>
<td>CF</td>
<td>?</td>
</tr>
<tr>
<td>COPz rt</td>
<td>BCF</td>
<td>BCT</td>
<td>BCFL</td>
<td>BCTL</td>
</tr>
<tr>
<td></td>
<td>CO</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CP0 Function</td>
<td>TLB</td>
<td>TLBWI</td>
<td>TLBWR</td>
<td></td>
</tr>
<tr>
<td></td>
<td>ERET</td>
<td>WAIT</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

Table A.4
Introduction

This appendix provides a detailed description of each floating-point unit (FPU) instruction (refer to Appendix A for a detailed description of the CPU instructions). The instructions are listed alphabetically, and any exceptions that may occur due to the execution of each instruction are listed after the description of each instruction. Descriptions of the immediate causes and the manner of handling exceptions are omitted from the instruction descriptions in this appendix (refer to Chapter 7 for detailed descriptions of floating-point exceptions and handling).

Figure B.3 on page B-45 lists the entire bit encoding for the constant fields of the floating-point instruction set; the bit encoding for each instruction is included with that individual instruction.

Instruction Formats

There are three basic instruction format types:

- I-Type, or Immediate instructions, which include load and store operations
- M-Type, or Move instructions
- R-Type, or Register instructions, which include the two- and three-register floating-point operations.

The instruction description subsections that follow show how these three basic instruction formats are used by:

- Load and store instructions
- Move instructions
- Floating-Point computational instructions
- Floating-Point branch instructions

Floating-point instructions are mapped onto the MIPS coprocessor instructions, defining coprocessor unit number one (CP1) as the floating-point unit.

Each operation is valid only for certain formats. Implementations may support some of these formats and operations through emulation, but they only need to support combinations that are valid (marked V in Table B.1).

Combinations marked R in Table B.1 are not currently specified by this architecture, and cause an unimplemented instruction trap. They will be available for future extensions to the architecture.
The coprocessor branch on condition true/false instructions can be used to logically negate any predicate. Thus, the 32 possible conditions require only 16 distinct comparisons, as shown in Table B.2 below.

<table>
<thead>
<tr>
<th>Operation</th>
<th>Source Format</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADD</td>
<td>V V R R</td>
</tr>
<tr>
<td>SUB</td>
<td>V V R R</td>
</tr>
<tr>
<td>MUL</td>
<td>V V R R</td>
</tr>
<tr>
<td>DIV</td>
<td>V V R R</td>
</tr>
<tr>
<td>SQRT</td>
<td>V V R R</td>
</tr>
<tr>
<td>ABS</td>
<td>V V R R</td>
</tr>
<tr>
<td>MOV</td>
<td>V V</td>
</tr>
<tr>
<td>NEG</td>
<td>V V R R</td>
</tr>
<tr>
<td>TRUNC.L</td>
<td>V V</td>
</tr>
<tr>
<td>ROUND.L</td>
<td>V V</td>
</tr>
<tr>
<td>CEIL.L</td>
<td>V V</td>
</tr>
<tr>
<td>FLOOR.L</td>
<td>V V</td>
</tr>
<tr>
<td>TRUNC.W</td>
<td>V V</td>
</tr>
<tr>
<td>ROUND.W</td>
<td>V V</td>
</tr>
<tr>
<td>CEIL.W</td>
<td>V V</td>
</tr>
<tr>
<td>FLOOR.W</td>
<td>V V</td>
</tr>
<tr>
<td>CVT.S</td>
<td>V V V V</td>
</tr>
<tr>
<td>CVT.D</td>
<td>V V</td>
</tr>
<tr>
<td>CVT.W</td>
<td>V V</td>
</tr>
<tr>
<td>CVT.L</td>
<td>V V</td>
</tr>
<tr>
<td>C</td>
<td>V V R R</td>
</tr>
</tbody>
</table>

Table B.1 Valid FPU Instruction Formats
**Floating-Point Loads, Stores, and Moves**

All movement of data between the floating-point coprocessor and memory is accomplished by coprocessor load and store operations, which reference the floating-point coprocessor *General Purpose* registers. These operations are unformatted; no format conversions are performed and, therefore, no floating-point exceptions can occur due to these operations.

Data may also be directly moved between the floating-point coprocessor and the processor by *move to coprocessor* and *move from coprocessor* instructions. Like the floating-point load and store operations, move to/from operations perform no format conversions and never cause floating-point exceptions.

An additional pair of coprocessor registers are available, called *Floating-Point Control* registers for which the only data movement operations supported are moves to and from processor *General Purpose* registers.

---

<table>
<thead>
<tr>
<th>Condition</th>
<th>Relations</th>
<th>Invalid Operation Exception if Unordered</th>
</tr>
</thead>
<tbody>
<tr>
<td>Mnemonic</td>
<td>Code</td>
<td>Greater Than</td>
</tr>
<tr>
<td>True</td>
<td>False</td>
<td>F</td>
</tr>
<tr>
<td>UN OR</td>
<td>1</td>
<td>F</td>
</tr>
<tr>
<td>EQ NEQ</td>
<td>2</td>
<td>F</td>
</tr>
<tr>
<td>UEQ OGL</td>
<td>3</td>
<td>F</td>
</tr>
<tr>
<td>OLT UGE</td>
<td>4</td>
<td>F</td>
</tr>
<tr>
<td>ULT OGE</td>
<td>5</td>
<td>F</td>
</tr>
<tr>
<td>OLE UGT</td>
<td>6</td>
<td>F</td>
</tr>
<tr>
<td>ULE OGT</td>
<td>7</td>
<td>F</td>
</tr>
<tr>
<td>SF ST</td>
<td>8</td>
<td>F</td>
</tr>
<tr>
<td>NGLE GLE</td>
<td>9</td>
<td>F</td>
</tr>
<tr>
<td>SEQ SNE</td>
<td>10</td>
<td>F</td>
</tr>
<tr>
<td>NGL GL</td>
<td>11</td>
<td>F</td>
</tr>
<tr>
<td>LT NLT</td>
<td>12</td>
<td>F</td>
</tr>
<tr>
<td>NGE GE</td>
<td>13</td>
<td>F</td>
</tr>
<tr>
<td>LE NLE</td>
<td>14</td>
<td>F</td>
</tr>
<tr>
<td>NGT GT</td>
<td>15</td>
<td>F</td>
</tr>
</tbody>
</table>

Table B.2 Logical Negation of Predicates by Condition True/False
Floating-Point Operations

The floating-point unit operation set includes:

- floating-point add
- floating-point subtract
- floating-point multiply
- floating-point divide
- floating-point square root
- convert between fixed-point and floating-point formats
- convert between floating-point formats
- floating-point compare

These operations satisfy the requirements of IEEE Standard 754 requirements for accuracy. Specifically, these operations obtain a result which is identical to an infinite-precision result rounded to the specified format, using the current rounding mode.

Instructions must specify the format of their operands. Except for conversion functions, mixed-format operations are not provided.

Instruction Notation Conventions

In this appendix, all variable subfields in an instruction format (such as $fs$, $ft$, $immediate$, and so on) are shown in lower-case. The instruction name (such as ADD, SUB, and so on) is shown in upper-case.

For the sake of clarity, we sometimes use an alias for a variable subfield in the formats of specific instructions. For example, we use $rs = base$ in the format for load and store instructions. Such an alias is always lower case, since it refers to a variable subfield.

In some instructions, the instruction subfields $op$ and $function$ can have constant 6-bit values. When reference is made to these instructions, upper-case mnemonics are used. For instance, in the floating-point ADD instruction we use $op = COP1$ and $function = FADD$. In other cases, a single field has both fixed and variable subfields, so the name contains both upper and lower case characters. Bit encoding for mnemonics are shown in Figure B.3 at the end of this appendix, and are also included with each individual instruction.

In the instruction description examples that follow, the Operation section describes the operation performed by each instruction using a high-level language notation.

Instruction Notation Examples

The following examples illustrate the application of some of the instruction notation conventions:

Example #1:

```
GPR[rt] ← immediate || 0^{16}
```

Sixteen zero bits are concatenated with an immediate value (typically 16 bits), and the 32-bit string (with the lower 16 bits set to zero) is assigned to General Purpose Register $rt$.

Example #2:

```
(immediate_{15})^{16} || immediate_{15..0}
```

Bit 15 (the sign bit) of an immediate value is extended for 16 bit positions, and the result is concatenated with bits 15 through 0 of the immediate value to form a 32-bit sign extended value.
Load and Store Instructions

In the R4600 implementation, the instruction immediately following a load may use the contents of the register being loaded. In such cases, the hardware *interlocks*, requiring additional real cycles, so scheduling load delay slots is still desirable, although not required for functional code.

The behavior of the load store instructions is dependent on the width of the FGRs.

- When the FR bit in the Status register equals zero, the Floating-Point General registers (FGRs) are 32-bits wide.
- When the FR bit in the Status register equals one, the Floating-Point General registers (FGRs) are 64-bits wide.

In the load and store operation descriptions, the functions listed in Table B.3 are used to summarize the handling of virtual addresses and physical memory.

<table>
<thead>
<tr>
<th>Function</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>AddressTranslation</td>
<td>Uses the TLB to find the physical address given the virtual address. The function fails and an exception is taken if the required translation is not present in the TLB.</td>
</tr>
<tr>
<td>LoadMemory</td>
<td>Uses the cache and main memory to find the contents of the word containing the specified physical address. The low-order two bits of the address and the Access Type field indicates which of each of the four bytes within the data word need to be returned. If the cache is enabled for this access, the entire word is returned and loaded into the cache.</td>
</tr>
<tr>
<td>StoreMemory</td>
<td>Uses the cache, write buffer, and main memory to store the word or part of word specified as data in the word containing the specified physical address. The low-order two bits of the address and the Access Type field indicates which of each of the four bytes within the data word should be stored.</td>
</tr>
</tbody>
</table>

Table B.3 Load and Store Common Functions

Figure B.1 shows the I-Type instruction format used by load and store operations.

I-Type (Immediate)

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>op</td>
<td>base</td>
<td>ft</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>offset</td>
</tr>
</tbody>
</table>

- op is a 6-bit operation code
- base is the 5-bit base register specifier
- ft is a 5-bit source (for stores) or destination (for loads) FPA register specifier
- offset is the 16-bit signed immediate offset

Figure B.1 Load and Store Instruction Format
All coprocessor loads and stores reference aligned-word data items. Thus, for word loads and stores, the access type field is always WORD, and the low-order two bits of the address must always be zero.

For doubleword loads and stores, the access type field is always DOUBLEWORD, and the low-order three bits of the address must always be zero.

Regardless of byte-numbering order (endianness), the address specifies that byte which has the smallest byte-address in the addressed field. For a big-endian machine, this is the leftmost byte; for a little-endian machine, this is the rightmost byte.

Computational Instructions

Computational instructions include all of the arithmetic floating-point operations performed by the FPU.

Figure B.2 shows the R-Type instruction format used for computational operations.

![Figure B.2 Computational Instruction Format](image)

The function field indicates the floating-point operation to be performed.

Each floating-point instruction can be applied to a number of operand formats. The operand format for an instruction is specified by the 5-bit format field; decoding for this field is shown in Table B.4.

<table>
<thead>
<tr>
<th>Code</th>
<th>Mnemonic</th>
<th>Size</th>
<th>Format</th>
</tr>
</thead>
<tbody>
<tr>
<td>16</td>
<td>S</td>
<td>single</td>
<td>Binary floating-point</td>
</tr>
<tr>
<td>17</td>
<td>D</td>
<td>double</td>
<td>Binary floating-point</td>
</tr>
<tr>
<td>18</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>19</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
<tr>
<td>20</td>
<td>W</td>
<td>single</td>
<td>32-bit binary fixed-point</td>
</tr>
<tr>
<td>21</td>
<td>L</td>
<td>longword</td>
<td>64-bit binary fixed-point</td>
</tr>
<tr>
<td>22–31</td>
<td>Reserved</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Table B.4 Format Field Decoding
Table B.5 lists all floating-point instructions.

<table>
<thead>
<tr>
<th>Code (5:0)</th>
<th>Mnemonic</th>
<th>Operation</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>ADD</td>
<td>Add</td>
</tr>
<tr>
<td>1</td>
<td>SUB</td>
<td>Subtract</td>
</tr>
<tr>
<td>2</td>
<td>MUL</td>
<td>Multiply</td>
</tr>
<tr>
<td>3</td>
<td>DIV</td>
<td>Divide</td>
</tr>
<tr>
<td>4</td>
<td>SQRT</td>
<td>Square root</td>
</tr>
<tr>
<td>5</td>
<td>ABS</td>
<td>Absolute value</td>
</tr>
<tr>
<td>6</td>
<td>MOV</td>
<td>Move</td>
</tr>
<tr>
<td>7</td>
<td>NEG</td>
<td>Negate</td>
</tr>
<tr>
<td>8</td>
<td>ROUND.L</td>
<td>Convert to single fixed-point, rounded to nearest/even</td>
</tr>
<tr>
<td>9</td>
<td>TRUNC.L</td>
<td>Convert to single fixed-point, rounded toward zero</td>
</tr>
<tr>
<td>10</td>
<td>CEIL.L</td>
<td>Convert to single fixed-point, rounded to $+\infty$</td>
</tr>
<tr>
<td>11</td>
<td>FLOOR.L</td>
<td>Convert to single fixed-point, rounded to $-\infty$</td>
</tr>
<tr>
<td>12</td>
<td>ROUND.W</td>
<td>Convert to single fixed-point, rounded to nearest/even</td>
</tr>
<tr>
<td>13</td>
<td>TRUNC.W</td>
<td>Convert to single fixed-point, rounded toward zero</td>
</tr>
<tr>
<td>14</td>
<td>CEIL.W</td>
<td>Convert to single fixed-point, rounded to $+\infty$</td>
</tr>
<tr>
<td>15</td>
<td>FLOOR.W</td>
<td>Convert to single fixed-point, rounded to $-\infty$</td>
</tr>
<tr>
<td>16-31</td>
<td>–</td>
<td>Reserved</td>
</tr>
<tr>
<td>32</td>
<td>CVT.S</td>
<td>Convert to single floating-point</td>
</tr>
<tr>
<td>33</td>
<td>CVT.D</td>
<td>Convert to double floating-point</td>
</tr>
<tr>
<td>34</td>
<td>–</td>
<td>Reserved</td>
</tr>
<tr>
<td>35</td>
<td>–</td>
<td>Reserved</td>
</tr>
<tr>
<td>36</td>
<td>CVT.W</td>
<td>Convert to 32-bit binary fixed-point</td>
</tr>
<tr>
<td>37</td>
<td>CVT.L</td>
<td>Convert to 64-bit binary fixed-point</td>
</tr>
<tr>
<td>38-47</td>
<td>–</td>
<td>Reserved</td>
</tr>
<tr>
<td>48-63</td>
<td>C</td>
<td>Floating-point compare</td>
</tr>
</tbody>
</table>

Table B.5 Floating-Point Instructions and Operations

In the following pages, the notation FGR refers to the 32 General Purpose registers FGR0 through FGR31 of the FPU, and FPR refers to the floating-point registers of the FPU.

- When the FR bit in the Status register (SR(26)) equals zero, only the even floating-point registers are valid and the 32 General Purpose registers of the FPU are 32-bits wide.
- When the FR bit in the Status register (SR(26)) equals one, both odd and even floating-point registers may be used and the 32 General Purpose registers of the FPU are 64-bits wide.

The following routines are used in the description of the floating-point operations to retrieve the value of an FPR or to change the value of an FGR:
FR = 0

value ← ValueFPR(fpr, fmt)
case fmt of
  S, W:
    if FGR0 = 0
      value ← FGR[fpr]
    else
      value ← FGR[fpr - 1]
    endif
  D:
    /* undefined for fpr not even */
    value ← FGR[fpr]
  end

StoreFPR(fpr, fmt, value):
case fmt of
  S, W:
    if FGR0 = 0
      FGR[fpr] ← FGR[fpr]63..32 || value
    else
      FGR[fpr - 1] ← value || FGR[fpr - 1]31..0
    endif
  D:
    /* undefined for fpr not even */
    FGR[fpr] ← value
  end

FR = 1

value ← ValueFPR(fpr, fmt)
case fmt of
  S:
    value ← FGR[fpr]31..0
  D, L:
    value ← FGR[fpr]
  W:
    value ← FGR[fpr]
  end

StoreFPR(fpr, fmt, value):
case fmt of
  S, W:
    FGR[fpr] ← undefined32 || value
  D, L:
    FGR[fpr] ← value
  end
ABS.fmt | Floating-Point Absolute Value | ABS.fmt

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>fmt</td>
<td>0</td>
<td>00000</td>
<td>fs</td>
<td>fd</td>
<td>ABS</td>
<td>000101</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
ABS.fmt fd, fs

**Description:**
The contents of the FPU register specified by fs are interpreted in the specified format and the arithmetic absolute value is taken. The result is placed in the floating-point register specified by fd.

The absolute value operation is arithmetic; a NaN operand signals invalid operation.

This instruction is valid only for single- and double-precision floating-point formats. The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

**Operation:**

```
T: StoreFPR(fd, fmt, AbsoluteValue(ValueFPR(fs, fmt)))
```

**Exceptions:**
- Coprocessor unusable exception
- Coprocessor exception trap

**Coprocessor Exceptions:**
- Unimplemented operation exception
- Invalid operation exception
ADD.fmt  Floating-Point Add  ADD.fmt

Format:
ADD.fmt fd, fs, ft

Description:
The contents of the FPU registers specified by fs and ft are interpreted in the specified format and arithmetically added. The result is rounded as if calculated to infinite precision and then rounded to the specified format (fmt), according to the current rounding mode. The result is placed in the floating-point register (FPR) specified by fd.

This instruction is valid only for single- and double-precision floating-point formats. The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

Operation:

T: StoreFPR (fd, fmt, ValueFPR(fs, fmt) + ValueFPR(ft, fmt))

Exceptions:
Coprocessor unusable exception
Floating-Point exception

Coprocessor Exceptions:
Unimplemented operation exception
Invalid operation exception
Inexact exception
Overflow exception
Underflow exception
BC1F Branch On FPA False (Coprocessor 1)

<table>
<thead>
<tr>
<th>COP1</th>
<th>BC</th>
<th>BCF</th>
<th>offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 1 0 0 0 1</td>
<td>0 1 0 0 0</td>
<td>0 0 0 0 0</td>
<td>16</td>
</tr>
</tbody>
</table>

**Format:**
BC1F offset

**Description:**
A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If the result of the last floating-point compare is false, the program branches to the target address, with a delay of one instruction.

**Operation:**

T–1:  condition ← not COC[1]
T:  target ← (offset_{16}) || offset || 0^2
T+1:  if condition then
      PC ← PC + target
      endif

**Exceptions:**
Coprocessor unusable exception
**Format:**

BC1FL offset

**Description:**

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended.

If the result of the last floating-point compare is false, the program branches to the target address, with a delay of one instruction. If the conditional branch is not taken, the instruction in the branch delay slot is nullified.

**Operation:**

| T-1: condition ← not COC[1] |
| T: target ← (offset15)46 || offset || 02 |
| T+1: if condition then |
| | PC ← PC + target |
| else |
| | NullifyCurrentInstruction |
| endif |

**Exceptions:**

Coprocessor unusable exception
**Format:**

BC1T offset

**Description:**

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If the result of the last floating-point compare is true, the program branches to the target address, with a delay of one instruction.

**Operation:**

| T−1: | condition ← COC[1] |
| T:   | target ← (offset)_{15}^{46} || offset || 0^2 |
| T+1: | if condition then |
|      | PC ← PC + target |
|      | endif |

**Exceptions:**

Coprocessor unusable exception
BC1TL  Branch On FPU True Likely  (Coprocessor 1)  BC1TL

<table>
<thead>
<tr>
<th>COP1</th>
<th>BC</th>
<th>BCTL</th>
<th>offset</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 1 0 0 0 1</td>
<td>0 1 0 0 0</td>
<td>0 0 0 1 1</td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
</tr>
</tbody>
</table>

**Format:**

BC1TL offset

**Description:**

A branch target address is computed from the sum of the address of the instruction in the delay slot and the 16-bit offset, shifted left two bits and sign-extended.

If the result of the last floating-point compare is true, the program branches to the target address, with a delay of one instruction. If the conditional branch is not taken, the instruction in the branch delay slot is nullified.

**Operation:**

\[
\begin{align*}
T-1: & \quad \text{condition} \leftarrow \text{COC[1]} \\
T: & \quad \text{target} \leftarrow (\text{offset}_{16})^{16} \parallel \text{offset} \parallel 0^2 \\
T+1: & \quad \text{if condition then} \\
& \quad \text{PC} \leftarrow \text{PC} + \text{target} \\
& \quad \text{else} \\
& \quad \text{NullifyCurrentInstruction} \\
& \quad \text{endif}
\end{align*}
\]

**Exceptions:**

Coprocessor unusable exception
**Format:**

C.cond.fmt fs, ft

**Description:**

The contents of the floating-point registers specified by fs and ft are interpreted in the specified format and arithmetically compared.

A result is determined based on the comparison and the conditions specified in the instruction. If one of the values is a Not a Number (NaN), and the high-order bit of the condition field is set, an invalid operation exception is taken. After a one-instruction delay, the condition is available for testing with branch on floating-point coprocessor condition instructions.

Comparisons are exact and can neither overflow nor underflow. Four mutually-exclusive relations are possible as results: less than, equal, greater than, and unordered. The last case arises when one or both of the operands are NaN; every NaN compares unordered with everything, including itself.

Comparisons ignore the sign of zero, so +0 = –0.

This instruction is valid only for single- and double-precision floating-point formats. The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

**Note:** *See “FPU Instruction Opcode Bit Encoding” at the end of Appendix B.*
**Operation:**

\[
T: \text{if } \text{NaN}(\text{ValueFPR}(fs, fmt)) \text{ or } \text{NaN}(\text{ValueFPR}(ft, fmt)) \text{ then}
\]
\[
\text{less} \leftarrow \text{false}
\]
\[
\text{equal} \leftarrow \text{false}
\]
\[
\text{unordered} \leftarrow \text{true}
\]
\[
\text{if } \text{cond}_3 \text{ then}
\]
\[
\text{signal } \text{InvalidOperationException}
\]
\[
\text{endif}
\]
\[
\text{else}
\]
\[
\text{less} \leftarrow \text{ValueFPR}(fs, fmt) < \text{ValueFPR}(ft, fmt)
\]
\[
\text{equal} \leftarrow \text{ValueFPR}(fs, fmt) = \text{ValueFPR}(ft, fmt)
\]
\[
\text{unordered} \leftarrow \text{false}
\]
\[
\text{endif}
\]
\[
\text{condition} \leftarrow (\text{cond}_2 \text{ and less}) \text{ or } (\text{cond}_1 \text{ and equal}) \text{ or } (\text{cond}_0 \text{ and unordered})
\]
\[
\text{FCR}[31]_{23} \leftarrow \text{condition}
\]
\[
\text{COC}[1] \leftarrow \text{condition}
\]

**Exceptions:**

Coprocessor unusable
Floating-Point exception

**Coprocessor Exceptions:**

Unimplemented operation exception
Invalid operation exception
**CEIL.L.fmt**

**Floating-Point Ceiling to Long Fixed-Point Format**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>fmt</td>
<td>0</td>
<td>fs</td>
<td>fd</td>
<td>CEIL.L</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

CEIL.L.fmt fd, fs

**Description:**

The contents of the floating-point register specified by fs are interpreted in the specified source format, fmt, and arithmetically converted to the single fixed-point format. The result is placed in the floating-point register specified by fd.

Regardless of the setting of the current rounding mode, the conversion is rounded as if the current rounding mode is round to $+\infty$ (2).

This instruction is valid only for conversion from single- or double-precision floating-point formats. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

When the source operand is an Infinity, NaN, or the correctly rounded integer result is outside of $-2^{63}$ to $2^{63}-1$, the Invalid operation exception is raised. If the Invalid operation is not enabled then no exception is taken and $2^{63}-1$ is returned.

**Operation:**

\[
T: \text{StoreFPR}(fd, L, \text{ConvertFmt}(\text{ValueFPR}(fs, fmt), fmt, L))
\]

**Exceptions:**

- Coprocessor unusable exception
- Floating-Point exception

**Coprocessor Exceptions:**

- Invalid operation exception
- Unimplemented operation exception
- Inexact exception
- Overflow exception
**CEIL.W.fmt**  
**Floating-Point Ceiling to Single Fixed-Point Format**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>fmt</td>
<td>0</td>
<td>00000</td>
<td>fs</td>
<td>fd</td>
<td>CEIL.W</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

CEIL.W.fmt fd, fs

**Description:**

The contents of the floating-point register specified by fs are interpreted in the specified source format, fmt, and arithmetically converted to the single fixed-point format. The result is placed in the floating-point register specified by fd.

Regardless of the setting of the current rounding mode, the conversion is rounded as if the current rounding mode is round to \( +\infty \) (2).

This instruction is valid only for conversion from a single- or double-precision floating-point formats. The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

When the source operand is an Infinity or NaN, or the correctly rounded integer result is outside of \(-2^{31}\) to \(2^{31}-1\), the Invalid operation exception is raised. If the Invalid operation is not enabled then no exception is taken and \(2^{31}-1\) is returned.

**Operation:**

\[
T: \text{StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W))}
\]

**Exceptions:**

- Coprocessor unusable exception
- Floating-Point exception

**Coprocessor Exceptions:**

- Invalid operation exception
- Unimplemented operation exception
- Inexact exception
- Overflow exception
**CFC1**  
Move Control Word From FPU  
(Coprocessor 1)

<table>
<thead>
<tr>
<th>COP1</th>
<th>CF</th>
<th>rt</th>
<th>fs</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>010001</td>
<td>00010</td>
<td>5</td>
<td>5</td>
<td>11</td>
</tr>
</tbody>
</table>

**Format:**
CFC1 rt, fs

**Description:**
The contents of the FPU control register fs are loaded into general register rt.
This operation is only defined when fs equals 0 or 31.
The contents of general register rt are undefined for time T of the instruction immediately following this load instruction.

**Operation:**

\[
\begin{align*}
T & : \quad \text{temp} \leftarrow \text{FCR}[fs] \\
T+1 & : \quad \text{GPR}[rt] \leftarrow (\text{temp}_{31})^{32} \parallel \text{temp}
\end{align*}
\]

**Exceptions:**
Coprocessor unusable exception
CTC1 Move Control Word To FPU (Coprocessor 1) CTC1

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>CT</td>
<td>rt</td>
<td>fs</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 1 0 0 0 1</td>
<td>0 0 1 1 0</td>
<td></td>
<td></td>
<td>0 0 0 0 0 0 0 0 0 0 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Format:
CTC1 rt, fs

Description:
The contents of general register rt are loaded into FPU control register fs. This operation is only defined when fs equals 31.
Writing to Control Register 31, the floating-point Control/Status register, causes an interrupt or exception if any cause bit and its corresponding enable bit are both set. The register will be written before the exception occurs. The contents of floating-point control register fs are undefined for time T of the instruction immediately following this load instruction.

Operation:

\[
\begin{align*}
T & : \text{temp} \leftarrow \text{GPR}[\text{rt}]_{31..0} \\
T+1 & : \text{FCR}[\text{fs}] \leftarrow \text{temp} \\
& \text{COC}[1] \leftarrow \text{FCR}[31]_{23}
\end{align*}
\]

Exceptions:
Coprocessor unusable exception
Floating-Point exception

Coprocessor Exceptions:
Unimplemented operation exception
Invalid operation exception
Division by zero exception
Inexact exception
Overflow exception
Underflow exception
CVT.D.fmt  
Convert to Double Floating-Point Format

Format:
CVT.D.fmt fd, fs

Description:
The contents of the floating-point register specified by fs is interpreted in the specified source format, fmt, and arithmetically converted to the double binary floating-point format. The result is placed in the floating-point register specified by fd.

This instruction is valid only for conversions from single floating-point format, 32-bit or 64-bit fixed-point format.

If the single floating-point or single fixed-point format is specified, the operation is exact. The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

Operation:

T: StoreFPR (fd, D, ConvertFmt(ValueFPR(fs, fmt), fmt, D))

Exceptions:
- Coprocessor unusable exception
- Floating-Point exception

Coprocessor Exceptions:
- Invalid operation exception
- Unimplemented operation exception
- Inexact exception
- Overflow exception
- Underflow exception
**Format:**

CVT.L.fmt fd, fs

**Description:**

The contents of the floating-point register specified by fs are interpreted in the specified source format, fmt, and arithmetically converted to the long fixed-point format. The result is placed in the floating-point register specified by fd.

This instruction is valid only for conversions from single- or double-precision floating-point formats.

When the source operand is an Infinity, NaN, or the correctly rounded integer result is outside of \(-2^{63}\) to \(2^{63} - 1\), the Invalid operation exception is raised. If the Invalid operation is not enabled then no exception is taken and \(2^{63} - 1\) is returned.

**Operation:**

\[
T: \quad \text{StoreFPR (fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L))}
\]

**Exceptions:**

Coproccessor unusable exception
Floating-Point exception

**Coprocessor Exceptions:**

Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
Format:
CVT.S.fmt fd, fs

Description:
The contents of the floating-point register specified by fs are interpreted in the specified source format, fmt, and arithmetically converted to the single binary floating-point format. The result is placed in the floating-point register specified by fd. Rounding occurs according to the currently specified rounding mode.

This instruction is valid only for conversions from double floating-point format, or from 32-bit or 64-bit fixed-point format. The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

Operation:

T: StoreFPR(fd, S, ConvertFmt(ValueFPR(fs, fmt), fmt, S))

Exceptions:
- Coprocessor unusable exception
- Floating-Point exception

Coprocessor Exceptions:
- Invalid operation exception
- Unimplemented operation exception
- Inexact exception
- Overflow exception
- Underflow exception
### CVT.W.fmt Convert to Fixed-Point Format

**Format:**

CVT.W.fmt fd, fs

**Description:**

The contents of the floating-point register specified by fs are interpreted in the specified source format, fmt, and arithmetically converted to the single fixed-point format. The result is placed in the floating-point register specified by fd. This instruction is valid only for conversion from a single- or double-precision floating-point formats. The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

When the source operand is an Infinity or NaN, or the correctly rounded integer result is outside of $-2^{31}$ to $2^{31} - 1$, an Invalid operation exception is raised. If Invalid operation is not enabled, then no exception is taken and $2^{31} - 1$ is returned.

**Operation:**

\[
T: \quad \text{StoreFPR}(fd, W, \text{ConvertFmt}(\text{ValueFPR}(fs, fmt), fmt, W))
\]

**Exceptions:**

- Coprocessor unusable exception
- Floating-Point exception

**Coprocessor Exceptions:**

- Invalid operation exception
- Unimplemented operation exception
- Inexact exception
- Overflow exception
**DIV.fmt**  Floating-Point Divide  **DIV.fmt**

<table>
<thead>
<tr>
<th>31</th>
<th>26 25</th>
<th>21 20</th>
<th>16 15</th>
<th>11 10</th>
<th>6 5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>fmt</td>
<td>ft</td>
<td>fs</td>
<td>fd</td>
<td>DIV</td>
<td>0</td>
</tr>
<tr>
<td>0 1 0 0 0 1</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
</tr>
</tbody>
</table>

**Format:**  
DIV.fmt fd, fs, ft

**Description:**  
The contents of the floating-point registers specified by fs and ft are interpreted in the specified format and arithmetically divided. The result is rounded as if calculated to infinite precision and then rounded to the specified format, according to the current rounding mode. The result is placed in the floating-point register specified by fd.

This instruction is valid for only single or double precision floating-point formats.

The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

**Operation:**

\[
T: \text{StoreFPR (fd, fmt, ValueFPR(fs, fmt)/ValueFPR(ft, fmt))}
\]

**Exceptions:**
- Coprocessor unusable exception
- Floating-Point exception

**Coprocessor Exceptions:**
- Unimplemented operation exception
- Invalid operation exception
- Division-by-zero exception
- Inexact exception
- Overflow exception
- Underflow exception
Doubleword Move From Floating-Point Coprocessor

**Format:**

```
DMFC1 rt, fs
```

**Description:**

The contents of register $fs$ from the floating-point coprocessor is stored into processor register $rt$.

The contents of general register $rt$ are undefined for time $T$ of the instruction immediately following this load instruction.

The $FR$ bit in the Status register specifies whether all 32 registers of the R4600 are addressable. When $FR$ equals zero, this instruction is not defined when the least significant bit of $fs$ is non-zero. When $FR$ is set, $fs$ may specify either odd or even registers.

**Operation:**

```
T: if SR26 = 1 then
   data ← CPR[1,fs]
else
   data ← CPR[1,fs4..1 || 0]
endif
T+1: GPR[rt] ← data
```

**Exceptions:**

Coprocessor unusable exception
**DMTC1 Doubleword Move To Floating-Point Coprocessor**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>DMT</td>
<td>rt</td>
<td>fs</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Format:**
DMTC1 rt, fs

**Description:**
The contents of general register rt are loaded into coprocessor register fs of the CP1.
The contents of floating-point register fs are undefined for time T of the instruction immediately following this load instruction.
The FR bit in the Status register specifies whether all 32 registers of the R4600 are addressable. When FR equals zero, this instruction is not defined when the least significant bit of fs is non-zero. When FR equals one, fs may specify either odd or even registers.

**Operation:**

T: data ← GPR[rt]
T+1: if SR26 = 1 then
    CPR[1, fs] ← data
else
    CPR[1, fs4..1 || 0] ← data
endif

**Exceptions:**
Coprocessor unusable exception
**FLOOR.L.fmt**  
**Floor to Long**  
**Fixed-Point Format**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>fmt</td>
<td>0</td>
<td>fs</td>
<td>fd</td>
<td>FLOOR.L</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010001</td>
<td>00000</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

FLOOR.L.fmt fd, fs

**Description:**

The contents of the floating-point register specified by fs are interpreted in the specified source format, fmt, and arithmetically converted to the single fixed-point format. The result is placed in the floating-point register specified by fd.

Regardless of the setting of the current rounding mode, the conversion is rounded as if the current rounding mode is round to -∞ (3).

This instruction is valid only for conversion from single- or double-precision floating-point formats.

When the source operand is an Infinity, NaN, or the correctly rounded integer result is outside of \(-2^{63}\) to \(2^{63}-1\), the Invalid operation exception is raised. If the Invalid operation is not enabled then no exception is taken and \(2^{63}-1\) is returned.

**Operation:**

\[
T: \text{StoreFPR}(fd, L, \text{ConvertFmt}(\text{ValueFPR}(fs, fmt), fmt, L))
\]

**Exceptions:**

Coprocessor unusable exception  
Floating-Point exception

**Coprocessor Exceptions:**

Invalid operation exception  
Unimplemented operation exception  
Inexact exception  
Overflow exception
FLOOR.W.fmt

**Format:**
FLOOR.W.fmt fd, fs

**Description:**
The contents of the floating-point register specified by fs are interpreted in the specified source format, fmt, and arithmetically converted to the single fixed-point format. The result is placed in the floating-point register specified by fd.

Regardless of the setting of the current rounding mode, the conversion is rounded as if the current rounding mode is round to –∞ (RM = 3).

This instruction is valid only for conversion from a single- or double-precision floating-point formats. The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

When the source operand is an Infinity or NaN, or the correctly rounded integer result is outside of \(-2^{31}\) to \(2^{31}-1\), an Invalid operation exception is raised. If Invalid operation is not enabled, then no exception is taken and \(2^{31}-1\) is returned.

**Operation:**

\[
T: \text{StoreFPR}(fd, W, \text{ConvertFmt}(	ext{ValueFPR}(fs, fmt), fmt, W))
\]

**Exceptions:**
Coprocessor unusable exception
Floating-Point exception

**Coprocessor Exceptions:**
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
**Format:**

LDC1 ft, offset(base)

**Description:**

The 16-bit offset is sign-extended and added to the contents of general register base to form an unsigned effective address.

When FR = 0, the contents of the doubleword at the memory location specified by the effective address is loaded into registers ft and ft+1 of the floating-point coprocessor. This instruction is not valid, and is undefined, when the least significant bit of ft is non-zero.

When FR = 1, the contents of the doubleword at the memory location specified by the effective address are loaded into the 64-bit register ft of the floating point coprocessor.

The FR bit of the Status register [SR26] specifies whether all 32 registers of the R4600 are addressable. If FR equals zero, this instruction is not defined when the least significant bit of ft is non-zero. If FR equals one, ft may specify either odd or even registers.

If any of the three least-significant bits of the effective address are non-zero, an address error exception takes place.

**Operation:**

\[
\begin{align*}
T: & \quad \text{vAddr} \leftarrow ((\text{offset}_{15})^{48} || \text{offset}_{15..0}) + \text{GPR}[\text{base}] \\
& \quad \text{pAddr, uncached} \leftarrow \text{AddressTranslation (vAddr, DATA)} \\
& \quad \text{data} \leftarrow \text{LoadMemory(uncached, DOUBLEWORD, pAddr, vAddr, DATA)} \\
& \quad \text{if SR}_{26} = 1 \text{ then} \\
& \quad \quad \text{CPR}[1, ft] \leftarrow \text{data} \\
& \quad \quad \text{else} \\
& \quad \quad \text{CPR}[1, ft_{4..1} || 0] \leftarrow \text{data} \\
& \quad \text{endif}
\end{align*}
\]

**Exceptions:**

- Coprocessor unusable
- TLB refill exception
- TLB invalid exception
- Bus error exception
- Address error exception
LWC1  Load Word to FPU  (Coprocessor 1)

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>LWC1 1 1 0 0 0 1</td>
<td>base</td>
<td>ft</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

LWC1 ft, offset(base)

**Description:**

The 16-bit offset is sign-extended and added to the contents of general register base to form an unsigned effective address. The contents of the word at the memory location specified by the effective address is loaded into register ft of the floating-point coprocessor.

The FR bit of the Status register specifies whether all 64-bit Floating-Point registers are addressable. If FR equals zero, LWC1 loads either the high or low half of the 16 even Floating-Point registers. If FR equals one, LWC1 loads the low 32-bits of both even and odd Floating-Point registers.

If either of the two least-significant bits of the effective address is non-zero, an address error exception occurs.

**Operation:**

\[
\begin{align*}
T: \quad & v\text{Addr} \leftarrow (\text{offset}_{15}^{48} \ || \ \text{offset}_{15..0}) + GPR[\text{base}] \\
& (p\text{Addr}, \text{uncached}) \leftarrow \text{AddressTranslation}(v\text{Addr}, \text{DATA}) \\
& p\text{Addr} \leftarrow p\text{Addr}_{\text{SIZE}-1..3} \ || (p\text{Addr}_{2..0} \ xor (\text{ReverseEndian} \ || \ 0^2)) \\
& \text{mem} \leftarrow \text{LoadMemory}(\text{uncached}, \text{WORD}, p\text{Addr}, \text{DATA}) \\
& \text{byte} \leftarrow v\text{Addr}_{2..0} \ xor (\text{BigEndianCPU} \ || \ 0^2) \\
& \text{if } SR_{26} = 1 \text{ then} \\
& \text{CPR}[1, ft] \leftarrow \text{undefined}^{32} \ || \ \text{mem}_{31+8\text{byte}..8\text{byte}} \\
& \text{else if } \text{ft} = 0 \text{ then} \\
& \text{CPR}[1, ft_{4..1} \ || \ 0] \leftarrow \text{CPR}[1, ft_{4..1} \ || \ 0]_{64..32} \ || \ \text{mem}_{31+8\text{byte}..8\text{byte}} \\
& \text{else} \\
& \text{CPR}[1, ft_{4..1} \ || \ 0] \leftarrow \text{mem}_{31+8\text{byte}..8\text{byte}} \ || \ \text{CPR}[1, ft_{4..1} \ || \ 0]_{31..0} \\
& \text{endif}
\end{align*}
\]

**Exceptions:**

- Coprocessor unusable
- TLB refill exception
- TLB invalid exception
- Bus error exception
- Address error exception
**MFC1** Move From FPU (Coprocessor 1)

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>MF</td>
<td>rt</td>
<td>fs</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0 1 0 0 0 1</td>
<td>0 0 0 0 0</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>11</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
MFC1 rt, fs

**Description:**
The contents of register *fs* from the floating-point coprocessor are loaded into processor register *rt*.

The contents of register *rt* are undefined for time *T* of the instruction immediately following this load instruction.

The FR bit of the Status register specifies whether all 32 registers of the R4600 are addressable. If FR equals zero, MFC1 loads either the high or low half of the 16 even Floating-Point registers. If FR equals one, MFC1 stores the low 32-bits of both even and odd Floating-Point registers.

**Operation:**

\[
\text{T: if SR}_{26} = 1 \text{ then}
\]
\[
\text{data} \leftarrow \text{CPR}[1, \text{fs}]
\]
\[
\text{else if } \text{ fs}_{0} = 0 \text{ then}
\]
\[
\text{data} \leftarrow \text{CPR}[1, \text{fs}_{4..1} \mathbin{||} 0]_{31..0}
\]
\[
\text{else}
\]
\[
\text{data} \leftarrow \text{CPR}[1, \text{fs}_{4..1} \mathbin{||} 0]_{63..32}
\]
\[
\text{endif}
\]
\[
\text{T+1: GPR}[rt] \leftarrow (\text{data}_{31})^{32} \mathbin{||} \text{data}
\]

**Exceptions:**
Coprocessor unusable exception
**Format:**
MOV.fmt fd, fs

**Description:**
The contents of the FPU register specified by fs are interpreted in the specified format and are copied into the FPU register specified by fd. The move operation is non-arithmetic; no IEEE 754 exceptions occur as a result of the instruction.
This instruction is valid only for single- or double-precision floating-point formats.
The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

**Operation:**

T: StoreFPR(fd, fmt, ValueFPR(fs, fmt))

**Exceptions:**
Coprocessor unusable exception
Floating-Point exception

**Coprocessor Exceptions:**
Unimplemented operation exception
**Format:**
MTC1 rt, fs

**Description:**
The contents of register rt are loaded into the FPU general register at location fs.
The contents of floating-point register fs is undefined for time T of the instruction immediately following this load instruction.
The FR bit of the Status register specifies whether all 32 registers of the R4600 are addressable. If FR equals zero, MTC1 loads either the high or low half of the 16 even Floating-Point registers. If FR equals one, MTC1 loads the low 32-bits of both even and odd Floating-Point registers.

**Operation:**

| T: | data ← GPR[rt]31..0 |
| T+1: | if SR26 = 1 then |
| | CPR[1, fs] ← undefined32 || data |
| | else if fs0=0 then |
| | CPR[1, fs4..1 || 0] ← CPR[1, fs4..1 || 0]63..32 || data |
| | else |
| | CPR[1, fs4..1 || 0] ← data || CPR[1, fs4..1 || 0]31..0 |
| | endif |

**Exceptions:**
Coprocessor unusable exception
**Format:**

MUL.fmt fd, fs, ft

**Description:**

The contents of the floating-point registers specified by fs and ft are interpreted in the specified format and arithmetically multiplied. The result is rounded as if calculated to infinite precision and then rounded to the specified format, according to the current rounding mode. The result is placed in the floating-point register specified by fd.

This instruction is valid only for single- or double-precision floating-point formats.

The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

**Operation:**

\[
T: \text{StoreFPR (fd, fmt, ValueFPR(fs, fmt) \times ValueFPR(ft, fmt))}
\]

**Exceptions:**

- Coprocessor unusable exception
- Floating-Point exception

**Coprocessor Exceptions:**

- Unimplemented operation exception
- Invalid operation exception
- Inexact exception
- Overflow exception
- Underflow exception
### Format:

\[ \text{NEG.fmt fd, fs} \]

### Description:

The contents of the FPU register specified by \( fs \) are interpreted in the specified format and the arithmetic negation is taken (polarity of the sign-bit is changed). The result is placed in the FPU register specified by \( fd \).

The negate operation is arithmetic; an NaN operand signals invalid operation.

This instruction is valid only for single- or double-precision floating-point formats. The operation is not defined if bit 0 of any register specification is set and the \( FR \) bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the \( FR \) bit in the Status register equals one, both even and odd register numbers are valid.

### Operation:

\[
T: \text{StoreFPR}(fd, fmt, \text{Negate(ValueFPR}(fs, fmt)))
\]

### Exceptions:

- Coprocessor unusable exception
- Floating-Point exception

### Coprocessor Exceptions:

- Unimplemented operation exception
- Invalid operation exception
**ROUND.L.fmt**  
**Floating-Point Round to Long Fixed-Point Format**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>fmt</td>
<td>0</td>
<td>fs</td>
<td>fd</td>
<td>ROUND.L</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 1 0 0 0 1</td>
<td>0 0 0 0</td>
<td>0 0 0 0</td>
<td>0 0 1 0 0 0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

ROUND.L.fmt fd, fs

**Description:**

The contents of the floating-point register specified by fs are interpreted in the specified source format, fmt, and arithmetically converted to the long fixed-point format. The result is placed in the floating-point register specified by fd.

Regardless of the setting of the current rounding mode, the conversion is rounded as if the current rounding mode is round to nearest/even (0).

This instruction is valid only for conversion from single- or double-precision floating-point formats.

When the source operand is an Infinity, NaN, or the correctly rounded integer result is outside of \(-2^{63}\) to \(2^{63}-1\), the Invalid operation exception is raised. If the Invalid operation is not enabled then no exception is taken and \(2^{63}-1\) is returned.

**Operation:**

T: StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L))

**Exceptions:**

Coprocessor unusable exception  
Floating-Point exception

**Coprocessor Exceptions:**

Invalid operation exception  
Unimplemented operation exception  
Inexact exception  
Overflow exception
ROUND.W.fmt Floating-Point ROUND.W.fmt
Round to Single
Fixed-Point Format

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>fmt</td>
<td>0</td>
<td>00000</td>
<td>fs</td>
<td>fd</td>
<td>ROUND.W</td>
<td>001100</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**
ROUND.W.fmt fd, fs

**Description:**
The contents of the floating-point register specified by fs are interpreted in the specified source format, fmt, and arithmetically converted to the single fixed-point format. The result is placed in the floating-point register specified by fd.

Regardless of the setting of the current rounding mode, the conversion is rounded as if the current rounding mode is round to the nearest/even (RM = 0).

This instruction is valid only for conversion from a single- or double-precision floating-point formats. The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

When the source operand is an Infinity or NaN, or the correctly rounded integer result is outside of $-2^{31}$ to $2^{31} - 1$, an Invalid operation exception is raised. If Invalid operation is not enabled, then no exception is taken and $2^{31} - 1$ is returned.

**Operation:**

\[
T: \text{StoreFPR}(fd, W, \text{ConvertFmt(ValueFPR(fs, fmt), fmt, W)})
\]

**Exceptions:**
- Coprocessor unusable exception
- Floating-Point exception

**Coprocessor Exceptions:**
- Invalid operation exception
- Unimplemented operation exception
- Inexact exception
- Overflow exception
**Format:**  
SDC1 ft, offset(base)

**Description:**  
The 16-bit offset is sign-extended and added to the contents of general register base to form an unsigned effective address.

When FR = 0, the contents of registers ft and ft+1 from the floating-point coprocessor are stored at the memory location specified by the effective address. This instruction is not valid, and is undefined, when the least significant bit of ft is non-zero.

When FR = 1, the 64-bit register ft is stored to the contents of the doubleword at the memory location specified by the effective address. The FR bit of the Status register (SR26) specifies whether all 32 registers of the R4600 are addressable. When FR equals zero, this instruction is not defined if the least significant bit of ft is non-zero. If FR equals one, ft may specify either odd or even registers.

If any of the three least-significant bits of the effective address are non-zero, an address error exception takes place.

**Operation:**

```
T:  vAddr ← (offset_{15})^{16} || offset_{15..0} + GPR[base]
    (pAddr, uncached) ← AddressTranslation (vAddr, DATA)
    if SR_{26} = 1
        data ← CPR[1, ft]
    else
        data ← CPR[1, ft_{4..1} || 0]
    endif
    StoreMemory(uncached, DOUBLEWORD, data, pAddr, vAddr, DATA)
```

**Exceptions:**  
- Coprocessor unusable
- TLB refill exception
- TLB invalid exception
- TLB modification exception
- Bus error exception
- Address error exception
**SQRT.fmt** | **Floating-Point Square Root** | **SQRT.fmt**
--- | --- | ---
| 31 | 26 | 25 | 21 | 20 | 16 | 15 | 11 | 10 | 6 | 5 | 0 |
| COP1 | fmt | 0 | 0 | fs | fd | SQRT |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |

**Format:**

SQRT.fmt fd, fs

**Description:**
The contents of the floating-point register specified by fs are interpreted in the specified format and the positive arithmetic square root is taken. The result is rounded as if calculated to infinite precision and then rounded to the specified format, according to the current rounding mode. If the value of fs corresponds to –0, the result will be –0. The result is placed in the floating-point register specified by fd.

This instruction is valid only for single- or double-precision floating-point formats.

The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

**Operation:**

T: \[
\text{StoreFPR}(fd, \text{fmt}, \text{SquareRoot(ValueFPR(fs, fmt)))}
\]

**Exceptions:**
- Coprocessor unusable exception
- Floating-Point exception

**Coprocessor Exceptions:**
- Unimplemented operation exception
- Invalid operation exception
- Inexact exception
SUB.fmt Floating-Point Subtract SUB.fmt

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>fmt</td>
<td>ft</td>
<td>fs</td>
<td>fd</td>
<td>SUB</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td>0</td>
</tr>
</tbody>
</table>

Format:
SUB.fmt fd, fs, ft

Description:
The contents of the floating-point registers specified by fs and ft are interpreted in the specified format and arithmetically subtracted. The result is rounded as if calculated to infinite precision and then rounded to the specified format, according to the current rounding mode. The result is placed in the floating-point register specified by fd.

This instruction is valid only for single- or double-precision floating-point formats.

The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

Operation:

T: StoreFPR (fd, fmt, ValueFPR(fs, fmt) – ValueFPR(ft, fmt))

Exceptions:
Coprocessor unusable exception
Floating-Point exception

Coprocessor Exceptions:
Unimplemented operation exception
Invalid operation exception
Inexact exception
Overflow exception
Underflow exception
**SWC1 Store Word from FPU (Coprocessor 1)**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SWC1</td>
<td>base</td>
<td>ft</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>111001</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

SWC1 ft, offset(base)

**Description:**

The 16-bit *offset* is sign-extended and added to the contents of general register *base* to form an unsigned effective address. The contents of register *ft* from the floating-point coprocessor are stored at the memory location specified by the effective address.

The *FR* bit of the *Status* register specifies whether all 64-bit floating-point registers are addressable.

- If FR = 0, SWC1 stores either the high or low half of the 16 even floating-point registers.
- If FR = 1, SWC1 stores the low 32-bits of both even and odd floating-point registers.

If either of the two least-significant bits of the effective address are non-zero, an address error exception occurs.

**Operation:**

\[
\text{T: } vAddr \leftarrow ((\text{offset}_{15})^{48} \ || \ \text{offset}_{15..0}) + \text{GPR}[\text{base}]
\]

\[
(pAddr, \text{uncached}) \leftarrow \text{AddressTranslation}(vAddr, \text{DATA})
\]

\[
pAddr \leftarrow pAddr_{\text{PSIZE-1..3}} \ || \ (pAddr_{2..0} \ xor (\text{ReverseEndian} \ || \ 0^2))
\]

\[
\text{byte} \leftarrow vAddr_{2..0} \ xor (\text{BigEndianCPU} \ || \ 0^2)
\]

if SR_{26} = 1 then

\[
data \leftarrow \text{CPR}[1, \ ft]_{63..8\text{-byte}} \ || \ 0^8\text{-byte}
\]

else if ft_{0} = 0 then

\[
data \leftarrow \text{CPR}[1, \ ft_{4..1}] \ || \ 0_{63..8\text{-byte}} \ || \ 0^8\text{-byte}
\]

else

\[
data \leftarrow 0^{32\text{-byte}} \ || \ \text{CPR}[1, \ ft_{4..1}] \ || \ 0_{63..32\text{-byte}}
\]

endif

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA)

**Exceptions:**

- Coprocessor unusable
- TLB refill exception
- TLB invalid exception
- TLB modification exception
- Bus error exception
- Address error exception
**TRUNC.L.fmt**

**Floating-Point Truncate to Long**

**TRUNC.L.fmt**

**Fixed-Point Format**

<table>
<thead>
<tr>
<th>31</th>
<th>26 25</th>
<th>21 20</th>
<th>16 15</th>
<th>11 10</th>
<th>6 5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>fmt</td>
<td>0</td>
<td>fs</td>
<td>fd</td>
<td>TRUNC.L</td>
<td>0 0 1 0 0 1</td>
</tr>
</tbody>
</table>

**Format:**

TRUNC.L.fmt fd, fs

**Description:**

The contents of the floating-point register specified by fs are interpreted in the specified source format, fmt, and arithmetically converted to the single fixed-point format. The result is placed in the floating-point register specified by fd.

Regardless of the setting of the current rounding mode, the conversion is rounded as if the current rounding mode is round toward zero (1).

This instruction is valid only for conversion from single- or double-precision floating-point formats.

When the source operand is an Infinity, NaN, or the correctly rounded integer result is outside of \(-2^{63}\) to \(2^{63}-1\), the Invalid operation exception is raised. If the Invalid operation is not enabled then no exception is taken and \(2^{63}-1\) is returned.

**Operation:**

\[
T: \text{StoreFPR}(fd, L, \text{ConvertFmt}(\text{ValueFPR}(fs, fmt), fmt, L))
\]

**Exceptions:**

Coprocessor unusable exception
Floating-Point exception

**Coprocessor Exceptions:**

Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
TRUNC.W.fmt Floating-Point Truncate to Single Fixed-Point Format

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>fmt</td>
<td>0</td>
<td>00000</td>
<td>fs</td>
<td>fd</td>
<td>TRUNC.W</td>
<td>001101</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Format: TRUNC.W.fmt fd, fs

Description:
The contents of the FPU register specified by fs are interpreted in the specified source format fmt and arithmetically converted to the single fixed-point format. The result is placed in the FPU register specified by fd.

Regardless of the setting of the current rounding mode, the conversion is rounded as if the current rounding mode is round toward zero (RM = 1).

This instruction is valid only for conversion from a single- or double-precision floating-point formats. The operation is not defined if bit 0 of any register specification is set and the FR bit in the Status register equals zero, since the register numbers specify an even-odd pair of adjacent coprocessor general registers. When the FR bit in the Status register equals one, both even and odd register numbers are valid.

When the source operand is an Infinity or NaN, or the correctly rounded integer result is outside of $-2^{31}$ to $2^{31}-1$, an Invalid operation exception is raised. If Invalid operation is not enabled, then no exception is taken and $2^{31}-1$ is returned.

Operation:

```
T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W))
```

Exceptions:
Coprocessor unusable exception
Floating-Point exception

Coprocessor Exceptions:
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
### FPU Instruction Opcode Bit Encoding

**Figure B.3** Bit Encoding for FPU Instructions

<table>
<thead>
<tr>
<th>31..29</th>
<th>28..26</th>
<th>25..24</th>
<th>20..19</th>
<th>18..16</th>
<th>15..12</th>
<th>12..9</th>
<th>9..6</th>
<th>6..3</th>
<th>3..0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Opcode</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td></td>
<td></td>
</tr>
<tr>
<td>COP1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>LWC1</td>
<td>LDC1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>SWC1</td>
<td>SDC1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>br</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td></td>
<td></td>
</tr>
<tr>
<td>BCF</td>
<td>BCT</td>
<td>BCFL</td>
<td>BCTL</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
</tr>
<tr>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
</tr>
<tr>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
</tr>
<tr>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
<td>γ</td>
</tr>
<tr>
<td>function</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td></td>
<td></td>
</tr>
<tr>
<td>ADD</td>
<td>SUB</td>
<td>MUL</td>
<td>DIV</td>
<td>SQRT</td>
<td>ABS</td>
<td>MOV</td>
<td>NEG</td>
<td></td>
<td></td>
</tr>
<tr>
<td>ROUND.L</td>
<td>TRUNC.L</td>
<td>CEIL.L</td>
<td>FLOOR.L</td>
<td>ROUND.W</td>
<td>TRUNC.W</td>
<td>CEIL.W</td>
<td>FLOOR.W</td>
<td></td>
<td></td>
</tr>
<tr>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td></td>
<td></td>
</tr>
<tr>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td></td>
<td></td>
</tr>
<tr>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td></td>
<td></td>
</tr>
<tr>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td></td>
<td></td>
</tr>
<tr>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td></td>
<td></td>
</tr>
<tr>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td></td>
<td></td>
</tr>
<tr>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td></td>
<td></td>
</tr>
<tr>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td>δ</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Key to Table:</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>γ Operation codes marked with a gamma cause a reserved instruction exception. They are reserved for future versions of the architecture.</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>δ Operation codes marked with a delta cause unimplemented operation exceptions in all current implementations and are reserved for future versions of the architecture.</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>η Valid when 64-bit operand opcodes are enabled.</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Introduction
This appendix lists cycle operation counts and caveats for R4600/R4700 cache operations timing.

Caveats About Cache Operations
1. All cycle counts are in processor cycles.
2. All cache ops have lower priority than cache misses, write backs and external requests. If the write back buffer contains unwritten data when a cache op is executed, the write back buffer will be retired before the cache op is begun.
   If an instruction cache miss occurs at the same time as a cache op is executed, the instruction cache miss will be handled first. Cache ops are mutually exclusive with respect to data cache misses. External requests will be completed before beginning a cache op.
3. For all data cache ops the cache op machine waits for the store buffer and response buffer to empty before beginning the cache op. This can add 3 cycles to any data cache op if there is data in the response buffer or store buffer. The response buffer contains data from the last data cache miss that has not yet been written to the data cache. The store buffer contains delayed store data waiting to be written to the data cache.
4. Cache ops of the form xxxx_Writeback_xxxx may perform a write back which will fill the write back buffer. Write backs can affect subsequent cache ops, since they will stall until the write back buffer is written back to memory. Cache ops which fill the write back buffer are noted as (writeback) in the following tables.
5. All cycle counts are best case assuming no interference from the mechanisms described above.

Cache Operations Tables
Table C.1 and Table C.2 show data cache and instruction cache operations information. A detailed explanation of the Fill_I equation follows Table C.2.
<table>
<thead>
<tr>
<th>Code</th>
<th>Name</th>
<th>Number of Cycles</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Index_Writeback_Invalidate_D</td>
<td>10 cycles if the cache line is clean. 12 cycles if the cache line is dirty (Writeback).</td>
</tr>
<tr>
<td>1</td>
<td>Index_Load_Tag_D</td>
<td>7 cycles.</td>
</tr>
<tr>
<td>2</td>
<td>Index_Store_Tag_D</td>
<td>8 cycles.</td>
</tr>
<tr>
<td>3</td>
<td>Create_Dirty_Exclusive_D</td>
<td>10 cycles for a cache hit. 13 cycles for a cache miss if the cache line is clean. 15 cycles for a cache miss if the cache line is dirty (Writeback).</td>
</tr>
<tr>
<td>4</td>
<td>Hit_Invalidate_D</td>
<td>7 cycles for a cache miss. 9 cycles for a cache hit.</td>
</tr>
<tr>
<td>5</td>
<td>Hit_Writeback_Invalidate_D</td>
<td>7 cycles for a cache miss. 12 cycles for a cache hit if the cache line is clean. 14 cycles for a cache hit if the cache line is dirty (Writeback).</td>
</tr>
<tr>
<td>7</td>
<td>Hit_Writeback_D</td>
<td>7 cycles for a cache miss. 10 cycles for a cache hit if the cache line is clean. 14 cycles for a cache hit if the cache line is dirty (Writeback).</td>
</tr>
</tbody>
</table>

**Note:**
1Code number corresponds to the code column of the CACHE instruction in Appendix A.
Details on the Fill_I Equation

These are the definitions for the Hit_Writeback_I equation in Table C.2:

**SYSDIV:** Number of processor cycles per system cycle; ranges from 2 - 8.

**ML:** Number of system cycles of memory latency, defined as the number of cycles the SysAD bus is driven by the external agent before the first double word of data appears.

**D:** Number of system cycles required to return the block of data, defined as the number of cycles beginning when the first double word of data appears on the SysAD bus and ending when the last double word of data appears on the SysAD bus, inclusive.

---

<table>
<thead>
<tr>
<th>Code</th>
<th>Name</th>
<th>Number of Cycles</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Index_Invalidate_I</td>
<td>7 cycles.</td>
</tr>
<tr>
<td>1</td>
<td>Index_Load_Tag_I</td>
<td>7 cycles.</td>
</tr>
<tr>
<td>2</td>
<td>Index_Store_Tag_I</td>
<td>8 cycles.</td>
</tr>
<tr>
<td>3</td>
<td>n/a</td>
<td>n/a</td>
</tr>
<tr>
<td>4</td>
<td>Hit_Invalidate_I</td>
<td>7 cycles for a cache miss.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>9 cycles for a cache hit.</td>
</tr>
<tr>
<td>5</td>
<td>Fill_I</td>
<td>Cycle number must be calculated based on the system response to a memory access, because Fill_I causes an instruction cache refill from memory. This equation yields the number of processor cycles for a Fill_I cache op: (20 + 10 + 0 + \frac{1}{SYSDIV} + (2 \times SYSDIV) + (ML \times SYSDIV) + (D \times SYSDIV))</td>
</tr>
<tr>
<td>6</td>
<td>Hit_Writeback_I</td>
<td>7 cycles for a cache miss.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>20 cycles for a cache hit (Writeback).</td>
</tr>
</tbody>
</table>

**Note:**
1. Code number corresponds to the code column of the CACHE instruction in Appendix A.
2. For definitions and discussion of the Fill_I equation variables refer to the subsection “Details of the Fill_I Equation,” which follows this table.
3. The term \(0 - (SYSDIV - 1)\) has a value between 0 and \(SYSDIV - 1\), depending on the alignment of the execution of the cache op with the system clock.

---

Table C.2  Primary Instruction Cache Operations
The R4600/R4700 provides a means to reduce the amount of power consumed by the internal core when the CPU would otherwise not be performing any useful operations. This is known as “Standby Mode” and is discussed in this appendix.

**Entering Standby Mode**

To enter Standby Mode, first execute the WAIT instruction. When the WAIT instruction finishes the W pipe-stage, if the **SysAD** bus is currently idle, the internal clocks will shut down, thus freezing the pipeline. The PLL, internal timer, some of the input pin clocks (**Int[5:0]**, **NMI***, **ExtRqst***, **Reset***, and **ColdReset***) and the output clocks (**TClock[1:0]**, **RClock[1:0]**, **SyncOut**, **ModeClock** and **MasterOut**) will continue to run. If the conditions are not correct when the WAIT instruction finishes the W pipe-stage (i.e., the **SysAD** bus is not idle), the WAIT is treated as a NOP.

Once the CPU is in Standby Mode, any interrupt, including **ExtRqst*** or **Reset***, will cause the CPU to exit Standby Mode.
This appendix identifies the R4600 and R4700 Coprocessor 0 hazards. In Table E.1 the number of instructions required between instruction A (which places a value in a CP0 register) and instruction B (which uses the same register as a source) is computed using the following formula:

\[(\text{destination stage of } A) - (\text{source stage of } B) - 1\]

<table>
<thead>
<tr>
<th>Operation</th>
<th>Name</th>
<th>SOURCE Stage</th>
<th>Name</th>
<th>DESTINATION Stage</th>
</tr>
</thead>
<tbody>
<tr>
<td>MTC0</td>
<td>gpr rt</td>
<td>2(A)</td>
<td>cpr rd</td>
<td>4(W)α</td>
</tr>
<tr>
<td>MFC0</td>
<td>cpr rd</td>
<td>2(A)</td>
<td>gpr rt</td>
<td>4(W)α</td>
</tr>
<tr>
<td>TLBR</td>
<td>Index, TLB</td>
<td>2(A)</td>
<td>PageMask, EntryHi, EntryLo0, EntryLo1</td>
<td>4(W)</td>
</tr>
<tr>
<td>TLBW1</td>
<td>Index or Random, PageMask, EntryHi, EntryLo0, EntryLo1</td>
<td>2(A)</td>
<td>TLB</td>
<td>3(D)β</td>
</tr>
<tr>
<td>TLBWR</td>
<td>Index or Random, PageMask, EntryHi, EntryLo0, EntryLo1</td>
<td>2(A)</td>
<td>TLB</td>
<td>3(D)β</td>
</tr>
<tr>
<td>TLBP</td>
<td>PageMask, EntryHi</td>
<td>2(A)</td>
<td>Index</td>
<td>4(W)</td>
</tr>
<tr>
<td>ERET</td>
<td>EPC or ErrorEPC, Status.ERL</td>
<td>2(A)</td>
<td>Status.EXL, Status.ERL</td>
<td>4(W)γ</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>LLbit</td>
<td>4(W)</td>
</tr>
<tr>
<td>CACHE Index Load Tag</td>
<td></td>
<td></td>
<td>TagLo, TagHi, ECC</td>
<td>3(D)</td>
</tr>
<tr>
<td>CACHE Index Store Tag</td>
<td>TagLo, TagHi, ECC</td>
<td>3(D)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Instruction fetch</td>
<td>EntryHi.ASID, Status.KSU, Status.RE, Config.K0C, TLB</td>
<td>0(I)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Instruction fetch</td>
<td>Status.ERL, Status.EXL</td>
<td>0(I)γ</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Instruction fetch exception</td>
<td>EPC, Status, Cause</td>
<td>4(W)</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>BadVAddr, Context, EntryHi</td>
<td>4(W)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Coprocessor usable test</td>
<td>Status.CU, Status.KSU, Status.EXL, Status.ERL</td>
<td>1(R)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Interrupt</td>
<td>Cause.IP, Status.IM, Status.IE, Status.EXL, Status.ERL</td>
<td>2(A)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Load/Store</td>
<td>EntryHi.ASID, Status.KSU, Status.RE, Status.EXL, Status.ERL Config.K0C, TLB</td>
<td>2(A)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Load/Store exception</td>
<td>EPC, Status, Cause, BadVAddr, Context, EntryHi</td>
<td>4(W)</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Notes:
- α There must be at least one instruction between a MTC0 and a MFC0.
- β TLBW instructions cause a one cycle slip.
- γ Instructions fetches following an ERET will see a change in EXL or ERL in Stage 2 of the ERET in anticipation of the completion of the ERET. If the ERET does not complete, these instructions are killed before they commit changes in state other than noted by d. The pipestage corresponding to the stage field is given in parentheses.
Certain combinations of instructions are not permitted because the results of executing such combinations are unpredictable in the face of the events such as pipeline delays, cache misses, interrupts, and exceptions.

Most hazards result from instructions modifying and reading state in different pipeline stages. Such hazards are defined between pairs of instructions, not on a single instruction in isolation. Other hazards are associated with restartability of instructions in the presence of exceptions.