Welcome to SecureRISC.org

I created this site to explore unconventional ideas in processor architecture. For decades, these concepts existed only as notes, scribbles, and thoughts. Recently, however, I decided to document some of this material in the hope that they might prove useful someday, even if only as a basis for discussion.

Table of Contents
The Author’s Journey to This Proposal
One Particular Exploration
Block Structured Renaming
RISC-V Proposals

The Author’s Journey to This Proposal

In the 1970s, I was programming PDP-11s (in assembler), PDP-10s (in assembler and Lisp on ITS), and Multics (in PL/1). Later in my career, I worked on a Common Lisp implementation for MIPS processors. Also in the 1970s, I observed colleagues developing computational fluid dynamics codes for the CDC 6600 and, later, the Cray-1. This indirect exposure gave me an appreciation of the Cray-1 approach to processor architecture, including RISC and vectors. Additionally, friends were working on Lisp machines (e.g., the Symbolics 3600), sparking my interest in some of their problem solutions. After this early exposure to Lisp, I have retained an interest in better support for it and similar languages, which features dynamic typing and garbage collection, but more in the context of high-performance architectures such as the Cray-1.

Since the early 1980s, I have been involved in designing processor architectures. My first experience began while contributing to a compiler (Pastel) and operating system (Amber) for a new processor designed by others. The processor, the S-1 Mark II, was 72 boards of ECL boards of logic, which led some of us to consider creating a simpler processor. We began work on what we, somewhat unimaginatively, called the S-2. This processor was designed to fit on two boards of ECL logic. While I was unfamiliar with the term at the time, the S-2 closely resembled the RISC ideas being developed in academia, except it had exactly one memory operand per instruction (similar to the PDP-10), but was otherwise a simple fixed 32-bit instruction format with a general purpose register file and simple pipelined instructions. Many machines of the era relied on microcode, but the S-2 avoided it.

The S-1 Mark II operating system team was writing a new operating system that incorporated many ideas from Multics alongside novel concepts. Consequently, both the Mark II and the S-2 included Multics-like virtual memory and protection features. In subsequent years, I worked on processor architectures without such features, which the new operating system dominating the scene, Unix (later Linux), did not know how to use. Despite this, I have maintained an interest in reviving those features for future systems.

In 1985, I joined the MIPS compiler team to work on the code generator, but later transitioned to the role of Director of Architecture after the original architect left. As a member of the compiler team, I disagreed with several design choices in the original MIPS ISA, such as the absence of load interlocks, the use of branch delay slots, absolute addresses in JAL, among others. However, it was too late to make changes, and many at MIPS thought these were central features of the ISA. These roles provided me with a close-up view of academic RISC architectures. One of my early tasks as an architect was to define the 64‑bit extension to the existing 32‑bit ISA in the late 1980s. Since then, I have generally believed that 32‑bit ISAs should be retired in favor of 64‑bit ones.

In the late 1990s, I was working at SGI on out-of-order processors and conducted various studies on the available Instruction Level Parallelism (ILP) in programs. These studies assumed infinite parallelism but accounted for latencies based on cache hierarchies and branch prediction. For example, a L1 data-cache hit might have a 2‑cycle latency, while a L1 miss, L2 hit might have a 6‑cycle latency. What fascinated me was that instruction fetch emerged as the limiting factor for ILP in these studies, likely because branch prediction in the 1990s was not particularly advanced. With modern branch prediction techniques, such as TAGE, this may no longer hold as true. However, I still believe that addressing the instruction fetch bottleneck can still be valuable. The issue with instruction fetch is that it resembles linked list processing. Even worse than simple list processing, there is parsing and arithmetic at each node to find the next link. Linked list processing is particularly latency-sensitive and is often replaced with array processing in high-performance computing, where possible, because of its performance advantages. I explored ways to make instruction fetch behave more like array processing but didn’t succeed in fully realizing that goal. Instead, I focused on reducing the parsing required at each list node. I refer to this general approach as a block-structured instruction set. One can think of it as replacing the Branch Target Buffer (BTB), which most contemporary processors generate dynamically, with a compiler-generated structure resident in code segments and cached in a new cache. This approach turned out to have many other advantages (e.g., prefetching, line size fills, control flow integrity features, better, support for parallel instruction decode, and so on), and I have retained an interest in exploring the potential of block-structured Instruction Set Architectures (ISAs).

It has long been evident that poorly designed programming languages, such as C and C++, have encouraged the development of software riddled with security vulnerabilities. Early in my career, I believed our industry would learn to do better over time. By the late 1990s, however, I began to have doubts. Rather than hope for the best, I started to consider the value of incorporating processor features that could assist languages, compilers, and operating systems in checks that would detect such vulnerabilities. As the decades have passed with minimal progress in this area, I have maintained an interest in exploring how processor architecture could help close this gap.

Also in the late 1990s, when I was working at Silicon Graphics, there were some there proposing micro-architectural alternatives to Out-of-Order (OoO) for latency tolerance. I don’t know, but I suspect this was based on the Decoupling in earlier scientific computing processors such as the Astronautics ZS-1. While I don’t think such decoupling is a replacement for OoO architectures, I have retained an interest in instruction sets that keep decoupling feasible, which either might be combined with OoO, or used in low-end implementations of an ISA that are less susceptible to some of the many security flaws introduced by speculation on OoO implementations (e.g., Spectre, Meltdown, Foreshadow, PACMAN, Retbleed, etc.).

Between 1997 and 2001 I worked on a processor architecture that targeted, among other things, Digital Signal Processing (DSP), and in that field it was common for processors to have zero-overhead loop features, and so Tensilica’s Xtensa ISA did as well. Given that such features solve branch prediction for such loops, and thereby generally make branch prediction elsewhere more effective, I have retained an interest in ISA features that may help in this regard, even if not necessarily being zero overhead. (A simple zero overhead loop feature in a block-structured ISA might be the ability to repeat the next basic-block N times, but that would not be general enough to support loops with branches, so I have tended toward other methods.)

Tensilica’s Xtensa ISA’s highest design priorities were extensibility and code size, and I chose to give Xtensa register windows because of its enormous benefits to code size, primarily in making function entry/exit small, which also optimizes performance for function-call intensive programs. Unfortunately, I don’t see register windows as fitting particularly well into the current ISA proposals, given the heterogeneous register file, despite their advantages for function-call intensive languages and programs. Xtensa’s register windows were much more efficient than SPARC’s (I think SPARC basically killed the register window idea for subsequent ISAs by making the register window increment 16—Xtensa has instead increments of 0, 4, 8, and 12 which allows it to get more utility out of 64 physical registers than SPARC did with its 144).

In 2022 I encountered the University of Cambridge Capability Hardware Enhanced RISC Instructions (CHERI) research effort. I found their work impressive, despite some concerns, and have sought to make my proposals be CHERI-capable for applications which can tolerate doubleword pointers. I don’t see doubleword pointers being used for everything in a system however, and so the current proposals support CHERI capabilities without requiring them everywhere.

Given the above interests, I have tinkered with a few different processor architectures that combine these things: more sophisticated virtual memory and protection, block-structured ISAs, better branch prediction, and things that address security, such as better bounds and other checking, CHERI capabilities, and support for garbage collection and dynamic typing. Some of these things are synergistic. For example, garbage collection can be security feature, as explicit memory reclamation can lead to programming errors that introduces security issues, and some of the features that support dynamic typing support other capabilities. This synergy has encouraged me to propose architectures with all of these features.

In the late 2010s, long after I had been thinking about unconventional processor architectures, I was introduced to the RISC‑V ISA, which seemed very conventional, i.e. much like the MIPS ISA that I had worked on in the 1980s and 1990s, but which had cleaned up MIPS’ worst warts and worked on modernizing its virtual memory from the days when I last followed it. Thus RISC‑V is very much a conventional ISA, and because RISC‑V is open-source, I find it a useful point of comparison to my explorations, and I have often modified my exposition to be more RISC‑V centric and even to adopt some of RISC‑V’s innovations when they fit.

In 2023 after I described a little of the block-structured ISA idea to a colleague, he sent me to Bird et al.’s 1993 Supercomputing paper The Effectiveness of Decoupling. The paper did not unfortunately go into detail on the mechanisms, so it is not possible to say how much of their Control Decoupling presages the block-structured ISA, but there is a modest amount of similarity between what they called the Control Processor and in what I propose below, where it is called the Basic Block Engine. Their Address and Data Processor separation is also similar to some of the structures in the proposed ISA that facilitate decoupling of address generation and computation along the lines of the ZS-1 cited earlier.

One Particular Exploration

One particular exploration of these ideas has been developed a little further than others. For the time being I am calling it SecureRISC in the hope that over time I can improve it to live up to its name. It has been in the back of my mind for decades, but it got slightly more attention after my last full-time employment ended in 2001. Over time, I might introduce other explorations on these pages. For example, while SecureRISC is block-structured, there are additional ways in which one might take block-structured ISAs further, such as facilitating register renaming on blocks, rather than on individual instructions (I have tinkered with this and it seems promising). But leave that for the future.

Please understand that SecureRISC is not a specification at this point. It is a set of explorations, some spelled out in detail, some less specific, with the intent by writing them down it could lead to useful discussion. So with that introduction, here is the current SecureRISC proposal.

Block Structured Renaming

Since I mentioned it above, I will elaborate slightly. Imagine the basic block descriptor included, in addition to what SecureRISC includes, the set of source registers used by the basic block, and the set of output registers of the basic block. Renaming could be done for the basic block as a whole, rather than on each instruction in the block. Within the basic block, instruction sources would either reference the Nth source register to the block or the result of the Nth instruction local to the basic block. Instructions would not need explicit destination register fields as a result (this would be in the basic block descriptor).

RISC-V Proposals

SecureRISC is the primary target for these pages, but occasionally I take ideas from SecureRISC and adapt them for RISC‑V. They are on here primarily because of the tools I created for producing SecureRISC register figures. If any of these proposals were to generate interest in the RISC‑V world, it would be necessary to convert them to asciidoc.

RISC-V Garbage Collection

A primary goal of SecureRISC is to support Garbage Collection (GC) efficiently and SecureRISC Garbage Collection describes the proposal for this. Most of what is proposed for SecureRISC has been adapted to RISC‑V and is described in proposal for RISC‑V GC.

Alternative 64‑bit Translation for RISC‑V

Currently 64‑bit RISC‑V has Sv39, Sv48, and Sv57 translation models for its supervisors using 3, 4, and 5‑level page tables with 512 PTEs per level for virtual address spaces of −2³⁸..2³⁸−1, of −2⁴⁷..2⁴⁷−1, and −2⁵⁶..2⁵⁶−1 respectively, with a 56‑bit physical address space. An obvious extension to Sv64 using a 6‑level page table for an address space of −2⁶³..2⁶³−1 is likely someday. As an alternative, I have created a 64‑bit translation for RISC‑V called Ssv64 based on a subset of the SecureRISC translation proposal that I believe has significant advantages compared to the existing three models and the obvious extension to 64 bits.

Ssv64 was designed to be as RISC‑V Sv57 etc. compatible as possible, which meant changing a number of things carried over from SecureRISC. In March 2023 I back-ported some of those changes into SecureRISC, since in most cases they don’t reduce SecureRISC functionality and gratuitous incompatibility isn’t helpful.

Alternative Smmtt Proposal

Proposal for Alternative Smmtt is a proposal that borrows from Ssv64 above to replaced the fixed table structure in the proposed RISC‑V Smmtt extension with something more flexible.

Tagged RISC-V

The RISC‑V multiverse concept has been proposed, with universes representing aligned entities within a general RISC‑V framework. For example, CHERI RISC‑V is considered a separate universe from the main line of RISC‑V. I propose that a stepping stone to SecureRISC might be a new universe called Tagged RISC‑V, which is basically the RISC‑V instruction set with SecureRISC Virtual Memory and tagging, including using two tags for CHERI-128 capabilities. Unfortunately this loses the Block Structured aspect of SecureRISC, and the Control Flow Integrity aspects associated with it, but it can provide bounds checking with CHERI, or sized pointers or cliques, as well as better Garbage Collection, and support for runtime typing.

Proposal for RISC‑V Matrix

SecureRISC has a matrix accumulator feature that I propose for extending the RISC‑V Vector ISA for AI.

SecureRISC.org Mail Policy
Do not send unsolicited commercial email (i.e. spam) to this site!
We reserve to right to charge up to US$5000 per violation.

		<webmaster at securerisc.org>
2024-09-02