Computational RAM
Encyclopedia
Computational RAM or C-RAM is random access memory with processing elements
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

 integrated into the design. This enables C-RAM to be used as a SIMD
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

 computer. It also can be used to more efficiently use memory bandwidth within a memory chip.

Perhaps the most influential implementations of computational RAM came from The Berkeley IRAM Project
The Berkeley IRAM Project
In a 1996–2004 research project in the Computer Science Division of the University of California, Berkeley, The Berkeley IRAM Project explored computer architecture enabled by the wide bandwidth between memory and processor made possible when both are designed on the same integrated circuit...

.

Some embarrassingly parallel
Embarrassingly parallel
In parallel computing, an embarrassingly parallel workload is one for which little or no effort is required to separate the problem into a number of parallel tasks...

 computational problems are already limited by the von Neumann bottleneck between the CPU and the DRAM.
Some researchers expect that, for the same total cost, a machine built from computational RAM will run orders of magnitude faster than a traditional general-purpose computer on these kinds of problems.

As of 2011, the "DRAM process" (few layers; optimized for high capacitance) and the "CPU process" (many layers; optimized for high frequency; relatively expensive per square millimeter) are distinct enough that there three approaches to computational RAM:
  • starting with a CPU-optimized process and a device that uses lots of embedded SRAM, add an additional process step (making it even more expensive per square millimeter) to allow replacing the embedded SRAM with embedded DRAM (eDRAM
    EDRAM
    eDRAM stands for "embedded DRAM", a capacitor-based dynamic random access memory integrated on the same die as an ASIC or processor. The cost-per-bit is higher than for stand-alone DRAM chips but in many applications the performance advantages of placing the eDRAM on the same chip as the processor...

    ), giving ~3x area savings on the SRAM areas (and so lowering net cost per chip).
  • starting with a system with a separate CPU chip and DRAM chip(s), add small amounts of "coprocessor" computational ability to the DRAM, working within the limits of the DRAM process and adding only small amounts of area to the DRAM, to do things that would otherwise be slowed down by the narrow bottleneck between CPU and DRAM: zero-fill selected areas of memory, copy large blocks of data from one location to another, find where (if anywhere) a given byte occurs in some block of data, etc. The resulting system—the unchanged CPU chip, and "smart DRAM" chip(s) -- is at least as fast as the original system, and potentially slightly lower in cost. The cost of the small amount of extra area is expected to be more than paid back in savings in expensive test time, since there is now enough computational capability on a "smart DRAM" for a wafer full of DRAM to do most testing internally in parallel, rather than the traditional approach of fully testing one DRAM chip at a time with an expensive external automatic test equipment
    Automatic test equipment
    Automatic or Automated Test Equipment is any apparatus that performs tests on a device, known as the Device Under Test , using automation to quickly perform measurements and evaluate the test results...

    .
  • starting with a DRAM-optimized process, tweak the process to make it slightly more like the "CPU process", and build a (relatively low-frequency, but low-power and very high bandwidth) general-purpose CPU within the limits of that process. The Berkeley IRAM Project
    The Berkeley IRAM Project
    In a 1996–2004 research project in the Computer Science Division of the University of California, Berkeley, The Berkeley IRAM Project explored computer architecture enabled by the wide bandwidth between memory and processor made possible when both are designed on the same integrated circuit...

    , TOMI Technology
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK