Research

My research interests are in developing energy efficient systems for portable multimedia devices. Power and battery life have become critical concerns with the ever-increasing use of multimedia applications, such as computational photography and video playback, on portable devices. Efficient integration of such high-complexity multimedia applications on battery-operated mobile devices is a key driver. I am interested in exploring power reduction techniques at various stages of the design, including hardware-oriented algorithms, highly-parallel architectures and low-voltage circuit design.

With continuing scaling of transistor technology, local variations are becoming very significant. I worked on developing a Statistical Static Timing Analysis (SSTA) methodology for performing timing analysis while designing ICs operating at ultra-low voltages, as part of my master's thesis. This technique can accurately predicts the statistical circuit performance in presence of transistor variations at low voltage.

Reconfigurable Processor for Computational Photography

Computational photography applications significantly extend and enhance the capabilities of existing cameras. The high computational complexity of such multimedia processing applications necessitates fast hardware implementations to allow real-time processing. This work implements a reconfigurable multi-application processor to enable energy-efficient real-time computational photography on portable multimedia devices.

The reconfigurable hardware implements Bilateral filtering - a non-linear filtering technique with wide range of computational photography applications, and implements it using a Bilateral Grid structure, which represents an image using a 3D data structure and filters it using a 3D Gaussian kernel. The processor implements High Dynamic Range (HDR) imaging, Low-Light Enhancement, by merging flash and non-flash images such that the natural scene ambience is preserved while achieving high details and low noise, and Glare Reduction. The filtering engine can also be accessed from off-chip and used with other applications.

The implementation significantly accelerates bilateral filtering and enables various edge-aware image processing applications in real-time on HD images. The processor, implemented using 40 nm CMOS technology, is operational from 25 MHz at 0.5 V to 98 MHz at 0.9 V. The testchip achieves 13 megapixel/s throughput while consuming 1.4 mJ/megapixel energy at 0.9 V - a significant energy reduction compared to CPU/GPU implementations.

The processor is combined with an FPGA, which implements a DDR2 memory interface and an USB interface. This allows processor integration with DDR2 memory, camera and a host PC and provides a portable system for live computational photography.

Publications

  • R. Rithe, P. Raina, N. Ickes, S. V. Tenneti, A. P. Chandrakasan, "Reconfigurable Processor for Energy-Efficient Computational Photography," IEEE Journal of Solid-State Circuits (JSSC), Vol. 48, No. 11, 2908-2919, November 2013. [paper].

  • R. Rithe, P. Raina, N. Ickes, S. V. Tenneti, A. P. Chandrakasan, "Reconfigurable Processor for Energy-Scalable Computational Photography," IEEE International Solid-State Circuits Conference (ISSCC), 164-165, February 2013. [paper | slides]

  • Live Demonstration at the ISSCC Demo Session. [video]

  • Live Demonstration at the MIT Wireless Retreat. [video]

In the News

    Picture Perfect: Quick, efficient chip cleans up common flaws in amateur photographs. (MIT News)
    Image Processor Makes for Better Photos and Performance (IEEE The Institute)
    MIT imaging chip creates natural-looking flash photos. (Engadget)
    MIT's new chip promises 'professional-looking' photos on your smartphone. (DPReview)
    MIT's developing a chip that makes you a better smartphone photographer. (Gizmodo)
    Improve your smartphone's photo quality with this chip. (Mashable)

Multi-Standard Low-Power Video Coding

This project aims to design a reconīŦgurable video encoder supporting H.264/AVC High Profile and VC-1 Advanced Profile video coding standards with 4k x 2k resolution at 30fps, implemented on a single low-power ASIC. The work explores algorithmic, architectural, and circuit-level innovations that can be applied to each of the functional blocks in a multi-standard video encoder to enable low-voltage operation while maintaining performance.

Transform engine is a critical part of the video codec and increased coding efficiency often comes at the cost of increased complexity in the transform module. In this work we propose a shared-reconfigurable transform engine for H.264/AVC and VC-1 video coding standards, using the structural similarity and symmetry of the transforms for H.264/AVC and VC-1. An approach to eliminate the need for an explicit transpose memory in 2D transforms is proposed. Data dependency is exploited to reduce power consumption. Ten different versions of the transform engine, such as with and without hardware sharing, with and without transpose memory, are implemented in the design. The design is fabricated using commercial 45nm CMOS technology and all implemented versions are verified. The shared-reconfigurable transform engine without transpose memory supports Quad Full-HD (3840x2160) video encoding at 30fps, while operating at 0.52V, with measured power of 214 uW.

Publications

  • R. Rithe, C. C. Cheng, A. Chandrakasan, "Quad Full-HD Transform Engine for Dual-Standard Low-Power Video Coding," IEEE Journal of Solid-State Circuits (JSSC), Vol. 47, No. 11, 2724-2736, November 2012. [paper]

  • R. Rithe, C. C. Cheng, A. Chandrakasan, "Quad Full-HD Transform Engine for Dual-Standard Low-Power Video Coding," IEEE Asian Solid-State Circuits Conference (A-SSCC), 401-404, November 2011. [paper | slides]

SSTA Design Methodology for Low Voltage Operation

In order to achieve ultra-low power (ULP), ICs are being designed for supply voltages less than 0.5V. At these low voltages, random dopant fluctuations (RDFs) result in a stochastic component of logic delay that can be comparable to the global corner delay. Moreover, the probability density function (PDF) of this stochastic delay can be highly non-Gaussian. In order to predict the statistical impact of RDF-induced local variations on logic timing, it is necessary to incorporate these effects into a timing closure methodology. This work proposes a computationally efficient methodology for stochastic characterization of standard cell libraries at low voltage, where the cell delay is a nonlinear function of the transistor random variables (RVs), and the resulting cell delay has a non-Gaussian PDF. It also presents a computationally efficient methodology for computing any point on the PDF of a timing path (TP) delay, in the case where cell delays are non-Gaussian. The method is called Operating Point Analysis (OPA). This work develops the general OPA theory and applies to cell library characterization, timing path analysis and full-chip timing closure. The approach has been implemented using commercial CAD tools, and integrated into a commercial IC design flow.

The approach is validated by comparison to Monte Carlo simulation. The OPA approach gives timing results that are within 5% accuracy compared to Monte-Carlo analysis at 0.5V. This compares to errors on the order of 50% with Gaussian SSTA. Timing closure using OPA in the design of a 28nm DSP SoC IC ensures reliable operation down to 0.6V while minimizing the area and power overhead.

Publications

  • N. Ickes, G. Gammie, M. E. Sinangil, R. Rithe, J. Gu, A. Wang, H. Mair, S. Datla, B. Rong, S. Honnavara-Prasad, L. Ho, G. Baldwin, D. Buss, A. P. Chandrakasan, U. Ko, "A 28 nm 0.6 V Low Power DSP for Mobile Applications," IEEE Journal of Solid-State Circuits (JSSC), Vol. 47, No. 1, 35-46, January 2012. [paper]

  • R. Rithe, S. Chou, J. Gu, A. Wang, S. Datla, G. Gammie, D. Buss, A. Chandrakasan, "The Effect of Random Dopant Fluctuations on Logic Timing at Low Voltage," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 20, No. 5, 911-924, May 2012. [paper]

  • G. Gammie, N. Ickes, M. Sinangil, R. Rithe, J. Gu, A. Wang, H. Mair, S. Datla, B. Rong, S. Honnavara-Prasad, L. Ho, G. Baldwin, D. Buss, A. Chandrakasan, U. Ko, "A 28nm 0.6V Low-Power DSP for Mobile Applications," IEEE International Solid-State Circuits Conference (ISSCC), 132-133, February 2011. [paper | slides]

  • R. Rithe, S. Chou, J. Gu, A. Wang, S. Datla, G. Gammie, D. Buss, A. Chandrakasan, "Cell Library Characterization at Low Voltage using Non-Linear Operating Point Analysis of Local Variations," International Conference on VLSI Design, 112-117, January 2011. [paper | slides]

  • R. Rithe, J. Gu, A. Wang, S. Datla, G. Gammie, D. Buss, A. Chandrakasan, "Non-Linear Operating Point Statistical Analysis for Local Variations in Logic Timing at Low Voltage," Design, Automation and Test in Europe (DATE) Conference, 965-968, March 2010. [paper | slides]

In the News

    MIT, TI Describe 28nm Mobile Apps Processor. (PCMag)
    MIT, TI tip 28-nm app processor. (EE Times)
    ISSCC: TI and MIT see DSP running off 0.6V. (Electronics Weekly)
    Texas Instruments & MIT: A 28nm 0.6V Low-Power DSP for Mobile Applications. (EDN Network)

Awards

  • MTL Annual Research Conference Best Presentation Award - 'Reconfigurable Processor for Computational Photography', 2013
  • Ernst Guillemin Award for best S.M. thesis in Electrical Engineering, MIT, 2010
  • MTL Annual Research Conference Best Presentation Award - 'Low-Power Multi-Standard Ultra-HD Video Codec', 2010
  • Best B.Tech. Thesis Award, IIT Kharagpur, 2008
  • InfoUSA Summer Research Fellowship, University of Southern California (USC), 2007