Project

# Title Team Members TA Documents Sponsor
23 Portable RAW Reconstruction Accelerator for Legacy CCD Imaging
Arnav Gaddam
Guyan Wang
Yuhong Chen
Gerasimos Gerogiannis design_document1.pdf
other1.docx
other2.pdf
# **RFA: Portable RAW Reconstruction Accelerator for Legacy CCD Imaging**

Group Member: Guyan Wang, Yuhong Chen

## **1\. Problem Statement**

**The "Glass-Silicon Gap":** Many legacy digital cameras (circa 2000-2010) are equipped with premium optics (Leica, Zeiss, high-grade Nikon/Canon glass) that outresolve their internal processing pipelines. While the optical pathway is high-fidelity, the final image quality is bottlenecked by:

- **Obsolete Signal Chains:** Early-stage Analogue-to-Digital Converters (ADCs) and readout circuits introduce significant read noise and pattern noise.
- **Destructive Processing:** In-camera JPEGs destroy dynamic range and detail. Even legacy RAW files are often processed with rudimentary demosaicing algorithms that fail to distinguish high-frequency texture from sensor noise.
- **Usability Void:** Users seeking the unique "CCD look" are forced to rely on cumbersome desktop post-processing workflows (e.g., Lightroom, Topaz), preventing a portable, shoot-to-share experience.

## **2\. Solution Overview**

**The "Digital Back" External Accelerator:** We propose a standalone, handheld hardware device-a "smart reconstruction box"-that interfaces physically with legacy CCD cameras. Instead of relying on the camera's internal image processor, this device ingests the raw sensor data (CCD RAW) and applies a hybrid reconstruction pipeline.

The core innovation is a **Hardware-Oriented Hybrid Pipeline**:

- **Classical Signal Processing:** Handles deterministic error correction (black level subtraction, gain normalization, hot pixel mapping).
- **Learned Estimator (AI):** A lightweight Convolutional Neural Network (CNN) or Vision Transformer model optimized for microcontroller inference (TinyML). This model does not "hallucinate" new details but acts as a probabilistic estimator to separate signal from stochastic noise based on the physics of CCD sensor characteristics.

The device will feature a touchscreen interface for file selection and "film simulation" style filter application, targeting an output quality perceptually comparable to a modern full-frame sensor (e.g., Sony A7 III) in terms of dynamic range recovery and noise floor.

## **3\. Solution Components**

### **Component A: The Compute Core (Embedded Host)**

- **MCU:** STMicroelectronics **STM32H7 Series** (e.g., STM32H747/H757).
- _Rationale:_ Dual-core architecture (Cortex-M7 + M4) allows separation of UI logic and heavy DSP operations. The Chrom-ART Accelerator helps with display handling, while the high clock speed supports the computationally intensive reconstruction algorithms.
- **Memory:** External SDRAM/HyperRAM expansion (essential for buffering full-resolution RAW files, e.g., 10MP-24MP) and high-speed QSPI Flash for AI model weight storage.

### **Component B: Connectivity & Data Ingestion Interface**

- **Physical I/O:** USB OTG (On-The-Go) Host port.
- _Function:_ The device acts as a USB Host, mounting the camera (or the camera's card reader) as a Mass Storage Device to pull RAW files (.CR2, .NEF, .RAF, .DNG).
- **Storage:** On-board MicroSD card slot for saving processed/reconstructed JPEGs or TIFFs.

### **Component C: Hybrid Reconstruction Algorithm**

- **Stage 1 (DSP):** Linearization, dark frame subtraction (optional calibration), and white balance gain application.
- **Stage 2 (NPU/AI):** A quantization-aware trained model (likely TFLite for Microcontrollers or STM32-AI) trained specifically on _noisy CCD -to- clean CMOS_ image pairs.
- _Task:_ Joint Demosaicing and Denoising (JDD).
- **Stage 3 (Color):** Application of specific "Film Looks" (LUTs) selected by the user via the UI.

### **Component D: Human-Machine Interface (HMI)**

- **Display:** 2.8" to 3.5" Capacitive Touchscreen (SPI or MIPI DSI interface).
- **GUI Stack:** TouchGFX or LVGL.
- _Workflow:_ User plugs in camera -> Device scans for RAWs -> User selects thumbnails -> User chooses "Filter/Profile" -> Device processes and saves to SD card.

## **4\. Criterion for Success**

To be considered successful, the prototype must meet the following benchmarks:

- **Quality Parity:** The output image, when blind-tested against the same scene shot on a modern CMOS sensor (Sony A7 III class), must show statistically insignificant differences in perceived noise at ISO 400-800 equivalent.
- **Edge Preservation:** The AI reconstruction must demonstrate a reduction in color moiré and false-color artifacts compared to standard bilinear demosaicing, without "smoothing" genuine texture (measured via MTF charts).
- **Latency:** Total processing time for a 10-megapixel RAW file must be under **15 seconds** on the STM32 hardware.
- **Universal RAW Support:** Successful parsing and decoding of at least two major legacy formats (e.g., Nikon .NEF from D200 era and Canon .CR2 from 5D Classic era).

## **5\. Alternatives**

- **Desktop Post-Processing (Software Only):**
- _Pros:_ Infinite computing power, established tools (DxO PureRAW), highly customized.
- _Cons:_ Destroys the portability of the photography experience; cannot be done "in the field." Need to be proficient with parameters inside the software, which requires self-training and tutoring (not user-friendly).
- **Smartphone App (via USB-C dongle):**
- _Pros:_ Powerful processors (Snapdragon/A-Series), high-res screens, easy to use.
- _Cons:_ Lack of low-level control over USB mass storage protocols for obscure legacy cameras; high friction in file management; operating system overhead prevents bare-metal optimization of the signal pipeline; unique algorithms may not be suitable for legacy cameras.
- **FPGA Implementation (Zynq/Cyclone):**
- _Pros:_ Parallel processing could make reconstruction instant.
- _Cons:_ Significantly higher complexity, cost, and power consumption compared to an STM32 implementation; higher barrier to entry for a "mini project."

Smart Glasses for the Blind

Siraj Khogeer, Abdul Maaieh, Ahmed Nahas

Smart Glasses for the Blind

Featured Project

# Team Members

- Ahmed Nahas (anahas2)

- Siraj Khogeer (khogeer2)

- Abdulrahman Maaieh (amaaieh2)

# Problem:

The underlying motive behind this project is the heart-wrenching fact that, with all the developments in science and technology, the visually impaired have been left with nothing but a simple white cane; a stick among today’s scientific novelties. Our overarching goal is to create a wearable assistive device for the visually impaired by giving them an alternative way of “seeing” through sound. The idea revolves around glasses/headset that allow the user to walk independently by detecting obstacles and notifying the user, creating a sense of vision through spatial awareness.

# Solution:

Our objective is to create smart glasses/headset that allow the visually impaired to ‘see’ through sound. The general idea is to map the user’s surroundings through depth maps and a normal camera, then map both to audio that allows the user to perceive their surroundings.

We’ll use two low-power I2C ToF imagers to build a depth map of the user’s surroundings, as well as an SPI camera for ML features such as object recognition. These cameras/imagers will be connected to our ESP32-S3 WROOM, which downsamples some of the input and offloads them to our phone app/webpage for heavier processing (for object recognition, as well as for the depth-map to sound algorithm, which will be quite complex and builds on research papers we’ve found).

---

# Subsystems:

## Subsystem 1: Microcontroller Unit

We will use an ESP as an MCU, mainly for its WIFI capabilities as well as its sufficient processing power, suitable for us to connect

- ESP32-S3 WROOM : https://www.digikey.com/en/products/detail/espressif-systems/ESP32-S3-WROOM-1-N8/15200089

## Subsystem 2: Tof Depth Imagers/Cameras Subsystem

This subsystem is the main sensor subsystem for getting the depth map data. This data will be transformed into audio signals to allow a visually impaired person to perceive obstacles around them.

There will be two Tof sensors to provide a wide FOV which will be connected to the ESP-32 MCU through two I2C connections. Each sensor provides a 8x8 pixel array at a 63 degree FOV.

- x2 SparkFun Qwiic Mini ToF Imager - VL53L5CX: https://www.sparkfun.com/products/19013

## Subsystem 3: SPI Camera Subsystem

This subsystem will allow us to capture a colored image of the user’s surroundings. A captured image will allow us to implement egocentric computer vision, processed on the app. We will implement one ML feature as a baseline for this project (one of: scene description, object recognition, etc). This will only be given as feedback to the user once prompted by a button on the PCB: when the user clicks the button on the glasses/headset, they will hear a description of their surroundings (hence, we don’t need real time object recognition, as opposed to a higher frame rate for the depth maps which do need lower latency. So as low as 1fps is what we need). This is exciting as having such an input will allow for other ML features/integrations that can be scaled drastically beyond this course.

- x1 Mega 3MP SPI Camera Module: https://www.arducam.com/product/presale-mega-3mp-color-rolling-shutter-camera-module-with-solid-camera-case-for-any-microcontroller/

## Subsystem 4: Stereo Audio Circuit

This subsystem is in charge of converting the digital audio from the ESP-32 and APP into stereo output to be used with earphones or speakers. This included digital to audio conversion and voltage clamping/regulation. Potentially add an adjustable audio option through a potentiometer.

- DAC Circuit

- 2*Op-Amp for Stereo Output, TLC27L1ACP:https://www.ti.com/product/TLC27L1A/part-details/TLC27L1ACP

- SJ1-3554NG (AUX)

- Connection to speakers/earphones https://www.digikey.com/en/products/detail/cui-devices/SJ1-3554NG/738709

- Bone conduction Transducer (optional, to be tested)

- Will allow for a bone conduction audio output, easily integrated around the ear in place of earphones, to be tested for effectiveness. Replaced with earphones otherwise. https://www.adafruit.com/product/1674

## Subsystem 5: App Subsystem

- React Native App/webpage, connects directly to ESP

- Does the heavy processing for the spatial awareness algorithm as well as object recognition or scene description algorithms (using libraries such as yolo, opencv, tflite)

- Sends audio output back to ESP to be outputted to stereo audio circuit

## Subsystem 6: Battery and Power Management

This subsystem is in charge of Power delivery, voltage regulation, and battery management to the rest of the circuit and devices. Takes in the unregulated battery voltage and steps up or down according to each components needs

- Main Power Supply

- Lithium Ion Battery Pack

- Voltage Regulators

- Linear, Buck, Boost regulators for the MCU, Sensors, and DAC

- Enclosure and Routing

- Plastic enclosure for the battery pack

---

# Criterion for Success

**Obstacle Detection:**

- Be able to identify the difference between an obstacle that is 1 meter away vs an obstacle that is 3 meters away.

- Be able to differentiate between obstacles on the right vs the left side of the user

- Be able to perceive an object moving from left to right or right to left in front of the user

**MCU:**

- Offload data from sensor subsystems onto application through a wifi connection.

- Control and receive data from sensors (ToF imagers and SPI camera) using SPI and I2C

- Receive audio from application and pass onto DAC for stereo out.

**App/Webpage:**

- Successfully connects to ESP through WIFI or BLE

- Processes data (ML and depth map algorithms)

- Process image using ML for object recognition

- Transforms depth map into spatial audio

- Sends audio back to ESP for audio output

**Audio:**

- Have working stereo output on the PCB for use in wired earphones or built in speakers

- Have bluetooth working on the app if a user wants to use wireless audio

- Potentially add hardware volume control

**Power:**

- Be able to operate the device using battery power. Safe voltage levels and regulation are needed.

- 5.5V Max

Project Videos