Augmented Object
Intelligence XR
A mobile AR capstone that makes everyday objects intelligent — point your camera at anything and AI-generated context appears anchored to it in 3D space. Built on Google's XR-Objects research, extended into a functional prototype across four sprints. My role was integration engineering and QA: the work that kept the Unity project building, the team unblocked, and every feature tested before it shipped.
// 01 — The Goal
Making objects
intelligent.
The core idea behind this project is simple: what if you could point your phone at anything and the world explained itself back to you? Most AR experiences today overlay fixed, pre-authored content — a label, a logo, a marker. AOI XR takes a different approach. Instead of pre-authored responses, it uses a live LLM to generate context-aware answers the moment an object is detected. The overlay isn't static — it's a real AI response, tailored to that specific object, in that moment.
This project extends Google's published XR-Objects research, which demonstrated the concept but stopped short of a mobile-deployable prototype. Our goal was to close that gap — to build something that actually runs on an iPhone or Android device, is accessible to everyday users, and can serve real-world needs across education, healthcare, retail, and beyond. The Sprint 4 scavenger hunt demo is a concrete proof of that: a fully playable, LLM-driven AR experience running on a physical device.
Democratize intelligent AR
XR-Objects existed as a research demo. We wanted to make it real — running on an ordinary smartphone, not a research rig. No special hardware, no pre-authored content, just point and learn.
Accessibility first
A strong emphasis on inclusive design: colorblind-safe confidence palettes, support for visually impaired and neurodiverse users, and gesture/voice as alternative input methods alongside tap.
Real-world applicability
Not a toy — a foundation for practical use. A student sees an unfamiliar lab instrument and asks what it does. A nurse checks a medication bottle. A warehouse worker identifies a part. The LLM adapts to each context.
Designed for real domains
// 02 — How It Works
Four stages.
Real time.
Every time the camera detects an object, four stages run in sequence — from raw camera frame to a 3D text anchor appearing in the user's physical environment. Each stage is a deliberate architectural choice balancing latency, accuracy, and device constraints.
frames & spatial data
object + hand tracking
→ LLM REST API
world-space canvas
to physical object
Stage 1 — Detection
AR Camera · XRCameraSubsystem · MediaPipe
AR Foundation's XRCameraSubsystem streams frames from the device camera while XRSessionSubsystem manages AR lifecycle. MediaPipe Unity Plugin runs a TensorFlow Lite object detection model on-device — no cloud vision call — and outputs detected object labels and bounding boxes each frame.
- ≥5 FPS detection rate target
- 10+ concurrent detections supported
- On-device ML — privacy preserved, no streaming
- Hand tracking enabled for gesture input
Stage 2 — LLM Query
AOI Integration Manager · UnityWebRequest · REST LLM
The AOI Integration Manager receives the detection label, builds a structured natural-language prompt, and fires it to the LLM backend via UnityWebRequest. The backend — evaluated across Google PaLI, LLaMA, and OpenAI-compatible APIs — returns a contextual response. The manager routes that text to the anchor system. Target response time is under 100ms.
- <100ms response time target
- Swappable backend — endpoint and auth header only
- API key secured via environment variables
- Mock fallback for offline development and testing
Stage 3 — Spatial Anchoring
AR Foundation · World-Space Canvas · UIAnchorManager
The UIAnchorManager creates a world-space Unity Canvas at the detected object's estimated 3D position. A billboard effect ensures it always rotates to face the camera. Object pooling recycles anchor components instead of instantiating and destroying them every frame, eliminating the GC pressure that would cause visible stutters during sustained detection sessions.
- Billboard effect — readable from any angle
- Confidence-coded color with colorblind-safe palette
- Object pool — zero GC allocations at runtime
- Fade-in/out + pulse animation coroutines
Sprint 4 — Scavenger Hunt
LLM item gen · Tap-to-detect · AR Marker Spawner
The final sprint pivoted the detection pipeline into an AR scavenger hunt for the live class demo. The LLM generates a 5–10 item list appropriate to the physical environment (classroom, outdoor, etc.), a new TapToDetectionFeeder feeds taps into the detection pipeline, a DetectorBridge formats results for matching, and AR markers spawn on detected surfaces via raycasting. A full session lifecycle runs from main menu through gameplay to end screen.
- LLM generates context-appropriate item lists
- Tap → detect → match → highlight → register
- AR markers placed on detected planes via raycast
- Session timer · ascending score · clean end state
// 03 — Architecture
Why we built it
this way.
Each technology decision was a deliberate trade-off. Understanding these choices is part of understanding the project — not just what was built, but why.
.meta file, a scene load order issue, or a misplaced serialized field can leave the project compiling but broken at runtime. Feature branch development with mandatory PR reviews caught several of these before they reached main. The CI pipeline adds automated checks specifically designed for Unity failure modes: meta file validation, null safety scanning, script compilation verification, and a smoke test that fails the build on any logged exception during scene initialization.
// 04 — My Role
Integration engineer.
The glue of the project.
My contribution was less about owning a single feature and more about owning reliability. When the Unity project entered Safe Mode in Sprint 4, I diagnosed and fixed the compilation errors that blocked the entire team. When scene transitions broke after a merge, I rewired the Build Settings and inspector references. When the AR camera rendered a yellow background or black screen on an iOS build, I traced the URP clear flag configuration and fixed it. Alongside that integration work, I ran QA across every sprint and contributed to the CI/CD pipeline gates that prevented these issues from recurring.
| Contribution | What I did | Sprint |
|---|---|---|
| AR Foundation Setup | Set up Unity's AR Foundation subsystem and provider layer in Sprint 1 — wiring XRSessionSubsystem, XRCameraSubsystem, XRPlaneSubsystem, and XRImageTrackingSubsystem to their ARKit (iOS) and ARCore/OpenXR (Quest) providers. Configured Player Settings for AR compatibility across iOS and Android. This established the AR session lifecycle and camera intrinsics pipeline that every later feature relied on. |
S1 |
| iOS Build Work | Committed iterative iOS builds throughout the project to validate AR rendering on physical hardware. Identified the Unity → Xcode connection error in Sprint 1 (documented as a known bug, fixed in Sprint 2). In Sprints 2–3 debugged the full iOS build pipeline — camera clear flags, URP transparency, background color, and Xcode export settings — and documented iOS-specific AR build differences for camera permissions and device compatibility. | S1 – S3 |
| AR Rendering Fixes | Debugged URP transparency and background rendering for iOS builds — fixed camera clear flags, background color (yellow bug), and scene lighting resets after merges. Resolved black screen after URP build. Maintained consistent AR camera rendering across merges and Player Settings changes across both sprints. | S2 · S3 |
| Scene Management | Organized Asset and Scene hierarchy for cross-platform builds. Merged MainScene.unity after binary conflict. Verified scene references, prefab integrity, and material assignments after every teammate merge across three sprints to prevent silent breakage. |
S1 – S3 |
| AOISetupHelper Expansion | Expanded AOISetupHelper in Sprint 3 so the full XR stack installs automatically — auto-creating ARSession, ARSessionOrigin, AR Camera, world-space Canvas, and required Prefabs on scene load. This eliminated the manual per-machine setup friction that caused inconsistent environments across the team, and validated camera permissions and anchor pool creation as part of the auto-init flow. |
S3 |
| Anchor Debug Panel | Built UIAnchorDebugPanel — in-editor context menu buttons for running deterministic anchor smoke tests, counting active anchors, and clearing the pool. Validates the full anchor pipeline without needing a physical AR device connected. |
S3 |
| Anchor Lifecycle | Refactored UIAnchor to properly handle fade-in/out, pulse animation, and click callbacks. Fixed coroutine timing conflicts that caused anchors to persist after their lifetime expired. Implemented UpdateAnchor() for dynamic detection refreshes. |
S3 |
| CI/CD Extensions | Added pre-commit C# formatter (whitespace + naming rules), script compilation validation, missing .meta file detection, and pipeline logging. Wired the CI smoke test to fail the build if any exception is logged during scene initialization — turning a manual catch into an automated gate. |
S3 |
| UI & Scene Flow | Implemented Entry/Main Menu screen, gameplay navigation, and Exit/Quit for the Sprint 4 scavenger hunt. Repaired scene management connections across menu → gameplay → end screen, including correcting Build Settings ordering and rewiring broken inspector references after merge. | S4 |
| Compilation Unblocking | Resolved the Unity Safe Mode startup issue that prevented the project from running — identified invalid class structures, misplaced [SerializeField] attributes, and missing type references introduced by a merge, then corrected them to restore the project for the team before the live demo. |
S4 |
| Code Reviews | Reviewed PRs each sprint: scene merges, MediaPipe simulation updates, AOISetupHelper, UI anchor adjustments. Feedback covered null reference handling around Camera.main, animation coroutine cleanup, confidence threshold hardcoding, and Unity naming conventions. |
S1 – S4 |
| Project Coordination | Managed team communication across all four sprints. Logged merge issues and bug reports clearly so teammates could act without re-investigating the same problem. Divided tasks for Sprint 4, ran recap meetings, and contributed to all sprint deliverable documentation. | S1 – S4 |
// 05 — Quality Assurance
Testing a system
you can't easily mock.
AR systems are notoriously hard to test — the environment is physical, the camera feed is live, and subsystems like ARKit and ARCore don't have simple unit-testable interfaces. The test strategy had to work around this by splitting coverage across four approaches: automated smoke tests in CI, manual functional tests on device, in-editor simulation via mock data, and formal test case documentation with build-linked results for traceability.
8 Testing Methodologies
Open-ended sessions on physical devices to find edge cases the spec didn't anticipate — especially around AR rendering, camera permissions, and lighting changes in real environments.
Scene load and AR subsystem initialization verified on every CI run. The UIAnchorDebugPanel enables deterministic in-editor smoke tests without a connected AR device.
Full end-to-end flows: object detected → LLM queried → anchor placed → user taps anchor → result displayed. Tests the integration seams between components, not just individual units.
Frame rate benchmarks targeting ≥5 FPS detection, <100ms LLM response, and memory usage under 2GB during sustained AR sessions. Anchor pooling verified to eliminate GC allocation spikes.
API key exposure scanning (no hardcoded keys in source), input validation for LLM prompts (injection prevention), and camera data privacy (no unnecessary storage or transmission of frames).
Colorblind-safe confidence colour palette verified across three standard colour blindness types. Anchor text contrast ratios checked. Foundation for future text-to-speech and haptic output.
Verifying the full detection → manager → LLM → anchor chain works together across real build targets, not just in editor. Cross-platform parity between iOS ARKit and Android ARCore builds.
Post-merge re-runs of all test cases after scene conflicts, PR merges, or Player Settings changes — preventing the category of bug where a fix in one area silently breaks another.
31 Test Cases Across 4 Sprints
Each test case includes component scope, configuration, exact steps, expected result, and a linked build commit for traceability. Sprints 1–3 built the core anchor and detection coverage; Sprint 4 added tap-to-detect and scavenger hunt session flow.
// 06 — CI / CD Pipeline
Automated gates
for a Unity project.
Unity projects break in ways that aren't obvious from a diff. A missing .meta file, a
scene serialized with a stale GUID, or a missing component reference compiles cleanly but crashes
at runtime. The CI/CD pipeline was designed around these Unity-specific failure modes —
with quality gates that catch issues before they reach main and cause the team to lose
a working build. Every push and pull request runs the full pipeline automatically.
main / develop
meta validation
Play Mode tests
fail on exception
GitHub artifact
version tag
- Pre-commit C# formatter — enforces whitespace and naming rules before a commit lands, so style issues never enter PR review
- Null safety scan — counts null checks in every C# script; flags components that dereference without guarding (the root cause of BUG-001)
- Error handling verification — scans for try-catch coverage in async and network-facing code paths
- Missing .meta file detection — Unity silently breaks when .meta files are absent from commits; this gate catches them before merge
- Script compilation validation — runs Unity's headless compiler to surface compile errors that only appear on specific platform targets
- Smoke test gate — boots the Unity scene in Edit Mode via CLI; if any exception is logged during initialization, the build fails immediately
- Unity Edit Mode and Play Mode test execution on every push
- Multi-platform build matrix — iOS (ARKit), Android (ARCore)
- Component initialization checks — Start/Awake present in MonoBehaviours
- Input validation patterns — IsNullOrEmpty / IsNullOrWhiteSpace coverage
- API key security scan — regex-based detection, excludes valid placeholder strings
- Code quality checks — TODO/FIXME detection, script file counting
- Artifact upload — build outputs stored per run for download and device deployment
- Cross-platform compatibility — Ubuntu, macOS, and Windows runners tested
Security & Threat Model
The project included a formal threat model identifying four attack surfaces and their mitigations — integrated into the CI pipeline rather than left as documentation only.
| Asset / Surface | Risk | Mitigation |
|---|---|---|
| AR Camera Data | Data leakage — raw camera frames captured or intercepted | TLS encryption on all API calls; frames never stored or transmitted beyond the LLM prompt; no cloud vision API used for detection |
| LLM API Key | Key exposure in source code or logs | Environment variables only — never hardcoded; CI security scan with regex detection on every push; excluded placeholder strings from false-positive triggering |
| LLM Prompt Input | Prompt injection via crafted object labels or user input | Input validation on all user-generated strings before they enter the prompt template; IsNullOrEmpty guards on detection labels; CI scan for validation patterns |
| Cloud Storage / Backend | Unauthorized access to backend APIs or stored data | Role-based access control; Firebase authentication for backend endpoints; TLS/SSL on all client-server communication |
// 07 — Team
Started as 3.
Grew to 5 for the demo.
The project ran through CPSC 490/491 with Alyssa Barrientos and Riya Jain as the primary two contributors after a third original team member withdrew following Sprint 1. Three additional contributors joined for Sprint 4 to build the scavenger hunt prototype for the live class demo.
- AR rendering & URP fixes
- Scene management & merges
- Anchor debug tooling
- UI / scene flow (Sprint 4)
- Compilation unblocking
- Code reviews · QA · coordination
- MediaPipe integration
- AOI Integration Manager & LLM backend
- GitHub Actions CI/CD pipeline
- Formal test cases (TC-001–005, TC-019–031)
- Bug tracking & operations docs
- David — LLM item list gen, timer, screens
- Mohamed — session lifecycle & interaction flow
- Marco — offline generator, AR marker spawner, UI manager
// 08 — Future Work
Where this goes
from here.
The prototype proves the concept — detection, LLM query, and spatial anchor rendering all running in a real AR session. These are the natural next engineering steps to close the gap between research prototype and production system.
On-Device LLM
Replace the REST call with a quantized model (Phi-3, Gemma 2B, or LLaMA 3 8B) running via llama.cpp or ExecuTorch on the device. Eliminates network latency, API key exposure, and the offline failure mode entirely. Feasible on iPhone 15 Pro and modern Android flagships.
Vision-Language Input
Upgrade from label-based prompting (MediaPipe outputs "chair" → send "chair" to LLM) to sending a cropped image patch to a vision-language model (LLaVA, PaLI-Gemma). Richer semantic context, better responses for ambiguous or unusual objects the detector labels poorly.
Persistent Spatial Memory
Use ARKit Scene Reconstruction or ARCore's Geospatial API to persist anchor positions between sessions — so the same physical object in the same room shows its cached LLM response on re-entry without re-querying the backend.
Multi-User Shared AR
Synchronize anchor state across devices via ARCore Cloud Anchors or Niantic Lightship. Multiple users in the same physical space could collaboratively tag and annotate objects — turning the tool into a shared, persistent knowledge layer for classrooms or field teams.
Full Accessibility Layer
Text-to-speech output of anchor labels for users with visual impairments, haptic feedback on detection events, high-contrast and large-text anchor modes. The colorblind-safe palette from Sprint 3 is the first layer of this work.
Known Issues to Fix
Unsubscribe AR events in OnDisable as well as OnDestroy in AOIIntegrationManager to close the memory leak. Replace the Arial.ttf runtime font reference with a bundled asset. Add structured fallback caching when the LLM endpoint is unreachable.