A Symbiotic Human-Machine Depth Sensor

Sketch of a person wearing a pupil labs headset and looking at a box.

The goal of this project is to explore how much we can learn about physical objects a user is looking at by observing gaze depth. We envision a symbiotic scenario, where current technology (e.g. depth cameras) is extended with "human sensing data". Here, a depth camera is able to create a rough understanding of a static environment and gaze depth is merged into the model by leveraging unique propetries of human vision:

  • the human eye as a near-range depth camera
  • using smooth pursuit eye movements to recognise dynamic objects

The fusion between human abilities and physical sensors can potentially leverage each individual advantages to overcome current technical difficulties and is also a start to explore future human-machine symbiotic sensors.


Setup of the human-machine depth sensor.

The system consists of a head-mounted Pupil Labs eye tracker and the OptiTrack motion capture system, allowing us to measure 3D gaze inside an approx. 50x50x50 cm volume. Additionally, we used a ChArUco board to measure the position of the world camera. The three coordinate systems are fused together inside a Unity application, enabling to visualise the 3D gaze position in real time.

Our 3D gaze algorithm is based on Wang et al.'s implementation that is based on a gaze point triangulation approach. Our algorithm works as follows:

  1. Calibration of the eye tracker with a 9-point calibration on a 2D plane in the real world (see calibration plane in Figure).
  2. The estimated 2D gaze points for each eye (given by the eye tracker) are projected onto the calibration plane to obtain both corresponding 2D gaze points in the real world in 3D coordinates.
  3. We apply gaze point triangulation to calculate the 3D gaze point by cating two rays through the user's eyes and corresponding 2D gaze points in 3D coordinates.
Illustration of the human-machine depth sensor setup

Gaze Scans

The gaze-scans were obtained by one author, who scanned the objects with their eyes, i.e. by consciously looking at the objects' outlines and main features at a distance of approx. 0.5m. For a proof of concept, we tested our approach with three objects, differing in geometric complexity and contained depth cues:

  • a simple three dimensional geometric object (Scan 1: black box)
  • simple geometric objects at different depth levels (Scan 2: letters built in Lego bricks)
  • an organic object that includes several depth levels itself (Scan 3: head of mannequin)


Teresa Hirzle

Enrico Rukzio


Teresa Hirzle, Jan Gugenheimer, Florian Geiselhart, Andreas Bulling, and Enrico Rukzio. 2018. Towards a Symbiotic Human-Machine Depth Sensor: Exploring 3D Gaze for Object Reconstruction. In The 31st Annual ACM Symposium on User Interface Software and Technology Adjunct Proceedings (UIST '18 Adjunct). ACM, New York, NY, USA, 114-116. DOI: doi.org/10.1145/3266037.3266119


Gaze Scan 1: black box

Illustration of the gaze-scan of a box.

Gaze Scan 2: letters built with lego bricks

Illustration of the gaze-scan of four letters (U,I,S,T) built with lego bricks

Gaze Scan 3: head of mannequin

Illustration of the gaze-scan of a mannequin's head.