A Design Space for Gaze Interaction on Head-Mounted Displays

Augmented and virtual reality (AR/VR) head-mounted display (HMD) applications inherently rely on three dimensional information. In contrast to gaze interaction on a two dimensional screen, gaze interaction in AR and VR therefore also requires to estimate a user's gaze in 3D (3D Gaze).
While first applications, such as foveated rendering, hint at the compelling potential of combining HMDs and gaze, a systematic analysis is missing. To fill this gap, we present the first design space for gaze interaction on HMDs.
Video
Design Space
The unique properties of human depth perception and the specific technical requirements of current head-mounted displays call for a structured analysis to identify key challenges and characterize the potential for future interaction design. With our design space we give an approach for such a structured analysis, which is currently missing. The design space is presented as a two-dimensional matrix, also known as Zwicky box. It is spanned by the two dimensions:
- D1: technical properties of HMDs and
- D2: properties of human depth perception.
Dimensions, Parameters, and Values:
D1 is defined by three parameters that were selected according to generally accepted technical classifications of HMD technology (see section 2 "classifications of HMDs"):
- device type,
- display type, and
- world knowledge.
D2 is defined by two parameters that were selected based on an analysis of human depth cues (see section 2 "human depth perception") and their application for measuring gaze depth (see section 2 "measuring gaze depth"):
- oculomotor depth cue and
- ocularity
Each of the parameters has two binary values. These are for D1:
- device type (AR/VR)
- display type (monoscopic/stereoscopic)
- world knowledge (none/full)
and for D2:
- oculomotor depth cue (vergence/accommodation)
- ocularity (monocular/binocular).
To reduce complexity we restricted the parameters to have binary values. This does not map the whole spectrum of possibilities, however the set of values can be expanded in future.
View on the Design Space
The broad applicability of our design space is shown by giving three exemplary views on how the cells of the design space can be filled.
Technology-based View:
This view refers to filling the design space with technical devices in combination with eye tracking devices and gaze depth algorithms. The Microsoft HoloLens with an attached Pupil Labs eye tracking add-on (capable of obtaining a 3D gaze point) can for example be placed in cell (2,2). At this D1 defines to which row the Hololens belongs: it is an AR device with a stereoscopic display, having full world knowledge. D2 defines to which row the Pupil Labs/eye tracking algorithm belongs: the headset is able to obtain a 3D gaze point based on binocular vergence estimates.
Application-based View:
This view refers to filling the design space with applications that combine a 3D gaze-based interaction approach with HMDs. One example for this view was presented by Hirzle et al.. In their work they implemented an application that creates a 3D scan of a gazed-at object. For D1 they used a stereoscopic AR device with full world knowledge (row 2). For D2 the application relies on binocular vergence estimate to calculate a 3D gaze point in space, as such the application results in cell (2,2) of the design space.
Interaction-based View:
This view refers to filling the design space with general interaction possibilities. These possibilities can then be used for/implemented in concrete applications, as presented in the section before. An example for this category is to use a user’s 3D gaze point in space for the correct positioning of augmented content in the real world. This interaction possibility requires for D1 full world knowledge and a stereoscopic display to be able to correctly display depth information. For D2 it requires the estimation of a 3D gaze point. As such it can be positioned in cells (2,1) and (2,2).
Usage of the Design Space

We propose two ways how our design space can be used in practice to derive new technological combinations, applications and interaction possibilities. The classification-based approach aims to position applications inside the space to identify technological requirements. The usage of the design space as ideation tool aims to derive new interaction possibilities, applications and even new device types based on fundamental components of the design space.
Classification-based Technique:
The classification-based technique of using the design space aims to identify technological requirements for given content and can be seen as a top-down approach. It is mainly directed towards designers and practitioners that have specific application scenarios or interaction possibilities for gazeinteraction on HMDs in mind. The aim is to identify devices and concrete implementations they could use. This approach also provides the possibility of identifying interaction possibilities that have to be fulfilled, similar to a requirements analysis. By filling several cells, this approach helps to identify alternative implementations or device types that could be used.
Ideation Tool:
Using the design space as an ideation tool is mainly directed towards researchers, designers and practitioners with the aim to derive new device types, interaction possibilities and application ideas expanding the content of the design space.
For this, we define three types of components (P, V and G) that cover a user’s view of the world in the context of the design space. P refers here to knowledge that is available about the physical world, V refers to virtualworld knowledge and G refers to knowledge that is available about the 3D gaze representation. This definition of component types is inspired by Milgram et al.’s view on HMDs, who defined them to have a physical and virtual part. We combine this perspective with a representation of the user’s gaze in 3D. An important realization is that G can exist inside both, the virtual V and physical P world but not at the same time. This realization is important to keep in mind when designing applications. For VR systems P is meaningless and only V and G apply, since VR by definition relies on virtual content only. The component types are strongly influenced by he parameters occurring in the design spacce. We present two different components for P and V respectively and three for G. However, these sets are not exhaustive and users are encouraged to expand them in the future. Inspired by Card et al.’s operators we then define exemplary aggregations that are applied to transfer the properties of each component into new expressions of device types or basic interaction possibilities. We choose two aggregations, which we will describe in more detail: addition and substitution.
Example Applications
We implemented two example applications that were derived using the design space: EyeHealthand X-Ray Vision.
Eye Health:
We implemented four exercises that aim to train the eye muscles and help with eye redness, fatigue, and tension. They are also designed to help with spasm of accommodation, which refers to a condition, where the eye remains in a constant state of contraction. The exercises are based on the Eye Care Plus application.
- Split Images:
Here a so called "split image" is presented on a virtual plane. A split image refers hereby to an image which is split in half, and each of its components is shown on one side of the virtual plane. In front of the image a red bar is positioned, which can be moved forward/backward. When focusing on the red bar it turns green, as such the system indicates that it keeps track of the user’s eye movements. When focusing on the forward moving bar the two split images in the background merge to one percept. This exercise aims to improve focusing and stimulate the vision center of the brain among others.
- Follow the Bouncing Ball:
A sphere is moving through space and the user has to follow it with her eyes. The sphere turns green when a fixation is detected. This application aims to reduce eye stress and tension and tries to make the eyes more focused.
- Focus Shift:
This application aims to train the eye muscles by "forcing" the eyes to refocus between forwards and backwards moving objects. Hereby a sphere and a cube are presented on the display, one of which moves forward or backward while the other does not move. The user has to follow the moving object with her eyes, which is again indicated by turning green.
- Look Through the Wall:
Here performing a voluntary, divergent eye movement is trained. A 3D object is positioned behind a slightly transparent wall. The user has to focus the object behind the wall. Once this is detected the wall in front of the object disappears. Once the user refocuses on an object in front of the wall the wall appears again. This can be tested with different degrees of transparency and also aims to train the eye muscles.
X-Ray Vision:
This application was derived by the design space following the ideation tool technique as described in section 4. For this application, some hidden virtual content is triggered by the user when they focus at a point behind the wall. This is difficult, because our eyes are not used to focus on an invisible point in space. Therefore we provide a "scaffolding pattern". This pattern is meant to support the user in refocusing, i.e. performing voluntary convergent and divergent eye movements, by applying pictorial depth cues (e.g. points in the front are bigger than points in the background). When the user successfully triggered a divergent movement with the eyes, i.e. established a predefined threshold from the 3D gaze point to the wall, the hidden virtual content is displayed. At this the horizon in the image helps to keep the eyes fixated at a certain distance.

Technology-Based View on the Design Space

The technology-based view on the design space can be filled with device types and 3D gaze tracking approaches/eye tracking devices. Here each device type can be positioned in one rowof the design space and each eye tracking device/3D gaze estimation technique can be positioned in one column.
Application-based View on the Design Space

The application-based view on the design space can be filled with concrete applications. When an application is positioned in only one cells it means that it can exclusively implemented with the according devices from the technology-based view. Applications that are positioned in more than one cell indicate alternative implementations/device types.
Interaction-based View on the Design Space

The interaction-based view on the design space can be filled with basic interaction possibilities, which can then be applied/implemented in concrete applications.