Nod to confirm: head gestures in digital interaction

Nod to confirm: Head gestures in digital interaction

Incorporating head gestures in interaction with technology can open up new levels of synchronicity and emotional connections with innovative social technologies.

Image generated using Leonardo.ai from the prompt ”A human and a strange robot looking at each other.”

Why head gestures?

Head gestures are an essential component of nonverbal communication. They encompass a range of motions like nodding, shaking, tilting, and chin pointing, all reflecting our inner thoughts and emotions. These gestures are prominently used during conversations to express agreement or disagreement, communicate from a distance in noisy settings, or respond to profound internal dialogues.

Despite the ubiquity of head gesturing in human interaction, its application in the technology interface is surprisingly limited. Current technological interactions do not use gestures like nodding to confirm transactions, shaking the head to decline calls, or tilting the head to express emotion at social media content.

Head-tracking technology is increasingly dominant in products such as Virtual Reality (VR) headsets, drone Point of View (POV) gear, and headphones equipped with Spatial Audio. In these applications, head movements are captured and converted into coordinates, translating physical motions into a virtual environment or spatial audio effects.

Image generated using Leonardo.ai from the prompt ”A wearing a VR set on her head in her home. ” and “medium shot of a man wearing a first-person view goggles, flying a drone.”

Additional use of head motion tracking is found in accessibility technology. For individuals with physical limitations, head gestures can offer a crucial alternative for interfacing with computers, smartphones, and other digital devices. This technology uses cameras to track head movements and translate them into a predefined set of digital commands. It does not process context or look for meaning.

There is more to our head’s motions than merely a rotation indicator. Our interactions with future technologies can be more meaningful if our intentions and reactions are better communicated.

The future is in our heads

Head gestures can enhance human-computer interaction (HCI) in various ways, offering a more intuitive, accessible, and efficient means of engaging with technology. Here are some key areas where head gestures can improve HCI:

Hands-Free Operation

Head gestures enable users to control devices without touching them when hands are occupied or unavailable, like driving, cooking, or even performing surgeries.

Enhanced User Experience

By integrating head gestures, user interfaces can become more engaging and interactive. It allows for a more organic interaction with digital content, as users can nod, shake their heads, or look toward certain areas to execute commands.

Non-verbal communication in Digital Interactions

Head gestures can convey non-verbal cues in online communications, adding a layer of expressiveness to video calls, virtual meetings, and online collaboration.

Context-Aware Computing

Head gestures can provide contextual information to devices. For example, a computer could pause a video when the user looks away, or a car’s infotainment system might switch displays based on where the driver is looking. Social robots with voice-operated interfaces and a head gesture interpretation can help read into users’ reactions, predict intent and emotions, or recognize a miscommunication.

What is out there now?

An extensive body of academic research can be found on the subject of head gesture interpretation for computer interaction. Study subjects vary from statistical models for recognizing head gestures, like the Hidden Markov models, to studies that successfully train AI models to predict and interpret gestures into contextual meaning when combined with speech recognition models. A study from 2007 displayed quantitative and qualitative benefits when interacting with contextual dialogs over graphic and touch-based user interfaces when training several AI models to recognize and categorize head gestures (Morency, L.-P. et al., 2007, pp. 583). The advances in AI models and the extensive research into head gestures demonstrate their potential to become a vital addition to our interaction with upcoming technology.

An opportunity to improve

The most common method for head tracking in research and accessibility technology involves cameras and facial recognition software, which visually capture and analyze the user’s face and motions. This standard approach is outside Virtual Reality (VR) and FPV (First-person view) headgear applications. However, there’s growing potential in exploring new, innovative methods for head tracking that do not rely on a camera fixed on the user’s face at a specific angle.

REAL-TIME NON-RIGID DRIVER HEAD TRACKING FOR DRIVER MENTAL STATE ESTIMATION — Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/3D-head-tracking-Our-real-time-non-rigid-tracking-algorithm-estimates-the-3D-pose-of-the_fig4_244261171 [accessed 5 Mar, 2024]

Camera-based head tracking is only as precise as the algorithm processing the images, and there’s a possibility that subtle nuances and varying intensities of movement might not be processed.

Wearable technology can act as a bridge between behavior and technology.

This is where wearable technology comes into play. Products like Earbuds and headphones with this feature utilize a built-in gyroscope to track head movements. This approach enhances the accuracy of head tracking and opens up new opportunities for exploring user interaction, particularly in the realm of audio experiences. By leveraging the microphone, speakers, and head tracking in headphones, we can achieve better nuanced and meaningful interactions, bridging the gap between natural human communication and digital products.

The Idea: Gesture-controlled Robot

This project originated from a technical curiosity. It began during an Arduino workshop in my Master’s program in Human-Computer Interaction (HCI) at Reichman University in Israel. The initial idea was to connect AirPods to an Arduino and translate head movements into servo actions. The ultimate ambition was to integrate this setup with a research robot from the Milab — Media Innovation Laboratory that hosts the HCI program. A critical component of this idea was the development of an app. This app was the core of the communication between the AirPods and Arduino, acting as a bridge and the “brain” of the setup.

The full setup — The iPhone app, AirPods pro and the Magnetform robot.

Technical Details

The basic setup used to test the integration’s feasibility had three components:

Apple AirPods Pro

The AirPods use a combination of technologies to track head movements. Beyond their audio capabilities, they are equipped with a gyroscope and an accelerometer, which detects the orientation and motion of the head along the X, Y, and Z axes. Once these coordinates are captured, they are transmitted to the iPhone using Bluetooth 4.0 BLE (Bluetooth Low Energy) technology.

iPhone app

The app serves as a conduit between the AirPods and the Arduino. It starts with receiving head motion data from the AirPods using an Application Programming Interface (API) known as CMHeadphoneMotionManager, provided by iOS. The rotation coordinates are relayed from the iPhone to the Arduino using Bluetooth (BT).

For the construction of this application, I chose Unity for its iOS build capabilities and ready-to-use solutions for complex development challenges, including handling multiple BT connections and providing a visual representation of the data.

The iPhone app built with Unity

The app’s backbone was an open-source GitHub project titled Headphone Motion Unity Plugin, created by Anastasia Devana. This Unity plugin was essential as it offered a 3D interface to represent head motion and included the implementation of the Headphone Motion Manager API.

To establish a Bluetooth connection with the Arduino, I used the Arduino Bluetooth Plugin, developed by Tony Abou Zaidan. This plugin facilitates Unity applications connecting with various Bluetooth devices, including the specific Bluetooth chip used for the Arduino board.

Arduino platform

The initial version of my project was set up with a straightforward set of components:

An Arduino Uno board served as the central processing unit.A 20Kg digital servo (Can be any servo)DSD TECH HM-10 Bluetooth 4.0 BLE module for Bluetooth connectivity.First version: Arduino Uno board, HM-10 module & a servo.

The HM-10 module communicates with the Arduino through the Serial port. The rotation coordinates received from the iPhone are first split into separate characters and then converted into floating-point variables. These variables are then mapped to a range usable by the servo (750–2250). I initially used the standard Servo library to control the servo. However, I encountered an issue with servo jittering when using it with the SoftwareSerial.h library. This problem resulted from the incompatibility of the Servo and SoftwareSerial libraries to operate simultaneously. To resolve this, I switched to using AltSoftSerial and ServoTimer2 libraries as alternatives, which provided a more stable performance without the jittering issue. This adjustment was a key step in ensuring the smooth operation of the servo in response to the head motion data.

Integration Process

Connecting the AirPods to the app

The Unity plugin I used provided all the necessary components to receive and use the rotation data effectively. Its features included:

Comprehensive API Integration: The plugin came equipped with all the required API calls and initializations for the motion API, streamlining the process of integrating the head motion data into the app.Basic 3D UI: It offered a user interface that graphically represented the head rotation in a 3D space.Controls: The plugin included basic controls that enabled, disabled, and calibrated the coordinates.Bluetooth Connection Indicators: Status indicators for the Bluetooth connection, providing real-time feedback on the connectivity status.

Building on this foundation, the next step was establishing the Bluetooth connection to the Arduino’s Bluetooth module.

Connecting the app to an Arduino

The Arduino Bluetooth Plugin proved beneficial in facilitating the integration process. It offered comprehensive documentation and examples for integrating it with the existing app.

However, transmitting and receiving the data was an undertaking. It necessitated an understanding of Bluetooth specification and its communication mechanism. Learning this aspect was essential for ensuring effective interaction between the devices. The second and more challenging aspect was transmitting the rotation coordinates to the Bluetooth plugin and then accurately reconstructing them into functional values without lagging.

Connecting to the HM-10 module

The plugin seamlessly manages the rest of the connection process by inputting the device name during the initial setup. Once the Arduino is activated, the app automatically connects with the Bluetooth module. This discovery significantly streamlined the connection process and eliminated the need for manual pairing.

BThelper.setDeviceName(“DSD TECH”);

Syncing the transmission

Handling the coordinates required working with several data types. First, the data had to be extracted from the API responses. This involved transforming the rotation vector variable (Quaternion) into values for separate axes. These values were then joined into a single string and transmitted to the Arduino. Achieving minimal delay between the head motion and the corresponding servo movement was crucial.

To manage synchronization, I leveraged Unity’s Update function. This function runs 60 times per second, providing a consistent frame rate for the entire sequence. During each cycle of the Update function, provided the Bluetooth connection was active, the latest rotation data for each axis was read and converted into a string variable within a custom command.

helper.WriteCharacteristic(bluetoothHelperCharacteristic, “<” + vXout.ToString(“F2”) + “>”);

For simplicity, I used just the X-axis values, called “vXout,” which range from -1 to 1. These values were then formatted into a string, retaining only two decimal places, shortening the package size.

Setting up the Arduino

The physical setup was straightforward: connecting to the serial transmission (TX) and reception (RX) pins and initializing the Serial.begin command within the Arduino’s Start() function.

However, the challenging aspect for me was converting the incoming data into a format that the Arduino could effectively utilize. The data arrived as a series of character values (Char), and transforming these back into floating-point variables, complete with their original positive or negative sign, required some learning. After navigating through trial and error, I stumbled upon a Stack Overflow discussion that provided the key. It offered a solution for parsing transmitted characters into other data types.

void parseData() { // split the data into its parts
char * strtokIndx; // this is used by strtok() as an index
strtokIndx = strtok(tempChars, “,”); // get the first part – the string
X_axis = atof(strtokIndx); // copy it to X_axis
}

Once the X_axis values were received correctly, they were mapped to the values acceptable by the ServoTimer2 library.

servoXData = X_axis*100;
servoXMove = map(servoXData, -50, 50, 750, 2250);
servoX.write(servoXMove);

First Win

After a thorough debugging session, I achieved the first milestone: the ability to control the servo motion by nodding my head. The Arduino’s response to my movements was smooth and almost instantaneous, thanks to the high frame rate. Incorporating the two additional axes into the system didn’t compromise performance. Managing the motion of three servos proved to be just as seamless as controlling a single one.

To further enhance the app’s functionality, I integrated additional controls. These new features included a real-time display of the axes’ values on the screen, adding an informative layer, and a button to switch the transmission to the Arduino on and off, for more control during testing.

With the foundational setup proving successful, the project was ready to connect the app to one of the lab’s research robots.

The Robot

Selecting a robot partner

Milab, the research lab hosting the program, specializes in using various non-anthropomorphic (non-humanoid) robots for studies in Human-Robot interaction. Among these, the robot selected for our project, as recommended by my mentors, was called Magnetform. Designed by Iddo Wald and Prof. Oren Zuckerman, Magnetform was initially conceived as a “shape-change display toolkit” for soft materials.

Magentform: A Shape-change Display Toolkit for Material-oriented Designers

The unique non-humanoid design of Magnetform presents an intriguing challenge: it requires the translation of head rotations into arm movements. This contrasts humanoid robots, where mimicking head movements would be more intuitive. With non-humanoid robots like Magnetform, creating empathic and synchronized interactions demands a more creative approach. Studying these challenges and exploring innovative ways to bridge this gap is a key area of research at Milab.

Structure

The Magnetform robot uses an Arduino Uno board with a PCA9685 16-channel 12-bit PWM Servo Motor Driver. This combination controls the eight servos operating the movements of its four arms. The robot’s symmetric design offers a unique advantage — it can be positioned in various configurations, effectively altering the interaction context.

At the tip of each arm, there’s a disk-shaped magnet. These magnets manipulate materials placed on the robot’s top cover, adding a dynamic and interactive element to its operation.

PWM Servo Motor Driver powering the eight servos, connected to an Arduino Uno below.

Integration

To facilitate the integration with the Magnetform, several modifications were necessary in its Arduino setup. First, I incorporated the HM-10 Bluetooth (BT) module into the existing system. Then, I replaced the existing ServoTimer2 library with the Adafruit PWMServoDriver library for the Servo Motor Driver. Following this, I mapped the eight servo channels, assigning movement algorithms to each robot’s four arms. These algorithms were designed to consider each arm’s location, movement direction, and motion range to ensure they do not collide. Additionally, I added a variable to regulate the overall motion speed, allowing for smoother and more controlled movements.

With these changes in place, I replaced the Magnetform’s original code with the updated version, setting the stage for the experiment.

The following is an example of the code for moving one arm:

servo1Move = map(constrainedInputAvg, minimumValue, maximumValue, 300, 500); // Mapping the raw values to servo move angle
servo1AMove = map(constrainedInputX, minimumValue, threashold, 500, 300);
pwm.setPWM(0, 0, servo1Move); // Send the current angle to the servo
pwm.setPWM(1, 0, servo1AMove);

The constrainedInputAvg variable is crucial as it confines the values received from the app within a predefined range, ensuring a consistent and clear movement range:

int constrainedInputX;
constrainedInputX = constrain(servoXData, minimumValue, maximumValue);

With the core integration complete, I began experimenting with various movement combinations in response to head rotations. The latest configuration of this setup is demonstrated in the accompanying video.

https://medium.com/media/e293c742022184acb401f4c8eaebbf14/href

The final additions to the app were features to record and save head rotation data to a file alongside controls for starting, stopping, and clearing the recorded data. A play/stop button was also included. These enhancements mean the app can stream live data or replay a recorded sequence of movements, opening up opportunities for creating interactive experiences with the Magnetform.

Recording and Playing features added to the final version before testing

Testing and Refinement

After the integration was complete, the robot was ready to be evaluated. With the help of the project’s mentors, we planned a short evaluation protocol of the two operation modes available in the app — the live streaming and the prerecorded movements. Our objective was to gauge the user-friendliness and intuitiveness of the technical setup and to determine user preferences between the two modes and their reasons for such preferences. To add relevance and depth to our experiment, we decided to take advantage of the primary purpose of headphones and use music as a key element of the test.

The participants were asked to wear the AirPods and listen to two tracks. Each track will be combined with one of the two modes.

When streaming live, the participant reacted to the music by nodding their head, which prompted a corresponding action from the robot. Contrarily, in the prerecorded mode, the robot automatically executed a preset sequence of head movements in sync with the music as soon as the track began playing, encouraging the participant to react.

Two of the six participants during the evaluation

*The AirPods were inspected and sanitized for health and safety reasons before each use.

Following the evaluation with six participants, we observed positive reactions for both modes of operation. Interestingly, the prerecorded motion condition edged out slightly in preference. However, we encountered a technical snag: some participants only recognized the robot’s responses to their head movements after being informed about this feature. Once aware, their feedback turned notably positive.

This initial feedback highlights an area for improvement: making the interaction more discoverable, either through technical tweaks or protocol changes. Although the direct responses to the robot’s interactions are still inconclusive, the overall positive reactions towards both modes are encouraging. This paves the way for more in-depth experiments in the future, exploring head gesture interactions further.

Future Developments and Integration

The advent of AI technology reveals exciting new possibilities for this project. By integrating AI, we can now attempt to translate head movements into meaningful gestures. We can achieve this by training an AI model with data collected from the headphones. Combined with advanced speech recognition models, we can create a nuanced correlation between head gestures and their conversational context. This is done by recording both the head movements and the speech simultaneously using the headphones’ sensors and microphone.

Smartphones might be reaching a plateau in terms of user experience. The interaction potential with flat user interfaces and buttons has its limits. Looking ahead, we envision a future where specialized AI devices, such as the Rabbit R1, AI-enhanced household appliances, and interactive social robots, will incorporate head gesture recognition models. This integration promises to make interactions more natural and intuitive, enhancing the user experience.

*I like to mention that my programming skills are self-taught, and coding isn’t my profession. The article focuses mainly on the process and the challenges and less on code and refined development assets.

Links & Refrences

Schmidt, A. (2014, January 1). Context-Aware Computing. Interaction Design Foundation — IxDF. https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/context-aware-computing-context-awareness-context-aware-user-interfaces-and-implicit-interactionC. Morimoto, Y. Yacoob and L. Davis, “Recognition of head gestures using hidden Markov models,” Proceedings of 13th International Conference on Pattern Recognition, Vienna, Austria, 1996, pp. 461–465 vol.3, doi: 10.1109/ICPR.1996.546990. https://ieeexplore.ieee.org/document/546990Understanding assistive technologies: What is a head mouse system? (2023). https://www.boia.org/blog/understanding-assistive-technologies-what-is-a-head-mouse-systemLouis-Philippe Morency, Candace Sidner, Christopher Lee, and Trevor Darrell. 2005. Contextual recognition of head gestures. In Proceedings of the 7th international conference on Multimodal interfaces (ICMI ‘05). Association for Computing Machinery, New York, NY, USA, 18–24. https://dl.acm.org/doi/10.1145/1088463.1088470Morency, L.-P., Sidner, C., Lee, C., & Darrell, T. (2007). Head gestures for perceptual interfaces: The role of context in improving recognition. Artificial Intelligence, 171(8), 568–585. doi:10.1016/j.artint.2007.04.003 https://www.sciencedirect.com/science/article/pii/S0004370207000641Apple Developer. CMHeadphoneMotionManager https://developer.apple.com/documentation/coremotion/cmheadphonemotionmanagerDSD TECH. HM-10 Bluetooth Module Datasheet https://people.ece.cornell.edu/land/courses/ece4760/PIC32/uart/HM10/DSD%20TECH%20HM-10%20datasheet.pdfAnastasia Devana. Headphone Motion Unity Plugin https://github.com/anastasiadevana/HeadphoneMotionTony Abou Zaidan. Arduino Bluetooth Plugin https://assetstore.unity.com/packages/tools/input-management/arduino-bluetooth-plugin-98960Stack Overflow. Arduino Serial parsing https://stackoverflow.com/questions/49625253/arduino-serial-parsingRostislav Varzar. track_robot: Solved problem with Bluetooth and servos https://github.com/vrxfile/track_robotAdafruit. Adafruit PCA9685 PWM Servo Driver Library https://github.com/adafruit/Adafruit-PWM-Servo-Driver-LibraryIddo Yehoshua Wald and Oren Zuckerman. 2021. Magnetform: a Shape-change Display Toolkit for Material-oriented Designers. In Proceedings of the Fifteenth International Conference on Tangible, Embedded, and Embodied Interaction (TEI ‘21). Association for Computing Machinery, New York, NY, USA, Article 90, 1–14. https://doi.org/10.1145/3430524.3446066Iddo Wald, Yoav Orlev, Andrey Grishko, and Oren Zuckerman. 2017. Animating Matter: Creating Organic-like Movement in Soft Materials. In Proceedings of the 2017 ACM Conference Companion Publication on Designing Interactive Systems (DIS ’17 Companion). Association for Computing Machinery, New York, NY, USA, 84–89. https://doi.org/10.1145/3064857.3079124

Nod to confirm: head gestures in digital interaction was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.

Article Categories:

Technology