In the mid-1990s a few curious people in the AV industry were asking, “Why are there so many bad user interfaces?” At the time it seemed that the only guidance was, “If you can put the interface in front of your mom and she could figure it out—it must be good enough.” In reality, when our technical biases (a.k.a., “curse of knowledge”) didn’t limit the empathy and observational skills we needed to create good user interfaces, our lack of graphic design and information architecture skills did.

Now I look around at the industry and see so much more awareness of design practices. Writers like Donald Norman, Alan Cooper, Edward Tufte, and many others have provided guidance on creating usable interfaces. In the past decade Apple taught the world how to use and design a touchscreen UI. AV interface designers that employ known design practices and leverage mobile user interface standards are creating GUIs that are better than ever.

Now we’re living through another interface revolution. Interaction options have expanded rapidly. We chat with chatbots, we speak to devices—and they talk back. We gesture to screens; computers with “vision” see and recognize us. And we still have the classics—touchscreen interfaces. Recently, I was asked to give a lecture about interface design in the age of the Internet of Things, and, as usual, I said yes to the opportunity before knowing what I was going to say (I recommend this as a strategy for life, in general). When I started sorting through all of these new options, I felt like a beginner again. I am convinced that none of the newer options will entirely supplant what we’ve been doing, but there seems to be little guidance about which interaction to use when. I started to realize that these new interactions mirrored many human senses and capabilities. This is the start to developing a better framework for designing with these new tools—by thoroughly analyzing human interaction and mapping the appropriate interface mode to create the most natural dialog between systems and humans.

In Werner Herzog’s new movie, Lo and Behold, he provides an outsider’s perspective of the internet and takes us on an interesting journey. One of the people in the movie is internet pioneer Leonard Kleinrock, who near the end of the movie gestures to the room around him and says, “This room should know I’m here. I should be able to talk to it, it should be able to give me an answer…with speech…with a display, in a very natural way. I should be able to use gestures and touch and even smell and all my senses to interact in a very humanistic way with this technology around me.” This reinforced the way I have been thinking about the goal of new user interface design.

Today we should not be asking, “How should we lay out this touchscreen?” We should be asking, “What are the users’ goals? How do people most naturally achieve their goal (with or without technology)? What sort of interaction will allow them to reach their goal with the least amount of friction?” The actual mode of interaction—whether it is voice commands, gesture, touch, etc.,—should vary and be guided by how humans have interacted with the world around them for thousands of years.

Kleinrock goes on to say, “And once that technology is added to our physical world and becomes embedded in our walls, in our desks, in our bodies in our fingernails in our cars in our offices in our homes, it should disappear and become invisible.” The goal of interface design has always been to achieve a certain type of natural interaction. Many people have referred to a good GUI design as “intuitive” but touchscreens have never provided a completely natural mode of interaction. As we improve our ability to design multi-modal interface dialogs, we may finally approach interactions that deserve to be called intuitive.

Modes of Interaction
Although it may be some time before we figure out how to use taste and smell in the design of system interfaces, the following modes of interaction can be considered for designing dialogs with buildings:

Sight: Computer vision can provide many types of input for the designer. Systems like Leap Motion, Microsoft’s Kinect, and Google’s emerging Soli technology all allow some level of gestural control of systems. Facial recognition can provide a ways of determining who people are, and infrared vision can tell you where people are located, even in the dark.

Sound: Sound can provide means for both human input in the form of speech recognition (computer ears) as well as system feedback in the form of bleeps and blips or as fully conversant systems such as Apple Siri and Amazon Echo (computer voice).

Touch: Touchscreens are much better than they used to be, as mobile devices develop the ability to read more gestures, adapt to finger or stylus control and provide better haptic feedback. These advanced interactions will be expected on every display in the future and designers will need to be able to implement these features on larger, embedded touchscreen environments.

Facial Expressions: Donald Norman’s book, Turn Signals Are the Facial Expressions of Automobiles, provides good guidance as we create feedback mechanisms for any system. Red colors on a display may indicate a type of system “anger” while cool colors may express that the system is “calm.”

Thought: Artificial intelligence is a tool that will eventually be common within the built environment. Amazon’s Alexa and Google’s Nest have given us a small glimpse into the possibilities of employing AI within buildings—and this is just the beginning.

Paul Chavez (pchavez@harman.com) is the director of systems applications for Harman Pro Group. He is a usability evangelist and a futurologist. Chavez has designed a variety of audiovisual systems ranging from themed attractions to super yachts. He has also taught and written on the topics of interaction design, audiovisual design, and networking.