IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces

Full Citation in the ACM Digital Library

IUI4EUD: intelligent user interfaces for end-user development

End-User Developers program to meet some goal other than the code itself. This includes scientists, data analysts, and the general public when they write code. We have been working for many years on various ways to make end-user development more successful. In this talk, I will focus on two new projects where we are applying intelligent user interfaces to this long-standing challenge. In Sugilite, the user can teach an intelligent agent new skills interactively with the user interfaces of relevant smartphone apps through a combination of programming by example (PBE) and natural language instructions. For instance, a user can teach Sugilite how to order the cheaper car between Uber and Lyft, even though Sugilite has no access to their APIs, no knowledge about the task domain, and no understanding of the concept "cheap" in advance. Another project, called Verdant, is focusing on helping data scientists, including those using Machine Learning and AI, to do exploratory programming. Verdant supports micro-versioning in computational notebooks for understanding the difference among the output and code of different versions, backtracking, provenance of output to its code, and searching the history. A goal for Verdant is to intelligently organize and summarize the raw history data to help data scientists make effective choices from it.

Explaining AI: fairly? well?

Explainable AI (XAI) has started experiencing explosive growth, echoing the explosive growth that has preceded it of AI becoming used for practical purposes that impact the general public. This spread of AI into the world outside of research labs brings with it pressures and requirements that many of us have perhaps not thought about deeply enough. In this keynote address, I will explain why I think we have a very long way to go.

One way to characterize our current state is that we're doing "fairly well", doing some explaining of some things. In a sense, this is reasonable: the XAI field is young, and still finding its way. However, moving forward demands progress in (at least) three areas.

(1) How we go about XAI research: Explainable AI cannot succeed if the only research foundations brought to bear on it are AI foundations. Likewise, it cannot succeed if the only foundations used are from psychology, education, etc. Thus, a challenge for our emerging field is how to conduct XAI research in a truly effective multi-disciplinary fashion, that is based on an integration of foundations behind what we can make AI algorithms do, with solid, well-founded principles of explaining the complex ideas behind the algorithms to real people. Fortunately, a few researchers have started to build such foundations.

(2) What we can succeed at explaining: So far, we as a field are doing a certain amount of cherry picking as to what we explain. We tend to choose what to explain by what we can figure out how to explain---but we are leaving too much out. One urgent case in point is the societal and legal need to explain fairness properties of AI systems.

The above challenges are important, but the field is already becoming aware of them. Thus, this keynote will focus mostly on the third challenge, namely:

(3) Who we can explain to. Who are the people we've even tried to explain AI to, so far? What are the societal implications of who we explain to well and who we do not?

Our field has not even begun to consider this question. In this keynote I'll discuss why we have to explain to populations to whom we've given little thought---diverse people in many dimensions, including gender diversity, cognitive diversity, and age diversity.

Addressing all of these challenges is necessary before we can claim to explain AI fairly and well.

Humans and robots together: engineering sociality and collaboration

Intelligent agents and robots will become part of our daily lives. As they do, they will not only carry out specific tasks in the environment but also to partner with us socially and collaboratively. Groups of social robots may interact with groups of humans performing joint activities. Yet, research related to hybrid groups of humans and robots is still limited. What does it mean to be part of a hybrid group? Can social robots team up with humans? How do humans respond to social robots as partners? Do humans trust them? To research these questions we need a deep understanding of how robots can interact socially in group. That involves being able to identify and characterize group members, evaluate the dependencies between the behaviors of different members, understand and consider different roles, and infer the dynamics of group interactions in a group, led by a common past to build an anticipated future.

In this talk I will discuss how to engineer social robots that act autonomously as members of a group collaborating with both humans and other robots. Different aspects need to be considered: what type of interaction mechanisms need to be built in the social robots to act in groups; how to build the capabilities that makes them able to perceive, identify and understand the human members of a group; how to model and retain information about past shared experiences of a group; how to model and adapt to the emotions of a group; and how to allow for a robot to respond to the dynamics of social interactions [Correia et al. 2019].

I will start by providing an overview of recent research in social human-robot teams, and present different scenarios to illustrate the work. I will explore the application in the context of education [Alves-Oliveira et al. 2019] and entertainment [Correia et al. 2016] [Oliveira et al. 2018], describing how autonomous robotic partners may take into account different preferences and characteristics by the humans, and act in a social manner respecting the group's characteristics [Correia et al. 2018].

SESSION: Intelligent support

Understanding the effectiveness of adaptive guidance for narrative visualization: a gaze-based analysis

We study the effectiveness of adaptive guidance at helping users process textual documents with embedded visualizations, known as narrative visualizations. We do so by leveraging eye tracking to analyze in depth the effect that adaptations meant to guide the user's gaze to relevant parts of the visualizations has on users with different levels of visualization literacy. Results indicate that the adaptations succeed in guiding attention to salient components of the narrative visualizations, especially by generating more transitions between key components of the visualization (i.e., datapoints, labels and legend). We also show that the adaptation helps users with lower levels of visualization literacy to better map datapoints to the legend, which leads in part to improved comprehension of the visualization. These findings shed light on how adaptive guidance helps users with different levels of visualization literacy, informing the design of personalized narrative visualizations.

Towards making videos accessible for low vision screen magnifier users

People with low vision who use screen magnifiers to interact with computing devices find it very challenging to interact with dynamically changing digital content such as videos, since they do not have the luxury of time to manually move, i.e., pan the magnifier lens to different regions of interest (ROIs) or zoom into these ROIs before the content changes across frames.

In this paper, we present SViM, a first of its kind screen-magnifier interface for such users that leverages advances in computer vision, particularly video saliency models, to identify salient ROIs in videos. SViM's interface allows users to zoom in/out of any point of interest, switch between ROIs via mouse clicks and provides assistive panning with the added flexibility that lets the user explore other regions of the video besides the ROIs identified by SViM.

Subjective and objective evaluation of a user study with 13 low vision screen magnifier users revealed that overall the participants had a better user experience with SViM over extant screen magnifiers, indicative of the former's promise and potential for making videos accessible to low vision screen magnifier users.

VASTA: a vision and language-assisted smartphone task automation system

We present VASTA, a novel vision and language-assisted Programming By Demonstration (PBD) system for smartphone task automation. Development of a robust PBD automation system requires overcoming three key challenges: first, how to make a particular demonstration robust to positional and visual changes in the user interface (UI) elements; secondly, how to recognize changes in the automation parameters to make the demonstration as generalizable as possible; and thirdly, how to recognize from the user utterance what automation the user wishes to carry out. To address the first challenge, VASTA leverages state-of-the-art computer vision techniques, including object detection and optical character recognition, to accurately label interactions demonstrated by a user, without relying on the underlying UI structures. To address the second and third challenges, VASTA takes advantage of advanced natural language understanding algorithms for analyzing the user utterance to trigger the VASTA automation scripts, and to determine the automation parameters for generalization. We run an initial user study that demonstrates the effectiveness of VASTA at clustering user utterances, understanding changes in the automation parameters, detecting desired UI elements, and, most importantly, automating various tasks. A demo video of the system is available here: http://y2u.be/kr2xE-FixjI.

An eye gaze-driven metric for estimating the strength of graphical passwords based on image hotspots

In this paper, we propose an eye gaze-driven metric based on hotspot vs. non-hotspot segments of images for unobtrusively estimating the strength of user-created graphical passwords by analyzing the users' eye gaze behavior during password creation. To examine the feasibility of this method, i.e., the existence of correlation between the proposed metric and the strength of users' generated passwords, we conducted an eye-tracking study (n=42), in which users created a graphical password with a personalized image that triggers declarative memory of users (familiar image) vs. an image illustrating generic content unfamiliar to the users' episodic and semantic memory (generic image). Results revealed a strong positive correlation between the password strength and the proposed eye gaze-driven metric, pointing towards a new direction for the design of intelligent eye gaze-driven graphical password schemes for unobtrusively assisting users in making better password choices.

WordBlender: principles and tools for generating word blends

Combining text and images is a powerful strategy in graphic design because images convey meaning faster, but text conveys more precise meaning. Word blends are a technique to combine both elements in a succinct yet expressive image. In a word blend, a letter is replaced by a symbol relevant to the message. This is difficult because the replacement must look blended enough to be readable, yet different enough to recognize the symbol. Currently, there are no known design principles to find aesthetically pleasing word blends. To establish these principles, we run two experiments and find that to be readable, the object should have a similar shape as the letter. However, to be aesthetically pleasing, the font should match some of the secondary features of the image: color, style, and thickness. We present WordBlender, an AI-powered design tool to quickly and easily create word blends based on these visual design principles. WordBlender automatically generates shape-based matches and allows users to explore combinations of color, style, and font that improve the design of blends.

SESSION: Active & interactive ML

iSeqL: interactive sequence learning

Exploratory analysis of unstructured text is a difficult task, particularly when defining and extracting domain-specific concepts. We present iSeqL, an interactive tool for the rapid construction of customized text mining models through sequence labeling. With iSeqL, analysts engage in an active learning loop, labeling text instances and iteratively assessing trained models by viewing model predictions in the context of both individual text instances and task-specific visualizations of the full dataset. To build suitable models with limited training data, iSeqL leverages transfer learning and pre-trained contextual word embeddings within a recurrent neural architecture. Through case studies and an online experiment, we demonstrate the use of iSeqL to quickly bootstrap models sufficiently accurate to perform in-depth exploratory analysis. With less than an hour of annotation effort, iSeqL users are able to generate stable outputs over custom extracted entities, including context-sensitive discovery of phrases that were never manually labeled.

Dynamic word recommendation to obtain diverse crowdsourced paraphrases of user utterances

Building task-oriented bots requires mapping a user utterance to an intent with its associated entities to serve the request. Doing so is not easy since it requires large quantities of high-quality and diverse training data to learn how to map all possible variations of utterances with the same intent. Crowdsourcing may be an effective, inexpensive, and scalable technique for collecting such large datasets. However, the diversity of the results suffers from the priming effect (i.e. workers are more likely to use the words in the sentence we are asking to paraphrase). In this paper, we leverage priming as an opportunity rather than a threat: we dynamically generate word suggestions to motivate crowd workers towards producing diverse utterances. The key challenge is to make suggestions that can improve diversity without resulting in semantically invalid paraphrases. To achieve this, we propose a probabilistic model that generates continuously improved versions of word suggestions that balance diversity and semantic relevance. Our experiments show that the proposed approach improves the diversity of crowdsourced paraphrases.

Investigating audio data visualization for interactive sound recognition

Interactive machine learning techniques have a great potential to personalize media recognition models for each individual user by letting them browse and annotate a large amount of training data. However, graphical user interfaces (GUIs) for interactive machine learning have been mainly investigated in image and text recognition scenarios, not in other data modalities such as sound. In a scenario where users browse a large amount of audio files to search and annotate target samples corresponding to their own sound recognition classes, it is difficult for them to easily navigate through the overall sample structure due to the non-visual nature of audio data. In this work, we investigate the design issue for interactive sound recognition by comparing different visualization techniques ranging from audio spectrograms to deep learning-based audio-to-image retrieval. Based on an analysis of the user study, we clarify the advantages and disadvantages of audio visualization techniques, and provide design implications for interactive sound recognition GUIs using a massive amount of audio samples.

Interactive deep singing-voice separation based on human-in-the-loop adaptation

This paper presents a deep-learning-based interactive system separating the singing voice from input polyphonic music signals. Although deep neural networks have been successful for singing voice separation, no approach using them allows any user interaction for improving the separation quality. We present a framework that allows a user to interactively fine-tune the deep neural model at run time to adapt it to the target song. This is enabled by designing unified networks consisting of two U-Net architectures based on frequency spectrogram representations: one for estimating the spectrogram mask that can be used to extract the singing-voice spectrogram from the input polyphonic spectrogram; the other for estimating the fundamental frequency (F0) of the singing voice. Although it is not easy for the user to edit the mask, he or she can iteratively correct errors in part of the visualized F0 trajectory through simple interaction. Our unified networks leverage the user-corrected F0 to improve the rest of the F0 trajectory through the model adaptation, which results in better separation quality. We validated this approach in a simulation experiment showing that the F0 correction can improve the quality of singing-voice separation. We also conducted a pilot user study with an expert musician, who used our system to produce a high-quality singing-voice separation result.

Contextual multi-armed bandit strategies for diagnosing post-harvest diseases of apple

This paper describes a decision support system to fulfill the diagnosis of post-harvest diseases in apple fruit. The diagnostic system builds on user feedback elicitation via multiple conversational rounds. The interaction with the user is conducted by selecting and displaying images depicting symptoms of diseased apples. The system is able to adapt the image selection mechanism by exploiting previous user feedback through a contextual multi-armed bandit approach. We performed a large scale user experiment, where different strategies for image selection have been compared in order to identify which reloading strategy makes users more effective in the diagnosis task. Concretely, we compared contextual multi-armed bandit methods with two baseline strategies and identified that the exploration-exploitation principle significantly paid off in comparison to a greedy and to a random stratified selection strategy. Although the application context is very domain specific, we believe that the general methodology of conversational item selection for eliciting user preferences applies to other scenarios in decision support and recommendation systems.

SESSION: Gestures & accessibility

Pose2Pose: pose selection and transfer for 2D character animation

An artist faces two challenges when creating a 2D animated character to mimic a specific human performance. First, the artist must design and draw a collection of artwork depicting portions of the character in a suitable set of poses, for example arm and hand poses that can be selected and combined to express the range of gestures typical for that person. Next, to depict a specific performance, the artist must select and position the appropriate set of artwork at each moment of the animation. This paper presents a system that addresses these challenges by leveraging video of the target human performer. Our system tracks arm and hand poses in an example video of the target. The UI displays clusters of these poses to help artists select representative poses that capture the actor's style and personality. From this mapping of pose data to character artwork, our system can generate an animation from a new performance video. It relies on a dynamic programming algorithm to optimize for smooth animations that match the poses found in the video. Artists used our system to create four 2D characters and were pleased with the final automatically animated results. We also describe additional applications addressing audio-driven or text-based animations.

Personalized gestural interaction applied in a gesture interactive game-based approach for people with disabilities

Technology can support people with disabilities to participate in social and economic life. Using relevant Human-Computer Interaction, as obtained through Intelligent User Interfaces, people with motor and speech impairments may be able to communicate in different ways. Augmentative and Alternative Communication supported by Computer Vision systems can benefit from the recognition of users' remaining functional motions as an alternative interaction design approach. Based on a methodology in which gestures and their meanings are created and configured by users and their caregivers, we developed a Computer Vision system, named PGCA, that employs machine learning techniques to create personalized gestural interaction as an Assistive Technology resource for communication purposes. Using a low-cost approach, PGCA has been experienced with students with motor and speech impairments to create personalized gesture datasets and to identify improvements for the system. This paper presents an experiment carried with the target audience using a game-based approach where three students used PGCA to interact with communication boards and to play a game. The system was evaluated by special education professionals using the System Usability Scale and was considered suitable for its purpose. Results from the experiment suggest the technical feasibility for the methodology and for the system, also adding knowledge about the interaction process of disabled people with a game.

SaIL: saliency-driven injection of ARIA landmarks

Navigating webpages with screen readers is a challenge even with recent improvements in screen reader technologies and the increased adoption of web standards for accessibility, namely ARIA. ARIA landmarks, an important aspect of ARIA, lets screen reader users access different sections of the webpage quickly, by enabling them to skip over blocks of irrelevant or redundant content. However, these landmarks are sporadically and inconsistently used by web developers, and in many cases, even absent in numerous web pages. Therefore, we propose SaIL, a scalable approach that automatically detects the important sections of a web page, and then injects ARIA landmarks into the corresponding HTML markup to facilitate quick access to these sections. The central concept underlying SaIL is visual saliency, which is determined using a state-of-the-art deep learning model that was trained on gaze-tracking data collected from sighted users in the context of web browsing. We present the findings of a pilot study that demonstrated the potential of SaIL in reducing both the time and effort spent in navigating webpages with screen readers.

SESSION: Natural language interfaces

User-in-the-loop adaptive intent detection for instructable digital assistant

People are becoming increasingly comfortable using Digital Assistants (DAs) to interact with services or connected objects. However, for non-programming users, the available possibilities for customizing their DA are limited and do not include the possibility of teaching the assistant new tasks. To make the most of the potential of DAs, users should be able to customize assistants by instructing them through Natural Language (NL). To provide such functionalities, NL interpretation in traditional assistants should be improved: (1) The intent identification system should be able to recognize new forms of known intents, and to acquire new intents as they are expressed by the user. (2) In order to be adaptive to novel intents, the Natural Language Understanding module should be sample efficient, and should not rely on a pretrained model. Rather, the system should continuously collect the training data as it learns new intents from the user. In this work, we propose AidMe (Adaptive Intent Detection in Multi-Domain Environments), a user-in-the-loop adaptive intent detection framework that allows the assistant to adapt to its user by learning his intents as their interaction progresses. AidMe builds its repertoire of intents and collects data to train a model of semantic similarity evaluation that can discriminate between the learned intents and autonomously discover new forms of known intents. AidMe addresses two major issues - intent learning and user adaptation - for instructable digital assistants. We demonstrate the capabilities of AidMe as a standalone system by comparing it with a one-shot learning system and a pretrained NLU module through simulations of interactions with a user. We also show how AidMe can smoothly integrate to an existing instructable digital assistant.

Predictive text encourages predictable writing

Intelligent text entry systems, including the now-ubiquitous predictive keyboard, can make text entry more efficient, but little is known about how these systems affect the content that people write. To study how predictive text systems affect content, we compared image captions written with different kinds of predictive text suggestions. Our key findings were that captions written with suggestions were shorter and that they included fewer words that that the system did not predict. Suggestions also boosted text entry speed, but with diminishing benefit for faster typists. Our findings imply that text entry systems should be evaluated not just by speed and accuracy but also by their effect on the content written.

Speech modulated typography: towards an affective representation model

The transcription of expressive speech into text is a lossy process, since traditional textual resources are typically not capable of fully representing prosody. Speech modulated typography aims to narrow the gap between expressive speech and its textual transcription, with potential applications in affect-sensitive text interfaces, closed-captioning, and automated voice transcriptions. This paper proposes and evaluates two different representation models of prosody-related acoustic features of expressive speech mapped as axes of a variable font. Our experiment tested its participants' preferences for four of these modulations: font weight, letter width, letter slant, and baseline shift. Each of these represented utterances expressed in one of five emotions (anger, happiness, neutrality, sadness, and surprise). Participants preferred font-weight for sentences spoken with intensity and baseline shift for quieter utterances. In both cases, the distance between each syllable's fundamental frequency and centroid frequency was a good predictor of these preferences.

SESSION: Recommender systems 1

Recommendation for video advertisements based on personality traits and companion content

People encounter video ads every day when they access online content. While ads can be annoying or greeted with resistance, they can also be seen as informative and enjoyable. We asked the question, what might make an ad more enjoyable? And, do people with different personality traits prefer to watch different ads --- could it be possible to better match ads and people? To answer these questions, we conducted an online study where we asked people to watch video ads of different emotional sentiments. We also measured their personality traits through an online survey. We found that the sentiment of people's preferred video ads varies significantly based on their personality traits. Additionally, we investigated when these ads are accompanied by content, how the emotional state induced by accompanying content affects people's ad preferences. We found that there was a complex relationship between people's emotional state induced by accompanying content and their ad preference when an ad highlighted either an alertness or calmness sentiment. However, when an ad highlighted activeness and amusement, the relationship was not significant. Overall, our results show that people's personality traits and their emotional states are two key elements that predict the tone of their preferred video ads.

Navigation-by-preference: a new conversational recommender with preference-based feedback

We present Navigation-by-Preference, n-by-p, a new conversational recommender that uses what the literature calls preference-based feedback. Given a seed item, the recommender helps the user navigate through item space to find an item that aligns with her long-term preferences (revealed by her user profile) but also satisfies her ephemeral, short-term preferences (revealed by the feedback she gives during the dialog). Different from previous work on preference-based feedback, n-by-p does not assume structured item descriptions (such as sets of attribute-value pairs) but works instead in the case of unstructured item descriptions (such as sets of keywords or tags), thus extending preference-based feedback to new domains where structured item descriptions are not available. Different too is that it can be configured to ignore long-term preferences or to take them into account, to work only on positive feedback or to also use negative feedback, and to take previous rounds of feedback into account or to use just the most recent feedback.

We use an offline experiment with simulated users to compare 60 configurations of n-by-p. We find that a configuration that includes long-term preferences, that uses both positive and negative feedback, and that uses previous rounds of feedback is the one with highest hit-rate. It also obtains the best survey responses and lowest measures of effort in a trial with real users that we conducted with a web-based system. Notable too is that the user trial has a novel protocol for experimenting with short-term preferences.

Explaining recommendations by means of aspect-based transparent memories

Recommender Systems have seen substantial progress in terms of algorithmic sophistication recently. Yet, the systems mostly act as black boxes and are limited in their capacity to explain why an item is recommended. In many cases recommendations methods are employed in scenarios where users not only rate items, but also convey their opinion on various relevant aspects, for instance by the means of textual reviews. Such user-generated content can serve as a useful source for deriving explanatory information to increase system intelligibility and, thereby, the user's understanding.

We propose a recommendation and explanation method that exploits the comprehensiveness of textual data to make the underlying criteria and mechanisms that lead to a recommendation more transparent. Concretely, the method incorporates neural memories that store aspect-related opinions extracted from raw review data. We apply attention mechanisms to transparently write and read information from memory slots.

Besides customary offline experiments, we conducted an extensive user study. The results indicate that our approach achieves a higher overall quality of explanations compared to a state-of-the-art baseline. Utilizing Structural Equation Modeling, we additionally reveal three linked key factors that constitute explanation quality: Content adequacy, presentation adequacy, and linguistic adequacy.

SESSION: Intelligent visualizations

With respect to what?: simultaneous interaction with dimension reduction and clustering projections

Direct manipulation interactions on projections are often incorporated in visual analytics applications. These interactions enable analysts to provide incremental feedback to the system in a semi-supervised manner, demonstrating relationships that the analyst wishes to find within the data. However, determining the precise intent of the analyst is a challenge. When an analyst interacts with a projection, the inherent ambiguity of some interactions leads to a variety of possible interpretations that the system could infer. Previous work has demonstrated the utility of clusters as an interaction target to address this "With Respect to What" problem in dimension-reduced projections. However, the introduction of clusters introduces interaction inference challenges as well. In this work, we discuss the interaction space for the simultaneous use of semi-supervised dimension reduction and clustering algorithms. Within this exploration, we highlight existing interaction challenges of such interactive analytical systems, describe the benefits and drawbacks of introducing clustering, and demonstrate a set of interactions from this space.

How do visual explanations foster end users' appropriate trust in machine learning?

We investigated the effects of example-based explanations for a machine learning classifier on end users' appropriate trust. We explored the effects of spatial layout and visual representation in an in-person user study with 33 participants. We measured participants' appropriate trust in the classifier, quantified the effects of different spatial layouts and visual representations, and observed changes in users' trust over time. The results show that each explanation improved users' trust in the classifier, and the combination of explanation, human, and classification algorithm yielded much better decisions than the human and classification algorithm separately. Yet these visual explanations lead to different levels of trust and may cause inappropriate trust if an explanation is difficult to understand. Visual representation and performance feedback strongly affect users' trust, and spatial layout shows a moderate effect. Our results do not support that individual differences (e.g., propensity to trust) affect users' trust in the classifier. This work advances the state-of-the-art in trust-able machine learning and informs the design and appropriate use of automated systems.

Towards visual debugging for multi-target time series classification

Multi-target classification of multivariate time series data poses a challenge in many real-world applications (e.g., predictive maintenance). Machine learning methods, such as random forests and neural networks, support training these classifiers. However, the debugging and analysis of possible misclassifications remain challenging due to the often complex relations between targets, classes, and the multivariate time series data. We propose a model-agnostic visual debugging workflow for multi-target time series classification that enables the examination of relations between targets, partially correct predictions, potential confusions, and the classified time series data. The workflow, as well as the prototype, aims to foster an in-depth analysis of multi-target classification results to identify potential causes of mispredictions visually. We demonstrate the usefulness of the workflow in the field of predictive maintenance in a usage scenario to show how users can iteratively explore and identify critical classes, as well as, relationships between targets.

An analytical framework for formulating metrics for evaluating multi-dimensional time-series data

This paper proposes a visual analytics framework for formulating metrics for evaluating multi-dimensional time-series data. Multidimensional time-series data has been collected and utilized in different domains. We believe evaluation metrics play an important role in utilizing those data, such as decision making and labeling training data used in machine learning. However, it is a difficult task for even domain experts to formulate metrics. To support the process of formulating metrics, the proposed framework represents metrics as a linear combination of data attributes, and provides a means for formulating it through interactive data exploration. A prototype interface that visualizes target data as an animated scatter plot was implemented. Through this interface, several visualized objects can be directly manipulated: a node and a trajectory of an instance, and a convex hull as the group of nodes and trajectories. Linear combinations of attributes are adjusted in accordance with the manipulation of different objects' types by the user. The effectiveness of the proposed framework was demonstrated through two application examples with real-world data.

SESSION: Sketching

Evaluating new requirements to pen-centric intelligent user interface based on end-to-end mathematical expressions recognition

Advances in deep learning end-to-end recognition systems allow moving forward intelligent user interfaces. At the same time, they lead to new implications and restrictions in the UI design. In this paper, we discuss new requirements for the pen-centric intelligent user interface for operating with mathematical expressions. We argue that the following features are necessary for such an interface: use of variables (both automatically assigned and defined by the user), quick editing, and connected writing support. We demonstrate and evaluate features in SMath - a solution for iterative input and automatic calculation of handwritten mathematical expressions. It allows pen-based input and editing of two-dimensional mathematical expressions in various scenarios with a possibility to reuse the results of previous calculations. Through the evaluation study, we justify the proposed requirements from the point of user needs and preferences. In addition to the automatic metrics that were collected during test tasks execution using the application, we gathered and summarized users' feedback, which can be further used to formulate functional requirements and design the user interface. We also provide the evaluation of the recognition performance of the system on an open-source dataset of handwritten mathematical expressions.

Creative sketching partner: an analysis of human-AI co-creativity

The creative sketching partner (CSP) is a proof of concept intelligent interface to inspire designers while sketching in response to a specified design task. With this interactive system we are studying the effect of an AI model of visual and conceptual similarity for selecting the Al's sketch response as an inspiration to the current state of the user's sketch. Specifically, we are interested in the user's behavior and response to an AI partner when engaged in a design task. By developing deep learning models of the sketches from a large-scale dataset, the user can control the amount of visual and conceptual similarity of the AI response when requesting inspiration from the CSP. We conducted a study with 50 design students to examine the participants' interaction behavior and their self reports. The participants' behavior maps into clusters that are co-related with three types of design creativity: combinatorial, exploratory, and transformational. Our findings demonstrate that the tool can facilitate ideation and overcome design fixation. In addition, analysis suggests that inspiration related to conceptual similarity is more associated with transformational creativity and inspiration related to visual similarity occurs more frequently during the detailed stages of design and is more prevalent with combinatorial creativity.

Recognizing perspective accuracy: an intelligent user interface for assisting novices

Sketching in perspective is a valuable skill for art, and for professional disciplines like industrial design, architecture, and engineering. However, it tends to be a difficult skill to grasp for novices. We have developed an algorithm that can classify rectilinear perspective strokes, as well as classify which of those strokes are accurate on a stroke-by-stroke basis. We also developed an intelligent user interface which can provide real-time accuracy feedback on a user's free-hand digital perspective sketch. To evaluate the system, we conducted a between-subjects user study with 40 novice participants which involved sketching city street corners in 2-point perspective. We discovered that the participants who received real-time intelligent feedback improved their perspective accuracy significantly (p < 0.0005) in a subsequent unassisted sketch, suggesting a mild learning effect and knowledge transfer. Through qualitative feedback we also derived principles for how intelligent real-time feedback can best assist novices in learning perspective.

Draw with me: human-in-the-loop for image restoration

The purpose of image restoration is to recover the original state of damaged images. To overcome the disadvantages of the traditional, manual image restoration process, like the high time consumption and required domain knowledge, automatic inpainting methods have been developed. These methods, however, can have limitations for complex images and may require a lot of input data. To mitigate those, we present "interactive Deep Image Prior", a combination of manual and automated, Deep-Image-Prior-based restoration in the form of an interactive process with the human in the loop. In this process a human can iteratively embed knowledge to provide guidance and control for the automated inpainting process. For this purpose, we extended Deep Image Prior with a user interface which we subsequently analyzed in a user study. Our key question is whether the interactivity increases the restoration quality subjectively and objectively. Secondarily, we were also interested in how such a collaborative system is perceived by users.

Our evaluation shows that, even with very little human guidance, our interactive approach has a restoration performance on par or superior to other methods. Meanwhile, very positive results of our user study suggest that learning systems with the human-in-the-loop positively contribute to user satisfaction. We therefore conclude that an interactive, cooperative approach is a viable option for image restoration and potentially other ML tasks where human knowledge can be a correcting or guiding influence.

Photo-realistic emoticon generation using multi-modal input

Emojis have changed the way humans communicate today. They are the most convenient non-linguistic social cues available to us in this era of social media. But there is no methodology wherein users can creatively interact with their systems to generate personalised emojis or edit existing ones. While there have been some experiments that enable networks to create images, there is no comprehensive solution that gives users the control to create personalised emoticons. In this work, we propose an end-to-end architecture to create a realistic emoji from a roughly drawn sketch. Our generated emojis show a PSNR value of 20.30dB and a SSIM of 0.914. Additionally, we look at a multi-modal architecture which generates an emoji when given an incomplete sketch along with a handwritten word describing the associated emotion.

SESSION: Explanations 1

Parallel embeddings: a visualization technique for contrasting learned representations

We introduce "Parallel Embeddings", a new technique that generalizes the classical Parallel Coordinates visualization technique to sequences of learned representations. This visualization technique is designed for concept-oriented "model comparison" tasks, allowing data scientists to understand qualitative differences in how models interpret input data. We compare user performance with our tool against Tensor Board Embedding Projector for understanding model accuracy and qualitative model differences. With our tool, users were more accurate and learned strategies for the tasks more quickly. Furthermore, users' analytical process in the comparison condition was positively influenced by using our tool beforehand.

Evaluating saliency map explanations for convolutional neural networks: a user study

Convolutional neural networks (CNNs) offer great machine learning performance over a range of applications, but their operation is hard to interpret, even for experts. Various explanation algorithms have been proposed to address this issue, yet limited research effort has been reported concerning their user evaluation. In this paper, we report on an online between-group user study designed to evaluate the performance of "saliency maps" - a popular explanation algorithm for image classification applications of CNNs. Our results indicate that saliency maps produced by the LRP algorithm helped participants to learn about some specific image features the system is sensitive to. However, the maps seem to provide very limited help for participants to anticipate the network's output for new images. Drawing on our findings, we highlight implications for design and further research on explainable AL In particular, we argue the HCI and AI communities should look beyond instance-level explanations.

NJM-Vis: interpreting neural joint models in NLP

Neural joint models have been shown to outperform non-joint models on several NLP and Vision tasks and constitute a thriving area of research in AI and ML. Although several researchers have worked on enhancing the interpretability of single-task neural models, in this work we present what is, to the best of our knowledge, the first interface to support the interpretation of results produced by joint models, focusing in particular on NLP settings. Our interface is intended to enhance interpretability of these models for both NLP practitioners and domain experts (e.g., linguists).

Trust in AutoML: exploring information needs for establishing trust in automated machine learning systems

We explore trust in a relatively new area of data science: Automated Machine Learning (AutoML). In AutoML, AI methods are used to generate and optimize machine learning models by automatically engineering features, selecting models, and optimizing hyperparameters. In this paper, we seek to understand what kinds of information influence data scientists' trust in the models produced by AutoML? We operationalize trust as a willingness to deploy a model produced using automated methods. We report results from three studies - qualitative interviews, a controlled experiment, and a card-sorting task - to understand the information needs of data scientists for establishing trust in AutoML systems. We find that including transparency features in an AutoML tool increased user trust and understandability in the tool; and out of all proposed features, model performance metrics and visualizations are the most important information to data scientists when establishing their trust with an AutoML tool.

AutoAIViz: opening the blackbox of automated artificial intelligence with conditional parallel coordinates

Artificial Intelligence (AI) can now automate the algorithm selection, feature engineering, and hyperparameter tuning steps in a machine learning workflow. Commonly known as AutoML or AutoAI, these technologies aim to relieve data scientists from the tedious manual work. However, today's AutoAI systems often present only limited to no information about the process of how they select and generate model results. Thus, users often do not understand the process, neither do they trust the outputs. In this short paper, we provide a first user evaluation by 10 data scientists of an experimental system, AutoAIViz, that aims to visualize AutoAI's model generation process. We find that the proposed system helps users to complete the data science tasks, and increases their understanding, toward the goal of increasing trust in the AutoAI system.

SESSION: Multimodal

Scones: towards conversational authoring of sketches

Iteratively refining and critiquing sketches are crucial steps to developing effective designs. We introduce Scones, a mixed-initiative, machine-learning-driven system that enables users to iteratively author sketches from text instructions. Scones is a novel deep-learning-based system that iteratively generates scenes of sketched objects composed with semantic specifications from natural language. Scones exceeds state-of-the-art performance on a text-based scene modification task, and introduces a mask-conditioned sketching model that can generate sketches with poses specified by high-level scene information. In an exploratory user evaluation of Scones, participants reported enjoying an iterative drawing task with Scones, and suggested additional features for further applications. We believe Scones is an early step towards automated, intelligent systems that support human-in-the-loop applications for communicating ideas through sketching in art and design.

PolySquare: a search engine for 3D models with tag propagation

Searching for desired 3D models is not easy because many of them are not well labeled; annotations often contain inconsistent information (e.g., uploaders' personal way of naming) and lack important details (e.g., detailed ornaments and pattern) of each model. We introduce PolySquare, a search engine for 3D models based on tag propagation---the process of assigning existing tags to other similar but unlabeled models considering important local properties. For instance, a tag `wheel' of a wheelchair can be spread out to other objects with wheels. Furthermore, PolySquare allows people to interactively refine the search results by iteratively including desired shapes and excluding unwanted ones. We evaluate the performance of tag propagation by measuring the precision-recall of propagation results with various similarity thresholds and demonstrate the effectiveness of the use of local features. We also showcase how PolySquare handles the unrefined tags through a case study using real 3D model data from Google Poly.

Generating need-adapted multimodal fragments

Multimodal content is central to digital communications and has been shown to increase user engagement - making them indispensable in today's digital economy. Image-text combination is a common multimodal manifestation seen in several digital forums, e.g., banners, online ads, social posts. The choice of a specific image-text combination is dictated by (a) the information to be represented, (b) the strength of the image and text modalities in representing the information, and (c) the need of the reader consuming the content. Given an input content, representing the information to be represented in a multimodal fragment, creating variants accounting for these factors is a non-trivial and tedious task; calling for a need to automate. In this paper, we propose a holistic approach to automatically create multimodal image-text fragments derived from an unstructured input content tailored towards a target need. The proposed approach aligns the fragment to the target need both in terms of content as well as style. With the help of metric-based and human evaluations, we show the effectiveness of the proposed approach in generating multimodal fragments aligned to target needs while also capturing the information to be presented.

Interacting in mixed reality: exploring behavioral differences between children and adults

With the development of intelligent interfaces and emerging technologies children and adult users are provided with exciting interaction approaches in several applications. Holographic applications were until recently only available to few people, mostly experts and researchers. In this work, we are investigating the differences between children and adult users towards their interaction behavior in mixed reality, when they were asked to perform a task. Analysis of the results demonstrates that children can be more efficient during their interaction in these environments while adults are more confident and their experience and knowledge is an advantage in achieving a task.

Proposal of a support system for learning brushstrokes in transcription for beginners

The purpose of this research is to propose a calligraphy learning support system focusing on brushstrokes. In conventional calligraphy, the learner imitates and writes characters while looking at a model placed next to the writing paper. However, it is difficult for the learner to learn the brushstroke characteristics, such as correct pressure and speed of brush movement, just by looking at a model. Therefore, we added an onomatopoeia utterance function to the conventional system to solve these problems. The onomatopoeia represents the brushstroke characteristics, such as pressure and speed. We compare the proposed method with the function to the comparative method without the function. As a result, it became clear that, when using the proposed method, correct brushstroke method could certainly be learned in a shorter time than with the comparison method, thus confirming the possibility of improving the calligraphy of proposed method.

SESSION: User modeling/personalization 1

The effect of numerical and textual information on visual engagement and perceptions of AI-driven persona interfaces

In an experiment, we present 38 marketing and data analysts professionals with two online AI-driven persona interfaces, one using numbers and the other using text. We employ eye tracking, think-aloud, and a post-engagement survey for data collection to measure perception and visual engagement with the personas along 7 constructs. Results show that the use of numbers has a mixed effect on the perceptions and visual engagement of the persona profile, with job role as a determining factor on whether numbers/text affect end users for 2 of the constructs. The use of numbers has a significant positive effect on user perceptions of usefulness by analysts but a significantly negative effect on user perceptions of completeness for both marketers and analysts. The use of numbers decreases the perceived completeness of the personas for both marketer and analysts. This research has both theoretical and practical consequences for AI-driven persona development and their interface design, suggesting that the inclusion of numbers can have a desirable effect for certain roles but with possible negative effects on user perceptions.

Driver drowsiness in automated and manual driving: insights from a test track study

Driver drowsiness is a major cause of traffic accidents. Automated driving might counteract this problem, but in the lower automation levels, the driver is still responsible as a fallback. The impact of driver drowsiness on automated driving under realistic conditions is, however, currently unknown. This work contributes to risk and hazard assessment with a field study comparing manual to level-2 automated driving. The experiment was conducted on a test track using an instrumented vehicle. Results (N=30) show that in automated driving, driver drowsiness (self-rated with the Karolinska Sleepiness Scale) is significantly higher as compared to manual driving. Age group (20--25, 65--70 years) and driving time have a strong impact on the self-ratings. Additionally, to subjective measures, a correlation was also identified between drowsiness and heart rate data. The gained knowledge can be useful for the development of drowsiness detection systems and dynamic configuration of driver-vehicle interfaces based on user state.

The effect of user characteristics in time series visualizations

There is increasing evidence that user characteristics can have a significant impact on visualization effectiveness, suggesting that visualizations could be enriched with personalization mechanisms that better fit each user's specific needs and abilities. In this paper, we contribute to this body of work with a study that investigates the impact of six user characteristics on the effectiveness of time series visualizations, which was not previously investigated in relation to personalizing Information visualization. We report on a controlled user study that compare four possible time series visualization techniques. User performance and how it was affected by user characteristics was measured while performing tasks from a formal taxonomy using Twitter data about real-world events. Our results show that both the personality trait of locus of control and the cognitive ability of verbal working memory influence which visualization is more effective when dealing with demanding and complex tasks. These findings extend the need for personalization to visualizations for time series data, and we discuss them in the context of creating systems that can utilize knowledge of the user's specific characteristics in order to present the most suitable visualization for each user.

Design and evaluation of intelligent agent prototypes for assistance with focus and productivity at work

Current research on building intelligent agents for aiding with productivity and focus in the workplace is quite limited, despite the ubiquity of information workers across the globe. In our work, we present a productivity agent which helps users schedule and block out time on their calendar to focus on important tasks, monitor and intervene with distractions, and reflect on their daily mood and goals in a single, standalone application. We created two different prototype versions of our agent: a text-based (TB) agent with a similar UI to a standard chatbot, and a more emotionally expressive virtual agent (VA) that employs a video avatar and the ability to detect and respond appropriately to users' emotions. We evaluated these two agent prototypes against an existing product (control) condition through a three-week, within subjects study design with 40 participants, across different work roles in a large organization. We found that participants scheduled 134% more time with the TB prototype, and 110% more time with the VA prototype for focused tasks compared to the control condition. Users reported that they felt more satisfied and productive with the VA agent. However, The perception of anthropomorphism in the VA was polarized, with several participants suggesting that the human appearance was unnecessary. We discuss important insights from our work for the future design of conversational agents for productivity, wellbeing, and focus in the workplace.

SESSION: Disability & accessibility

Dynamic layout optimization for multi-user interaction with a large display

In this paper, we propose a user interface that allows multiple users to obtain information interactively and to operate the interface simultaneously on a large display. Users can interact with the display using hand gestures, which enables many users to easily use the system from a distance. In order for each user to effectively use the screen and also not to be interfered with by other users, the system dynamically optimizes the layout of the screen. We created a prototype system and conducted an experiment to compare the proposed interface with those without optimization. The results showed that the participants were able to obtain more information per unit time using the proposed interface. Moreover, the participants were less hurried or frustrated, and felt less interfered with by other users.

ALCOVE: an accessible comic reader for people with low vision

Low vision is a visual impairment that affects one's capacity to perform daily activities, among others, reading. Since difficulty to access printed materials is an obstacle to education, employment and social activities, adaptations of these material is often required, such as the usage of large print or audio transcriptions. The advent of digital documents, on computers and mobile devices, has enhanced the accessibility of text for people with low vision; however, there is still a lack of supply regarding accessible books for recreation reading. In this work, we study how an accessible reader can impact reader's engagement with comics, for both children and adults. We summarise the results of online surveys that have been conducted about the current usage of comics by reader with low vision and how professional attempt to provide them. This led to the design of ALCOVE, an accessible comic reader for people with low vision. We report the results of a user study with 11 participants with low vision, that examined both the usefulness of the proposed comic reader and the reading presentation mode. Results show that the participants found our application easy and very convenient to use.

Investigating the intelligibility of a computer vision system for blind users

Computer vision systems to help blind users are becoming increasingly common, yet often these systems are not intelligible. Our work investigates the intelligibility of a wearable computer vision system to help blind users locate and identify people in their vicinity. Providing a continuous stream of information, this system allows us to explore intelligibility through interaction and instructions, going beyond studies of intelligibility that focus on explaining a decision a computer vision system might make. In a study with 13 blind users, we explored whether varying instructions (either basic or enhanced) about how the system worked would change blind users' experience of the system. We found offering a more detailed set of instructions did not affect how successful users were using the system nor their perceived workload. We did, however, find evidence of significant differences in what they knew about the system and they employed different, and potentially more effective, use strategies. Our findings have important implications for researchers and designers of computer vision systems for blind users, as well as more general implications for understanding what it means to make interactive computer vision systems intelligible.

SESSION: User modeling/personalization 2

Racial mirroring effects on human-agent interaction in psychotherapeutic conversations

Conversational agents are increasingly utilized to deliver mental health interventions. However, these systems are characterized by relatively poor adoption and adherence. Our study explores the "racial mirroring" effects on how people perceive and engage with agents in the context of psychotherapy. We developed a conversational system with racially heterogeneous personas using strong visual cues. We conducted an experiment by randomly assigning participants (N=212) to racial mirroring, non-mirroring and control groups. Our results suggest that racial mirroring did influence human-agent interaction in terms of perceived interpersonal closeness, user satisfaction, disclosure comfort, desire to continue interacting, and projected future relationship with agents. In this paper, we present the conversational system, experimental procedure, and results. We conclude with design recommendations for employing conversational agents in mental health intervention.

Calibration of game dynamics for a more even multi-player experience

When designing a serious game it is paramount to take into account that the players' performance may greatly vary. In many types of games the performance will impact the game experience, as it is related to a point acquisition mechanic. In specific contexts, such as multi-player non-competitive or collaborative games, the designer may desire to minimize this variability, reducing the difference in game progression between low-performing and high-performing players. In this paper we detail such a context: an educational serious game for sustainability addressed to classes of a primary school, where the low performance of a class may not be necessarily related to the children's effort. We thus present and evaluate a novel approach for the calibration of the game dynamics for achieving a more even player experience.

SESSION: Methods

Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems

Explainable artificially intelligent (XAI) systems form part of sociotechnical systems, e.g., human+AI teams tasked with making decisions. Yet, current XAI systems are rarely evaluated by measuring the performance of human+AI teams on actual decision-making tasks. We conducted two online experiments and one in-person think-aloud study to evaluate two currently common techniques for evaluating XAI systems: (1) using proxy, artificial tasks such as how well humans predict the AI's decision from the given explanations, and (2) using subjective measures of trust and preference as predictors of actual performance. The results of our experiments demonstrate that evaluations with proxy tasks did not predict the results of the evaluations with the actual decision-making tasks. Further, the subjective measures on evaluations with actual decision-making tasks did not predict the objective performance on those same tasks. Our results suggest that by employing misleading evaluation methods, our field may be inadvertently slowing its progress toward developing human+AI teams that can reliably perform better than humans or AIs alone.

Keeping it "organized and logical": after-action review for AI (AAR/AI)

Explainable AI (XAI) is growing in importance as AI pervades modern society, but few have studied how XAI can directly support people trying to assess an AI agent. Without a rigorous process, people may approach assessment in ad hoc ways---leading to the possibility of wide variations in assessment of the same agent due only to variations in their processes. AAR, or After-Action Review, is a method some military organizations use to assess human agents, and it has been validated in many domains. Drawing upon this strategy, we derived an AAR for AI, to organize ways people assess reinforcement learning (RL) agents in a sequential decision-making environment. The results of our qualitative study revealed several strengths and weaknesses of the AAR/AI process and the explanations embedded within it.

What is "intelligent" in intelligent user interfaces?: a meta-analysis of 25 years of IUI

This reflection paper takes the 25th IUI conference milestone as an opportunity to analyse in detail the understanding of intelligence in the community: Despite the focus on intelligent UIs, it has remained elusive what exactly renders an interactive system or user interface "intelligent", also in the fields of HCI and AI at large. We follow a bottom-up approach to analyse the emergent meaning of intelligence in the IUI community: In particular, we apply text analysis to extract all occurrences of "intelligent" in all IUI proceedings. We manually review these with regard to three main questions: 1) What is deemed intelligent? 2) How (else) is it characterised? and 3) What capabilities are attributed to an intelligent entity? We discuss the community's emerging implicit perspective on characteristics of intelligence in intelligent user interfaces and conclude with ideas for stating one's own understanding of intelligence more explicitly.

Human-in-the-loop AI in government: a case study

In this paper, we present a novel application where Human-Computer Interaction (HCI) meets Artificial Intelligence (AI) and discuss obstacles that need to be resolved on the long journey from research to production. Unlike academia and industries that have been at the forefront of automation for decades, government is a new player in the field, though an important one. We build systems that are used on a large scale, we collect data to inform policymakers. Using the example of the Household Budget Survey, we demonstrate how government agencies can apply Human-in-the-Loop AI to automate the production of official statistics. The aim is time and resource saving on repetitive, labour-intensive tasks which machines are good at, allowing humans to focus on value added tasks requiring flexibility and intelligence. One major challenge is the human factor. How will the users, who are accustomed to manual tasks, react to the complexity of AI? How should we design the interface to give them a good user experience? How do we measure success? Indeed, one key step towards production is to secure funding, which requires presenting potential success in a way that the stakeholder can understand. Stressing the importance of formulating problems from a practical business viewpoint, we hope to bridge the communication gap and help the research community reach out to more potential users and help solve more novel real-world problems.

SESSION: Explanations 2

Understanding the effects of control and transparency in searching as learning

In this paper, we analyze the benefits of adopting user interfaces that offer control and transparency for searching in contexts of learning activities. Concretely, we conducted a user study with pharmacy students performing a problem-solving task in the course of a university lecture. The task involved finding scientific papers containing relevant information to solve a clinical case. Students were split into two independent groups and assigned one search tool to perform the task. The baseline group worked with PubMed, a popular search engine in the life sciences domain, whereas the second half of the class was assigned an exploratory search system (ESS) designed for control and transparency. In the analysis, we cover the objective and subjective dimensions of the task outcomes. Firstly, the objective analysis addresses the inherent difficulty of the search task in a learning scenario and identifies certain improvements in performance for those students using the ESS, most notably when searching for primary-source content. The subjective analysis investigates the human factors side, providing evidence that the ESS effectively increases the perception of control and transparency and is able to produce a better user experience. Lastly, we report on perceived learning as a subjective dimension measured separately from user experience.

Leveraging rationales to improve human task performance

Machine learning (ML) systems across many application areas are increasingly demonstrating performance that is beyond that of humans. In response to the proliferation of such models, the field of Explainable AI (XAI) has sought to develop techniques that enhance the transparency and interpretability of machine learning methods. In this work, we consider a question not previously explored within the XAI and ML communities: Given a computational system whose performance exceeds that of its human user, can explainable AI capabilities be leveraged to improve the performance of the human? We study this question in the context of the game of Chess, for which computational game engines that surpass the performance of the average player are widely available. We introduce the Rationale-Generating Algorithm, an automated technique for generating rationales for utility-based computational methods, which we evaluate with a multi-day user study against two baselines. The results show that our approach produces rationales that lead to statistically significant improvement in human task performance, demonstrating that rationales automatically generated from an AI's internal task model can be used not only to explain what the system is doing, but also to instruct the user and ultimately improve their task performance.

Digging into user control: perceptions of adherence and instability in transparent models

We explore predictability and control in interactive systems where controls are easy to validate. Human-in-the-loop techniques allow users to guide unsupervised algorithms by exposing and supporting interaction with underlying model representations, increasing transparency and promising fine-grained control. However, these models must balance user input and the underlying data, meaning they sometimes update slowly, poorly, or unpredictably---either by not incorporating user input as expected (adherence) or by making other unexpected changes (instability). While prior work exposes model internals and supports user feedback, less attention has been paid to users' reactions when transparent models limit control. Focusing on interactive topic models, we explore user perceptions of control using a study where 100 participants organize documents with one of three distinct topic modeling approaches. These approaches incorporate input differently, resulting in varied adherence, stability, update speeds, and model quality. Participants disliked slow updates most, followed by lack of adherence. Instability was polarizing: some participants liked it when it surfaced interesting information, while others did not. Across modeling approaches, participants differed only in whether they noticed adherence.

ViCE: visual counterfactual explanations for machine learning models

The continued improvements in the predictive accuracy of machine learning models have allowed for their widespread practical application. Yet, many decisions made with seemingly accurate models still require verification by domain experts. In addition, end-users of a model also want to understand the reasons behind specific decisions. Thus, the need for interpretability is increasingly paramount. In this paper we present an interactive visual analytics tool, ViCE, that generates counterfactual explanations to contextualize and evaluate model decisions. Each sample is assessed to identify the minimal set of changes needed to flip the model's output. These explanations aim to provide end-users with personalized actionable insights with which to understand, and possibly contest or improve, automated decisions. The results are effectively displayed in a visual interface where counterfactual explanations are highlighted and interactive methods are provided for users to explore the data and model. The functionality of the tool is demonstrated by its application to a home equity line of credit dataset.

SESSION: Wearable & ubiquitous

Detecting errors in pick and place procedures: detecting errors in multi-stage and sequence-constrained manual retrieve-assembly procedures

Many human activities, such as manufacturing and assembly, are sequence-constrained procedural tasks (SPTs): they consist of a series of steps that must be executed in a specific spatial/temporal order. However, these tasks can be error prone - steps can be missed out, executed out-of-order, and repeated. The ability to automatically predict if a person is about to commit an error could greatly help in these cases. The prediction could be used, for example, to provide feedback to prevent mistakes or mitigate their effects. In this paper, we present a novel approach for real-time error prediction for multi-step sequence tasks which uses a minimum viable set of behavioural signals. We have three main contributions. The first we present an architecture for real-time error prediction based on task tracking and intent prediction. The second is to explore the effectiveness of using hand position and eye-gaze tracking for task tracking. We confirm that eye-gaze is more effective for intent prediction, hand tracking is more accurate for task tracking and that combining the two provides the best overall response. We show that using Hands and Gaze tracking data we can predict selection/placement errors with an F1 score of 97%, approximately 300ms before the error would occur. Finally, we discuss the application of this hand-gaze error detection architecture used in conjunction with head-mounted AR displays, to support industrial manual assembly.

Multi-attention deep recurrent neural network for nursing action evaluation using wearable sensor

A nursing action evaluation system that can assess the performance of students practicing patient handling related nursing skills becomes an urgent need for solving the nursing educator shortage problem. Such an evaluation system should be designed with less hand-crafted procedures for its scalability. Additionally, realizing high accuracy of nursing action recognition, especially fine-grained action recognition remains a problem. This reflects in the recognition of the correct and incorrect methods when students perform a nursing action, and low accuracy of that would mislead the nursing students. We propose a multi-attention deep recurrent neural network (MA-DRNN) model for nursing action recognition by directly processing the raw acceleration and rotational speed signals from wearable sensors. Data samples of target nursing actions in a nursing skill called patient transfer were collected to train and compare the models. The experiment results show that the proposed model can reach approximately 96% recognition accuracy for four target fine-grained nursing action classes helped by the attention mechanism on time and layer domains, which outperforms the state-of-the-art models of wearable sensor-based HAR.

Ring-based finger tracking using capacitive sensors and long short-term memory

We present a ring-shaped interaction device, called PeriSense, utilizing capacitive sensing in order to enable finger tracking. The finger angles and its adjacent fingers are sensed by measuring capacitive proximity between electrodes and human skin. To map the capacitive measurements to the finger angles, we use long short-term memory (LSTM). By wearing the ring on the middle finger, the angles of the index, middle and ring finger can be determined from the capacitive measurements. We collected sample data from 17 users using a Leap Motion camera, which provided the reference data. In a leave-one-user-out cross-validation test, we revealed a mean absolute error of 13.02 degrees over all finger angles. The motion of the little finger and the thumb are out of the capacitive measurement range or covered by the index and ring finger, respectively. With natural finger movements, the LSTM estimates the angles of the little finger and thumb based on the movement pattern of the other fingers.

SESSION: Recommender systems 2

Using affordances to improve AI support of social media posting decisions

Intelligent systems are limited in their ability to match the fluid social needs of people. We use affordances---people's perceptions of the utilities of a target system---as a means of creating models that provide intelligent systems with a better understanding of how people make decisions. We study affordance-based models in the context of social network site (SNS) usage, a domain where people have complex social needs often poorly supported by technology. Using data collected via a scenario-based survey (N=674), we build two affordance-based models about people's multi-SNS posting behavior. Our results highlight the feasibility of using affordances to help intelligent systems support people's decision-making behavior: both of our models are ~15% more accurate than a majority-class baseline, and they are ~33% and ~48% more accurate than a random baseline for this task. We contrast our approach with other ways of modeling posting behavior and discuss the implications of using affordances for modeling human behavior for intelligent systems.

With a little help from my peers: depicting social norms in a recommender interface to promote energy conservation

How can recommender interfaces help users to adopt new behaviors? In the behavioral change literature, nudges and norms are studied to understand how to convince people to take action (e.g. towel re-use is boosted when stating that `75% of hotel guests' do so), but what is advised is typically not personalized. Most recommender systems know what to recommend in a personalized way, but not much research has considered how to present such advice to help users to change their current habits. We examine the value of presenting normative messages (e.g. `75% of users do X') based on actual user data in a personalized energy recommender interface called `Saving Aid'. In a study among 207 smart thermostat owners, we compared three different normative explanations (`Global', `Similar', and `Experienced' norm rates) to a non-social baseline (`kWh savings'). Although none of the norms increased the total number of chosen measures directly, we show evidence that the effect of norms seems to be mediated by the perceived feasibility of the measures. Also, how norms were presented (i.e. specific source, adoption rate) affected which measures were chosen within our Saving Aid interface.

TAPrec: supporting the composition of trigger-action rules through dynamic recommendations

Nowadays, users can personalize the joint behavior of their connected entities, i.e., smart devices and online service, by means of trigger-action rules. As the number of supported technologies grows, however, so does the design space, i.e., the combinations between different triggers and actions: without proper support, users often experience difficulties in discovering rules and their related functionality. In this paper, we introduce TAPrec, an End-User Development platform that supports the composition of trigger-action rules with dynamic recommendations. By exploiting a hybrid and semantic recommendation algorithm, TAPrec suggests, at composition time, either a) new rules to be used or b) actions for auto-completing a rule. Recommendations, in particular, are computed to follow the user's high-level intention, i.e., by focusing on the rules' final purpose rather than on low-level details like manufacturers and brands. We compared TAPrec with a widely used trigger-action programming platform in a study on 14 end users. Results show evidence that TAPrec is appreciated and can effectively simplify the personalization of connected entities: recommendations promoted creativity by helping users personalize new functionality that are not easily noticeable in existing platforms.