The creation and use of datasets, data science and machine learning artifacts is a critical contemporary force on human cultures.
How should we study data science practices?
Bring your research perspective to an interdisciplinary workshop among researchers from HCI, sociology, psychology, computer science, machine learning and more.
With the rise and frequent opacity of big data, there has been a growing need to understand who is working in data science and how they are doing their work. Researchers in HCI, CSCW, and critical data studies have begun to examine these questions (for more detail, see Background below). In this workshop, we invite researchers and practitioners to share their observations, experiences, hypotheses, and insights. Depending on participant interest, we hope to answer questions such as the following.
HCI engagement with data science is relatively new for most people. Therefore, we will spend a half-day to share knowledge and perspectives in this domain, through brief presentations of participants’ position papers (which we will make available in advance, at the workshop website). The focus will be on the intersection of data science and HCI. Co-organizers will organize the presentations into "mini-sessions" based on common themes. We will conclude the morning by organizing the afternoon session into small groups, formed through recognition of shared interests during the morning discussions. The afternoon will conclude with small-group reports, a plan for future meetings, plans for a workshop report, and an optional informal dinner.
Schedule
If you would like to join the growing community of researchers and practitioners in HCI and allied fields who are building new insights, methods, and collaborative practices around data science, submit a position paper on topics such as (but are not limited to) the following:
We encourage you to contact us at husdatworkshop@gmail.com if you have any questions about the workshop or themes.
Your submission abstract should be a single PDF file between 2-4 pages in total, and include the following information:
As you will submit your abstract via email, please include a brief paragraph or so in the email about the following. This will help us organize the workshop around interdiciplinary interests a bit better.
We encouage authors to use the ACM SIGCHI Extended Abstract Format for their submissions.
Template:
At least one author of each accepted position paper must attend the workshop. All authors who attend the workshop must register for the workshop and at least one day of the conference.
Applications are open. Please email your submissions directly to husdatworkshop@gmail.com
Extraordinary claims are made about the promises and current successes of data science [1, 8, 14, 15, 19, 22, 29, 33] While some of these claims are stated for the future [7], Agarwal and Dhar editorialized in 2014 that "This is powerful... we are in principle already there"[2]. Meanwhile to the dystopian extreme, scholars warn about the "mythology" of working with big data, that the quantitative nature of data gives a false illusion that all data-driven outcomes are objective, ethical, or true [5]. Further complicating these discussions, there is considerable diversity in the tools and methods [12], challenges [13], and job-roles [15, 19, 23] involved in data science. Detailed study will be necessarily partial and contextualized, adding depth of description and understanding, but potentially lacking a broader view.
Several studies in HCI, CSCW, FAT (Fairness, Accountability, and Transparency, and critical data studies have begun to look at facets of these diverse topics.
Dealing with data, or data wrangling, has been estimated to take up 80-90% of the effort in a typical data science project [11, 16, 30]. Understanding how people approach their data is therefore important. Bilis contrasted two views of the analyst’s relationship with data [4]. In one view, the analyst takes a relatively passive stance, and receives data as "given" by the environment ("donnÃľ"). In a second view, the analyst takes a more active role as s/he captures data ("capta"). Pine and Leboiron made a similar point, claiming that in some cases "human-computer interactions start before the data reaches the computer because various measurement interfaces are the invisible premise of data and databases" (emphasis in their original text) [28]. Feinberg describes the "design" of data [9], and Patel et al. similarly describe the creation of features for analysis [27]. Muller et al. documented the sometimes necessary processes of the creation of data, including the creation of grounded truth data [24].
Passi and Jackson described an ongoing tension over the use of algorithms as rules [25]. They propose that data science students learn to practice a kind of data vision that emphasizes the discretionary craftspersonship needed to appropriately handle data analysis. This notion of "crafts[person]ship" appears again in studies of how novices develop machine learning models: novices fail when they cannot intuitively relate how the code they write interplays with the data itself [27, 35]. This problem becomes especially important from an HCI perspective, in which domain knowledge is often crucial to understand how to analyze a topic and its data (e.g., [34]). Recording the outcomes and managing the diverse experimental analytic histories (i.e., provenance) of data and code are also challenging. Despite the promise of literate programming [20], people who engage in data science tend to scant their documentation, apparently because of a tension between dynamic, engaged exploration and time-consuming explanation [17, 18, 31].
Despite the paucity of colleague-oriented documentation in many data science projects, there is increasing evidence that data science workers nonetheless collaborate. In an ethnographic study, Passi and Jackson reported on diverse actors, with diverse motivations, working in and with corporate data science teams [26]. They highlighted issues of trust among heterogeneous teams of designers, managers, business analysts, and data scientists, which are often resolved in part through work that is simultaneously "calculative" (e.g., reliance on quantified metrics, statistical tests) and "collaborative" (e.g., negotiation and translation work). Chang et al. proposed collaborative commenting in JupyterLab notebooks [6]. Recent tools such as Co-Calc or Colaboratory provides real-time chat and multi-user storage in a notebook programming environment.
In summary, the field of data science presents us with multiple tensions which might be addressed through HCI research. A partial list of these tensions would include:
Some of these challenges may be resolvable. Other challenges may take the form of enduring analytic dimensions that inform our research plans and outcomes.
Michael Muller works as a researcher at IBM Research AI, where he studies data science work, and collaborates with data science workers to design future tools for data science.
Bonnie John is a Senior Interaction Designer at Bloomberg, where she uses user-centered methods to design and evaluate tools for financial data scientists and collaborates with Project Jupyter.
Melanie Feinberg is an associate professor at the School of Information and Library Science (SILS) at the University of North Carolina at Chapel Hill. She studies the practices by which data is made, and the characteristics of data as both design artifact and design material.
Mary Beth Kery is a PhD student at the Human-Computer Interaction Institute at Carnegie Mellon University. Her research focuses on studying programmer behavior and designing new kinds of programming tools to support exploratory data science work.
Timothy George works as a UI/UX Designer for Project Jupyter, where he designs next generation data science tools. He also works to develop open standards, protocols and practices for practitioning data scientists.
Samir Passi is a PhD candidate in the Department of Information Science at Cornell University. His research focuses on the forms of human work in data science learning, research, and practice. He studies such forms of work ethnographically in the context of academic as well as corporate data science.
Steven Jackson is an Associate Professor and Chair of Information Science at Cornell University. His work addresses questions of ethics, policy and practice in emerging computing fields.