1st CHI 2019 Workshop on Human-Centered Study of Data Science Work Practices

How should we study data science practices?

The creation and use of datasets, data science and machine learning artifacts is a critical contemporary force on human cultures.

Bring your research perspective to an interdisciplinary workshop among researchers from HCI, sociology, psychology, computer science, machine learning and more.

Call for Participation

We hope to gather the growing community of researchers and practitioners in HCI and allied fields who are building new insights, methods, and collaborative practices around data science. Topics of interest include (but are not limited to):


  • Contextualize and understand data science work practices - by individuals, and by groups and teams
  • Characterize the work practices of data science workers, including programming, ideation, and collaboration
  • Building tools or methods to support human activities in data science work
  • Show how practices of data creation and aggregation work of data science
  • Understand the shared and unique design challenges of data science environments, including methods and tools for comprehending data, data wrangling, model building, debugging, collaborating and communicating results, especially to nonprogrammers
  • Support the incorporation of diverse ethics and human values into data science work, e.g., in relation to algorithmic fairness or bias reduction.
  • Bring sociotechnical and organizational perspectives on data work to bear on data science education and practice
  • Suggest methods of standardizing or coordinating data collection across organizational and industry boundaries.
  • Bridge the gap between the knowledge of data scientists and that of domain experts in various fields of application
  • Widen the audience for data science beyond highly technically skilled programmers, to include UX designers, project managers, novice programmers, and other stakeholders into a data-driven project
  • Help policymakers to build more effective, appropriate, and transparent rules around the complex domains of data science work

We encourage you to contact us if you have any questions about the workshop or themes!

Overview

With the rise of big data, there has been an increasing need to understand who is working in data science, and how they are doing their work. HCI and CSCW researchers have begun to examine these questions. In this workshop, we invite researchers to share their observations, experiences, hypotheses, and insights, in the hopes of developing a taxonomy of work practices and open issues in the behavioral and social study of data science and data science workers.

The outcomes of data science work are increasingly influential in much of the world. How people work in data science is equally important, if we are to understand, support, and critique as (appropriate) these important societal forces. We invite our colleagues to work with us for a deeper understanding of how human perform the work of data science.

Extraordinary claims are made about the promises and current successes of data science. While some of these claims are stated for the future, Agarwal and Dhar editorialize that “This is powerful… we are in principle already there”. At a different extreme, scholars have criticized the “mythology” of such claims.

Further complicating these discussions, there is considerable diversity in the tools and methods, challenges, and job-roles involved in data science. Detailed study will be necessarily partial and contextualized, adding depth of description and understanding, but potentially lacking a broader view.

Several studies in HCI and CSCW have begun to look at facets of these diverse topics. Passi and Jackson described an on-going tension over the use of algorithmic rules. They propose that data science students can learn to practice a kind of data vision that treats rules more as guidance (“rules-based”) than as formal constraint (“rules-bound”).

Dealing with data, or “data wrangling,” has been estimated to take up 80-90% of the effort in a typical data science project. Understanding how people approach their data is therefore important.

Bilis contrasted two views of the analyst’s relationship with data. In one view, the analyst takes a relatively passive stance, and receives data as “given” by the environment (“donné”). In a second view, the analyst takes a more active role as s/he captures data (“capta”). Pine and Leboiron made a similar point, claiming that in some cases “human-computer interactions start before the data reaches the computer because various measurement interfaces are the invisible premise of data and databases” (emphasis in their original text).

Mentis et al. described curatorial practices with data among transplant surgeons as they engaged in the necessary work of “crafting the image” for one another, and Taylor et al. offered similar observations how a local community “enacted a multiplicity of ‘small worlds” in their data. Feinberg describes the “design” of data, and Patel et al. similarly describe the creation of features for analysis. Muller et al. documented the sometimes necessary processes of the creation of data, including the creation of grounded truth data.

Recording the outcomes and managing the diverse experimental analytic histories of data science work are also challenging. Despite the promise of literate programming [KNUTH84], people engaged in data science tend to scant their documentation, apparently because of a tension between dynamic exploration and time-consuming explanation.

Call for Participation

Submission Details:

Your submission abstract should be a single PDF file between 2-4 pages in total, and include the following information:

  1. Name, title, affiliation, and email of authors
  2. A description of one or more themes of particular interest to you that are related to the workshop topic. This should be presented as an extended abstract summarizing a research idea, including relevant related work and any original research contributions.
  3. A short summary of your background, interest in this area, and motivations for participating in the workshop.
  4. If relevant, you may provide links to additional online materials in the PDF.

As you will submit your abstract via email, please include a brief paragraph or so in the email about the following. This will help us organize the workshop around interdiciplinary interests a bit better!

  1. a brief overview of your background;
  2. your specific interests as they relate to one or more workshop themes outlined above; and
  3. and what you hope to achieve from the workshop.

We encouage authors to use the ACM SIGCHI Extended Abstract Format for their submissions.
Template:

Important Dates:

Paper submission deadline: February 12, 2019 (11:59 pm EST)
Notification of acceptance: March 1, 2019
Workshop at CHI in Glascow: May 4 or 5, 2019 (TBD)

Applications are open! Please email your submissions directly to husdatworkshop@gmail.com

Submission Evaluation:

  • Applications will be peer reviewed by organizers, and where appropriate, external reviewers.
  • Successful applications will be selected based on their relevance to their workshop themes, fit with the program and background of applicants. A small number of successful submissions will be invited to present their accounts at the workshop.
  • Applications will NOT be penalized for lack of adherence to ACM formatting guidelines.

Organizers


Michael Muller (Primary contact)

Michael Muller works as a researcher at IBM Research AI, where he studies data science work, and collaborates with data science workers to design future tools for data science.

Bonnie E. John

Bonnie John is a Senior Interaction Designer at Bloomberg, where she uses user-centered methods to design and evaluate tools for financial data scientists and collaborates with Project Jupyter.

Melanie Feinberg

Melanie Feinberg is an associate professor at the School of Information and Library Science (SILS) at the University of North Carolina at Chapel Hill. She studies the practices by which data is made, and the characteristics of data as both design artifact and design material.

Mary Beth Kery

Mary Beth Kery is a PhD student at the Human-Computer Interaction Institute at Carnegie Mellon University. Her research focuses on studying programmer behavior and designing new kinds of programming tools to support exploratory data science work.

Timothy George

Timothy George works as a UI/UX designer for Project Jupyter, where he designs next generation data science tools. He also works to develop open standards, protocols and practices for practitioning data scientists.

Samir Passi

Samir Passi is a PhD candidate in the Department of Information Science at Cornell University. His research focuses on the forms of human work in data science learning, research, and practice. He studies such forms of work ethnographically in the context of academic as well as corporate data science.

Steven Jackson

Steven Jackson is an Associate Professor and Chair of Information Science at Cornell University. His work addresses questions of ethics, policy and practice in emerging computing fields.