Angler: Helping Machine Translation Practitioners Prioritize Model Improvements

( * Authors contributed equally )
crown jewel figure
Angler enables ML developers to easily explore and curate challenge sets for machine translation. (A) The Table View lists all challenge sets where users can compare them by metrics such as sample count, model performance, and familiarity score. After selecting a set, (B) the Detail View allows users to further explore samples in this set across various dimensions. (B1) The Timeline enables users to query data samples by time. (B2) The Spotlight presents visualizations with linking and brushing to help users characterize the set from different angles. (B3) The Sentence List shows all selected data samples and allows users to further fine-tune before exporting this challenge set for downstream tasks.
Demo Video
Machine learning (ML) models can fail in unexpected ways in the real world, but not all model failures are equal. With finite time and resources, ML practitioners are forced to prioritize their model debugging and improvement efforts. Through interviews with 13 ML practitioners, we found that they construct small targeted test sets to estimate an error's nature, scope, and impact on users. We built on this insight in a case study with machine translation models, and developed Angler, an interactive visual analytics tool to help practitioners prioritize model improvements. In an observational study with 7 machine translation experts, we used Angler to understand prioritization practices when the input space is infinite, and obtaining reliable signals of model quality is expensive. Our study revealed that participants could form more interesting and user-focused hypotheses for prioritization by analyzing quantitative summary statistics and qualitatively assessing data by reading sentences.
Angler: Helping Machine Translation Practitioners Prioritize Model Improvements
(*Authors contributed equally)
  title = {Angler: {{Helping Machine Translation Practitioners Prioritize Model Improvements}}},
  booktitle = {{{CHI Conference}} on {{Human Factors}} in {{Computing Systems}}},
  author = {Robertson, Samantha and Wang, Zijie J. and Moritz, Dominik and Kery, Mary Beth and Hohman, Fred},
  year = {2023},
  doi = {10.1145/3544548.3580790},
  langid = {english}