Photo editing can be a challenging task, and it becomes even more difficult on the small, portable screens such as camera phones that are now frequently used to edit images. To address this problem we present PixelTone, a multimodal photo editing interface that combines speech and direct manipulation. We utilize semantic distance modeling, allowing users to express voice commands using their own words instead of application-enforced terms. Additionally, user's can point to subjects in an image, tag them with names, and refer to those tags while simultaneously editing using voice commands.
Gierad Laput, Mira Dontcheva, Gregg Wilensky, Walter Chang, Aseem Agarwala, Jason Linder, and Eytan Adar. 2013. PixelTone: a multimodal interface for image editing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). ACM, New York, NY, USA, 2185-2194. DOI=http://dx.doi.org/10.1145/2470654.2481301