ZENSORS: FREQUENTLY ASKED QUESTIONS
Why not use Computer Vision altogether?
Our goal was not to beat existing CV systems; there are many advanced techniques to improve accuracy. Our CV features + ML show that even a basic approach can achieve high accuracy.
Even with our “simple” approach, we are unaware of any system in HCI or CV literature that approaches the breadth of Zensors. For example, here are 4 of the 13 sensors we deployed as part of an early study:
- “How messy is the counter?” (scale)
- “Is there leftover food?” (binary)
- “What type of food do you see?” (category)
- “How many people are standing in line?” (number)
- diverse, natural language questions
- sensors authored by non-experts (see video)
- with reasonably high accuracy
- needs zero training data
- receives live data within seconds
If you're sending an image every, let's say, 30 seconds, wouldn't that be a lot of images to process?
Correct, but we employ image similarity detection, wherein we only process images that are significantly different from a previously captured frame. This reduces the amount of data being sent to the crowd, with a reduction of up to 40% - 60%.
Is Zensors accuracy bounded by Computer Vision or the Crowd?
We show that accuracy is bounded by crowd accuracy, mainly due to image quality and question ambiguity. However, there are ways to mitigate this e.g., question templates, richer context, and example labels.
How much of Machine Learning have you explored in this work?
We are just scratching the surface when it comes to Machine Learning. We continue exploring contributing factors for success, and we plan to explore more sophisticated computer vision and machine learning techniques such as deep learning.