According to a recent survey, the percentage of companies that have adopted artificial intelligence (AI) is now in the majority. Benefits aside, AI regularly fails when making critical business decisions. This introduces risk to these companies and their end-users which can be significant and strike at the most unpredictable moments.
Computer vision (CV) is a field of AI that assesses and acts on information from visual inputs, such as digital images and video. CV models tend to be significantly more complex and failure-prone than tabular and natural language processing (NLP) since images are extremely high-dimensional, necessitating models which are much more complex and uninterpretable.
Real-world examples
CV has proven to be widely applicable, and failure can take many forms. One such example is Twitter’s biased photo cropping algorithm. A study of 10,000 images demonstrated bias in what the algorithm chooses to highlight. The photo cropping tool was used to automatically crop and zoom-in on images to display in tweets. The study showed that the algorithm chose to display white people over black people and demonstrated gender bias as well. Twitter has since minimized their use of the algorithm and will likely stop using it altogether.
Another example is with the use of facial recognition. This is particularly contentious when used in law enforcement because of the adverse impacts inflicted on marginalized groups due to racial biases embedded in models. Some cities, such as Boston and San Francisco, have gone so far as to ban the use of facial recognition in policing. In a widely cited research study, Gender Shades, Joy Buolamwini (the researcher behind Gender Shades) “uncovered severe gender and skin-type bias in gender classification” from facial analysis technology used by leading companies (like IBM and Microsoft).
Analyzing a CV model
Let’s walk through a concrete example to illustrate the magnitude of AI risk in CV. We trained a widely used model architecture (Resnet-18) on a well-known dataset for constrained learning (Animals with Attributes 2). The model must differentiate between 40 types of animals.
We have detected a host of vulnerabilities. Our image brightness test immediately finds 4 images which are abnormally bright given the data that the model was trained on:
The results of this test suggest that images with abnormal brightness levels are a threat during production, suggesting that CV models are subject to fail with slight changes or minor abnormalities in data.
In the test case below, it is again clear that CV models are at risk with small elements of data change. For example, a small amount of blur fools our state-of-the-art model into believing that a napping tiger is in fact a deer, further demonstrating risks in the use of CV models.
In manufacturing, blurred images may prevent fault detection systems from catching defective parts which could ultimately harm people. In agricultural systems, the brightness of the day may significantly affect the accuracy of automated crop management systems and the resulting food supply.
How leaders manage risk today
To better understand the unique challenges presented by managing CV models in production, Robust Intelligence hosted a roundtable with stakeholders and leaders in the field. Some key takeaways from the discussion are as follows:
- Testing and monitoring models is burdensome for teams
- Responding to an environment change is extremely cumbersome and takes a long time
- Many respond slowly to distribution shift because they fear hurrying the redeployment process will introduce additional errors
- Many are curious about best practices for model validation when models are redeployed
- Some save hard samples in a dataset bank that they hope will eventually cover all cases
- It is difficult to predict what abnormal inputs one will see in production
It’s notable that our roundtable participants largely expressed similar pain points despite coming from different industries and use cases. Hearing how some teams are attempting to manage risk was insightful. It’s evident that increasing the efficiency of model redeployment by automating the processing of “hard samples” seen during production sounds promising, along with regression testing for new models.
How to mitigate risk
It is clear that there are shared challenges in deploying CV models into production, and it’s important to address risk by taking proactive steps to avoid these failures. Robust Intelligence is uniquely suited to solve this problem. As the industry’s first end-to-end solution designed to eliminate AI failure, we stress test your models in development and protect them in production.
Request a demo here if you’d like to learn how we can eliminate risk from your CV models.