Computer Vision AI and its Impact on Business
Reid McCrabb
February 8, 2024
“Vision is the most underinvested, highest potential field of AI. Language will never have the bitrate and dimensionality to approximate the universe. Language asks AI to come down to our level vs reaching its greatest potential. Language is merely a way to decode back to our human brains. It’s an inherently lossy way to compress dimensionality to our small minds. We are stuck in a local minima due to the success of ChatGPT with a grand race focused largely on language instead of vision.” – Suhail Doshi
Introduction
The surge of AI in business is being led by LLMs, but this has become a gateway drug into Computer Vision technology. Unlike Large Language Models that interpret and generate text, Computer Vision focuses on enabling computers to understand and interpret the visual world.
This technology automates a wide range of tasks traditionally performed by humans. By analyzing visual data, computer vision systems can recognize patterns in the world and act upon them or provide the information back to humans.
As this technology evolves, computer vision AI is set to redefine how businesses operate, making processes faster, more accurate, and significantly less reliant on manual intervention.
Business Applications
Most visual actions that humans perform are quantifiable by data and can be outsourced to technology. From scanning the roads for passing cars to tracking the number of customers that enter a storefront, computer vision applications are becoming more and more relevant in daily life. Below are some examples of the use cases emerging today.
Self Driving
Perhaps the most prevalent example of computer vision is in self-driving. Computer Vision Companies leading this space include Tesla and Waymo.
Tesla now has a “Full-self driving” package, where car operators can essentially become passengers in their own vehicle. Tesla’s self-driving is still improving, today it is still a requirement for a driver to be operation ready.
Waymo, on the other hand, is completely driverless. The self-driving taxi service is incredibly popular in San Francisco and to a smaller degree in Phoenix. Waymo plans to roll out the self driving taxi service in Austin and Los Angelos next.
Below, is a simplified version of computer vision being optimized for self-driving.
Coding
The ability to code has never been easier due to AI. As the barrier to entry lowers, more people are becoming developers, and the actual definition of what a developer is, rapidly is shifting.
Learning the syntax for given languages is not necessary in many cases today, and will be increasingly less useful in the future. A simple natural language prompt to AI can produce the needed code for most simple use cases.
Text-based applications, like chatGPT and Co-pilot are leading the way in the AI coding revolution. But, a series of use cases surrounding computer vision has been unleashed. Most notably, screenshots to code.
By supplying an image of a webpage, or of a Figma file, language models can write front end code at a pretty impressive level. The user can then continue iterating via text-based directions, or go into the actual code itself and make changes.
Retail
Quantifying customers coming through your door can enable companies to understand what days are busy, who their cliental is, the average length of stay, number of items purchased and much more.
This information is incredibly valuable to a business, and can enable them to prepare staff, inventory, advertising, and more based on the data collected.
This coffee shop uses computer vision to track the productivity of baristas, as well how much time customers are spending in the shop, and the number of cups of coffee consumed.
Understanding Computer Vision
At its core, Computer Vision enables machines to identify pixels and translate them into numerical data.
Rigorous training of these visual AI models grants the ability to achieve a high level of accuracy, often surpassing human capabilities by reducing the likelihood of biases and errors inherent in human judgment.
Furthermore, the speed of computer-based computation significantly exceeds that of human reasoning. Computer Vision systems can process and analyze visual data in real time, providing instant results without the delay typically associated with manual data processing and analysis. This efficiency opens up new possibilities for applications where timely decision-making is critical.
Linkt’s First Computer Vision Product
At Linkt, computer vision is a primary focus. We are currently developing a model that will be implemented to automate the quoting process for laborious tasks.
By combining a mixture of Computer vision and Transformer Models, we can automate both the tedious task of identifying objects and offering a quote to a client. On the company side, the job is given a rating on its level of difficulty, which then notifies which employees to send to the job.
More specifically, in the use case of a moving company we are working with, this means identifying pieces of furniture, calculating the dimensions, as well as if an item is fragile, bulky, or a tight fit and then using LLMs to reason a price and difficulty level based on the mixture of variables.
Computer Vision Innovation
Just like with Large Language Models, OpenAI’s closed source model, GPT-4V, is a leader in the vision AI space and used by many companies today. The model is much newer than the LLM models, and experiments are being run on what the actual use cases are. Open-source models are also available, the top ones today include LLaVA, CogVLM, and Yolov8.
Separate from Computer Vision, but ultimately related, is the development of spatial computing. This is the technology powering the Apple Vision Pro. The device has a new operating system, called the VisionOS. As well as a brand new app store, released on January 16, 2024 with 600 apps currently running.
While there are many fun user apps taking advantage of this new medium of computation, there is likely to be a new surge of business use cases to emerge.
Businesses that are able to capitalize on this paradigm shift, will reap the rewards. The future belongs to those who can envision and execute on the potential of these emerging use cases.