Computer Vision | AI Fundamentals Course | 3.2

Look around wherever you are right now.  See your phone?  Your coffee cup?  Maybe a dog lying on the floor or a car passing by outside? Now imagine teaching a computer to see all those things, not just to see them but to actually understand what it’s looking at.  That, in a nutshell, is what Computer Vision is all about.

Sounds wild right?  But we’re already living in a world where computers recognize faces, detect diseases in X-rays, and help self-driving cars figure out when to stop at a red light. So in this post, we’re going to break down what computer vision is, how it works, and explore two foundational concepts in the field:

  • Image classification
  • Object detection

What is Computer Vision?

Computer Vision is a field of artificial intelligence that trains machines to “see” and make sense of the visual world. We humans do this all day, every day, without even thinking about it.  Our eyes take in the world, and our brains interpret what we’re seeing; whether it’s a cat, a stop sign, or a facial expression.

With computers, the process is more mechanical, but the goal is the same:  help machines understand and act on visual data.

Why is Computer Vision a Big Deal?

Here’s why this field matters so much:

  • Healthcare:  AI can detect tumors in medical scans faster than human doctors.
  • Autonomous Vehicles:  Self-driving cars use cameras & vision to navigate.
  • Retail:  Smart stores use vision to track products & detect theft.
  • Security:  Facial recognition for access control & surveillance.
  • Agriculture:  Drones scan crops for health, hydration, & pests.
  • Accessibility:  Apps help visually impaired users identify objects or read text aloud.

And that’s just scratching the surface.  Computer vision is everywhere and growing fast.

How Does Computer Vision Work?

Let’s walk through the basic idea using an example.

Say you want to teach a computer to recognize pictures of dogs.  You’d start with a bunch of dog images.  The computer doesn’t see a “dog” the way we do, it sees numbers, pixels, & color values.

Through a process called training, we help the computer learn patterns that distinguish a dog from, say, a loaf of bread or a bicycle. The more data it sees, the smarter it gets.

Two Key Pillars of Computer Vision

Let’s zoom in on two of the most foundational and fascinating applications of computer vision:

  • Image Classification
  • Object Detection

Both are related, but they serve different purposes.

Part 1:  What is Image Classification?

Imagine you give an AI a photo and ask, “What’s in this image?”  If the AI responds with just one label, like “dog” or “pizza”, that’s image classification.

Definition:  Image classification is the process of assigning a label to an image from a fixed set of categories.

Think of it like sorting pictures into folders:

  • All the cat pics go in the “Cat” folder.
  • All the car pics go in the “Car” folder.
  • All the pizza pics go in the “Pizza” folder.

Real-World Examples of Image Classification

  • Instagram or Facebook automatically tagging your friends.
  • Medical imaging tools detecting whether a scan is healthy or abnormal.
  • Wildlife conservation using drones to classify animals in photos.
  • Email filters recognizing & classifying spam based on images.

How Does It Work?

Behind the scenes, the process usually looks like this:

  • Step 1:  Data Collection
    • You gather thousands (or millions) of labeled images.  For example, photos labeled as “cat”, “dog”, “car”, etc.
  • Step 2:  Preprocessing
    • Images are resized, normalized, or converted to grayscale to reduce complexity.
  • Step 3:  Model Training
    • You use a machine learning model, often a Convolutional Neural Network (CNN), to learn from the images.
  • Step 4:Testing & Inference
    • Once trained, you can feed the model a new image, and it will predict the most likely label.

Part 2:  What is Object Detection?

Image classification is great, but what if there are multiple things in the image?  Say you’re looking at a photo of a dog, a ball, and a tree.  You don’t just want to know what’s in the image, you want to know:

  • What is there?
  • Where it is in the image?

That’s where object detection comes in.

Definition:  Object detection identifies multiple objects within an image & draws bounding boxes around them, labeling each object.

So instead of saying “this is a dog”, it says:

  • “There’s a dog at (x1,y1)-(x2, y2), a ball at (x3, y3)-(x4, y4), and a tree at (x5, y5)-(x6,y6).”

Real-World Examples of Object Detection

  • Autonomous Vehicles:  Spotting pedestrians, street signs, & other vehicles.
  • Retail:  Detecting products on shelves for inventory management.
  • Security Cameras:  Recognizing people or unusual objects.
  • Sports Analytics:  Tracking players, the ball, & movement patterns.
  • Smartphone Cameras:  Enhancing autofocus  facial detection.

How Does Object Detection Work?

It’s a bit more complex than classification, but let’s simplify it.

  • Step 1:  Annotated Training Data
    • You need images with bounding boxes & labels for each object in them.
    • Example
      • A “person” box
      • A “bicycle” box
      • A “dog” box
  • Step 2:  Train an Object Detection Model
    • Popular architectures for object detection include:
      • YOLO (You Only Look Once)
      • SSD (Single Shot Multibox Detector)
      • Faster R-CNN (Region-based Convolutional Neural Network)
    • These models detect & localize multiple objects simultaneously.
  • Step 3:  Output
    • You feed an image, and the model spits out:
      • Coordinates of bounding boxes
      • The predicted object in each box
      • Confidence score (like:  “I’m 93% sure this is a truck”)

Image Classification vs. Object Detection

Here’s a quick side-by-side to clarify the difference:

FeatureImage ClassificationObject Detection
Answers the question:“What is this image?”“What’s in this image and where?”
OutputOne label per imageMultiple labels with bounding boxes
Example TaskDog or Cat?Find all dogs, cats, and birds in this photo
Real-World UseMedical scans, spam filtersSelf-driving cars, security, retail automation

Challenges in Computer Vision

Computer vision is powerful, but it’s not perfect.  Here are a few common challenges:

  • Lighting & Weather
    • A model trained on sunny photos may struggle on a rainy day.
  • Angles & Perspectives
    • An object viewed from a weird angle might confuse the system.
  • Occlusion
    • What if a dog is behind a chair?  Can the AI still recognize it?
  • Bias in Training Data
    • If a facial recognition model is trained mostly on one ethnicity, it may perform poorly on others leading to dangerous consequences.
  • Processing Power
    • High-end vision models (especially real-time ones) can be super demanding.

These challenges are why researchers are constantly refining algorithms, data collection methods, & fairness metrics.

What You Should Know Moving Forward

By now, you’ve got a solid grip on:

  • What computer vision is
  • The basics of image classification
  • How object detection works
  • Real-world uses of both
  • Some challenges & ongoing improvements

If you’re serious about diving deeper into the world of AI, understanding how machines see & interpret the world is essential. Even cooler?  You can actually experiment with pre-built tools like:

  • Google’s Teachable Machine
  • OpenCV (Open Source Computer Vision Library)
  • TensorFlow & PyTorch for model training

No PhD required to start playing around.

Final Thoughts

Computer Vision is the bridge between the digital & physical worlds.

  • It helps machines recognize, understand, & interact with the world around them.
  • It powers everything from Google Photos to robot navigation systems.
  • And with tools like image classification & object detection, it’s making things smarter, safer, & more efficient.

But the journey doesn’t stop there. As you continue your AI Fundamentals journey, think about this:  How would you want an AI to see the world?  What data would you feed it?  How would you ensure it sees fairly & accurately?

Because training an AI to see isn’t just a technical challenge…it’s a human one too.