Structure of Machine Learning Models | AI Fundamentals Course | 2.2

Let’s say you’re scrolling through your favorite streaming app.  You watched a few rom-coms last week, and suddenly, it’s suggesting a dozen more.  Magic? Nah.  That’s machine learning (ML) doing its thing behind the scenes.  But how does it actually work?

Welcome to your deep-dive into the structure of machine learning models, the inner workings that power everything from recommendation engines and facial recognition to language translation & self-driving cars. In this post, we’re going to break it down step by step.

  • What features & labels are (and why the matter)
  • What training data is and how models learn from it
  • How we evaluate if a model is actually doing its job

Let’s dive in.

What is a Machine Learning Model?

Imagine you’re trying to teach a dog tricks.  You give it a command (“sit”), show it how, give it a treat when it gets it right and after enough repetition, the dog learns what “sit” means.

A machine learning model is a little like that dog.  But instead of learning tricks, it’s learning patterns from data.  You give it a bunch of examples (called training data), tell it what the correct answers are (labels), and it adjusts itself internally to get better over time.  Eventually, it can make predictions or decisions based on new data it hasn’t seen before.

Now let’s explore each part of this process in more detail.

Features: The Ingredients of Your Model

Think of features as the information you give to the machine learning model to help it make a decision.  It’s like the ingredients in a recipe.  Without them, the model can’t cook up any results.

Examples of Features

Let’s say you want to build a model that predicts the price of a house.  Here are some features you might use:

  • Square footage
  • Number of bedrooms
  • Number of bathrooms
  • ZIP code
  • Year built
  • Lot size

Each of these things is a feature: a data point that helps the model understand what might influence the house price.

Feature Types

Features can be:

  • Numerical:  Like age, income, temperature
  • Categorical:  Like color, gender, or zip code
  • Boolean:  True / false or yes / no
  • Textual:  Like reviews or comments (requires special handling)

Pro Tip

Choosing the right features (called feature engineering) is one of the most important skills in machine learning.  Bad features = bad model.

Labels: The Correct Answers We Train On

So if features are the ingredients, labels are the final dish you’re trying to make.  In machine learning, a label is the value or category you want the model to predict.

Examples of Labels

Using our house price example:

  • The label is the actual sale price of the house.

If you’re training a model to recognize spam emails:

  • The label might be “spam” or “not spam”.

If you’re training a model to recognize cats in images:

  • The label is either “cat” or “no cat”.

Supervised Learning = Features + Labels

Most machine learning models (especially in beginner-level projects) use something called supervised learning, where the model learns from both the features and the correct answers (labels).  The idea is simple:  show the model examples and the correct labels, and it will learn to predict the labels from the features.

Training Data: The Fuel for Learning

You’ve got your features (input) and your labels (output).  Now you need to connect the dots and that’s where training data comes in.

What Is Training Data?

Training data is the collection of examples the model uses to learn patterns.  Each example includes both features and a label.

Let’s go back to the house price example.  Your training data might look like this:

The model analyzes these examples and “learns” how the input features relate to the output price.  Over time, the model updates its internal math (called weights and biases) to improve its predictions.

Testing & Evaluation: Did the Model Learn Anything Useful?

Okay, so your model has been trained.  Now what?  You need to evaluate it; basically test how well it performs on new, unseen data.  Why?  Because we don’t want a model that just memorizes the training data.  We want one that generalizes to new examples.

Step 1:  Split Your Data

Most ML workflows involve splitting your dataset into:

  • Training Set (usually ~70-80% of the data):  For training the model
  • Testing Set (the remaining ~20-30%):  For evaluating performance

Sometimes, there’s also a validation set, used to fine-tune settings before the final test.

Step 2:  Make Predictions on the Test Set

Now, the model makes predictions using the features in the test set.  Then we compare those predictions with the actual labels in the test set.

Step 3:  Use Evaluation Metrics

How do we measure whether the model is good?  It depends on the kind of task you’re doing.  Let’s explore some common types.

Evaluation Methods in Machine Learning

A. For Classification Problems (e.g., spam vs. not spam)

Common metrics include:

  • Accuracy
    • The percentage of correct predictions
    • Simple & intuitive, but not always reliable (especially for imbalance data)
  • Precision
    • Of all the positive predictions the model made, how many were correct?
    • Good when false positives are costly (like predicting diseases).
  • Recall
    • Of all the actual positives, how many did the model find?
    • Important when missing a positive is dangerous (like detecting fraud).
  • F1 Score
    • The harmonic mean of precision and recall.
    • Balances the two and is great when you need an overall performance metric.
  • Confusion Matrix
    • A table that shows true positives, false positives, true negatives, and false negatives.
    • It gives a deeper look into how the model performs.

B. For Regression Problems (e.g., predicting house prices)

You’re predicting a number instead of a category.

  • Mean Absolute Error (MAE)
    • Average of how far off predictions are from actual values.
  • Mean Squared Error (MSE)
    • Like MAE, but squares the errors before averaging: penalizes larger errors more.
  • Root Mean Squared Error (RMSE)
    • Just the square root of MSE: puts error back in the original unit (like dollars).
  • R2 Score (Coefficient of Determination)
    • Measures how well the model explains the variability of the target.
    • A score of 1.0 means perfect prediction.

Analogy:  Teaching a Kid to Read

Let’s tie it all together with a real-world analogy.

  • Step 1:  Features
    • You show the kid flashcards with letters on them.  These letters are the features.
  • Step 2:  Labels
    • You also tell the kid what each word is supposed to sound like.  These are the labels.
  • Step 3:  Training
    • You go through hundreds of flashcards together. This is the training data.
  • Step 4:  Evaluation
    • Later, you test the kid with new flashcards to see if they can still read correctly.  You evaluate how many they got right, how fast, and whether they struggled with certain letters. These are the evaluation metrics.

That’s machine learning in a nutshell.

A Peek Inside the Model

If you’re wondering what’s actually happening under the hood, here’s a sneak peek.

Most models use something called a function: a bit of math that maps inputs (features) to outputs (predictions). For simple models, it might be a linear equation:

The model learns the best values for those coefficients (like 200 or 10,000) by analyzing training data.  For more complex models like neural networks, the math gets deeper, but the principle stays the same:  map input features to output labels.

Building a Machine Learning Model Step-by-Step

Let’s recap everything with a simple step-by-step outline:

  1. Collect Data:  Get your dataset (like housing data or emails).
  2. Extract Features & Labels:  Decide what information to use (features) and what to predict (labels).
  3. Split the Data:  Into training & test sets.
  4. Train the Model:  Feed the training data to the model.
  5. Make Predictions:  On the test data.
  6. Evaluate Performance:  Use metrics like accuracy, MAE, or F1 score.
  7. Tune & Improve:  Adjust features, try different algorithms, or clean up your data to get better results.

Final Thoughts

Congrats! You now understand the basic structure of machine learning models.  This isn’t just geeky knowledge for coders or data scientists.  It’s foundational for anyone who wants to understand how AI works, make better use of smart tools, or even pursue a career in machine learning.

Whether you’re trying to:

  • Build your first model
  • Understand how AI makes decisions
  • Explain these concepts to others

– this knowledge gives you the power to do it.

Just remember:  AI models are only as good as the data we give them and the way we measure their success.  With great data (and great responsibility), we can build systems that really do make life better.