Back to: Coding With AI
Building Blocks of AI
Welcome to an exciting lesson where we explore the core components that make artificial intelligence (AI) systems work! This lesson will introduce the fundamental building blocks of AI, such as datasets, features, and models, and show you how to interact with these concepts through coding and hands-on activities.
Objective
By the end of this lesson, you will:
- Understand the key components of AI systems: datasets, features, and models.
- Learn how AI uses data to make predictions and decisions.
- Write Python code to create, analyze, and visualize datasets.
- Build and test a basic machine learning model to predict outcomes.
Core Concepts
1. What is a Dataset?
A dataset is the foundation of any AI system. It is a collection of data that AI uses to learn patterns, make decisions, and provide meaningful insights. Think of a dataset as a table with rows and columns where each row represents an observation, and each column represents a feature or attribute of that observation.
Key Characteristics of a Dataset:
- Rows (Observations): Each row in a dataset represents a single observation or data point. For example, in a student dataset, each row might represent an individual student.
- Columns (Features): Each column contains attributes or features of the observation, such as “Name,” “Hours Studied,” or “Test Score.”
Example Dataset:
Imagine a dataset about students’ performance:
| Name | Hours Studied | Test Score |
|---|---|---|
| Alice | 2 | 50 |
| Bob | 4 | 70 |
| Charlie | 6 | 85 |
Why Are Datasets Important?
- Datasets provide the raw material for AI systems to learn and make predictions. Without data, an AI system has nothing to analyze or learn from.
- High-quality datasets with accurate and relevant information are critical for building effective AI models.
Activity: Think about a dataset in your daily life—such as your favorite music app’s playlist. What kinds of data might it store (e.g., song title, artist, play count)?
2. What are Features?
Features are the attributes or characteristics of the data that help an AI model make predictions. These are the building blocks of your dataset, and they play a critical role in determining the success of your AI model.
Key Points About Features:
- Relevance: Features must be relevant to the problem you are trying to solve. For example, “Hours Studied” is highly relevant when predicting “Test Score.”
- Quantitative vs. Qualitative Features: Features can be numbers (e.g., hours studied) or categories (e.g., subject names).
Discussion Point:
What other features might influence test scores? For example, you might consider:
- Amount of sleep.
- Type of study material used.
- Number of practice tests taken.
Activity: Look at a simple dataset. Can you identify which features are most likely to affect the outcome?
3. What is a Model?
A model is the part of the AI system that learns from the data and makes predictions. Think of the model as the “brain” of your AI system. It looks at the features in the dataset, learns patterns from the data, and uses those patterns to predict outcomes.
How Does a Model Work?
- Training: During training, the model uses a dataset to identify relationships between features and outcomes. For example, it learns that as “Hours Studied” increases, “Test Score” also tends to increase.
- Prediction: After training, the model can use new data to make predictions. For instance, if a student studies for 8 hours, the model might predict a test score of 90.
Types of Models:
- Linear Regression: Finds a straight-line relationship between features and outcomes.
- Classification Models: Categorize data into groups (e.g., spam vs. non-spam emails).
Activity: Think about real-life examples where AI models are used, such as predicting weather or recommending movies. How do you think these models make their predictions?
Hands-On Coding
Let’s dive into Python to explore these concepts in action!
1. Importing and Exploring Data
We’ll use Python’s pandas library to create and explore datasets. This allows us to analyze and manipulate data efficiently.
import pandas as pd
# Create a simple dataset
data = pd.DataFrame({
"Name": ["Alice", "Bob", "Charlie"],
"Hours Studied": [2, 4, 6],
"Test Score": [50, 70, 85]
})
# Display the dataset
print(data)
# View basic statistics
print(data.describe())
Activity:
- Run the code above. What insights can you gather from the dataset? Try adding a new row for another student.
- Use
data.head()to display the first few rows of the dataset.
2. Visualizing Data
Visualization helps us understand the relationship between features. We’ll use Python’s matplotlib library to create a scatter plot.
import matplotlib.pyplot as plt
# Data for plotting
hours = data["Hours Studied"]
scores = data["Test Score"]
# Create a scatter plot
plt.scatter(hours, scores, color='blue')
plt.title("Hours Studied vs Test Score")
plt.xlabel("Hours Studied")
plt.ylabel("Test Score")
plt.grid(True)
plt.show()
Activity:
- Modify the data points to observe how the graph updates. Can you identify a trend in the data?
- Add a line of best fit (use
numpy.polyfit) to highlight the relationship.
3. Building a Simple Model
We’ll use scikit-learn to create a machine learning model that predicts test scores based on study hours. This example uses linear regression, a simple and widely used technique.
from sklearn.linear_model import LinearRegression
import numpy as np
# Prepare the data for modeling
hours_studied = np.array([2, 4, 6]).reshape(-1, 1)
test_scores = np.array([50, 70, 85])
# Create and train the model
model = LinearRegression()
model.fit(hours_studied, test_scores)
# Predict test scores for new data
new_hours = np.array([8, 10]).reshape(-1, 1)
predictions = model.predict(new_hours)
print("Predicted scores for 8 and 10 hours studied:", predictions)
# Display the model's coefficients
print("Slope:", model.coef_)
print("Intercept:", model.intercept_)
Activity:
- Experiment with different input values for
new_hours. How do the predictions change? - Discuss: What does the slope (coefficient) represent in this context?
Copyright 2024 MAIS Solutions, LLC All Rights Reserved
