4 of 518%

beginnerMachine Learning8% complete

Datasets and Features

Learn Datasets and Features through prediction dataset: what it does, when to use it, the code pattern, and a small task you can test immediately.

Run example Practice

This lesson gives you

3 Working code

3 Practice tasks

5 Interview answers

Plain meaning

Datasets and Features is a Machine Learning pattern for one practical job. Learn the input, apply the smallest working syntax, check the output, then reuse the pattern in a real feature.

Why it matters

Datasets and Features matters because real Machine Learning work needs consistent ways to train, validate and explain a predictive model. Without this pattern, the feature becomes harder to change, test and review.

Real use

In a real project, datasets and features helps build a beginner machine learning experiment using features, labels, metrics and validation rows.

Working example

Core pattern

This is the version to read first, run next, and modify last.

dataset = [
    {"feature": 1.2, "label": "low"},
    {"feature": 3.8, "label": "high"},
    {"feature": 2.4, "label": "medium"},
]
features = [row["feature"] for row in dataset]
labels = [row["label"] for row in dataset]
print({"rows": len(dataset), "features": features, "labels": labels})

Expected output

The experiment prepares features, labels, metrics and validation rows, trains or scores a small model pattern, and prints a metric you can compare.

Line by line

What each part does

Line 1 sets up the Datasets and Features example: dataset = [.

Line 2 adds one required part of the working pattern: {"feature": 1.2, "label": "low"},.

Line 3 adds one required part of the working pattern: {"feature": 3.8, "label": "high"},.

Line 4 adds one required part of the working pattern: {"feature": 2.4, "label": "medium"},.

Line 5 adds one required part of the working pattern: ].

Line 6 adds one required part of the working pattern: features = [row["feature"] for row in dataset].

Methods and commands

Datasets and Features reference

Use these methods, commands, tags or properties with the working example above.

features

X = [[feature_1, feature_2]]

Represent inputs the model can learn from.

features = [[area, bedrooms] for area, bedrooms, price in rows]

train/test split

split rows into train_rows and test_rows

Measure performance on data the model did not train on.

train_rows = rows[:80]
test_rows = rows[80:]

label

y = [target]

Represent the answer the model should predict.

labels = [price for area, bedrooms, price in rows]

baseline

predict the average or majority class

Create a simple reference before using a complex model.

baseline = sum(labels) / len(labels)

accuracy

correct / total

Score classification when classes are reasonably balanced.

accuracy = correct / len(actual)

precision

tp / (tp + fp)

Measure how many positive predictions were actually positive.

precision = true_positive / (true_positive + false_positive)

recall

tp / (tp + fn)

Measure how many real positives the model found.

recall = true_positive / (true_positive + false_negative)

standardization

(value - mean) / std

Put numeric features on comparable scales.

scaled = [(x - mean) / std for x in values]

Try it yourself

Edit and run the concept

Change one thing at a time so the output stays easy to understand.

Machine Learning Datasets and Features editor

lesson.pypython

python8 linesWrap

Input

Wrap

Terminal

Success

Ready.

Run code to see output here.

Examples

Three useful variations

Compare the examples by level. Each one keeps the same idea but changes the situation.

Beginner example

python

dataset = [
    {"feature": 1.2, "label": "low"},
    {"feature": 3.8, "label": "high"},
    {"feature": 2.4, "label": "medium"},
]
features = [row["feature"] for row in dataset]
labels = [row["label"] for row in dataset]
print({"rows": len(dataset), "features": features, "labels": labels})

The experiment prepares features, labels, metrics and validation rows, trains or scores a small model pattern, and prints a metric you can compare.

Intermediate example

python

dataset = [
    {"feature": 1.2, "label": "low"},
    {"feature": 3.8, "label": "high"},
    {"feature": 2.4, "label": "medium"},
]
features = [row["feature"] for row in dataset]
labels = [row["label"] for row in dataset]
print({"rows": len(dataset), "features": features, "labels": labels})

The experiment prepares features, labels, metrics and validation rows, trains or scores a small model pattern, and prints a metric you can compare.

Advanced example

python

dataset = [
    {"feature": 1.2, "label": "low"},
    {"feature": 3.8, "label": "high"},
    {"feature": 2.4, "label": "medium"},
]
features = [row["feature"] for row in dataset]
labels = [row["label"] for row in dataset]
print({"rows": len(dataset), "features": features, "labels": labels})

The experiment prepares features, labels, metrics and validation rows, trains or scores a small model pattern, and prints a metric you can compare.

Practice

Build understanding

Rewrite the Datasets and Features example for prediction dataset using your own labels or data.

Add one edge case from features, labels, metrics and validation rows and record the output.

Explain where Datasets and Features fits inside a beginner machine learning experiment.

Mini task

Build a tiny a beginner machine learning experiment step that uses Datasets and Features, then write the expected output before running it.

Checklist

Use it correctly

Datasets and Features is easier when connected to a real task.
Small examples are the fastest way to catch misunderstandings.
Practice, quiz review and projects reinforce the lesson.
Line-by-line review turns copied code into understood code.

Common mistake

Skipping the small datasets and features example and trying to memorize the rule first.

Best practice

Use descriptive names so the example explains itself.

Interview prep

Datasets and Features questions

Use these as concise model answers, then rewrite them in your own words.

1. What is Datasets and Features in Machine Learning?

Datasets and Features is a specific Machine Learning pattern used to make a common task easier to read, write, test, or explain. A strong answer includes the purpose, a tiny example, and the result you expect after running it.

2. Why do developers use datasets and features?

3. How would you use datasets and features in a real project?

In a real project, datasets and features helps build a beginner machine learning experiment using features, labels, metrics and validation rows. Start with the simple syntax, keep names clear, run the code, then handle one edge case before expanding the feature.

4. What mistake should a beginner avoid with datasets and features?

Skipping the small datasets and features example and trying to memorize the rule first.

5. How would you explain Machine Learning Introduction in Machine Learning during an interview?

Machine Learning Introduction is best explained with its purpose, a small example, and one common mistake.

6. How would you explain AI vs ML vs Deep Learning in Machine Learning during an interview?

AI vs ML vs Deep Learning is best explained with its purpose, a small example, and one common mistake.

Simple rule

Start with the working example, change one value, run it again, and explain why the output changed. That makes datasets and features useful instead of memorized.