Lecture 10 - NumPy for Data Science (Arrays, Indexing, Vectorization)

Learn NumPy for Data Science with practical examples. Master arrays, indexing, slicing and vectorized operations in Python with this beginner-friendly guide.

Introduction

NumPy is the core numerical library in Python and a backbone of data science, machine learning, and scientific computing. Almost every major Python data science library such as Pandas, Scikit-Learn, TensorFlow, and PyTorch is built on top of NumPy.

In this lecture, you will learn how NumPy works, why it is faster than regular Python lists, and how to use NumPy arrays for efficient data analysis. This lecture focuses on arrays, indexing, slicing, and vectorized operations, which are essential skills for any data scientist.

By the end of this lecture, you will be able to:

Understand NumPy arrays and how they differ from Python lists
Create and manipulate arrays
Use indexing and slicing to access data
Perform fast calculations using vectorization

What is NumPy?

NumPy stands for Numerical Python. It provides:

High-performance multidimensional arrays
Mathematical functions for numerical operations
Tools for linear algebra, statistics, and random numbers

Why NumPy is Important for Data Science

Faster computation than Python lists
Less memory usage
Cleaner and more readable code
Industry standard for numerical computing

Python documentation

Installing and Importing NumPy

If NumPy is not already installed, you can install it using pip:

pip install numpy

Import NumPy in your Python program:

import numpy as np

The alias np is a standard convention used worldwide.

Understanding NumPy Arrays

A NumPy array is a collection of elements of the same data type stored in contiguous memory locations.

Creating a NumPy Array

import numpy as np

arr = np.array([10, 20, 30, 40, 50])
print(arr)

Output:

[10 20 30 40 50]

Difference Between Python List and NumPy Array

Feature	Python List	NumPy Array
Data Type	Mixed	Same type
Speed	Slower	Faster
Memory	More	Less
Operations	Loop-based	Vectorized

Creating Arrays Using NumPy Functions

Zeros and Ones

np.zeros(5)
np.ones(5)

Range of Values

np.arange(0, 10, 2)

Evenly Spaced Numbers

np.linspace(0, 1, 5)

Python for Data Science

Array Dimensions and Shape

One-Dimensional Array

arr = np.array([1, 2, 3])
print(arr.ndim)

Two-Dimensional Array

matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix.shape)

Output:

(2, 3)

This means 2 rows and 3 columns.

Indexing NumPy Arrays

Indexing allows you to access specific elements from an array.

1D Array Indexing

arr = np.array([10, 20, 30, 40])
print(arr[1])

Output:

2D Array Indexing

matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix[1, 2])

Output:

Slicing NumPy Arrays

Slicing extracts a portion of an array.

Slicing 1D Arrays

arr = np.array([10, 20, 30, 40, 50])
print(arr[1:4])

Output:

[20 30 40]

Slicing 2D Arrays

matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

print(matrix[:2, :2])

Output:

[[1 2]
 [4 5]]

Boolean Indexing

Boolean indexing is widely used in data science for filtering data.

arr = np.array([10, 20, 30, 40, 50])
print(arr[arr > 25])

Output:

[30 40 50]

Vectorized Operations (Most Important Concept)

Vectorization means performing operations on entire arrays without using loops.

Example Without NumPy (Slow)

data = [1, 2, 3, 4]
result = []

for x in data:
    result.append(x * 2)

Same Task Using NumPy (Fast)

arr = np.array([1, 2, 3, 4])
print(arr * 2)

Output:

[2 4 6 8]

Common Vectorized Operations

arr + 5
arr - 2
arr * 3
arr / 2
arr ** 2

Mathematical Functions in NumPy

NumPy provides built-in math functions optimized for performance.

arr = np.array([1, 4, 9, 16])

np.sqrt(arr)
np.mean(arr)
np.sum(arr)
np.max(arr)
np.min(arr)

Broadcasting in NumPy

Broadcasting allows NumPy to perform operations on arrays of different shapes.

arr = np.array([1, 2, 3])
print(arr + 10)

Each element gets +10 automatically.

Real-World Data Science Example

marks = np.array([65, 70, 80, 90, 55])

average = np.mean(marks)
passed = marks[marks >= 60]

print("Average:", average)
print("Passed Students:", passed)

This type of operation is common in analytics and machine learning preprocessing.

Common Mistakes to Avoid

Using Python loops instead of vectorization
Mixing data types unnecessarily
Forgetting array shape when working with 2D data
Modifying array slices unintentionally (views vs copies)

Conclusion

NumPy is the foundation of Python data science. Understanding arrays, indexing, slicing, and vectorization will make your code faster, cleaner, and more professional. This lecture prepares you for working with real datasets, which we will start handling using Pandas in the next lecture.

Next Lecture: Lecture 11 – Pandas Fundamentals (Series, DataFrames, Reading Datasets)

Lecture 10 – NumPy for Data Science (Arrays, Indexing, Vectorization)

Introduction

What is NumPy?

Why NumPy is Important for Data Science

Installing and Importing NumPy

Understanding NumPy Arrays

Creating a NumPy Array

Difference Between Python List and NumPy Array

Creating Arrays Using NumPy Functions

Zeros and Ones

Range of Values

Evenly Spaced Numbers

Array Dimensions and Shape

One-Dimensional Array

Two-Dimensional Array

Indexing NumPy Arrays

1D Array Indexing

2D Array Indexing

Slicing NumPy Arrays

Slicing 1D Arrays

Slicing 2D Arrays

Boolean Indexing

Vectorized Operations (Most Important Concept)

Example Without NumPy (Slow)

Same Task Using NumPy (Fast)

Common Vectorized Operations

Mathematical Functions in NumPy

Broadcasting in NumPy

Real-World Data Science Example

Common Mistakes to Avoid

Conclusion

Leave a ReplyCancel Reply

Contact Us

Introduction

What is NumPy?

Why NumPy is Important for Data Science

Installing and Importing NumPy

Understanding NumPy Arrays

Creating a NumPy Array

Difference Between Python List and NumPy Array

Creating Arrays Using NumPy Functions

Zeros and Ones

Range of Values

Evenly Spaced Numbers

Array Dimensions and Shape

One-Dimensional Array

Two-Dimensional Array

Indexing NumPy Arrays

1D Array Indexing

2D Array Indexing

Slicing NumPy Arrays

Slicing 1D Arrays

Slicing 2D Arrays

Boolean Indexing

Vectorized Operations (Most Important Concept)

Example Without NumPy (Slow)

Same Task Using NumPy (Fast)

Common Vectorized Operations

Mathematical Functions in NumPy

Broadcasting in NumPy

Real-World Data Science Example

Common Mistakes to Avoid

Conclusion

Related Posts

Lecture 9 – Object-Oriented Programming (OOP) in Python

Lecture 8 – Modules and Packages in Python (import, pip, libraries, virtual environments)

Lecture 7 – Lambda, Map, Filter, Reduce (Functional Programming in Python)

Leave a ReplyCancel Reply