Lecture 10 – NumPy for Data Science (Arrays, Indexing, Vectorization)

Learn NumPy for Data Science with practical examples. Master arrays, indexing, slicing and vectorized operations in Python with this beginner-friendly guide.

Introduction

NumPy is the core numerical library in Python and a backbone of data science, machine learning, and scientific computing. Almost every major Python data science library such as Pandas, Scikit-Learn, TensorFlow, and PyTorch is built on top of NumPy.

In this lecture, you will learn how NumPy works, why it is faster than regular Python lists, and how to use NumPy arrays for efficient data analysis. This lecture focuses on arrays, indexing, slicing, and vectorized operations, which are essential skills for any data scientist.

By the end of this lecture, you will be able to:

  • Understand NumPy arrays and how they differ from Python lists
  • Create and manipulate arrays
  • Use indexing and slicing to access data
  • Perform fast calculations using vectorization

What is NumPy?

NumPy stands for Numerical Python. It provides:

  • High-performance multidimensional arrays
  • Mathematical functions for numerical operations
  • Tools for linear algebra, statistics, and random numbers

Why NumPy is Important for Data Science

  • Faster computation than Python lists
  • Less memory usage
  • Cleaner and more readable code
  • Industry standard for numerical computing

Python documentation

Installing and Importing NumPy

If NumPy is not already installed, you can install it using pip:

pip install numpy

Import NumPy in your Python program:

import numpy as np

The alias np is a standard convention used worldwide.

Understanding NumPy Arrays

A NumPy array is a collection of elements of the same data type stored in contiguous memory locations.

Creating a NumPy Array

import numpy as np

arr = np.array([10, 20, 30, 40, 50])
print(arr)

Output:

[10 20 30 40 50]

Difference Between Python List and NumPy Array

FeaturePython ListNumPy Array
Data TypeMixedSame type
SpeedSlowerFaster
MemoryMoreLess
OperationsLoop-basedVectorized

Creating Arrays Using NumPy Functions

Zeros and Ones

np.zeros(5)
np.ones(5)

Range of Values

np.arange(0, 10, 2)

Evenly Spaced Numbers

np.linspace(0, 1, 5)

Python for Data Science

Array Dimensions and Shape

One-Dimensional Array

arr = np.array([1, 2, 3])
print(arr.ndim)

Two-Dimensional Array

matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix.shape)

Output:

(2, 3)

This means 2 rows and 3 columns.

Indexing NumPy Arrays

Indexing allows you to access specific elements from an array.

1D Array Indexing

arr = np.array([10, 20, 30, 40])
print(arr[1])

Output:

20

2D Array Indexing

matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix[1, 2])

Output:

6

Slicing NumPy Arrays

Slicing extracts a portion of an array.

Slicing 1D Arrays

arr = np.array([10, 20, 30, 40, 50])
print(arr[1:4])

Output:

[20 30 40]

Slicing 2D Arrays

matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

print(matrix[:2, :2])

Output:

[[1 2]
 [4 5]]

Boolean Indexing

Boolean indexing is widely used in data science for filtering data.

arr = np.array([10, 20, 30, 40, 50])
print(arr[arr > 25])

Output:

[30 40 50]

Vectorized Operations (Most Important Concept)

Vectorization means performing operations on entire arrays without using loops.

Example Without NumPy (Slow)

data = [1, 2, 3, 4]
result = []

for x in data:
    result.append(x * 2)

Same Task Using NumPy (Fast)

arr = np.array([1, 2, 3, 4])
print(arr * 2)

Output:

[2 4 6 8]

Common Vectorized Operations

arr + 5
arr - 2
arr * 3
arr / 2
arr ** 2

Mathematical Functions in NumPy

NumPy provides built-in math functions optimized for performance.

arr = np.array([1, 4, 9, 16])

np.sqrt(arr)
np.mean(arr)
np.sum(arr)
np.max(arr)
np.min(arr)

Broadcasting in NumPy

Broadcasting allows NumPy to perform operations on arrays of different shapes.

arr = np.array([1, 2, 3])
print(arr + 10)

Each element gets +10 automatically.

Real-World Data Science Example

marks = np.array([65, 70, 80, 90, 55])

average = np.mean(marks)
passed = marks[marks >= 60]

print("Average:", average)
print("Passed Students:", passed)

This type of operation is common in analytics and machine learning preprocessing.

Common Mistakes to Avoid

  • Using Python loops instead of vectorization
  • Mixing data types unnecessarily
  • Forgetting array shape when working with 2D data
  • Modifying array slices unintentionally (views vs copies)

Conclusion

NumPy is the foundation of Python data science. Understanding arrays, indexing, slicing, and vectorization will make your code faster, cleaner, and more professional. This lecture prepares you for working with real datasets, which we will start handling using Pandas in the next lecture.

Next Lecture: Lecture 11 – Pandas Fundamentals (Series, DataFrames, Reading Datasets)

Leave a Reply

Your email address will not be published. Required fields are marked *