Learn NumPy for Data Science with practical examples. Master arrays, indexing, slicing and vectorized operations in Python with this beginner-friendly guide.
Introduction
NumPy is the core numerical library in Python and a backbone of data science, machine learning, and scientific computing. Almost every major Python data science library such as Pandas, Scikit-Learn, TensorFlow, and PyTorch is built on top of NumPy.
In this lecture, you will learn how NumPy works, why it is faster than regular Python lists, and how to use NumPy arrays for efficient data analysis. This lecture focuses on arrays, indexing, slicing, and vectorized operations, which are essential skills for any data scientist.
By the end of this lecture, you will be able to:
- Understand NumPy arrays and how they differ from Python lists
- Create and manipulate arrays
- Use indexing and slicing to access data
- Perform fast calculations using vectorization
What is NumPy?
NumPy stands for Numerical Python. It provides:
- High-performance multidimensional arrays
- Mathematical functions for numerical operations
- Tools for linear algebra, statistics, and random numbers
Why NumPy is Important for Data Science
- Faster computation than Python lists
- Less memory usage
- Cleaner and more readable code
- Industry standard for numerical computing
Installing and Importing NumPy
If NumPy is not already installed, you can install it using pip:
pip install numpy
Import NumPy in your Python program:
import numpy as np
The alias np is a standard convention used worldwide.
Understanding NumPy Arrays
A NumPy array is a collection of elements of the same data type stored in contiguous memory locations.
Creating a NumPy Array
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
print(arr)
Output:
[10 20 30 40 50]
Difference Between Python List and NumPy Array
| Feature | Python List | NumPy Array |
|---|---|---|
| Data Type | Mixed | Same type |
| Speed | Slower | Faster |
| Memory | More | Less |
| Operations | Loop-based | Vectorized |
Creating Arrays Using NumPy Functions
Zeros and Ones
np.zeros(5)
np.ones(5)
Range of Values
np.arange(0, 10, 2)
Evenly Spaced Numbers
np.linspace(0, 1, 5)
Array Dimensions and Shape
One-Dimensional Array
arr = np.array([1, 2, 3])
print(arr.ndim)
Two-Dimensional Array
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix.shape)
Output:
(2, 3)
This means 2 rows and 3 columns.
Indexing NumPy Arrays
Indexing allows you to access specific elements from an array.
1D Array Indexing
arr = np.array([10, 20, 30, 40])
print(arr[1])
Output:
20
2D Array Indexing
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix[1, 2])
Output:
6
Slicing NumPy Arrays
Slicing extracts a portion of an array.
Slicing 1D Arrays
arr = np.array([10, 20, 30, 40, 50])
print(arr[1:4])
Output:
[20 30 40]
Slicing 2D Arrays
matrix = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(matrix[:2, :2])
Output:
[[1 2]
[4 5]]
Boolean Indexing
Boolean indexing is widely used in data science for filtering data.
arr = np.array([10, 20, 30, 40, 50])
print(arr[arr > 25])
Output:
[30 40 50]
Vectorized Operations (Most Important Concept)
Vectorization means performing operations on entire arrays without using loops.
Example Without NumPy (Slow)
data = [1, 2, 3, 4]
result = []
for x in data:
result.append(x * 2)
Same Task Using NumPy (Fast)
arr = np.array([1, 2, 3, 4])
print(arr * 2)
Output:
[2 4 6 8]
Common Vectorized Operations
arr + 5
arr - 2
arr * 3
arr / 2
arr ** 2
Mathematical Functions in NumPy
NumPy provides built-in math functions optimized for performance.
arr = np.array([1, 4, 9, 16])
np.sqrt(arr)
np.mean(arr)
np.sum(arr)
np.max(arr)
np.min(arr)
Broadcasting in NumPy
Broadcasting allows NumPy to perform operations on arrays of different shapes.
arr = np.array([1, 2, 3])
print(arr + 10)
Each element gets +10 automatically.
Real-World Data Science Example
marks = np.array([65, 70, 80, 90, 55])
average = np.mean(marks)
passed = marks[marks >= 60]
print("Average:", average)
print("Passed Students:", passed)
This type of operation is common in analytics and machine learning preprocessing.
Common Mistakes to Avoid
- Using Python loops instead of vectorization
- Mixing data types unnecessarily
- Forgetting array shape when working with 2D data
- Modifying array slices unintentionally (views vs copies)
Conclusion
NumPy is the foundation of Python data science. Understanding arrays, indexing, slicing, and vectorization will make your code faster, cleaner, and more professional. This lecture prepares you for working with real datasets, which we will start handling using Pandas in the next lecture.
Next Lecture: Lecture 11 – Pandas Fundamentals (Series, DataFrames, Reading Datasets)




