The Complete Python Mastery Course
Python Β· NumPy Β· Pandas
From Absolute Beginner β Professional Developer
Introduction & Setup
1.1 What is Python?
Python is a high-level, interpreted, general-purpose programming language created by Guido van Rossum in 1991. Its design philosophy emphasises code readability and simplicity, making it one of the most beginner-friendly languages while also being powerful enough for cutting-edge AI and data science.
1.2 Why Learn Python?
- Web development (Django, Flask, FastAPI)
- Data Science & Machine Learning (NumPy, Pandas, scikit-learn, TensorFlow)
- Automation & scripting β save hours of repetitive work
- Finance & Quantitative Analysis
- Scientific computing & research
- DevOps, Cloud, and APIs
1.3 Installing Python
Step 1: Visit https://www.python.org/downloads/ and download Python 3.11+ for your OS.
Step 2: During installation on Windows, tick "Add Python to PATH".
Step 3: Verify installation:
python --version # e.g. Python 3.11.4 pip --version # package manager
1.4 Installing Libraries
pip install numpy pandas matplotlib jupyter
1.5 Your First Python Program
# my_first_program.py print("Hello, World!") print("Welcome to Python! π")
- VS Code + Python extension is the best free IDE for beginners.
- Jupyter Notebook is great for interactive coding: run
jupyter notebook - Online alternative: https://replit.com β no installation needed.
- Install Python and create a file called
hello.py - Print your name, age, and favourite hobby on separate lines.
- Add a comment at the top describing what the program does.
# Introduces me to Python print("Name: Alex") print("Age: 25") print("Hobby: Hiking")
Variables & Data Types
2.1 What is a Variable?
A variable is a labelled container that stores a value. In Python you do NOT need to declare a type β Python infers it automatically (dynamic typing).
name = "Alice" # str β text age = 30 # int β whole number height = 1.75 # float β decimal is_cool = True # bool β True / False nothing = None # NoneType β absence of value print(type(name)) # <class 'str'> print(type(age)) # <class 'int'> print(type(height)) # <class 'float'>
2.2 Core Data Types
| Type | Example |
|---|---|
| int | 42, -7, 1_000_000 |
| float | 3.14, -0.001, 2.5e3 |
| str | "hello", 'world' |
| bool | True, False |
| NoneType | None |
| complex | 3 + 4j |
2.3 Arithmetic Operators
a, b = 10, 3 print(a + b) # 13 addition print(a - b) # 7 subtraction print(a * b) # 30 multiplication print(a / b) # 3.333... true division print(a // b) # 3 floor division print(a % b) # 1 modulus (remainder) print(a ** b) # 1000 exponentiation
2.4 Comparison & Logical Operators
print(5 == 5) # True equal print(5 != 3) # True not equal print(5 > 3) # True greater than print(5 <= 5) # True less than or equal print(True and False) # False print(True or False) # True print(not True) # False
2.5 Type Conversion
x = "42" y = int(x) # str β int z = float(x) # str β float w = str(100) # int β str b = bool(0) # int β bool (0 β False) # Two-step conversion int(float("3.14")) # 3
2.6 Multiple Assignment & Swapping
x, y, z = 1, 2, 3 # Swap without a temp variable β pure Python magic! a, b = 10, 20 a, b = b, a print(a, b) # 20 10 # Same value to many variables p = q = r = 0
- Variable names are case-sensitive:
myVarβmyvarβMYVAR - Use snake_case β it's the Python convention (PEP 8).
- Avoid single-letter names except for counters (
i,j,k).
Scenario: You are building a simple payroll calculator for a small cafΓ©.
- Create variables:
employee_name,hourly_rate,hours_worked,tax_rate(use realistic values) - Calculate
gross_pay,tax_amount, andnet_pay - Print a formatted payslip showing all values
- Verify all types using
type()
employee_name = "Maria" hourly_rate = 15.50 hours_worked = 40 tax_rate = 0.20 gross_pay = hourly_rate * hours_worked # 620.0 tax_amount = gross_pay * tax_rate # 124.0 net_pay = gross_pay - tax_amount # 496.0 print("===== PAYSLIP =====") print("Employee :", employee_name) print("Gross Pay : $", gross_pay) print("Tax (20%) : $", tax_amount) print("Net Pay : $", net_pay) print(type(employee_name)) # <class 'str'> print(type(hourly_rate)) # <class 'float'>
Strings: The Language of Data
3.1 Creating Strings
s1 = "Hello" # double quotes s2 = 'World' # single quotes (identical) s3 = """This is a multi-line string.""" # triple quotes s4 = r"C:\Users\file" # raw string β backslash is literal
3.2 String Indexing & Slicing β MEMORISE THIS
Python uses zero-based indexing. Negative indices count from the end.
word = "Python" # P y t h o n # 0 1 2 3 4 5 (positive) # -6-5-4-3-2-1 (negative) print(word[0]) # P β first character print(word[-1]) # n β last character print(word[2:5]) # tho β slice [start:stop] (stop excluded) print(word[:3]) # Pyt β from start print(word[3:]) # hon β to end print(word[::2]) # Pto β every 2nd character (step) print(word[::-1]) # nohtyP β REVERSE the string!
3.3 Essential String Methods
| Method | What it does |
|---|---|
| .upper() / .lower() | Convert case |
| .strip() / .lstrip() / .rstrip() | Remove whitespace |
| .split(sep) | Split into a list |
| .join(iterable) | Join a list into string |
| .replace(old, new) | Find & replace |
| .find(sub) | Index of first occurrence (-1 if not found) |
| .count(sub) | Count occurrences |
| .startswith(s) / .endswith(s) | Boolean checks |
| .isdigit() / .isalpha() | Character type checks |
| .zfill(n) | Pad with zeros on the left |
| .center(n, char) | Centre in a field of width n |
text = " Hello, World! " print(text.strip()) # "Hello, World!" print(text.strip().lower()) # "hello, world!" print(text.strip().split(", ")) # ["Hello", "World!"] words = ["Python", "is", "awesome"] print(" ".join(words)) # "Python is awesome" email = "user@example.com" print(email.split("@")[1]) # "example.com" print(email.replace("@", " at ")) # "user at example.com"
3.4 f-Strings β Modern Python's Superpower
name = "Alice" score = 95.678 print(f"Hello, {name}!") print(f"Score: {score:.2f}") # Score: 95.68 (2 decimal places) print(f"Score: {score:08.2f}") # Score: 00095.68 (zero-padded) print(f"2 + 2 = {2 + 2}") # expressions inside {} print(f"Name upper: {name.upper()}") # Debug shorthand (Python 3.8+) x = 42 print(f"{x = }") # x = 42
3.5 Format Spec Cheat Sheet
| Format Spec | Meaning & Example |
|---|---|
| {:.2f} | Float 2 decimal places β 3.14 |
| {:,} | Thousand separator β 1,000,000 |
| {:>10} | Right-align in width 10 |
| {:<10} | Left-align in width 10 |
| {:^10} | Centre in width 10 |
| {:05d} | Zero-pad integer β 00042 |
| {:.2%} | Percentage β 75.00% |
| {:e} | Scientific notation β 1.23e+04 |
| {:b} | Binary β 1010 |
| {:x} | Hex β ff |
Scenario: Build a receipt generator for a coffee shop.
- Store:
shop_name="The Daily Grind",item="Latte",qty=3,price=4.75,customer="Bob" - Calculate
total = qty * price - Print a formatted receipt with shop name centred in a 40-char line of
= - Extract the domain from
"orders@dailygrind.com" - Count how many vowels are in the item name
shop_name = "The Daily Grind" item = "Latte" qty = 3 price = 4.75 customer = "Bob" total = qty * price print("=" * 40) print(shop_name.center(40)) print("=" * 40) print(f"Customer : {customer}") print(f"Item : {item} x{qty} @ ${price:.2f} each") print(f"Total : ${total:.2f}") print(f"Thank you for visiting {shop_name.upper()}!") email = "orders@dailygrind.com" print("Domain:", email.split("@")[1]) # dailygrind.com vowels = sum(1 for ch in item.lower() if ch in "aeiou") print(f"Vowels in '{item}': {vowels}") # 2
Control Flow: if / elif / else
4.1 The if Statement
temperature = 35 if temperature > 30: print("It is hot outside!") # 4-space indent is mandatory
4.2 if / elif / else
score = 72 if score >= 90: grade = "A" elif score >= 80: grade = "B" elif score >= 70: grade = "C" elif score >= 60: grade = "D" else: grade = "F" print(f"Grade: {grade}") # Grade: C
4.3 Ternary (One-line if)
# value_if_true if condition else value_if_false age = 20 status = "adult" if age >= 18 else "minor" discount = 0.20 if age >= 65 else 0.0
4.4 Truthy & Falsy Values β Critical Knowledge
# FALSY values β evaluated as False in boolean context: False, None, 0, 0.0, 0j, "", [], (), {}, set() # TRUTHY: everything else name = "" if not name: print("Name cannot be empty!") items = [] if not items: print("Cart is empty")
4.5 match / case (Python 3.10+)
command = "quit" match command: case "start": print("Starting...") case "stop" | "quit": print("Stopping...") case _: # default (like else) print(f"Unknown: {command}")
Rules: Child (0-12) $8 Β· Teen (13-17) $10 Β· Adult (18-64) $15 Β· Senior (65+) $10
Tuesday = 20% off Β· Members get extra $2 off after all discounts
- Test:
age=25, day="Tuesday", is_member=Trueβ should be $10 - Test:
age=10, day="Friday", is_member=Falseβ should be $8
age = 25 day = "Tuesday" is_member = True if age <= 12: price = 8 elif age <= 17: price = 10 elif age <= 64: price = 15 else: price = 10 if day == "Tuesday": price = price * 0.80 if is_member: price = price - 2 print(f"Ticket price: ${price:.2f}") # $10.00
Loops: for & while
5.1 for Loop
for i in range(5): # 0, 1, 2, 3, 4 print(i) for i in range(1, 11, 2): # 1, 3, 5, 7, 9 print(i) fruits = ["apple", "banana", "cherry"] for fruit in fruits: print(fruit.upper())
5.2 enumerate() β Index + Value
students = ["Alice", "Bob", "Carol"] for i, name in enumerate(students, start=1): print(f"{i}. {name}")
5.3 zip() β Loop Multiple Lists Together
names = ["Alice", "Bob", "Carol"] scores = [95, 82, 78] for name, score in zip(names, scores): print(f"{name}: {score}")
5.4 while Loop
n = 5 while n > 0: print(n) n -= 1 print("Blastoff!")
5.5 break, continue, else
# break β exit loop immediately for n in range(10): if n == 5: break print(n) # prints 0 1 2 3 4 # continue β skip to next iteration for n in range(10): if n % 2 == 0: continue print(n) # prints 1 3 5 7 9 # else on a for loop β runs only if no break for n in range(2, 10): if 100 % n == 0: print(f"Factor: {n}"); break else: print("No factor found")
5.6 List Comprehensions β Pythonic One-Liners
# [expression for item in iterable if condition] squares = [x**2 for x in range(1, 11)] # [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] evens = [x for x in range(20) if x % 2 == 0] upper = [n.upper() for n in ["alice", "bob"]] # Nested β 3Γ3 multiplication table table = [[i*j for j in range(1,4)] for i in range(1,4)]
daily_sales = [120, 85, 200, 60, 175, 90, 310, 45, 220, 130]
- Print each day's sales with day number (Day 1: $120)
- Find total sales using a loop
- List comprehension: sales that exceeded $100
- Find which day had highest sales using enumerate
- While loop: start $1000, subtract sales >$150, stop when <$500
- Count days with sales below $100 using a comprehension
daily_sales = [120, 85, 200, 60, 175, 90, 310, 45, 220, 130] # 1. Print each day for i, sale in enumerate(daily_sales, start=1): print(f"Day {i}: ${sale}") # 2. Total total = sum(daily_sales) print(f"Total: ${total}") # $1435 # 3. Sales > $100 high_sales = [s for s in daily_sales if s > 100] # 4. Best day best_day, best_sale = 1, daily_sales[0] for i, sale in enumerate(daily_sales, start=1): if sale > best_sale: best_sale, best_day = sale, i print(f"Best: Day {best_day} β ${best_sale}") # Day 7 β $310 # 5. Cash register balance = 1000 big_days = [s for s in daily_sales if s > 150] idx = 0 while balance >= 500 and idx < len(big_days): balance -= big_days[idx]; idx += 1 print(f"Balance: ${balance}") # 6. Slow days slow = [s for s in daily_sales if s < 100] print(f"Slow days: {len(slow)}") # 3
Functions
6.1 Defining & Calling
def greet(name): """Greet a person by name.""" # docstring return f"Hello, {name}!" message = greet("Alice") print(message) # Hello, Alice!
6.2 Parameters: Positional, Keyword, Default
def create_profile(name, age, country="US"): return f"{name}, {age}, {country}" print(create_profile("Alice", 30)) # Alice, 30, US print(create_profile(age=25, name="Bob")) # keyword order doesn't matter print(create_profile("Chen", 22, "CN")) # override default
6.3 *args and **kwargs
# *args β any number of positional arguments (stored as tuple) def add_all(*numbers): return sum(numbers) print(add_all(1, 2, 3, 4, 5)) # 15 # **kwargs β any number of keyword arguments (stored as dict) def print_info(**details): for key, value in details.items(): print(f"{key}: {value}") print_info(name="Alice", job="Engineer", city="NYC")
6.4 Lambda Functions
# lambda args : expression square = lambda x: x ** 2 print(square(5)) # 25 # Lambdas shine as sort keys students = [("Alice", 88), ("Bob", 72), ("Carol", 95)] students.sort(key=lambda s: s[1], reverse=True) print(students) # [("Carol",95), ("Alice",88), ("Bob",72)]
6.5 Scope β LEGB Rule
# L β Local E β Enclosing G β Global B β Built-in x = "global" def outer(): x = "enclosing" def inner(): x = "local" print(x) # local inner() print(x) # enclosing outer() print(x) # global
6.6 map(), filter(), reduce()
numbers = [1, 2, 3, 4, 5, 6] doubled = list(map(lambda x: x * 2, numbers)) # [2,4,6,8,10,12] evens = list(filter(lambda x: x % 2 == 0, numbers)) # [2,4,6] from functools import reduce product = reduce(lambda a, b: a * b, numbers) # 720
- Write
deposit(balance, amount)β raiseValueErrorif amount β€ 0 - Write
withdraw(balance, amount)β raise error if amount > balance - Write
transaction_summary(*transactions)β returns(deposited, withdrawn, net) - Write
apply_interest(balance, rate=0.05) - Use
map+ lambda to apply a 2% fee to a list of withdrawal amounts
def deposit(balance, amount): if amount <= 0: raise ValueError("Must be positive") return balance + amount def withdraw(balance, amount): if amount <= 0: raise ValueError("Must be positive") if amount > balance: raise ValueError("Insufficient funds") return balance - amount def transaction_summary(*transactions): deposited = sum(t for t in transactions if t > 0) withdrawn = sum(abs(t) for t in transactions if t < 0) return deposited, withdrawn, deposited - withdrawn def apply_interest(balance, rate=0.05): return balance * (1 + rate) bal = 1000 bal = deposit(bal, 500) # 1500 bal = withdraw(bal, 200) # 1300 bal = apply_interest(bal) # 1365.0 withdrawals = [100, 250, 75] after_fee = list(map(lambda x: x * 1.02, withdrawals))
Data Structures: List, Tuple, Dict, Set
7.1 Lists β Ordered & Mutable
fruits = ["apple", "banana", "cherry"] fruits.append("date") # add to end fruits.insert(1, "avocado") # insert at index fruits.remove("apple") # remove by value popped = fruits.pop() # remove & return last fruits.sort() # sort in-place print(fruits.count("date")) # count occurrences # Safe copy (important!) copy = fruits[:] # slice copy # WARNING: other = fruits β NOT a copy, just a reference!
7.2 Tuples β Ordered & Immutable
point = (3, 4) x, y = point # unpacking from collections import namedtuple Person = namedtuple("Person", ["name", "age", "city"]) alice = Person("Alice", 30, "NYC") print(alice.name) # Alice
7.3 Dictionaries β Key-Value Store
person = {"name": "Alice", "age": 30}
person.get("salary", 0) # default if key missing
person["email"] = "a@mail.com" # add key
del person["age"] # delete key
for key, val in person.items():
print(f"{key}: {val}")
# Dict comprehension
squares = {x: x**2 for x in range(1, 6)}
7.4 Sets β Unordered & Unique
a = {1, 2, 3, 4, 5}
b = {4, 5, 6, 7, 8}
a | b # union {1,2,3,4,5,6,7,8}
a & b # intersection {4, 5}
a - b # difference {1, 2, 3}
a ^ b # symmetric diff {1,2,3,6,7,8}
# Remove duplicates from a list
unique = list(set(["a", "b", "a"]))
inventory = {
"apple": {"price": 0.50, "stock": 200, "category": "fruit"},
"bread": {"price": 2.50, "stock": 50, "category": "bakery"},
"milk": {"price": 1.20, "stock": 80, "category": "dairy"},
"banana": {"price": 0.30, "stock": 150, "category": "fruit"},
"cheese": {"price": 4.00, "stock": 30, "category": "dairy"},
}
- Print each item with price and stock
- Find items with stock below 60
- Calculate total inventory value (price Γ stock)
- Add eggs: price=$3.00, stock=100, category="dairy"
- Apply 10% price increase to all fruit
- Get a set of all unique categories
- Create a list of (item, price) tuples sorted by price descending
# 1. Print all for item, d in inventory.items(): print(f"{item:10} ${d['price']:.2f} stock: {d['stock']}") # 2. Low stock low = [item for item, d in inventory.items() if d["stock"] < 60] # 3. Total value total = sum(d["price"] * d["stock"] for d in inventory.values()) # 4. Add eggs inventory["eggs"] = {"price": 3.00, "stock": 100, "category": "dairy"} # 5. Fruit price increase for item, d in inventory.items(): if d["category"] == "fruit": d["price"] = round(d["price"] * 1.10, 2) # 6. Unique categories cats = {d["category"] for d in inventory.values()} # 7. Sorted by price by_price = sorted(inventory.items(), key=lambda x: x[1]["price"], reverse=True)
File I/O & Error Handling
8.1 Reading & Writing Files
# Use "with" β file closes automatically with open("data.txt", "r") as f: content = f.read() # entire file as string with open("data.txt", "r") as f: for line in f: # memory-efficient print(line.strip()) # "w" = write (overwrites), "a" = append with open("output.txt", "w") as f: f.write("Line 1\n")
8.2 CSV & JSON
import csv, json # Read CSV with open("sales.csv") as f: for row in csv.DictReader(f): print(row["product"], row["revenue"]) # JSON read / write data = {"name": "Alice", "scores": [95, 82, 78]} with open("data.json", "w") as f: json.dump(data, f, indent=4) with open("data.json") as f: loaded = json.load(f)
8.3 Exception Handling
try: x = int(input("Enter a number: ")) y = 100 / x except ValueError: print("That is not a number!") except ZeroDivisionError: print("Cannot divide by zero!") else: print(f"Result: {y}") # runs only if NO exception finally: print("This ALWAYS runs")
8.4 Custom Exceptions
class InsufficientFundsError(Exception): def __init__(self, amount, balance): super().__init__( f"Cannot withdraw ${amount}; balance is ${balance}") def withdraw(balance, amount): if amount > balance: raise InsufficientFundsError(amount, balance) return balance - amount
- Create a dict of 5 students with 4 test scores each
- Write
save_gradebook(data, filename)β saves to JSON - Write
load_gradebook(filename)β handlesFileNotFoundError - Write
get_average(scores)β raisesValueErrorif empty - Save, load, and print each student's average
- Export to CSV: Name, Score1β¦Score4, Average
import json, csv gradebook = { "Alice": [92, 88, 95, 91], "Bob": [75, 82, 70, 78], "Carol": [98, 96, 100, 94], } def save_gradebook(data, filename): with open(filename, "w") as f: json.dump(data, f, indent=4) def load_gradebook(filename): try: with open(filename) as f: return json.load(f) except FileNotFoundError: print("File not found"); return {} def get_average(scores): if not scores: raise ValueError("Empty list") return sum(scores) / len(scores) save_gradebook(gradebook, "grades.json") loaded = load_gradebook("grades.json") for name, scores in loaded.items(): print(f"{name}: {get_average(scores):.1f}") with open("grades.csv", "w", newline="") as f: w = csv.writer(f) w.writerow(["Name","S1","S2","S3","S4","Average"]) for name, scores in loaded.items(): w.writerow([name] + scores + [round(get_average(scores),1)])
Object-Oriented Programming
9.1 Classes & Objects
class Dog: species = "Canis lupus familiaris" # class variable def __init__(self, name, breed, age): self.name = name # instance variables self.breed = breed self.age = age def bark(self): return f"{self.name} says: Woof!" def __str__(self): return f"{self.name} ({self.breed}, {self.age} yrs)" rex = Dog("Rex", "Labrador", 3) print(rex) # Rex (Labrador, 3 yrs) print(rex.bark()) # Rex says: Woof!
9.2 Inheritance & Polymorphism
class Animal: def __init__(self, name): self.name = name def speak(self): raise NotImplementedError class Cat(Animal): def speak(self): return f"{self.name} says: Meow!" class Duck(Animal): def speak(self): return f"{self.name} says: Quack!" animals = [Cat("Whiskers"), Duck("Donald")] for a in animals: print(a.speak()) # polymorphism β same call, different result
9.3 Properties & Encapsulation
class BankAccount: def __init__(self, owner, balance=0): self.owner = owner self._balance = balance # _ = private by convention @property def balance(self): return self._balance @balance.setter def balance(self, amount): if amount < 0: raise ValueError("Cannot be negative") self._balance = amount def deposit(self, amount): self._balance += amount; return self # enables chaining acc = BankAccount("Alice", 1000) acc.deposit(500) # method chaining print(acc.balance) # 1500
9.4 Magic (Dunder) Methods Reference
| Method | Triggered by |
|---|---|
| __init__ | Construction: Dog() |
| __str__ | print(obj) / str(obj) |
| __repr__ | repr(obj), in REPL |
| __len__ | len(obj) |
| __getitem__ | obj[key] |
| __contains__ | "x" in obj |
| __add__ | obj1 + obj2 |
| __eq__ | obj1 == obj2 |
| __lt__ | obj1 < obj2 (enables sorting) |
| __enter__ / __exit__ | with obj: (context manager) |
- Create
Bookclass: title, author, year, isbn, available=True. Methods:checkout(),return_book() - Create
Libraryclass:add_book(),remove_book(isbn),find_by_author(),available_books(),__len__ - Add 5 books, checkout 2, call
available_books()andfind_by_author()
class Book: def __init__(self, title, author, year, isbn): self.title, self.author = title, author self.year, self.isbn = year, isbn self.available = True def checkout(self): if not self.available: raise ValueError(f"{self.title} already checked out") self.available = False def return_book(self): self.available = True def __str__(self): s = "β" if self.available else "β" return f"[{s}] {self.title} by {self.author}" class Library: def __init__(self, name): self.name, self.books = name, [] def add_book(self, book): self.books.append(book) def remove_book(self, isbn): self.books = [b for b in self.books if b.isbn != isbn] def find_by_author(self, author): return [b for b in self.books if author.lower() in b.author.lower()] def available_books(self): return [b for b in self.books if b.available] def __len__(self): return len(self.books)
Modules, Packages & Virtual Environments
10.1 Importing Modules
import math print(math.sqrt(16)) # 4.0 from math import sqrt, pi print(sqrt(25)) # 5.0 import numpy as np # alias import pandas as pd
10.2 Useful Standard Library Modules
| Module | Purpose |
|---|---|
| math | sqrt, pi, floor, ceil, log, sin, cos |
| random | random(), randint(), choice(), shuffle() |
| datetime | date, time, datetime, timedelta |
| os | getcwd(), listdir(), makedirs(), path.join() |
| re | Regular expressions: search, match, findall, sub |
| collections | Counter, defaultdict, deque, namedtuple |
| json | load, loads, dump, dumps |
| csv | reader, writer, DictReader, DictWriter |
| pathlib | Modern file path handling: Path() |
| itertools | chain, product, combinations, permutations |
10.3 Virtual Environments
python -m venv myenv # create source myenv/bin/activate # activate (Mac/Linux) myenv\Scripts\activate # activate (Windows) pip install numpy pandas # install into env pip freeze > requirements.txt # save deps pip install -r requirements.txt # recreate on another machine deactivate # exit
10.4 if __name__ == "__main__"
def main(): print("Running as main program") if __name__ == "__main__": main() # Imported: __name__ = "module_name" β main() NOT called # Run directly: __name__ = "__main__" β main() IS called
NumPy: Numerical Computing
NumPy provides fast, memory-efficient N-dimensional arrays that are 10β100Γ faster than pure Python loops. It is the foundation of Python's data science ecosystem.
import numpy as np
11.1 Creating Arrays
a = np.array([1, 2, 3, 4, 5]) b = np.array([[1,2,3], [4,5,6]]) # 2D np.zeros((3, 4)) # 3Γ4 of zeros np.ones((2, 3)) # 2Γ3 of ones np.full((3, 3), 7) # 3Γ3 filled with 7 np.eye(4) # 4Γ4 identity matrix np.arange(0, 10, 2) # [0,2,4,6,8] np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1.0] np.random.rand(3, 3) # 3Γ3 uniform [0,1) np.random.randint(1, 100, (4,4)) # random ints
11.2 Array Attributes β Know These!
a = np.array([[1,2,3],[4,5,6]]) a.shape # (2, 3) β rows, columns a.ndim # 2 β number of dimensions a.size # 6 β total elements a.dtype # int64 β data type a.astype(np.float64) # change dtype
11.3 Indexing & Slicing
a = np.array([[10,20,30],[40,50,60],[70,80,90]]) a[1, 2] # 60 β row 1, col 2 a[0, :] # [10 20 30] β entire first row a[:, 1] # [20 50 80] β entire second column a[1:, 1:] # [[50 60][80 90]] # Boolean indexing a[a > 50] # [60 70 80 90] a[a < 30] = 0 # set elements < 30 to 0 np.where(a > 50, a, -1) # conditional fill
11.4 Vectorised Operations (No Loops Needed!)
a = np.array([1, 2, 3, 4]) b = np.array([10, 20, 30, 40]) a + b # [11 22 33 44] a * b # [10 40 90 160] a ** 2 # [1 4 9 16] a * 3 # [3 6 9 12] β broadcasting (scalar) np.sqrt(a) # [1. 1.41 1.73 2.] np.exp(a) # e^1, e^2, β¦ np.log(b) # natural log
11.5 Aggregation Functions
data = np.array([[4,7,2,9],[1,8,5,3],[6,0,4,8]]) np.sum(data) # 57 np.sum(data, axis=0) # sum each column np.sum(data, axis=1) # sum each row np.mean(data) np.median(data) np.std(data) np.argmin(data) # index of min (flattened) np.argmax(data) np.cumsum([1,2,3,4]) # [1 3 6 10]
11.6 Reshaping & Stacking
a = np.arange(12) b = a.reshape(3, 4) # 3 rows, 4 cols c = b.T # transpose β (4, 3) d = b.flatten() # back to 1D x = np.array([1, 2, 3]) y = np.array([4, 5, 6]) np.vstack([x, y]) # vertical stack β (2,3) np.hstack([x, y]) # horizontal β (6,) np.column_stack([x, y]) # as columns β (3,2)
11.7 Linear Algebra
A = np.array([[1,2],[3,4]]) B = np.array([[5,6],[7,8]]) A @ B # matrix multiplication np.linalg.det(A) # determinant: -2.0 np.linalg.inv(A) # inverse b = np.array([5, 11]) x = np.linalg.solve(A, b) # solve Ax = b
np.random.seed(99) temps = np.random.normal(loc=15, scale=8, size=365)
- Find mean, median, min, max, std
- Count days above 25Β°C
- Find indices of the 5 hottest days
- Convert to Fahrenheit:
F = C * 9/5 + 32 - Split into 4 seasons, find mean temp per season
- Create 52Γ7 matrix (weeksΓdays), find hottest day per week
- Normalise to 0β1:
(x - min) / (max - min)
import numpy as np np.random.seed(99) temps = np.random.normal(loc=15, scale=8, size=365) print(f"Mean: {temps.mean():.2f}Β°C") print(f"Median: {np.median(temps):.2f}Β°C") print(f"Min: {temps.min():.2f}Β°C") print(f"Max: {temps.max():.2f}Β°C") hot_days = np.sum(temps > 25) top5_idx = np.argsort(temps)[-5:][::-1] temps_f = temps * 9/5 + 32 seasons = np.array_split(temps, 4) names = ["Spring", "Summer", "Autumn", "Winter"] for name, s in zip(names, seasons): print(f"{name}: {s.mean():.2f}Β°C") weekly = temps[:364].reshape(52, 7) hottest = weekly.max(axis=1) norm = (temps - temps.min()) / (temps.max() - temps.min())
Pandas: Data Analysis
Pandas provides two powerful structures β Series (1D) and DataFrame (2D) β that make tabular data analysis intuitive and fast.
import pandas as pd import numpy as np
12.1 Series
prices = pd.Series([1.99, 3.49, 0.99], index=["apple", "bread", "banana"]) print(prices["bread"]) # 3.49 print(prices[prices > 1.5]) # boolean filter
12.2 DataFrame Creation
df = pd.DataFrame({
"name": ["Alice", "Bob", "Carol", "David"],
"dept": ["Eng", "HR", "Eng", "Fin"],
"salary": [90000, 55000, 95000, 70000],
"years": [5, 3, 8, 4],
})
df.shape # (4, 4)
df.info() # types + nulls
df.describe() # statistics
12.3 Selecting Data
df["salary"] # Series df[["name", "salary"]] # DataFrame df.loc[0] # row by label df.loc[1:3, "name":"salary"] # slice df.iloc[0, 2] # row 0, col 2 (integer position) # Boolean filters df[df["salary"] > 70000] df[(df["dept"] == "Eng") & (df["years"] > 4)] df[df["dept"].isin(["Eng", "Fin"])]
12.4 Modifying DataFrames
df["bonus"] = df["salary"] * 0.10 # add column df["salary"] = df["salary"] * 1.05 # 5% raise df["level"] = df["years"].apply( lambda y: "Senior" if y >= 5 else "Junior") df.drop(columns=["bonus"], inplace=True) df.rename(columns={"years": "experience"}, inplace=True)
12.5 Handling Missing Data
df.isnull().sum() # count NaN per column df.dropna() # remove rows with ANY NaN df["salary"].fillna(df["salary"].mean(), inplace=True) df.fillna(method="ffill") # forward fill
12.6 GroupBy β Split β Apply β Combine
grp = df.groupby("dept") grp["salary"].mean() # mean salary per dept grp["salary"].agg(["mean","min","max","std"]) grp.size() # headcount # Transform β keep original shape df["dept_avg"] = df.groupby("dept")["salary"].transform("mean")
12.7 Merging & Joining
merged = pd.merge(employees, departments,
on="dept_id", how="inner")
# how: "inner" | "left" | "right" | "outer"
df_all = pd.concat([df1, df2], ignore_index=True) # stack rows
12.8 Pivot Tables
pivot = sales.pivot_table(
values="revenue",
index="month",
columns="product",
aggfunc="sum"
)
12.9 Reading & Writing Data
df = pd.read_csv("data.csv") df.to_csv("output.csv", index=False) df = pd.read_excel("data.xlsx", sheet_name="Sheet1") df.to_excel("output.xlsx", index=False) df = pd.read_json("data.json")
12.10 String & DateTime Operations
# .str accessor df["name"].str.upper() df["email"].str.contains("@gmail") df["phone"].str.replace("-", "") # .dt accessor df["date"] = pd.to_datetime(df["date_str"]) df["year"] = df["date"].dt.year df["weekday"] = df["date"].dt.day_name() # Resample time series monthly = df.set_index("date").resample("ME").sum()
np.random.seed(42); n = 200
df = pd.DataFrame({
"date": pd.date_range("2024-01-01", periods=n, freq="D"),
"store": np.random.choice(["A","B","C","D"], n),
"product": np.random.choice(["Coffee","Tea","Cake","Sandwich"], n),
"qty": np.random.randint(1, 50, n),
"price": np.random.uniform(2, 12, n).round(2),
})
df["revenue"] = (df["qty"] * df["price"]).round(2)
- Show first 5 and last 5 rows
- Total revenue per store (groupby)
- Best-selling product by total quantity
- Filter: transactions with revenue > $300
- Add month column, find monthly revenue trends
- Best day of week by average revenue
- Pivot table: stores Γ products Γ sum of revenue
- Best store + product combination
- Find missing values per column
- Export summary to CSV
# 1. Head / tail df.head(5); df.tail(5) # 2. Revenue per store rev_store = df.groupby("store")["revenue"].sum().sort_values(ascending=False) # 3. Best-selling product best = df.groupby("product")["qty"].sum().idxmax() # 4. High-value transactions big = df[df["revenue"] > 300] # 5. Monthly revenue df["month"] = df["date"].dt.to_period("M") monthly = df.groupby("month")["revenue"].sum() # 6. Best weekday df["weekday"] = df["date"].dt.day_name() best_day = df.groupby("weekday")["revenue"].mean().idxmax() # 7. Pivot table pivot = df.pivot_table(values="revenue", index="store", columns="product", aggfunc="sum", fill_value=0) # 8. Best store + product sp = df.groupby(["store","product"])["revenue"].sum() print(sp.idxmax(), sp.max()) # 9. Missing values df.isnull().sum() # 10. Export summary = df.groupby(["store","product"])["revenue"].sum().reset_index() summary.to_csv("store_product_revenue.csv", index=False)
Capstone Projects
- Transaction class: date, description, amount, category, type (income/expense)
- Functions: add_transaction(), load_transactions(), save_transactions()
- Analysis: total income, total expenses, net savings, breakdown by category
- Filter by month/year Β· Identify top 3 spending categories
- Generate synthetic data: 50 students Γ 5 subjects Γ 4 exams
- Per-student: overall average, subject strengths/weaknesses
- Per-subject: class average, pass rate (β₯50), top/bottom 10%
- Identify students at risk (avg < 50) Β· Export full report to CSV
- Pivot table: students Γ subjects Γ average
- Simulate daily temp, humidity, rainfall for 10 cities Γ 365 days
- NumPy: vectorised heat index = f(temp, humidity)
- Pandas: resample to weekly/monthly summaries
- Hottest week, wettest month, most humid city
- Detect "heatwave" events: 3+ consecutive days above 35Β°C
- Export findings to JSON
Quick Reference
A.1 Built-in Functions
| Function | Description |
|---|---|
| print(*args) | Output to screen |
| input(prompt) | Read from keyboard (returns str) |
| len(x) | Length of sequence |
| range(start,stop,step) | Integer range iterator |
| type(x) | Return type of x |
| isinstance(x, T) | Check if x is instance of T |
| int(), float(), str(), bool() | Type conversion |
| sorted(it, key, reverse) | Return sorted list |
| enumerate(it, start) | Return (index, value) pairs |
| zip(*its) | Combine iterables element-wise |
| map(fn, it) | Apply function to iterable |
| filter(fn, it) | Filter iterable by function |
| sum / min / max | Aggregate numeric iterables |
| any(it) / all(it) | True if any/all truthy |
| help(x) / dir(x) | Show docs / list attributes |
A.2 Common Error Types
| Error | Common Cause |
|---|---|
| SyntaxError | Typo, missing colon, unclosed bracket |
| IndentationError | Inconsistent spaces/tabs |
| NameError | Variable used before assignment |
| TypeError | Wrong type for operation (str + int) |
| ValueError | Right type, wrong value (int("abc")) |
| IndexError | List index out of range |
| KeyError | Dict key does not exist |
| AttributeError | Object has no such attribute |
| ZeroDivisionError | Dividing by zero |
| FileNotFoundError | File path incorrect |
| ImportError | Module not found / not installed |
| RecursionError | Missing base case in recursion |
A.3 NumPy Quick Reference
| Function | Description |
|---|---|
| np.array() | Create array from list |
| np.zeros/ones/full(shape) | Filled arrays |
| np.arange / np.linspace | Range arrays |
| np.random.rand/randn/randint | Random arrays |
| a.shape, a.dtype, a.size | Array properties |
| a.reshape() / a.flatten() | Change shape / to 1D |
| a.T | Transpose |
| np.vstack / np.hstack | Stack arrays |
| np.sum/mean/std/var(axis=) | Aggregations |
| np.sort / np.argsort | Sort / sort indices |
| np.where(cond, a, b) | Conditional selection |
| a @ b / np.dot(a,b) | Matrix multiply |
| np.linalg.inv/det/solve | Linear algebra |
A.4 Pandas Quick Reference
| Operation | Syntax |
|---|---|
| Read CSV / Excel | pd.read_csv() / pd.read_excel() |
| First/last N rows | df.head(n) / df.tail(n) |
| Shape, info, describe | df.shape / df.info() / df.describe() |
| Select column | df["col"] or df.col |
| Select by label / position | df.loc[] / df.iloc[] |
| Boolean filter | df[df["col"] > val] |
| Add / drop column | df["new"]=val / df.drop(columns=) |
| Sort | df.sort_values("col", ascending=False) |
| Group & aggregate | df.groupby("col")["val"].agg(fn) |
| Pivot table | df.pivot_table(values,index,columns,aggfunc) |
| Merge | pd.merge(df1, df2, on=, how=) |
| Missing values | df.isnull().sum() / df.fillna() / df.dropna() |
| Apply function | df["col"].apply(fn) |
| String methods | df["col"].str.upper() etc. |
| DateTime | df["col"].dt.year etc. |
| Resample time series | df.resample("ME").sum() |
Happy Coding!
Practice daily, read others' code, and build real projects.