Python 101 for Data Analytics
As part of your Data Analytics journey, mastering Python is essential.
This guide covers the fundamentals and the “must-know” setup to start your projects effectively.
About Python
Philosophy & Syntax
Python’s guiding philosophy is encapsulated in “The Zen of Python”, which prioritizes readability and simplicity.
Its hallmarks include:
- Concise Syntax: Strongly typed yet dynamic (no explicit type annotations required).
- Indentation: Code blocks are defined by whitespace, promoting clean formatting.
- Versatility: Supports functional, procedural, and object-oriented programming (OOP).
Applications in Data
Python is the go-to choice for:
- Data Analysis: (Pandas, NumPy).
- Visualization: (Matplotlib, Seaborn).
- Machine Learning: (Scikit-learn, TensorFlow).
Setup & Environment
Installation
Python is Free and Open Source.
You can download it at python.org.
[!NOTE] Check your version: Run
python --versionorimport sys; print(sys.version)to verify your installation.
Choosing an IDE
- VSCode / VSCodium: General-purpose excellence.
- Spyder: Optimized for Data Science.
- Jupyter Notebook: The industry standard for Exploratory Data Analysis (EDA).
Python’s Data Structures
| Structure | Mutable? | Ordered? | Duplicates? | Usage |
|---|---|---|---|---|
| Lists | Yes | Yes | Yes | [1, 2, 2, 3] |
| Dictionaries | Yes | No* | Keys No | {"id": 1, "name": "A"} |
| Sets | Yes | No | No | {"a", "b", "c"} |
| Tuples | No | Yes | Yes | (10, 20) (Space efficient) |
*Dictionaries are insertion-ordered since Python 3.7+.
Logic & Loops
The “Pythonic” Way
Avoid using manual index counters.
Loop directly over the contents:
# List Comprehension (High Efficiency)
best_list = [item for item in your_list if len(item) >= 4]
Regular Loops
for item in your_list:
if len(item) >= 4:
print(item)
Functions & Modular Code
Regular Functions
Use def to create reusable logic.
For organizational clarity, you can import custom functions from other scripts:
from my_utils import calculate_metric as udf
Lambda Functions
Anonymous, one-line functions useful for quick operations:
multiply = lambda a, b : a * b
print(multiply(2, 3)) # Output: 6
Documentation (Docstrings)
Always document your complex functions to ensure your work is understandable:
def add(num1, num2):
"""Add up two integer numbers."""
return num1 + num2
Object-Oriented Programming (Briefly)
Classes allow you to bundle data and functionality together.
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def greet(self):
print(f"Hello, my name is {self.name}")
p1 = Person("Yosua", 31)
p1.greet()
Managing Dependencies (FAQ)
To ensure your code is reproducible, you must manage environments properly.
1. Venv (Built-in)
Ideal for simple, project-specific isolation.
python -m venv myenv
source myenv/bin/activate # Linux
pip install pandas
2. Conda
Best for cross-platform projects involving non-Python dependencies (R, C++).
conda create -n myenv python=3.11
conda activate myenv
conda install numpy
3. UV (High Performance)
UV is an extremely fast Python package manager written in Rust.
It can replace pip, pip-compile, and venv while being 10-100x faster.
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
# Usage Example
uv init
uv add pandas numpy
uv run app.py
4. Pipenv & Poetry
Modern tools that handle lockfiles and packaging more robustly, ensuring that others can replicate your exact environment.
[!TIP] Reproduction is key: Always share a
requirements.txtorpyproject.tomlwith your data projects.
Best Practices & Reliability
Project Structure (The README)
A good code repository starts with a great README.md. It should explain:
- What the project does.
- How to set up the environment.
- How to run the main application or notebooks.
Managing Secrets (.env)
Never hardcode API keys (like OPENAI_API_KEY) in your scripts. Use a .env file and the python-dotenv library.
Example .env file:
OPENAI_API_KEY="your-api-key-here"
DB_PASSWORD="secure-password"
Loading secrets in Python:
import os
print(os.getenv("OPENAI_API_KEY"))
FAQ
Using Python to Pull AWS S3 Data
If you have data stored in S3 buckets, you can interact with it using the AWS CLI for exploration and Boto3 for programmatic access.
1. AWS CLI (Exploration)
Install the CLI and configure your credentials to browse your buckets from the terminal.
# Check installation
aws --version
# Configure credentials
aws configure
# List buckets
aws s3 ls
2. Boto3 (Python Integration)
Boto3 is the official AWS SDK for Python. It allows you to download, upload, and query data directly from your scripts.
pip install boto3
Basic Example:
import boto3
# Initialize S3 client
s3 = boto3.client('s3')
# List objects in a specific bucket
response = s3.list_objects_v2(Bucket='your-bucket-name')
for obj in response.get('Contents', []):
print(obj['Key'])