JAlcocerTech E-books

Python 101 for Data Analytics

As part of your Data Analytics journey, mastering Python is essential.

This guide covers the fundamentals and the “must-know” setup to start your projects effectively.


About Python

Philosophy & Syntax

Python’s guiding philosophy is encapsulated in “The Zen of Python”, which prioritizes readability and simplicity.

Its hallmarks include:

  • Concise Syntax: Strongly typed yet dynamic (no explicit type annotations required).
  • Indentation: Code blocks are defined by whitespace, promoting clean formatting.
  • Versatility: Supports functional, procedural, and object-oriented programming (OOP).

Applications in Data

Python is the go-to choice for:

  • Data Analysis: (Pandas, NumPy).
  • Visualization: (Matplotlib, Seaborn).
  • Machine Learning: (Scikit-learn, TensorFlow).

Setup & Environment

Installation

Python is Free and Open Source.

You can download it at python.org.

[!NOTE] Check your version: Run python --version or import sys; print(sys.version) to verify your installation.

Choosing an IDE

  • VSCode / VSCodium: General-purpose excellence.
  • Spyder: Optimized for Data Science.
  • Jupyter Notebook: The industry standard for Exploratory Data Analysis (EDA).

Python’s Data Structures

StructureMutable?Ordered?Duplicates?Usage
ListsYesYesYes[1, 2, 2, 3]
DictionariesYesNo*Keys No{"id": 1, "name": "A"}
SetsYesNoNo{"a", "b", "c"}
TuplesNoYesYes(10, 20) (Space efficient)

*Dictionaries are insertion-ordered since Python 3.7+.


Logic & Loops

The “Pythonic” Way

Avoid using manual index counters.

Loop directly over the contents:

# List Comprehension (High Efficiency)
best_list = [item for item in your_list if len(item) >= 4]

Regular Loops

for item in your_list:
    if len(item) >= 4:
        print(item)

Functions & Modular Code

Regular Functions

Use def to create reusable logic.

For organizational clarity, you can import custom functions from other scripts:

from my_utils import calculate_metric as udf

Lambda Functions

Anonymous, one-line functions useful for quick operations:

multiply = lambda a, b : a * b
print(multiply(2, 3)) # Output: 6

Documentation (Docstrings)

Always document your complex functions to ensure your work is understandable:

def add(num1, num2):
    """Add up two integer numbers."""
    return num1 + num2

Object-Oriented Programming (Briefly)

Classes allow you to bundle data and functionality together.

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def greet(self):
        print(f"Hello, my name is {self.name}")

p1 = Person("Yosua", 31)
p1.greet()

Managing Dependencies (FAQ)

To ensure your code is reproducible, you must manage environments properly.

1. Venv (Built-in)

Ideal for simple, project-specific isolation.

python -m venv myenv
source myenv/bin/activate # Linux
pip install pandas

2. Conda

Best for cross-platform projects involving non-Python dependencies (R, C++).

conda create -n myenv python=3.11
conda activate myenv
conda install numpy

3. UV (High Performance)

UV is an extremely fast Python package manager written in Rust.

It can replace pip, pip-compile, and venv while being 10-100x faster.

# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh

# Usage Example
uv init
uv add pandas numpy
uv run app.py

4. Pipenv & Poetry

Modern tools that handle lockfiles and packaging more robustly, ensuring that others can replicate your exact environment.

[!TIP] Reproduction is key: Always share a requirements.txt or pyproject.toml with your data projects.


Best Practices & Reliability

Project Structure (The README)

A good code repository starts with a great README.md. It should explain:

  • What the project does.
  • How to set up the environment.
  • How to run the main application or notebooks.

Managing Secrets (.env)

Never hardcode API keys (like OPENAI_API_KEY) in your scripts. Use a .env file and the python-dotenv library.

Example .env file:

OPENAI_API_KEY="your-api-key-here"
DB_PASSWORD="secure-password"

Loading secrets in Python:

import os
print(os.getenv("OPENAI_API_KEY"))

FAQ

Using Python to Pull AWS S3 Data

If you have data stored in S3 buckets, you can interact with it using the AWS CLI for exploration and Boto3 for programmatic access.

1. AWS CLI (Exploration)

Install the CLI and configure your credentials to browse your buckets from the terminal.

# Check installation
aws --version

# Configure credentials
aws configure

# List buckets
aws s3 ls

2. Boto3 (Python Integration)

Boto3 is the official AWS SDK for Python. It allows you to download, upload, and query data directly from your scripts.

pip install boto3

Basic Example:

import boto3

# Initialize S3 client
s3 = boto3.client('s3')

# List objects in a specific bucket
response = s3.list_objects_v2(Bucket='your-bucket-name')
for obj in response.get('Contents', []):
    print(obj['Key'])