JAlcocerTech E-books

Self-Hosted Business Intelligence Tools

If you interact with the data & analytics space, at some point you’ll need to build a dashboard.

This chapter covers open-source BI tools you can self-host using Docker.

Why Self-Host BI Tools?

Benefits:

  • Data Control: Keep sensitive data on your infrastructure
  • Cost Savings: No per-user licensing fees
  • Customization: Full control over features and integrations
  • Privacy: No data sent to third-party services
  • Learning: Understand the full stack

Popular Commercial Alternatives:

  • Power BI (Microsoft)
  • Tableau (Salesforce)
  • Looker (Google Cloud)

Open-Source BI Platforms

Apache Superset

Apache Superset is a modern, enterprise-ready business intelligence web application.

Key Features:

  • Rich visualizations with 40+ chart types
  • SQL IDE for data exploration
  • Semantic layer for defining custom dimensions and metrics
  • Support for most SQL databases
  • Role-based access control
  • Caching for performance

Docker Setup:

git clone https://github.com/apache/superset.git
cd superset
git checkout 3.0.0

# Production deployment
TAG=3.0.0 docker compose -f docker-compose-non-dev.yml pull
TAG=3.0.0 docker compose -f docker-compose-non-dev.yml up -d

Access at: http://localhost:8088

Default Credentials:

  • Username: admin
  • Password: admin

Integration with Data Sources:

Superset works with:

  • PostgreSQL, MySQL, MariaDB
  • BigQuery, Redshift, Snowflake
  • Trino SQL (ex-Presto SQL)
  • ClickHouse for real-time analytics
  • Many more via SQLAlchemy

Use Cases:

  • Business dashboards
  • Data exploration
  • Real-time analytics
  • Embedded analytics

Resources:

Metabase

Metabase is a simple, open-source BI tool that makes it easy for everyone in your company to ask questions and learn from data.

Key Features:

  • User-friendly interface (no SQL required for basic queries)
  • SQL mode for advanced users
  • Automatic dashboard generation
  • Email reports and alerts
  • Embedded analytics
  • 20+ visualization types

Docker Setup:

docker run -d -p 3000:3000 \
  -v metabase-data:/metabase-data \
  -e "MB_DB_FILE=/metabase-data/metabase.db" \
  --name metabase \
  metabase/metabase

Docker Compose:

version: '3.8'
services:
  metabase:
    image: metabase/metabase:latest
    container_name: metabase
    ports:
      - "3000:3000"
    volumes:
      - metabase-data:/metabase-data
    environment:
      - MB_DB_FILE=/metabase-data/metabase.db
    restart: unless-stopped

volumes:
  metabase-data:

Access at: http://localhost:3000

Integration Examples:

With MariaDB:

  1. Deploy MariaDB:
version: '3.8'
services:
  mariadb:
    image: mariadb:latest
    environment:
      MYSQL_ROOT_PASSWORD: rootpassword
      MYSQL_DATABASE: analytics
      MYSQL_USER: metabase
      MYSQL_PASSWORD: metabasepass
    ports:
      - "3306:3306"
    volumes:
      - mariadb-data:/var/lib/mysql

  metabase:
    image: metabase/metabase:latest
    ports:
      - "3000:3000"
    depends_on:
      - mariadb

volumes:
  mariadb-data:
  1. In Metabase UI:
    • Add Database → MariaDB
    • Host: mariadb
    • Port: 3306
    • Database: analytics
    • Username: metabase
    • Password: metabasepass

Embedded Analytics:

Metabase supports static embedding for dashboards:

<iframe
  src="https://metabase.example.com/embed/dashboard/TOKEN"
  frameborder="0"
  width="800"
  height="600"
  allowtransparency
></iframe>

Use Cases:

  • Quick business dashboards
  • Self-service analytics for non-technical users
  • IoT data visualization
  • Embedded analytics in applications

Resources:

Redash

Redash is an open-source tool for querying, visualizing, and sharing data.

Key Features:

  • Support for 35+ data sources
  • SQL-based queries
  • Collaborative query editing
  • Scheduled queries and alerts
  • API for programmatic access
  • Dashboard sharing

Docker Setup:

git clone https://github.com/getredash/setup
cd setup
./setup.sh

Docker Compose (Simplified):

version: '3.8'
services:
  server:
    image: redash/redash:latest
    command: server
    depends_on:
      - postgres
      - redis
    ports:
      - "5000:5000"
    environment:
      REDASH_WEB_WORKERS: 4
      REDASH_DATABASE_URL: "postgresql://postgres:password@postgres/postgres"
      REDASH_REDIS_URL: "redis://redis:6379/0"
      REDASH_SECRET_KEY: "your-secret-key-here"
    restart: unless-stopped

  worker:
    image: redash/redash:latest
    command: scheduler
    depends_on:
      - server
    environment:
      REDASH_DATABASE_URL: "postgresql://postgres:password@postgres/postgres"
      REDASH_REDIS_URL: "redis://redis:6379/0"

  postgres:
    image: postgres:13
    environment:
      POSTGRES_PASSWORD: password
    volumes:
      - postgres-data:/var/lib/postgresql/data

  redis:
    image: redis:6-alpine

volumes:
  postgres-data:

Access at: http://localhost:5000

Supported Data Sources:

  • PostgreSQL, MySQL, MongoDB
  • BigQuery, Redshift, Snowflake
  • Elasticsearch
  • ClickHouse
  • REST APIs
  • CSV files
  • And 25+ more

Use Cases:

  • SQL-focused teams
  • Multi-source data analysis
  • Scheduled reports
  • API-driven dashboards

Resources:

Grafana

Grafana is the leading open-source platform for monitoring and observability.

Key Features:

  • Beautiful, flexible dashboards
  • Support for 100+ data sources
  • Alerting and notifications
  • Plugin ecosystem
  • Template variables for dynamic dashboards
  • JSON-based dashboard definitions

Docker Setup:

docker run -d \
  -p 3000:3000 \
  --name=grafana \
  -v grafana-storage:/var/lib/grafana \
  grafana/grafana-oss

Docker Compose:

version: '3.8'
services:
  grafana:
    image: grafana/grafana-oss:latest
    container_name: grafana
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
    restart: unless-stopped

volumes:
  grafana-data:

Access at: http://localhost:3000

Default Credentials:

  • Username: admin
  • Password: admin (change on first login)

Popular Data Sources:

  • Prometheus (metrics)
  • Loki (logs)
  • InfluxDB (time-series)
  • Elasticsearch
  • PostgreSQL, MySQL
  • Tempo (traces)

LGTM Stack:

The LGTM stack is a complete observability solution:

  • Loki: Log aggregation
  • Grafana: Visualization
  • Tempo: Distributed tracing
  • Prometheus: Metrics

Docker Compose for LGTM:

version: '3.8'
services:
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    volumes:
      - grafana-data:/var/lib/grafana

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'

  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"
    volumes:
      - loki-data:/loki

  tempo:
    image: grafana/tempo:latest
    ports:
      - "3200:3200"
    volumes:
      - tempo-data:/var/tempo

volumes:
  grafana-data:
  prometheus-data:
  loki-data:
  tempo-data:

Dashboard as Code:

Grafana dashboards are defined in JSON:

{
  "dashboard": {
    "title": "My Dashboard",
    "panels": [
      {
        "id": 1,
        "type": "graph",
        "title": "CPU Usage",
        "targets": [
          {
            "expr": "rate(cpu_usage[5m])"
          }
        ]
      }
    ]
  }
}

Use Cases:

  • Infrastructure monitoring
  • Application performance monitoring (APM)
  • IoT sensor data visualization
  • Business metrics
  • Log analysis

Resources:

Specialized Tools

Kibana (ELK Stack)

Kibana is the visualization layer for the Elastic Stack (Elasticsearch, Logstash, Kibana).

Key Features:

  • Powerful search and filtering
  • Real-time data exploration
  • Machine learning capabilities
  • Geospatial analysis
  • Security analytics

Query Languages:

KQL (Kibana Query Language):

Simple, user-friendly syntax:

viewerID : * and site : "somename" and (HttpPlayerPlaybackEndEvent_assetType : * or HttpPlayerStartEvent_assetType : *)

Elasticsearch DSL (Domain-Specific Language):

JSON-based, more powerful:

{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "viewerID": {"value": "*"}
          }
        },
        {
          "term": {
            "site.keyword": {"value": "AD04"}
          }
        },
        {
          "bool": {
            "should": [
              {
                "wildcard": {
                  "HttpPlayerPlaybackEndEvent_assetType": {"value": "*"}
                }
              },
              {
                "wildcard": {
                  "HttpPlayerStartEvent_assetType": {"value": "*"}
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Relationship:

  • Lucene: Core search engine library
  • KQL: User-friendly layer for Kibana
  • DSL: Most powerful and flexible query language

Use Cases:

  • Log analysis
  • Security monitoring (SIEM)
  • Application performance monitoring
  • Business analytics on Elasticsearch data

Chronograf

Chronograf is the visualization component of the TICK stack (Telegraf, InfluxDB, Chronograf, Kapacitor).

Key Features:

  • Real-time visualization of time-series data
  • Template-based dashboards
  • Alert management
  • Data exploration

Use Cases:

  • IoT sensor monitoring
  • Infrastructure metrics
  • Time-series data visualization

Integration:

  • Works seamlessly with InfluxDB
  • Pre-built templates for common use cases

Comparison & Decision Guide

Feature Comparison

ToolBest ForDifficultyData SourcesCustomization
SupersetEnterprise BIMedium40+High
MetabaseSelf-serviceEasy20+Medium
RedashSQL usersEasy35+Medium
GrafanaMonitoringMedium100+Very High
KibanaLog analysisMediumElasticsearchHigh

When to Use Which Tool

Choose Superset if:

  • You need enterprise-grade BI
  • You want rich visualizations
  • You have technical users
  • You need semantic layers

Choose Metabase if:

  • You want simplicity
  • You have non-technical users
  • You need quick setup
  • You want embedded analytics

Choose Redash if:

  • Your team knows SQL
  • You have multiple data sources
  • You need scheduled queries
  • You want API access

Choose Grafana if:

  • You’re monitoring infrastructure
  • You need real-time dashboards
  • You use Prometheus/InfluxDB
  • You want alerting

Choose Kibana if:

  • You use Elasticsearch
  • You need log analysis
  • You want machine learning
  • You need security analytics

Resource Requirements

Minimum Requirements:

ToolRAMCPUStorage
Superset4GB2 cores10GB
Metabase2GB1 core5GB
Redash2GB2 cores10GB
Grafana512MB1 core1GB
Kibana2GB2 cores10GB

Production Recommendations:

  • Superset: 8GB RAM, 4 cores
  • Metabase: 4GB RAM, 2 cores
  • Redash: 4GB RAM, 2 cores
  • Grafana: 2GB RAM, 2 cores
  • Kibana: 4GB RAM, 4 cores

Docker Deployment Best Practices

General Guidelines

1. Use Docker Compose:

version: '3.8'
services:
  app:
    image: your-bi-tool:latest
    ports:
      - "3000:3000"
    volumes:
      - app-data:/data
    environment:
      - DB_HOST=database
      - DB_PORT=5432
    restart: unless-stopped
    depends_on:
      - database

  database:
    image: postgres:13
    environment:
      POSTGRES_PASSWORD: secure-password
    volumes:
      - db-data:/var/lib/postgresql/data

volumes:
  app-data:
  db-data:

2. Use Environment Variables:

# .env file
DB_PASSWORD=secure-password
SECRET_KEY=random-secret-key
ADMIN_EMAIL=admin@example.com

3. Persist Data:

Always use volumes for:

  • Database data
  • Application configuration
  • User uploads
  • Cache

4. Networking:

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge

services:
  app:
    networks:
      - frontend
      - backend
  database:
    networks:
      - backend

5. Security:

  • Use secrets for sensitive data
  • Run containers as non-root users
  • Limit resource usage
  • Keep images updated
services:
  app:
    image: app:latest
    user: "1000:1000"
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G

Real-World Use Cases

IoT Monitoring

Scenario: Monitor temperature sensors across multiple locations

Stack:

  • Data Collection: MQTT → InfluxDB
  • Visualization: Grafana
  • Alerting: Grafana alerts

Why: Real-time updates, time-series optimization, built-in alerting

Business Analytics

Scenario: Sales dashboard for non-technical users

Stack:

  • Data Storage: PostgreSQL
  • Visualization: Metabase
  • Embedding: Metabase embedded in company portal

Why: User-friendly, no SQL required, easy embedding

Log Analysis

Scenario: Application log monitoring and troubleshooting

Stack:

  • Data Collection: Filebeat → Elasticsearch
  • Visualization: Kibana
  • Alerting: ElastAlert

Why: Full-text search, powerful filtering, machine learning

Multi-Source Analytics

Scenario: Combine data from PostgreSQL, MongoDB, and APIs

Stack:

  • Visualization: Redash
  • Scheduling: Redash scheduled queries
  • Sharing: Redash dashboards

Why: 35+ data source support, SQL-based, API access

Conclusion

Self-hosting BI tools gives you:

  • Control: Over your data and infrastructure
  • Cost Savings: No per-user fees
  • Flexibility: Customize to your needs
  • Privacy: Data stays on your servers

Getting Started:

  1. Start Simple: Begin with Metabase or Grafana
  2. Understand Your Needs: Match tool to use case
  3. Use Docker: Simplifies deployment and management
  4. Plan for Scale: Consider resource requirements
  5. Secure Your Setup: Use SSL, authentication, and access controls