Self-Hosted Business Intelligence Tools
If you interact with the data & analytics space, at some point you’ll need to build a dashboard.
This chapter covers open-source BI tools you can self-host using Docker.
Why Self-Host BI Tools?
Benefits:
- Data Control: Keep sensitive data on your infrastructure
- Cost Savings: No per-user licensing fees
- Customization: Full control over features and integrations
- Privacy: No data sent to third-party services
- Learning: Understand the full stack
Popular Commercial Alternatives:
- Power BI (Microsoft)
- Tableau (Salesforce)
- Looker (Google Cloud)
Open-Source BI Platforms
Apache Superset
Apache Superset is a modern, enterprise-ready business intelligence web application.
Key Features:
- Rich visualizations with 40+ chart types
- SQL IDE for data exploration
- Semantic layer for defining custom dimensions and metrics
- Support for most SQL databases
- Role-based access control
- Caching for performance
Docker Setup:
git clone https://github.com/apache/superset.git
cd superset
git checkout 3.0.0
# Production deployment
TAG=3.0.0 docker compose -f docker-compose-non-dev.yml pull
TAG=3.0.0 docker compose -f docker-compose-non-dev.yml up -d
Access at: http://localhost:8088
Default Credentials:
- Username:
admin - Password:
admin
Integration with Data Sources:
Superset works with:
- PostgreSQL, MySQL, MariaDB
- BigQuery, Redshift, Snowflake
- Trino SQL (ex-Presto SQL)
- ClickHouse for real-time analytics
- Many more via SQLAlchemy
Use Cases:
- Business dashboards
- Data exploration
- Real-time analytics
- Embedded analytics
Resources:
Metabase
Metabase is a simple, open-source BI tool that makes it easy for everyone in your company to ask questions and learn from data.
Key Features:
- User-friendly interface (no SQL required for basic queries)
- SQL mode for advanced users
- Automatic dashboard generation
- Email reports and alerts
- Embedded analytics
- 20+ visualization types
Docker Setup:
docker run -d -p 3000:3000 \
-v metabase-data:/metabase-data \
-e "MB_DB_FILE=/metabase-data/metabase.db" \
--name metabase \
metabase/metabase
Docker Compose:
version: '3.8'
services:
metabase:
image: metabase/metabase:latest
container_name: metabase
ports:
- "3000:3000"
volumes:
- metabase-data:/metabase-data
environment:
- MB_DB_FILE=/metabase-data/metabase.db
restart: unless-stopped
volumes:
metabase-data:
Access at: http://localhost:3000
Integration Examples:
With MariaDB:
- Deploy MariaDB:
version: '3.8'
services:
mariadb:
image: mariadb:latest
environment:
MYSQL_ROOT_PASSWORD: rootpassword
MYSQL_DATABASE: analytics
MYSQL_USER: metabase
MYSQL_PASSWORD: metabasepass
ports:
- "3306:3306"
volumes:
- mariadb-data:/var/lib/mysql
metabase:
image: metabase/metabase:latest
ports:
- "3000:3000"
depends_on:
- mariadb
volumes:
mariadb-data:
- In Metabase UI:
- Add Database → MariaDB
- Host:
mariadb - Port:
3306 - Database:
analytics - Username:
metabase - Password:
metabasepass
Embedded Analytics:
Metabase supports static embedding for dashboards:
<iframe
src="https://metabase.example.com/embed/dashboard/TOKEN"
frameborder="0"
width="800"
height="600"
allowtransparency
></iframe>
Use Cases:
- Quick business dashboards
- Self-service analytics for non-technical users
- IoT data visualization
- Embedded analytics in applications
Resources:
Redash
Redash is an open-source tool for querying, visualizing, and sharing data.
Key Features:
- Support for 35+ data sources
- SQL-based queries
- Collaborative query editing
- Scheduled queries and alerts
- API for programmatic access
- Dashboard sharing
Docker Setup:
git clone https://github.com/getredash/setup
cd setup
./setup.sh
Docker Compose (Simplified):
version: '3.8'
services:
server:
image: redash/redash:latest
command: server
depends_on:
- postgres
- redis
ports:
- "5000:5000"
environment:
REDASH_WEB_WORKERS: 4
REDASH_DATABASE_URL: "postgresql://postgres:password@postgres/postgres"
REDASH_REDIS_URL: "redis://redis:6379/0"
REDASH_SECRET_KEY: "your-secret-key-here"
restart: unless-stopped
worker:
image: redash/redash:latest
command: scheduler
depends_on:
- server
environment:
REDASH_DATABASE_URL: "postgresql://postgres:password@postgres/postgres"
REDASH_REDIS_URL: "redis://redis:6379/0"
postgres:
image: postgres:13
environment:
POSTGRES_PASSWORD: password
volumes:
- postgres-data:/var/lib/postgresql/data
redis:
image: redis:6-alpine
volumes:
postgres-data:
Access at: http://localhost:5000
Supported Data Sources:
- PostgreSQL, MySQL, MongoDB
- BigQuery, Redshift, Snowflake
- Elasticsearch
- ClickHouse
- REST APIs
- CSV files
- And 25+ more
Use Cases:
- SQL-focused teams
- Multi-source data analysis
- Scheduled reports
- API-driven dashboards
Resources:
Grafana
Grafana is the leading open-source platform for monitoring and observability.
Key Features:
- Beautiful, flexible dashboards
- Support for 100+ data sources
- Alerting and notifications
- Plugin ecosystem
- Template variables for dynamic dashboards
- JSON-based dashboard definitions
Docker Setup:
docker run -d \
-p 3000:3000 \
--name=grafana \
-v grafana-storage:/var/lib/grafana \
grafana/grafana-oss
Docker Compose:
version: '3.8'
services:
grafana:
image: grafana/grafana-oss:latest
container_name: grafana
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
restart: unless-stopped
volumes:
grafana-data:
Access at: http://localhost:3000
Default Credentials:
- Username:
admin - Password:
admin(change on first login)
Popular Data Sources:
- Prometheus (metrics)
- Loki (logs)
- InfluxDB (time-series)
- Elasticsearch
- PostgreSQL, MySQL
- Tempo (traces)
LGTM Stack:
The LGTM stack is a complete observability solution:
- Loki: Log aggregation
- Grafana: Visualization
- Tempo: Distributed tracing
- Prometheus: Metrics
Docker Compose for LGTM:
version: '3.8'
services:
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
volumes:
- grafana-data:/var/lib/grafana
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
volumes:
- loki-data:/loki
tempo:
image: grafana/tempo:latest
ports:
- "3200:3200"
volumes:
- tempo-data:/var/tempo
volumes:
grafana-data:
prometheus-data:
loki-data:
tempo-data:
Dashboard as Code:
Grafana dashboards are defined in JSON:
{
"dashboard": {
"title": "My Dashboard",
"panels": [
{
"id": 1,
"type": "graph",
"title": "CPU Usage",
"targets": [
{
"expr": "rate(cpu_usage[5m])"
}
]
}
]
}
}
Use Cases:
- Infrastructure monitoring
- Application performance monitoring (APM)
- IoT sensor data visualization
- Business metrics
- Log analysis
Resources:
Specialized Tools
Kibana (ELK Stack)
Kibana is the visualization layer for the Elastic Stack (Elasticsearch, Logstash, Kibana).
Key Features:
- Powerful search and filtering
- Real-time data exploration
- Machine learning capabilities
- Geospatial analysis
- Security analytics
Query Languages:
KQL (Kibana Query Language):
Simple, user-friendly syntax:
viewerID : * and site : "somename" and (HttpPlayerPlaybackEndEvent_assetType : * or HttpPlayerStartEvent_assetType : *)
Elasticsearch DSL (Domain-Specific Language):
JSON-based, more powerful:
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"viewerID": {"value": "*"}
}
},
{
"term": {
"site.keyword": {"value": "AD04"}
}
},
{
"bool": {
"should": [
{
"wildcard": {
"HttpPlayerPlaybackEndEvent_assetType": {"value": "*"}
}
},
{
"wildcard": {
"HttpPlayerStartEvent_assetType": {"value": "*"}
}
}
]
}
}
]
}
}
}
Relationship:
- Lucene: Core search engine library
- KQL: User-friendly layer for Kibana
- DSL: Most powerful and flexible query language
Use Cases:
- Log analysis
- Security monitoring (SIEM)
- Application performance monitoring
- Business analytics on Elasticsearch data
Chronograf
Chronograf is the visualization component of the TICK stack (Telegraf, InfluxDB, Chronograf, Kapacitor).
Key Features:
- Real-time visualization of time-series data
- Template-based dashboards
- Alert management
- Data exploration
Use Cases:
- IoT sensor monitoring
- Infrastructure metrics
- Time-series data visualization
Integration:
- Works seamlessly with InfluxDB
- Pre-built templates for common use cases
Comparison & Decision Guide
Feature Comparison
| Tool | Best For | Difficulty | Data Sources | Customization |
|---|---|---|---|---|
| Superset | Enterprise BI | Medium | 40+ | High |
| Metabase | Self-service | Easy | 20+ | Medium |
| Redash | SQL users | Easy | 35+ | Medium |
| Grafana | Monitoring | Medium | 100+ | Very High |
| Kibana | Log analysis | Medium | Elasticsearch | High |
When to Use Which Tool
Choose Superset if:
- You need enterprise-grade BI
- You want rich visualizations
- You have technical users
- You need semantic layers
Choose Metabase if:
- You want simplicity
- You have non-technical users
- You need quick setup
- You want embedded analytics
Choose Redash if:
- Your team knows SQL
- You have multiple data sources
- You need scheduled queries
- You want API access
Choose Grafana if:
- You’re monitoring infrastructure
- You need real-time dashboards
- You use Prometheus/InfluxDB
- You want alerting
Choose Kibana if:
- You use Elasticsearch
- You need log analysis
- You want machine learning
- You need security analytics
Resource Requirements
Minimum Requirements:
| Tool | RAM | CPU | Storage |
|---|---|---|---|
| Superset | 4GB | 2 cores | 10GB |
| Metabase | 2GB | 1 core | 5GB |
| Redash | 2GB | 2 cores | 10GB |
| Grafana | 512MB | 1 core | 1GB |
| Kibana | 2GB | 2 cores | 10GB |
Production Recommendations:
- Superset: 8GB RAM, 4 cores
- Metabase: 4GB RAM, 2 cores
- Redash: 4GB RAM, 2 cores
- Grafana: 2GB RAM, 2 cores
- Kibana: 4GB RAM, 4 cores
Docker Deployment Best Practices
General Guidelines
1. Use Docker Compose:
version: '3.8'
services:
app:
image: your-bi-tool:latest
ports:
- "3000:3000"
volumes:
- app-data:/data
environment:
- DB_HOST=database
- DB_PORT=5432
restart: unless-stopped
depends_on:
- database
database:
image: postgres:13
environment:
POSTGRES_PASSWORD: secure-password
volumes:
- db-data:/var/lib/postgresql/data
volumes:
app-data:
db-data:
2. Use Environment Variables:
# .env file
DB_PASSWORD=secure-password
SECRET_KEY=random-secret-key
ADMIN_EMAIL=admin@example.com
3. Persist Data:
Always use volumes for:
- Database data
- Application configuration
- User uploads
- Cache
4. Networking:
networks:
frontend:
driver: bridge
backend:
driver: bridge
services:
app:
networks:
- frontend
- backend
database:
networks:
- backend
5. Security:
- Use secrets for sensitive data
- Run containers as non-root users
- Limit resource usage
- Keep images updated
services:
app:
image: app:latest
user: "1000:1000"
deploy:
resources:
limits:
cpus: '2'
memory: 4G
Real-World Use Cases
IoT Monitoring
Scenario: Monitor temperature sensors across multiple locations
Stack:
- Data Collection: MQTT → InfluxDB
- Visualization: Grafana
- Alerting: Grafana alerts
Why: Real-time updates, time-series optimization, built-in alerting
Business Analytics
Scenario: Sales dashboard for non-technical users
Stack:
- Data Storage: PostgreSQL
- Visualization: Metabase
- Embedding: Metabase embedded in company portal
Why: User-friendly, no SQL required, easy embedding
Log Analysis
Scenario: Application log monitoring and troubleshooting
Stack:
- Data Collection: Filebeat → Elasticsearch
- Visualization: Kibana
- Alerting: ElastAlert
Why: Full-text search, powerful filtering, machine learning
Multi-Source Analytics
Scenario: Combine data from PostgreSQL, MongoDB, and APIs
Stack:
- Visualization: Redash
- Scheduling: Redash scheduled queries
- Sharing: Redash dashboards
Why: 35+ data source support, SQL-based, API access
Conclusion
Self-hosting BI tools gives you:
- Control: Over your data and infrastructure
- Cost Savings: No per-user fees
- Flexibility: Customize to your needs
- Privacy: Data stays on your servers
Getting Started:
- Start Simple: Begin with Metabase or Grafana
- Understand Your Needs: Match tool to use case
- Use Docker: Simplifies deployment and management
- Plan for Scale: Consider resource requirements
- Secure Your Setup: Use SSL, authentication, and access controls