SQL Ecosystem & Tools
Trino (formerly Presto)
Trino is an open-source, distributed SQL query engine for running interactive queries across multiple data sources.
Key Features:
- Query data where it lives (S3, HDFS, Kafka, MongoDB, etc.)
- Federated queries across diverse sources
- Parallel processing for speed
- User-Defined Functions (UDFs) in Java
SHOW CATALOGS;
SHOW TABLES FROM catalog.schema;
SELECT * FROM "catalog.schema.table";
Connectors: Kafka, MariaDB, Google Sheets, MongoDB, DRUID, Prometheus, HDFS, S3, GCS
Clients: Redash, Superset, Metabase, Grafana, Python, R
- Managed option: Starburst Galaxy
- Docs: trino.io
DuckDB
An embedded OLAP database optimized for analytical queries.
Great for local data analysis.
docker run -d -p 8888:8888 gethue/hue:latest
UIs for DuckDB:
3. Useful Tools for Database Exploration
- ChartDB: A visual database diagram editor (ERDs from a single query).
- DuckDB: An in-process OLAP database (the “SQLite for Analytics”).
- DBeaver / Beekeeper Studio: Universal database managers with excellent GUIs.
- Hue: An open-source SQL assistant for Big Data clusters (Hadoop/Hive).