Data Governance & Quality
Data Governance is the process of managing the availability, usability, integrity, and security of data within an organization.
What is Data Governance?
Data governance establishes:
- Policies — Rules for data handling
- Standards — Consistent definitions and formats
- Processes — Workflows for data management
- Accountability — Clear ownership and responsibilities
Why It Matters
| Driver | Description |
|---|---|
| Data quality | Ensure accuracy, consistency, reliability |
| Compliance | Meet regulatory requirements (GDPR, HIPAA) |
| Common language | Standardized definitions across organization |
| Trust | Confidence in data for decision-making |
Data Governance Roles
| Role | Responsibility |
|---|---|
| Data Governance Officer | Oversees governance activities, enforces policies |
| Data Steward | Day-to-day data quality and standards |
| Data Owner | Business accountability for data domain |
| Data Quality Analyst | Monitors and improves data quality |
| Data Privacy Officer | Ensures regulatory compliance |
Data Quality
Data quality measures how well data meets its intended purpose:
Quality Dimensions
| Dimension | Description | Example Issue |
|---|---|---|
| Accuracy | Correctly represents reality | Wrong phone number |
| Completeness | All required data present | Missing email addresses |
| Consistency | Uniform across systems | ”USA” vs “United States” |
| Timeliness | Current and up-to-date | Outdated inventory counts |
| Validity | Conforms to rules/formats | Invalid date format |
| Uniqueness | No duplicates | Same customer twice |
Data Profiling
Data profiling examines data to understand its:
- Structure and format
- Distributions and patterns
- Anomalies and errors
- Relationships and dependencies
Goal: Identify quality issues before they impact analytics.
Master Data Management (MDM)
MDM maintains a single, consistent view of critical business entities:
What is Master Data?
Core entities shared across the organization:
- Customers
- Products
- Suppliers
- Locations
- Employees
Why MDM?
| Benefit | Description |
|---|---|
| Single source of truth | One authoritative record |
| Reduced redundancy | Eliminate duplicates |
| Better decisions | Consistent data across systems |
| Operational efficiency | Less time reconciling data |
MDM Types
| Type | Focus | Use Case |
|---|---|---|
| Operational MDM | Transactional systems | Real-time consistency |
| Analytical MDM | Warehouses and analytics | Reporting consistency |
Golden Record
The golden record is the single, trusted version of a master data entity.
Creation process:
- Data enrichment — Add missing information
- Cleansing — Correct errors
- Matching — Identify related records
- Merging — Combine matched records
- Survivorship — Select best values for each attribute
MDM Tools
- IBM InfoSphere MDM
- SAP Master Data Governance
- Informatica MDM
- Talend MDM
Enterprise Data Categories
| Category | Description | Examples |
|---|---|---|
| Metadata | Data about data | Schema, lineage, definitions |
| Transactional | Day-to-day operations | Orders, payments, logs |
| Master | Core business entities | Customers, products |
| Reference | Lookup/classification data | Country codes, status codes |
| Unstructured | No predefined format | Documents, emails, images |
Data Classification
Classification based on sensitivity:
| Level | Description | Examples |
|---|---|---|
| Public | Freely shareable | Press releases, public reports |
| Internal | Organization-only | Internal memos, policies |
| Confidential | Restricted access | Financial records, contracts |
| Restricted | Highest protection | PII, PHI, trade secrets |
Data Protection Regulations
Key Regulations
| Regulation | Jurisdiction | Focus |
|---|---|---|
| GDPR | European Union | Personal data protection, consent |
| HIPAA | United States | Healthcare information |
| CCPA | California, USA | Consumer privacy rights |
Protected Data Types
| Type | Definition | Examples |
|---|---|---|
| PII (Personally Identifiable Information) | Identifies an individual | Name, SSN, address, email |
| PHI (Protected Health Information) | Health-related information | Medical records, diagnoses |
GDPR Key Requirements
- Consent — Explicit permission for data processing
- Right to access — Individuals can request their data
- Right to erasure — “Right to be forgotten”
- Data portability — Users can transfer their data
- Breach notification — 72-hour reporting requirement
Entity-Relationship Diagrams (ERDs)
ERDs visualize database structure:
Components
| Element | Representation |
|---|---|
| Entity | Rectangle (table) |
| Attribute | Oval or listed inside entity |
| Relationship | Diamond or line with labels |
| Cardinality | Notation showing 1:1, 1:N, N:M |
Common Notations
- UML (Unified Modeling Language)
- Chen — Original ER notation
- Crow’s Foot — Shows cardinality visually
- IDEF1X — NIST standard
Designing Data Systems
Functional Design
Defines features, functions, and workflows:
- User interface specifications
- Data management requirements
- System interactions
Data Structure Design
Defines data organization and relationships:
- Entity-relationship diagrams
- Data models (conceptual → logical → physical)
- Constraints and validation rules
Tools
| Category | Examples |
|---|---|
| ER Modeling | ER/Studio, MySQL Workbench, Visio |
| Data Modeling | PowerDesigner, Toad Data Modeler |
| UML | Visual Paradigm, Enterprise Architect |
Key Takeaways
- Data governance establishes policies, standards, and accountability
- Data quality has six key dimensions: accuracy, completeness, consistency, timeliness, validity, uniqueness
- MDM creates golden records — single source of truth for master data
- Classify data by sensitivity — public, internal, confidential, restricted
- Know your regulations — GDPR, HIPAA, CCPA
- PII and PHI require special protection
- ERDs document database structure visually