JAlcocerTech E-books

Data Governance & Quality

Data Governance is the process of managing the availability, usability, integrity, and security of data within an organization.


What is Data Governance?

Data governance establishes:

  • Policies — Rules for data handling
  • Standards — Consistent definitions and formats
  • Processes — Workflows for data management
  • Accountability — Clear ownership and responsibilities

Why It Matters

DriverDescription
Data qualityEnsure accuracy, consistency, reliability
ComplianceMeet regulatory requirements (GDPR, HIPAA)
Common languageStandardized definitions across organization
TrustConfidence in data for decision-making

Data Governance Roles

RoleResponsibility
Data Governance OfficerOversees governance activities, enforces policies
Data StewardDay-to-day data quality and standards
Data OwnerBusiness accountability for data domain
Data Quality AnalystMonitors and improves data quality
Data Privacy OfficerEnsures regulatory compliance

Data Quality

Data quality measures how well data meets its intended purpose:

Quality Dimensions

DimensionDescriptionExample Issue
AccuracyCorrectly represents realityWrong phone number
CompletenessAll required data presentMissing email addresses
ConsistencyUniform across systems”USA” vs “United States”
TimelinessCurrent and up-to-dateOutdated inventory counts
ValidityConforms to rules/formatsInvalid date format
UniquenessNo duplicatesSame customer twice

Data Profiling

Data profiling examines data to understand its:

  • Structure and format
  • Distributions and patterns
  • Anomalies and errors
  • Relationships and dependencies

Goal: Identify quality issues before they impact analytics.


Master Data Management (MDM)

MDM maintains a single, consistent view of critical business entities:

What is Master Data?

Core entities shared across the organization:

  • Customers
  • Products
  • Suppliers
  • Locations
  • Employees

Why MDM?

BenefitDescription
Single source of truthOne authoritative record
Reduced redundancyEliminate duplicates
Better decisionsConsistent data across systems
Operational efficiencyLess time reconciling data

MDM Types

TypeFocusUse Case
Operational MDMTransactional systemsReal-time consistency
Analytical MDMWarehouses and analyticsReporting consistency

Golden Record

The golden record is the single, trusted version of a master data entity.

Creation process:

  1. Data enrichment — Add missing information
  2. Cleansing — Correct errors
  3. Matching — Identify related records
  4. Merging — Combine matched records
  5. Survivorship — Select best values for each attribute

MDM Tools

  • IBM InfoSphere MDM
  • SAP Master Data Governance
  • Informatica MDM
  • Talend MDM

Enterprise Data Categories

CategoryDescriptionExamples
MetadataData about dataSchema, lineage, definitions
TransactionalDay-to-day operationsOrders, payments, logs
MasterCore business entitiesCustomers, products
ReferenceLookup/classification dataCountry codes, status codes
UnstructuredNo predefined formatDocuments, emails, images

Data Classification

Classification based on sensitivity:

LevelDescriptionExamples
PublicFreely shareablePress releases, public reports
InternalOrganization-onlyInternal memos, policies
ConfidentialRestricted accessFinancial records, contracts
RestrictedHighest protectionPII, PHI, trade secrets

Data Protection Regulations

Key Regulations

RegulationJurisdictionFocus
GDPREuropean UnionPersonal data protection, consent
HIPAAUnited StatesHealthcare information
CCPACalifornia, USAConsumer privacy rights

Protected Data Types

TypeDefinitionExamples
PII (Personally Identifiable Information)Identifies an individualName, SSN, address, email
PHI (Protected Health Information)Health-related informationMedical records, diagnoses

GDPR Key Requirements

  • Consent — Explicit permission for data processing
  • Right to access — Individuals can request their data
  • Right to erasure — “Right to be forgotten”
  • Data portability — Users can transfer their data
  • Breach notification — 72-hour reporting requirement

Entity-Relationship Diagrams (ERDs)

ERDs visualize database structure:

Components

ElementRepresentation
EntityRectangle (table)
AttributeOval or listed inside entity
RelationshipDiamond or line with labels
CardinalityNotation showing 1:1, 1:N, N:M

Common Notations

  • UML (Unified Modeling Language)
  • Chen — Original ER notation
  • Crow’s Foot — Shows cardinality visually
  • IDEF1X — NIST standard

Designing Data Systems

Functional Design

Defines features, functions, and workflows:

  • User interface specifications
  • Data management requirements
  • System interactions

Data Structure Design

Defines data organization and relationships:

  • Entity-relationship diagrams
  • Data models (conceptual → logical → physical)
  • Constraints and validation rules

Tools

CategoryExamples
ER ModelingER/Studio, MySQL Workbench, Visio
Data ModelingPowerDesigner, Toad Data Modeler
UMLVisual Paradigm, Enterprise Architect

Key Takeaways

  1. Data governance establishes policies, standards, and accountability
  2. Data quality has six key dimensions: accuracy, completeness, consistency, timeliness, validity, uniqueness
  3. MDM creates golden records — single source of truth for master data
  4. Classify data by sensitivity — public, internal, confidential, restricted
  5. Know your regulations — GDPR, HIPAA, CCPA
  6. PII and PHI require special protection
  7. ERDs document database structure visually