JAlcocerTech E-books

Advanced: Encoding and Evolution

Systems rarely upgrade all at once. Old and new versions of code must often coexist, so data formats need to support gradual change.

Encoding

Encoding turns in-memory objects into bytes for files, messages, or network calls. Common families include JSON/XML, binary formats, and schema-based systems such as Protocol Buffers, Thrift, or Avro.

Human-readable formats are easy to inspect but can be verbose and ambiguous. Binary formats are compact and efficient, but depend more heavily on schemas and tooling.

Backward and Forward Compatibility

Backward compatibility means new code can read old data. Forward compatibility means old code can read new data. Real deployments often need both because rolling upgrades, retries, queues, caches, and stored records mix versions.

Compatible schema evolution usually means:

  • adding optional fields rather than required fields
  • preserving field identifiers
  • avoiding changes that reinterpret existing data
  • maintaining defaults for older readers
  • treating unknown fields carefully

Dataflow Through Systems

Data moves through databases, services, and asynchronous messages. Each path has different evolution pressure.

In databases, old records may remain for years. In service calls, compatibility matters during rolling deploys. In queues and logs, messages may be consumed long after they are produced.

RPC and Messaging

Remote procedure calls can make network calls look local, but they are not local. Networks have latency, partial failure, retries, timeouts, and duplicate requests.

Message passing makes time and failure more explicit. It can decouple producers from consumers, absorb bursts, and support replay, but it introduces ordering, idempotency, and delivery semantics as design concerns.

Practical Takeaway

Design data formats as long-lived contracts:

  • version schemas deliberately
  • test old/new producer and consumer combinations
  • make consumers tolerant of unknown fields
  • use idempotent handlers for retried messages
  • avoid pretending that network calls behave like function calls