The Organization of Digital Information Systems

The fundamental problem of communication is reproducing a message from one point to another. — Claude Shannon

The Information Age has revolutionized economy, politics, and culture.

DIS (Digital Information Systems) store, transmit, and transform data with incredible efficiency.

The challenge: Make sense of vast, rapidly growing information.

DIS are digital, interconnected, and concerned with information.

We interact with DIS through apps, webpages, and programs.

(And for that, we need computers and phones)

The purpose of this treatise: Understand DIS systematically.

My hypothesis: these methods yield a 10x speedup and 2-4x quality & value.

My goal: by making DIS simpler, more humans will be empowered to bring their best.

What is information? Why do we care about DIS?

Information = Data + Context. Understanding transforms data into information.

(From now on: data = information)

DIS only do three things:

  1. communicate data
  2. store data
  3. transform data

Speed, accuracy, and cost set digital systems apart from their predecessors.

Digital systems represent data with electrons, not carvings or ink.

The challenge: organize vast amounts of information. Organization is the key.

Most DIS challenges stem from understanding how parts interrelate.

Simple systems are easier to understand than complex ones.

The art of system design is the making of systems that are as simple as possible.

(The limiting factor is complexity, not energy)

The main thesis: to understand the system, focus on the data

When designing or understanding, focus on the data, not The List.

(The painting, not the palette and brushes!)

The List

  • Programming languages
  • Programming paradigms
  • Type systems
  • Libraries
  • Communication protocols
  • Operating systems
  • System architectures
  • Performance
  • Availability
  • Cost
  • AI

Why not look at the data?

  1. There's too much data, we cannot look directly at it.
  2. Data is just a detail, it's not important.

(This entire treatise topples these two myths)

We can build on five pillars, each of them a practical concept that removes a major obstacle to looking at the data.

The five pillars

Pillar 1: Single representation of data

Overcomes not being able to look and describe data in unambiguous terms.

Pillar 2: Single dataspace

Overcomes having parts of the system floating around instead of being part of one whole picture.

Pillar 3: Call and response

Overcomes the invisibility of how data is transformed inside a DIS.

Pillar 4: Logic is what happens between call and response

Overcomes doubts about the shape of the solution for a clearly specified problem.

Pillar 5: Interface is call and response

Overcomes separateness between system and user and between data and time.

Pillar 1: Single Representation of Data

Digital data is binary. We need to find a better way to represent it than zeroes and ones.

Introducing fourdata

A textual representation for all data.

(Why text? Because it is linear, compact and portable)

Fourdata represents four types of data:

  1. Number:

    1234
  2. Text:

    Hi
  3. List:

  4. Hash:

Data types can be combined and nested

And can represent data as diverse as an HTTP call

Or the state of a CPU

Or a row in a database

Or a simple web page

Fourdata can represent any conceivable data directly and without ambiguity, just using text and a few rules.

Pillar 2: Single dataspace

Every piece of data in our system has a path to it.

A path is a sequence of texts and numbers.

The path to eggs is breakfast 2

Paths are themselves data because they consist of numbers and texts.

Every data point in our system has a path to it.

Paths don't just point to data, they are the data!

Paths make places memorable, associative, even permanent.

To the left, there is context. To the right, detail.

Paths are themselves data because they consist of numbers and texts.

Every DIS stores its data in two primary forms: files and databases.

Both can be placed in the dataspace.

For files, put the path to the file as a hash, followed by its content:

Files can also be represented in binary format

For databases:

  • Use a hash to represent database name and table/collection name
  • A list of hashes for each of the rows/documents

The dataspace is not where the data is.

The dataspace is the data.

Pillar 3: Call and response

The combination of a call and a response can be used to express any data transformation.

The formula of a call:

  • @ denotes a call
  • = denotes its response

A reference to a variable:

A function call:

A database query:

An HTTP call:

An assembler instruction:

Call and response represent the dynamic nature of data while still being data.

Pillar 4: Logic is what happens between call and response

Logic is how a call creates a response.

(Logic is intentional transformation of data).

The five elements of logic:

  1. Reference: destination of a call.
  2. Sequence: calls made by a call.
  3. Conditional: choice between sequences.
  4. Loop: conditional repetition of a sequence.
  5. Error: special type of conditional response.

The first three are essential, the last two are nice to have.

A reference is the destination of a call.

References are links between parts of the dataspace.

A reference can point to a mere value:

And it also can point to a call:

Resolving a reference is finding what part of the dataspace it refers to.

Here is one way to do it:

  • From the place where the call is made, we go one level up (left) and try to find it. If it's not there, we repeat the process of going one more level up to find it.
  • If we have gone all the way to the left and we can't find it, we obtain an empty text.

A sequence is a list of calls.

The concept of sequence works at any level of abstraction.

Function, flow, procedure, operation, definition, all express the same: a list of calls.

A good analogy for a sequence is a recipe:

The colon (:) freezes the sequence so that it can be expanded only when it is called.

When we call a sequence, we can see its expansion:

Note the colon (:) contains the expansion of the calls.

Conditionals let you choose between sequences based on a condition:

The simplest conditional only has one sequence, which will only be expanded if the condition is true.

An example with two sequences:

Loops are conditional repetition of sequences:

This simple type of loop will likely be like at least 50% of the loops in your logic.

Loops can be used as filters:

Or as accumulators:

Recursion can be understood as loops with depth:

If calls can represent transformation at any level, we can go beyond declarative vs imperative.

The "what" is the interface, the "how" is its logic. Every call is declarative on the outside, imperative on the inside.

Pillar 5: Interface is call and response

Everyday notion of interface: something made for humans, with graphics.

Formal notion of interface: a boundary between two parts of the system.

A more general way to look at it: an interface is the combination of a call and its response.

Implications:

  • No intrinsic boundary between user and system, both are one.
  • No intrinsic "internal" or "external" areas. A call has an external area (interface) and an internal area (logic).
  • User calls and system calls are the same.

Calls are reactive.

When something changes (destination, message, logic), the response is updated.

When a part of the system changes, the system updates itself to stay in sync. This is the true meaning of reactivity.