Jul 11, 2023

Notes on Data Modeling: From REST to GraphQL and the Logic of Datalog

Context: Abstractions

I’ve been spending a lot of time recently thinking about the layers of abstraction we work with daily. The initial hurdle of just making things work is mundane and I’d like to understand why they work the way they do.

Initial Note: Most of my work is in the browser, which is a massive abstraction itself—feels more like a runtime or a game engine than a simple document viewer. UI frameworks like React add another layer. The deeper I look, the more I realize that the way we fetch and structure data is the most repetatively designed piece. If the data model is clunky, the whole application feels clunky.

Discovery 1: The GraphQL Convenience

My first real exposure to a different data model came from working with a new project that used GraphQL. Up until then, everything was standard REST: resources, endpoints, and the usual dance of over-fetching or under-fetching data.

Observation on REST:

Pros: Simple, uses standard HTTP verbs, easy to cache.
Cons: Rigid. Need to hit multiple endpoints for related data, or endpoints return too much data (e.g., fetching a list of users and getting every field when I only need the name and ID).

Observation on GraphQL:

The experience was immediately different. Instead of requesting a fixed resource, I was defining the shape of the data I wanted.
It felt like I was navigating a graph of interconnected data, not just pulling a record from a table.
Key Takeaway: GraphQL forces you to think about data not as discrete resources, but as a network of nodes and edges. This is a fundamental shift from the row-column model used in a lot of introductory material.

I specifically remember using the monday.com GraphQL API. It was a complex system, but the query structure made it feel intuitive. This led to a thought experiment: if we can query a complex project board as a graph, we can also use it for modeling things like a Revit building model (which is a massive web of interconnected components, in a traditional relational database?). Maybe the graph model is a better fit for complex, highly-related data structures.

Discovery 2: Logic Programming and Datalog

The idea of data as a graph led me to research other non-traditional data models. I stumbled across an article by Pete Vilter about treating a “Codebase as Database” [1], which introduced me to Datalog.

Initial Reaction: This is completely different. It’s a logic programming language and it’s all about deduction.

Notes on Datalog’s Structure (The Core Concepts):

Facts: These are the base truths, the static data. They look like function calls:
```
mother{child: "Pete", mother: "Mary"}.
father{child: "Mary", father: "Mark"}.
```
Rules: These define how new truths can be derived from existing facts. They are declarative statements using logical operators (:- means “if”):
```
# Rule: A parent is a mother OR a father
parent{child: C, parent: P} :-
  mother{child: C, mother: P} |
  father{child: C, father: P}.
```
Queries: You ask the system to solve for a variable:
```
parent{child: "Pete", parent: P}. # Find P
```

The ‘Aha’ Moment: The most interesting part is the traceability. When Datalog answers a query, it can provide a trace tree showing the exact logical steps it took to infer the answer. This is a huge win for debugging and understanding data integrity. It’s not just “here’s the data,” it’s “here’s why this data is true.”

Synthesis: Data Topology

Connecting GraphQL (a graph-based query language) and Datalog (a logic-based deduction system) started to form a larger idea in my head, which I’m calling Data Topology for now.

Working Definition: Data Topology is the idea that the structure and relationships within the data—the layout of the information—is what allows us to extract meaning, not just the values themselves.

Data Model	Primary Focus	Structure Imposed	How Meaning Emerges
Relational (SQL)	Records/Tables	Upfront (Schema)	Explicit joins and aggregations
GraphQL	Nodes/Edges	Flexible (Schema)	Navigating relationships (the graph)
Datalog	Facts/Rules	Declarative (Logic)	Deduction based on configuration (the logical arrangement)

In traditional databases, we impose the structure. In these newer models, the structure can emerge from the data and the rules we define. It’s a shift from a rigid, geometric view of data (like Euclidean geometry) to a more flexible topological one where relationships are expressed.

Closing Thoughts

One thing I must add is that while I have worked on projects involving producting and consuming data from REST systems, for GraphQL I have been mostly worked on consuming side. There was little bit of maintainence I recall from a time on a project which involved writing some serverside GraphQL resolvers, but don’t really recall much from that.

For now, I’m just trying to incorporate this graph-based and logic-based thinking into my daily work. It’s changing how I design my application state and how I think about data flow.

I’m excited to see what happens with these ideas.

References

[1] Pete Vilter’s piece on “Codebase as Database”