How to Effectively Scope Middleware

It’s all too common to see Statements of Work that include a single, loosely defined line item labeled “Middleware.” In some cases, it may be slightly more specific, refined as “product migration” or “order sync,” but often the scope remains undefined. The result is predictable: estimates are misaligned with effort, and complexity emerges only after implementation begins.

Inaccurate estimates often result from insufficient structural breakdown of the integration scope. Middleware is not a single deliverable, but a collection of discrete, directional data exchanges. Each exchange carries its own transformation logic, validation requirements, and operational characteristics. When those exchanges aren’t explicitly outlined, hidden assumptions build up. Dependencies emerge late, exceptions are handled reactively, and delivery timelines inevitably expand.

Scoping middleware accurately requires a more deliberate approach. Each automated exchange should be decomposed and defined in terms of the data being moved, the direction of movement, the triggering conditions, and the transformation logic applied. As the level of detail increases, so too will the accuracy of the estimates.¹

What is Middleware?

When integrating systems, data doesn’t move randomly — it moves with structure and intent. At a high level, the movement of data via middleware follows the same fundamental principles as a traditional data migration — with one important distinction:

Whereas data migration is a one-time event, triggered manually, middleware is (usually) automated².

Middleware applies this same ETL process as non-automated data migrations.

In modern architectures, APIs are the most common transport mechanism. Middleware retrieves data via API requests and submits transformed payloads to corresponding endpoints. But APIs are not the only option. Middleware may also read from or write to flat files (CSV, XML, JSON), SFTP locations, or other data stores depending on system capabilities and constraints.

Execution timing varies based on business needs:

Event-driven integrations (e.g., webhooks) support near real-time synchronization
Scheduled integrations (e.g., cron jobs) process data in batches
Many implementations combine both approaches to balance performance, reliability, and cost

Regardless of trigger or transport method, middleware serves one purpose:

Middleware is the intermediary layer responsible for reliably extracting data, applying business logic and transformations, and delivering it accurately to its destination.

But understanding how middleware works technically is not the same as understanding how to scope it properly.

The real challenge in integration projects is not enabling data transfer.
It is defining what moves, in which direction, under what conditions, and with what rules.

The most effective way to accomplish that is by defining data flows.

Understanding Data Flows

When scoping middleware (whether using platforms like Celigo, custom services, or EDI tooling), the integration should be broken down into discrete data flows.

Middleware is not one large synchronization engine.
It is a collection of clearly defined, independent exchanges.

A data flow is defined by four components:

Data Type (e.g., Products, Orders, Customers)
Operation Type (Create, Update, Delete)
Source System
Destination System

The combination of these four elements defines a single, specific automated movement of data.

Source and destination together establish the direction of the flow.

Why Direction Matters

Even when the data type remains the same, the direction of the flow significantly impacts implementation.

For example:

Sending product data from an ERP to an ecommerce platform requires logic aligned with the ecommerce platform’s product creation rules.
Sending product data from the ecommerce platform to the ERP may require entirely different field mappings, validation requirements, and transformation logic.

Reversing the direction fundamentally changes the implementation.

Direction matters because every system has:

Its own schema
Its own required fields
Its own validation rules
Its own business logic constraints

Why Operation Type Matters

Operation type is just as important as direction.

A Create operation behaves very differently from an Update or Delete.

For example:

Create (Product)
May require a full dataset: name, description, pricing, images, attributes, metadata, and required fields.
Update (Product)
May intentionally modify only select fields (e.g., price or inventory) to avoid overwriting enriched content in the destination system.
Delete (Product)
May require soft-deletion, archiving, or status updates rather than permanent removal, depending on platform constraints.

When possible, combine logic for create and update operations into a single generalized flow. Using different mappings for create and update mappings can result in unintended overwrites.

Without explicitly defining direction and operation, integrations risk overwriting trusted data or introducing inconsistencies. Separating flows by both direction and operation type ensures precision and reduces risk.

Example Data Flows

Data	Operation	Source → Destination
Products	Create	ERP → Ecommerce Platform
Product Inventory	Update	ERP → Ecommerce Platform
Orders	Create	Ecommerce Platform → ERP
Tracking Numbers	Update	ERP → Ecommerce Platform
Customers	Create	Ecommerce Platform → ERP
Customers	Create	ERP → Ecommerce Platform

Each row represents a distinct integration flow.

Each flow requires its own:

Field mappings
Transformation logic
Validation handling
Error handling strategy
Retry mechanism
Logging and monitoring configuration

Scoping middleware properly begins with identifying these flows explicitly.

Only after they are defined should you begin documenting field-level mappings and transformation logic.

Interdependencies Between Flows

Data flows rarely exist in isolation.

For example:

Orders reference product records.
Orders reference customer records.
Customers may be tied to B2B company accounts.
Inventory may exist across multiple fulfillment locations.

These dependencies are not always obvious during initial planning. They often surface during implementation, when a flow fails because a prerequisite record does not yet exist in the destination system.

This is a normal part of integration work. It is a natural characteristic of distributed systems, not an indication of poor planning.

Clear flow definitions provide the structure needed to handle these dependencies rationally. When new requirements surface, you can:

Introduce supporting flows
Sequence flows appropriately
Add validation safeguards
Define authoritative systems for specific data domains

Middleware as a Set of Contracts

A helpful mental model is to think of each data flow as a contract:

It defines exactly what data moves.
It defines exactly when it moves.
It defines exactly how it is transformed.
It defines exactly how errors are handled.

Scoping middleware, therefore, is not about configuring endpoints first.
It is about defining contracts first.

Once those contracts are clear, implementation becomes a technical exercise.

Without that clarity, integration projects become reactive, fragile, and difficult to maintain.

Final Thought

Middleware complexity exists when boundaries are undefined. By organizing integrations into clearly scoped data flows — separated by data type, operation, direction, and responsibility — you create a system that is:

Easier to reason about
Easier to implement
Easier to monitor
Easier to extend

Scoping middleware correctly is not just a technical task. It is an architectural discipline.

Determining the proper level of detail for accurate estimates is part art, part process. A practical approach is to start with a high-level overview and then refine it methodically until the remaining uncertainty is small enough to support a reliable, defensible estimate. ↩︎
Technically, any software that sits between two systems is middleware, and middleware can even facilitate manual migrations, but for the purposes of this article, middleware refers to automated Electronic Data Interchange (EDI) between systems. ↩︎

What is Middleware?

Understanding Data Flows

Why Direction Matters

Why Operation Type Matters

Example Data Flows

Interdependencies Between Flows

Middleware as a Set of Contracts

Final Thought

More Thinking