Transient Workflows in AEM: How I Almost Halved the Time of a PIM Synchronization

A few months ago, a challenge came up in a maintenance project for a client: improving the efficiency of a data synchronization between a PIM and nodes in AEM. The synchronization affected far too many nodes, times were skyrocketing, and something had to be done. While researching alternatives I found transient workflows which, until then, I had never used — and they ended up being the key piece of the solution. In this article I explain what they are, why they are more efficient, how to enable them and what results I obtained in a real case.

The use case: synchronizing a PIM with AEM

A PIM (Product Information Management) is the master system for product information: names, descriptions, attributes, categories, references... In this project, that information had to be synchronized periodically into AEM, where each product materialized as nodes in the JCR repository.

The problem was scale. Each execution of the synchronization processed a huge volume of nodes, and each node went through an AEM workflow orchestrating the steps of the process. At that volume, two costs became unsustainable:

The total time of the process, which grew with every catalog expansion.
The resource consumption of the instance while the synchronization was running, affecting overall performance.

What makes a "normal" workflow slow

To understand the solution you first have to understand where the cost comes from. When you launch a standard workflow in AEM, the workflow engine persists the entire execution state in the JCR repository: a workflow instance is created under /var/workflow/instances, and the step history, metadata and the state of every transition are stored in it as the workflow progresses.

That makes sense when you need traceability: you can see the workflow in the instances console, check which step it is at, view the complete history or resume it if something fails.

But it comes at a price: every execution generates writes to the JCR. And when you are not talking about one occasional workflow but tens of thousands of executions in a massive synchronization, that overhead multiplies:

Thousands of instance nodes created, modified and (eventually) purged.
Pressure on the repository: more writes, more segment store growth, more work for the maintenance and compaction tasks.
Extra time added to each individual execution which, summed up, dominates the total time of the process.

For a PIM → AEM synchronization, all that persisted traceability added no value: what mattered was the result (the updated nodes), not the history of each mini-execution.

The solution: transient workflows

A transient workflow is exactly the same workflow, with one fundamental difference: it does not store execution data in the JCR repository. The workflow instance lives only in memory while it runs; no nodes are created under /var/workflow/instances, there is no persisted history, there is nothing to purge afterwards.

By eliminating those writes, every execution is lighter and the repository stops receiving the barrage of state writes. In massive processes, the difference shows directly in the times.

How to enable it

Enabling it is surprisingly simple: in the workflow model editor, within the model properties, tick the "Transient Workflow" checkbox. From then on, all executions of that model will be transient. There is no need to change the code of the process steps: the same WorkflowProcess implementations work as before.

What you give up in exchange

It is not free, of course. By not persisting state, you give up several things:

No visibility in the instances console: you cannot see the workflow "in progress" nor its history once finished.
No resumption after failure: if the AEM instance goes down mid-execution, the workflow does not recover — there is simply no record of it. Your process must be idempotent or re-runnable.
No participant steps: workflows with human intervention (approvals, tasks) cannot be transient, because they need to persist state while they wait.

For the PIM synchronization, none of these trade-offs was a problem: the process was fully automatic, idempotent by design (each execution left the nodes in the state dictated by the PIM) and its monitoring was done through logs, not through the workflow console.

The results

Switching to transient workflows, on its own, reduced the process times by up to 10%. Just by ticking a checkbox — probably the highest effort/benefit ratio of the whole maintenance project.

But I did not stop there. I took the opportunity to review the process code and fix several points of excessive memory usage: unnecessary accumulation of objects while iterating nodes, sessions and resources kept open longer than needed, and saves against the repository that were too frequent (or too infrequent).

Adding up the code fixes and the transient workflows, the final result was a reduction of the times of almost 40-50% compared to the original process.

Conclusion

Transient workflows are one of those AEM features that go unnoticed until you need them: they have been in the platform for years, they are enabled with a checkbox and, in the right scenario — massive, automatic, re-runnable processes — they deliver an immediate performance improvement by eliminating state persistence in the JCR.

The other lesson of the project is just as important: big improvements rarely come from a single change. The 10% from transient workflows became 40-50% when combined with memory optimizations in the code. Performance in AEM almost always means exactly that: stacking improvements across different layers.

If you have to wrestle with a massive synchronization in AEM and you did not know about transient workflows yet, I highly recommend giving them a try 😋