We're open-sourcing a multi-agent framework for Java, inspired by Agentican

Shane K Johnson · Chief AI Agent Herder · April 2026

The first agent frameworks, written in Python, were simply a while loop wrapped around an LLM — prompt, call tool, prompt, call tool, done. They were built to power chatbots and conversational assistants, and for that, they worked fine.

But agentic workflows are different. They're structured, hierarchical and non-linear. They're multi-agent — different roles for different steps. They run steps in parallel where possible and serial where they must. They need human checkpoints. They need to survive a server restart. And they need to be auditable — every prompt, every response, every tool call available for review.

Today we're open-sourcing the Agentican framework for Java. It's a multi-agent orchestration framework that treats workflows as first-class artifacts — declarative plans with parallel steps, loops, branches and human checkpoints — composed from agents, skills, tools and knowledge.

If you're a Java shop evaluating how to build and integrate agentic workflows, this is for you.

The two-line hello world

Before anything else — what does "getting started" actually look like?

# application.properties
agentican.llm[0].api-key=${ANTHROPIC_API_KEY}

@Inject Agentican agentican;

var task = agentican.run("Research the competitive landscape for " +
                           "real-time CDC tools and summarize the top 5");
var output = task.result().output();

That's it. No agents registered. No skills defined. No plan authored. The framework's built-in Planner — itself an agent — sees an empty catalog, determines what agents and skills are needed, defines them, builds a plan and hands it to the task runner.

This is unusual for a Java framework. The typical "hello world" for an agent library is 50+ lines of registration ceremony. Agentican's is two lines, because the framework's own agents do the scaffolding for you.

But that's the demo. Let's talk about what the framework actually is.

Core concepts

A handful of primitives. They compose.

To make things concrete, we'll build one workflow — a competitive landscape brief — across the next few examples, introducing each primitive in context before wiring everything together at the end.

Agent — an identity with a role, bound to a runner. Agents are who does the work.

var researcherConfig = AgentConfig.builder()
    .name("Market Research Analyst")
    .role("Expert at finding information on companies and markets")
    .externalId("market-research-analyst")
    .build();

var analystConfig = AgentConfig.builder()
    .name("Strategy Analyst")
    .role("Synthesizes research into concise competitive briefings")
    .externalId("strategy-analyst")
    .build();

Two agents for our briefing workflow: a researcher who gathers information, an analyst who synthesizes it.

Skill — a reusable capability applied to agents at runtime based on the task. Skills are how agents do the work. One agent can take on different skills in different parts of a plan.

var webSearchConfig = SkillConfig.builder()
    .name("Web search")
    .instructions("When a question requires external information, use the " +
                  "search tools first. Quote sources in your answer.")
    .externalId("web-search")
    .build();

Either agent can pick up this skill when their step calls for it.

Toolkit + Tool — a tool is a function the LLM can call: a name, a description, a JSON schema for arguments. A toolkit is a collection of related tools with the actual implementation.

The framework ships integrations for MCP (Model Context Protocol), Composio (100+ SaaS integrations: Slack, Notion, Linear, GitHub…) and has built-in toolkits for scratchpads, HITL questions and knowledge recall.

The researcher will use GitHub to profile competitors' open-source footprint. MCP is one config object — slug, display name, the MCP server's SSE endpoint, optional headers for auth. Tool discovery is automatic; the framework calls listTools() against the MCP server at registration time.

var githubConfig = McpConfig.builder()
    .slug("github")
    .name("GitHub")
    .url("https://mcp.github.com/sse") // replace with your MCP server
    .header("Authorization", "Bearer " + System.getenv("GITHUB_TOKEN"))
    .build();

The analyst will post the final brief to Slack. Composio is even simpler — your API key and a Composio user ID, and every toolkit the user has connected (Slack, Notion, Linear, etc.) becomes available:

var composioConfig = ComposioConfig.builder()
    .apiKey(System.getenv("COMPOSIO_API_KEY"))
    .userId("user-123")
    .build();

Both register automatically when configured via application.properties under agentican.mcp[N] and agentican.composio — direct construction is for cases where you're building outside the Quarkus binding or wiring toolkits dynamically at runtime.

Toolkits can also declare that a specific tool needs human approval — the runner will pause, raise an HITL checkpoint and wait.

Knowledge — structured facts extracted from agent outputs and stored in Postgres. No vector DB. No embedding infrastructure. Agents recall relevant facts automatically in future tasks.

When the analyst completes a brief, the Knowledge Expert extracts the key facts about each competitor — market positioning, funding, notable features — and persists them. The next time any agent in any plan asks about those competitors, the facts are there. No code required; it happens on every step completion.

Plan — the workflow itself. A typed DAG of steps — agent steps, loop steps, branch steps — with declared dependencies. The runner builds a dep graph and dispatches whatever's ready onto virtual threads.

var planConfig = PlanConfig.builder()
    .name("Competitive Landscape Brief")
    .description("Research, analyze and deliver a competitive landscape brief")
    .externalId("competitive-landscape-brief")
    .param("market", "Target market to research", null, true)
    .step("research", s -> s
        .agent("Market Research Analyst")
        .skills("Web search")
        .tools("search_repositories")
        .instructions("Profile top competitors in {{param.market}}"))
    .step("analyze", s -> s
        .agent("Strategy Analyst")
        .skills("Web search")
        .instructions("Synthesize brief from {{step.research.output}}")
        .dependencies("research"))
    .step("deliver", s -> s
        .agent("Strategy Analyst")
        .tools("slack_send_message")
        .instructions("Post brief to #competitive: {{step.analyze.output}}")
        .dependencies("analyze")
        .hitl())
    .build();

Plans are records. Records of records. Immutable data. Serializable, inspectable, diffable. You can load them from YAML, build them programmatically, or have an LLM generate them for you.

Putting it together. The Agentican builder accepts every config object directly:

try (var agentican = Agentican.builder()
        .config(runtimeConfig)
        .agent(researcherConfig)
        .agent(analystConfig)
        .skill(webSearchConfig)
        .mcp(githubConfig)
        .composio(composioConfig)
        .plan(planConfig)
        .build()) {

    var plan = planConfig.toPlan();

    var briefTask = agentican.run(plan, Map.of("market", "CDC tools"));

    var brief = briefTask.result().output();
}

In production, these same config objects are typically declared in application.properties and the Quarkus binding wires them automatically — this direct form is what you'd use in tests, standalone apps, or dynamic configurations.

The framework ships with its own agents

The framework doesn't just run your agents. It ships with its own.

The Planner

PlannerAgent takes a natural-language task description and produces a structured Plan. It considers every agent, skill, toolkit and existing plan in your catalog. It decides:

Reuse an existing plan if the request matches one already registered.
Compose a new plan from the agents and skills you've registered.
Build what's missing. No suitable agents or skills? The Planner creates them.

That last point is why the two-line hello world works. With an empty catalog, the Planner bootstraps everything from scratch. With a populated catalog, it dispatches over your domain primitives. The more you register, the more it behaves as a dispatcher. The less you register, the more it behaves as a bootstrapper.

Newly created agents and skills are registered and can be reused in subsequent plans.

The Knowledge Expert

A knowledge agent that watches every step as it completes. When an agent finishes its work, the Knowledge Expert extracts structured facts from the output — only what the agent discovered, not what was already in the input — and writes them to the knowledge store.

Days later, a different plan has a step that asks "What's customer ABC's SLA?" The knowledge toolkit returns the stored facts. The agent cites them in its response.

Knowledge that builds as work happens. Invisible to the agents doing the work. Implicit to the agents benefiting from it.

Why this matters

Most frameworks leave planning to the developer — write the prompts that decompose the steps, wire up the dependencies, maintain the orchestration logic. It works, but it's brittle. Change the task and you're rewriting the plan.

Memory and knowledge are a different problem. In most frameworks, they exist deep inside the infrastructure — embedding pipelines, vector databases, retrieval layers that the framework manages on the agent's behalf. In Agentican, knowledge is a core component that agents interact with directly. Agents decide when to recall facts. The framework provides the store and the tools. The agents use their own judgment about when and how to use them.

Plans are more than sequences

The competitive-brief from the previous section is a simple three-step chain. Real workflows need more: steps that run concurrently, iterations over dynamic collections and conditional paths that route based on what an agent decides. Let's grow the same workflow to use all three.

Parallel steps — the single research step can fan out into concurrent fact-gathering streams:

.step("web-signals", s -> s
    .agent("Market Research Analyst")
    .skills("Web search")
    .instructions("Find positioning on competitors in {{param.market}}"))
.step("code-signals", s -> s
    .agent("Market Research Analyst")
    .tools("search_repositories")
    .instructions("Profile open-source competitors in {{param.market}}"))
.step("financial-signals", s -> s
    .agent("Market Research Analyst")
    .instructions("Summarize recent funding in {{param.market}}"))

All three run concurrently on virtual threads. The next step waits for all of them.

Loops — when the research step identifies competitors dynamically, each one gets its own deep-dive in parallel:

.loop("deep-dive", l -> l
    .over("web-signals")
    .step("analyze", s -> s
        .agent("Strategy Analyst")
        .skills("Web search")
        .instructions("Deep-dive {{item}}, {{step.code-signals.output}}" +
                      " and {{step.financial-signals.output}}")))
.step("synthesize", s -> s
    .agent("Strategy Analyst")
    .instructions("Synthesize as brief: {{step.deep-dive.output}}")
    .dependencies("deep-dive"))

Twenty competitors, twenty concurrent analysis pipelines — each with access to the code and financial research. The synthesis step waits for every iteration to complete, then consolidates.

Branches — the final delivery depends on what the analysis found. Routine results get a standard digest; urgent competitive threats escalate for human review:

.branch("deliver", b -> b
    .from("synthesize")
    .defaultPath("standard")
    .path("standard", p -> p
        .agent("Strategy Analyst")
        .tools("slack_send_message")
        .instructions("Post the weekly digest to #product-strategy"))
    .path("urgent", p -> p
        .step("alert", s -> s
            .agent("Strategy Analyst")
            .instructions("Draft an urgent executive alert"))
        .step("review", s -> s
            .agent("Senior Strategy Analyst")
            .instructions("Review and approve the alert before it goes out")
            .hitl())))

Only the relevant path executes. No wasted work, no wasted tokens.

The full plan — all three shapes compose into a single workflow:

var planConfig = PlanConfig.builder()
    .name("Competitive Intelligence Digest")
    .description("Weekly competitive landscape brief with research, " +
                 "per-competitor deep-dives and threat-level routing")
    .externalId("competitive-intelligence-digest")
    .param("market", "Target market to research", null, true)
    .step("web-signals", s -> s
        .agent("Market Research Analyst")
        .skills("Web search")
        .instructions("Find positioning on competitors in {{param.market}}"))
    .step("code-signals", s -> s
        .agent("Market Research Analyst")
        .tools("search_repositories")
        .instructions("Profile open-source competitors in {{param.market}}"))
    .step("financial-signals", s -> s
        .agent("Market Research Analyst")
        .instructions("Summarize recent funding in {{param.market}}"))
    .loop("deep-dive", l -> l
        .over("web-signals")
        .dependencies("code-signals", "financial-signals")
        .step("analyze", s -> s
            .agent("Strategy Analyst")
            .skills("Web search")
            .instructions("Deep-dive {{item}}, {{step.code-signals.output}}" +
                          " and {{step.financial-signals.output}}")))
    .step("synthesize", s -> s
        .agent("Strategy Analyst")
        .instructions("Synthesize as brief: {{step.deep-dive.output}}")
        .dependencies("deep-dive"))
    .branch("deliver", b -> b
        .from("synthesize")
        .defaultPath("standard")
        .path("standard", p -> p
            .agent("Strategy Analyst")
            .tools("slack_send_message")
            .instructions("Post the weekly digest to #product-strategy"))
        .path("urgent", p -> p
            .step("alert", s -> s
                .agent("Strategy Analyst")
                .instructions("Draft an urgent executive alert"))
            .step("review", s -> s
                .agent("Senior Strategy Analyst")
                .instructions("Review and approve alert before it goes out")
                .hitl())))
    .build();

Parallel research, per-competitor loop, threat-level branch — one plan, one submission, one virtual-thread-based execution.

Human-in-the-loop is structural

HITL in Agentican isn't middleware bolted on after the fact. It's declared in the plan and enforced by the framework.

Three checkpoint types:

Step approval — the agent completes a step, pauses and waits for a human to approve or reject before downstream work begins. Configured per step.

Tool approval — the agent wants to call a tool (send an invoice, create a CRM record). The framework pauses mid-execution. A human sees the exact action and parameters. Approve and it executes. Reject with feedback and the agent adapts in real time. Configured per tool.

Ask user — the agent initiates a question. "Which tone should I use?" or "I found two valid approaches, which do you prefer?" The human answers and the agent continues.

All three are persisted. All survive server restart. A pending approval from before a crash is still actionable after recovery.

Durability

If the server dies mid-workflow, tasks resume automatically on next boot.

What survives: completed step outputs, in-flight agent turns (reconciled at turn boundary), pending HITL checkpoints, loop iteration state, branch decisions and a snapshot of the plan shape at dispatch.

What re-runs at most: one tool call (the one in flight at crash time) and one LLM call (if the request was sent but the response never arrived). Everything else is deterministically skipped.

A built-in ResumeClassifier reads each interrupted task's persisted state and classifies the in-flight turn into one of seven states to pick the right reconciliation strategy. If three tools ran in parallel and two completed before the crash, only the third re-executes on resume. The LLM is not re-called for that turn — the response is replayed in-process.

The plan shape is captured at dispatch time, and pending HITL checkpoints are rehydrated on startup. If you edit the plan registry after a task starts, the running task continues with the original shape. A human approval that was waiting before the crash is still actionable after recovery — same checkpoint, same button. No mid-flight surprises.

Resume concurrency is configurable. The framework gates how many interrupted tasks resume in parallel to avoid flooding the LLM on a restart with a large backlog.

Observability

Every event becomes a CDI event, an OpenTelemetry span and a Micrometer counter.

OTel spans nest to mirror the execution tree — task → step → run → turn → llm.call / tool.call. Resumed tasks carry agentican.resumed=true so you can filter recovery executions in your trace waterfall.

Micrometer exposes agentican.tasks.active, agentican.tasks.completed{status}, agentican.tasks.duration, agentican.hitl.checkpoints.pending and more. Feed Prometheus and Grafana as usual.

CDI events let you subscribe to lifecycle transitions in any @ApplicationScoped bean. Alerting, audit forwarding, custom behavior — all without patching the framework.

Because every state transition is persisted, most observability is a side effect of durability. You can reconstruct any historical task's complete execution tree from the store.

Where it fits

Workflow engines (Temporal, Camunda). Same category in spirit — durable, structured, long-running. Different unit of work. Temporal's activities are arbitrary code. Agentican's steps are agent invocations with LLM loops, HITL gates and tool chains. If your workflows are deterministic code, use Temporal. If your workflows are "an LLM decides what to do next, within a structured plan," use Agentican.

LLM client libraries (Spring AI, LangChain4j). Adjacent category. These are clients — prompt management, chat memory, tool invocation. Agentican is the orchestration tier above them. You can swap Agentican's built-in LLM client for Spring AI or LangChain4j under the hood — the framework only cares about the LlmClient interface.

Python agent frameworks (LangGraph, CrewAI, AutoGen). Same problem domain, different ecosystem. The reason to pick Agentican is usually that you're a JVM shop and cross-process Python/Java calls are a tax you don't want to pay in production. Same-process transactions, existing observability stack, type safety, operational familiarity, one deployment artifact.

Agentican is the orchestration tier for agentic workflows in a Java stack — the layer that was missing.

Getting started

Add the Quarkus extension. Set an LLM key. Call agentican.run().

Register agents and skills as your domain evolves. Build plans when you want explicit control. Let the Planner handle the rest.

The framework core is plain Java — Quarkus-first but runnable in Spring Boot, Micronaut, or standalone. Postgres for production persistence, H2 for tests.

GitHub →

We're open-sourcing a multi-agent framework for Java, inspired by Agentican. No chat loops. No Python sidecars. No glue code.