Skip to main content
Longer conversations in a single thread can exceed the model’s context window, leaving you unable to continue with that thread. Compaction solves this by condensing older messages in a thread into a high-density summary. This immediately restores available tokens upto a certain extent, letting you continue with the thread for a longer period of time. It also allows the Agent to focus fully on your most recent queries while maintaining a clear understanding of the thread history.

When is compaction triggered

Kombai monitors your token usage and triggers compaction under the following conditions:
  • Context window saturation: When the thread consumes the majority (typically 90-95%) of the available context window.
  • Cache expiry: To optimize costs, if you return to a thread after 5 minutes of inactivity and 50-60% of the context is already used, Kombai will auto-compact because the cache has expired. Cache is also expired when you toggle Agent Variant or Extended Thinking.
If you edit or restore a message, the previous compaction is reset, which may temporarily increase context usage. As you continue the conversation, Kombai will re-evaluate and trigger compaction again if necessary.

How context is managed

The context window is shared across several elements beyond your message history:
  • Infrastructure overhead: Tools, system prompts, skills, and RAG attachments typically occupy 15–35% of the window, depending on your tech stack.
  • Persistent rules: Project and User Rules are never compacted. They remain in their full form to ensure the Agent strictly adheres to your constraints at all times.

Toggle auto-compaction

Compaction is enabled by default. If you prefer to manage the context window manually, you can disable the Auto compact threads toggle under the
icon.

Manually compact threads

You can trigger compaction at any time by clicking the icon at the end of your thread and selecting Compact thread.

Why compact threads

  1. Extended conversations: Restore available tokens to maintain a single, continuous thread for long-term projects.
  2. Optimized performance: Experience faster response times as compaction will reduce the massive overhead of uncompressed history.
  3. Preserved context: Retain critical information and decision points through intelligent summarization while discarding redundant data.
  4. Cost efficiency: Dramatically reduce token consumption in API calls by compressing the massive context.