Checkpoints¶
Save and resume workflow execution state.
Why Checkpoints?¶
Long-running workflows can fail midway due to:
- API rate limits
- Network issues
- System interruptions
- Temporary failures
Checkpoints let you resume from where you left off instead of starting over.
Enabling Checkpoints¶
Automatic Checkpointing¶
Save checkpoints at regular intervals:
from vibe_aigc import MetaPlanner
# Checkpoint every 5 completed nodes
planner = MetaPlanner(checkpoint_interval=5)
result = await planner.execute_with_resume(vibe)
Manual Checkpoints¶
Create checkpoints programmatically:
# Create a checkpoint
checkpoint_id = planner.create_checkpoint(
plan_id="plan-001",
execution_state=current_state
)
print(f"Saved checkpoint: {checkpoint_id}")
Managing Checkpoints¶
List Checkpoints¶
checkpoints = planner.list_checkpoints()
for cp in checkpoints:
print(f"ID: {cp['checkpoint_id']}")
print(f"Created: {cp['created_at']}")
print(f"Plan: {cp['plan_id']}")
print(f"Progress: {cp['completed_nodes']}/{cp['total_nodes']}")
print()
Get Checkpoint Details¶
checkpoint = planner.get_checkpoint(checkpoint_id)
print(f"Workflow state: {checkpoint['state']}")
print(f"Node results: {checkpoint['node_results']}")
Delete Checkpoint¶
Resuming Execution¶
Resume from Latest¶
Resume from Specific Checkpoint¶
Checkpoint Storage¶
By default, checkpoints are stored in .vibe_checkpoints/:
.vibe_checkpoints/
├── plan-001/
│ ├── cp-abc123.json
│ └── cp-def456.json
└── plan-002/
└── cp-ghi789.json
Custom Storage Location¶
from vibe_aigc.persistence import WorkflowPersistenceManager
manager = WorkflowPersistenceManager(
checkpoint_dir="/path/to/checkpoints"
)
planner = MetaPlanner(persistence_manager=manager)
Checkpoint Contents¶
Each checkpoint contains:
{
"checkpoint_id": "cp-abc123",
"plan_id": "plan-001",
"created_at": "2026-02-05T10:30:00Z",
"vibe": {
"description": "...",
"style": "...",
"constraints": []
},
"workflow_plan": { ... },
"completed_nodes": ["node1", "node2"],
"node_results": {
"node1": { "status": "completed", "output": "..." },
"node2": { "status": "completed", "output": "..." }
},
"pending_nodes": ["node3", "node4"]
}
Best Practices¶
- Set appropriate intervals: Too frequent = overhead, too sparse = lost progress
- Clean up old checkpoints: Delete after successful completion
- Use meaningful plan IDs: Easier to find relevant checkpoints
- Handle resume failures: Implement fallback to fresh start