Documentation Index Fetch the complete documentation index at: https://mintlify.com/openai/codex/llms.txt
Use this file to discover all available pages before exploring further.
This guide covers the testing workflows for Codex CLI, including unit tests, integration tests, and snapshot tests.
Rust Testing
The Rust implementation uses standard cargo test along with specialized tools.
Running Tests
Run Tests for a Specific Crate
Always start by testing the specific crate you modified: cargo test -p codex-tui
cargo test -p codex-core
cargo test -p codex-app-server-protocol
This is the fastest way to get feedback on your changes.
Run Full Test Suite (If Needed)
If you changed common, core, or protocol crates, run the complete test suite: # Standard cargo test
cargo test
# Or with nextest (faster)
just test
Avoid --all-features for routine local runs. It expands the build matrix and significantly increases build time and disk usage. Only use it when you specifically need full feature coverage.
Check Results
Review test output for any failures or warnings.
Snapshot Tests
Codex uses snapshot tests via insta to validate rendered output, especially in codex-tui.
Requirement: Any change that affects user-visible UI must include corresponding insta snapshot coverage.
Run Tests to Generate Snapshots
Check Pending Snapshots
cargo insta pending-snapshots -p codex-tui
Review Changes
Review the generated *.snap.new files directly, or preview a specific file: cargo insta show -p codex-tui path/to/file.snap.new
Accept Snapshots (If Correct)
Only accept if you’ve verified the changes are correct: cargo insta accept -p codex-tui
If you don’t have the tool installed: cargo install cargo-insta
Test Assertions Best Practices
Use pretty_assertions
Avoid field-by-field assertions
use pretty_assertions :: assert_eq;
#[test]
fn test_example () {
let result = calculate_something ();
let expected = ExpectedStruct { /* ... */ };
// Prefer deep equals on entire objects
assert_eq! ( result , expected );
}
Integration Tests
When writing end-to-end Codex tests, use the utilities in core_test_support::responses.
Typical test pattern
Response mock helpers
use core_test_support :: responses;
#[tokio :: test]
async fn test_function_call () -> Result <()> {
let mock = responses :: mount_sse_once ( & server , responses :: sse ( vec! [
responses :: ev_response_created ( "resp-1" ),
responses :: ev_function_call ( call_id , "shell" , & serde_json :: to_string ( & args ) ? ),
responses :: ev_completed ( "resp-1" ),
])) . await ;
codex . submit ( Op :: UserTurn { /* ... */ }) . await ? ;
// Assert request body
let request = mock . single_request ();
assert_eq! ( request . function_call_output ( call_id ) ? , expected_output );
Ok (())
}
Best practices for integration tests:
Prefer wait_for_event over wait_for_event_with_timeout
Prefer mount_sse_once over mount_sse_once_match or mount_sse_sequence
Avoid mutating process environment in tests
Spawning Workspace Binaries in Tests
Use codex_utils_cargo_bin::cargo_bin("...") instead of assert_cmd::Command::cargo_bin(...) when tests need to spawn first-party binaries.
use codex_utils_cargo_bin :: cargo_bin;
#[test]
fn test_cli_binary () {
let codex_bin = cargo_bin ( "codex" );
// Use codex_bin path...
}
This ensures paths resolve correctly under both Cargo and Bazel runfiles.
TypeScript Testing
The TypeScript implementation is legacy . This section is for reference only.
The TypeScript CLI uses Vitest for unit tests.
Running TypeScript Tests
Watch mode (recommended)
Single run
With type checking
Full validation suite
Git Hooks
The TypeScript project uses Husky to enforce code quality:
Pre-commit hook: Runs lint-staged to format and lint files
Pre-push hook: Runs tests and type checking
These hooks help maintain code quality and prevent pushing code with failing tests.
App-Server Protocol Testing
After changing API shapes in app-server-protocol:
Regenerate Schema Fixtures
just write-app-server-schema
# If experimental API fixtures are affected:
just write-app-server-schema --experimental
Validate Changes
cargo test -p codex-app-server-protocol
Sandbox Testing
Test commands under the Codex sandbox using dedicated subcommands:
macOS Seatbelt
Linux Landlock
Windows
codex sandbox macos [--full-auto] [--log-denials] [COMMAND]...
# Legacy alias
codex debug seatbelt [--full-auto] [--log-denials] [COMMAND]...
Use --log-denials on macOS to see what file accesses are being blocked by Seatbelt.
Before Submitting a PR
Before marking your PR as ready for review, run all checks locally:
# Format code
just fmt
# Fix linter issues
just fix -p < crate-you-touche d >
# Run tests
cargo test -p < crate-you-touche d >
# If you changed core crates:
cargo test
# Run full validation suite
pnpm test && pnpm run lint && pnpm run typecheck
CI failures that could have been caught locally slow down the review process. Always run checks before pushing.
Next Steps
Guidelines Review contribution guidelines
Building Learn how to build the project