Project files¶
This chapter is the authoritative reference for the project definition file format used by mcprojsim. It documents every supported field, explains the motivation for each feature, and includes examples for every accepted syntax form.
For the formal EBNF grammar that precisely defines what is syntactically valid, see the Formal Grammar Specification. This reference uses human-readable prose and worked examples; the grammar is the machine-precise complement.
The parser layer is intentionally simple:
.yamland.ymlfiles are loaded with the YAML parser,.tomlfiles are loaded with the TOML parser,- both are then validated against the same
Projectmodel.
Because of that design, the logical schema is the same regardless of whether you write the file in YAML or TOML.
Creating project files¶
There are three ways to create a project file:
| Method | Best for |
|---|---|
| Write YAML or TOML by hand | Full control over every field |
mcprojsim generate |
Quick creation from a natural language description |
| MCP server | AI-assisted generation through an MCP client |
The generate command converts a plain-text project description into a valid YAML file without requiring any AI service — it runs locally using a built-in pattern-based parser.
Using mcprojsim generate¶
Write a description in a text file (e.g., description.txt):
Project name: Website Redesign
Start date: 2026-04-15
Task 1:
- Gather requirements
- Size: S
Task 2:
- Create wireframes
- Depends on Task 1
- Size: M
Task 3:
- Build frontend
- Depends on Task 2
- Size: XL
Generate the YAML project file:
See Running Simulations — mcprojsim generate for the full command reference and all options. The MCP Server page covers the complete input format and more examples.
Note
The generate command creates a minimal but valid project file. After generating, you can manually add uncertainty factors, risks, resources, and other fields documented in this chapter.
Supported file formats¶
The validator and CLI currently recognize these project file extensions:
.yaml.yml.toml
Any other extension is rejected as an unsupported file format.
Top-level structure¶
At the highest level, a project file may contain the following sections:
project— requiredtasks— requiredproject_risks— optionalresources— optional. When present, constrained scheduling is activated.calendars— optional. Used by constrained scheduling when resources reference calendars.sprint_planning— optional. Activates sprint-based simulation mode.
If project.team_size is greater than zero, default resources are generated up to that size (after validating explicit resources), which also makes scheduling resource-constrained.
The smallest valid project file therefore looks like this:
project:
name: "My Project"
start_date: "2026-03-01"
tasks:
- id: "task_001"
name: "First task"
estimate:
low: 1
expected: 2
high: 3
Top-level YAML skeleton¶
Top-level TOML skeleton¶
Formal section order¶
The parser does not require a specific order for top-level sections, but this is the clearest and most conventional order:
projectproject_riskstasksresourcescalendarssprint_planning
That is also the order used in most examples and in this reference.
The project section¶
The project section is required. It contains project-level metadata and reporting settings.
Every simulation starts here: the name and start_date are the two fields the engine cannot work without. Everything else in this section controls how results are interpreted and displayed — which confidence percentiles appear in the output, what colour thresholds mark a date as red or green, and what distribution model is used by default across all tasks. You set these once at project level and every task inherits them automatically, which keeps individual task definitions short.
Supported fields¶
| Field | Required | Type | Default | Notes |
|---|---|---|---|---|
name |
Yes | string | — | Project display name |
description |
No | string | null |
Optional descriptive text |
start_date |
Yes | ISO date string | — | Must parse as YYYY-MM-DD |
currency |
No | string | "USD" |
Stored as metadata |
confidence_levels |
No | list of integers | [10, 25, 50, 75, 80, 85, 90, 95, 99] |
Controls reported percentiles |
hours_per_day |
No | float | 8.0 |
Hours in a working day; used for day/week conversion |
distribution |
No | "triangular" or "lognormal" |
"triangular" |
Default estimate distribution for tasks that do not specify one |
team_size |
No | integer | null |
If > 0, target total resources after validation (may auto-create defaults) |
probability_red_threshold |
No | float | 0.50 |
Must be between 0.0 and 1.0 |
probability_green_threshold |
No | float | 0.90 |
Must be between 0.0 and 1.0 |
Required constraints¶
The implementation currently enforces these rules for project:
start_datemust be a valid ISO-format date string or a date object,probability_red_thresholdmust be less thanprobability_green_threshold,- both thresholds must be in the range
0.0to1.0, distribution, if provided, must be eithertriangularorlognormal,- if provided,
team_sizemust be>= 0, - if
team_size > 0and explicitresourcesare fewer, default resources are added up toteam_size, - if explicit
resourcesexceedteam_size, validation fails.
YAML example¶
project:
name: "Customer Portal Redesign"
description: "Next-generation customer portal with enhanced features"
start_date: "2025-11-01"
currency: "USD"
distribution: "triangular"
confidence_levels: [10, 25, 50, 75, 80, 85, 90, 95, 99]
probability_red_threshold: 0.50
probability_green_threshold: 0.90
TOML example¶
[project]
name = "Customer Portal Redesign"
description = "Next-generation customer portal with enhanced features"
start_date = "2025-11-01"
currency = "USD"
distribution = "triangular"
confidence_levels = [10, 25, 50, 75, 80, 85, 90, 95, 99]
probability_red_threshold = 0.50
probability_green_threshold = 0.90
The tasks section¶
The tasks section is required and must contain at least one task.
A task is any unit of schedulable work. You give each task an estimate of its duration, list its dependencies (other tasks that must complete before it can start), and optionally add uncertainty factors, resource requirements, and task-level risks. The simulation samples every task's duration in every iteration and sequences them according to the dependency graph — or, when resources are present, according to resource availability too. The more accurately each task is described, the more representative the resulting uncertainty distribution will be.
Each task is validated as a Task object with the following fields.
Supported task fields¶
| Field | Required | Type | Default | Notes |
|---|---|---|---|---|
id |
Yes | string | — | Must be unique across all tasks |
name |
Yes | string | — | Human-readable task name |
description |
No | string | null |
Optional task description |
estimate |
Yes | object | — | One of the supported estimate syntaxes |
dependencies |
No | list of strings | [] |
Each entry must match another task id |
uncertainty_factors |
No | object | defaults applied | Recognized factor fields described below |
resources |
No | list of strings | [] |
Task-level resource names |
max_resources |
No | integer | 1 |
Max number of resources that may be assigned concurrently |
min_experience_level |
No | integer | 1 |
Minimum resource experience allowed (1, 2, 3) |
planning_story_points |
No | integer > 0 | null |
Story point size used for sprint planning; overrides estimate.story_points when set |
priority |
No | integer | null |
Scheduling priority hint used in some sprint-planning modes |
spillover_probability_override |
No | float 0.0–1.0 | null |
Per-task override for the probability that incomplete work spills into the next sprint |
risks |
No | list of risk objects | [] |
Task-level probabilistic risks |
Minimal task example¶
TOML task example¶
[[tasks]]
id = "task_001"
name = "Design database schema"
[tasks.estimate]
min = 3
expected = 5
max = 8
The estimate section¶
Every task must define an estimate object. The estimate is the engine's primary input: it describes the uncertain duration of a task (or, in sprint planning mode, its size in story points or tasks) so the simulator can draw a different sample in each Monte Carlo iteration.
Choosing the right estimate style matters for accuracy:
- Triangular is the simplest and most common choice. It matches the way team members typically think — "at best X, most likely Y, at worst Z" — and is well suited to tasks with a finite worst case.
- Log-normal is appropriate for tasks where extreme over-runs are more likely than a symmetric range suggests — for example, research-heavy work, integration tasks with unknown third-party behaviour, or anything where "two to three times longer than expected" is a realistic tail scenario.
- T-shirt sizes are useful when teams do not have enough information to produce numeric estimates, or when they want to use a consistent sizing vocabulary across many tasks without converting each size into hours by hand.
- Story points are the standard agile unit and are mapped to numeric ranges through the configuration file, with a default mapping included out of the box.
The implementation supports four estimate styles:
- triangular estimate,
- log-normal estimate,
- T-shirt-size estimate,
- Story Point estimate.
The distribution field defaults to triangular when omitted. Setting it at task level overrides the project-level default for that task only — useful when most tasks warrant a triangular estimate but a few specific tasks benefit from a log-normal tail model.
For the formal grammar of all estimate forms see Formal Grammar — <estimate_spec>.
Field name aliases¶
The model accepts both long-form and short-form names for the three range fields:
| Canonical name | Accepted alias |
|---|---|
low |
min |
expected |
most_likely |
high |
max |
Both forms are valid in YAML and TOML. Examples in this chapter use both; they are equivalent.
1. Triangular estimate¶
This is the default and most common form.
Supported fields¶
| Field | Required | Type | Default |
|---|---|---|---|
distribution |
No | "triangular" or "lognormal" |
null (inherits project default) |
min |
Yes for triangular | number ≥ 0 | null |
expected |
Yes | number > 0 | null |
max |
Yes for triangular | number ≥ 0 | null |
unit |
No | "hours", "days", or "weeks" |
"hours" |
Validation rules¶
For a triangular estimate:
expectedmust be present,minmust be present,maxmust be present,low <= expected <= high.unitmust be one of"hours","days", or"weeks"if specified.
YAML example¶
TOML example¶
2. Log-normal estimate¶
The implementation also supports shifted log-normal estimates. In this mode
you still provide low, expected, and high, but they are interpreted
differently:
lowis the hard shift / minimum,expectedis the mode,highis interpreted as the configured percentile (P95 by default; seelognormal.high_percentilein the configuration file).
Supported fields¶
| Field | Required | Type | Notes |
|---|---|---|---|
distribution |
Yes | "lognormal" |
Must be set explicitly |
low |
Yes | number ≥ 0 | Required |
expected |
Yes | number > 0 | Required |
high |
Yes | number ≥ 0 | Required |
unit |
No | "hours", "days", or "weeks" |
"hours" |
Validation rules¶
For a log-normal estimate:
distributionmust belognormal,lowmust be present,expectedmust be present,highmust be present,low < expected < high,unitmust be one of"hours","days", or"weeks"if specified.
YAML example¶
TOML example¶
3. T-shirt-size estimate¶
This form lets the task refer to a symbolic size token.
Supported token forms:
- bare size:
M - qualified category/size:
epic.M - long-form size alias:
Medium,Epic.Large - full long-form aliases:
EXTRA_SMALL,SMALL,MEDIUM,LARGE,EXTRA_LARGE,EXTRA_EXTRA_LARGE
For the complete token grammar see Formal Grammar — <tshirt_size>.
Supported fields¶
| Field | Required | Type | Default |
|---|---|---|---|
t_shirt_size |
Yes | string | — |
distribution |
No | enum | Defaults to triangular |
Validation behavior¶
When t_shirt_size is present:
- the
TaskEstimatevalidator accepts the estimate immediately, - explicit
min,expected, andmaxare not required, unitmust not be specified — the unit comes from the configuration file'st_shirt_size_unitsetting (default:"hours"),- the simulation engine resolves the size to actual
min,expected, andmaxvalues from the active configuration, - if
distributionis omitted, the task inherits the project-level default distribution.
If the chosen size does not exist in the active configuration, simulation raises an error.
YAML example¶
TOML example¶
Important precedence note¶
As implemented today, if t_shirt_size is present together with explicit numeric estimate fields, the T-shirt-size path takes precedence during validation and resolution.
In other words, this is technically accepted by the model:
But it should be treated as ambiguous and avoided in real project files. If you use t_shirt_size, prefer to omit the explicit numeric range fields.
4. Story Point estimate¶
This form lets a task use agile-style relative sizing while still simulating an effort range.
Supported fields¶
| Field | Required | Type | Default |
|---|---|---|---|
story_points |
Yes | integer | — |
distribution |
No | enum | Defaults to triangular |
Validation behavior¶
When story_points is present:
- the value must currently be one of
1,2,3,5,8,13, or21, unitmust not be specified — the unit comes from the configuration file'sstory_point_unitsetting (default:"days"),- the simulation engine resolves the Story Point value to actual
min,expected, andmaxvalues from the active configuration.
If the chosen Story Point value does not exist in the active configuration, simulation raises an error.
YAML example¶
TOML example¶
Symbolic estimate mappings in configuration¶
Both t_shirt_size and story_points are symbolic estimate forms. They are converted to numeric ranges by the active configuration, using the unit specified by t_shirt_size_unit (default: "hours") and story_point_unit (default: "days") respectively. All values are then converted to hours internally.
Built-in defaults exist for both styles, and a custom configuration file may override all or only some of those mappings.
Example configuration¶
t_shirt_sizes:
story:
M:
low: 45
expected: 65
high: 130
epic:
M:
low: 240
expected: 520
high: 1400
t_shirt_size_default_category: epic
story_points:
5:
low: 4
expected: 6
high: 9
8:
low: 6
expected: 9
high: 16
If a custom configuration overrides only some T-shirt sizes or Story Point values, the remaining built-in defaults stay available.
The dependencies field¶
dependencies is a list of task IDs that must complete before the current task can start.
Dependencies drive the critical-path analysis and scheduling. When the dependency graph is shallow (few dependencies), many tasks can run in parallel, and the project duration is limited mainly by the longest single task. When the graph is deep (long chains of dependent tasks), the critical path grows and the overall schedule becomes more sensitive to individual task delays. Expressing dependencies accurately is therefore important: underspecifying them produces over-optimistic forecasts; overspecifying them produces unnecessarily conservative ones.
Supported syntax¶
Validation rules¶
The project validator enforces that:
- every dependency ID must match an existing task ID,
- task IDs must be unique,
- no circular dependency may exist,
- at least one task must exist overall.
Example¶
tasks:
- id: "task_001"
name: "Backend design"
estimate:
low: 2
expected: 4
high: 6
- id: "task_002"
name: "API implementation"
estimate:
low: 5
expected: 8
high: 12
dependencies: ["task_001"]
The uncertainty_factors field¶
At task level, uncertainty_factors is modeled as a structured object with a fixed set of recognized fields.
Uncertainty factors apply a multiplier to a task's sampled duration before it enters the schedule. They let you express qualitative risk signals — "the team is junior", "the requirements are immature", "this task integrates with multiple external systems" — without converting those signals into numeric estimates by hand. The multipliers for each factor level are defined in the configuration file; the project file simply assigns a level to each factor.
All five factors are optional and each defaults to its medium level, so you only need to specify factors where a task deviates from a typical baseline. Omitting uncertainty_factors entirely applies the full set of medium-level defaults.
Recognized fields¶
| Field | Default |
|---|---|
team_experience |
"medium" |
requirements_maturity |
"medium" |
technical_complexity |
"medium" |
team_distribution |
"colocated" |
integration_complexity |
"medium" |
YAML example¶
uncertainty_factors:
team_experience: "high"
requirements_maturity: "medium"
technical_complexity: "high"
team_distribution: "distributed"
integration_complexity: "medium"
TOML example¶
[tasks.uncertainty_factors]
team_experience = "high"
requirements_maturity = "medium"
technical_complexity = "high"
team_distribution = "distributed"
integration_complexity = "medium"
Important implementation note¶
Only the five fields above are represented by the current project model and used by the simulation engine.
If you add other names under uncertainty_factors, they are not part of the supported project-file reference and should not be relied on as active inputs.
The resources field inside a task¶
Each task may list resource names as strings:
Task-level resources is typed as a list of strings in the current model.
Resource assignment rule with max_resources¶
When resources lists multiple names, the scheduler may still assign fewer resources:
- assignment at task start is capped by
max_resources(default1), - scheduler applies an automatic practical cap:
granularity_cap = max(1, floor(task_effort_hours / 16.0))coordination_cap = 3practical_cap = min(granularity_cap, coordination_cap)- effective start-time assignment is:
min(max_resources, practical_cap, eligible_available_resources_now).
This avoids unrealistic over-assignment on short tasks while still permitting parallelization on larger tasks.
Important behavior for schema users:
- assignment happens at task start only,
- assigned resources remain fixed for the task execution,
- no mid-task reassignment/swapping is performed.
Also, if you explicitly list resource names and set min_experience_level, each named resource must meet that minimum or validation fails.
The risks field inside a task¶
Each task may have zero or more task-level risks.
A risk models a discrete event that may or may not occur during the task. In each Monte Carlo iteration, the engine draws a Bernoulli sample for each risk using its probability. When the risk fires, the impact is added to the task's sampled duration. This cleanly separates planned estimation uncertainty (captured by the estimate range) from identified discrete risks (captured here). It lets you quantify the schedule impact of a risk without baking it unconditionally into the estimate.
Task-level risks are appropriate for events that affect only one task — for example, "the third-party API may introduce a breaking change during this task" at probability 0.15. For events that could affect the whole project — for example, "key stakeholder may request a scope change" — use project_risks instead.
For the complete risk grammar see Formal Grammar — <risk_properties>.
Task-level risk object fields¶
| Field | Required | Type | Notes |
|---|---|---|---|
id |
Yes | string | Risk identifier |
name |
Yes | string | Display name |
probability |
Yes | float | Must be between 0.0 and 1.0 |
impact |
Yes | number or object | See below |
description |
No | string | Optional |
Risk impact syntax¶
The impact field supports two forms.
1. Simple numeric impact¶
This is interpreted as an absolute time penalty.
2. Structured impact object¶
This may be either absolute or percentage.
risks:
- id: "risk_001"
name: "Architecture rework"
probability: 0.20
impact:
type: "absolute"
value: 5
unit: "days"
risks:
- id: "risk_002"
name: "Approval delay"
probability: 0.10
impact:
type: "percentage"
value: 15
Validation rules for risks¶
The model enforces:
probabilitymust be between0.0and1.0,impact.valuemust be greater than0when object syntax is used,- numeric impacts are converted to floats,
- structured impacts must use
type: "absolute"ortype: "percentage".
The project_risks section¶
project_risks has exactly the same syntax as task-level risks, but it appears at top level and applies to the project as a whole.
When a project-level risk fires in a Monte Carlo iteration, its impact is added to the total elapsed project duration (on top of the scheduled task chain). Use project_risks for cross-cutting uncertainties — late design freezes, vendor delays, or regulatory response times — that cannot be cleanly attributed to a single task.
Example¶
project_risks:
- id: "proj_risk_001"
name: "Requirements change"
probability: 0.30
impact:
type: "absolute"
value: 10
unit: "days"
description: "Late business scope change"
The top-level resources section¶
Adding a resources section switches the scheduler from dependency-only mode to resource-constrained mode. In dependency-only mode the scheduler assumes unlimited workforce and sequences tasks purely by dependencies. In resource-constrained mode each task can only start when both its dependencies are satisfied and a suitable resource is available. This produces longer but more realistic schedules whenever the team is genuinely the bottleneck.
If you prefer not to enumerate individual resources, you can instead set project.team_size to a number of default resources to generate automatically.
Each resources entry supports the following fields:
Supported fields¶
| Field | Required | Type | Default | Constraints |
|---|---|---|---|---|
name |
No | string | auto-generated | Unique across all resolved resource names |
id |
No | string | null |
Legacy fallback for name; still accepted |
experience_level |
No | integer | 2 |
Must be 1, 2, or 3 |
productivity_level |
No | float | 1.0 |
0.1 to 2.0 |
sickness_prob |
No | float | 0.0 |
0.0 to 1.0 |
planned_absence |
No | list of dates | [] |
ISO YYYY-MM-DD format |
calendar |
No | string | "default" |
Must match a calendar id if calendars are defined |
availability |
No | float | 1.0 |
Must be in (0, 1] |
Note
If name is omitted, the engine auto-generates a unique name (resource_001, resource_002, …). The legacy id field is still accepted and used as a fallback for name.
YAML example¶
resources:
- name: "backend_dev"
experience_level: 3
productivity_level: 1.1
sickness_prob: 0.05
planned_absence:
- "2026-07-01"
- "2026-07-02"
- experience_level: 2
productivity_level: 0.9
sickness_prob: 0.08
The top-level calendars section¶
Calendars control when resources are available. Without a calendar definition, the scheduler uses an 8-hour / 5-day working week. Defining a calendar lets you model public holidays, reduced-hour days, or non-standard working weeks. Each resource can reference a specific calendar by name; resources that do not specify one fall back to the default calendar.
Each calendars entry supports the following fields:
| Field | Required | Type | Default | Notes |
|---|---|---|---|---|
id |
No | string | "default" |
Must be unique across all calendars |
work_hours_per_day |
No | float > 0 | 8.0 |
Number of working hours in a day |
work_days |
No | list of integers | [1, 2, 3, 4, 5] |
Days of the week; 1 = Monday, 7 = Sunday |
holidays |
No | list of ISO dates | [] |
Dates in YYYY-MM-DD format |
YAML example¶
calendars:
- id: "standard"
work_hours_per_day: 8
work_days: [1, 2, 3, 4, 5]
holidays:
- "2026-12-25"
- "2026-12-26"
The sprint_planning section¶
The optional sprint_planning section activates sprint-based simulation mode. When present and enabled: true, the engine models work as a sequence of fixed-length sprints rather than a single elapsed duration. Each sprint draws a velocity from the historical distribution, places as many backlog items as will fit, and carries the rest forward to the next sprint. The output is a distribution over the number of sprints (and hence calendar weeks) needed to complete the backlog, rather than a raw elapsed-hours distribution.
Use sprint planning when the team works in fixed-length iterations and tracks velocity \u2014 whether measured in story points or tasks completed per sprint. The historical sprint data you provide is the primary input: the simulator fits a velocity distribution to it and samples from that distribution in each Monte Carlo iteration.
Note
Sprint planning requires at least two usable historical sprint entries (entries with a positive delivery signal). The capacity_mode field controls whether the unit of delivery is story points or task counts, and must be consistent across all history entries and task backlogs.
For a comprehensive walkthrough of sprint planning features, configuration options, and interpretation of results, see Sprint Planning.
Top-level sprint planning fields¶
| Field | Required | Type | Default | Notes |
|---|---|---|---|---|
enabled |
No | boolean | false |
When false, the section is parsed but not simulated |
sprint_length_weeks |
Yes | integer > 0 | \u2014 | Duration of each sprint in calendar weeks |
capacity_mode |
Yes | story_points | tasks |
\u2014 | Unit family for all velocity and backlog measurements |
planning_confidence_level |
No | float (0, 1) | 0.80 |
Percentile of the sprint count distribution reported |
velocity_model |
No | empirical | neg_binomial |
empirical |
Distribution fitted to historical velocity observations |
removed_work_treatment |
No | churn_only | reduce_backlog |
churn_only |
How removed sprint items affect the net backlog |
history |
No | list or external descriptor | [] |
Historical sprint data; see below |
future_sprint_overrides |
No | list | [] |
Forward-looking capacity adjustments |
volatility_overlay |
No | object | disabled | Sprint-level disruption model |
spillover |
No | object | disabled | Task-level spillover model |
sickness |
No | object | disabled | Per-person sickness model |
velocity_model values¶
| Value | Meaning |
|---|---|
empirical |
Resample directly from observed historical velocities |
neg_binomial |
Fit a negative-binomial distribution to history and sample from that |
removed_work_treatment values¶
| Value | Meaning |
|---|---|
churn_only |
Removed items are treated as pure churn with no net effect on remaining backlog |
reduce_backlog |
Removed items reduce the total backlog to be completed |
Minimal example¶
sprint_planning:
enabled: true
sprint_length_weeks: 2
capacity_mode: story_points
history:
- sprint_id: "SPR-001"
completed_story_points: 10
spillover_story_points: 1
- sprint_id: "SPR-002"
completed_story_points: 9
spillover_story_points: 2
- sprint_id: "SPR-003"
completed_story_points: 11
For a tasks-based project, replace completed_story_points with completed_tasks and set capacity_mode: tasks.
Sprint history¶
Historical sprint data is the empirical foundation of the sprint planning simulation. Rather than asking you to specify a single velocity, mcprojsim fits a probability distribution to your team's observed sprint outcomes and samples from that distribution in each Monte Carlo iteration. The result is a forecast of "how many sprints will this backlog require?" expressed as a confidence interval, not a single number.
Why history matters¶
Two sprints of history give the simulator just enough to estimate the spread of the velocity distribution. More history reduces sampling uncertainty and produces tighter, more reliable confidence intervals. A minimum of two usable entries — entries where the delivery signal (completed points or tasks) is greater than zero — is required when enabled: true.
The velocity the simulator works with is not simply completed_story_points. For each historical sprint it computes the effective velocity as:
This means spillover_story_points, added_story_points, and removed_story_points all influence the distribution the simulator samples from. The more accurately you record them, the more representative the velocity distribution will be.
Unit-family consistency¶
All history entries in a project file must belong to the same unit family as capacity_mode. You cannot mix completed_story_points entries with completed_tasks entries in a single history list.
Inline history format¶
Each history list entry represents one completed sprint. Fields marked Conditional are required only for the chosen capacity_mode.
| Field | Required | Type | Default | Notes |
|---|---|---|---|---|
sprint_id |
Yes | string | — | Unique string identifier for the sprint; must be non-empty |
sprint_length_weeks |
No | integer > 0 | inherits top-level sprint_length_weeks |
Override sprint length for this individual entry |
completed_story_points |
Conditional | float ≥ 0 | — | Completed capacity; required when capacity_mode: story_points |
completed_tasks |
Conditional | integer ≥ 0 | — | Completed capacity; required when capacity_mode: tasks |
spillover_story_points |
No | float ≥ 0 | 0 |
Story points that carried over from the previous sprint into this one |
spillover_tasks |
No | integer ≥ 0 | 0 |
Tasks that carried over from the previous sprint into this one |
added_story_points |
No | float ≥ 0 | 0 |
Story points added to the sprint backlog mid-sprint |
added_tasks |
No | integer ≥ 0 | 0 |
Tasks added to the sprint backlog mid-sprint |
removed_story_points |
No | float ≥ 0 | 0 |
Story points removed from the sprint backlog mid-sprint |
removed_tasks |
No | integer ≥ 0 | 0 |
Tasks removed from the sprint backlog mid-sprint |
holiday_factor |
No | float > 0 | 1.0 |
Capacity reduction from public holidays; 0.8 means 20% reduction |
end_date |
No | ISO date string | null |
Date the sprint ended; used for timeline charts and calendar alignment |
team_size |
No | integer ≥ 0 | null |
Actual team headcount during this sprint |
notes |
No | string | null |
Free-text annotation; not used by the simulator |
Note
Story-point and task-count fields are mutually exclusive within a single entry. Do not mix completed_story_points with spillover_tasks, added_tasks, or removed_tasks in the same entry, or vice versa. The validator will reject it.
Story-point mode example¶
sprint_planning:
enabled: true
sprint_length_weeks: 2
capacity_mode: story_points
history:
- sprint_id: "SPR-001"
completed_story_points: 34
spillover_story_points: 2
- sprint_id: "SPR-002"
completed_story_points: 28
spillover_story_points: 4
added_story_points: 3
- sprint_id: "SPR-003"
completed_story_points: 31
removed_story_points: 2
holiday_factor: 0.9
end_date: "2026-03-14"
notes: "Public holiday reduced capacity"
Task-count mode example¶
sprint_planning:
enabled: true
sprint_length_weeks: 1
capacity_mode: tasks
history:
- sprint_id: "WK-01"
completed_tasks: 7
spillover_tasks: 1
- sprint_id: "WK-02"
completed_tasks: 6
spillover_tasks: 2
added_tasks: 1
- sprint_id: "WK-03"
completed_tasks: 8
External history format¶
For teams that maintain sprint data in a separate file, mcprojsim can load history from an external JSON or CSV source instead of an inline list:
sprint_planning:
enabled: true
sprint_length_weeks: 2
capacity_mode: story_points
history:
format: json
path: "sprint_planning_history.json"
Supported formats:
| Value | Description |
|---|---|
json |
A JSON file containing either a top-level array of sprint objects, or an object with a sprints key whose value is that array |
csv |
A CSV file with a header row; column names must match the field names in the table above |
The external file must use the same field names as the inline entries. See Sprint Planning for complete JSON and CSV shape examples.
future_sprint_overrides¶
Forward-looking sprint overrides let you express known capacity variations in upcoming sprints \u2014 for example, a holiday week that will reduce capacity by 20%, or a sprint where the team is larger.
Each override must identify its target sprint via at least one of sprint_number or start_date.
| Field | Required | Type | Default | Notes |
|---|---|---|---|---|
sprint_number |
Conditional | integer > 0 | null |
1-based sprint number relative to the start of simulation |
start_date |
Conditional | date string | null |
ISO date of the sprint start |
holiday_factor |
No | float > 0 | 1.0 |
Capacity scaling due to holidays |
capacity_multiplier |
No | float > 0 | 1.0 |
Additional scaling factor (e.g. 0.5 for a half-team sprint) |
notes |
No | string | null |
Annotation |
The effective capacity for an overridden sprint is holiday_factor \u00d7 capacity_multiplier of the baseline.
future_sprint_overrides:
- sprint_number: 3
holiday_factor: 0.8
notes: "Easter week"
- start_date: "2026-06-15"
capacity_multiplier: 0.5
notes: "Half the team at off-site"
Task fields used by sprint planning¶
When sprint_planning is active, three additional task fields become relevant:
| Field | Type | capacity_mode |
Notes |
|---|---|---|---|
planning_story_points |
integer > 0 | story_points |
Story point size for sprint planning; overrides estimate.story_points when set |
priority |
integer | both | Scheduling priority hint; lower values are allocated to sprints first |
spillover_probability_override |
float 0.0\u20131.0 | both | Per-task spillover probability, overrides the model default |
Configuration interaction¶
The sprint_defaults section in the configuration file supplies default values for every sprint_planning parameter. Any value set directly in the project file's sprint_planning section overrides the corresponding config default. This means you can tune velocity models, sickness parameters, and spillover behaviour once in a shared config file and override only sprint-specific values in each project file.
\newpage
Full YAML example¶
The following example demonstrates every currently recognized project-file section in one file.
project:
name: "Reference Example"
description: "Comprehensive project file example"
start_date: "2026-04-01"
currency: "USD"
confidence_levels: [50, 75, 80, 90, 95]
probability_red_threshold: 0.45
probability_green_threshold: 0.90
project_risks:
- id: "proj_risk_001"
name: "Late stakeholder change"
probability: 0.20
impact:
type: "absolute"
value: 4
unit: "days"
tasks:
- id: "task_001"
name: "Design"
description: "Design the feature set"
estimate:
low: 2
expected: 4
high: 7
unit: "days"
dependencies: []
uncertainty_factors:
team_experience: "high"
requirements_maturity: "medium"
resources: ["designer"]
risks:
- id: "task_risk_001"
name: "Clarification delay"
probability: 0.15
impact: 1.5
- id: "task_002"
name: "Implementation"
estimate:
distribution: "lognormal"
low: 3
expected: 8
high: 20
unit: "days"
dependencies: ["task_001"]
uncertainty_factors:
team_experience: "medium"
technical_complexity: "high"
team_distribution: "distributed"
resources: ["backend_dev", "frontend_dev"]
- id: "task_003"
name: "Deployment"
estimate:
t_shirt_size: "S"
dependencies: ["task_002"]
resources:
- name: "designer"
- name: "backend_dev"
- name: "frontend_dev"
calendars:
- id: "standard"
work_days: [1, 2, 3, 4, 5]
holidays:
- "2026-12-25"
Full TOML example¶
The same logical content can be written in TOML. The most important syntax difference is that repeated entries such as tasks and risks use array-of-table syntax.
[project]
name = "Reference Example"
description = "Comprehensive project file example"
start_date = "2026-04-01"
currency = "USD"
confidence_levels = [50, 75, 80, 90, 95]
probability_red_threshold = 0.45
probability_green_threshold = 0.90
[[project_risks]]
id = "proj_risk_001"
name = "Late stakeholder change"
probability = 0.20
[project_risks.impact]
type = "absolute"
value = 4
unit = "days"
[[tasks]]
id = "task_001"
name = "Design"
description = "Design the feature set"
dependencies = []
resources = ["designer"]
[tasks.estimate]
min = 2
expected = 4
max = 7
unit = "days"
[tasks.uncertainty_factors]
team_experience = "high"
requirements_maturity = "medium"
[[tasks.risks]]
id = "task_risk_001"
name = "Clarification delay"
probability = 0.15
impact = 1.5
[[tasks]]
id = "task_002"
name = "Implementation"
dependencies = ["task_001"]
resources = ["backend_dev", "frontend_dev"]
[tasks.estimate]
distribution = "lognormal"
low = 3
expected = 8
high = 20
unit = "days"
[tasks.uncertainty_factors]
team_experience = "medium"
technical_complexity = "high"
team_distribution = "distributed"
[[tasks]]
id = "task_003"
name = "Deployment"
dependencies = ["task_002"]
[tasks.estimate]
t_shirt_size = "S"
[[resources]]
name = "designer"
[[resources]]
name = "backend_dev"
[[resources]]
name = "frontend_dev"
[[calendars]]
id = "standard"
work_days = [1, 2, 3, 4, 5]
holidays = ["2026-12-25"]
Configuration file reference¶
The project file defines the work being simulated. The configuration file defines how mcprojsim interprets uncertainty, symbolic estimates, reporting defaults, and staffing analysis.
Use a configuration file when you want to:
- change uncertainty multipliers,
- override T-shirt size mappings,
- override Story Point mappings,
- set simulation and report defaults,
- tune staffing analysis behavior.
Unlike project files, the configuration file is currently loaded as YAML.
How configuration loading works¶
When you pass --config config.yaml, the loader:
- starts from the built-in default configuration,
- reads your YAML file,
- merges your values into the defaults recursively,
- validates the result against the
Configmodel.
That means you can override only the values you care about. For example, if you provide only t_shirt_sizes.M, the built-in definitions for XS, S, L, XL, and XXL remain available.
Top-level configuration structure¶
The current configuration schema supports these top-level sections:
uncertainty_factorst_shirt_sizest_shirt_size_unitt_shirt_size_default_categorystory_pointsstory_point_unitlognormalsimulationoutputstaffingconstrained_schedulingsprint_defaults
Minimal configuration example¶
Full configuration skeleton¶
uncertainty_factors:
team_experience:
high: 0.90
medium: 1.0
low: 1.30
requirements_maturity:
high: 1.0
medium: 1.15
low: 1.40
technical_complexity:
low: 1.0
medium: 1.20
high: 1.50
team_distribution:
colocated: 1.0
distributed: 1.25
integration_complexity:
low: 1.0
medium: 1.15
high: 1.35
t_shirt_sizes:
story:
XS:
low: 3
expected: 5
high: 15
M:
low: 40
expected: 60
high: 120
epic:
M:
low: 200
expected: 480
high: 1200
t_shirt_size_unit: "hours"
t_shirt_size_default_category: "epic"
story_points:
1:
low: 0.5
expected: 1
high: 3
5:
low: 3
expected: 5
high: 8
story_point_unit: "days"
lognormal:
high_percentile: 95
simulation:
default_iterations: 10000
random_seed: null
max_stored_critical_paths: 20
output:
formats: ["json", "csv", "html"]
include_histogram: true
histogram_bins: 50
critical_path_report_limit: 2
staffing:
min_individual_productivity: 0.25
experience_profiles:
senior:
productivity_factor: 1.0
communication_overhead: 0.04
mixed:
productivity_factor: 0.85
communication_overhead: 0.06
junior:
productivity_factor: 0.65
communication_overhead: 0.08
constrained_scheduling:
assignment_mode: "greedy_single_pass"
pass1_iterations: 1000
sickness_prob: 0.0
sprint_defaults:
planning_confidence_level: 0.80
velocity_model: "empirical"
removed_work_treatment: "churn_only"
The uncertainty_factors section¶
This section maps uncertainty factor names to per-level multipliers.
YAML structure¶
Supported fields¶
| Field | Required | Type | Default | Notes |
|---|---|---|---|---|
| factor name | No | mapping | built-in defaults | Outer keys are factor names |
| level name | No | float | none at the schema level | Inner keys depend on the factor, for example high, medium, low, colocated, or distributed |
Built-in level names by factor¶
| Factor | Built-in level names |
|---|---|
team_experience |
high, medium, low |
requirements_maturity |
high, medium, low |
technical_complexity |
low, medium, high |
team_distribution |
colocated, distributed |
integration_complexity |
low, medium, high |
Built-in factor names¶
The default configuration defines these factor names:
team_experiencerequirements_maturitytechnical_complexityteam_distributionintegration_complexity
These are also the names used by the current project-file model under tasks[].uncertainty_factors.
Built-in defaults¶
| Factor | High / low-side values | Medium / baseline |
|---|---|---|
team_experience |
high: 0.90, low: 1.30 |
medium: 1.0 |
requirements_maturity |
high: 1.0, low: 1.40 |
medium: 1.15 |
technical_complexity |
low: 1.0, high: 1.50 |
medium: 1.20 |
team_distribution |
colocated: 1.0, distributed: 1.25 |
not applicable |
integration_complexity |
low: 1.0, high: 1.35 |
medium: 1.15 |
Note
The configuration model can parse arbitrary nested dictionaries here, but the current project-file schema only exposes the recognized uncertainty-factor names listed above. Extra factor names in the config file are not useful unless the source model and simulation logic also reference them.
The t_shirt_sizes section¶
This section maps symbolic T-shirt sizes to numeric effort ranges by category. The typical structure looks like this:
t_shirt_sizes:
story:
M:
low: 40
expected: 60
high: 120
epic:
M:
low: 200
expected: 480
high: 1200
t_shirt_size_default_category: epic
Supported fields for each category/size entry¶
| Field | Required | Type | Default | Constraints |
|---|---|---|---|---|
low |
Yes when that size is defined | float | — | > 0 |
expected |
Yes | float | — | > 0 |
high |
Yes | float | — | > 0 |
Built-in category keys¶
bugstoryepicbusinessinitiative
Built-in size keys (per category)¶
XSSMLXLXXL
Built-in story defaults¶
| Size | low |
expected |
high |
|---|---|---|---|
XS |
3 | 5 | 15 |
S |
5 | 16 | 40 |
M |
40 | 60 | 120 |
L |
160 | 240 | 500 |
XL |
320 | 400 | 750 |
XXL |
400 | 500 | 1200 |
Example override¶
With this override, only story.M changes. Other built-in categories and sizes remain available.
The t_shirt_size_unit field¶
This field controls the unit used for all values in t_shirt_sizes.
Supported values¶
"hours""days""weeks"
Default¶
"hours"
Example¶
If a task uses estimate.t_shirt_size: "M", the simulator resolves it through t_shirt_size_default_category. A qualified value like estimate.t_shirt_size: "epic.M" resolves directly to that category.
The story_points section¶
This section maps Story Point values to numeric effort ranges.
Supported fields for each point value¶
| Field | Required | Type | Default | Constraints |
|---|---|---|---|---|
low |
Yes when that point value is defined | float | — | > 0 |
expected |
Yes | float | — | > 0 |
high |
Yes | float | — | > 0 |
Built-in point values¶
123581321
Built-in defaults¶
| Points | low |
expected |
high |
|---|---|---|---|
1 |
0.5 | 1 | 3 |
2 |
1 | 2 | 4 |
3 |
1.5 | 3 | 5 |
5 |
3 | 5 | 8 |
8 |
5 | 8 | 15 |
13 |
8 | 13 | 21 |
21 |
13 | 21 | 34 |
Example override¶
The story_point_unit field¶
This field controls the unit used for all values in story_points.
Supported values¶
"hours""days""weeks"
Default¶
"days"
The lognormal section¶
This section controls how the simulation interprets log-normal estimates.
Supported fields¶
| Field | Required | Type | Default | Constraints | Notes |
|---|---|---|---|---|---|
high_percentile |
No | integer | 95 |
one of 70, 75, 80, 85, 90, 95, 99 |
The percentile that high is treated as in a log-normal estimate |
Example¶
When a task uses distribution: lognormal, the high value is fitted as this percentile of the resulting distribution. A lower value makes the tail shorter; a higher value widens it.
The simulation section¶
This section controls default simulation behavior.
Supported fields¶
| Field | Required | Type | Default | Constraints | Notes |
|---|---|---|---|---|---|
default_iterations |
No | integer | 10000 |
> 0 |
Used by commands that rely on config defaults |
random_seed |
No | integer or null |
null |
none | Set for reproducible runs |
max_stored_critical_paths |
No | integer | 20 |
> 0 |
Number of full critical path sequences retained in results |
Example¶
The output section¶
This section controls reporting and export defaults.
Supported fields¶
| Field | Required | Type | Default | Constraints | Notes |
|---|---|---|---|---|---|
formats |
No | list of strings | ["json", "csv", "html"] |
each entry must be json, csv, or html; list must not be empty |
Default export formats for config-driven workflows |
include_histogram |
No | boolean | true |
— | Whether histogram data should be included where supported |
histogram_bins |
No | integer | 50 |
> 0 |
Number of bins for histogram generation |
critical_path_report_limit |
No | integer | 2 |
> 0 |
Number of stored full critical paths shown in reports by default |
Example¶
output:
formats: ["json", "html"]
include_histogram: true
histogram_bins: 80
critical_path_report_limit: 5
The staffing section¶
This section controls the staffing analysis added to CLI output and exports.
Supported fields¶
| Field | Required | Type | Default | Constraints | Notes |
|---|---|---|---|---|---|
effort_percentile |
No | integer | omitted | 1..99 when set |
Uses that effort percentile instead of the mean for staffing calculations |
min_individual_productivity |
No | float | 0.25 |
> 0, <= 1 |
Lower bound on each person's productivity after communication overhead is applied |
experience_profiles |
No | mapping | built-in defaults | profile values validated individually | Defines named team profiles |
When effort_percentile is omitted, staffing uses the mean total effort and mean elapsed time. When it is set, staffing uses the matching percentile for both effort and elapsed time, for example P80 effort with P80 elapsed time.
How min_individual_productivity affects team-size efficiency¶
The staffing model assumes that each additional person creates some communication overhead. For a team of size $n$, the model first calculates a raw per-person productivity:
$$ P_{raw}(n) = 1 - c(n - 1) $$
where $c$ is the communication_overhead for the selected experience profile.
That raw value is then floored by min_individual_productivity:
$$ P(n) = \max(P_{min}, P_{raw}(n)) $$
where $P_{min}$ is min_individual_productivity.
This means min_individual_productivity is not a bonus. It is a safety floor. It prevents the model from predicting that people become almost useless, or literally zero-productivity, as team size increases.
The model then converts per-person productivity into total effective capacity:
$$ E(n) = n \cdot P(n) \cdot f $$
where $f$ is the profile's productivity_factor.
Calendar duration for that team size is then:
$$ T(n) = \max\left(T_{cp}, \frac{W}{E(n)}\right) $$
where:
- $T_{cp}$ is the critical-path elapsed time,
- $W$ is the total effort in person-hours,
- $E(n)$ is the effective capacity of the team.
Finally, the Efficiency shown in the staffing table is calculated relative to the fastest team size found for that profile:
$$ \text{Efficiency}(n) = \frac{T_{min}}{T(n)} $$
So min_individual_productivity affects efficiency indirectly:
- if the floor is lower, very large teams lose more effective capacity as communication overhead grows,
- if the floor is higher, large teams retain more capacity and the efficiency drop-off on the right side of the staffing table is less severe,
- if the project is already near the critical-path floor, changing this value may have little visible effect, because no team can compress the schedule below $T_{cp}$ anyway.
Practical interpretation¶
- Small team sizes:
min_individual_productivityusually does nothing, because raw productivity is still above the floor. - Medium team sizes: the value may begin to matter if communication overhead becomes significant.
- Large team sizes: this setting determines how harshly the model penalises oversized teams.
For example, with communication_overhead: 0.06, raw individual productivity is:
- 1 person: $1.00$
- 3 people: $1 - 0.06 \cdot 2 = 0.88$
- 8 people: $1 - 0.06 \cdot 7 = 0.58$
- 15 people: $1 - 0.06 \cdot 14 = 0.16$
If min_individual_productivity is 0.25, the 15-person team is floored to $0.25$ instead of dropping to $0.16$. That keeps the team from looking unrealistically ineffective, while still showing diminishing returns.
In practice:
- use a lower value when you want the model to penalise oversized teams more aggressively,
- use a higher value when you believe communication overhead is real but should not collapse individual output too sharply,
- keep in mind that this value mainly shapes the right-hand side of the efficiency curve, where teams are larger than the optimal size.
The experience_profiles subsection¶
Each profile name maps to an object with these fields:
| Field | Required | Type | Default | Constraints |
|---|---|---|---|---|
productivity_factor |
Yes when that profile is defined | float | — | > 0 |
communication_overhead |
Yes when that profile is defined | float | — | 0..1 |
Built-in profile defaults¶
| Profile | productivity_factor |
communication_overhead |
|---|---|---|
senior |
1.00 | 0.04 |
mixed |
0.85 | 0.06 |
junior |
0.65 | 0.08 |
Example¶
staffing:
effort_percentile: 80
min_individual_productivity: 0.30
experience_profiles:
senior:
productivity_factor: 1.0
communication_overhead: 0.03
contractor:
productivity_factor: 0.75
communication_overhead: 0.05
Configuration validation summary¶
The current configuration model validates these rules directly:
t_shirt_size_unitmust be one ofhours,days, orweeks,story_point_unitmust be one ofhours,days, orweeks,lognormal.high_percentilemust be one of70,75,80,85,90,95,99,- all configured estimate ranges require positive
min,expected, andmax, output.formatsmust be a non-empty list; each entry must bejson,csv, orhtml,simulation.default_iterationsmust be greater than 0,simulation.max_stored_critical_pathsmust be greater than 0,output.histogram_binsmust be greater than 0,output.critical_path_report_limitmust be greater than 0,staffing.effort_percentile, when set, must be between 1 and 99,staffing.min_individual_productivitymust be greater than 0 and at most 1,experience_profiles[*].productivity_factormust be greater than 0,experience_profiles[*].communication_overheadmust be between 0 and 1.
Recommended authoring style for configuration files¶
- override only the values you need,
- keep symbolic estimate mappings consistent with your team's estimation conventions,
- set
random_seedonly when you want reproducible runs, - use
effort_percentilewhen staffing recommendations should be conservative, - add custom experience profiles only when they correspond to real planning scenarios.
Validation summary¶
The current implementation validates the following rules directly:
- file extension must be
.yaml,.yml, or.toml, projectmust be present,tasksmust be present,- there must be at least one task,
- task IDs must be unique,
- all dependencies must point to existing tasks,
- task dependencies must not be circular,
start_datemust parse as an ISO date,- probability thresholds must be in range and ordered correctly,
- triangular estimates must satisfy
low <= expected <= high, - log-normal estimates must include
low,expected, andhigh, and must satisfylow < expected < high, - risks must have probabilities in
0.0..1.0, - structured risk impacts must use positive values.
Notes on undocumented keys¶
This reference documents the supported keys that are explicitly modeled by the current source code.
Keys outside these structures are not part of the formal project-file reference. In the current implementation, undeclared keys are not a reliable extension mechanism and should not be used to represent important semantics unless and until the source model explicitly supports them.
Recommended authoring style¶
Although multiple forms are accepted, the clearest project files usually follow these practices:
- keep the
projectsection concise and metadata-focused, - use triangular estimates unless log-normal or T-shirt sizing is clearly justified,
- use task IDs that are stable and machine-friendly,
- keep dependency lists explicit,
- use only the recognized uncertainty-factor names,
- use structured risk impacts when you need
percentageor explicit units, - treat
resourcesandcalendarsas advanced sections whose detailed internal schema may evolve.
If you want to see the reference syntax used in practice, compare this chapter with examples/sample_project.yaml, examples/tshirt_walkthrough_project.yaml, examples/story_points_walkthrough_project.yaml, and examples/project_with_custom_thresholds.yaml.
\newpage