Natural Language Parser¶

Overview¶

NLProjectParser converts semi-structured, informal plain text into valid mcprojsim YAML project files. It sits at the boundary between human-readable input and the structured project model, and is the backend for both the mcprojsim generate CLI command and the MCP server's generate_project_file tool. The parser processes input line by line using a section-based state machine: a header line (e.g. Task 1:, Resource 2: Alice) opens a new section, and subsequent bullet lines fill in that section's properties.

When to use this module: Use NLProjectParser when you want to create project files from natural language — for rapid prototyping, LLM-generated descriptions, or programmatic project creation without writing YAML by hand.

Capability	Description
`parse(text)`	Runs the state machine over plain text and returns a `ParsedProject` dataclass
`to_yaml(project)`	Renders a `ParsedProject` to a valid YAML project file string
`parse_and_generate(text)`	Convenience wrapper combining `parse` + `to_yaml` in one call
Flexible separators	Keyword–value separators `:`, `.`, `=`, or space are all equivalent
T-shirt size normalisation	Aliases like `Extra Large`, `2XL`, or `Medium` resolve to canonical sizes
Resources & calendars	Parses `Resource N:` and `Calendar:` sections alongside tasks and sprint planning

Background: Section-based state machine — The parser reads input top to bottom. Any line matching a section-header pattern (Task N:, Resource N:, Calendar:, Sprint planning:, etc.) switches the current active section. Bullet lines (-, *, •) are dispatched to the active section's property handler. Project-level metadata (name, start date, hours per day) is matched anywhere outside a section.

Imports:

from mcprojsim.nl_parser import NLProjectParser

`NLProjectParser`¶

Converts semi-structured, plain-text project descriptions into valid mcprojsim YAML project files. Also available via the mcprojsim generate CLI command and the MCP server's generate_project_file tool.

from mcprojsim.nl_parser import NLProjectParser

parser = NLProjectParser()

Method	Signature	Description
`parse`	`(text: str) -> ParsedProject`	Extract project metadata and tasks from a text description. Returns a `ParsedProject`. Raises `ValueError` if no tasks are found.
`to_yaml`	`(project: ParsedProject) -> str`	Render a `ParsedProject` as a valid YAML project file string.
`parse_and_generate`	`(text: str) -> str`	Convenience wrapper: calls `parse` then `to_yaml` and returns the YAML string directly.

Input format¶

The parser processes the description line by line using a section-based state machine. A section header line (e.g. Task 1:, Resource 2: Alice, Sprint planning:) opens a new section; subsequent bullet lines (prefixed with -, *, or •) are parsed as properties of that section. Blank lines are ignored. Project-level metadata (name, start date, etc.) can appear anywhere outside a section.

Separators are flexible. In most patterns the separator between keyword and value can be :, ., =, or a space, and the keyword itself is case-insensitive. For example, all of the following are equivalent:

Size: M
Size. M
Size = M
size XL

Project-level metadata¶

These lines can appear before the first task or between sections:

Keyword	Example	Notes
`Project name:` / `Project:`	`Project name: Website Redesign`
`Start date:`	`Start date: 2026-06-01`	ISO 8601 (`YYYY-MM-DD`)
`Description:`	`Description: Q3 infrastructure work`
`Hours per day:`	`Hours per day: 7.5`	Default `8.0`
`Confidence levels:`	`Confidence levels: 50, 80, 90, 95`	Comma-separated percentiles
`Rate:` / `Hourly rate:` / `Default rate:` / `Blended rate:`	`Rate: $150/hour`	Default hourly rate (cost estimation)
`Overhead:` / `Overhead rate:`	`Overhead: 15%`	Overhead multiplier applied to all costs
`Currency:`	`Currency: USD`	Three-letter ISO 4217 currency code

Tasks¶

A task section starts with Task N: optionally followed by the task name on the same line. Subsequent bullet lines define properties.

Alternatively, when no Task N: headers are present, the parser auto-detects tasks from plain lists:

List format	Example	Numbering
Dot numbered	`1. Design phase`	Preserved from source
Paren numbered	`2) Implementation`	Preserved from source
Bracket numbered	`[3] Testing`	Preserved from source
Bullet (`-`, `*`, `•`)	`- Backend API`	Auto-assigned (1, 2, 3, …)
Hash numbered	`# 1 First task`	Preserved from source

Auto-task mode activates only when no Task N: headers have been seen. The two modes cannot be mixed.

Indented continuation lines under auto-detected tasks are parsed as bullet properties, identical to the explicit-header format.

Inline properties on task lines¶

When using auto-detected lists, properties can appear inline on the task name line:

Inline pattern	Example	Effect
Bracketed size	`Backend API [XL]`	Sets `t_shirt_size = "XL"`
Parenthesized size	`Frontend (M)`	Sets `t_shirt_size = "M"`
Fuzzy size hint	`probably an M`, `likely L`, `assume S`	Sets `t_shirt_size`
Inline range	`3–5 days`, `2-4 hours`	Sets `low`, `expected`, `high`, `unit`
Inline dependency	`depends on Task 1`	Sets `dependency_refs`

The parser extracts inline properties and strips them from the task name.

Task bullet properties¶

Bullet keyword	Example	Notes
`Name:`	`- Name: Backend API`	Also accepted as first unmatched bullet
`Size:`	`- Size: M`	See size aliases below
`Story points:` / `Points:`	`- Story points: 5`
`Estimate:`	`- Estimate: 3/5/10 days`	`low/expected/high`, separator `/` `-` or `,`
`Depends on:` / `Depends:` / `Depend on:`	`- Depends on Task 1, Task 3`	References by task number
`Resources:`	`- Resources: Alice, Bob`	Names must match a `Resource N:` header
`Max resources:`	`- Max resources: 2`	Concurrent resource cap
`Min experience:` / `Min experience level:`	`- Min experience: 2`	1–3
`Fixed cost:` / `One-time cost:`	`- Fixed cost: 5000`	One-time cost added regardless of duration
`Risk:` / `Risks:`	`- Risk: Integration failure`	Opens a risk sub-section under this task (see below)

Risk bullet properties (nested under a Risk: line within a task section):

Bullet keyword	Example	Notes
`Probability:`	`- Probability: 20%`	0–100
`Impact:`	`- Impact: 5 days`	Schedule impact; optional unit
`Cost impact:`	`- Cost impact: $10000`	Monetary impact if the risk occurs

Risks can also be described in prose form inside a task: "there is a 15% chance of a 3-day delay" or "a 10% risk of $5000 penalty".

Estimate units for explicit estimates: hours / hour / h, days / day / d, weeks / week / w.

T-shirt size aliases — all of the following map to a canonical size:

Canonical	Accepted aliases
`XS`	`XS`, `Extra Small`, `Extrasmall`
`S`	`S`, `Small`
`M`	`M`, `Medium`, `Med`
`L`	`L`, `Large`
`XL`	`XL`, `Extra Large`, `Extralarge`
`XXL`	`XXL`, `Extra Extra Large`, `2XL`

Resources¶

A resource section starts with Resource N: Name. Bullet properties:

Bullet keyword	Example	Notes
`Experience level:` / `Experience:`	`- Experience: 3`	1–3
`Productivity level:` / `Productivity:`	`- Productivity: 1.1`	Multiplier, default `1.0`
`Availability:`	`- Availability: 0.8`	Fraction of full-time, 0–1
`Calendar:`	`- Calendar: part_time`	References a `Calendar:` section ID
`Sickness prob:` / `Sickness:`	`- Sickness: 0.02`	Per-day probability
`Rate:` / `Hourly rate:` / `Cost:`	`- Rate: $120/hour`	Per-resource hourly rate (cost estimation)
`Absence:` / `Planned absence:`	`- Absence: 2026-05-15, 2026-06-01`	Comma-separated ISO dates or date ranges (`2026-05-20 to 2026-05-22`)

Calendars¶

A calendar section starts with Calendar: id. Bullet properties:

Bullet keyword	Example	Notes
`Work hours per day:` / `Work hours:`	`- Work hours: 7`
`Work days:`	`- Work days: 1, 2, 3, 4`	Integers, 1=Mon … 7=Sun
`Holidays:`	`- Holidays: 2026-12-25, 2026-12-26`	ISO 8601 dates

Sprint planning¶

A Sprint planning: header opens the sprint planning section. Bullet properties:

Bullet keyword	Example	Notes
`Sprint length:`	`- Sprint length: 2`	Weeks; also `2-week sprints`
`Capacity mode:`	`- Capacity mode: story points`	`story points` or `tasks`
`Planning confidence level:`	`- Planning confidence level: 80%`
`Velocity model:`	`- Velocity model: empirical`	`empirical` or `neg_binomial`
`Removed work treatment:`	`- Removed work treatment: churn_only`	`churn_only` or `reduce_backlog`
`Sickness:`	`- Sickness: enabled`	`enabled`/`disabled`/`on`/`off`/`yes`/`no`/`true`/`false`; also `No sickness`
`Sickness team size:`	`- Sickness team size: 6`
`Sickness probability per person per week:`	`- Sickness probability: 5%`
`Sickness duration log mu:`	`- Sickness duration log mu: 1.1`
`Sickness duration log sigma:`	`- Sickness duration log sigma: 0.4`

Sprint history entries use Sprint history <id>: as the header (auto-ID generated as SPR-001, SPR-002, … if omitted). Bullet keywords:

Bullet keyword	Aliases	Notes
`Completed:`	`Done:`, `Finished:`, `Delivered:`	`10 points` or `10 tasks`
`Spillover:`	`Carryover:`, `Rolled over:`
`Added:`	`Scope added:`
`Removed:`	`Scope removed:`
`Holiday factor:`		Capacity reduction factor

Future sprint overrides use Future sprint override <N>: or Future sprint override <YYYY-MM-DD>: as the header. Bullet properties: Sprint number:, Start date:, Holiday factor:, Capacity multiplier:, Notes:.

Complete example¶

Project name: Platform Migration
Start date: 2026-05-01

Resource 1: Alice
- Experience: 3
- Productivity: 1.1
- Sickness: 0.02
- Absence: 2026-05-15

Resource 2: Bob
- Experience: 2
- Availability: 0.8

Calendar: default
- Work hours: 8
- Work days: 1, 2, 3, 4, 5
- Holidays: 2026-05-25

Task 1: Architecture design
- Estimate: 16/24/40 hours
- Min experience: 2

Task 2: Core implementation
- Estimate: 80/120/180 hours
- Depends on Task 1
- Resources: Alice, Bob
- Max resources: 2
- Min experience: 2

from mcprojsim.nl_parser import NLProjectParser

yaml_output = NLProjectParser().parse_and_generate(description)

Data Classes¶

`ParsedProject`¶

Top-level container for all data extracted from a natural language description.

Field	Type	Default	Description
`name`	`str`	`"Untitled Project"`	Project name.
`start_date`	`str \\| None`	`None`	ISO 8601 start date string.
`description`	`str \\| None`	`None`	Optional project description.
`hours_per_day`	`float`	`8.0`	Working hours per day.
`tasks`	`list[ParsedTask]`	`[]`	Extracted tasks.
`confidence_levels`	`list[int]`	`[50, 80, 90, 95]`	Percentile confidence levels to report.
`resources`	`list[ParsedResource]`	`[]`	Extracted team members/resources.
`calendars`	`list[ParsedCalendar]`	`[]`	Extracted working calendars.
`sprint_planning`	`ParsedSprintPlanning \\| None`	`None`	Sprint planning configuration, if present.
`default_hourly_rate`	`float \\| None`	`None`	Default hourly rate for cost estimation.
`overhead_rate`	`float \\| None`	`None`	Overhead multiplier applied to all costs.
`currency`	`str \\| None`	`None`	Three-letter ISO 4217 currency code.

`ParsedTask`¶

A single task extracted from the description.

Field	Type	Default	Description
`number`	`int`	required	Task number as written in the input (e.g., `1` for `Task 1:`).
`name`	`str`	`""`	Task name.
`t_shirt_size`	`str \\| None`	`None`	Normalised T-shirt size label (e.g., `"M"`, `"XL"`).
`story_points`	`int \\| None`	`None`	Story point estimate.
`low_estimate`	`float \\| None`	`None`	Optimistic explicit estimate.
`expected_estimate`	`float \\| None`	`None`	Expected explicit estimate.
`high_estimate`	`float \\| None`	`None`	Pessimistic explicit estimate.
`estimate_unit`	`str`	`"days"`	Unit for explicit estimates (`"days"` or `"hours"`).
`dependency_refs`	`list[str]`	`[]`	Raw dependency references as written (e.g., `["Task 1"]`).
`description`	`str \\| None`	`None`	Optional task description text.
`resources`	`list[str]`	`[]`	Resource names assigned to the task.
`max_resources`	`int`	`1`	Maximum number of resources that can work the task concurrently.
`min_experience_level`	`int`	`1`	Minimum experience level required for an assigned resource.
`fixed_cost`	`float \\| None`	`None`	One-time fixed cost added to this task regardless of duration.
`risks`	`list[ParsedRisk]`	`[]`	Risks associated with this task.

`ParsedResource`¶

A team member or resource extracted from the description.

Field	Type	Default	Description
`number`	`int`	required	Resource number as written in the input (e.g., `1` for `Resource 1:`).
`name`	`str`	`""`	Resource name.
`availability`	`float`	`1.0`	Fraction of working time available (0.0–1.0).
`experience_level`	`int`	`2`	Experience level (1–3; 1 = junior, 2 = mid-level, 3 = senior).
`productivity_level`	`float`	`1.0`	Productivity multiplier.
`calendar`	`str`	`"default"`	Calendar ID used by this resource.
`sickness_prob`	`float`	`0.0`	Per-day sickness probability.
`planned_absence`	`list[str]`	`[]`	List of planned absence date strings.
`hourly_rate`	`float \\| None`	`None`	Per-resource hourly rate for cost estimation.

`ParsedRisk`¶

A risk associated with a task, extracted from the description.

Field	Type	Default	Description
`name`	`str`	`""`	Risk name.
`probability`	`float`	`0.0`	Probability of occurrence (0.0–1.0 or percentage divided by 100).
`impact_value`	`float \\| None`	`None`	Schedule impact amount.
`impact_unit`	`str \\| None`	`None`	Unit for the schedule impact (`"days"`, `"hours"`, etc.).
`cost_impact`	`float \\| None`	`None`	Monetary impact if the risk occurs.

`ParsedCalendar`¶

A working calendar extracted from the description.

Field	Type	Default	Description
`id`	`str`	`"default"`	Calendar identifier.
`work_hours_per_day`	`float`	`8.0`	Working hours per day.
`work_days`	`list[int]`	`[1, 2, 3, 4, 5]`	Working days of the week (1 = Monday … 7 = Sunday).
`holidays`	`list[str]`	`[]`	ISO 8601 holiday date strings.

`ParsedSprintPlanning`¶

Sprint planning configuration extracted from the description.

Field	Type	Default	Description
`enabled`	`bool`	`True`	Whether sprint planning is enabled.
`sprint_length_weeks`	`int`	`2`	Sprint length in weeks.
`capacity_mode`	`str`	`"story_points"`	Capacity tracking mode (`"story_points"` or `"tasks"`).
`planning_confidence_level`	`float \\| None`	`None`	Confidence level override for commitment guidance.
`removed_work_treatment`	`str \\| None`	`None`	How removed work is handled (`"churn_only"` or `"reduce_backlog"`).
`velocity_model`	`str \\| None`	`None`	Velocity model override (`"empirical"` or `"neg_binomial"`).
`sickness_enabled`	`bool \\| None`	`None`	Override for sickness simulation.
`sickness_team_size`	`int \\| None`	`None`	Team size for sickness modelling.
`sickness_probability_per_person_per_week`	`float \\| None`	`None`	Per-person-per-week sickness probability override.
`sickness_duration_log_mu`	`float \\| None`	`None`	Log-mean for sickness duration distribution override.
`sickness_duration_log_sigma`	`float \\| None`	`None`	Log-sigma for sickness duration distribution override.
`future_sprint_overrides`	`list[ParsedFutureSprintOverride]`	`[]`	Capacity overrides for future sprints.
`history`	`list[ParsedSprintHistoryEntry]`	`[]`	Historical sprint data.

`ParsedFutureSprintOverride`¶

A capacity override for a future sprint.

Field	Type	Default	Description
`sprint_number`	`int \\| None`	`None`	Sprint number this override applies to.
`start_date`	`str \\| None`	`None`	ISO 8601 start date for the overridden sprint.
`holiday_factor`	`float \\| None`	`None`	Capacity reduction factor due to holidays (0.0–1.0).
`capacity_multiplier`	`float \\| None`	`None`	Overall capacity multiplier for the sprint.
`notes`	`str \\| None`	`None`	Free-text notes for this override.

`ParsedSprintHistoryEntry`¶

A historical sprint outcome.

Field	Type	Default	Description
`sprint_id`	`str`	required	Unique sprint identifier.
`completed_story_points`	`float \\| None`	`None`	Story points completed in the sprint.
`completed_tasks`	`int \\| None`	`None`	Tasks completed (task-capacity mode).
`spillover_story_points`	`float \\| None`	`None`	Story points that spilled over to the next sprint.
`spillover_tasks`	`int \\| None`	`None`	Tasks that spilled over (task-capacity mode).
`added_story_points`	`float \\| None`	`None`	Story points added mid-sprint.
`added_tasks`	`int \\| None`	`None`	Tasks added mid-sprint (task-capacity mode).
`removed_story_points`	`float \\| None`	`None`	Story points removed mid-sprint.
`removed_tasks`	`int \\| None`	`None`	Tasks removed mid-sprint (task-capacity mode).
`holiday_factor`	`float \\| None`	`None`	Capacity reduction factor applied to this sprint.