Natural Language Parser¶
Overview¶
NLProjectParser converts semi-structured, informal plain text into valid mcprojsim YAML project files. It sits at the boundary between human-readable input and the structured project model, and is the backend for both the mcprojsim generate CLI command and the MCP server's generate_project_file tool. The parser processes input line by line using a section-based state machine: a header line (e.g. Task 1:, Resource 2: Alice) opens a new section, and subsequent bullet lines fill in that section's properties.
When to use this module: Use NLProjectParser when you want to create project files from natural language — for rapid prototyping, LLM-generated descriptions, or programmatic project creation without writing YAML by hand.
| Capability | Description |
|---|---|
parse(text) |
Runs the state machine over plain text and returns a ParsedProject dataclass |
to_yaml(project) |
Renders a ParsedProject to a valid YAML project file string |
parse_and_generate(text) |
Convenience wrapper combining parse + to_yaml in one call |
| Flexible separators | Keyword–value separators :, ., =, or space are all equivalent |
| T-shirt size normalisation | Aliases like Extra Large, 2XL, or Medium resolve to canonical sizes |
| Resources & calendars | Parses Resource N: and Calendar: sections alongside tasks and sprint planning |
Background: Section-based state machine — The parser reads input top to bottom. Any line matching a section-header pattern (Task N:, Resource N:, Calendar:, Sprint planning:, etc.) switches the current active section. Bullet lines (-, *, •) are dispatched to the active section's property handler. Project-level metadata (name, start date, hours per day) is matched anywhere outside a section.
Imports:
NLProjectParser¶
Converts semi-structured, plain-text project descriptions into valid mcprojsim YAML project files. Also available via the mcprojsim generate CLI command and the MCP server's generate_project_file tool.
| Method | Signature | Description |
|---|---|---|
parse |
(text: str) -> ParsedProject |
Extract project metadata and tasks from a text description. Returns a ParsedProject. Raises ValueError if no tasks are found. |
to_yaml |
(project: ParsedProject) -> str |
Render a ParsedProject as a valid YAML project file string. |
parse_and_generate |
(text: str) -> str |
Convenience wrapper: calls parse then to_yaml and returns the YAML string directly. |
Input format¶
The parser processes the description line by line using a section-based state machine. A section header line (e.g. Task 1:, Resource 2: Alice, Sprint planning:) opens a new section; subsequent bullet lines (prefixed with -, *, or •) are parsed as properties of that section. Blank lines are ignored. Project-level metadata (name, start date, etc.) can appear anywhere outside a section.
Separators are flexible. In most patterns the separator between keyword and value can be :, ., =, or a space, and the keyword itself is case-insensitive. For example, all of the following are equivalent:
Project-level metadata¶
These lines can appear before the first task or between sections:
| Keyword | Example | Notes |
|---|---|---|
Project name: / Project: |
Project name: Website Redesign |
|
Start date: |
Start date: 2026-06-01 |
ISO 8601 (YYYY-MM-DD) |
Description: |
Description: Q3 infrastructure work |
|
Hours per day: |
Hours per day: 7.5 |
Default 8.0 |
Confidence levels: |
Confidence levels: 50, 80, 90, 95 |
Comma-separated percentiles |
Rate: / Hourly rate: / Default rate: / Blended rate: |
Rate: $150/hour |
Default hourly rate (cost estimation) |
Overhead: / Overhead rate: |
Overhead: 15% |
Overhead multiplier applied to all costs |
Currency: |
Currency: USD |
Three-letter ISO 4217 currency code |
Tasks¶
A task section starts with Task N: optionally followed by the task name on the same line. Subsequent bullet lines define properties.
Alternatively, when no Task N: headers are present, the parser auto-detects tasks from plain lists:
| List format | Example | Numbering |
|---|---|---|
| Dot numbered | 1. Design phase |
Preserved from source |
| Paren numbered | 2) Implementation |
Preserved from source |
| Bracket numbered | [3] Testing |
Preserved from source |
Bullet (-, *, •) |
- Backend API |
Auto-assigned (1, 2, 3, …) |
| Hash numbered | # 1 First task |
Preserved from source |
Auto-task mode activates only when no Task N: headers have been seen. The two modes cannot be mixed.
Indented continuation lines under auto-detected tasks are parsed as bullet properties, identical to the explicit-header format.
Inline properties on task lines¶
When using auto-detected lists, properties can appear inline on the task name line:
| Inline pattern | Example | Effect |
|---|---|---|
| Bracketed size | Backend API [XL] |
Sets t_shirt_size = "XL" |
| Parenthesized size | Frontend (M) |
Sets t_shirt_size = "M" |
| Fuzzy size hint | probably an M, likely L, assume S |
Sets t_shirt_size |
| Inline range | 3–5 days, 2-4 hours |
Sets low, expected, high, unit |
| Inline dependency | depends on Task 1 |
Sets dependency_refs |
The parser extracts inline properties and strips them from the task name.
Task bullet properties¶
| Bullet keyword | Example | Notes |
|---|---|---|
Name: |
- Name: Backend API |
Also accepted as first unmatched bullet |
Size: |
- Size: M |
See size aliases below |
Story points: / Points: |
- Story points: 5 |
|
Estimate: |
- Estimate: 3/5/10 days |
low/expected/high, separator / - or , |
Depends on: / Depends: / Depend on: |
- Depends on Task 1, Task 3 |
References by task number |
Resources: |
- Resources: Alice, Bob |
Names must match a Resource N: header |
Max resources: |
- Max resources: 2 |
Concurrent resource cap |
Min experience: / Min experience level: |
- Min experience: 2 |
1–3 |
Fixed cost: / One-time cost: |
- Fixed cost: 5000 |
One-time cost added regardless of duration |
Risk: / Risks: |
- Risk: Integration failure |
Opens a risk sub-section under this task (see below) |
Risk bullet properties (nested under a Risk: line within a task section):
| Bullet keyword | Example | Notes |
|---|---|---|
Probability: |
- Probability: 20% |
0–100 |
Impact: |
- Impact: 5 days |
Schedule impact; optional unit |
Cost impact: |
- Cost impact: $10000 |
Monetary impact if the risk occurs |
Risks can also be described in prose form inside a task: "there is a 15% chance of a 3-day delay" or "a 10% risk of $5000 penalty".
Estimate units for explicit estimates: hours / hour / h, days / day / d, weeks / week / w.
T-shirt size aliases — all of the following map to a canonical size:
| Canonical | Accepted aliases |
|---|---|
XS |
XS, Extra Small, Extrasmall |
S |
S, Small |
M |
M, Medium, Med |
L |
L, Large |
XL |
XL, Extra Large, Extralarge |
XXL |
XXL, Extra Extra Large, 2XL |
Resources¶
A resource section starts with Resource N: Name. Bullet properties:
| Bullet keyword | Example | Notes |
|---|---|---|
Experience level: / Experience: |
- Experience: 3 |
1–3 |
Productivity level: / Productivity: |
- Productivity: 1.1 |
Multiplier, default 1.0 |
Availability: |
- Availability: 0.8 |
Fraction of full-time, 0–1 |
Calendar: |
- Calendar: part_time |
References a Calendar: section ID |
Sickness prob: / Sickness: |
- Sickness: 0.02 |
Per-day probability |
Rate: / Hourly rate: / Cost: |
- Rate: $120/hour |
Per-resource hourly rate (cost estimation) |
Absence: / Planned absence: |
- Absence: 2026-05-15, 2026-06-01 |
Comma-separated ISO dates or date ranges (2026-05-20 to 2026-05-22) |
Calendars¶
A calendar section starts with Calendar: id. Bullet properties:
| Bullet keyword | Example | Notes |
|---|---|---|
Work hours per day: / Work hours: |
- Work hours: 7 |
|
Work days: |
- Work days: 1, 2, 3, 4 |
Integers, 1=Mon … 7=Sun |
Holidays: |
- Holidays: 2026-12-25, 2026-12-26 |
ISO 8601 dates |
Sprint planning¶
A Sprint planning: header opens the sprint planning section. Bullet properties:
| Bullet keyword | Example | Notes |
|---|---|---|
Sprint length: |
- Sprint length: 2 |
Weeks; also 2-week sprints |
Capacity mode: |
- Capacity mode: story points |
story points or tasks |
Planning confidence level: |
- Planning confidence level: 80% |
|
Velocity model: |
- Velocity model: empirical |
empirical or neg_binomial |
Removed work treatment: |
- Removed work treatment: churn_only |
churn_only or reduce_backlog |
Sickness: |
- Sickness: enabled |
enabled/disabled/on/off/yes/no/true/false; also No sickness |
Sickness team size: |
- Sickness team size: 6 |
|
Sickness probability per person per week: |
- Sickness probability: 5% |
|
Sickness duration log mu: |
- Sickness duration log mu: 1.1 |
|
Sickness duration log sigma: |
- Sickness duration log sigma: 0.4 |
Sprint history entries use Sprint history <id>: as the header (auto-ID generated as SPR-001, SPR-002, … if omitted). Bullet keywords:
| Bullet keyword | Aliases | Notes |
|---|---|---|
Completed: |
Done:, Finished:, Delivered: |
10 points or 10 tasks |
Spillover: |
Carryover:, Rolled over: |
|
Added: |
Scope added: |
|
Removed: |
Scope removed: |
|
Holiday factor: |
Capacity reduction factor |
Future sprint overrides use Future sprint override <N>: or Future sprint override <YYYY-MM-DD>: as the header. Bullet properties: Sprint number:, Start date:, Holiday factor:, Capacity multiplier:, Notes:.
Complete example¶
Project name: Platform Migration
Start date: 2026-05-01
Resource 1: Alice
- Experience: 3
- Productivity: 1.1
- Sickness: 0.02
- Absence: 2026-05-15
Resource 2: Bob
- Experience: 2
- Availability: 0.8
Calendar: default
- Work hours: 8
- Work days: 1, 2, 3, 4, 5
- Holidays: 2026-05-25
Task 1: Architecture design
- Estimate: 16/24/40 hours
- Min experience: 2
Task 2: Core implementation
- Estimate: 80/120/180 hours
- Depends on Task 1
- Resources: Alice, Bob
- Max resources: 2
- Min experience: 2
from mcprojsim.nl_parser import NLProjectParser
yaml_output = NLProjectParser().parse_and_generate(description)
Data Classes¶
ParsedProject¶
Top-level container for all data extracted from a natural language description.
| Field | Type | Default | Description |
|---|---|---|---|
name |
str |
"Untitled Project" |
Project name. |
start_date |
str \| None |
None |
ISO 8601 start date string. |
description |
str \| None |
None |
Optional project description. |
hours_per_day |
float |
8.0 |
Working hours per day. |
tasks |
list[ParsedTask] |
[] |
Extracted tasks. |
confidence_levels |
list[int] |
[50, 80, 90, 95] |
Percentile confidence levels to report. |
resources |
list[ParsedResource] |
[] |
Extracted team members/resources. |
calendars |
list[ParsedCalendar] |
[] |
Extracted working calendars. |
sprint_planning |
ParsedSprintPlanning \| None |
None |
Sprint planning configuration, if present. |
default_hourly_rate |
float \| None |
None |
Default hourly rate for cost estimation. |
overhead_rate |
float \| None |
None |
Overhead multiplier applied to all costs. |
currency |
str \| None |
None |
Three-letter ISO 4217 currency code. |
ParsedTask¶
A single task extracted from the description.
| Field | Type | Default | Description |
|---|---|---|---|
number |
int |
required | Task number as written in the input (e.g., 1 for Task 1:). |
name |
str |
"" |
Task name. |
t_shirt_size |
str \| None |
None |
Normalised T-shirt size label (e.g., "M", "XL"). |
story_points |
int \| None |
None |
Story point estimate. |
low_estimate |
float \| None |
None |
Optimistic explicit estimate. |
expected_estimate |
float \| None |
None |
Expected explicit estimate. |
high_estimate |
float \| None |
None |
Pessimistic explicit estimate. |
estimate_unit |
str |
"days" |
Unit for explicit estimates ("days" or "hours"). |
dependency_refs |
list[str] |
[] |
Raw dependency references as written (e.g., ["Task 1"]). |
description |
str \| None |
None |
Optional task description text. |
resources |
list[str] |
[] |
Resource names assigned to the task. |
max_resources |
int |
1 |
Maximum number of resources that can work the task concurrently. |
min_experience_level |
int |
1 |
Minimum experience level required for an assigned resource. |
fixed_cost |
float \| None |
None |
One-time fixed cost added to this task regardless of duration. |
risks |
list[ParsedRisk] |
[] |
Risks associated with this task. |
ParsedResource¶
A team member or resource extracted from the description.
| Field | Type | Default | Description |
|---|---|---|---|
number |
int |
required | Resource number as written in the input (e.g., 1 for Resource 1:). |
name |
str |
"" |
Resource name. |
availability |
float |
1.0 |
Fraction of working time available (0.0–1.0). |
experience_level |
int |
2 |
Experience level (1–3; 1 = junior, 2 = mid-level, 3 = senior). |
productivity_level |
float |
1.0 |
Productivity multiplier. |
calendar |
str |
"default" |
Calendar ID used by this resource. |
sickness_prob |
float |
0.0 |
Per-day sickness probability. |
planned_absence |
list[str] |
[] |
List of planned absence date strings. |
hourly_rate |
float \| None |
None |
Per-resource hourly rate for cost estimation. |
ParsedRisk¶
A risk associated with a task, extracted from the description.
| Field | Type | Default | Description |
|---|---|---|---|
name |
str |
"" |
Risk name. |
probability |
float |
0.0 |
Probability of occurrence (0.0–1.0 or percentage divided by 100). |
impact_value |
float \| None |
None |
Schedule impact amount. |
impact_unit |
str \| None |
None |
Unit for the schedule impact ("days", "hours", etc.). |
cost_impact |
float \| None |
None |
Monetary impact if the risk occurs. |
ParsedCalendar¶
A working calendar extracted from the description.
| Field | Type | Default | Description |
|---|---|---|---|
id |
str |
"default" |
Calendar identifier. |
work_hours_per_day |
float |
8.0 |
Working hours per day. |
work_days |
list[int] |
[1, 2, 3, 4, 5] |
Working days of the week (1 = Monday … 7 = Sunday). |
holidays |
list[str] |
[] |
ISO 8601 holiday date strings. |
ParsedSprintPlanning¶
Sprint planning configuration extracted from the description.
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
bool |
True |
Whether sprint planning is enabled. |
sprint_length_weeks |
int |
2 |
Sprint length in weeks. |
capacity_mode |
str |
"story_points" |
Capacity tracking mode ("story_points" or "tasks"). |
planning_confidence_level |
float \| None |
None |
Confidence level override for commitment guidance. |
removed_work_treatment |
str \| None |
None |
How removed work is handled ("churn_only" or "reduce_backlog"). |
velocity_model |
str \| None |
None |
Velocity model override ("empirical" or "neg_binomial"). |
sickness_enabled |
bool \| None |
None |
Override for sickness simulation. |
sickness_team_size |
int \| None |
None |
Team size for sickness modelling. |
sickness_probability_per_person_per_week |
float \| None |
None |
Per-person-per-week sickness probability override. |
sickness_duration_log_mu |
float \| None |
None |
Log-mean for sickness duration distribution override. |
sickness_duration_log_sigma |
float \| None |
None |
Log-sigma for sickness duration distribution override. |
future_sprint_overrides |
list[ParsedFutureSprintOverride] |
[] |
Capacity overrides for future sprints. |
history |
list[ParsedSprintHistoryEntry] |
[] |
Historical sprint data. |
ParsedFutureSprintOverride¶
A capacity override for a future sprint.
| Field | Type | Default | Description |
|---|---|---|---|
sprint_number |
int \| None |
None |
Sprint number this override applies to. |
start_date |
str \| None |
None |
ISO 8601 start date for the overridden sprint. |
holiday_factor |
float \| None |
None |
Capacity reduction factor due to holidays (0.0–1.0). |
capacity_multiplier |
float \| None |
None |
Overall capacity multiplier for the sprint. |
notes |
str \| None |
None |
Free-text notes for this override. |
ParsedSprintHistoryEntry¶
A historical sprint outcome.
| Field | Type | Default | Description |
|---|---|---|---|
sprint_id |
str |
required | Unique sprint identifier. |
completed_story_points |
float \| None |
None |
Story points completed in the sprint. |
completed_tasks |
int \| None |
None |
Tasks completed (task-capacity mode). |
spillover_story_points |
float \| None |
None |
Story points that spilled over to the next sprint. |
spillover_tasks |
int \| None |
None |
Tasks that spilled over (task-capacity mode). |
added_story_points |
float \| None |
None |
Story points added mid-sprint. |
added_tasks |
int \| None |
None |
Tasks added mid-sprint (task-capacity mode). |
removed_story_points |
float \| None |
None |
Story points removed mid-sprint. |
removed_tasks |
int \| None |
None |
Tasks removed mid-sprint (task-capacity mode). |
holiday_factor |
float \| None |
None |
Capacity reduction factor applied to this sprint. |