Adaptive Learning Technology

The Challenge of Personalized Cognitive Training

Cognitive training games face a fundamental challenge: users have vastly different ability levels, and a fixed difficulty that works for one person may be too easy or too hard for another. Too easy, and the brain isn't challenged enough to improve. Too hard, and users become frustrated and quit.

Traditional approaches use simple rule-based systems (e.g., “increase difficulty after 3 correct answers”), but these fail to account for the complex dynamics of human learning and engagement. Our solution leverages deep reinforcement learning to create a truly adaptive system that learns optimal difficulty adjustment strategies.

Theoretical Foundation

Flow Theory & Optimal Challenge

Our system is grounded in Csikszentmihalyi's Flow Theory, which identifies an optimal zone where challenge level matches skill level. In this state, learners experience:

Deep concentration and engagement
Intrinsic motivation to continue
Accelerated skill acquisition
Positive emotional experience

The Flow Zone in Cognitive Training


Accuracy < 60%  →  Frustration Zone  →  High dropout risk
Accuracy 65-85% →  FLOW ZONE         →  Optimal learning
Accuracy > 90%  →  Boredom Zone      →  Disengagement risk

Zone of Proximal Development

Vygotsky's Zone of Proximal Development (ZPD) suggests that learning is most effective when tasks are just beyond a learner's current independent capability. Our adaptive system continuously estimates each user's ZPD and adjusts difficulty to stay within it.

Item Response Theory (IRT)

We use the 2-Parameter Logistic (2PL) IRT model to model the relationship between user ability and task difficulty:

P(correct) = 1 / (1 + e^(-a(θ - b)))

Where θ is user ability, b is item difficulty, and a is item discrimination

Technical Architecture

Reinforcement Learning Framework

We formulate adaptive difficulty as a Markov Decision Process (MDP) and train a Deep Q-Network (DQN) agent to learn optimal difficulty adjustment policies.

State Space (9 features)

ability_score - Estimated cognitive ability
uncertainty - Confidence in estimate
session_count - User experience level
recent_accuracy - 5-session average
rt_trend - Response time trajectory
dprime_trend - Discriminability trend
current_difficulty - Active level
trials_completed - Session progress
session_accuracy - Current performance

Action Space (4 actions)

DECREASE - Reduce difficulty by one level
MAINTAIN - Keep current difficulty
INCREASE - Raise difficulty by one level
MICRO_ADJUST - Fine-tune within level

Reward Function

The reward function balances multiple objectives to optimize both learning effectiveness and user engagement:

Component	Weight	Description
Flow Zone	40%	Gaussian reward centered at 75% accuracy
Engagement	20%	Session completion bonus
Dropout Penalty	20%	Strong penalty for user abandonment
Improvement	10%	Reward for ability gains
Response Time	10%	Optimal pacing incentive

Training Methodology

Simulated Student Population

Real user data is limited and expensive to collect, so we pre-train the model on a diverse population of IRT-based simulated students. These synthetic learners exhibit realistic behaviors including:

Variable ability levels (struggling to advanced)
Learning over time (ability improves with practice)
Fatigue effects (performance degrades in long sessions)
Dropout behavior (frustration/boredom leads to quitting)

Student Archetypes

Archetype	Ability	Persistence	Population %
Struggling	30 ± 8	70%	15%
Developing	45 ± 10	75%	25%
Average	50 ± 12	80%	30%
Proficient	65 ± 10	85%	20%
Advanced	80 ± 8	90%	10%

Curriculum Learning

To improve training stability and ensure the model learns to handle all user types, we employ curriculum learning:

Phase 1 (0-30%): Focus on average students, include minimum 10% struggling
Phase 2 (30-70%): Increase struggling student proportion to 20% for focused learning
Phase 3 (70-100%): Full population distribution with all archetypes

DQN Hyperparameters

{
  "algorithm": "DQN (Deep Q-Network)",
  "framework": "Stable Baselines3",
  "learning_rate": 1e-4,
  "buffer_size": 50000,
  "batch_size": 64,
  "gamma": 0.95,
  "exploration_fraction": 0.2,
  "exploration_final_eps": 0.05,
  "network_architecture": [128, 64, 32],
  "total_training_steps": 500000
}

Training Results

After extensive experimentation with reward function tuning, dropout model improvements, and curriculum learning, our final model achieves exceptional performance:

89%

Flow Zone Rate

Target: ≥65%

Dropout Rate

Target: ≤20%

75.6%

Mean Accuracy

Target: 70-85%

Per-Archetype Performance

A key achievement is the model's ability to serve all user types effectively, including the challenging “struggling” archetype:

User Type	Baseline Model	Final Model	Improvement
Struggling	5%	75%	+70% (15× better)
Developing	40%	90%	+50%
Average	65%	95%	+30%
Proficient	65%	95%	+30%
Advanced	70%	90%	+20%

Learned Policy

The trained agent learned an effective difficulty adjustment strategy:

MICRO_ADJUST (63%): Primarily uses fine-grained adjustments within difficulty levels
DECREASE (28%): Frequently lowers difficulty when users struggle
INCREASE (7%): Conservatively raises difficulty only when confident
MAINTAIN (1%): Rarely holds steady, preferring active optimization

Significance & Impact

For Cognitive Science

Our adaptive system addresses a longstanding challenge in cognitive training research: maintaining optimal challenge levels across diverse populations. By keeping 89% of sessions in the flow zone, we maximize the conditions known to promote neuroplasticity and skill transfer.

For Struggling Learners

Perhaps the most significant impact is on users who need help most. Traditional fixed-difficulty systems leave struggling learners behind (only 5% flow zone rate). Our system achieves 75% flow zone rate for this group, making effective cognitive training accessible to:

Individuals with attention deficits (ADHD)
Those with learning disabilities
Older adults experiencing cognitive decline
Patients in cognitive rehabilitation

For User Engagement

Reducing dropout from 80% to 7% has profound implications for training effectiveness. Cognitive training requires sustained practice over weeks to produce measurable benefits. With our system:

Metric	Before	After	Impact
Complete 20+ sessions	~1%	~23%	23× more users get full benefit
Avg. session length	~15 trials	~55 trials	3.7× more training per session
User satisfaction	Low (frustration)	High (flow state)	Better retention & referrals

For Cognitive Assessment

Adaptive difficulty also enables more accurate cognitive profiling. Fixed difficulty tests suffer from ceiling and floor effects—high-ability users max out while low-ability users bottom out, providing limited discrimination. Our adaptive system:

Finds each user's true ability level through calibrated difficulty
Provides more precise construct score estimates
Enables tracking of ability changes over time
Reduces measurement error in cognitive profiles

Neural Network Architecture

Our adaptive difficulty agent uses a Deep Q-Network (DQN), a neural network that learns to estimate the expected cumulative reward (Q-value) for each possible action given the current state.

Network Structure


┌─────────────────────────────────────────────────────────────┐
│                    DQN ARCHITECTURE                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   INPUT LAYER (9 neurons)                                    │
│   ┌─────────────────────────────────────────────────────┐   │
│   │ ability_score │ uncertainty │ session_count │ ...   │   │
│   └─────────────────────────────────────────────────────┘   │
│                          ↓                                   │
│   HIDDEN LAYER 1 (128 neurons) + ReLU activation            │
│   ┌─────────────────────────────────────────────────────┐   │
│   │ ████████████████████████████████████████████████████│   │
│   └─────────────────────────────────────────────────────┘   │
│                          ↓                                   │
│   HIDDEN LAYER 2 (64 neurons) + ReLU activation             │
│   ┌───────────────────────────────────┐                     │
│   │ ██████████████████████████████████│                     │
│   └───────────────────────────────────┘                     │
│                          ↓                                   │
│   HIDDEN LAYER 3 (32 neurons) + ReLU activation             │
│   ┌───────────────────┐                                     │
│   │ ██████████████████│                                     │
│   └───────────────────┘                                     │
│                          ↓                                   │
│   OUTPUT LAYER (4 neurons) - Q-values for each action       │
│   ┌────────┬────────┬────────┬────────┐                     │
│   │DECREASE│MAINTAIN│INCREASE│ MICRO  │                     │
│   │ Q=-0.2 │ Q=0.1  │ Q=-0.5 │ Q=0.8  │  ← Select max       │
│   └────────┴────────┴────────┴────────┘                     │
│                                                              │
│   Total Parameters: ~15,000 (lightweight for edge inference) │
└─────────────────────────────────────────────────────────────┘

Input Feature Engineering

Each input feature is carefully normalized to ensure stable training:

Feature	Raw Range	Normalized	Source
`ability_score`	0-100	[0, 1]	Bayesian ability estimate from user_construct_scores
`uncertainty`	0-50	[0, 1]	Standard deviation of ability estimate
`session_count`	0-100+	[0, 1]	User's total completed sessions
`recent_accuracy`	0-1	[0, 1]	Rolling average of last 5 sessions
`rt_trend`	slope	[-1, 1]	Response time trajectory (faster/slower)
`dprime_trend`	slope	[-1, 1]	Signal detection improvement rate
`current_difficulty`	0-1	[0, 1]	Active difficulty level (0=easiest, 1=hardest)
`trials_completed`	0-100	[0, 1]	Progress within current session
`session_accuracy`	0-1	[0, 1]	Current session's accuracy so far

Why DQN Over Other Algorithms?

DQN Advantages

✓ Sample efficient - Experience replay reuses past data
✓ Off-policy - Can learn from historical sessions
✓ Fast inference - Single forward pass (<5ms)
✓ Discrete actions - Natural fit for difficulty levels
✓ Stable training - Target network prevents oscillation

PPO Trade-offs

✗ On-policy only - cannot use historical data
✗ Requires more samples for convergence
✗ Policy sampling adds inference latency
○ Better for continuous action spaces
○ More stable with complex reward landscapes

Training Pipeline

Our training pipeline uses a two-phase approach: pre-training on simulated students followed by fine-tuning on real user data.

Phase 1: Simulated Pre-Training


┌─────────────────────────────────────────────────────────────────────┐
│                    TRAINING DATA FLOW                                │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐        │
│  │  Student     │     │  Gymnasium   │     │    DQN       │        │
│  │  Population  │────▶│  Environment │────▶│    Agent     │        │
│  │  (IRT-based) │     │              │     │              │        │
│  └──────────────┘     └──────────────┘     └──────────────┘        │
│         │                    │                    │                  │
│         │                    │                    │                  │
│         ▼                    ▼                    ▼                  │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐        │
│  │ Sample       │     │ state, reward│     │ Experience   │        │
│  │ student with │     │ action, done │     │ Replay       │        │
│  │ archetype    │     │              │     │ Buffer       │        │
│  └──────────────┘     └──────────────┘     └──────────────┘        │
│                                                   │                  │
│                                                   ▼                  │
│                              ┌────────────────────────────┐         │
│                              │  Mini-batch Gradient       │         │
│                              │  Descent (Adam optimizer)  │         │
│                              │  Loss = (Q - target_Q)²    │         │
│                              └────────────────────────────┘         │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

IRT-Based Student Simulation

Each simulated student uses a 2-Parameter Logistic IRT model to generate realistic responses:


class IRTStudent:
    def probability_correct(self, difficulty: float) -> float:
        """2PL IRT model for response probability."""
        theta = (self.ability - 50) / 15  # Convert to IRT scale
        b = difficulty * 6 - 3            # Map [0,1] to [-3, 3]
        a = 1.0 + difficulty * 0.5        # Discrimination parameter

        # Apply fatigue effect (performance drops over time)
        fatigue = self.trials_completed * 0.001
        effective_theta = theta - fatigue

        # 2PL IRT formula
        logit = a * (effective_theta - b)
        return 1 / (1 + exp(-logit))

    def check_dropout(self) -> bool:
        """Simulate user dropout based on frustration/boredom."""
        if self.accuracy < 0.3:      # Too hard
            dropout_prob = 0.015 * (1.2 - self.persistence)
        elif self.accuracy > 0.95:   # Too easy
            dropout_prob = 0.010 * (1.2 - self.persistence)
        elif 0.65 <= self.accuracy <= 0.85:  # Flow zone
            dropout_prob = 0.002     # Very low dropout
        else:
            dropout_prob = 0.008 * (1.2 - self.persistence)

        return random() < dropout_prob

Curriculum Learning Strategy

We use curriculum learning to improve training stability and ensure the model learns to handle all user types:

0-30%

Focus: Average + 10% Struggling

30-70%

Emphasis: 20% Struggling

70-100%

Full Distribution

Struggling Developing Average Proficient Advanced

Phase 2: Real Data Fine-Tuning

After pre-training, we fine-tune on real session data extracted from our database:


-- Extract training data from Neon PostgreSQL
SELECT
  s.id as session_id,
  s.user_id,
  ucs.score as ability_score,
  ucs.uncertainty,
  COUNT(*) OVER (PARTITION BY s.user_id) as session_count,
  sm.value as metric_value,
  s.difficulty_level,
  s.status
FROM sessions s
JOIN session_metrics sm ON s.id = sm.session_id
JOIN user_construct_scores ucs ON s.user_id = ucs.user_id
WHERE s.ended_at IS NOT NULL
  AND s.status IN ('completed', 'abandoned')
ORDER BY s.started_at;

Platform Integration

The adaptive difficulty system integrates seamlessly with the Cog-Ace platform through our Cloudflare Workers API.

Request Flow


┌──────────────────────────────────────────────────────────────────────────┐
│                     ADAPTIVE DIFFICULTY REQUEST FLOW                      │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  ┌─────────┐      ┌─────────────────┐      ┌─────────────────────────┐  │
│  │  Game   │─────▶│  POST /api/v1/  │─────▶│  AdaptiveDifficulty     │  │
│  │ Client  │      │  sessions/start │      │  Service                │  │
│  └─────────┘      └─────────────────┘      └─────────────────────────┘  │
│       │                                              │                    │
│       │                                              ▼                    │
│       │                                    ┌─────────────────────┐       │
│       │                                    │  1. Build State     │       │
│       │                                    │     from user data  │       │
│       │                                    └─────────────────────┘       │
│       │                                              │                    │
│       │                                              ▼                    │
│       │                                    ┌─────────────────────┐       │
│       │                                    │  2. Query Database  │       │
│       │                                    │  - user_construct_  │       │
│       │                                    │    scores           │       │
│       │                                    │  - recent sessions  │       │
│       │                                    └─────────────────────┘       │
│       │                                              │                    │
│       │                                              ▼                    │
│       │                                    ┌─────────────────────┐       │
│       │                                    │  3. ONNX Inference  │       │
│       │                                    │     (<50ms)         │       │
│       │                                    └─────────────────────┘       │
│       │                                              │                    │
│       │                                              ▼                    │
│       │                                    ┌─────────────────────┐       │
│       │                                    │  4. Select Action   │       │
│       │                                    │  DECREASE/MAINTAIN/ │       │
│       │                                    │  INCREASE/MICRO     │       │
│       │                                    └─────────────────────┘       │
│       │                                              │                    │
│       │         ┌────────────────────────────────────┘                   │
│       │         ▼                                                         │
│       │  ┌─────────────────────────────────────────────────────────┐    │
│       │  │  Response: { sessionId, difficulty: 0.65, action: ... } │    │
│       │  └─────────────────────────────────────────────────────────┘    │
│       │         │                                                         │
│       ▼         ▼                                                         │
│  ┌─────────────────────┐                                                 │
│  │  Game runs at       │                                                 │
│  │  recommended        │                                                 │
│  │  difficulty         │                                                 │
│  └─────────────────────┘                                                 │
│                                                                           │
└──────────────────────────────────────────────────────────────────────────┘

API Implementation


// api/src/services/adaptive-difficulty.ts

export class AdaptiveDifficultyService {
  async selectDifficulty(
    userId: string,
    gameId: string,
    currentDifficulty: number
  ): Promise<DifficultyRecommendation> {
    // 1. Build state vector from user history
    const state = await this.buildStateVector(userId, gameId);

    // 2. Run ONNX inference (or use heuristic fallback)
    const action = await this.runInference(state);

    // 3. Apply action to get new difficulty
    const newDifficulty = this.applyAction(
      currentDifficulty,
      action,
      state.sessionAccuracy
    );

    // 4. Apply cold-start exploration for new users
    if (state.sessionCount < 5) {
      const exploreRate = this.getExplorationRate(state.sessionCount);
      if (Math.random() < exploreRate) {
        return this.sampleExploratory(newDifficulty);
      }
    }

    return {
      difficulty: newDifficulty,
      action: ACTION_NAMES[action],
      confidence: this.calculateConfidence(state),
    };
  }

  private async buildStateVector(
    userId: string,
    gameId: string
  ): Promise<StateVector> {
    // Query user's construct scores
    const scores = await this.db
      .select()
      .from(userConstructScores)
      .where(eq(userConstructScores.userId, userId));

    // Query recent sessions
    const recentSessions = await this.sessionsRepo
      .getRecentSessions(userId, gameId, 5);

    // Calculate trends and build 9-dimensional state
    return {
      abilityScore: scores[0]?.score ?? 50,
      uncertainty: scores[0]?.uncertainty ?? 25,
      sessionCount: recentSessions.length,
      recentAccuracy: this.calculateRecentAccuracy(recentSessions),
      rtTrend: this.calculateRtTrend(recentSessions),
      dprimeTrend: this.calculateDprimeTrend(recentSessions),
      currentDifficulty: currentDifficulty,
      trialsCompleted: 0,
      sessionAccuracy: 0.5,  // Prior for new session
    };
  }
}

Cold Start Strategy

For new users with limited data, we use an exploration schedule that balances learning about the user with providing a good experience:

Sessions Completed	Exploration Rate	Strategy
0 (first session)	50%	High exploration, start at medium difficulty
1	40%	Still exploring, use initial performance data
2	30%	Beginning to personalize
3	20%	Model predictions becoming reliable
4	10%	Mostly exploitation with occasional exploration
5+	5%	Standard operation, rare exploration

Database Schema Integration

The adaptive system reads from and writes to these key tables:


┌─────────────────────────┐      ┌─────────────────────────┐
│    user_construct_      │      │       sessions          │
│    scores               │      │                         │
├─────────────────────────┤      ├─────────────────────────┤
│ user_id (FK)            │◀────▶│ user_id (FK)            │
│ construct_id (FK)       │      │ game_id (FK)            │
│ score (0-100)           │      │ difficulty_level        │
│ uncertainty             │      │ status                  │
│ updated_at              │      │ started_at / ended_at   │
└─────────────────────────┘      └─────────────────────────┘
                                           │
                                           ▼
                                 ┌─────────────────────────┐
                                 │    session_metrics      │
                                 ├─────────────────────────┤
                                 │ session_id (FK)         │
                                 │ metric_id (FK)          │
                                 │ value                   │
                                 │ (accuracy, rt_ms, etc.) │
                                 └─────────────────────────┘

Model Deployment

ONNX Export Pipeline

The trained PyTorch model is exported to ONNX format for cross-platform inference:


# models/cogace_rl/export/onnx_export.py

def export_to_onnx(model_path: str, output_path: str):
    # Load trained DQN model
    model = DQN.load(model_path)

    # Extract Q-network
    q_net = model.q_net

    # Create dummy input matching state dimensions
    dummy_input = torch.randn(1, 9)  # [batch, state_dim]

    # Export to ONNX with optimization
    torch.onnx.export(
        q_net,
        dummy_input,
        output_path,
        input_names=["state"],
        output_names=["q_values"],
        dynamic_axes={"state": {0: "batch"}, "q_values": {0: "batch"}},
        opset_version=17,
    )

    # Verify exported model
    onnx_model = onnx.load(output_path)
    onnx.checker.check_model(onnx_model)

    # Benchmark inference time
    session = ort.InferenceSession(output_path)
    times = []
    for _ in range(100):
        start = time.perf_counter()
        session.run(None, {"state": dummy_input.numpy()})
        times.append((time.perf_counter() - start) * 1000)

    print(f"Inference: {np.mean(times):.2f}ms ± {np.std(times):.2f}ms")
    # Typical: 0.5ms ± 0.1ms (well under 50ms target)

Edge Deployment Architecture


┌───────────────────────────────────────────────────────────────────┐
│                    DEPLOYMENT ARCHITECTURE                         │
├───────────────────────────────────────────────────────────────────┤
│                                                                    │
│  ┌─────────────────┐                                              │
│  │   Training      │                                              │
│  │   (Python)      │                                              │
│  │                 │                                              │
│  │ Stable-Baselines│                                              │
│  │ PyTorch         │                                              │
│  │ Gymnasium       │                                              │
│  └────────┬────────┘                                              │
│           │                                                        │
│           ▼  Export                                                │
│  ┌─────────────────┐                                              │
│  │   ONNX Model    │                                              │
│  │   (~500KB)      │                                              │
│  └────────┬────────┘                                              │
│           │                                                        │
│           ▼  Upload                                                │
│  ┌─────────────────┐     ┌─────────────────┐                     │
│  │  Cloudflare R2  │────▶│ Cloudflare      │                     │
│  │  (Model Store)  │     │ Workers         │                     │
│  └─────────────────┘     │                 │                     │
│                          │ ONNX Runtime    │◀──── API Requests   │
│                          │ Web Assembly    │                     │
│                          │                 │────▶ Difficulty     │
│                          │ <50ms inference │      Recommendation │
│                          └─────────────────┘                     │
│                                                                    │
│  Benefits:                                                         │
│  • Global edge deployment (300+ locations)                        │
│  • No cold start (Workers always warm)                            │
│  • Sub-50ms total latency                                         │
│  • Scales automatically                                           │
│  • No GPU required for inference                                  │
│                                                                    │
└───────────────────────────────────────────────────────────────────┘

Primary Research Foundation

Zini et al. (2022)

“Adaptive Cognitive Training with Reinforcement Learning”

ACM Transactions on Interactive Intelligent Systems (TiiS)

This foundational paper establishes the framework for using Deep Q-Networks (DQN) to learn optimal difficulty adjustment policies in cognitive training applications. The key contributions include:

•Formulation of adaptive difficulty as a Markov Decision Process (MDP)
•Multi-objective reward function balancing flow zone and engagement
•Simulation-based pre-training with IRT student models
•Demonstrated superiority over rule-based and random baselines

Supporting Research

RL-Tutor (2025)

PPO for Personalized Tutoring

Explores Proximal Policy Optimization with Dynamic Knowledge Tracing for vocabulary learning, achieving 30% improvement in retention.

RL-DKT (2025)

RL + Dynamic Knowledge Tracing

Combines reinforcement learning with knowledge tracing models to optimize exercise sequencing in educational systems.

Game-Specific Difficulty Mapping

Our adaptive system uses a game-agnostic architecture that separates the RL policy (which outputs optimal difficulty levels) from game-specific parameter mapping. This allows any cognitive game to integrate with our adaptive system without requiring game-specific model training.

Architecture Overview


┌─────────────────────────────────────────────────────────────────────────┐
│                    DIFFICULTY MAPPING PIPELINE                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────────┐ │
│  │   DQN Model     │    │   Platform API  │    │   Game Runtime      │ │
│  │                 │    │                 │    │                     │ │
│  │  State → Action │───▶│  Action → Level │───▶│  Level → Params     │ │
│  │  (9 features)   │    │  (0-1 numeric)  │    │  (game-specific)    │ │
│  └─────────────────┘    └─────────────────┘    └─────────────────────┘ │
│         │                       │                        │              │
│         ▼                       ▼                        ▼              │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────────┐ │
│  │ MICRO_ADJUST    │    │ numericValue:   │    │ N-Back:             │ │
│  │ (action=3)      │    │ 0.52            │    │   nBackLevel: 2     │ │
│  │                 │    │                 │    │   gridSize: 3×3     │ │
│  │                 │    │ difficulty:     │    │                     │ │
│  │                 │    │ "medium"        │    │ Stroop:             │ │
│  │                 │    │                 │    │   displayTime: 480ms│ │
│  │                 │    │                 │    │   congruencyRatio:  │ │
│  │                 │    │                 │    │   0.3               │ │
│  └─────────────────┘    └─────────────────┘    └─────────────────────┘ │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Game Task Specification

Each game defines its supported difficulty levels and session parameters in atask-spec.json file:

// task-spec.json
{
  "constructs": [
    { "id": "working_memory", "weight": 0.7 },
    { "id": "processing_speed", "weight": 0.3 }
  ],
  "metrics": [
    { "key": "accuracy", "required": true },
    { "key": "rt_ms_p50", "required": true }
  ],
  "modes": ["training", "assessment"],
  "controls": {
    "difficulty": ["easy", "medium", "hard"],
    "sessionMinutes": [2, 3, 5]
  }
}

Session Start API Response

When a game session starts, the API returns both the discrete level and precise numeric value for games that support fine-grained difficulty:

// POST /api/v1/sessions/start response
{
  "sessionId": "550e8400-e29b-41d4-a716-446655440000",
  "sessionToken": "eyJhbGciOiJIUzI1NiIs...",
  "config": {
    "mode": "training",
    "difficulty": "medium",
    "durationMinutes": 5,
    "adaptive": {
      "numericDifficulty": 0.52,    // Precise 0-1 value
      "action": 3,                   // MICRO_ADJUST
      "confidence": 0.85,            // Model confidence
      "coldStart": false,            // New user flag
      "reason": "model: MICRO_ADJUST (accuracy: 78%, ability: 52)"
    }
  }
}

Example: Game-Specific Parameter Mapping

N-Back Game

Level	N-Back	Grid
easy (0.0-0.3)	1-back	3×3
medium (0.3-0.7)	2-back	3×3
hard (0.7-1.0)	3-back	4×4

Stroop Game

Level	Display	Incongruent
easy	1000ms	20%
medium	500ms	40%
hard	250ms	60%

Benefits of This Architecture

✓
Game-Agnostic Model: One trained DQN works for all games - no per-game model training required
✓
Easy Integration: Games only need to define their difficulty mapping in task-spec.json
✓
Cross-Game Learning: User performance in one game informs difficulty in others via shared construct scores
✓
Fine-Grained Control: Games can use the precise numeric value (0-1) for continuous difficulty scaling

References & Further Reading

Flow Theory: Csikszentmihalyi, M. (1990). Flow: The Psychology of Optimal Experience.
Zone of Proximal Development: Vygotsky, L. S. (1978). Mind in Society.
Item Response Theory: Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists.
RL for Education: Zini, F., et al. (2022). Adaptive Cognitive Training with Reinforcement Learning. ACM TiiS.
Deep Q-Networks: Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature.

Technical Implementation

Our adaptive difficulty system is implemented using:

Training: Python with Stable Baselines3, PyTorch, Gymnasium
Model Export: ONNX format for cross-platform inference
Production Inference: ONNX Runtime in Cloudflare Workers (<50ms latency)
Cold Start: Bayesian priors with exploration schedule for new users

The model is designed for real-time inference, with decisions made in under 50ms to ensure seamless gameplay experience.