Research & Technology

Adaptive Learning Technology

How we use reinforcement learning to keep every user in the optimal learning zone

The Challenge of Personalized Cognitive Training

Cognitive training games face a fundamental challenge: users have vastly different ability levels, and a fixed difficulty that works for one person may be too easy or too hard for another. Too easy, and the brain isn't challenged enough to improve. Too hard, and users become frustrated and quit.

Traditional approaches use simple rule-based systems (e.g., “increase difficulty after 3 correct answers”), but these fail to account for the complex dynamics of human learning and engagement. Our solution leverages deep reinforcement learning to create a truly adaptive system that learns optimal difficulty adjustment strategies.

Theoretical Foundation

Flow Theory & Optimal Challenge

Our system is grounded in Csikszentmihalyi's Flow Theory, which identifies an optimal zone where challenge level matches skill level. In this state, learners experience:

  • Deep concentration and engagement
  • Intrinsic motivation to continue
  • Accelerated skill acquisition
  • Positive emotional experience

The Flow Zone in Cognitive Training


Accuracy < 60%  →  Frustration Zone  →  High dropout risk
Accuracy 65-85% →  FLOW ZONE         →  Optimal learning
Accuracy > 90%  →  Boredom Zone      →  Disengagement risk
                

Zone of Proximal Development

Vygotsky's Zone of Proximal Development (ZPD) suggests that learning is most effective when tasks are just beyond a learner's current independent capability. Our adaptive system continuously estimates each user's ZPD and adjusts difficulty to stay within it.

Item Response Theory (IRT)

We use the 2-Parameter Logistic (2PL) IRT model to model the relationship between user ability and task difficulty:

P(correct) = 1 / (1 + e^(-a(θ - b)))

Where θ is user ability, b is item difficulty, and a is item discrimination

Technical Architecture

Reinforcement Learning Framework

We formulate adaptive difficulty as a Markov Decision Process (MDP) and train a Deep Q-Network (DQN) agent to learn optimal difficulty adjustment policies.

State Space (9 features)

  • ability_score - Estimated cognitive ability
  • uncertainty - Confidence in estimate
  • session_count - User experience level
  • recent_accuracy - 5-session average
  • rt_trend - Response time trajectory
  • dprime_trend - Discriminability trend
  • current_difficulty - Active level
  • trials_completed - Session progress
  • session_accuracy - Current performance

Action Space (4 actions)

  • DECREASE - Reduce difficulty by one level
  • MAINTAIN - Keep current difficulty
  • INCREASE - Raise difficulty by one level
  • MICRO_ADJUST - Fine-tune within level

Reward Function

The reward function balances multiple objectives to optimize both learning effectiveness and user engagement:

ComponentWeightDescription
Flow Zone40%Gaussian reward centered at 75% accuracy
Engagement20%Session completion bonus
Dropout Penalty20%Strong penalty for user abandonment
Improvement10%Reward for ability gains
Response Time10%Optimal pacing incentive

Training Methodology

Simulated Student Population

Real user data is limited and expensive to collect, so we pre-train the model on a diverse population of IRT-based simulated students. These synthetic learners exhibit realistic behaviors including:

  • Variable ability levels (struggling to advanced)
  • Learning over time (ability improves with practice)
  • Fatigue effects (performance degrades in long sessions)
  • Dropout behavior (frustration/boredom leads to quitting)

Student Archetypes

ArchetypeAbilityPersistencePopulation %
Struggling30 ± 870%15%
Developing45 ± 1075%25%
Average50 ± 1280%30%
Proficient65 ± 1085%20%
Advanced80 ± 890%10%

Curriculum Learning

To improve training stability and ensure the model learns to handle all user types, we employ curriculum learning:

  • Phase 1 (0-30%): Focus on average students, include minimum 10% struggling
  • Phase 2 (30-70%): Increase struggling student proportion to 20% for focused learning
  • Phase 3 (70-100%): Full population distribution with all archetypes

DQN Hyperparameters

{
  "algorithm": "DQN (Deep Q-Network)",
  "framework": "Stable Baselines3",
  "learning_rate": 1e-4,
  "buffer_size": 50000,
  "batch_size": 64,
  "gamma": 0.95,
  "exploration_fraction": 0.2,
  "exploration_final_eps": 0.05,
  "network_architecture": [128, 64, 32],
  "total_training_steps": 500000
}

Training Results

After extensive experimentation with reward function tuning, dropout model improvements, and curriculum learning, our final model achieves exceptional performance:

89%
Flow Zone Rate
Target: ≥65%
7%
Dropout Rate
Target: ≤20%
75.6%
Mean Accuracy
Target: 70-85%

Per-Archetype Performance

A key achievement is the model's ability to serve all user types effectively, including the challenging “struggling” archetype:

User TypeBaseline ModelFinal ModelImprovement
Struggling5%75%+70% (15× better)
Developing40%90%+50%
Average65%95%+30%
Proficient65%95%+30%
Advanced70%90%+20%

Learned Policy

The trained agent learned an effective difficulty adjustment strategy:

  • MICRO_ADJUST (63%): Primarily uses fine-grained adjustments within difficulty levels
  • DECREASE (28%): Frequently lowers difficulty when users struggle
  • INCREASE (7%): Conservatively raises difficulty only when confident
  • MAINTAIN (1%): Rarely holds steady, preferring active optimization

Significance & Impact

For Cognitive Science

Our adaptive system addresses a longstanding challenge in cognitive training research: maintaining optimal challenge levels across diverse populations. By keeping 89% of sessions in the flow zone, we maximize the conditions known to promote neuroplasticity and skill transfer.

For Struggling Learners

Perhaps the most significant impact is on users who need help most. Traditional fixed-difficulty systems leave struggling learners behind (only 5% flow zone rate). Our system achieves 75% flow zone rate for this group, making effective cognitive training accessible to:

  • Individuals with attention deficits (ADHD)
  • Those with learning disabilities
  • Older adults experiencing cognitive decline
  • Patients in cognitive rehabilitation

For User Engagement

Reducing dropout from 80% to 7% has profound implications for training effectiveness. Cognitive training requires sustained practice over weeks to produce measurable benefits. With our system:

MetricBeforeAfterImpact
Complete 20+ sessions~1%~23%23× more users get full benefit
Avg. session length~15 trials~55 trials3.7× more training per session
User satisfactionLow (frustration)High (flow state)Better retention & referrals

For Cognitive Assessment

Adaptive difficulty also enables more accurate cognitive profiling. Fixed difficulty tests suffer from ceiling and floor effects—high-ability users max out while low-ability users bottom out, providing limited discrimination. Our adaptive system:

  • Finds each user's true ability level through calibrated difficulty
  • Provides more precise construct score estimates
  • Enables tracking of ability changes over time
  • Reduces measurement error in cognitive profiles

Neural Network Architecture

Our adaptive difficulty agent uses a Deep Q-Network (DQN), a neural network that learns to estimate the expected cumulative reward (Q-value) for each possible action given the current state.

Network Structure


┌─────────────────────────────────────────────────────────────┐
│                    DQN ARCHITECTURE                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   INPUT LAYER (9 neurons)                                    │
│   ┌─────────────────────────────────────────────────────┐   │
│   │ ability_score │ uncertainty │ session_count │ ...   │   │
│   └─────────────────────────────────────────────────────┘   │
│                          ↓                                   │
│   HIDDEN LAYER 1 (128 neurons) + ReLU activation            │
│   ┌─────────────────────────────────────────────────────┐   │
│   │ ████████████████████████████████████████████████████│   │
│   └─────────────────────────────────────────────────────┘   │
│                          ↓                                   │
│   HIDDEN LAYER 2 (64 neurons) + ReLU activation             │
│   ┌───────────────────────────────────┐                     │
│   │ ██████████████████████████████████│                     │
│   └───────────────────────────────────┘                     │
│                          ↓                                   │
│   HIDDEN LAYER 3 (32 neurons) + ReLU activation             │
│   ┌───────────────────┐                                     │
│   │ ██████████████████│                                     │
│   └───────────────────┘                                     │
│                          ↓                                   │
│   OUTPUT LAYER (4 neurons) - Q-values for each action       │
│   ┌────────┬────────┬────────┬────────┐                     │
│   │DECREASE│MAINTAIN│INCREASE│ MICRO  │                     │
│   │ Q=-0.2 │ Q=0.1  │ Q=-0.5 │ Q=0.8  │  ← Select max       │
│   └────────┴────────┴────────┴────────┘                     │
│                                                              │
│   Total Parameters: ~15,000 (lightweight for edge inference) │
└─────────────────────────────────────────────────────────────┘
                

Input Feature Engineering

Each input feature is carefully normalized to ensure stable training:

FeatureRaw RangeNormalizedSource
ability_score0-100[0, 1]Bayesian ability estimate from user_construct_scores
uncertainty0-50[0, 1]Standard deviation of ability estimate
session_count0-100+[0, 1]User's total completed sessions
recent_accuracy0-1[0, 1]Rolling average of last 5 sessions
rt_trendslope[-1, 1]Response time trajectory (faster/slower)
dprime_trendslope[-1, 1]Signal detection improvement rate
current_difficulty0-1[0, 1]Active difficulty level (0=easiest, 1=hardest)
trials_completed0-100[0, 1]Progress within current session
session_accuracy0-1[0, 1]Current session's accuracy so far

Why DQN Over Other Algorithms?

DQN Advantages

  • Sample efficient - Experience replay reuses past data
  • Off-policy - Can learn from historical sessions
  • Fast inference - Single forward pass (<5ms)
  • Discrete actions - Natural fit for difficulty levels
  • Stable training - Target network prevents oscillation

PPO Trade-offs

  • ✗ On-policy only - cannot use historical data
  • ✗ Requires more samples for convergence
  • ✗ Policy sampling adds inference latency
  • ○ Better for continuous action spaces
  • ○ More stable with complex reward landscapes

Training Pipeline

Our training pipeline uses a two-phase approach: pre-training on simulated students followed by fine-tuning on real user data.

Phase 1: Simulated Pre-Training


┌─────────────────────────────────────────────────────────────────────┐
│                    TRAINING DATA FLOW                                │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐        │
│  │  Student     │     │  Gymnasium   │     │    DQN       │        │
│  │  Population  │────▶│  Environment │────▶│    Agent     │        │
│  │  (IRT-based) │     │              │     │              │        │
│  └──────────────┘     └──────────────┘     └──────────────┘        │
│         │                    │                    │                  │
│         │                    │                    │                  │
│         ▼                    ▼                    ▼                  │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐        │
│  │ Sample       │     │ state, reward│     │ Experience   │        │
│  │ student with │     │ action, done │     │ Replay       │        │
│  │ archetype    │     │              │     │ Buffer       │        │
│  └──────────────┘     └──────────────┘     └──────────────┘        │
│                                                   │                  │
│                                                   ▼                  │
│                              ┌────────────────────────────┐         │
│                              │  Mini-batch Gradient       │         │
│                              │  Descent (Adam optimizer)  │         │
│                              │  Loss = (Q - target_Q)²    │         │
│                              └────────────────────────────┘         │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘
                

IRT-Based Student Simulation

Each simulated student uses a 2-Parameter Logistic IRT model to generate realistic responses:


class IRTStudent:
    def probability_correct(self, difficulty: float) -> float:
        """2PL IRT model for response probability."""
        theta = (self.ability - 50) / 15  # Convert to IRT scale
        b = difficulty * 6 - 3            # Map [0,1] to [-3, 3]
        a = 1.0 + difficulty * 0.5        # Discrimination parameter

        # Apply fatigue effect (performance drops over time)
        fatigue = self.trials_completed * 0.001
        effective_theta = theta - fatigue

        # 2PL IRT formula
        logit = a * (effective_theta - b)
        return 1 / (1 + exp(-logit))

    def check_dropout(self) -> bool:
        """Simulate user dropout based on frustration/boredom."""
        if self.accuracy < 0.3:      # Too hard
            dropout_prob = 0.015 * (1.2 - self.persistence)
        elif self.accuracy > 0.95:   # Too easy
            dropout_prob = 0.010 * (1.2 - self.persistence)
        elif 0.65 <= self.accuracy <= 0.85:  # Flow zone
            dropout_prob = 0.002     # Very low dropout
        else:
            dropout_prob = 0.008 * (1.2 - self.persistence)

        return random() < dropout_prob
              

Curriculum Learning Strategy

We use curriculum learning to improve training stability and ensure the model learns to handle all user types:

0-30%
Focus: Average + 10% Struggling
30-70%
Emphasis: 20% Struggling
70-100%
Full Distribution
Struggling Developing Average Proficient Advanced

Phase 2: Real Data Fine-Tuning

After pre-training, we fine-tune on real session data extracted from our database:


-- Extract training data from Neon PostgreSQL
SELECT
  s.id as session_id,
  s.user_id,
  ucs.score as ability_score,
  ucs.uncertainty,
  COUNT(*) OVER (PARTITION BY s.user_id) as session_count,
  sm.value as metric_value,
  s.difficulty_level,
  s.status
FROM sessions s
JOIN session_metrics sm ON s.id = sm.session_id
JOIN user_construct_scores ucs ON s.user_id = ucs.user_id
WHERE s.ended_at IS NOT NULL
  AND s.status IN ('completed', 'abandoned')
ORDER BY s.started_at;
              

Platform Integration

The adaptive difficulty system integrates seamlessly with the Cog-Ace platform through our Cloudflare Workers API.

Request Flow


┌──────────────────────────────────────────────────────────────────────────┐
│                     ADAPTIVE DIFFICULTY REQUEST FLOW                      │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  ┌─────────┐      ┌─────────────────┐      ┌─────────────────────────┐  │
│  │  Game   │─────▶│  POST /api/v1/  │─────▶│  AdaptiveDifficulty     │  │
│  │ Client  │      │  sessions/start │      │  Service                │  │
│  └─────────┘      └─────────────────┘      └─────────────────────────┘  │
│       │                                              │                    │
│       │                                              ▼                    │
│       │                                    ┌─────────────────────┐       │
│       │                                    │  1. Build State     │       │
│       │                                    │     from user data  │       │
│       │                                    └─────────────────────┘       │
│       │                                              │                    │
│       │                                              ▼                    │
│       │                                    ┌─────────────────────┐       │
│       │                                    │  2. Query Database  │       │
│       │                                    │  - user_construct_  │       │
│       │                                    │    scores           │       │
│       │                                    │  - recent sessions  │       │
│       │                                    └─────────────────────┘       │
│       │                                              │                    │
│       │                                              ▼                    │
│       │                                    ┌─────────────────────┐       │
│       │                                    │  3. ONNX Inference  │       │
│       │                                    │     (<50ms)         │       │
│       │                                    └─────────────────────┘       │
│       │                                              │                    │
│       │                                              ▼                    │
│       │                                    ┌─────────────────────┐       │
│       │                                    │  4. Select Action   │       │
│       │                                    │  DECREASE/MAINTAIN/ │       │
│       │                                    │  INCREASE/MICRO     │       │
│       │                                    └─────────────────────┘       │
│       │                                              │                    │
│       │         ┌────────────────────────────────────┘                   │
│       │         ▼                                                         │
│       │  ┌─────────────────────────────────────────────────────────┐    │
│       │  │  Response: { sessionId, difficulty: 0.65, action: ... } │    │
│       │  └─────────────────────────────────────────────────────────┘    │
│       │         │                                                         │
│       ▼         ▼                                                         │
│  ┌─────────────────────┐                                                 │
│  │  Game runs at       │                                                 │
│  │  recommended        │                                                 │
│  │  difficulty         │                                                 │
│  └─────────────────────┘                                                 │
│                                                                           │
└──────────────────────────────────────────────────────────────────────────┘
                

API Implementation


// api/src/services/adaptive-difficulty.ts

export class AdaptiveDifficultyService {
  async selectDifficulty(
    userId: string,
    gameId: string,
    currentDifficulty: number
  ): Promise<DifficultyRecommendation> {
    // 1. Build state vector from user history
    const state = await this.buildStateVector(userId, gameId);

    // 2. Run ONNX inference (or use heuristic fallback)
    const action = await this.runInference(state);

    // 3. Apply action to get new difficulty
    const newDifficulty = this.applyAction(
      currentDifficulty,
      action,
      state.sessionAccuracy
    );

    // 4. Apply cold-start exploration for new users
    if (state.sessionCount < 5) {
      const exploreRate = this.getExplorationRate(state.sessionCount);
      if (Math.random() < exploreRate) {
        return this.sampleExploratory(newDifficulty);
      }
    }

    return {
      difficulty: newDifficulty,
      action: ACTION_NAMES[action],
      confidence: this.calculateConfidence(state),
    };
  }

  private async buildStateVector(
    userId: string,
    gameId: string
  ): Promise<StateVector> {
    // Query user's construct scores
    const scores = await this.db
      .select()
      .from(userConstructScores)
      .where(eq(userConstructScores.userId, userId));

    // Query recent sessions
    const recentSessions = await this.sessionsRepo
      .getRecentSessions(userId, gameId, 5);

    // Calculate trends and build 9-dimensional state
    return {
      abilityScore: scores[0]?.score ?? 50,
      uncertainty: scores[0]?.uncertainty ?? 25,
      sessionCount: recentSessions.length,
      recentAccuracy: this.calculateRecentAccuracy(recentSessions),
      rtTrend: this.calculateRtTrend(recentSessions),
      dprimeTrend: this.calculateDprimeTrend(recentSessions),
      currentDifficulty: currentDifficulty,
      trialsCompleted: 0,
      sessionAccuracy: 0.5,  // Prior for new session
    };
  }
}
              

Cold Start Strategy

For new users with limited data, we use an exploration schedule that balances learning about the user with providing a good experience:

Sessions CompletedExploration RateStrategy
0 (first session)50%High exploration, start at medium difficulty
140%Still exploring, use initial performance data
230%Beginning to personalize
320%Model predictions becoming reliable
410%Mostly exploitation with occasional exploration
5+5%Standard operation, rare exploration

Database Schema Integration

The adaptive system reads from and writes to these key tables:


┌─────────────────────────┐      ┌─────────────────────────┐
│    user_construct_      │      │       sessions          │
│    scores               │      │                         │
├─────────────────────────┤      ├─────────────────────────┤
│ user_id (FK)            │◀────▶│ user_id (FK)            │
│ construct_id (FK)       │      │ game_id (FK)            │
│ score (0-100)           │      │ difficulty_level        │
│ uncertainty             │      │ status                  │
│ updated_at              │      │ started_at / ended_at   │
└─────────────────────────┘      └─────────────────────────┘
                                           │
                                           ▼
                                 ┌─────────────────────────┐
                                 │    session_metrics      │
                                 ├─────────────────────────┤
                                 │ session_id (FK)         │
                                 │ metric_id (FK)          │
                                 │ value                   │
                                 │ (accuracy, rt_ms, etc.) │
                                 └─────────────────────────┘
              

Model Deployment

ONNX Export Pipeline

The trained PyTorch model is exported to ONNX format for cross-platform inference:


# models/cogace_rl/export/onnx_export.py

def export_to_onnx(model_path: str, output_path: str):
    # Load trained DQN model
    model = DQN.load(model_path)

    # Extract Q-network
    q_net = model.q_net

    # Create dummy input matching state dimensions
    dummy_input = torch.randn(1, 9)  # [batch, state_dim]

    # Export to ONNX with optimization
    torch.onnx.export(
        q_net,
        dummy_input,
        output_path,
        input_names=["state"],
        output_names=["q_values"],
        dynamic_axes={"state": {0: "batch"}, "q_values": {0: "batch"}},
        opset_version=17,
    )

    # Verify exported model
    onnx_model = onnx.load(output_path)
    onnx.checker.check_model(onnx_model)

    # Benchmark inference time
    session = ort.InferenceSession(output_path)
    times = []
    for _ in range(100):
        start = time.perf_counter()
        session.run(None, {"state": dummy_input.numpy()})
        times.append((time.perf_counter() - start) * 1000)

    print(f"Inference: {np.mean(times):.2f}ms ± {np.std(times):.2f}ms")
    # Typical: 0.5ms ± 0.1ms (well under 50ms target)
              

Edge Deployment Architecture


┌───────────────────────────────────────────────────────────────────┐
│                    DEPLOYMENT ARCHITECTURE                         │
├───────────────────────────────────────────────────────────────────┤
│                                                                    │
│  ┌─────────────────┐                                              │
│  │   Training      │                                              │
│  │   (Python)      │                                              │
│  │                 │                                              │
│  │ Stable-Baselines│                                              │
│  │ PyTorch         │                                              │
│  │ Gymnasium       │                                              │
│  └────────┬────────┘                                              │
│           │                                                        │
│           ▼  Export                                                │
│  ┌─────────────────┐                                              │
│  │   ONNX Model    │                                              │
│  │   (~500KB)      │                                              │
│  └────────┬────────┘                                              │
│           │                                                        │
│           ▼  Upload                                                │
│  ┌─────────────────┐     ┌─────────────────┐                     │
│  │  Cloudflare R2  │────▶│ Cloudflare      │                     │
│  │  (Model Store)  │     │ Workers         │                     │
│  └─────────────────┘     │                 │                     │
│                          │ ONNX Runtime    │◀──── API Requests   │
│                          │ Web Assembly    │                     │
│                          │                 │────▶ Difficulty     │
│                          │ <50ms inference │      Recommendation │
│                          └─────────────────┘                     │
│                                                                    │
│  Benefits:                                                         │
│  • Global edge deployment (300+ locations)                        │
│  • No cold start (Workers always warm)                            │
│  • Sub-50ms total latency                                         │
│  • Scales automatically                                           │
│  • No GPU required for inference                                  │
│                                                                    │
└───────────────────────────────────────────────────────────────────┘
                

Primary Research Foundation

Zini et al. (2022)

“Adaptive Cognitive Training with Reinforcement Learning”

ACM Transactions on Interactive Intelligent Systems (TiiS)

This foundational paper establishes the framework for using Deep Q-Networks (DQN) to learn optimal difficulty adjustment policies in cognitive training applications. The key contributions include:

  • Formulation of adaptive difficulty as a Markov Decision Process (MDP)
  • Multi-objective reward function balancing flow zone and engagement
  • Simulation-based pre-training with IRT student models
  • Demonstrated superiority over rule-based and random baselines

Supporting Research

RL-Tutor (2025)

PPO for Personalized Tutoring

Explores Proximal Policy Optimization with Dynamic Knowledge Tracing for vocabulary learning, achieving 30% improvement in retention.

RL-DKT (2025)

RL + Dynamic Knowledge Tracing

Combines reinforcement learning with knowledge tracing models to optimize exercise sequencing in educational systems.

Game-Specific Difficulty Mapping

Our adaptive system uses a game-agnostic architecture that separates the RL policy (which outputs optimal difficulty levels) from game-specific parameter mapping. This allows any cognitive game to integrate with our adaptive system without requiring game-specific model training.

Architecture Overview


┌─────────────────────────────────────────────────────────────────────────┐
│                    DIFFICULTY MAPPING PIPELINE                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────────┐ │
│  │   DQN Model     │    │   Platform API  │    │   Game Runtime      │ │
│  │                 │    │                 │    │                     │ │
│  │  State → Action │───▶│  Action → Level │───▶│  Level → Params     │ │
│  │  (9 features)   │    │  (0-1 numeric)  │    │  (game-specific)    │ │
│  └─────────────────┘    └─────────────────┘    └─────────────────────┘ │
│         │                       │                        │              │
│         ▼                       ▼                        ▼              │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────────┐ │
│  │ MICRO_ADJUST    │    │ numericValue:   │    │ N-Back:             │ │
│  │ (action=3)      │    │ 0.52            │    │   nBackLevel: 2     │ │
│  │                 │    │                 │    │   gridSize: 3×3     │ │
│  │                 │    │ difficulty:     │    │                     │ │
│  │                 │    │ "medium"        │    │ Stroop:             │ │
│  │                 │    │                 │    │   displayTime: 480ms│ │
│  │                 │    │                 │    │   congruencyRatio:  │ │
│  │                 │    │                 │    │   0.3               │ │
│  └─────────────────┘    └─────────────────┘    └─────────────────────┘ │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
                

Game Task Specification

Each game defines its supported difficulty levels and session parameters in atask-spec.json file:

// task-spec.json
{
  "constructs": [
    { "id": "working_memory", "weight": 0.7 },
    { "id": "processing_speed", "weight": 0.3 }
  ],
  "metrics": [
    { "key": "accuracy", "required": true },
    { "key": "rt_ms_p50", "required": true }
  ],
  "modes": ["training", "assessment"],
  "controls": {
    "difficulty": ["easy", "medium", "hard"],
    "sessionMinutes": [2, 3, 5]
  }
}

Session Start API Response

When a game session starts, the API returns both the discrete level and precise numeric value for games that support fine-grained difficulty:

// POST /api/v1/sessions/start response
{
  "sessionId": "550e8400-e29b-41d4-a716-446655440000",
  "sessionToken": "eyJhbGciOiJIUzI1NiIs...",
  "config": {
    "mode": "training",
    "difficulty": "medium",
    "durationMinutes": 5,
    "adaptive": {
      "numericDifficulty": 0.52,    // Precise 0-1 value
      "action": 3,                   // MICRO_ADJUST
      "confidence": 0.85,            // Model confidence
      "coldStart": false,            // New user flag
      "reason": "model: MICRO_ADJUST (accuracy: 78%, ability: 52)"
    }
  }
}

Example: Game-Specific Parameter Mapping

N-Back Game

LevelN-BackGrid
easy (0.0-0.3)1-back3×3
medium (0.3-0.7)2-back3×3
hard (0.7-1.0)3-back4×4

Stroop Game

LevelDisplayIncongruent
easy1000ms20%
medium500ms40%
hard250ms60%

Benefits of This Architecture

  • Game-Agnostic Model: One trained DQN works for all games - no per-game model training required
  • Easy Integration: Games only need to define their difficulty mapping in task-spec.json
  • Cross-Game Learning: User performance in one game informs difficulty in others via shared construct scores
  • Fine-Grained Control: Games can use the precise numeric value (0-1) for continuous difficulty scaling

References & Further Reading

  • Flow Theory: Csikszentmihalyi, M. (1990). Flow: The Psychology of Optimal Experience.
  • Zone of Proximal Development: Vygotsky, L. S. (1978). Mind in Society.
  • Item Response Theory: Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists.
  • RL for Education: Zini, F., et al. (2022). Adaptive Cognitive Training with Reinforcement Learning. ACM TiiS.
  • Deep Q-Networks: Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature.

Technical Implementation

Our adaptive difficulty system is implemented using:

  • Training: Python with Stable Baselines3, PyTorch, Gymnasium
  • Model Export: ONNX format for cross-platform inference
  • Production Inference: ONNX Runtime in Cloudflare Workers (<50ms latency)
  • Cold Start: Bayesian priors with exploration schedule for new users

The model is designed for real-time inference, with decisions made in under 50ms to ensure seamless gameplay experience.

Academic Research Comparison

How does our DQN-based approach compare to other adaptive learning methods like ZPDES (Zone of Proximal Development and Empirical Success)? We provide a detailed comparison with the latest academic research including mathematical formulas and algorithmic differences.

View Research Comparison →