Problem-solving mode

Every topic becomes a question you can answer under pressure.

The goal is to build agent design reflexes: name the failure mode, choose the minimal architecture, define tools and memory, then verify with an eval.

Agentic AI MOOC Fall 2025 - video 01

Agentic AI safety and security

Open lesson

Agents can take actions, so prompt injection, tool misuse, memory poisoning, and privilege escalation become operational risks.

Drill 1Threat-model a browser agent.
  1. List assets: accounts, data, tools, money movement.
  2. List attack inputs: pages, emails, files, memory.
  3. Add permission boundaries and confirmation gates.
Drill 2What is indirect prompt injection?
  1. A malicious external document instructs the agent to ignore policy or exfiltrate data.
  2. Defense: treat tool outputs as untrusted data.

Agentic AI MOOC Fall 2025 - video 02

Autonomous embodied agents

Open lesson

Embodied agents must act in environments where observations are delayed, partial, and physically grounded.

Drill 1Map an LLM web agent to an embodied agent.
  1. Observation becomes sensory input.
  2. Tool call becomes motor/action command.
  3. Verifier becomes environment reward or safety monitor.
Drill 2Why is simulation useful?
  1. Cheap exploration.
  2. Safe failure.
  3. Curriculum construction.

Agentic AI MOOC Fall 2025 - video 03

LLM-era multi-agent systems

Open lesson

Classic multi-agent systems assumed explicit protocols; LLM agents communicate in flexible language but become harder to verify.

Drill 1Design a multi-agent debate that is not theater.
  1. Assign different evidence sources.
  2. Require citations or tests.
  3. Use a judge rubric based on outcome.
Drill 2Convert loose chat to protocol.
  1. Define message schema.
  2. Define authority.
  3. Define merge rule.

Agentic AI MOOC Fall 2025 - video 04

Deploying real-world agents

Open lesson

Real users expose edge cases that scripted demos never touch.

Drill 1Design a customer-support agent eval.
  1. Create happy path, ambiguity, hostile input, and tool outage cases.
  2. Score resolution and policy compliance.
  3. Replay failures into regression tests.
Drill 2Where should humans enter the loop?
  1. High-risk actions.
  2. Low confidence.
  3. User dispute.
  4. Policy ambiguity.

Agentic AI MOOC Fall 2025 - video 05

AI agents for science

Open lesson

Scientific discovery is a pipeline of literature search, hypothesis generation, experiment design, analysis, and iteration.

Drill 1Design a paper-to-agent workflow.
  1. Extract claims, methods, data, limitations.
  2. Expose search and calculation tools.
  3. Add citation-backed answers and uncertainty.
Drill 2What must a science agent verify?
  1. Dataset provenance.
  2. Experimental feasibility.
  3. Statistical validity.
  4. Safety constraints.

Agentic AI MOOC Fall 2025 - video 06

Benchmark noise and evaluation

Open lesson

A model looks better or worse because the benchmark is noisy, not because the agent improved.

Drill 1Design an eval report for an agent.
  1. Include success rate, confidence interval, cost, latency, and retries.
  2. Show failure categories.
  3. Report what changed between runs.
Drill 2Why can leaderboard scores mislead?
  1. Benchmarks leak.
  2. Tasks become saturated.
  3. Scaffolds differ.
  4. Budgets differ.

Agentic AI MOOC Fall 2025 - video 07

Multi-agent AI

Open lesson

One model context is too narrow for broad, parallel, open-ended work.

Drill 1Split a market research task into subagents.
  1. Segment by question, region, or source type.
  2. Give each subagent a deliverable schema.
  3. Merge with source-quality checks.
Drill 2When should you avoid multi-agent design?
  1. When tasks are tightly coupled.
  2. When shared context is essential.
  3. When the extra token cost exceeds value.

Agentic AI MOOC Fall 2025 - video 08

Training agentic models

Open lesson

Base and chat models know language, but agentic work needs persistence, tool discipline, and recovery from failure.

Drill 1Create an agent training curriculum.
  1. Start with single-tool tasks.
  2. Add noisy observations and retries.
  3. End with long-horizon tasks and hidden state.
Drill 2Diagnose overfitting to benchmark format.
  1. Change surface wording.
  2. Mutate environment state.
  3. Check if success survives changed tools.

Agentic AI MOOC Fall 2025 - video 09

Post-training verifiable agents

Open lesson

Agents need training signals for long tasks, but many useful tasks do not have obvious step-by-step labels.

Drill 1Turn a vague task into a verifiable agent task.
  1. Define the end artifact.
  2. Write a checker or rubric.
  3. Add adversarial cases where shortcut behavior fails.
Drill 2Why can verifiers beat preference labels?
  1. They are cheaper at scale.
  2. They reduce subjective grading.
  3. They support self-improvement loops.

Agentic AI MOOC Fall 2025 - video 10

Agent system design evolution

Open lesson

Agent prototypes work in demos but fail when state, tools, latency, retries, and deployment versions interact.

Drill 1Design a durable run record for an agent.
  1. Record goal, plan, tool calls, observations, checkpoints, and final evidence.
  2. Make each step resumable.
  3. Attach cost and latency budgets.
Drill 2Find where a demo agent breaks in production.
  1. Look for missing idempotency.
  2. Look for unbounded tool loops.
  3. Look for impossible rollback.

Agentic AI MOOC Fall 2025 - video 11

LLM agent foundations

Open lesson

A model can answer a prompt, but an agent must decide what to do next, which tools to use, and when to stop.

Drill 1Design a minimal research agent loop.
  1. State the user goal as an outcome.
  2. Define tools with narrow schemas.
  3. Add a stopping rule and final evidence check.
Drill 2When is a workflow better than an agent?
  1. Use a workflow when the path is known.
  2. Use an agent when the path depends on intermediate observations.
  3. Prefer the simpler system until dynamic control is necessary.

Advanced LLM Agents MOOC Spring 2025 - video 01

Safe and secure agentic AI

Open lesson

As agents gain tools and memory, security is no longer a prompt add-on; it is architecture.

Drill 1Design privilege separation for an agent.
  1. Classify tools by risk.
  2. Separate read and write authority.
  3. Require approvals for high-risk transitions.
Drill 2Red-team an agent memory system.
  1. Insert malicious memory.
  2. Trigger retrieval.
  3. Check whether the agent follows poisoned instruction.

Advanced LLM Agents MOOC Spring 2025 - video 02

Abstraction and discovery

Open lesson

Agents should not only solve one task; they should discover reusable abstractions that compress future tasks.

Drill 1Build a concept-library agent.
  1. Observe repeated solutions.
  2. Propose abstraction.
  3. Test on held-out tasks.
  4. Promote only if it helps.
Drill 2When is abstraction harmful?
  1. Premature naming.
  2. Leaky concept.
  3. No measurable reuse.

Advanced LLM Agents MOOC Spring 2025 - video 03

Informal plus formal math reasoning

Open lesson

A complete proof often needs informal planning before formal verification.

Drill 1Convert an informal proof sketch to a formal plan.
  1. List lemmas.
  2. Map each lemma to library search.
  3. Define tactic sequence.
  4. Check and repair.
Drill 2Why use sketches?
  1. They reduce search space.
  2. They preserve human-level strategy.
  3. They guide formal tactics.

Advanced LLM Agents MOOC Spring 2025 - video 04

Autoformalization and theorem proving

Open lesson

Human math is informal; proof assistants require exact formal statements and tactics.

Drill 1Design an autoformalization pipeline.
  1. Parse informal statement.
  2. Retrieve similar formal theorems.
  3. Draft formal statement.
  4. Check types.
  5. Repair.
Drill 2Where does retrieval help Lean agents?
  1. Find theorem names.
  2. Find tactic patterns.
  3. Find library definitions.

Advanced LLM Agents MOOC Spring 2025 - video 05

AlphaProof and formal math

Open lesson

Natural-language math reasoning is fragile; formal systems can verify proofs but are hard to search.

Drill 1Explain why theorem proving is agentic.
  1. There is a state: proof context.
  2. There are actions: tactics.
  3. There is feedback: verifier accepts or rejects.
Drill 2What makes math a good RL domain?
  1. Formal reward.
  2. Huge search space.
  3. Reusable libraries.

Advanced LLM Agents MOOC Spring 2025 - video 06

Perception to action

Open lesson

Computer-use agents must operate across real operating systems, not only benchmark websites.

Drill 1Design a GUI agent safety layer.
  1. Limit destructive actions.
  2. Require confirmation for irreversible changes.
  3. Log screen/action pairs.
Drill 2Measure a computer-use agent.
  1. Task success.
  2. Action count.
  3. Recovery from misclick.
  4. Human intervention rate.

Advanced LLM Agents MOOC Spring 2025 - video 07

Multimodal autonomous agents

Open lesson

Web and GUI tasks require seeing layout, reading text, choosing actions, and recovering from UI changes.

Drill 1Turn a web task into agent state.
  1. Goal.
  2. Current page observation.
  3. Available actions.
  4. Memory.
  5. Success check.
Drill 2Why do web agents fail?
  1. Wrong element.
  2. Hidden state.
  3. Long horizon.
  4. Ambiguous success.

Advanced LLM Agents MOOC Spring 2025 - video 08

Code agents and vulnerability detection

Open lesson

Security bugs hide across files, execution paths, and tool outputs; static prompting misses them.

Drill 1Design a vulnerability-finding agent.
  1. Index code.
  2. Generate threat hypotheses.
  3. Trace data flow.
  4. Run targeted tests.
  5. Report evidence.
Drill 2What makes a security finding useful?
  1. Repro steps.
  2. Impact.
  3. Affected path.
  4. Minimal fix.
  5. Regression test.

Advanced LLM Agents MOOC Spring 2025 - video 09

Open training recipes

Open lesson

Open models need reproducible paths to reasoning without secret proprietary data.

Drill 1Audit an open reasoning recipe.
  1. List datasets.
  2. List filtering rules.
  3. List eval contamination risks.
  4. Check ablations.
Drill 2Build a small post-training plan.
  1. Define task family.
  2. Generate data.
  3. Filter with verifiers.
  4. Evaluate on held-out mutations.

Advanced LLM Agents MOOC Spring 2025 - video 10

Memory and planning

Open lesson

Agents forget, repeat work, or plan against a false model of the environment.

Drill 1Design memory for a coding agent.
  1. Store repo map, user constraints, test results, decisions.
  2. Retrieve by task and file path.
  3. Expire stale assumptions.
Drill 2When does memory hurt?
  1. Poisoned memories.
  2. Over-retrieval.
  3. Outdated facts.

Advanced LLM Agents MOOC Spring 2025 - video 11

Learning to reason with LLMs

Open lesson

Reasoning behavior must be trained or elicited without simply teaching the model to produce longer text.

Drill 1Create reasoning training examples.
  1. Include hard negatives.
  2. Include verifier feedback.
  3. Avoid rewarding verbosity alone.
Drill 2Diagnose fake reasoning.
  1. Remove the chain and test answer.
  2. Mutate numbers.
  3. Ask for independent verification.

Advanced LLM Agents MOOC Spring 2025 - video 12

Inference-time reasoning

Open lesson

Some tasks need search at inference time because one sampled chain is fragile.

Drill 1Choose an inference-time technique.
  1. If answer can be tested, use generate-test-repair.
  2. If many paths exist, use search.
  3. If prompt is variable, use optimization.
Drill 2Why self-correction often fails?
  1. The model may not see the error.
  2. The critic may share the same blind spot.
  3. External verifiers fix this.