Problem-solving mode
Every topic becomes a question you can answer under pressure.
The goal is to build agent design reflexes: name the failure mode, choose the minimal architecture, define tools and memory, then verify with an eval.
Agentic AI MOOC Fall 2025 - video 01
Agentic AI safety and security
Agents can take actions, so prompt injection, tool misuse, memory poisoning, and privilege escalation become operational risks.
Drill 1Threat-model a browser agent.
- List assets: accounts, data, tools, money movement.
- List attack inputs: pages, emails, files, memory.
- Add permission boundaries and confirmation gates.
Drill 2What is indirect prompt injection?
- A malicious external document instructs the agent to ignore policy or exfiltrate data.
- Defense: treat tool outputs as untrusted data.
Agentic AI MOOC Fall 2025 - video 02
Autonomous embodied agents
Embodied agents must act in environments where observations are delayed, partial, and physically grounded.
Drill 1Map an LLM web agent to an embodied agent.
- Observation becomes sensory input.
- Tool call becomes motor/action command.
- Verifier becomes environment reward or safety monitor.
Drill 2Why is simulation useful?
- Cheap exploration.
- Safe failure.
- Curriculum construction.
Agentic AI MOOC Fall 2025 - video 03
LLM-era multi-agent systems
Classic multi-agent systems assumed explicit protocols; LLM agents communicate in flexible language but become harder to verify.
Drill 1Design a multi-agent debate that is not theater.
- Assign different evidence sources.
- Require citations or tests.
- Use a judge rubric based on outcome.
Drill 2Convert loose chat to protocol.
- Define message schema.
- Define authority.
- Define merge rule.
Agentic AI MOOC Fall 2025 - video 04
Deploying real-world agents
Real users expose edge cases that scripted demos never touch.
Drill 1Design a customer-support agent eval.
- Create happy path, ambiguity, hostile input, and tool outage cases.
- Score resolution and policy compliance.
- Replay failures into regression tests.
Drill 2Where should humans enter the loop?
- High-risk actions.
- Low confidence.
- User dispute.
- Policy ambiguity.
Agentic AI MOOC Fall 2025 - video 05
AI agents for science
Scientific discovery is a pipeline of literature search, hypothesis generation, experiment design, analysis, and iteration.
Drill 1Design a paper-to-agent workflow.
- Extract claims, methods, data, limitations.
- Expose search and calculation tools.
- Add citation-backed answers and uncertainty.
Drill 2What must a science agent verify?
- Dataset provenance.
- Experimental feasibility.
- Statistical validity.
- Safety constraints.
Agentic AI MOOC Fall 2025 - video 06
Benchmark noise and evaluation
A model looks better or worse because the benchmark is noisy, not because the agent improved.
Drill 1Design an eval report for an agent.
- Include success rate, confidence interval, cost, latency, and retries.
- Show failure categories.
- Report what changed between runs.
Drill 2Why can leaderboard scores mislead?
- Benchmarks leak.
- Tasks become saturated.
- Scaffolds differ.
- Budgets differ.
Agentic AI MOOC Fall 2025 - video 07
Multi-agent AI
One model context is too narrow for broad, parallel, open-ended work.
Drill 1Split a market research task into subagents.
- Segment by question, region, or source type.
- Give each subagent a deliverable schema.
- Merge with source-quality checks.
Drill 2When should you avoid multi-agent design?
- When tasks are tightly coupled.
- When shared context is essential.
- When the extra token cost exceeds value.
Agentic AI MOOC Fall 2025 - video 08
Training agentic models
Base and chat models know language, but agentic work needs persistence, tool discipline, and recovery from failure.
Drill 1Create an agent training curriculum.
- Start with single-tool tasks.
- Add noisy observations and retries.
- End with long-horizon tasks and hidden state.
Drill 2Diagnose overfitting to benchmark format.
- Change surface wording.
- Mutate environment state.
- Check if success survives changed tools.
Agentic AI MOOC Fall 2025 - video 09
Post-training verifiable agents
Agents need training signals for long tasks, but many useful tasks do not have obvious step-by-step labels.
Drill 1Turn a vague task into a verifiable agent task.
- Define the end artifact.
- Write a checker or rubric.
- Add adversarial cases where shortcut behavior fails.
Drill 2Why can verifiers beat preference labels?
- They are cheaper at scale.
- They reduce subjective grading.
- They support self-improvement loops.
Agentic AI MOOC Fall 2025 - video 10
Agent system design evolution
Agent prototypes work in demos but fail when state, tools, latency, retries, and deployment versions interact.
Drill 1Design a durable run record for an agent.
- Record goal, plan, tool calls, observations, checkpoints, and final evidence.
- Make each step resumable.
- Attach cost and latency budgets.
Drill 2Find where a demo agent breaks in production.
- Look for missing idempotency.
- Look for unbounded tool loops.
- Look for impossible rollback.
Agentic AI MOOC Fall 2025 - video 11
LLM agent foundations
A model can answer a prompt, but an agent must decide what to do next, which tools to use, and when to stop.
Drill 1Design a minimal research agent loop.
- State the user goal as an outcome.
- Define tools with narrow schemas.
- Add a stopping rule and final evidence check.
Drill 2When is a workflow better than an agent?
- Use a workflow when the path is known.
- Use an agent when the path depends on intermediate observations.
- Prefer the simpler system until dynamic control is necessary.
Advanced LLM Agents MOOC Spring 2025 - video 01
Safe and secure agentic AI
As agents gain tools and memory, security is no longer a prompt add-on; it is architecture.
Drill 1Design privilege separation for an agent.
- Classify tools by risk.
- Separate read and write authority.
- Require approvals for high-risk transitions.
Drill 2Red-team an agent memory system.
- Insert malicious memory.
- Trigger retrieval.
- Check whether the agent follows poisoned instruction.
Advanced LLM Agents MOOC Spring 2025 - video 02
Abstraction and discovery
Agents should not only solve one task; they should discover reusable abstractions that compress future tasks.
Drill 1Build a concept-library agent.
- Observe repeated solutions.
- Propose abstraction.
- Test on held-out tasks.
- Promote only if it helps.
Drill 2When is abstraction harmful?
- Premature naming.
- Leaky concept.
- No measurable reuse.
Advanced LLM Agents MOOC Spring 2025 - video 03
Informal plus formal math reasoning
A complete proof often needs informal planning before formal verification.
Drill 1Convert an informal proof sketch to a formal plan.
- List lemmas.
- Map each lemma to library search.
- Define tactic sequence.
- Check and repair.
Drill 2Why use sketches?
- They reduce search space.
- They preserve human-level strategy.
- They guide formal tactics.
Advanced LLM Agents MOOC Spring 2025 - video 04
Autoformalization and theorem proving
Human math is informal; proof assistants require exact formal statements and tactics.
Drill 1Design an autoformalization pipeline.
- Parse informal statement.
- Retrieve similar formal theorems.
- Draft formal statement.
- Check types.
- Repair.
Drill 2Where does retrieval help Lean agents?
- Find theorem names.
- Find tactic patterns.
- Find library definitions.
Advanced LLM Agents MOOC Spring 2025 - video 05
AlphaProof and formal math
Natural-language math reasoning is fragile; formal systems can verify proofs but are hard to search.
Drill 1Explain why theorem proving is agentic.
- There is a state: proof context.
- There are actions: tactics.
- There is feedback: verifier accepts or rejects.
Drill 2What makes math a good RL domain?
- Formal reward.
- Huge search space.
- Reusable libraries.
Advanced LLM Agents MOOC Spring 2025 - video 06
Perception to action
Computer-use agents must operate across real operating systems, not only benchmark websites.
Drill 1Design a GUI agent safety layer.
- Limit destructive actions.
- Require confirmation for irreversible changes.
- Log screen/action pairs.
Drill 2Measure a computer-use agent.
- Task success.
- Action count.
- Recovery from misclick.
- Human intervention rate.
Advanced LLM Agents MOOC Spring 2025 - video 07
Multimodal autonomous agents
Web and GUI tasks require seeing layout, reading text, choosing actions, and recovering from UI changes.
Drill 1Turn a web task into agent state.
- Goal.
- Current page observation.
- Available actions.
- Memory.
- Success check.
Drill 2Why do web agents fail?
- Wrong element.
- Hidden state.
- Long horizon.
- Ambiguous success.
Advanced LLM Agents MOOC Spring 2025 - video 08
Code agents and vulnerability detection
Security bugs hide across files, execution paths, and tool outputs; static prompting misses them.
Drill 1Design a vulnerability-finding agent.
- Index code.
- Generate threat hypotheses.
- Trace data flow.
- Run targeted tests.
- Report evidence.
Drill 2What makes a security finding useful?
- Repro steps.
- Impact.
- Affected path.
- Minimal fix.
- Regression test.
Advanced LLM Agents MOOC Spring 2025 - video 09
Open training recipes
Open models need reproducible paths to reasoning without secret proprietary data.
Drill 1Audit an open reasoning recipe.
- List datasets.
- List filtering rules.
- List eval contamination risks.
- Check ablations.
Drill 2Build a small post-training plan.
- Define task family.
- Generate data.
- Filter with verifiers.
- Evaluate on held-out mutations.
Advanced LLM Agents MOOC Spring 2025 - video 10
Memory and planning
Agents forget, repeat work, or plan against a false model of the environment.
Drill 1Design memory for a coding agent.
- Store repo map, user constraints, test results, decisions.
- Retrieve by task and file path.
- Expire stale assumptions.
Drill 2When does memory hurt?
- Poisoned memories.
- Over-retrieval.
- Outdated facts.
Advanced LLM Agents MOOC Spring 2025 - video 11
Learning to reason with LLMs
Reasoning behavior must be trained or elicited without simply teaching the model to produce longer text.
Drill 1Create reasoning training examples.
- Include hard negatives.
- Include verifier feedback.
- Avoid rewarding verbosity alone.
Drill 2Diagnose fake reasoning.
- Remove the chain and test answer.
- Mutate numbers.
- Ask for independent verification.
Advanced LLM Agents MOOC Spring 2025 - video 12
Inference-time reasoning
Some tasks need search at inference time because one sampled chain is fragile.
Drill 1Choose an inference-time technique.
- If answer can be tested, use generate-test-repair.
- If many paths exist, use search.
- If prompt is variable, use optimization.
Drill 2Why self-correction often fails?
- The model may not see the error.
- The critic may share the same blind spot.
- External verifiers fix this.