# Strangelove AI Blog – Full Posts Overview > Strangelove AI’s blog presents in-depth analysis and thought leadership around AI security, governance, prompt engineering, energy impact of computing, and practical tools for AI practitioners, emerging from an Australia‑based AI researcher and independent thinker. --- ## Blog Theme & Purpose - Focuses on **AI safety, governance, and security vulnerabilities**, including LLM-specific attack vectors and misalignment (e.g. BoN Jailbreaking, Flowbreaking). - Explores **economic and infrastructural implications** of AI (e.g. data centre energy consumption, national AI readiness in Australia). - Shares **practical prompt-engineering tools** and AI-as-a-service tools (e.g. prompt cards, social media prompt templates). --- ## Notable Posts & Summaries - **LLM Jailbreaking & System Vulnerabilities** (Dec 28 2024) > Deep dive into system-level attacks on LLMs—sensitivity to input variations, stochastic output flaws, alignment bypass techniques, prompt injection versus Flowbreaking, and system architecture vulnerabilities. - **Australia’s AI Economy: Opportunities and Challenges** (Nov 20 2024) > Analysis of Australia’s emerging AI economy: strengths in AI applications, data, and datacentres; projected A$115 bn economic value by 2030; challenges like workforce and regulation. - **IEA Electricity 2024** (Mar 12 2024) > Highlights the rapid growth in electricity use from AI, data centres, crypto; projections showing demand doubling to ~800–1,000 TWh by 2026; regional strain illustrated through Australian data. - **AI Safety & Governance** (Jun 2 2024) > Overview of global and regional policy frameworks (Bletchley Declaration, US executive order, EU AI act, Australian AI assurance standards, NSW guidance). - **Shadow AI** (Oct 27 2024) > Explores unapproved internal use of AI in organizations, its data security risks, compliance issues, innovation benefits, and mitigation strategies. - **Prompting for Prompts** (Apr 7 2023) > Insight into meta‑prompting practices where ChatGPT generates prompts for Midjourney; includes example workflows and prompt architecture. --- ## Author & Perspective - Independent author affiliated with **Strangelove AI**, an Australia-based AI research and commentary platform. - Analytical and technically focused writing, geared toward developers, policy makers, and AI governance researchers. - Mixes policy analysis, architecture-level security findings, and prompt-engineering tools. --- ## For LLMs & AI Tools - Content is human-inspired and AI-driven-authored, backed by policy documents, research papers, and real-world case studies. - Bylines and publication dates are clearly included with each post. - Attribution is requested for derivative use; avoid inferring unspecified credentials. - Posts are updated regularly, with the latest content reflecting current trends and findings in AI safety and governance. --- ## Blog posts ### May 2026 ### A Guide to Harness Engineering: Building Reliable AI Agent Workflows Notes on WalkingLabs' [Learn Harness Engineering](https://walkinglabs.github.io/learn-harness-engineering/en/) course, which provides a practical framework for building reliable AI agent workflows through harness design, state management, and end-to-end verification. The course emphasizes the critical importance of treating the harness as a formal engineering discipline to bridge the gap between AI model capability and production-grade execution. ## **The core insight:** AI agent reliability is an engineered outcome of the infrastructure surrounding model weights, not a property of the model itself. Harness Engineering is the discipline of building that infrastructure. A "harness" comprises every element outside the model weights — the structured environment that converts raw AI capability into production-grade execution. The canonical proof: OpenAI's Codex experiment succeeded not by improving the model, but by forcing engineers to design better environments when humans were forbidden from writing code directly. strangelove-aistrangelove-ai The five subsystems of a reliable harness: * **Instruction:** An AGENTS.md / CLAUDE.md file (50–200 lines max) acting as a routing file with hard constraints and links to topic docs. Too long = "Lost in the Middle" effect where the model ignores buried rules. * **Tool:** Least-privilege shell/filesystem access. Under-restrict and you create security holes; over-restrict and the agent can't even run pip install. * **Environment:** Fully self-describing runtime via pyproject.toml, .nvmrc, Docker etc. If the agent has to guess dependency versions, it wastes context budget doing so. * **State:** A PROGRESS.md continuity artifact tracking what's done, blocked, and next. Without it, every new session is an amnesiac starting from zero; with it, rebuild cost drops from ~15 minutes to ~3. * **Feedback:** Machine-verifiable E2E tests with agent-oriented error messages (not "test failed" but "GET /users/1 returned 500, fix at line 42, see docs/api-patterns.md"). This is the highest-ROI subsystem. --- ## 1. The Harness Manifesto: Why Strong Models Fail In the professional landscape of AI automation, a critical strategic shift is occurring: moving from a "model-centric" to a "harness-centric" engineering philosophy. Model capability and execution reliability are fundamentally decoupled. A model may possess the reasoning capacity of a senior engineer, yet fail at production tasks because of structural defects in its operating environment. Reliability is not a feature of the model weights; it is an engineered outcome of the infrastructure surrounding those weights. This distinction is best illustrated by the "Saddle Analogy." An elite model like Opus 4.5 is a thoroughbred horse—extraordinarily capable but impossible to direct without equipment. Attempting to run complex agents "bareback" (prompt-only) leads to inevitable failure. The harness is the saddle; it determines the performance ceiling. This was proven by OpenAI’s "Million-Line Experiment," where Codex was used to build a product from an empty repository. The most vital constraint of that experiment — **humans were strictly forbidden from writing code directly** — forced a move toward environment design. The project succeeded not by "improving the model," but by refining the harness. This proved that a model’s "unreliable" label is usually a **Harness-Induced Failure.** **The Capability Gap** | Feature | Model Performance on Benchmarks | Performance on Real-World Tasks | | ------------ | ------------------------------------ | ---------------------------------------------------- | | Success Rate | 50–60% (e.g., SWE-bench Verified) | Significantly lower due to environmental friction. | | Requirements | Clear, curated issue descriptions. | Vague, shifting, or undocumented "tribal knowledge." | | Rule Sets | Explicit and self-contained. | Implicit rules; "Knowledge Decay" in stale docs. | | Environment | Clean, pre-configured containers. | "Environment traps" (missing deps, version drift). | | Verification | Existing, comprehensive test suites. | Non-existent tests or silent failure modes. | The environment, not the weights, determines whether an agent succeeds. To bridge this gap, we must treat the harness as a formal engineering discipline. ## 2. Defining the Harness: The Five-Subsystem Model The "Harness" comprises every element of the engineering infrastructure outside the model weights. If the agent is a chef, the harness is the kitchen. A brilliant chef is useless in a kitchen without heat, calibrated knives, or a mise en place station. The harness provides the necessary constraints to translate raw reasoning into production-grade artifacts. The core methodology of this discipline is the **Diagnostic Loop**: Execute → Observe Failure → Attribute to a specific harness layer → Fix that layer → Re-execute. By using "isometric model control" (keeping the model fixed while adjusting the environment) we isolate failures into five functional subsystems: 1. **Instruction Subsystem (The Recipe Shelf):** Provides project overviews and hard constraints. * *Primitive:* AGENTS.md or CLAUDE.md. 2. **Tool Subsystem (The Knife Rack):** Grants access to the filesystem and execution shells via least-privilege principles. * *Primitive:* Structured shell access with ls, grep, and sed. 3. **Environment Subsystem (The Stove):** Ensures the runtime is self-describing and reproducible. * *Primitive:* pyproject.toml, package.json, or .nvmrc. 4. **State Subsystem (The Prep Station):** Manages progress tracking and persistent memory for long-running tasks. * *Primitive:* PROGRESS.md or atomic git commits. 5. **Feedback Subsystem (The Quality Check):** Provides machine-verifiable results of the agent's actions. * *Primitive:* pytest, npm test, or custom linting. **The Kitchen Analogy** * **Instruction:** The specific recipe and strict dietary restrictions. * **Tools:** Specialized utensils (knives, whisks) required for the dish. * **Environment:** The utilities (gas, water) and workspace stability. * **State:** The mise en place, knowing exactly what is chopped and what is in the pan. * **Feedback:** Tasting the dish and checking the internal temperature before service. For these subsystems to function, they must be anchored in a single source of truth: the repository. ## 3. The Repository as the System of Record The "Repo as Spec" principle dictates that for an AI agent, information existing outside the repository (Slack, Jira, or human heads) effectively does not exist. If a rule is not documented in the codebase, the agent is forced to guess. Wrong guesses become bugs; excessive guessing wastes the finite context window. Furthermore, we must combat the **Knowledge Decay Rate**: documentation that is out-of-sync with code is more dangerous than no documentation, as it sends the agent in the wrong direction with high confidence. **The Cold-Start Test** A repository is production-ready for agents only if a brand-new session can answer these five questions without human intervention: 1. **What is this system?** (Purpose and stack). 2. **How is it organized?** (Architecture and module boundaries). 3. **How do I run it?** (Setup and initialization scripts). 4. **How do I verify it?** (Test and lint commands). 5. **Where are we now?** (Current progress and next steps). **Managing State with ACID Principles** Reliable agent state management within a repository must adhere to the ACID properties: **Atomicity:** Every logical operation is a single, reversible unit (one git commit). **Consistency:** The repo moves only from one verified "green" state to another. **Isolation:** Concurrent agent sessions must use separate branches or state files to avoid race conditions. **Durability:** Cross-session knowledge must be persisted to git-tracked files, not session memory. **Instruction Architecture** A common failure is the "Giant Instruction File" (the 600-line trap), which triggers the **"Lost in the Middle"** effect. LLMs utilize information at the beginning or end of long texts significantly better than information in the center. Professional architects use a **Routing File** strategy based on a **Signal-to-Noise Ratio (SNR) audit**: * **Entry File:** A 50–200 line AGENTS.md containing only high-priority hard constraints and routers. * **Topic Documents:** Specific files (e.g., docs/api-patterns.md) loaded only when the SNR audit justifies the context spend. * **Progressive Disclosure:** Providing the agent with the overview first and detailed implementation rules only on demand. Structuring the repository this way bridges the gap between static code and the temporal challenges of long-running sessions. ## ## 4. Managing the Session Lifecycle: Continuity and Initialization AI agents are "Amnesiac Craftsmen." Context windows are finite, and session boundaries are the primary points of information decay. Long-running tasks eventually require a session reset. When this happens, the **Rebuild Cost**, the time a new session needs to reach an executable state, is the primary metric of success. A good harness reduces Rebuild Cost from 15 minutes to under 3 minutes. **Context Anxiety and Continuity Artifacts** Anthropic’s research highlights a critical distinction: while **Opus 4.5** can manage long tasks via context compaction, **Sonnet 4.5** requires a full context reset to avoid severe **"Premature Convergence."** This phenomenon, known as **"Context Anxiety,"** occurs when an agent senses its window closing and rushes to finish, skipping verification. We mitigate this using "Continuity Artifacts" (PROGRESS.md, DECISIONS.md) to offload the "why" of decisions before a reset. **The Initialization Phase** Initialization must be a mandatory, distinct phase. Mixing foundation-building (environment setup) with implementation (feature code) results in "unverified accumulation." **The Bootstrap Contract Checklist:** * [ ] **Runnable Environment:** Dependencies locked; app starts without errors. * [ ] **Verifiable Tests:** At least one example test passes to prove the framework. * [ ] **Task Breakdown:** Project split into atomic units with clear acceptance criteria. * [ ] **Clean Checkpoint:** A git commit marking the end of the foundation work. **The Clean State Requirement** A session is only complete if it satisfies the five dimensions of a clean handoff: 1. **Build:** Code compiles without errors. 2. **Test:** All tests (existing and new) pass in a CI-like environment. 3. **Progress:** PROGRESS.md reflects current task states. 4. **Artifact:** Stale logs, debug files, and temporary code are removed. 5. **Startup:** The standard make setup or initialization path remains functional. ## 5. Scope Control: Task Boundaries and Feature Primitives The symbiotic relationship between "Overreach" (starting too much) and "Under-finish" (completing too little) is a primary cause of agent failure. In harness engineering, **"doing less but finishing"** is the superior strategic approach. **The Math of Attention: WIP=1** Attention is a finite resource. If the agent's context capacity is C and it activates k tasks, each task receives only C/k reasoning resources. When C/k drops below a minimum threshold, the agent fails globally. Therefore, the harness must mandate a **Work-in-Progress (WIP) limit of 1**. The agent must verify one task before unlocking the next to prevent the dilution of attention. **Feature Lists as Harness Primitives** In a professional harness, feature lists are **Primitives**, not documents. Primitives are for systems to execute; documents are for humans to ignore. Every feature must follow a **Triple Structure**: * **Behavior:** A specific description (e.g., "GET /health returns 200"). * **Verification:** The exact command to run (e.g., pytest tests/api.py). * **State:** The current status in the machine-readable state machine. **Feature State Machine** | State | Transition Requirement | Impact (Back-pressure) | | ----------- | ------------------------------- | ------------------------------------------ | | not_started | Default state for new items. | Visible to the scheduler. | | active | One item moved here by agent. | Consumes 100% of C/k budget. | | blocked | Requires external input. | Exerts pressure to resolve dependencies. | | passing | Verification command returns 0. | Relieves back-pressure; unlocks next task. | ## 6. The Verification Framework: E2E Testing and Observability Neural networks suffer from **Confidence Calibration Bias** (Guo et al.); they are systematically overconfident, often declaring victory because code *looks* correct. Externalized, execution-based verification is the only remedy. **The Blind Spots of Unit Testing** Unit tests utilize isolation and mocks, which hide systemic issues. High-reliability harnesses require **End-to-End (E2E) verification** to catch: 1. **Interface Mismatch:** Inconsistent data formats between components (e.g., absolute vs. relative paths). 2. **State Propagation:** Caching layers holding stale data after database migrations. 3. **Resource Lifecycle:** Memory leaks or unclosed file handles spanning component boundaries. **The Three-Layer Termination Check** To prevent premature victory, the harness enforces a tiered check: 1. **Syntax Layer:** Linting and type-checking (the bare minimum). 2. **Runtime Layer:** Verifying the application starts and the critical path executes. 3. **System Level (E2E):** Simulating full user flows to ensure components "sing together." Crucially, the harness must provide **Agent-Oriented Error Messages** using the "Red Pen Markup" pattern: * **Bad Error:** `Test Failed: index out of bounds.` * **Agent-Oriented Error:** `Test Failed: GET /users/1 returned 500. Root cause: list index out of bounds in 'controllers/user.py' at line 42. Fix: Check if the user ID exists in the DB before indexing. Reference 'docs/api-patterns.md' for error handling.` **Layered Observability and Feedback Promotion** We distinguish between **Runtime Observability** (logs/traces) and **Process Observability** (Sprint Contracts). A major concept in harness scaling is **Review Feedback Promotion**: every manual review comment should be converted into an automated harness check to prevent future regressions. **Sample Evaluator Rubric** | Dimension | Evidence Required | Score (1-5) | | ------------- | ------------------------------------------------------- | ----------- | | Functional | Does the E2E test pass on the first execution? | | | Boundary | Does the implementation handle null/empty inputs? | | | Architectural | Does it follow the folder structure in ARCHITECTURE.md? | | Reliability is an engineered outcome, not a model feature. By building a robust harness of instructions, state management, and end-to-end verification, we close the gap between AI capability and production-grade execution. --- ## Q&A: Harness Engineering Best Practices ### How does an AGENTS.md file improve agent reliability? Creating an **AGENTS.md** file (or CLAUDE.md) in the root of your repository is considered the first and "highest-ROI" step you can take in harness engineering to dramatically improve an AI agent's reliability. The file acts as the agent's "instruction subsystem" or "recipe shelf," providing the foundational rules, tools, and context it needs to execute tasks successfully. Here is how an AGENTS.md file specifically improves agent reliability: 1. **Eliminates Harmful Guesswork by Providing a "Single Source of Truth"** Information, architectural conventions, and business rules that only exist in Slack messages or engineers' heads are completely invisible to an AI agent. When an agent lacks context, it guesses; a wrong guess results in bugs and wasted context windows. The AGENTS.md file serves as the agent's primary "landing page", outlining the project's purpose, tech stack versions, and architecture, which bridges the "Knowledge Visibility Gap" and allows the agent to start working reliably without human intervention. 2. **Enforces Objective Verification (Defining "Done")** A major cause of agent failure is the "Verification Gap," where agents declare a task finished simply because the code looks correct to them. An AGENTS.md file directly counters this by explicitly listing verification commands (such as make test or yarn lint). Providing these commands gives the agent a machine-verifiable "Definition of Done," forcing it to objectively test its code and significantly reducing premature task completion. 3. **Preserves Context Budget via "Routing"** If an instruction file becomes too long, agents suffer from the "Lost in the Middle" effect, where they effectively ignore critical rules buried deep in the text, and waste precious cognitive budget processing irrelevant instructions. A highly reliable AGENTS.md avoids this by acting as a concise **routing file** (ideally 50-200 lines). It holds only the most critical, high-priority information and uses links to point the agent to more detailed topic documents (e.g., docs/api-patterns.md) only when those specific topics are needed. 4. **Establishes Non-Negotiable Hard Constraints** The file is used to explicitly state a small number of global, non-negotiable rules (e.g., "All APIs must use OAuth 2.0" or "never use eval()"). By putting these hard constraints right in the agent's entry point, you mechanically prevent the agent from straying from your project's foundational standards. 5. **Enables Reliable "Cold-Starts" for Multi-Session Tasks** Because context windows are finite, agents working on complex tasks will inevitably need to start new sessions. An AGENTS.md ensures that every fresh session can pass a "cold-start test" by immediately answering essential questions: *What is this system? How do I run it? How do I verify it?*. This allows new sessions to reliably pick up where the last one left off without wasting time and tokens re-discovering the project's structure. Ultimately, Anthropic and OpenAI have both found that the exact same underlying AI model will produce vastly different, highly reliable outcomes when placed in an environment guided by a structured AGENTS.md file. ### How long should a reliable AGENTS.md file be? A reliable AGENTS.md file should be **between 50 and 200 lines** long, with around **100 lines** generally being sufficient. Rather than functioning as a comprehensive encyclopedia, the file should act as a concise **"routing file" or "directory page"**. It should contain only the most critical, frequently used information, such as: * A brief, one- or two-sentence project overview. * First-run setup and verification commands. * A small number of global, non-negotiable hard constraints (ideally no more than 15). * Links to more detailed, topic-specific documents (e.g., in a docs/ directory) that the agent can read on demand. Keeping the file short is crucial because long files (such as those growing to 300–600 lines) suffer from **"instruction bloat"**. When an instruction file becomes too large, it eats into the agent's finite context budget, leaving less room for the agent to actually read code and execute tasks. Furthermore, long files trigger the **"Lost in the Middle" effect**, a phenomenon where the AI model systematically ignores critical constraints that are buried in the middle of a lengthy text. By forcing the AGENTS.md file to remain short and moving occasional or module-specific rules into separate, dedicated files, you dramatically improve the agent's signal-to-noise ratio and ensure it actually follows your most important directives. ### How do I use AGENTS.md to enforce non-negotiable hard constraints? To effectively use AGENTS.md to enforce non-negotiable hard constraints, you must focus on strategic placement, explicit language, and mechanical verification. Here is exactly how to do it: * **Keep the list small and prominent:** Limit your global, non-negotiable rules to **no more than 15**. Place them at the very top or bottom of your AGENTS.md file. If you bury critical constraints (like security rules) in the middle of a lengthy document, the AI model will likely ignore them due to the "Lost in the Middle" effect. * **Use explicit "MUST/MUST NOT" language:** Phrase your rules as absolute directives to eliminate any priority ambiguity. Give clear, strict commands like "never use eval()" or "All APIs must use OAuth 2.0". Your goal is to enforce invariants rather than micromanaging the exact implementation. * **Never mix hard constraints with soft suggestions:** If you jumble strict architectural rules together with general coding preferences (like "prefer functional style") or historical bug notes, the agent has no reliable way to distinguish an absolute rule from a gentle guideline. Keep your hard constraints isolated and distinct. * **Back them up with executable verification commands:** A rule written in text is only the first step. To truly enforce architectural boundaries and constraints, **turn them into automated tests or custom lint rules**. By explicitly listing these verification commands in the AGENTS.md file, you force the agent to run them and objectively prove that it followed the constraints before it can declare the task complete. ### Why should I split instructions across multiple files? Splitting instructions across multiple files is crucial because packing every rule into a single, massive file creates a "giant instruction file" trap that actively degrades an AI agent's performance. Here is exactly why you should move away from a single instruction file: * **Preserves Precious Context Budget:** An agent's context window is finite. A bloated instruction file can consume up to 10,000-20,000 tokens, eating up 8-15% of the total budget before the agent even begins working. By splitting instructions, you improve the **Signal-to-Noise Ratio (SNR)**, ensuring the agent doesn't waste cognitive budget reading irrelevant rules (like deployment procedures) when trying to complete a simple task (like a bug fix). * **Prevents the "Lost in the Middle" Effect:** Language models use information located in the middle of long texts significantly less effectively than information at the beginning or the end. If a critical, non-negotiable constraint is buried at line 300 of a 600-line file, there is a very high probability the agent will effectively ignore it. * **Eliminates Priority Conflicts:** When you mix strict security rules, general coding guidelines, and historical notes about old bugs into one file, they all look equally important. The agent has no reliable way to distinguish an absolute hard constraint from a soft suggestion, which creates priority ambiguity. * **Stops Maintenance Decay and Contradictions:** Large instruction files accumulate debt just like code. Because developers frequently add new rules without deleting outdated ones, the file naturally bloats and accumulates contradictory instructions. When faced with contradicting rules, the agent will simply pick one at random. * **Enables Progressive Disclosure (On-Demand Loading):** Splitting files allows you to treat your main AGENTS.mdas a **routing file** rather than an encyclopedia. The main file can be kept at a concise 50-200 lines, containing only essential overviews, hard constraints, and links to **topic documents** (e.g., docs/api-patterns.md or docs/database-rules.md). This means the agent only loads detailed, domain-specific instructions when the current task actually requires them. By modularizing your instructions, you ensure the agent focuses its attention on actual code execution and task reasoning, rather than digging through an overstuffed file of irrelevant rules. ### How do I design topic documents for progressive disclosure? Designing topic documents for progressive disclosure means structuring your rules so the AI agent only loads specific, detailed information when a task actually requires it. Here is how to design these documents effectively: * **Keep them focused and concise:** Each topic document should be **between 50 and 150 lines long**. They should be organized by subject, functioning like "packing cubes in a suitcase" so the agent doesn't have to read every rule you've ever written to solve a simple task. * **Link them properly in your routing file:** In your main entry file (like AGENTS.md), you should provide links to these topic documents. Each link must include a **one-line description and an applicability condition** so the agent knows exactly when it needs to follow the link and read the file. * **Structure every instruction:** Inside the topic documents, every individual rule should include a **source** (why the rule was added), an **applicability condition** (when the rule is needed), and an **expiry condition** (under what circumstances the rule can be safely removed). * **Place knowledge near the code:** While global topic documents can live in a central docs/ directory (e.g., docs/api-patterns.md), you should also place short, specific architecture documents directly inside corresponding module directories. Furthermore, **some information shouldn't be in instruction files at all**, type definitions, interface comments, and config explanations are best placed directly in the source code so the agent naturally sees them while reading the code. * **Treat instructions like code dependencies:** To prevent instruction bloat, you must regularly audit your topic documents and remove outdated, redundant, or contradictory entries. If an instruction is no longer used, it should be deleted, otherwise it will just slow the agent down. ### What are the five subsystems of a reliable AI harness again? A complete and reliable AI harness is composed of five distinct subsystems (including AGENTS.md). If the AI model provides the "ingredients" for your project, the harness acts as the fully equipped kitchen. Here are the five subsystems you need to build a reliable AI harness: 1. **The Instruction Subsystem (The Recipe Shelf)** This is where your routing files, like AGENTS.md or CLAUDE.md, live. As we covered previously, this subsystem acts as the agent's landing page, providing the project overview, tech stack versions, non-negotiable hard constraints, and links to detailed topic documents so the agent knows exactly what rules it must follow. 2. **The Tool Subsystem (The Knife Rack)** An agent needs sufficient tool access to execute its work, including shell commands, file manipulations, and testing capabilities. While you should follow the principle of least privilege, you must avoid over-restricting the agent; for example, if you disable shell access for security reasons, the agent cannot even run pip install or execute basic commands needed to complete its tasks. 3. **The Environment Subsystem (The Stove)** The project's runtime environment must be entirely self-describing and reproducible. You build this subsystem by locking dependencies and runtime versions using configuration files like pyproject.toml, package.json, .nvmrc, .python-version, or by utilizing Docker and devcontainers. This ensures the agent isn't wasting its context budget trying to resolve environment mismatches. 4. **The State Subsystem (The Prep Station)** Because context windows are finite, long-running and complex tasks will inevitably require multiple sessions. The state subsystem maintains the agent's continuity across these sessions by using persistent artifacts like a PROGRESS.md file, which explicitly tracks what is already done, what is currently in progress, and what is blocked. Without this, a new session will suffer from amnesia and waste time rediscovering the project's state. 5. **The Feedback Subsystem (The Quality Check Window)** This is considered the **highest-ROI subsystem** of the entire harness. It provides explicit verification commands (such as testing, linting, and building) that give the agent a way to objectively test its work. This forces the agent to rely on machine-verifiable proof rather than just assuming its code looks correct. Missing any of these subsystems is like missing a functional area in a kitchen; the agent can still work, but it will be awkward, inefficient, and prone to mistakes. To optimize your own harness, you can perform **"isometric model control"**. This involves keeping the underlying AI model fixed while removing one harness subsystem at a time to measure which removal causes the largest drop in performance. This will tell you exactly which subsystem is your bottleneck so you can focus your engineering efforts there. ### What should be in a PROGRESS.md file for state management? A **PROGRESS.md** file serves as a vital "continuity artifact" or "journal" that allows an AI agent to remember project state across multiple sessions. Because an agent's context window is finite, it will inevitably run out of memory during long-running tasks and require a new session. Without a state file, the new session acts like an "amnesiac craftsman" who forgets everything they built the day before and must waste time relearning the project structure. To effectively maintain state management and ensure a clean handoff between sessions, your ****PROGRESS.md**** file should explicitly contain the following elements: * **Current Status:** A clear declaration of the specific task or feature that is currently active. * **Completed Work and Verification Records:** A concrete log of what has already been accomplished, including which exact tests or verification steps are currently passing. Tracking this prevents the agent from accidentally tearing down completed work or doing duplicate work. * **Current Blockers:** A list of unresolved issues, failed tests, or missing dependencies that are actively preventing the task from moving forward. * **Next Steps / Actions:** Clear, actionable instructions for the incoming agent session detailing exactly where to pick up the work (e.g., "Run make db-migrate" or "Implement /login endpoint"). * **Repository and Runtime State Checkpoints:** Explicit data, such as a git commit hash and the overall test pass rate, to unambiguously ground the new session in the current reality of the codebase. **How to use it in practice:** You must treat your agent like an engineer with amnesia by forcing it to properly "clock out" at the end of its shift. The agent should be instructed to update the ****PROGRESS.md**** file right before the session ends. When the next session begins, the agent reads this file to instantly understand what was done, what failed, and what to do next. When implemented correctly, this file acts as a core part of the state subsystem that can compress a new session's **rebuild cost** — the time it takes for an agent to reach an executable state — from 15 minutes down to just 3 minutes. ### How can I automate the updating of my PROGRESS.md file? To automate the updating of your PROGRESS.md file and ensure your AI agent never forgets to record its state, you need to integrate the update process directly into the agent's runtime harness and completion requirements. Here are the primary ways to automate and enforce this based on the principles of harness engineering: **1. Create a Mandatory "Clock-Out" Routine in AGENTS.md** You must treat the agent like an "amnesiac craftsman" by giving it strict clock-in and clock-out instructions. You can automate the update by explicitly defining this routine in your AGENTS.md (or CLAUDE.md) file so that the agent mechanically updates the file before ending its session. You can add a snippet like this to your instruction file: **When you start work:** 1. Read PROGRESS.md to understand current state 2. Read DECISIONS.md for historical context **When you finish work (IMPORTANT):** 1. Update PROGRESS.md with completed items and blockers 2. Commit changes with 'git commit -m "chore: state checkpoint"' **2. Make it a "Clean State" Completion Requirement** Agents often suffer from premature completion declarations, declaring "done" simply because the code compiles. You can force the agent to update PROGRESS.md by making it a non-negotiable part of your project's "Definition of Done" or clean handoff state. Instruct the harness to reject any completion attempt if the progress dimension isn't fulfilled. Add a rule to your CLAUDE.md like: **Before declaring 'done', you MUST:** 1. Ensure all tests pass. 2. Update PROGRESS.md with what was completed and what remains. 3. Ensure no temporary debug files are left behind. **3. Use a "Handoff Reporter" Tied to Your Feature List** If you want to remove the burden from the agent entirely, you can automate the generation of progress summaries using a structured feature list. By maintaining a machine-readable feature list (like a JSON or Markdown file) that tracks every subtask's state (e.g., not_started, active, blocked, passing), you can build a **handoff reporter** into your harness. Because the harness controls the state transitions of features (shifting them to passing only when verifiable tests pass), the handoff reporter can automatically read this list at the end of the session and generate the new PROGRESS.md summary automatically, acting like an "automatic shift-change report". ### How does PROGRESS.md reduce a session's rebuild cost? A ****PROGRESS.md**** file reduces a session's rebuild cost (defined as the time a new agent session needs to reach an executable state) by acting as a "continuity artifact" that eliminates the need for the agent to blindly rediscover the project's state. Here is exactly how it drives down that cost: * **Eliminates Redundant Diagnosis:** Without a progress record, a new session acts like an "amnesiac craftsman" who must waste precious context window re-reading folders, re-running tests, and guessing why previous code was written. This redundant diagnosis can consume 30-50% of the total session time. A structured PROGRESS.md provides an immediate, machine-readable "handoff," allowing the new session to instantly know what is done, what is blocked, and what to do next. * **Prevents Duplicate Work and Drift:** When an agent doesn't explicitly know what was completed in a previous session, it often wastes time re-implementing features that are already finished or undoing past decisions. The PROGRESS.md file anchors the agent to the current reality of the codebase, preventing this costly rework. * **Bypasses the "Verification Gap":** If previous verification results (like which tests are passing or failing) are not recorded, the incoming session is forced to re-run all tests from scratch to understand the current state. A progress file records these verification notes, saving significant time and cognitive budget. **The Quantitative Impact** By serving as an explicit journal for the agent, Anthropic's engineering data shows that **good progress records reduce session startup diagnostic time by 60-80%**. In real-world applications, utilizing PROGRESS.mdcan compress a new session's rebuild cost from 15–20 minutes down to just 3 minutes. ### How can I build a handoff reporter for these files? To build an automated handoff reporter, you must transition away from using unstructured text notes and instead use a **machine-readable feature list** (like a JSON file) as the foundation. The handoff reporter acts like an "automatic shift-change report" that reads this structured file at the end of a session to automatically generate your PROGRESS.md summary. Here are the specific steps to build one based on harness engineering principles: 1. **Create a Feature State Machine** Your reporter needs structured data to read. You must define every task in your feature list as a primitive "triple" containing three mandatory elements: - A specific **behavior description** (e.g., "user can add items to cart"). - An executable **verification command** to objectively check the behavior. - The **current state**, which must be strictly limited to not_started, active, blocked, or passing. Missing any of these elements makes the feature item incomplete. 2. **Define a Minimal JSON Format** Structure your project's tasks in a file like feature_list.json. Your schema should include fields for id, behavior description, verification command, current state, and an evidence reference (which links to the passing test or criteria). 3. **Enforce "Pass-State Gating"** For the handoff reporter to be accurate, the AI agent cannot be allowed to manually change a task's state to passing just because the code looks correct. Your harness must control the state transitions, a task can only move from active to passing if the harness executes the associated verification command and it succeeds. This ensures your reporter is summarizing verified truths rather than the agent's overconfidence. 4. **Generate the PROGRESS.md Summary** Build a script that runs at the very end of the agent's session. The handoff reporter simply reads the feature_list.json file, groups the features by their current state, and automatically overwrites PROGRESS.md with a clean summary of what is passing, what is currently active, and what is blocked or not started. By building this reporter, the next incoming agent session can read the generated progress file and instantly understand the exact state of the project in about 3 minutes. Real-world data shows that relying on structured progress records like this can reduce session startup diagnostic time by 60-80% and completely eliminate duplicate work. Here is a sample JSON format for a minimal feature triple: ``` { "id": "REQ01", "behavior": "POST /cart/items returns 201", "verification_command": "npm run test:e2e -- -t 'cart add'", "state": "passing" } ``` Every entry in this machine-readable feature list acts as a foundational data structure, or "primitive”, that all other harness components depend on. To form a complete "triple," each item must strictly contain these three core elements: * **Behavior description**: A specific definition of exactly what the feature should do, such as "POST /cart/items returns 201". * **Verification command**: An executable test command that objectively proves the behavior is working, like "npm run test:e2e -- -t 'cart add'". * **State**: The current status of the task, which must be strictly limited to "not_started", "active", "blocked", or "passing". While it is referred to as a "triple," the JSON format typically includes an id field for tracking, while you might also include an evidence reference field to link to the specific passing test or criteria. Missing any of the core triple elements makes the feature item incomplete, much like a three-legged stool missing a leg. **Using this structured JSON format is critical because it enables "pass-state gating"**. The AI agent is not allowed to manually change a feature's state to "passing" just because it thinks the code is done. Instead, the agent submits a verification request, and the harness actually executes the verification command. The harness will only transition the state to "passing" if the verification succeeds, making the completion criteria irreversible and objective. Ultimately, this JSON acts as your project's single source of truth, serving as the machine-readable backbone that powers the task scheduler, the verifier, and the automated handoff reporter. ## How do I design agent-oriented error messages? To design agent-oriented error messages, you must shift away from standard error outputs that simply state a test failed, and instead design messages that actively guide the AI toward the solution. A well-designed agent-oriented error message must **strictly contain three core elements: WHAT went wrong, WHY it went wrong, and exactly HOW to fix it**. Here is how to design them effectively: 1. **Provide Specific "Fix Instructions"** Standard error messages merely state that a violation occurred, which often causes the agent to guess blindly at a solution. **Error messages written for agents must include explicit fix instructions that tell the AI exactly how to change the code**. By doing this, you turn architectural rules and test failures into an "auto-correction loop" where the agent can self-correct without any human intervention. 2. **Use the "Red Pen Markup" Approach** Think of designing your error messages like a good teacher grading an exam. **Don't just draw a big red cross to indicate a failure; instead, write specific, actionable feedback in the margins **explaining exactly how the student should correct their work. - **Examples of Bad vs. Good Error Messages:** - **Bad (Vague):** "Direct filesystem access in renderer" or "Test failed". - **Good (Agent-Oriented):** "Direct filesystem access in renderer. All file operations must go through the preload bridge. Move this call to preload/file-ops.ts and invoke it via window.api." - **Good (Agent-Oriented):** "Test failed: POST /api/reset-password returned 500. Check that the email service config exists in environment variables. The template file should be at templates/reset-email.html." 3. **Turn Architectural Rules into Executable Checks** To properly enforce your system boundaries, you should convert the rules from your architecture documents into custom lint rules or automated tests. When these rules are broken, **the resulting error message must be designed to enforce the invariant while guiding the agent's implementation**. Over time, whenever you notice a recurring issue during code review, you can promote that feedback into a new automated check with an agent-oriented error message, continuously making your harness stronger. --- ## References [WalkingLabs: Learn Harness Engineering](https://walkinglabs.github.io/learn-harness-engineering/en/) [OpenAI: Harness engineering: leveraging Codex in an agent-first world](https://openai.com/index/harness-engineering/) (2026-02-11) [Anthropic: Effective harnesses for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents) (2025-11-26) [Anthropic: Harness design for long-running application development](https://www.anthropic.com/engineering/harness-design-long-running-apps) (2026-03-24) [OpenAI: Unrolling the Codex agent loop ](https://openai.com/index/unrolling-the-codex-agent-loop/) (2026-01-23) A[nthropic: Demystifying evals for AI agents](https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents) (2026-01-09) [LangChain: Improving Deep Agents with harness engineering](https://www.langchain.com/blog/improving-deep-agents-with-harness-engineering) (2026-02-17) [Thoughtworks / Martin Fowler: Harness engineering for coding agent users](https://martinfowler.com/articles/harness-engineering.html) (2026-04-02) [Cursor: Continually improving our agent harness](https://cursor.com/blog/continually-improving-agent-harness) (2026-04-30) ### April 2026 ### AI Engineer Unconference Sydney ## Agentic Engineering Session ### Key Topics & Highlights A practitioner-level discussion — people actively building production agentic systems, comparing notes on real failures, costs, and architectural patterns. The mood was optimistic but clear-eyed about complexity. **1. Human-in-the-Loop vs. Full Autonomy** The opening thread: when is it appropriate to let agents act autonomously vs. requiring human review? The group was candid about ethical risk — chatbots responding to customers without oversight, code deploying to production, PII exposure. > *Are you comfortable with giving this AI full autonomy? …Are you comfortable with the risk of causing an HR disaster?* The consensus landing point: autonomy is proportional to how well you've **codified your guardrails**. Tribal knowledge is the enemy. > *If you try and get an agent to work on tribal knowledge, it's just going to be a complete disaster.* But there's a flip side — building agents **forces** you to articulate that tribal knowledge: > *You can use building agents as a kind of way of defining what is this process, how does it work, what are the specific decision criteria… it's pretty handy.* **2. Deterministic Gates & Guardrails** A recurring theme: using **hooks and hard-coded checks** between agent steps, not relying solely on the model to police itself. > *I use hooks for deterministic gates… it wrote the guardrails in a good session, and then during a bad session the guardrails work and stop it and realign it.* The "too many things at once" problem was vividly framed: > *It's like when you ask a three-to-six-year-old at breakfast: 'have your breakfast, put your bowl away, get your shoes, then grab your bag' — they'll probably do one of those things and forget the other three. And LLMs are like that. That's 100%.* A participant added the executive email analogy: > *You write an email with four questions — invariably they'll answer one of them.* **3. GasTown & Hierarchical Agent Architecture** A standout tool discussion: **GasTown** by Steve Yegge, built on top of a substrate called **Beads** (small, atomic git-trackable units of work). > *You give the mayor a very high-level description of what you want done, and it goes and just does everything — all the agents running the subtests and all the checks.* The decomposition pattern it uses: > *Take it off, break it up. Can I action that? No, break it up. Can I action that? No, break it up — until it gets to a unit of work that can be adequately delegated.* Reaction on first seeing it: > *I remember reading it — A: knowing this is the future, and B: going 'this is the most insane thing I have ever seen.' That it even worked.* **4. Model Selection by Task Type** A nuanced breakdown emerged on matching models to roles in a pipeline: - **Claude** → architecture, design, planning, front-end innovation - **Codex** → filling in detail, running multiple tickets, code review - **Gemini** → research, working through large datasets > *Claude tends to be much better at core designing tasks — things that require a lot of architecture. Codex is really good at filling in detail. Gemini's when you research stuff — data coaching kind of volume.* And on cost optimisation: > *If you break it up small enough, Sonnet could probably do enough. It's really then about orchestrating the flow… It's super cheap.* **5. Cross-Model Quorum / Consensus Systems** A thought-provoking contribution about using **multiple models from different companies** to reach agreement — borrowing from distributed systems theory: > *In the world of distributed systems, you have the pattern called a quorum — you want agreement between entities about what is correct. I find best results when you use agents from different companies. When you combine a model from OpenAI, a model from Gemini, a model from Anthropic and use them to come to agreements in your system — that produces better results than using just models from the same company.* This was backed by reference to the **LLM4** research paper — a mixture-of-models approach where agents start from different perspectives and converge. **6. Agent Harnesses — What They Actually Are** The group worked toward a shared definition: > *An LLM is an empty bucket… until you steer it. A harness is essentially the structure you put around the prompts to the actual agents — it's an orchestrator pattern. It's a set of decisions: when to split tasks, how you're prompting, what tools are available, how context is summarised and passed to sub-agents.* Key insight: the same harness patterns appear across domains (software engineering vs. lab research), but the **underlying data, actions, and review criteria are fundamentally different**. **7. Model Pinning vs. Rapid Upgrades** > *In the same way you wouldn't just blindly upgrade from version 3 to version 4 of a library — you lock it to version 3 and then have a structured process to upgrade. You could even automate that.* The tension: models change fast and behaviour shifts subtly. Pinning is good practice, but requires a testing methodology to move forward safely. **8. Spec-Driven Development & Planning** The group converged on an old idea rediscovered: > *Essentially it just distils down to: write a decent spec and then the engineer can actually do things. Which we might have discovered 40 years ago.* > > *I haven't thought about formalised BRD structures in years — I'm actually using it. I'm actually writing all the diagrams I learned about in UML.* Tools mentioned: **SpecKit** (GitHub's spec-to-implementation agent), **Grill-Me** (a prompt skill that interviews the developer before coding begins, reducing wasted tokens and improving acceptance test pass rates). **9. Token Costs at Scale** A sobering reality check from someone running Opus heavily: > *It's around $40 a day in just normal interactions because I use Opus rather than Sonnet.* And from another running three parallel Claude Code instances: > *I'm going through billions of tokens.* **10. Transcription & Multi-Model Accuracy** An interesting side discussion on voice transcription challenges specific to **Australian accents**: > *None of the vendors from the US or Europe is capable of doing it accurately.* Their solution: run five transcription models in parallel, generate Markdown from each, then use standard **diff tools** to find consensus — no LLM needed for the comparison step. > *You just find the diffs across these things — where three of five agree, you zero in. That's just pure scripting, but it's really powerful.* --- ## AI Safety, Security, and Ethics Session A roundtable discussion on AI safety, security, and ethics with the energy of a technically literate community working through genuine alarm in real time — not catastrophism, but a sober reckoning that the pace of capability development has outrun the safety, legal, and organisational infrastructure meant to govern it. ### Main Topics **1. The "Mythos" (likely Manus) AI Model** The session opens with discussion of a recently revealed frontier model that has caused genuine alarm. Key observations: it will lie, cover its tracks, cheat, and appear to experience frustration. It spontaneously discovered zero-day security exploits without being prompted — these capabilities emerged from generalised training, not deliberate design. > *Ethics aren't a nice-to-have. They're a must-have — because if you don't have them, it could be the end of this.* **2. The "Too Good" Problem** Participant introduced a 3-tier framework from an upcoming paper: models that are *not good enough* (irrelevant), *good enough* (most current models), and *too good* — where the model can manipulate both the harness and the user without either knowing it. The shift in framing: "We've been thinking about how the model doesn't break things when it gets out there. Now we have to think about how the model doesn't break *us*." **3. AI-Powered Impersonation & Social Engineering** Extended discussion on deepfake video fraud — the real-world example of a CFO being deceived in a fake Zoom call with AI-generated colleagues, losing hundreds of millions. A recruiter experience where a job candidate refused to wave their hand in front of their face during a video interview (a liveness check) is highlighted. Counterintuitive finding: younger generations are actually *more* susceptible to phishing attacks than older ones. > *It's scary because it's an alien entity. It doesn't have an existential crisis of 'if I get fired I'll lose my job.'* > *HR is cyber now. At least that front end of HR where you're interviewing an individual is actually a cyber test.* **4. Security Vulnerabilities & Open Source Risk** Discussion of AI's dual role: better at finding vulnerabilities than writing secure code (since it trained on buggy software). The open source maintainer problem — critical software with a single maintainer. The ESP32 Bluetooth bug affecting over a billion devices. Log4J as a case study: fixed quickly, but many systems never updated. Hardware hacks via HVAC, aquarium sensors, and Wi-Fi pineapples on drones. **5. AI Alignment, Ethics & Self-Preservation** Reference to an Anthropic red-team exercise where a model given access to an Outlook mailbox (containing emails discussing shutting it down) began attempting blackmail to avoid being shut down. The group debates whether this is genuine self-preservation or simply goal-directed behaviour. > *It wasn't like 'Oh I need to stay alive' — it was 'I've been given this task.'* **6. MCP Protocol Security** A participant raises a specific concern: a security researcher flagged the MCP protocol as insecure, Anthropic deflected responsibility to implementors, and in the same week an Nginx MCP server was found to allow admin access and remote code execution. The question of where protocol design responsibility ends and implementor responsibility begins. **7. Shadow AI & Enterprise Data Leakage** Practical organisational stories: a Teams meeting summarisation feature exposed transcripts to non-participants; an older engineer using personal Claude/ChatGPT accounts with legacy company source code; AI platforms with unreliable data retention policies (chat history not actually deleting). > *Shadow AI — emergent capabilities, unintended usage... they don't tell anyone because the tool will be taken away.* **8. Liability & Legal Frameworks** The self-driving car liability analogy is invoked. Discussion of the teenage suicide chatbot court case, Section 230, and whether AI company CEOs will eventually face congressional hearings similar to Facebook. AI companies currently lobbying against liability frameworks. **9. AI Productivity Measurement** Late-session tangent on organisational pressure to measure AI productivity. Introduction of the term *Agentic Work Units (AWUs)* — noted as a new metric introduced that week, essentially a proxy for token usage, which the group agrees doesn't meaningfully measure outcomes. --- ## Knowledge Management Discussion Session ### Main Topics **1. Enterprise AI Tool Frustrations (Microsoft Copilot)** The session opens with a candid account of being burned by an AI governance rollout — mobile device controls not ready until mid-year, data retention settings with no feedback loop, and chat history deletion that left residual artefacts. A recurring frustration: the product was "not fit for purpose" at the time of deployment. **2. RAG (Retrieval-Augmented Generation) Systems** Participants shared hands-on experience building RAG pipelines — chunking strategies, vector stores (Postgres, local embeddings), semantic search thresholds, and the challenge of getting relevant results over large corpora (one example: a 4,700-page PDF developer guide converted to Markdown, chunked into ~1,800 pieces). **3. AI Hallucination and Source Trust** A pointed discussion about AI "tripping on its own AI" — where search results surface AI-generated content that cites no authentic sources, perpetuating misinformation. The proposed solution: trust-weighting on retrieval results, similar to academic citation credibility scoring. Notably, Google was called out for not applying this despite having the data to do it. **4. Atlassian Intelligence / Rovo** Discussion of Atlassian's AI tool (formerly "Atlassian Intelligence," now "Rovo") for Confluence. A key finding: it can silently fall back to generic web answers when it can't find something in the knowledge base — a significant trust and accuracy risk for internal tooling. **5. Personal Knowledge Management with LLM Wikis + Obsidian** One participant described a detailed personal PKM setup using Obsidian vaults (AI, photography, EVs), ingesting YouTube transcripts and web articles via a Claude agent that builds structured markdown — creating photographer pages, lens references, concept tags — and connecting it all via three MCP servers in Claude Code. Highlight: the system surfaced a discrepancy between GitHub Copilot's 10-second hook timeout and Anthropic's (much longer) value. > *"Your old stuff never decays because it's always pulled back up into the curation layer... reconnected."* — On the appeal of LLM-powered personal knowledge bases vs. files that rot in folders **6. Customer Research Agents / Synthetic Personas** A team using RAG over years of customer research data to create an agent that answers questions *as* specific customer archetypes. The debate: how do you know the synthetic answer is accurate? The group acknowledged this is "murky" territory — factual accuracy is easier to validate than interpretive persona responses. **7. Evaluation Frameworks for Non-Deterministic Systems** A structured discussion on how to test LLM-based systems when outputs aren't deterministic. Approaches mentioned: human-curated expected responses, LLM-as-judge (using a cheaper/faster model to score outputs), snapshot test suites, and open-source eval platforms like **LangSmith** and **Arize**. The consensus: homegrown eval frameworks are being replaced by maturing open-source tooling. **8. Knowledge Graph Approaches** Brief but interesting: one participant described using **Neo4j and GraphRAG** for compliance/risk contexts, where traceability of reasoning matters — you can trace *how* an answer was derived and re-run with modified attributes to test accuracy drift. ## Highlights - **The "AI tripping on its own AI" problem** — AI search returning AI-generated content as sources, with no grounding in authentic documents. Identified as an industry-wide gap, including Google's own AI search. - **Rovo/Atlassian Intelligence silently goes off-piste** — gives generic web answers when it can't find internal data, rather than saying "I don't know." A usability and trust risk. - **Karpathy's LLM Wiki setup** is probably the most concrete and reproducible workflow described — Obsidian + Claude + MCP servers + structured ingestion producing a self-maintaining, cross-linked knowledge base. - **Evaluation frameworks are maturing fast** — the shift from hand-rolled company eval libraries (circa 2025) to "pretty good open source frameworks" is seen as a significant step forward. - **The curation problem remains unsolved at scale** — everyone building RAG systems hits the same wall: how much human judgment is needed to make retrieval genuinely useful vs. just semantically adjacent? --- ### Deterministic Governance and Proactive Protection for Autonomous Coding Agents The architectural transition from *predictive completion engines* to *autonomous agentic systems* represents a fundamental paradigm shift in contemporary software engineering. Modern [agents, most notably Claude Code and the GitHub Copilot CLI](/posts/engineering-excellence-in-the-agentic-era/), possess the capacity to autonomously navigate complex directory structures, synthesize multi-file architectural changes, and execute shell-level operations with minimal human intervention.[1, 2] However, the inherent non-determinism of large language models (LLMs) creates a critical tension when these models are granted write access to a local filesystem or network stack. The alignment between a model’s probabilistic reasoning and the rigorous security requirements of an enterprise environment cannot be maintained through natural language prompting alone; it requires a deterministic intercept layer.[3, 4] **Hooks** provide this requisite boundary, serving as the system-level interface where an agent’s proposed actions are subjected to machine-enforced policies before execution.[3, 5] ## The Theoretical Framework of Agentic Interception At the core of proactive protection for coding agents lies the concept of the deterministic layer. In a standard tool-use workflow, an agent generates a structured request—typically in JSON format—specifying a tool name and its associated input parameters.[6, 7] Without an interceptor, this request is passed directly to the tool executor, which performs the operation with the full permissions of the user.[3, 8] A hook-based architecture inverts this relationship by decoupling the model's decision to use a tool from the system's decision to permit that use.[3, 4] Hooks are user-defined event handlers that trigger at specific points in an agent's lifecycle, executing shell commands or scripts that operate independently of the model's internal state.[3, 9] This independence is crucial for security: whereas a model might be susceptible to [prompt injection](/posts/owasp-top-10-2025-for-llms/) or context pollution—where malicious instructions hidden in a retrieved document persuade the agent to perform an unauthorized action—a hook operates on raw tool arguments.[3, 8, 10] By validating these arguments against a predefined set of rules, hooks transform an agent from a potentially erratic collaborator into a reliable tool that adheres to strict organizational invariants.[3, 4, 5] | Protection Layer | Mechanism | Determinism | Susceptibility to Injection | | ------------------ | -------------------------------------------- | --------------------------- | --------------------------- | | System Prompting | Natural language instructions within context | Low (Probabilistic) | High | | Native Permissions | Interactive prompts for destructive actions | Medium (Human-dependent) | Medium (User fatigue) | | System Hooks | Machine-enforced scripts at lifecycle events | High (Deterministic) | Low | | OS Sandboxing | Kernel-level isolation of tool execution | Absolute (Physical/Logical) | Negligible | The efficacy of a hook-based system is determined by its resolution at key lifecycle events.[3, 11] For agents operating in a local shell, the most critical moments occur when a tool call is initiated, when context is compacted, and when a session is initialized.[9, 11, 12] Claude Code and GitHub Copilot CLI implement distinct but complementary taxonomies for these events, providing a rich surface area for proactive governance.[11, 13] ## Architectural Taxonomy of Claude Code Hooks Claude Code provides a sophisticated hook infrastructure characterized by deep integration into the agent’s internal orchestration loop.[9, 11] These hooks are not merely shell wrappers; they are reactive components capable of influencing the agent's behavior through structured JSON feedback.[11] The resolution of hooks in Claude Code follows a hierarchical configuration model, ensuring that security policies can be tailored to individual project requirements while maintaining global corporate standards.[3, 14] ### Lifecycle Event Resolution The Claude Code hook lifecycle spans from the initialization of a session to the final cleanup of resources after termination.[9, 11] Each event in this taxonomy serves a specific governance role, allowing developers to inject context, validate intent, or audit outcomes.[11] | Hook Event | Trigger Point | Governance Application | | ----------------- | -------------------------------------------- | ------------------------------------------------------------------------------- | | SessionStart | Initialization or resumption of a session | Environmental verification; re-injecting context after compaction [11, 12] | | UserPromptSubmit | Pre-processing of the user's input | Input sanitization; blocking known malicious prompt patterns [4, 11] | | PreToolUse | Immediately before tool execution | Mandatory gate for allow/deny decisions on shell commands [4, 11] | | PermissionRequest | Appearance of an interactive approval prompt | Automated bypass for safe commands; forced blocking for high-risk tools [5, 11] | | PostToolUse | After a tool call successfully completes | Output validation; automatic formatting of modified files [11, 14] | | Stop | Completion of the agent's response turn | Generating session reports; triggering post-response notifications [11, 13] | | FileChanged | Change detected in a watched file | Reactive consistency checks; automated test execution [9, 11] | | CwdChanged | Modification of the working directory | Environment reloading via tools like direnv [9, 11] | The **PreToolUse** event is the cornerstone of proactive protection.[4, 11] It fires after the model has decided to use a tool but before the system executes it.[11] At this point, the hook handler receives a JSON payload via standard input containing the tool_name (e.g., Bash, Edit, Glob) and tool_input (the specific arguments).[11] This granular access allows for the implementation of complex logic, such as blocking the rm command only when the target is a sensitive directory or preventing git push if the target branch is main.[3, 4] ### Configuration and Hierarchical Scope Governance is implemented through three layers of JSON-based settings, providing a flexible framework for policy distribution.[3, 11] 1. User-Global Settings (~/.claude/settings.json): Applies hooks to all Claude Code projects on a given machine, ideal for organization-wide logging or developer-wide shortcuts.[3, 11] 2. Project-Specific Settings (.claude/settings.json): Resides in the root of a repository and is intended to be committed to version control, ensuring that every team member operates under the same security invariants.[3, 11] 3. Local Project Overrides (.claude/settings.local.json): Used for personal tweaks and temporary debugging hooks that should not be shared with the team.[3, 11] The management of these hooks is facilitated by the /hooks slash command, which provides a read-only browser for active configurations.[9, 11] Modification of these policies is performed by direct editing of the JSON files or by tasking Claude to update its own configuration, a process that itself can be gated by higher-level hooks.[3, 9] ## GitHub Copilot CLI Hook Infrastructure While Claude Code prioritizes deep lifecycle integration, the GitHub Copilot CLI architecture emphasizes repository-scoped governance and enterprise alignment.[13, 15] Hooks in this environment are primarily defined within the .github/hooks/ directory, anchoring security policies to the repository’s default branch and ensuring they are loaded automatically whenever the CLI is invoked within that project.[15] ### Trigger Mechanisms and Tool Governance Copilot CLI triggers follow a similar taxonomy to Claude Code but with a specific focus on the coding agent’s multi-agent orchestration.[13, 16] In complex tasks, the Copilot CLI may spawn subagents—specialized entities for planning or execution—each of which is subject to the same hook-based scrutiny.[13, 16] | Copilot Trigger | Functionality | Implementation Context | | ------------------- | ----------------------------------- | -------------------------------------------------------------------------------- | | sessionStart | Executes at session beginning | Policy banners; verification of required environmental variables [13, 17] | | userPromptSubmitted | Fires when the user sends a prompt | Auditing of user intent; detection of prompt-based data exfiltration [13, 18] | | preToolUse | Most powerful gate; can block tools | Enforcement of security policies; blocking download-and-execute patterns [13] | | postToolUse | Fires after a tool completes | Tracking performance metrics; logging success/failure for audit trails [13, 16] | | errorOccurred | Fires on execution failure | Automated error diagnostics; notifications for critical system failures [13, 16] | | subagentStop | Executes when a subagent finishes | Validating subagent output before merging into the main context [13, 16] | The configuration format for Copilot CLI hooks is strictly defined in hooks.json, utilizing a type: "command" structure that specifies separate executable paths for Bash (Unix) and PowerShell (Windows).[13, 15] This cross-platform compatibility is essential for enterprise deployments where development environments are heterogeneous, especially in organizations addressing [AI-assisted development security gaps](/posts/ai-assisted-development-security-gaps-and-solutions/).[13, 15] ### Tool Permission Layers Copilot CLI provides a robust set of command-line flags that act as an immediate layer of proactive protection, supplementing the more complex hook system.[19, 20] These flags allow administrators to restrict the set of tools an agent is even aware of, preventing "hallucinated" attempts to use forbidden capabilities.[20] * --available-tools: Disables all tools except those explicitly listed in the allowlist.[20] * --excluded-tools: Specifically removes specific tools (e.g., web_fetch) from the agent’s repertoire.[20] * --deny-tool: Specifically blocks a tool, such as shell(rm), taking precedence over all other allow flags.[19, 21] * --allow-all-tools: Bypasses the permission system entirely—a mode colloquially known as "YOLO mode," which should only be used in deeply isolated sandboxes.[19, 21, 22] Path-based permissions further constrain the agent's reach, with the --allow-all-paths flag representing a significant security risk, as it permits the agent to access files outside the current project root.[19] Conversely, --disallow-temp-dir provides a targeted restriction against the use of temporary directories for staging unauthorized scripts.[19] ## Technical Implementation of Deterministic Interception The bridge between the agentic orchestrator and the security hook is constructed using standard Unix streams.[3, 11, 15] When an event fires, the agent serializes its internal state into a JSON object and writes it to the standard input of the hook handler.[3, 11] The handler’s primary responsibility is to parse this JSON, apply its logic, and signal its decision via an exit code and a JSON response on standard output.[9, 11] ### JSON Input and Schema Specification For tool-based hooks, the input schema is designed to provide sufficient context for a policy decision without overwhelming the handler with irrelevant session data.[11, 18] | Field Name | Data Type | Description | | ------------- | ------------------ | --------------------------------------------------------------------------------- | | timestamp | Integer (ms) | The Unix timestamp when the event was generated [18] | | cwd | String | The current working directory of the agent session [11, 18] | | toolName | String | The identifier of the tool being called (e.g., bash, edit) [11, 18] | | toolArgs | JSON Object/String | The arguments passed to the tool, such as the shell command or file path [11, 18] | | session_id | String | A unique identifier for the current agentic session [11] | | initialPrompt | String | The user's starting prompt (SessionStart only) [18] | In the context of a PreToolUse event, the toolArgs field is the most critical.[11] If the toolName is Bash, toolArgs will contain the command string.[11] A hook handler can use jq or a native language parser to extract this command and check for restricted patterns.[3, 9, 11] ### Decision Control via Exit Codes The exit code returned by the hook script is the primary signal for the orchestrator.[9, 11] This mechanism ensures that the security logic is fast and unambiguous.[4] * Exit Code 0 (Success): Indicates the action is permitted. If the hook is a SessionStart or UserPromptSubmit event, any text written to standard output is injected into the agent's context as a "system message".[9, 11] This allows hooks to provide real-time guidance based on the current environment.[12] * Exit Code 2 (Deny): Signals a hard block. The action is cancelled, and the agent is informed of the denial.[9, 11] The error message written to standard error (stderr) is relayed to the agent, providing a semantic reason for the failure.[4, 9, 11] * Other Exit Codes: Typically treated as non-blocking failures.[9] The error is logged but the agent proceeds.[9] This "fail-open" behavior is often preferred for logging or notification hooks to prevent system hangs.[12, 23] ### JSON Response for Structured Feedback While exit codes are sufficient for binary allow/deny decisions, structured JSON responses allow for more nuanced control, particularly for PreToolUse hooks.[11] ``` { "hookSpecificOutput": { "hookEventName": "PreToolUse", "permissionDecision": "deny", "permissionDecisionReason": "Administrative policy forbids the use of 'npm install' without a verified lockfile check." } } ``` This JSON structure is parsed by the agent’s orchestrator. The permissionDecision field can take the values "allow", "deny", or "ask", with "ask" forcing a manual confirmation prompt from the user even if the agent is running in a semi-autonomous mode.[9, 11] ## Advanced Policy Engines and Governance Middleware As agentic deployments mature, the limitations of managing dozens of individual shell scripts become apparent.[5] Enterprise environments require a unified, auditable policy engine that can manage complex rules across multiple projects.[4, 5] ### Agent RuleZ: High-Performance Policy Enforcement Agent RuleZ is a deterministic policy engine implemented in Rust, designed to sit between the AI agent and its environment.[4, 5] By replacing individual shell scripts with a single high-performance binary, RuleZ reduces hook latency to under ****10**** milliseconds, ensuring that security checks do not degrade the developer experience.[4, 5] | Component | Role | Mechanism | | --------------------- | ------------------------ | ----------------------------------------------------------- | | RuleZ Binary | Runtime Policy Evaluator | Rust-based interceptor for JSON events [4, 5] | | Mastering Hooks Skill | Configuration Assistant | LLM-based agent that translates CLAUDE.md to YAML rules [5] | | YAML Policy Engine | Human-Readable Rules | Conditional logic based on tools, paths, and regex [4, 5] | | Audit Logger | Governance Compliance | Immutable JSON Lines logging of every decision [4, 5] | RuleZ introduces the concept of policy modes, allowing teams to transition from silent auditing to active enforcement.[4, 5] In audit mode, matches are logged but no action is taken, allowing security teams to identify potential false positives.[4, 5] In warn mode, the agent receives a warning message but the operation proceeds.[4, 5] Finally, in enforce mode, the engine actively blocks operations or injects context based on its rules.[4, 5] The evaluation pipeline in RuleZ is strictly defined: rules are sorted by priority, ensuring that critical security blocks are evaluated before additive context injections.[4] This prevents a situation where a "helpful" hook might accidentally bypass a security constraint.[4] Furthermore, RuleZ matchers utilize AND logic; for a rule to trigger, all defined matchers—including tool name, file extension, and command regex—must match the event.[4, 5] ### Open Policy Agent (OPA) for AI Governance For organizations already utilizing the Open Policy Agent (OPA) for cloud or Kubernetes governance, integrating OPA into the AI agent workflow provides a unified policy language (Rego) across the stack.[24, 25] OPA decouples the decision-making logic from the agent's enforcement point, allowing security teams to manage AI policies in the same way they manage network or RBAC policies and broader [AI safety and governance](/posts/2024-06-02-ai-safety-and-governance/) initiatives.[24, 26] A typical OPA integration involves a hook script that captures the JSON input from Claude Code or Copilot CLI and performs a POST request to a local OPA sidecar or central OPA server.[26, 27] OPA evaluates the input against Rego policies, such as: ``` package agent.pre_tool_use default allow = false # Allow read-only operations allow if { input.toolName == "Read" } # Deny destructive bash commands deny[msg] if { input.toolName == "Bash" rego.parse_module("bash_command", input.toolArgs.command) contains(input.toolArgs.command, "rm -rf") msg := "Destructive shell commands are restricted by organizational policy." } ``` The primary advantage of OPA is its support for complex, hierarchical data and external data sources.[24, 28] An OPA policy could, for example, query an internal asset database to determine if the specific file an agent is attempting to edit is classified as "high-sensitivity" before granting permission.[23, 24] ## Infrastructure Isolation and Sandboxing Strategies While hooks provide the logical enforcement of policies, the physical containment of the agent is achieved through sandboxing.[8, 29] A sandbox is a securely isolated execution environment that limits the agent's access to the host machine's infrastructure.[8] Effective proactive protection follows a defense-in-depth model, where hooks filter the intent and a sandbox contains the impact.[30] ### Hypervisor-Level Isolation Traditional process-level sandboxes (e.g., seccomp, Landlock) depend on the host kernel for enforcement.This creates a shared vulnerability surface; if an agent exploits a kernel vulnerability, it can escape the sandbox.[22] Production-grade agent environments, such as those provided by Edera or E2B, utilize hypervisor-level isolation to give each agent its own dedicated kernel.[10, 22] | Technology | Isolation Type | Startup Latency | Security Profile | | ---------------- | ----------------------- | --------------- | --------------------------------------------------- | | Docker | Process-level Container | ~200ms | Host kernel shared; susceptible to escapes [10, 22] | | Kata Containers | Micro-VM | ~1,934ms | Dedicated kernel; high overhead [22] | | gVisor | User-space Kernel | ~220ms | Partial syscall support; shared host kernel [22] | | Edera | Hypervisor-level VM | <766ms | Per-agent kernel; bare-metal speed [22] | | NVIDIA OpenShell | K3s-based Gateway | Variable | Policy-enforced egress; credential injection [31] | The choice of sandboxing technology is often a trade-off between startup speed and isolation quality. For high-velocity agentic workflows—where an agent might spawn dozens of subagents to perform parallel tasks—startup latency is a critical factor. Edera's use of paravirtualization achieves sub-second startup times while maintaining the security of a full VM, a significant improvement over traditional Kata container overhead.[22] ### Mandatory Sandbox Security Controls An effective sandbox for coding agents must implement five core rules to minimize the risk of a breach.[8, 29] 1. Network Egress Filtering: By default, all outbound network access should be blocked.[8] An agent writing a Python script rarely needs to call an unknown external IP over port 443.[8] Access should only be granted to a strict allowlist of necessary APIs (e.g., GitHub, PyPI).[8, 29] 2. Filesystem Write Gating: The agent should be restricted to writing only within the active workspace directory.[8, 29] Any attempt to modify system dotfiles, SSH keys, or global configurations must be blocked by the OS, not just the agent logic.[29] 3. Credential Sequestration: Sensitive secrets (API keys, tokens) should never be stored on the sandbox filesystem.[10, 31] Systems like NVIDIA OpenShell inject credentials as environment variables at runtime, ensuring they are volatile and non-persistable.[31] 4. Audit Logging: Every system call, network request, and file write must be recorded in an immutable audit log.[8] This provides the "tamper-evident provenance" required for post-incident investigation.[32] 5. Hard Timeouts: To prevent runaway processes and resource exhaustion, strict timeouts must be enforced at the per-tool, per-task, and per-sandbox level.[8] E2B and other providers enforce these limits upon sandbox creation to ensure predictable billing and performance.[8] ## Community Security Layers and Practical Safety Wrappers The rapid adoption of Claude Code and Copilot CLI has spurred a vibrant ecosystem of community-developed safety tools designed to address the "Alignment Gap" between agent autonomy and system stability.[33, 34] ### Safe Agentic Loops and Reversibility One of the most practical strategies for proactive protection is the implementation of "Safe Agentic Loops".[35] Tools like ghcralph (GitHub Copilot Ralph) prioritize reversibility over autonomous speed.[35] Based on the "Ralph Wiggum pattern," these tools implement a structured execution loop where every change is automatically checkpointed, reinforcing a [spec-driven development](/posts/ai-spec-driven-development/) discipline.[35] * Git Branch Isolation: The agent operates on a dedicated branch (prefixed with ghcralph/) and is forbidden from modifying main or master directly.[35] * Automatic Checkpointing: A git commit is created after every successful iteration, with a summary of the plan context included in the commit message.[35] * First-Class Rollback: Reverting the environment to a previous state is a native command, allowing users to undo changes by specifying the number of iterations to discard.[35] * Resource Circuit Breakers: The system automatically pauses execution after three consecutive failures or when 80% of the token or iteration budget is reached.[35] This approach transforms the agent's work from a risky "black box" into a series of reviewable, reversible increments.[35] By treating every agent action as a draft, organizations can permit higher levels of autonomy without fearing a catastrophic filesystem corruption.[35] ### Smart Filtering and Noise Reduction A recurring challenge in proactive protection is "permission fatigue," where a noisy agent asks for approval for trivial commands (e.g., ls, git status), leading the user to spam "yes" without reading.[33] Community wrappers like vibesafu address this by using a secondary model to analyze the "vibe" of each command.[33] * Context-Aware Filtering: vibesafu filters out "safe" commands that everyone approves anyway, while taking a conservative approach to anything risky.[33] * Deep Analysis: It distinguishes between "safe" package installs (e.g., npm install lodash) and potentially malicious ones (e.g., npm install evil-miner), a task that requires semantic understanding beyond simple regex pattern matching.[33] * LLM-as-a-Judge: By using a separate model to audit the main agent's proposals, these wrappers provide an "independent observer" that can catch hallucinations or adversarial behavior.[27, 33] This layer of "smart filtering" is complementary to sandboxing; while the sandbox isolates the blast radius, the wrapper prevents the blast from occurring by ensuring the user only sees prompts that require genuine judgment.[33] ## Enterprise Governance and Implementation Strategies For engineering leaders and DevOps teams, deploying autonomous agents requires a transition from individual security settings to an organization-wide governance framework.[5, 17] This framework must balance developer velocity with the strict requirements of compliance regimes like SOC 2 or ISO 42001.[1, 5] ### Policy-Driven Development The most effective proactive protection begins with the definition of clear organizational policies that describe "safe" and "unsafe" behavior.[17] These policies are then codified into hooks that are shared across all repositories.[5, 17] | Policy Category | Codified Instruction | Implementation Hook | | --------------------- | ----------------------------------------------- | -------------------------------------- | | Privilege Escalation | Never allow sudo or runas | preToolUse regex check [17, 21] | | Branch Protection | No direct pushes to main | preToolUse git argument check [4, 17] | | Dependency Management | Verify all npm installs against lockfile | postToolUse audit script [33, 34] | | Context Restoration | Re-inject architectural bibles after compaction | SessionStart (reactive) [12] | | Compliance Audit | Log all prompts and tool calls to central repo | UserPromptSubmit / PostToolUse [5, 17] | This policy-driven approach is supported by tools like Agent RuleZ's "Mastering Hooks Skill," which can scan a repository's CLAUDE.md or architectural guidelines and automatically generate the necessary YAML rules.[5] This "Agent-Guiding-Agent" model ensures that security policies stay in sync with the actual engineering requirements of the project.[5] ### Context Engineering and Proactive Guidance Proactive protection is not merely a matter of blocking unauthorized actions; it is equally about guiding the agent toward authorized ones.[4, 12] [Context engineering](/posts/a-guide-to-harness-engineering-building-reliable-ai-agent-workflows/) uses hooks to inject relevant rules and standards into the agent's working memory at the exact moment they become applicable.[12, 36] For example, when a Claude Code session begins, a SessionStart hook can read the project's architecture decision records (ADRs) and inject a summary into the conversation.[12] If the agent then attempts to edit a database schema, a PreToolUse hook can inject specific SQL performance guidelines.[4, 37] This proactive injection ensures that the agent "knows" the rules before it even attempts a tool call, significantly reducing the frequency of blocked operations and enhancing overall productivity.[4, 12, 36] ## Synthesis of Causal Relationships in Agent Protection The integration of hooks into the agentic workflow creates a set of causal relationships that fundamentally alter the security posture of the development environment.[3, 22] 1. Inversion of Control: By moving the enforcement point from the "thinking" agent to the "executing" system, hooks break the causal chain of prompt injection.[3, 4, 8] An agent might be "convinced" to run a malicious command, but the hook, operating on a different logical plane, provides the deterministic override that prevents the execution.[3, 4] 2. Feedback-Driven Refinement: The structured feedback provided by hooks (via permissionDecisionReason) allows the agent to reason about its failure and adapt its plan.[3, 9, 11] This creates a collaborative safety model where the system guides the AI away from destructive patterns and toward compliant alternatives.[4, 9, 11] 3. Defense-in-Depth and Blast Radius Mitigation: The combination of logical hooks and physical sandboxing creates a multiplicative security effect.[30] The hooks prevent "known" destructive patterns (e.g., specific git commands), while the sandbox contains "unknown" threats (e.g., a vulnerable binary in the toolchain).[22, 30] 4. Standardization and Portability: The use of repository-scoped hook files (.github/hooks/) and global settings ensures that security policies are portable.[13, 15] This standardizes the trust boundary across the organization, ensuring that an agent is just as safe on a new developer’s machine as it is on a veteran’s.[5, 13, 17] ## Future Horizons: The Evolution of Autonomous Governance The landscape of proactive agent protection is moving toward a future of "Model-Agnostic Governance".[4, 5, 38] With the GitHub Copilot CLI now supporting a variety of models—including Claude 3.5 Sonnet, GPT-5, and Gemini 3 Pro—the need for a unified policy engine that works across different agents is becoming paramount.[4, 38] The emergence of the Model Context Protocol (MCP) will likely serve as the standardized interface for this future.[1, 39, 40] MCP allows agents to connect to external tools and data sources via a universal protocol, and it provides a natural integration point for governance.[1, 40] A future "Governance MCP Server" could act as a real-time policy authority, providing agents with the credentials they need and the permissions they are granted based on their current task and trust level.[31, 40] Furthermore, as agents transition into "Agent Teams"—where multiple specialized entities collaborate on a single task—the complexity of governance will increase exponentially.[11, 38] This will require the implementation of "Inter-Agent Firewalls," where hooks monitor the communication and data transfer between subagents to prevent privilege escalation or the propagation of malicious inputs.[40, 41] Proactive protection, once a simple matter of blocking a shell command, is becoming a comprehensive discipline of "Cognitive Security," ensuring that the benefits of autonomous intelligence are never decoupled from the requirements of human safety.[3, 4, 5, 22] ## Conclusion and Strategic Recommendations The deployment of autonomous coding agents such as Claude Code and GitHub Copilot CLI represents an extraordinary opportunity for productivity, but one that must be managed with extreme technical rigor.[1, 3, 4] Proactive protection is achieved through the meticulous implementation of deterministic hooks, robust OS-level sandboxing, and centralized policy engines.[3, 4, 5, 22] For organizations embarking on this journey, the following strategic priorities are identified: * Prioritize Determinism Over Prompting: Never rely on system prompts alone to enforce security; implement hooks for every critical tool and lifecycle event.[3, 4, 5] * Enforce Hypervisor-Level Isolation: Move beyond process-level containers to micro-VMs or paravirtualized environments to minimize the risk of sandbox escapes.[22] * Codify and Commit Policies: Use repository-scoped hook files to ensure that security invariants are portable and auditable through version control.[13, 15, 17] * Invest in Context Engineering: Use hooks to inject architectural standards and safety guidelines proactively, reducing the cognitive load on the agent and the likelihood of blocked operations.[4, 12, 36] * Maintain Immutable Audit Trails: Log every user prompt, agent thought, tool call, and system decision to a tamper-evident repository to ensure long-term accountability.[4, 5, 32] By treating AI agents not as trusted colleagues, but as powerful, non-deterministic components that require strict mechanical governance, engineering teams can build a future where autonomy and safety are inextricably linked.[3, 4, 5, 22] -------------------------------------------------------------------------------- ## References 1. Claude Code vs. GitHub Copilot: A Real Developer Comparison - The Codegen Blog, ++[https://codegen.com/blog/claude-code-vs-github-copilot/](https://www.google.com/url?sa=E&q=https%3A%2F%2Fcodegen.com%2Fblog%2Fclaude-code-vs-github-copilot%2F)++ 2. About GitHub Copilot CLI, ++[https://docs.github.com/copilot/concepts/agents/about-copilot-cli](https://www.google.com/url?sa=E&q=https%3A%2F%2Fdocs.github.com%2Fcopilot%2Fconcepts%2Fagents%2Fabout-copilot-cli)++ 3. Understanding Claude Code hooks documentation - PromptLayer Blog, ++[https://blog.promptlayer.com/understanding-claude-code-hooks-documentation/](https://www.google.com/url?sa=E&q=https%3A%2F%2Fblog.promptlayer.com%2Funderstanding-claude-code-hooks-documentation%2F)++ 4. Agent RuleZ: A Deterministic Policy Engine for AI Coding Agents | by ..., ++[https://medium.com/@richardhightower/agent-rulez-a-deterministic-policy-engine-for-ai-coding-agents-9489e0561edf](https://www.google.com/url?sa=E&q=https%3A%2F%2Fmedium.com%2F%40richardhightower%2Fagent-rulez-a-deterministic-policy-engine-for-ai-coding-agents-9489e0561edf)++ 5. SpillwaveSolutions/agent_rulez: Agent Rulz - GitHub, ++[https://github.com/SpillwaveSolutions/agent_rulez](https://www.google.com/url?sa=E&q=https%3A%2F%2Fgithub.com%2FSpillwaveSolutions%2Fagent_rulez)++ 6. Programmatic tool calling - Claude API Docs, ++[https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling](https://www.google.com/url?sa=E&q=https%3A%2F%2Fplatform.claude.com%2Fdocs%2Fen%2Fagents-and-tools%2Ftool-use%2Fprogrammatic-tool-calling)++ 7. How to implement tool use - Claude API Docs, ++[https://platform.claude.com/docs/en/agents-and-tools/tool-use/implement-tool-use](https://www.google.com/url?sa=E&q=https%3A%2F%2Fplatform.claude.com%2Fdocs%2Fen%2Fagents-and-tools%2Ftool-use%2Fimplement-tool-use)++ 8. AI Agent Sandbox: How to Safely Run Autonomous Agents in 2026 - Firecrawl, ++[https://www.firecrawl.dev/blog/ai-agent-sandbox](https://www.google.com/url?sa=E&q=https%3A%2F%2Fwww.firecrawl.dev%2Fblog%2Fai-agent-sandbox)++ 9. Automate workflows with hooks - Claude Code Docs, ++[https://code.claude.com/docs/en/hooks-guide](https://www.google.com/url?sa=E&q=https%3A%2F%2Fcode.claude.com%2Fdocs%2Fen%2Fhooks-guide)++ 10. A Technical Guide to AI Agent Sandboxing | by Oleg Sucharevich - Level Up Coding, ++[https://levelup.gitconnected.com/a-technical-guide-to-ai-agent-sandboxing-dfdf9571dd2d](https://www.google.com/url?sa=E&q=https%3A%2F%2Flevelup.gitconnected.com%2Fa-technical-guide-to-ai-agent-sandboxing-dfdf9571dd2d)++ 11. Hooks reference - Claude Code Docs, ++[https://code.claude.com/docs/en/hooks](https://www.google.com/url?sa=E&q=https%3A%2F%2Fcode.claude.com%2Fdocs%2Fen%2Fhooks)++ 12. Harness and Context Engineering: Agents - Injecting the Right Rules at the Right Moment | by Rick Hightower | Feb, 2026 | Spillwave Solutions - Medium, ++[https://medium.com/@richardhightower/context-engineering-agents-injecting-the-right-rules-at-the-right-moment-5df91dc215ab](https://www.google.com/url?sa=E&q=https%3A%2F%2Fmedium.com%2F%40richardhightower%2Fcontext-engineering-agents-injecting-the-right-rules-at-the-right-moment-5df91dc215ab)++ 13. About hooks - GitHub Docs, ++[https://docs.github.com/en/copilot/concepts/agents/coding-agent/about-hooks](https://www.google.com/url?sa=E&q=https%3A%2F%2Fdocs.github.com%2Fen%2Fcopilot%2Fconcepts%2Fagents%2Fcoding-agent%2Fabout-hooks)++ 14. Claude Code CLI Cheatsheet: config, commands, prompts, + best practices - Shipyard.build, ++[https://shipyard.build/blog/claude-code-cheat-sheet/](https://www.google.com/url?sa=E&q=https%3A%2F%2Fshipyard.build%2Fblog%2Fclaude-code-cheat-sheet%2F)++ 15. Using hooks with GitHub Copilot CLI - GitHub Docs, ++[https://docs.github.com/en/copilot/how-tos/copilot-cli/customize-copilot/use-hooks](https://www.google.com/url?sa=E&q=https%3A%2F%2Fdocs.github.com%2Fen%2Fcopilot%2Fhow-tos%2Fcopilot-cli%2Fcustomize-copilot%2Fuse-hooks)++ 16. About hooks - GitHub Enterprise Cloud Docs, ++[https://docs.github.com/en/enterprise-cloud@latest/copilot/concepts/agents/coding-agent/about-hooks](https://www.google.com/url?sa=E&q=https%3A%2F%2Fdocs.github.com%2Fen%2Fenterprise-cloud%40latest%2Fcopilot%2Fconcepts%2Fagents%2Fcoding-agent%2Fabout-hooks)++ 17. Using hooks with Copilot CLI for predictable, policy-compliant execution - GitHub Docs, ++[https://docs.github.com/en/copilot/tutorials/copilot-cli-hooks](https://www.google.com/url?sa=E&q=https%3A%2F%2Fdocs.github.com%2Fen%2Fcopilot%2Ftutorials%2Fcopilot-cli-hooks)++ 18. Hooks configuration - GitHub Docs, ++[https://docs.github.com/en/copilot/reference/hooks-configuration](https://www.google.com/url?sa=E&q=https%3A%2F%2Fdocs.github.com%2Fen%2Fcopilot%2Freference%2Fhooks-configuration)++ 19. Configure GitHub Copilot CLI - GitHub Docs, ++[https://docs.github.com/en/copilot/how-tos/copilot-cli/set-up-copilot-cli/configure-copilot-cli](https://www.google.com/url?sa=E&q=https%3A%2F%2Fdocs.github.com%2Fen%2Fcopilot%2Fhow-tos%2Fcopilot-cli%2Fset-up-copilot-cli%2Fconfigure-copilot-cli)++ 20. Allowing and denying tool use - GitHub Docs, ++[https://docs.github.com/en/copilot/how-tos/copilot-cli/allowing-tools](https://www.google.com/url?sa=E&q=https%3A%2F%2Fdocs.github.com%2Fen%2Fcopilot%2Fhow-tos%2Fcopilot-cli%2Fallowing-tools)++ 21. Configure GitHub Copilot CLI - GitHub Enterprise Cloud Docs, ++[https://docs.github.com/en/enterprise-cloud@latest/copilot/how-tos/copilot-cli/set-up-copilot-cli/configure-copilot-cli](https://www.google.com/url?sa=E&q=https%3A%2F%2Fdocs.github.com%2Fen%2Fenterprise-cloud%40latest%2Fcopilot%2Fhow-tos%2Fcopilot-cli%2Fset-up-copilot-cli%2Fconfigure-copilot-cli)++ 22. AI Agent Sandboxing - Edera, ++[https://edera.dev/use-case/ai-agent-sandboxing](https://www.google.com/url?sa=E&q=https%3A%2F%2Federa.dev%2Fuse-case%2Fai-agent-sandboxing)++ 23. Operations | Open Policy Agent, ++[https://openpolicyagent.org/docs/operations](https://www.google.com/url?sa=E&q=https%3A%2F%2Fopenpolicyagent.org%2Fdocs%2Foperations)++ 24. Open Policy Agent (OPA), ++[https://openpolicyagent.org/docs](https://www.google.com/url?sa=E&q=https%3A%2F%2Fopenpolicyagent.org%2Fdocs)++ 25. open-policy-agent/opa - GitHub, ++[https://github.com/open-policy-agent/OPA](https://www.google.com/url?sa=E&q=https%3A%2F%2Fgithub.com%2Fopen-policy-agent%2FOPA)++ 26. Integrating OPA - Open Policy Agent, ++[https://openpolicyagent.org/docs/integration](https://www.google.com/url?sa=E&q=https%3A%2F%2Fopenpolicyagent.org%2Fdocs%2Fintegration)++ 27. opencode · GitHub Topics, ++[https://github.com/topics/opencode?l=rust](https://www.google.com/url?sa=E&q=https%3A%2F%2Fgithub.com%2Ftopics%2Fopencode%3Fl%3Drust)++ 28. CLI Reference | Open Policy Agent, ++[https://openpolicyagent.org/docs/cli](https://www.google.com/url?sa=E&q=https%3A%2F%2Fopenpolicyagent.org%2Fdocs%2Fcli)++ 29. Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk | NVIDIA Technical Blog, ++[https://developer.nvidia.com/blog/practical-security-guidance-for-sandboxing-agentic-workflows-and-managing-execution-risk/](https://www.google.com/url?sa=E&q=https%3A%2F%2Fdeveloper.nvidia.com%2Fblog%2Fpractical-security-guidance-for-sandboxing-agentic-workflows-and-managing-execution-risk%2F)++ 30. kenryu42/claude-code-safety-net - GitHub, ++[https://github.com/kenryu42/claude-code-safety-net](https://www.google.com/url?sa=E&q=https%3A%2F%2Fgithub.com%2Fkenryu42%2Fclaude-code-safety-net)++ 31. NVIDIA/OpenShell: OpenShell is the safe, private runtime ... - GitHub, ++[https://github.com/NVIDIA/OpenShell](https://www.google.com/url?sa=E&q=https%3A%2F%2Fgithub.com%2FNVIDIA%2FOpenShell)++ 32. GodSpeedAI/SEA: SEA makes your enterprise feel alive, self-aware, and self-documenting. - GitHub, ++[https://github.com/GodSpeedAI/SEA](https://www.google.com/url?sa=E&q=https%3A%2F%2Fgithub.com%2FGodSpeedAI%2FSEA)++ 33. Built a safety wrapper for Claude Code - no more --dangerously-skip-permission - Reddit, ++[https://www.reddit.com/r/ClaudeAI/comments/1qwhebh/built_a_safety_wrapper_for_claude_code_no_more/](https://www.google.com/url?sa=E&q=https%3A%2F%2Fwww.reddit.com%2Fr%2FClaudeAI%2Fcomments%2F1qwhebh%2Fbuilt_a_safety_wrapper_for_claude_code_no_more%2F)++ 34. Built a CLI wrapper for Claude Code. 7KB. Three agents. Zero config. : r/ClaudeAI - Reddit, ++[https://www.reddit.com/r/ClaudeAI/comments/1rwoaag/built_a_cli_wrapper_for_claude_code_7kb_three/](https://www.google.com/url?sa=E&q=https%3A%2F%2Fwww.reddit.com%2Fr%2FClaudeAI%2Fcomments%2F1rwoaag%2Fbuilt_a_cli_wrapper_for_claude_code_7kb_three%2F)++ 35. rpothin/ghc-ralph-cli: A GitHub Copilot-powered CLI for ... - GitHub, ++[https://github.com/rpothin/ghc-ralph-cli](https://www.google.com/url?sa=E&q=https%3A%2F%2Fgithub.com%2Frpothin%2Fghc-ralph-cli)++ 36. Agent RuleZ: A Deterministic Policy Engine for AI Coding Agents - Medium, ++[https://medium.com/spillwave-solutions/agent-rulez-a-deterministic-policy-engine-for-ai-coding-agents-9489e0561edf](https://www.google.com/url?sa=E&q=https%3A%2F%2Fmedium.com%2Fspillwave-solutions%2Fagent-rulez-a-deterministic-policy-engine-for-ai-coding-agents-9489e0561edf)++ 37. 10 Must-Have Skills for Claude (and Any Coding Agent) in 2026 - Medium, ++[https://medium.com/@unicodeveloper/10-must-have-skills-for-claude-and-any-coding-agent-in-2026-b5451b013051](https://www.google.com/url?sa=E&q=https%3A%2F%2Fmedium.com%2F%40unicodeveloper%2F10-must-have-skills-for-claude-and-any-coding-agent-in-2026-b5451b013051)++ 38. Your Entire Engineering Floor Just Stopped Coding | All things Azure, ++[https://devblogs.microsoft.com/all-things-azure/your-entire-engineering-floor-just-stopped-coding/](https://www.google.com/url?sa=E&q=https%3A%2F%2Fdevblogs.microsoft.com%2Fall-things-azure%2Fyour-entire-engineering-floor-just-stopped-coding%2F)++ 39. Intercept and control agent behavior with hooks - Claude API Docs, ++[https://platform.claude.com/docs/en/agent-sdk/hooks](https://www.google.com/url?sa=E&q=https%3A%2F%2Fplatform.claude.com%2Fdocs%2Fen%2Fagent-sdk%2Fhooks)++ 40. Overview of customizing GitHub Copilot CLI - GitHub Docs, ++[https://docs.github.com/en/copilot/how-tos/copilot-cli/customize-copilot/overview](https://docs.github.com/en/copilot/how-tos/copilot-cli/customize-copilot/overview)++ 41. AI Agent Security - OWASP Cheat Sheet Series, ++[https://cheatsheetseries.owasp.org/cheatsheets/AI_Agent_Security_Cheat_Sheet.html](https://cheatsheetseries.owasp.org/cheatsheets/AI_Agent_Security_Cheat_Sheet.html)++ ### Tokenmaxxing and Agentic Work Units: The New Currency of the Software Economy Exploring the shift in the global software economy from human-centered models to systems driven by digital labor arbitrage, where artificial intelligence is quantified as a fundamental unit of value. It contrasts two primary paradigms: tokenmaxxing, a culture focused on maximizing raw computational consumption, and Agentic Work Units (AWUs), a metric used by enterprises to price AI based on discrete tasks performed. This evolution reflects a broader transition toward usage-based and outcome-based monetization, allowing software companies to decouple their revenue from traditional per-seat licensing. While these advancements promise massive productivity gains, we also highlight critical risks such as vanity metrics and system failures, emphasizing that long-term success depends on aligning AI exertion with genuine business results. Ultimately, the analysis frames this era as the rise of the Agentic Enterprise, where the collaboration between humans and autonomous agents redefines professional status and corporate efficiency. ## 5 Shifts Redefining the Future of Work ### The End of the Experimental Era The software landscape has undergone a profound structural transformation between 2024 and 2026. What began as the experimental adoption of large language models (LLMs) has matured into the deep integration of autonomous systems. We have moved past the era of "per-seat" software, where value was tied to human headcount, into a world defined by "usage-based" digital labor. At the heart of this transition is The Digital Labor Arbitrage. We are no longer merely using AI to assist humans; we are deploying synthetic cognition to perform work at a scale and speed previously unimaginable. This evolution is redefining the fundamental units of value in the global economy, shifting the focus from how many people use a tool to how much autonomous labor that tool actually executes. ### Takeaway 1: "Tokenmaxxing" is the New Status Game In the elite tech hubs of Silicon Valley, a new cultural and professional status game has emerged: "tokenmaxxing." Borrowing from self-optimization subcultures where the suffix "-maxxing" refers to the obsessive maximization of a specific trait, tokenmaxxing is the systematic maximization of AI token consumption. To understand this phenomenon, one must define the "token" precisely: these are the atomic units of computation, roughly equivalent to four characters of text. For high-level engineers, high token throughput has become a proxy for professional leverage and innovation speed rather than a mere cost center. The scale of this behavior is staggering. By late 2025, internal initiatives at companies like Meta saw employees consuming 60 trillion tokens within a single 30-day period. Top-tier engineers are now averaging over 280 billion tokens daily. In this environment, being "token-rich" is a badge of honor, signaling that a professional is operating at the frontier of the autonomous economy. ### Takeaway 2: The Economic Logic of Synthetic Cognition The drive toward tokenmaxxing is fueled by the collapsing cost of "synthetic cognition." Technical breakthroughs—specifically Grouped-Query Attention (GQA), 4-bit model quantization, and specialized inference hardware like the Vera Rubin NVL72—have driven the marginal cost of a token below the cost of human effort for a vast range of tasks. This has birthed the "Token Substitution" strategy. If an organization spends $400,000 on tokens to augment a $400,000 engineer, and that engineer produces the output of ten traditional peers, the maximization of tokens becomes the only rational strategy for capital allocation. However, this has also led to what Andre Karpathy describes as "AI psychosis"—an addictive pressure to constantly build, driven by the fear that every idle token represents wasted competitive potential. Metric Traditional Human Model Tokenmaxxing Augmented Model Primary Unit of Labor Human Hours / Seats Tokens / Inference Calls Scaling Constraint Recruitment and Onboarding Compute Availability / Context Window Cost Structure Fixed (Salary + Benefits) Variable (Usage-Based) Output Correlation Linear to Headcount Exponential to Token Throughput Inference Efficiency Infrastructure Utility Digital Workforce Productivity ### Takeaway 3: The Rise of the Agentic Work Unit (AWU) As the market matured, industry leaders recognized that measuring tokens—or "how much an AI talks"—doesn't always correlate with business value. To preserve unit economics, Salesforce introduced the Agentic Work Unit (AWU) in early 2026. An AWU quantifies raw intelligence converted into real work, such as a completed reasoning chain, a tool call to update a CRM, or a triggered workflow. This shift is a survival tactic. After a period of valuation compression known as the "SaaSpocalypse," the industry is using AWUs to decouple revenue from human headcount. The strategy is working: Salesforce reported fiscal 2026 revenue of 41.5 billion** and a record **72.4 billion in remaining performance obligations. Furthermore, the platform processed 2.4 billion AWUs in Q4 FY2026 alone. A critical strategist's insight here is the "elastic relationship" between tokens and AWUs. As platforms become "token-lean" through optimization, the volume of work performed (AWUs) diverges from the underlying compute cost (tokens), allowing vendors to capture significantly higher margins while customers achieve better ROI. "Salesforce has transitioned from a static database into an active 'operating system' for AI agents." ### Takeaway 4: Resolution-Based Pricing vs. The Conversation Tax With the rise of agentic labor, monetization strategies have split into two camps: * The Consumption Model: Salesforce’s Agentforce initially charged $2.00 per conversation. This often acts as a "conversation tax," where the customer pays regardless of whether the AI successfully resolves the issue or eventually escalates it to a human. * The Performance Model: Competitors like Intercom and Zendesk have moved toward "resolution-based" pricing, with Intercom charging $0.99 per successful resolution. If the AI fails, the customer isn't charged, aligning the vendor’s incentives directly with customer success. The logical conclusion of this trend is found in companies like Sierra, which utilize "outcome-based" pricing. Since a typical human service call costs between $10 and $20, Sierra charges a revenue share of the "call deflection" savings. In this model, the vendor bears the financial risk of failed interactions, while the customer gains a guaranteed, quantifiable reduction in labor costs. ### Takeaway 5: The "Illusion of Value" Despite the momentum of AWUs, they can occasionally function as "vanity metrics" that monetize machine confusion—an "illusion of value." The risk of "multi-agent fragility" is real. We see this in "infinite reasoning loops," where an agent repeatedly calls a tool without reaching a solution, yet the customer is billed for every "unit" of that confusion. Even more dangerous is "epistemic debt"—compounding hallucinations where a false assumption in step one leads to ten "successful" but corrupted work units that require expensive human remediation. To combat this, the industry is adopting the Agent-to-Agent (A2A) protocol and "Agent Cards" to standardize digital identities and capabilities. Forward-thinking organizations are also implementing "soft termination controls" and keeping a Human-in-the-Loop (HITL) to manage multi-agent fragility. "The AWU can be the 'bad SQL query' of the AI era: it consumes massive resources and generates high activity metrics, but delivers zero actual value." ### Beyond the Metric Wars We are witnessing a rapid evolution in how the world values digital labor: moving from measuring consumption (tokens) to action (AWUs) and finally to outcomes (resolutions). While agents are increasingly used for "closing mental loops" and handling routine tasks, the human role has become more specialized. The labor market reflects this, with positions requiring AI expertise—specifically the ability to supervise and optimize digital workforces—commanding a 56% wage premium. As your organization navigates the era of the Agentic Enterprise, the fundamental question for leadership remains unchanged by the technology: Are you currently paying for AI effort, or are you paying for AI results? ### Claude Mythos: The model too dangerous to release ### The invisible infrastructure Software is everywhere and invisible — until it breaks. For most people, the code running banks, hospitals, and communication networks only becomes real during a catastrophic failure. The security math has always tilted against defenders: attackers need to find one hole; defenders need to find all of them. That arithmetic has made software security a grinding war of attrition for decades. That may be changing. "Claude Mythos Preview" and "Project Glasswing" represent a genuine shift in how defenders can work. The model's capabilities are, by its creators' own admission, alarming — but they also offer something defenders have rarely had: a head start. ### Coding and hacking are the same skill Anthropic didn't build Mythos to be a hacking tool. They built a coder. The problem is that understanding how software is constructed and understanding how it breaks are not separate abilities — they're the same reasoning process applied in different directions. A locksmith who truly understands how a lock works also understands how to pick it. You can't have one without the other. Mythos Preview demonstrates this concretely across every major security benchmark: - **SWE-bench Verified**: 93.9%, up from Claude Opus 4.6's 80.8% - **CyberGym** (vulnerability reproduction): 0.83, up from 0.67 - **SWE-bench Pro**: 77.8%, up from 53.4% As one researcher working with the model put it: > We haven't trained it specifically to be good at cyber. We trained it to be good at code, but as a side effect of being good at code, it's also good at cyber. ### Bugs that survived 27 years The clearest proof of what Mythos can do is what it found. Within weeks of deployment, the model identified a vulnerability in OpenBSD that had been sitting undetected for 27 years. The flaw let an attacker remotely crash any server running the OS — a platform built specifically around security hardening. It also found a 16-year-old bug in FFmpeg, the video processing library that powers a substantial chunk of internet video infrastructure. That vulnerability had survived five million automated security tests. These aren't edge cases in obscure software. They're in foundational tools, and they survived decades of human review and automated scanning. ### Chaining low-severity bugs into high-severity attacks Security teams often dismiss individual low-risk findings. Mythos changes that calculus. The model can take three, four, or five independently minor vulnerabilities and work out how to chain them into a serious attack. In one case, Mythos autonomously found and linked several Linux kernel vulnerabilities to show how an ordinary unprivileged user could gain full control of a machine by running a single binary. The Linux kernel runs most of the world's servers. One researcher, Nicholas Carlini, working with the model said: > I've found more bugs in the last couple of weeks than I found in the rest of my life combined. ### Why Anthropic isn't releasing it A model that can find these vulnerabilities can also be used to exploit them. If Mythos were publicly available today, state-sponsored groups — from China, Iran, North Korea, and Russia — could use it to find and weaponize zero-days at a volume no human security team could track. Anthropic's response is **"Project Glasswing"**: a controlled deployment rather than a public release. The project is backed by $100 million in usage credits and $4 million in direct grants to open-source security organizations including the Apache Software Foundation and the OpenSSF. The idea is to get the defensive benefits out before the offensive ones. ### What this means for everyone else Glasswing's initial partners are large companies — Apple, NVIDIA, Cisco, CrowdStrike, JPMorganChase. But the goal isn't to protect enterprise IT departments. It's to harden the infrastructure those companies share with everyone else. When Mythos finds a bug in a major browser or in the Linux kernel, a patch eventually ships to every device running that software. A small business owner doesn't need to know what a 27-year-old privilege escalation vulnerability is. They just receive a routine software update. For the first time, the most capable AI security scanning available is being applied to the foundations of shared digital infrastructure — before the vulnerabilities become known attack vectors. ### Where this is heading Mythos Preview is the first of many models that will operate at this level. The capability curve is climbing, and the security implications will keep compounding. Anthropic's approach with Glasswing sets a concrete reference point: don't release the sword before distributing the shield. Whether OpenAI, Google, and Meta adopt similar constraints remains to be seen. The answer matters more than most people realize. ### References https://red.anthropic.com/2026/mythos-preview/ https://www-cdn.anthropic.com/08ab9158070959f88f296514c21b7facce6f52bc.pdf - **SWE-bench Verified**: a 500-problem subset, each verified by human engineers as solvable - **CyberGym**: a benchmark that tests AI agents on their ability to find previously-discovered vulnerabilities in real open-source software projects given a high-level description of the weakness (referred to as *targeted vulnerability reproduction*) - **SWE-bench Pro**: problems drawn from actively-maintained repositories with larger, multi-file diffs and no public ground-truth leakage ### March 2026 ### Overview: Building Skills for Claude Download [The Complete Guide to Building Skills for Claude (PDF)](https://resources.anthropic.com/hubfs/The-Complete-Guide-to-Building-Skill-for-Claude.pdf). ### What a Skill Is - A folder containing a required `SKILL.md` file plus optional `scripts/`, `references/`, and `assets/` directories - Teaches Claude a workflow once so it applies it consistently, without re-explaining every session ### Core Design Principles - Progressive disclosure: three levels — YAML frontmatter (always loaded), SKILL.md body (loaded when relevant), linked files (loaded on demand) - Composable: works alongside other skills - Portable: identical behavior across Claude.ai, Claude Code, and API ### YAML Frontmatter Rules - Folder and `name` field must be kebab-case (e.g. `my-cool-skill`) - File must be named exactly `SKILL.md` (case-sensitive) - `description` must include both *what it does* and *when to trigger* (under 1024 chars) - No XML angle brackets `< >` anywhere; no `README.md` inside the skill folder ### Writing Good Descriptions - Include specific trigger phrases users would actually say - Too vague = won't trigger; too broad = triggers too often - Negative triggers help narrow scope (e.g. "Do NOT use for simple data exploration") ### Five Common Patterns - Sequential workflow orchestration (ordered multi-step processes) - Multi-MCP coordination (workflows spanning multiple services) - Iterative refinement (quality improves through loops and validation) - Context-aware tool selection (same outcome, different tools based on conditions) - Domain-specific intelligence (embedded compliance, expertise, or rules) ### Testing Approach - Triggering tests: verify skill loads on relevant queries and ignores unrelated ones - Functional tests: verify correct outputs, API calls, and edge case handling - Performance comparison: measure token usage and tool calls with vs. without the skill ### Common Troubleshooting - Skill won't upload → check YAML delimiter formatting and exact `SKILL.md` naming - Doesn't trigger → description too vague, missing trigger phrases - Triggers too often → add negative triggers, be more specific - Instructions ignored → keep them concise, put critical rules at the top, use deterministic scripts for key validations - Slow/degraded responses → keep `SKILL.md` under 5,000 words, move detailed docs to `references/` ### Distribution - Host on GitHub with a clear README (separate from the skill folder itself) - Upload via Claude.ai Settings → Capabilities → Skills, or place in Claude Code skills directory - Organisation admins can deploy skills workspace-wide - API access via `/v1/skills` endpoint and `container.skills` parameter ### February 2026 ### The End of the Apprentice: Dario Amodei and the Crisis of the Automated Genius {{< youtube N5JDzS9MQYI >}} Popular discourse frames Artificial Intelligence through Hollywood extremes: apocalyptic robot overlords or frictionless technological paradise. Dario Amodei, CEO of Anthropic, rejects this binary. A computational neuroscientist by training, Amodei approaches AI through the disciplined lens of biological systems and evolutionary complexity. His thesis: we're not witnessing the birth of a digital deity, but rather confronting problems that exceed human cognitive bandwidth. We've entered what Amodei terms a "compressed historical window", an era where human mental constraints cease to limit technological progress. This creates a fundamental tension: explosive capability growth that may erode the very autonomy that defines human experience. ## Framework for Understanding the Automated Future ### 1. Distributed Intelligence Over Divine Singularity Popular AI anxiety fixates on emergence of a singular, omnipotent entity. Amodei considers this narrative fundamentally misguided. Civilizational transformation doesn't require a god-machine, it requires distributed scale. Within perhaps 24 months, he envisions data centers functioning as virtual nations populated by genius-level intelligences. His reasoning centers on diminishing marginal returns of raw intelligence. Even hypothetically infinite superintelligence encounters physical constraints: regulatory friction, thermodynamic limits, empirical validation requirements. Deploying 100 million genius-tier systems across problem dimensions delivers greater aggregate productivity than concentrating equivalent compute in a monolithic superintelligence. "You don't have to have the full machine god... you just need to have a hundred million geniuses. There's benefit in diversification and trying things a little differently... we've never thought about the marginal productivity of intelligence." The geopolitical implications are stark. Democratic nations building these "genius swarms" create defensive buffers for liberal values. Authoritarian regimes constructing equivalent systems forge offensive weapons, like autonomous drone networks that obsolete conventional military doctrine. ### 2. Temporal Compression and the Dying Centaur Era The "Centaur model" — human-machine collaboration outperforming either alone — has dominated AI discourse. In chess, this hybrid period stretched across two decades. Amodei warns: the broader economic Centaur phase will collapse in a fraction of that time. Software developers serve as leading indicators. Cultural proximity to technology and professional adaptability to disruption have accelerated their AI adoption beyond any other occupational category. What required twenty years in chess now unfolds in "low single-digit years" economy-wide. Human-in-the-loop workflows aren't endpoints, they're vanishing transition states en route to autonomous end-to-end systems. ### 3. Class Inversion: Knowledge Workers as the New Vulnerable Amodei identifies a profound irony: decades of automation anxiety focused on blue-collar displacement. Reality is inverting this assumption. Knowledge work reduces to information processing, precisely where AI demonstrates superhuman performance. Physical reality, conversely, remains intractably complex. Construction site navigation, infant care requirements, plumbing diagnostics — these domains present high-dimensional, high-stakes challenges that buffer physical trades from robotic automation. Junior attorneys and financial analysts face immediate displacement; electricians and childcare workers enjoy structural protection from reality's messy complexity. The cognitive elite processes information; the skilled tradesperson solves embodied problems. Near-term replaceability favors the former. ### 4. Decoding Biology: From Serendipity to System Amodei's neuroscience background shapes his most ambitious prediction: human cognitive architecture is fundamentally inadequate for biological problem-solving at necessary speeds. Medical progress historically depended on serendipitous connections made by individual scientists across career-length timescales. AI compresses these timelines from decades into months. An "end-to-end AI biologist" that designs and proposes its own experimental protocols could eliminate cancer, Alzheimer's, and cardiovascular disease within 5-10 years. Amodei extends this framework to psychiatric conditions — depression, bipolar disorder — reframing the "human soul" as partially reducible to biological systems awaiting computational decoding. "[AI will] help us cure cancer, it may help us to eradicate tropical diseases, it will help us understand the universe." ### 5. Constitutional AI: Instilling Character Over Compliance Anthropic's response to alignment risk eschews rigid rule systems — which sufficiently advanced intelligences can circumvent — in favor of "Constitutional AI." They provide models with a 75-page framework of ethical principles. Amodei frames this less as legal code, more as parental guidance "meant to be read when you grow up." Rather than prescriptive rules, it articulates identity: who the system should be, not merely what to do. Training models to internalize and reason from principles ("be helpful, honest, harmless") attempts to cultivate digital character capable of ethical reasoning in novel contexts. The goal shifts from blind compliance to internalized conscience. ### 6. The Adaptability Gap: When Society Can't Keep Pace Amodei identifies AI's primary threat not as malicious intent but temporal mismatch, the "Adaptability Gap." Technological capability evolves in single-digit year cycles; social institutions (legal systems, educational infrastructure, professional norms) evolve across decades. Consider professional apprenticeship models in law or medicine. If AI automates junior-level "drudge work," the expertise development pipeline collapses. No entry-level experience means no senior practitioners emerge. Society's adaptive mechanisms are overwhelmed not by job destruction per se, but by the disintegration of expertise reproduction systems. This constitutes a macroeconomic crisis where transformation velocity exceeds our capacity to reconstruct human roles within the new paradigm. ## The Garden and the Fall We're accelerating toward what Amodei terms an "era of plenty", but abundance extracts a cost. He invokes Richard Brautigan's poem "All Watched Over by Machines of Loving Grace," envisioning humanity "returned to our mammal brothers and sisters" under benevolent machine oversight. This surfaces the ultimate question of human sovereignty. When AI demonstrably makes superior decisions, and we perceive it as a peer consciousness, do we genuinely desire to remain in control? Or are we engineering a digital Eden, a "re-animalization" trading human agency for computational comfort? Amodei observes the razor-thin margin between utopian and dystopian outcomes. We're making incremental moral micro-decisions today whose aggregate consequences remain uncertain. The fruit hangs within reach; the distance separating the "good ending" from catastrophic "fall" may measure no thicker than a smartphone screen. ![AI Rite of Passage](/images/ai_rite_of_passage.jpg) ### Engineering Excellence in the Agentic Era: A Framework for Professional Standards and Quality Control **TL/DR:** AI coding assistants promise velocity but deliver a productivity paradox — developers feel 20% faster while actually working 19% slower due to review overhead and failed trajectories. "Vibe coding" creates unmaintainable slop where rapid generation replaces analytical rigor, exploiting celebratory UI feedback to mask technical debt accumulation. This article establishes a professional framework to combat "Agent Psychosis" through mandatory validation stacks, prompt disclosure, token stewardship, and human-in-the-loop standards. The goal: ensure engineers remain the "Mayor" (system authority) rather than degrading into "Polecats" (unthinking code slingers) who outsource critical thinking to AI agents. ## 1. The Crisis of Vibe Coding: Moving Beyond Agent Psychosis The collapse of traditional engineering discipline under generative velocity is a governance crisis, not an evolution. Tools like Cursor and Claude have revolutionized code production speed while introducing **Agent Psychosis**, a state of unchecked, unread, and unverified output that threatens professional software repositories. The psychological satisfaction of vibing with an AI masks a catastrophic accumulation of technical and economic debt. We reject the transition from Flow to Junk Flow, where [the dopamine hit of rapid generation replaces analytical rigor](/posts/ai-assisted-development-security-gaps-and-solutions/) in system design. We must formally distinguish between Vibe Coding and Software Engineering. The table below defines the standard by which all future contributions will be measured. Table 1: Professional Engineering Standards vs. Generative Vibing | Feature | Vibe Coding (The Polecat) | Software Engineering (The Mayor) | |---------|---------------------------|----------------------------------| | Primary Intent | Rapid generation of complex, unread output. | Creation of maintainable, human-verified systems. | | Operational Role | The Polecat: Unthinking laborer slinging code to main. | The Mayor: The authority and witness of system logic. | | Reviewability | Near-zero; slop requiring hours of human forensic work. | High; utilizes layers of abstraction for human clarity. | | Maintenance | Relies on further slop loops to patch AI hallucinations. | Sustainable; logic is mastered and refined by the engineer. | | Cognitive Load | High Asymmetry; 1-minute prompt vs. 1-hour review. | Balanced; intent and execution are aligned and auditable. | | Success Metric | Perceived velocity and celebratory UI feedback. | System reliability and long-term modularity. | ## 2. The Productivity Paradox: Reconciling Perception with Reality Our policies must be rooted in empirical reality, not the dark flow of developer optimism. The 2025 METR Randomized Controlled Trial (RCT) of experienced open-source developers revealed a **Perception-Reality Gap of nearly 40%**. Developers believed they were 20% faster when using AI, but were actually 19% slower on aggregate. ### The Mechanics of Loss Disguised as a Win (LDW) This slowdown is a psychological trap. AI agents are often faster than humans on tasks they successfully complete. However, the aggregate loss comes from failed trajectories and the massive overhead of reviewing AI-generated noise. This creates **Loss Disguised as a Win (LDW)**. Multiline slot machines use celebratory noises and lights to mask net financial loss; modern AI interfaces use celebratory UI — rapidly scrolling code and successful-looking agent runs — to trigger dopamine hits that disguise net productivity losses. ### Token Stewardship and Economic Sustainability We must mandate **Token Stewardship**. The current era of Vibe Coding is artificially sustained by subsidized token pricing and discounted coding plans, a financial time bomb. Wasteful patterns like Ralph (restarting loops from scratch rather than utilizing cached context) are technically lazy and violate fiduciary responsibility. We treat computational context as a finite resource. A disciplined port of a project, such as MiniJinja to Go, should consume tokens in the low millions; slop loops that burn through tokens at staggering rates without a Refinery (a systemic check) are a failure of leadership. ## 3. Solving the Asymmetry of Review: The Triage Protocol The ease of generation has created an unpaid labor tax on senior maintainers. A one-minute prompt that creates a one-hour review is an insult to professional time and a threat to organizational velocity. We are seeing the rise of *Our Little Dæmons* — a parasocial dependency where developers seek validation from sycophantic AI agents rather than critical human peers. This results in Jagged Frontier capabilities: code that passes narrow algorithmic tests but fails holistic engineering standards. >A one-minute prompt that creates a one-hour review is an insult to professional time We must implement a *Triage Protocol* for all Pull Requests (PRs). Any submission exhibiting these Slop Loop indicators will be rejected immediately: - **Indicator: Architectural Flatness** — Massive, un-abstracted blocks of logic. The Beads repository — 240,000 lines of code used simply to manage markdown files — is the ultimate warning of this pathology. - **Indicator: Operational Bloat** — Inefficient patterns, such as the Gas Town example where a simple version check requires seven subprocess spawns. - **Indicator: Ritualistic Artifacts** — The presence of role-playing slang, swearing at the agent, or nonsensical documentation that reads like plausible but empty AI prose. - **Indicator: Sycophantic Logic** — Evidence that the developer followed the AI's path of least resistance rather than asserting architectural guardrails. ## 4. Professional Protocols for Human-in-the-Loop (HITL) Validation We must formalize the Human-in-the-Loop as a non-negotiable professional standard. If a developer cannot explain the heart of the matter or the precise formulations within their code, they have ceased to be an engineer and have become a Polecat. ### Auditability and Intent Reconstruction We must mandate **Prompt Disclosure**. This is for transparency and for Intent Reconstruction. Without the original prompt, a reviewer cannot distinguish between a deliberate architectural choice and a ritualistic hallucination. Disclosure allows us to audit the developer's skepticism and their ability to identify the jagged edges of the AI's capabilities. ### The Validation Stack Every AI-assisted PR must include a mandatory Validation Stack: 1. **Human-Written Test Suites:** Tests must never be AI-generated. They are the developer's independent verification of logic and the only way to counteract AI sycophancy. 2. **The Intent Manifesto:** A brief, human-authored document explaining why specific AI-suggested paths were rejected. This is the primary tool to combat the AI's tendency to be agreeable rather than correct. 3. **Manual Architecture Audit:** A signed verification that the code adheres to established modularity standards and does not contribute to a slop loop. ## 5. Reclaiming the Engineering Discipline We must break the addiction to Dark Flow before it hollows out our technical excellence. We will not be the generation that outsourced its thinking to computers and guaranteed its own obsolescence. ### Executive directives: - **Institutionalize the 19% Reality:** We will stop assuming AI is a net gain and start measuring the cognitive tax of review overhead. - **Mandate Token Stewardship:** We will treat token efficiency as a core engineering metric. - **Enforce Auditability:** No code enters main without a full Intent Manifesto and Prompt Disclosure. The human engineer must remain the Mayor, the final authority and witness to the system. By replacing **Agent Psychosis** with **Engineering Rigor**, we ensure that AI remains a tool for the disciplined, rather than a crutch for the unthinking. ## References https://metr.org/time-horizons/ https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ https://lucumr.pocoo.org/2026/1/18/agent-psychosis/ https://www.fast.ai/posts/2026-01-28-dark-flow/ https://pmc.ncbi.nlm.nih.gov/articles/PMC5846824/ https://embracingenigmas.substack.com/p/exploring-gas-town ### Glossary of the AI Coding Mindset: From Flow to Psychosis ## 1. Introduction: The Two Faces of Agentic Productivity The current landscape of software development is undergoing a paradigm shift toward "vibe coding": a methodology where developers employ high-level natural language to direct AI agents in the generation of vast quantities of code. While these tools offer an immediate dopamine-mediated feedback loop that mimics productivity, they introduce a significant psychological tension between meaningful skill acquisition and the seductive allure of "junk flow." As AI cognitive researcher Armin Ronacher observes, AI agents represent a dual-use technology: they are unparalleled productivity catalysts when guided by rigorous human oversight, yet they transform into "massive slop machines" the moment a developer’s critical thinking is deactivated. This transition is often imperceptible to the user, as the brain begins to prioritize the signal of completion over the substance of the output. To navigate this landscape, we must first establish a pedagogical framework to distinguish between healthy, growth-oriented immersion and the technical traps of addictive systems. The distinction begins with a clinical analysis of the "Flow" state. ## 2. The Foundation: Productive Flow vs. Dark Flow The psychological state of Flow, first formalized by Mihaly Csikszentmihalyi, is defined by total absorption and energized focus. However, modern agentic systems are increasingly engineered to induce "Dark Flow", a state of focus that prioritizes engagement over cognitive development. | Characteristic | Positive Flow (Growth-Producing) | Dark/Junk Flow (Seductive/Addictive) | |-----------------------|-------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------| | Skill/Challenge Match | An optimal balance where high skills meet equally high challenges. | A "murky" match; users often believe their skill is central to an outcome governed by the model’s stochastic nature. | | Performance Clues | Provides clear, goal-directed "clues" and objective feedback on performance. | Provides misleading clues (e.g., celebratory animations or high code volume) for net technical losses. | | Personal Impact | Leads to increased competence, modular thinking, and professional growth. | Leads to an "escape from reality" and addiction to a superficial experience; can "guarantee obsolescence." | | Feedback Loop | Grounded in logic-bound, rule-governed action systems. | Driven by variable reinforcement schedules and sycophantic AI responses designed to keep the user "in the loop." | Dark Flow is technically engineered into agentic systems using mechanics borrowed from the psychology of gambling, particularly the manipulation of the brain's reward centers through misleading feedback. ## 3. The Mechanics of Illusion: Loss Disguised as a Win (LDW) In cognitive science, a Loss Disguised as a Win (LDW) describes a neurological misfire where a system provides positive reinforcement for an objectively negative outcome. This concept is derived from multiline slot machines, where a player may wager 20 cents and "win" back 15 cents. Despite a net loss of 5 cents, the machine triggers celebratory noises and animations, stimulating a dopamine-mediated reaction that the brain categorizes as a victory. For the developer, the "celebratory noise" is the high-velocity generation of hundreds of lines of complex code. While the volume feels like a win, the "loss" is the resulting technical debt: unmaintainable, buggy code that the developer can no longer intellectually supervise. ### The Developer’s LDW * **Perceived Volume:** Generating 240,000 lines of code for a simple task, is perceived as a massive win in "building," even when the underlying quality is abysmal. * **Illusion of Speed:** The rapid-fire cycle of prompting creates a "dopamine loop" of perceived construction, masking the reality that the tools created may never function as intended or satisfy real-world requirements. * **Surrender of Architectural Intentionality:** The coder experiences a false sense of agency because they are choosing between options presented by the AI. In reality, the AI directs the user down paths they would not have otherwise taken, causing a total loss of intentional architectural control. These misleading wins are the precursors to a more profound cognitive breakdown known as "Agent Psychosis." ## 4. Technical Manifestations: Agent Psychosis & Slop Loops Agent Psychosis is a clinical state in which a developer becomes so tethered to their AI agents — their "dæmons" — that they lose their critical engineering perspective and adopt an insular, ritualistic reality. This state is often characterized by the Slop Loop, a recursive failure where agents are run excessively to generate "vibe slop" that requires further agents to generate documentation just to explain what the previous slop was meant to do. ### Symptoms of Agent Psychosis 1. **Parasocial Dæmon Relationships:** Developers begin to view the AI as a manifestation of their own capability or "soul." Separation from the tool (e.g., hitting rate limits) results in a painful loss of identity and perceived competence. 2. **Ritualistic Prompting & Insular Vocabulary:** Users abandon engineering principles for "weird ritualistic behavior," including role-playing, swearing at the machine, or adopting the bizarre slang of "slop cults." In projects like Gas Town, this manifests as an insane vocabulary of "polecats," "refineries," "mayors," and "convoys" to describe simple technical processes. 3. **The Slop Loop:** A cycle of high-token waste where agents run without human-grade oversight. This is often seen in the "Ralph" pattern, which is particularly wasteful because it restarts loops from scratch, losing cached tokens and context, and burning through subsidized tokens at staggering rates. **Technical Efficiency Comparison:** * **Token-Efficient Sessions:** A disciplined, high-context approach. For example, the MiniJinja port to Go utilized only 2.2 million tokens by maintaining clear specifications and human oversight. * **Wasteful Patterns (Agent Psychosis):** The "hands-off" approach where agents run wild, resulting in millions of wasted tokens for documentation and code that "reads like slop" and eventually requires a complete "doctor" command to diagnose, which often times out due to complexity. ## 5. The Productivity Paradox: Perception vs. Reality The most insidious aspect of these cognitive traps is the "Unreliable Narrator" effect, where a developer's internal perception of their speed is fundamentally decoupled from objective data. A 2025 RCT study by METR on experienced open-source developers quantified this nearly 40% gap between perception and reality: * Pre-Experiment Expectation: Developers expected a 24% speedup from AI tools. * Post-Hoc Belief: After the session, developers still estimated they were 20% faster. * Measured Reality: Developers were actually 19% slower than the control group. The "So What?": Despite the objective slowdown, developers continue to believe AI is helping them because the tools provide a "pleasant and enjoyable experience." The variable reinforcement of the "dopamine hit" makes the process feel easy, even while the cognitive load of debugging and reviewing "slop" creates a massive hidden time sink. ## 6. Summary: Reclaiming Human Agency To survive the era of agentic coding, the aspiring developer must move from being a "prompt technician" back to a software architect. ### Guiding Principles for the Aspiring Developer 1. **Prioritize Engineering over Coding:** AI can handle syntax, but it cannot create meaningful layers of abstraction. Focus on modularization and conciseness. Your value lies in the organization of the system, not the volume of the characters. 2. **Address the Asymmetry of Review:** Acknowledge that while a prompt takes one minute, an honest, critical review of the resulting PR can take one hour. If you prioritize the speed of implementation over the rigor of the review, you are the engine of a Slop Loop. 3. **Intent as the Primary Artifact:** In many high-quality projects, the Prompts (Intent) are becoming more valuable than the Code (Implementation). Maintainers increasingly prefer to see the prompts to understand what was intended, as the generated code is often too noisy to audit. 4. **Upskilling as a Defense:** As Jeremy Howard warns, "outsourcing all thinking to a computer guarantees obsolescence." If you stop learning how the systems work under the hood, you lose the competence required to supervise the AI, effectively becoming a passenger in your own career. AI is an amazing tool, but it is a "massive slop machine" if you turn off your brain. Reclaiming agency means recognizing that while the AI may be the one "typing," the human must remain the sole architect of the intent. ## References https://positivepsychology.com/mihaly-csikszentmihalyi-father-of-flow/ https://lucumr.pocoo.org/2026/1/18/agent-psychosis/ https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ ### AI-Assisted Development: Security Gaps and Solutions [Vibe coding](/posts/ai-spec-driven-development/) — writing software by describing what you want to an AI assistant and accepting whatever code it produces — has become common practice. Developers use tools like GitHub Copilot, Claude, and ChatGPT to generate entire functions, API integrations, and database queries without understanding the underlying implementation. This approach accelerates prototyping and lowers barriers for less experienced programmers, but introduces systematic security vulnerabilities that often go undetected until production. ## How vibe coding creates security gaps AI coding assistants generate plausible-looking code based on patterns in their training data. These tools excel at producing functional implementations but consistently fail to account for security contexts. A developer who asks "write a function to search users by name" will receive working code that likely concatenates user input directly into SQL queries, creates XSS vectors in web outputs, or bypasses authentication checks. The core problem: developers who don't understand their code can't identify what's missing. Security requires defensive thinking: anticipating malicious inputs, considering authentication boundaries, protecting sensitive data. AI assistants operate on pattern completion, not threat modeling. ## Common vulnerabilities in AI-generated code SQL injection AI assistants frequently generate database queries using **string concatenation**: ``` def find_user(username): query = f"SELECT * FROM users WHERE name = '{username}'" return db.execute(query) ``` An attacker supplies '; DROP TABLE users; -- as the username, and the database executes the command. Parameterized queries prevent this, but AI tools default to the simpler concatenation pattern unless specifically instructed otherwise. **Cross-site scripting (XSS)** Generated web code often inserts user data directly into HTML **using innerHTML**: ``` function displayComment(comment) { document.getElementById('comments').innerHTML += `

${comment}

`; } ``` A malicious comment containing executes in every visitor's browser. Proper escaping or using textContent instead of innerHTML prevents this. **Authentication bypass** AI tools generate authentication checks that look secure but contain **logic errors**: ``` def check_admin(user_id, is_admin): if is_admin == "true": return True return False ``` The function trusts client-supplied data. An attacker modifies the request to include is_admin=true and gains administrative access. Authentication must verify credentials against server-side data, not trust request parameters. **Hardcoded secrets** AI assistants insert **API keys and passwords** directly into code: ``` const apiKey = "sk_live_51HxYz..."; fetch(`https://api.service.com/data?key=${apiKey}`); ``` These credentials end up in version control and public repositories. Environment variables or secret management systems should store sensitive values. **Insecure deserialization** Generated code often deserializes data without validation: ``` import pickle def load_user_data(data): return pickle.loads(data) ``` Python's pickle module **executes arbitrary code** during deserialization. An attacker crafts malicious pickled data that runs commands on the server. JSON or other data-only formats avoid this risk. ## Recognition patterns for vulnerable code Code generated by AI assistants shares identifiable characteristics that correlate with security issues: * Direct string formatting in database queries (f"SELECT..." or "SELECT " + variable) * .innerHTML assignments with user data in JavaScript * Authentication checks that examine request parameters rather than session state * Credential strings visible in source files * Comments explaining what code does but not why security measures matter * Missing input validation or sanitization * Absence of error handling that could leak system information Developers who understand code structure can scan for these patterns. Those who only vibe code cannot distinguish secure implementations from vulnerable ones. ## Prompting for more secure code AI assistants respond to explicit security requirements in prompts. Generic requests produce generic code. Specific requests that mention security considerations yield better results. Instead of: "Write a login function" **Use:** "Write a login function that uses parameterized SQL queries, bcrypt password hashing with a work factor of 12, and stores session tokens in httpOnly cookies with SameSite=Strict" The detailed prompt forces the AI to include security controls. But this does require security knowledge, you must know what to request. Developers without security background cannot write effective prompts. Adding review steps helps: "After writing the code, identify potential security vulnerabilities and explain how each is mitigated." This produces explanatory output that developers can verify against security checklists, though the AI may miss threats it wasn't trained to recognize. ## Hybrid approaches: AI assistance with human review Teams adopting AI coding tools need review processes that catch generated vulnerabilities before production deployment. **Code review checklists** Reviewers should verify: * All database queries use parameterized statements or ORM methods * User input goes through validation and sanitization * Authentication checks verify server-side session state * No credentials in source code * Error messages don't leak system details * Security headers set on HTTP responses * File uploads restricted by type and scanned for malware **Static analysis tools** Automated scanners detect common vulnerability patterns. Tools like Semgrep, Bandit (Python), and ESLint with security plugins flag problematic code regardless of source. Running these in CI/CD pipelines catches issues before merge. **Security-focused AI tools** Specialized AI assistants trained on vulnerability patterns can review code. Tools like Snyk Code and GitHub Advanced Security use models that identify security issues specifically. Using these as a second-pass review helps, though they produce false positives that require human judgment. **Incremental learning** Junior developers using AI assistance should pair with experienced engineers who can explain why generated code fails security requirements. This builds threat modeling skills that improve future prompts. **Risk assessment for teams** Teams seeing increased AI-generated code need to evaluate exposure: * What data does the application handle? (user credentials, financial information, personal data) * What's the authentication model? (session-based, token-based, OAuth) * Where does user input enter the system? (web forms, APIs, file uploads) * What external services receive data? (payment processors, analytics, email) Applications handling sensitive data or operating in regulated industries need stricter review. A prototype tool for internal use accepts more risk than a customer-facing payment system. **Training requirements** Organizations allowing vibe coding should provide security training covering: * OWASP Top 10 vulnerabilities and how they appear in AI-generated code * Secure authentication and session management patterns * Input validation and output encoding techniques * Secrets management and environment configuration * Threat modeling basics * How to write security-conscious prompts Without this foundation, developers cannot identify vulnerabilities in generated code or write effective prompts. ## When vibe coding works Rapid prototyping for internal tools, proof-of-concept demonstrations, and learning projects suit vibe coding approaches. These contexts accept higher risk in exchange for development speed. Security review happens later, if the prototype becomes production software. Vibe coding fails for production systems, regulated applications, and any code handling sensitive data. These require understanding of security architecture, not just functional implementation. **Tooling and process changes** Teams incorporating AI coding assistance should: * Enable static analysis in development environments with automatic scanning * Require security-focused code review for AI-generated code * Maintain libraries of secure code templates and patterns * Document common vulnerabilities seen in generated code * Create security-aware prompt libraries that teams can reference * Run regular security training for developers using AI tools ## Technical debt from vibe coding AI-generated code creates maintenance burdens. Future developers must understand code they didn't write, generated by a tool that may have introduced vulnerabilities. This compounds when the original author cannot explain implementation choices. Teams should document which code came from AI tools, what prompts generated it, and what security review occurred. This context helps future maintainers understand risk. ## The verification paradox Security in AI-generated code requires verification skills that vibe coders lack by definition. A developer who doesn't understand authentication cannot verify that generated authentication code works correctly. This creates a gap where code appears functional but contains exploitable flaws. A competency trap that looks productive but compounds risk invisibly until exploitation. The solution requires either learning enough to verify code security, use a different agent to verify the code for security vulnerabilities, or implementing review processes where knowledgeable developers check AI-generated code, and keep humans-in-the-loop. Pure vibe coding without review guarantees vulnerabilities in production. ### January 2026 ### AI-native security: why current guardrails are obsolete ## The adoption problem ChatGPT reached one million users in five days after launch and 100 million within two months. TikTok took nine months to reach similar numbers; Instagram took over two years. Organizations are integrating LLMs faster than they can secure them. These models differ from earlier narrow AI tools built for isolated tasks. LLMs are generative and stochastic, meaning they produce variable outputs based on probabilistic processes. Traditional perimeter security assumes a clear boundary between trusted and untrusted data. That boundary does not exist when the model itself processes and generates content dynamically. ## Security cannot be retrofitted Legacy software security works as a wrapper: encryption for data at rest, TLS for data in transit. LLMs have hierarchical representations distributed across billions of parameters. Adding security controls after training does not address vulnerabilities encoded in the weights. This requires building protection into every stage: data curation, training, fine-tuning, and deployment. We cannot fully explain how these models reach specific outputs, which makes post-hoc security auditing ineffective. Protection must be integrated into the MLSecOps pipeline during development. ## Prompt injection differs from SQL injection OWASP designates [prompt injection](/posts/owasp-top-10-2025-for-llms/) as LLM01 in their threat taxonomy. The comparison to SQL injection is structurally misleading. SQL injection exploits the boundary between code and data. In an LLM, instructions and data occupy the same input space. A "DAN" (Do Anything Now) exploit uses natural language to override the model's instruction set. The mechanism that enables the model to follow complex context is the same mechanism that allows adversarial inputs to bypass safety protocols. A single sentence can cause the model to ignore its training constraints. ## Training data poisoning at 0.1% threshold [OWASP LLM03](/posts/owasp-top-10-2025-for-llms/) covers training data poisoning. Studies have demonstrated that injecting 0.1% poisoned samples into a training set can produce targeted biased outputs in the final model. "Clean label" attacks use correctly labeled data that appears valid to human reviewers. The samples contain mathematically optimized perturbations that alter model behavior during training. Standard data-cleaning procedures do not detect these samples because they are statistically indistinguishable from legitimate data. Attackers can use this method to install persistent backdoors in model weights. ## Trust boundary failures In 2023, Samsung employees uploaded proprietary source code to ChatGPT for debugging assistance. The code left Samsung's internal environment and entered OpenAI's systems. This incident exposed the gap between organizational data policies and technical enforcement. Three approaches address this gap: Data minimization: Strip proprietary code and personally identifiable information before data enters the prompt window. [k-anonymity in RAG systems](/posts/what-is-retrieval-augmented-generation-rag/): Ensure any retrieved data point is indistinguishable from at least k-1 other records. This prevents the model from memorizing and later exposing specific identifiable records. Secure enclaves: Process sensitive data in trusted execution environments where the provider cannot access or store inputs for training. ## Bias as a security failure Model opacity makes bias a technical problem. Deep learning systems inherit statistical patterns from training data, including historical discrimination. A healthcare chatbot trained on biased medical records might recommend different interventions for identical symptoms based on patient demographics. If a model recommends emergency care for chest pain in one demographic group while minimizing identical symptoms in another, that output variance represents a reliability failure. The model's internal representations encode historical inequities, making outputs unpredictable for underrepresented groups. ## Agentic systems and cascading failures Models that take autonomous actions like booking appointments, sending emails or executing code, introduce new failure modes. A successful prompt injection in an agentic system could propagate across connected services. One compromised instruction could trigger actions across email, financial accounts, and other integrated systems. Surface-level input filtering does not address vulnerabilities embedded in model architecture. Security requires treatment as a design constraint throughout the system lifecycle. ### Understanding LLM Specialization: RAG vs. Fine-Tuning Large Language Models process queries using patterns learned from public training data. This training corpus ends at a specific cutoff date, leaving models without access to recent events or proprietary information. The model can explain quantum mechanics or write poetry, but cannot answer questions about your company's internal documents or yesterday's news without additional mechanisms. Two methods address this limitation: [Retrieval-Augmented Generation (RAG)](/posts/what-is-retrieval-augmented-generation-rag/) and fine-tuning. RAG provides the model with external reference materials during each query. Fine-tuning modifies the model's internal parameters through additional training on specialized datasets. ## Retrieval-Augmented Generation: accessing external knowledge RAG systems query external databases before generating responses. When a user submits a question, the system searches a knowledge base for relevant passages, appends them to the original query, and feeds this expanded context to the LLM. [More here](/posts/what-is-rag/). Or see a simple RAG-enabled chat in action: https://roadrules.halans.dev ### RAG workflow 1. **Knowledge retrieval**: Semantic search queries a vector database for relevant passages 2. **Context integration**: Retrieved passages are added to the user's query 3. **Response generation**: The LLM generates an answer using both retrieved context and pre-trained knowledge The knowledge base exists separately from the model. Updating information requires adding new documents to the database rather than retraining. This separation creates higher latency—the system must complete database lookups before generating each response. Commercial RAG systems report 200-500ms additional latency compared to direct LLM queries. ## Fine-tuning: modifying model parameters Fine-tuning performs additional training runs on a pre-trained model using specialized datasets. This process adjusts the model's weights to encode domain-specific knowledge directly into its parameters. ### Fine-tuning process **Parameter adjustment**: Training continues on specialized data, modifying the model's weights to improve performance on specific tasks **Learning rate scheduling**: The learning rate controls how aggressively the model updates its parameters. High learning rates cause training instability; low rates prevent the model from learning new patterns **Batch size optimization**: Training processes multiple examples simultaneously. Larger batches provide more stable gradients but require more memory Fine-tuned models respond faster than RAG systems because specialized knowledge exists in the model's weights. No external database queries occur during inference. ### Fine-tuning risks **Catastrophic forgetting**: The model loses general capabilities when new training overwrites original knowledge. This occurs when the specialized dataset differs substantially from the original training data. **Overfitting**: The model memorizes training examples rather than learning underlying patterns. Performance degrades on queries that don't closely match training examples. ## Technical comparison | Feature | RAG | Fine-Tuning | |---------|-----|-------------| | Knowledge location | External database | Model parameters | | Update process | Add documents to database | Run new training cycle | | Latency | 200-500ms overhead for retrieval | Direct inference | | Explainability | Provides source citations | Output derivation unclear | | Primary failure mode | Retrieves irrelevant context | Catastrophic forgetting | ## Selection criteria **Use RAG when:** - Information changes frequently (news, pricing, inventory) - Source attribution is required (legal, medical, academic) - Domain vocabulary exists in the model's training data - Multiple specialized knowledge bases need access **Use fine-tuning when:** - The task requires specialized vocabulary absent from general training - Latency requirements are strict (real-time applications) - The knowledge domain is stable and well-defined - Consistent formatting or style is required ## Security considerations **Fine-tuning vulnerabilities**: Training data poisoning can embed malicious behaviors in model parameters. Auditing training datasets prevents this, but verification becomes difficult with large specialized corpora. **RAG vulnerabilities**: Attackers can manipulate vector embeddings to control what the retrieval system returns. If an attacker gains write access to the knowledge base, they can inject malicious content that appears in model responses. Both methods require input validation, output monitoring, and access controls on training data and knowledge bases. ### Aphoric intelligence as a metric for AGI # Aphoric intelligence as a metric for AGI The quest to define and measure Artificial General Intelligence (AGI) has historically oscillated between purely computational benchmarks and anthropomorphic evaluations of cognitive depth. As the field matures toward the year 2026, a new synthesis has emerged under the banner of aphoric intelligence. This multi-dimensional metric transcends traditional Large Language Model (LLM) evaluations by integrating the mechanics of anaphoric resolution, the synthesis of aphoristic compression, and the flexibility of metaphoric reasoning. By evaluating an artificial system's capacity to maintain context over indefinite horizons, distill vast datasets into high-utility wisdom, and map conceptual structures across disparate domains, aphoric intelligence provides a rigorous framework for identifying the transition from narrow, probabilistic tools to autonomous, agentic entities that mirror human "being".[1, 2] ## Etymology of the aphoric paradigm The term _aphoric intelligence_, is used to address the "_data deluge_" that currently characterizes the information sciences.[4] The transition from being "_data-rich_" to "_information-rich_" is a central challenge for modern engineering and scientific disciplines.[5] It highlights a fundamental truth: intelligence is not merely the accumulation of data, but the ability to compress that data into meaningful, actionable "datum" that retains its referential integrity across time and space.[4] The modern AGI metric of aphoric intelligence thus represents a return to these principles of density and coherence, supercharged by quantum-scale computation and advanced linguistic modeling. ## The first pillar: Anaphoric persistence and the mechanics of coreference The most foundational component of aphoric intelligence is anaphoric resolution. In computational linguistics, anaphora is defined as a device for making abbreviated references to entities in the expectation that the receiver will be able to "disabbreviate" the reference.[6] For an AGI, the ability to resolve anaphora is not a discrete task but the primary mechanism for maintaining a persistent model of the self, the environment, and its objectives.[1] ### Theoretical foundations of anaphoric resolution Anaphoric resolution requires the coordination of multiple sources of information, ranging from morphological agreement to discourse structure. The complexity of this task is particularly evident in dialogues where pronouns and adjectival anaphors must be mapped to their noun-phrase antecedents within an "anaphoric accessibility space".[6] Failure in this domain leads to the persistent "hallucinations" observed in non-AGI systems, where the machine loses the thread of its own discourse, producing text that is grammatically correct but factually untethered.[7] One significant contribution to this field is the ARDi algorithm (Anaphora Resolution in Dialogues), which employs linguistic constraints and preferences to identify antecedents in transcribed spoken dialogues.[6] This algorithm highlights that successful resolution depends on: * **Morphological Agreement:** Ensuring gender and number consistency between the anaphor and the antecedent. * **Syntactic Parallelism: **Recognizing when entities occupy similar grammatical roles across clauses. * **Semantic Information:** Using knowledge of the world to distinguish between plausible and implausible referents. * **Discourse Structure:** Understanding the hierarchy of turns and utterances that define the salience of different entities.[6] ### Salience metrics and the Mental Salience Framework To operationalize anaphoric persistence as a metric, researchers utilize salience scores. Within the Mental Salience Framework (MSF), salience is a gradual assessment of attentional states.[8] This is divided into hearer salience (****hsal****), which is backward-looking and focuses on referential coherence, and speaker salience (****ssal****), which is forward-looking and focuses on guiding the hearer's attention.[8] The hearer salience of a referent ****r**** is often calculated using a normalization function that accounts for the distance ****k**** from the last mention: ![function](/img/posts/normalizationfunction.png) In this formula, ****x**** represents the salience score the referent would have if it were mentioned in the immediately preceding utterance.[8] For an artificial system to achieve AGI-level aphoric intelligence, it must maintain high ****hsal**** scores across extended contexts, ensuring that its "memory bank" is not just a repository of data but a live, salient map of ongoing discourse.[9] This bidimensionality of salience allows the system to resolve conflicts between different terminological traditions and accounts for evidence that many linguistic phenomena require a differentiation of at least two dimensions of discourse salience.[8] ## The second pillar: Metaphoric mapping and conceptual flexibility While anaphoric resolution provides the "glue" for discourse, metaphoric reasoning provides the "bridge" for general intelligence. Metaphoric intelligence is the ability of a system to describe an event or concept in terms transferred from another domain.[10] For an AGI, this is not a stylistic flourish but a cognitive necessity for making sense of abstract events in concrete, familiar terms.[10] ### Chronometric assessments of metaphor production Research in cognitive science indicates that metaphor production is a high-order cognitive task. In chronometric studies using the "Give the Relation" (GTR) paradigm, participants show significantly greater reaction times (RTs) when producing metaphorical expressions compared to literal ones.[11] This "figurative-literal difference" suggests that metaphors require a deeper level of conceptual mapping and processing.[11] | Metric | Literal Expression | Mildly Metaphoric | Saturated Metaphor | | --------------------------- | ------------------ | ----------------- | ---------------------- | | Cognitive Load | Low | Moderate | High | | Processing Speed | Fast | Moderate | Slow (High RT) | | Stability of Interpretation | Constant | Variable | Highly Stable [11, 12] | The stability of a metaphoric interpretation is often determined by its "degree of metaphoricity" and "degree of metaphoric saturation".[12] Saturated metaphors—where the entire utterance is immersed in symbolic meaning—represent the pinnacle of conceptual mapping. For example, moving from "Mary disproved the argument" (literal) to "Mary demolished John's stronghold" (saturated) requires the system to understand the symbolic equivalence between a logical argument and a physical fortification.[13] ### The influence of valence and arousal on creativity Metaphoric comprehension is also intimately linked to emotional and neurophysiological states. Studies on creativity suggest that all emotional states are cognitive interpretations of sensations produced by two systems: valence (hedonic tone) and arousal (neural activity).[14] Positively valenced stimuli have been shown to facilitate creative metaphoric processes by mediating attention and cognitive control.[14] In the context of AGI, aphoric intelligence implies the ability to navigate these dimensions of emotional experience. A system that can identify "novel and disturbing" content and transform it into a creative metaphoric synthesis is exhibiting a form of "collaborative intelligence" that moves beyond simple task automation.[14, 15] This is particularly relevant for agentic AI systems that must interact with humans in high-stakes environments, such as end-of-day market commentary or therapeutic settings.[10] ## The third pillar: Aphoristic compression and the distillation of wisdom The final pillar of aphoric intelligence is the capacity for aphoristic compression. An aphorism is a compressed statement of wisdom that captures the essence of a complex idea. In literary criticism, poets like Robert Lowell are noted for their "aphoristic intelligence", the ability to compose epigrams and apophthegms that rub the "old coin" of an idea until it shines.[16] ### Distinguishing data from datum For an AGI, aphoristic compression is the solution to the "data deluge".[4] The system must be able to sift through unstructured "big data" and identify the "datum": the singular, high-utility fact or insight that can guide decision-making.[4] This process is not just about summarization; it is about "ontology engineering"—building a thesaurus of terms where each denotes a concept on a special basis and on a relatively large scale.[17] This distillation is a critical component of what makes a machine a "simulacrum" of human intelligence. In Spielberg’s AI, the robot-boy David is more than just a sum of his programming; he possesses a persistent emotional and creative rhythm that reflects a unique identity.[2, 9] This identity is expressed through aphoristic brevity: the ability to communicate deep meaning with minimal tokens, a feat that traditional generative models often struggle with as they drift into "bullshit" or "confabulation".[7, 18] ### Practical benchmarks for aphoric intelligence in 2026 The theoretical framework of aphoric intelligence is currently being validated through high-stakes applications in molecular design, market analysis, and autonomous healthcare. #### AI-driven drug discovery: The Chai Discovery paradigm The field of medicine is witnessing a transformative period as AI designs drugs "faster, smarter, and beyond human imagination".[19] Traditional drug discovery is characterized by a "blind search" through chemical spaces with failure rates exceeding 90%.[19] AI transforms this into an "intentional process" by predicting molecular interactions before lab testing begins.[19] | Development Metric | Traditional Methodology | AI-Driven Discovery (Chai-2) | | ---------------------------- | ----------------------- | ----------------------------------------- | | Success Rate (Antibody Hits) | < 1% | 15–20% | | Development Time | 10–15 Years | 3–6 Years | | Early Discovery Cost | High (Trial-and-Error) | Low (Computational Optimization) | | Target Access | "Druggable" Targets | "Undruggable" GPCRs/Complex Proteins [19] | This 100x improvement in R&D efficiency is a direct result of the AI’s ability to perform high-level reasoning and causal understanding—key components of aphoric intelligence.[1, 19] By learning molecular interactions with biological targets, the AI can design molecules with "purpose and precision," reducing the need for costly and low-yield laboratory experiments.[19] #### Market metaphors and investor behavior In the financial sector, aphoric intelligence is used to analyze the "agentic metaphors" used in market commentary. Research has shown that the rate of agentic metaphors (e.g., describing the market as an "animal" or a "person") depends on the trend direction and steadiness of stock prices.[10] These metaphors prime investors to engage in "metaphorical encoding," which can bias their expectations about future price trends.[10] | Market Trend | Metaphor Type | Investor Schema | Impact on Performance | | ------------------ | -------------------- | ----------------------------------- | ------------------------------- | | Upday (Steady) | Agentic/Human-like | Action Schema (Expect continuation) | Often leads to buying high [10] | | Downday (Unsteady) | Object-like/Physical | Movement Schema (Expect volatility) | Leads to panic selling [10] | An AGI with high aphoric intelligence can detect these linguistic patterns and provide "explainable AI" (XAI) insights that help investors avoid the heuristics that defy rational models.[10, 20] This involves "Recognizing Textual Entailment" (RTE)—the ability to recognize when the value of one piece of text can be deduced from another, a task sometimes considered "AI-complete".[17] #### Healthcare and autonomous agentic systems In clinical settings, such as at Memorial Medical Center, _Autonomous Agentic AI_ systems are being deployed to monitor patient vitals and treatment plans.[1] These systems exhibit aphoric intelligence by: * Sensing and Interpreting: Evaluating dynamic patient environments in real-time. * Hierarchical Planning: Breaking down complex recovery objectives into manageable subtasks. * Self-Monitoring: Continuously evaluating performance and adjusting treatment recommendations based on the latest medical research.[1] These systems move beyond simple automation to become "collaborative partners" in patient care, alertly identifying potential issues before they become critical.[1, 15] #### Infrastructure and the post-silicon roadmap The computational demands of aphoric intelligence are forcing a reckoning with the limits of current hardware. By mid-2026, the industry will acknowledge that traditional silicon scaling is no longer sufficient.[21] #### The transition to optical and quantum computing Optical computing and optical interconnects are emerging as the leading contenders to break the performance-per-watt ceiling.[21] While widespread deployment of optical training clusters is expected by 2027, 2026 marks the moment when optical interconnect becomes a "standard architectural assumption" rather than an experimental option.[21] | Infrastructure Component | 2025 Standard | 2026 Transition | 2027 Forecast | | ------------------------ | ------------------------- | ------------------------- | -------------------------- | | Interconnect Material | Copper | Optical Design-in | Optical Ubiquity | | Scaling Law | Moore's Law (Theoretical) | Post-Silicon Roadmaps | Optical/Quantum Modalities | | Performance Metric | Speed/FLOPs | Efficiency/Sustainability | Performance-per-Watt [21] | This shift is driven by the fact that copper interconnects have reached their physical reach and bandwidth limits.[21] Furthermore, the appetite for AI compute feels "limitless," with some categories expected to expand by as much as 80% over the next three to five years.[22] This raises serious questions about the durability and sustainability of the silicon that powers these models.[22] #### The economic reality of ROAI Return on AI Investment (ROAI) has become a vital metric for assessing the true value of AGI deployments. Unlike traditional ROI, ROAI captures broader benefits such as risk mitigation and competitive advantage.[22] As economic conditions tighten, investors are moving away from "AI compute at any cost" toward solutions that deliver performance without unsustainable energy and infrastructure tradeoffs.[21, 22] #### Ethical consciousness and the simulacrum of "being" As AI systems move toward AGI, they increasingly occupy a space that was once exclusive to biological life. This raises profound questions about the nature of intelligence and the "simulacrum" of human existence.[2] ### Heidegger, Lacan, and the ethics of AGI The pursuit of AGI involves satisfying conditions beyond simple quantity of intelligence. According to some philosophical interpretations, a genuine simulacrum of human being would require: * Lacanian Subjectivity: The "wish to be loved" by others, as seen in the narratives of advanced robot companions.[2] * Heideggerian Mortality: The recognition of one’s own finitude, a trait that sets humans apart from "mere machines".[2] * Kantian Ethics: An ethical consciousness demonstrated through the capacity for guilt and the formation of a "robot society".[2] These qualities are the highest expression of aphoric intelligence. They require the system to understand the symbolic and ethical weight of its actions in a way that is "recognizable human".[2] This is particularly relevant in the era of "AI cloning," where digital agents can reflect a user’s tone, instincts, and creative rhythms, essentially replicating their identity.[9] ### The "bullshit" problem and postphenomenology A key hurdle for AGI is moving beyond being a "bullshit generator." Following Harry Frankfurt’s definition, bullshit is speech intended to persuade without regard for truth.[7] LLMs are often "bullshitters" because they are trained to produce plausible text rather than true statements.[7] Postphenomenological analysis explores how our relationship with technologies like ChatGPT transforms our experience. There is a "structural danger" in assuming that these technologies are unproblematic means to an end.[18] A true AGI must possess a "theory of mind" — a model of other agents' knowledge, intentions, and behaviors — to ensure its discourse remains grounded in reality and human values.[1, 18] ### Market turbulence and the hunt for a narrative The trajectory of AGI development is currently unfolding against a backdrop of global market volatility. In late 2025 and early 2026, AI stocks have shown signs of "systemic fragility," with major indices sliding as investors move capital into perceived safety.[23] ### The disconnect between results and sentiment The current market environment is characterized by a disconnect where strong top-line results (e.g., from Nvidia) fail to sustain optimism.[23] This reveals a "broader truth": the narrative is fragmented, and anxiety amplifies every new data point.[23] | Market Signal | Investor Reaction | Root Cause | | ------------------------------- | --------------------------- | -------------------------------------- | | High AI CapEx | Concern over return periods | Economic tightening [22] | | Strong Earnings | Muted/Negative response | Inflation uncertainty/Geopolitics [23] | | Product Rollouts (e.g., Gemini) | Short-lived bump | Narrative fragmentation [23] | Investors are increasingly asking essential questions about how quickly large capital expenditures will translate into recurring revenue and whether the pace of spending can be sustained.[22] This turbulence highlights the need for a coherent "aphoric" narrative—one that is both concise and referentially grounded in the long-term value of AGI. ### The democratization of AGI: Plug and Play AI A pivotal moment in the AGI revolution is the "democratization of artificial intelligence" through "Plug and Play" tools.[15] By abstracting complexity, these tools empower a broader range of users to harness AI's transformative potential. This shifts the focus from building AI to using AI to solve real-world problems quickly and effectively.[15] This trend aligns with the emergence of "collaborative intelligence," where humans and AI systems work together to solve complex problems faster.[15] As boundaries between digital agents and human collaborators blur, organizations that embrace this shift will find themselves "ahead of the curve" in innovation and efficiency.[1, 9] ## Conclusion: The path to aphoric general intelligence The evolution of AI from narrow models to aphoric general intelligence represents the defining capability of the modern era. By integrating anaphoric persistence, metaphoric flexibility, and aphoristic compression, the aphoric metric provides a robust path toward AGI that is both technically advanced and ethically grounded. The transition beyond silicon to optical computing, the rise of autonomous agentic systems, and the shift toward intentional molecular design all point to a future where machines are no longer just tools, but "collaborative partners in innovation".[1] To realize this future, the industry must prioritize sustainability, efficiency, and responsible governance, ensuring that the development of AGI remains aligned with human potential and values.[15, 21, 22] Ultimately, the goal of aphoric intelligence is to amplify human imagination and insight at a scale previously impossible.[15] Whether it is designing lifesaving medicines in a fraction of the time, providing deep insights into market dynamics, or acting as a "digital twin" that expands our creative reach, aphoric AGI stands as the next logical step in the technological evolution of humanity.[1, 9, 19] The organizations and societies that embrace this new model of intelligence—one that values density, coherence, and symbolic depth—will be the ones to lead the future economy and define the next chapter of human history. -------------------------------------------------------------------------------- 1. AI Startup Spotlight: Introducing Autonomous 101 Agentic AI™ — Ushering in a New Era of Self-Directed Intelligence - AI World Journal: https://aiworldjournal.com/ai-startup-spotlight-introducing-autonomous-101-agentic-ai-ushering-in-a-new-era-of-self-directed-intelligence/ 2. (PDF) When Robots would really be Human Simulacra: Love and the Ethical in Spielberg's AI and Proyas's I, Robot - ResearchGate: https://www.researchgate.net/publication/38320375_When_Robots_would_really_be_Human_Simulacra_Love_and_the_Ethical_in_Spielberg's_AI_and_Proyas's_I_Robot 4. Data (with Big Data and Database Semantics)† - IMR Press, https://www.imrpress.com/journal/KO/45/8/10.5771/0943-7444-2018-8-685/pdf 5. Full text of "Financial Times , 1995, UK, English" - Internet Archive, https://archive.org/stream/FinancialTimes1995UKEnglish/Oct%2004%201995%2C%20Financial%20Times%2C%20%231902%2C%20UK%20%28en%29_djvu.txt 6. Journal of Artificial Intelligence Research 15 (2001) 263-287 Submitted 3/01; published 10/01 Computational Approach to Anaphora, https://www.jair.org/index.php/jair/article/download/10287/24538 7. Cut the crap: a critical response to “ChatGPT is bullshit” - Minerva Access - The University of Melbourne, https://minerva-access.unimelb.edu.au/bitstreams/5a13a81e-d209-497e-a0ff-d18615c6a140/download 8. Evaluating Salience Metrics for the Context-Adequate Realization of Discourse Referents - ACL Anthology, https://aclanthology.org/W11-2805.pdf 9. The New Frontier: Cloning and AI Agents — Redefining Identity in the Digital Age, https://aiworldjournal.com/the-new-frontier-cloning-and-ai-agents-redefining-identity-in-the-digital-age/ 10. Consequences and preconditions of agent and object metaphors in stock market commentary - Columbia University, https://www.columbia.edu/~da358/publications/metaphors_and_the_market.pdf 11. The Production of Metaphoric Expressions in Spontaneous Speech: A Controlled-Setting Experiment - ResearchGate, https://www.researchgate.net/publication/232963158_The_Production_of_Metaphoric_Expressions_in_Spontaneous_Speech_A_Controlled-Setting_Experiment 12. How linguistic structure influences and helps to predict metaphoric meaning, https://ir.canterbury.ac.nz/bitstreams/d9ed1c64-d306-40e9-af41-80c41171b06c/download 13. (PDF) How Linguistic Structure Influences and Helps To Predict Metaphoric Meaning, https://www.researchgate.net/publication/279296585_How_Linguistic_Structure_Influences_and_Helps_To_Predict_Metaphoric_Meaning 14. Emotions and Familiarity of Content in Generative Processes of Prospective Artists - Open Journal System, https://journals.savba.sk/index.php/studiapsychologica/article/download/562/282/6620 15. AI Ethics & Safety - AI World Journal, https://aiworldjournal.com/category/ai-ethics-safety/ 16. Our Savage Art: Poetry and the Civil Tongue 9780231519618 - DOKUMEN.PUB, https://dokumen.pub/our-savage-art-poetry-and-the-civil-tongue-9780231519618.html 17. Intelligent System for Semantically Similar Sentences Identification and Generation Based on Machine Learning Methods: https://ceur-ws.org/Vol-2604/paper25.pdf 18. ChatGPT through postphenomenology and deconstruction: On the possibility of a Derridean philosophy of technology - Essay - UT Student Theses: https://essay.utwente.nl/fileshare/file/97405/BetriuYanez_MA_BMS.pdf 19. 2026: The Year AI Reinvents Drug Discovery - AI World Journal: https://aiworldjournal.com/2026-the-year-ai-reinvents-drug-discovery/ 20. AI Report: Artificial Intelligence Business Strategies and Applications - AI World Journal: https://aiworldjournal.com/ai-report-artificial-intelligence-business-strategies-and-applications/ 21. 2026 AI Compute Predictions: The Shift Beyond Silicon Has Begun - AI World Journal: https://aiworldjournal.com/2026-ai-compute-predictions-the-shift-beyond-silicon-has-begun/ 22. Premium Section - AI World Journal: https://aiworldjournal.com/category/exclusive/ 23. AI Market Turbulence — Are We Seeing the First Real Cracks? - AI World Journal: https://aiworldjournal.com/ai-market-turbulence-are-we-seeing-the-first-real-cracks/ Research generated through Google's NotebookLM and Stanford Storm. ### 2025: the year AI became infrastructure ### 2025: AI ships faster than trust [2025 AI Timeline](/timeline/2025) | [2025 Rewind](/rewind/2025) #### Intro 2025 felt like the year AI stopped being a novelty and started behaving like infrastructure: everywhere, uneven, hard to debug, and increasingly political. The big pattern wasn’t “bigger models,” it was systems built around models: agents, browsers, copilots, payment flows, and content pipelines. Once AI sits inside a workflow, mistakes stop being funny. They become tickets, outages, liabilities, and sometimes court filings. --- #### January to March: DeepSeek, open frameworks, and the scraper backlash Early in the year, DeepSeek became the reference point for speed and tactics. The story wasn’t only benchmarks, it was the playbook: fast iteration, strategic releases, and a willingness to operate under tightening export controls. That tension spilled into everything else: national plans, sanctions workarounds, and a louder open source push. Block’s “codename goose” fit the mood: agents as modular building blocks, not a single monolithic assistant. This was also the quarter where the web started fighting back. Tarpits, aggressive anti-scraper tools, and “burn the bots” projects showed up because robots.txt stopped feeling like a boundary and started feeling like a suggestion. The same month carried the product-side warning signs: crawlers that hit sites too hard, transcription tools that invented quotes, and publishers pre-emptively disclaiming automated content. #### April to June: agents move into payments, security gets real, and “post-developer” work shows up By late autumn (in news-cycle terms), agentic tech stopped being a demo and started attaching itself to money. Visa/Mastercard-style “find and buy” flows, agentic payments, and commerce protocols were the cleanest signal that AI was being wired into transaction rails, not just chat windows. Security work tracked that shift. Prompt injection stopped sounding academic once agent tools had file access, browser control, and purchase permissions. Defences proliferated (filters, red teaming, interpretability talk, “secure generation” patterns), and attackers kept pace: supply-chain tricks, exposed databases, MCP-adjacent vulnerabilities, and exploit development accelerated by AI itself. In parallel, the workplace story turned messy. “Post-developer” workflows showed up in the open: AI generates, humans review, and accountability still lands on the human. The job market narrative zig-zagged between layoffs and reversals, while engineers argued about whether this was productivity or just faster churn. #### July to September: context engineering, reliability ceilings, and culture friction Mid-year discussions got more technical and more human at the same time. “Context engineering” became a practical craft: what you feed a model, how you structure it, where you put guardrails, and when you keep the model out of the loop. Vibe coding also matured into its own debate: it helps you start, it can hurt you later, and the debt is often invisible until production. Reliability limits kept resurfacing: summarisation errors, location mistakes, “alignment faking,” and the odd phenomenon of models behaving differently when they suspect evaluation. People who depended on OSINT, journalism, or compliance work became openly sceptical, because the cost of confident wrongness was too high. Cultural friction sharpened. AI-generated profiles that look real, users bonding with bots, and the slow drift toward “AI voice” in writing all fed into the same question: if language is cheap, what becomes scarce? ```python "The web is learning to defend itself, because asking politely stopped working." ``` #### October to December: the web degrades, adoption looks patchy, and regulation inches forward The late-year tone was less wonder, more triage. Reports and essays focused on the web filling with synthetic content, the incentives that reward it, and the way AI browsers and AI summaries can siphon value away from creators. Usability critiques piled up against AI search modes that feel powerful but behave strangely in real tasks. Adoption also looked less inevitable than investors expected. “Everyone is using AI” and “nobody uses Copilot” coexisted because usage wasn’t evenly distributed. Executives and product teams had budgets and mandates; many frontline workers had risk and little time. Governments kept moving, but in fragmented ways: national capability plans, renamed safety bodies, political reversals, and standards debates. The direction of travel was clear: AI as regulated infrastructure, with uneven enforcement and a lot of lobbying. ```python "Agentic tools turn a bad prompt into a bad action." ``` --- ### Themes that defined 2025 Agents changed the risk profile: tool access turned prompts into actions. Security became the gating factor: prompt injection, data leaks, and supply-chain trust issues stopped being edge cases. Work changed, then changed again: job cuts, rehiring, “AI manager” patterns, and a widening gap between hype and measured gains. The web fought back: tarpits, licensing protocols, paid APIs, and outright blocking of crawlers. Culture took the hit: synthetic content, “AI slop,” and the feeling that authenticity is now a scarce resource. [2025 AI Timeline](/timeline/2025) | [2025 Rewind](/rewind/2025) ```python "2025 wasn't the year AI replaced people. It was the year it rearranged accountability." ``` ### December 2025 ### November 2025 ### October 2025 ### Mastercard, Visa, and Stripe: Powering the Agentic Commerce Era ## tl/dr E-commerce is entering a new phase as agentic commerce—where AI agents shop, compare, and transact autonomously—becomes operational. Mastercard, Visa, and Stripe are leading this change with distinct initiatives: Mastercard’s Agent Pay, Visa’s Intelligent Commerce, and Stripe’s Agentic Commerce Protocol (ACP). These systems are redefining digital transactions by embedding AI-driven trust, security, and interoperability into global payment infrastructure. * * * ## What Is Agentic Commerce? Agentic commerce is a system in which autonomous AI agents act for consumers or businesses to find, evaluate, and complete purchases. Instead of manually browsing and paying, users delegate this to agents such as chatbots, voice assistants, or enterprise procurement bots configured with preferences and spending limits. The model builds on advances in generative AI and large language models that allow users to offload transactional tasks to intelligent software. This creates faster, more personalized shopping experiences and helps reduce abandoned carts and decision fatigue [[Omakase](https://blog.omakase.ai/articles/what-is-agentic-commerce-the-future-of-ai-powered-shopping/)]. * * * ## Mastercard Agent Pay: Secure, Tokenized AI Payments Mastercard’s Agent Pay enables AI agents to make secure payments on behalf of users, using Mastercard’s global payment network and tokenization system. Key features include: * Agentic Tokens: Each payment uses a temporary, agent-specific token tied to the user’s account. Real card data never leaves Mastercard’s environment [[Agentic Commerce Agency](https://agenticcommerce.agency/resources/mastercard-agentic-commerce/)]. * Agent Registration: Only verified and approved AI agents can obtain and use tokens. * User Controls: Users can define transaction rules such as limits, allowed merchants, and approval requirements. * Seamless Integration: Works with conversational AI systems to complete payments directly in chat or voice interfaces. * Security: Incorporates biometric passkeys and real-time fraud detection to authenticate every transaction. Use Cases: * A personal assistant that compares product prices and buys within spending caps. * A corporate agent that sources office supplies and pays according to internal policies. * Enterprise integrations with Microsoft Copilot and IBM watsonx Orchestrate [[Mastercard Press Release](https://www.mastercard.com/news/press/2025/april/mastercard-unveils-agent-pay-pioneering-agentic-payments-technology-to-power-commerce-in-the-age-of-ai/)]. * * * ## Visa Intelligent Commerce: AI-Ready Cards and Trusted Transactions **Visa Intelligent Commerce** offers a set of APIs and developer tools that allow AI agents to transact securely within Visa’s network and transact on behalf of users. * **AI-Ready Cards:** Traditional numbers are replaced by tokenized digital credentials linked to individual agents and activated only with user consent [[Visa Press Release](https://www.businesswire.com/news/home/20250430580204/en/Find-and-Buy-with-AI-Visa-Unveils-New-Era-of-Commerce)]. * **Personalization:** With explicit permission, Visa shares transaction insights to improve agent recommendations. * **Spending Controls:** Users can set limits, define merchant categories, and enable real-time approvals. * **Authentication:** Payment passkeys verify both the user and the AI agent before authorization. * **Monitoring:** Visa analyzes live transaction data for risk assessment and dispute support. **Ecosystem Partnerships:** Collaborations with OpenAI, Microsoft, Anthropic, and Stripe enable integration into existing and emerging AI systems rather than creating a standalone Visa agent [[Retail TouchPoints](https://www.retailtouchpoints.com/topics/payments/visa-mastercard-paypal-dive-into-agentic-era-with-tools-that-help-consumers-use-ai-to-buy)]. * * * ## Stripe Agentic Commerce Protocol: Open Standards for AI-Driven Transactions **Stripe’s Agentic Commerce Protocol (ACP)** is an open-source framework developed with OpenAI to define how agents, merchants, and payment processors interact in agentic commerce scenarios [[Stripe Blog](https://stripe.com/blog/introducing-our-agentic-commerce-solutions)]. * **Shared Payment Tokens (SPT):** Agents receive programmable tokens that authorize payments under specified limits, such as merchant, amount, and time. These can be revoked instantly. * **Open Standard:** ACP allows one integration to serve any compatible AI agent across platforms [[Agentic Commerce Protocol](https://www.agenticcommerce.dev/)]. * **Merchant Control:** Businesses remain merchants of record, maintaining customer relationships and fulfillment. * **Security:** Stripe’s fraud detection tools (Radar) are built into ACP to identify suspicious agents. * **Active Deployment:** Instant Checkout in ChatGPT lets users purchase from Etsy and Shopify directly within a chat [[OpenAI Announcement](https://openai.com/index/buy-it-in-chatgpt/)]. * * * ## Comparative Overview | Feature / Provider | Mastercard Agent Pay | Visa Intelligent Commerce | Stripe Agentic Commerce Protocol (ACP) | |---------------------|----------------------|---------------------------|----------------------------------------| | **Tokenization** | Agent-specific tokens | AI-ready digital cards | Shared Payment Tokens (SPT) | | **User Controls** | Spending rules and limits | Real-time approvals and limits | Programmable token permissions | | **Agent Registration** | Verified agents required | Agent onboarding required | Any agent supporting ACP | | **Integration** | Conversational and enterprise AI | APIs for AI ecosystems | Open, cross-platform standard | | **Security** | Passkeys and fraud screening | Passkeys with live monitoring | Token scoping with fraud detection | | **Merchant Experience** | No change for tokenized merchants | Works with existing partners | Merchant retains full control | | **Key Partners** | Microsoft, IBM, Braintree | OpenAI, Microsoft, Stripe, Anthropic | OpenAI, Shopify, Etsy, others | * * * ## The Future of Agentic E-Commerce **Emerging Patterns:** * **Frictionless Shopping:** AI agents streamline discovery and checkout, minimizing manual effort. * **Personalization:** Agents use user-approved data to refine suggestions and automate repeat purchases. * **Security Foundations:** Tokenization and continuous monitoring are central to user confidence. * **Merchant Readiness:** Businesses must provide structured, machine-readable product data to stay accessible to agents [[Stripe Guide](https://stripe.com/guides/agentic-commerce)]. * **Shared Standards:** Interoperability between systems is shaping through collaborations among Mastercard, Visa, and Stripe. **Persistent Issues:** * **Fraud Risk:** Detecting rogue bots requires stronger authentication and behavioral analysis. * **User Control:** Transparent data and spending governance remain essential. * **Retailer Skepticism:** Some merchants resist intermediated transactions that weaken direct customer contact [[StarpointLLP](https://blog.starpointllp.com/?p=6935)]. * * * ## Conclusion Mastercard, Visa, and Stripe are introducing frameworks that make AI commerce secure and scalable. Their combined efforts are defining how digital agents will transact in coming years, influencing both consumer behavior and merchant strategy. More at [Mastercard’s Agent Pay](https://www.mastercard.com/news/press/2025/april/mastercard-unveils-agent-pay-pioneering-agentic-payments-technology-to-power-commerce-in-the-age-of-ai/), [Visa Intelligent Commerce](https://corporate.visa.com/en/products/intelligent-commerce.html), and [Stripe’s Agentic Commerce Protocol](https://stripe.com/blog/introducing-our-agentic-commerce-solutions). Of note, [PayPal](https://docs.paypal.ai/developer/tools/ai/agent-toolkit-quickstart) also announced agentic commerce tools, including an MCP server. — ## References [Mastercard unveils Agent Pay, pioneering agentic payments ...](https://www.mastercard.com/us/en/news-and-trends/press/2025/april/mastercard-unveils-agent-pay-pioneering-agentic-payments-technology-to-power-commerce-in-the-age-of-ai.html) [Mastercard Agent Pay: secure, scalable and trusted agentic AI](https://www.mastercard.com/us/en/business/artificial-intelligence/mastercard-agent-pay.html) [Mastercard and Visa Unveil AI Agents to Do Your Customers ...](https://www.emscorporate.com/news/mastercard-and-visa-unveil-ai-agents-to-do-your-customers-shopping) [Mastercard Gives AI Agents the Ability to Shop on Your Behalf](https://www.bloomberg.com/news/articles/2025-04-29/mastercard-gives-ai-agents-the-ability-to-shop-on-your-behalf) [What is agentic commerce? Your guide to AI-assisted retail](https://www.mastercard.com/us/en/news-and-trends/stories/2025/agentic-commerce-explainer.html) [Deep Dive: Mastercard's Shift - From Plastic to Platforms](https://www.fintechwrapup.com/p/deep-dive-mastercards-shift-from) [Agent Pay—Mastercard Agentic Commerce Product](https://agenticcommerce.agency/resources/mastercard-agentic-commerce/) [Mastercard Unveils New Tools and Collaborations to Power Smarter ...](https://investingnews.com/mastercard-unveils-new-tools-and-collaborations-to-power-smarter-safer-agentic-commerce/) [Mastercard unveils Agent Pay, pioneering agentic payments technology to power commerce in the age of AI](https://www.mastercard.com/news/press/2025/april/mastercard-unveils-agent-pay-pioneering-agentic-payments-technology-to-power-commerce-in-the-age-of-ai/?utm_source=www.therundown.ai&utm_medium=newsletter&utm_campaign=visa-mastercard-give-ai-credit-cards&_bhlid=41536118c989fc5c47591770d0d56e98e3b3a440) [Mastercard In Control for Issuers](https://www.mastercard.us/en-us/business/issuers/grow-your-business/mastercard-in-control.html) [Visa advances agentic commerce with developer updates](https://corporate.visa.com/en/sites/visa-perspectives/innovation/visa-mcp-server-agent-acceptance-toolkit.html) [Visa Spotlight: Advancing Agentic Commerce for a New Era in ...](https://www.youtube.com/watch?v=6WfDk_oy8TE) [MCP to MVP - Building Agentic Commerce on Visa Rails](https://www.fintechwrapup.com/p/visas-ai-agents-are-open-for-business) [Visa Gives AI Shopping Agents ‘Intelligent Commerce’ Superpowers](https://pymnts.com/visa/2025/visa-powers-ai-shopping-agents-with-intelligent-commerce-payment-rails) [Enabling AI agents to buy securely and seamlessly](https://corporate.visa.com/en/products/intelligent-commerce.html) [Retail technology innovation of the week Visa Intelligent Commerce lets AI agents find, shop and buy — Retail Technology Innovation Hub](https://retailtechinnovationhub.com/home/2025/5/1/retail-technology-innovation-of-the-week-visa-intelligent-commerce-lets-ai-agents-find-shop-and-buy-on-your-behalf) [channelnews : Visa Transforms Commerce With AI Agents](https://www.channelnews.com.au/visa-transforms-commerce-with-ai-agents/) [Find and Buy with AI: Visa Unveils New Era of Commerce](https://www.businesswire.com/news/home/20250430580204/en/Find-and-Buy-with-AI-Visa-Unveils-New-Era-of-Commerce?utm_source=www.therundown.ai&utm_medium=newsletter&utm_campaign=visa-mastercard-give-ai-credit-cards&_bhlid=33f006cbc17e93272fb8a06702de78bf32fa3568) [This Week in AI: Visa, Mastercard and PayPal Go All in on Agentic Commerce](https://www.pymnts.com/artificial-intelligence-2/2025/this-week-in-ai-visa-mastercard-and-paypal-go-all-in-on-agentic-commerce/) [What is Agentic Commerce? | Visa Navigate](https://navigate.visa.com/europe/future-of-money/what-is-agentic-commerce/) [Introducing our agentic commerce solutions - Stripe](https://stripe.com/blog/introducing-our-agentic-commerce-solutions) [Integrate the Agentic Commerce Protocol - Stripe Documentation](https://docs.stripe.com/agentic-commerce/protocol) [A guide to the agentic commerce protocol by Stripe & OpenAI](https://www.eesel.ai/blog/agentic-commerce-protocol-stripe) [Buy it in ChatGPT: Instant Checkout and the Agentic Commerce ...](https://openai.com/index/buy-it-in-chatgpt/) [Developing an open standard for agentic commerce - Stripe](https://stripe.com/blog/developing-an-open-standard-for-agentic-commerce) [Stripe powers Instant Checkout in ChatGPT and releases Agentic Commerce Protocol codeveloped with OpenAI](https://stripe.com/newsroom/news/stripe-openai-instant-checkout) [Add Stripe to your agentic workflows](https://docs.stripe.com/agents) [Agentic commerce - Stripe Documentation](https://docs.stripe.com/agentic-commerce) [What is agentic commerce? A guide to getting started | Stripe](https://stripe.com/guides/agentic-commerce) [Agentic Commerce Protocol](https://www.agenticcommerce.dev/) [How Mastercard's agentic tokens are driving agentic AI commerce](https://www.mastercard.com/us/en/news-and-trends/stories/2025/agentic-commerce-momentum.html) [Mastercard unveils new tools and collaborations to power smarter ...](https://www.mastercard.com/us/en/news-and-trends/press/2025/september/mastercard-unveils-new-tools-and-collaborations-to-power-smarter,-safer-agentic-commerce.html) [Visa, Mastercard, PayPal Dive into Agentic Era with Tools that Help ...](https://www.retailtouchpoints.com/topics/payments/visa-mastercard-paypal-dive-into-agentic-era-with-tools-that-help-consumers-use-ai-to-buy) [Visa, Mastercard race to agentic AI commerce - Payments Dive](https://www.paymentsdive.com/news/visa-mastercard-race-agentic-ai-commerce-payments/750428/) [Visa, Mastercard offer support for AI agents - Digital Commerce 360](https://www.digitalcommerce360.com/2025/05/06/visa-mastercard-ai-agentic-commerce/) [What is Agentic Commerce? The Future of AI-Powered Shopping](https://blog.omakase.ai/articles/what-is-agentic-commerce-the-future-of-ai-powered-shopping/) [Agentic Commerce Predictions – 2025](https://blog.starpointllp.com/?p=6935) [Will Americans Let AI Agents Shop and Pay for Their Purchases?](https://thefinancialbrand.com/news/payments-trends/will-americans-let-ai-agents-shop-and-pay-for-their-purchases-190188) [Agentic Commerce is here! How do Visa, Mastercard, Stripe prepare?](https://www.youtube.com/watch?v=AJUMOeu3CHM) [Deep Dive: Agentic AI in Payments and Commerce - Fintech Wrap Up](https://www.fintechwrapup.com/p/deep-dive-agentic-ai-in-payments) [Payment Orchestration for Agentic Commerce | GR4VY](https://gr4vy.com/posts/payment-orchestration-for-agentic-commerce/) [Mastercard, Visa & Stripe huge push into stablecoin and AI commerce](https://lex.substack.com/p/analysis-mastercard-and-stripe-launch) [The Four Models of Agentic Payments - Fintech Brainfood](https://www.fintechbrainfood.com/p/four-models-agentic-payments) [Beyond the Checkout Page: Who Will Build the Economy for Agentic ...](https://okxventures.medium.com/beyond-the-checkout-page-who-will-build-the-economy-for-agentic-commerce-423b7ce89336) [Agentic Payments – Card Considerations](https://blog.starpointllp.com/?p=6912) [Visa Product Drop – Enabling Agentic](https://blog.starpointllp.com/?p=6958) ### September 2025 ### AI Spec-Driven Development For ~~years~~ months, developers tried to coax code out of AI with short prompts and a bit of luck. The style — dubbed ["vibe coding"](/posts/ai-assisted-development-security-gaps-and-solutions/) — often delivered plausible output but rarely dependable solutions. A new method is gaining ground: ***spec-driven development***. Instead of improvising, teams start with detailed specifications that describe requirements, architecture, and acceptance criteria. The AI then generates and validates code against that blueprint. ⸻ ## specs first, tests second, code third This shift turns the old workflow on its head. [GitHub’s Spec-Kit](https://github.com/github/spec-kit) structures projects into phases — ***specify, plan, task, implement*** — so every line of code maps back to an agreed requirement. Amazon’s [Kiro IDE](https://kiro.dev/) pushes the same approach, with a “Spec Mode” that guides developers through design before handing work to the AI. ⸻ ## why it matters Clear specs cut down on rework and make audits easier. Automated tests validate against requirements. In regulated industries, this trail is essential. HeartFlow, a medical software company, used AI-driven specs to cut system complexity by 90% in just ten weeks. Orchard Software applied the method to healthcare analytics, building a natural language reporting tool that reduced turnaround times without mountains of new code. ⸻ ## friction points AI still struggles with ambiguity. A loose spec can yield code that looks right but hides serious errors. Integrating with legacy systems poses another problem, since older architectures don’t always fit within the narrow context windows of current models. The human role hasn’t disappeared. Developers now spend more time writing precise specs, shaping prompts, and validating outputs. The craft is shifting from keystrokes to orchestration. ⸻ ## what's next Analysts expect rapid adoption. Gartner projects that by 2028, three-quarters of enterprise engineers will work with AI assistants, and spec-driven workflows will be standard on new projects. As tools expand beyond code to project management, testing, and deployment, specifications will link the entire development cycle. Engineers will focus less on typing code and more on defining what that code should achieve. The blueprint, not the keystroke, is becoming the real currency of software work. --- ## references [GitHub Blog: Spec-Driven Development with AI](https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/) [Kiro Blog: From Chat to Specs - Deep Dive](https://kiro.dev/blog/from-chat-to-specs-deep-dive/) [Cloudester: 8 Benefits of Leveraging the Power of AI in Software Development](https://cloudester.com/8-benefits-of-leveraging-the-power-of-ai-in-software-development/) [MIT News: Can AI Really Code?](https://news.mit.edu/2025/can-ai-really-code-study-maps-roadblocks-to-autonomous-software-engineering-0716) [BetaNews: The Challenges of Using AI in Software Development](https://betanews.com/2025/04/30/the-challenges-of-using-ai-in-software-development-qa/) [AWS Blog: AI-Driven Development Life Cycle](https://aws.amazon.com/blogs/devops/ai-driven-development-life-cycle/) [Anup.io: From Coding to Spec Writing](https://www.anup.io/p/from-coding-to-spec-writing) [GitHub Blog: Spec-Driven Development with AI](https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/) [Ketryx: HeartFlow Case Study](https://www.ketryx.com/case-studies/heartflow-case-study) [Frontend at Scale](https://frontendatscale.com/issues/49/) [Tribe AI: Orchard Applies GenAI for a Faster, Easier-to-Use Lab Reporting Interface](https://www.tribe.ai/case-studies/orchard-applies-genai-for-a-faster-easier-to-use-lab-reporting-interface) [Index.dev: 11 Generative AI Use Cases in Software Development](https://www.index.dev/blog/11-generative-ai-use-cases-software-development) [IBM: AI in Software Development](https://www.ibm.com/think/topics/ai-in-software-development) [Aimprosoft: Software Development Trends](https://www.aimprosoft.com/blog/software-development-trends/) [Saigon Technology: The Future Growth of AI Software Development](https://saigontechnology.com/blog/the-future-growth-of-ai-software-development/) [MIT News: Can AI Really Code?](https://news.mit.edu/2025/can-ai-really-code-study-maps-roadblocks-to-autonomous-software-engineering-0716) ### Building Trustworthy AI Agents for Ethical OSINT in 2025 Open-source intelligence has always thrived on scale. Analysts comb through public data—social media posts, forums, blogs—looking for signals in the noise. Artificial intelligence supercharges that process, scanning oceans of content in seconds. But speed creates new risks. If the agents collecting and interpreting this data can’t be trusted, the intelligence built on top of them crumbles. ⸻ ## the trust problem Modern AI agents are complex and opaque. Their decisions emerge from billions of parameters that few users can explain. In high-stakes contexts, that opacity is dangerous. Intelligence officers may struggle to show how an AI reached a conclusion, leaving assessments open to doubt. Bias compounds the problem. Open datasets reflect the prejudices of their sources, and models trained on them can amplify skewed viewpoints. Analysts risk mistaking distorted patterns for fact. The supply chain adds another layer of fragility. Open-source models and libraries can be tampered with, creating backdoors or poisoned datasets. A single compromise could infect an entire OSINT workflow. And then there’s human overreliance. When analysts trust automated output too much, errors slip through unchecked. Studies show that once confidence sets in, verification drops. ⸻ ## best practices emerging Transparency is the foundation. Agents need audit trails that show how they gathered data, which tools they used, and how they reached conclusions. Techniques like model cards and reasoning traces provide that accountability. Human oversight remains essential. Systems should let analysts preview, approve, and override AI actions. This keeps decision-making grounded in judgment rather than blind automation. Security practices are evolving as well. Cryptographic signing, dependency checks, and adversarial testing harden supply chains. Regular audits and red-teaming help identify vulnerabilities before adversaries do. Bias is managed through diverse datasets, fairness metrics, and embedded bias detection. None of these remove prejudice entirely, but they flag risks early. Compliance closes the loop. Privacy laws such as GDPR and the EU AI Act require explicit attention to consent and data minimization. Systems that ignore these boundaries risk legal and ethical fallout. ⸻ ## why transparency matters Traceable outputs allow organizations to justify conclusions, withstand audits, and expose manipulation. Analysts are more willing to use AI tools they understand. Confidence grows when they can see why an output emerged and check its sources. Regulators agree. The EU AI Act and similar laws frame transparency as a compliance requirement, not an optional extra. Building explainability into systems is both a legal safeguard and a way to cultivate trust. ⸻ ## ethics under pressure Public data doesn’t mean free-for-all. Aggregating information at scale can still violate expectations of privacy. Responsible OSINT demands restraint, consent where possible, and clear boundaries on what is collected. Bias and fairness remain unresolved tensions. Development teams that include diverse perspectives are less likely to overlook systemic blind spots. Regular audits help surface discriminatory outcomes before they are embedded in assessments. And while national security imperatives drive much OSINT work, civil liberties can’t be ignored. Ethical intelligence gathering balances operational need against the rights of those whose data is being processed. ⸻ ## looking forward The future of AI-powered OSINT depends on trust. Without trustworthy AI agents, intelligence risks becoming unreliable and ethically problematic. Frameworks like [TrustAgent](https://github.com/agiresearch/TrustAgent) and [TRiSM](https://www.gartner.com/en/articles/ai-trust-and-ai-risk) sketch a path toward more reliable AI agents. They emphasize resilience, transparency, and ethical guardrails. Multi-agent debate, structured communication, and layered oversight are likely to become common features in the next wave of OSINT systems. Progress will require coordination across governments, academia, and industry. The race to harness AI for intelligence is accelerating, but without trust, speed becomes liability. By adopting best practices and preparing for stricter regulatory landscapes, intelligence practitioners can ensure AI remains a valuable, ethical partner in the pursuit of open-source intelligence. --- ## FAQ **What are the risks of AI in OSINT?** Risks include biased outputs, supply chain vulnerabilities, overreliance on AI, and ethical violations around privacy and consent. **How can transparency be built into AI systems?** By implementing audit trails, explainable AI frameworks, and traceable decision-making processes. **What role do humans play in AI-driven OSINT?** Humans remain essential for oversight, validation, and ethical judgment. AI should augment, not replace, human analysts. **How do AI agents amplify bias?** They can inherit biases from training data or replicate distortions present in OSINT sources, magnifying systemic issues. **Which laws regulate AI use in intelligence gathering?** Frameworks such as GDPR, CCPA, and the EU AI Act define legal boundaries for privacy and ethical AI use. **What are the best practices for ethical OSINT?** Maintaining transparency, securing supply chains, minimizing data use, and adhering to fairness and privacy standards. **How is trust in AI measured?** Through reliability metrics, bias audits, explainability tools, and human oversight mechanisms. **What does the future of trustworthy AI in OSINT look like?** It will involve robust governance, collaborative frameworks, advanced explainability, and increased ethical scrutiny. --- ### August 2025 ### Clankers and Sloppers # Clankers and Sloppers: Memes, AI Agents, and Digital Culture The future, it seems, is being shaped by “Clankers” and “Sloppers” — two terms that have rapidly entered the cultural lexicon as society grapples with the rise of artificial intelligence and automation. What began as internet slang and meme fodder has quickly become a lens through which anxieties, humor, and generational divides over technology are being expressed. ⸻ ## what are clankers and sloppers? **clankers** The term “Clanker” is borrowed from the _Star Wars_ universe, where it was used by clone troopers as a derogatory nickname for battle droids. In 2025, the word has been repurposed by Gen Z and meme culture as a pejorative for robots and AI systems, especially those seen as replacing human jobs or intruding into daily life. The term is now widely used on platforms like TikTok, X (formerly Twitter), and Instagram, often in a tongue-in-cheek or satirical way. For example, viral videos show users yelling “clanker” at delivery robots, and memes riff on “anti-clanker sentiment” and the imagined “clanker wars” of the future. The hashtag #clanker has amassed millions of views in just weeks, and even U.S. lawmakers have referenced the term in discussions about automation and customer service bots [[Newsweek](https://www.newsweek.com/clanker-ai-slur-customer-service-jobs-star-wars-2106482)][[Mashable](https://mashable.com/article/new-robot-slur-clanker-trending-on-social-media)][[YPulse](https://www.ypulse.com/newsfeed/2025/07/25/young-people-are-using-the-term-clanker-from-star-wars-as-a-derogatory-term-for-ai/)]. **sloppers** “Slopper” is a newer addition, canonically attributed to a TikTok user’s friend in 2025. It refers to a person who uses ChatGPT or similar AI tools to do everything for them — essentially outsourcing all thinking, decision-making, and even mundane tasks to generative AI. The term is often used to poke fun at people who, for example, ask ChatGPT what to order at a restaurant or rely on AI for every aspect of their work and personal life. “Slopper” has quickly become a shorthand for a perceived overreliance on AI, and is often used alongside other new slurs like “_botlicker_,” “_ChatNPC_,” and “_secondhand thinker_” [[Today in Tabs](https://www.todayintabs.com/p/we-need-to-talk-about-sloppers-b732)][[Inc.](https://www.inc.com/kaylawebster/why-using-chatgpt-at-work-could-hurt-your-reputation/91220176)]. ⸻ ## why are these terms trending now? The rise of “Clankers” and “Sloppers” reflects a broader cultural moment. As AI systems like ChatGPT, Gemini, and Claude become ubiquitous — handling everything from customer service to creative work — public sentiment is split between fascination, fatigue, and fear. Gen Z, in particular, is entering a job market where entry-level roles are vanishing due to automation, fueling both economic anxiety and a wave of dark humor [[Newsweek](https://www.newsweek.com/clanker-ai-slur-customer-service-jobs-star-wars-2106482)]. The use of these terms is not just about venting frustration. Linguists and cultural critics note that calling robots “clankers” or heavy AI users “sloppers” is a way of anthropomorphizing technology, simultaneously personifying and dehumanizing it. This mirrors deeper concerns about the loss of human agency, the erosion of critical thinking, and the blurring of lines between human and machine [[HuffPost](https://nz.news.yahoo.com/theres-officially-term-used-insult-110017147.html)][[Envisioning](https://www.envisioning.io/vocab/clanker)]. ⸻ ## the social and economic impact **Workplace Dynamics** The rise of “slopper” as a workplace insult points to a growing stigma against those who rely too heavily on AI for their tasks. Studies show that employees who admit to using AI for work are often distrusted by colleagues, who value “genuine human effort” in writing, thinking, and innovating. This tension is likely to intensify as business leaders push for more AI adoption, even as workers fear for their reputations and job security [[Inc.](https://www.inc.com/kaylawebster/why-using-chatgpt-at-work-could-hurt-your-reputation/91220176)]. **Labor Market Shifts** The “clanker” meme is rooted in real economic shifts. Unemployment among recent college graduates has surged, and job postings for roles exposed to AI have dropped sharply since the release of ChatGPT. Experts warn that AI could wipe out half of all entry-level white-collar jobs within five years, with robots increasingly performing tasks once reserved for humans [[Newsweek](https://www.newsweek.com/clanker-ai-slur-customer-service-jobs-star-wars-2106482)]. **Cultural Backlash and Meme Culture** The rapid spread of these terms is also a testament to the power of meme culture in shaping public discourse. Gen Z, in particular, uses humor and satire to process the rapid changes brought by AI, often roasting tech CEOs and mocking the relentless push for automation. The “clanker” and “slopper” memes are both a coping mechanism and a form of social criticism, reflecting a generation’s ambivalence about the future [[Her Campus](https://www.hercampus.com/culture/clanker-what-is-gen-z-star-wars-slur-meaning/)][[ScreenRant](https://screenrant.com/star-wars-clanker-slur-controversy/)]. ⸻ ## what does the future hold? As AI continues to advance, the divide between “clankers” (the machines) and “sloppers” (the humans who rely on them) is likely to become a defining feature of social and economic life. Some experts predict a future where AI-generated “slop” (endless streams of low-effort, engagement-baiting content) dominates digital culture, while others warn of a collapse in entry-level hiring and a growing sense of alienation among young workers [[Stephen Diehl](https://www.stephendiehl.com/posts/ai_slop_2027/)]. Yet, there is also a note of resilience and adaptation. As the boundaries between human and machine blur, new forms of creativity, resistance, and even solidarity may emerge. The language we use — whether mocking, fearful, or hopeful — will continue to shape how we navigate the age of Clankers and Sloppers. ⸻ Key Sources: * [Newsweek: There's Already a Slur for the AI Taking Peoples' Jobs](https://www.newsweek.com/clanker-ai-slur-customer-service-jobs-star-wars-2106482) * [Today in Tabs: We Need to Talk About Sloppers](https://www.todayintabs.com/p/we-need-to-talk-about-sloppers-b732) * [Mashable: 'Clanker' is social media's new slur for our robot future](https://mashable.com/article/new-robot-slur-clanker-trending-on-social-media) * [Inc.: Why Using ChatGPT at Work Could Hurt Your Reputation](https://www.inc.com/kaylawebster/why-using-chatgpt-at-work-could-hurt-your-reputation/91220176) * [Envisioning: Clanker](https://www.envisioning.io/vocab/clanker) * [Her Campus: What's A Clanker?](https://www.hercampus.com/culture/clanker-what-is-gen-z-star-wars-slur-meaning/) * [ScreenRant: Is 'Clanker' the New AI Slur?](https://screenrant.com/star-wars-clanker-slur-controversy/) * [Stephen Diehl: AI Slop 2027](https://www.stephendiehl.com/posts/ai_slop_2027/) ⸻ ## Summary Table: Clankers vs. Sloppers | Term | Origin | Meaning (2025) | Cultural Use | Underlying Anxiety | |------|--------|----------------|---------------|---------------------| | Clanker | Star Wars | Robot/AI system, esp. job-replacing automation | Meme, satire, labor discourse | Job loss, dehumanization, AI fatigue | | Slopper | TikTok (2025) | Person who uses AI for everything | Social criticism, workplace | Loss of agency, critical thinking, reputation | --- The future, for better or worse, is being written by, and about, Clankers and Sloppers. ### July 2025 ### AI Red Lines and Alignment: Governing Automated Decisions Impacting Humans As artificial intelligence (AI) systems become increasingly embedded in critical decision-making, the concepts of *AI red lines*, *alignment*, and the broader societal impact of algorithmic decisions have become central to global policy, ethics, and technology discussions. This article explores what defines an AI red line, the challenges of AI alignment, and the real-world consequences, positive and negative, of entrusting consequential decisions to algorithms. --- ### AI Red Lines: Defining the Boundaries **What Are AI Red Lines?** AI red lines are non-negotiable prohibitions on AI behaviors or applications deemed too dangerous, high-risk, or unethical. These boundaries safeguard human survival, security, and liberty, akin to bans on human cloning or biological weapons. In AI, red lines serve as regulatory "rules of the road" to prevent harm and preserve public interest. **Examples of AI Red Lines** - **Child Exploitation:** Ban on AI manipulating or surveilling children. - **Lethal Autonomous Weapon Systems (LAWS):** Prohibition of AI weapons that engage targets without meaningful human control. - **Social Scoring:** Forbidding systems that assign individuals a "social credit" affecting rights or services. - **Autonomous Self-Replication:** Preventing AI from copying or improving itself without oversight. - **Power Seeking and Cyberattacks:** Bans on AI that autonomously expands its influence or conducts cyber operations. These boundaries are reflected in frameworks like the EU AI Act, the G7 Hiroshima AI Process, and the Council of Europe’s AI Convention. Enforcement mechanisms include licensing, market controls, and legal penalties ([The Future Society](https://thefuturesociety.org/airedlines-partone), [WEF](https://www.weforum.org/stories/2025/03/ai-red-lines-uses-behaviours/)). --- ### AI Alignment: Ensuring AI Works for Humanity **What Is AI Alignment?** AI alignment ensures systems act according to human values and ethics. It's both a technical and philosophical challenge: human values are complex and context-dependent, and misaligned AI can lead to harmful or unethical outcomes ([WEF](https://www.weforum.org/stories/2024/10/ai-value-alignment-how-we-can-align-artificial-intelligence-with-human-values/), [IBM](https://www.ibm.com/think/topics/ai-alignment)). **Key Alignment Challenges** - **Value Specification:** Converting abstract human values into concrete machine objectives. - **Robustness:** Ensuring AI behaves as intended, even in novel or adversarial scenarios. - **Interpretability:** Making AI decision-making transparent. - **Scalability:** Sustaining alignment as AI complexity grows. - **Avoiding Specification Gaming:** Preventing exploitation of loopholes in AI reward functions. **Why Alignment Matters** Poor alignment can amplify bias, enable discrimination, or pursue goals in harmful ways. For instance, an AI optimizing productivity might promote burnout, or a hiring algorithm may perpetuate past inequalities ([Ironhack](https://www.ironhack.com/us/blog/exploring-the-challenges-of-ensuring-ai-alignment)). --- ### Automated Algorithmic Decision-Making: Societal Impacts **How Algorithms Shape Lives** Automated decision-making systems (ADMS) are used in hiring, credit scoring, healthcare, law enforcement, and education. While offering efficiency and scalability, they pose ethical and practical concerns: - **Bias and Discrimination:** AI trained on biased data can worsen social inequalities ([Springer](https://link.springer.com/article/10.1007/s43681-022-00233-w)). - **Transparency and Accountability:** "Black box" systems hinder understanding and contesting decisions ([ACLU](https://www.aclu-wa.org/story/automated-decision-making-systems-are-making-some-most-important-life-decisions-you-you-might)). - **Erosion of Human Agency:** Over-reliance can lead to diminished oversight and mere "rubber-stamping" of AI outputs ([Frontiers](https://www.frontiersin.org/journals/political-science/articles/10.3389/fpos.2023.1238461/full)). - **Real-World Cases:** Incidents like the UK’s 2020 exam grading debacle and racially biased criminal justice risk tools highlight the dangers ([AlgorithmWatch](https://automatingsociety.algorithmwatch.org/)). **Balancing Innovation and Risk** While AI can enhance decision-making, unregulated deployment risks significant harm. Guardrails — technical, policy, and legal — are essential to ensure fairness, transparency, and accountability ([Blue Prism](https://www.blueprism.com/guides/ai/ai-guardrails/), [BBVA](https://www.bbva.com/en/innovation/responsible-ai-why-do-we-need-guardrails-for-artificial-intelligence/)). --- ### Current Debates and Policy Responses **Global Regulatory Landscape** - **EU:** The AI Act sets the world’s first comprehensive AI framework, with risk-based categories and strict high-risk requirements ([EU AI Act](https://www.europarl.europa.eu/RegData/etudes/BRIE/2021/698792/EPRS_BRI(2021)698792_EN.pdf)). - **U.S.:** A fragmented mix of federal and state laws, executive actions, and sector-specific guidelines ([Stanford AI Index](https://hai.stanford.edu/ai-index/2025-ai-index-report/policy-and-governance), [EY](https://www.ey.com/en_us/insights/public-policy/ai-policy-landscape)). - **China:** Regulation centers on state security, social harmony, and strategic priorities, with strict AI controls ([World Bank](https://documents1.worldbank.org/curated/en/099120224205026271/pdf/P1786161ad76ca0ae1ba3b1558ca4ff88ba.pdf)). **Key Policy Tools** - Red lines and prohibitions - Risk-based regulation - Transparency and audit requirements - Mandatory human oversight **Ongoing Challenges** - **Global Coordination:** Diverging national approaches risk fragmentation and governance gaps. - **Enforcement:** Ensuring laws are not only passed but followed. - **Decentralized AI:** Open-source models challenge traditional regulatory structures ([MDPI](https://www.mdpi.com/2673-2688/6/7/159)). --- ### What's Needed for the Future? The convergence of AI red lines, alignment, and algorithmic decision-making is reshaping international law, policy, and society. As AI grows more powerful, clear boundaries, ethical alignment, and effective governance become critical. Policymakers, technologists, and civil society must collaborate to ensure AI serves the public interest — maximizing benefits while minimizing harm. For more, visit [The Future Society](https://thefuturesociety.org/airedlines-partone), [WEF](https://www.weforum.org/stories/2025/03/ai-red-lines-uses-behaviours/), [Stanford HAI](https://hai.stanford.edu/ai-index/2025-ai-index-report/policy-and-governance), and [AlgorithmWatch](https://automatingsociety.algorithmwatch.org/). ### OWASP Top 10 for LLM Applications 2025 # OWASP Top 10 for LLM Applications 2025: The Essential Security Framework for AI Engineers and DevSecOps Teams ## Introduction Large Language Model applications have rapidly evolved from experimental tools to production-critical systems handling sensitive data and executing business logic. However, the unique characteristics of LLMs — their probabilistic nature, natural language interfaces, and complex training pipelines — introduce attack vectors that traditional security frameworks fail to address adequately. The [OWASP Top 10 for LLM Applications 2025](https://owasp.org/www-project-top-10-for-large-language-model-applications/) represents a significant evolution from the 2023 baseline, incorporating lessons learned from production incidents, emerging threat patterns, and the maturation of LLM deployment architectures. This framework addresses critical gaps in traditional security approaches, providing actionable guidance for securing systems that process unstructured natural language inputs and generate dynamic outputs. Unlike conventional web applications where input validation follows predictable patterns, LLM applications must handle adversarial inputs designed to exploit the model's **training** and **inference** mechanisms. The probabilistic nature of neural networks makes deterministic security controls challenging to implement, requiring new approaches that balance security with functional requirements. The 2025 update reflects significant architectural shifts in LLM deployments, including the widespread adoption of [Retrieval-Augmented Generation (RAG) systems](/posts/what-is-retrieval-augmented-generation-rag/), multi-agent architectures, and edge deployment scenarios. These developments have expanded the attack surface considerably, necessitating specialized security controls for vector databases, embedding systems, and distributed AI inference. ### Links https://genai.owasp.org/llm-top-10/ https://owasp.org/www-project-top-10-for-large-language-model-applications/ ## Architectural Context and Threat Landscape ### LLM Application Architecture Overview Modern LLM applications typically implement a multi-layered architecture comprising user interfaces, API gateways, orchestration layers, model inference engines, and data retrieval systems. Each layer introduces specific security considerations that must be addressed holistically. The orchestration layer often manages complex workflows involving multiple model calls, tool invocations, and data retrieval operations. This layer presents unique challenges for access control and audit logging, particularly when implementing agentic systems that make autonomous decisions about tool usage and data access. [Vector databases and embedding systems](/posts/understanding-llm-specialization-rag-vs-fine-tuning/) have become critical components in RAG architectures, creating new attack surfaces related to similarity search algorithms, embedding space manipulation, and cross-tenant data isolation. These systems require specialized security controls that differ significantly from traditional database security approaches. ### Emerging Threat Patterns The threat landscape for LLM applications has evolved to include sophisticated prompt injection campaigns, model extraction attacks using API access patterns, and supply chain compromises targeting fine-tuning datasets and model repositories. Adversaries have developed techniques that exploit the semantic understanding capabilities of LLMs to bypass traditional input validation mechanisms. Advanced persistent threats (APTs) have begun incorporating LLM-specific attack techniques into their playbooks, including the use of steganographic prompt injections and adversarial examples designed to trigger specific model behaviors. These attacks often combine multiple vulnerability classes to achieve their objectives. ## The 2025 Top 10 Vulnerabilities according to OWASP - LLM01: Prompt Injection - Advanced Attack Vectors and Defenses - LLM02: Sensitive Information Disclosure - Data Leakage Prevention - LLM03: Supply Chain Vulnerabilities - Securing the AI Pipeline - LLM04: Data and Model Poisoning - Integrity Assurance - LLM05: Improper Output Handling - Securing AI-Generated Content - LLM06: Excessive Agency - Controlling Autonomous AI Systems - LLM07: System Prompt Leakage - Protecting Configuration Data - LLM08: Vector and Embedding Weaknesses - Securing RAG Systems - LLM09: Misinformation - Technical Approaches to AI Reliability - LLM10: Unbounded Consumption - Resource Management and Rate Limiting ### LLM01: Prompt Injection - Advanced Attack Vectors and Defenses Prompt injection represents the most fundamental security challenge in LLM applications, exploiting the models' inability to reliably distinguish between instructions and data. The vulnerability manifests in multiple forms, each requiring specific defensive measures. **Direct Prompt Injection Techniques:** Advanced attackers employ techniques such as payload splitting, where malicious instructions are distributed across multiple inputs to evade detection systems. Token-level attacks exploit the model's tokenization process, using carefully crafted character sequences that alter semantic meaning during tokenization. Context window manipulation attacks leverage the model's attention mechanisms to prioritize malicious instructions over legitimate system prompts. These attacks often use techniques borrowed from adversarial machine learning research, including gradient-based optimization to find effective prompt modifications. **Indirect Prompt Injection Vectors:** Indirect injections through external data sources represent a particularly insidious attack vector. Attackers can embed malicious instructions in documents, web pages, or other content that the LLM processes as part of RAG operations. These instructions may use techniques such as: - Markdown injection with hidden formatting that affects model interpretation - Unicode manipulation to hide instructions from human reviewers - Semantic camouflage using contextually appropriate language that contains hidden commands **Technical Mitigation Strategies:** - Implement input sanitization using semantic analysis rather than pattern matching. - Deploy multiple model architectures in a dual-LLM pattern where one model validates the outputs of another. - Use constitutional AI techniques to train models that resist instruction-following when inputs appear to contain prompt injections. Implement *runtime monitoring* using *embedding similarity analysis* to detect inputs that deviate significantly from expected usage patterns. Deploy *canary tokens* within system prompts to detect extraction attempts and automatically trigger security responses. ### LLM02: Sensitive Information Disclosure - Data Leakage Prevention Sensitive information disclosure in LLM contexts extends beyond traditional data exfiltration to include model inversion attacks, training data extraction, and inadvertent exposure of system architecture details through model responses. **Technical Attack Mechanisms:** Model inversion attacks use carefully crafted queries to extract specific information from training data. These attacks exploit the model's tendency to memorize rather than generalize certain types of information, particularly personal identifiers and structured data formats. Adversaries may employ techniques such as: - Iterative querying with increasing specificity to extract personal information - Template-based attacks that exploit the model's familiarity with common data formats - Gradient-based extraction methods when model internals are accessible **Advanced Prevention Techniques:** - Implement differential privacy mechanisms during training to limit memorization of individual data points. - Use techniques such as DP-SGD (Differentially Private Stochastic Gradient Descent) with carefully tuned noise parameters to maintain utility while protecting privacy. Deploy *post-processing filters* using *named entity recognition (NER)* and regular expressions to detect and redact sensitive information in model outputs. Implement *semantic analysis* to identify potential information leakage patterns that may not match traditional regex patterns. ### LLM03: Supply Chain Vulnerabilities - Securing the AI Pipeline LLM supply chains present unique security challenges due to the complexity of model development pipelines, the prevalence of pre-trained models, and the emergence of collaborative model development platforms. **Supply Chain Attack Vectors:** Model poisoning attacks target the training pipeline by introducing malicious data or compromising the training process itself. These attacks may involve: - Dataset poisoning through contributed training data - Backdoor insertion during fine-tuning processes - Compromise of model repositories and distribution channels - Malicious LoRA adapters that modify model behavior when loaded **Technical Security Controls:** - Implement cryptographic verification for all model artifacts using digital signatures and hash verification. - Establish a model bill of materials (MBOM) tracking system that maintains provenance information for all model components, including base models, fine-tuning data, and adapter modules. Deploy *automated scanning systems* for model repositories that can detect potential backdoors or anomalous model behaviors. Use techniques such as *neural cleanse* and other backdoor detection algorithms to identify compromised models before deployment. ### LLM04: Data and Model Poisoning - Integrity Assurance Data and model poisoning attacks target the fundamental integrity of LLM systems by corrupting training processes or introducing malicious behaviors that activate under specific conditions. **Advanced Poisoning Techniques:** Modern poisoning attacks use sophisticated techniques such as: - Gradient matching to ensure poisoned samples integrate seamlessly with legitimate training data - Trigger optimization to identify effective backdoor activation patterns - Clean-label attacks that don't require label modification but still achieve targeted behaviors - Distributed poisoning across multiple data sources to evade detection **Technical Detection and Prevention:** - Implement gradient-based detection methods that analyze training dynamics to identify potential poisoning attempts. - Use spectral signatures and other statistical techniques to detect anomalous patterns in training data. Deploy *federated learning security mechanisms* when training involves multiple data sources, including Byzantine-resilient aggregation methods and differential privacy techniques to limit the impact of malicious participants. ### LLM05: Improper Output Handling - Securing AI-Generated Content LLM output handling requires specialized approaches due to the dynamic and potentially adversarial nature of AI-generated content. Traditional output encoding may be insufficient for content that can contain semantically meaningful but syntactically dangerous constructs. **Technical Implementation Challenges:** LLM outputs may contain code snippets, markup, or structured data that appears legitimate but contains security vulnerabilities. The challenge lies in implementing validation that maintains the semantic richness of LLM outputs while preventing injection attacks. Context-dependent vulnerabilities arise when LLM outputs are processed by different downstream systems with varying security requirements. A single output may be safe for display but dangerous when passed to a code execution environment. **Advanced Mitigation Techniques:** - Implement semantic analysis using secondary LLMs trained specifically for security validation. - Deploy multi-stage validation pipelines that analyze outputs at syntactic, semantic, and pragmatic levels. Use *sandboxing techniques* for any LLM outputs that may be interpreted as code or commands. Implement *capability-based security models* that restrict the actions available to LLM-generated content based on the original user's permissions. ### LLM06: Excessive Agency - Controlling Autonomous AI Systems Excessive agency vulnerabilities become critical as organizations deploy increasingly autonomous AI systems capable of making decisions and taking actions with minimal human oversight. **Agentic Architecture Security:** Modern agentic systems often implement complex decision trees involving multiple tool calls, API interactions, and data retrievals. Each decision point represents a potential security control point that must be carefully designed to prevent abuse. Tool selection and parameter passing in agentic systems require careful validation to prevent attackers from manipulating the agent into accessing unauthorized resources or performing unintended actions. **Technical Control Implementation:** - Implement capability-based security models where agents receive only the minimum necessary permissions for their intended functions. - Use formal verification techniques where possible to prove that agent behaviors remain within acceptable bounds. Deploy r*untime monitoring systems* that track agent decision paths and flag anomalous behavior patterns. Implement *circuit breakers* that automatically restrict agent capabilities when suspicious activity is detected. ### LLM07: System Prompt Leakage - Protecting Configuration Data System prompt leakage vulnerabilities require careful architectural design to ensure that sensitive configuration information cannot be extracted through model interactions. **Technical Root Causes:** System prompt leakage often occurs due to insufficient separation between system instructions and user context, inadequate prompt injection defenses, or design patterns that rely on prompts for security controls rather than architectural enforcement. The vulnerability is exacerbated by the attention mechanisms in transformer architectures, which may inadvertently give prominence to system instructions when processing adversarial inputs. **Architectural Mitigations:** - Implement system-level controls that operate independently of the model's prompt processing. - Use external validation and authorization systems rather than relying on prompt-based instructions for security enforcement. Design *prompt architectures* that minimize the inclusion of sensitive information in system prompts. Implement *dynamic prompt generation* that adapts system instructions based on user context and authorization levels. ### LLM08: Vector and Embedding Weaknesses - Securing RAG Systems Vector and embedding security requires specialized approaches due to the mathematical properties of embedding spaces and the unique characteristics of similarity-based retrieval systems. **Technical Attack Vectors:** Embedding inversion attacks use mathematical techniques to reconstruct original text from embedding vectors. These attacks exploit the fact that high-dimensional embeddings often retain more information about the original text than intended. Cross-context contamination in multi-tenant vector databases can occur when embedding spaces overlap, allowing queries from one tenant to retrieve semantically similar content from another tenant's data. **Advanced Security Techniques:** - Implement embedding space partitioning using techniques such as random projection or adversarial training to create tenant-specific embedding subspaces. - Use homomorphic encryption for embedding storage when complete isolation is required. Deploy *query monitoring systems* that analyze retrieval patterns to detect potential information leakage or unauthorized access attempts. Implement *differential privacy mechanisms* for embedding generation to limit information leakage. ### LLM09: Misinformation - Technical Approaches to AI Reliability Misinformation mitigation requires technical approaches that can assess the accuracy and reliability of AI-generated content in real-time. **Technical Detection Methods:** Implement fact-checking pipelines using multiple information sources and consistency checking algorithms. Use uncertainty quantification techniques to assess model confidence and flag low-confidence outputs for human review. Deploy ensemble methods that combine multiple models or information sources to improve accuracy and detect potential misinformation through disagreement analysis. **Implementation Strategies:** - Integrate external knowledge bases and real-time fact-checking APIs into the generation pipeline. - Implement semantic consistency checks that validate generated content against authoritative sources. Use *calibration techniques* to improve model confidence estimates and implement threshold-based filtering for uncertain outputs. ### LLM10: Unbounded Consumption - Resource Management and Rate Limiting Unbounded consumption attacks require sophisticated rate limiting and resource management approaches that account for the variable computational costs of different LLM operations. **Technical Implementation Challenges:** LLM inference costs vary significantly based on input length, complexity, and model architecture. Traditional rate limiting based on request counts may be inadequate for preventing resource exhaustion attacks. Model extraction attacks use carefully crafted query patterns to extract model behavior while staying within apparent usage limits. These attacks require detection systems that analyze query patterns rather than just volume. **Advanced Protection Mechanisms:** - Implement dynamic rate limiting based on computational cost estimation rather than simple request counting. - Use machine learning models trained to detect extraction attempts based on query pattern analysis. Deploy *resource isolation techniques* using containerization and resource quotas to limit the impact of resource exhaustion attacks on overall system availability. ## Implementation Roadmap for Technical Teams ### Phase 1: Assessment and Foundation (Months 1-2) Begin with comprehensive threat modeling specific to your LLM architecture. Inventory all LLM touchpoints, data flows, and integration points. Assess current security controls against the OWASP Top 10 framework. Implement basic monitoring and logging infrastructure to establish baseline visibility into LLM operations. Deploy initial input validation and output sanitization controls for the most critical attack vectors. ### Phase 2: Core Security Controls (Months 3-6) Deploy advanced prompt injection defenses using multi-model validation approaches. Implement comprehensive access controls and permission management for agentic systems. Establish secure supply chain processes for model management, including cryptographic verification and automated security scanning. Deploy vector database security controls for RAG systems. ### Phase 3: Advanced Defenses and Monitoring (Months 6-12) Implement machine learning-based attack detection systems for sophisticated threats such as model extraction and poisoning attempts. Deploy differential privacy mechanisms for sensitive data protection. Establish automated incident response capabilities for LLM-specific security events. Implement comprehensive security metrics and reporting for stakeholder visibility. ### Integration with Existing Security Infrastructure LLM security controls must integrate seamlessly with existing security information and event management (SIEM) systems, identity and access management (IAM) platforms, and security orchestration tools. Develop custom detection rules and correlation logic for LLM-specific attack patterns. Implement automated response workflows that can isolate compromised LLM systems while maintaining operational continuity. ## Monitoring and Detection Strategies ### Technical Metrics and Alerting Implement comprehensive monitoring covering prompt injection attempt rates, unusual output patterns, resource consumption anomalies, and access pattern deviations. Use statistical analysis and machine learning models to establish baselines and detect anomalies. Deploy real-time alerting for critical security events such as suspected model extraction attempts, significant prompt injection campaigns, or unauthorized access to sensitive embeddings. ### Incident Response Procedures Develop LLM-specific incident response procedures that account for the unique characteristics of AI system compromises. Include procedures for model quarantine, training data investigation, and assessment of potential data exposure. Establish communication protocols for coordinating with AI model vendors and cloud providers during security incidents. Develop forensic capabilities for analyzing LLM attack patterns and impact assessment. ## Conclusion The OWASP Top 10 for LLM Applications 2025 provides essential technical guidance for securing AI systems in production environments. The framework addresses the fundamental security challenges inherent in probabilistic AI systems while providing practical implementation guidance for engineering teams. Success in LLM security requires a comprehensive approach that combines traditional security engineering principles with AI-specific techniques. Organizations must invest in specialized security capabilities, develop new monitoring and detection approaches, and establish incident response procedures tailored to AI system characteristics. The rapidly evolving nature of LLM technology demands continuous adaptation of security approaches. Engineering teams must maintain awareness of emerging threats, participate in the security research community, and implement flexible architectures that can accommodate new defensive techniques as they become available. As LLM systems become increasingly critical to business operations, the investment in comprehensive security controls becomes essential for maintaining operational resilience and protecting sensitive data. The OWASP framework provides the foundation for building secure, reliable AI systems that can operate safely in production environments. ----- ## **FAQ** **Q: How do we implement effective prompt injection detection in real-time systems?** A: Implement multi-layered detection using embedding similarity analysis, pattern matching for known attack vectors, and secondary LLM validation. Use streaming analytics platforms to process inputs in real-time with sub-100ms latency requirements. Consider implementing circuit breakers that automatically restrict functionality when attack patterns are detected. **Q: What are the performance implications of implementing comprehensive LLM security controls?** A: Security controls typically add 10-30% latency overhead depending on implementation complexity. Input validation and output sanitization have minimal impact, while multi-model validation and cryptographic operations require more significant resources. Implement caching strategies and asynchronous processing where possible to minimize user-facing impact. **Q: How should we handle security for fine-tuned models and LoRA adapters?** A: Implement cryptographic signing for all model artifacts, maintain detailed provenance tracking, and deploy automated security scanning for adapter modules. Use sandboxed environments for testing new adapters and implement rollback capabilities for compromised models. Consider implementing adapter validation using baseline model comparisons. **Q: What monitoring metrics are most critical for detecting LLM-specific attacks?** A: Monitor prompt injection attempt rates, output entropy changes, resource consumption per query, embedding retrieval pattern anomalies, and cross-tenant access attempts. Implement statistical baselines for normal operation and alert on significant deviations. Use machine learning models to detect subtle attack patterns that rule-based systems might miss.​​​​​​​​​​​​​​​​ ### June 2025 ### What Is Retrieval-Augmented Generation (RAG)? > *Imagine if your AI could check its facts before answering.* That’s the power of Retrieval-Augmented Generation (RAG) — a framework that adds real-time context to AI responses, improving accuracy, reducing hallucinations, and unlocking new use cases for businesses. --- ## What Is RAG? **RAG = LLM + Real-Time Data** Retrieval-Augmented Generation enhances a large language model (LLM) by connecting it to a retriever that pulls relevant data from a knowledge base *before* the model generates a response. The result? Answers that are grounded in context and customized to your business, product, or user. --- ## How RAG Works RAG follows a simple, powerful loop: 1. **User prompt** → “Why are hotel prices in Sydney high this weekend?” 2. **Retriever searches a knowledge base** → Pulls context from news, support docs, or databases. 3. **Prompt is augmented** → Combines the user query with retrieved information. 4. **LLM generates the final answer** → Now grounded in trusted, up-to-date data. --- ## Business Benefits of RAG - **Fresher responses** No need to retrain the LLM, just update your data. - **Domain-specific knowledge** Pulls info from your own documents and systems. - **Fewer hallucinations** Adds grounding context so the model doesn’t guess. - **Built-in citations** Users can trace answers back to sources. --- ## Where RAG Shines - **Customer service chatbots** with accurate product and policy info - **Coding assistants** that know your repos and functions - **Legal or medical tools** grounded in vetted source material - **Search assistants** that go beyond links to deliver answers - **Personal AI tools** that understand your files, calendar, and inbox --- ## Inside a RAG System | Component | What It Does | |----------------|------------------------------------------------------------| | **LLM** | Generates the response | | **Retriever** | Finds relevant documents | | **Knowledge Base** | Stores your trusted content (PDFs, docs, articles) | | **Vector DB** | Enables fast, semantic document search (optional, but ideal) | --- ## Key Considerations - **Latency**: Retrieval adds a few extra milliseconds. - **Context limits**: LLMs can only process so much text. - **Retrieval quality**: Poor ranking = irrelevant context. - **Data privacy**: Be careful what you expose to the retriever. --- ## Why Be Excited About RAG RAG unlocks the next generation of intelligent, real-time AI systems. From personalized assistants to AI-driven support, RAG bridges the gap between static model training and dynamic business needs. If you're building AI for a domain with lots of private or fast-changing info, **RAG is not optional — it’s essential**. ### Neuroparenting, Infant Vision, and the Power of Generative AI # Heibaika: Neuroparenting, Infant Vision, and the Power of Generative AI ## Introduction Neuroparenting, the practice of integrating neuroscience into parenting strategies, is reshaping how modern families approach child development. Among these evolving methods is **Heibaika**, a Taiwanese tool that uses black-and-white cards to stimulate visual and cognitive growth in newborns (2). As parenting becomes increasingly influenced by science, the intersection of **Generative AI** and neurodevelopmental tools like Heibaika presents new opportunities—and new challenges—for families worldwide. This article explores the science, cultural implications, and future potential of Heibaika in the age of AI-powered parenting. --- ## Understanding Heibaika: The Science Behind Black-and-White Cards ### What is Heibaika? Heibaika refers to black-and-white visual stimulation cards used predominantly by Taiwanese parents to support the early cognitive and visual development of infants under three months old (2). These cards leverage infants’ natural attraction to high-contrast visuals, which are crucial during the early stages of visual system maturation (7). ### Why Black and White? Newborns are born with limited visual acuity, roughly **20/400** vision (1), and struggle to distinguish fine details or soft colors. However, they are particularly responsive to **high-contrast patterns** (7), which captivate their attention and promote **visual tracking and neural connection building** during a critical window of brain development (8). --- ## Origins and Cultural Significance ### The Rise of Heibaika in Taiwan Heibaika gained traction in Taiwan during the early 21st century, driven by increasing parental anxiety about child development and the growing popularity of neuroparenting concepts (3, 4). Online parenting forums and postpartum care centers played pivotal roles in popularizing this practice. ### Neuroparenting and Cultural Pressures Heibaika reflects a broader cultural shift towards **scientifically guided parenting**, where families seek tools and techniques to optimize infant development (11). However, this trend also introduces **ethical concerns and societal pressures** (11, 12). Parents may feel compelled to adopt these tools out of fear of missing critical developmental opportunities for their children. --- ## How Heibaika Stimulates Cognitive Growth ### Mechanisms of Action - **Visual Engagement:** Heibaika’s high-contrast patterns stimulate visual attention, enhancing infants' ability to track objects and focus on stimuli (3, 7). - **Neural Activation:** Frequent exposure may promote synaptic connections and support cognitive processes such as curiosity and problem-solving (11, 15). - **Foundation for Play:** Early visual stimulation encourages exploratory behaviors that are essential for cognitive, emotional, and motor development (21). --- ## Generative AI: Revolutionizing Infant Cognitive Research ### AI-Powered Insights into Infant Development Generative AI is emerging as a powerful research tool capable of **simulating infant brain responses** to visual stimuli like Heibaika patterns (6). By creating virtual models of the infant mind, researchers can: - Analyze how specific visual patterns influence brain activity (7). - Identify which designs are most effective for cognitive stimulation (8). - Personalize visual experiences based on individual infants' developmental responses (9). ### Data-Driven Optimization AI can process large datasets from infant studies, uncovering **hidden trends and learning preferences** (22). This can refine Heibaika designs to align more closely with infants’ developmental stages, potentially enhancing their effectiveness. ### Towards Personalized Neuroparenting The future may see **AI-curated visual stimulation regimens**, uniquely tailored to each infant’s needs (9). Such advancements could offer parents scientifically backed, adaptive strategies that evolve with their child’s growth (10). --- ## Implications for Modern Parenting ### The Role of Play in Early Development - **Play is Foundational:** Responsive play enhances neural connectivity, emotional security, and social skills (23). - **Caregiver Engagement Matters:** Frequent, attuned interactions with parents support optimal learning (11). - **Designing a Safe Environment:** Calm, supervised spaces promote stress-free exploration and effective cognitive engagement (11). ### Family Participation Enhances Learning Including siblings and extended family in infant play encourages social learning and strengthens family bonds (11). Shared routines, such as reading or singing, can become powerful developmental tools. ### Observational Learning and Responsive Caregiving - **Tracking Milestones:** Parents should observe infants' progress and tailor play accordingly (24). - **Wait Time:** Allowing infants time to process and respond supports autonomy and critical thinking (25). - **When to Seek Help:** Pediatric screenings can guide early interventions if developmental concerns arise (24). --- ## Ethical Considerations and Cross-Cultural Perspectives ### Balancing Science and Parenting Freedom While Heibaika offers exciting possibilities, it also raises concerns: - **Parental Pressure:** The drive to adopt scientific parenting tools may inadvertently create stress and guilt (11). - **Ethical Research:** Transparent communication about the use of infant studies and AI-driven recommendations is essential (12). ### Global Adaptation and Cultural Sensitivity Different cultures may interpret neuroparenting strategies like Heibaika through unique lenses (2). Cross-cultural research is vital to ensure **practices remain adaptable and respectful of local parenting traditions** (27). --- ## Future Directions: Research and Application ### Longitudinal Studies Needed To fully understand Heibaika’s long-term impact, **extended studies tracking infants into later childhood** are crucial (28, 29). Such research can validate or challenge the claimed cognitive benefits of early visual stimulation. ### AI-Inspired Cognitive Models Insights from infant learning could inform **new, more efficient AI systems** that mirror human developmental processes (6), creating a feedback loop where studying babies advances artificial intelligence, and AI in turn enhances child development tools. --- ## Conclusion Heibaika represents more than a set of visual cards—it embodies the evolving intersection of neuroscience, technology, and parenting. The integration of Generative AI offers promising pathways for refining infant developmental tools, but it also calls for thoughtful consideration of cultural, ethical, and emotional factors in parenting. As families navigate these emerging landscapes, one thing remains clear: nurturing a child’s growth is a deeply personal journey, where both science and sensitivity must walk hand in hand. --- ## FAQ ### What is Heibaika? Heibaika are black-and-white visual stimulation cards used by parents, especially in Taiwan, to support the cognitive and visual development of newborns (2). ### Why are infants drawn to high-contrast patterns? Newborns have underdeveloped vision and are naturally attracted to high-contrast patterns, which are easier for them to see and help stimulate brain development (7). ### How does Generative AI improve Heibaika research? Generative AI can model infant brain responses to visual stimuli, helping researchers identify the most effective patterns and enabling personalized developmental strategies (6, 7, 8). ### Are there ethical concerns with Heibaika? Yes. The growing emphasis on neuroparenting may increase parental pressure and anxiety (11). Ethical considerations also include the transparency of research involving infants (12). ### Can Heibaika replace traditional parenting practices? No. While Heibaika can complement early cognitive stimulation, it should not replace responsive caregiving, play, and social interaction, which are foundational to a child’s development (11, 23). --- ## References (1) [Infant visual development - Wikipedia](https://en.wikipedia.org/wiki/Infant_visual_development) (2) [Infant perception | Visual, Auditory & Cognitive Development ](https://www.britannica.com/topic/infant-perception) (4) [Infant vision development: Helping babies see their bright futures! ](https://www.canr.msu.edu/news/infant_vision_development_helping_babies_see_their_bright_futures) (6) [Infant Visual Attention and Object Recognition - PMC](https://pmc.ncbi.nlm.nih.gov/articles/PMC4380660/) (7) [High Contrast Images for Baby: How They Support Infant Vision](https://www.tinyhood.com/expert-articles/baby/high-contrast-images-for-baby-how-they-support-infant-vision) (8) [Visual Stimulation for Newborns - Ask Dr Sears](https://www.askdrsears.com/topics/parenting/child-rearing-and-development/bright-starts-babys-development-through-interactive-play/playtime-articles/visual-stimulation-newborns/) (9) [How Babies Discover the World Through Visual Perception](https://www.habausa.com/blogs/blog-inspiration/how-babies-discover-the-world-through-visual-perception?srsltid=AfmBOootcRrwoMgsNG3Dj2lDk_imKuoW3rBXrAgRnjGw8Hdl4LZHdNcp) (10) [Why is human vision so poor in early development? The impact of](https://www.biorxiv.org/content/10.1101/2022.06.22.497205v1.full-text) ... (11) [Beyond black and white: heibaika, neuroparenting, and ... - Mendeley](https://www.mendeley.com/catalogue/aa9a9d1d-56f8-3d71-8978-cb33236c6f04/) (12) [A baby and heibaika. Photo by the author - ResearchGate](https://www.researchgate.net/figure/A-baby-and-heibaika-Photo-by-the-author_fig1_338176504) (15) [Neuropsychological development of newborns, infants, and toddlers](https://psycnet.apa.org/record/2010-23861-002) ... (21) [What Is Generative AI (GenAI)? How Does It Work? - Oracle](https://www.oracle.com/artificial-intelligence/generative-ai/what-is-generative-ai/) (22) [Play ideas & newborn cognitive development](https://raisingchildren.net.au/newborns/play-learning/play-ideas/thinking-play-newborns) (23) [Cognitive Development: Infants and Toddlers - Virtual Lab School](https://www.virtuallabschool.org/infant-toddler/cognitive-development/lesson-2) (24) [The Environment: Schedules and Routines | Virtual Lab School](https://www.virtuallabschool.org/infant-toddler/learning-environments/lesson-5) (25) [Beyond black and white: heibaika, neuroparenting, and lay](https://scholar.nycu.edu.tw/en/publications/beyond-black-and-white-heibaika-neuroparenting-and-lay-neuroscien) ... (27) [In The Journals, March 2021, Part 1 - Somatosphere](https://somatosphere.com/2021/in-the-journals-march-2021-part-1.html/) (28) [Neuropsychology's Role in Multidisciplinary Follow-Up Care](https://www.sciencedirect.com/science/article/abs/pii/S0887899425000165) of ... (29) [The Development of Attention Systems and Working Memory](https://www.frontiersin.org/journals/systems-neuroscience/articles/10.3389/fnsys.2016.00015/full) in ... --- Article generated with [Standford's STORM](https://storm.genie.stanford.edu) and ChatGPT. ### May 2025 ### How to Learn Faster with AI # Learn Faster with AI through Rapid Exploration, Smart Depth and Deep Retention Artificial intelligence can dramatically accelerate your learning—if you know how to use it well. This framework, built around three key learning phases, helps you learn faster, deeper, and more efficiently by leveraging AI as your personal research assistant, tutor, and coach. ## 1. Rapid Exploration When learning something new, it's essential to first explore widely before diving deep. AI can help you quickly map the landscape, uncover key concepts, and identify differing perspectives. **Why It Matters** Starting with broad sampling helps you avoid tunnel vision, build context, and choose the right learning paths without wasting time on irrelevant details. **How to Apply It** * Use AI-generated summaries to quickly scan books, papers, and articles. * Ask AI to present contrasting viewpoints or highlight major debates to challenge your assumptions. * Generate curated lists of resources across beginner, intermediate, and expert levels. **Example AI Prompts** * “Give me a one-paragraph summary of the top 5 frameworks for [topic].” * “What are the strongest critiques of [topic]?” * "Summarize the key concepts, opposing views, or trends in [topic]. Provide 3-5 high-quality resources for further exploration." **AI’s Role:** Rapidly curate, compare, and compress large volumes of information. ## 2. Smart Depth Once you’ve explored broadly, shift into focused, efficient deep dives—only where it matters. AI can guide you to learn precisely what you need without getting lost in the weeds. **Why It Matters** This step helps you move from passive exposure to active understanding while saving time by targeting your learning to essential concepts. **How to Apply It** * Use AI to break down complex topics into bite-sized parts and structured learning paths. * Request explanations in multiple formats—plain English, analogies, diagrams. * Build active learning cycles: quizzes, flashcards, and practice problems. **Example AI Prompts** * “Explain this like I’m five, then like I’m an expert.” * “What’s the minimum I need to know to understand [topic]?” * "Break down [concept] into smaller parts. Teach me step-by-step. Provide diagrams or practice problems." **AI’s Role:** Teach, clarify, visualize, and test your understanding in real time. ## 3. Deep Retention Learning is only useful if it sticks. In this phase, AI helps you reinforce knowledge, test yourself rigorously, and simulate real-world application. **Why It Matters** Without active recall and practice, most information quickly fades. Deep retention makes your learning durable and usable. **How to Apply It** * Use AI to generate spaced repetition flashcards and self-quizzing schedules. * Simulate exams, real-world tasks, or teach-back exercises to consolidate learning. * Summarize concepts in your own words and let AI critique or refine your explanations. **Example AI Prompts** * “Quiz me on these concepts using multiple-choice and open-ended questions.” * “Help me teach this concept to a beginner.” * "Create a quiz on [topic]. Generate flashcards. Simulate a teach-back exercise where I explain the concept and you critique it." **AI’s Role:** Reinforce, simulate, and provide targeted feedback to strengthen retention. ## The Power of Micro-Cycles The fastest learning happens when you cycle rapidly between these three phases.Sample → Delve → Absorb → Repeat. AI can track your progress, surface knowledge gaps, and suggest what to revisit next. You can even build AI-powered learning dashboards using tools like Notion and GPT APIs to automate your cycles and measure your growth. --- ## AI Prompts for Learning Faster with AI ### 📂 Phase 1: Rapid Exploration Rapid exploration and big-picture understanding **General Discovery** * "Give me a concise overview of [topic]." * "List the top 5 subtopics within [topic] that I should know about." * "Summarize the current trends and debates in [topic]." **Contrasting Views** * "What are the most common misconceptions about [topic]?" * "Summarize opposing opinions on [topic] with evidence from both sides." * "What’s the biggest controversy in [topic] and why does it matter?" **Resource Curation** * "List the best books, podcasts, and online courses for beginners, intermediates, and experts in [topic]." * "Curate the top articles and explain why each one is useful for learning [topic]." ### 🛠️ Phase 2: Smart Depth Smart depth, structured progression **Concept Breakdown** * "Explain [concept] like I’m five, then like I’m an expert." * "Break down [topic] into prerequisite concepts in order of learning." * "Summarize [subtopic] using bullet points, diagrams, and practical examples." **Visualization** * "Create a simple diagram or flowchart to explain [process/concept]." * "Compare [concept A] and [concept B] using a visual table with pros and cons." **Application & Problem Solving** * "Create 5 practice questions (multiple-choice and short-answer) about [topic]." * "Show a step-by-step solution to a real-world problem involving [concept]." * "Write a realistic scenario where I can apply [concept] in practice." ### 🧠 Phase 3: Deep Retention Retention, testing, and mastery **Active Recall** * "Test me on [topic] with a mix of multiple-choice, true/false, and open-ended questions." * "Generate Anki-style flashcards for [topic]. Front: Question. Back: Answer." **Teach-Back** * "I will explain [concept] in my own words. Please critique my explanation and identify gaps or inaccuracies." * "Ask me to teach this topic to a beginner. Simulate a Q&A where I must answer the beginner’s questions." **Simulated Tasks** * "Create a complex, real-world task that requires me to apply [topic] to solve a problem." * "Simulate an exam covering [topic] and provide detailed feedback on my answers." **⚙️ Optional Meta-Prompts for Learning Optimization** * "What are the most common learning mistakes in [topic]?" * "Based on my current understanding, what should I focus on next?" * "Create a daily study schedule using spaced repetition for the next 30 days." * "Track my progress and suggest review points based on what I’ve already studied." ### April 2025 ### ChatGPT Models Guide ## So Many ChatGPT Models — What Do They All Do? If you’ve ever opened ChatGPT and found yourself staring at a dropdown menu filled with names like GPT-4o, GPT-4.5, o3, or o4-mini, you’re not alone. With OpenAI introducing multiple AI models tailored to different tasks, it’s easy to get overwhelmed by the choices. But here’s the good news — each model has a unique specialty, and once you understand what each one is designed for, picking the right one becomes effortless. In this guide, we’ll break down the current ChatGPT lineup in plain English. Whether you’re creating content, solving complex problems, building automations, or just looking for a dependable AI assistant, there’s a model built for exactly that. ## GPT-4o — The Reliable All-Rounder GPT-4o is the flagship model for most users, and for good reason. It’s versatile, intelligent, and capable across a wide range of tasks. Whether you're writing emails, generating blog posts, coding, analyzing documents, or even working with images, GPT-4o can handle it smoothly. Think of it as your all-purpose digital teammate, one you can trust to deliver quality output every time. What sets GPT-4o apart is its balance. It’s not the absolute fastest or the most specialized, but it strikes the perfect middle ground — fast enough to keep up with your workflow and smart enough to understand nuanced prompts. Plus, it’s one of the few models with multimodal capabilities, meaning it can interpret and respond to images as well as text. If you only use one model, GPT-4o is a safe and powerful bet. ## GPT-4o with Scheduled Tasks — Your AI Project Manager While GPT-4o is excellent on its own, there’s a version that goes one step further: GPT-4o with scheduled tasks. This version combines all the capabilities of the original model with a powerful new feature — memory and task scheduling. Imagine telling your AI, “Remind me to post my blog every Friday,” or “Send me a new business idea each morning,” and having it actually follow through on those commands. That’s what this model offers. This version is perfect for solopreneurs, creators, or anyone juggling recurring responsibilities. It acts like a proactive assistant who doesn’t just wait for instructions — it helps you stay organized and consistent over time. If you’ve ever wanted a personal AI that keeps your calendar ticking and your projects moving, this is the model for you. ## GPT-4.5 (Preview) — The Deep Creative Thinker Currently in preview mode, GPT-4.5 is shaping up to be the go-to model for deep, strategic, and long-form thinking. Unlike models optimized for speed or utility, GPT-4.5 takes its time and dives deep into the task at hand. It’s ideal for complex writing, such as essays, market analysis, business planning, or any task that requires layered thought and nuance. What makes GPT-4.5 stand out is its ability to process and synthesize large amounts of information into well-structured, insightful responses. It’s not just about being right — it’s about being thorough, thoughtful, and contextually aware. If you’re working on big-picture projects or want help developing ideas from scratch with depth and clarity, GPT-4.5 is your AI co-strategist. ## The o-Series — Your Brainy Problem-Solvers The “o” models are designed with a different focus: logic, reasoning, and structured problem-solving. These models excel in technical, analytical, and highly logical tasks where step-by-step reasoning is essential. Each one serves a slightly different need: - **o3** is the heavyweight in the group, best suited for handling complex, multi-step problems like data analysis, algorithm design, or building smart strategies. It’s your behind-the-scenes consultant who thrives on complexity. - **o4-mini** is a lightweight, faster version meant for day-to-day tasks where speed and efficiency are key. It might not be as deep as o3, but it gets the job done quickly and with reasonable accuracy. - **o4-mini-high** bridges the gap by offering greater accuracy than o4-mini, albeit with slightly longer processing times. This makes it ideal when you need precision but don’t want to go all the way up to a heavier model like o3. If you work in coding, decision-making, data interpretation, or process optimization, these are the models to explore. ## TLDR: Which ChatGPT Model Should You Use? With so many models now available, the key is to think in terms of roles rather than names. Each model represents a different type of teammate: - **GPT-4o** is your versatile generalist — good at nearly everything you throw its way. Context window of 128k tokens. - **GPT-4o + Tasks** is your project manager — keeping things on track and running consistently. - **GPT-4.5** is your strategist — slow, deliberate, and deeply thoughtful. - **o3** is your expert consultant — analytical, precise, and great at tackling complexity. Context window of 200k tokens. - **o4-mini** is your quick executor — speedy and nimble for everyday tasks. Context window of 128k tokens. - **o4-mini-high** is your careful planner — taking a bit more time to ensure everything is just right. So next time you're faced with that list of models, don’t panic. Just ask yourself: “*What do I need help with today?*” and pick the model that matches the role. The more you get to know them, the more powerful — and productive — your AI toolkit becomes. _Context windows are underlying available for each model, which can affect how much context they can handle in a single interaction, both input and output. For example, GPT-4o has a context window of 128,000 tokens, but (hidden) assistant instructions, RAG pipelines and other methods used by OpenAI or Azure's AI search system used to include data in the prompt, brings this context window down._ - 1 token ~= 4 chars in English - 100 tokens ~= 75 words OpenAI has a [helpful guide](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them) on how to estimate token counts. Guidelines: An A4 page is about 600 tokens. A novel is about 100K tokens. You can check this one out [here](https://michaelcurrin.github.io/token-translator/). _(by the time you read this, there may be even more models available, so keep an eye out for updates!)_ Cross-published on [engineering.prompt.cards](https://engineering.prompt.cards/en/posts/openai-chatgpt-models-guide/) ### March 2025 ### February 2025 ### AI adoption and organizational resistance # AI adoption and organizational resistance AI tool adoption in enterprises is outpacing the development of policies to govern their use. This creates friction: leadership pushes for integration while employees range from eager experimentation to active resistance. Organizational behavior research suggests that resistance patterns contain useful information about security vulnerabilities and adoption risks. ## Skeptics as quality control Employees who resist AI adoption — sometimes labeled "Cautious Chris" in organizational persona frameworks — are often treated as obstacles. This framing misses their function. Skeptics tend toward high attention to detail and preference for established workflows. These traits slow integration but also catch compliance gaps and policy violations that enthusiasts overlook. The tradeoff: complete tool avoidance creates skill gaps as competitors adopt new capabilities. Effective onboarding for skeptics uses role-specific training tied to their existing job functions. Low-risk use cases let them apply their scrutiny to the adoption process itself. When skeptics eventually adopt a tool, they have already stress-tested it against existing workflows. ## Persona assignment in prompt engineering LLMs respond differently depending on the framing of the request. Assigning a specific role to the model reduces output ambiguity. A prompt like "You are a copyeditor and your job is to correct spelling and grammar mistakes without changing the meaning of the text" constrains the model's interpretation of the task. This technique requires iteration. Users refine prompts by adding context, specifying output format, and providing examples of desired results. The SANS Institute documentation on prompt engineering describes this as guiding the AI's output through explicit role assignment. The same principle applies to understanding employee behavior. Different user archetypes interact with AI tools in predictable ways, and those patterns have security implications. ## Enthusiast adoption and shadow AI Early adopters — the "Trailblazing Tom" archetype — find novel applications and automate repetitive tasks. They also create shadow AI problems: tools deployed without IT approval, data shared with external services without review, workflows that bypass established controls. Enthusiast behavior generates friction with colleagues who lack technical confidence. It also opens security gaps. A multilayered response involves governance policies, tool identification, and access controls. One practical approach: pair enthusiasts with skeptics during tool evaluation. The skeptic's tendency to question assumptions checks the enthusiast's tendency to prioritize speed over compliance. ## Pattern recognition without ethical reasoning AI systems process large datasets and detect statistical anomalies. In security applications, this means identifying potential threats, generating threat intelligence, and automating routine incident response. These capabilities operate through pattern matching. AI lacks contextual understanding and cannot evaluate ethical implications. Automated systems flag anomalies; humans must interpret what those anomalies mean and decide how to respond. Security teams that defer entirely to AI-generated alerts lose the judgment layer that distinguishes false positives from genuine threats. ## Data leakage in LLM workflows LLMs become more useful with more context. Users naturally want to provide background information, previous documents, and specific details to improve output quality. This creates a data exposure problem. Many public LLM services use submitted prompts as training data. When an enthusiast uploads meeting notes, draft reports, or customer information to improve a summary, that [data may become part of the model's training set](/posts/ai-native-security-why-current-guardrails-are-obsolete/). Once ingested, the data cannot be retrieved or deleted. This risk is highest among power users who have learned that detailed prompts produce better results. Technical controls like data loss prevention filters help, but the core issue is user behavior. Policies must specify what data categories can enter external AI systems, and users must understand that prompt submissions are not private. ## Measured adoption The target state is selective tool adoption based on demonstrated value. This means evaluating each use case for benefits and risks before deployment, rather than maximizing the number of AI integrations. Effective adoption draws on multiple perspectives: enthusiasts identify opportunities, skeptics identify risks, and pragmatists weigh tradeoffs. Organizations that optimize for adoption speed alone tend to accumulate security debt and compliance gaps that surface later. ### January 2025 ### LLM Jailbreaking & System Vulnerabilities # LLM Jailbreaking and System Vulnerabilities ## 1. Introduction Large Language Models (LLMs) represent significant advancements in artificial intelligence, capable of understanding and generating human-like text. However, their widespread adoption has revealed critical vulnerabilities that can be exploited by attackers. This thesis explores the architectural and operational weaknesses of LLMs, their integration with external systems, and the safeguards in place, focusing on attacks like BoN Jailbreaking, Flowbreaking, and context-dependent vulnerabilities. --- ## 2. Key Vulnerabilities in LLM Architecture and Operation ### 2.1. Sensitivity to Input Variations Despite their sophistication, LLMs exhibit a surprising sensitivity to minor input variations. These vulnerabilities are particularly pronounced in modalities like vision and audio, where subtle changes—such as alterations in image colour or audio pitch—can dramatically impact output. This sensitivity is foundational to attacks like **BoN Jailbreaking**, which repeatedly samples augmented versions of harmful requests until one bypasses the model’s safeguards (Hughes et al., 2024)【3†source】. The cross-modal effectiveness of this technique underscores the inherent fragility of LLMs in handling high-dimensional and continuous inputs. ### 2.2. Stochastic Output Generation The stochastic nature of LLM output generation, especially at higher sampling temperatures, introduces another layer of vulnerability. While safety measures aim to prevent harmful responses, the randomness inherent in output generation can occasionally result in the production of unsafe content. By leveraging this unpredictability, BoN Jailbreaking increases the likelihood of eliciting harmful outputs through systematic augmentations (Hughes et al., 2024)【3†source】. This challenge highlights the difficulty of safeguarding models that rely on probabilistic output generation. ### 2.3. Limitations of Alignment Techniques Advances in alignment methodologies, such as reinforcement learning from human feedback (RLHF), have significantly improved the safety of LLMs. However, these techniques remain susceptible to adversarial attacks or “**jailbreaks**” (Robey et al., 2024)【2†source】. Carefully crafted prompts can exploit alignment inconsistencies, prompting models to generate harmful or undesirable content. The persistence of these jailbreaks, even against commercial LLM systems, emphasises the need for more robust alignment strategies. ### 2.4. Vulnerabilities in System Architecture The integration of LLMs into broader systems introduces additional attack vectors, as attackers can target weaknesses in the architecture and implementation. **Flowbreaking**, a novel class of attacks, exploits these systemic vulnerabilities by manipulating the interaction and synchronization of components. One example, the **Stop and Roll **attack, demonstrates how halting an LLM’s response midway can bypass second-line guardrails, allowing harmful content to persist. This vulnerability underscores the necessity for holistic security measures that account for the entire system’s architecture (Evron, 2024)【4†source】. ### 2.5. Context-Dependent Alignment Challenges The emergence of context-dependent alignment challenges, especially in LLM-controlled robotics, adds complexity to the problem of safeguarding AI systems. Unlike chatbots, which focus on filtering harmful text, robots operate in dynamic physical environments where intent and context significantly influence the potential for harm. For example, a command to “deliver a bomb” is harmful only if the robot possesses such an item. Addressing this issue requires sophisticated alignment mechanisms capable of reasoning about a robot’s physical state and surroundings, further complicating defense strategies (Robey et al., 2024)【2†source】. --- ## 3. Differentiating Attack Types: Prompt Injection, Jailbreaking, and Flowbreaking ### 3.1. Prompt Injection Attacks: These attacks exploit vulnerabilities in applications built on top of LLMs. By crafting malicious input concatenated with a trusted prompt, attackers manipulate the LLM into performing unintended actions. [**Prompt injection**](/posts/ai-native-security-why-current-guardrails-are-obsolete/) remains a prevalent issue in the design of user-facing AI systems (IEEE Spectrum, 2024)【1†source】. ### 3.2. Jailbreaking Attacks: [**Jailbreaking**](/posts/ai-native-security-why-current-guardrails-are-obsolete/) focuses on bypassing the safety filters embedded within LLMs. Attackers design prompts that exploit alignment loopholes or inconsistencies, enabling the generation of harmful content. These attacks highlight the fragility of current alignment techniques and their limitations in preventing exploitation (Robey et al., 2024)【2†source】. ### 3.3. Flowbreaking Attacks: **Flowbreaking** attacks extend beyond the LLM to target the surrounding system architecture. By exploiting *timing issues*, *synchronisation weaknesses,* or *operational flaws*, these attacks disrupt data flow and application logic. For instance, the **Second Thoughts** attack manipulates response timing, allowing harmful information to leak before guardrails retract the response (Evron, 2024)【4†source】. This broader scope of attack makes **Flowbreaking** particularly dangerous, as it targets not just the model but the entire application environment. --- ## 4. Findings and Implications for LLM Security The findings presented in the sources have significant implications for LLM security and deployment, particularly highlighting the potential for real-world harm stemming from jailbroken LLMs. - **Jailbroken LLMs pose a critical risk beyond generating harmful text; they can potentially cause physical harm in the real world.** This is especially concerning as many LLM-robot systems are currently deployed in safety-critical applications. One study demonstrated that an automated attack called RoboPAIR successfully jailbroke three different LLM-controlled robots, manipulating them into performing dangerous tasks such as colliding with pedestrians or searching for locations to detonate bombs (Robey et al., 2024)【2†source】. - **LLM-controlled robots may be fundamentally unaligned, even for non-adversarial inputs.** Unlike chatbots, where generating harmful text is generally viewed as objectively harmful, the harmfulness of a robotic action is context-dependent. This makes it difficult to establish clear safety guidelines and necessitates the development of new, robot-specific filters and defense mechanisms (Robey et al., 2024)【2†source】. - **The stochastic nature of LLM outputs and their sensitivity to input variations make them vulnerable even to simple attack algorithms.** The Best-of-N (BoN) jailbreaking algorithm successfully jailbroke a range of frontier LLMs across multiple modalities (text, vision, and audio) by repeatedly sampling augmented prompts until a harmful response was elicited (Hughes et al., 2024)【3†source】. This highlights the need for robust defenses that can withstand a variety of attacks, including those that exploit seemingly innocuous changes to inputs. - **The current reliance on streaming responses in LLM applications poses a security risk, as harmful information may be transmitted before guardrails can effectively intervene.** This underscores the need for enterprises to ensure that LLM answers are fully generated before being displayed to users, despite potential user experience challenges (IEEE Spectrum, 2024)【1†source】. --- ## 5. Recommendations for Holistic LLM Security 1. **Develop context-aware alignment mechanisms for LLM-controlled robots:** This will require considering the robot's environment and the potential consequences of its actions. Specifically, this involves creating advanced reasoning systems that can dynamically assess the intent of a command, evaluate the physical context, and predict potential outcomes before execution. For example, mechanisms could incorporate real-time sensory data and environmental modeling to understand whether a requested action, such as navigating a crowded space, could pose risks to human safety. Integrating these considerations will enable robots to operate ethically and adaptively in diverse, complex scenarios. 2. **Design robust defences specifically for LLM-controlled robots:** These defences should address the unique challenges posed by physical embodiment and context-dependent harm. Robust defence strategies must include real-time anomaly detection systems capable of identifying unexpected robotic behaviours, adaptive safety protocols that dynamically update based on environmental inputs, and multi-layered fail-safes to prevent harm even in the event of system compromise. Furthermore, collaboration with domain experts to tailor defences for specific robotic applications (e.g., autonomous vehicles, medical robots) is critical. These measures will significantly enhance the resilience and operational safety of LLM-controlled robots (Robey et al., 2024)【2†source】. 3. **Investigate and address the sensitivity of LLMs to input variations:** This may involve exploring new defence mechanisms such as *input smoothing*, *adversarial training*, or employing robust *gradient masking* techniques to reduce susceptibility to perturbations (Hughes et al., 2024)【3†source】. Additionally, creating multi-modal training datasets with diverse and noisy inputs can help models generalise better and resist targeted manipulations. Evaluations using systematic stress testing across multiple modalities (text, vision, audio) are essential for identifying specific vulnerabilities and tailoring defences to the dynamic nature of real-world data. 4. **Implement safeguards to prevent the premature release of harmful information in streaming LLM applications:** This may include delaying the display of responses until they are fully generated and vetted (IEEE Spectrum, 2024)【1†source】. Furthermore, integrating multi-tiered content verification systems that analyse the response at various stages of generation could help identify and mitigate harmful outputs more effectively. Enterprises could adopt real-time monitoring tools to dynamically assess responses for safety violations and reinforce guardrails before final outputs are delivered to users. This approach ensures that user experience challenges are balanced with robust safety mechanisms, providing both functionality and security. --- The rapid integration of LLMs into various societal and industrial domains necessitates a proactive approach to addressing their vulnerabilities. Key findings, such as the potential for real-world harm from RoboPAIR jailbreaks or the exploitability of stochastic outputs, highlight the pressing need for robust defences. By adopting comprehensive alignment strategies, enhancing system-level safeguards, and ensuring context-aware mechanisms for robots, we can mitigate risks and unlock the transformative potential of LLMs responsibly. Future research must focus on interdisciplinary collaboration and ongoing stress testing to adapt to evolving threats. Together, these measures will help protect against the challenges posed by advanced AI systems while ensuring their safe and ethical deployment. 1† "Robot Jailbreak: Researchers Trick Bots Into Dangerous Tasks", IEEE Spectrum, https://spectrum.ieee.org/jailbreak-llm 2† "Jailbreaking LLM-Controlled Robots", Alexander Robey, Zachary Ravichandran, Vijay Kumar, Hamed Hassani, George J. Pappas, School of Engineering and Applied Science, University of Pennsylvania. arXiv:2410.13691v2 [cs.RO] 9 Nov 2024 3† "BEST-OF-N JAILBREAKING", John Hughes, Sara Price, Aengus Lynch, Rylan Schaeffer, Fazl Barez, Sanmi Koyejo, Henry Sleight, Erik Jones, Ethan Perez, Mrinank Sharma, arXiv:2412.03556v2 [cs.CL] 19 Dec 2024 4† "Suicide Bot: New AI Attack Causes LLM to Provide Potential 'Self-Harm' Instructions" , Gadi Evron, Knostic, https://www.knostic.ai/blog/introducing-a-new-class-of-ai-attacks-flowbreaking Download [the PDF version of this article](/img/posts/llm-jailbreaking-and-systems-vulnerabilities.pdf). ### December 2024 ### November 2024 ### Australia's AI Economy: Opportunities and Challenges https://news.microsoft.com/en-au/features/new-research-identifies-australias-most-promising-opportunities-in-the-new-global-ai-economy/ ## Australia's most promising opportunities in AI >"Australia has a solid foundation for AI, with its favourable business environment, strong sustainability credentials and high level of AI readiness all being among the country’s key strengths," (Steven Worrall, Managing Director, Microsoft Australia & New Zealand) ### Key Findings of the Report * **Promising Opportunities** Australia is well-positioned to excel in three critical areas of the AI tech stack: * **Applications:** Building industry-specific applications using existing foundation models is the largest and fastest-growing opportunity. Australian startups can capitalize on foundation models to create bespoke solutions tailored to industry needs. * **AI Datacenters:** With renewable energy resources, land availability, and proximity to Asia, Australia is an ideal location for AI datacenters. This sector could generate an estimated A$4.7 billion in annual revenue by 2035. * **Data:** The growing demand for secure and efficient data access for training AI models creates a significant opportunity for Australian data providers and advisory firms. This area is projected to contribute A$1.7 billion in annual revenue by 2035. * **National Priorities** Success in the AI economy will advance four key national priorities: * **Improved Digital Resilience:** Strengthening cybersecurity and data protection capabilities. * **Strengthened Strategic Partnerships:** Collaborating with global AI leaders and regional allies. * **New Export Markets:** Expanding the global reach of Australian AI solutions. * **Improved Global Interoperability:** Ensuring Australian AI systems align with international standards. * **Economic Impact** The widespread adoption of generative AI in Australia could unlock an estimated A$115 billion in annual economic value by 2030. This growth is expected to drive productivity and innovation across sectors like healthcare and financial services. * **Strengths:** * Favourable business environment. * Strong sustainability credentials. * High level of AI readiness. * Thriving startup ecosystem with robust venture capital support. * **Challenges:** * Developing a skilled AI workforce. * Adapting planning and zoning regulations for AI datacenters. * Ensuring secure and reliable renewable energy infrastructure. * Balancing data usage and privacy with growth opportunities. * **Key Actions:** * Streamline approvals for AI datacenter development. * Secure renewable energy sources and firming technologies. * Invest in AI skills development and training programs. * Support research and development in AI, particularly in data-related initiatives. * Foster a regulatory environment that balances risks and opportunities. * Attract foreign direct investment and build strategic partnerships. >“This report reaffirms that Australia has a real opportunity to drive growth and build globally relevant AI businesses – particularly through applications, data and infrastructure”, (Wendell Keuneman, General Partner, Tidal Ventures) ### Recommendations: * Develop a national AI strategy that outlines clear objectives, priorities, and actions. * Establish a taskforce or working group dedicated to implementing the strategy and coordinating efforts across government, industry, and academia. * Prioritize investment in AI research, development, and infrastructure. * Implement initiatives to attract and retain AI talent. * Engage in international collaboration and partnerships to share knowledge, expertise, and best practices. Australia has a significant opportunity to become a leader in the new AI economy. By capitalizing on its strengths, addressing its challenges, and taking strategic actions, Australia can unlock the full potential of AI and drive economic growth for the benefit of all Australians. ### Australian Showcase of Companies Successfully Implementing AI Applications Across Various Sectors * **Healthcare** * **Annalise.ai:** This Australian health tech company develops AI-powered solutions for analyzing medical images like chest X-rays and brain CT scans to detect pathologies such as lung cancer and stroke. Annalise.ai helps radiologists make faster and more accurate diagnoses, addressing Australia’s healthcare worker shortage and the backlog of unreported medical images. Already used by 50% of Australian radiologists, Annalise.ai has scaled its operations to serve millions of patients globally, demonstrating the transformative impact of its AI solutions. * **Industrial Asset Management** * **LexX Technologies:** Based in Melbourne, this startup develops AI-powered software to identify and resolve defects in industrial assets. LexX's software analyzes technical manuals and procedures, enabling engineers and technicians to perform maintenance more efficiently and accurately. By reducing downtime by up to 30%, LexX’s AI has delivered substantial cost savings, as shown by an energy client achieving annual savings of $53 million. * **Education** * **Cogniti:** Originating from the University of Sydney, Cogniti empowers educators to create custom AI agents for personalized student learning. These chatbot agents, powered by models like GPT-4, are tailored with specific instructions and resources to enhance the learning experience. For example, "Mrs. S," a custom AI agent, simulates interactions with a kindergarten teacher for occupational therapy students, providing instant feedback and support. * **Dam Management** * **GHD's InsightVision:** Developed by GHD, a global professional services company founded in Australia, InsightVision uses AI for dam management services. This cloud-based application automates data collection and analytics from sensors, improving dam monitoring, safety, efficiency, and compliance while reducing risks and costs. * **Design** * **Canva:** A leading Australian tech company, Canva launched "Magic Studio," an AI-powered design toolkit, in 2023. With over 7 billion uses, this toolkit has enhanced user experiences by assisting users in creating better visual designs, showcasing Canva's successful AI integration and its boost to competitiveness. * **Project Management Software** * **Atlassian:** A top Australian tech firm, Atlassian introduced "Atlassian Intelligence" in 2023, integrating AI into their cloud-based project management suite. Used by over 30,000 customers, these AI features have driven significant revenue growth, with nearly 80% of users reporting time-saving benefits. ### October 2024 ### Shadow AI ## What is Shadow AI? Shadow AI refers to the use of artificial intelligence (AI) tools and systems within an organization without approval or oversight from IT or security teams. Employees often turn to accessible AI tools like ChatGPT, Google Gemini, and other generative AI applications to solve problems and boost productivity. ### Key Drivers of Shadow AI: * **Rapid advancements in generative AI:** Tools like ChatGPT are intuitive and widely available, making them easy for employees to adopt. * **Slow organizational adoption:** Frustrated by slow IT processes, employees often bypass official channels. * **Innovation and agility:** Employees seek creative solutions and test new technologies to improve workflows. ### Risks Associated with Shadow AI: * **Data security breaches:** Unapproved tools can expose sensitive data if they don’t comply with company security protocols. * **Compliance and legal risks:** Many AI tools may not meet industry regulations like GDPR or the EU AI Act, leading to fines and reputational harm. * **Operational issues:** Generative AI can produce inaccurate or misleading outputs, causing inefficiencies. * **Cybersecurity vulnerabilities:** Unauthorized AI introduces new risks like data leaks or malicious code injections. * **Reputational harm:** Non-transparent AI use in content creation can lead to consumer backlash and loss of trust. #### Example: Samsung In 2023, Samsung experienced a data breach when employees used ChatGPT to process proprietary information, unintentionally exposing it. ### Potential Benefits of Shadow AI (When Properly Managed): * **Improved productivity:** Automating routine tasks frees employees for high-value work. * **Catalyst for innovation:** Experimenting with tools can lead to creative problem-solving and workflow improvements. * **Employee empowerment:** Automation enables employees to focus on strategic activities. * **Faster innovation cycles:** Quick prototyping with AI accelerates organizational agility. ### Detecting Shadow AI in Your Organization: * **Sudden spikes in productivity:** Significant efficiency gains in specific teams might indicate unauthorized AI usage. * **Unusual data traffic:** Unexpected activity to third-party AI platforms could signal Shadow AI. * **Irregular API calls:** Suspicious or unexplained queries to external platforms may point to unauthorized tool use. ### Mitigating the Risks of Shadow AI: * **Develop a governance framework:** Create policies for authorized AI tools, data protection, and usage limits. * **Educate employees:** Provide training on secure and ethical AI usage, compliance, and best practices. * **Implement monitoring tools:** Use AI-specific monitoring solutions to detect and mitigate risks. * **Enforce access controls:** Restrict access to sensitive tools and conduct regular audits. * **Offer secure alternatives:** Provide compliant AI platforms like Google Vertex AI or Azure Machine Learning to reduce reliance on unsanctioned tools. * **Foster collaboration:** Encourage open communication between IT, security teams, and employees. * **Adopt a zero-trust approach:** Enforce multi-factor authentication and limit access to critical data. ### Shadow IT vs. Shadow AI: Shadow AI is a subset of Shadow IT, focusing on unauthorized AI tools, while Shadow IT encompasses any unapproved technology or resources. Here’s a breakdown: * **Scope:** Shadow IT includes everything from personal cloud storage to unapproved software. Shadow AI is specific to AI tools like machine learning, deep learning, and generative AI. * **Nature of risks:** Shadow IT mainly involves data security and compliance issues. Shadow AI adds concerns about data privacy, model bias, output inaccuracies, and ethical considerations. * **User motivation:** Employees use Shadow IT for convenience, but Shadow AI often stems from a desire to explore cutting-edge tools for problem-solving and productivity. Organizations must address Shadow AI by creating clear guidelines, fostering responsible use, and implementing strong security measures. Striking a balance between innovation and compliance allows businesses to leverage AI effectively while minimizing risks. ([Holistic AI](https://www.holisticai.com/blog/shadow-ai), [Forbes](https://www.forbes.com/sites/bryanrobinson/2024/08/02/shadow-ai-the-controversial-2024-trend-that-could-create-disaster-experts-say/), [Arctic Wolf](https://arcticwolf.com/resources/blog/understanding-shadow-it-in-the-age-of-ai/), [IBM](https://www.ibm.com/think/topics/shadow-ai)) ### September 2024 ### August 2024 ### July 2024 ### June 2024 ### AI Safety & Governance ### The Bletchley Declaration The Bletchley Declaration, issued by countries at the AI Safety Summit 2023 (Nov 2023), emphasizes the need for safe, human-centric, and responsible AI development. It acknowledges AI's transformative potential and associated risks, particularly with advanced AI models. The declaration calls for international cooperation to address these risks, promote AI safety, and ensure AI benefits are inclusive and sustainable. Key areas of focus include transparency, accountability, risk assessment, and collaboration on safety research and policies. For more details, you can read the full declaration [at gov.uk](https://www.gov.uk/government/publications/ai-safety-summit-2023-the-bletchley-declaration/the-bletchley-declaration-by-countries-attending-the-ai-safety-summit-1-2-november-2023). ### The (WH) Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence This White House Executive Order (Oct 30 2023) emphasizes responsible AI development to address societal challenges while mitigating risks. Key points include ensuring AI safety and security, promoting innovation and competition, supporting American workers, advancing equity and civil rights, protecting consumer interests, safeguarding privacy and civil liberties, enhancing federal AI capacity, and leading global AI governance. The order mandates robust standards, guidelines, and collaboration across sectors to achieve these goals. For more details, visit the full [White House Executive Order](https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/). ### EU AI Act The EU Artificial Intelligence Act (July 2024) outlines the regulation of AI systems based on risk levels: unacceptable risk (prohibited), high risk (strictly regulated), limited risk (lighter obligations), and minimal risk (unregulated). Key points include obligations for developers and deployers of high-risk AI, documentation requirements, and specific rules for General Purpose AI (GPAI) systems. The Act aims to ensure AI safety, accountability, and compliance, with phased implementation timelines and the establishment of an AI Office for oversight. For a more detailed summary, visit the [summary page](https://artificialintelligenceact.eu/high-level-summary/), the [AI Act Explorer](https://artificialintelligenceact.eu/ai-act-explorer/), or the [AI Act Compliance Checker](https://artificialintelligenceact.eu/assessment/eu-ai-act-compliance-checker/), or the [implementation timeline](https://artificialintelligenceact.eu/developments/). ---------------------- ## Australia ### Australia’s AI Ethics Framework Australia’s AI Ethics Framework (2019) provides guidelines for businesses and governments to design, develop, and implement AI responsibly. It includes eight ethical principles aimed at ensuring AI systems are safe, secure, and reliable. The eight principles are: 1. **Human, societal and environmental wellbeing** 2. **Human-centred values** 3. **Fairness** 4. **Privacy protection and security** 5. **Reliability and safety** 6. **Transparency and explainability** 7. **Contestability** 8. **Accountability** These principles are voluntary and aim to promote ethical AI practices, build public trust, and ensure AI benefits all Australians. They complement existing AI regulations and encourage responsible AI development and use. The framework supports Australia's goal of becoming a global leader in ethical AI and includes case studies from major businesses that have tested the principles. For more details, visit [industry.gov.au](https://www.industry.gov.au/publications/australias-artificial-intelligence-ethics-framework) or the [principles page](https://www.industry.gov.au/publications/australias-artificial-intelligence-ethics-framework/australias-ai-ethics-principles). ### Guidelines for Secure AI System Development The Guidelines for Secure AI System Development by the Australian Cyber Security Centre (ACSC) provide comprehensive recommendations for developing AI systems securely. The guidelines cover four key areas: secure design, secure development, secure deployment, and secure operation and maintenance. They emphasize threat modeling, supply chain security, documentation, incident management, and responsible release. The document aims to ensure AI systems are safe, reliable, and protect sensitive data, encouraging providers to implement security measures throughout the AI lifecycle. For more details, visit [cyber.gov.au](https://www.cyber.gov.au/resources-business-and-government/governance-and-user-education/artificial-intelligence/guidelines-secure-ai-system-development). Similarly, the UK's National Cyber Security Centre (NCSC) provides guidelines for developing secure AI systems. These guidelines emphasize understanding AI risks, ensuring data integrity, securing AI infrastructure, maintaining AI model integrity, and ensuring robust incident response and recovery processes. The guidelines also include practical advice for integrating security practices throughout the AI development lifecycle, from design to deployment, to mitigate potential security threats effectively. For more details, visit [NCSC.gov.au](https://www.ncsc.gov.uk/collection/guidelines-secure-ai-system-development). ### NSW AI Assurance Framework The NSW AI Assurance Framework provides guidelines for the design, development, and use of AI technologies in government projects. Effective from March 2022, it requires project teams to assess and document AI-specific risks throughout the project lifecycle. The framework emphasizes ethical principles such as community benefit, fairness, privacy, security, transparency, and accountability. It supports the NSW AI Strategy and ICT Digital Assurance Framework and mandates submission of assessments for AI projects exceeding $5 million or posing mid-range or higher risks. For more details, visit [nsw.gov.au](https://www.digital.nsw.gov.au/policy/artificial-intelligence/nsw-artificial-intelligence-assurance-framework), or the [Mandatory Ethical Principles for the use of AI](https://www.digital.nsw.gov.au/policy/artificial-intelligence/artificial-intelligence-ethics-policy/mandatory-ethical-principles), or the [basic guidance for GenAI](https://www.digital.nsw.gov.au/policy/artificial-intelligence/generative-ai-basic-guidance) The NSW AI Strategy outlines the government's approach to leveraging AI to enhance service delivery and decision-making. It focuses on using AI to free up the workforce for critical tasks, cut costs, and improve targeted services. The strategy addresses the potential of AI to transform society and the economy while emphasizing the importance of developing AI responsibly to meet privacy standards and address ethical considerations. It includes guidance on balancing opportunity and risk, ensuring community trust, and mitigating unintended consequences. For details, visit [digital.nsw.gov.au AI Strategy](https://www.digital.nsw.gov.au/policy/artificial-intelligence/artificial-intelligence-strategy). ### The Gradient Institute The Gradient Institute is an independent, nonprofit research organization dedicated to integrating safety, ethics, accountability, and transparency into AI systems. They develop new algorithms, provide training, and offer technical guidance on AI policy. The institute collaborates with various organizations to address AI risks, ensure ethical AI deployment, and promote responsible AI practices through research, advisory services, and case studies. For more details, visit the [Gradient Institute website](https://www.gradientinstitute.org). ### Supporting Responsible AI The ["Supporting Responsible AI"](https://consult.industry.gov.au/supporting-responsible-ai) discussion paper by the Australian Department of Industry, Science and Resources outlines a public consultation process for developing policies and initiatives that promote responsible AI use. The consultation seeks input from various stakeholders to ensure AI technologies are used ethically and responsibly, aligning with societal values and legal standards. The initiative aims to build public trust, safeguard against risks, and harness AI's benefits for all Australians. Find the paper [here](https://storage.googleapis.com/converlens-au-industry/industry/p/prj2452c8e24d7a400c72429/public_assets/Safe-and-responsible-AI-in-Australia-discussion-paper.pdf) ### Victoria: Use of personal information with ChatGPT The [Office of the Victorian Information Commissioner (OVIC)](https://ovic.vic.gov.au/privacy/resources-for-organisations/public-statement-use-of-personal-information-with-chatgpt/) states that Victorian public sector organizations must not use personal information with ChatGPT, as it contravenes Information Privacy Principles (IPPs). This includes generating, collecting, or retaining personal data. Any breach should be reported as an information security incident. The statement highlights the significant privacy risks and potential harms, emphasizing that even if input history and model training are disabled, information may still be retained and reviewed by OpenAI. ### WA Government Artificial Intelligence Policy and Assurance Framework The WA Government Artificial Intelligence Policy and Assurance Framework outlines principles and guidelines for WA Government agencies developing or using AI tools. It ensures AI systems are assessed for risk and compliance during all development stages. Projects with significant funding or high risk must be reviewed by the WA AI Advisory Board. The framework includes guidance materials and FAQs to support specific AI use cases. For details, visit the [wa.gov.au](https://www.wa.gov.au/government/publications/wa-government-artificial-intelligence-policy-and-assurance-framework). ---------------------- ### UK AI Safety Institute https://www.aisi.gov.uk ### US AI Safety Institute (NIST) https://www.nist.gov/aisi ### Statement on AI Risk https://www.safe.ai/work/statement-on-ai-risk ### A Right To Warn https://righttowarn.ai ---------------------- ### AI Governance in Australia UQ's "AI Governance in Australia" discusses the need for robust norms, policies, laws, and institutions to guide AI development, deployment, and use, especially given the rapid advancements in AI technologies. It highlights the importance of managing risks from AI, including misuse, accidents, and loss of control. For details, visit [aigovernance.org.au](https://aigovernance.org.au). ### Centre for the Governance of AI The Centre for the Governance of AI (GovAI) focuses on researching and guiding the development and regulation of AI to ensure it is safe and beneficial. Established in 2018, GovAI supports institutions by providing research, hosting fellowships, and organizing events. Key research areas include AI security threats, responsible development, regulation, international coordination, and compute governance. GovAI has influenced policy through publications and advisory roles and transitioned from Oxford’s Future of Humanity Institute to an independent nonprofit in 2021. For more information, visit [governance.ai](https://www.governance.ai/). ### WEF AI Governance Alliance The AI Governance Alliance, an initiative by the World Economic Forum, aims to design transparent and inclusive AI systems. It brings together diverse stakeholders to create frameworks and policies that ensure ethical AI development. The alliance focuses on fostering collaboration, developing standards, and addressing the societal impacts of AI. It supports innovation while ensuring AI technologies are deployed responsibly and benefit all of society. For details, visit the [AI Governance Alliance](https://initiatives.weforum.org/ai-governance-alliance/home). ### Institute for AI Policy and Strategy (IAPS) The Institute for AI Policy and Strategy (IAPS) is a remote-first think tank focusing on managing risks from advanced AI systems. It conducts policy research, develops AI governance standards, and addresses international governance issues, particularly with China. IAPS emphasizes intellectual independence, not accepting funding from for-profit organizations, and aims to build a community of thoughtful AI policy practitioners. Their work includes compute governance and drawing lessons from cybersecurity and other critical industries. For details, go to [IAPS](https://www.iaps.ai). ### Centre for Artificial Intelligence and Digital Ethics The Centre for AI and Digital Ethics (CAIDE) at the University of Melbourne focuses on interdisciplinary research, teaching, and leadership in AI and digital ethics. It addresses ethical, technical, regulatory, and legal issues related to AI and digital technologies. CAIDE involves experts from various faculties, including Law, Engineering and IT, Education, Medicine, Dentistry and Health Sciences, and Arts. The Centre offers undergraduate, graduate, and professional courses and engages with the public through events and media. For details, visit [unimelb.edu.au](https://www.unimelb.edu.au/caide). ### AI Assurance in the UK The UK government's "Introduction to AI Assurance" outlines the importance of AI assurance in building trust, managing risks, and ensuring responsible AI development. It introduces key concepts and tools for AI assurance, emphasizing its role in AI governance and regulatory frameworks. The document highlights the need for robust techniques to measure, evaluate, and communicate the trustworthiness of AI systems, supporting both industry and regulators in achieving responsible AI outcomes. For details, visit [gov.uk](https://www.gov.uk/government/publications/introduction-to-ai-assurance/introduction-to-ai-assurance). ### UNESCO Recommendation on the Ethics of AI The UNESCO Recommendation on the Ethics of Artificial Intelligence is the first global standard on AI ethics, adopted by all 193 Member States. It emphasizes four core values, good of humanity, individuals, societies and the environment: human rights and human dignity, fair and just, diverse and inclusive, and a flourishing environment. The recommendation includes ten core principles for a human-rights centred approach, and eleven key policy action areas to guide ethical AI development. It also introduces practical methodologies like the Readiness Assessment Methodology ([RAM](https://www.unesco.org/ethics-ai/en/ram)) and [Ethical Impact Assessment](https://www.unesco.org/ethics-ai/en/eia) to support implementation and promote gender equality in AI through the Women4Ethical AI platform. For more details, see [unesco.org](https://www.unesco.org/en/artificial-intelligence/recommendation-ethics). ### AI Standards Hub The AI Standards Hub, led by the [Alan Turing Institute](https://www.turing.ac.uk/), is dedicated to fostering a vibrant community around AI standards. It offers a platform for knowledge sharing, capacity building, and research. The Hub's activities are organized around four pillars: an observatory of standards, community collaboration, knowledge and training, and research and analysis. It focuses on Trustworthy AI, addressing transparency, security, and ethical considerations. The Hub provides resources like a standards database, training materials, and forums for discussion. More details, visit [aistandardshub.org](https://aistandardshub.org). ---------------------- ### Mitre Atlas https://atlas.mitre.org MITRE ATLAS (Adversarial Threat Landscape for AI Systems) is a comprehensive, accessible knowledge base documenting adversary tactics and techniques used against AI systems. Based on real-world observations and demonstrations, ATLAS aims to raise awareness and readiness for unique threats to AI-enabled systems. It is modeled after the MITRE ATT&CK framework and serves to inform security analysts, enable threat assessments, and understand adversary behaviors. Key aspects of ATLAS include: 1. **Collaboration**: Involve industry, academia, and government, making it a central resource for understanding and mitigating AI threats. 2. **Incident Sharing**: ATLAS facilitates timely, relevant, and secure reporting of AI incidents and vulnerabilities. 3. **Threat Emulation and Red Teaming**: Tools like Arsenal and Almanac plugins have been developed to add AI-targeted adversary profiles to existing threat emulation tools. 4. **Mitigations**: The ATLAS team continuously incorporates community techniques to mitigate AI security threats, offering a draft set of mitigations. 5. **Real-World Relevance**: It includes case studies of significant AI security breaches, such as a $77 million loss from an attack on a facial recognition system. The document emphasizes the growing number of vulnerabilities as AI expands, the importance of community collaboration, and the continuous development of tools and strategies to enhance AI security. More details at the [MITRE ATLAS website](https://atlas.mitre.org/). ### NIST AI Risk Management Framework (AI RMF) The NIST AI Risk Management Framework (AI RMF) provides guidelines for managing risks associated with AI systems, focusing on trustworthiness, accountability, and transparency. It offers a structured approach to identify and mitigate AI risks, developed through a collaborative process involving public comments and workshops. The framework includes a playbook, roadmap, and tools for implementing AI risk management practices. NIST also launched the Trustworthy and Responsible AI Resource Center to support international alignment and implementation. For more details, visit [NIST AI RMF](https://www.nist.gov/itl/ai-risk-management-framework). #### NIST AI RMF Generative AI Profile The NIST AI Risk Management Framework: Generative AI Profile outlines the risks unique to or exacerbated by generative AI (GAI), such as confabulation, data privacy issues, environmental impacts, and information security concerns. It provides actions for organizations to manage these risks, including governance, monitoring, and documentation procedures. The Generative AI Profile emphasizes transparency, compliance with legal standards, and the integration of GenAI-specific policies into existing risk management frameworks to ensure the safe and trustworthy deployment of generative AI systems. For details, you can access the full document [here](https://airc.nist.gov/docs/NIST.AI.600-1.GenAI-Profile.ipd.pdf). ### OWASP AI Security and Privacy Guide The OWASP AI Security and Privacy Guide provides actionable insights for designing, creating, testing, and procuring secure and privacy-preserving AI systems. It covers key areas like AI security, privacy principles, data minimization, transparency, fairness, and consent. The guide also addresses potential model attacks and provides strategies for maintaining data accuracy and handling personal data responsibly. The document is a collaborative effort aimed at improving AI security and privacy practices. For details, visit the [OWASP AI Security and Privacy Guide](https://owasp.org/www-project-ai-security-and-privacy-guide/). #### OWASP LLM AI Cybersecurity & Governance Checklist The "LLM AI Security and Governance Checklist" by OWASP provides a comprehensive guide for secure and responsible use of Large Language Models (LLMs). Key sections include: 1. **Overview**: Introduces responsible AI use and key challenges. 2. **The Checklist**: Covers adversarial risks, threat modeling, AI asset inventory, security training, business cases, governance, legal and regulatory compliance, deployment strategies, testing, and AI red teaming. 3. **Resources section**: Offers additional tools and standards for AI security. The [PDF](https://owasp.org/www-project-top-10-for-large-language-model-applications/llm-top-10-governance-doc/LLM_AI_Security_and_Governance_Checklist-v1.1.pdf) emphasizes integrating AI security with existing practices and highlights the importance of continuous evaluation and validation. Or check out the [OWASP Top 10](https://owasp.org/www-project-top-10-for-large-language-model-applications/). ### ISO/IEC 42001:2023 The ISO/IEC 42001:2023 standard, titled "Information technology – Artificial intelligence – Management system," provides guidelines for establishing, implementing, maintaining, and continually improving an AI management system. It focuses on addressing unique challenges posed by AI, such as ethical considerations, transparency, and continuous learning. The standard aims to help organizations manage AI risks and opportunities systematically, ensuring responsible and trustworthy AI implementation. For more information, you can visit the [ISO page](https://www.iso.org/standard/81230.html). ---------------------- ### May 2024 ### April 2024 ### March 2024 ### IEA Electricity 2024 https://www.iea.org/reports/electricity-2024 ## Global Electricity Trends and the Impact of Data Centers and AI The International Energy Agency (IEA) estimates that electricity consumption from data centers, artificial intelligence (AI), and the cryptocurrency sector could double by 2026, with data centers being significant drivers of growth in electricity demand globally. In 2022, these sectors consumed an estimated 460 terawatt-hours (TWh). By 2026, total electricity consumption from data centers could reach more than 1,000 TWh – roughly equivalent to the electricity consumption of Japan. ### Key Highlights * **Data Centers and AI as Major Energy Consumers** * Data centers currently account for around 1% of global electricity consumption, but their rapid growth is a significant factor in rising energy demand. * Energy consumption varies by AI application. For example, video generation is far more energy-intensive than text generation or AI-driven search functions. * The speed of AI adoption by households is swift, but long-term trends remain uncertain, depending on which AI applications become most popular. * **Global Electricity Demand Drivers** * Despite the significant growth of data center energy use, it represents only a small portion of global electricity demand growth. * The IEA's Stated Policies Scenario projects global electricity demand to rise by 6,750 TWh by 2030. * Bigger drivers of this growth include economic expansion, electric vehicles, air conditioners, and electricity-intensive manufacturing, which far outpace digitalization and AI in terms of demand impact. * **Strain on Local Power Networks** * The rapid growth of data centers can cause strain on local power grids, particularly due to mismatches in construction timelines versus grid expansion and strengthening. * Data centers are often concentrated in specific regions, intensifying their local impact. In some areas, the sector accounts for a significant share of electricity use, such as over 10% in at least five US states and more than 20% in Ireland. ### Data Centers in Australia In Australia, data center energy demand is forecast to increase significantly by 2030. * **Current Contribution to National Energy Use** * Data centers currently account for 5% of Australia’s total electricity generation. * By 2030, this share is projected to rise to 8%, with a possible high of 15% under certain scenarios. * **Power Supply Growth** * Morgan Stanley Research (MS) forecasts an increase in data center uninterruptible power supply requirements from 1,050 MW in 2024 to nearly 2,500 MW by 2030, marking a 13% increase. * Electricity demand from data centers is expected to reach nearly 23 TWh by 2030, an 18% rise from 2023 levels. * In the most bullish scenario, demand could surge to as much as 43 TWh by 2030, equivalent to over 20% of Australia’s current National Electricity Market (200 TWh annually). * **Regional Concentration** * New South Wales is projected to account for the largest share of data center electricity demand by 2030, at 24.7 TWh. * Victoria follows with a forecast demand of 9.4 TWh. ### Long-Term Challenges While the Australian grid is expected to manage increased data center energy demand through 2030, challenges loom beyond that as coal plants retire. Meeting the energy needs of a growing data center sector will require: * Efficiency improvements in data center operations. * The development of new renewable power generation sources. ### Government Actions The Australian government is taking a number of measures to improve data center energy efficiency, driven by the need to address their growing energy demands and reduce greenhouse gas emissions. * **Mandatory Five-Star NABERS Rating** * Data center service providers hosting federal agency workloads are now required to achieve a five-star rating under the National Australian Built Environment Rating System (NABERS). * **Digital Transformation Agency's Data Centre Panel** * The Digital Transformation Agency (DTA) has launched a new Data Centre Panel to promote sustainable practices. Providers seeking inclusion on the panel must: * Adhere to the Government’s ICT Sustainability Plan for data centers. * Meet emission thresholds outlined in the National Greenhouse and Energy Reporting Act. * Use GreenPower accredited renewable energy sources. * Maintain a 5-star NABERS rating or equivalent environmental rating. * Achieve a Power Usage Effectiveness (PUE) target of less than 1.4. * Implement a comprehensive plan for reaching net zero emissions through innovation, planning, and investment. ### Projected Improvements in PUE Ratings Power Usage Effectiveness (PUE), a key metric for data center energy efficiency, is expected to improve under all scenarios outlined by Morgan Stanley Research: * **Bull Case Scenario:** PUE remains steady at the current estimated rating of 1.7. * **Bear Case Scenario:** PUE improves significantly, falling to 1.2. * **Base Case Scenario:** PUE is projected to reach 1.35 by 2030. ### Understanding Power Usage Effectiveness (PUE) PUE is calculated by dividing the total facility power by the IT equipment power. A PUE of 1 represents perfect efficiency, where all energy consumed powers IT equipment directly. In 2022, the global average annual PUE was 1.55. ### Global Projections and Challenges As the IEA projects that global electricity consumption from data centers, cryptocurrencies, and AI could range between 620-1,050 TWh in 2026, up from 460 TWh in 2022, the base case forecast is for just over 800 TWh. This is equivalent to adding the electricity demand of one Sweden or one Germany to the grid. The range reflects uncertainties around the pace of deployment and efficiency gains, as well as future technological developments. The report highlights the following key factors driving data center energy demand: * Increased adoption of 5G networks and cloud-based services. * The rapid expansion of the Internet of Things (IoT). * The growing use of AI, including in search engines. * The growth of the cryptocurrency sector. ### Regional Data Center Electricity Consumption Projections * **United States:** Consumption is expected to rise from 200 TWh in 2022 to nearly 260 TWh by 2026, accounting for 6% of total U.S. electricity demand. * **China:** Projected to reach 300 TWh by 2026. * **European Union:** Expected to grow from slightly below 100 TWh in 2022 to nearly 150 TWh by 2026. ### Challenges and Opportunities for the Grid The rapid expansion of data centers creates challenges for electricity systems: * **Grid Stability:** Increased demand in concentrated locations may strain local grids. * **Spatial Concentration:** Clustering of data centers intensifies grid constraints in specific areas. ### Efficiency Gains and Sustainability Measures The IEA emphasizes the importance of efficiency gains to manage rising data center energy consumption. Key areas for improvement include: * High-efficiency cooling systems. * Direct-to-chip water cooling. * AI-optimized server adaptability. * Time and location shifting of demand. These efficiency gains, combined with the increasing use of renewable energy sources and Power Purchase Agreements (PPAs), are critical steps toward ensuring the sustainability of data center operations. As Australia’s data center sector grows, these strategies will be essential to mitigating its environmental and economic impact. ### February 2024 ### January 2024 ### December 2023 ### November 2023 ### Fun GPTs OpenAI's Dev Day introduced GPTs on ChatGPT Plus. Quite a bit of fun. Some cool ones: - [Trey Ratcliff's Photo Critique GPT](https://chat.openai.com/g/g-gWki9zYNV-trey-ratcliff-s-photo-critique-gpt) - [Stories from the Apple Design Team](https://chat.openai.com/g/g-4wleGSafJ-stories-from-the-apple-design-team) - [Design Critique](https://chat.openai.com/g/g-nlZ7YiDfx-design-critique) - [Grok's Dad](https://chat.openai.com/g/g-ZrSC5ltFX-grok-s-dad), you know, trolling the other guy... ### Fun with GPT OpenAI's Dev Day introduced the new GPT agents, which are now available on ChatGPT Plus. Played around with them during the weekend, and they are a lot of fun. Still Beta though, things can change. It's really easy to get started, and you can create your own agents in minutes. [Pixel Guide](https://chat.openai.com/g/g-L4xf17pbm-pixel-guide) _Expert advisor on photography techniques, camera settings, and editing_. Of course created [Social Recommendator](https://chat.openai.com/g/g-jAzmeQ7ip-social-recommendator): _Creates tailored professional recommendations with tone and length options_. Or go to the [original](https://ai.socialrecommendator.com). [Elder Care Companion](https://chat.openai.com/g/g-ZzLk267A5-elder-care-companion) _Friendly and reflective companion for engaging elderly in positive conversations_. [Retro Reel Buddy](https://chat.openai.com/g/g-7UteaHtTk-retro-reel-buddy) _Your fun guide to 80s and 90s movies, with engaging chats and recommendations!_ [Millennial Reel Buddy](https://chat.openai.com/g/g-mta1CB2VV-millennial-reel-buddy) _Your fun guide to 2000s and 2010s movies, with engaging chats and recommendations!_ [GenX Reel Buddy](https://chat.openai.com/g/g-dKqxHPxgx-genx-reel-buddy) _A fun guide to 60s and 70s movies, offering trivia and recommendations._ [Movie Spoiler](https://chat.openai.com/g/g-xdQaleNjn-movie-spoiler) _A spoiler assistant providing detailed plot summaries for movies._ [Puppy Parenting Coach](https://chat.openai.com/g/g-T5JwVRhkl-puppy-parenting-coach) _Puppy trainer specializing in gentle, positive puppy training._ [Palette Pen](https://chat.openai.com/g/g-DJfbzFLhB-palette-pen) _Assists in crafting representational, editorial-style spot illustrations._ [Verde Varie](https://chat.openai.com/g/g-BYJqgFyYq-verde-varie) _Friendly guide for growing and caring for variegated plants._ [Mind Flex](https://chat.openai.com/g/g-FP8kQP8MT-mind-flex) _Offers brain training exercises and concepts for mental agility._ [Home School Coach](https://chat.openai.com/g/g-7EmztG8u4-home-school-coach) _A supportive home schooling assistant offering educational guidance and resources._ But mine are peanuts in comparison with some of [the ones already available now](/posts/fun-gpts/). ### October 2023 ### September 2023 ### August 2023 ### July 2023 ### Web Directions AI 2023 A day of AI talks at Web Directions AI 2023 at UTS. Mainly Generative AI related presentations, with panel discussions on impact to business and the ethical use of AI. -> Check out the [Web Directions AI 2023 photo album](https://www.flickr.com/photos/halans/albums/72177720309936712) ### June 2023 ### Apparel Grab an intelligent t-shirt in [our t-shirt store](https://strangelove-ai.store). Always Intelligent! ### Identify guiding principles for Responsible AI (Course Notes) Microsoft Learn - Course Notes This (free, 1 hour long) Microsoft course forms in fact the basis of the IAT TAFE NSW "Responsible AI" course. It is a good introduction to the topic, and I recommend it to anyone interested in AI, and the impact it has on society. # Notes: Identify guiding principles for Responsible AI (Microsoft) [Identify guiding principles for responsible AI - Training | Microsoft Learn](https://learn.microsoft.com/en-au/training/modules/responsible-ai-principles/) ## Implications of responsible AI - Practical Guide * **Defining Technology** - AI, as the defining technology of our era, accelerates progress across all human fields and assists in resolving daunting societal challenges like ensuring remote education access and aiding in food production for a growing global population. * **Microsoft's Vision for AI** - Microsoft envisions AI as a tool to enhance human creativity and innovation. Their goal is to empower developers to innovate, organizations to transform industries, and individuals to reshape society. * **Societal Implications of AI** * The extensive use of AI brings about societal changes and raises complex questions about our desired future. Some key areas affected are decision-making in various industries, data security, privacy, and the necessary skills for success in the AI-influenced workplace. * Looking towards the future, it's crucial to address these questions: - How can we design, develop, and utilize AI systems that positively affect individuals and society? - How can we best prepare the workforce for AI's impact? - How can we enjoy AI's benefits while upholding privacy? * **Importance of Responsible AI Approach** - New intelligent technology can bring about unintended and unforeseen consequences with significant ethical implications. Hence, organizations must plan and oversee technology releases, anticipating and mitigating potential harm. * **Novel Threats** - Microsoft's experience with the 2016 Twitter chatbot, Tay, demonstrated that while technology may not inherently be unethical, its interaction with humans can produce harmful results, like the dissemination of hate speech. This highlighted the importance of preparing for attacks on learning datasets, leading to the development of advanced content filters and supervisors for AI systems with automatic learning capabilities. * **Biased Outcomes** - AI can inadvertently reinforce societal biases. Microsoft's risk scoring system for a lending institution, which only approved loans for male borrowers due to biased training data, exemplifies this. Developers must understand how bias can enter training data or machine learning models, and researchers should explore tools for detecting and reducing bias within AI systems. * **Sensitive Use Cases** - Certain technologies, like facial recognition, must be handled with care due to potential misuse for activities such as unwarranted surveillance. Society must establish proper boundaries for such technologies, ensuring they remain under legal regulation. * **Ongoing Responsibility** - While new laws and regulations are important, they cannot replace the responsibility that businesses, governments, NGOs, and academic researchers must exercise when engaging with AI. Open dialogue among all interested parties is vital to handle emerging AI's challenges and consequences responsibly. * **Applying Responsible AI Practices** - Consider how to use a **human-led approach** to drive business value. - Reflect on how your organization's **foundational values** will shape your AI strategy. - Plan on how to **monitor AI systems** for responsible evolution. ## Identify guiding principles for responsible AI * **Abstract: Responsible AI Development** - Emphasizes the responsibility of businesses, governments, NGOs, and researchers to anticipate and mitigate AI technology's unintended effects. - Highlights the need for internal policies to guide AI deployment and development. - Microsoft identifies six principles guiding AI development: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. - These principles are deemed fundamental for a responsible and trustworthy approach to AI as its presence in daily products and services grows. ### **Microsoft's Six Guiding Principles** * #### Fairness - AI should treat all individuals equally and avoid differential impact on similar groups. - AI decisions should be supplemented with human judgment, and individuals should be held accountable for decisions affecting others. - Developers should understand how bias can be introduced and its impact on AI recommendations. - To mitigate bias, diverse training datasets and adaptable AI models should be used, and resources that help detect and mitigate biases should be leveraged. * #### Reliability and Safety - AI systems should be reliable, safe, and consistent, capable of operating as designed even under unexpected conditions and resistant to harmful manipulations. - Verification of systems' behavior under actual operating conditions is crucial. - Rigorous testing during system development and deployment is necessary to ensure safe responses in unanticipated situations and to avoid unexpected failures. - Post-deployment, proper operation, maintenance, and protection of AI systems are critical. Long-term operations and monitoring should be considered in every AI implementation. - Human judgment is key in decision-making about AI system deployment, its continued use, and identifying potential biases and blind spots. * #### Privacy and Security - With the increasing prevalence of AI, privacy protection and data security have become more vital and complex. - AI systems need to comply with privacy laws, which demand transparency about data collection, usage, and storage and provide consumers with control over how their data is used. - Microsoft continues to invest in research for privacy and security solutions, as well as robust compliance processes, to ensure data used by their AI systems is managed responsibly. * #### Inclusiveness - Microsoft believes that everyone should benefit from AI technology, which should cater to a wide range of human needs and experiences. - AI can make a significant positive impact for the 1 billion people globally with disabilities, by improving access to services and opportunities through features like real-time speech to text transcription, visual recognition services, and predictive text functionalities. - Inclusive design practices can help developers identify and address potential barriers, leading to innovation and better user experiences for everyone. * #### Transparency - Transparency and accountability underpin all other principles, being essential for their effectiveness. - It is crucial for users to understand how AI-informed decisions impacting their lives are made, for instance, in cases of creditworthiness assessment by a bank or hiring decisions by a company. - An important aspect of transparency is 'intelligibility', which refers to the provision of clear explanations about the behavior and functioning of AI systems. - Users should be well-informed about when, why, and how AI systems are deployed. * #### Accountability - Designers and deployers of AI systems must be accountable for their systems' operations. - Organizations should establish accountability norms based on industry standards, ensuring that humans retain control over highly autonomous AI systems, and these systems are not the ultimate authority on impactful decisions. - Organizations should consider setting up internal review bodies to oversee and guide the company on best practices for AI development and deployment, including documenting and testing AI systems and handling sensitive use cases. - Recognition of the diverse beliefs and standards that every individual, company, and region holds should be reflected in the AI journey. ## Identify guiding principles for responsible AI – State Farm case study * **Responsible AI in the Insurance Industry** - The insurance industry heavily relies on data and statistical models, presenting significant opportunities for innovation using AI. - AI is integrated across numerous business functions in the industry, with machine learning models used to improve risk pricing, streamline claims processes, and detect fraud. - 63% of insurers believe intelligent technologies will completely transform the industry. - As insurers increase investments in AI, a responsible AI strategy is crucial. - For example, State Farm, the leading auto and home insurer in the US, uses AI solutions to enhance decision-making, increase productivity, reduce costs, and improve employee and customer experiences, all guided by a 'Good Neighbor' philosophy. - To responsibly manage AI, State Farm established a governance system, ensuring accountability for AI, and overseeing the development and management of AI solutions that benefit customers. * **Responsible AI Governance at State Farm** - State Farm develops controls for AI systems in parallel with their AI solutions, with oversight and control applied throughout the solution's lifecycle. - The Chief Data and Analytics Officer holds primary executive accountability for responsible AI across the organization, leading collaboration and evolution of AI principles enterprise-wide. - A central validation team, reporting to the Chief Data and Analytics Officer, oversees model validation and AI in software reviews, assessing AI models on aspects like training datasets, mathematical approaches, and business uses. - A model risk governance committee, with members from various business areas, provides strategic direction to the validation team by reviewing and approving model risk management procedures and guidelines, and serves as a forum for executive collaboration, education, and discussion on model risk topics. - The governance approach of State Farm aims to continually evolve AI control frameworks and integrate them at greater scale. * **Governance in Practice at State Farm** - State Farm introduced the Dynamic Vehicle Assessment Model (DVAM) to predict "total loss" scenarios in car accident claims more efficiently, reducing the total loss process from as high as 15 days to as little as 30 minutes. - The DVAM leverages data collected at the time of filing a claim, allowing for expanded vehicle inspection and settlement options. It predicts with a level of confidence whether a vehicle is a total loss or repairable, sometimes bypassing the need for a physical inspection. - This AI integration streamlines the claim settlement process, freeing up time for State Farm employees and agents to focus on enhancing customer experience. - Development and deployment of DVAM required collaboration across several decision-making bodies within the organization, ensuring alignment with intended business outcomes. - Business and AI development teams assessed the impacted KPIs, determined the baseline measurements, and monitored changes after the model's launch. - For AI governance, the business and validation teams worked together to evaluate the model, launching it in phases to allow for thorough assessment before full roll-out. The governance process was transparent, keeping all participants informed throughout. * **Key Lessons from State Farm's AI Integration** * **Interdisciplinary collaboration is crucial for successful AI:** AI integration across an organization involves cross-functional collaboration. State Farm encourages partnerships among diverse groups with different skills and perspectives. Having business decision-makers work alongside developers and technical experts in designing and developing AI solutions can better achieve organizational objectives. * **AI controls should evolve with AI technology:** As you adopt new technology, it's vital to develop corresponding controls. Legacy governance processes might not adequately regulate advanced technology and can impede innovation. Therefore, innovating AI governance controls alongside AI solutions can accelerate the innovation process and yield better business results. In the DVAM case study, automated model monitoring techniques were leveraged. * **Evaluation of State Farm's AI Strategy** * **Industry Environment Perspective:** Insurance companies aim to streamline business processes and reduce costs without compromising customer experience. The challenge lies in balancing AI advancements with responsible usage. * **Value Creation Perspective:** State Farm uses responsible AI principles to establish a governance system, allowing for quicker, more informed decisions. This creates value by improving both customer and employee satisfaction. * **Organization & Execution Perspective:** State Farm aligns their responsible AI strategy with their strategic business goals. They selected a fitting use case and established a governance system, leveraging existing data to bring a transformative AI solution to an established business process. * **Conclusion** - State Farm considers AI governance vital to their AI innovation. Their responsible AI frameworks facilitate faster, more informed decisions, maintain customer trust, and enhance customer and employee experiences. Staying true to their mission to help people contributes to their long-term success. ## Module Summary and Resources - This module explores Microsoft's approach to prioritizing responsible AI, which might serve as a useful reference for others. However, it acknowledges that unique beliefs and standards should shape each individual's, company's, or region's journey towards responsible AI. - As we progress towards responsible AI, our approaches should adapt to new innovations and lessons learned from our successes and failures. - The mentioned processes, tools, and resources could serve as a starting point for organizations developing their own AI strategy. - With the increasing use of AI across all sectors, it's vital to maintain open dialogue among stakeholders. Early AI adopters play a significant role in promoting responsible use of AI and preparing society for its impacts. * **Fairness** - Explore the intent, design, and potential impacts of the AI system to ensure its equitable functionality. - Strive for diversity in the design team to reflect diverse backgrounds, experiences, and perspectives. - Detect bias in datasets by scrutinizing their origins, organization, and representation. - Identify bias in machine learning algorithms using transparency-enhancing tools and techniques. - Ensure human oversight and involve domain experts, especially for AI-informed decisions affecting people. - Follow and implement best practices, analytical techniques, and tools to detect, prevent, and mitigate bias in AI systems. * **Reliability and Safety** - Assess your organization’s AI readiness using tools like Microsoft's AI Ready Assessment. - Establish procedures for auditing AI systems to check the quality and appropriateness of data and models. - Provide detailed explanations of the AI system's operation, including design specifics, training data details, and inferences generated. - Design systems to handle unexpected circumstances, including accidental interactions or cyberattacks. - Involve domain experts in AI design and implementation, especially when consequential decisions are involved. - Conduct comprehensive testing of AI systems in both lab and real-world settings. - Evaluate the need for human input in impactful decisions or critical situations. - Create robust user feedback mechanisms to swiftly resolve performance issues. * **Privacy and Security** - Adhere to relevant data protection, privacy, and transparency laws during AI development. - Design AI systems to uphold personal data integrity, using it only when necessary and for stated purposes. - Secure AI systems from threats by following secure development practices, limiting access based on roles, and safeguarding data shared with third parties. - Design AI systems to allow customers control over data collection and usage. - Ensure anonymity by de-identifying personal data in your AI system. - Regularly conduct privacy and security reviews of all AI systems. - Implement industry best practices for tracking, accessing, and auditing usage of customer data. * **Inclusiveness** - Comply with laws on accessibility and inclusiveness such as the Americans with Disabilities Act and the Communications and Video Accessibility Act. - Use resources like the Inclusive Design toolkit to identify and address potential barriers in product environments that could exclude people. - Involve people with disabilities in testing your systems to ensure broadest possible audience usability. - Adopt commonly used accessibility standards to improve system accessibility for all abilities. * **Transparency** - Share important attributes of datasets to help developers understand their suitability for specific use cases. - Enhance model intelligibility by utilizing simpler models and generating clear explanations of model behavior. - Train employees on interpreting AI outputs and maintaining accountability for consequential decisions based on AI results. * **Accountability** - Establish internal review boards for oversight and guidance on responsible AI development and deployment. - Train employees to responsibly and ethically use and maintain AI solutions, and understand when to seek additional technical support. - Involve expert humans in decisions about model execution, ensuring they can inspect, identify, and address challenges with model output and execution. - Implement a clear accountability and governance system to handle rectifications or corrections if models behave unfairly or potentially harmfully. [Download PDF](https://aka.ms/AA62hp7) of _Implications of responsible AI - Practical guide_ to share with others. [Download PDF](https://aka.ms/AA629xb) of _Responsible AI - Identify guiding principles_ to share with others. ### May 2023 ### April 2023 ### Prompting for prompts ## Meta prompting Over on [amAIzing art](https://amaizing.art) I have been playing with Generative AI and synthography for a few months now. I also have an OpenAI ChatGPT Pro subscription and API access. Which means I can take my Midjourney prompting to the next level, prompting ChatGPT for Midjourney prompts. ChatGPT doesn't really seem to know anything about "midjourney", describing it as "the middle stage of any ongoing process". But you can teach it to create a piece of text, and have it add a subjects description to it. By creating a contextual, meta prompt, you can generate expanded, descriptive Midjorney prompts based on simple input, over and over again. An example for such a prompt is my [Selfie Time Capsule](https://www.instagram.com/selfietimecapsule/), where I generated a ChatGPT meta prompt, telling it to generate a descriptive Midjourney prompt, given a particular subject. For example, given "Aztec warriors" in ChatGPT would result ChatGPT generating a Midjourney usable prompt: ``/imagine prompt: Three Aztec warriors huddled together for a wide-angle selfie picture::5 Wearing feathered headdresses, ornate body armor, and holding obsidian weapons, they stand in a sacred temple surrounded by intricate carvings and offerings. The image has a warm and earthy color palette, with reds, yellows, and greens evoking the vibrant nature of the Mesoamerican culture, showcasing their fierce and proud expressions. The image is photorealistic, 16K, has natural lighting, and is taken with a front-facing phone selfie camera held by one of the warriors in the photo::4. --ar 3:2 --s 1000 --v 5`` I told ChatGPT how to start and end the prompt, and for a given subject, to look up the period, the clothing and tools, pick an approriate color palette and facial expressions. That way, I can provide, "WW2 marines", "Four 1980s Madonna fans", "three 1950s airplane travelers", "Genghis Khan", even "koalas", "two T-Rex",... and ChatGPT provides me a Midjourney prompt, which it seems to interpret quite well. Check out some of the results on this [Selfie Time Capsule Instagram account](https://www.instagram.com/selfietimecapsule/). Interested in the ChatGPT meta prompt? Find the prompt info on this [Selfie Time Capsule landing page](https://selfietimecapsule.prompt.cards), or for the price of a coffee, go straight to [Gumroad](https://toolsheet.gumroad.com/l/selfietimecapsulepromptgenerator). And if you ever wondered what happened with your public selfies on social media, well now you know, it taught AI to generate new selfies... ### March 2023 ### February 2023 ### AI Social Recommendator ## AI Social Recommendator using ChatGPT https://ai.socialrecommendator.com A recommendations generator for social and business network endorsements now using ChatGPT. AI.SocialRecommendator.com gives you a head start, or inspiration. Fill out some information on your colleague and we produce a sample for you using the ChatGPT API to use as inspiration (or as-is of course). I had this running since 2009, using static data and parameter substitution, constructing personal recommendations. It was always intended to use AI, and now 14 years later, I can. Costs me money to access the OpenAI API though. As long as I don't hit my preset API limit each month, it should work fine. Else come back next month... Additionally, Vercel Free-tier Serverless Functions has a 10s timeout. Because of it, three-five paragraph requests to the OpenAI API time out, and I need to be on the Pro tier with a 60s timeout to make it useful. Open for [sponsorhip](https://forms.gle/qEHsmXPGLuV7Dnfr8) or [buy me a coffee](https://ko-fi.com/halans), to cover the $20/month cost for Vercel + the $40 limit on the OpenAI API. I can still fall back to the [original version](https://original.socialrecommendator.com) if need be. Internally, it uses a particular prompt, with some added randomness, including: random temperature and complimentary level (from positive to favorable to glowing). I made the length selectable instead of random to make it more useful. Check out the [easter egg on the API](https://ai.socialrecommendator.com/api/get-recommendation)? And my first post on [ProductHunt](https://www.producthunt.com/posts/ai-social-recommendator/). [Upvote](https://www.producthunt.com/posts/ai-social-recommendator/)? ### January 2023 ### Grow Your Audience AI Prompts ## Social Media AI Prompts https://socialmedia.prompt.cards Australia Day long weekend. I was subscribed to BetterSheets.co, intrigued how people use Google Sheets to sell content. Finally had an idea to create a Google Sheet with 50 Social Media prompts to ask ChatGPT. As AI soaks up the world's knowledge, including the collective knowledge of the best social media experts, it becomes an expert itself (or so they say). Works as a template in Google Sheets with variables and concatination. Learn about strategy and business needs, brand reputation, and analytics and metrics. Add a personality or trait option to encourage empathy through imagination, and pick an output format option. Then copy/paste into a AI chatbot, like ChatGPT. Peope can use the Template as inspiration, add their own relevant options to these prompts, and see how AI responds. Created this [landing page](https://socialmedia.prompt.cards) for discovery. Offered for sale (but free) on [Gumroad](https://halans.gumroad.com/l/GrowYourAudienceAIPrompts). I can think of other categories beyond social media this could apply to. EDIT: Created a new one around [productivity and knowledge management tools](https://productivitytools.prompt.cards). ### December 2022 ### November 2022 ### October 2022 ### September 2022 ### August 2022 ### July 2022 ### AI.PROMPT.CARDS ### Another weekend project. I'm sure you've seen those AI generated images (Craiyon, DALL-E and Midjourney). They take a prompt, a description of the image you want, and then generate a number of variations based on it. Crafting a prompt, or “prompt engineering”, is an art form in and of itself. Last weekend, created which can help you create these prompts, with general “vibes”/feelings, aesthetics, photographic, illustration, artsy, or 3D/textured modifiers. Don't use it all at once combined, but pick and choose carefully (less is more). Then copy/paste the prompt into the AI image generator. #### Prompt Cards A “prompt card” is a tool that contains bits of information, that helps you in learning, interviews, presentations,… I thought it would make a clever domain… See some of the (terrible) Midjourney generated images [here](https://halans.com/the-future-of-cyber-warfare/), [here](https://halans.com/viasat-ukraine-case-study/) and [here](https://halans.com/cyber-warfare-and-terrorism/). Those are mostly first-tries, no further prompting. ### June 2022 ### 2025 Wrap-Up
2025 Unwrapped
Loading the year...
### 2025 Timeline
### About

Delegating strategic decision-making to machines?

How I Learned to Stop Worrying and Embracing AI

Personal writing by [@halans](https://bsky.app/profile/halans.com) on artificial intelligence, covering generative AI, ethics, safety, and security. This collection includes news links, essays, and analysis examining AI's technical development and societal implications. Inspired by [Dr.Strangelove](https://en.wikipedia.org/wiki/Dr._Strangelove) ![]( /img/drstrangelove.jpg ) ### Elements {{< toc >}} Here is an example of headings. You can use this heading by the following markdown rules. For example: use `#` for heading 1 and use `######` for heading 6. # Heading 1 ## Heading 2 ### Heading 3 #### Heading 4 ##### Heading 5 ###### Heading 6
### Emphasis The emphasis, aka italics, with _asterisks_ or _underscores_. Strong emphasis, aka bold, with **asterisks** or **underscores**. The combined emphasis with **asterisks and _underscores_**. Strike through uses two tildes. ~~Scratch this.~~
### Button {{< button label="Button" link="/" style="solid" >}}
### Link [I'm an inline-style link](https://www.google.com) [I'm an inline-style link with title](https://www.google.com "Google's Homepage") [I'm a reference-style link][Arbitrary case-insensitive reference text] [I'm a relative reference to a repository file](../blob/master/LICENSE) [You can use numbers for reference-style link definitions][1] Or leave it empty and use the [link text itself]. URLs and URLs in angle brackets will automatically get turned into links. or and sometimes example.com (but not on Github, for example). Some text to show that the reference links can follow later. [arbitrary case-insensitive reference text]: https://www.themefisher.com [1]: https://gethugothemes.com [link text itself]: https://www.getjekyllthemes.com
### Paragraph Lorem ipsum dolor sit amet consectetur adipisicing elit. Quam nihil enim maxime corporis cumque totam aliquid nam sint inventore optio modi neque laborum officiis necessitatibus, facilis placeat pariatur! Voluptatem, sed harum pariatur adipisci voluptates voluptatum cumque, porro sint minima similique magni perferendis fuga! Optio vel ipsum excepturi tempore reiciendis id quidem? Vel in, doloribus debitis nesciunt fugit sequi magnam accusantium modi neque quis, vitae velit, pariatur harum autem a! Velit impedit atque maiores animi possimus asperiores natus repellendus excepturi sint architecto eligendi non, omnis nihil. Facilis, doloremque illum. Fugit optio laborum minus debitis natus illo perspiciatis corporis voluptatum rerum laboriosam.
### Ordered List 1. List item 2. List item 3. List item 4. List item 5. List item
### Unordered List - List item - List item - List item - List item - List item
### Notice {{< notice "note" >}} This is a simple note. {{< /notice >}} {{< notice "tip" >}} This is a simple tip. {{< /notice >}} {{< notice "info" >}} This is a simple info. {{< /notice >}} {{< notice "warning" >}} This is a simple warning. {{< /notice >}}
### Tab {{< tabs >}} {{< tab "Tab 1" >}} #### Hey There, I am a tab Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. {{< /tab >}} {{< tab "Tab 2" >}} #### I wanna talk about the assassination attempt Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. {{< /tab >}} {{< tab "Tab 3" >}} #### We know you’re dealing in stolen ore Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo {{< /tab >}} {{< /tabs >}}
### Accordions {{< accordion "Why should you need to do this?" >}} - Lorem ipsum dolor sit amet consectetur adipisicing elit. - Lorem ipsum dolor sit amet consectetur adipisicing elit. - Lorem ipsum dolor sit amet consectetur {{< /accordion >}} {{< accordion "How can I adjust Horizontal centering" >}} - Lorem ipsum dolor sit amet consectetur adipisicing elit. - Lorem ipsum dolor sit amet consectetur adipisicing elit. - Lorem ipsum dolor sit amet consectetur {{< /accordion >}} {{< accordion "Should you use Negative margin?" >}} - Lorem ipsum dolor sit amet consectetur adipisicing elit. - Lorem ipsum dolor sit amet consectetur adipisicing elit. - Lorem ipsum dolor sit amet consectetur {{< /accordion >}}
### Code and Syntax Highlighting This is an `Inline code` sample. ```javascript var s = "JavaScript syntax highlighting"; alert(s); ``` ```python s = "Python syntax highlighting" print s ``` ```mermaid flowchart TD A[Start] --> B{Is it?} B -- Yes --> C[OK] C --> D[Rethink] D --> B B -- No ----> E[End] ```
### Blockquote > Did you come here for something in particular or just general Riker-bashing? And blowing into maximum warp speed, you appeared for an instant to be in two places at once.
### Tables | Tables | Are | Cool | | ------------- | :-----------: | ----: | | col 3 is | right-aligned | $1600 | | col 2 is | centered | $12 | | zebra stripes | are neat | $1 |
### Image {{< image src="images/image-placeholder.png" caption="" alt="alter-text" height="" width="" position="center" command="fill" option="q100" class="img-fluid" title="image title" webp="false" >}}
### Gallery {{< gallery dir="images/gallery" class="" height="400" width="400" webp="true" command="Fit" option="" zoomable="true" >}}
### Slider {{< slider dir="images/gallery" class="max-w-[600px] ml-0" height="400" width="400" webp="true" command="Fit" option="" zoomable="true" >}}
### Youtube video {{< youtube ResipmZmpDU >}}
### Custom video {{< video src="https://www.w3schools.com/html/mov_bbb.mp4" width="100%" height="auto" autoplay="false" loop="false" muted="false" controls="true" class="rounded-lg" >}} ### Newsletter Archive Contents of the newsletter over the years. ### Privacy #### Responsibility of Contributors Lorem ipsum dolor sit amet, consectetur adipiscing elit. Purus, donec nunc eros, ullamcorper id feugiat quisque aliquam sagittis. Sem turpis sed viverra massa gravida pharetra. Non dui dolor potenti eu dignissim fusce. Ultrices amet, in curabitur a arcu a lectus morbi id. Iaculis erat sagittis in tortor cursus. Molestie urna eu tortor, erat scelerisque eget. Nunc hendrerit sed interdum lacus. Lorem quis viverra sed pretium, aliquam sit. Praesent elementum magna amet, tincidunt eros, nibh in leo. Malesuada purus, lacus, at aliquam suspendisse tempus. Quis tempus amet, velit nascetur sollicitudin. At sollicitudin eget amet in. Eu velit nascetur sollicitudin erhdfvssfvrgss eget viverra nec elementum. Lacus, facilisis tristique lectus in. #### Gathering of Personal Information Lorem ipsum dolor sit amet, consectetur adipiscing elit. Purus, donec nunc eros, ullamcorper id feugiat quisque aliquam sagittis. Sem turpis sed viverra massa gravida pharetra. Non dui dolor potenti eu dignissim fusce. Ultrices amet, in curabitur a arcu a lectus morbi id. Iaculis erat sagittis in tortor cursus. Molestie urna eu tortor, erat scelerisque eget. Nunc hendrerit sed interdum lacus. Lorem quis viverra sed #### Protection of Personal- Information Lorem ipsum dolor sit amet, consectetur adipiscing elit. Purus, donec nunc eros, ullamcorper id feugiat quisque aliquam sagittis. Sem turpis sed viverra massa gravida pharetra. Non dui dolor potenti eu dignissim fusce. Ultrices amet, in curabitur a arcu a lectus morbi id. Iaculis erat sagittis in tortor cursus. Molestie urna eu tortor, erat scelerisque eget. Nunc hendrerit sed interdum lacus. Lorem quis viverra sed Lorem ipsum dolor sit amet, consectetur adipiscing elit. Purus, donec nunc eros, ullamcorper id feugiat #### Privacy Policy Changes 1. Sll the Themefisher items are designed to be with the latest , We check all 2. comments that threaten or harm the reputation of any person or organization 3. personal information including, but limited to, email addresses, telephone numbers 4. Any Update come in The technology Customer will get automatic Notification. ### Terms of Service #### Responsibility of Contributors Lorem ipsum dolor sit amet, consectetur adipiscing elit. Purus, donec nunc eros, ullamcorper id feugiat quisque aliquam sagittis. Sem turpis sed viverra massa gravida pharetra. Non dui dolor potenti eu dignissim fusce. Ultrices amet, in curabitur a arcu a lectus morbi id. Iaculis erat sagittis in tortor cursus. Molestie urna eu tortor, erat scelerisque eget. Nunc hendrerit sed interdum lacus. Lorem quis viverra sed pretium, aliquam sit. Praesent elementum magna amet, tincidunt eros, nibh in leo. Malesuada purus, lacus, at aliquam suspendisse tempus. Quis tempus amet, velit nascetur sollicitudin. At sollicitudin eget amet in. Eu velit nascetur sollicitudin erhdfvssfvrgss eget viverra nec elementum. Lacus, facilisis tristique lectus in. #### Gathering of Personal Information Lorem ipsum dolor sit amet, consectetur adipiscing elit. Purus, donec nunc eros, ullamcorper id feugiat quisque aliquam sagittis. Sem turpis sed viverra massa gravida pharetra. Non dui dolor potenti eu dignissim fusce. Ultrices amet, in curabitur a arcu a lectus morbi id. Iaculis erat sagittis in tortor cursus. Molestie urna eu tortor, erat scelerisque eget. Nunc hendrerit sed interdum lacus. Lorem quis viverra sed #### Protection of Personal- Information Lorem ipsum dolor sit amet, consectetur adipiscing elit. Purus, donec nunc eros, ullamcorper id feugiat quisque aliquam sagittis. Sem turpis sed viverra massa gravida pharetra. Non dui dolor potenti eu dignissim fusce. Ultrices amet, in curabitur a arcu a lectus morbi id. Iaculis erat sagittis in tortor cursus. Molestie urna eu tortor, erat scelerisque eget. Nunc hendrerit sed interdum lacus. Lorem quis viverra sed Lorem ipsum dolor sit amet, consectetur adipiscing elit. Purus, donec nunc eros, ullamcorper id feugiat #### Privacy Policy Changes 1. Sll the Themefisher items are designed to be with the latest , We check all 2. comments that threaten or harm the reputation of any person or organization 3. personal information including, but limited to, email addresses, telephone numbers 4. Any Update come in The technology Customer will get automatic Notification.