Assessment Tool Catalog
Select any tool to view full criteria, methodology, and usage guidance, or launch directly.
Agentic AI Governance Assessment Tool
Evaluates an agentic AI task against five governance criteria and returns a structured Governance Assessment Card with an intervention level recommendation and suggested oversight actions.
Human Oversight Quality Index
Assesses whether human oversight of an AI system is meaningful, scoring reviewer competence, decision authority, cognitive load, feedback loop integrity, and accountability across five dimensions.
AI Use Case Risk Tiering Wizard
Assigns a defensible Tier 1-5 risk designation to any AI use case across five dimensions: consequential impact, autonomy and oversight, data sensitivity, decision reversibility, and validation maturity.
AI RMF Implementation Fidelity Checker
Probes the gap between NIST AI RMF adoption and operationalization across all four framework functions using a three-point fidelity scale that distinguishes documented compliance from actual use.
Agentic AI Governance Assessment Tool
This tool operationalizes the five-characteristic human oversight decision framework developed in GIAG Working Paper Two. You describe an agentic AI task in plain language. The tool evaluates it against five governance criteria, synthesizes the scores, and returns a structured Governance Assessment Card with an intervention level recommendation and suggested oversight actions.
The Five Criteria
Intervention Levels
How to Use It
Describe your task
Enter a plain-language description of the agentic AI task you want to evaluate. Include what the agent does autonomously, what data it accesses, who is affected by its outputs, and what decisions it makes without human input. One to three sentences is sufficient. Three preloaded examples cover lower-risk, mixed-risk, and higher-risk scenarios.
Click Run Assessment
The tool evaluates the task in four stages: decomposition, five-criterion scoring, synthesis, and output formatting. Each criterion is evaluated independently. Allow 20 to 40 seconds for the assessment to complete. The stage-by-stage progress is visible during processing.
Review the Governance Assessment Card
The card presents criterion scores with rationale, an intervention level recommendation with synthesis reasoning, and suggested oversight actions specific to the task. You may optionally enter an email address to receive the card by email. Each run is independent; no data is stored or retained.
Try the Assessment Tool
No login required. Works on any device with a browser. Three example tasks are preloaded, or enter your own.
Methodology and Source
The five-characteristic framework and intervention level taxonomy are documented in GIAG Working Paper Two. The tool uses a four-stage chained prompt architecture built on the Anthropic API. Each assessment run is stateless; no task descriptions, results, or user information are stored. The tool is a research prototype, not a certified governance instrument, and should be used accordingly.
Human Oversight Quality Index (HOQI)
HOQI asks a more fundamental question than whether human oversight exists: it asks whether that oversight is meaningful. The tool scores an AI system's oversight arrangement across five dimensions that determine whether human review translates into genuine accountability or merely into documentation. It generates a scored profile and a Claude-powered advisory analysis tailored to your deployment context and sector.
The Five Dimensions
Quality Levels
How to Use It
Describe your oversight arrangement
Provide a plain-language description of the AI system you are assessing and the oversight structure currently in place: who reviews outputs, at what cadence, with what authority, and in what operational context. The more specific the description, the more precise the scoring and advisory output.
Answer the diagnostic questions
The tool walks through structured prompts for each of the five dimensions. Responses do not require technical expertise. They draw on what practitioners already know about how their oversight processes actually work in practice, not how they are documented on paper.
Review the HOQI score and advisory analysis
The output includes a dimension-by-dimension score with rationale, a composite HOQI score, and a practitioner advisory that identifies the highest-priority gaps and suggests specific strengthening actions. Scores are contextualized to your sector and deployment type.
Run a HOQI Assessment
No login required. Works on any device with a browser. Typical completion time is five to ten minutes.
Methodology and Source
The five oversight quality dimensions and scoring rubric are grounded in the human oversight decision framework developed in GIAG Working Paper Two. The HOQI tool extends that framework from task-level oversight decisions to system-level oversight quality evaluation. Scores are generated through a structured AI-assisted analysis using the Anthropic API. Each assessment run is stateless. The tool is a research prototype, not a certified governance instrument.
AI Use Case Risk Tiering Wizard
Risk tiering is a foundational governance activity, one that many agencies complete inconsistently or based on informal judgment rather than structured criteria. This wizard assigns a defensible Tier 1-5 risk designation to any AI use case by evaluating it across five dimensions. The result is a scored profile with documented rationale that can support governance documentation, procurement decisions, and oversight planning.
The Five Dimensions
Risk Tiers
How to Use It
Describe the AI use case
Enter a description of the AI use case you want to tier: what it does, who it affects, what data it uses, and how its outputs are used in decision-making. You do not need to complete a formal use case inventory in advance; a working description is sufficient for the tool to generate a scored assessment.
The wizard scores across five dimensions
Each dimension is scored independently against a structured rubric. The wizard evaluates your description against each criterion and generates a rationale for each score. The process takes 20 to 40 seconds. Dimension scores are combined into an overall tier designation using a weighted synthesis that reflects relative governance significance.
Review the tier designation and rationale
The output provides a Tier 1-5 designation with full scoring rationale, a summary of the factors driving the tier, and governance recommendations calibrated to that tier level. The rationale is designed to be documentable, suitable for use in governance records or to support oversight conversations with leadership.
Run the Risk Tiering Wizard
No login required. Works on any device with a browser. Preloaded examples span routine administrative tools to high-consequence decision support systems.
Methodology and Source
The five-dimension risk tiering framework draws on NIST AI RMF categorization guidance, OMB M-24-10 use case inventory requirements, and GIAG Stream One research on implementation fidelity patterns across federal and state agencies. The scoring rubric is documented in Working Paper One. The tool uses the Anthropic API for dimension scoring and synthesis. Each run is stateless.
AI RMF Implementation Fidelity Checker
Most agencies that claim NIST AI RMF adoption have documented the framework rather than operationalized it. This tool probes the difference. It evaluates implementation across all four AI RMF functions using a three-point fidelity scale that distinguishes documented compliance, partial operationalization, and genuine use. The output identifies which functions are performing at fidelity and which represent governance gaps that policy artifacts do not reveal.
The Four AI RMF Functions
Three-Point Fidelity Scale
How to Use It
Describe your current RMF implementation
Provide a description of how your organization currently implements or references the NIST AI RMF, including which functions have formal documentation, where active processes exist, who owns implementation, and how the framework connects to actual AI deployment decisions. Candid descriptions of partial or nominal compliance produce more useful diagnostic output.
The checker probes each function
The tool generates targeted diagnostic questions for each of the four RMF functions, then evaluates your responses against the three-point fidelity scale. The probing sequence is designed to surface the gap between policy documentation and operational practice, the gap that aggregate compliance metrics consistently obscure.
Review the fidelity profile and gap analysis
The output presents a function-by-function fidelity score, a composite implementation profile, and a prioritized gap analysis. The advisory identifies the most consequential fidelity gaps and suggests specific operationalization steps calibrated to your organization's current state. Results are suitable for use in maturity assessments or implementation planning.
Run the Fidelity Checker
No login required. Works on any device with a browser. Typical completion time is ten to fifteen minutes for a thorough description of current RMF implementation.
Methodology and Source
The fidelity framework and three-point scale are documented in GIAG Working Paper One: "Implementation Fidelity: Why AI RMF Adoption Metrics Are Measuring the Wrong Thing." The tool operationalizes the paper's core argument that adoption rates and framework citations measure presence, not use. The probing questions and scoring rubric are grounded in the diagnostic methodology developed through GIAG Stream One practitioner research. Each run is stateless. The tool is a research prototype, not a certified audit instrument.