GIAG A THINKCAPITAL RESEARCH PROGRAM

Applied Research Output • Stream One & Two

Assessment Tools

Four interactive instruments built directly from GIAG research frameworks. These are working prototypes, not production systems, designed to make research findings tangible and testable by practitioners. All tools are free, open-access, and require no login.

Select any tool to view full criteria, methodology, and usage guidance, or launch directly.

AGA Stream Two

Agentic AI Governance Assessment Tool

Evaluates an agentic AI task against five governance criteria and returns a structured Governance Assessment Card with an intervention level recommendation and suggested oversight actions.

Launch →
HOQI Stream Two

Human Oversight Quality Index

Assesses whether human oversight of an AI system is meaningful, scoring reviewer competence, decision authority, cognitive load, feedback loop integrity, and accountability across five dimensions.

Launch →
Risk Tier Stream One & Two

AI Use Case Risk Tiering Wizard

Assigns a defensible Tier 1-5 risk designation to any AI use case across five dimensions: consequential impact, autonomy and oversight, data sensitivity, decision reversibility, and validation maturity.

Launch →
RMF-FC Stream One

AI RMF Implementation Fidelity Checker

Probes the gap between NIST AI RMF adoption and operationalization across all four framework functions using a three-point fidelity scale that distinguishes documented compliance from actual use.

Launch →
GIAG Assessment Tool Suite diagram showing four instruments mapped by deployment phase (pre-deployment vs. active deployment) and analytical scope (task/use case vs. portfolio/organizational)
GIAG Assessment Tool Suite: areas of scope and specialty across deployment phase and analytical scope. Stream One and Two instruments. ThinkCapital LLC, May 2026.
Stream Two  •  Active Prototype

Agentic AI Governance Assessment Tool

This tool operationalizes the five-characteristic human oversight decision framework developed in GIAG Working Paper Two. You describe an agentic AI task in plain language. The tool evaluates it against five governance criteria, synthesizes the scores, and returns a structured Governance Assessment Card with an intervention level recommendation and suggested oversight actions.

IrreversibilityCan the action be undone?
Consequence TransferWho bears the outcome?
Distributional NoveltyIs the population in scope?
Value ConflictAre competing values at stake?
Legal / RegulatoryIs there formal accountability?
Autonomous
Monitor
Review Required
Joint Execution
Human Only
1

Describe your task

Enter a plain-language description of the agentic AI task you want to evaluate. Include what the agent does autonomously, what data it accesses, who is affected by its outputs, and what decisions it makes without human input. One to three sentences is sufficient. Three preloaded examples cover lower-risk, mixed-risk, and higher-risk scenarios.

2

Click Run Assessment

The tool evaluates the task in four stages: decomposition, five-criterion scoring, synthesis, and output formatting. Each criterion is evaluated independently. Allow 20 to 40 seconds for the assessment to complete. The stage-by-stage progress is visible during processing.

3

Review the Governance Assessment Card

The card presents criterion scores with rationale, an intervention level recommendation with synthesis reasoning, and suggested oversight actions specific to the task. You may optionally enter an email address to receive the card by email. Each run is independent; no data is stored or retained.

Try the Assessment Tool

No login required. Works on any device with a browser. Three example tasks are preloaded, or enter your own.

The five-characteristic framework and intervention level taxonomy are documented in GIAG Working Paper Two. The tool uses a four-stage chained prompt architecture built on the Anthropic API. Each assessment run is stateless; no task descriptions, results, or user information are stored. The tool is a research prototype, not a certified governance instrument, and should be used accordingly.

Practitioner feedback welcome. If you run a task through the tool and find the scoring inaccurate, the rationale unclear, or the intervention level recommendation miscalibrated for your operational context, that is useful research input. Use the Engage page to share your observations.
Stream Two  •  Active Prototype

Human Oversight Quality Index (HOQI)

HOQI asks a more fundamental question than whether human oversight exists: it asks whether that oversight is meaningful. The tool scores an AI system's oversight arrangement across five dimensions that determine whether human review translates into genuine accountability or merely into documentation. It generates a scored profile and a Claude-powered advisory analysis tailored to your deployment context and sector.

Reviewer CompetenceDoes the reviewer understand what they are reviewing?
Decision AuthorityCan the reviewer actually stop or change the output?
Cognitive LoadIs the review volume and pace compatible with careful judgment?
Feedback Loop IntegrityDo reviewer corrections reach the system and improve it?
AccountabilityIs there a named individual responsible for outcomes?
Absent
Marginal
Nominal
Functional
Exemplary
1

Describe your oversight arrangement

Provide a plain-language description of the AI system you are assessing and the oversight structure currently in place: who reviews outputs, at what cadence, with what authority, and in what operational context. The more specific the description, the more precise the scoring and advisory output.

2

Answer the diagnostic questions

The tool walks through structured prompts for each of the five dimensions. Responses do not require technical expertise. They draw on what practitioners already know about how their oversight processes actually work in practice, not how they are documented on paper.

3

Review the HOQI score and advisory analysis

The output includes a dimension-by-dimension score with rationale, a composite HOQI score, and a practitioner advisory that identifies the highest-priority gaps and suggests specific strengthening actions. Scores are contextualized to your sector and deployment type.

Run a HOQI Assessment

No login required. Works on any device with a browser. Typical completion time is five to ten minutes.

The five oversight quality dimensions and scoring rubric are grounded in the human oversight decision framework developed in GIAG Working Paper Two. The HOQI tool extends that framework from task-level oversight decisions to system-level oversight quality evaluation. Scores are generated through a structured AI-assisted analysis using the Anthropic API. Each assessment run is stateless. The tool is a research prototype, not a certified governance instrument.

Practitioner feedback welcome. HOQI calibration is an ongoing research activity. If your sector's oversight arrangements produce scores that feel miscalibrated, too high, too low, or missing important structural factors, that feedback directly informs the scoring rubric. Use the Engage page to share observations.
Stream One & Two  •  Active Prototype

AI Use Case Risk Tiering Wizard

Risk tiering is a foundational governance activity, one that many agencies complete inconsistently or based on informal judgment rather than structured criteria. This wizard assigns a defensible Tier 1-5 risk designation to any AI use case by evaluating it across five dimensions. The result is a scored profile with documented rationale that can support governance documentation, procurement decisions, and oversight planning.

Consequential ImpactWhat is the scale and severity of potential harm?
Autonomy and OversightHow much does the system operate without human check?
Data SensitivityWhat categories of data does the system access or process?
Decision ReversibilityCan downstream decisions be corrected or appealed?
Validation MaturityHow well-tested and validated is the system for this use case?
Tier 1
Minimal
Tier 2
Low
Tier 3
Moderate
Tier 4
High
Tier 5
Critical
1

Describe the AI use case

Enter a description of the AI use case you want to tier: what it does, who it affects, what data it uses, and how its outputs are used in decision-making. You do not need to complete a formal use case inventory in advance; a working description is sufficient for the tool to generate a scored assessment.

2

The wizard scores across five dimensions

Each dimension is scored independently against a structured rubric. The wizard evaluates your description against each criterion and generates a rationale for each score. The process takes 20 to 40 seconds. Dimension scores are combined into an overall tier designation using a weighted synthesis that reflects relative governance significance.

3

Review the tier designation and rationale

The output provides a Tier 1-5 designation with full scoring rationale, a summary of the factors driving the tier, and governance recommendations calibrated to that tier level. The rationale is designed to be documentable, suitable for use in governance records or to support oversight conversations with leadership.

Run the Risk Tiering Wizard

No login required. Works on any device with a browser. Preloaded examples span routine administrative tools to high-consequence decision support systems.

The five-dimension risk tiering framework draws on NIST AI RMF categorization guidance, OMB M-24-10 use case inventory requirements, and GIAG Stream One research on implementation fidelity patterns across federal and state agencies. The scoring rubric is documented in Working Paper One. The tool uses the Anthropic API for dimension scoring and synthesis. Each run is stateless.

Tier calibration feedback welcome. The tier boundaries and weighting scheme are working hypotheses subject to empirical refinement. If you apply the tool to use cases you have already tiered through other processes and find systematic discrepancies, that is valuable calibration data. Use the Engage page to share your findings.
Stream One  •  Active Prototype

AI RMF Implementation Fidelity Checker

Most agencies that claim NIST AI RMF adoption have documented the framework rather than operationalized it. This tool probes the difference. It evaluates implementation across all four AI RMF functions using a three-point fidelity scale that distinguishes documented compliance, partial operationalization, and genuine use. The output identifies which functions are performing at fidelity and which represent governance gaps that policy artifacts do not reveal.

GovernIs AI risk governance embedded in organizational structure?
MapAre AI risks systematically identified and contextualized?
MeasureAre risk levels assessed against defined criteria?
ManageAre risks treated and tracked with documented accountability?
Documented
Policy exists, practice does not
Partial
Some functions operationalized
Operationalized
Framework in active use
1

Describe your current RMF implementation

Provide a description of how your organization currently implements or references the NIST AI RMF, including which functions have formal documentation, where active processes exist, who owns implementation, and how the framework connects to actual AI deployment decisions. Candid descriptions of partial or nominal compliance produce more useful diagnostic output.

2

The checker probes each function

The tool generates targeted diagnostic questions for each of the four RMF functions, then evaluates your responses against the three-point fidelity scale. The probing sequence is designed to surface the gap between policy documentation and operational practice, the gap that aggregate compliance metrics consistently obscure.

3

Review the fidelity profile and gap analysis

The output presents a function-by-function fidelity score, a composite implementation profile, and a prioritized gap analysis. The advisory identifies the most consequential fidelity gaps and suggests specific operationalization steps calibrated to your organization's current state. Results are suitable for use in maturity assessments or implementation planning.

Run the Fidelity Checker

No login required. Works on any device with a browser. Typical completion time is ten to fifteen minutes for a thorough description of current RMF implementation.

The fidelity framework and three-point scale are documented in GIAG Working Paper One: "Implementation Fidelity: Why AI RMF Adoption Metrics Are Measuring the Wrong Thing." The tool operationalizes the paper's core argument that adoption rates and framework citations measure presence, not use. The probing questions and scoring rubric are grounded in the diagnostic methodology developed through GIAG Stream One practitioner research. Each run is stateless. The tool is a research prototype, not a certified audit instrument.

Practitioner assessment data welcome. If you run the fidelity checker against your organization's actual RMF implementation and find the diagnostic questions missing key operational factors, or the fidelity scale failing to capture important distinctions, that feedback strengthens the underlying research methodology. Use the Engage page to share your observations.