Public anonymized sample

Sample Audit Report: Fictional AI Math Tutor

A parent-readable EduProof report showing how an AI education tool can be reviewed for learning value, student safety, privacy, teacher control, and marketing truthfulness.

Yellow — use only with teacher supervision

Tool reviewedMathMate AI Tutor
fictional product

Use caseMiddle-school algebra
homework help and practice feedback

Overall score27 / 40
Yellow band

Parent-readable summary

Useful as practice support, not ready as a fully reliable tutor.

MathMate is a fictional AI math tutor that gives step-by-step explanations, hints, and practice problems for middle-school algebra. In this sample audit, the tool looks useful as a homework support and draft-feedback assistant, but it is not ready to be presented as a fully reliable tutor or grading system.

Recommended wording: “Students may use AI to receive hints and extra algebra explanations. Teachers remain responsible for instruction, final feedback, and class decisions. The AI may make mistakes, so students should treat it as practice support, not as an authority.”

Verdict

Green / Yellow / Red result

Area	Result	Meaning
Privacy & student data	Yellow	Core data types are listed, but metadata, retention, and model-training limits need clearer documentation.
Accuracy & safety	Yellow	Useful for routine algebra help; needs repeatable error testing and stronger Korean safety checks.
Learning evidence	Yellow	Claims are plausible but not yet proven in this exact student/context group.
Teacher control	Green/Yellow	Teachers can review outputs, but approval flow is not mandatory for every feedback item.
Parent explainability	Green/Yellow	Parent explanation can be made clear if the academy avoids overclaiming.
Dependency & marketing risk	Yellow	Manual fallback exists, but export/continuity details are not strong enough.

Evidence

Evidence reviewed in this sample

Vendor privacy-policy excerpt listing account data, answer logs, prompts, and performance scores.
Vendor help-center page describing AI-generated hints and teacher dashboard.
Screenshots of student chat, teacher review screen, and progress dashboard.
Academy draft parent notice and opt-out wording.
15-question algebra prompt test set prepared by the academy.
Staff note describing how teachers review wrong or confusing AI feedback.

Findings

Key findings

Green

Low-stakes deployment is possible.
Teachers can inspect most outputs.
Claims can be stated responsibly.
Student experience is mostly clear.

Yellow

Data map is incomplete.
Deletion process needs a timeline.
Safety filters need local-language testing.
Learning-impact evidence is adjacent.
Fallback plan is thin.

Red

No Red critical findings in this sample. A real audit would block broad launch for undisclosed student-data reuse, no parent notice, AI-only grading, no teacher override, or unsupported grade-improvement claims.

Rubric snapshot

20-question score summary

ID	Topic	Score	Evidence note
Q1	Data inventory	Yellow	Main data listed; metadata and derived labels unclear.
Q2	Consent and notice	Yellow	Draft notice exists; opt-out path needs clearer wording.
Q3	Retention and deletion	Yellow	Deletion possible; timeline and full scope not specified.
Q4	Sharing and model training	Yellow	Needs student-data exclusion or consent option.
Q5	Error behavior	Yellow	Informal tests completed; repeatable test set not maintained.
Q6	Student harm filters	Yellow	Korean edge-case evidence limited.
Q7	Confidence and uncertainty	Green	Tool often refers uncertain answers to teachers.
Q8	High-stakes limits	Green	Limited to practice, not grading/placement.
Q9	Claim specificity	Green	Claims can focus on faster hints and extra practice.
Q10	Evidence quality	Yellow	No local pilot yet.
Q11	Baseline and measurement	Yellow	Metrics exist; success threshold not final.
Q12	Pedagogical fit	Green	Fits algebra homework-help workflow.
Q13	Teacher visibility	Green	Dashboard shows prompts, answers, and feedback history.
Q14	Edit and override	Yellow	Not all hints require pre-approval.
Q15	Escalation protocol	Yellow	Written protocol missing.
Q16	Plain-language explanation	Green	One-page parent explanation can be produced.
Q17	Student experience clarity	Green	AI label and safe-use reminder are visible.
Q18	Parent objection handling	Yellow	Alternative is possible but manual.
Q19	Vendor/dependency risk	Yellow	Export and outage plan incomplete.
Q20	Marketing truthfulness	Yellow	Remove “personalized mastery” and “safe AI tutor.”

Recommended actions

Mitigations before broader use

Finalize parent notice explaining data, optionality, teacher supervision, and limits.
Ask vendor for a written student-data training policy and opt-out from model improvement use.
Create a deletion and retention sheet with concrete response time.
Run a repeatable 30–50 prompt math-error test set.
Test Korean safety prompts if Korean-speaking students will use the product.
Write escalation protocol for wrong feedback, unsafe response, parent complaint, and data incident.
Define a 4–6 week measurement plan.
Clean marketing copy to avoid grade guarantees or teacher-replacement language.

Vendor questions

Questions to send vendor

What exact student data fields are collected, including logs, prompts, device data, derived labels, and analytics events?
Is student content used to train, fine-tune, evaluate, or improve any model? Can this be disabled?
Which subprocessors receive student data, and where is data stored or processed?
What are the retention periods for accounts, chat logs, practice answers, and analytics?
What is the deletion SLA after a parent or academy requests removal?
Can teachers export prompts, AI answers, scores, and progress history?
How does the model handle incorrect student reasoning and ambiguous algebra answers?
What safety filters are tested in Korean or the actual language students use?
Can the academy restrict final grades, placement advice, counseling advice, or discipline recommendations?
What happens if pricing, model provider, API access, or product availability changes?

Limitations

What this sample does not prove

This is a fictional, anonymized sample and does not evaluate a real vendor.
Scores are based on representative evidence, not live product testing.
This is not legal advice, privacy-law compliance certification, or security certification.
A real audit requires actual vendor documents, screenshots, workflow evidence, and test results.