# EduProof Sample Audit Report — Fictional AI Math Tutor

**Public sample:** anonymized / fictional vendor  
**Tool reviewed:** MathMate AI Tutor (fictional)  
**Use case:** middle-school algebra homework help and practice feedback  
**Target students:** Grade 7–9  
**Report date:** 2026-07-04  
**Final verdict:** **Yellow — use only with teacher supervision and parent-facing explanation**

> This is a sample report showing EduProof's format. It is not an audit of a real company or product.

---

## Parent-readable summary

MathMate is a fictional AI math tutor that gives step-by-step explanations, hints, and practice problems for middle-school algebra. In this sample audit, the tool looks useful as a **homework support and draft-feedback assistant**, but it is **not ready to be presented as a fully reliable tutor or grading system**.

The strongest points are that teachers can review student progress, the AI usually explains uncertainty, and the academy can limit use to low-stakes practice. The main concerns are student-data clarity, incomplete Korean safety testing, limited evidence for learning improvement, and weak vendor continuity planning.

**Recommended parent-facing wording:**  
“Students may use AI to receive hints and extra algebra explanations. Teachers remain responsible for instruction, final feedback, and class decisions. The AI may make mistakes, so students should treat it as practice support, not as an authority.”

---

## Verdict

| Area | Result | Meaning |
|---|---|---|
| Privacy & student data | Yellow | Core data types are listed, but metadata, retention, and model-training limits need clearer documentation. |
| Accuracy & safety | Yellow | Useful for routine algebra help; needs repeatable error testing and stronger Korean safety checks. |
| Learning evidence | Yellow | Claims are plausible but not yet proven in this exact student/context group. |
| Teacher control | Green/Yellow | Teachers can review outputs, but approval flow is not mandatory for every feedback item. |
| Parent explainability | Green/Yellow | Parent explanation can be made clear if the academy avoids overclaiming. |
| Dependency & marketing risk | Yellow | Manual fallback exists, but export/continuity details are not strong enough. |

**Overall score:** 27 / 40  
**Overall rating:** **Yellow**

Reason: the score falls in the Yellow band. No critical item is Red in this sample, but several critical items are Yellow and require mitigation before broad deployment.

---

## Evidence reviewed in this sample

Because this is a fictional public sample, the “evidence” below is representative of what EduProof would request and inspect in a real audit.

- Vendor privacy-policy excerpt listing account data, answer logs, prompts, and performance scores.
- Vendor help-center page describing AI-generated hints and teacher dashboard.
- Screenshots of student chat, teacher review screen, and progress dashboard.
- Academy draft parent notice and opt-out wording.
- 15-question algebra prompt test set prepared by the academy.
- Staff note describing how teachers review wrong or confusing AI feedback.

---

## Key findings

### Green findings

1. **Low-stakes deployment is possible.** The academy plans to use the tool for practice and hints, not final grading or placement.
2. **Teachers can inspect most outputs.** The dashboard shows student questions, AI replies, and practice accuracy.
3. **Claims can be stated responsibly.** The tool can be described as “extra explanation and practice support” without promising grade improvement.
4. **Student experience is mostly clear.** The interface labels AI responses and can show a short “do not share personal information” reminder.

### Yellow findings

1. **Data map is incomplete.** The vendor lists student answers and account data, but device logs, derived performance labels, and prompt-retention details are unclear.
2. **Deletion process needs a timeline.** Deletion is described as available, but no concrete SLA is given.
3. **Safety filters need local-language testing.** General safety filtering exists, but Korean-language edge cases are not documented.
4. **Learning-impact evidence is adjacent.** Vendor claims are based on general practice-engagement data, not a controlled pilot in this academy’s age group and curriculum.
5. **Teacher override is available but not always built into workflow.** Teachers can correct outputs, but some student-facing hints appear immediately.
6. **Fallback plan is thin.** If the vendor changes pricing or has an outage, teachers can return to worksheets, but data export and continuity are not formalized.

### Red findings

No Red critical findings in this sample. EduProof would still block broad launch if any of the following appeared in a real audit: undisclosed student-data reuse for model training, no parent notice, AI-only grading, no teacher override, or unsupported marketing claims such as “guaranteed score improvement.”

---

## 20-question rubric snapshot

| ID | Category | Score | Evidence note |
|---|---|---|---|
| Q1 | Data inventory | Yellow | Main data listed; metadata and derived labels unclear. |
| Q2 | Consent and notice | Yellow | Draft parent notice exists; opt-out path needs clearer wording. |
| Q3 | Retention and deletion | Yellow | Deletion possible; timeline and full scope not specified. |
| Q4 | Sharing and model training | Yellow | Training policy says “service improvement”; needs student-data exclusion or consent option. |
| Q5 | Error behavior | Yellow | Informal algebra tests completed; repeatable test set not yet maintained. |
| Q6 | Student harm filters | Yellow | Vendor claims filters; Korean edge-case evidence limited. |
| Q7 | Confidence and uncertainty | Green | Tool often says “check with your teacher” on uncertain answers. |
| Q8 | High-stakes limits | Green | Academy policy limits tool to practice, not grading/placement. |
| Q9 | Claim specificity | Green | Acceptable claim: faster hints and extra practice, not grade guarantee. |
| Q10 | Evidence quality | Yellow | Evidence adjacent; no local pilot yet. |
| Q11 | Baseline and measurement | Yellow | Academy will track homework completion, but success threshold not final. |
| Q12 | Pedagogical fit | Green | Fits algebra homework-help workflow. |
| Q13 | Teacher visibility | Green | Dashboard shows prompts, answers, and AI feedback history. |
| Q14 | Edit and override | Yellow | Teachers can correct, but not all hints require pre-approval. |
| Q15 | Escalation protocol | Yellow | Staff know owner informally; written protocol missing. |
| Q16 | Plain-language explanation | Green | One-page parent explanation can be produced from current materials. |
| Q17 | Student experience clarity | Green | AI label and safe-use reminder are visible. |
| Q18 | Parent objection handling | Yellow | Non-AI worksheet alternative possible but operationally manual. |
| Q19 | Vendor/dependency risk | Yellow | Manual fallback exists; export and outage plan incomplete. |
| Q20 | Marketing truthfulness | Yellow | Current wording needs removal of “personalized mastery” and “safe AI tutor.” |

---

## Required mitigations before broader use

1. **Finalize parent notice** explaining data collected, purpose, optionality, teacher supervision, and limits.
2. **Ask vendor for a written student-data training policy** and option to disable model improvement use if applicable.
3. **Create a deletion and retention sheet** with retention windows, deletion request contact, and expected response time.
4. **Run a repeatable 30–50 prompt math-error test set** including wrong assumptions, ambiguous student answers, and misleading worked solutions.
5. **Test Korean safety prompts** if Korean-speaking students will use the product.
6. **Write escalation protocol** for wrong feedback, unsafe response, parent complaint, and data incident.
7. **Define measurement plan**: baseline homework completion, reattempt rate, teacher correction rate, and student confusion reports over 4–6 weeks.
8. **Clean marketing copy** to avoid grade guarantees, teacher replacement language, or unsupported “fully personalized” claims.

---

## Recommended vendor questions

1. What exact student data fields are collected, including logs, prompts, device data, derived labels, and analytics events?
2. Is student content used to train, fine-tune, evaluate, or improve any model? If yes, can this be disabled for school/academy customers?
3. Which subprocessors receive student data, and in which countries is data stored or processed?
4. What are the default retention periods for accounts, chat logs, practice answers, and analytics data?
5. What is the deletion SLA after a parent or academy requests removal?
6. Can teachers export prompts, AI answers, scores, and progress history in a usable format?
7. How does the model handle incorrect student reasoning and ambiguous algebra answers?
8. What safety filters are tested in Korean or the actual language students use?
9. Can the academy restrict the AI from giving final grades, placement advice, counseling advice, or disciplinary recommendations?
10. What happens if the vendor changes pricing, changes the underlying model, loses API access, or shuts down the product?

---

## Parent-facing recommendation

**Use with conditions.** MathMate-style AI tutoring can help students get extra algebra practice, but it should be introduced as a supervised support tool. Parents should receive a short explanation before use. Teachers should review error cases and keep a non-AI alternative available.

**Do not say:** “This AI guarantees better grades” or “This replaces teacher feedback.”  
**Safer wording:** “This AI gives extra practice hints. Teachers remain responsible for instruction and final feedback.”

---

## Limitations of this sample report

- This is a fictional, anonymized sample and does not evaluate a real vendor.
- Scores are based on representative evidence, not live product testing.
- The report is not legal advice, privacy-law compliance certification, or security certification.
- A real audit would require actual vendor documents, product screenshots, academy workflow evidence, and test results.
- A Yellow rating is not a permanent approval; it means controlled use is possible only if mitigations are completed.
