By Kim Xi Harris Founder & Platform Architect, Lex Arca™ Legal Vault | Calculate your firm’s billing leakage | legalvault@lex-arca.com
According to Clio’s 2026 Legal Trends Report for Solo and Small Law Firms (May 2026, https://www.clio.com/about/press/2026-solo-small-firm-report/), 71% of solo practitioners and 75% of small firms are now using AI to complete legal work — yet fewer than 33% have seen any revenue increase from it, compared to nearly 60% of enterprise firms. The gap between AI adoption and AI compliance is not a policy problem. It is an architecture problem.
Anthropic’s Claude for Legal toolkit — released May 12, 2026 — can produce a litigation chronology that covers up to 86% of a professional rubric’s criteria. That is genuinely useful. It is also not a compliance record. An AI-generated timeline is analysis. A Verification Attestation backed by an append-only, tamper-evident activity trail is documentation. In 2026, courts and bar associations require both — and most AI legal tools are only built for one of them.
What Did Anthropic’s Claude for Legal Actually Produce?
An independent benchmark published May 22, 2026 by Overfitting Dicta tested Claude for Legal’s chronology skill — the litigation plugin that extracts events from a case file, tags each event’s significance, flags evidentiary gaps, and builds a timeline. Tested against Harvey Labs’ build-litigation-case-timeline task (a synthetic breach of contract case, Harborview v. Greenleaf), across eleven model and effort combinations, and scored against a 66-criterion rubric by two independent AI judges (Opus 4.7 and GPT-5.5), the results were meaningful.
Sonnet and Opus models at medium effort and above covered 76% to 86% of the rubric criteria. Runs consumed 85,000 to 249,000 tokens and took 2 to 16 minutes. Top performance was tied between Opus at Medium effort and Opus at Max effort — both at 86.4% coverage, 57 of 66 criteria.
That is a real capability. Any solo attorney or paralegal who has spent three hours manually building a case chronology from discovery documents will understand immediately what a structured, AI-extracted timeline means for case preparation speed.
Performance scaled with effort level — not just model size. Low-effort runs across all models landed in a 40%–62% coverage band. The jump to medium effort pushed capable models into the 76%–86% range.
The benchmark also identified the consistent weak spot: issue spotting and damages assessment were sparse across all model and effort combinations. The more the task required legal judgment — not just event extraction — the more coverage thinned out. The two judge models disagreed most often on those criteria, which the author flagged as a reliability signal.
What Does the Benchmark Not Measure — and Why Does That Matter for Your License?
The Overfitting Dicta author is explicit about this, and it is worth quoting the structure of what they left unanswered: Does rubric coverage approximate analysis quality? Which categories of legal analysis is AI well-suited to perform? Is the output good legal work product?
Those are empirical questions the benchmark deliberately does not answer. But there is a fourth question the benchmark does not ask at all — and it is the one that connects AI output to attorney ethics obligations:
What happens to the markdown chronology file after Claude for Legal produces it? Who documented that the attorney read it? When? What did the attorney verify? What is the compliance record that proves the output was reviewed before it touched a filing?
Claude for Legal is an open-source toolkit that runs inside Claude Code or Claude Cowork. It produces a markdown file. That file has no audit trail, no attorney certification lock, no jurisdictional compliance gate, and no signed attestation certificate. The output is the work product. The documentation that satisfies ABA Formal Opinion 512 is a separate problem — one the toolkit was not designed to solve.
Under ABA Formal Opinion 512, every attorney using AI has three enforceable obligations: competence (Model Rule 1.1 — personally verify AI output), communication (Model Rule 1.4 — disclose material AI use to clients), and fees (Model Rule 1.5 — bill AI-assisted time accurately). None of those obligations are satisfied by receiving a well-structured markdown file. Visit Lex Arca’s AI compliance in your litigation practice isn’t a checklist problem — it’s an architecture problem to understand why the distinction is architectural, not procedural.
What Are Court Orders Requiring in 2026 That AI Output Alone Cannot Satisfy?
The benchmarked compliance gap is not abstract. As of 2026, more than 300 standing court orders govern AI use in filings — over 200 of those were added in the second half of 2025 alone. Specific jurisdictions have moved to enforcement.
- Florida Administrative Order SC2026-0673 / AOSC26-12 (effective June 15, 2026): Attorneys must personally certify AI-assisted filings — not just disclose, but certify review.
- New York 22 NYCRR Part 161 (effective June 1, 2026): Affirmative disclosure requirements for AI-assisted court submissions.
- Colorado SB 26-189 (signed May 14, 2026, effective January 1, 2027): Comprehensive AI governance obligations for legal practitioners.
- Texas federal courts and California federal judges: Attorneys must personally certify they reviewed every AI-assisted statement before submission.
The DOJ terminated an attorney in March 2026 after AI-fabricated citations appeared in a federal brief — identified and reported to the court by a pro se plaintiff. A Florida attorney faced $86,000 in sanctions for AI hallucinations submitted across multiple federal cases, including in his response to the court’s own show-cause order. Three attorneys at a 350-person firm were disqualified from their case and referred to their state bars.
None of those attorneys lacked AI capability. They lacked AI documentation — a verifiable, attorney-signed record that they personally reviewed the output before it filed.
What Is the Difference Between an AI-Generated Chronology and a Compliant AI Activity Trail?
This is the question the benchmark implicitly raises by stopping where it does. The distinction is structural.
An AI-generated chronology is analysis. It extracts, organizes, and tags case events. It is useful for case strategy, witness preparation, and discovery mapping. Claude for Legal’s chronology skill produces this. At medium effort with Sonnet or Opus, it produces it well.
A compliant AI activity trail is documentation. It records, with cryptographic timestamps, what documents an attorney accessed, when, and in what sequence. It generates a Verification Attestation — a signed certificate that can accompany a filing or sit in a matter file as proof that attorney review occurred. It satisfies ABA Formal Opinion 512 not by describing a workflow, but by recording one.
Lex Arca™ Legal Vault provides a Verification Attestation certificate and an append-only, tamper-evident activity trail — generating the documented compliance record ABA Formal Opinion 512 requires before the filing goes out. The platform is architecturally excluded from your data — not by policy, not by contract, but by design. See how the ABA Opinion 512 compliance workflow applies directly to how AI billing must be handled in small firm practice.
Claude for Legal tells you what happened in the case file. Lex Arca™ Legal Vault documents that you verified it, when you verified it, and certifies that the attorney reviewed it before it touched a filing.
From Kim’s Chair: The Questions I Would Have Asked
I did not build Lex Arca™ from studying benchmark reports. I built it from a chair — the client’s chair — where I watched the gap between AI capability and AI accountability show up in real time. When I read a study showing that an AI tool can produce an 86% coverage chronology from a case file, I do not see a legal tech milestone. I see the client who trusts that the attorney reviewed it — and who has no mechanism to verify that they did.
If I were in that room as the client, here is what I would ask the room:
- If AI can build a near-complete case chronology in 16 minutes, why is there no industry standard for documenting that the attorney actually read it?
- At what point does ‘I used a compliant tool’ stop being a defense and start being the malpractice — when the tool produced the output but nothing recorded the review?
- Who in this room has a compliance record that would survive a sanctions motion — not just a subscription to an AI product, but a documented, verifiable trail of attorney review?
- The benchmark authors explicitly stopped short of evaluating legal quality. Who in the legal tech industry is building the infrastructure to answer that question at scale?
And if I were your client — sitting across from you before you walked into that courtroom — here is what I would have asked you:
- When the AI built your timeline of my case, did you read every entry — and is there a record that proves it?
- If opposing counsel challenges the accuracy of a document or timestamp in that chronology, can you show a court exactly when you accessed it and what you verified?
- Do I have any way to know whether the tools you used on my case have been approved for use in this jurisdiction — before we find out in a sanctions motion?
- If you receive a request for your AI use records during discovery, what exactly would you produce?
These are not hostile questions. They are the questions that documentation answers — and the silence where documentation does not exist.
Key Takeaways
- Anthropic’s Claude for Legal toolkit, benchmarked independently in May 2026, produced litigation chronologies covering up to 86.4% of a 66-criterion professional rubric — a genuine capability for case analysis and trial preparation.
- AI-generated output and AI compliance documentation are two separate obligations: analysis produces a work product; compliance requires a documented, attorney-certified activity trail that satisfies ABA Formal Opinion 512 under Model Rules 1.1, 1.4, and 1.5.
- More than 300 court orders now govern AI use in filings, with Florida, New York, Colorado, Texas, and California all imposing explicit attorney certification requirements — none of which are satisfied by a markdown output file alone.
- Attorneys evaluating Claude for Legal should ask a second question after testing the output: where is the compliance record?
- Lex Arca™ Legal Vault provides a documented, verifiable AI activity trail — including a Verification Attestation certificate and append-only, tamper-evident audit record — designed to support attorney compliance workflows.
- Calculate your firm’s billing leakage and get early access at https://calculator.lex-arca.com.
About the Author: Kim Xi Harris is the Founder and Platform Architect of Lex Arca™ Legal Vault, an AI-native litigation intelligence and compliance platform for solo and small-firm attorneys. She is a Cornell Women’s Entrepreneur Program graduate, SBA Women in Business Champion Award recipient, WOSB certified, and holds five Google AI certifications. Calculate your firm’s billing leakage at https://calculator.lex-arca.com — or reach us at legalvault@lex-arca.com.