AI Idea: Virus Scanning, For Legal Docs

The What

You got a legal document to sign. You’re thinking to yourself “should I get a lawyer to take a look at this?”

You’re not quite sure how to do that. Also, won’t that be expensive? Maybe you’ll just examine the document yourself. You start leafing through it. Seems long. Some of the words are big. “I should really get a lawyer to look at this”, you think to yourself.

A few days later, you’re done procrastinating. You sign the document without getting that lawyer. How bad could it be?

How about this instead: you get the document in your inbox. If it’s is kosher for you to sign, you to see a big green checkbox next to it. If there are any weird clauses, you see red. Think attachment virus scanning, but for legal documents.

The How

Crowdsource all the legal documents in the world. Create 5 bullet summary for all the common archetypes you detect (standard lease, YC SAFE, etc).

Diff every document you get against the most similar archetype. Didn’t find any changes? Mark the document as green and provide the user the summary (“standard SF apartment lease”). If you find changes, flag the document red. Present the user with the changes, and a “Explain” button. If they click it, charge the user $10 and send the diffs to a paralegal. Have them quickly summarize any changes with additional bullet points (“you can’t sub-lease this apartment” or “the company is a Florida LLC, which is uncommon”) and relay that to the user.

Over time, you should be able to create more archetypes (you’ll have a template for an apartment-lease-with-a-sublease-clause), so you’ll need to rely on your paralegals less and less. One day, you’ll be able to use AI(TM) to automate the process entirely.

The Market

The current market size for online legal outsourcing is a modest $1B, growing at 30% a year. But I think this company’s biggest market is non-consumption. People are using Uber more than they did taxis because of ease of use.

The Gotchas

  1. Bootstrapping the dataset might be tricky.
  2. Ideally you want to diff based on substance, not characters. One approach might be to “shingle” the document into logical chunks and diff those blocks against each other.

Why is is this a bad idea? I’d love to get some feedback from lawyers. If it isn’t, build it and let me know!