What is Entity Extraction and Why Should You Care?

What is Entity Extraction?

Entity extraction, aka. “entity recognition”, allows computer systems to grab chunks of data from documents and enter them into separate fields in a database. For instance, they can sort through emails, webpages, or even massive WikiLeaks documents and find names, places, addresses, phone numbers, or key phrases and sort them into useful databases.

Entity extraction uses Natural Language Processing (NLP) to find mentions of specific data and turn them into structured data. How this data is used can be either “good” or “bad,” depending on your frame of reference. Government agencies can use it to sort through massive quantities of data to track down terrorists. Advertisers can use it to target you with ads that you might be interested in and not show you advertisements that are just a waste of your time. And thieves and scammers can use it to find potential targets.

How it Works

Entity extraction systems must solve several language difficulties to identify and classify data effectively. Although distinguishing between different types of names (e.g., person, location, organization, product, etc.) is simple for humans, the ambiguities of language make this a challenging task for artificial intelligence (A.I.) robots.

A keyword-based system can’t distinguish between all of a word’s possible meanings or how it’s employed. For example, “table” could refer to furniture or a chart illustration, but a keyword search may have difficulty telling the difference.

Entity extraction software uses a combination of rules based on pattern matching, linguistics, syntax, and semantics. It decodes the meaning and comprehends context, allowing for a variety of uses in various business tasks across multiple industries. Entity extraction creates connections between knowledge repositories. An entity extraction tool may identify all places, people, or items referenced across various documents and link them to a map location or cross-reference entities with other data sources.

Entity Extraction
Photo by Markus Spiske from Pexels

The Dark Side of Entity Extraction

Privacy:

The 4th amendment of the U.S. Constitution states: “The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.”

But increasingly the government is mining text messages, phone calls, and other data sources for evidence of a crime without a warrant using entity extraction techniques. This was the major premise of the 2013 Edward Snowden leak. Snowden revealed numerous global surveillance programs, run by the NSA and other Intelligence Agencies, including Australia’s ASD, the U.K.’s GCHQ, and Canada’s CSEC. with the cooperation of telecommunication companies and other European governments. Snowden claimed that: “I, sitting at my desk [could] wiretap anyone, from you or your accountant to a federal judge or even the president, if I had a personal email.” Snowden claimed that 90% of those placed under surveillance in the U.S. are ordinary Americans and are not the intended targets. He also claimed that some of the surveillance was for industrial espionage purposes and not National Security. Snowden claimed that in 2013 alone, U.S. spy agencies paid private tech companies $52 billion for clandestine access to their communications networks.

Healthcare:

According to Lexalytics: Experts have raised concerns about the ethical implications of healthcare data storage and data security practices for years, and A.I. is taking up a larger share of that conversation every day. Current laws aren’t enough to protect an individual’s health data. In fact, a shocking study from the University of California Berkeley says that advances in artificial intelligence have rendered the Health Insurance Portability and Accountability Act of 1996 (HIPAAobsolete, and this was before the COVID-19 pandemic… 

HIPAA also fails to regulate genetics testing companies like Ancestry and 23andMe. These companies, which analyze your DNA to give you information about your health, traits and ancestry, don’t legally count as a healthcare service. And so, HIPAA rules don’t apply… These companies can and do legally sell their customers’ genetic data to pharmaceutical and biotechnology firms.

Lexalytics further states that insurance companies could use your genetic data to discriminate against individuals. Unfortunately, during the pandemic HIPPA rights have been increasingly eroded. On the positive side, the European Union has instituted their General Data Protection Regulation (GDPR) which mandates that organizations must have informed user consent before they collect sensitive information. Unlike HIPPA, GPDR doesn’t distinguish between Health providers and Tech companies, so at the moment Europeans have better privacy protection than Americans.

Conclusion

Entity extraction is an efficient method for retrieving specific information from massive documents and data sources. Although the process is complex and computer-intensive, this technology is a game-changer. It has both the potential to be used for good or evil and its use needs to be regulated in order to ensure our Constitutional rights.

You might also like:

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top