Imagine you are sitting on tons of data that needs to be processed to be understood, or Big Data which needs to be processed every day. This data is sourced from a variety of inputs. A lot of it is documents, but there are also emails, voice recordings in mp3 format, text-based news clips, radio interviews, websites, third-party documentation in PDF formats, scanned images and text, videos. Or you may receive key financial information from a variety of banks, funds and financial institutions, from which you just to compile key information like names of people and names of institutions, exchange rates between a couple of currencies.
Pangea’s Knowledge Discovery tool is here to precisely do that for you: it doesn’t matter the source, we will turn it into text so it can be processed. Through a series of NLP techniques, we will structure the data so that key information can be extracted for you in a user-friendly format. This can be a list of actions or people doing things, keywords, amounts extracted from tables in different shapes, key sentences or a whole tagging of the material so any type of actionable insight can be taken in the future.
Knowledge Discovery must not be confused with e-Discovery as it does not only extract information from text from a given list of keywords, leaving the source intact. K-Discovery provides a structure to the source text so that machine learning and machine mining methods can be applied for any type of structured information retrieval at a later stage. It is different from Summarization because it does not intend to create an abstract representation of the meaning for quick processing by humans (that can be a final use of Knowledge Discovery). It applies knowledge to the source so many types of actions can be undertaken, and several types of usages can be derived from it.
Typical Knowledge Discovery Use Cases
- Create csv / spreadsheets with financial data coming from central banks and financial institutions on currency predictions
- Provide a summarized report on a subject extracting data from several sources
- Tag Big Data with typical NLP tags for machine learning
- Extract names, places and actions from a series of TV programs, internet videos or radio interviews, using speech-to-text first
- Find sentiment from social media inputs and tag each tweet or comment as positive or negative
- Automatically classify documents according to a pre-defined domain and create an abstract