Siemens Big Data Analysis
Every day, more articles of unstructured data populate the internet where they are left untouched. Much of it contains raw text, unlabeled and unclassified. Analyzing these enormous amounts of data can lead to making discoveries and trends amongst it. This information, when fully realized, can be utilized to find unknown relationships between the entities it contains, such as people, businesses, or other groups. Taking these unstructured documents, we use natural language processing and named entity recognition to identify entities of people, organizations, and locations, and recognize the connections which link them together. Implementing Latent Dirichlet Allocation (LDA), the similarity between documents is discovered. And together, the relevancy of documents and the similarities of the entities they share can shed light on connections previously undiscovered.