Text Analysis



Vitrana is a thought leader in text analytics and is developing new generation tools and products that are changing the way organizations make decisions. We develop and implement techniques to extract useful information from unstructured text wherever it is embedded, from journal abstracts and articles to free text fields in structured documents such as electronic health records. Vitrana has a strong hold in R&D for NLP and many of Vitrana employees are key contributor to open source projects in this domain.

Unstructured Data

Unstructured data is a key term used to refer to data with no pre-defined schema or data model that dictates the layout of the information contained. Such data is typically very text heavy and often free form or open ended in nature, example text files, resumes etc. Typically most of the data within an organization will exist in an unstructured form. In addition, with the rise of social media, blogging and electronic storage of documents, the amount of unstructured data that is getting generated is humongous.


Analyzing such text sources is a challenging problem that many companies are now facing. Fortunately there are many tools and approaches to conduct this type of analysis, which draw upon techniques from Natural Language Processing, or NLP.

What We Provide

Our services include:

  • Information Extraction – mining for important named entities, the phrases they co-occur with, and relationships between entities, high-throughput lexical-based extraction of key document features.
  • Semantic Search – Search your documents using semantic relationship between the search keys and the text of the document. For example if a document contains information about the drug and its adverse event so if a user asks a question like what other drugs have similar events for some other therapy area, that question can then be answered using this process.
  • Topic Modeling – Given a collection of documents, automatically categorize them into a number of different topics.
  • Text Summarization – provide modern day algorithms that can summarize huge documents using HMM techniques.
  • Data analysis and text classifier development.
  • Ontology-based document indexing and knowledge extraction.