Thursday, January 16, 2014

What is sentiment analysis?


How to use text mining for predictive analytics?


Wednesday, January 1, 2014

What's the difference between text analytics and natural language processing?

Phrases like "text analytics/text mining", and "natural language processing" are often used interchangeably. While there are considerable topics of overlap between the subjects that come under these terms, they are not identical.

Text Analytics/Mining

  • Getting structure and patterns out of textual data (a type of data mining; traditional data mining deals with structured data)
  • Heavily uses natural language processing concepts & tools to extract structure
  • Uses rules or statistics to impose or derive structure
  • Tasks like natural language generation are usually not part of this analysis, being more of a synthesis task
  • May use information retrieval concepts such as as tf-idf, bag of words etc.; thus text mining may do more shallow language processing rather than any deep linguistic analyses.


Natural Language Processing

  • A mixture of linguistics, computer science, and maths
  • Usually considered a division of artificial intelligence studies
  • Both analysis and generation tasks are part of this
  • Often described as getting computers to understand natural language

Introduction

Hello! Text Analytics and Natural Language Processing are rapidly growing fields in computer science and information technology. Since it is such a vibrant field, diverse lenses can be applied to them for study: research, teaching, software development, business, tools and resources, apps and applications. It's a lot of fun!

 In this blog, I will cover information such as introductions to different aspects of the field, practical know-how of all kinds (ranging from how to study, applying for jobs, writing NLP papers etc.), and point you to interesting papers, applications, and discussions.

Additionally, the blog will have an India focus, since India is one of the leading centres of Text Analytics in many ways, given the sheer number of researchers & developers working in it, in what is still a niche area. There seem to be few resources that give a clear picture of what's happening in India in Text Analytics and related fields.

About the blogger
J. Ramanand studied computer science and engineering at COEP (Pune) and IIT Bombay (Mumbai). His first exposure to NLP came in 2004 during a course that he took while still working. Between 2006-2007, as a post-graduate student, he worked with Prof. Pushpak Bhattacharyya on research problems involving the world of wordnets.

Between 2007-2011, he worked at Cognizant Technology Systems as an applied researcher in areas such as sentiment analysis and information extractions in an innovation group. From 2011-2013, he worked at IBM's CIO Lab on an internal application that used the IBM Watson's DEEP QA system.

Since 2014, he co-runs Choose To Thinq, which applies the power of questions and quizzing to corporate learning, engagement, and innovation, and also works with schools and colleges in building and celebrating a "THINQ" mindset.

His current research interests are in natural language generation.