Machine Learning natural language processing text vectorization

    The Beginner’s Guide to Text Vectorization

    By Divya Susarla on September, 27 2017

    Stay up to date

    Back to main Blog
    Divya Susarla

    This blog post was originally posted on MonkeyLearn by Rodrigo Stecanella

    Since the beginning of the brief history of Natural Language Processing (NLP), there has been the need to transform text into something a machine can understand. That is, transforming text into a meaningful vector (or array) of numbers. The de-facto standard way of doing this in the pre-deep learning era was to use a bag of words approach.

    Bag of words

    The idea behind this method is very simple, though very powerful. First, we define a fixed length vector where each entry corresponds to a word in our pre-defined dictionary of words. The size of the vector equals the size of the dictionary. Then, for representing a text using this vector, we just count how many times each word of our dictionary appears in the text and we put this number in the corresponding vector entry.

    Read more from this blog post here.

    Submit a Comment

    Stay up to date