Day 1 of #100DaysOfMLCode – Text Classification

This is day 1 of #100DaysOfMLCode.

I am happy to try some technique in machine learning, basically text classification.

I had gone through various articles and I stumbled across one, which was clear in its explanation and categorization of steps on how to achieve text classification using TensorFlow.

So, I quickly read through the article and found out some of the techniques which we basically need, namely: Stemming, Tokenization, Bag of words model etc. These techniques transform the input text into a format which is easy to process by the machine learning model. Finally, the machine learning model was built using TFLearn, which is a wrapper on top of Tensorflow.

So, my goal was to run it quickly and see the output. Fortunately, we have Google Colaboratory, which is a online jupyter notebook allowing developers to try out code on top of the free GPUs hosted by Google.

Please find the original article, which I referred to where I learnt about the Text Classification in Tensorflow part here –> https://sourcedexter.com/tensorflow-text-classification-python/

Now about the code, I am copying the code from Google Colaboratory to my github repository, under Day1. Please find the github link here –> https://github.com/inspire99/100DaysOfMLCode/blob/master/Day1/TF_Text_Classification.ipynb

Please find Google Colaboratory link here –> https://colab.research.google.com/drive/1qMCbx7pS8QzKDzkwUmEymx5FNncqX4OG

Requirements to run above code: data.json file should be uploaded to google drive.
The file data.json should contain the following input information:
{
“time”: [“what time is it?”, “how long has it been since we started?”, “that’s a long time ago”, ” I spoke to you last week”, ” I saw you yesterday”],
“sorry”: [“I’m extremely sorry”, “did he apologize to you?”, “I shouldn’t have been rude”],
“greeting”: [“Hello there!”, “Hey man! How are you?”, “hi”],
“farewell”: [“It was a pleasure meeting you”, “Good Bye.”, “see you soon”, “I gotta go now.”],
“age”: [“what’s your age?”, “How old are you?”, “I’m a couple of years older than her”, “You look aged!”]
}
You can get the drive id of your uploaded file: get shareable link, then copy the drive id of the file to the above code.
I ran the code and got the below output for the input:
Input:
sent_1 = “what time is it?”
sent_2 = “I gotta go now”
sent_3 = “do you know the time now?”
sent_4 = “you must be a couple of years older then her!”
Output:
Training Step: 2999 | total loss: 0.10340 | time: 0.009s | Adam | epoch: 1000 | loss: 0.10340 – acc: 0.9910 — iter: 16/19 Training Step: 3000 | total loss: 0.53846 | time: 0.016s | Adam | epoch: 1000 | loss: 0.53846 – acc: 0.8919 — iter: 19/19 — INFO:tensorflow:/content/model.tflearn is not in all_model_checkpoint_paths. Manually adding it.
time
farewell
time
age
For knowledge sharing, I have prepared a small ppt, which contains a summary of original article related to Text Classification.
Thanks to Akshay from SourceDexter for the article on Text Classification using Tensorflow.
Thanks to Google Colaboratory project for providing the platform to run the code and seeing the results.
You can follow me on twitter –> https://twitter.com/gansai9
You can follow my wordpress blog, to get more updates on my progress in #100DaysofMLCode
Thanks for reading folks. Please leave any comments to share your views.
Advertisements

Distributed System Learning Notes – Cassandra & Related

Cassandra

  • Scalable NoSQL Database
  • Open Source
  • Can handle large amount of structured, semi-structured, unstructured data across multiple data centres and cloud
  • Highly available
  • Provides linear scalability and operational simplicity
  • Data Model offers column indexes
  • Denormalization support
  • Materialized views
  • Performance similar to log-structured updates
  • Powerful built-in caching
  • Classified as an AP system ( Availability and Partition Tolerance )
  • Masterless architecture
  • Nodes participate in a Cassandra ring – data gets distributed or partitioned across nodes transparently.
  • It can be configured to replicate data across data centers or multiple data centers or multiple cloud available systems. If one node goes down, other nodes have data belonging to this node. So, replication is supported and is configurable.
  • Linearly scalable – If 2 nodes can handle 1 lakh transactions per second. You can add 2 more nodes to the system, so it can handle 2 lakh transactions per second. Making it 8 nodes, can tackle 4 lakh transactions per second.

Figure 1

 

What is Linear Scalability?

  • Application is said to be linearly scalable if it can scale, with addition of nodes, without change in application code.

What are materialized views?

  • Normal views are like: storing results of query in a virtual table ( it doesnot physically exist ). Materialized views are like: we want it to be materialized. So, the underlying database system, stores results of query into actual table underneath. Materialized views are used for improving performance, & stability.

What is CAP Theorem?

  • Also called Brewer’s Theorem.
  • It is impossible for a distributed system to simultaneously provide all 3 guarantees:
    • Consistency – guarantee that all nodes see the same data at same time
    • Availability – guarantee that every request receives response whether it succeeded or failed
    • Partition tolerance – guarantee that despite partial failure of system OR arbitrary message loss, whole system continues to operate

 

 

Useful Resources

 

 

 

 

Udemy Course Log 1 – Complete Guide to Tensorflow for Deep Learning with Python

Below is course log for 1st video – Introduction – Lecture 1

deep-learning-machine-learning-tensorflow

Udemy Course Link

  1. What is objective of this course?
    1. For learners to become proficient in deep learning techniques with TensorFlow deep learning framework.
  2. What will the learners be exposed to?
    1. Basic crash course in Python and essential data science libraries such as numpy, matplotlib, scikit-learn, pandas
    2. Learn about few machine learning models such as Densely connected network, Word2Vec, Recurrent Neural Network which deals with continuous stream of data such as time-series.
    3. Learn about neural networks – perceptron, activation functions, back-propagation etc.
  3. Is there a Q&A forum ?
    1. This course has a Q&A forum and gitter chatroom as well.

At the outset, this course would give some elementary knowledge in data science libraries in python, help learners to understand few techniques in TensorFlow.

Following are my questions in context of learning about Tensorflow:

  • Can a learner with an average laptop setup , be able to execute program written using Tensorflow?
  • What should be the ideal configuration required for practicing Tensorflow?
  • Is GPU essential to run Tensorflow code?
  • Do we have free cloud resources which can run Tensorflow code?
  • Does this course provide free infrastructure for practicing TensorFlow code?

Kotlin – Hello World

http://gist-it.appspot.com/http://github.com/$file

// This program just demonstrates usage of println function
// nice usage of the word, 'fun'
fun main(args:Array){
// So, helloWorld is a fun, which takes an argument which is named as args and is having a type: Array of String
// Similar to Java, we have a main method here in Kotlin, where execution of our application starts.
// It is very simple compared to Java code
// Simple in the sense that, with less code, we are able to get things done
/*
class Main{
public static void main(String args[]){
System.out.println("Hello World");
}
} */
println("Hello World") // in Kotlin, we don't need to add semicolon, for end of statement. Something similar to Python.
}

Garbage collection 1

This is an attempt to understand garbage collection.

When an object becomes a garbage, it needs to be removed from the system. It is like how we have some items which are no longer required and we throw it in the dustbin.

Can we garbage collect Singleton objects from the system?

Technically speaking, when does an object become a garbage? Can Singleton object be ever garbage collected? Even if some Singleton’s are no longer referenced by the application, should it be cleaned up?

Suppose we do cleanup the Singleton object. What are the consequences? Probably, when the Singleton was being referenced, there was some state captured about the system. And this state could be of use later on. If this state is lost, then system would lose this crucial piece of information for it’s processing. Probably someone can say that the state can be serialized so that, on next instantiation of the Singleton object, the deserialization could happen and Singleton object is updated with that state.

I need to find out what is currently being done to Singleton, when it is no longer referenced. Basically what is the object graph, where it wouldn’t be attached, inorder to be cleaned up.

UPDATE:

We need to remember that Singleton object is constructed using static fields.

I just found out that static variables are GC roots. Since GC roots are always referencable, they will not be garbage collected. Again, we have a but. But if the classloader which loaded the Singleton class becomes eligible for garbage collection, then Singleton object becomes eligible for garbage collection.

Some useful links which are related to this topic:

Some of best articles for understanding concepts

Yes, I would like to share some of best articles on the internet which helped in understanding some concepts. In no order, please find below:-

Concept Brief Summary Link
Kubernetes Container Orchestration Deis
Coming soon Coming soon Coming soon
Coming soon Coming soon Coming soon
Coming soon Coming soon Coming soon
Coming soon Coming soon Coming soon
Coming soon Coming soon Coming soon

 

Fundamentals which we need to care about

Warren Buffet on Investing (excerpt from Foreword for Intelligent Investor book) :

To invest successfully over a lifetime does not require a stratospheric IQ, unusual business insights, or inside information. What’s needed is a sound intellectual framework for making decisions and the ability to keep emotions from corroding that framework. This book precisely and clearly prescribes the proper framework. You must supply the emotional discipline.

 

Fundamentals of  Javascript:

Javascript code is executed by the browser. For example, google chrome can interpret Javascript code.

Fundamentals of HTML:

HTML code is a formatting utility, it says how different parts of webpage should be formatted.

Fundamentals of CSS:

CSS provides styling to parts of a webpage.

Fundamentals of machine learning

Enable machine to learn to recognize a cat/dog in an image. This is done by providing the machine with training dataset. Training dataset will contain labelled images. Labelled images will contain an image of a dog/cat and will contain a label whether it is a dog/cat. Enabling machine to learn to recognize is done by coding algorithms, which can mathematically model an image (matrix/vectors), which can understand properties of image (pixel values). While learning, we make mistakes, so machine also makes mistakes, so these are errors. The algorithms are tuned in such a way that these error rates are reduced over a period of time.

Machine can learn concepts from data and data can be in the form of files. And files can be in form of images/videos/sounds.

(Know) What is ImageNet? — It is a huge database of images. (Understand) Why it is useful? — Researchers/Programmers can use this free database of images to train their image learning algorithms.