Distributed System Learning Notes – Cassandra & Related

Cassandra

  • Scalable NoSQL Database
  • Open Source
  • Can handle large amount of structured, semi-structured, unstructured data across multiple data centres and cloud
  • Highly available
  • Provides linear scalability and operational simplicity
  • Data Model offers column indexes
  • Denormalization support
  • Materialized views
  • Performance similar to log-structured updates
  • Powerful built-in caching
  • Classified as an AP system ( Availability and Partition Tolerance )
  • Masterless architecture
  • Nodes participate in a Cassandra ring – data gets distributed or partitioned across nodes transparently.
  • It can be configured to replicate data across data centers or multiple data centers or multiple cloud available systems. If one node goes down, other nodes have data belonging to this node. So, replication is supported and is configurable.
  • Linearly scalable – If 2 nodes can handle 1 lakh transactions per second. You can add 2 more nodes to the system, so it can handle 2 lakh transactions per second. Making it 8 nodes, can tackle 4 lakh transactions per second.

Figure 1

 

What is Linear Scalability?

  • Application is said to be linearly scalable if it can scale, with addition of nodes, without change in application code.

What are materialized views?

  • Normal views are like: storing results of query in a virtual table ( it doesnot physically exist ). Materialized views are like: we want it to be materialized. So, the underlying database system, stores results of query into actual table underneath. Materialized views are used for improving performance, & stability.

What is CAP Theorem?

  • Also called Brewer’s Theorem.
  • It is impossible for a distributed system to simultaneously provide all 3 guarantees:
    • Consistency – guarantee that all nodes see the same data at same time
    • Availability – guarantee that every request receives response whether it succeeded or failed
    • Partition tolerance – guarantee that despite partial failure of system OR arbitrary message loss, whole system continues to operate

 

 

Useful Resources

 

 

 

 

Advertisements

Udemy Course Log 1 – Complete Guide to Tensorflow for Deep Learning with Python

Below is course log for 1st video – Introduction – Lecture 1

deep-learning-machine-learning-tensorflow

Udemy Course Link

  1. What is objective of this course?
    1. For learners to become proficient in deep learning techniques with TensorFlow deep learning framework.
  2. What will the learners be exposed to?
    1. Basic crash course in Python and essential data science libraries such as numpy, matplotlib, scikit-learn, pandas
    2. Learn about few machine learning models such as Densely connected network, Word2Vec, Recurrent Neural Network which deals with continuous stream of data such as time-series.
    3. Learn about neural networks – perceptron, activation functions, back-propagation etc.
  3. Is there a Q&A forum ?
    1. This course has a Q&A forum and gitter chatroom as well.

At the outset, this course would give some elementary knowledge in data science libraries in python, help learners to understand few techniques in TensorFlow.

Following are my questions in context of learning about Tensorflow:

  • Can a learner with an average laptop setup , be able to execute program written using Tensorflow?
  • What should be the ideal configuration required for practicing Tensorflow?
  • Is GPU essential to run Tensorflow code?
  • Do we have free cloud resources which can run Tensorflow code?
  • Does this course provide free infrastructure for practicing TensorFlow code?

Kotlin – Hello World

http://gist-it.appspot.com/http://github.com/$file

// This program just demonstrates usage of println function
// nice usage of the word, 'fun'
fun main(args:Array){
// So, helloWorld is a fun, which takes an argument which is named as args and is having a type: Array of String
// Similar to Java, we have a main method here in Kotlin, where execution of our application starts.
// It is very simple compared to Java code
// Simple in the sense that, with less code, we are able to get things done
/*
class Main{
public static void main(String args[]){
System.out.println("Hello World");
}
} */
println("Hello World") // in Kotlin, we don't need to add semicolon, for end of statement. Something similar to Python.
}

Garbage collection 1

This is an attempt to understand garbage collection.

When an object becomes a garbage, it needs to be removed from the system. It is like how we have some items which are no longer required and we throw it in the dustbin.

Can we garbage collect Singleton objects from the system?

Technically speaking, when does an object become a garbage? Can Singleton object be ever garbage collected? Even if some Singleton’s are no longer referenced by the application, should it be cleaned up?

Suppose we do cleanup the Singleton object. What are the consequences? Probably, when the Singleton was being referenced, there was some state captured about the system. And this state could be of use later on. If this state is lost, then system would lose this crucial piece of information for it’s processing. Probably someone can say that the state can be serialized so that, on next instantiation of the Singleton object, the deserialization could happen and Singleton object is updated with that state.

I need to find out what is currently being done to Singleton, when it is no longer referenced. Basically what is the object graph, where it wouldn’t be attached, inorder to be cleaned up.

UPDATE:

We need to remember that Singleton object is constructed using static fields.

I just found out that static variables are GC roots. Since GC roots are always referencable, they will not be garbage collected. Again, we have a but. But if the classloader which loaded the Singleton class becomes eligible for garbage collection, then Singleton object becomes eligible for garbage collection.

Some useful links which are related to this topic:

Some of best articles for understanding concepts

Yes, I would like to share some of best articles on the internet which helped in understanding some concepts. In no order, please find below:-

Concept Brief Summary Link
Kubernetes Container Orchestration Deis
Coming soon Coming soon Coming soon
Coming soon Coming soon Coming soon
Coming soon Coming soon Coming soon
Coming soon Coming soon Coming soon
Coming soon Coming soon Coming soon

 

Fundamentals which we need to care about

Warren Buffet on Investing (excerpt from Foreword for Intelligent Investor book) :

To invest successfully over a lifetime does not require a stratospheric IQ, unusual business insights, or inside information. What’s needed is a sound intellectual framework for making decisions and the ability to keep emotions from corroding that framework. This book precisely and clearly prescribes the proper framework. You must supply the emotional discipline.

 

Fundamentals of  Javascript:

Javascript code is executed by the browser. For example, google chrome can interpret Javascript code.

Fundamentals of HTML:

HTML code is a formatting utility, it says how different parts of webpage should be formatted.

Fundamentals of CSS:

CSS provides styling to parts of a webpage.

Fundamentals of machine learning

Enable machine to learn to recognize a cat/dog in an image. This is done by providing the machine with training dataset. Training dataset will contain labelled images. Labelled images will contain an image of a dog/cat and will contain a label whether it is a dog/cat. Enabling machine to learn to recognize is done by coding algorithms, which can mathematically model an image (matrix/vectors), which can understand properties of image (pixel values). While learning, we make mistakes, so machine also makes mistakes, so these are errors. The algorithms are tuned in such a way that these error rates are reduced over a period of time.

Machine can learn concepts from data and data can be in the form of files. And files can be in form of images/videos/sounds.

(Know) What is ImageNet? — It is a huge database of images. (Understand) Why it is useful? — Researchers/Programmers can use this free database of images to train their image learning algorithms.

 

Logistic Regression – draft

  • What is Logistic Regression?

Logistic Regression is a machine learning model.

Why machine learning model? We need a machine learning model, in the sense that: that model will be used by the machine to learn something. The machine will learn something out of data. The model will help the machine to make some sense out of data. And the best part is, when it makes sense out of data, it can do some useful tasks, like: predicting, classifying, on its own.

  • Can you give examples of problems where Logistic Regression can be put to use?

For example, we want to know if a person will likely have diabetes. We need some parameters to judge whether this person will get diabetes. To simplify things, let us consider some features/parameters like: person’s height, person’s weight, person’s blood pressure.

So, as software engineers, we have a method which takes some input and it will generate some output. Something like follows:

public boolean doesPersonHaveDiabetes ( Height p_height, Weight p_weight, BloodPressure p_bloodPressure) {}What is the difference between Linear Regression and Logistic Regression?

 

  • What is the difference between Logistic Regression and Linear Regression?

Linear Regression is great for predicting continuous values (what will be the stock price after 5 mins given some basic set of conditions ) .

Logistic Regression is great for predicting discrete outcome ( does person have diabetes or not ).

So, we are going to use a Logistic Regression model to predict an outcome ( yes or no).

Why “Logistic” ? ( Enter Mathematics … )

The core part of Logistic Regression is the ‘Logistic function’ or ‘Sigmoid function’, which looks like: Capture

From above function, we see that the numerator is 1 and denominator is restricted to take values ( 1+ e^-x ). As you can see, the function takes values from 0 to 1.  For the range of values of x, we see that max is 1 and min is 0. This fits into our scenario, where we want to know diabetes or not, we can say: more than 70% probability, then yes. Otherwise no.

  • Can we use logistic regression model for all kinds of input data ?

No, we can use it only when input space is linearly separable.