TLDR: “Linear regression is a classification algorithm. It classifies data into different classes.”

Suppose you have lot of emails in your inbox. However, your personal mail server does not have any email spam filtering system. So you have to build a system that can filter your emails and classify as spam or not.

You can classify an email as spam by using some parameters or variables. These variables are called independent variables. For example, you may say that if the email is coming from this particular email address then it is spam. Or if this email is coming from that particular IP address or server then it’s spam. Another variable or parameter can be a “list of words”. If these words are present in your email’s subject or email’s body then you can identify the email as spam.
By using these independent variables we can find out the dependent variable – “spam”. Using independent variables we find out dependent variable with help of an algorithm. This algorithm is called Logistic regression.

When to use Logistic regression:

If you can answer using YES or NO to your questions, then you can apply logistic regression.

For example:

Is the email spam? Yes / No;

Do the symptoms (independent variables) have risk of potential heart attack? Yes / No;

Is the transaction fraud? Yes / No;

By this way, you can find out when to use Logistic regression.

When you start to learn linear regression and Logistic regression you will come up with an equation. The equation will contain theta. You might be wondering what does this theta in machine learning mean.
A few days ago I was in my EMBA class “Managing Operations”. our class topic was forcasting. In that class I learned something about weights. Later I was able to relate between weights and theta. Today I will explain what is theta in terms of weights.

Suppose you have an online store where you sell expensive, unique and stylus pens. In last year you advertised on Google AdWords, Facebook, Twitter and a local newspaper. From last year’s data you have come to know that Google advertisement was more effective than all other advertisements. This year you have to make a plan for advertising on different media. To get a better result what will you do?
Simple answer: You will spend more money on Google advertisement. From your previous year’s data you have found that Facebook is in second position. So your second priority will be Facebook advertisement. You have advertised on your local newspaper too. However, you couldn’t figure out how much traffic did you get from that advertisement. So this year you are not going to advertise on local newspaper at all.
Let’s assume, your this year’s total advertisement budget is US$ 6000. We can write down your this year’s advertisement plan as follows:

$3000 Google advertisement + $2000 Facebook advertisement + $1000 twitter advertisement + $0 Local newspaper advertisement = $6000.

From above calculation we can say that we are spending most of the money in Google advertisement. Alternatively, we can say most weight goes to Google advertisement. So by using weights you give different priority to different media.
Now you understood what does it mean by weight. If so then you also understand the meaning of theta in machine learning. In machine learning algorithms we use theta in different features to give them different priority or weight based on their importance.

House price example

When we do linear regression and Logistic regression we use some features. For example, suppose we are going to find out house price of any specific area. For that we take some historical value or previous selling price data. From that data we take different features of house. e.g. number of rooms, number of bathrooms, kitchens and whether the house is beside the main road or far from the main road.
For calculating house price we give more priority to some features and less priority to some other features. For example, number of rooms have more weight or priority than number of kitchens. If there are 5 kitchens in a 2000 square feet home, that may not add more value to the house. Because only one or two kitchens are enough. That’s why number of kitchens are not major issues in predicting house price. So we give less weight to kitchen and more weight to rooms as well as total area of the house.
So now if you see a machine learning algorithm with theta, you will be able to figure out what does it mean.

Reinforcement learning is one type of Machine learning. In a single sentence, in this learning process a machine learns using trial and error method. Here basically, we give the machine 2 instructions.

1. Try all possible ways.

2. From your experience avoid errors and increase success rate.

Suppose, we have a robot. There is a fire in front of it. The robot can do 2 things. Whether it can directly jump into the fire or run away from it.

At first it will try both ways. Jump into fire and fail. Then again it will run away and survive. The robot will remember it. Next time when it see the fire again, it will run away. This is the basic concept of reinforcement learning.

Reinforcement learning Algorithms:

Q-Learning

SARSA (State–action–reward–state–action)

Relative value learning (R-Learning)

Where to apply:

There are many fields where we can apply it. Some examples are as follows:

Playing a game: Reinforcement learning can learn to play different games and can become master on it. One great example is “AlphaGo system”. Using this machine learning the system beat a high ranked Go player.

Natural language processing: Processing human language is very difficult task. By using it we are overcoming this issue.

Self driving car system: In the near future, we’ll see lot of self driving cars on the road. To make it come true reinforcement learning is contributing a lot. ML algorithms (e.g. Deep Q-Learning algorithm) are used in self driving car system to improve driving.

Robot’s movement: Robot’s different movements are improved over time by using reinforcement learning. For example, robot can grab an object more accurately by using this algorithm.

This is 2nd part “Machine Learning Interview Questions”. To read the first part of this series, click here

What are the pros and cons (advantages and disadvantages) of Bayes’ Theorem?

Pros: 1. Bayes’ theorem is relatively simple to understand and build, 2. We can train it easily; even with a small dataset, 3. It’s fast!, 4. It’s not sensitive to irrelevant features. Cons: 1. It assumes every feature is independent, which isn’t always the case

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that the subcomponents are non-Gaussian signals and that they are statistically independent from each other. ICA is a special case of blind source separation. A common example application is the “cocktail party problem” of listening in on one person’s speech in a noisy room.

Deep learning, a subset of machine learning, utilizes a hierarchical level of artificial neural networks to carry out the process of machine learning. The artificial neural networks are built like the human brain, with neuron nodes connected together like a web. While traditional programs build analysis with data in a linear way, the hierarchical function of deep learning systems enables machines to process data with a non-linear approach. A traditional approach to detecting fraud or money laundering might rely on the amount of transaction that ensues, while a deep learning non-linear technique to weeding out a fraudulent transaction would include time, geographic location, IP address, type of retailer, and any other feature that is likely to make up a fraudulent activity.

1. Automatic speech recognition, 2. Image recognition, 3. Visual Art Processing, 4. Natural language processing, 5. Drug discovery and toxicology, 6. Customer relationship management, 7. Recommendation systems, 8. Bioinformatics, 9. Mobile Advertising

What is random forest?

Random forests or random decision forests[1][2] are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees’ habit of overfitting to their training set.

What do you know about Dimensionality Reduction Algorithms?

In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration,[1] via obtaining a set of principal variables. It can be divided into feature selection and feature extraction.

TensorFlow is an open-source software library for dataflow programming across a range of tasks. It is a symbolic math library, and also used for machine learning applications such as neural networks.[3] It is used for both research and production at Google.

What is data mining?

Data mining is the process of sorting through large data sets to identify patterns and establish relationships to solve problems through data analysis. Data mining tools allow enterprises to predict future trends.

Use test data for evaluation or do cross validation. Add regularizations terms (such as L1, L2, AIC, BIC, MDL or a probabilistic prior) to the objective function.

In machine learning, kernel methods are a class of algorithms for pattern analysis, whose best known member is the support vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example clusters, rankings, principal components, correlations, classifications) in datasets. kernel methods require only a user-specified kernel, i.e., a similarity function over pairs of data points in raw representation.

Will machines ever be able to feel consciousness. What do you think?

Answer based on your own preference

What is apache spark?

Apache Spark is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. It was originally developed at UC Berkeley in 2009

What is the difference between hash table and array?

1) Hash table store data as name, value pair. While in array only value is store. 2) To access value from hash table, you need to pass name. While in array, to access value, you need to pass index number. 3) you can store different type of data in hash table, say int, string etc. while in array you can store only similar type of data.

What are some of the major tasks in data pre-processing?

Data cleaning: Fill in or missing values, detect and remove noisy data and outliers.

Data transformation: Normalize data to reduce dimensions and noise.

Data reduction: Sample data records or attributes for easier data handling.

Data discretization: Convert continuous attributes to categorical attributes for ease of use with certain machine learning methods.

Text cleaning: remove embedded characters which may cause data misalignment, for e.g., embedded tabs in a tab-separated data file, embedded new lines which may break records, etc.

To deal with missing values, it is best to first identify the reason for the missing values to better handle the problem. Typical missing value handling methods are: Deletion: Remove records with missing values Dummy substitution: Replace missing values with a dummy value: e.g, unknown for categorical or 0 for numerical values. Mean substitution: If the missing data is numerical, replace the missing values with the mean. Frequent substitution: If the missing data is categorical, replace the missing values with the most frequent item Regression substitution: Use a regression method to replace missing values with regressed values.

Nearly all big tech companies have an artificial intelligence project, and they are willing to pay experts millions of dollars to help get it done. – By CADE METZ

Machine learning is a part of artificial intelligence. According to IBM’s forecast, job opening for artificial intelligence, machine learning and data science will increase 28% by 2020 (Forbes).

So if you are looking for a machine learning job or need to prepare for machine learning interview, then take a look at following questionaries.

What is machine learning?

Machine learning is a branch of Artificial Intelligence. It allows systems to automatically learn and improve from experience without being explicitly programmed.

What is artificial intelligence?

Artificial Intelligence is a branch of Computer Science that studies and researches to develop machines that have intelligence like human being. Most importantly, they can learn from experience and deal with new situations smartly.

What is the difference between artificial intelligence and machine learning?

Artificial Intelligence (AI) has many branches. One of them is ML. AI deals with broader context of developing a machine that can act like human and smartly. On the other hand, in machine learning we provide data to machines and they learn for themselves from that data.

What are the types of machine learning?

There are 3 types of machine learning. 1. Supervised learning, 2. Unsupervised learning and 3. Reinforced learning

What is Supervised machine learning?

In supervised machine learning, you provide a set of data with problems and answers. Machine learns from that set of data and applies learning in future.

What is Unsupervised machine learning?

In unsupervised learning, we don’t provide any solution data to machine. We provide them a set of data. The machine learns for itself.

What is Reinforcement machine learning?

Reinforcement learning is training by rewards and punishments. Here we train a computer as if we train a dog. If the dog obeys and acts according to our instructions we encourage it by giving biscuits or we punish it (by not providing biscuit or any other mean). Similarly, if the system works well then the teacher gives positive value (i.e. reward) or the teacher gives negative value (i.e. punishment). The learning system which gets the punishment has to improve itself. Thus it is a trial and error process.

Linear regression is a statistical method that attempts to model relationship between different scalar variables. There can be two or more variables. Among them, one is dependent variable. Others are independent variables.

What do you know about logistic regression?

Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

What is the difference between linear regression and correlation?

From correlation we can only get an index describing the linear relationship between two variables; in regression can predict the relationship between more than two variables and can use it to identify which variables x can predict the outcome variable y. … While regression means going back towards average .

A logistic regression model is searching for a single linear decision boundary in your feature space, whereas a decision tree is essentially partitioning your feature space into half-spaces using axis-aligned linear decision boundaries. The net effect is that you have a non-linear decision boundary, possibly more than one.

This is nice when your data points aren’t easily separated by a single hyperplane. On the other hand, decision trees are so flexible that it depends on your specific problem and the data you have. Both decision trees (depending on the implementation, e.g. C4.5) and logistic regression should be able to handle continuous and categorical data just fine. It can be prone to overfitting. To combat this, you can try pruning. Logistic regression tends to be less susceptible (but not immune!) to overfitting.

Lastly, another thing to consider is that decision trees can automatically take into account interactions between variables. For example xyxy if you have two independent features xx and yy. With logistic regression, you’ll have to manually add those interaction terms yourself.

Which algorithms do we use for supervised machine learning?

K-nearest neighbors is a classification algorithm, which is a subset of supervised learning. K-means is a clustering algorithm, which is a subset of unsupervised learning. … In sum, two different algorithms with two very different end results

In statistics, a receiver operating characteristic curve, i.e. ROC curve, is a graphical plot. It illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. ROC analysis is related in a direct and natural way to cost/benefit analysis of diagnostic decision making.

In statistics, ordinary least squares (OLS) or linear least squares is a method for estimating the unknown parameters in a linear regression model. The goal of it is to minimizing the sum of the squares of the differences between the observed responses (values of the variable being predicted) in the given dataset and those predicted by a linear function of a set of explanatory variables.

Naive Bayes is a collection of classification algorithms based on Bayes Theorem. It is not a single algorithm but a family of algorithms that all share a common principle, that every feature being classified is independent of the value of any other feature. So for example, a fruit may be considered to be an apple if it is red, round, and about 3″ in diameter. A Naive Bayes classifier considers each of these “features” (red, round, 3” in diameter) to contribute independently to the probability that the fruit is an apple, regardless of any correlations between features. Features, however, aren’t always independent which is often seen as a shortcoming of the Naive Bayes algorithm and this is why it’s labeled “naive”.

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiate the two classes very well.

When it’s time to learn “Machine Learning”, the first thing that you will hear is “Types of Machine Learning”. Because this is where you will begin to learn.

Based on learning algorithms, machines can learn in two ways. In supervised way and the other is un-supervised way. So these are the types of machine learning. Let’s discuss about them with examples.

1. Supervised machine learning

Here at the very beginning you teach your machine. Then the machine gives you result based on your lessons. Let me give you a real world example:

Suppose, you want to teach the machine to recognize images of fruits. In supervised learning process you show the image of apple and tell machine that this is apple. Again you take image of orange and let it know that the image contains orange.

By this way you teach your machine with lot of images and their labels.

After that, if you show a new image to the machine, most likely it will recognize the fruit’s name.

2. Unsupervised machine learning

Here you don’t teach machine. It learns itself. Lets jump into an example. In this scenario, you show many images of apples and oranges. But don’t say which one is apple and which one is orange. The machine will be able to predict that these two things are different. It will categorize apple in one category and orange in another category.

That is, it will cluster different things in different groups.