This is 2nd part “Machine Learning Interview Questions”. To read the first part of this series, click here

###### What are the pros and cons (advantages and disadvantages) of Bayes’ Theorem?

Pros: 1. Bayes’ theorem is relatively simple to understand and build, 2. We can train it easily; even with a small dataset, 3. It’s fast!, 4. It’s not sensitive to irrelevant features. Cons: 1. It assumes every feature is independent, which isn’t always the case

###### What is the difference between L1 and L2 regularization?

The difference between the L1 and L2 is just that L2 is the sum of the square of the weights, while L1 is just the sum of the weights.

###### Briefly describe Cluster analysis

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.

###### What is ICA or Intependent Component Analysis?

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that the subcomponents are non-Gaussian signals and that they are statistically independent from each other. ICA is a special case of blind source separation. A common example application is the “cocktail party problem” of listening in on one person’s speech in a noisy room.

What is deep learning?

Deep learning, a subset of machine learning, utilizes a hierarchical level of artificial neural networks to carry out the process of machine learning. The artificial neural networks are built like the human brain, with neuron nodes connected together like a web. While traditional programs build analysis with data in a linear way, the hierarchical function of deep learning systems enables machines to process data with a non-linear approach. A traditional approach to detecting fraud or money laundering might rely on the amount of transaction that ensues, while a deep learning non-linear technique to weeding out a fraudulent transaction would include time, geographic location, IP address, type of retailer, and any other feature that is likely to make up a fraudulent activity.

###### Where can we use deep learning?

1. Automatic speech recognition, 2. Image recognition, 3. Visual Art Processing, 4. Natural language processing, 5. Drug discovery and toxicology, 6. Customer relationship management, 7. Recommendation systems, 8. Bioinformatics, 9. Mobile Advertising

###### What is random forest?

Random forests or random decision forests[1][2] are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees’ habit of overfitting to their training set.

###### What do you know about Dimensionality Reduction Algorithms?

In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration,[1] via obtaining a set of principal variables. It can be divided into feature selection and feature extraction.

###### What are the popular programming languages used in machine learning?

Python, Java, R, Scala etc

###### What are the popular frameworks in machine learning?

1. Apache Spark MLlib, 2. TensorFlow, 3. Amazon Machine Learning (AML), 4. Apache Singa, 5. Torch, 6. Azure ML Studio etc

###### What is tensor flow?

TensorFlow is an open-source software library for dataflow programming across a range of tasks. It is a symbolic math library, and also used for machine learning applications such as neural networks.[3] It is used for both research and production at Google.

###### What is data mining?

Data mining is the process of sorting through large data sets to identify patterns and establish relationships to solve problems through data analysis. Data mining tools allow enterprises to predict future trends.

###### How do you avoid overfitting with a model?

Use test data for evaluation or do cross validation. Add regularizations terms (such as L1, L2, AIC, BIC, MDL or a probabilistic prior) to the objective function.

###### What is Kernel Trick?

In machine learning, kernel methods are a class of algorithms for pattern analysis, whose best known member is the support vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example clusters, rankings, principal components, correlations, classifications) in datasets. kernel methods require only a user-specified kernel, i.e., a similarity function over pairs of data points in raw representation.

###### When will machines eat us?

Answer based on your own preference

###### Will machines ever be able to feel consciousness. What do you think?

Answer based on your own preference

###### What is apache spark?

Apache Spark is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. It was originally developed at UC Berkeley in 2009

###### What is the difference between hash table and array?

1) Hash table store data as name, value pair. While in array only value is store. 2) To access value from hash table, you need to pass name. While in array, to access value, you need to pass index number. 3) you can store different type of data in hash table, say int, string etc. while in array you can store only similar type of data.

###### What are some of the major tasks in data pre-processing?

**Data cleaning:**Fill in or missing values, detect and remove noisy data and outliers.**Data transformation:**Normalize data to reduce dimensions and noise.**Data reduction:**Sample data records or attributes for easier data handling.**Data discretization:**Convert continuous attributes to categorical attributes for ease of use with certain machine learning methods.**Text cleaning:**remove embedded characters which may cause data misalignment, for e.g., embedded tabs in a tab-separated data file, embedded new lines which may break records, etc.

###### How to deal with missing values?

To deal with missing values, it is best to first identify the reason for the missing values to better handle the problem. Typical missing value handling methods are: Deletion: Remove records with missing values Dummy substitution: Replace missing values with a dummy value: e.g, unknown for categorical or 0 for numerical values. Mean substitution: If the missing data is numerical, replace the missing values with the mean. Frequent substitution: If the missing data is categorical, replace the missing values with the most frequent item Regression substitution: Use a regression method to replace missing values with regressed values.