Image for post
Image for post

What is Dimensionality Reduction?

In machine learning problems, there are often too many factors on the basis of which the final classification is done. These factors are basically variables called features. The higher the number of features, the harder it gets to visualize the training set and then work on it. The process of selecting a subset of features for use in model construction is called Dimensionality Reduction.

Before Learning the techniques of Dimensionality Reduction, lets understand why it is important to do Dimensionality Reduction in our Dataset.

Reasons :
The abundance of redundant and irrelevant features
2) With a fixed number of training samples, the predictive power reduces as the dimensionality increases. [Hughes phenomenon]
3) Other things being equal, simpler explanations are generally better than complex ones.
4) It improves the accuracy of a model if the right subset is chosen.
5) Reduces the Overfitting.
6) It reduces computation time.

Image for post
Image for post

Definition of ML given by Arthur Samuel in 1959

“ the field of study that gives computer the ability to learn without being explicitly programmed”.

Definition of ML given by Tom Mitchell in 1997

“ A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E”.
Example : playing checkers
E = the experience of playing many games of checkers
T = the task of playing checkers
P = the probability that the program will win the next game

In general, any ML problem can be assigned to one of 3 broad classifications :

  1. Supervised Learning
  2. Unsupervised Learning
  3. Reinforcement Learning

Supervised Learning : In the supervised learning, we are given a dataset and already know what our correct output should look like, having the idea that there is a relationship between the input and the output. …

Image for post
Image for post

Sometimes it might be confusing to some people to distinguish between Data Science and Data Mining, so after reading this article it will clear your concepts about Data Science and Data Mining.

Lets begin with their formal definition and their history related knowledge.

Data Mining is a process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.Data mining is an inter- disciplinary subfield of computer science and statistics with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use.

1989 The term “Knowledge Discovery in Databases” (KDD) is coined by Gregory Piatetsky-Shapiro. It also at this time that he co-founds the first workshop also named KDD. …


Astitva Srivastava

Data Analysis| Machine Learning | Passionate about solving business problems by data-driven approaches. 📊📈

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store