In machine learning problems, there are often too many factors on the basis of which the final classification is done. These factors are basically variables called features. The higher the number of features, the harder it gets to visualize the training set and then work on it. The process of selecting a subset of features for use in model construction is called Dimensionality Reduction.
Before Learning the techniques of Dimensionality Reduction, lets understand why it is important to do Dimensionality Reduction in our Dataset.
1) The abundance of redundant and irrelevant features
2) With a fixed number of training samples, the predictive power reduces as the dimensionality increases. [Hughes phenomenon]
3) Other things being equal, simpler explanations are generally better than complex ones.
4) It improves the accuracy of a model if the right subset is chosen.
5) Reduces the Overfitting.
6) It reduces computation time. …
“ the field of study that gives computer the ability to learn without being explicitly programmed”.
“ A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E”.
Example : playing checkers
E = the experience of playing many games of checkers
T = the task of playing checkers
P = the probability that the program will win the next game
Supervised Learning : In the supervised learning, we are given a dataset and already know what our correct output should look like, having the idea that there is a relationship between the input and the output. …
Sometimes it might be confusing to some people to distinguish between Data Science and Data Mining, so after reading this article it will clear your concepts about Data Science and Data Mining.
Data Mining is a process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.Data mining is an inter- disciplinary subfield of computer science and statistics with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use.
1989 The term “Knowledge Discovery in Databases” (KDD) is coined by Gregory Piatetsky-Shapiro. It also at this time that he co-founds the first workshop also named KDD. …