BRIEF NOTE ON PRINCIPAL COMPONENT ANALYSIS
Principal component analysis, commonly known as PCA, serves as a powerful tool for reducing the dimensionality of large datasets. Its primary function is to transform a multitude of variables into a smaller, more manageable set while preserving the essential information inherent in the original dataset.
Though diminishing the number of variables inevitably introduces a degree of loss in accuracy, the essence of dimensionality reduction lies in trading a fraction of precision for simplicity. By condensing the data into a more compact form, PCA facilitates easier exploration and visualization, streamlining the analysis process for machine learning algorithms. This streamlined approach not only enhances efficiency but also accelerates the comprehension and processing of data points.
In essence, PCA aims to distil the essence of a dataset by identifying its principal components — those directions that encapsulate the most significant variance in the data. Through this process, PCA enables the extraction of key insights while minimizing complexity, thereby empowering more effective data analysis and interpretation.
Note:
- Principal components are new variable that are constructed as linear combinations or mixtures of the initial variables.
- The components are uncorrelated.
Steps involved in PCA:
1. Standardize the range of continuous initial variables.
2. Compute the covariance matrix to identify variables.
3. Compute the eigen-vector and eigen-values of the covariance matrix to identify the principal components.
4. Create a feature vector to decide which principal components to keep.
5. Recast the data along the principal component axes.
1. Standardization:
The reason behind the step lies in the fact that PCA is sensitive to variances of the initial variables i.e. variables with large ranges will dominate over those with small ranges, for example, a variable that ranges between 0 and 100 will dominate over variable that ranges between 0 and 1, thus, leading to biased results.
2. Covariance Matrix Computation:
This step aims to discern the relationships between variables within the input dataset by examining how they deviate from the mean in relation to each other. Essentially, it seeks to uncover any correlations among variables, as some may be highly interrelated, containing overlapping or redundant information. To uncover these correlations, we calculate the covariance matrix. This matrix provides valuable insights into the patterns of variability among the variables, laying the groundwork for further analysis and dimensionality reduction.
3. Eigen values and eigen vectors computation:
Eigenvector and values are concepts of linear algebra and are calculated based on the covariance matrix. Here, the Eigen-vectors of the covariance matrix stand for the direction where there is most variance and Eigen-values represent the co-efficient attached to the eigen-vector and provides information on the amount of variance in each principal component. Eigen-vector and values are always in pairs and the no. of dimensions of the data is equal to the number of pairs.
The percentage of variance for the components can be computed by dividing the eigen value of each component by the sum of the eigen values.
4. Create feature vector:
Computing the eigen vectors and ordering them by eigen values help identify components of high significance and discard components of low significance. The remaining factors are considered to form a matix known as feature vector. Also facilitating the first step towards dimensionality reduction.
5. Recast data along Principal Component Axes:
The aim here is to use the feature vector formed in the previous step to re-orient the data from the original axes to the one represented by the principal components. This can be achieved by multiplying the transpose of the original dataset by the transpose of the feature vector.

For further explanation, the below link provides a detailed explanation of PCA.
Reference:
- [skymind.ai]: Eigenvectors, Eigenvalues, PCA, Covariance and Entropy