當前位置：首頁 > news >正文

福田網(wǎng)站開發(fā)北京seo營銷培訓

news 2025/7/5 16:08:28

福田網(wǎng)站開發(fā),北京seo營銷培訓,網(wǎng)站開發(fā)課程的心得,dw網(wǎng)站管理與建設大家在做數(shù)據(jù)分析或者機器學習應用過程中，不可避免的需要對數(shù)據(jù)進行降維操作，好多垂直行業(yè)業(yè)務中經(jīng)常出現(xiàn)數(shù)據(jù)量少但維度巨大的情況。數(shù)據(jù)降維的目的是為了剔除不相關或冗余特征，使得數(shù)據(jù)易用，去除無用數(shù)據(jù)，實現(xiàn)數(shù)據(jù)可…

? ? ? 大家在做數(shù)據(jù)分析或者機器學習應用過程中，不可避免的需要對數(shù)據(jù)進行降維操作，好多垂直行業(yè)業(yè)務中經(jīng)常出現(xiàn)數(shù)據(jù)量少但維度巨大的情況。數(shù)據(jù)降維的目的是為了剔除不相關或冗余特征，使得數(shù)據(jù)易用，去除無用數(shù)據(jù)，實現(xiàn)數(shù)據(jù)可視化，提高模型精確度，減少運行成本，減少特征個數(shù)并關注本質特征，確保數(shù)據(jù)特征屬性間相互獨立。

1.數(shù)據(jù)降維的主要方法

數(shù)據(jù)降維主要有線性和非線性方法，線性方法有PCA 、ICA、LDA、LFA、LPP(LE 的線性表示），非線性方法有基于核函數(shù)——KPCA 、KICA、KDA，基于特征值的流型學習——ISOMAP、LLE、LE、LPP、LTSA、MVU。
本文主要講述PCA和ICA以及NMF，NMF是一種只關注非負值的PCA降維方法。其中，PCA是一種全新的正交特征（也被稱為主成分）來表示向數(shù)據(jù)變化最大的方向投影(最大方差)，或者說向重構誤差最小化的方向投影，形成維度更少、正交的數(shù)據(jù)特征。

2.數(shù)據(jù)降維的應用場景

主要應用于文本處理、人臉識別、圖片識別、自然語言處理、業(yè)務環(huán)節(jié)的高維數(shù)據(jù)處理等領域。

3.數(shù)據(jù)降維示例

數(shù)據(jù)降維方法的主要示例詳見下方。

import numpy as np  
import matplotlib.pyplot as plt  
import cv2  #主成分分析PCA
mean = [20, 20]             # 各維度的均值，確定數(shù)據(jù)維度，表示1行2列，長度為N的一維矩陣  
cov = [[5, 0], [25, 25]]    # 協(xié)方差矩陣，且協(xié)方差矩陣必須是對稱矩陣和半正定矩陣(形狀為(N,N)的二維數(shù)組) 
np.random.seed(42) #設置隨機種子點，這樣每次生成數(shù)據(jù)都一樣  
x, y = np.random.multivariate_normal(mean, cov, 2000).T #根據(jù)均值和協(xié)方差矩陣情況生成一個多元正態(tài)分布矩陣  
plt.figure(figsize=(10, 6))  
plt.plot(x, y, 'o', zorder=1)  
plt.axis([0, 40, 0, 40])  
plt.xlabel('source feature 1')  
plt.ylabel('source feature 2')  
plt.show()  
X = np.vstack((x, y)).T #組合成特征矩陣  
mu, eig = cv2.PCACompute(X, np.array([])) #以空數(shù)組作為蒙版，獲得平均值和協(xié)方差矩陣的特征向量eig  
plt.figure(figsize=(10, 6))  
plt.plot(x, y, 'o', zorder=1)  
plt.quiver(mean, mean, eig[:, 0], eig[:, 1], zorder=3, scale=0.2, units='xy')  
plt.text(mean[0] + 5 * eig[0, 0], mean[1] + 5 * eig[0, 1], 'v1', zorder=5,  
fontsize=16, bbox=dict(facecolor='white', alpha=0.6))  
plt.text(mean[0] + 7 * eig[1, 0], mean[1] + 4 * eig[1, 1], 'v2', zorder=5,  
fontsize=16, bbox=dict(facecolor='white', alpha=0.6))  
plt.axis([0, 40, 0, 40])  
plt.xlabel('feature 1')  
plt.ylabel('feature 2')  
plt.show()  #1.opencv提供與PCA密切相關的降維技術  
X2 = cv2.PCAProject(X, mu, eig)     #選擇數(shù)據(jù)，將xy坐標軸旋轉為以v1，v2為坐標軸，v1、v2的選擇來自于mu和eig  
plt.figure(figsize=(10, 6))  
plt.plot(X2[:, 0], X2[:, 1], '^')  
plt.xlabel('first principal component')  
plt.ylabel('second principal component')  
plt.axis([-20, 20, -10, 10])  
plt.show()  #2.sklearn提供與PCA密切相關的降維技術ICA  
from sklearn import decomposition  
ica = decomposition.FastICA() #與PCA類似，但分解后選擇盡量相互獨立的成分。  
X2 = ica.fit_transform(X)  
plt.figure(figsize=(10, 6))  
plt.plot(X2[:, 0], X2[:, 1], '^')  
plt.xlabel('first independent component')  
plt.ylabel('second independent component')  
plt.axis([-0.2, 0.2, -0.2, 0.2])  
plt.show()  #3.sklearn提供PCA密切相關的降維技術，即非負矩陣分解，僅僅處理那些非負的數(shù)據(jù)，特征矩陣中不能有負值  
from sklearn import decomposition  
nmf = decomposition.NMF()  
X2 = nmf.fit_transform(X)  
plt.figure(figsize=(10, 6))  
plt.plot(X2[:, 0], X2[:, 1], 'o')  
plt.xlabel('first non-negative component')  
plt.ylabel('second non-negative component')  
plt.axis([0, 1.5, -0.5, 1.5])  
plt.show()

查看全文

http://m.aloenet.com.cn/news/40637.html