當(dāng)前位置：首頁 > news >正文

西安市城鄉(xiāng)建設(shè)委員會(huì)網(wǎng)站6關(guān)鍵詞生成器

news 2025/7/2 5:42:49

西安市城鄉(xiāng)建設(shè)委員會(huì)網(wǎng)站6,關(guān)鍵詞生成器,win10 做網(wǎng)站服務(wù)器,網(wǎng)站建設(shè)的七大主要目的照片由 Unsplash上的 vackground.com提供一、說明 SVM（支持向量機(jī)）簡(jiǎn)單而優(yōu)雅用于分類和回歸的監(jiān)督機(jī)器學(xué)習(xí)方法。該算法試圖找到一個(gè)超平面，將數(shù)據(jù)分為不同的類，并具有盡可能最大的邊距。本篇我們將介紹如果最大邊距不存在的時(shí)候…

一、說明

????????SVM（支持向量機(jī)）簡(jiǎn)單而優(yōu)雅用于分類和回歸的監(jiān)督機(jī)器學(xué)習(xí)方法。該算法試圖找到一個(gè)超平面，將數(shù)據(jù)分為不同的類，并具有盡可能最大的邊距。本篇我們將介紹如果最大邊距不存在的時(shí)候，如何創(chuàng)造最大邊距。

二、讓我們逐步了解 SVM?

????????假設(shè)我們有一維濕度數(shù)據(jù)，紅點(diǎn)代表不下雨的日子，藍(lán)點(diǎn)代表下雨的日子。

??????

????????根據(jù)我們擁有的一維觀測(cè)數(shù)據(jù)，我們可以確定閾值。該閾值將充當(dāng)分類器。由于我們的數(shù)據(jù)是一維的，分類器將有一個(gè)閾值。如果我們的數(shù)據(jù)是二維的，我們會(huì)使用一條線。

????????觀察到的數(shù)據(jù)（最近的數(shù)據(jù)點(diǎn)）與分類器閾值之間的最短距離稱為邊距。能夠提供最大margin的閾值稱為Maximal Margin Classifier (Hyperplane)。在我們的例子中，它將位于雙方最接近數(shù)據(jù)的中點(diǎn)。

???????

????????最大保證金在實(shí)踐中不太適用。因?yàn)樗鼘?duì)異常值沒有抵抗力。想象一下，我們有一個(gè)具有藍(lán)色值的離群紅點(diǎn)。在這種情況下，分類器將非常接近藍(lán)點(diǎn)，遠(yuǎn)離紅點(diǎn)。

為了改善這一點(diǎn)，我們應(yīng)該允許異常值和錯(cuò)誤分類。我們?cè)谙到y(tǒng)中引入偏差（并減少方差）。現(xiàn)在，邊距稱為軟邊距。使用軟間隔的分類器稱為支持向量分類器或軟間隔分類器。邊緣上和軟邊緣內(nèi)的數(shù)據(jù)點(diǎn)稱為支持向量。

????????我們使用交叉驗(yàn)證來確定軟邊距應(yīng)該在哪里。

????????在 2D 數(shù)據(jù)中，支持向量分類器是一條線。在 3D 中，它是一個(gè)平面。在 4 個(gè)或更多維度中，支持向量分類器是一個(gè)超平面。從技術(shù)上講，所有 SVC 都是超平面，但在 2D 情況下更容易將它們稱為平面。

//www.analyticsvidhya.com/blog/2021/05/support-vector-machines/和https://www.sciencedirect.com/topics/computer-science/support-vector-machine

????????正如我們?cè)谏厦婵吹降?#xff0c;支持向量分類器可以處理異常值并允許錯(cuò)誤分類。但是，我們?nèi)绾翁幚砣缦滤镜闹丿B數(shù)據(jù)呢？

????????這就是支持向量機(jī)發(fā)揮作用的地方。讓我們?yōu)閱栴}添加另一個(gè)維度。我們有特征 X，作為新的維度，我們?nèi)?X 的平方并將其繪制在 y 軸上。

由于現(xiàn)在的數(shù)據(jù)是二維的，我們可以畫一條支持向量分類器線。

????????支持向量機(jī)獲取低維數(shù)據(jù)，將其移至更高維度，并找到支持向量分類器。

????????與我們上面所做的類似，支持向量機(jī)使用核函數(shù)來查找更高維度的支持向量分類器。核函數(shù)是一種函數(shù)，它采用原始輸入空間中的兩個(gè)輸入數(shù)據(jù)點(diǎn)，并計(jì)算變換后（高維）特征空間中它們對(duì)應(yīng)的特征向量的內(nèi)積。

????????核函數(shù)允許 SVM 在變換后的特征空間中運(yùn)行，而無需顯式計(jì)算變換后的特征向量，這對(duì)于大型數(shù)據(jù)集或復(fù)雜的變換來說計(jì)算成本可能很高。相反，核函數(shù)直接在原始輸入空間中計(jì)算特征向量之間的內(nèi)積。這稱為內(nèi)核技巧。

三、多項(xiàng)式核

????????多項(xiàng)式核用于將輸入數(shù)據(jù)從低維空間變換到高維空間，在高維空間中使用線性決策邊界更容易分離類。

????????多項(xiàng)式核。

????????a和b是兩個(gè)不同的觀測(cè)值，r是多項(xiàng)式系數(shù)，d是多項(xiàng)式的次數(shù)。假設(shè)d為 2，r為 1/2。

數(shù)學(xué)。

????????我們最終得到一個(gè)點(diǎn)積。第一項(xiàng)（a和b）是 x 軸，第二項(xiàng)（a2和b2）是 y 軸。因此，我們需要做的就是計(jì)算每對(duì)點(diǎn)之間的點(diǎn)積。例如更高維度中兩點(diǎn)之間的關(guān)系；a = 9，b = 14 => (9 x 114 + 1/2)2 = 16000,25。

四、徑向內(nèi)核 (RBF)

????????徑向核在無限維度中查找支持向量分類器。

????????它為距離測(cè)試點(diǎn)較近的點(diǎn)分配較高的權(quán)重，為較遠(yuǎn)的點(diǎn)（如最近的鄰居）分配較低的權(quán)重。較遠(yuǎn)的觀察對(duì)數(shù)據(jù)點(diǎn)的分類影響相對(duì)較小。

內(nèi)核函數(shù)。

????????它計(jì)算兩個(gè)數(shù)據(jù)之間的平方距離。Gamma 由交叉驗(yàn)證確定，它會(huì)縮放平方距離，這意味著它會(huì)縮放兩個(gè)點(diǎn)彼此之間的影響。在此公式中，隨著兩點(diǎn)之間的距離增加，該值將接近于零。

????????當(dāng)類之間的決策邊界是非線性且復(fù)雜的時(shí)，徑向核特別有用，因?yàn)樗梢圆东@輸入特征之間的復(fù)雜關(guān)系。

五、Python實(shí)現(xiàn)

????????我們可以使用支持向量機(jī)sklearn.

from sklearn.svm import SVC

具有不同內(nèi)核的 SVC。來源

SVC接受一些參數(shù)：

C是正則化參數(shù)。較大的值會(huì)使模型在訓(xùn)練數(shù)據(jù)上犯更多錯(cuò)誤（錯(cuò)誤分類）。因此，它的目的是有一個(gè)更好的概括。默認(rèn)值為 1。
kernel設(shè)置核函數(shù)。默認(rèn)為rbf。其他選擇是：Linear、poly、sigmoid和precompulated。此外，您還可以傳遞自己的內(nèi)核函數(shù)。
degree指定多項(xiàng)式核的次數(shù)。僅當(dāng)內(nèi)核是多項(xiàng)式時(shí)它才可用。默認(rèn)值為 3。
gamma控制核函數(shù)的形狀。它可用于rbf、poly和sigmoid內(nèi)核，較小的 gamma 值使決策邊界更平滑，較大的值使決策邊界更復(fù)雜。默認(rèn)值是比例，等于 1 / (n_features x X.var())。auto是 1 / n_features。或者您可以傳遞一個(gè)浮點(diǎn)值。
coef0僅用于 poly 和 sigmoid 內(nèi)核。它控制多項(xiàng)式核函數(shù)中高階項(xiàng)的影響。默認(rèn)值為 0。
shrinking控制是否使用收縮啟發(fā)式。這是一個(gè)加速啟發(fā)式過程。
tol是停止標(biāo)準(zhǔn)的容差。當(dāng)目標(biāo)函數(shù)的變化小于tol時(shí)，優(yōu)化過程將停止。
class_weight平衡分類問題中類別的權(quán)重?？梢詫⑵湓O(shè)置為平衡，以根據(jù)課程頻率自動(dòng)調(diào)整權(quán)重。默認(rèn)值為“無”。
max_iter是迭代極限。-1 表示無限制（默認(rèn)）。
probability指定是否啟用概率估計(jì)。當(dāng)它設(shè)置為 True 時(shí)，估計(jì)器將估計(jì)類概率，而不僅僅是返回預(yù)測(cè)的類標(biāo)簽。當(dāng)probability設(shè)置為 True 時(shí)，可以使用predict_proba該類的方法來獲取新數(shù)據(jù)點(diǎn)的類標(biāo)簽的估計(jì)概率。SVC
cache_size用于設(shè)置SVM算法使用的內(nèi)核緩存的大小。當(dāng)訓(xùn)練樣本數(shù)量非常大或者內(nèi)核計(jì)算成本很高時(shí)，內(nèi)核緩存會(huì)很有用。通過將核評(píng)估存儲(chǔ)在緩存中，SVM 算法可以在計(jì)算正則化參數(shù) C 的不同值的決策函數(shù)時(shí)重用結(jié)果。

SVC 使用具有不同參數(shù)的 RBF 內(nèi)核。來源

一個(gè)簡(jiǎn)單的實(shí)現(xiàn)：

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC# cancer data
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, random_state=42)# parameters
params = {'C': 1.0, 'kernel': 'rbf', 'gamma': 'scale', 
'probability': False, 'cache_size': 200}# training
svc = SVC(**params)
svc.fit(X_train, y_train)# we can use svc's own score function
score = svc.score(X_test, y_test)
print("Accuracy on test set: {:.2f}".format(score))
#Accuracy on test set: 0.95

六、回歸

????????我們也可以在回歸問題中使用支持向量機(jī)。

from sklearn.svm import SVR

epsilon是指定回歸線周圍容差大小的參數(shù)?；貧w線由 SVR 算法確定，使其在一定的誤差范圍內(nèi)擬合訓(xùn)練數(shù)據(jù)，該誤差范圍由參數(shù)定義epsilon。

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error# the California Housing dataset
california = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(california.data, california.target, random_state=42)# training
svr = SVR(kernel='rbf', C=1.0, epsilon=0.1)
svr.fit(X_train, y_train)# Evaluate the model on the testing data
y_pred = svr.predict(X_test)
mse = mean_squared_error(y_test, y_pred)print("MSE on test set: {:.2f}".format(mse))
#MSE on test set: 1.35

????????我們還可以使用來繪制邊界matplotlib。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.svm import SVC# Load the Iris dataset
iris = load_iris()# Extract the first two features (sepal length and sepal width)
X = iris.data[:, :2]
y = iris.target# Create an SVM classifier
svm = SVC(kernel='linear', C=1.0)
svm.fit(X, y)# Create a mesh of points to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),np.arange(y_min, y_max, 0.02))# Plot the decision boundary
Z = svm.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)# Plot the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.title('SVM decision boundary for Iris dataset')plt.show()

邊界。圖片由作者提供。

SVM 是一種相對(duì)較慢的方法。

import time
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split# Load the breast cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)# Fit a logistic regression model and time it
start_time = time.time()
lr = LogisticRegression(max_iter=1000)
lr.fit(X_train, y_train)
end_time = time.time()
lr_runtime = end_time - start_time# Fit an SVM model and time it
start_time = time.time()
svm = SVC(kernel='linear', C=1.0)
svm.fit(X_train, y_train)
end_time = time.time()
svm_runtime = end_time - start_time# Print the runtimes
print("Logistic regression runtime: {:.3f} seconds".format(lr_runtime))
print("SVM runtime: {:.3f} seconds".format(svm_runtime))"""
Logistic regression runtime: 0.112 seconds
SVM runtime: 0.547 seconds
"""

支持向量機(jī) (SVM) 可能會(huì)很慢，原因如下：

SVM 是計(jì)算密集型的：SVM 涉及解決凸優(yōu)化問題，對(duì)于具有許多特征的大型數(shù)據(jù)集來說，計(jì)算成本可能很高。SVM 的時(shí)間復(fù)雜度通常至少為 O(n2)，其中 n 是數(shù)據(jù)點(diǎn)的數(shù)量，對(duì)于非線性內(nèi)核來說，時(shí)間復(fù)雜度可能要高得多。
用于調(diào)整超參數(shù)的交叉驗(yàn)證：SVM需要調(diào)整超參數(shù)，例如正則化參數(shù)C和核超參數(shù)，這涉及使用交叉驗(yàn)證來評(píng)估不同的超參數(shù)設(shè)置。這可能非常耗時(shí)，尤其是對(duì)于大型數(shù)據(jù)集或復(fù)雜模型。
大量支持向量：對(duì)于非線性SVM，支持向量的數(shù)量會(huì)隨著數(shù)據(jù)集的大小或模型的復(fù)雜性而快速增加。這可能會(huì)減慢預(yù)測(cè)時(shí)間，尤其是在模型需要頻繁重新訓(xùn)練的情況下。

我們可以通過嘗試以下一些方法來加速 SVM：

使用線性核：線性 SVM 的訓(xùn)練速度比非線性 SVM 更快，因?yàn)閮?yōu)化問題更簡(jiǎn)單。如果您的數(shù)據(jù)是線性可分的或者不需要高度復(fù)雜的模型，請(qǐng)考慮使用線性核。

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC, LinearSVC
import time# Load MNIST digits dataset
mnist = fetch_openml('mnist_784', version=1)
data, target = mnist['data'], mnist['target']
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=42)# Train linear SVM
start_time = time.time()
linear_svc = LinearSVC()
linear_svc.fit(X_train, y_train)
linear_train_time = time.time() - start_time# Train non-linear SVM with RBF kernel
start_time = time.time()
rbf_svc = SVC(kernel='rbf')
rbf_svc.fit(X_train, y_train)
rbf_train_time = time.time() - start_timeprint('Linear SVM training time:', linear_train_time)
print('Non-linear SVM training time:', rbf_train_time)"""
Linear SVM training time: 109.03955698013306
Non-linear SVM training time: 165.98812198638916
"""

使用較小的數(shù)據(jù)集：如果您的數(shù)據(jù)集非常大，請(qǐng)考慮使用較小的數(shù)據(jù)子集進(jìn)行訓(xùn)練。您可以使用隨機(jī)抽樣或分層抽樣等技術(shù)來確保子集代表完整數(shù)據(jù)集。
使用特征選擇：如果您的數(shù)據(jù)集具有許多特征，請(qǐng)考慮使用特征選擇技術(shù)來減少特征數(shù)量。這可以降低問題的維度并加快訓(xùn)練速度。
使用較小的值C：正則化參數(shù)C控制最大化邊際和最小化分類誤差之間的權(quán)衡。較小的值C可以產(chǎn)生具有較少支持向量的更簡(jiǎn)單的模型，這可以加速訓(xùn)練和預(yù)測(cè)。

import time
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split# Load the breast cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)for C in [0.1, 1, 10]:start_time = time.time()svm = SVC(kernel='linear', C=C, random_state=42)svm.fit(X_train, y_train)train_time = time.time() - start_timeprint('Training time with C={}: {:.2f}s'.format(C, train_time))"""
Training time with C=0.1: 0.08s
Training time with C=1: 0.55s
Training time with C=10: 0.90s
"""

使用緩存：SVM 涉及計(jì)算數(shù)據(jù)點(diǎn)對(duì)之間的內(nèi)積，這可能會(huì)導(dǎo)致計(jì)算成本高昂。Scikit-learn 的 SVM 實(shí)現(xiàn)包括一個(gè)緩存，用于存儲(chǔ)常用數(shù)據(jù)點(diǎn)的內(nèi)積值，這可以加快訓(xùn)練和預(yù)測(cè)速度。您可以使用參數(shù)調(diào)整緩存的大小cache_size。

from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
import time# Load the dataset
X, y = load_breast_cancer(return_X_y=True)# Train the model without a cache
start_time = time.time()
clf = SVC(kernel='linear', cache_size=1).fit(X, y)
end_time = time.time()
print(f"Training time without cache: {end_time - start_time:.3f} seconds")# Train the model with a cache of 200 MB
start_time = time.time()
clf_cache = SVC(kernel='linear', cache_size=200, max_iter=10000).fit(X, y)
end_time = time.time()
print(f"Training time with cache: {end_time - start_time:.3f} seconds")"""
Training time without cache: 0.535 seconds
Training time with cache: 0.014 seconds
"""

七、結(jié)論

????????一般來說，SVM 適用于特征數(shù)量與樣本數(shù)量相比相對(duì)較少且不同類之間有明顯分離余量的分類任務(wù)。SVM 還可以處理高維數(shù)據(jù)以及特征和目標(biāo)變量之間的非線性關(guān)系。然而，SVM 可能不適合非常大的數(shù)據(jù)集，因?yàn)樗鼈兛赡苁怯?jì)算密集型的并且需要大量?jī)?nèi)存。

參考文章：

查看全文

http://m.aloenet.com.cn/news/1625.html

国产亚洲精品福利在线无卡一,国产精久久一区二区三区,亚洲精品无码国模,精品久久久久久无码专区不卡

西安市城鄉(xiāng)建設(shè)委員會(huì)網(wǎng)站6關(guān)鍵詞生成器

一、說明

二、讓我們逐步了解 SVM?

三、多項(xiàng)式核

四、徑向內(nèi)核 (RBF)

五、Python實(shí)現(xiàn)

六、回歸

七、結(jié)論

相關(guān)文章：

国产亚洲精品福利在线无卡一,国产精久久一区二区三区,亚洲精品无码国模,精品久久久久久无码专区不卡

一、說明

二、讓我們逐步了解 SVM?

三、多項(xiàng)式核

四、徑向內(nèi)核 (RBF)

五、Python實(shí)現(xiàn)

六、回歸

七、結(jié)論

相關(guān)文章：

一、說明

二、讓我們逐步了解 SVM?

三、多項(xiàng)式核

七、結(jié)論