湖北省建設(shè)質(zhì)量安全協(xié)會(huì)網(wǎng)站seo網(wǎng)站推廣專員招聘
一、attention機(jī)制
????注意力模型最近幾年在深度學(xué)習(xí)各個(gè)領(lǐng)域被廣泛使用,無(wú)論是圖像處理、語(yǔ)音識(shí)別還是自然語(yǔ)言處理的各種不同類型的任務(wù)中,都很容易遇到注意力模型的身影。從注意力模型的命名方式看,很明顯其借鑒了人類的注意力機(jī)制。我們來(lái)看下面的一張圖片。
????圖中形象化展示了人類在看到一副圖像時(shí)是如何高效分配有限的注意力資源的,其中紅色區(qū)域表明視覺(jué)系統(tǒng)更關(guān)注的目標(biāo),很明顯對(duì)于圖1所示的場(chǎng)景,人們會(huì)把注意力更多投入到人的臉部,文本的標(biāo)題以及文章首句等位置。
???視覺(jué)注意力機(jī)制是人類視覺(jué)所特有的大腦信號(hào)處理機(jī)制。人類視覺(jué)通過(guò)快速掃描全局圖像,獲得需要重點(diǎn)關(guān)注的目標(biāo)區(qū)域,也就是一般所說(shuō)的注意力焦點(diǎn),而后對(duì)這一區(qū)域投入更多注意力資源,以獲取更多所需要關(guān)注目標(biāo)的細(xì)節(jié)信息,而抑制其他無(wú)用信息。深度學(xué)習(xí)中的注意力機(jī)制的核心就是讓網(wǎng)絡(luò)關(guān)注其更需要更重要的地方,注意力機(jī)制就是實(shí)現(xiàn)網(wǎng)絡(luò)自適應(yīng)的一個(gè)方式。
????注意力機(jī)制的本質(zhì)就是定位到感興趣的信息,抑制無(wú)用信息,結(jié)果通常都是以概率圖或者概率特征向量的形式展示,從原理上來(lái)說(shuō),主要分為空間注意力模型,通道注意力模型,空間和通道混合注意力模型三種。那么今天我們主要介紹通道注意力機(jī)制。
1、通道注意力機(jī)制
????通道注意力機(jī)制最經(jīng)典的應(yīng)用就是SENet(Sequeeze and Excitation Net),它通過(guò)建模各個(gè)特征通道的重要程度,然后針對(duì)不同的任務(wù)增強(qiáng)或者抑制不同的通道,原理圖如下。
?
???????在正常的卷積操作后分出了一個(gè)旁路分支,首先進(jìn)行Squeeze操作(即圖中Fsq(·)),它將空間維度進(jìn)行特征壓縮,即每個(gè)二維的特征圖變成一個(gè)實(shí)數(shù),相當(dāng)于具有全局感受野的池化操作,特征通道數(shù)不變。然后是Excitation操作(即圖中的Fex(·)),它通過(guò)參數(shù)w為每個(gè)特征通道生成權(quán)重,w被學(xué)習(xí)用來(lái)顯式地建模特征通道間的相關(guān)性。在文章中,使用了一個(gè)2層bottleneck結(jié)構(gòu)(先降維再升維)的全連接層+Sigmoid函數(shù)來(lái)實(shí)現(xiàn)。得到了每一個(gè)特征通道的權(quán)重之后,就將該權(quán)重應(yīng)用于原來(lái)的每個(gè)特征通道,基于特定的任務(wù),就可以學(xué)習(xí)到不同通道的重要性。作為一種通用的設(shè)計(jì)思想,它可以被用于任何現(xiàn)有網(wǎng)絡(luò),具有較強(qiáng)的實(shí)踐意義。
????綜上通道注意力計(jì)算公式總結(jié)為:
????關(guān)于通道注意力機(jī)制的原理就介紹到這里,想要了解具體原理的,大家可以參考文獻(xiàn):Squeeze-and-Excitation Networks
二、代碼實(shí)戰(zhàn)
clc clear ? close all load Train.mat % load Test.mat Train.weekend = dummyvar(Train.weekend); Train.month = dummyvar(Train.month); Train = movevars(Train,{'weekend','month'},'After','demandLag'); Train.ts = []; ? ? Train(1,:) =[]; y = Train.demand; x = Train{:,2:5}; [xnorm,xopt] = mapminmax(x',0,1); [ynorm,yopt] = mapminmax(y',0,1); ? xnorm = xnorm(:,1:1000); ynorm = ynorm(1:1000); ? k = 24; % 滯后長(zhǎng)度 ? % 轉(zhuǎn)換成2-D image for i = 1:length(ynorm)-k ?Train_xNorm{:,i} = xnorm(:,i:i+k-1);Train_yNorm(i) = ynorm(i+k-1);Train_y{i} = y(i+k-1); end Train_x = Train_xNorm'; ? ytest = Train.demand(1001:1170); xtest = Train{1001:1170,2:5}; [xtestnorm] = mapminmax('apply', xtest',xopt); [ytestnorm] = mapminmax('apply',ytest',yopt); % xtestnorm = [xtestnorm; Train.weekend(1001:1170,:)'; Train.month(1001:1170,:)']; xtest = xtest'; for i = 1:length(ytestnorm)-kTest_xNorm{:,i} = xtestnorm(:,i:i+k-1);Test_yNorm(i) = ytestnorm(i+k-1);Test_y(i) = ytest(i+k-1); end Test_x = Test_xNorm'; x_train = table(Train_x,Train_y'); x_test = table(Test_x); %% 訓(xùn)練集和驗(yàn)證集劃分 % TrainSampleLength = length(Train_yNorm); % validatasize = floor(TrainSampleLength * 0.1); % Validata_xNorm = Train_xNorm(:,end - validatasize:end,:); % Validata_yNorm = Train_yNorm(:,TrainSampleLength-validatasize:end); % Validata_y = Train_y(TrainSampleLength-validatasize:end); % % Train_xNorm = Train_xNorm(:,1:end-validatasize,:); % Train_yNorm = Train_yNorm(:,1:end-validatasize); % Train_y = Train_y(1:end-validatasize); %% 構(gòu)建殘差神經(jīng)網(wǎng)絡(luò) lgraph = layerGraph(); tempLayers = [imageInputLayer([4 24 1],"Name","imageinput")convolution2dLayer([3 3],32,"Name","conv","Padding","same")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [batchNormalizationLayer("Name","batchnorm")reluLayer("Name","relu")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [additionLayer(2,"Name","addition")convolution2dLayer([3 3],32,"Name","conv_1","Padding","same")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [batchNormalizationLayer("Name","batchnorm_1")reluLayer("Name","relu_1")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [additionLayer(2,"Name","addition_1")convolution2dLayer([3 3],32,"Name","conv_2","Padding","same")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [batchNormalizationLayer("Name","batchnorm_2")reluLayer("Name","relu_2")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [additionLayer(2,"Name","addition_2")convolution2dLayer([3 3],32,"Name","conv_3","Padding","same")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [batchNormalizationLayer("Name","batchnorm_3")reluLayer("Name","relu_3")]; lgraph?=?addLayers(lgraph,tempLayers); tempLayers = [additionLayer(2,"Name","addition_4")sigmoidLayer("Name","sigmoid")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = multiplicationLayer(2,"Name","multiplication"); lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [additionLayer(3,"Name","addition_3")fullyConnectedLayer(32,"Name","fc1")fullyConnectedLayer(16,"Name","fc2")fullyConnectedLayer(1,"Name","fc3")regressionLayer("Name","regressionoutput")]; lgraph = addLayers(lgraph,tempLayers); ? % 清理輔助變量 clear?tempLayers; plot(lgraph); analyzeNetwork(lgraph); %% 設(shè)置網(wǎng)絡(luò)參數(shù) maxEpochs = 100; miniBatchSize = 32; options = trainingOptions('adam', ...'MaxEpochs',maxEpochs, ...'MiniBatchSize',miniBatchSize, ...'InitialLearnRate',0.005, ...'GradientThreshold',1, ...'Shuffle','never', ...'Plots','training-progress',...'Verbose',0); ? net = trainNetwork(x_train,lgraph ,options); ? Predict_yNorm = predict(net,x_test); Predict_y = double(Predict_yNorm); plot(Test_y) hold on plot(Predict_y) legend('真實(shí)值','預(yù)測(cè)值') ?
?訓(xùn)練迭代圖:
試集預(yù)測(cè)曲線圖
完整代碼