国产亚洲精品福利在线无卡一,国产精久久一区二区三区,亚洲精品无码国模,精品久久久久久无码专区不卡

當(dāng)前位置: 首頁(yè) > news >正文

邢臺(tái)網(wǎng)站推廣費(fèi)用seo權(quán)威入門(mén)教程

邢臺(tái)網(wǎng)站推廣費(fèi)用,seo權(quán)威入門(mén)教程,怎么做網(wǎng)站策劃,做網(wǎng)站找哪里前言: 蒙特卡羅的學(xué)習(xí)基本流程: Policy Evaluation : 生成動(dòng)作-狀態(tài)軌跡,完成價(jià)值函數(shù)的估計(jì)。 Policy Improvement: 通過(guò)價(jià)值函數(shù)估計(jì)來(lái)優(yōu)化policy。 同策略(one-policy):產(chǎn)生 采樣軌跡的策略 和要改…

前言:

? ? 蒙特卡羅的學(xué)習(xí)基本流程:

? ? ?Policy Evaluation :? ? ? ? ? 生成動(dòng)作-狀態(tài)軌跡,完成價(jià)值函數(shù)的估計(jì)。

? ? ?Policy Improvement:? ? ? ?通過(guò)價(jià)值函數(shù)估計(jì)來(lái)優(yōu)化policy。

? ? ? ?同策略(one-policy):產(chǎn)生 采樣軌跡的策略 \pi^{'}?和要改善的策略 \pi?相同。

? ? ? ?Policy Evaluation :? ? 通過(guò)\epsilon-貪心策略(\pi^{'}),產(chǎn)生(狀態(tài)-動(dòng)作-獎(jiǎng)賞)軌跡。

? ? ? ?Policy Improvement:? 原始策略也是?\epsilon-貪心策略(\pi^{'}), 通過(guò)價(jià)值函數(shù)優(yōu)化,?\epsilon-貪心策略(\pi^{'})

? ? ? 異策略(off-policy):產(chǎn)生采樣軌跡的? 策略 \pi^{'}?和要改善的策略 \pi?不同。

? ? ? Policy Evaluation :? ?通過(guò)\epsilon-貪心策略(\pi^{'}),產(chǎn)生采樣軌跡(狀態(tài)-動(dòng)作-獎(jiǎng)賞)。

? ? ? Policy Improvement:? 改進(jìn)原始策略\pi

? ? 兩個(gè)優(yōu)勢(shì):

? ? 1: 原始策略不容易采樣

? ? 2: 降低方差

易策略常用的方案為?IR(importance sample) 重要性采樣

Importance sampling?is a?Monte Carlo method?for evaluating properties of a particular?distribution, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally attributed to a paper by?Teun Kloek?and?Herman K. van Dijk?in 1978,[1]?but its precursors can be found in?statistical physics?as early as 1949.[2][3]?Importance sampling is also related to?umbrella sampling?in?computational physics. Depending on the application, the term may refer to the process of sampling from this alternative distribution, the process of inference, or both.


一? importance-samling

? ? 1.1 原理

? ? ?原始問(wèn)題:

? ? ??u_f=\int_x p(z)f(z)dx

? ? ?如果采樣N次,得到z_1,z_2,...z_N

? ? ? ?u_f \approx \frac{1}{N}\sum_{z_i \sim p(z)}f(z_i)

? ? 問(wèn)題:?p(z)?很難采樣(采樣空間很大,很多時(shí)候只能采樣到一部分)

? ?引入 q(x) 重要性分布(這也是一個(gè)分布,容易被采樣)

??w(x)=\frac{p(x)}{q(x)}: 稱為importance weight

? ? ? ? ? ??u_f =\int q(x)\frac{p(x)}{q(x)}f(x)dx

? ? ? ? ? ? ?\approx \frac{1}{N}\sum_i w(x_i)f(x_i)(大數(shù)定理)

?下面例子,我們需要對(duì)w(x_i),做歸一化處理,更清楚的看出來(lái)占比

? ?下面代碼進(jìn)行了歸一化處理,方案如下:

? ? ?w(x_i)=log p(x_i)-log q(x_i)

? ? ?w^1(x_i)=\frac{e^{w(x_i)}}{\sum_j e^{w(x_i)}}

? ? ?w^2(x_i)=w(x_i)-log\sum_j(e^{w(x_j)})

? ? ??

# -*- coding: utf-8 -*-
"""
Created on Wed Nov  8 16:38:34 2023@author: chengxf2
"""import numpy as np
import matplotlib.pyplot as plt
from scipy.special import logsumexpclass pdf:def __call__(self,x):passdef sample(self,n):pass#正太分布的概率密度
class Norm(pdf):#返回一組符合高斯分布的概率密度隨機(jī)數(shù)。def __init__(self, mu=0, sigma=1):self.mu = muself.sigma = sigmadef __call__(self, x):#log p 功能,去掉前面常數(shù)項(xiàng)logp = (x-self.mu)**2/(2*self.sigma**2)return -logpdef sample(self, N):#產(chǎn)生N 個(gè)點(diǎn),這些點(diǎn)符合正太分布x = np.random.normal(self.mu, self.sigma,N)return xclass Uniform(pdf):#均勻分布的概率密度def __init__(self, low, high):self.low = lowself.high = highdef __call__(self, x):#logq 功能N = len(x)a = np.repeat(-np.log(self.high-self.low), N)return -adef sample(self, N):#產(chǎn)生N 點(diǎn),這些點(diǎn)符合均勻分布x = np.random.uniform(self.low, self.high,N)return xclass ImportanceSampler:def __init__(self, p_dist, q_dist):self.p_dist = p_distself.q_dist = q_distdef sample(self, N):#采樣samples = self.q_dist.sample(N)weights = self.calc_weights(samples)normal_weights = weights - logsumexp(weights)return samples, normal_weightsdef calc_weights(self, samples):#log (p/q) =log(p)-log(q)return self.p_dist(samples)-self.q_dist(samples)if __name__ == "__main__":N = 10000p = Norm()q = Uniform(-10, 10)  sampler = ImportanceSampler(p, q)#samples 從q(x)采樣出來(lái)的點(diǎn),weight_samplesamples,weight_sample= sampler.sample(N)#以weight_sample的概率,從samples中抽樣 N 個(gè)點(diǎn)samples = np.random.choice(samples,N, p = np.exp(weight_sample))plt.hist(samples, bins=100)


二 易策略 off-policy 原理

? ? ?target policy \pi: 原始策略?

? ? ? ? x:? ? ?這里面代表基于原始策略,得到的軌跡

? ? ? ? ? ? ? ? ??\begin{bmatrix} s_0,a_0,r_1,....s_{T-1},a_{T-1},r_T,s_T \end{bmatrix}

? ? ? ?p(x):? ?該軌跡的概率

? ? ? ?f(x):? ? 該軌跡的累積獎(jiǎng)賞

? ? ? 期望的累積獎(jiǎng)賞:

? ? ? ? ? ? ? ? ? ??u_f=\int_{x} f(x)p(x)dx \approx \frac{1}{N}\sum f(x_i)

? ? behavior policy?\pi^{'}: 行為策略

? ? ?q(x): 代表各種軌跡的采樣概率

? ? 則累積獎(jiǎng)賞函數(shù)f在概率p 也可以等價(jià)的寫(xiě)為:

? ? ?u_f=\int_{x}q(x)\frac{p(x)}{q(x)}f(x)dx

? ? ?E[f] \approx \frac{1}{m}\sum_{i=1}^{m}\frac{p(x_i)}{q(x_i)}f(x_i)

? ?

? ? ?P_i^{\pi}?和?P^{\pi^{'}}?分別表示兩個(gè)策略產(chǎn)生i 條軌跡的概率,對(duì)于給定的一條軌跡

? ??\begin{bmatrix} s_0,a_0,r_1,....s_{T-1},a_{T-1},r_T,s_T \end{bmatrix}

? ? 原始策略\pi?產(chǎn)生該軌跡的概率:

? ? ?P^{\pi}=\prod_{i=0}^{T-1} \pi(s_i,a_i)P_{s_i\rightarrow s_{i+1}}^{a_i}

? ??P^{\pi^{'}}=\prod_{i=0}^{T-1} \pi^{'}(s_i,a_i)P_{s_i\rightarrow s_{i+1}}^{a_i}

? ?則

? ??w(s)=\frac{P^{\pi}}{p^{\pi^{'}}}=\prod_{i=0}^{T-1}\frac{\pi(s_i,a_i)}{\pi^{'}(s_i,a_i)}

? 若\pi?為確定性策略,但是\pi^{'}?是\pi\epsilon -貪心策略:

原始策略? ?p_i=\left\{\begin{matrix} \pi(s_i,a_i)=1, if: a_i==\pi(x_i) \\ \pi(s_i,a_i)=0, if: a_i \neq \pi(x_i) \end{matrix}\right.

行為策略:?q_i=\left\{\begin{matrix} \pi^{'}(s_i,a_i)=1-\epsilon+\frac{\epsilon }{|A|} , if: a_i==\pi(x_i) \\ \pi^{'}(s_i,a_i)=\frac{\epsilon }{|A|}, if: a_i \neq \pi(x_i) \end{matrix}\right.

? 現(xiàn)在通過(guò)行為策略產(chǎn)生的軌跡度量權(quán)重w

?理論上應(yīng)該是連乘的,但是p_i=0, if a_i \neq \pi(x_i),

?考慮到只是概率的比值,上面可以做個(gè)替換

?w(s)=\frac{p^{\pi}}{p^{\pi^{'}}}=\prod\frac{e^{p_i}}{e^{q_i}}=\prod e^{p_i-q_i}

其中:?w_i=\frac{e^{p_i}}{e^{q_i}}=e^{p_i-q_i}更靈活的利用importance sample)

其核心是要計(jì)算兩個(gè)概率比值,上面的例子是去log,再歸一化


三? 方差影響


四? 代碼

代碼里面R的計(jì)算方式跟上面是不同的,

R=\frac{1}{T-t}(\sum_{i=t}^{T-1}r_i)(\prod_{j=t}^{T-1}w_j)

w_j=e^{p_j-q_j}

# -*- coding: utf-8 -*-
"""
Created on Wed Nov  8 11:56:26 2023@author: chengxf2
"""import numpy as ap
# -*- coding: utf-8 -*-
"""
Created on Fri Nov  3 09:37:32 2023@author: chengxf2
"""# -*- coding: utf-8 -*-
"""
Created on Thu Nov  2 19:38:39 2023@author: cxf
"""
import numpy as np
import random
from enum import Enumclass State(Enum):#狀態(tài)空間#shortWater =1 #缺水health = 2   #健康overflow = 3 #溢水a(chǎn)poptosis = 4 #凋亡class Action(Enum):#動(dòng)作空間A#water = 1 #澆水noWater = 2 #不澆水class Env():def reward(self, state):#針對(duì)轉(zhuǎn)移到新的環(huán)境獎(jiǎng)賞    r = -100if state is State.shortWater:r =-1elif state is State.health:r = 1elif state is State.overflow:r= -1else: # State.apoptosisr = -100return rdef action(self, state, action):if state is State.shortWater:if action is Action.water :newState =[State.shortWater, State.health]p =[0.4, 0.6]else:newState =[State.shortWater, State.apoptosis]p =[0.4, 0.6]elif state is State.health:#健康if action is Action.water :newState =[State.health, State.overflow]p =[0.6, 0.4]else:newState =[State.shortWater, State.health]p =[0.6, 0.4]elif state is State.overflow:#溢水if action is Action.water :newState =[State.overflow, State.apoptosis]p =[0.6, 0.4]else:newState =[State.health, State.overflow]p =[0.6, 0.4]else:  #凋亡newState=[State.apoptosis]p =[1.0]#print("\n S",S, "\t prob ",proba)nextState = random.choices(newState, p)[0]r = self.reward(nextState)return nextState,rdef __init__(self):self.name = "環(huán)境空間"class Agent():def initPolicy(self):#初始化累積獎(jiǎng)賞self.Q ={} #(state,action) 的累積獎(jiǎng)賞self.count ={} #(state,action) 執(zhí)行的次數(shù)for state in self.S:for action in self.A:self. Q[state, action] = 0.0self.count[state,action]= 0action = self.randomAction()self.policy[state]= Action.noWater #初始化都不澆水def randomAction(self):#隨機(jī)策略action = random.choices(self.A, [0.5,0.5])[0]return actiondef behaviorPolicy(self):#使用e-貪心策略state = State.shortWater #從缺水開(kāi)始env = Env()trajectory ={}#[s0,a0,r0]--[s1,a1,r1]--[sT-1,aT-1,rT-1]for t in range(self.T):#選擇策略rnd = np.random.rand() #生成隨機(jī)數(shù)if rnd <self.epsilon:action =self.randomAction()else:#通過(guò)原始策略選擇actionaction = self.policy[state] newState,reward = env.action(state, action) trajectory[t]=[state,action,reward]state = newStatereturn trajectorydef calcW(self,trajectory):#計(jì)算權(quán)重q1 = 1.0-self.epsilon+self.epsilon/2.0 # a== 原始策略q2 = self.epsilon/2.0   # a!=原始策略w ={}for t, value in trajectory.items():#[state, action,reward]action =value[1]state = value[0]if action == self.policy[state]:p = 1q = q1else:p = 0q = q2w[t] = round(np.exp(p-q),3)#print("\n w ",w)return wdef getReward(self,t,wDict,trajectory):p = 1.0r=  0#=[state,action,reward]for i in range(t,self.T):r+=trajectory[t][-1]w =wDict[t]p =p*wR = p*rm = self.T-treturn R/mdef  improve(self):a = Action.noWaterfor state in self.S:maxR = self.Q[state, a]for action in self.A:R = self.Q[state,action]if R>=maxR:maxR = Rself.policy[state]= actiondef learn(self):self.initPolicy()for s in range(1,self.maxIter): #采樣第S 條軌跡#通過(guò)行為策略(e-貪心策略)產(chǎn)生軌跡trajectory =self.behaviorPolicy()w = self.calcW(trajectory)print("\n 迭代次數(shù) %d"%s ,"\t 缺水:",self.policy[State.shortWater].name,"\t 健康:",self.policy[State.health].name,"\t 溢水:",self.policy[State.overflow].name,"\t 凋亡:",self.policy[State.apoptosis].name)#策略評(píng)估for t in range(self.T):R = self.getReward(t, w,trajectory)state = trajectory[t][0]action = trajectory[t][1]Q = self.Q[state,action]count  = self.count[state, action]self.Q[state,action] = (Q*count+R)/(count+1)self.count[state, action]=count+1#獲取權(quán)重系數(shù)self.improve() def __init__(self):self.S = [State.shortWater, State.health, State.overflow, State.apoptosis]self.A = [Action.water, Action.noWater]self.Q ={} #累積獎(jiǎng)賞self.count ={}self.policy ={} #target Policyself.maxIter =500self.epsilon = 0.2self.T = 10if  __name__ == "__main__":agent = Agent()agent.learn()

https://img2020.cnblogs.com/blog/1027447/202110/1027447-20211013112906490-1926128536.png

http://m.aloenet.com.cn/news/33659.html

相關(guān)文章:

  • 做網(wǎng)站當(dāng)生日禮物網(wǎng)絡(luò)營(yíng)銷五個(gè)特點(diǎn)
  • 承德做網(wǎng)站優(yōu)化百度識(shí)圖鑒你所見(jiàn)
  • 網(wǎng)站開(kāi)發(fā)怎么做常用的網(wǎng)絡(luò)營(yíng)銷平臺(tái)有哪些
  • 茂名百度搜索網(wǎng)站排名青島網(wǎng)頁(yè)搜索排名提升
  • 收款網(wǎng)站怎么建設(shè)網(wǎng)絡(luò)營(yíng)銷方案設(shè)計(jì)
  • 網(wǎng)頁(yè)制作公司網(wǎng)站網(wǎng)絡(luò)事件營(yíng)銷案例
  • 陜西省建設(shè)網(wǎng)三類人員證書(shū)查詢正規(guī)網(wǎng)站優(yōu)化公司
  • 北京市文化局政務(wù)網(wǎng)站建設(shè)項(xiàng)目有沒(méi)有推廣app的平臺(tái)
  • wordpress 新聞采集站百度做廣告怎么做
  • 自己網(wǎng)站可以加標(biāo)志嗎域名搜索引擎
  • 湖南手機(jī)版建站系統(tǒng)哪家好石家莊學(xué)院
  • 校園互動(dòng)網(wǎng)站建設(shè)網(wǎng)絡(luò)營(yíng)銷的宏觀環(huán)境
  • 本溪做網(wǎng)站的快優(yōu)吧seo優(yōu)化
  • 太原企業(yè)做網(wǎng)站營(yíng)銷團(tuán)隊(duì)找產(chǎn)品合作
  • 重慶門(mén)戶網(wǎng)站開(kāi)發(fā)報(bào)價(jià)seo網(wǎng)站結(jié)構(gòu)優(yōu)化的方法
  • 綿陽(yáng)專門(mén)做網(wǎng)站的公司有哪些產(chǎn)品如何做線上推廣
  • 博客系統(tǒng)做網(wǎng)站aso搜索優(yōu)化
  • 做視頻網(wǎng)站視頻放在哪里找營(yíng)銷型企業(yè)網(wǎng)站
  • html商品展示頁(yè)面專業(yè)搜索引擎seo技術(shù)公司
  • 淘寶網(wǎng)屬于b2b還是b2c培訓(xùn)機(jī)構(gòu)優(yōu)化
  • 數(shù)據(jù)服務(wù)網(wǎng)站開(kāi)發(fā)google網(wǎng)站登錄入口
  • 上海城建設(shè)計(jì)院網(wǎng)站微營(yíng)銷是什么
  • 怎樣幫別人做網(wǎng)站軟文大全800字
  • 合肥的網(wǎng)站建設(shè)深圳網(wǎng)絡(luò)營(yíng)銷網(wǎng)站設(shè)計(jì)
  • 企業(yè)網(wǎng)站建設(shè)制作多少錢(qián)搜索關(guān)鍵詞排名
  • b站是什么平臺(tái)網(wǎng)站優(yōu)化北京seo
  • 比較好的做網(wǎng)站重慶白云seo整站優(yōu)化
  • b2b門(mén)戶網(wǎng)站建設(shè)多少錢(qián)南寧哪里有seo推廣廠家
  • 在線免費(fèi)網(wǎng)站建設(shè)平臺(tái)seo怎么優(yōu)化關(guān)鍵詞排名
  • wordpress banner路徑seo知識(shí)分享