當前位置：首頁 > news >正文

蕪湖做網(wǎng)站的鄧健照片企業(yè)培訓

news 2025/7/8 6:50:29

蕪湖做網(wǎng)站的鄧健照片,企業(yè)培訓,網(wǎng)站寶建站,建設銀行內(nèi)部學習網(wǎng)站論文：ICWS 2021 移動邊緣計算中的用戶分配：一種深度強化學習方法代碼地址：使用強化學習在移動邊緣計算環(huán)境中進行用戶分配目錄 Ⅰ.Introduction II. MOTIVATION-A.驗證假設的觀察結果 II. MOTIVATION-A Motivating Example 數(shù)據(jù)驅(qū)動…

論文：ICWS 2021 移動邊緣計算中的用戶分配：一種深度強化學習方法

代碼地址：使用強化學習在移動邊緣計算環(huán)境中進行用戶分配

Ⅰ.Introduction

II. MOTIVATION-A.驗證假設的觀察結果?

II. MOTIVATION-A Motivating Example

數(shù)據(jù)驅(qū)動方法的基本思想

III.強化學習分配

RL框架

?RL Allocation Algorithm代碼

IV.確定性方法作為BASELINE

?ILP Algorithm整數(shù)線性規(guī)劃（ILP）算法代碼

Nearest Neighbourhood Greedy Algorithm代碼?

整體框架代碼：

框架中用到的函數(shù)：

對比實驗：

V.實驗與結果分析

Ⅰ.Introduction

隨著對低延遲需求的增加，邊緣計算或霧計算逐漸成為主流。當前最先進的技術假設邊緣服務器上的總資源利用率等于從邊緣服務器提供的所有服務的資源利用率的總和。然而，邊緣服務器的資源利用率與服務請求數(shù)量之間通常存在高度非線性的關系，尤其CPU-GPU協(xié)同執(zhí)行，使資源利用率的數(shù)學建模異常復雜。

Motivation：目前用于解決邊緣用戶分配（EUA）問題的算法普遍假設服務的資源利用率與邊緣服務器上提供的服務數(shù)呈線性關系，假設服務器的總資源利用率是每個服務請求的資源占用量的累積總和。然而，實際服務過程中資源使用量是高度動態(tài)的，難以通過數(shù)學建模精確描述。

Method：提出一種設備端深度強化學習（DRL）框架來解決邊緣用戶分配（EUA）問題，基于與 MEC 系統(tǒng)的經(jīng)驗和交互逐步學習適當?shù)馁Y源分配。DRL Agent在服務延遲閾值約束下學習在某邊緣服務器上服務的用戶數(shù)量。DRL Agent通過同時觀察系統(tǒng)參數(shù)，直接從邊緣服務器中學習非線性依賴關系。

Conclusion：實驗評估表明，DRL框架在用戶分配數(shù)量方面優(yōu)于傳統(tǒng)的確定性方法。

II. MOTIVATION-A.驗證假設的觀察結果?

分別使用 GPU 和不使用 GPU 觀察目標檢測應用程序 YOLO處理圖像的服務執(zhí)行時間:

非線性關系：YOLO的執(zhí)行時間與CPU和GPU參數(shù)之間的關系是非線性的。表明執(zhí)行時間不僅僅取決于單一的參數(shù)變化，還受到許多隱含因素的影響，例如CPU/GPU的可用性。

復雜性：由于存在多個隱藏參數(shù)，精確建模YOLO執(zhí)行時間與系統(tǒng)資源之間的關系是相對困難的。

建模服務執(zhí)行時間困難:

?可用處理器資源和執(zhí)行時間之間的非線性關系:可用處理器資源和執(zhí)行時間之間存在非線性關系，內(nèi)核數(shù)量少時減少內(nèi)核數(shù)量對執(zhí)行時間的影響更顯著，在高工作負載情況下增加后臺工作負載會顯著減慢執(zhí)行速度。

? 跨時間的變化：相同配置的同一臺機器上執(zhí)行時間也存在顯著差異，受服務調(diào)用模式和溫度等多個隱藏參數(shù)影響。

? 跨服務的變化：不同服務的執(zhí)行時間模式各異，如 Yolo 執(zhí)行時間隨用戶增加近似線性增長，而 MobileNet 表現(xiàn)為非線性，使建模任務復雜化。

II. MOTIVATION-A Motivating Example

七個移動用戶 U1、U2 .. U7和兩個邊緣服務器 e1 和 e2，每個用戶請求邊緣服務器上可用的兩個服務 s1 和 s2 之一，用戶U1、U2、U4和U6請求服務s1，其余用戶請求服務s2，每個邊緣服務器由一個資源向量4 元組（Available RAM、Core 的數(shù)量、CPU 后臺工作負載%、GPU Utilized%）。

解釋傳統(tǒng)的確定性方法如何導致用戶到服務器的分配效率低下：

如對于邊緣服務器 e1，服務s1的單個用戶請求的執(zhí)行時間為 3.12s，四個用戶的預期執(zhí)行時間用線性插值得到12.48 s。然而實際 3.468 s。

假設延遲閾值6.58 秒，用僅考慮服務單個請求的執(zhí)行時間的確定性方法將U1、U2和U3分配給e1，只會給s1分配2個用戶（每個用戶3.12s），給s2分配1個用戶（6.32s）

使用確定性方法分配的用戶總數(shù)為 3

數(shù)據(jù)驅(qū)動方法的基本思想

數(shù)據(jù)驅(qū)動方法基于實際的執(zhí)行時間數(shù)據(jù)和更精確的資源利用模型進行用戶到服務器的分配，克服確定性方法的缺點，提供更有效的資源分配。

運行 YOLO 的四個用戶的執(zhí)行時間低于 6.55 的延遲閾值。實際上可以在 e1 上分配用戶 U1、U2、U4(只需 3.35)。e2 可以容納 U5 和 U7（只需 6.12）。

使用數(shù)據(jù)驅(qū)動的方法分配總共 5 個用戶（比確定性方法多 2 個用戶），更準確地建模資源利用率。

III.強化學習分配

MEC環(huán)境 $E=\{e_1,e_2...e_j\}$ 中每個邊緣服務器 $e_j$ 的覆蓋率半徑為 $r_j$ 。邊緣服務器覆蓋半徑內(nèi)的移動users $U=\{U_1,U_2...U_i\}$ 可以請求托管在該服務器上的服務 $S=\{s_1,s_2...s_k\}$ 。每個邊緣服務器上可用資源（RAM、Cores、CPU 背景工作負載%、GPU Utilized%）

分配策略根據(jù)動態(tài)因素確定用戶與特定邊緣服務器的綁定：

a)新用戶加入邊緣服務器的覆蓋區(qū)域? (b)用戶遠離邊緣服務器的覆蓋區(qū)域?? (c)用戶服務請求更改? (d)邊緣服務器或移動用戶離線

分配策略目標是在遵守服務執(zhí)行的延遲閾值𝛤Γ的同時盡可能多滿足服務請求。傳統(tǒng)的確定性方法依賴歷史數(shù)據(jù)預測執(zhí)行時間，但由于執(zhí)行時間的動態(tài)特性，可能導致資源分配過度或不足。提出的RL學習框架通過從邊緣服務器的實際經(jīng)驗中學習服務執(zhí)行模式，實時優(yōu)化用戶-服務器綁定決策，更有效地應對動態(tài)環(huán)境。

RL框架

RL 框架中的Agent通過探索環(huán)境并從動作接收反饋來學習環(huán)境以選擇更好的動作選擇，RL可以在不需要大量標記數(shù)據(jù)的情況下學習底層環(huán)境。

在這個 RL 框架中，Agent不斷地與邊緣服務器交互以采取行動，執(zhí)行多個服務請求并根據(jù)執(zhí)行占用空間獲得相應的獎勵。

狀態(tài)：邊緣服務器的資源向量以及服務請求的數(shù)量

動作：等待在邊緣服務器上執(zhí)行的服務請求集。

Reward的計算：?

#compute latencydef get_reward(self, state, action):#將動作 action 轉(zhuǎn)換為兩個用戶數(shù)量 u1 和 u2u1 = action//5 + 1u2 = (action+1) - (u1-1)*5#sample time from dataframegram = state[0]gcores = state[1]gwl_c = state[2]gwl_g = state[3]gs1 = u1*100gs2 = u2*100#查找與當前狀態(tài)和動作匹配的記錄fetch_state = self.df.loc[ (self.df['ram'] == gram) & (self.df['cores']== gcores) & (self.df['workload_cpu']==gwl_c) & (self.df['workload_gpu']==gwl_g) & (self.df['users_yolo']==gs1) & (self.df['users_mnet']==gs2)]#計算獎勵：        if fetch_state.empty:#找不到匹配的狀態(tài)信息，則返回較大的負獎勵，表示這是一個不利的動作選擇return -20 # 計算網(wǎng)絡延遲：time1 = fetch_state.sample().iloc[0]['time_yolo'] #從匹配的狀態(tài)信息中隨機選擇一個樣本time2 = fetch_state.sample().iloc[0]['time_mnet']#獲取 time_yolo 和 time_mnet 的延遲時間tm = max(time1, time2)#兩者中較大的延遲時間用作網(wǎng)絡延遲的閾值#add total latencies due to network based on number of u1 and u2if (tm <= latency_threshold): #用戶數(shù)量的變化對應的獎勵，以及動作本身的基礎獎勵（u1 + u2）return  0.01*(gs1 - state[4]) +  0.01*(gs2 - state[5]) + u1 + u2 else:return -5 - u1 - u2

?訓練RL Agent

RL環(huán)境：

import numpy as np
import gym
from gym import spaces
from gym.utils import seedingclass yolosystem(gym.Env):metadata = {'render.modes': ['human']}def __init__(self, n_actions, filename):super(yolosystem, self).__init__()self.n_actions = n_actions #total number of action space after ranging [10, 20, 30 ...]self.action_space = spaces.Discrete(self.n_actions) #total number of users in the action space; starts with zeroself.observation_space = spaces.Box(low=np.array([0,0,0,0,0,0]), high=np.array([11000]*6), shape=(6, ), dtype=np.int32) #<RAM, Core, Workload>self.seed()self.current_obs = np.array( [3000, 2, 40, 2, 100, 100] ) #current observation = <ram, cores, workload%>#Load datasetself.df = pd.read_csv(filename)# computer percentage of GPU usage from actual useself.df['workload_gpu'] = self.df['workload_gpu'].multiply(1/80).round(0).astype(int) #round gpu workload#get unique data in setself.ram = self.df.ram.unique()self.cores = self.df.cores.unique()self.workload_cpu = self.df.workload_cpu.unique()print(self.df) #print datasetdef seed(self, seed=1010):self.np_random, seed = seeding.np_random(seed)return [seed]def step(self, action):assert self.action_space.contains(action) #action should be in action spacestate = self.current_obsdone = True #Episodes ends after each action#compute latecy from the number of usersreward = self.get_reward(state, action) #linear latency           
#         print(action, reward)self.current_obs = self.get_random_state() #go to a random state#         print(self.current_obs)return self.current_obs, reward, done, {} #no-states, reward, episode-done, no-infodef reset(self):self.current_obs = self.get_random_state()return self.current_obs #current state of the system with no loaddef render(self, mode='human', close=False):print(f"Current State:<{self.current_obs}>")#compute latencydef get_reward(self, state, action):#將動作 action 轉(zhuǎn)換為兩個用戶數(shù)量 u1 和 u2u1 = action//5 + 1u2 = (action+1) - (u1-1)*5#sample time from dataframegram = state[0]gcores = state[1]gwl_c = state[2]gwl_g = state[3]gs1 = u1*100gs2 = u2*100#查找與當前狀態(tài)和動作匹配的記錄fetch_state = self.df.loc[ (self.df['ram'] == gram) & (self.df['cores']== gcores) & (self.df['workload_cpu']==gwl_c) & (self.df['workload_gpu']==gwl_g) & (self.df['users_yolo']==gs1) & (self.df['users_mnet']==gs2)]#計算獎勵：        if fetch_state.empty:#找不到匹配的狀態(tài)信息，則返回較大的負獎勵，表示這是一個不利的動作選擇return -20 # 計算網(wǎng)絡延遲：time1 = fetch_state.sample().iloc[0]['time_yolo'] #從匹配的狀態(tài)信息中隨機選擇一個樣本time2 = fetch_state.sample().iloc[0]['time_mnet']#獲取 time_yolo 和 time_mnet 的延遲時間tm = max(time1, time2)#兩者中較大的延遲時間用作網(wǎng)絡延遲的閾值#add total latencies due to network based on number of u1 and u2if (tm <= latency_threshold): #用戶數(shù)量的變化對應的獎勵，以及動作本身的基礎獎勵（u1 + u2）return  0.01*(gs1 - state[4]) +  0.01*(gs2 - state[5]) + u1 + u2 else:return -5 - u1 - u2     #get to some random state after taking an actiondef get_random_state(self):#generate state randomlygram = np.random.choice(self.ram, 1)[0]gcores = np.random.choice(self.cores, 1)[0]gwl_c = np.random.choice(self.workload_cpu, 1)[0]#fetch gamma for the statefetch_state = self.df.loc[ (self.df['ram'] == gram) & (self.df['cores']== gcores) & (self.df['workload_cpu']==gwl_c) ]gwl_g = fetch_state.sample().iloc[0]['workload_gpu'] #fetch workload randmolygs1 = random.randrange(50, 550, 50)gs2 = random.randrange(50, 550, 50)return np.array( [gram, gcores, gwl_c, gwl_g, gs1, gs2] )

from stable_baselines3.common.monitor import Monitor
import os
# Create log dir
log_dir = './agent_tensorboard/'
os.makedirs(log_dir, exist_ok=True)env = Monitor(env, log_dir)from stable_baselines3 import DQN
from stable_baselines3.dqn import MlpPolicy
from stable_baselines3.common.vec_env import DummyVecEnv# wrap it非向量化的環(huán)境 env 轉(zhuǎn)換為一個向量化的環(huán)境 env
env = DummyVecEnv([lambda: env])

訓練?

model = DQN(MlpPolicy, env, verbose=0, tensorboard_log = log_dir, exploration_fraction=0.4, learning_starts=150000,  train_freq=30, target_update_interval=30000, exploration_final_eps=0.07)begin = time.time()
model.learn(total_timesteps=500000) #reset_num_timesteps=False
end = time.time()
training_time = end-begin

?RL Allocation Algorithm代碼

#Load model
def rl_algo():#對于每臺服務器，使用RL預測每個服務器的容量server_capacity = np.zeros((N, S))for server_id in range(N):state = server_state[server_id]
#         if model_type == 'lin':action = model_rl.predict(np.array(state), deterministic=True)
#         if model_type == 'exp':
#             action = model_exp.predict(np.array(state), deterministic=True)
# 根據(jù)action計算兩種服務的預測容量 (u1 和 u2)u1 = action[0]//5 + 1u2 = (action[0]+1) - (u1-1)*5server_capacity[server_id][0] = u1*100 #model outputserver_capacity[server_id][1] = u2*100 #model output#根據(jù)每個用戶的服務器數(shù)進行排序col1 = np.array([np.sum(ngb,axis=1)])col2 = np.array([np.arange(U)])sorted_ngb = np.concatenate((ngb, col1.T, col2.T), axis=1) #add rowsum and index column添加行和索引列sorted_ngb = sorted_ngb[np.argsort(sorted_ngb[:, N])] #sort the rows based on rowsum column根據(jù)行和列對行進行排序#分配用戶到服務器#run allocation algorithmrl_allocation = []
# 遍歷用戶，根據(jù)用戶連接的服務器列表和服務請求，選擇最大預測容量的服務器分配。服務器有足夠容量則更新服務器容量并記錄分配結果for i in range(U):server_list = np.where(sorted_ngb[i, :N] == 1)[0] #獲取用戶連接到的服務器列表if len(server_list) == 0: #跳過沒有服務器的用戶continueser = int(service[i]) #用戶正在請求哪個服務choosen_server = server_list[np.argmax(server_capacity[server_list, ser])] #找到所選服務器的 IDif  server_capacity[choosen_server][ser] > 0: #將用戶分配給choosen_serverserver_capacity[choosen_server][ser] -= 1 #減少服務器容量rl_allocation.append( (int(sorted_ngb[i, N+1]), choosen_server) ) #(user, server) alloc pairprint('RL Num of allocation: {}'.format(len(rl_allocation)))return rl_allocation

IV.確定性方法作為BASELINE

使用歷史服務執(zhí)行數(shù)據(jù)的平均值確定邊緣服務器 $e_j$ 上服務 $S_k$ 的執(zhí)行時間 $\gamma _{kj}$ ，進而確定可以分配到邊緣服務器的用戶數(shù)量的相應代碼：allocation.ipynb? def generate_server_state(num_server)

#獲取與選擇的 GPU 工作負載值匹配的行 計算YOLO 和 MNet 的平均時間time_yolo = fetch_time['time_yolo'].mean() #average of time for particular statetime_mnet = fetch_time['time_mnet'].mean()# 根據(jù)每個服務器的服務請求分配狀態(tài)gs1 = server_service[s_id][0]gs2 = server_service[s_id][1] server_state.append( [gram, gcores, gwl_c, gwl_g, gs1, gs2] )# 追加每個服務器的 gamma 值gamma.append((time_yolo, time_mnet)) #append the gamma value of each server

?ILP Algorithm整數(shù)線性規(guī)劃（ILP）算法代碼

解決用戶分配到服務器的問題，目標是最大化覆蓋的用戶數(shù)量

def ilp_algo():## ===================================ILP with python mip# >> solver_name=GRB# >> Currently using CBCI = range(U) #user 用戶的范圍J = range(N) #server服務器的范圍alloc = Model(sense=MAXIMIZE, name="alloc", solver_name=CBC)alloc.verbose = 0def coverage(user_ix, server_ix):if ngb[user_ix][server_ix]==1:return 1else:return 0#U: num of users, N: num of servers# 創(chuàng)建二進制變量矩陣 x，其中 x[i][j] 表示用戶 i 是否被分配到服務器 jx = [[ alloc.add_var(f"x{i}{j}", var_type=BINARY) for j in J] for i in I]#Objective Equation# 目標函數(shù)：最大化分配的用戶數(shù)量alloc.objective = xsum( x[i][j]  for i in I for j in J )#1. 覆蓋約束for i in I:for j in J:        if not coverage(i,j):alloc += x[i][j] == 0# 2. 每個用戶只能被分配到一個服務器for i in I:alloc += xsum( x[i][j] for j in J ) <=1# 3. 延遲約束for j in J:alloc += xsum( gamma[j][int(service[i])]*x[i][j] for i in I ) <=latency_threshold-network_latency[j] #alloc.write("test-model.lp")#===========Start Optimization=========alloc.optimize(max_seconds=25)# 優(yōu)化模型#==========ILP Ends here#print(f"Number of Solutions:{qoe.num_solutions}")ilp_allocation = [ (i,j) for i in I for j in J if x[i][j].x >= 0.99] # 獲取分配結果#print(f"Number of Solutions:{qoe.num_solutions}")#print(f"Objective Value:{qoe.objective_value}")allocated_num_users = len(ilp_allocation)print("ILP Allocated Num of Users: {}".format(allocated_num_users))# 輸出分配的用戶數(shù)量# selected.sort()return ilp_allocation

Nearest Neighbourhood Greedy Algorithm代碼?

def greedy_algo():server_capacity = np.zeros(N)# 初始化服務器容量數(shù)組rl_allocation = []for user_id in range(U):#獲取與用戶連接的服務器列表server_ngb_list = np.where(ngb[user_id, :N] == 1)[0] #get the list of server to which user is connectedif len(server_ngb_list) == 0: #ignore the users which are not under any serverscontinue # 計算每個用戶到各個服務器的距離并排序#find the distance to each users in the server_ngb_listdist_list = np.array([ server_ngb_list, [server.iloc[i]['geometry'].centroid.distance(user.iloc[user_id]['geometry']) for i in server_ngb_list] ])# sorted list of servers based on the distance from userssorted_distance_list = dist_list[ :, dist_list[1].argsort()]#get the list of servers arranged in least to max distanceserver_list = sorted_distance_list[0].astype(int)# 分配算法lat = 0for server_id in server_list:lat = gamma[server_id][int(service[user_id])]#根據(jù)用戶請求的服務類型和服務器，獲取相應的服務延遲if server_capacity[server_id]+lat <= latency_threshold-network_latency[server_id]:server_capacity[server_id] += lat #increment the server_capacity of serverrl_allocation.append( (user_id, server_id) ) #(user, server) alloc pairbreakprint('Greedy-Ngb Num of allocation: {}'.format(len(rl_allocation)))return rl_allocation

整體框架代碼：

固定服務器數(shù)量更改用戶數(shù)量的情況：

對于不同用戶數(shù)量，先拿到用戶和服務器之間的鄰居矩陣ngb，并計算網(wǎng)絡延遲network_latency

之后生成服務器狀態(tài)server_state 計算每個服務器的gamma 值?generate_server_state(num_server)

之后分別使用三種算法計算成功分配的數(shù)量

if alloc_type == 'server': #服務器固定，變化用戶數(shù)量"for U in range(100, 600, 100):#用戶數(shù)量100-500for epoch in range(50):print("User:", U, 'Server:', N, 'Epoch:', epoch)ngb, user, server, service, server_service, network_latency = ngb_matrix(U, N, S) #從EUA數(shù)據(jù)生成服務器和用戶 # 確定鄰域矩陣server_state, gamma = generate_server_state(N) #為每個用戶分配狀態(tài)和γ值#=======ILP startsstart = 0stop = 0execution_time_ilp = 0start = timeit.default_timer()ilp_aloc = ilp_algo() #call ILP algorithmstop = timeit.default_timer()execution_time_ilp = stop - start#========ILP ends#=======Greedy startsstart = 0stop = 0execution_time_greedy = 0start = timeit.default_timer()greedy_aloc = greedy_algo() #call ILP algorithmstop = timeit.default_timer()execution_time_greedy = stop - start#========Greedy ends#=======RL_linear startsstart = 0stop = 0execution_time_rl = 0start = timeit.default_timer()rl_aloc = rl_algo() #call ILP algorithmstop = timeit.default_timer()execution_time_rl = stop - start#========RL_linear ends#========Store results to fileto_append = [U, N,len(ilp_aloc), execution_time_ilp,len(greedy_aloc), execution_time_greedy,len(rl_aloc), execution_time_rl,] dseries = pd.Series(to_append, index = result_user.columns)result_user = result_user.append(dseries, ignore_index=True)print("epoch:", epoch)result_user.to_csv(result_file, index=False)

框架中用到的函數(shù)：

一、生成服務器狀態(tài)計算每個服務器的 gamma 值

def generate_server_state(num_server):#生成服務器狀態(tài)計算每個服務器的 gamma 值df = pd.read_csv(filename_base)# 將 GPU 工作負載的數(shù)值進行縮放和四舍五入   
#     df['ram'] = df['ram'].div(1000).round(0).astype(int)
#     df['workload_cpu'] = df['workload_cpu'].div(10).round(0).astype(int)df['workload_gpu'] = df['workload_gpu'].multiply(1/80).round(0).astype(int) #round gpu workload
#     df['users_yolo'] = df['users_yolo'].div(100).round(0).astype(int)
#     df['users_mnet'] = df['users_mnet'].div(100).round(0).astype(int)#get unique data in set獲取數(shù)據(jù)集中唯一的 RAM、核心數(shù)和 CPU 工作負載值ram = df.ram.unique()cores = df.cores.unique()workload_cpu = df.workload_cpu.unique()server_state = []#服務器狀態(tài)gamma = []for s_id in range(num_server):#對于每一個服務器，隨機選擇一個 RAM、核心數(shù)和 CPU 工作負載值gram = np.random.choice(ram, 1)[0]gcores = np.random.choice(cores, 1)[0]gwl_c = np.random.choice(workload_cpu, 1)[0]#fetch gamma for the state獲取對應狀態(tài)的行fetch_state = df.loc[ (df['ram'] == gram) & (df['cores']== gcores) & (df['workload_cpu']==gwl_c) ]#    從匹配的狀態(tài)中隨機選擇一個 GPU 工作負載值gwl_g = fetch_state.sample().iloc[0]['workload_gpu'] #fetch workload randmolyfetch_time = fetch_state.loc[ (df['workload_gpu'] == gwl_g) ]#獲取與選擇的 GPU 工作負載值匹配的行 計算YOLO 和 MNet 的平均時間time_yolo = fetch_time['time_yolo'].mean() #average of time for particular statetime_mnet = fetch_time['time_mnet'].mean()# 根據(jù)每個服務器的服務請求分配狀態(tài)gs1 = server_service[s_id][0]gs2 = server_service[s_id][1] server_state.append( [gram, gcores, gwl_c, gwl_g, gs1, gs2] )# 追加每個服務器的 gamma 值gamma.append((time_yolo, time_mnet)) #append the gamma value of each serverreturn server_state, gamma

二、計算鄰接矩陣

#================neighbourhood Computing
def ngb_matrix(U, N, S):
# 生成用戶和服務器之間的鄰居矩陣，并計算網(wǎng)絡延遲 #U: number of users#N: number of servers#S: number of services# U X N matrixuser = load_users(U)server = load_servers(N)neighbourhood = np.zeros([U, N]) #用戶和服務器之間的鄰居矩陣network_latency = np.zeros(N) #每個服務器的網(wǎng)絡延遲latency_data = load_planetlab() #加載 PlanetLab 數(shù)據(jù)，返回一個距離矩陣（bin size 150）# 檢查每個用戶是否在服務器的緩沖區(qū)內(nèi)，并計算網(wǎng)絡延遲for u in range(0, U):for n in range(0, N):#檢查用戶是否在服務器的緩沖區(qū)內(nèi)（使用幾何空間的 contains 方法）if server.iloc[n].geometry.contains(user.iloc[u].geometry):neighbourhood[u,n]=1#鄰居矩陣中相應位置設為 1# 計算距離并分配延遲distance = server.iloc[n].geometry.centroid.distance(user.iloc[u].geometry)rep_lat = fetch_network_lat(int(distance), latency_data) #根據(jù)距離從 latency_data 中獲取網(wǎng)絡延遲if network_latency[n] < rep_lat:#最大可能延遲network_latency[n] = rep_latelse:neighbourhood[u,n]=0service = np.zeros(U)#為用戶分配服務請求for u in range(0, U):#為每個用戶隨機分配一個從 0 到 S-1 的服務請求service[u] = random.randrange(0, S, 1) server_service = np.zeros((N, S))#統(tǒng)計每個服務器的服務請求數(shù)量for n in range(0, N):for u in range(0, U):if neighbourhood[u][n] == 1:server_service[n][int(service[u])] += 1return neighbourhood, user, server, service, server_service, network_latency

三、計算鄰接矩陣中用到的函數(shù)

#================Load Planet Lab data
# 加載 PlanetLab 數(shù)據(jù)并轉(zhuǎn)換為一個矩陣格式
def load_planetlab():#Convert to triangleldata = np.loadtxt('eua/PlanetLabData_1')[np.tril_indices(490)]ldata = ldata[ ldata != 0]#提取下三角矩陣的非零值ldata =np.unique(ldata)#去重并重置數(shù)據(jù)大小，使其符合150行的矩陣格式np.set_printoptions(suppress=True)length = ldata.shape[0]latency_row = 150latency_col = (length//latency_row) #Global Data usedldata = np.resize(ldata, latency_col*latency_row)latency = ldata.reshape(latency_row,-1)return latency#=================Fetch Network latency
# 根據(jù)距離從延遲數(shù)據(jù)中獲取網(wǎng)絡延遲
def fetch_network_lat(distance, latency_data):rep_lat = np.random.choice(latency_data[distance], size=1, replace=True)#根據(jù)距離從延遲數(shù)據(jù)中隨機選擇一個延遲值return rep_lat/1000 #將延遲值轉(zhuǎn)換為秒#===============User Data
# 加載用戶數(shù)據(jù)并轉(zhuǎn)換為地理數(shù)據(jù)格式
def load_users(num_of_users):user_raw = pd.read_csv("eua/users.csv")user_raw = user_raw.rename_axis("UID")#將數(shù)據(jù)框的索引軸重命名為 "UID"，即用戶的唯一標識符df = user_raw.sample(num_of_users)#隨機抽樣指定數(shù)量的用戶數(shù)據(jù)
# 創(chuàng)建地理數(shù)據(jù)框，使用Longitude和Latitude列創(chuàng)建點幾何對象，并轉(zhuǎn)換坐標參考系統(tǒng)（CRS）gdf = geopandas.GeoDataFrame(df, geometry = geopandas.points_from_xy(df.Longitude, df.Latitude), crs = 'epsg:4326')#創(chuàng)建地理數(shù)據(jù)框user = gdf [['geometry']] #保留geometry列user = user.to_crs(epsg=28355) #指定數(shù)據(jù)的坐標參考系統(tǒng)（WGS84投影）#Insert additional data to dataframe#user = user.apply(add_data, axis=1)return user#================Server Data
def load_servers(num_of_servers):# 加載服務器數(shù)據(jù)，并將其轉(zhuǎn)換為地理數(shù)據(jù)格式server_raw = pd.read_csv("eua/servers.csv")server_raw = server_raw.rename_axis("SID")#將數(shù)據(jù)框的索引軸重命名為 "SID"，即服務器的唯一標識符df = server_raw.sample(num_of_servers) #隨機抽樣指定數(shù)量的服務器gdf = geopandas.GeoDataFrame(df, geometry = geopandas.points_from_xy(df.LONGITUDE, df.LATITUDE), crs = 'epsg:4326')#創(chuàng)建地理數(shù)據(jù)框server = gdf [['geometry']] #Keep Geometry columnserver = server.to_crs(epsg=28355) #Cover to crs in Australian EPSGdef add_radius(series):
#         radius = random.randrange(150, 250, 10)
# 為每個服務器添加一個固定半徑的緩沖區(qū)radius = 150 #每個服務器的緩沖區(qū)半徑設為固定值 150series.geometry = series.geometry.buffer(radius)series['radius'] = radius
#         series['resource'] = tcompreturn seriesserver = server.apply(add_radius, axis = 1)return server#繪制用戶和服務器數(shù)據(jù)在地圖上的分布情況
def plot_data(user, server):%config InlineBackend.figure_format='retina'%matplotlib inlinecbd = geopandas.read_file('eua/maps', crs = {'init': 'epsg=28355'} ) #read cbd-australia location datafig, ax = plt.subplots(1, 1, figsize=(15,10))ax.set_aspect('equal')ax.set_xlim(319400, 322100)ax.set_ylim(5811900, 5813700)user.plot(ax=ax, marker='o', color='red', markersize=20, zorder=3, label="users")server.plot(ax =ax, linestyle='dashed', edgecolor='green', linewidth=1, facecolor="none", zorder=1)server.centroid.plot(ax=ax, marker='s', color='blue', markersize=50, zorder=2, label="server")cbd.plot(ax=ax, color='grey', zorder=0, alpha = 0.3);ax.set_title("MEC Environment(EUA): CBD Melbourne(Australia)")ax.legend(bbox_to_anchor=(1, 0), loc='lower left')plt.show()

運行結果：

對比實驗：

一、對于RL算法使用不同訓練回合數(shù)：

rl_algo_prop()和rl_algo_und()只是用了兩個不同程度訓練的agent模型，其余部分完全一致，這里只展示rl_algo_prop()。

分別是訓練30,000回合的RL Agent生成的分配數(shù)量和訓練1,50,000回合

動作空間的量化大小 $\lambda$ = 2

model_und = DQN.load("trained_agents/edge_agent_under_train")
model_prop = DQN.load("trained_agents/edge_agent_proper_train")

#Load model
def rl_algo_prop():#...同rl_algo()#換個模型就OK action = model_prop.predict(np.array(state), deterministic=True)print('Actionprop: {}'.format(action))u1 = (action[0]//10)*2 + 1u2 = (action[0]%10)*2 + 1server_capacity[server_id][0] = u1 #model outputserver_capacity[server_id][1] = u2 #model output#...同rl_algo()

二、對于RL算法使用不同量化因子：

其余兩種只在模型和和動作空間映射改變

rl_algo_act() 代碼中說是 $\lambda$ =5：

action = model_act.predict(np.array(state), deterministic=True)print('Actionact: {}'.format(action))u1 = (action[0]//5)*4 + 1 #25 action spaceu2 = (action[0]%5)*4 + 1server_capacity[server_id][0] = u1 #model outputserver_capacity[server_id][1] = u2 #model output

?rl_algo_thres10()代碼中說是 $\lambda$ =100：

 action = model_thres10.predict(np.array(state), deterministic=True)print('Actionthres10: {}'.format(action))u1 = action[0]//5 + 1u2 = (action[0]+1) - (u1-1)*5server_capacity[server_id][0] = u1*100 #model outputserver_capacity[server_id][1] = u2*100 #model output

對于訓練不同回合數(shù)rl_algo_prop中動作空間的量化大小 $\lambda$ = 2的個人理解，不一定對：

每次agent預測出action之后，從中還原出兩個服務s1、s2上的服務請求數(shù)（動作）使用的方法不同， $\lambda$ = 2時的映射方法如下，輸出action

for action in range(25):#rl_algo_prop()u1 = (action//10)*2 + 1u2 = (action%10)*2 + 1print(f"Action: {action}, u1: {u1}, u2: {u2}")

Action: 0, u1: 1, u2: 1
Action: 1, u1: 1, u2: 3
Action: 2, u1: 1, u2: 5
Action: 3, u1: 1, u2: 7
Action: 4, u1: 1, u2: 9
Action: 5, u1: 1, u2: 11
Action: 6, u1: 1, u2: 13
Action: 7, u1: 1, u2: 15
Action: 8, u1: 1, u2: 17
Action: 9, u1: 1, u2: 19
Action: 10, u1: 3, u2: 1
Action: 11, u1: 3, u2: 3
Action: 12, u1: 3, u2: 5
Action: 13, u1: 3, u2: 7
Action: 14, u1: 3, u2: 9
Action: 15, u1: 3, u2: 11
Action: 16, u1: 3, u2: 13
Action: 17, u1: 3, u2: 15
Action: 18, u1: 3, u2: 17
Action: 19, u1: 3, u2: 19
Action: 20, u1: 5, u2: 1
Action: 21, u1: 5, u2: 3
Action: 22, u1: 5, u2: 5
Action: 23, u1: 5, u2: 7
Action: 24, u1: 5, u2: 9

文章提到：使用大小為 $\lambda$ 的量化減少動作空間的基數(shù)， $\lambda$ =10時新的動作元組 (2,2) 表示舊動作空間中范圍 (11 - 20,11 - 20) 中的所有動作

于是我暫且認為 $\lambda$ =2指的是，量化后的動作空間中的一個動作，代表原來動作空間中的兩個動作，也就是，第一個動作中的 u1: 1是我們選擇來代表原來動作空間中u1: 1、u1: 2；同理u2:1代表u2:1、u2:2。下一個動作就從3開始。

但無法解釋后邊 $\lambda$ =5， $\lambda$ =100情況：

$\lambda$ =5：

for action in range(25):#rl_algo_act()u1 = (action//5)*4 + 1u2 = (action%5)*4 + 1print(f"Action: {action+1}, u1: {u1}, u2: {u2}")

Action: 0, u1: 1, u2: 1
Action: 1, u1: 1, u2: 5
Action: 2, u1: 1, u2: 9
Action: 3, u1: 1, u2: 13
Action: 4, u1: 1, u2: 17
Action: 5, u1: 5, u2: 1
Action: 6, u1: 5, u2: 5
Action: 7, u1: 5, u2: 9
Action: 8, u1: 5, u2: 13
Action: 9, u1: 5, u2: 17
Action: 10, u1: 9, u2: 1
Action: 11, u1: 9, u2: 5
Action: 12, u1: 9, u2: 9
Action: 13, u1: 9, u2: 13
Action: 14, u1: 9, u2: 17
Action: 15, u1: 13, u2: 1
Action: 16, u1: 13, u2: 5
Action: 17, u1: 13, u2: 9
Action: 18, u1: 13, u2: 13
Action: 19, u1: 13, u2: 17
Action: 20, u1: 17, u2: 1
Action: 21, u1: 17, u2: 5
Action: 22, u1: 17, u2: 9
Action: 23, u1: 17, u2: 13
Action: 24, u1: 17, u2: 17
?

$\lambda$ =100：

for action in range(25):#rl_algo_thres10u1 = action//5 + 1u2 = (action+1) - (u1-1)*5print(f"Action: {action+1}, u1: {u1}, u2: {u2}")

?Action: 1, u1: 1, u2: 1
Action: 2, u1: 1, u2: 2
Action: 3, u1: 1, u2: 3
Action: 4, u1: 1, u2: 4
Action: 5, u1: 1, u2: 5
Action: 6, u1: 2, u2: 1
Action: 7, u1: 2, u2: 2
Action: 8, u1: 2, u2: 3
Action: 9, u1: 2, u2: 4
Action: 10, u1: 2, u2: 5
Action: 11, u1: 3, u2: 1
Action: 12, u1: 3, u2: 2
Action: 13, u1: 3, u2: 3
Action: 14, u1: 3, u2: 4
Action: 15, u1: 3, u2: 5
Action: 16, u1: 4, u2: 1
Action: 17, u1: 4, u2: 2
Action: 18, u1: 4, u2: 3
Action: 19, u1: 4, u2: 4
Action: 20, u1: 4, u2: 5
Action: 21, u1: 5, u2: 1
Action: 22, u1: 5, u2: 2
Action: 23, u1: 5, u2: 3
Action: 24, u1: 5, u2: 4
Action: 25, u1: 5, u2: 5