Summit Training
Rule-based VS ML
Rule-based approach: Decision rules are clearly defined by humans
Machine Learning: Trained from examples. Rules are note defined by humans but learned by the machine from data.
Artificial Intelligence: A concept of machines to make human intelligence. Ability to solve different kinds of work from managing schedule to analyizing marketing trends.
Machine Learning: A method for computers to learn from data. A scale in making prediction based on past experience. Using past sales data and learn to predict future sales trends baesd on patterns it observes.
- Supervised Learning: learn the relationship of given inputs to a given output
- linear regression
- logistic regression (output is binary)
- naive bayes
- support vector machine (non linear problem)
- ada boost
- decision tree
- random forest
- simple neural network
- Un-supervised Learning: without and explicit output
- K Means Clustering: puts data into groups
- Hierarchal Clustering
- Gaussian Mixture
- Recommender System
- Apriori algorithm
- Reinforcement Learning: learning through interaction with an environment and rewards/punishments
- Supervised Learning: learn the relationship of given inputs to a given output
Deep Learning: Specialized technique using neural networks to process data deeply A specialty in deeply analysizing data to understand even a tiniest enfluences on sales. Not just understang which product sales well, but also recognizing many patterns, such as how weather changes might influence sales.
Hyperparameter 超参数
- gradient descent batch size: 64 梯度下降样本数
- number of epochs: 10 迭代次数
- learning rate: 0.0003 学习率(初次训练0.01尽快到最优解附近,后续训练降低学习率微调)
- entropy: 0.01 熵
- discount factor: 0.999 折扣率 learning rate = epoch * discount factor 使学习率逐渐下降
- loss type: huber
Action space
- continus action space 连续 (由算法决定)
- steering angel range: -30 - 30 转弯角度范围(取决于地图)
- speed: 0.1 - 4 速度范围
- discrete action space 离散 (手动输入:角度越大,速度越慢)
Reward function
- Follow the center line in time trials
if distance_from_center <= marker_1:
reward = 1
elif distance_from_center <= marker_2:
reward = 0.5
elif distance_from_center <= marker_3:
reward = 0.1
reward = 1e-3 # likely crashed/ close to off track
return reward
- Stay inside the two borders in time trials
if all_wheels_on_track and (0.5*track_width - distance_from_center) >= 0.05:
reward = 1.0
- Prevent zig-zag in time trials
# Steering penality threshold, change the number based on your action space setting
# Penalize reward if the car is steering too much
if abs_steering > ABS_STEERING_THRESHOLD:
reward *= 0.8
Params dictionary
"all_wheels_on_track": Boolean, # flag to indicate if the agent is on the track
"x": float, # agent's x-coordinate in meters
"y": float, # agent's y-coordinate in meters
"distance_from_center": float, # distance in meters from the track center
"is_left_of_center": Boolean, # Flag to indicate if the agent is on the left side to the track center or not.
"is_offtrack": Boolean, # Boolean flag to indicate whether the agent has gone off track.
"is_reversed": Boolean, # 反向
"heading": float, # 朝向
"progress": float, # progress尽可能大,steps尽可能小
"steps": int, # 1 秒15个steps
"speed": float, # agent's speed in meters per second (m/s)
"steering_angle": float, # 轮子角度
"track_length": float, # 轨道长度
"track_width": float, # 轨道宽度
"closest_waypoints": [int, int], # 最近的两点
"waypoints": [(float, float), ] # 地图信息
## Define the default reward ##
reward = 1
progress = params['progress']
steps = params['steps']
if (steps % 15) == 0 and progress/100 > (steps/TOTAL_NUM_STEPS):
reward += 2.22 #for each second faster than 45s projected
# reward += progress - (steps/TOTAL_NUM_STEPS)*100
## Reward if car goes close to optimal racing line ##
dist = dist_to_racing_line(optimals[0:2], optimals_second[0:2], [x, y])
distance_reward = max(1e-3, 1 - (dist/(track_width*0.5)))
reward += distance_reward * DISTANCE_MULTIPLE
## Reward if speed is close to optimal speed ##
speed_diff = abs(optimals[2]-speed)
if speed_diff <= SPEED_DIFF_NO_REWARD:
# we use quadratic punishment (not linear) bc we're not as confident with the optimal speed
# so, we do not punish small deviations from optimal speed
speed_reward = (1 - (speed_diff/(SPEED_DIFF_NO_REWARD))**2)**2
speed_reward = 0
reward += speed_reward * SPEED_MULTIPLE
# Zero reward if obviously wrong direction (e.g. spin)
direction_diff = racing_direction_diff(
optimals[0:2], optimals_second[0:2], [x, y], heading)
if direction_diff > 30:
reward = 1e-3
# Zero reward of obviously too slow
speed_diff_zero = optimals[2]-speed
if speed_diff_zero > 0.5:
reward = 1e-3
## Incentive for finishing the lap in less steps ##
REWARD_FOR_FASTEST_TIME = 500 # should be adapted to track length and other rewards
STANDARD_TIME = 12 # seconds (time that is easily done by model)
FASTEST_TIME = 9 # seconds (best time of 1st place on the track)
if progress == 100:
finish_reward = max(1e-3, (-REWARD_FOR_FASTEST_TIME /
finish_reward = 0
reward += finish_reward
## Zero reward if off track ##
if all_wheels_on_track == False:
reward = 1e-3
Best-route strategy:
AWS Deepracer Community: