RAP opportunity

RAP opportunity at Air Force Science and Technology Fellowship Program AF STFP

Safety Monitoring for Autonomous Systems

Location

Sensors Directorate, RY/Sensors Division

opportunity	location

13.35.01.C0674	Wright-Patterson AFB, OH 454337542

Advisers

name	email	phone

Kerianne Lanett Hobbs	kerianne.hobbs@afrl.af.mil	937.623.3981

Description

Description:

Reinforcement Learning (RL) has recently demonstrated prowess that surpasses humans in both high dimensional decision spaces like Go and in complex real-time strategy games like StarCraft. In aerospace, reinforcement learning could provide new solution spaces for complex control challenges in areas ripe for autonomy such as urban air taxis, package delivery drones, satellite constellation management, in-space assembly, and on-orbit satellite servicing. In contrast to video games, and therefore much of the existing corpus of RL research, aircraft and spacecraft are safety and mission critical systems. In these applications, a poor decision from an RL agent could result in a loss of life in the air domain or loss of a highly valuable space-based service in the space domain.

When an RL agent is in control of a physical system, such as a robot, aircraft, or spacecraft, ensuring safety of that agent and the humans who interact with it becomes critically important. Safe RL approaches learn a policy that maximizes reward while adhering to safety constraints. Approaches to Safe RL fall under the category of reward shaping, which incorporates safety into the reward function, or shielding (i.e. run time assurance,) which monitors the RL agent outputs and intervenes by modifying the action to ensure safety. Further research is needed to determine novel approaches to bound, train, and verify reinforcement learning neural network control systems.

Citations:

[1] Ames, A. D., Xu, X., Grizzle, J. W., and Tabuada, P., “Control Barrier Function Based Quadratic Programs for Safety Critical Systems,” IEEE Transactions on Automatic Control, Vol. 62, No. 8, 2016, pp. 3861–3876.

[2] Chow, Y., Nachum, O., Duenez-Guzman, E., and Ghavamzadeh, M., “A Lyapunov-based Approach to Safe ReinforcementLearning,”Advances in Neural Information Processing Systems, 2018, pp. 8092–8101.

[3] Gross, Kerianne H., et al. "Run-time assurance and formal methods analysis applied to nonlinear system control." Journal of Aerospace Information Systems 14.4 (2017): 232-246.

key words

Controls, Run Time Assurance, Safe Reinforcement Learning, Neural Network Verification

Eligibility

Citizenship: Open to U.S. citizens

Level: Open to Postdoctoral and Senior applicants

Stipend

Base Stipend	Travel Allotment	Supplementation

$95,000.00	$5,000.00
Experience Supplement Postdoctoral and Senior awardees will receive an appropriately higher stipend based on the number of years of experience past their PhD.

Additional Benefits

Relocation

Awardees who reside more than 50 miles from their host laboratory and remain on tenure for at least six months are eligible for paid relocation to within the vicinity of their host laboratory.

Health insurance

A group health insurance program is available to awardees and their qualifying dependents in the United States.

Participating Agencies