Environment¶
The main Environment class for NASim: NASimEnv.
The NASimEnv class is the main interface for agents interacting with NASim.
-
class
nasim.envs.environment.
NASimEnv
(scenario, fully_obs=False, flat_actions=True, flat_obs=True, render_mode=None)¶ A simulated computer network environment for pen-testing.
Implements the gymnasium interface.
…
-
name
¶ the environment scenario name
Type: str
-
scenario
¶ Scenario object, defining the properties of the environment
Type: Scenario
-
action_space
¶ Action space for environment. If flat_action=True then this is a discrete action space (which subclasses gymnasium.spaces.Discrete), so each action is represented by an integer. If flat_action=False then this is a parameterised action space (which subclasses gymnasium.spaces.MultiDiscrete), so each action is represented using a list of parameters.
Type: FlatActionSpace or ParameterisedActionSpace
-
observation_space
¶ observation space for environment. If flat_obs=True then observations are represented by a 1D vector, otherwise observations are represented as a 2D matrix.
Type: gymnasium.spaces.Box
-
last_obs
¶ the last observation that was generated by environment
Type: Observation
-
steps
¶ the number of steps performed since last reset (this does not include generative steps)
Type: int
-
__init__
(scenario, fully_obs=False, flat_actions=True, flat_obs=True, render_mode=None)¶ Parameters: - scenario (Scenario) – Scenario object, defining the properties of the environment
- fully_obs (bool, optional) – The observability mode of environment, if True then uses fully observable mode, otherwise is partially observable (default=False)
- flat_actions (bool, optional) – If true then uses a flat action space, otherwise will uses a parameterised action space (default=True).
- flat_obs (bool, optional) – If true then uses a 1D observation space, otherwise uses a 2D observation space (default=True)
- render_mode (str, optional) – The render mode to use for the environment.
-
close
()¶ After the user has finished using the environment, close contains the code necessary to “clean up” the environment.
This is critical for closing rendering windows, database or HTTP connections.
-
generate_initial_state
()¶ Generate the initial state for the environment.
Returns: The initial state Return type: State Notes
This does not reset the current state of the environment (use
reset()
for that).
-
generate_random_initial_state
()¶ Generates a random initial state for environment.
This only randomizes the host configurations (os, services) using a uniform distribution, so may result in networks where it is not possible to reach the goal.
Returns: A random initial state Return type: State
-
generative_step
(state, action)¶ Run one step of the environment using action in given state.
Parameters: Returns: - State – the next state after action was performed
- Observation – observation from performing action
- float – reward from performing action
- bool – whether a terminal state has been reached or not
- dict – auxiliary information regarding step
(see
nasim.env.action.ActionResult.info()
)
-
get_action_mask
()¶ Get a vector mask for valid actions.
Returns: numpy vector of 1’s and 0’s, one for each action. Where an index will be 1 if action is valid given current state, or 0 if action is invalid. Return type: ndarray
-
get_minimum_hops
()¶ Get the minimum number of network hops required to reach targets.
That is minimum number of hosts that must be traversed in the network in order to reach all sensitive hosts on the network starting from the initial state
Returns: minumum possible number of network hops to reach target hosts Return type: int
-
get_score_upper_bound
()¶ Get the theoretical upper bound for total reward for scenario.
The theoretical upper bound score is where the agent exploits only a single host in each subnet that is required to reach sensitive hosts along the shortest bath in network graph, and exploits the all sensitive hosts (i.e. the minimum network hops). Assuming action cost of 1 and each sensitive host is exploitable from any other connected subnet (which may not be true, hence being an upper bound).
Returns: theoretical max score Return type: float
-
goal_reached
(state=None)¶ Check if the state is the goal state.
The goal state is when all sensitive hosts have been compromised.
Parameters: state (State, optional) – a state, if None will use current_state of environment (default=None) Returns: True if state is goal state, otherwise False. Return type: bool
-
render
()¶ Render environment.
Implements gymnasium.Env.render().
See render module for more details on modes and symbols.
-
render_action
(action)¶ Renders human readable version of action.
This is mainly useful for getting a text description of the action that corresponds to a given integer.
Parameters: action (Action or int or list or NumpyArray) – Action to render. If not Action object, then if using flat actions this should be an int and if using non-flat actions this should be an indexable array.
-
render_episode
(episode, width=7, height=7)¶ Render an episode as sequence of network graphs, where an episode is a sequence of (state, action, reward, done) tuples generated from interactions with environment.
Parameters: - episode (list) – list of (State, Action, reward, done) tuples
- width (int) – width of GUI window
- height (int) – height of GUI window
-
render_network_graph
(ax=None, show=False)¶ Render a plot of network as a graph with hosts as nodes arranged into subnets and showing connections between subnets. Renders current state of network.
Parameters: - ax (Axes) – matplotlib axis to plot graph on, or None to plot on new axis
- show (bool) – whether to display plot, or simply setup plot and showing plot can be handled elsewhere by user
-
render_obs
(mode='human', obs=None)¶ Render observation.
See render module for more details on modes and symbols.
Parameters: - mode (str) – rendering mode
- obs (Observation or numpy.ndarray, optional) – the observation to render, if None will render last observation. If numpy.ndarray it must be in format that matches Observation (i.e. ndarray returned by step method) (default=None)
-
render_state
(mode='human', state=None)¶ Render state.
See render module for more details on modes and symbols.
- If mode = ASCI:
- Machines displayed in rows, with one row for each subnet and hosts displayed in order of id within subnet
Parameters: - mode (str) – rendering mode
- state (State or numpy.ndarray, optional) – the State to render, if None will render current state If numpy.ndarray it must be in format that matches State (i.e. ndarray returned by generative_step method) (default=None)
-
reset
(*, seed=None, options=None)¶ Reset the state of the environment and returns the initial state.
Implements gymnasium.Env.reset().
Parameters: - seed (int, optional) – the optional seed for the environments RNG
- options (dict, optional) – optional environment options (does nothing in NASim at the moment)
Returns: - numpy.Array – the initial observation of the environment
- dict – auxiliary information regarding reset
-
step
(action)¶ Run one step of the environment using action.
Implements gymnasium.Env.step().
Parameters: action (Action or int or list or NumpyArray) – Action to perform. If not Action object, then if using flat actions this should be an int and if using non-flat actions this should be an indexable array. Returns: - numpy.Array – observation from performing action
- float – reward from performing action
- bool – whether the episode reached a terminal state or not (i.e. all target machines have been successfully compromised)
- bool – whether the episode has reached the step limit (if one exists)
- dict – auxiliary information regarding step
(see
nasim.env.action.ActionResult.info()
)
-