Environment

The main Environment class for NASim: NASimEnv.

The NASimEnv class is the main interface for agents interacting with NASim.

class nasim.envs.environment.NASimEnv(scenario, fully_obs=False, flat_actions=True, flat_obs=True, render_mode=None)

A simulated computer network environment for pen-testing.

Implements the gymnasium interface.

name

the environment scenario name

Type:str
scenario

Scenario object, defining the properties of the environment

Type:Scenario
action_space

Action space for environment. If flat_action=True then this is a discrete action space (which subclasses gymnasium.spaces.Discrete), so each action is represented by an integer. If flat_action=False then this is a parameterised action space (which subclasses gymnasium.spaces.MultiDiscrete), so each action is represented using a list of parameters.

Type:FlatActionSpace or ParameterisedActionSpace
observation_space

observation space for environment. If flat_obs=True then observations are represented by a 1D vector, otherwise observations are represented as a 2D matrix.

Type:gymnasium.spaces.Box
current_state

the current state of the environment

Type:State
last_obs

the last observation that was generated by environment

Type:Observation
steps

the number of steps performed since last reset (this does not include generative steps)

Type:int
__init__(scenario, fully_obs=False, flat_actions=True, flat_obs=True, render_mode=None)
Parameters:
  • scenario (Scenario) – Scenario object, defining the properties of the environment
  • fully_obs (bool, optional) – The observability mode of environment, if True then uses fully observable mode, otherwise is partially observable (default=False)
  • flat_actions (bool, optional) – If true then uses a flat action space, otherwise will uses a parameterised action space (default=True).
  • flat_obs (bool, optional) – If true then uses a 1D observation space, otherwise uses a 2D observation space (default=True)
  • render_mode (str, optional) – The render mode to use for the environment.
close()

After the user has finished using the environment, close contains the code necessary to “clean up” the environment.

This is critical for closing rendering windows, database or HTTP connections.

generate_initial_state()

Generate the initial state for the environment.

Returns:The initial state
Return type:State

Notes

This does not reset the current state of the environment (use reset() for that).

generate_random_initial_state()

Generates a random initial state for environment.

This only randomizes the host configurations (os, services) using a uniform distribution, so may result in networks where it is not possible to reach the goal.

Returns:A random initial state
Return type:State
generative_step(state, action)

Run one step of the environment using action in given state.

Parameters:
  • state (State) – The state to perform the action in
  • action (Action, int, list, NumpyArray) – Action to perform. If not Action object, then if using flat actions this should be an int and if using non-flat actions this should be an indexable array.
Returns:

  • State – the next state after action was performed
  • Observation – observation from performing action
  • float – reward from performing action
  • bool – whether a terminal state has been reached or not
  • dict – auxiliary information regarding step (see nasim.env.action.ActionResult.info())

get_action_mask()

Get a vector mask for valid actions.

Returns:numpy vector of 1’s and 0’s, one for each action. Where an index will be 1 if action is valid given current state, or 0 if action is invalid.
Return type:ndarray
get_minimum_hops()

Get the minimum number of network hops required to reach targets.

That is minimum number of hosts that must be traversed in the network in order to reach all sensitive hosts on the network starting from the initial state

Returns:minumum possible number of network hops to reach target hosts
Return type:int
get_score_upper_bound()

Get the theoretical upper bound for total reward for scenario.

The theoretical upper bound score is where the agent exploits only a single host in each subnet that is required to reach sensitive hosts along the shortest bath in network graph, and exploits the all sensitive hosts (i.e. the minimum network hops). Assuming action cost of 1 and each sensitive host is exploitable from any other connected subnet (which may not be true, hence being an upper bound).

Returns:theoretical max score
Return type:float
goal_reached(state=None)

Check if the state is the goal state.

The goal state is when all sensitive hosts have been compromised.

Parameters:state (State, optional) – a state, if None will use current_state of environment (default=None)
Returns:True if state is goal state, otherwise False.
Return type:bool
render()

Render environment.

Implements gymnasium.Env.render().

See render module for more details on modes and symbols.

render_action(action)

Renders human readable version of action.

This is mainly useful for getting a text description of the action that corresponds to a given integer.

Parameters:action (Action or int or list or NumpyArray) – Action to render. If not Action object, then if using flat actions this should be an int and if using non-flat actions this should be an indexable array.
render_episode(episode, width=7, height=7)

Render an episode as sequence of network graphs, where an episode is a sequence of (state, action, reward, done) tuples generated from interactions with environment.

Parameters:
  • episode (list) – list of (State, Action, reward, done) tuples
  • width (int) – width of GUI window
  • height (int) – height of GUI window
render_network_graph(ax=None, show=False)

Render a plot of network as a graph with hosts as nodes arranged into subnets and showing connections between subnets. Renders current state of network.

Parameters:
  • ax (Axes) – matplotlib axis to plot graph on, or None to plot on new axis
  • show (bool) – whether to display plot, or simply setup plot and showing plot can be handled elsewhere by user
render_obs(mode='human', obs=None)

Render observation.

See render module for more details on modes and symbols.

Parameters:
  • mode (str) – rendering mode
  • obs (Observation or numpy.ndarray, optional) – the observation to render, if None will render last observation. If numpy.ndarray it must be in format that matches Observation (i.e. ndarray returned by step method) (default=None)
render_state(mode='human', state=None)

Render state.

See render module for more details on modes and symbols.

If mode = ASCI:
Machines displayed in rows, with one row for each subnet and hosts displayed in order of id within subnet
Parameters:
  • mode (str) – rendering mode
  • state (State or numpy.ndarray, optional) – the State to render, if None will render current state If numpy.ndarray it must be in format that matches State (i.e. ndarray returned by generative_step method) (default=None)
reset(*, seed=None, options=None)

Reset the state of the environment and returns the initial state.

Implements gymnasium.Env.reset().

Parameters:
  • seed (int, optional) – the optional seed for the environments RNG
  • options (dict, optional) – optional environment options (does nothing in NASim at the moment)
Returns:

  • numpy.Array – the initial observation of the environment
  • dict – auxiliary information regarding reset

step(action)

Run one step of the environment using action.

Implements gymnasium.Env.step().

Parameters:action (Action or int or list or NumpyArray) – Action to perform. If not Action object, then if using flat actions this should be an int and if using non-flat actions this should be an indexable array.
Returns:
  • numpy.Array – observation from performing action
  • float – reward from performing action
  • bool – whether the episode reached a terminal state or not (i.e. all target machines have been successfully compromised)
  • bool – whether the episode has reached the step limit (if one exists)
  • dict – auxiliary information regarding step (see nasim.env.action.ActionResult.info())