Environment¶

The main Environment class for NASim: NASimEnv.

The NASimEnv class is the main interface for agents interacting with NASim.

class nasim.envs.environment.NASimEnv(scenario, fully_obs=False, flat_actions=True, flat_obs=True, render_mode=None)¶

A simulated computer network environment for pen-testing.

Implements the gymnasium interface.

…

name¶

the environment scenario name

Type:	str

scenario¶

Scenario object, defining the properties of the environment

Type:	Scenario

action_space¶

Action space for environment. If flat_action=True then this is a discrete action space (which subclasses gymnasium.spaces.Discrete), so each action is represented by an integer. If flat_action=False then this is a parameterised action space (which subclasses gymnasium.spaces.MultiDiscrete), so each action is represented using a list of parameters.

Type:	FlatActionSpace or ParameterisedActionSpace

observation_space¶

observation space for environment. If flat_obs=True then observations are represented by a 1D vector, otherwise observations are represented as a 2D matrix.

Type:	gymnasium.spaces.Box

current_state¶

the current state of the environment

Type:	State

last_obs¶

the last observation that was generated by environment

Type:	Observation

steps¶

the number of steps performed since last reset (this does not include generative steps)

Type:	int

__init__(scenario, fully_obs=False, flat_actions=True, flat_obs=True, render_mode=None)¶

Parameters:

scenario (Scenario) – Scenario object, defining the properties of the environment
fully_obs (bool, optional) – The observability mode of environment, if True then uses fully observable mode, otherwise is partially observable (default=False)
flat_actions (bool, optional) – If true then uses a flat action space, otherwise will uses a parameterised action space (default=True).
flat_obs (bool, optional) – If true then uses a 1D observation space, otherwise uses a 2D observation space (default=True)
render_mode (str, optional) – The render mode to use for the environment.

close()¶

After the user has finished using the environment, close contains the code necessary to “clean up” the environment.

This is critical for closing rendering windows, database or HTTP connections.

generate_initial_state()¶

Generate the initial state for the environment.

Returns:	The initial state
Return type:	State

Notes

This does not reset the current state of the environment (use reset() for that).

generate_random_initial_state()¶

Generates a random initial state for environment.

This only randomizes the host configurations (os, services) using a uniform distribution, so may result in networks where it is not possible to reach the goal.

Returns:	A random initial state
Return type:	State

generative_step(state, action)¶

Run one step of the environment using action in given state.

Parameters:

state (State) – The state to perform the action in
action (Action, int, list, NumpyArray) – Action to perform. If not Action object, then if using flat actions this should be an int and if using non-flat actions this should be an indexable array.

Returns:

State – the next state after action was performed
Observation – observation from performing action
float – reward from performing action
bool – whether a terminal state has been reached or not
dict – auxiliary information regarding step (see nasim.env.action.ActionResult.info())

get_action_mask()¶

Get a vector mask for valid actions.

Returns:	numpy vector of 1’s and 0’s, one for each action. Where an index will be 1 if action is valid given current state, or 0 if action is invalid.
Return type:	ndarray

get_minimum_hops()¶

Get the minimum number of network hops required to reach targets.

That is minimum number of hosts that must be traversed in the network in order to reach all sensitive hosts on the network starting from the initial state

Returns:	minumum possible number of network hops to reach target hosts
Return type:	int

get_score_upper_bound()¶

Get the theoretical upper bound for total reward for scenario.

The theoretical upper bound score is where the agent exploits only a single host in each subnet that is required to reach sensitive hosts along the shortest bath in network graph, and exploits the all sensitive hosts (i.e. the minimum network hops). Assuming action cost of 1 and each sensitive host is exploitable from any other connected subnet (which may not be true, hence being an upper bound).

Returns:	theoretical max score
Return type:	float

goal_reached(state=None)¶

Check if the state is the goal state.

The goal state is when all sensitive hosts have been compromised.

Parameters:	state (State, optional) – a state, if None will use current_state of environment (default=None)
Returns:	True if state is goal state, otherwise False.
Return type:	bool

render()¶

Render environment.

Implements gymnasium.Env.render().

See render module for more details on modes and symbols.

render_action(action)¶

Renders human readable version of action.

This is mainly useful for getting a text description of the action that corresponds to a given integer.

Parameters:	action (Action or int or list or NumpyArray) – Action to render. If not Action object, then if using flat actions this should be an int and if using non-flat actions this should be an indexable array.

render_episode(episode, width=7, height=7)¶

Render an episode as sequence of network graphs, where an episode is a sequence of (state, action, reward, done) tuples generated from interactions with environment.

Parameters:	episode (list) – list of (State, Action, reward, done) tuples width (int) – width of GUI window height (int) – height of GUI window

render_network_graph(ax=None, show=False)¶

Render a plot of network as a graph with hosts as nodes arranged into subnets and showing connections between subnets. Renders current state of network.

Parameters:	ax (Axes) – matplotlib axis to plot graph on, or None to plot on new axis show (bool) – whether to display plot, or simply setup plot and showing plot can be handled elsewhere by user

render_obs(mode='human', obs=None)¶

Render observation.

See render module for more details on modes and symbols.

Parameters:	mode (str) – rendering mode obs (Observation or numpy.ndarray, optional) – the observation to render, if None will render last observation. If numpy.ndarray it must be in format that matches Observation (i.e. ndarray returned by step method) (default=None)

render_state(mode='human', state=None)¶

Render state.

See render module for more details on modes and symbols.

If mode = ASCI:: Machines displayed in rows, with one row for each subnet and hosts displayed in order of id within subnet

Parameters:	mode (str) – rendering mode state (State or numpy.ndarray, optional) – the State to render, if None will render current state If numpy.ndarray it must be in format that matches State (i.e. ndarray returned by generative_step method) (default=None)

reset(*, seed=None, options=None)¶

Reset the state of the environment and returns the initial state.

Implements gymnasium.Env.reset().

Parameters:

seed (int, optional) – the optional seed for the environments RNG
options (dict, optional) – optional environment options (does nothing in NASim at the moment)

Returns:

numpy.Array – the initial observation of the environment
dict – auxiliary information regarding reset

step(action)¶

Run one step of the environment using action.

Implements gymnasium.Env.step().

Parameters:	action (Action or int or list or NumpyArray) – Action to perform. If not Action object, then if using flat actions this should be an int and if using non-flat actions this should be an indexable array.
Returns:	numpy.Array – observation from performing action float – reward from performing action bool – whether the episode reached a terminal state or not (i.e. all target machines have been successfully compromised) bool – whether the episode has reached the step limit (if one exists) dict – auxiliary information regarding step (see `nasim.env.action.ActionResult.info()`)