Interacting with NASim Environment

Assuming you are comfortable loading an environment from a scenario (see Starting a NASim Environment or Starting NASim using OpenAI gym), then interacting with a NASim Environment is very easy and follows the same interface as gymnasium.

Starting the environment

First thing is simply loading the environment:

import nasim
# load my environment in the desired way (make_benchmark, load, generate)
env = nasim.make_benchmark("tiny")

# or using gym
import gymnasium as gym
env = gym.make("nasim:Tiny-PO-v0")

Here we are using the default environment parameters: fully_obs=False, flat_actions=True, and flat_obs=True.

The number of actions can be retrieved from the environment action_space attribute as follows:

# When flat_actions=True
num_actions = env.action_space.n

# When flat_actions=False
nvec_actions = env.action_space.nvec

The shape of the observations can be retrieved from the environment observation_space attribute as follows:

obs_shape = env.observation_space.shape

Getting the initial observation and resetting the environment

To reset the environment and get the initial observation, use the reset() function:

o, info = env.reset()

The info return value contains optional auxiliary information.

Performing a single step

A step in the environment can be taken using the step(action) function. Here action can take a few different forms depending on if using flat_actions=True or flat_actions=False, for our example we can simply pass an integer with 0 <= action < N, which specifies the index of the action in the action space. The step function then returns a (Observation, float, bool, bool, dict) tuple corresponding to observation, reward, done, step limit reached, auxiliary info, respectively:

action = # integer in range [0, env.action_space.n]
o, r, done, step_limit_reached, info = env.step(action)

if done=True then the goal has been reached, and the episode is over. Alternatively, if the current scenario has a step limit and step_limit_reached=True then, well, the step limit has been reached. Following both cases, it is then recommended to stop or reset the environment, otherwise theres no gaurantee of what will happen (especially the first case).

Visualizing the environment

You can use the render() function to get a human readable visualization of the state of the environment. To use render correctly make sure to pass render_mode="human" to the environment initialization function:

import nasim
# load my environment in the desired way (make_benchmark, load, generate)
env = nasim.make_benchmark("tiny", render_mode="human")

# or using gym
import gymnasium as gym
env = gym.make("nasim:Tiny-PO-v0", render_mode="human")

env.reset()
# render the environment
# (if render_mode="human" is not passed during initialization this will do nothing)
env.render()

An example agent

Some example agents are provided in the nasim/agents directory. Here is a quick example of a hypothetical agent interacting with the environment:

import nasim

env = nasim.make_benchmark("tiny")

agent = AnAgent(...)

o, info = env.reset()
total_reward = 0
done = False
step_limit_reached = False
while not done and not step_limit_reached:
    a = agent.choose_action(o)
    o, r, done, step_limit_reached, info = env.step(a)
    total_reward += r

print("Done")
print("Total reward =", total_reward)

It’s as simple as that.