Benchmark Scenarios¶

There are a number of existing scenarios that come with NASim. They cover a range of complexities and sizes and are intended to be used to help with benchmarking algorithms. Additionally, there are two flavours of existing scenarios: static and generated.

Note

For full list of benchmark scenarios see All benchmark scenarios.

Static scenarios are predefined and will be exactly the same every time they are loaded. They are defined in .yaml files in the nasim/scenarios/benchmark/ directory.

Generated are scenario generated using the Scenario Generator based on some parameters. While certain features of the each scenario will remain constant between generations (e.g. number of hosts, services, exploits), other features may change (e.g. specific host configurations, firewall settings, exploit probabilities) depending on the random seed.

All benchmark scenarios¶

The following table provides details of each benchmark scenario currently available in NASim.

NASim Benchmark scenarios¶
Name	Type	Subnets	Hosts	OS	Services	Processes	Exploits	PrivEscs	Actions	Observation Dims	States	Step Limit
tiny	static	4	3	1	1	1	1	1	18	4X14	576	1000
tiny-hard	static	4	3	2	3	2	3	2	27	4X18	9216	1000
tiny-small	static	5	5	2	3	2	3	2	45	6X20	15360	1000
small	static	5	8	2	3	2	3	2	72	9X23	24576	1000
small-honeypot	static	5	8	2	3	2	3	2	72	9X23	24576	1000
small-linear	static	7	8	2	3	2	3	2	72	9X22	24576	1000
medium	static	6	16	2	5	3	5	3	192	17X27	393216	2000
medium-single-site	static	2	16	2	5	3	5	3	192	17x34	393216	2000
medium-multi-site	static	7	16	2	5	3	5	3	192	17X29	393216	2000
tiny-gen	generated	4	3	1	1	1	1	1	18	4X14	576	1000
tiny-gen-rangoal	generated	4	3	1	1	1	1	1	18	4X14	576	1000
small-gen	generated	5	8	2	3	2	3	2	72	9X23	24576	1000
small-gen-rangoal	generated	5	8	2	3	2	3	2	72	9X23	24576	1000
medium-gen	generated	6	16	2	5	2	5	2	176	17X26	196608	2000
large-gen	generated	8	23	3	7	3	7	3	322	24X32	4521984	5000
huge-gen	generated	11	38	4	10	4	10	4	684	39X40	2.39E+08	10000
pocp-1-gen	generated	10	35	2	50	2	60	2	2310	36X75	1.51E+19	30000
pocp-2-gen	generated	21	95	3	10	3	30	3	3515	96X48	1.49E+08	30000

The number of actions is calculated as Hosts X (Exploits + PrivEscs + 4). The +4 is for the 4 scans available for each host (OSScan, ServiceScan, ProcessScan, and SubnetScan).

The number of states is calculated as Hosts X 2^(3 + OS + Services) X 3 *. Here the first 3 comes from the *compromised, reachable and discovered features of the state and the base of 2 is due to all state features being boolean (present/absent). The second 3 comes from the number of possible access levels possible on a host.

The table below provides mean steps to reach the goal and reward (+/- stdev) for a uniform random agent, with scores averaged over 100 runs.

NASim Benchmark scenarios Agent scores¶
Scenario Name	Steps	Total Reward
tiny	108.02 +/- 43.82	91.98 +/- 43.82
tiny-hard	135.31 +/- 65.56	21.05 +/- 85.45
tiny-small	319.56 +/- 124.26	-225.86 +/- 167.14
small	501.94 +/- 181.40	-469.80 +/- 241.99
small-honeypot	448.72 +/- 151.62	-476.08 +/- 222.41
small-linear	566.00 +/- 177.08	-555.08 +/- 241.06
medium	1371.45 +/- 420.41	-1875.29 +/- 660.62
medium-single-site	654.89 +/- 385.76	-782.17 +/- 581.14
medium-multi-site	1060.94 +/- 389.86	-1394.71 +/- 590.89
tiny-gen	86.56 +/- 40.16	116.43 +/- 40.15
tiny-gen-rgoal	98.94 +/- 47.83	104.02 +/- 47.80
small-gen	435.73 +/- 205.61	-228.53 +/- 214.34
small-gen-rgoal	423.52 +/- 226.68	-218.62 +/- 240.20
medium-gen	1002.94 +/- 468.10	-788.64 +/- 481.86
large-gen	2548.62 +/- 1224.08	-2327.34 +/- 1241.92
huge-gen	6303.86 +/- 2403.40	-6075.69 +/- 2434.77
pocp-1-gen	15189.46 +/- 6879.75	-14947.80 +/- 6887.43
pocp-2-gen	17211.38 +/- 5855.83	-16871.05 +/- 5864.58

Notes on the scenarios¶

The tiny, small, medium, large, and huge (and their generated versions) are all based on the network scenarios first used by:

The pocp-1-gen and pocp-2-gen scenarios are based on the work by:

Shmaryahu, D., Shani, G., Hoffmann, J., & Steinmetz, M. (2018, June). Simulated penetration testing as contingent planning. In Twenty-Eighth International Conference on Automated Planning and Scheduling.

The other scenarios were made up by author after looking at some random google images of network layouts, and playing around with different interesting network topologies.