Wednesday, August 15, 2007

Automated Agents that Learn...

AUTOMATED AGENTS THAT LEARN AND EXPLAIN THEIR OWN ACTIONS: A Progress Report

S. Kocabas*, E. Oztemel**, M. Uludagand Nazim Koc

Abstract
Computer generated agents need to be able to learn meaningful actions in various tactical situations and explain the reasons behind such actions. Different inductive methods have been tried by a few research groups in teaching actions to such agents in tactical air simulations. There have also been some attempts to enable the intelligent agents explain reasons behind their own actions in the form of debriefing records. However, previos research has left the integration of learning and real time explanation as an open issue. The use of inductive methods in teaching tactically meaningful actions makes it rather difficult to integrate learning and explanation. In our research, we have used deductive methods in teaching meaningful actions and their real time explanations to an intelligent air target in 1-v-1 air combat. Our research aims at integrating artificial intelligence techniques in an international EUCLID project for building a distributed simulation system.

Keywords: machine learning, driving simulation, real-time explanation.
-------------------------* Also affiliated with: Department of Space Sciences and Technology, ITU, Maslak, 80626 Istanbul, Turkey.** Also affiliated with: Depertment of Industrial Engineering, SAU, Esentepe Kampusu, Adapazari, Turkey.

1. Introduction
Recent research on computer generated agents focus on using artificial intelligence (AI) techniques in controlling such agents. Several research groups have studied the application of AI techniques in various aspects of air to air combat. These efforts include the application of neural networks for acquiring air combat decision-making skills (Crowe, M.X., 1990); automated agents for beyond visual range (BVR) tactical air simulation (Rosenbloom, et al., 1994); knowledge based decision aiding for BVR combat with multiple targets (Halski, et al., 1991); generating agent goals in an interactive environment (Jones, R.M., et al., 1994); and agents that explain their own actions (Johnson, W.L., 1994).

A large part of the current research relies on static knowledge based methods rather than machine learning techniques which enable the dynamic acquisition of the knowledge and skills of human behavior in tactical situations such as in air combat.

In our research we attempt to implement explanation based learning (EBL), a deductive machine learning technique, in teaching computer generated agents to perform intelligent behavior in BVR and close combat. This study is carried out as part of a joint EUCLID project RTP 11.3 which aims at building a distributed simulation system capable of integrating C3I functions and AI techniques. The project uses ITEMS as the simulation environment.
Explanation Based Learning has been one of the extensively investigated machine learning methods in artificial intelligence (see, e.g., Mitchell, et al., 1986). Different versions of EBL has been applied to a variety of tasks, such as learning concepts, control rules, and planning and scheduling, but the majority of these applications are in small domains.

2. The Task Domain
The aim of our research is to develop techniques to create AI targets (AIT) capable of performing intelligent behavior in tactical air combat. The tactical behavior includes BVR and close combat, in a BARCAP (barrier combat air partrol) scenario for an F16 plane. The task is the intelligent control of the AIT from an AI station connected to the main simulation system via Ethernet (see, Figure 1.)
------------------ ------------------

Ethernet Simulation
AI Station ++++++++++ Sysytem
(ITEMS)

------------------ ------------------

Network

Figure 1. The hardware structure for the
intelligent control of scenario elements.
The ITEMS simulation system is capable of large number of independent agents called scenario elements or "targets" in a real-time 3-D environment representing geographical, atmospheric and terrain data. In a scenario, the scenario elements can be controlled by human operators or control programs. The ITEMS system itself has rule based facilities for developing control systems for creating automated agents.
The acquisition of knowledge and skills for complex real-time behavior as in tactical air combat is a difficult task. Handcoding of rules for such behavior is rather tedious, as it is difficult to foresee all possible interactions. Therefore, machine learning methods need to be used for the acquisition of such knowledge and skills. Some inductive methods have been used in acquiring the rules of intelligent behavior e.g. from flight data obtained from excercises (see, e.g. Crowe, 1990; Sammut, et al., 1992). However, inductive methods require a large number of training examples in order to support reasonably acceptable behavior. Additionally, it is difficult - if not impossible- by inductive methods to integate capabilities for the intelligent agent to explain its own behavior in every tactical situation. Behavioral explanations for intelligent agents have been studied by Johnson (1994) using SOAR, but the explanations provided by Johnson's Debrief system are post-flight explanations, rather than real time explanations.
We have been developing an integrated system called RSIM, capable of controlling an F16 in the ITEMS simulation environment in an intelligent and human-like way. The RSIM system is capable of learning tactical behavior at training sessions, and producing and explaining its agent's behavior in real time during the execution of a mission. The program consists of two subsystems: Cognition-Action subsystem, and Learning and Explanation subsystem (see Figure

2).
RSIM has been tested on a 2-dimensional simulation system for BARCAP mission in 1-v-1 tactical situations with successful results. The program learns to patrol a region around a waypoint in Forward Battle Area (FBA), and engage a hostile target as soon as the situational conditions are satisfied. RSIM also learns the explanation of its target's behavior at each tactical situation during training excercises, and produces the same explanations in similar situations during scenario executions. The program is now being tested on SG Flight Simulator, and will be adapted to ITEMS as soon as the latter is installed.

------------- -------
begin Set Initial
RSIM -----> Conditions ----->
-------------
Learning and S
Explanation Cognition-Action I
-------------------------------------------- M
read Situation Assessment U
situation <---- L
x,y coordinates A
headings T
generalize angle I
situation distance <----- O
time N
expert missile range
---> call for missile count
action fuel S
------------------------ Y
expert S
---> call for ------------------------ T
explanation Action Management E
----> M
missile control/fire ----->
formulate select maneuver
rule explain action
-------------------------------------------- -------

end


Figure 2. Control structure of RSIM.

<>

References
Crowe, M.X. (1990). The application of artificial neural systems to the training of air combat decision-making skills. In Proceedings of the 12th ITSC., pp. 302-312.
Halski, D.J., Landy, J.R. & Kocher, J.A. (1991). Integrated control and avionics for air superiority: A knowledge-based decision-aiding system. AGARD CP-424, Madrid 1991, pp 53-1 to 53-10.
Johnson, W.L. (1994). Agents that explain their own actions. In Proceedings of the 4th Conference on Computer Generated Forces. May 1994, Orlando, Florida.
Jones, R.M., Laird, J.E., Tambe, M. & Rosenbloom, P.S. (1994). Generating goals in response to interacting goals.In proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation.
Mitchell, T., Keller, R. M., and Kedar-Cabelli, S.T. (1986). Explanation- based generalization: A unifying view. Machine Learning 1 (1) 47-80.
Rosenbloom, P.S., Johnson, W.L., Jones, R.M., Koss, F., Laird, J.E., Lehman, J.F., Rubinoff, R., Schwamb, K.B. & Tambe, M. (1994).Intelligent automated agents for tactical air simulation: A progress report. In proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation. pp. 69-78.
Sammut, C., Hurst, S., Kedzier, D., and Michie, D. (1992). Learning to fly. Machine Learning Workshop Proceedings, pp. 385-393, Morgan Kaufmann.
3. RSIM's Control Structure
In order to explain RSIM's operation we will describe the program in terms of its problem space, its subsystems, and its inputs and outputs. The Cognition-Action system of RSIM divides into two operators as Situation Assessment and Action Management. Each of these subsystems and their operators are described below.
3.1 Cognition-Action Subystem
The Cognition-Action subsystem of RSIM consists of two modules: Situation Assessment, and Action Management. An intelligent agent operating in a real-time environment, must have the capability of situation assessment in an effective way in real time. RSIM's Cognition-Action subsystem performs situation assessments by its Situation Assessment operator.
3.1.1. Situation Assessment
The problem space of RSIM consists of two targets moving in a two- dimensional space. There are 12 state variables for these targets. The names of these variables and their types are as follows:
x,y coordinates (AIT/MCT) (integer)
Headings (AIT/MCT) (8 directions)
Distance between targets (real)
Positional angle (AIT -> MCT) (real)
Time (real-time)
Missile range (integer)
Missile fired (AIT/MCT) (integer)

The values of the state variables determine the problem situation at every instant. As the targets change their positions every 1/2 seconds, the problem situation changes accordingly. At every cycle, RSIM has to make situation assessment, and has to decide which action to take. Only some of these values are provided by the Simulation System. These are the values for x-y coordinates for both targets, their headings, real time, missile range and fired missile count. The program's Situation Assessment operator reads the values for the x-y coordinates, and calculates the values for the real distance and the positional angle between the two targets.
Once the values for real distance and angle are calculated, these are classified into fuzzy values. The state variables and their values are sent to a message list by the Situation Assessment operator. This message list is read by the Action-Management operator.

3.1.2. Action Management
The Action-Management operator has three functions: 1) Select-Maneuver, Missile Control, and Explain Behavior. The Select-Maneuver function decides which action to be taken for the AIT, by reading the message list and matching the operational variables in the message list with the action rule set. The rule that matches the current situation is selected as the action rule to be in effect.

Here, each action rule points to a simple maneuver, where each maneuver consists of four-pixel motion. There are five such simple maneuvers as go straight (ss), soft turn right (sr), hard turn right (hr), sot turn left (sl), and hard turn left (hl), (see, Figure 3). In this way, each maneuver lasts two seconds (4 pixels by 1/2 second each).
ss
sl. . sr
. . . .
hl . . hr
. .

Figure 3. Five simple maneuvers for RSIM targets.
Altohugh the selected maneuvers last two seconds, situation assessments continue to be carried out at every cycle of 1/2 second and the message list is read by the Action-Management at every cycle. In this way, when AIT enters into missile fire zone during a simple maneuver, Missile-Control function fires a missile provided a missile is available.

The Action-Management operator can explain the reasons for the selection of a particular maneuver by sending a message, and this message appears during the execution of that maneuver in a screen window. In this way, the the behaviour of AIT is explaned for every simple maneuver in a continuous sequence of maneuvers.

All of the messages of the Action-Management operator, including the explanations, are sent to the Simulation System. The maneuver messages are translated to single-step actions by the Simulation System. For example, a message that says apply go-staright (ss) maneuver, is performed by moving the target by four pixels in the target heading, keeping the heading constant.

3.2. The Learning and Explanation Subsystem
RSIM has a learning subsystem which learns action rules for the AIT by an explanation based generalization (EBG) mechanism. Action rules and explanations are learned during training sessions in an incremental fashion. Action rules are if-then rules that match situations with simple maneuvers. At each problem state, operational variables in the message list periodically updated by the Situation Assessment function, are taken as the current situation. If no rule exists to match the current situation, then the Learning subsystem asks the trainer which maneuver to select. The Learning subsystem then generalizes the current situation, and records it as the conjunctive conditional part of the rule whose conclusion or action part proposes to apply the selected maneuver. The generalization consists of generalizing the values of operational situation variables from real values to a predetermined range. In this way, the distance and angle between the two targets are mapped into a particular distance and angle range.

The trainer also gives an explanation as to why that particular maneuver was selected. This explanation is associated with the rule generated for the current situation as the reason for the selection of the rule. An example rule is shown in Figure 3. The rule
-----------------------------------------------

Conditions: Distance is D6, and
Angle is A5, and
Heading(AIT) is E, and
Heading(MCT) is W.

Action: Apply maneuver SS.

Explanation: Target detected. Approach target.

-------------------------------------------------
Figure 3. Example of a rule generated by RSIM:
in this figure says that when the distance between AIT and MCT is within the range of D6, the angle is within the range of A5, the heading of AIT is east, and the heading of MCT is west, then continue to go straight. The reason for this particular maneuver under the current situation is that the target MCT has been detected, and the intention is to approach the target.
RSIM can apply the rules that it has generated as soon as a matching situation arises. In other words, the program generates and uses its rules in a dynamic way, rather than storing the rules in a rule database. Once the scenario ends (e.g. when a target is shot) learned rules can be transferred from dynamic memory to a rule file for future use.

References
Crowe, M.X. (1990). The application of artificial neural systems to the training of air combat decision-making skills. In Proceedings of the 12th ITSC., pp. 302-312.
Halski, D.J., Landy, J.R. & Kocher, J.A. (1991). Integrated control and avionics for air superiority: A knowledge-based decision-aiding system. AGARD CP-424, Madrid 1991, pp 53-1 to 53-10.
Johnson, W.L. (1994). Agents that explain their own actions. In Proceedings of the 4th Conference on Computer Generated Forces. May 1994, Orlando, Florida.
Jones, R.M., Laird, J.E., Tambe, M. & Rosenbloom, P.S. (1994). Generating goals in response to interacting goals. In proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation.
Mitchell, T., Keller, R. M., and Kedar-Cabelli, S.T. (1986). Explanation- based generalization: A unifying view. Machine Learning 1 (1) 47-80.
Rosenbloom, P.S., Johnson, W.L., Jones, R.M., Koss, F., Laird, J.E., Lehman, J.F., Rubinoff, R., Schwamb, K.B. & Tambe, M. (1994). Intelligent automated agents for tactical air simulation: A progress report. In proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation. pp. 69-78.
Sammut, C., Hurst, S., Kedzier, D., and Michie, D. (1992). Learning to fly. Machine Learning Workshop Proceedings, pp. 385-393, Morgan Kaufmann.

No comments: