📄 paper.tex
字号:
% This is LLNCS.DEM the demonstration file of
% the LaTeX macro package from Springer-Verlag
% for Lecture Notes in Computer Science, version 1.1
\documentstyle{llncs}
%
\begin{document}
\title{Little Green BATS Humanoid 3D Simulation Team Research Proposal}
\author{Sander van Dijk\inst{1}, Martin Klomp\inst{1}, Herman Kloosterman\inst{1}, Bram Neijt\inst{1}, Matthijs Platje\inst{1}, Mart van de Sanden\inst{1} and Erwin Scholtens\inst{1} \\ Supervisor: Tijn van der Zant\inst{1,2} }
\institute{University of Groningen, Artificial Intelligence Department, The Netherlands\andExecutive Committee Robocup@Home}
\maketitle
\begin{abstract}As required for the RoboCup 3D league of 2007, this proposal describes the research and implementation planned for the Little Green BATS team.The research will focus on two hierarchical behavior systems that are used to construct an agent's intelligence. The systems can be constructed mostly by hand, but also have the capability to be improved by learning mechanisms. One of the hierarchical systems, that will be part of one of the member's master thesis, will even be able to learn it's hierarchical structure from the ground up.\end{abstract}
%
\section{Introduction}
%prima...Agents that live in complex environment usually face a difficult problem: they need to show high level intelligence to survive, but the only way to directly interact with the world is through very basic senses and actions. They need a form of abstraction on these senses and actions to be able to act and response to difficult situations in real time. This abstraction has to take place at different levels, because there is no clear distinction between low level behavior and high level behavior. It is even hard to tell what kind of behavior is of a higher level than other kinds, so abstraction should be possible in any degree, at any level and over any kind of abstraction that is already made at lower levels.To account for this abstraction, our research will focus on hierarchical behavior models. Where the agent's intelligence is constructed as a tree of subbehaviors, each controlling behaviors at lower levels to supply a new abstraction in the agent's senses and actions. We will use two different hierarchical models. One is focussed on construction of the behavioral tree by hand using human high level knowledge, either top down or bottom up, with the ability to improve the behavior through learning. The other, part of one of the team members' master thesis, focusses on learning the tree from the ground up with as few human interactions as possible. We will research the differences in performance and usability of both models and ways to integrate the two models to be able to mix the strong points of each.In the next section we will lay out the first proposed model. Section 3 will summarize the proposal for the master thesis on the second model. In section 4 we propose ways to combine the models in the future.
\section{BATS Hierarchical Behavior model}The first hierarchical model will initially be the base of the BATS Humanoid Simulation soccer agents. It is based on the following key ideas:\begin{enumerate}\item A type of behavior is defined by the type of goals it tries to achieve\item A behavior can set subgoals to be achieved by other behaviors\item An agent has one highest level goal (eg 'win the game' for a soccer agent)\item Behavior selection is done based on a behavior's capability to reach a goal\end{enumerate}Therefore, a behavior consists of a sequence of steps. Each step has a subgoal and a set of subbehaviors that can be picked to achieve these goals. This results in a tree of behaviors with the top level, most abstract behavior at the root and the lowest level, most primitive behaviors at the leafs. The latter behaviors don't perform any behavior selection anymore, but only perform real world actions.There are several questions related to this model that arise and which we attempt to answer with our research:\begin{enumerate}\item What is the best strategy to run through a sequence inside a behavior? Should it only be possible to travers it from start to end, step by step? Or does performance increase when it is possible to skip steps, so the endgoal can be reached faster, but with the risk of skipping crucial subgoals?\item Initially the capabilities of behaviors to achieve a goal are defined by the human designers. What learning method is best to improve these values, or to learn them from scratch, given real world experience?\end{enumerate}\section{Hierarchical Multi Agent Reinforcement Learning}The second hierarchical model extends existing hierarchical Reinforcement Learning techniques. The main focus is on letting agents determine their own behaviour from the ground up, sped up by abstracting the behaviour in a hierarchical way. This way the agents can perfect their behaviour and cooperation at every level and still react swiftly to unexpected situations.Until recently, learning, planning and representing knowledge at higher levels of abstraction were key problems for the field of Reinforcement Learning. A couple of solutions have been suggested to decompose learning complex behaviours into hierarchies, opposed to using so called \emph{flat behaviours}, most notably MAXQ learning, Hierarchical Abstract Machines (HAMs) and the Options model \cite{dietterich00,parr97,sutton99,barto03}.As in the model described in the previous section, initially the components of the hierarchies of these solutions, their place in the hierarchy and the abstractions used were decided upon by the human designer, using his higher level knowledge of the domain. However, even though solutions found this way can be good, the restrictions set by the designer will usually cause them to be sub-optimal. This is why some researchers have looked into automatic sub goal discovery and learning task hierarchies \cite{mcgovern01,bakker04}.In a multi agent environment like robotic soccer, complex behaviour like communication and coordination can increase the performance of an agent and the system in total. Multi-agent learning is very complex because of the rise in dimensionality and the partial observability: the behaviour of the other agents is not always known and predictable. However, learning together can benefit from dividing the main task into subtasks, possibly decreasing the complexity of behaviour of the separate agents. Hierarchical Multi Agent Reinforcement Learning (HMARL) can help to address the complexity problems to accelerate learning \cite{ghavamzadeh06}.The research concerning the second model will use these techniques and build upon them. It will be the first time that the theories on hierarchical multi agent reinforcement learning and automatic option discovery will be combined. This will take away the requirement to prematurely determine agent structures in multi agent environments, restricting their final abilities and still be able to use the hierarchical model for abstraction, making learning faster and coordination between agents easier.Also, the research will be a practical test case for the relatively new field of Hierarchical Reinforcement Learning. Until now, most environments to show and test these theories are simple and discrete and created specifically for these theories. By using the competitive RoboCup environment as a base, the theory can be tested in a 'real' world against other methods that have proven to be successful.The following questions will be answered:\begin{enumerate}\item{Is a hierarchical behaviour based system, notably the Options model, applicable to a (semi) continuous domain, like Robosoccer?}\item{Is it possible for Robosoccer agents to automatically discover usable Options?}\item{Can heterogeneous Robosoccer agents learn to play well together by constructing their hierarchies simultaneously?}\item{Is the abstraction of the task by the hierarchical models enough to effectively switch and learn a strategy against a competing team within a match?}\end{enumerate}
%
% ---- Bibliography ----
%\bibliographystyle{plain}\bibliography{voorstel}
%
\end{document}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -