📄 hellibash3d-tdp-2008.tex

📁 a good irainan 3d team of allame helli
💻 TEX
字号:
% HelliBASH 3D Team Description Paper\documentclass{llncs}\usepackage{graphicx}%\textwidth=450pt\begin{document}\title{Simulated Humanoid Robot Controller\\Team Description Paper\\HelliBASH3D}\author{Maziar Amini Zanjani (maziar.aminizanjani@gmail.com) \\	Sina Saharkhiz (sina.saharkhiz@gmail.com)\\	Team leaders:\\	Eugene Ghanizadeh (ghanizadeh.eugene@gmail.com),\\	Pooria Kaviani (pooria.kaviani@gmail.com)}\institute{Department of Computer Science and Information Technology \\ Allame Helli High School (NODET), Iran}\maketitle\begin{abstract}The goal of this paper is to describe HelliBASH 3D soccer simulation team researchesand current status of our team. Here, we go on explaining some facts about the simulationbase, the basic skills that a humanoid robot should contain, and some factors which arenecessary to save stability of the biped robot. \end{abstract}\section{Introduction}Starting from a point to describe 3D soccer simulation, It's notable that a soccer agent thinks and decides about which action (containing the actions of: walking, turning, shooting, standing up, etc) should be done next. To determine what should be done as the next action, the agent must pay attention to a plenty of parameters (including: agent's position, ball's position ,gyro rate, etc) Also, in order to have a good play, there should be some kind of co-operation between the agents.\section{Behaviors}\begin{figure}[htp]\centering\includegraphics[scale=0.5]{Pics/chart.eps}\caption{}\label{fig:My Label}\end{figure}Soccer simulation server sends some datas to the agent. The agent must parse the datas and makedecision to perform the best action. Then the agent sends the specified commands back to server. \\The agents must contain some procedures, which manage the behaviors of them. It's obvious that anormal human soccer player, has got these abilities. In order to reach our goal, the agent musthave some basic (but necessary) skills. \\We've provided our agents with two kind of BasicSkills and AdvancedSkills. Each AdvancedSkilluses some BasicSkills, and each BasicSkill sets the properties of the joints directly and makethem move. The agent has some plans which are named as GoalKeeper Plan, Defender Plan, MidFielderPlan and Striker Plan. It runs a plan according to the player type, and the plan chooses whichAdvancedSkill and where it should be used for. \\Some of agent's advanced skills has been listed below:\begin{enumerate}\item (GoalKeeper): saving the ball.\item (All Types): The ability to walk and saving stability.\item (All Types): Standing up, when it falls on the ground.\item (Striker): Finding the ball, turning toward it, walking toward, stopping near the ball, aligning the body with the ball, shoot.\item Having a plan for the play: Note that there must be some events which can affect our plan, so it's better if the plan is dynamic.\end{enumerate}\section{Walking}Walking skill is one of the most important (and maybe the hardest) skills which is critical.There are two main factors in a cycle of walk:\begin{enumerate}\item Stability:\\The main parameter in walking skill is stability. High stability equals walking faster.There are some important parameters to improve the robot stability.\cite{2}\\We can name some easy and static algorithms that let the robot to walk statically, butthey are slow. As an example we can name "step by step walking" (making the robot stableafter every footstep). So it's better to look for a dynamic algorithm.\item Speed:\\As mentioned above, this item, can be considered as a member of the Stability, but it'simportant enough to name it as a seperate item\end{enumerate}\section{Humanoid stability}Between all of the agent's abilities, stability is an important problem. The robot must be stablein all situations (i.e. walking, standing up, etc), So it's necessary to use optimized algorithmsto make the biped robot stable.\\Since that keeping a biped stable, is a matter of real humanoid too, we can reuse researches in this field:\\"Controlling fast biped robots requires a deeper knowledge of biped walking than for slow robots.While the slow robots walk statically, fast biped has to be dynamically balanced. A very interestingpoint is that, in spite of all the researches around walking techniques, the robots couldn't reachthe stability and speed of human's normal walking."\cite{6}\\Considering the physical aspect of the robot, there are some useful parameters that can be used for reaching the stability :\begin{enumerate}\item Center Of Mass (COM)\\Assuming that each part of the robot's body (shown by a box) has a specific mass and dimensions,we can calculate the COM point of each box. At last we can calculate the COM point of whole body.\\If shadow of COM point falls in the support polygon, the situation is stable. (Note that supportpolygon is the convex hull of the stance foot)\\The most stable situation is reached when shadow of COM point is located in the center of support polygon.So, our goal is to move the joints in such a way that shadow of COM point gets located in the center of the support polygon.\\COM point could be computed from the formula below:\begin{center}$ P_{COM} = \frac{\sum_{i=1}^n P_i. m_i}{\sum_{i=1}^n m_i} $\end{center}Where $ M_i $ is the mass of $i$th box and $ P_i $ is location of the COM of the $i$th box. \\Depending on the position of COM and comparing it with the center of support polygon, the robotcan reach the stability through moving the shadow of COM to center of support polygon. We'vedesigned a middleware controller system for determining the required joint movements, so that theCOM reaches the center of convex hull and robot gains static physical stability.\\ \begin{figure}[htp]\centering\includegraphics[scale=.6]{Pics/COM.eps}\caption{}\label{fig:My Label}\end{figure}In Fig. 2 , you can see that cross 1 and cross 2 are stable points (cross 1 is at full stability), andit's obvious that cross 3 and cross 4 are not stable.\item Zero Moment Point (ZMP)\\Considering just the informations that COM provides, it's not possible to use it for keeping the dynamic physicalstability (in contrast with static skills like static walking) since in a dynamic physical action overall torqueson the robot body should be zero. Dynamic physical action is in contrast with static physical actions in whichthere is an equilibrium after a small amount of time in the whole system. So we should use another technique,which can include moment effects too. \\A useful dynamic stability providing parameter is Zero Moment Point. ZMP is a point on the ground where totalmomentum generated (due to gravity and inertia) equals zero. This concept was introduced in January1968 by Miomir Vukobratovi膰 at The Third All-Union Congress of Theoretical and Applied Mechanics in Moscow.It specifies the point with respect to which dynamic reaction force at the contact of the foot withthe ground does not produce any moment\cite{7}. If the ZMP locates in the support polygon, the robothas less overall momentum and more stability.\cite{1}\begin{center}$ X_{ZMP} = \frac{\sum_{i=1}^n m_i(\ddot{z}_i-g_z)x_i-\sum_{i=1}^n m_i(\ddot{x}_i-g_x)z_i-\sum_{i=1}^n (\dot{T}_y)_i}{\sum_{i=1}^n m_i(\ddot{z}_i+g_z)}$\end{center}\begin{center}$ Y_{ZMP} = \frac{\sum_{i=1}^n m_i(\ddot{z}_i-g_z)y_i-\sum_{i=1}^n m_i(\ddot{y}_i-g_y)z_i-\sum_{i=1}^n (\dot{T}_x)_i}{\sum_{i=1}^n m_i(\ddot{z}_i+g_z)}$\end{center}$ (\dot{T}_x)_i , (\dot{T}_x)_i $ denote x, y components of time derivatives of the $i$th box angular momentum.\item Motion Generation Algorithm\\One of the most famous algorithms for walking ability is Motion Generation. In this algorithm,first we generate the body's path, and then, in each cycle, a middleware action controller sets the anglesof joints in such a way that each joint reaches the correct place. (Fig.4)\cite{4}\\\begin{figure}[htp]\centering\includegraphics[scale=0.3]{Pics/pathgeneration.eps}\caption{}\label{fig:My Label}\end{figure}\item Tracking Control\\In this algorithm, we devide the process of controlling the robot into two parts:\\\begin{enumerate}\item Offline Trajectory Generation\\This part's duty, is to do the needed calculation before running the agent and then generating the best algorithm.\\\item Real-Time Motion Control\\This part is the real-time controlling. In this part another algorithm is running parallel the main algorithm(which was generated by the offline trajectory generation). This algorithm will do a correction proccess on themain algorithm with analyzing the effective factors in walking.\cite{3,5}There are some dynamic factors that affect the robot in such a way, that they prevent us from using staticalgorithm for using a skill. So we have to use some kind of dynamic algorithm, so that we can keep our robot stable.Dynamic stability is the most important problem in controlling a humanoid robot. This algorithm must check robot'sstability with robot's sensors (such as gyro rate sensor, feet force resistance sensor or calculating the COM or ZMP).If the algorithm detects any instability, it will try to correct the main algorithm. There should be a kind of Self Learningso that the agent will learn the best algorithm, this will be explained later.\end{enumerate}\end{enumerate}\section{Future Plans\cite{8,9}}*Reinforcement Learning\\For using the Tracking Control technique and ZMP, we should have a learner system. Learner Systems, aremostly used for NP problems which have a lots of cases so that the whole state space can not be examined. These systemscan be useful for our program, because we can't examine all of the cases of walking.\\With the researches that  we had, gradually it became obvious that Reinforcement Learning (RL) issuitable for us. It's good to explain what it is and how it works, first.\\\begin{enumerate}\item How does it work:\\A function can be imagined which in each situation does an action. Also there is another function which tells the previousfunction, the result of that action. and the first function, depending on the result, changes the action and soon. It's notable that the return value of the second function is called the reward.\item How to describe the problem as a model\begin{figure}[htp]\centering\includegraphics[scale=0.45]{Pics/Diagram1.eps}\caption{}\label{fig:My Label}\end{figure}\begin{enumerate}\item Choose an action to do\item Depending on value of reward select an action and return\item Do this action\item The result was ...\item How much reward is going to be given for the result and movements\item Return the requested value\item Assign the values of rewards\end{enumerate}\item Why Reinforcement Learning\\It's a good question. Note that for each case of questions, there might be another suitable algorithms.\\These are the advantages of RL which satisfied us to use it, they are listed below:\begin{enumerate}\item It's not necessary to know how the environment is:\\It was explained before, that we need to know only the least information from environment,and it is the result of our action. It's a very good advantage that you don't have to makean extra connection between environment (which can be very complicated) and the agent.Also we don't have to be worried about the complexity of enviroment affect on the agent.\item It's unsupervised:\\In a supervised learning program, there must be a direct teaching, for example, there must be a humanwho will say the program wheter the decision was good or not, but it doesn't fit for our target, because we can'tdecide if the robots walking was optimal or not.\end{enumerate}\item General model for the agent\\We'll assume Q a function which represents the probability of reward if at the state $S_i$ the action $A_j$ is chosen.After the learning program has finished it examines and returns the action with the maximum Q at each state. Alsothe learning can continue online which is a very good advantage.\\Now the matter is that:\begin{enumerate}\item How to modify Q due to rewards:\\A good choice for this part is Q-Learning algorithm. Concise of it is that Q is initialized by a value andthen after each reward Q changes with some policy.\\\item What action to choose\\There are a lot of algorithms for this part which discussing about them is beyond the scope of this document.\end{enumerate}\item RL's usage in our program\\In soccer simulation, RL can be very useful. We can pass the parameters, states of joints and angle of joints,as the state and action parameters. Also, for the reward calculation part, we can use ZMP, because if ZMP is in thesupport polygon, the robot is stable. So we can use the distance of ZMP and center of support polygon as reward.\end{enumerate}\bibliographystyle{plain}\bibliography{voorstel}\end{document}
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -