Open-ended strategy evolution范文[英语论文]

资料分类免费英语论文 责任编辑:王教授更新时间:2017-04-25
提示:本资料为网络收集免费论文,存在不完整性。建议下载本站其它完整的收费论文。使用可通过查重系统的论文,才是您毕业的保障。
In a standard Genetic Algorithm (GA) following Holland [9], the genome is a fixed length string composed of symbols taken from a finite alphabet. Such a genome can encode only a finite number of strategies. This finiteness imposes a ceiling upon the possible elaboration of strategy. This can be important where individuals are involved in the sort of modelling “arms-race” that can occur in situations of social competition, where the whole panoply of social manoeuvres is possible: alliances, bluff, double-crossing, lies, flattery etc. The presence of a complexity ceiling in such a situation (as would happen with a GA) can change the outcomes in a qualitatively significant way, for example by allowing the existence of a unique optimal strategy that can be discovered. This sort of ceiling can be avoided using an open-ended genome structure as happens in Genetic Programming (GP) or messy genetic algorithms. Within these frameworks, strategies can be indefinitely elaborated so that is it possible that any particular strategy can be bettered with sufficient ingenuity. 

Here I use the GP paradigm, since it provides a sufficiently flexible framework for the purpose in hand. It is based upon a tree-structure which is expressive enough to encode almost any structure including neural-networks, Turing complete finite automata, and computer programs [14]. GP paradigm means that the space of possible strategies is limited only by computational resources. It also has other properties which make it suitable for my purposes: 1. The process is a path-dependent one since the development of new strategies depends upon the resource of present strategies, providing a continuity of development. This means that not only can completely different styles of strategy be developed but also different ways of approaching (expressing) strategies with similar outcomes. 2. The population provides an implicit sensitivity to the context of action - different strategies will ‘surface’ at different times as their internal fitnesses change with the entities – 5 – circumstances. They will probably remain in the population for a while even when they are not the fittest, so that they can ‘re-emerge’ when they become appropriate again. Thus agents using a GP-based decision-making algorithm can appear to ‘flip’ rapidly between strategies as circumstances make this appropriate.

Meta-evolution 
Such a set-up does mean that the strategy that is selected by an agent is very unpredictable; what the currently selected strategy is can depend upon the history of the whole population of strategies due to the result of crossover in shuffling sections of the strategies around and the contingency of the evaluation of strategies depending upon the past circumstances of the agent. However the method by which new strategies are produced is not dependent upon the past populations of strategies, so there is no backward recursion of the choice property whereby the presence of free choice at one stage can be ‘amplified’ in the next. Thus the next stage is to include the operators of variation in the evolutionary process. 

In the Koza's original GP algorithm there are only two operators: propagation and tree-crossover. Instead of these two operators I suggest that the population of operators themselves are specified as trees following [4]. These operators are computationally interpreted so they act upon strategies in the base population to produce new variations. The operators are allocated fitness indirectly from the fitnesses of the strategies they produce using the “bucket-brigade” algorithm of Holland [9] or similar (such as that of Baum [1], which is better motivated). To complete the architecture we set the population of operators to also operate on themselves in order to drive the production of new operators. Now the decision making processes (including the processes to produce the processes etc.) are generated internally, in response to the twin evolutionary pressures of deciding what to do to further the agents goals (in this case profit) and avoiding being predictable to other agents.

Anticipatory rationality 
If an agent is to reflectively choose its action rather than merely react to events, then this agent needs to be able to anticipate the result of its actions. This, in turn, requires some model of the world, i.e. some representation of the consequences of actions that has been learnt through past interaction with that world (either via evolution of the entity). The models of the consequences of action are necessarily separate from the strategies (or plans) for action. It is possible to conflate these in simple cases of decision making but if an entity is to choose between plans of action with respect to the expected outcome then this is not old population of operators old population of strategies new population of operators new population of strategies – 6 – possible. 

There is something about rationality which excludes the meta-strategy of altering one's model of the world to suit ones chosen strategy - the models are chosen according to their accuracy and relevance and the strategies are then chosen according to which would produce the best anticipated outcome according to the previously selected world model.reactive agent may merely work on the presumption that the strategies that have worked best in the past are the ones to use again. This excludes the possibility of anticipating change or of attempting to deliberately ‘break-out’ of current trends and patterns of behaviour.we have a process which models the consequences of action and one which models strategies for action. To decide upon an action the best relevant model of action consequence is chosen and the various strategies for action considered with respect to what their anticipated consequences would be if the consequence model is correct. The strategy that would seem to lead to the consequence that best fitted the goals would be chosen.

Co-evolution 
The next important step is to situate the above elaborated model of strategy development in a society of competitive peers. The development of free-will only makes sense in such a setting, for if there are not other active entities who might be predicting your action there would be no need for anything other than a reactive cognition. This observations fits in with the hypothesis that our cognitive faculties evolved in our species due to a selective pressure of social origin [2]. Thus we have a situation where many agents are each evolving their models of their world (including of each other) as well as their strategies. 

The language that these strategies are limited to must be sufficiently expressive so that it includes strategies such as: attempting to predict another's action and doing the opposite; evaluating the success of other agents and copying the actions of the one that did best; and detecting when another agent is copying one's own actions and using this fact to do what would help you. Thus the language has to have ‘hooks’ that refer to ones own actions as well as to other's past actions and their results.circumstances such as these it has been observed that agents can spontaneously differentiate themselves by specialising in different styles of strategies [5]. It is also not the case that just because these agents are competing that they ignore each other. Such a co-evolution of strategy (when open-ended and resource limited) can result in the intensive use of the actions of others as inputs to their own deliberation, but in a way that is unpredictable to the others [6]. So that the suggested structure for agent free-will can include a high level of social embedding.

Structuring the development of free-will within a society of peers 
The final difficulty is to find how to structure this mental evolution so that in addition to maintaining the internal coherence of the deliberations and their effectiveness at pursuing goals and being unpredictable to others, the actions of the agent can be presented to others as rational and verified as such by those agents. This is in order to fulfil criterion (E) above. This last criterion can be achieved if there is a normative process which specifies a framework of rationality which is not restrictive so that different deliberative processes for the same action can be simultaneously acceptable. The framework must be loose enough so that the openness of the strategy development process is maintained, allowing creativity in the development of strategies, etc. But on the other hand must be restrictive enough so that others can understand and empathise with the deliberative processes (or at least a credible reconstruction of the processes) that lead to action.are number of ways in which this framework could be implemented. I favour the possibility that it is the language of the strategies which is developed normatively in parallel with the development of an independent free-will. Thus the bias of the strategies can be co-evolved with the biases of others and the strategies developed within this bias.

Putting it all together 
Collecting all these elements together we have the following parts: 1. A framework for the expression of strategies which is (at least partially) normatively specified by the society of the entity. 2. An internal open-ended evolutionary process for the development of strategies under the twin selective pressures of favouring those that further the goals of the entity and against those that result in actions predictable by its peers. 3. That the operators of the evolutionary process are co-evolved along with the population of strategies so that indeterminism in the choice of the entity is amplified in succeeding choices. 4. That models of the consequences of action be learned in parallel so that the consequences of candidate strategies can be evaluated for their anticipated effect with respect to the agent’s goals. Each of these elements have been implemented in separate systems, all that it requires is that these be put together. No doubt doing this will reveal further issues to be resolved and problems to be solved, however doing so will represent, I suggest, real progress towards the goal of implementing free-will.

Conclusion 
Although it is probably not possible to implement the facility for free-will directly in an agent (i.e. by designing the detail of the decision making process), I have argued that it is possible to implement a cognitive framework within which free-will can evolve. This seems to require certain machinery: an open-ended evolutionary process; selection against predictability; separate learning of the consequences of action; anticipation of the results of action and the evolution of the evolutionary process itself. Each of these have been implemented in different systems but not, as far as I know, together. The free-will that results is a practical free-will. I contend that if the architecture described was implemented the resulting facility would have the essential properties of our free-will from the point of view of an external observer. Such a facility seems more real to me than many of the – 8 – versions of free-will discussed in the philosophical literature, because it is driven more by practical concerns and observations of choice and is less driven by an unobtainable wish for universal coherency. are basically three possibilities: free-will is a sort of ‘magic’; it is an illusion; or it is implementable. I hope to have made the third a little more real.

网站原创范文除特殊说明外一切图文作品权归所有;未经官方授权谢绝任何用途转载或刊发于媒体。如发生侵犯作品权现象,英语论文题目,保留一切法学追诉权。()英语毕业论文
免费论文题目: