ISLAM and SCIENCE
Dr Sakir Kocabas
1. Introduction
The contribution of Islamic civilization to scientific development has been a subject of debate both by Muslim and non-Muslim writers since the 19th century. The controversial position of Muslims in such debates is due to the fact that they have lost their effectiveness in science and in other fields of life after a certain stage in history.
When we look at the "Islam and Science" debates in the last two centuries, we see several approaches in general. The first one, which was developed by European historians of philosophy and science in the last century and lasted until World War II, claims that Muslims have not contributed to civilization in the fields of science and philosophy anything significant beyond being the "commentators" of Ancient Greek philosophy. We see such eminent philosophers as Bertrand Russell1 among the proponents of this view.
The second approach is the one developed by some Muslim writers in need of intellectual defense for Islamic civilization against the views of European historians of science and philosophy. According to this approach, science and philosophy are not only not needed for human happiness and well being in any case, but even harmful activities. Consequently, there is nothing to worry about the fact that Muslims are lagging behind in science and technology.
The third approach, which is also developed by some Muslim scholars, claims that Muslims have conducted pioneering studies in science and philosophy in the Classical Era of Islamic civilization, but were held back in scientific activity by devastating external influences such as the Crusades and the Moghul invasions.
The fourth approach, which was adopted by some western historians of science such as Sarton2, and more recently by Huff3, recognizes that Muslims made major contributions to science in the early period of the Islamic civilization, and attempts to explain the decline of scientific activity in terms of social, cultural, economic and legal issues.
The fifth approach which is being advocated by some Muslim writers4, claims that the Muslims' earlier contributions to science have nothing to do with Islam, as these studies were conducted in a non-Islamic, secular spirit, and if at all, Muslims will only be successful in the same way. The proponents of this approach also want to extend the separation of "science” and “religion" which took effect after the Renaissance and Reformation in Europe. They carry the related arguments about Christianity over Islam without a detailed conceptual analysis, and try to establish an "Islam and science" division in the Muslim mind.
Our approach will be a different one bringing a detailed analysis and a new synthesis to the subject, although carrying elements from some of the above stated approaches. We will try to describe our approach in some detail below, but before that, let us take a closer look at the other five approaches.
The first approach has lost its credibility today among people who are acquainted with history of science, but nevertheless is still being argued by some short sighted writers. We will show below with examples why this approach is based on mistaken views. The second approach is being defended by naive Muslims in good will, without being aware of what they advocate seriously contradicts the concept of knowledge (=‘ilm) and a series of other related concepts in the Qur'an. The third approach on the other hand, is in the error of exaggerating the relationships between scientific motivation and social, political and economic conditions. In this study, we will start from the point reached by the fourth approach. We will attempt to unearth the basis of the problem, namely, what have happened after a period of intensive learning, investigation leading to the development of a scientific research tradition in the Classical Era of Islamic civilization, i.e. the 8-11th centuries. But before we do that, let us try to explicate why the fifth approach starts from mistaken premises.
In order to decide about how much of the historical reality is reflected by the claim that Islam had nothing to do with the Muslims' contribution to science during the Classical Era (roughly 8-11th centuries), we have to bear in mind a few historical facs. It is well known that the Europe of the Middle Ages turned to scientific investigation as it turned away from the views of the Church about the world and nature. Whereas, we observe that the Arabic and Turkic populations who did not have any noticeable scientific activity before Islam, began to produce their scientific and philosophical works after they accepted Islam. Therefore, the Muslims in that era conducted their scientific and philosophical studies with the motivation and within the intellectual framework that they acquired largely from Islam.
To see that it was Islam that gave the Muslims of the Classical Era such great motivation in science and philosophy compared to their contemporaries, we have to understand what sort of conceptual transformation that Islam had made in their minds. What were their concepts of the world and reality before their acceptance of Islam? And how did their concepts of kmowledge and reality transformed by their acceptance of Islam? We can ask similar questions for the christian peoples in Europe in the Middle Ages: What sort of conceptual transformation took place in Europe after the 12th century? What sort of conceptual changes lay behind the scientific development that took place after the 16th century?
Some contraversial european historians, before answering such questions kept claiming that scientific thought in Europe was developed from ancient Greeks, in an attempt to blur the effects of Islam on the Renaissance, the Reformation and the subsequent scientific endavour. This opinion has long lost its historical credibility, but there may still be some who would like to defend it. A simple question should be sufficient to bring this claim into perspective: Why had the Europeans not been able to start the Renaissance and develop their scientific thinking much earlier, as they had the works of the ancient Greek philosophy and science in their hands for over a thousand years? Why had Europe waited another 1200 years for the development of scientific thought in Europe? The answers to these questions can be: Either the Europeans were unable to understand the works of the ancient Greeks during that period, and began to understand them only after reading the works of the Muslim thinkers such as Ibn Rushd, or the ancient Greek thought by itself did not provide the concepts and motivation good enough to initiate such a scientific enterprise, or both. (Despite their controversial nature we can say that all these possibilities are worthy of a detailed investigation.)
When we study the history of Islamic thought, we see that the Muslims have been in a large scale and multi-dimensional research activity in science and philosophy until the 12th century, lasting as sporadic and isolated activities until the 15th century. During this period, on the one hand they studied the works of ancient Greek and Indian scholars5, and on the other they developed completely novel approaches and methods of systematic investigation of nature. We will explain this with examples below.
The development of these different approaches and methods require a certain creative intellectual capacity. Scientific creativity and its relation to motivation and conceptual structures is a subject which we have been studying recently6. In order to have an insight to the nature and extent of the conceptual transformation that Islam has brought to the Muslims in the Classical Era let us now take a look at certain concepts and their interrelations in Islam, particularly in the Qur'an.
2. The Concept of Knowledge in The Qur’an
Throughout the history of Islam, a great many things have been said about the concept of knowledge (= ilm) by Muslim and non-Muslim writers7. These will be outside of the scope of our present concern. We have conducted a 6-month study over the concept of knowledge (= ilm) in the Qur'an in 1986. A summary remarks on the grammar of the concept "ilm" in the verses8 of the Qur'an in which the word takes place in its root form and derivatives can be given as follows:
1) First of all, the concept of knowledge (= ilm) in the Qur'an is an indivisible, holistic concept. (In this respect, there is a certain important difference between the concept of "ilm" in the Qur'an and that of today's Muslims. We shall return to this point below.)
2) The Qur'anic concept of knowledge covers all knowledge, without making any distinction between sciences as such.
3) The concept of knowledge in the Qur'an does not allow the qualification of true/false knowledge, because the word is always used in certain relation to reality. On the other hand it can be used in indicating as someone having or not having the knowledge of something, as in "... has the knowledge of" or "... has not the knowledge". (See verses: 006.108, 006.119, 030.029)
4) The Qur'anic concept of knowledge indicates that reality is grasped with knowledge, and the limits of knowledge determines the extent of the awareness of reality and consciousness. (027.084)
5) In the Qur'an, it is stated that Allah has encircled (= wasia) everything with knowledge. This might mean that knowledge extends through space. (006.080)
6) Also, the expression "a knowledge from Allah’s presence" (= min ladunna ‘ilma) takes place in verses as a particularly different kind of knowledge, in contrast to the knowledge that extends through space. (018.065)
7) The fact that Allah has power over everything, can be known by man. (065.012)
8) The verb "to know" (= alima) is applicable to both individual human beings and to a nation (= qawm) as in the expression "a nation that knows". (002.230,006.097, 007.032)
9) "Those who know and those who do not, cannot be the same". (039.009)
The conceptual grammar of "knowledge" which we have summarized here, constitutes a part of a wider conceptual network in the Qur'an. The importance and necessity of knowledge in Islam, emerges as a prerequisite for being human. Only through knowledge man wants to know how far his abilities and responsibilities extend.
3. The Importance and Necessity of Knowledge
What is it that makes man important, while he occupies such a small space-time region on a small planet compared with the astronomical dimensions of the heavens that emcompass hundred billions of galaxies? An answer can be given to this question in Islamic thought is: What makes us important is the fact that Allah has given us some superior qualities among His creation, notably our superior ability of learning and thinking, by which we can increase our contact with reality and consciousness.
For a man believing in Allah, the most important question of science is this: "How does Allah administer or rule the heavens and the earth?" Another question that follows this is: "Can human mind comprehend how Allah rules the heavens and the earth?" This may come as a surprise to some readers, but both questions have answers, and the answer to the second question within the Quranic framework, is a definite "yes". The answers are given in the Qur'an as follows:
"Have they not studied the Administration of the heavens and the earth, and what things that Allah has created? (= awa lam yanzuru fi malakut as samawati wal ardi wa ma khalaqallahu min shay)" (007.185)
"Allah is He, who created the seven heavens and of the earth the like of them. The instruction (= amr) is sent down through them, so that you know that Allah has power over everything, and that Allah has encompassed everything with a knowledge." (065.012)
As can be seen, in the first verse above, man is motivated to a systematic study of the Administration of the heavens and the earth, or in other words, over the “laws” and “principles” of the interactions which take place in them. The word "study" (= yanzuru fi) in this verse indicates reference to both observation and thinking, and the term “nazariya” (= theory) has been derived from this word in the classical era of Islamic thought. Accordingly “nazara fi”, can be understood as “systematic observational and theoretical study”.
In the second verse, we see how the concept of instruction (= amr) that emerges as a key concept in the "Administration of the heavens and the earth" is closely related to the concept of knowledge (= ‘ilm). In this verse, the expression "so that you should know", clearly indicates that man can know how Allah keeps the events happening in the heavens and the earth under His knowledge and His control. The knowledge of this should indeed be the most important gain for mankind.
Notice that the question: "How does Allah direct the heavens and the earth?" is, in a narrower sense, also the fundamental question of physical science: How do various phenomena take place in nature in an ordered way? How did the order that we see in the universe come about? We shall return to this below.
It is with these motivations that the Muslims in the classical era turned towards understanding themselves and the space that they lived in. This was the motivation behind their occupation with such sciences as mathematics, logic, physics, chemistry, botany and astronomy. The prominence given to knowledge, learning and thinking in Islam, is a pointer to the importance that is given to man. The Prophet (s.a.) had taught Muslims through the Qur'an, the importance of knowledge, learning, and thinking. And the Muslims, through their books between 8-12th centuries, taught Christians and Jews in al-Andalus how to correctly reason about nature. Regretfully, due to serious changes in their concept system, which started to take place in the 11th century, Muslims began to loose motivation to science, and as a result, their influence in science gradually dwindled and died out in the subsequent centuries. The present state of the Muslims regarding scientific activity constitutes a complete contrast with that of the Islamic civilization in the classical era. The concept of knowledge has now lost its significance, while some secondary or obscure concepts have acquired prominence in the Muslim's mind today.
4. The Concept of Knowledge and Motivation in Islamic Civilization
The concept of knowledge (= ‘ilm) had such a central role in the Islamic civilization during the Classical Era, that this made the famous orientalist Franz Rosenthal9 to coin his observation that there had been no other civilization in history, in which the concept of knowledge had played such a central role as in the Islamic civilization in the classical era, including the Western civilization.
It can be said that, correct thinking develops over correct premises, a correct concept system, and correct observation and inference methods. Correct thinking affects motivation to research positively, as it leads to a better understanding of the world, and to useful discoveries and inventions. Systematic knowledge rests on a correct concept system. For this reason, in the process of the improvement and increase of systematic knowledge, the concept system on which it stands, plays an important role. In our own study on scientific creativity, we are convinced that the concept system has a critical role over scientific motivation and scientific creativity.
With the birth of Islam, the development of tendencies towards learning, quickly turned into a campaign, and within a short span in historical terms, it caused the formation of scientific motivation among Muslims. The campaign for learning had already started in the Medina period, when the Prophet (s.a.) introduced a policy of releasing the prisoners of war (who were then treated as slaves) on the condition of teaching Muslim children reading and writing.10 In the same years, it is also known that Muslims opened a school in Medina. Such an educational policy could not even be dreamt of in those ages. In this era, Muslims considered learning and acquisition of knowledge as a paramount duty of being human.
The conceptual change, and the learning campaign that developed in parallel, started giving its results in scientific development within less than a century, during the time of the Umayyads. This was followed on in an even stronger form during the Abbasids (750-1254), particularly by the establishment of Bayt al Hikma (= the House of Wisdom) by the Abbasid ruler Harun al Rashid. This royal institute, being unique in its kind in the history of science, had started as a center for translation for all the works of ancient cultures, from poetry to medicine, astronomy and philosophy, but soon turned into a center for original scientific work.
With their pioneering work in chemistry, the Umayyad prince Khalid bin Yazid (665-704), Ja’far al Sadiq (700-765), Jabir bin Hayyan (approx. 721-805), Zunnun al-Misri (d. 860), Al Razi (= Alrhazes, 860-925), Ibni Sina (= Avicenna, 980-1037) and Al Matruji (? - 1007) laid the foundations of modern chemistry as an experimental science. In physics, we see Al Kindi (= Alkindus, a 796-872) and again in physics with his research in optics Ibn Haytham (a. 965-1051); in mathematics Al Khawarizmi (a. 780-850) and Thabit bin Qurra (a. 834-901); in zoology Jahiz (a. 776-869); in astronomy Bayruni (a. 973-1051), al Zarqali (1029-1087), and Ibn Shatir (d. 1375); in medicine Al Razi (= Alrhazes a 864-925) and Ibn Sina (= Avicenna, y 980-1037), and in medicine, physics and philosophy Ibn Rushd (=Averroes, 1126-1198) among the prominent Muslim pioneers of science and philosophy.11 We can now take a brief look at the important contributions of Muslims to chemistry, physics and mathematics in this period.
The Muslims' contribution to the development of chemistry as an experimental science has been crucial in several respects. First, as can be seen in Jabir bin Hayyan's collection, contrary to the ancient Greek tradition, Muslims thought that chemical substances are composed of a set of basic properties combined in certain proportions, and that these properties could be separated and recombined to yield new substances12. Notice that this idea introduces the concepts of analysis and synthesis and an accompanying methodology for research into chemistry for the first time in its history. Secondly, their claim that these basic properties are held together in balance (= mizan) in chemical substances, introduces the notions of stability and equillibrium13. Third, but not the least important, they have described their chemical experiments in such a way that it is possible to see all the parameters of a modern chemical experiment. For example, we see everything in the description of an oxidation experiment of mercury conducted and described by Al Matruji14, that the description of a modern chemical experiment should include: a) reaction materials, b) their quantities, c) reaction equipment used, d) reaction conditions, e) reaction products, and f) their quantities. The same approach can be seen in the experiments of other early Muslim alchemists. It can hardly be denied that these are very important contributions to the development of chemistry as an experimental and theoretical science. Using this methodology, Muslim scientists had isolated inorganic substances such as alkalis and hydrochloric, nitric, and sulphuric acids and ammonia. They conducted distillation experiments on organic substances and separated several basic organic substances.
In the field of physics, Ibn Haytham's work on parabolic and spherical mirrors, and glass magnifiers, and his work on the refraction of light in general formed the basis of optics. Ibn Haytham studied the behavior of light when passing from a less dense medium to a denser medium, and led the way for the discovery of the famous sinusoidal law of refraction of light15. This was the second law discovered in physics after Archimedes' well-known discovery.
Perhaps the most important contribution of Muslims to the development of classical physics has been an indirect one; by the invention of algebra. Al Khawarizmi's (780-850) introduction of the concept of equation and the use of variables in place of numbers in mathematical problem solving, is regarded as one of the most important abstractions in the history of mathematics. The first abstraction in mathematics was the invention of decimal numbers and the introduction of zero. The second important abstraction was transforming geometry into an axiom system by Thales (640-546 B.C.) and Euclid (4th century B.C.) After Khawarizmi’s invention of algebra, the next important abstraction in mathematics took place nine centuries later, with the introduction of the concept of function by Newton and Leibniz in the 17th century. With their invention of algebra, the Muslims had shown how complex arithmetical problems could be easily solved by symbolic equations. This was a serious departure from the geometrical techniques used by ancient Greeks. The development of algebra, coupled with geometry, has led to the development of analytical geometry and trigonometry, whose foundations were also laid by Muslim mathematicians. Without algebra, there could be no analytic geometry, calculus and classical physics, and consequently no industrial revolution. In astronomy, we know that the Muslims had not only discovered that the earth was spherical, but also measured the distance between the longitudes during the Abbasids. They had also considered the heliocentric system of planets several centuries before Galileo and Copernicus. In fact, Copernicus’ book contains astronomical drawings identical to those drawn by Tusi several centuries earlier. 16
When we compare the studies that the Muslims conducted in science between 8-11th centuries to the scientific developments in Europe since the 16th century, of course they might look minimal. However, for a correct evaluation, the Muslims' success in several fields of science in these centuries, must be viewed in comparison and contrast with the happenings in the other parts of the world during the same centuries. Then the Muslims would stand out as unrivalled in scientific activity and motivation among their contemporaries. The methodological error that is frequently fallen into in the so called "Islam and Science" debates is the result of an illicit evaluation.
The important point that must be noted is: The Muslims' studies in this period had led to the development of a "research tradition" based on experimentation and observation for the first time in the world history as we know it today.17 Right here we are faced with the central questions of our analysis: Why did the Muslims not continue their activities in science after such a successful start. Why did not they develop a systematic and multi-dimensional research that formed the basis of sciences such as physics, chemistry and astronomy? What had happened in the history of Islam that scientific and philosophical studies gradually slipped out of the field of interest of Muslim scholars? What kinds of psychological, social, economic and political factors caused Muslims to leave research in the fields of science? These are the questions that must be asked in the "Islam and Science" debates until the correct answers and explanations are found.
The decline of Muslim scientific activity after a brilliant and successful start has attracted the attention of a number of Muslim writers and historians since the 17th. century with the Ottoman scholar Katip Chelebi, down today. Katip Chelebi's insightful observations had related the decline to a conceptual problem, but his warnings did not start a serious movement in the Ottoman administration. Until recently most of the proposed explanations, were directed to explain the decline with political causes such as the Crusades in the 12th. Century and the destruction (especially of libraries and men of knowledge) by the Moghul invasions in the 13th. Century; and to economic causes such as the decline of the importance of the Silk Road, and the accompanying shift of economic power from Islamic countries towards Europe as a result of the geographic discoveries that took place in the 15th and 16th Centuries. All these explanations may have some truth in it, but they do not explain the loss of motivation in science in Muslim societies as a whole.
Huff,18 a leading figure in the field of comparative historical study of science, states that Muslims had made a brilliant start in the middle age, and quickly gained and established a clear superiority over China, India and Europe in almost all the fields of scientific activity, but that their activity started to decline after the 12th century. He examines the causes of decline through several interrelated issues which can be summarized in three categories: 1) the role of the scientist in the society, 2) the basic beliefs of the scientist about nature, and 3) the existence of the social and legal institutions that support the development of science.
When he examines the role of the scientist, Huff classifies the intellectuals in the Islamic society of the Middle Age into three classes: fuqaha (= jurists), mutakallimun (= theologians), and philosphers. He recounts al-Kindi, al-Farabi, al-Razi (Alrazes), Ibn Sina (Avicenna), al-Biruni, and Ibn Rushd among the Muslim philosophers who have contributed the development of early modern science. He states that the philosophers could not maintain influence on their societies after some their ideas became the target of the theologians, causing the former to lose support for their activities from the Muslim population. In this period, the theologians used the opinions of al Ghazali and Ibn Taymiya to attack philosophers. Huff also states that the fuqaha (=jurists) from time to time severely criticized the theologians themselves. According to Huff, the Muslim philosophers did not have clearly defined and valid social roles in their society. Those who studied philosophy and science mostly had an additional duty accepted by their societies. For example, Ibn Rushd was also a jurist, and Ibn Shatir was a muwaqqit preparing prayer timetables for the daily prayers of Muslims.
From the viewpoint of the basic beliefs of the scientist about the nature, Huff relates several principles for scientific inquiry, some of which can be listed as: 1) a rational and objective investigation of nature in order to understand it operations is possible and desirable, 2) such an investigation makes use of empirical methods, 3) might make use of mathematics and deductive reasoning, 4) the scientist should eschew all voices of authority, tradition and popular opinion in question of how nature functions, except to the extent that the information is rationally verifiable, and 5) the scientist must practice systematic doubt, and sometimes endure a prolonged uncertainty in his disciplined search for an understanding of natural phenomena. Most of these principles were practiced by the Muslim scientists, as they had developed and applied empirical and methods, particularly in chemistry and optics, and they were the first to use mathematics in a field of science, namely in astronomy.
Huff also notes the negative role of the doctrines developed by the theologians, which rendered systematic studies of nature a meaningless activity. Lastly, he states that in the Medieval Muslim world, the legal and social institutions were not developed to support the activities of Muslim scientists. The colleges (= madrasa) that were developed by Muslim foundations focused on teaching classical Arabic, Qur’anic interpretation (= tafsir), prophetic tradition (= hadith), logic (= mantiq) and theology (= kalam), and limited teaching of mathematics and medicine. Later, college education became confined to “religious sciences”. Study of astronomy and mathematics were the subjects of observatories, while medical studies were carried out in hospitals. The educational system of the Madrasa were based on mastering the subject of individual teachers, rather than being organized in faculties. The curricula of these colleges were determined by the foundation (= waqf) who provided its finances, and as they did not give much respect for sciences other than the “religious sciences”, studies in experimental sciences and philosophy were left to individual efforts. In contrast, the European universities founded several centuries after their Muslim counterparts obtained their legal and educational independence soon after.
In summary, Huff tries to explain the decline of science in Islamic civilization primarily by the failure of Muslims in developing the necessary institutions and in providing legal autonomy to scientific activities. These are important insights into the nature of this historical phenomenon, and we agree with Huff on these accounts to a certain extent, but we have to go deeper below the social and legal causes. Social and legal order in a society is continuously shaped and reshaped by the beliefs and motivations and the order of concepts of the society. For this reason we claim that the deeper causes of the decline has to be searched for in the conceptual changes that began to take place in the language and minds of the Muslims around 11th century.
For a historical change in such proportions, we never found satisfactory the explanations that rested only on political and economic causes. We were considering since the early 1970s that there had to be more convincing explanations for such a grand scale decline, but were not able to figure out the real causes. Years later, after a debate when doing a PhD in London in 1986, a question began to emerge as to whether the Muslims used certain Qur'anic concepts such as "aql" (= using intellect) and "ilm" (= knowledge) correctly, i.e. in the same grammar as in the Book. Then, following a three-month study on the concept of "aql", we had realized that this word was used quite differently from its native grammar in the Qur'an.
Our later studies on several concepts which we called "The AMR Constellation of Words", revealed an important conceptual network which was directly related to the cosmic order. These concepts were completely ignored by today's Muslims, while they had at least partially constituted the basis of the ideas of many Muslim scholars about reality in one form or another, between 8-12th centuries, from Al Kindi to Ibn Rushd.
This study was indicating that there had been a serious conceptual break in the history of Islamic thought. It began to look clear that this conceptual break gradually diminished the motivation for scientific research among Muslims, and as a result of this had they failed to develop the necessary social institutions and legal support, but not the other way round. In the next section we discuss this conceptual degeneration in some detail.
5. The Conceptual Disintegration in the 11th Century: The Division of the Concept of Knowledge
We can say that the most important conceptual change in the history of Islamic thought had taken place on the concept of "ilm" (= knowledge). This concept is used by today's Muslims as divided into two broad and disjunctive categories as "ilm ad-din" (= religious knowledge) and "ilm ad-dunya" (= worldly knowledge). In a study conducted on this concept both in the Qur'an and the six hadith books which compile the sayings and actions of the Prophet (s.a.), we noticed that there was no trace of such a division in these sources. On the contrary, the concept appears in these sources as an indivisible whole. This indicated that such a conceptual division on knowledge as "religious knowledge - worldly knowledge" had been introduced in the latter half of the 10th century, about a century later from when the hadith were compiled. We can now say that the conceptual division of "knowledge" began to take serious effect in the Muslim thought in the 11th century, and before the end of the 12th century with the exile of Ibn Rushd, it became an accepted norm in the Muslim world from Cordova to Baghdad. In this way, the integral concept of knowledge (= ‘ilm) with its close relation to the concept of reality (= haqq), left its place to a divided concept of knowledge and reality.
At this point some might argue that the qualification of knowledge as "religious knowledge - worldly knowledge" had resulted from a necessity due to the increase in the amount and variety of knowledge. We maintain that this conceptual division was introduced by the 10th century Muslim theologians (particularly by the Ash’arites) for certain other purposes. These purposes primarily included the "protection of Muslims from certain heretical beliefs and ideas". Whatever their aims were, the conceptual changes introduced by the theologians can hardly be taken as a mere manifestation of goodwill, considering the drastic results that they entailed. They introduced both a simplistic and a contradictory set of concepts in place of a rich and complex, but a consistent conceptual structure in the Qur'an. Simplistic, because it overlooked the fact that the Qur’anic concept of knowledge includes all expressions which reflect reality; and contradictory because it ended up denying reality itself.
The division of knowledge as "religious knowledge - worldly knowledge" in the 11th century resulted in questioning the status of physics, chemistry, astronomy, mathematics and logic, which were now regarded as "worldly knowledge". In the beginning it had been stated that these sciences "had, positively or negatively, nothing to do with the religion"19, then the idea that knowledge of these sciences are superfluous, began to creep in the minds of Muslims. Quite often, these sciences were considered as "useless knowledge" as opposed to the "religious knowledge" which was regarded as "useful" by definition. Muslims who were engaged in such sciences, while previously being supported by the public and rulers alike, began to lose support, or even became isolated from society.
As an example, we can cite the case of Ibn Rushd (= Averroes) who was the chief justice in Cordova and the doctor of the Caliph during the Andalusian Umayyads in the 12th century. Ibn Rushd was persecuted by the theologians for his ideas about science and philosophy, when the latter had established their political power in the Umayyid administration. The theologians publicly burned Ibn Rushd's books and wanted to him to be sentenced to death, but he was narrowly saved by the Caliph owing to his earlier services to the state, and was exiled to North Africa instead. Reactions to men of science to the extent of the Inquisition, are in general not observed in the history of Islam. But, as we can now see more clearly, the dismissal of philosophy as heresy, and science as a useless enterprise, has deeply influenced the motivation of the Muslims away from such activities conceptually, psychologically and politically.
On the other hand, many early Muslim philosophers also had based some of their ideas about reality on several concepts adopted from ancient philosophers like Plato and Aristotle. Such controversial concepts as existence (= wujud), infinite past (= qidam), and intellect (= ‘aql), and the logical concepts such as universal (= kulli) and particular (= juz’i) were being used in philosophical and theological discussions instead of the rich network of Qur’anic concepts about the whole reality. Unending debates were taking place between the philosophers and the theologians where the latter were also using the same concepts as their opponents, in addition to the evasive dialectic methods. This led to the gradual isolation of the philosophers from their society, as they could not defend their position effectively in a concept system detached from the Qur’an. It is no surprise that the most original contribution of Muslims have been in the fields of chemistry and algebra, where they relied on the Qur’anic concepts of “balance” (= mizan), and “shay”.
The most interesting outcome of this conceptual break was yet to come. The division of the concept of knowledge, which appears as a unified concept in the Qur'an, as "religious knowledge - worldly knowledge", and the almost unanimous acceptance of this division by the Muslims after the 11-12th century meant the acceptance of secularism by the Muslims in thought as a doctrine. In other words, by accepting such a conceptual division, Muslims would have effectively accepted the separation of their "religion" from their "world". Despite this, when they were faced with the situation of forcefully accepting secularism as a legal principle after nine centuries, they reacted strongly. The rationality of this reaction needs to be evaluated in the same framework with the quiet acceptance of the division of the concept of knowledge in the 11th century.
The conceptual change that took place during the 11th century did not confine to the concept of knowledge, but also spilled over a series of other concepts related with "creation". Abandoning the understanding of a cosmic order based on the concepts of “haqq” (= reality) and "amr" (= instruction) and a set of related concepts in the Qur'an, the theologians adopted a view of physical space based on the hypothesis of continuous creation-annihilation. This hypothesis was developed from a simplistic concept of "creation" which resulted from the reduction of a dozen concepts in the Qur'an related with "creation". In this process, the theologians reduced such concepts in the Qur'an as khalaqa, jaala, baththa, nabata, fatara, banaa, sawwara, sawwa, etc., that mean designing, making, evolving, giving form, bringing into existence, constructing, growing, etc., into a single concept "creating" (= khalaqa) and in this way, turned a rich, complex and consistent concept system into a single concept which swallowed all the details of the original set of tightly related concepts.
As a result of this conceptual reductionism, many Muslims soon found themselves in a position to deny that substances had any essential properties, and by denying the principle of causality in any form, they had locked their minds in a concept system that made scientific explanations quite impossible. (Imagine the development of empirical sciences such as physics, chemistry and astronomy, without accepting the principle of causality or that the substances have characteristic properties.) Yet it is clear from the verses related with the word "amr" (= instruction) and a set of other related concepts in the Qur'an, that the properties of substances are manifestations of a set of instructions that make the substances themselves.
There are about 250 verses (= ayah) in the Qur'an in which the word "amr" (= instruction) and its derivatives take place. Some of these verses state that Allah rules the heavens and the earth with His "amr" (= instructions). Indeed, it is clearly stated in the verses that mention the movements of celestial objects, that they move in accordance with the instructions that had been revealed to (or loaded in) the heavens during their formation:
"... Then He decreed it (the heaven) as the seven heavens, and revealed in (or loaded in) each heaven its instruction." (041.012)
"The sun, the moon and the stars are subjected [to remain in their courses] (= musakharat) by His instruction." (007.054, 014.033, 016.012, 022.065)
From the grammar of the word "amr" that occurs in these verses, a conceptual framework emerges, which indicates that a set of instructions that are distributed in space, where they can be joined or dispersed. In this framework, it is understood that physical events take place in an order determined by the complex interactions, unifications and distributions of the instructions in the physical space. However, the cosmic order is not totally unchangeable and free from divine intervention. As can be seen from the verses related with the word "izn", Allah may intervene to any space-time regions with new instructions and can alter its physical properties according to His will. In this way He can bring about changes that would otherwise be impossible with the existing instructions in that space-time region. Similarly, He can also prevent the happenings that would have resulted from the existing instructions otherwise. Indeed, from the verses related with "amr" and "izn", it is clear that Allah intervenes with some events by means of new instructions which He "sends down" periodically. What we call "miracles" are also partly explainable in this framework. We say "partly", because we do not exactly know what can and cannot be materialized within the interactions of the existing instructions in a space-time region.
On the other hand, in verses related to the word "sakhara" (= make subject to / give under use or control) in the Qur'an, Allah states that He has "made subject to mankind whatever is in the heavens and the earth":
"Do you not see that Allah has made subject to you whatever in the heavens and the earth (= sakhara la kum)?" (031.020)
“He has made subject to you whatever is in the heavens and the earth, all from Him (= wa sakhara la kum ma fi-s samawati wa-l ‘ardi jami’an minhu); in this there are indeed signs for a nation who reflect (= qawmin yatafakkarun).” (045.013)
The words "amr", "izn", "sakhara", "sultan", "qadr" and "qada" constitute an extremely remarkable conceptual network in the Qur’an. We have made a detailed study on this subject, and we intend to publish this work soon in a book titled "Foundations of Scientific Thought in Islam".
The denial of the essential properties of substances and causal relationships between physical events by the theologians and Al Ghazali (1058-1111) in the 11th century, was criticized in detail by Ibn Rushd (1126-1198) in the 12th century 20. Ibn Rushd also took seriously the erroneous division of the concept of knowledge by the theologians, and wrote a book titled Fasl al Maqal. 21 In this book he tried to demonstrate the indivisibility of science and religion, both in logico-philosophical and legal terms (as he was both a philosopher and the chief justice of Cordova.) Despite his serious warnings on this matter in both of his books Fasl al Maqal and Tahafut al Tahafut, his work did not receive sufficient attention and understanding by Muslims of his times and of later centuries.
The continuation of conceptual disintegration in subsequent centuries resulted in the abandonment of the research tradition developed until the 12th century. (It would be impossible to do research in experimental sciences such as physics and chemistry, in an intellectual framework where causality and the essential properties of substances were denied.) The philosophical disagreement between Al Ghazali and Ibn Rushd was debated until the 15th century when the Ottoman sultan Mehmed II decided to resolve it by an academic debate. The sultan asked a committee to be set up from scholars to discuss the issue in a free spirit. This is related by Osman Turan, a 20th century Turkish historian as:
"Sultan Mehmed had gathered the scholars of the age around him. He wanted to resolve the disagreement between Al Ghazali and Ibn Rushd. For this reason, he formed a commission under the chair of Hodja-Zadeh who had publications in philosophy. However, because of the complexity of the matter for the scholars of that age, the problem remained unresolved, and the controversy between philosophy and religion continued." 22
However, after Sultan Selim I (1512-1520), the official educational policy in the Ottoman Madrasas gradually shifted towards the Ash’arites’ views from that of the Ma’turidi theologians who give more prominence to reason and rationality. As a result, interest in experimental sciences further declined in the Ottoman institutions.
The reasons for the failure of the Ottoman universities (= the Madrasa) in competing with the European universities in the field of mathematical and physical sciences, can be found in its mistaken views about philosophy and science. But at the root of this failure was the gradual abandonment of the research tradition developed by the early Muslim scientists. However, in contrast to their failure in physical sciences, sporadic technological achievements by the Ottomans continued until the 17th century, particularly in the military technology. The real Ottoman success was in political science and administration which can partly be explained by the effect of the Enderun (= the Royal College) which was independent of the Madrasa system. Additionally, the Ottomans followed the original political concepts of Islam to a certain extent, rather than that of the 14th century theologian Ibn Taymiyyah. This is another topic that needs a careful and detailed study in itself.
Despite their earlier technological successes, the poor performance of the Ottomans in science and philosophy in comparison to the developments in Europe, was noticed early enough by some Ottoman scholars. For example, Katip Celebi (1609-1657) in his Mizan ul-Hak, had alerted the Ottoman administration about the complete failure of the Madrasa in the study of physical sciences and mathematics by its dismissal of such sciences as "of philosophy", despite the clear advancements made in European universities in these sciences. But such warnings were bound to fail to produce real concern within the distorted concept system that had started to settle in the Muslims' minds a few centuries earlier.
6. Ottoman Initiatives for Renewal and Muslims in the 20th Century
By the turn of the century, the Ottomans started to feel deeply the decline in the structure of the state from economy to defence as a result of their failure in the fields of philosophy and science, and consequently in technology. However, the Ottoman administrators and intellectuals (with a few exceptions like Katip Celebi, Koci Bey and Ahmed Cevdet Pasha) tried to reverse the decline only by a series of social and political measures such as in the reforms of Islahat (= recovery), Tanzimat (= reorganisation) and Meshrutiyet (= constitutional reform). They could not see that their problems would not be resolved by mere political and social measures, as they were rooted much deeper in the conceptual plane, and were affecting the Muslims' motivation to science in an extremely negative way. The failure in science in turn, was deeply affecting, in an indirect way, the social and political structure and its institutions.
Finally, the educational campaign started by Sultan Abdulhamid II towards the end of the 19th century, which at times met serious opposition from Muslims themselves, proved to be insufficient. In the end, the Ottoman state left the stage of history following a decisive defeat by European states who commanded scientific and technological superiority in World War I.
The attempts to solve conceptual problems by political and social measures continued in the Turkish Republican Era. The political measures such as the cultural reforms like changing the alphabet and enforcing new outfits; the social reforms like the adoption of constitution and legal codes from the French, Swiss and Italian laws in the early period of the Republic (1923-1933); the industrialization campaign in the second stage (1950-1960); the heavy-industrialization phase in the third stage (1970-1975); and finally, the policies of liberalization and market economy (1980-1990), can be regarded as the continuation of the efforts in the same direction as started in the last century.
As the conceptual structure that was inherited from the Ottomans could not be changed by forceful social, legal and pseudo-cultural measures, the Republican policies of development, which themselves were infected with the same conceptual diseases albeit in the opposite end, have failed to yield real success. The positivist policies that were introduced in the Republican era, were aimed at suppressing Muslims in effect, if not in form, extinguishing the social motivation still further by straining the illicit "religion - science" division even further, instead of trying to resolve it.
The Republican governments failed to develop an effective "science policy" despite the illustrious slogans they produced about science. Until recent years, there was even no academy of sciences in Turkey, and the one that exists now looks at cultural affiliations rather than academic achievements as criteria for nominations. The reasons why there is still no proper academy of sciences, no ministry of science and technology, and no policy of science and technology in this country must be seriously investigated by the thinking people in this country.
7. Other Recent Attempts
About two decades ago some Muslim writers in the UK and Pakistan had initiated a programme which they called “Islamization of Knowledge”. These writers had proposed that the scientific theories developed by "Western" scientists be examined carefully and modified in accordance with the “Islamic viewpoint” (inevitably this will be their own viewpoint based on a concept system adopted from some relativist "Western" philosophers), and thus be "Islamized". Whatever underlying goodwill their proponents may have had in mind, we considered such attempts as spurious as they did not reflect any serious consideration of the conceptual problems involved. As could be expected, this project had ended in failure within a few years from its inceptions.
Another attempt came from the distinguished Muslim historian of science Nasr, who proposed mystical foundations for motivation in scientific research for the Muslim scientists of the future 23. Nasr claimed that the early Muslim scientists, particularly the alchemists were motivated by the prospect of gaining the knowledge of the hidden.
Both programs were bound to fail because of the errors they precluded. The proponents did not see that the decline of the Islamic civilization in the field of science, and consequently in many other fields of life, was the result of the Muslims' abandonment of their own research tradition, and deeper below, their loss of motivation for learning and research. No program can succeed before Muslims regain their motivation to science, which in turn, can happen only when the Muslims abandon the disintegrated concept system and reclaim the rich and consistent concept system in the Qur'an. Then only, will they begin to see reality as it is, and free themselves from the need of any slogans and the evasiveness of the shallow policies of change.
As to Pervez Hoodbhoy's24 comments: We have to take seriously his criticisms on the ill-formed notion of science of today's Muslims. However, we have also to consider the serious errors in his analyses, and will show how his analyses are based on a series of mistaken premises.
One of the serious errors of Hoodbhoy is his claim that science is a secular activity25. His claim is based on his mistaken views about the scientific motivation of Muslims during the 8-11th centuries. He seems to forget that the Muslims in that era conducted their studies and research within a certain concept system and the related understanding of "being human" that they learned from Islam. Early Muslims had great respect for truth (= haqq) and knowledge (= ‘ilm), and would stand all kinds of hardships for truth and knowledge.
If Hoodboy were correct in his claims that the Muslims scientists and philosophers in that era owed their success to a secular view of the world, why had other more secular societies not been able to demonstrate such a remarkable scientific activity around the same centuries? The fact is that Muslims in those centuries had an unrivalled position in terms of scientific activity, despite that they had never obstructed the activities of other cultures that lived with them.
Hoodbhoy correctly identifies that the "Islamic" countries are in a "crisis of science" down from popular to the administrative levels, and that this crisis has been causing them a complete destruction in many fields of life. However, he continues to state that, when examined carefully, it would be seen that this crisis is of political nature in its essence26.
We have only to remember that this diagnosis had already been given by the Ottoman and the Turkish Republican intellectuals and administrators much earlier, and that the political measures taken to that effect had until today, produced nothing but failure. We can see that deep in the heart of this crisis, there lies problems of conceptual nature, rather than political or economic problems.
Finally, after having described the tragic situation of today's Muslims as regards to science, Hoodbhoy finds the only solution for Muslims to initiate a scientific revival in a secular approach, with the acceptance of "science - religion" duality. We have just described how a similar approach fails in Turkey since the early days of the Republic. The reasons for the failure was first of all, that this approach was obstructing the Muslims' motivation to learning and to research in general, let alone their scientific motivation. Secondly and more importantly, the conceptual division entailed by such a secular approach, contradicts the unifying concept of knowledge and a host of related concepts in the Qur'an.
It would be naive to expect Muslims to have any sustainable motivation for scientific activity within such a contradictory and secular conceptual framework. Besides, any other motivation would be indistinguishable from anything within the existing framework of modern scientific culture which itself cannot offer any new solutions to the existing problems of modern science. Fame or the feeling of superiority, are the main motivations for science in the contemporary secular culture, but not any love for truth and reality. More scientists with the same motivations would not earn mankind any better world than this unjust world that we live in.
It should be noted that the conditions of contemporary world do not support isolated scientific activities, except in some extreme cases. This means that scientists are in much closer contact with each other than they were in the earlier centuries. Therefore Muslim scientists would do better if they focused on more basic problems of modern science, rather than on the particular problems of modern science, but only through a concept system which has perfect contacts with reality. This is because, at the roots of the illnesses of the contemporary culture, there lies a complex conceptual network inherited from the distant past, which has already begun to hinder further progress in science.
8. Back to the Future
We insist that, the Muslims' scientific revival in the future can be realized neither by Hoodbhoy's programme, nor with that of the other Muslim writers that he criticizes. Because none of them seem to be aware of the conceptual disintegration behind the problems, and how it obstructs the scientific motivation of the Muslims of today. These writers find the solution only in the changes of external factors with new additions to a disintegrated concept system. What drives men/women to learning is their motivation, which can only develop in a concept system that feeds it. We have seen a live historical example that it is possible to extinguish cognitive motivation in a concept system and the related cultural environment by introducing changes to that concept system, as that has happened to the Islamic civilization. The question now lingers is: How can we revive a disintegrated concept system which was in perfect order when it started? Is there any easy way of reconstructing a degenerated concept system?
We are not going to answer these questions in this study. This subject will be dealt with in detail in a book that we intend to publish soon. What we can say here is that, before Muslims realize the necessity for such a conceptual restructuring, they cannot give much hope for the future of Islamic civilization. Unless and until they regain the rich concept structure that they abandoned around the 11th century, the Muslims will not possess the cognitive motivation by which they can make real progress in science and other fields of civilization. Finally, we can also say that, if Muslims, in an effort which we can call going "back to the future", succeed in regaining the concept system that they began to abandon around the 11th. Century, then the history of mankind shall once again witness surprising scientific developments from the hands of Muslims, surprising even in the standards of the fast changing scientific and technological conditions of our time.
References
1. Russell, B. (1969). History of Western Philosophy. George Allen & Unwin, London, p. 420.
2. Sarton, G. (1927-48). Introduction to History of Science. Williams & Wilkins, Baltimore, Vol. 3, Chapter 5.
3. Huff, T.E. (1993). The Rise of Early Modern Science: Islam, China and the West. Cambridge U.P.
4. Hoodbhoy, P. (1992). Islam ve Bilim. (Turkish Tr.), Cep Kitaplari, Istanbul.
5. Scientific activity was institutionalized in the famous Bayt al-Hikma (= House of Wisdom) established by Abbasid caliphs in Baghdad in the 9th century, from where it quickly spread throughout the then Islamic world from Spain to Central Asia.
6. Kocabas, S. (1993). Elements of Scientific Creativity. Working Notes: AAAI Spring Symposium Series, 23-25 March 1993, Stanford, USA, pp. 39-46.
7) See: Rosenthal, F. (1970). Knowledge Triumphant. E.J. Brill, Leiden.
8) The figures in the parantheses indicate the numbers of the chapters and verses of the Qur'an. E.g. (006.108) means the 6th chapter, 108th verse.
9) Rosenthal, F. (1970). Knowledge Triumphant. E.J. Brill, Leiden.
10) See, Hamidullah, M. (1966). The Prophet of Islam. Tr. into Turkish by M.Said Mutlu. Irfan Yayinevi, Istanbul, p. 14. (The author relates this information from classical Islamic sources, Ibn Sa’d, Suhaili, and Ibn Hanbal.)
11. The reader is referred to the following sources for the Muslim contribution to early modern science:
- Sarton, G. (1927-48). Introduction to History of Science. Williams and Wilkins, Baltimore.
- Nasr, S.H. (1989). Islamic Science. Insan Yayinlari, Istanbul.
- Huff, T.E. (1993). The Rise of Early Modern Science. Cambridge U.P., Cambridge.
- Demirci, M. (1996). Beyt-ul Hikme. (In Turkish). Insan Yayinlari, Istanbul.
- Akin, O. and Desay, M. (1993). Five Great Scholars of Algebra. (In Turkish). MEB Yayinlari, Ankara.
- Leicester, H.M. (1971). The Historical Background of Chemistry. Dover, N.Y.
12. See: Leicester, H.M. (1971). The Historical Background of Chemistry. Dover, New York. p. 66.
13. Ibid, p. 66.
14. Ibid, p. 71.
15. Topdemir, H.G. (1991). Ibnul Heysem'in Optik Arastirmalari (= The Optical Studies of Ibn Haytham). Bilim, Felsefe, Tarih. No. 1, pp 187-190.
16. See, Huff, T.E. (1993). The Rise of Early Modern Science. Cambridge U.P, Cambridge. p. 56 and 58.
17. This view is also supported by Huff (1993).
18. See, Huff, T.E. (1993).
19. Al Ghazali in his book Al Munkiz min ad-Dalal states that logic and mathematics which are counted as philosophical sciences, have nothing to do, positively or negatively, with the religion. We do not argue about his intentions about this qualification, but observe that it is quite open to misunderstandings.
20) See, Averroes (1978, pp. 316-321). Tahafut al-Tahafut. Tr. by Simon van Den Bergh. Luzac, London.
21) Averroes (1976). Kitab Fasl al-Maqal (On the Harmony Between Religion and Philosophy). Tr. by G. F. Hourani. Luzac, London.
22) Turan, O. Turk Cihan Hakimiyeti Mefkuresi Tarihi. Vol I-II, p. 542.
23) Nasr, S.H. (1991). Islamic Science. Insan Yayinlari, Istanbul.
24) Hoodbhoy, P. (1992). Islam ve Bilim. (Turkish Tr.) Cep Kitaplari, Istanbul.
25) Ibid, p. 17.
26) Ibid, p. 21.
Thursday, August 30, 2007
Tuesday, August 21, 2007
How Allah Directs Natural Phenomena
HOW ALLAH DIRECTS NATURAL PHENOMENA
Sakir Kocabas
Summary
In this work we look into how Allah directs what we call “natural phenomena”. Let us remind from the outset that our study is based on the ayahs (= verses) of the Qur’an. For a more detailed study, the sayings of Prophet Muhammad (s.a.w.) on the subject need also be taken into account. Yet, relying on the general principle that there can be no contradiction between the ayahs of the Qur’an and the sayings of the Prophet, we believe that the conclusions that can be derived from the Qur’an about this subject will be sufficient to draw a correct frame to start with.
Before we go on to explore the main subject of this study, we need to recall some of the ayahs in the Qur’an about how Allah, the Creator of the heavens and the earth, has established the order in the heavens and the earth and how He maintains it. For this reason we will first see the ayahs of the Qur’an, which state that Allah is the Real Ruler (= Malik al-Haqq). Secondly, we examine in some detail, the ayahs that express how Allah has established the order in the heavens and how He maintains it. Next, we attempt to bring clarity to the concepts of “physical phenomenon” and “natural phenomenon”. After these definitions we attempt to explore our main subject: How Allah controls and directs physical phenomena and natural phenomena. Finally, we end our survey with a summary of the conclusions.
1. Allah is the Real Ruler (= Malik al-Haqq)
In the Qur’an, there are more than 30 ayahs (= Qur’anic verses) that state that Allah is the Creator of the heavens and the earth.1 Some other ayahs state that Allah’s is the dominion of the heavens and the earth (= lahu mulk as-samawati wa-l ard). Moreover, Allah also states in the Qur’an that He is the Real Ruler (= Malik al-Haqq):
“Exalted is Allah, the Real Ruler; be not in haste with the Qur’an before its revelation to you is completed, but say: My Sustainer, increase me in knowledge.” (Ta-Ha 20/114)
We learn from this ayah an important name of Allah: Malik al-Hakk (= the Real Ruler). We need to dwell on the meaning of this name. Briefly, the above ayah expresses in a clear and succint way that whatever happens in the heavens and the earth happens under the direction and control of Allah. The following ayah on the other hand, states clearly that there is nothing in the heavens and the earth that escapes His knowledge:
“Allah is He who created the seven heavens, and of the earth the like of them [in number]; the instruction (= amr) descends through the midst of them [all]; that you may know that Allah has power over all things and that Allah has encircled all things with knowledge (= wa annallaha qad ahata bi kulli shay’in ‘ilma).” (Talaq 65/12)
“But the god of you all is Allah; there is no god but He; He surrounds all things with knowledge (= wasia kulli shay’in ‘ilma).” (Ta-Ha 20/98)
These ayahs state that Allah has encircled and surrounded everything with knowledge. Moreover, as stated in many other ayahs in the Qur’an, “He is well informed of the actions of His servants (= wallahu khabeerun bi ma ya’malun); “He sees their actions” (= wallahu baseerun bi ma ya’malun); “He hears and sees” (= innahu huwas sami’ul baseer); and as stated in Mulk/67, “He sees everything” (= innahu bi kulli shay’in baseer); in Fatir/38 “Allah knows the secrets of the heavens and the earth (= innallaha ‘alimul ghaybis samawati wal ard); and in the same ayah “indeed He knows the secrets of the hearts (= innahu ‘aleemun bi zatissudur); and in Yunus/10 “… nothing in the heavens and the earth that weighs as an atom (= misqala zarratin), or [anything] smaller or greater than that escapes from His attention (= wa ma ya’zubu ‘an rabbika), all [of this] are in an open book.” It is clearly understood from these ayahs that, all that happens in the heavens and the earth is like an open book to Him, and that nothing happens in the heavens and the earth outside His knowledge.
2. Allah’s is the Administration of the Heavens and the Earth
In the previous section we saw the ayahs which state that Allah is the Real Ruler, and that there is nothing that escapes His knowledge. Yet, His power is not limited to this, for Allah holds in His hand, the Administration (= malakut) of all things:
“Exalted is He in whose hand the is administration of all things (= fa subhan allazi bi yadihi malakutu kulli shay’); you will return to Him.” (Ya-Sin 38/83)
From these verses it is clear that Allah holds the Administration of the heavens and the earth, and He is the Real Ruler. The limited power that He gives to some people in this world for a determined period, is only by His will, and He takes it back when He will.
After we saw the ayahs that state that Allah is the Real Ruler of the heavens and the earth, we are faced with two questions: How does Allah rule the heavens and the earth? Can the human mind comprehend how He rules the heavens and the earth? At first sight these questions may seem to be impossible to answer, yet the answers to both questions can be derived easily from the Qur’an. The next ayah invites mankind to conduct observational and theoretical study on the Administration of the heavens and the earth:
“Have they not studied the administration of the heavens and the earth and what things that Allah has created? (= awa lam yanzuru fi malakut as-samawati wal ardi wa ma khalaqallahu min shay)” (A’raf 7/185)
Another ayah below expresses how Allah actually realizes the Administration, and that this is to be known by mankind:
“Allah is He who has created the seven heavens, and of the earth the like of them; the amr (= instruction) descends through the midst of them so that you may know (= li ya’lamu) that Allah has power over all things and that Allah has encircled everything with knowledge (= wa annallaha qad ahata bi kulli shay’in ‘ilma).” (Talaq 65/12)
In this ayah the word “amr” (= instruction/command) is a particularly important keyword for understanding the administration of the heavens and the earth. As can be understood from the ayah, Allah rules the heavens and the earth by His amr. But what is amr and what is its function in the administration? In order to understand this, we have to look into the ayahs in the Qur’an that this word and its derivatives occur. We attempt to explain briefly what this word refers to in the ayahs next.3
3. The Word Amr in the Qur’an and the Order in the Heavens
In the Qur’an, in relation to the creation and the administration of the heavens, the word amr appears in the ayahs in three principal frames:4
1) The amr that has been revealed into the heavens during their creation, by which the primary order has been established . (We call this the “primary amr”.)
2) The amr that is sent by Allah to influence and control the events in the world. (We call this the “secondary amr”.) With this amr, which He sends down by His angels and which is directly subject to His permission (= izn), Allah can change the current order in any region of space, and can create new and unseen events by it.5
3) The amr that will terminate the current order in the heavens and the earth, which in the Qur’an is called the amr of the Hour (= amr as-Saah).
We can now take a closer look at how the word amr takes place in these three contexts.
The order in the Heavens: The primary amr
The use of the word “amr” together with the words “sakhara” (= make dependent) and “qadr” (= measure) in the ayahs in the first sense above, is closely related with how the order has been established and maintained in the heavens. This is made clear by the ayahs which state that the seven heavens have been revealed in (or loaded with) their instructions with their creation, and that the states of the heavens and of the objects in them are maintained by this amr:
“And in two days He decreed (= qada) them [the heaven and the earth] as the seven heavens, and revealed in each heaven its instruction (= wa awha fi kulli samain amraha) ...” (Fussilat 41/12)
“The sun, the moon and the stars are all subjected [to remain in their courses] by His instruction (= musakharatun bi amrihi).” (A’raf 7/54, Ibrahim 14/33)
“And one of His signs is that the heaven and the earth stands with His amr (= an taqum as-samau wal ardu bi amrihi) ...” (Rum 30/25)
“Did you not see that Allah has made subject to you whatever is on the earth? Ships flow by His amr; He holds the heaven from falling on earth so that it would not fall, except by His permission (= izn); Allah is Most Kind and Most Merciful to mankind.” (Haj 22/65)
As can be seen, the order in the heavens is established and maintained by the (primary) amr that has been revealed in them by Allah. In this case, whatever happens in the heavens must happen in accordance with this amr, so long as there is no other intervention by Allah.6
This understanding leads us to an interesting concept of science, such that in this conceptual framework the aim of scientific investigation becomes understanding and explicating the structure and distribution of the instruction that has been revealed in the heavens by the Creator. An understanding of science as such, would not only explain many things about the order and harmony in the known space, but also would bring clarity to the issue of the creation and formation of the objects in the space. No cosmology developed to date has the basic concepts by which the extremely complicated, and yet excellent order that we observe from the micro-world to the macro-world, can be explained in a consistent way. How did, from a small number of basic physical forces, evolve the wonderously rich physical, chemical, biological and psychological interactions in the world? Is there a cosmic plan behind all this? These are the questions which occupy the minds of many scientists working in the fields of physics and cosmology.7
From the ayahs related with the primary amr in the Qur’an, we can infer that the order in the heavens emerge as a result of the interactions of the instructions (= amr) dispersed in all regions of space (= makan). In this case, the word “amr” (= instruction) emerges as a fundamental concept directly related with “being”. In information physics, the concept of information is used as a basic concept in explaining the degree of order (or the negative entropy) of physical systems.8 But there are categorical differences between the concepts of amr and information: It seems that the primary amr is a set of instructions which not only determines the order in a region of space, but also brings about what we call “matter” itself. From this, we can say that the concept of amr, unlike the concept of information, is a concept related with “being”, or in philosophical terms, is an ontological concept. (Indeed, in a number of ayahs the word amr occurs in close relation with the word kun (= be); see e.g. ayahs in Baqarah 2/117, Al-i Imran 3/47, Maryam 19/35, Mu’min 40/68, Ya-Sin 36/82).
At this point, we are faced with the question whether the primary amr is sufficient in itself or not, in maintaining the order in the heavens as an important question. Before we make a judgement on this issue, we need to consider the ayah:
“It is Allah who holds the heavens and the earth from collapse (= yumsiku-s samawati wal arda an tazula); if they should collapse, there is none, not one can hold them therafter; verily He is most Forbearing, oft Forgiving.” (Fatir 35/4)
This ayah brings several possibilities in mind. The first one is that the verb “holds” (= yumsiku) can be understood as “holds with His amr”, so that when the effects of the primary amr is obliterated by Allah, there would be no one other than Him to bring back the order. The ayah in Rum 30/25 that we saw earlier strengthens this possibility. The second one is that the heavens are protected from reduction or collapse, such that the word “tazula” which is a derivative of “zawal” (= reduction, collapse) may be pointing to such possibility.9
Another possibility is that, the cosmic order which has been established by the primary amr cannot go on indefinitely by itself, and that Allah maintains the order by His secondary amr. It is also imaginable that both possibilities can be the case. Other possibilities than what we said here in a theoretical and speculative framework need also be considered and investigated.
Allah’s intervention to the events in this world: The secondary amr
Let us continue with the relationships between the word amr and the order in the heavens. Since the order in the heavens has been established by the primary amr that has been revealed in them, one might think: If we have a complete understanding of the primary amr which have been revealed in the heavens, and consequently to all systems in them, we can understand all that happens in them. In this way we can possess complete knowledge about these happenings, and see the future. (This could be the final vision of the contemporary understanding of science.) Yet, as explained in some detail below, the problem is not as simple as this. The main reason is that the amr is not something that cannot be changed, and that its effects cannot be overridden once it has been revealed in these systems. An ayah which we saw earlier, explicitly states that other instructions are being (continuously or periodically) sent down by Allah:
“Allah is He who created the seven heavens, and of the earth the like of them [in number]; the instruction (= amr) descends through the midst of them [all]; that you may know that Allah has power over all things and that Allah has encircled all things with knowledge.” (Talaq 65/12)
Indeed, as will be seen when the conceptual frames of the words amr and all other related words (haqq, qadr, qada, izn, sakhara, sultan, ‘aql, and ruh), the amr is not something that consists of the primary amr. We understand from the ayahs in which the words amr, haqq, izn, qadr, and qada occurs, that the effects of the primary amr can be cancelled, overridden, or entirely new conditions can be created by new amr (the secondary amr) sent down by Allah.
The secondary amr sent by the Real Ruler is what is (in terms of the verbs used in the Qur’an in association with it) determined (mubrim), decided on (mustaqir), measured to a measure (qadaran maqdura), differentiated (yufraqu), directed or administered (yudabbir), sent (mursil), sent down (munzil), distributed (muqassimat), decreed (qada), and infused in (yulqi). The angels are sent down with this amr, and they descend through the heavens with it, and after the amr is obeyed (ata), applied/done (maf’ul) and completed (balagha), the amr ascends (ya’ruj) to Allah, and returns (yurji) to Him. The effects of this amr are sometimes made visible (zahara) to mankind. These verses indicate that the completion of the cycle of amr can be regarded both as periodical and continuous.
Some of the other ayahs directly related with the secondary amr are:
“… an amr from Allah’s presence …” (Maida 5/52, Duhan 44/5)
“Or do they determine the amr? We indeed are the determiner (= mubrimun).” (Zukhruf 43/79)
“…all amr have been decided on (= wa kulli amrin mustaqir).” (Qamar 54/3)
“[Allah] directs the amr from the heaven to the earth (= yudabbir al-amri min as-samai ilal ard); then it ascends (= ya’ruj) to Him in [part of ] a day the measure of which is thousand years in your count.” (Sajda 32/5)
“[Allah] sends / sends down the amr (= mursil/munzil).” (Duhan 44/5, Talaq 65/5)
“Allah’s amr is a measure measured (= qadaran maqdura).” (Ahzab 33/38)
“[A nigt] in which all wise [or mighty] amr are differentiated (= fiha yufraqu kulli amrin hakeem).” (Duhan 44/4)
“The angels and the Spirit (= Ruh) descend in that [night] by the leave (= izn) of their Sustainer from [or with] all amr.” (Qadr 97/4)
“When He decrees an amr, He says: ‘Be!’, and it is (= iza qada amran yaqulu lahu kun fa yakun).” (Baqara 2/117, Al-i Imran 3/47, Maryam 19/35, Mu’min 40/68, Ya-Sin 36/82)
“The amr of Allah has come (= ata amrullah) …” (Nahl 16/1)
“… the amr of Allah is done [or applied] (= wa kana amrullahi maf’ula).” (Nisa 4/47)
“… the amr of Allah has become visible (= zahara amrullah) …” (Tawba 9/48)
“Our amr is but a single [act] like the twinkling of an eye.” (Qamar 54/50)
“… Allah has power over His amr (= wallahu ghalibun ‘ala amrihi), but most among mankind know it not.” (Yusuf 12/21)
There are ayahs in the Qur’an indicating that the angels are given the task of applying the secondary amr. Some of these are,
“We [angels] descend only by the leave of your Sustainer (= wa ma natanazzalu illa bi amri rabbika)…” (Maryam 19/64)
“And to Allah bow all that is in the heavens and in the earth whether moving [living] creatures or the angels; for none are arrogant [before their Sustainer].” (Nahl 16/49-50)
“And they [the angels] speak not before He [speaks], and they act by His amr.” (Anbiya 21/27)
Another two ayahs related with the secondary amr, which particularly attract our attention are,
“Nor can a soul die except by Allah’s leave (= izn), the term being fixed as by wiriting…” (Al-i Imran 3/145)
“For each [person] there are [angels] before and behind him; they protect him from the amr of Allah (= yahfazuna min amrillah). Verily, never will Allah change the condition of a nation until they change what is in their soul; but when Allah wishes the punishment of a nation, there can be no turning it back, nor will they find beside Him any to protect.” (Ra’d 13/11)
As can be seen, the first ayah states no person/soul (= nafs) dies except by Allah’s leave (=bi iznillah) which is associated with an amr. This ayah also shows that all the physical conditions determined by the primary amr would not be sufficient to cause the death of a person, however the seem to be deadly. This issue is made clear by the statement in the last ayah, “they protect him form the amr of Allah” so that these protectors [angels], protect that person from the unbearable and deadly effects of the primary amr.10
All these ayahs clearly indicate that the primary amr that has been revealed in the heavens, may not explain every event that happens in the heavens and the earth, despite that the order in the heavens has primarily been established and is maintained by it. Still, we must not overlook the fact that the primary amr has a basic function in maintaining the cosmic order.
We also see from these ayahs that the decree and its application of the amr that overtakes or overcomes the primary amr in a region of space is totally dependent on Allah’s leave (= izn). This is very important, because without Allah’s leave, the primary emr continues its function, and all makan (= spaces) and all the objects in them continue to carry the properties determined by it. The following ayahs clearly state this,
“Did you not see that Allah has made subject to you whatever is on the earth? Ships flow(= tajree) by His amr; He holds the heaven from falling on the earth so that it would not fall, except by His permission (= izn); Allah is Most Kind and Most Merciful to mankind.” (Haj 22/65)
“… the sun, the moon and the stars are in subjection by His amr; verily in this are signs for a nation who use intellect.” (Nahl 16/12)
As can be seen from these ayahs, the properties and motions of the objects in the heavens are formed by the primary amr, and as long as Allah does not send another amr on them by His izn, their properties and motions will continue. The effective use of objects in the heavens and the forces in them by mankind requires studying these properties and the physical forces that determine these properties. We can say that the basic physical forces emerge as the result of the order (= mizan) that has been placed in the heavens.
We can acquire, by all the activity that we call “scientific research”, only the knowledge of the effects of the primary amr and the established order (= mizan). In the future, even if we should have an excellent knowledge of physics and computation, we can only have the knowledge of predicting the effects of the primary amr in a certain region of space.11 As we will explain shortly, when Allah interferes with His amr in any “natural penomenon” it would be impossible to make any reliable prediction about the processes of that phenomenon by physical methods, because it is impossible to know by any scientific method, when and where the secondary amr will take effect.
The ayahs related with this issue clearly shows that, even if we have a complete knowledge of the primary amr that has been revealed in the heavens, we would still not have a complete and absolute knowledge about the world. Even if we used all our scientific research methods, we cannot obtain even the trace of knowledge of the secondary amr that Allah may send by His leave (= izn), to cancel out or partially or completely change the effects of the primary amr in a region of space. This tells us that even when we possess an excellent science and technology, we should put our reliance and trust only in Allah.
***
We see in some of the ayahs of the Qur’an that a close relationship is made between using intellect and understanding cosmological events. Since the order in the heavens and the earth is the work of Allah, the study and research for understanding this great work should be a paramount duty for mankind, because in this way, the true might of Allah can be better understood and better appreciated.
Besides, these ayahs clearly motivate mankind to reason about the creation of the heavens and to understand the amr which lay beneath the cosmological events, and their effective use for the benefit of mankind. These ayahs also ask mankind to take lessons from such events, and guide them to realize that the life of the Hereafter which Allah promises is far superior to the life of this world. Some of the related ayahs are,
“Indeed, in the creation of the heavens and the earth; in the alternation of the day and night; in the ships which flow (= tajree) in the ocean; in the rain which Allah sends down from the sky and revives the earth with it after its death; in the beasts of all kinds that He scatters throughout the earth; in the redirection (= tasreef) of the winds, and the clouds which they trail like their slaves between the earth and the sky, are signs for a nation who use intellect.” (Baqara 2/164)
“In the alternation of the day and night; in the sustenance which Allah sends down from the sky and revives the earth with it after its death; and in the redirection (= tasreef) of the winds, are signs for a nation who use intellect.” (Jasiya 45/5)
“And such are the parables We set for mankind, but none use intellect on them except those who have knowledge (= wa ma ya’qiluha illal ‘alimun).” (Ankabut 29/43)
In the first two verses above, the word “ya’qilun” (= those who use intellect) refers to those who can see the connection between these ayahs and reality. The same term appears in the following verses which state that the life of the Hereafter is superior to the life of this world:
“… But the home [or life] of the Hereafter (= dar al akhira) is better for the righteous; will you not use intellect?” (A’raf 7/169)
“The things you have been given are but the provision and the glitter of the life of this world; better is Allah’s reward and more lasting. Will you not use intellect?” (Qasas 28/60)
On the other hand, the following ayah states that those who do not use intellect are like cattle, or even lower in guidence:
“Do you think that most of them listen or use intellect? They are like cattle, and even more misguided (= bal hum adall).” (Furqan 25/44)
In the ayah below, a surprising stetement appears: “Whatever is in the heavens” have been given to the use and benefit of mankind:
“[Allah] has subjected (= sakhara) to you whatever is in the heavens and the earth, all from Him; verily in this there are signs (= ayat) for a nation who reflect (=qawmin yatafakkarun).”
In this ayah the expression “all from Him” indicates that the use of all the objects and events, or all the physical forces that form and determine their properties and motions are given, without exception, potentially to all men who strive to understand them. In this ayah we also see that the word “sakhara” (= subjected to) is linked with the phrase “a nation who reflect”. This means that the effective use of the objects and the physical forces in the heavens and the earth will be accomplished by collaborative study and the use of intellect by people as nations, rather than as isolated individuals. This would require of course, a public orientation and participation.
The end of the order in the heavens: The amr of the Hour
The amr which is termed in the Qur’an as the amr of the Hour (= amr-us saah), is the instruction which will terminate the order established with the primary amr. As understood from the ayahs related with the Day of Standing (= yawm al qiyama) in the Qur’an, this amr will take effect in a day which will encompass Resurrection and the Day of Reckoning (= yawm al hisab). This subject is dealt with in detail in another study titled “The Day of Standing in the Qur’an and Traditions”, which we hope to have translated into English soon. After these explanations, we can now go on to the definitions of the terms “physical event” and “natural phenomenon” within this conceptual framework.
4. The Definition of “Physical Event”
After this brief inquiry into the ayahs related with the word amr, we can now attempt to provide a definition of “physical event” within this framework: A physical event or physical phenomenon, is an event which happens only within the context of the primary amr which has been revealed in the heavens with their creation. The characteristic feature of such events is that they are repeatable (or repeatedly observable) by humans in the laboratory and observation conditions. We can say that causality in physical events arises as a result of the order or symmetry (= mizan) laid with the primary amr. Causality can be said to be relevant only in the space of large (or macro) scale interactions. Symmetry is also in effect in most interactions in micro space, but here causality leaves its place to uncertainty due to some fundamental properties of light (= photons?) invariably used in the measurements.
The uncertainty in physical events arises in two categorically different forms: uncertainty in the microworld, and uncertainty in the macroworld. We stated that the first arises from the basic properties of light used in observations and measurements. The second type of uncertainty arises from the difficulties of determining the initial conditions of certain complex physical events. This can be called the statistical uncertainty. The physicists believe that many physical phenomena can be modeled by differential equations. In such models, the main problem is to determine the initial conditions, so that starting with these conditions at a time to, the equations would give the status of the event at time t1. In many physical events, the initial conditions are too complicated to know, but in many others, these are known within statistical limits.
5. The Definiton of “Natural Phenomenon”
The term “natural phenomenon” is mostly used for macro scale events observed in the world and in space. Solar and lunar eclipses, various meteorological events, and earthquakes are regarded as phenomena in this framework. In today’s understanding of science all natural phenomena are believed to be a composition of mere physical events. However, unlike accurately predictable events such as the solar and lunar eclipses, the unpredictibility of meteorological events and earthquakes have led the scientists to consider them in a separate class as “chaotic events”.
Based on this classification, causality in natural phenomena needs to be considered in two different frames. Space events such as the solar and lunar eclipses can be explained by the effects of the physical forces, such as gravity, which determine the orbits of the objects in space. But as the explanations of meteorological and tectonic events require taking into account of a number of different effects at the same time, a complete explanation of such an event becomes impossible. The term “butterfly effect” for meteorological phenomena has been coined by some scientists for this purpose. In meteorological and geological phenomena, many effects such as the particular spatial configuration of the planets in their orbits according to the earth and the sun, solar explosions (or “solar spots”), the impact of large meteors on the earth, and other space events can be at work together. All these effects contribute to the uncertainties in the predictions and explanations of such events.
6. Allah’s Intervention in Physical Events
Earlier, we saw from the ayahs in the Qur’an that Allah intervenes in physical events in any region of space and time by His secondary amr as He wishes (= yuridu/arada) to do so. When Allah’s amr comes to a space-time region, it may result in three different effects: 1) The obstruction or cancellation of the effects of the primary amr in the same region, 2) The strengthening the effects of the primary amr, 3) The emergence of an entirely new set of effects in the same region.
In the first case, Allah’s new amr (the secondary amr) interacts with the primary amr that occupies the same place so as to cancel or weaken its effects.
In the second case, Allah’s amr interacts with the primary amr so as to increase its effects, and in this way, it strengthens and/or focuses the current effects.
In the third case, Allah’s new amr brings about entirely new and previously unseen effects either by opening new space for itself, or by interacting with the primary amr in the same space.
7. Allah’s Intervention to Natural Phenomena
A number of ayahs in the Qur’an clearly describes examples of how Allah intervenes in and directs what we call meteorological and geological events. We shall see some of these ayas shortly. But first, let us consider what may happen when Allah intervenes in natural phenomena. In such cases we can think of four different effects: 1) Delaying or initiating the occurrence of the natural event in order to disperse its distructive effects which would otherwise be caused under the effects of the primary amr, 2) Focusing and directing the effects of the natural event, 3) Increasing the strength of the effects of the event, 4) Bringing about natural events with perceived effects of such kind as previously unseen. Let us now see some of the ayahs which exemplify these four different effects:
“Did you not see that Allah makes the clouds move gently, then joins them together and then makes them into a heap? Then you see the rain emerge from their midst; and He sends down from the sky mountain masses (of clouds) in which is hail; He strikes with it whom He will, and He turns it away from whom He will. The vivid flash of His lightning almost blinds the eyes.” (Nur 24/43)
In this ayah the expression “He strikes with it whom He will, and He turns it away from whom He will” exemplifies the effects of the first and the second type that we listed above. Some ayahs which could be taken as examples for the other two cases are,
“[The nation of] Aad, behaved arrogantly in the land with no just reason, and they said: ‘Who is mightier than us?’ Could they not see that Allah, who created them, was mightier than they? Yet they denied Our signs.” (Fussilat 41/15)
“So We sent against them a furious wind through the days of disaster that We might give them a taste of punishment of humiliation in this life; but more humiliating still, will be the punishment of the life to come. And they will not be helped.” (Fussilat 41/16)
“And the [nation of] Aad; they were destroyed by a furious wind, exceedingly violent.” (Haaqqa 69/6)
“[The nation of Samood] rebelled against the amr of their Sustainer; so the stunning noise [of a thunderbolt] seized them even while they were looking on.” (Zariyat 51/44)
The expression “stunning noise” can be regarded as an example of a previously unseen “phenomenon”. (How “natural” could it be regarded is another matter.) In another unusual “phenomenon”, the nation of Lot were destroyed by the collapse of the land swallowing the whole city with all its population, together with a hail of baked stones [lava stones or meteorites?] from above:
“[The angels] said: O, Lot!, we are messengers from your Sustainer; they [your nation] shall not touch you; depart with your kinfolk with the dead of the night, and none of you look back; as for your wife, she shall suffer the fate of the others. In the morning their hour will come. Is not the morning near?” (Houd 11/81)
“And when our amr came, we turned it [the city] upside down, and let loose upon it a shower of baked-stones spread layer on layer.” (Houd 11/82)
These ayahs describe, clearly with no need for more comments, how these nations were destroyed by what we may still call “natural events”.
At this point, regarding our current subject, we are faced with two extremely important questions: If Allah severely punishes sometimes entire nations, what could be the reasons for such punishments? Could it be known or predicted how such punishments will happen? The answers to these questions can be found in the Qur’an in ayahs where the word “sunnatullah” occur. In the Qur’an, the word “sunnatullah” occurs in reference to certain forms of conduct for people and nations. These laws are not changeable, not for even the Messengers of Allah:
“… no change will you find in the laws of Allah (= fa lan tajida li sunnatillahi tabdeela); and no turning off will you find in the laws of Allah.” (Faatir 35/43)
In the Qur’an these laws of conduct are also referred as “the law of the ancients [past nations]” (= sunnatul awwaleen) in some ayahs. The codes of these laws, or in other words the conditions for them to take effect can be found in the ayahs where the words “haqq” and “sunna” take place. We can summarize some of the conditions of these laws as follows:
- To behave arrogantly in the land with no just reason.
- To secretly devise evil plots.
- To kill the messengers of Allah, and those who instruct with equity.
- To make friends with the subjects of Satan (= shayateen) against Allah.
Let us now see the related ayahs. The first is related with the arrogants:
“[The nation of] Aad, behaved arrogantly in the land with no just reason (=fastakbaru fil ardi bi ghayri-l haqq), and they said: ‘Who is mightier than us?’ Could they not see that Allah, who created them, was mightier than they? Yet they denied Our signs.” (Fussilat 41/15)
The Aad paid in this world the due of their evil conduct by being destroyed by a furious storm:
“So We sent against them a furious storm through the days of disaster that We might give them a taste of punishment of humiliation in this life; but more humiliating still, will be the punishment of the life to come. And they will not be helped.” (Fussilat 41/16)
Another great sin which deserves punishment in this world is to secretly devise evil plots (= makr-us sayyia) against people:
“[Their] arrogance in the land and [their] plotting evil (= istakbaran fil ardi wa makr-us sayyia); evil plots will harm only their authors. Are they looking for other than the law of the ancients? “… no change will you find in the laws of Allah (= fa lan tajida li sunnatillahi tabdeela); and no turning off will you find in the laws of Allah.” (Faatir 35/43)
Related with those who devise evil plots, the following ayahs need also be considered:
“Do those who secretly devise evil plots (= allazina makaru-s sayyiat) feel secure that Allah will not cause the earth to swallow them up, or that the wrath will not seize them from directions they little percieve.” (Nahl 16/45)
“Or that He may not seize them in the midst of their going to and fro, when they cannot escape?” (Nahl 16/46)
“Or that He will not give them over to slow destruction? Yet your Sustainer is Compasionate and Merciful.” (Nahl 16/47)
The expression “your Sustainer is Compassionate and Merciful” can be uderstood as that Allah will save those who were wronged by the hands of the plotters of evil. It can also mean that Allah gives long respite to those who devise evil plots, beside openly warning them off by His words about the due results of their deeds, so that they may take heed and give up their evil deeds.
From these ayahs we understand that such plotters of evil are to be sternly punished in this world in four different ways:
1) Allah will bury them in the ground by a terrible disaster.
2) They will be seized by the wrath from a direction they hardly percieve.
3) They will be seized when they go about doing their business.
4) They will be subjected to a slow destruction.
The past nations which murdered Allah’s messengers, or forcefully drove them away from their homes have been destroyed according to these laws (see, e.g. Isra 17/76-77). Those people who unjustly murdered the individuals who instructed equity among them, have also been destroyed in accordance with these laws:
“As to those who deny Allah’s revelations, and slay the Prophets and slay with no just reason those who instruct with equity among mankind; announce them a grevious penalty.” (Al-i Imran 3/21)
“And We have destined for them intimate companions [shayateen]; who make past and future seem fair to them; well was the word justified against them [or: they deserved the fate] which overtook the parties (= umam) of the jinn and men who have gone before them. They shall assuredly be lost.” (Fussilat 41/25)
The expression “well was the word justified against them” (= haqqat ‘alayhim-ul qawl) in the last ayah can be taken to refer to the laws that have been applied to the ancients.
What we have discussed to this point are the laws by which mankind are punished in this world when they exceed the limits. We can now return to our second question above: Could it be known beforehand how the punishment will come into effect?
It is not possible to predict by which “natural event”, and in fact how sunnatullah will take effect. But since the limits of the conditions of these laws are given in the Qur’an, the pending disaster can be estimated by closely observing the behavior of the society, particularly the behavior of those who command and exercise power in it (e.g., its leaders, elites and the wealthy), as to whether the conditions are fulfilled or not.
There are many ayahs in the Qur’an about how Allah has punished the wrongdoing nations in the past some of which seem to be relevant to the conditions of our time:
“Many a cities have insolently opposed the command of their Sustainer and His Messengers, and we called them to a severe account (= hasabnaha hisaban shadeeda); We punished them with exemplary punishment.” (Talaq 65/8)
“When we decide to destroy a settlement, we first send our command to those of them who live in comfort; if they trangress, so that the word is proved true against them; then We destroy them utterly.” (Isra 17/16)
“We did not wrong them, but they wronged their own souls; when the amr of your Systainer comes (= lamma ja’a amru rabbuka), the deities they invoked other than Allah availed them nothing; they only hastened their ruin.” (Houd 11/101)
Lastly, apart from these ayahs, the following ayah is of great interest regarding what may be expected to happen in the future in this world:
“There is no city (= qarya) but shall be destroyed or sternly punished before the Last Day; that is decreed in the Book (= kana zalika fi-l kitabi mastura).” (Isra 17/58)
It would be the duty of all men and women who think, keep contact with reality and take heed, to be prepared as needs to be prepared before the truth of these ayahs become visible.
Conclusion
Many ayahs in the Qur’an declare that Allah is the Creator of the heavens and the earth, and their Real Ruler. He has encompassed and encircled all things with knowledge. Nothing in the heavens and the earth can ever escape His knowledge. Allah intervenses and directs with His secondary amras He wishes, what we call “natural phenomena” which normally happen within the framework of the primary order which He has established in the heavens by His primary amr. He punishes with them whom He decides among mankind, and turns the destruction away from whom He wishes. Allah has made clear by His laws termed as “sunnatullah” in the Qur’an, under what conditions He sends His punishment on people and nations. These laws are also somewhat related with the order (= mizan) which has been established by their creation. If the limits of these laws are well known, it can be sensed when they will take effect, but except as Allah will, it cannot be known exactly when and how they will take effect. We must strive to understand, by using all the means available to us, the happenings in the heavens and the earth and try to understand reality as a whole. When we do this study in a perfect manner and keep equity about ourselves and our position in this world at the same time, we would better appreciate the true might of Allah (= haqqa qadrihi). We will also realize, as has been written in the Qur’an, that the life of the Hereafter is much superior to the life of this world and will give direction to our lives accordingly. Allah knows the best of all things. Was-salaam.
Notes
1 We call what is termed in the Qur’an “the heavens and the earth” as “the universe”. The validity of the term “universe” is being questioned by some physicists such as David Deutsch, who would prefer the term “multiverse” instead. The repeated use of the expression “the heavens and the earth” in the Qur’an can be viewed as to stress the particular importance of the earth in the heavens, with its being the home of millions of different living species including the humans, and its surprisingly suitable conditions for the sustenance of life. This particular place of the planet earth in the universe has been of great interest to many physicists and cosmologists in recent decades. (See, Ref. 7-b)
2 The expression “yanzuru fi” in this verse refers to both observation and systmatic (theoretical) thinking, and muslim scientists in the Classical Era have derived the term “nazariya” (= theory) from the root of this verb “nazara fi”.
3 For more detailed explanations please see,
- Kocabas, S. “Islam’da Bilginin Temelleri”. Iz Yayincilik, Istanbul, 1997.
(An extended English version of this book is being prepared, and we hope to publish it in the future.)
4 The word “amr” occurs in the Qur’an in other frames than the two we have given here. See, Ref. 3 for details.
5 We must emphasize that the word “amr” is not classified in the Qur’an as “primary amr” and “secondary amr”. We have introduced this distinction from the differences of frames of the use of this word. But the appropriateness of this classification can even be seen from the personal pronouns that are used with the word “amr” in the Qur’an. What we call “the primary amr” corresponds to the uses in the ayahs where the word appears almost exclusively in the form “His amr” with the third person singular pronoun, and our term “the secondary amr” corresponds to the uses of the word as “the amr of Allah”, “Our amr”, and simply “the amr”.
6 Whether this amr can be understood as the “operating system” of the heavens, or as a kind of “software” loaded in the heavens, is a subject that deserves to be seriously considered.
7 See;
a - Davies, P. (1992). “The Mind of God”. Touchstone Books. New York.
b - Barrow, J.D. & Tipler, F. (1996). “The Anthropic Cosmological Principle.” Oxford: Oxford University Press.
8 For information physics, see: Stonier, T. (1990). “Information an the Internal Structure of the Universe”. London: Springer-Verlag.
9 The current theoretical framework about elementary particles involves the protection of the basic building blocks of material existence (e.g. protons and electrons) from decay. As an example, consider protons which are accepted to be one of the basic constituents of hydrogen atoms: Unlike free neutrons; protons can stay for a long period (at least 1030 s.) without decaying into lighter particles (mesons and leptons). Had there not been in effect a particular form of baryonic symmetry between elementary particles, there would be no atoms, and no living and inanimate objects as we know them in the world. The word “tazula” may be taken to refer to preventing such kind of collapse.
10 For detailed information on this subject, see Ref. 3.
11 Despite this, we believe that all scientific research in this direction need to be continued by all means, because only those who have knowledge can see the limits of current scientific knowledge, and can better appreciate the true might of Allah. Also, we need to remember the ayah: “... and say: Could those who know be like those who know not? ...” (Zumar 39/9)
Sakir Kocabas
Summary
In this work we look into how Allah directs what we call “natural phenomena”. Let us remind from the outset that our study is based on the ayahs (= verses) of the Qur’an. For a more detailed study, the sayings of Prophet Muhammad (s.a.w.) on the subject need also be taken into account. Yet, relying on the general principle that there can be no contradiction between the ayahs of the Qur’an and the sayings of the Prophet, we believe that the conclusions that can be derived from the Qur’an about this subject will be sufficient to draw a correct frame to start with.
Before we go on to explore the main subject of this study, we need to recall some of the ayahs in the Qur’an about how Allah, the Creator of the heavens and the earth, has established the order in the heavens and the earth and how He maintains it. For this reason we will first see the ayahs of the Qur’an, which state that Allah is the Real Ruler (= Malik al-Haqq). Secondly, we examine in some detail, the ayahs that express how Allah has established the order in the heavens and how He maintains it. Next, we attempt to bring clarity to the concepts of “physical phenomenon” and “natural phenomenon”. After these definitions we attempt to explore our main subject: How Allah controls and directs physical phenomena and natural phenomena. Finally, we end our survey with a summary of the conclusions.
1. Allah is the Real Ruler (= Malik al-Haqq)
In the Qur’an, there are more than 30 ayahs (= Qur’anic verses) that state that Allah is the Creator of the heavens and the earth.1 Some other ayahs state that Allah’s is the dominion of the heavens and the earth (= lahu mulk as-samawati wa-l ard). Moreover, Allah also states in the Qur’an that He is the Real Ruler (= Malik al-Haqq):
“Exalted is Allah, the Real Ruler; be not in haste with the Qur’an before its revelation to you is completed, but say: My Sustainer, increase me in knowledge.” (Ta-Ha 20/114)
We learn from this ayah an important name of Allah: Malik al-Hakk (= the Real Ruler). We need to dwell on the meaning of this name. Briefly, the above ayah expresses in a clear and succint way that whatever happens in the heavens and the earth happens under the direction and control of Allah. The following ayah on the other hand, states clearly that there is nothing in the heavens and the earth that escapes His knowledge:
“Allah is He who created the seven heavens, and of the earth the like of them [in number]; the instruction (= amr) descends through the midst of them [all]; that you may know that Allah has power over all things and that Allah has encircled all things with knowledge (= wa annallaha qad ahata bi kulli shay’in ‘ilma).” (Talaq 65/12)
“But the god of you all is Allah; there is no god but He; He surrounds all things with knowledge (= wasia kulli shay’in ‘ilma).” (Ta-Ha 20/98)
These ayahs state that Allah has encircled and surrounded everything with knowledge. Moreover, as stated in many other ayahs in the Qur’an, “He is well informed of the actions of His servants (= wallahu khabeerun bi ma ya’malun); “He sees their actions” (= wallahu baseerun bi ma ya’malun); “He hears and sees” (= innahu huwas sami’ul baseer); and as stated in Mulk/67, “He sees everything” (= innahu bi kulli shay’in baseer); in Fatir/38 “Allah knows the secrets of the heavens and the earth (= innallaha ‘alimul ghaybis samawati wal ard); and in the same ayah “indeed He knows the secrets of the hearts (= innahu ‘aleemun bi zatissudur); and in Yunus/10 “… nothing in the heavens and the earth that weighs as an atom (= misqala zarratin), or [anything] smaller or greater than that escapes from His attention (= wa ma ya’zubu ‘an rabbika), all [of this] are in an open book.” It is clearly understood from these ayahs that, all that happens in the heavens and the earth is like an open book to Him, and that nothing happens in the heavens and the earth outside His knowledge.
2. Allah’s is the Administration of the Heavens and the Earth
In the previous section we saw the ayahs which state that Allah is the Real Ruler, and that there is nothing that escapes His knowledge. Yet, His power is not limited to this, for Allah holds in His hand, the Administration (= malakut) of all things:
“Exalted is He in whose hand the is administration of all things (= fa subhan allazi bi yadihi malakutu kulli shay’); you will return to Him.” (Ya-Sin 38/83)
From these verses it is clear that Allah holds the Administration of the heavens and the earth, and He is the Real Ruler. The limited power that He gives to some people in this world for a determined period, is only by His will, and He takes it back when He will.
After we saw the ayahs that state that Allah is the Real Ruler of the heavens and the earth, we are faced with two questions: How does Allah rule the heavens and the earth? Can the human mind comprehend how He rules the heavens and the earth? At first sight these questions may seem to be impossible to answer, yet the answers to both questions can be derived easily from the Qur’an. The next ayah invites mankind to conduct observational and theoretical study on the Administration of the heavens and the earth:
“Have they not studied the administration of the heavens and the earth and what things that Allah has created? (= awa lam yanzuru fi malakut as-samawati wal ardi wa ma khalaqallahu min shay)” (A’raf 7/185)
Another ayah below expresses how Allah actually realizes the Administration, and that this is to be known by mankind:
“Allah is He who has created the seven heavens, and of the earth the like of them; the amr (= instruction) descends through the midst of them so that you may know (= li ya’lamu) that Allah has power over all things and that Allah has encircled everything with knowledge (= wa annallaha qad ahata bi kulli shay’in ‘ilma).” (Talaq 65/12)
In this ayah the word “amr” (= instruction/command) is a particularly important keyword for understanding the administration of the heavens and the earth. As can be understood from the ayah, Allah rules the heavens and the earth by His amr. But what is amr and what is its function in the administration? In order to understand this, we have to look into the ayahs in the Qur’an that this word and its derivatives occur. We attempt to explain briefly what this word refers to in the ayahs next.3
3. The Word Amr in the Qur’an and the Order in the Heavens
In the Qur’an, in relation to the creation and the administration of the heavens, the word amr appears in the ayahs in three principal frames:4
1) The amr that has been revealed into the heavens during their creation, by which the primary order has been established . (We call this the “primary amr”.)
2) The amr that is sent by Allah to influence and control the events in the world. (We call this the “secondary amr”.) With this amr, which He sends down by His angels and which is directly subject to His permission (= izn), Allah can change the current order in any region of space, and can create new and unseen events by it.5
3) The amr that will terminate the current order in the heavens and the earth, which in the Qur’an is called the amr of the Hour (= amr as-Saah).
We can now take a closer look at how the word amr takes place in these three contexts.
The order in the Heavens: The primary amr
The use of the word “amr” together with the words “sakhara” (= make dependent) and “qadr” (= measure) in the ayahs in the first sense above, is closely related with how the order has been established and maintained in the heavens. This is made clear by the ayahs which state that the seven heavens have been revealed in (or loaded with) their instructions with their creation, and that the states of the heavens and of the objects in them are maintained by this amr:
“And in two days He decreed (= qada) them [the heaven and the earth] as the seven heavens, and revealed in each heaven its instruction (= wa awha fi kulli samain amraha) ...” (Fussilat 41/12)
“The sun, the moon and the stars are all subjected [to remain in their courses] by His instruction (= musakharatun bi amrihi).” (A’raf 7/54, Ibrahim 14/33)
“And one of His signs is that the heaven and the earth stands with His amr (= an taqum as-samau wal ardu bi amrihi) ...” (Rum 30/25)
“Did you not see that Allah has made subject to you whatever is on the earth? Ships flow by His amr; He holds the heaven from falling on earth so that it would not fall, except by His permission (= izn); Allah is Most Kind and Most Merciful to mankind.” (Haj 22/65)
As can be seen, the order in the heavens is established and maintained by the (primary) amr that has been revealed in them by Allah. In this case, whatever happens in the heavens must happen in accordance with this amr, so long as there is no other intervention by Allah.6
This understanding leads us to an interesting concept of science, such that in this conceptual framework the aim of scientific investigation becomes understanding and explicating the structure and distribution of the instruction that has been revealed in the heavens by the Creator. An understanding of science as such, would not only explain many things about the order and harmony in the known space, but also would bring clarity to the issue of the creation and formation of the objects in the space. No cosmology developed to date has the basic concepts by which the extremely complicated, and yet excellent order that we observe from the micro-world to the macro-world, can be explained in a consistent way. How did, from a small number of basic physical forces, evolve the wonderously rich physical, chemical, biological and psychological interactions in the world? Is there a cosmic plan behind all this? These are the questions which occupy the minds of many scientists working in the fields of physics and cosmology.7
From the ayahs related with the primary amr in the Qur’an, we can infer that the order in the heavens emerge as a result of the interactions of the instructions (= amr) dispersed in all regions of space (= makan). In this case, the word “amr” (= instruction) emerges as a fundamental concept directly related with “being”. In information physics, the concept of information is used as a basic concept in explaining the degree of order (or the negative entropy) of physical systems.8 But there are categorical differences between the concepts of amr and information: It seems that the primary amr is a set of instructions which not only determines the order in a region of space, but also brings about what we call “matter” itself. From this, we can say that the concept of amr, unlike the concept of information, is a concept related with “being”, or in philosophical terms, is an ontological concept. (Indeed, in a number of ayahs the word amr occurs in close relation with the word kun (= be); see e.g. ayahs in Baqarah 2/117, Al-i Imran 3/47, Maryam 19/35, Mu’min 40/68, Ya-Sin 36/82).
At this point, we are faced with the question whether the primary amr is sufficient in itself or not, in maintaining the order in the heavens as an important question. Before we make a judgement on this issue, we need to consider the ayah:
“It is Allah who holds the heavens and the earth from collapse (= yumsiku-s samawati wal arda an tazula); if they should collapse, there is none, not one can hold them therafter; verily He is most Forbearing, oft Forgiving.” (Fatir 35/4)
This ayah brings several possibilities in mind. The first one is that the verb “holds” (= yumsiku) can be understood as “holds with His amr”, so that when the effects of the primary amr is obliterated by Allah, there would be no one other than Him to bring back the order. The ayah in Rum 30/25 that we saw earlier strengthens this possibility. The second one is that the heavens are protected from reduction or collapse, such that the word “tazula” which is a derivative of “zawal” (= reduction, collapse) may be pointing to such possibility.9
Another possibility is that, the cosmic order which has been established by the primary amr cannot go on indefinitely by itself, and that Allah maintains the order by His secondary amr. It is also imaginable that both possibilities can be the case. Other possibilities than what we said here in a theoretical and speculative framework need also be considered and investigated.
Allah’s intervention to the events in this world: The secondary amr
Let us continue with the relationships between the word amr and the order in the heavens. Since the order in the heavens has been established by the primary amr that has been revealed in them, one might think: If we have a complete understanding of the primary amr which have been revealed in the heavens, and consequently to all systems in them, we can understand all that happens in them. In this way we can possess complete knowledge about these happenings, and see the future. (This could be the final vision of the contemporary understanding of science.) Yet, as explained in some detail below, the problem is not as simple as this. The main reason is that the amr is not something that cannot be changed, and that its effects cannot be overridden once it has been revealed in these systems. An ayah which we saw earlier, explicitly states that other instructions are being (continuously or periodically) sent down by Allah:
“Allah is He who created the seven heavens, and of the earth the like of them [in number]; the instruction (= amr) descends through the midst of them [all]; that you may know that Allah has power over all things and that Allah has encircled all things with knowledge.” (Talaq 65/12)
Indeed, as will be seen when the conceptual frames of the words amr and all other related words (haqq, qadr, qada, izn, sakhara, sultan, ‘aql, and ruh), the amr is not something that consists of the primary amr. We understand from the ayahs in which the words amr, haqq, izn, qadr, and qada occurs, that the effects of the primary amr can be cancelled, overridden, or entirely new conditions can be created by new amr (the secondary amr) sent down by Allah.
The secondary amr sent by the Real Ruler is what is (in terms of the verbs used in the Qur’an in association with it) determined (mubrim), decided on (mustaqir), measured to a measure (qadaran maqdura), differentiated (yufraqu), directed or administered (yudabbir), sent (mursil), sent down (munzil), distributed (muqassimat), decreed (qada), and infused in (yulqi). The angels are sent down with this amr, and they descend through the heavens with it, and after the amr is obeyed (ata), applied/done (maf’ul) and completed (balagha), the amr ascends (ya’ruj) to Allah, and returns (yurji) to Him. The effects of this amr are sometimes made visible (zahara) to mankind. These verses indicate that the completion of the cycle of amr can be regarded both as periodical and continuous.
Some of the other ayahs directly related with the secondary amr are:
“… an amr from Allah’s presence …” (Maida 5/52, Duhan 44/5)
“Or do they determine the amr? We indeed are the determiner (= mubrimun).” (Zukhruf 43/79)
“…all amr have been decided on (= wa kulli amrin mustaqir).” (Qamar 54/3)
“[Allah] directs the amr from the heaven to the earth (= yudabbir al-amri min as-samai ilal ard); then it ascends (= ya’ruj) to Him in [part of ] a day the measure of which is thousand years in your count.” (Sajda 32/5)
“[Allah] sends / sends down the amr (= mursil/munzil).” (Duhan 44/5, Talaq 65/5)
“Allah’s amr is a measure measured (= qadaran maqdura).” (Ahzab 33/38)
“[A nigt] in which all wise [or mighty] amr are differentiated (= fiha yufraqu kulli amrin hakeem).” (Duhan 44/4)
“The angels and the Spirit (= Ruh) descend in that [night] by the leave (= izn) of their Sustainer from [or with] all amr.” (Qadr 97/4)
“When He decrees an amr, He says: ‘Be!’, and it is (= iza qada amran yaqulu lahu kun fa yakun).” (Baqara 2/117, Al-i Imran 3/47, Maryam 19/35, Mu’min 40/68, Ya-Sin 36/82)
“The amr of Allah has come (= ata amrullah) …” (Nahl 16/1)
“… the amr of Allah is done [or applied] (= wa kana amrullahi maf’ula).” (Nisa 4/47)
“… the amr of Allah has become visible (= zahara amrullah) …” (Tawba 9/48)
“Our amr is but a single [act] like the twinkling of an eye.” (Qamar 54/50)
“… Allah has power over His amr (= wallahu ghalibun ‘ala amrihi), but most among mankind know it not.” (Yusuf 12/21)
There are ayahs in the Qur’an indicating that the angels are given the task of applying the secondary amr. Some of these are,
“We [angels] descend only by the leave of your Sustainer (= wa ma natanazzalu illa bi amri rabbika)…” (Maryam 19/64)
“And to Allah bow all that is in the heavens and in the earth whether moving [living] creatures or the angels; for none are arrogant [before their Sustainer].” (Nahl 16/49-50)
“And they [the angels] speak not before He [speaks], and they act by His amr.” (Anbiya 21/27)
Another two ayahs related with the secondary amr, which particularly attract our attention are,
“Nor can a soul die except by Allah’s leave (= izn), the term being fixed as by wiriting…” (Al-i Imran 3/145)
“For each [person] there are [angels] before and behind him; they protect him from the amr of Allah (= yahfazuna min amrillah). Verily, never will Allah change the condition of a nation until they change what is in their soul; but when Allah wishes the punishment of a nation, there can be no turning it back, nor will they find beside Him any to protect.” (Ra’d 13/11)
As can be seen, the first ayah states no person/soul (= nafs) dies except by Allah’s leave (=bi iznillah) which is associated with an amr. This ayah also shows that all the physical conditions determined by the primary amr would not be sufficient to cause the death of a person, however the seem to be deadly. This issue is made clear by the statement in the last ayah, “they protect him form the amr of Allah” so that these protectors [angels], protect that person from the unbearable and deadly effects of the primary amr.10
All these ayahs clearly indicate that the primary amr that has been revealed in the heavens, may not explain every event that happens in the heavens and the earth, despite that the order in the heavens has primarily been established and is maintained by it. Still, we must not overlook the fact that the primary amr has a basic function in maintaining the cosmic order.
We also see from these ayahs that the decree and its application of the amr that overtakes or overcomes the primary amr in a region of space is totally dependent on Allah’s leave (= izn). This is very important, because without Allah’s leave, the primary emr continues its function, and all makan (= spaces) and all the objects in them continue to carry the properties determined by it. The following ayahs clearly state this,
“Did you not see that Allah has made subject to you whatever is on the earth? Ships flow(= tajree) by His amr; He holds the heaven from falling on the earth so that it would not fall, except by His permission (= izn); Allah is Most Kind and Most Merciful to mankind.” (Haj 22/65)
“… the sun, the moon and the stars are in subjection by His amr; verily in this are signs for a nation who use intellect.” (Nahl 16/12)
As can be seen from these ayahs, the properties and motions of the objects in the heavens are formed by the primary amr, and as long as Allah does not send another amr on them by His izn, their properties and motions will continue. The effective use of objects in the heavens and the forces in them by mankind requires studying these properties and the physical forces that determine these properties. We can say that the basic physical forces emerge as the result of the order (= mizan) that has been placed in the heavens.
We can acquire, by all the activity that we call “scientific research”, only the knowledge of the effects of the primary amr and the established order (= mizan). In the future, even if we should have an excellent knowledge of physics and computation, we can only have the knowledge of predicting the effects of the primary amr in a certain region of space.11 As we will explain shortly, when Allah interferes with His amr in any “natural penomenon” it would be impossible to make any reliable prediction about the processes of that phenomenon by physical methods, because it is impossible to know by any scientific method, when and where the secondary amr will take effect.
The ayahs related with this issue clearly shows that, even if we have a complete knowledge of the primary amr that has been revealed in the heavens, we would still not have a complete and absolute knowledge about the world. Even if we used all our scientific research methods, we cannot obtain even the trace of knowledge of the secondary amr that Allah may send by His leave (= izn), to cancel out or partially or completely change the effects of the primary amr in a region of space. This tells us that even when we possess an excellent science and technology, we should put our reliance and trust only in Allah.
***
We see in some of the ayahs of the Qur’an that a close relationship is made between using intellect and understanding cosmological events. Since the order in the heavens and the earth is the work of Allah, the study and research for understanding this great work should be a paramount duty for mankind, because in this way, the true might of Allah can be better understood and better appreciated.
Besides, these ayahs clearly motivate mankind to reason about the creation of the heavens and to understand the amr which lay beneath the cosmological events, and their effective use for the benefit of mankind. These ayahs also ask mankind to take lessons from such events, and guide them to realize that the life of the Hereafter which Allah promises is far superior to the life of this world. Some of the related ayahs are,
“Indeed, in the creation of the heavens and the earth; in the alternation of the day and night; in the ships which flow (= tajree) in the ocean; in the rain which Allah sends down from the sky and revives the earth with it after its death; in the beasts of all kinds that He scatters throughout the earth; in the redirection (= tasreef) of the winds, and the clouds which they trail like their slaves between the earth and the sky, are signs for a nation who use intellect.” (Baqara 2/164)
“In the alternation of the day and night; in the sustenance which Allah sends down from the sky and revives the earth with it after its death; and in the redirection (= tasreef) of the winds, are signs for a nation who use intellect.” (Jasiya 45/5)
“And such are the parables We set for mankind, but none use intellect on them except those who have knowledge (= wa ma ya’qiluha illal ‘alimun).” (Ankabut 29/43)
In the first two verses above, the word “ya’qilun” (= those who use intellect) refers to those who can see the connection between these ayahs and reality. The same term appears in the following verses which state that the life of the Hereafter is superior to the life of this world:
“… But the home [or life] of the Hereafter (= dar al akhira) is better for the righteous; will you not use intellect?” (A’raf 7/169)
“The things you have been given are but the provision and the glitter of the life of this world; better is Allah’s reward and more lasting. Will you not use intellect?” (Qasas 28/60)
On the other hand, the following ayah states that those who do not use intellect are like cattle, or even lower in guidence:
“Do you think that most of them listen or use intellect? They are like cattle, and even more misguided (= bal hum adall).” (Furqan 25/44)
In the ayah below, a surprising stetement appears: “Whatever is in the heavens” have been given to the use and benefit of mankind:
“[Allah] has subjected (= sakhara) to you whatever is in the heavens and the earth, all from Him; verily in this there are signs (= ayat) for a nation who reflect (=qawmin yatafakkarun).”
In this ayah the expression “all from Him” indicates that the use of all the objects and events, or all the physical forces that form and determine their properties and motions are given, without exception, potentially to all men who strive to understand them. In this ayah we also see that the word “sakhara” (= subjected to) is linked with the phrase “a nation who reflect”. This means that the effective use of the objects and the physical forces in the heavens and the earth will be accomplished by collaborative study and the use of intellect by people as nations, rather than as isolated individuals. This would require of course, a public orientation and participation.
The end of the order in the heavens: The amr of the Hour
The amr which is termed in the Qur’an as the amr of the Hour (= amr-us saah), is the instruction which will terminate the order established with the primary amr. As understood from the ayahs related with the Day of Standing (= yawm al qiyama) in the Qur’an, this amr will take effect in a day which will encompass Resurrection and the Day of Reckoning (= yawm al hisab). This subject is dealt with in detail in another study titled “The Day of Standing in the Qur’an and Traditions”, which we hope to have translated into English soon. After these explanations, we can now go on to the definitions of the terms “physical event” and “natural phenomenon” within this conceptual framework.
4. The Definition of “Physical Event”
After this brief inquiry into the ayahs related with the word amr, we can now attempt to provide a definition of “physical event” within this framework: A physical event or physical phenomenon, is an event which happens only within the context of the primary amr which has been revealed in the heavens with their creation. The characteristic feature of such events is that they are repeatable (or repeatedly observable) by humans in the laboratory and observation conditions. We can say that causality in physical events arises as a result of the order or symmetry (= mizan) laid with the primary amr. Causality can be said to be relevant only in the space of large (or macro) scale interactions. Symmetry is also in effect in most interactions in micro space, but here causality leaves its place to uncertainty due to some fundamental properties of light (= photons?) invariably used in the measurements.
The uncertainty in physical events arises in two categorically different forms: uncertainty in the microworld, and uncertainty in the macroworld. We stated that the first arises from the basic properties of light used in observations and measurements. The second type of uncertainty arises from the difficulties of determining the initial conditions of certain complex physical events. This can be called the statistical uncertainty. The physicists believe that many physical phenomena can be modeled by differential equations. In such models, the main problem is to determine the initial conditions, so that starting with these conditions at a time to, the equations would give the status of the event at time t1. In many physical events, the initial conditions are too complicated to know, but in many others, these are known within statistical limits.
5. The Definiton of “Natural Phenomenon”
The term “natural phenomenon” is mostly used for macro scale events observed in the world and in space. Solar and lunar eclipses, various meteorological events, and earthquakes are regarded as phenomena in this framework. In today’s understanding of science all natural phenomena are believed to be a composition of mere physical events. However, unlike accurately predictable events such as the solar and lunar eclipses, the unpredictibility of meteorological events and earthquakes have led the scientists to consider them in a separate class as “chaotic events”.
Based on this classification, causality in natural phenomena needs to be considered in two different frames. Space events such as the solar and lunar eclipses can be explained by the effects of the physical forces, such as gravity, which determine the orbits of the objects in space. But as the explanations of meteorological and tectonic events require taking into account of a number of different effects at the same time, a complete explanation of such an event becomes impossible. The term “butterfly effect” for meteorological phenomena has been coined by some scientists for this purpose. In meteorological and geological phenomena, many effects such as the particular spatial configuration of the planets in their orbits according to the earth and the sun, solar explosions (or “solar spots”), the impact of large meteors on the earth, and other space events can be at work together. All these effects contribute to the uncertainties in the predictions and explanations of such events.
6. Allah’s Intervention in Physical Events
Earlier, we saw from the ayahs in the Qur’an that Allah intervenes in physical events in any region of space and time by His secondary amr as He wishes (= yuridu/arada) to do so. When Allah’s amr comes to a space-time region, it may result in three different effects: 1) The obstruction or cancellation of the effects of the primary amr in the same region, 2) The strengthening the effects of the primary amr, 3) The emergence of an entirely new set of effects in the same region.
In the first case, Allah’s new amr (the secondary amr) interacts with the primary amr that occupies the same place so as to cancel or weaken its effects.
In the second case, Allah’s amr interacts with the primary amr so as to increase its effects, and in this way, it strengthens and/or focuses the current effects.
In the third case, Allah’s new amr brings about entirely new and previously unseen effects either by opening new space for itself, or by interacting with the primary amr in the same space.
7. Allah’s Intervention to Natural Phenomena
A number of ayahs in the Qur’an clearly describes examples of how Allah intervenes in and directs what we call meteorological and geological events. We shall see some of these ayas shortly. But first, let us consider what may happen when Allah intervenes in natural phenomena. In such cases we can think of four different effects: 1) Delaying or initiating the occurrence of the natural event in order to disperse its distructive effects which would otherwise be caused under the effects of the primary amr, 2) Focusing and directing the effects of the natural event, 3) Increasing the strength of the effects of the event, 4) Bringing about natural events with perceived effects of such kind as previously unseen. Let us now see some of the ayahs which exemplify these four different effects:
“Did you not see that Allah makes the clouds move gently, then joins them together and then makes them into a heap? Then you see the rain emerge from their midst; and He sends down from the sky mountain masses (of clouds) in which is hail; He strikes with it whom He will, and He turns it away from whom He will. The vivid flash of His lightning almost blinds the eyes.” (Nur 24/43)
In this ayah the expression “He strikes with it whom He will, and He turns it away from whom He will” exemplifies the effects of the first and the second type that we listed above. Some ayahs which could be taken as examples for the other two cases are,
“[The nation of] Aad, behaved arrogantly in the land with no just reason, and they said: ‘Who is mightier than us?’ Could they not see that Allah, who created them, was mightier than they? Yet they denied Our signs.” (Fussilat 41/15)
“So We sent against them a furious wind through the days of disaster that We might give them a taste of punishment of humiliation in this life; but more humiliating still, will be the punishment of the life to come. And they will not be helped.” (Fussilat 41/16)
“And the [nation of] Aad; they were destroyed by a furious wind, exceedingly violent.” (Haaqqa 69/6)
“[The nation of Samood] rebelled against the amr of their Sustainer; so the stunning noise [of a thunderbolt] seized them even while they were looking on.” (Zariyat 51/44)
The expression “stunning noise” can be regarded as an example of a previously unseen “phenomenon”. (How “natural” could it be regarded is another matter.) In another unusual “phenomenon”, the nation of Lot were destroyed by the collapse of the land swallowing the whole city with all its population, together with a hail of baked stones [lava stones or meteorites?] from above:
“[The angels] said: O, Lot!, we are messengers from your Sustainer; they [your nation] shall not touch you; depart with your kinfolk with the dead of the night, and none of you look back; as for your wife, she shall suffer the fate of the others. In the morning their hour will come. Is not the morning near?” (Houd 11/81)
“And when our amr came, we turned it [the city] upside down, and let loose upon it a shower of baked-stones spread layer on layer.” (Houd 11/82)
These ayahs describe, clearly with no need for more comments, how these nations were destroyed by what we may still call “natural events”.
At this point, regarding our current subject, we are faced with two extremely important questions: If Allah severely punishes sometimes entire nations, what could be the reasons for such punishments? Could it be known or predicted how such punishments will happen? The answers to these questions can be found in the Qur’an in ayahs where the word “sunnatullah” occur. In the Qur’an, the word “sunnatullah” occurs in reference to certain forms of conduct for people and nations. These laws are not changeable, not for even the Messengers of Allah:
“… no change will you find in the laws of Allah (= fa lan tajida li sunnatillahi tabdeela); and no turning off will you find in the laws of Allah.” (Faatir 35/43)
In the Qur’an these laws of conduct are also referred as “the law of the ancients [past nations]” (= sunnatul awwaleen) in some ayahs. The codes of these laws, or in other words the conditions for them to take effect can be found in the ayahs where the words “haqq” and “sunna” take place. We can summarize some of the conditions of these laws as follows:
- To behave arrogantly in the land with no just reason.
- To secretly devise evil plots.
- To kill the messengers of Allah, and those who instruct with equity.
- To make friends with the subjects of Satan (= shayateen) against Allah.
Let us now see the related ayahs. The first is related with the arrogants:
“[The nation of] Aad, behaved arrogantly in the land with no just reason (=fastakbaru fil ardi bi ghayri-l haqq), and they said: ‘Who is mightier than us?’ Could they not see that Allah, who created them, was mightier than they? Yet they denied Our signs.” (Fussilat 41/15)
The Aad paid in this world the due of their evil conduct by being destroyed by a furious storm:
“So We sent against them a furious storm through the days of disaster that We might give them a taste of punishment of humiliation in this life; but more humiliating still, will be the punishment of the life to come. And they will not be helped.” (Fussilat 41/16)
Another great sin which deserves punishment in this world is to secretly devise evil plots (= makr-us sayyia) against people:
“[Their] arrogance in the land and [their] plotting evil (= istakbaran fil ardi wa makr-us sayyia); evil plots will harm only their authors. Are they looking for other than the law of the ancients? “… no change will you find in the laws of Allah (= fa lan tajida li sunnatillahi tabdeela); and no turning off will you find in the laws of Allah.” (Faatir 35/43)
Related with those who devise evil plots, the following ayahs need also be considered:
“Do those who secretly devise evil plots (= allazina makaru-s sayyiat) feel secure that Allah will not cause the earth to swallow them up, or that the wrath will not seize them from directions they little percieve.” (Nahl 16/45)
“Or that He may not seize them in the midst of their going to and fro, when they cannot escape?” (Nahl 16/46)
“Or that He will not give them over to slow destruction? Yet your Sustainer is Compasionate and Merciful.” (Nahl 16/47)
The expression “your Sustainer is Compassionate and Merciful” can be uderstood as that Allah will save those who were wronged by the hands of the plotters of evil. It can also mean that Allah gives long respite to those who devise evil plots, beside openly warning them off by His words about the due results of their deeds, so that they may take heed and give up their evil deeds.
From these ayahs we understand that such plotters of evil are to be sternly punished in this world in four different ways:
1) Allah will bury them in the ground by a terrible disaster.
2) They will be seized by the wrath from a direction they hardly percieve.
3) They will be seized when they go about doing their business.
4) They will be subjected to a slow destruction.
The past nations which murdered Allah’s messengers, or forcefully drove them away from their homes have been destroyed according to these laws (see, e.g. Isra 17/76-77). Those people who unjustly murdered the individuals who instructed equity among them, have also been destroyed in accordance with these laws:
“As to those who deny Allah’s revelations, and slay the Prophets and slay with no just reason those who instruct with equity among mankind; announce them a grevious penalty.” (Al-i Imran 3/21)
“And We have destined for them intimate companions [shayateen]; who make past and future seem fair to them; well was the word justified against them [or: they deserved the fate] which overtook the parties (= umam) of the jinn and men who have gone before them. They shall assuredly be lost.” (Fussilat 41/25)
The expression “well was the word justified against them” (= haqqat ‘alayhim-ul qawl) in the last ayah can be taken to refer to the laws that have been applied to the ancients.
What we have discussed to this point are the laws by which mankind are punished in this world when they exceed the limits. We can now return to our second question above: Could it be known beforehand how the punishment will come into effect?
It is not possible to predict by which “natural event”, and in fact how sunnatullah will take effect. But since the limits of the conditions of these laws are given in the Qur’an, the pending disaster can be estimated by closely observing the behavior of the society, particularly the behavior of those who command and exercise power in it (e.g., its leaders, elites and the wealthy), as to whether the conditions are fulfilled or not.
There are many ayahs in the Qur’an about how Allah has punished the wrongdoing nations in the past some of which seem to be relevant to the conditions of our time:
“Many a cities have insolently opposed the command of their Sustainer and His Messengers, and we called them to a severe account (= hasabnaha hisaban shadeeda); We punished them with exemplary punishment.” (Talaq 65/8)
“When we decide to destroy a settlement, we first send our command to those of them who live in comfort; if they trangress, so that the word is proved true against them; then We destroy them utterly.” (Isra 17/16)
“We did not wrong them, but they wronged their own souls; when the amr of your Systainer comes (= lamma ja’a amru rabbuka), the deities they invoked other than Allah availed them nothing; they only hastened their ruin.” (Houd 11/101)
Lastly, apart from these ayahs, the following ayah is of great interest regarding what may be expected to happen in the future in this world:
“There is no city (= qarya) but shall be destroyed or sternly punished before the Last Day; that is decreed in the Book (= kana zalika fi-l kitabi mastura).” (Isra 17/58)
It would be the duty of all men and women who think, keep contact with reality and take heed, to be prepared as needs to be prepared before the truth of these ayahs become visible.
Conclusion
Many ayahs in the Qur’an declare that Allah is the Creator of the heavens and the earth, and their Real Ruler. He has encompassed and encircled all things with knowledge. Nothing in the heavens and the earth can ever escape His knowledge. Allah intervenses and directs with His secondary amras He wishes, what we call “natural phenomena” which normally happen within the framework of the primary order which He has established in the heavens by His primary amr. He punishes with them whom He decides among mankind, and turns the destruction away from whom He wishes. Allah has made clear by His laws termed as “sunnatullah” in the Qur’an, under what conditions He sends His punishment on people and nations. These laws are also somewhat related with the order (= mizan) which has been established by their creation. If the limits of these laws are well known, it can be sensed when they will take effect, but except as Allah will, it cannot be known exactly when and how they will take effect. We must strive to understand, by using all the means available to us, the happenings in the heavens and the earth and try to understand reality as a whole. When we do this study in a perfect manner and keep equity about ourselves and our position in this world at the same time, we would better appreciate the true might of Allah (= haqqa qadrihi). We will also realize, as has been written in the Qur’an, that the life of the Hereafter is much superior to the life of this world and will give direction to our lives accordingly. Allah knows the best of all things. Was-salaam.
Notes
1 We call what is termed in the Qur’an “the heavens and the earth” as “the universe”. The validity of the term “universe” is being questioned by some physicists such as David Deutsch, who would prefer the term “multiverse” instead. The repeated use of the expression “the heavens and the earth” in the Qur’an can be viewed as to stress the particular importance of the earth in the heavens, with its being the home of millions of different living species including the humans, and its surprisingly suitable conditions for the sustenance of life. This particular place of the planet earth in the universe has been of great interest to many physicists and cosmologists in recent decades. (See, Ref. 7-b)
2 The expression “yanzuru fi” in this verse refers to both observation and systmatic (theoretical) thinking, and muslim scientists in the Classical Era have derived the term “nazariya” (= theory) from the root of this verb “nazara fi”.
3 For more detailed explanations please see,
- Kocabas, S. “Islam’da Bilginin Temelleri”. Iz Yayincilik, Istanbul, 1997.
(An extended English version of this book is being prepared, and we hope to publish it in the future.)
4 The word “amr” occurs in the Qur’an in other frames than the two we have given here. See, Ref. 3 for details.
5 We must emphasize that the word “amr” is not classified in the Qur’an as “primary amr” and “secondary amr”. We have introduced this distinction from the differences of frames of the use of this word. But the appropriateness of this classification can even be seen from the personal pronouns that are used with the word “amr” in the Qur’an. What we call “the primary amr” corresponds to the uses in the ayahs where the word appears almost exclusively in the form “His amr” with the third person singular pronoun, and our term “the secondary amr” corresponds to the uses of the word as “the amr of Allah”, “Our amr”, and simply “the amr”.
6 Whether this amr can be understood as the “operating system” of the heavens, or as a kind of “software” loaded in the heavens, is a subject that deserves to be seriously considered.
7 See;
a - Davies, P. (1992). “The Mind of God”. Touchstone Books. New York.
b - Barrow, J.D. & Tipler, F. (1996). “The Anthropic Cosmological Principle.” Oxford: Oxford University Press.
8 For information physics, see: Stonier, T. (1990). “Information an the Internal Structure of the Universe”. London: Springer-Verlag.
9 The current theoretical framework about elementary particles involves the protection of the basic building blocks of material existence (e.g. protons and electrons) from decay. As an example, consider protons which are accepted to be one of the basic constituents of hydrogen atoms: Unlike free neutrons; protons can stay for a long period (at least 1030 s.) without decaying into lighter particles (mesons and leptons). Had there not been in effect a particular form of baryonic symmetry between elementary particles, there would be no atoms, and no living and inanimate objects as we know them in the world. The word “tazula” may be taken to refer to preventing such kind of collapse.
10 For detailed information on this subject, see Ref. 3.
11 Despite this, we believe that all scientific research in this direction need to be continued by all means, because only those who have knowledge can see the limits of current scientific knowledge, and can better appreciate the true might of Allah. Also, we need to remember the ayah: “... and say: Could those who know be like those who know not? ...” (Zumar 39/9)
Thursday, August 16, 2007
Integration of Research Tasks in Modeling Discoveries in Particle Physics
INTEGRATION OF RESEARCH TASKS IN
MODELING DISCOVERIES IN PARTICLE PHYSICS
Sakir Kocabas
Pat Langley
(langley @ cs.stanford.edu)
Robotics Laboratory, Computer Science Dept.,
Stanford University, Stanford, CA 94305 USA
Abstract:
This paper describes a discovery system, BR-4, which integrates several research tasks in modeling the discovery of certain quantum properties and conservation laws by physicists in this century. The program is directed by consistency and completeness constraints, and has the capabilities of theory formation and theory revision in its domain, and of explaining its knowledge state by these constraints . BR-4 is capable of formulating new elementary particles and particle reactions, and proposing observations to test their existence. The program revises its domain theory when it detects formal and theoretical contradictions, and when its domain theory conflicts with observational data.
-----------
* Also affiliated with ITU, Faculty of Space Sciences and Technology, Istanbul, Turkey.
** Also affiliated with the Institute for the Study of Learning and Expertise, 2451 High St., Palo Alto, CA 94301 USA.
1. Introduction
Computational modeling of discovery has been the focus of attention by several research groups in the last ten years, and a number of models with different capabilities have been developed. These capabilities include goal selection, experime nt design, data collection, expectation setting, quantitative reasoning, concept formation, hypothesis formation, theory formation, theory revision, explanation, and paradigm shifts by qualitative models. In current models only a few of these discovery tasks have been integrated in one system. The integration of discovery tasks continues to be a difficult problem in this reearch area of artificial intelligence.
The subject of this paper is an integrated discovery model BR-4, with the capabilities of theory formation, event prediction, data acquisition, explanation, and theory revision. Before we describe the system and its behavior, it is appropri ate to present some background information about its task domain, particle physics.
1.1. The Domain of Particle Physics
Particle physics studies the nature of elementary particles - the building blocks of matter - and interactions among these entities. The basic phenomena in this field take the form of reactions, similar in many ways to those found in chemistry. For instance, two such observed reactions* are
p + p --> p + n + pi
pio --> g + g
where the symbols p, n, pi, pio and g represent the proton, neutron, pion, pion-zero and gamma particles, respectively.
As in chemistry, physics require that reactions among elementary particles obey certain conservation laws. For instance, one of the most basic laws states that any such reaction conserve electric charge of the particles involved. Electric charge is an example of a quantum property, and one of the main tasks in particle physics concerns the assignment of values for quantum properties such that observed reactions conserve those properties. Thus, both of the above reactions conserve electric charge provided we assign the commonly accepted charges 1 to p, 0 to n, 1 to pi, 0 to pio, and 0 to g. Other assignments are also possible for this pair of reactions, but they would not be consistent with other observed particles.
-----------
* Typically, physicists infer the occurence of such reactions from tracks in cloud chambers and similar evidence. We will not attempt to model this inference process here, and instead will simply treat reactions as though they are directly observed.
The concern with conservation also explains why some particle reactions are never observed. For example, the process of beta decay,
n --> p + e + /nu,
in which a neutron decays into a proton p, an electron e, and an antineutrino /n , has been widely detected, in contrast, the decay of protons, as in the reactions
p -> pi + pio
p -> /e + g
has never been seen despite its inherent plausibility. All three reactions satisfy conservation of energy and electric charge, yet only the first occurs in nature. However, one can explain the absence of the other reactions by the existence of another quantum property, the baryon number, that must also be conserved and that these two reaction would violate. Thus, another central task in particle physics involves the explanation of unobserved reactions through the postulation of new qantum numbers.
Other activities include the postulation of new particles, either on theoretical or empirical grounds, and the prediction of reactions that satisfy known conservation laws. Testing such predictions leads into the realm of experimental particle phsics, which we will not address here. But the above pursuits cover a wide range of behaviors that occur in this scientific field.
The above analysis of the discovery tasks suggests that six basic operations play a central role in particle physics. First, one must have a representation to receive and evaluate data about domain objects and events, Second, for a given set of particles, quantum numbers and observed reactions, one must be able to determine a set of quantum values that satisfy conservation for those reactions. Third, one must have a mechanism to explain the currently observed and unobservable reactions in terms of the constraints of the model. Fourth, one must be able to posit new quantum properties that account for the absence of unobserved reactions. Fifth, one requires an operator that posits new particles and determine their role in known reactions. Finally, one must have some mechanism for predicting reactions that have not yet been observed, but which follow from the current theoretical model. We have incorporated these operators into BR-4, where they play a central role in the process of theory formation and revision. (We will refer to them as Read-Data, Determine-Values, Explain-Event, Posit-Property, Posit-Particle, and Predict-Reaction, respectively.)
Operators of this sort must alter some internal representation that contains hypotheses about the particles, properties, and reactions that exist. This representation can take many forms, but following Valdes-Perez et al. (1993), one can view it as two matrices. One matrix lists particles against quantum properties, with each matrix entry specifying the value for a specific particle on a specific prorperty. The other matrix lists particles against reactions, with an entry containing the total number of times the particle occurs in the reaction. In this light, the operator for determining quantum values alters entries in the first matrix, whereas each of the other three operators (Posit-Property, Posit-Particle, and Predict-Reaction) extends one or both matrices along one of their dimensions.
In the next section we describe the knowledge representation and the discovery operators of BR-4 together with its control structure in modeling several different discovery taks ith illustrative examples from particle physics. This will be followed by a discussion on the system's methods and proections for future work. The paper ends with a summary of the conclusions drawn from this research.
2. The System's Knowledge Representation and Behavior
In this section we describe the program's knowledge representation methods and its behavior in modeling certain discoveries in particle physics. The program uses a structured knowledge representation similar to qualitative schemas as in AbE (O'Rorke et al, 1990) and the other recent discovery models.
2.1. Knowledge Representation
BR-4's knowledge organization distinguishes descriptive and prescriptive knowledge. The former type of knowledge is represented as frames, and the latter as a series of operators and functions. The program has six operators which are named as follows: Read-Data, Determine-Values, Explain-Event, Posit-Property, Posit-Particle and Predict-Reaction.
The main data items of BR-4 are elementary particles and their reactions. Both are represented as frames in the system's knowledge base. Particle frames include the name of the particle, the quantum properties and their values. The general form of a particle frame is as follows:
frame: P (frame name)
class : particle
q1 : v1
q2 : v2
.......
qn : vn.
where P is the name of the particle, q1,...,qn the quantum properties, and v1,...,vn the corresponding quantum values, which can be -1, 0, or 1.
Particle reactions are represented in a similar way, this time containing information about the reactions, such as the particles involved, the reaction conditions, the physical status of the reaction, and its validity under the current theory. The general form of a particle reaction frame is as follows:
frame: reaction
class : physical event
actual status : A
logical status : L, logical_status(N,L)
reactants : R
products : P
active properties : Q, active_properties(N,Q)
reactants properties : Rp, reactants_properties(Q,Rp)
products properties : Pp, products_properties(Q,Pp)
conditions : (Rp = Pp) or (Rp =/= Pp).
where A indicates whether the reaction has been physically observed or unobserved, and L indicates whether the reaction is valid or invalid under the current theoretical knowledge of the system. R and P are the lists of the particles involved in the reaction as the reactants and the products respectively. Q indicates the vector of quantum properties that play an active role in the reaction, while Rp and Pp are the quantum value vectors of the reactants and the products. Normally, particle reactions are added to the program's knowledge base (e.g. for the reaction (n --> p + e + /nu) as follows:
frame: r1
class = reaction
actual status = observed
reactants = [n]
products = [p,e,/nu].
Such input reaction frames are then transformed into the form below by the Read-Data operator acting on the parent frame:
frame: r1,
class = reaction
actual status = observed
logical status = valid
reactants = [n]
products = [p,e,/nu]
active properties = [q0, q1]
reactants properties = [1, 0]
products properties = [1, 0]
conditions = {[1,0] = [1,0]}.
The amended slots are added after their values are calculated by the Read-Data operator. In this wa, the system's domain theory is built, onwhich BR-4's other operators act as described below in a control structure summarized in Figure 1.
___________
| Read Data | <-- new data
|___________|
| |
__|____|___ ___________
| |--->| Explain |
| | |_Event_____|
| | _____|_____
| |--->| Determine |<------
| Domain |<---|_Value_____| |
| Theory | _____|_____ |
| |--->| Posit |_______|
| |<---|_Property__| |
| | _____|_____ |
| |--->| Posit |_______|
| |<---|_Particle__|
| | _____|_____
| |--->| Predict |
|___________|<---|_Reactions_|
Figure 1. BR-4's general control structure in the
discovery of quantum properties
2.2. Theory Formation and Revision
The program starts with a simple domain theory about several particles and a small number of observable reactions. BR-4's theory formation activites are driven by its Explain-Event operator which acts on particle reaction frames, looking for reactions which cannot be explained with the system's consistency and completeness contraints. The consistency condition states that any observed particle reaction must be valid by the system's domain theory, where validity is defined as compliance with the quantum conservation laws. An inconsistent reaction in this sense, is unexplainable by the Explain-Event operator.
There are two heuristics for eliminating such contradictions. One is to revise the quantum values of particles in a depth-first search with backtracking through the space of values, until a consistent value set is found. The second heuristic is to introduce a hidden particle to balance the reaction, in either the input or the output, positing that it actually takes part in the reaction but for some reason is not directly observable. The system then computes the property values for this particle, identifying it with an already known particle, or creating an entirely new particle. The first heuristic is applied by the Determine-Values operator and the second one by Posit-Particle.
The completeness condition is defined over unobserved reactions. Any unobserved particle reaction must be violating some quantum conservation law. If the domain theory of BR-4 contains an unobservable reaction that does not seem to violate a quantum conservation law, then this is also an unexplainable event for the Explain-Event operator. This means that the system's domain theory is incomplete regarding the unobserved reaction. In such cases, the system's Posit-Property operator takes control, which posits a new quantum property also to be conserved in observed particle reactions, but not by the unobserved reactions. Determining the values of this property requires search, first for the particles in the missing reaction, and an embedded search for the values of particles in other reactions. This search is carried out by the Determine-Values operator, and as before, if the system arrives at a partial combination of values that rules out an observed reaction or fails to eliminate the unobserved one, it backtracks and considers alternative paths until it finds an acceptable set.
We can extend the notion of incompleteness to include theories that do not explicitly specify all reactions that follow from them, as occurs when BR-4's Posit-Particle postulates a ne particle. In this situation, the system's Predict-Reactions operator systematically generates all possible reactions (decays and collisions) of the ne particle involving one, two or three other known particles. For each such tentative reaction R, the program predicts that R will occur if it conserves all known properties.
3. Illustrative Examples From Particle Physics
In this section we describe the behavior of BR-4 on three examples of discovery fom the history of particle physics, involving the neutrino, baryon and lepton numbers, and electron and muon numbers.
Table 1. The quantum values of particles known prior
to the discovery of the neutrino.
-----------------------------------------------------
Particle mass charge spin
g 0.0 0 1
e 0.51 -1 1/2
p 938.26 1 1/2
n 939.55 0 1/2
/e 0.51 1 1/2
n 0.0 0 1/2
/n 0.0 0 1/2
-----------------------------------------------------
3.1. Discovery of the Neutrino
Until the early 1930's, scientists knew only a few elementary particles, shown in Table 1 along with their mass and their values on the three known quantum properties, energy, charge and spin. The known reactions were also limited to a small number:
p + p --> p + p
e + /e --> g
g --> e + /e
This situation changed after the discovery of the neutron in 1932, when experiments on beta decay revealed the reaction
n --> p + e
in which a neutron decays into a proton and an electron. However, this reaction was problematic in that it violated the conservation of energy and spin, with the total energy and spin counts unbalanced in the reaction. Rather than abandon the conservation law, physicists postulated the presence of a new particle,* also generated during beta decay, that would balance out the missing energy and spin. Although not visible in the reaction, they inferred the property values for this particle from the values for the other particles in the decay process. They concluded that this neutrino has zero rest mass, no electrical charge, and a spin of one half.
Given the reactions above and the quantum numbers in Table 1, BR-4 responds in a similar manner. The system's Explain-Event operator cannot explain the fourth reaction, as it detect passes control to Posit-Property. This operator considers to assign alternative spin values in an attempt to find a consistent set of values that would balance the reaction. But in this case, BR-4 is not allowed to modify the spin values, as these are assumed to be correctly established by observation. This leaves revision of the unbalanced reaction as the
------------
* In the early 1930's there were serious debates among physicists as to the validity of the conservation laws in the subatomic world.
Table 2. Particle reactions that were (a) observed and (b) not observed
in experiments after the introduction of the particles in Table 1.
-----------------------------------------------------------------------
a) Observed reactions b) Unobserved reactions
p + p --> p + p p --> /e + g
n --> p + e + /nu p --> /e + e + /e
/e + e --> g p --> /e + g + g
g + p --> e + /e + p
/nu + p --> n + /e
nu + n --> p + e
-----------------------------------------------------------------------
only solution as the control passes to the Posit-Particle operator, which adds an extra particle to the output side of the reaction, giving
n --> p + e + nu.
Using the conservation laws, Determine-Values computes the charge and spin of the new particle, nu, as 0 and 1/2 respectively. Another possible revision would have added a new particle with opposite properties to /n, to the input side of the reaction, but physicists favored the former solution as they were thinking in terms of a decay process.
However, the inclusion of the neutrino and its antiparticle leaves the theory incomplete, in that they imply reactions with other known particles. BR-4's Predict-Reactions operator finds no decays for the neutrino, but it does find three collision reactions that are consistent with the theory:
/nu + p --> n + /e
nu + n --> p + e
nu + /nu --> g
which are predicted to be observed in experiments. The first two of these were later detected by physicists. The third reaction has a very low probablity and is rather difficult to detect.
3.2. Proposing Baryon and Lepton Numbers
The discovery of the neutrino left physicists with seven elementary particles,* having the properties and values shown in Table 1. Physicists realized that the existence of these particles, combined with known quantum conservation laws, implied a variety of reactions. Subsequent observations revealed evidence for the predicted reactions in Table 2 (a) but not for those shown in Table 2 (b). For some reason, the three predicted decays of the proton did not occur in nature. To explain this, physicists proposed a new quantum property, known as the baryon number.**
--------------------
* The neutrino-antineutrino distinction as experimentally verified in the late 1950's.
** Stuckelberg proposed this new quantum property in 1938 as the protonic charge which was later to be called the baryon number.
Table 3. The quantum values for elementary particles known in 1953
after the discovery of baryon and lepton numbers.
--------------------------------------------------------------------
Particle mass charge spin baryon lepton
g 0.00 0 1 0 0
e 0.51 -1 1/2 0 1
p 938.26 1 1/2 1 0
n 939.55 0 1/2 1 0
/e 0.51 1 1/2 0 -1
nu 0.00 0 1/2 0 1
/nu 0.00 0 1/2 0 -1
mu 105.60 -1 1/2 0 1
/mu 105.60 1 1/2 0 -1
pi 139.60 1 - 0 0
/pi 139.60 -1 - 0 0
pio 135.00 0 - 0 0
---------------------------------------------------------------------
BR-4's Predict-Reactions operator proposes the same reactions, but the Explain-Event operator cannot explain the absence of the reactions in Table 2 (b). The program selects the first reaction, p --> /e + g, and turns it into a set of inequalities, each based on a different combination of values for the particles involved. In this case, it would generate the four ineualities
1 =/= 0 + 0
1 =/= 1 + 1
0 =/= 1 + 0
0 =/= 0 + 1
The Determine-Values operator then selects one of these value sets, say the first, p =1, /e = 0, g = 0, and tests them in the observed reactions, say n --> p + e + /nu, this time treating it as an equality, and obtains
n = 1 + 0 + /nu
which leaves the property values for n and /nu unspecified. Two value sets are possible for this pair, n = 1, /nu = 0 and n = 0, /nu = -1. The first value set is consistent with all the then known reactions, while the second set is inconsistent with the reaction nu + n --> p + e. At any point, detection of an unbalanced reaction that violates conservation of the new property causing backtracking to one of the alternative value sets. If the search exhausts all such sets produced from observed reactions, the system backtracks further and considers alternative value sets generated from the unobserved reactions.
Given the experimental results in Table 2, BR-4 arrives at the value zero for all particles except the proton and neutron, to which it assigns the value one. These settings correspond to those obtained by physicists for the baryon number, which successfully explain the absence of the reactions in Table 2 (b).
Alternatively, by using the value set in the third inequality above, BR-4 would propose another quantum property by assigning the following values to particles: p = 0, n = 0, /e = -1, g = 0, and e = 1. These values correspond to the lepton numbers of elementar particles (see, Table 3).
Table 4. Some particle reactions that were (a) observed and (b) not
observed in experiments after the discovery of baryon and lepton numbers.
----------------------------------------------------------------------
a) Observed reactions b) Unobserved reactions
pi --> /nu + mu mu --> e + g
/pi --> mu + /nu pi --> /mu + g
mu --> e + nu + /nu pi --> /e + g
/mu --> /e + /nu + nu
pio --> g + /e + e
pio --> g + g
pio --> e + e + /e + /e
----------------------------------------------------------------------
In 1935, Yukawa had proposed the existence of additional particles with the mass of about 100 MeV in the nucleus. The reasoning behind Yukawa's proposal, which we have not attempted to model, involved energy calculations on atomic nuclei. Later, in the 1940s, observations on cosmic rays revealed five such particles: the muon (mu) and anti-muon (/mu), the pion (pi) and anti-pion (/pi), and the pion-zero (pio), along with the property values in Table 4. Baryon and lepton numbers could explain the possibility and absence of the reactions of these particles in the 1950s. Some of these reactions are given in Table 5(a) and 5 (b).
3.3. Electron and Muon Numbers
With the discovery of the baryon and lepton numbers, physicists had produced a theory, involving 12 elementary particles and four quantum properties plus the relativistic masses of the particles, that was apparently consistent and complete. Table 3 reflects this state of physical knowledge. Some skepticisms remained, such as for the neutrino, which seemed very difficult to observe for theoretical reasons. However, in 1953, experiments revealed indirect evidence for the reaction
/nu + p --> n + /e.
Unfortunately, this reaction occurred when the anti-neutrino n had been generated through beta decay (n p + e + n ), but not when produced through muon decay (m --> e + nu + /nu).
To resolve this dilemma, scientists postulated that the two reactions actually generated two distinct types of neutrinos, calling the former an electron neutrino (ne) and the latter a muon neutrino (nu_mu). This distinction (and the analogous one for anti-neutrinos) introduced two additional rows in the table of particles. However, it also produced the unobserved reactions shown in Table 5(b), which physicists again sought to explain by introducing yet another property, which they named the electron number.
Our model cannot directly explain the historical distinction into two classes of neutrinos, but we believe it constitutes a variation on the heuristic for postulating new particles that originally led to inference of the neutrino. Once this distinction has been made, BR-4 realizes that its current theory is incomplete, in that it cannot explain the unobserved reactions involving the muon neutrino and its antiparticle. Postulating a new property, it searches the space of values using the same process as it used for the baryon and lepton numbers. The resulting values agree with those proposed by physicists for the electron number, but are not sufficient to rule out the unobserved reaction (pi --> /mu + g). Explanation of this omission requires introduction of yet another quantum property, this one corresponding to the muon number, which physicists postulated in 1962.
Table 5. Some particle reactions that were (a) observed and (b) not
observed in experiments after introducing distinction between electron
neutrinos (nu_e) and muon neutrinos (nu_mu).
----------------------------------------------------------------------
a) Observed reactions b) Unobserved reactions
pi --> /mu _ nu_mu mu --> e + g
/pi --> mu + /nu_mu /nu_mu + p --> n + /e
mu --> e + /nu_e + nu_mu nu_mu + n --> p + e
/mu --> /e + nu_e + /nu_mu pi --> /mu + g
pio --> g + e + /e pi --> /e + g
pio --> g + g
pio --> e + /e + e + /e
----------------------------------------------------------------------
4. Discussion of the Framework
Now that we have seen some examples of BR-4's operation, we can consider the implications of the model for research on scientific creativity, related work on scientific discovery, and some directions for future research on this topic.
4.1 Implications of the Model
Modern scientific research is one of the most complex human activities, requiring the use of different types of general and specific knowledge. It can also involve more than a dozen different search spaces ranging from scientific problem formulation through data collection and evaluation, to hypothesis formation, theory formation and theory revision (see, Kocabas, 1993). Within the research activities, different types of discovery and creativity can be distinguished as logico-mathematical, formal, theoretical and empirical discovery. Current computational models have shortcomings in capturing the details of historical discoveries for reasons described by Tweney (1990). However, this should not diminish their usefulness, as they can provide an overall look into the structure of the developments of theories both in their formation and revision processes. They can also be useful in analyzing the historical progress of scientific ideas and of the possibility of alternative ideas together with their implications.
In this study, our aim has not been to model the historical details of particle physics, but to show that certain computational mechanisms can account for theory formation and revision in this domain. The basic mechanisms in BR-4 -- search guided by heuristic knowledge -- bears close resemblance to those implicated in normal human problem solving, as studied by Newell and Simon (1972), as well as many others.
If correct, this view suggests that some of the creative activities in particle physics has much in common with everyday reasoning. However, modern scientific reasoning is much more reliant on logico-mathematical, theoretical and methodological knowledge than everyday reasoning in addition to empirical and commonsense knowledge. Unlike simple search spaces dealt with in everyday reasoning, it also has to deal with a number of different search spaces at the same time if it has to result in discoveries, or even to make progress at all (see, e.g., Klahr, 1994; Kocabas, 1993). Our model operates only in the spaces of empirical hypothesis and theory formation, event prediction, problem formulation and theory revision.
Previous models of scientific discovery, such as those described by Langley, Simon, Bradshaw, and Zytkow (1987), have taken a similar stance on the creative process. However, most such work has focused on limited aspects of scientific reasoning, such as the discovery of laws or the formation of structural theories.
With BR-4, we have attempted to cover a broader range of the discovery process within a unified framework. We described how the system formulates new problems whenever new data reveals its current theory to be either inconsistent or incomplete. In handling problems of inconsistency, BR-4 relies on depth-first search guided by algebraic and domain heuristics to explore the space of values for quantum properties, resorting to the postulation of new particles only if its search fails.
In dealing with incompleteness, the model predicts new reactions that follow from the introduction of new particles and posits new quantum properties to explain why some of these reactions never occur. The introduction of new particles and new properties constitute important examples of theory formation.
Our system does not provide a detailed account of the historical record, but it does explain several impressive discoveries at a more abstract level, using simple mechanisms of a familiar kind. This limited success provides further evidence that at least some types of scientific creativity does not require any special processes, but can be explained as a straightforward extension of existing theories of human cognition.
4.2 Related Work on Scientific Discovery
Our computational model of discovery draws many of its ideas from earlier work in this area. BR-4 is a direct descendant of Zytkow and Simon's (1986) STAHL, which modeled a variety of qualitative discoveries in the history of chemistry. The detection of inconsistencies in reactions played a central role in this system, with one of its responses being the introduction of new elements like phlogiston, which served much the same role in early chemistry as the neutrino did in particle physics.
Rose and Langley (1986) described STAHLp, a rational reconstruction of the earlier system that showed all of its discoveries could be explained in terms of inconsistencies and their resolution. In addition, they used the system to model a number of other reaction-oriented discoveries from the history of science. Moreover, their approach showed that dependency-directed reasoning simplified the theory revision process, letting their STAHLp handle problems with a search-control scheme that relied on simple hill climbing.
The BR-3 system, presented by Kocabas (1991), extended this framework to include the detection of incomplete theories, and the postulation of new properties to explain the absence of reactions. Kocabas applied this idea to the history of particle physics, using it to explain both the origin of several quantum numbers and the particular values assigned to them by scientists. In related work (Kocabas, 1994), he described another system TREV which formulates new particles and new reactions, but this system does not integrate these functions in its discovery process. BR-3 was the immediate precursor of BR-4, differing mainly in that the former lacked the ability to postulate new particles and to predict new reactions.
Valdes-Perez (in press) has described an alternative approach to discovery in particle physics, which he has implemented in the PAULI system. This scheme use a variation on linear programming to search the space of property values, subject to constraints that reflect observed and unobserved reactions. Also, Fischer and Zytkow (1992) have reported on GELL-MANN, a system designed to explain the formation of the quark theory, which also carries out a search through a space of parameter values subject to constraints.
A more general framework, proposed by Valdes-Perez, Simon, and Zytkow (1993), views the process of formulating structural models in terms of matrix operations. They show how many existing systems, including those described above, can be viewed in this light, with the basic operations involving the extension of a matrix along one or more dimensions and the revision of entries in the cells of the matrix. Our own BR-4 system also fits well into this framework, as suggested by our use of Valdes-Perez et al.'s terminology in Section 2.
Other research on theory revision seems less closely related. Rajamoney's (1990) COAST system designs experiments to distinguish between alternative structural models in physics, and Karp's (1990) HypGene uses a similar idea for biological theories. Kulkarni and Simon (1990) describe KEKADA, a computational model that integrates theory revision, experiment design, and problem formulation to model Krebs' discovery of the urea cycle. Shrager and Langley (1990) consider the relations among these systems in more detail.
4.3. Directions for Future Work
Although BR-4 provides an abstract account for some important developments in the history of particle physics, there remains considerable room for extensions to the model. One direction for improvement involves the notion of explanation. In some sense, the current system formulates explanations when it finds that a newly observed reaction is consistent with the existing theory or when it proposes a new property that rules out an unobserved reaction. However, BR-4 does not generate an explicit proof or other structure that connects assumptions and observations. In future work, we plan to model the explanatory process in more detail, with the system deducing the presence or absence of specific reactions from declara tive statements of quantum properties and conservation laws. In turn, this may let us recast BR-4's operators in terms of an abduction process (Ng & Mooney, 1990; O'Rorke et al., 1990) that modifies assumptions to explain known phenomena.
We also hope to extend the system to handle the introduction of componential models, which describe particles at one level as combinations of more primitive particles. Langley et al.'s (1987) DALTON took some initial steps along these lines to explain the relations between chemical molecules and elements, but we believe that we can adapt BR-4 to explain the origins of the quark theory and its alternatives. The basic task here involves explaining why elementary particles with some quantum properties exist and others do not. The constraints of consistency and completeness, which play such a central role in BR-4, seem well suited for this problem, which involves postulating new component particles (quarks), then searching the space of quantum values and their compositions that satisfy certain constraints (e.g., symmetry) for known particles and violate these constraints for nonexistent ones.
Finally, like most other models of scientific discovery, BR-4 ignores the interactions that occur among different researchers. Scientists cooperate along some dimensions, with theorists passing on predictions to experimentalists, who in turn report their observations to theorists. They also compete in developing theories to explain new findings, in discovering evidence for predicted events, and by noting errors in others' reasoning. The history of particle physics is rich in examples of such interactions, and we believe that some revisions to BR-4 will let us model some of them. In particular, we plan to assign different facets of the system's domain knowledge to different agents, which would communicate through a common representation; we will also let different agents explore different branches when search suggests alternative solutions.
5. Concluding Remarks
In this paper we presented BR-4, an abstract computational model of scientific discovery. We examined the system's behavior on three problems from particle physics, showing that it can replicate, though in a schematic way, important steps in the historical development of this field, some of which were considered major discoveries when first introduced. In particular, BR-4 proposes the existence of the neutrino to avoid violating conservation of spin, it invents baryon and lepton numbers to explain the absence of reactions involving proton decay, and it postulates electron and muon numbers to rule out unobserved neutrino reactions. In addition, the system can determine appropriate quantum values for each particle, and it can predict the reactions implied by a set of particles and quantum properties.
The BR-4 model accomplishes these feats using simple processes that play a central role in many aspects of human cognition. The system employs four basic operators for determining property values, creating new properties, positing new particles, and predicting reactions. Moreover, it uses consistency and completeness constraints to selectively apply these operators, and it incorporates depth-first control scheme to carry out search when necessary. The simplicity of these mechanisms, and their similarity to other processes observed in human behavior, suggest that one can explain some aspects of scientific creativity in similar terms.
References
Fischer, P., & Zytkow, J. M. (1992). Incremental generation and exploration of hidden structure. Proceedings of the ML92 Workshop on Machine Discovery.
Karp, P. (1990). Hypothesis formation as design. In Shrager, J., and Langley P., eds.,Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Klahr, D. (1994). Extended abstract: Children, adults and machines as discovery systems. Machine Learning, 14, 313-320.
Kocabas, S. (1991). Conflict resolution as discovery in particle physics. Machine Learning, 6 , 277-309.
Kocabas, S. (1992). Elements of scientific research: Modeling discoveries in oxide superconductivity. Proceedings of the ML 92 Workshop on Machine Discovery. Aberdeen, Scotland. (pp. 63-70).
Kocabas, S. (1993). Elements of Scientific Creativity. In Technical Report: Artificial Intelligence and Creativity . AAAI Press, pp. 39-45.
Kocabas, S. (1994). Goal directed discovery and explanation in particle physics. In Working Notes: Goal Driven Learning, AAAI Spring Symposium Series. (pp. 54-61).
Kulkarni, D. and Simon, H. (1988). The processes of scientific discovery. Cognitive Science,12, 139-175.
Langley, P., Simon, H. A., Bradshaw, G. L., & Zytkow, J. M. (1987). Scientific discovery: Computational explorations of the creative processes. Cambridge, MA: MIT Press.
Ng, ?. and Mooney, ? (199?). ...
O'Rorke, P., Morris, S. and Schulenburg, D. (1990). Theory formation by abstraction. In Shrager, J., and Langley P. eds. Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rajamoney, S.A. (1990). A computational approach to theory revision. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rose, D., & Langley, P. (1986). Chemical discovery as belief revision. Machine Learning, 1, 423-451.
Shen, W. M., & Simon, H. A. (1989). Rule creation and rule learning through environmental exploration. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. ***--***). Detroit, MI: Morgan Kaufmann.
Shrager, J. and Langley, P. Computational approaches to scientific discovery.In Shrager, J., and Langley P., eds.,Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Tweney, R.D. (1990). In Shrager, J. and Langley, P. (eds.) Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann, San Mateo, CA.
Valdes-Perez, R. E. (in press). Discovery of conserved properties in particle physics: A comparison of two models. Machine Learning.
Valdes-Perez, R. E., Zytkow, J. M., & Simon, H. A. (1993). Scientific model building as search in matrix spaces. Proceedings of the Eleventh National Conference on Artificial Intelligence (pp. 472-478). Washington, DC: AAAI Press.
Zytkow, J.M. and Simon, H. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
MODELING DISCOVERIES IN PARTICLE PHYSICS
Sakir Kocabas
Pat Langley
(langley @ cs.stanford.edu)
Robotics Laboratory, Computer Science Dept.,
Stanford University, Stanford, CA 94305 USA
Abstract:
This paper describes a discovery system, BR-4, which integrates several research tasks in modeling the discovery of certain quantum properties and conservation laws by physicists in this century. The program is directed by consistency and completeness constraints, and has the capabilities of theory formation and theory revision in its domain, and of explaining its knowledge state by these constraints . BR-4 is capable of formulating new elementary particles and particle reactions, and proposing observations to test their existence. The program revises its domain theory when it detects formal and theoretical contradictions, and when its domain theory conflicts with observational data.
-----------
* Also affiliated with ITU, Faculty of Space Sciences and Technology, Istanbul, Turkey.
** Also affiliated with the Institute for the Study of Learning and Expertise, 2451 High St., Palo Alto, CA 94301 USA.
1. Introduction
Computational modeling of discovery has been the focus of attention by several research groups in the last ten years, and a number of models with different capabilities have been developed. These capabilities include goal selection, experime nt design, data collection, expectation setting, quantitative reasoning, concept formation, hypothesis formation, theory formation, theory revision, explanation, and paradigm shifts by qualitative models. In current models only a few of these discovery tasks have been integrated in one system. The integration of discovery tasks continues to be a difficult problem in this reearch area of artificial intelligence.
The subject of this paper is an integrated discovery model BR-4, with the capabilities of theory formation, event prediction, data acquisition, explanation, and theory revision. Before we describe the system and its behavior, it is appropri ate to present some background information about its task domain, particle physics.
1.1. The Domain of Particle Physics
Particle physics studies the nature of elementary particles - the building blocks of matter - and interactions among these entities. The basic phenomena in this field take the form of reactions, similar in many ways to those found in chemistry. For instance, two such observed reactions* are
p + p --> p + n + pi
pio --> g + g
where the symbols p, n, pi, pio and g represent the proton, neutron, pion, pion-zero and gamma particles, respectively.
As in chemistry, physics require that reactions among elementary particles obey certain conservation laws. For instance, one of the most basic laws states that any such reaction conserve electric charge of the particles involved. Electric charge is an example of a quantum property, and one of the main tasks in particle physics concerns the assignment of values for quantum properties such that observed reactions conserve those properties. Thus, both of the above reactions conserve electric charge provided we assign the commonly accepted charges 1 to p, 0 to n, 1 to pi, 0 to pio, and 0 to g. Other assignments are also possible for this pair of reactions, but they would not be consistent with other observed particles.
-----------
* Typically, physicists infer the occurence of such reactions from tracks in cloud chambers and similar evidence. We will not attempt to model this inference process here, and instead will simply treat reactions as though they are directly observed.
The concern with conservation also explains why some particle reactions are never observed. For example, the process of beta decay,
n --> p + e + /nu,
in which a neutron decays into a proton p, an electron e, and an antineutrino /n , has been widely detected, in contrast, the decay of protons, as in the reactions
p -> pi + pio
p -> /e + g
has never been seen despite its inherent plausibility. All three reactions satisfy conservation of energy and electric charge, yet only the first occurs in nature. However, one can explain the absence of the other reactions by the existence of another quantum property, the baryon number, that must also be conserved and that these two reaction would violate. Thus, another central task in particle physics involves the explanation of unobserved reactions through the postulation of new qantum numbers.
Other activities include the postulation of new particles, either on theoretical or empirical grounds, and the prediction of reactions that satisfy known conservation laws. Testing such predictions leads into the realm of experimental particle phsics, which we will not address here. But the above pursuits cover a wide range of behaviors that occur in this scientific field.
The above analysis of the discovery tasks suggests that six basic operations play a central role in particle physics. First, one must have a representation to receive and evaluate data about domain objects and events, Second, for a given set of particles, quantum numbers and observed reactions, one must be able to determine a set of quantum values that satisfy conservation for those reactions. Third, one must have a mechanism to explain the currently observed and unobservable reactions in terms of the constraints of the model. Fourth, one must be able to posit new quantum properties that account for the absence of unobserved reactions. Fifth, one requires an operator that posits new particles and determine their role in known reactions. Finally, one must have some mechanism for predicting reactions that have not yet been observed, but which follow from the current theoretical model. We have incorporated these operators into BR-4, where they play a central role in the process of theory formation and revision. (We will refer to them as Read-Data, Determine-Values, Explain-Event, Posit-Property, Posit-Particle, and Predict-Reaction, respectively.)
Operators of this sort must alter some internal representation that contains hypotheses about the particles, properties, and reactions that exist. This representation can take many forms, but following Valdes-Perez et al. (1993), one can view it as two matrices. One matrix lists particles against quantum properties, with each matrix entry specifying the value for a specific particle on a specific prorperty. The other matrix lists particles against reactions, with an entry containing the total number of times the particle occurs in the reaction. In this light, the operator for determining quantum values alters entries in the first matrix, whereas each of the other three operators (Posit-Property, Posit-Particle, and Predict-Reaction) extends one or both matrices along one of their dimensions.
In the next section we describe the knowledge representation and the discovery operators of BR-4 together with its control structure in modeling several different discovery taks ith illustrative examples from particle physics. This will be followed by a discussion on the system's methods and proections for future work. The paper ends with a summary of the conclusions drawn from this research.
2. The System's Knowledge Representation and Behavior
In this section we describe the program's knowledge representation methods and its behavior in modeling certain discoveries in particle physics. The program uses a structured knowledge representation similar to qualitative schemas as in AbE (O'Rorke et al, 1990) and the other recent discovery models.
2.1. Knowledge Representation
BR-4's knowledge organization distinguishes descriptive and prescriptive knowledge. The former type of knowledge is represented as frames, and the latter as a series of operators and functions. The program has six operators which are named as follows: Read-Data, Determine-Values, Explain-Event, Posit-Property, Posit-Particle and Predict-Reaction.
The main data items of BR-4 are elementary particles and their reactions. Both are represented as frames in the system's knowledge base. Particle frames include the name of the particle, the quantum properties and their values. The general form of a particle frame is as follows:
frame: P (frame name)
class : particle
q1 : v1
q2 : v2
.......
qn : vn.
where P is the name of the particle, q1,...,qn the quantum properties, and v1,...,vn the corresponding quantum values, which can be -1, 0, or 1.
Particle reactions are represented in a similar way, this time containing information about the reactions, such as the particles involved, the reaction conditions, the physical status of the reaction, and its validity under the current theory. The general form of a particle reaction frame is as follows:
frame: reaction
class : physical event
actual status : A
logical status : L, logical_status(N,L)
reactants : R
products : P
active properties : Q, active_properties(N,Q)
reactants properties : Rp, reactants_properties(Q,Rp)
products properties : Pp, products_properties(Q,Pp)
conditions : (Rp = Pp) or (Rp =/= Pp).
where A indicates whether the reaction has been physically observed or unobserved, and L indicates whether the reaction is valid or invalid under the current theoretical knowledge of the system. R and P are the lists of the particles involved in the reaction as the reactants and the products respectively. Q indicates the vector of quantum properties that play an active role in the reaction, while Rp and Pp are the quantum value vectors of the reactants and the products. Normally, particle reactions are added to the program's knowledge base (e.g. for the reaction (n --> p + e + /nu) as follows:
frame: r1
class = reaction
actual status = observed
reactants = [n]
products = [p,e,/nu].
Such input reaction frames are then transformed into the form below by the Read-Data operator acting on the parent frame:
frame: r1,
class = reaction
actual status = observed
logical status = valid
reactants = [n]
products = [p,e,/nu]
active properties = [q0, q1]
reactants properties = [1, 0]
products properties = [1, 0]
conditions = {[1,0] = [1,0]}.
The amended slots are added after their values are calculated by the Read-Data operator. In this wa, the system's domain theory is built, onwhich BR-4's other operators act as described below in a control structure summarized in Figure 1.
___________
| Read Data | <-- new data
|___________|
| |
__|____|___ ___________
| |--->| Explain |
| | |_Event_____|
| | _____|_____
| |--->| Determine |<------
| Domain |<---|_Value_____| |
| Theory | _____|_____ |
| |--->| Posit |_______|
| |<---|_Property__| |
| | _____|_____ |
| |--->| Posit |_______|
| |<---|_Particle__|
| | _____|_____
| |--->| Predict |
|___________|<---|_Reactions_|
Figure 1. BR-4's general control structure in the
discovery of quantum properties
2.2. Theory Formation and Revision
The program starts with a simple domain theory about several particles and a small number of observable reactions. BR-4's theory formation activites are driven by its Explain-Event operator which acts on particle reaction frames, looking for reactions which cannot be explained with the system's consistency and completeness contraints. The consistency condition states that any observed particle reaction must be valid by the system's domain theory, where validity is defined as compliance with the quantum conservation laws. An inconsistent reaction in this sense, is unexplainable by the Explain-Event operator.
There are two heuristics for eliminating such contradictions. One is to revise the quantum values of particles in a depth-first search with backtracking through the space of values, until a consistent value set is found. The second heuristic is to introduce a hidden particle to balance the reaction, in either the input or the output, positing that it actually takes part in the reaction but for some reason is not directly observable. The system then computes the property values for this particle, identifying it with an already known particle, or creating an entirely new particle. The first heuristic is applied by the Determine-Values operator and the second one by Posit-Particle.
The completeness condition is defined over unobserved reactions. Any unobserved particle reaction must be violating some quantum conservation law. If the domain theory of BR-4 contains an unobservable reaction that does not seem to violate a quantum conservation law, then this is also an unexplainable event for the Explain-Event operator. This means that the system's domain theory is incomplete regarding the unobserved reaction. In such cases, the system's Posit-Property operator takes control, which posits a new quantum property also to be conserved in observed particle reactions, but not by the unobserved reactions. Determining the values of this property requires search, first for the particles in the missing reaction, and an embedded search for the values of particles in other reactions. This search is carried out by the Determine-Values operator, and as before, if the system arrives at a partial combination of values that rules out an observed reaction or fails to eliminate the unobserved one, it backtracks and considers alternative paths until it finds an acceptable set.
We can extend the notion of incompleteness to include theories that do not explicitly specify all reactions that follow from them, as occurs when BR-4's Posit-Particle postulates a ne particle. In this situation, the system's Predict-Reactions operator systematically generates all possible reactions (decays and collisions) of the ne particle involving one, two or three other known particles. For each such tentative reaction R, the program predicts that R will occur if it conserves all known properties.
3. Illustrative Examples From Particle Physics
In this section we describe the behavior of BR-4 on three examples of discovery fom the history of particle physics, involving the neutrino, baryon and lepton numbers, and electron and muon numbers.
Table 1. The quantum values of particles known prior
to the discovery of the neutrino.
-----------------------------------------------------
Particle mass charge spin
g 0.0 0 1
e 0.51 -1 1/2
p 938.26 1 1/2
n 939.55 0 1/2
/e 0.51 1 1/2
n 0.0 0 1/2
/n 0.0 0 1/2
-----------------------------------------------------
3.1. Discovery of the Neutrino
Until the early 1930's, scientists knew only a few elementary particles, shown in Table 1 along with their mass and their values on the three known quantum properties, energy, charge and spin. The known reactions were also limited to a small number:
p + p --> p + p
e + /e --> g
g --> e + /e
This situation changed after the discovery of the neutron in 1932, when experiments on beta decay revealed the reaction
n --> p + e
in which a neutron decays into a proton and an electron. However, this reaction was problematic in that it violated the conservation of energy and spin, with the total energy and spin counts unbalanced in the reaction. Rather than abandon the conservation law, physicists postulated the presence of a new particle,* also generated during beta decay, that would balance out the missing energy and spin. Although not visible in the reaction, they inferred the property values for this particle from the values for the other particles in the decay process. They concluded that this neutrino has zero rest mass, no electrical charge, and a spin of one half.
Given the reactions above and the quantum numbers in Table 1, BR-4 responds in a similar manner. The system's Explain-Event operator cannot explain the fourth reaction, as it detect passes control to Posit-Property. This operator considers to assign alternative spin values in an attempt to find a consistent set of values that would balance the reaction. But in this case, BR-4 is not allowed to modify the spin values, as these are assumed to be correctly established by observation. This leaves revision of the unbalanced reaction as the
------------
* In the early 1930's there were serious debates among physicists as to the validity of the conservation laws in the subatomic world.
Table 2. Particle reactions that were (a) observed and (b) not observed
in experiments after the introduction of the particles in Table 1.
-----------------------------------------------------------------------
a) Observed reactions b) Unobserved reactions
p + p --> p + p p --> /e + g
n --> p + e + /nu p --> /e + e + /e
/e + e --> g p --> /e + g + g
g + p --> e + /e + p
/nu + p --> n + /e
nu + n --> p + e
-----------------------------------------------------------------------
only solution as the control passes to the Posit-Particle operator, which adds an extra particle to the output side of the reaction, giving
n --> p + e + nu.
Using the conservation laws, Determine-Values computes the charge and spin of the new particle, nu, as 0 and 1/2 respectively. Another possible revision would have added a new particle with opposite properties to /n, to the input side of the reaction, but physicists favored the former solution as they were thinking in terms of a decay process.
However, the inclusion of the neutrino and its antiparticle leaves the theory incomplete, in that they imply reactions with other known particles. BR-4's Predict-Reactions operator finds no decays for the neutrino, but it does find three collision reactions that are consistent with the theory:
/nu + p --> n + /e
nu + n --> p + e
nu + /nu --> g
which are predicted to be observed in experiments. The first two of these were later detected by physicists. The third reaction has a very low probablity and is rather difficult to detect.
3.2. Proposing Baryon and Lepton Numbers
The discovery of the neutrino left physicists with seven elementary particles,* having the properties and values shown in Table 1. Physicists realized that the existence of these particles, combined with known quantum conservation laws, implied a variety of reactions. Subsequent observations revealed evidence for the predicted reactions in Table 2 (a) but not for those shown in Table 2 (b). For some reason, the three predicted decays of the proton did not occur in nature. To explain this, physicists proposed a new quantum property, known as the baryon number.**
--------------------
* The neutrino-antineutrino distinction as experimentally verified in the late 1950's.
** Stuckelberg proposed this new quantum property in 1938 as the protonic charge which was later to be called the baryon number.
Table 3. The quantum values for elementary particles known in 1953
after the discovery of baryon and lepton numbers.
--------------------------------------------------------------------
Particle mass charge spin baryon lepton
g 0.00 0 1 0 0
e 0.51 -1 1/2 0 1
p 938.26 1 1/2 1 0
n 939.55 0 1/2 1 0
/e 0.51 1 1/2 0 -1
nu 0.00 0 1/2 0 1
/nu 0.00 0 1/2 0 -1
mu 105.60 -1 1/2 0 1
/mu 105.60 1 1/2 0 -1
pi 139.60 1 - 0 0
/pi 139.60 -1 - 0 0
pio 135.00 0 - 0 0
---------------------------------------------------------------------
BR-4's Predict-Reactions operator proposes the same reactions, but the Explain-Event operator cannot explain the absence of the reactions in Table 2 (b). The program selects the first reaction, p --> /e + g, and turns it into a set of inequalities, each based on a different combination of values for the particles involved. In this case, it would generate the four ineualities
1 =/= 0 + 0
1 =/= 1 + 1
0 =/= 1 + 0
0 =/= 0 + 1
The Determine-Values operator then selects one of these value sets, say the first, p =1, /e = 0, g = 0, and tests them in the observed reactions, say n --> p + e + /nu, this time treating it as an equality, and obtains
n = 1 + 0 + /nu
which leaves the property values for n and /nu unspecified. Two value sets are possible for this pair, n = 1, /nu = 0 and n = 0, /nu = -1. The first value set is consistent with all the then known reactions, while the second set is inconsistent with the reaction nu + n --> p + e. At any point, detection of an unbalanced reaction that violates conservation of the new property causing backtracking to one of the alternative value sets. If the search exhausts all such sets produced from observed reactions, the system backtracks further and considers alternative value sets generated from the unobserved reactions.
Given the experimental results in Table 2, BR-4 arrives at the value zero for all particles except the proton and neutron, to which it assigns the value one. These settings correspond to those obtained by physicists for the baryon number, which successfully explain the absence of the reactions in Table 2 (b).
Alternatively, by using the value set in the third inequality above, BR-4 would propose another quantum property by assigning the following values to particles: p = 0, n = 0, /e = -1, g = 0, and e = 1. These values correspond to the lepton numbers of elementar particles (see, Table 3).
Table 4. Some particle reactions that were (a) observed and (b) not
observed in experiments after the discovery of baryon and lepton numbers.
----------------------------------------------------------------------
a) Observed reactions b) Unobserved reactions
pi --> /nu + mu mu --> e + g
/pi --> mu + /nu pi --> /mu + g
mu --> e + nu + /nu pi --> /e + g
/mu --> /e + /nu + nu
pio --> g + /e + e
pio --> g + g
pio --> e + e + /e + /e
----------------------------------------------------------------------
In 1935, Yukawa had proposed the existence of additional particles with the mass of about 100 MeV in the nucleus. The reasoning behind Yukawa's proposal, which we have not attempted to model, involved energy calculations on atomic nuclei. Later, in the 1940s, observations on cosmic rays revealed five such particles: the muon (mu) and anti-muon (/mu), the pion (pi) and anti-pion (/pi), and the pion-zero (pio), along with the property values in Table 4. Baryon and lepton numbers could explain the possibility and absence of the reactions of these particles in the 1950s. Some of these reactions are given in Table 5(a) and 5 (b).
3.3. Electron and Muon Numbers
With the discovery of the baryon and lepton numbers, physicists had produced a theory, involving 12 elementary particles and four quantum properties plus the relativistic masses of the particles, that was apparently consistent and complete. Table 3 reflects this state of physical knowledge. Some skepticisms remained, such as for the neutrino, which seemed very difficult to observe for theoretical reasons. However, in 1953, experiments revealed indirect evidence for the reaction
/nu + p --> n + /e.
Unfortunately, this reaction occurred when the anti-neutrino n had been generated through beta decay (n p + e + n ), but not when produced through muon decay (m --> e + nu + /nu).
To resolve this dilemma, scientists postulated that the two reactions actually generated two distinct types of neutrinos, calling the former an electron neutrino (ne) and the latter a muon neutrino (nu_mu). This distinction (and the analogous one for anti-neutrinos) introduced two additional rows in the table of particles. However, it also produced the unobserved reactions shown in Table 5(b), which physicists again sought to explain by introducing yet another property, which they named the electron number.
Our model cannot directly explain the historical distinction into two classes of neutrinos, but we believe it constitutes a variation on the heuristic for postulating new particles that originally led to inference of the neutrino. Once this distinction has been made, BR-4 realizes that its current theory is incomplete, in that it cannot explain the unobserved reactions involving the muon neutrino and its antiparticle. Postulating a new property, it searches the space of values using the same process as it used for the baryon and lepton numbers. The resulting values agree with those proposed by physicists for the electron number, but are not sufficient to rule out the unobserved reaction (pi --> /mu + g). Explanation of this omission requires introduction of yet another quantum property, this one corresponding to the muon number, which physicists postulated in 1962.
Table 5. Some particle reactions that were (a) observed and (b) not
observed in experiments after introducing distinction between electron
neutrinos (nu_e) and muon neutrinos (nu_mu).
----------------------------------------------------------------------
a) Observed reactions b) Unobserved reactions
pi --> /mu _ nu_mu mu --> e + g
/pi --> mu + /nu_mu /nu_mu + p --> n + /e
mu --> e + /nu_e + nu_mu nu_mu + n --> p + e
/mu --> /e + nu_e + /nu_mu pi --> /mu + g
pio --> g + e + /e pi --> /e + g
pio --> g + g
pio --> e + /e + e + /e
----------------------------------------------------------------------
4. Discussion of the Framework
Now that we have seen some examples of BR-4's operation, we can consider the implications of the model for research on scientific creativity, related work on scientific discovery, and some directions for future research on this topic.
4.1 Implications of the Model
Modern scientific research is one of the most complex human activities, requiring the use of different types of general and specific knowledge. It can also involve more than a dozen different search spaces ranging from scientific problem formulation through data collection and evaluation, to hypothesis formation, theory formation and theory revision (see, Kocabas, 1993). Within the research activities, different types of discovery and creativity can be distinguished as logico-mathematical, formal, theoretical and empirical discovery. Current computational models have shortcomings in capturing the details of historical discoveries for reasons described by Tweney (1990). However, this should not diminish their usefulness, as they can provide an overall look into the structure of the developments of theories both in their formation and revision processes. They can also be useful in analyzing the historical progress of scientific ideas and of the possibility of alternative ideas together with their implications.
In this study, our aim has not been to model the historical details of particle physics, but to show that certain computational mechanisms can account for theory formation and revision in this domain. The basic mechanisms in BR-4 -- search guided by heuristic knowledge -- bears close resemblance to those implicated in normal human problem solving, as studied by Newell and Simon (1972), as well as many others.
If correct, this view suggests that some of the creative activities in particle physics has much in common with everyday reasoning. However, modern scientific reasoning is much more reliant on logico-mathematical, theoretical and methodological knowledge than everyday reasoning in addition to empirical and commonsense knowledge. Unlike simple search spaces dealt with in everyday reasoning, it also has to deal with a number of different search spaces at the same time if it has to result in discoveries, or even to make progress at all (see, e.g., Klahr, 1994; Kocabas, 1993). Our model operates only in the spaces of empirical hypothesis and theory formation, event prediction, problem formulation and theory revision.
Previous models of scientific discovery, such as those described by Langley, Simon, Bradshaw, and Zytkow (1987), have taken a similar stance on the creative process. However, most such work has focused on limited aspects of scientific reasoning, such as the discovery of laws or the formation of structural theories.
With BR-4, we have attempted to cover a broader range of the discovery process within a unified framework. We described how the system formulates new problems whenever new data reveals its current theory to be either inconsistent or incomplete. In handling problems of inconsistency, BR-4 relies on depth-first search guided by algebraic and domain heuristics to explore the space of values for quantum properties, resorting to the postulation of new particles only if its search fails.
In dealing with incompleteness, the model predicts new reactions that follow from the introduction of new particles and posits new quantum properties to explain why some of these reactions never occur. The introduction of new particles and new properties constitute important examples of theory formation.
Our system does not provide a detailed account of the historical record, but it does explain several impressive discoveries at a more abstract level, using simple mechanisms of a familiar kind. This limited success provides further evidence that at least some types of scientific creativity does not require any special processes, but can be explained as a straightforward extension of existing theories of human cognition.
4.2 Related Work on Scientific Discovery
Our computational model of discovery draws many of its ideas from earlier work in this area. BR-4 is a direct descendant of Zytkow and Simon's (1986) STAHL, which modeled a variety of qualitative discoveries in the history of chemistry. The detection of inconsistencies in reactions played a central role in this system, with one of its responses being the introduction of new elements like phlogiston, which served much the same role in early chemistry as the neutrino did in particle physics.
Rose and Langley (1986) described STAHLp, a rational reconstruction of the earlier system that showed all of its discoveries could be explained in terms of inconsistencies and their resolution. In addition, they used the system to model a number of other reaction-oriented discoveries from the history of science. Moreover, their approach showed that dependency-directed reasoning simplified the theory revision process, letting their STAHLp handle problems with a search-control scheme that relied on simple hill climbing.
The BR-3 system, presented by Kocabas (1991), extended this framework to include the detection of incomplete theories, and the postulation of new properties to explain the absence of reactions. Kocabas applied this idea to the history of particle physics, using it to explain both the origin of several quantum numbers and the particular values assigned to them by scientists. In related work (Kocabas, 1994), he described another system TREV which formulates new particles and new reactions, but this system does not integrate these functions in its discovery process. BR-3 was the immediate precursor of BR-4, differing mainly in that the former lacked the ability to postulate new particles and to predict new reactions.
Valdes-Perez (in press) has described an alternative approach to discovery in particle physics, which he has implemented in the PAULI system. This scheme use a variation on linear programming to search the space of property values, subject to constraints that reflect observed and unobserved reactions. Also, Fischer and Zytkow (1992) have reported on GELL-MANN, a system designed to explain the formation of the quark theory, which also carries out a search through a space of parameter values subject to constraints.
A more general framework, proposed by Valdes-Perez, Simon, and Zytkow (1993), views the process of formulating structural models in terms of matrix operations. They show how many existing systems, including those described above, can be viewed in this light, with the basic operations involving the extension of a matrix along one or more dimensions and the revision of entries in the cells of the matrix. Our own BR-4 system also fits well into this framework, as suggested by our use of Valdes-Perez et al.'s terminology in Section 2.
Other research on theory revision seems less closely related. Rajamoney's (1990) COAST system designs experiments to distinguish between alternative structural models in physics, and Karp's (1990) HypGene uses a similar idea for biological theories. Kulkarni and Simon (1990) describe KEKADA, a computational model that integrates theory revision, experiment design, and problem formulation to model Krebs' discovery of the urea cycle. Shrager and Langley (1990) consider the relations among these systems in more detail.
4.3. Directions for Future Work
Although BR-4 provides an abstract account for some important developments in the history of particle physics, there remains considerable room for extensions to the model. One direction for improvement involves the notion of explanation. In some sense, the current system formulates explanations when it finds that a newly observed reaction is consistent with the existing theory or when it proposes a new property that rules out an unobserved reaction. However, BR-4 does not generate an explicit proof or other structure that connects assumptions and observations. In future work, we plan to model the explanatory process in more detail, with the system deducing the presence or absence of specific reactions from declara tive statements of quantum properties and conservation laws. In turn, this may let us recast BR-4's operators in terms of an abduction process (Ng & Mooney, 1990; O'Rorke et al., 1990) that modifies assumptions to explain known phenomena.
We also hope to extend the system to handle the introduction of componential models, which describe particles at one level as combinations of more primitive particles. Langley et al.'s (1987) DALTON took some initial steps along these lines to explain the relations between chemical molecules and elements, but we believe that we can adapt BR-4 to explain the origins of the quark theory and its alternatives. The basic task here involves explaining why elementary particles with some quantum properties exist and others do not. The constraints of consistency and completeness, which play such a central role in BR-4, seem well suited for this problem, which involves postulating new component particles (quarks), then searching the space of quantum values and their compositions that satisfy certain constraints (e.g., symmetry) for known particles and violate these constraints for nonexistent ones.
Finally, like most other models of scientific discovery, BR-4 ignores the interactions that occur among different researchers. Scientists cooperate along some dimensions, with theorists passing on predictions to experimentalists, who in turn report their observations to theorists. They also compete in developing theories to explain new findings, in discovering evidence for predicted events, and by noting errors in others' reasoning. The history of particle physics is rich in examples of such interactions, and we believe that some revisions to BR-4 will let us model some of them. In particular, we plan to assign different facets of the system's domain knowledge to different agents, which would communicate through a common representation; we will also let different agents explore different branches when search suggests alternative solutions.
5. Concluding Remarks
In this paper we presented BR-4, an abstract computational model of scientific discovery. We examined the system's behavior on three problems from particle physics, showing that it can replicate, though in a schematic way, important steps in the historical development of this field, some of which were considered major discoveries when first introduced. In particular, BR-4 proposes the existence of the neutrino to avoid violating conservation of spin, it invents baryon and lepton numbers to explain the absence of reactions involving proton decay, and it postulates electron and muon numbers to rule out unobserved neutrino reactions. In addition, the system can determine appropriate quantum values for each particle, and it can predict the reactions implied by a set of particles and quantum properties.
The BR-4 model accomplishes these feats using simple processes that play a central role in many aspects of human cognition. The system employs four basic operators for determining property values, creating new properties, positing new particles, and predicting reactions. Moreover, it uses consistency and completeness constraints to selectively apply these operators, and it incorporates depth-first control scheme to carry out search when necessary. The simplicity of these mechanisms, and their similarity to other processes observed in human behavior, suggest that one can explain some aspects of scientific creativity in similar terms.
References
Fischer, P., & Zytkow, J. M. (1992). Incremental generation and exploration of hidden structure. Proceedings of the ML92 Workshop on Machine Discovery.
Karp, P. (1990). Hypothesis formation as design. In Shrager, J., and Langley P., eds.,
Klahr, D. (1994). Extended abstract: Children, adults and machines as discovery systems. Machine Learning, 14, 313-320.
Kocabas, S. (1991). Conflict resolution as discovery in particle physics. Machine Learning, 6 , 277-309.
Kocabas, S. (1992). Elements of scientific research: Modeling discoveries in oxide superconductivity. Proceedings of the ML 92 Workshop on Machine Discovery. Aberdeen, Scotland. (pp. 63-70).
Kocabas, S. (1993). Elements of Scientific Creativity. In Technical Report: Artificial Intelligence and Creativity . AAAI Press, pp. 39-45.
Kocabas, S. (1994). Goal directed discovery and explanation in particle physics. In Working Notes: Goal Driven Learning, AAAI Spring Symposium Series. (pp. 54-61).
Kulkarni, D. and Simon, H. (1988). The processes of scientific discovery. Cognitive Science,
Langley, P., Simon, H. A., Bradshaw, G. L., & Zytkow, J. M. (1987). Scientific discovery: Computational explorations of the creative processes. Cambridge, MA: MIT Press.
Ng, ?. and Mooney, ? (199?). ...
O'Rorke, P., Morris, S. and Schulenburg, D. (1990). Theory formation by abstraction. In Shrager, J., and Langley P. eds. Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rajamoney, S.A. (1990). A computational approach to theory revision. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rose, D., & Langley, P. (1986). Chemical discovery as belief revision. Machine Learning, 1, 423-451.
Shen, W. M., & Simon, H. A. (1989). Rule creation and rule learning through environmental exploration. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. ***--***). Detroit, MI: Morgan Kaufmann.
Shrager, J. and Langley, P. Computational approaches to scientific discovery.In Shrager, J., and Langley P., eds.,
Tweney, R.D. (1990). In Shrager, J. and Langley, P. (eds.) Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann, San Mateo, CA.
Valdes-Perez, R. E. (in press). Discovery of conserved properties in particle physics: A comparison of two models. Machine Learning.
Valdes-Perez, R. E., Zytkow, J. M., & Simon, H. A. (1993). Scientific model building as search in matrix spaces. Proceedings of the Eleventh National Conference on Artificial Intelligence (pp. 472-478). Washington, DC: AAAI Press.
Zytkow, J.M. and Simon, H. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
Goal Directed Discovery and Explanation in Particle Physics
GOAL DIRECTED DISCOVERY AND EXPLANATION
IN PARTICLE PHYSICS
Sakir Kocabas
Department of Artificial Intelligence
Tubitak - MRC, PK 21 Gebze, Turkey
Abstract:
This paper describes a goal directed discovery system, TREV, which models the disvery of certain quantum properties and conservation laws by physicists between 1920 and 1960. The program is directed by completeness and consistency constraints, and has the capability of explaining its knowledge state by these constraints. TREV is capable of formulating new elementary particles and particle reactions, and proposing observations to test their existence. According to the results of such observations, the program can revise its knowledge base (e.g. its hypotheses about the particles), until it achieves a consistent and complete theory of its domain.
1. Introduction
Goal directed discovery has been the focus of attention by several researchers in the last ten years, and a number of computational models with different capabilities have been developed. Among these systems, BACON (Langley, Simon, Bradshaw & Zytkow, 1987), has the capabilities of data collection, quantitative reasoning and hypothesis formation; IDS (Nordhausen & Langley, 1993) and FAHRENHEIT (Zytkow, 1987) have the features of data collection, qualitative and quantitative reasoning, and hypothesis formation; GLAUBER (Langley, et al., 1987), concept formation and the discovery of qualitative laws; STAHL (Zytkow & Simon, 1986), STAHLp (Rose & Langley, 1986), REVOLVER (Rose & LANGLEY, 1986), concept formation (i.e., the componential models of chemical substances or quark compositions of elementary particles) and theory revision; MECHEM (Valdes-Perez, 1992) discovery of reaction pathways; AbE (O'Rorke, Morris & Schulenburg, 1990), theory formation, explanation and theory revision by using qualitative schemas; GALILEO (Zytkow, 1990), theory formation; KEKADA (Kulkarni & Simon, 1988), goal selection, hypothesis formation, experiment design, and expectation setting; COAST (Rajamoney, 1990) and ECHO (Thagard, P. and Nowak, G., 1990), theory formation, theory revision and paradigm shifts by qualitative models; and BR-3 (Kocabas, 1991), theory formation and theory revision.
The subject of this paper is a goal directed discovery model TREV, with the capabilities of theory formation, experiment design, data acquisition, explanation, and theory revision. Before we describe the system and its behavior, it is appropriate to present some background information about its task domain, particle physics.
1.1 The Domain of Particle Physics
Until the last decade of the 19th century, material substances were thought to be consisting of indivisible atoms. Towards the end of that century, experiments with cathode ray tubes revealed the first elementary particle (the electron), which was to be identified as one of the basic components of an atom. Early in the 20th century, other elementary particles, the proton and the neutron were discovered. Later, observations on cosmic rays revealed a number of other particles such as the muon, pion, kaon, the neutrinos and the lambda particles. There are now well over a hundred elementary particles known, some of which are listed with their quantum properties in Table 1. Most of these particles are unstable, and quickly decay into a series of lighter and more stable particles such as the electron and neutrino, and into gamma rays. For example, a neutron decays to produce a proton, an electron and an antineutrino; and a pion decays into an antimuon and a neutrino:
n --> p + e + /nu
pi --> /mu + nu.
Particles also interact with one another under natural and experimental conditions, producing other elementary particles or gamma radiation. These reactions are called "particle transmutations". An example to such interactions is the high-energy electron-proton collision, which produces a neutron and a neutrino:
e + p --> n + nu.
The theoretical possibility of such particle reactions depend on a series of quantum conservation laws. According to these laws, quantum properties such as spin, lepton number, electrical charge, baryon number, strangeness, energy, and momentum are conserved in particle decays and collisions. However, some quantum properties may not be conserved in certain particle reactions, (e.g., the strangeness property is not conserved in weak interactions.)
Table 1. Some elementary particles and their quantum properties. With the exception of gamma, each particle has an antiparticle with opposite quantum values. The antiparticles are indicated with a '/' in the text (e.g. as in /n for anti-neutron).
----------------------------------------------------------------
electrical lepton baryon spin strangeness
charge number number
----------------------------------------------------------------
gamma 0 0 0 1 0
nu 0 1 0 1/2 0
mu -1 1 0 1/2 0
tau -1 1 0 1/2 0
e -1 1 0 1/2 0
pi 1 0 0 0 0
pi0 0 0 0 0 0
k 1 0 0 0 1
k0 0 0 0 0 1
p 1 0 1 1/2 0
n 0 0 1 1/2 0
----------------------------------------------------------------
1.2 Theory Development in Particle Physics
The earliest known laws about elementary particle reactions were the energy and charge conservation laws. The law of the conservation of charge can be stated as follows: The sum of the charges of the initial particles entering a reaction is equal to the sum of the charges of the final particles. The following reactions conserve electrical charge and have been "observed" by physicists:
p + p --> p + n + pi
pi0 --> gamma + gamma
where p, n, pi, pi0, and gamma designate the proton, neutron, pion, pion-zero and gamma particles respectively. It has been known since early this century that the proton and electron have opposite and unit electrical charges. The neutron has been known to be unstable, decaying into a proton, an electron, and an antineutrino in what is called "beta decay", or
n --> p + e + /nu
but a proton decay has never been observed, and the stability of this particle had puzzled the physicists. Why does it not decay into lighter particles? Reactions such as
p --> pi + pi0
p --> /e + gamma
never happen despite the fact that they apparently obey the charge conservation law. A theoretical framework based only on the charge conservation law would not be capable of explaining the absence of these reactions. In other words, such a theory would be incomplete concerning particle reactions.
The discrepancy between the theoretically valid and physically observable reactions was a conflict that had to be resolved. Physicists resolved these conflicts by postulating new quantum properties and conservation laws, so that theoretically valid but physically unobservable reactions were rendered theoretically invalid by these laws (see, Omnes, 1970; Griffiths, 1987). In this way the absence of these reactions were explained by their violation of the conservation of the new quantum property. The next problem was to find the quantum value distribution of the new property over the elementary particles.
=====================================
To illustrate how such conflicts were resolved, let us consider a reaction which conserves electrical charge but has not been observed
p --> pi + pi0.
Let us assume that this reaction violates the conservation of a new property (e.g., the "protonic charge"). Now, if we arbitrarily assign the new charge value to the proton as one and assume that the other particles, pi and pi0, do not have this charge (i.e., they both have zero protonic charge), then the reaction would be unbalanced by the new charge (i.e., 1 =/= 0 + 0). This would explain why the reaction had never been observed. Nevertheless, the value set [1,0,0] is not the only one that makes the reaction unbalanced, as the values [0,1,1], [0,1,0], [0,0,1] and [1,1,1] produce the same effect.
On the other hand, the new quantum values make some observed reactions unbalanced, as in
p + p --> p + n + pi
p + /pi --> n + pi0
These reactions conserve electrical charge, but not the "known" values of the new charge. This can be seen by substituting the protonic charge values:
1 + 1 = 1 + n + 0
1 + /pi = n + 0
This suggests that some of the other particles in these reactions must have nonzero protonic charge. Here, if we assign the protonic charge value of one to the neutron and zero to /pi, the reactions would be balanced. However, other valid and observed reactions may conflict with the assigned values, and we may have to revise some of the assumptions about the protonic charge values of particles accordingly.
+++++++++++++++++++
TREV, like its predecessor BR-3 (Kocabas, 1991) rediscovers the quantum properties in the same way as explained above. As the program's goal is to achieve a consistent and complete knowledge state, it postulates new hypotheses, and revises its domain knowledge until it achieves its goal state. In this way TREV models the discoveries of the lepton, baryon, electron, and muon number properties in particle physics. Apart from its theory formation and theory revision capabilities, the program has also the ability of proposing experiments and providing explanations for its assumptions about its domain objects.
In the remaining part of this paper we first present an overview of the system, and describe its behaviour in modeling the discoveries of the quantum properties, in proposing experiments, and in providing explanations. This is followed by a comparative discussion on the system's research goals, knowledge representation, theory revision and search methods, and its generality. The paper concludes with a summary of the results.
2. The System's Knowledge Representation and Behavior
The program uses a structured knowledge representation similar to qualitative schemas as in AbE (O'Rorke et al, 1990) and the other recent discovery models. This structured representation facilitates the system's identification of problem states such as incompleteness and inconsistency. Therefore we begin with describing the knowledge representation methods of TREV in some detail.
2.1 Knowledge Representation
TREV's knowledge organization distinguishes descriptive and prescriptive knowledge. The former type of knowledge is represented as frames, and the latter as a series of operators and functions. The program has nine operators which are named as follows: 'evaluate', 'check-consistency', 'check-completeness', 'postulate-properties', 'revise-hypotheses', 'find-quantum-values', 'formulate-new-particles', 'formulate-virtual-particles', and 'formulate-reactions'. The program also has a similarity based learning (SBL) module.
The main data items of TREV are elementary particles and their reactions. Both are represented as frames in the system's knowledge base. Particle frames include the name of the particle, the quantum properties and their values. The general form of a particle frame is as follows:
frame: P
class = particle
q1 = v1
q2 = v2
...
qn = vn.
where P is the name of the particle, q1,...,qn the quantum properties, and v1,...,vn the corresponding quantum values, which can be -1, 0, or 1.
Particle reactions are represented in a similar way, this time containing information about the reactions, such as the particles involved, the reaction conditions, the physical status of the reaction, and its validity under the current theory. The general form of a particle reaction frame is as follows:
frame: reaction
class = physical event
actual status = A
logical status = L, logical-status(N,L)
reactants = R
products = P
active properties = Q, active-properties(N,Q)
reactants properties = Rp, reactants-properties(Q,Rp)
products properties = Pp, products-properties(Q,Pp)
conditions = (Rp = Pp) or (Rp =/= Pp)
where A indicates whether the reaction has been physically observed or unobserved, and L indicates whether the reaction is valid or invalid under the current theoretical knowledge of the system. R and P are the lists of the particles involved in the reaction as the reactants and the products respectively. Q indicates the vector of quantum properties that play an active role in the reaction, while Rp and Pp are the quantum value vectors of the reactants and the products. Normally, particle reactions are added to the program's knowledge base (e.g. for the reaction n -> p + e + /nu) as follows:
frame: r1
class = reaction
actual status = observed
reactants = [n]
products = [p, e, /nu].
Such input reaction frames are then transformed into the form below by inheritance from the parent frame:
frame: r1,
class = reaction
actual status = observed
logical status = valid
reactants = [n]
products = [p, e, /nu]
active properties = [q0, q1]
reactants properties = [1, 0]
products properties = [1, 0]
conditions = {[1,0] = [1,0]}.
The amended slots are added after their values are calculated by the 'evaluate' operator.
TREV has two operators, 'check-consistency' and 'check-completeness', which can identify the problem states (inconsistency and incompleteness) about reactions.
The 'check-consistency' operator can decide whether the information in a reaction frame is consistent or inconsistent with the system's knowledge, by the following rules:
If R is a reaction,
and its actual status is o b s e r v e d,
and its logical status is v a l i d,
then R is consistent with the system's knowledge base.
If R is a reaction,
and its actual status is o b s e r v e d,
and its logical status is i n v a l i d,
then R is inconsistent with the system's knowledge base.
The check-completeness operator on the other hand, can also decide whether a reaction is explainable within the system's current knowledge, i.e., why the reation is physically observable or unobservable. In other words, the program can decide whether its knowledge concerning a particle reaction is complete or incomplete. The completeness rules are as follows:
If R is a reaction,
and its actual status is u n o b s e r v e d,
and its logical status is i n v a l i d,
then the system's knowledge base is complete regarding R.
If R is a reaction,
and its actual status is u n o b s e r v e d,
and its logical status is v a l i d,
then the system's knowledge base is incomplete regarding R.
The program checks its knowledge about reactions for consistency and completeness every time it is presented with a new set of data, and tries to achieve a consistent and complete knowledge state. In this, TREV uses a a control structure employed by its predecessor, BR-3 (Kocabas, 1991). Figure 1 summarizes the system's control structure. Accordingly, TREV first checks for consistency by using the above rules over its reaction frames, and reports inconsistent reactions to a message list. Inconsistent reactions are observed reaction that do not conserve a certain quantum property in the program's knowledge base.
An inconsistency report in the message list activates the 'revise-hypotheses operator'. This operator modifies the system's knowledge about the particles' quantum property values by first turning the inconsistent reactions into algebraic equations and finding sets of alternative quantum values for the particles appearing in these reactions. Since there are only three possible quantum values, namely -1, 0 and 1, modifications alternate between these values. Each vallue set is tried until the consistency constraints are satisfied.
On the other hand, after consistency has been achieved, but TREV cannot explain why a certain unobserved particle reaction is impossible, the program posts an incompleteness message to the message list. This in turn, activates the 'postulate-property' operator, which postulates a new quantum property. The program adds the new quantum property to a new slot in the particle frames with the default values of zero.
The 'find-quantum-values' operator turns the unobserved reaction formula into an algebraic inequality, and finds a set of quantum values for the particles in the formula. E.g. for the unobserved reaction p --> /e + gamma, the inequalities
0 =/= 0 + 1
0 =/= -1 + 0
1 =/= 0 + 0
1 =/= -1 + 0
1 =/= -1 + 1
are generated by the program. Each of these inequalities represent a set of quantum values for the new property, which enable TREV to explain the absence of the reaction. The first quantum value set (p=0, /e=0, gamma=1) is assigned to the particles first. However, the new values must be consistent with the system's knowledge of elementary particles and their observed reactions. To secure this, the quantum values for the new property are assigned to other particles, such that its conservation is satisfied in the observed reactions. The check-consistency operator checks if the new values are consistent, and the revise-hypotheses operator revises them as necessary. This cycle continues until the system achieves a consistent and complete knowledge state.
inconsistent revise check
knowledge ---> hypotheses ---> consistency
state and completeness
incomplete postulate find check
knowledge ---> new ---> quantum ---> consistency
state properties values
consistent and
complete ---> stop.
knowledge state
Figure 1. TREV's general control structure in the discovery of
quantum properties.
2.2 Formulation of New Particles
The program can define new particles by making modifications on the values of quantum property slots of existing particle frames. For example, from the neutron's frame
frame: n (neutron)
class = particle
q1 = 0 (electrical charge)
q2 = 0 (lepton number)
q3 = 1 (baryon number)
a new particle can be defined by changing the q1 value to -1 to obtain
the particle
frame: p1 (proposed particle)
class = proposed particle
q1 = -1 (electrical charge)
q2 = 0 (lepton number)
q3 = 1 (baryon number)
which, incidentally corresponds to anti-proton. The program proposes to make observations to check whether such postulated particles exist in nature. The important point about this exercise is that certain quantum property combinations never exist (e.g. particles having nonzero baryon and lepton values at the same time.) In fact, this observation had led to the development of the quark theory in particle physics in the 1960s.
After observations, if the proposed particle has been decided not to exist in nature then it is recorded as nonexistent particle e.g. as
frame: np1
class = nonexistent particle
q1 = v1
q2 = v2
q3 = v3
From its accumulated knowledge about existing elementary particles, TREV can construct hypotheses about the nonexistence of certain quantum value combinations, by an inductive method called exclusion based learning (Kocabas, 1989). These hypotheses state that particles with certain quantum property value combinations cannot exist. TREV can modify its exclusion hypotheses in view of the new knowledge about elementary particles. As soon as a new particle frame is created, the program checks its exclusion hypotheses to decide if the quantum values of the particle contradicts a hypothesis. If it does, the individual hypothesis is removed.
The exclusion hypotheses are added to the system's knowledge base as frames:
frame: ep1,
class = excluded q-composition
q1 = v1
q2 = v2
q3 = #
which means that the quantum values v1 and v2 for the properties q1 and q2 respectively, cannot be possessed by an elementary particle.
2.3 Formulation of Virtual Particles and New Reactions
The program formulates particle decays and collisions by first defining a set of 'virtual' particles. These are formulated simply by adding the vectors of quantum property values of two or three particles. An example to such virtual particles is the one that is formulated by adding the quantum values of the proton [1,0,1] and electron [-1,1,0], resulting in a proton-electron virtual particle with the quantum values of [0,1,1].
proton electron (proton-electron)
[1,0,1] + [-1,1,0] = [0,1,1]
In this way, a virtual particle with zero electrical charge, and with lepton and baryon numbers of 1 is defined. Such virtual particles are used in constructing particle decay and collision reactions. One such possible construction can be a neutron decay:
n --> p + e
which, incidentally, is not a valid reaction, because it does not conserve the quantum values of lepton property, as quantum value vectors of the reactants and products are not equal, i.e., [0,1,0] =/= [0,1,1]. On the other hand, the reaction, which is obtained by using the neutron and the virtual particle proton-electron- antineutrino (p,e,/nu),
n --> p + e + /nu
is a valid and observed reaction as it conserves all the three quantum properties, electrical charge, lepton and baryon numbers with the quantum value vectors of both sides being equal, i.e [0,1,0] = [0,1,0].
Testing the reactions proposed by TREV may lead to the discovery of new quantum properties. If a proposed reaction is valid by the program's knowledge of quantum values, but cannot be observed, then this creates an incompleteness problem for the program. As has been described above, in such cases TREV postulates a new quantum property and tries to find a consistent and complete set of values for particles regarding the new property.
2.4 TREV's Methods of Explanation
The program uses its structured knowledge representation for producing explanations about the objects and events of its domain. Explanations are provided when the system is in a consistent and complete knowledge state.
The program can explain why a certain proposed particle reaction is consistent or inconsistent with the system's knowledge about particle physics. In this type of explanations, TREV uses the definition of consistency over the reaction in question.
The consistency (or validity) of a certain proposed reaction is explained by proving that the reaction conserves the quantum values that the program knows. If the reaction does not conserve these quantum values, then it is not inconsistent (or invalid). Consistency (or validity) of a reaction can easily be decided by checking its 'logical status' slot, or by calculating the quantum value vectors of the reactant and the resultant particles and by comparing them. For example, the reaction n --> p + e + /nu is consistent because the 'actual state' slot of the reaction's frame says that the reaction has been observed, and the 'logical status' slot says it is valid. If the reaction frame does not have such a slot, then the 'check-validity' operator fires, which in turn finds if the reaction conserves the known quantum properties.
TREV can explain why a certain reaction is not observable by proving that it violates the conservation of a quantum property that it knows. Also, by using its completeness constraints, the program can explain why the impossibility of a certain unobserved reaction is or is not explainable within the program's domain theory. When the program cannot explain the absence of such a reaction by its domain theory, then it concludes that its knowledge about elementary particles is incomplete concerning the unobserved reaction. As has been described above, TREV resolves such problem states by postulating a new quantum property.
On the other hand, the program can also explain why there can be no particles with a certain set of quantum properties, by using its exclusion hypotheses for such explanations. For example, the exclusion hypothesis
frame: ep1,
class = excluded q-composition
q1 = 1
q2 = 1
q3 = #
explains why there cannot be a particle with the quantum values of q1=1, q2=1, and q3=0.
The system's explanatory power increases as it discovers new quantum properties, and as the particle descriptions become more detailed by including new quantum property slots and values.
TREV can learn to explain consistency and completeness by its similarity based learning (SBL) module. In learning a concept (e.g. 'consistent'), the SBL module compares the positive instances of the concept (i.e. valid and observed reactions), and creates the definition of the concept. The system's consistency and completeness rules are created in this way.
3. Discussion on the System's Methods
TREV is a system that combines several features of a discovery model. Every discovery system, by definition, must have the ability to learn. The program has three distinct types of learning ability, namely inductive learning and learning by discovery. As described above, TREV learns its consistency and completeness constraints by similarity based learning, and its exclusion hypotheses, by exclusion based learning methods. The program also constructs its domain theory with its ability to learn by observation and by discovery. The former involves the formulation of new particles and reactions, and their subsequent comparison with the physical world. The latter takes place by postulating new quantum properties and assigning a set of corresponding quantum values to the particles.
An important feature of a discovery model is theory development, which itself can be divided in two tasks as theory formation and theory revision. TREV extends its domain theory by using its learning and discovery abilities, by adding exclusion hypotheses, by formulating its consistency and completeness constraints, and by postulating new quantum properties when faced with an incomplete knowledge state. When it is faced with an inconsistent knowledge state, the program revises its domain knowledge (i.e. knowledge about particles and their reactions) by using its consistency constraints together with general algebraic constraints.
In its theory development and theory revision activities based on the consistency and completeness constraints, the program works in a coordinated way. However, the system's other task operators work independently and in an uncoordinated way. For example, the 'evaluate', 'formulate-new-particles', 'formulate-virtual-particles', and 'formulate-reactions' operators are fired by an external agent (e.g. a user) independently. Similarly, explanation generating functions of the system are called on user demand and for specific purposes, such as in explaining why a particular is unobservable.
Also, the operators which formulate new particles and reactions are not constrained by domain dependent and general constrains. Hence, they operate in a relatively large search space. As a result, these operators can formulate uninteresting domain objects as well as the interesting ones.
TREV's explanation functions take advantage the system's structural knowledge representation. The explanations provided are simple, and do not go deeper into the system's domain theory. However, the program can be improved in this direction.
The program's ability to fromulate new objects means that it has the ability to propose observations to decide whether the formulated objects (i.e. elementary particles and reactions) exist in nature. Observation results are entered by the 'user'. There are a few discovery models, such as IDS (Nordhausen & langley, 1993) and FAHRENHEIT (Zytkow, 1987) that can directly receive data from their physical environment. However, experimental setup is rather complex for any direct data acquisition in the domain of TREV.
The program has two types of theory revision capability. One is based on using the consistency constraints, and the other is theory revision by observational evidence.
Another shortcoming of the program is that the theory formation and revision operators fired by a rule set whose conditions are determined by the message list. In other words, the control rules are hardwired, though an explanation based learning method could be used to learn such rules. We will address this problem in the future versions of the program.
4. Conclusions
One important problem in artificial intelligence is building models that integrate different methods of representation and learning. We have described a discovery system, directed by completeness and consistency constraints, with the capabilities of theory formation and theory revision, and with the ability of explaining its knowledge state by its domain constraints. The system is capable of formulating new elementary particles and particle reactions, and proposing observations to test their existence. The program has a certain degree of integration in its representation, learning and discovery methods, which can be further improved.
References
Griffiths, D. (1987). Introduction to Elementary Particles. John Wiley and Sons, N.Y.
Kocabas, S. (1989). Scientific Explanation by Exclusion. In Proceedings of the 12th Congress on Cybernetics, Namur, Belgium.
Kocabas, S. (1991). Conflict resolution as discovery in particle physics. Machine Learning, Vol 6, No 3, 277-309.
Kulkarni, D. and Simon, H. (1988). The processes of scientific discovery. Cognitive Science, 12, 139-175.
Langley, P., Simon, H., Bradshaw, G., and Zykow, J. (1987). Scientific discovery: Exploration of the creative processes. MIT Press.
Nordhausen, B. and Langley, P. (1993). An integrated framework for empirical discovery. Machine Learning, 12, 17-47.
Omnes, R. (1970). Intorduction to Particle Physics. Tr. by G. Barton. Wiley Interscience, London.
O'Rorke, P., Morris, S. and Schulenburg, D. (1990). Theory formation by abstraction. In Shrager, J., and Langley P. eds. Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rajamoney, S.A. (1990). A computational approach to theory revision. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rose, D. and Langley, P. (1986). Chemical discovery as belief revision. Machine Learning, 1, 423-452.
Thagard, P. and Nowak, G. (1990). The conceptual structure of the geological revolution. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Valdes-Perez, R. (1992). Theory driven discovery of reaction pathways in the MECHEM system. In Proceedings of the National Conference on Artificial Intelligence.
Zytkow, J.M. (1987). Combining many searches in the FAHRENHEIT discovery system. Proceedings of the Fourth International Workshop on Machine Learning, Morgan Kaufmann, 281-287, Los Altos, CA.
Zytkow, J.M. (1990). Deriving laws through analysis of processes and equations. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Zytkow, J.M. and Simon, H. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
IN PARTICLE PHYSICS
Sakir Kocabas
Department of Artificial Intelligence
Tubitak - MRC, PK 21 Gebze, Turkey
Abstract:
This paper describes a goal directed discovery system, TREV, which models the disvery of certain quantum properties and conservation laws by physicists between 1920 and 1960. The program is directed by completeness and consistency constraints, and has the capability of explaining its knowledge state by these constraints. TREV is capable of formulating new elementary particles and particle reactions, and proposing observations to test their existence. According to the results of such observations, the program can revise its knowledge base (e.g. its hypotheses about the particles), until it achieves a consistent and complete theory of its domain.
1. Introduction
Goal directed discovery has been the focus of attention by several researchers in the last ten years, and a number of computational models with different capabilities have been developed. Among these systems, BACON (Langley, Simon, Bradshaw & Zytkow, 1987), has the capabilities of data collection, quantitative reasoning and hypothesis formation; IDS (Nordhausen & Langley, 1993) and FAHRENHEIT (Zytkow, 1987) have the features of data collection, qualitative and quantitative reasoning, and hypothesis formation; GLAUBER (Langley, et al., 1987), concept formation and the discovery of qualitative laws; STAHL (Zytkow & Simon, 1986), STAHLp (Rose & Langley, 1986), REVOLVER (Rose & LANGLEY, 1986), concept formation (i.e., the componential models of chemical substances or quark compositions of elementary particles) and theory revision; MECHEM (Valdes-Perez, 1992) discovery of reaction pathways; AbE (O'Rorke, Morris & Schulenburg, 1990), theory formation, explanation and theory revision by using qualitative schemas; GALILEO (Zytkow, 1990), theory formation; KEKADA (Kulkarni & Simon, 1988), goal selection, hypothesis formation, experiment design, and expectation setting; COAST (Rajamoney, 1990) and ECHO (Thagard, P. and Nowak, G., 1990), theory formation, theory revision and paradigm shifts by qualitative models; and BR-3 (Kocabas, 1991), theory formation and theory revision.
The subject of this paper is a goal directed discovery model TREV, with the capabilities of theory formation, experiment design, data acquisition, explanation, and theory revision. Before we describe the system and its behavior, it is appropriate to present some background information about its task domain, particle physics.
1.1 The Domain of Particle Physics
Until the last decade of the 19th century, material substances were thought to be consisting of indivisible atoms. Towards the end of that century, experiments with cathode ray tubes revealed the first elementary particle (the electron), which was to be identified as one of the basic components of an atom. Early in the 20th century, other elementary particles, the proton and the neutron were discovered. Later, observations on cosmic rays revealed a number of other particles such as the muon, pion, kaon, the neutrinos and the lambda particles. There are now well over a hundred elementary particles known, some of which are listed with their quantum properties in Table 1. Most of these particles are unstable, and quickly decay into a series of lighter and more stable particles such as the electron and neutrino, and into gamma rays. For example, a neutron decays to produce a proton, an electron and an antineutrino; and a pion decays into an antimuon and a neutrino:
n --> p + e + /nu
pi --> /mu + nu.
Particles also interact with one another under natural and experimental conditions, producing other elementary particles or gamma radiation. These reactions are called "particle transmutations". An example to such interactions is the high-energy electron-proton collision, which produces a neutron and a neutrino:
e + p --> n + nu.
The theoretical possibility of such particle reactions depend on a series of quantum conservation laws. According to these laws, quantum properties such as spin, lepton number, electrical charge, baryon number, strangeness, energy, and momentum are conserved in particle decays and collisions. However, some quantum properties may not be conserved in certain particle reactions, (e.g., the strangeness property is not conserved in weak interactions.)
Table 1. Some elementary particles and their quantum properties. With the exception of gamma, each particle has an antiparticle with opposite quantum values. The antiparticles are indicated with a '/' in the text (e.g. as in /n for anti-neutron).
----------------------------------------------------------------
electrical lepton baryon spin strangeness
charge number number
----------------------------------------------------------------
gamma 0 0 0 1 0
nu 0 1 0 1/2 0
mu -1 1 0 1/2 0
tau -1 1 0 1/2 0
e -1 1 0 1/2 0
pi 1 0 0 0 0
pi0 0 0 0 0 0
k 1 0 0 0 1
k0 0 0 0 0 1
p 1 0 1 1/2 0
n 0 0 1 1/2 0
----------------------------------------------------------------
1.2 Theory Development in Particle Physics
The earliest known laws about elementary particle reactions were the energy and charge conservation laws. The law of the conservation of charge can be stated as follows: The sum of the charges of the initial particles entering a reaction is equal to the sum of the charges of the final particles. The following reactions conserve electrical charge and have been "observed" by physicists:
p + p --> p + n + pi
pi0 --> gamma + gamma
where p, n, pi, pi0, and gamma designate the proton, neutron, pion, pion-zero and gamma particles respectively. It has been known since early this century that the proton and electron have opposite and unit electrical charges. The neutron has been known to be unstable, decaying into a proton, an electron, and an antineutrino in what is called "beta decay", or
n --> p + e + /nu
but a proton decay has never been observed, and the stability of this particle had puzzled the physicists. Why does it not decay into lighter particles? Reactions such as
p --> pi + pi0
p --> /e + gamma
never happen despite the fact that they apparently obey the charge conservation law. A theoretical framework based only on the charge conservation law would not be capable of explaining the absence of these reactions. In other words, such a theory would be incomplete concerning particle reactions.
The discrepancy between the theoretically valid and physically observable reactions was a conflict that had to be resolved. Physicists resolved these conflicts by postulating new quantum properties and conservation laws, so that theoretically valid but physically unobservable reactions were rendered theoretically invalid by these laws (see, Omnes, 1970; Griffiths, 1987). In this way the absence of these reactions were explained by their violation of the conservation of the new quantum property. The next problem was to find the quantum value distribution of the new property over the elementary particles.
=====================================
To illustrate how such conflicts were resolved, let us consider a reaction which conserves electrical charge but has not been observed
p --> pi + pi0.
Let us assume that this reaction violates the conservation of a new property (e.g., the "protonic charge"). Now, if we arbitrarily assign the new charge value to the proton as one and assume that the other particles, pi and pi0, do not have this charge (i.e., they both have zero protonic charge), then the reaction would be unbalanced by the new charge (i.e., 1 =/= 0 + 0). This would explain why the reaction had never been observed. Nevertheless, the value set [1,0,0] is not the only one that makes the reaction unbalanced, as the values [0,1,1], [0,1,0], [0,0,1] and [1,1,1] produce the same effect.
On the other hand, the new quantum values make some observed reactions unbalanced, as in
p + p --> p + n + pi
p + /pi --> n + pi0
These reactions conserve electrical charge, but not the "known" values of the new charge. This can be seen by substituting the protonic charge values:
1 + 1 = 1 + n + 0
1 + /pi = n + 0
This suggests that some of the other particles in these reactions must have nonzero protonic charge. Here, if we assign the protonic charge value of one to the neutron and zero to /pi, the reactions would be balanced. However, other valid and observed reactions may conflict with the assigned values, and we may have to revise some of the assumptions about the protonic charge values of particles accordingly.
+++++++++++++++++++
TREV, like its predecessor BR-3 (Kocabas, 1991) rediscovers the quantum properties in the same way as explained above. As the program's goal is to achieve a consistent and complete knowledge state, it postulates new hypotheses, and revises its domain knowledge until it achieves its goal state. In this way TREV models the discoveries of the lepton, baryon, electron, and muon number properties in particle physics. Apart from its theory formation and theory revision capabilities, the program has also the ability of proposing experiments and providing explanations for its assumptions about its domain objects.
In the remaining part of this paper we first present an overview of the system, and describe its behaviour in modeling the discoveries of the quantum properties, in proposing experiments, and in providing explanations. This is followed by a comparative discussion on the system's research goals, knowledge representation, theory revision and search methods, and its generality. The paper concludes with a summary of the results.
2. The System's Knowledge Representation and Behavior
The program uses a structured knowledge representation similar to qualitative schemas as in AbE (O'Rorke et al, 1990) and the other recent discovery models. This structured representation facilitates the system's identification of problem states such as incompleteness and inconsistency. Therefore we begin with describing the knowledge representation methods of TREV in some detail.
2.1 Knowledge Representation
TREV's knowledge organization distinguishes descriptive and prescriptive knowledge. The former type of knowledge is represented as frames, and the latter as a series of operators and functions. The program has nine operators which are named as follows: 'evaluate', 'check-consistency', 'check-completeness', 'postulate-properties', 'revise-hypotheses', 'find-quantum-values', 'formulate-new-particles', 'formulate-virtual-particles', and 'formulate-reactions'. The program also has a similarity based learning (SBL) module.
The main data items of TREV are elementary particles and their reactions. Both are represented as frames in the system's knowledge base. Particle frames include the name of the particle, the quantum properties and their values. The general form of a particle frame is as follows:
frame: P
class = particle
q1 = v1
q2 = v2
...
qn = vn.
where P is the name of the particle, q1,...,qn the quantum properties, and v1,...,vn the corresponding quantum values, which can be -1, 0, or 1.
Particle reactions are represented in a similar way, this time containing information about the reactions, such as the particles involved, the reaction conditions, the physical status of the reaction, and its validity under the current theory. The general form of a particle reaction frame is as follows:
frame: reaction
class = physical event
actual status = A
logical status = L, logical-status(N,L)
reactants = R
products = P
active properties = Q, active-properties(N,Q)
reactants properties = Rp, reactants-properties(Q,Rp)
products properties = Pp, products-properties(Q,Pp)
conditions = (Rp = Pp) or (Rp =/= Pp)
where A indicates whether the reaction has been physically observed or unobserved, and L indicates whether the reaction is valid or invalid under the current theoretical knowledge of the system. R and P are the lists of the particles involved in the reaction as the reactants and the products respectively. Q indicates the vector of quantum properties that play an active role in the reaction, while Rp and Pp are the quantum value vectors of the reactants and the products. Normally, particle reactions are added to the program's knowledge base (e.g. for the reaction n -> p + e + /nu) as follows:
frame: r1
class = reaction
actual status = observed
reactants = [n]
products = [p, e, /nu].
Such input reaction frames are then transformed into the form below by inheritance from the parent frame:
frame: r1,
class = reaction
actual status = observed
logical status = valid
reactants = [n]
products = [p, e, /nu]
active properties = [q0, q1]
reactants properties = [1, 0]
products properties = [1, 0]
conditions = {[1,0] = [1,0]}.
The amended slots are added after their values are calculated by the 'evaluate' operator.
TREV has two operators, 'check-consistency' and 'check-completeness', which can identify the problem states (inconsistency and incompleteness) about reactions.
The 'check-consistency' operator can decide whether the information in a reaction frame is consistent or inconsistent with the system's knowledge, by the following rules:
If R is a reaction,
and its actual status is o b s e r v e d,
and its logical status is v a l i d,
then R is consistent with the system's knowledge base.
If R is a reaction,
and its actual status is o b s e r v e d,
and its logical status is i n v a l i d,
then R is inconsistent with the system's knowledge base.
The check-completeness operator on the other hand, can also decide whether a reaction is explainable within the system's current knowledge, i.e., why the reation is physically observable or unobservable. In other words, the program can decide whether its knowledge concerning a particle reaction is complete or incomplete. The completeness rules are as follows:
If R is a reaction,
and its actual status is u n o b s e r v e d,
and its logical status is i n v a l i d,
then the system's knowledge base is complete regarding R.
If R is a reaction,
and its actual status is u n o b s e r v e d,
and its logical status is v a l i d,
then the system's knowledge base is incomplete regarding R.
The program checks its knowledge about reactions for consistency and completeness every time it is presented with a new set of data, and tries to achieve a consistent and complete knowledge state. In this, TREV uses a a control structure employed by its predecessor, BR-3 (Kocabas, 1991). Figure 1 summarizes the system's control structure. Accordingly, TREV first checks for consistency by using the above rules over its reaction frames, and reports inconsistent reactions to a message list. Inconsistent reactions are observed reaction that do not conserve a certain quantum property in the program's knowledge base.
An inconsistency report in the message list activates the 'revise-hypotheses operator'. This operator modifies the system's knowledge about the particles' quantum property values by first turning the inconsistent reactions into algebraic equations and finding sets of alternative quantum values for the particles appearing in these reactions. Since there are only three possible quantum values, namely -1, 0 and 1, modifications alternate between these values. Each vallue set is tried until the consistency constraints are satisfied.
On the other hand, after consistency has been achieved, but TREV cannot explain why a certain unobserved particle reaction is impossible, the program posts an incompleteness message to the message list. This in turn, activates the 'postulate-property' operator, which postulates a new quantum property. The program adds the new quantum property to a new slot in the particle frames with the default values of zero.
The 'find-quantum-values' operator turns the unobserved reaction formula into an algebraic inequality, and finds a set of quantum values for the particles in the formula. E.g. for the unobserved reaction p --> /e + gamma, the inequalities
0 =/= 0 + 1
0 =/= -1 + 0
1 =/= 0 + 0
1 =/= -1 + 0
1 =/= -1 + 1
are generated by the program. Each of these inequalities represent a set of quantum values for the new property, which enable TREV to explain the absence of the reaction. The first quantum value set (p=0, /e=0, gamma=1) is assigned to the particles first. However, the new values must be consistent with the system's knowledge of elementary particles and their observed reactions. To secure this, the quantum values for the new property are assigned to other particles, such that its conservation is satisfied in the observed reactions. The check-consistency operator checks if the new values are consistent, and the revise-hypotheses operator revises them as necessary. This cycle continues until the system achieves a consistent and complete knowledge state.
inconsistent revise check
knowledge ---> hypotheses ---> consistency
state and completeness
incomplete postulate find check
knowledge ---> new ---> quantum ---> consistency
state properties values
consistent and
complete ---> stop.
knowledge state
Figure 1. TREV's general control structure in the discovery of
quantum properties.
2.2 Formulation of New Particles
The program can define new particles by making modifications on the values of quantum property slots of existing particle frames. For example, from the neutron's frame
frame: n (neutron)
class = particle
q1 = 0 (electrical charge)
q2 = 0 (lepton number)
q3 = 1 (baryon number)
a new particle can be defined by changing the q1 value to -1 to obtain
the particle
frame: p1 (proposed particle)
class = proposed particle
q1 = -1 (electrical charge)
q2 = 0 (lepton number)
q3 = 1 (baryon number)
which, incidentally corresponds to anti-proton. The program proposes to make observations to check whether such postulated particles exist in nature. The important point about this exercise is that certain quantum property combinations never exist (e.g. particles having nonzero baryon and lepton values at the same time.) In fact, this observation had led to the development of the quark theory in particle physics in the 1960s.
After observations, if the proposed particle has been decided not to exist in nature then it is recorded as nonexistent particle e.g. as
frame: np1
class = nonexistent particle
q1 = v1
q2 = v2
q3 = v3
From its accumulated knowledge about existing elementary particles, TREV can construct hypotheses about the nonexistence of certain quantum value combinations, by an inductive method called exclusion based learning (Kocabas, 1989). These hypotheses state that particles with certain quantum property value combinations cannot exist. TREV can modify its exclusion hypotheses in view of the new knowledge about elementary particles. As soon as a new particle frame is created, the program checks its exclusion hypotheses to decide if the quantum values of the particle contradicts a hypothesis. If it does, the individual hypothesis is removed.
The exclusion hypotheses are added to the system's knowledge base as frames:
frame: ep1,
class = excluded q-composition
q1 = v1
q2 = v2
q3 = #
which means that the quantum values v1 and v2 for the properties q1 and q2 respectively, cannot be possessed by an elementary particle.
2.3 Formulation of Virtual Particles and New Reactions
The program formulates particle decays and collisions by first defining a set of 'virtual' particles. These are formulated simply by adding the vectors of quantum property values of two or three particles. An example to such virtual particles is the one that is formulated by adding the quantum values of the proton [1,0,1] and electron [-1,1,0], resulting in a proton-electron virtual particle with the quantum values of [0,1,1].
proton electron (proton-electron)
[1,0,1] + [-1,1,0] = [0,1,1]
In this way, a virtual particle with zero electrical charge, and with lepton and baryon numbers of 1 is defined. Such virtual particles are used in constructing particle decay and collision reactions. One such possible construction can be a neutron decay:
n --> p + e
which, incidentally, is not a valid reaction, because it does not conserve the quantum values of lepton property, as quantum value vectors of the reactants and products are not equal, i.e., [0,1,0] =/= [0,1,1]. On the other hand, the reaction, which is obtained by using the neutron and the virtual particle proton-electron- antineutrino (p,e,/nu),
n --> p + e + /nu
is a valid and observed reaction as it conserves all the three quantum properties, electrical charge, lepton and baryon numbers with the quantum value vectors of both sides being equal, i.e [0,1,0] = [0,1,0].
Testing the reactions proposed by TREV may lead to the discovery of new quantum properties. If a proposed reaction is valid by the program's knowledge of quantum values, but cannot be observed, then this creates an incompleteness problem for the program. As has been described above, in such cases TREV postulates a new quantum property and tries to find a consistent and complete set of values for particles regarding the new property.
2.4 TREV's Methods of Explanation
The program uses its structured knowledge representation for producing explanations about the objects and events of its domain. Explanations are provided when the system is in a consistent and complete knowledge state.
The program can explain why a certain proposed particle reaction is consistent or inconsistent with the system's knowledge about particle physics. In this type of explanations, TREV uses the definition of consistency over the reaction in question.
The consistency (or validity) of a certain proposed reaction is explained by proving that the reaction conserves the quantum values that the program knows. If the reaction does not conserve these quantum values, then it is not inconsistent (or invalid). Consistency (or validity) of a reaction can easily be decided by checking its 'logical status' slot, or by calculating the quantum value vectors of the reactant and the resultant particles and by comparing them. For example, the reaction n --> p + e + /nu is consistent because the 'actual state' slot of the reaction's frame says that the reaction has been observed, and the 'logical status' slot says it is valid. If the reaction frame does not have such a slot, then the 'check-validity' operator fires, which in turn finds if the reaction conserves the known quantum properties.
TREV can explain why a certain reaction is not observable by proving that it violates the conservation of a quantum property that it knows. Also, by using its completeness constraints, the program can explain why the impossibility of a certain unobserved reaction is or is not explainable within the program's domain theory. When the program cannot explain the absence of such a reaction by its domain theory, then it concludes that its knowledge about elementary particles is incomplete concerning the unobserved reaction. As has been described above, TREV resolves such problem states by postulating a new quantum property.
On the other hand, the program can also explain why there can be no particles with a certain set of quantum properties, by using its exclusion hypotheses for such explanations. For example, the exclusion hypothesis
frame: ep1,
class = excluded q-composition
q1 = 1
q2 = 1
q3 = #
explains why there cannot be a particle with the quantum values of q1=1, q2=1, and q3=0.
The system's explanatory power increases as it discovers new quantum properties, and as the particle descriptions become more detailed by including new quantum property slots and values.
TREV can learn to explain consistency and completeness by its similarity based learning (SBL) module. In learning a concept (e.g. 'consistent'), the SBL module compares the positive instances of the concept (i.e. valid and observed reactions), and creates the definition of the concept. The system's consistency and completeness rules are created in this way.
3. Discussion on the System's Methods
TREV is a system that combines several features of a discovery model. Every discovery system, by definition, must have the ability to learn. The program has three distinct types of learning ability, namely inductive learning and learning by discovery. As described above, TREV learns its consistency and completeness constraints by similarity based learning, and its exclusion hypotheses, by exclusion based learning methods. The program also constructs its domain theory with its ability to learn by observation and by discovery. The former involves the formulation of new particles and reactions, and their subsequent comparison with the physical world. The latter takes place by postulating new quantum properties and assigning a set of corresponding quantum values to the particles.
An important feature of a discovery model is theory development, which itself can be divided in two tasks as theory formation and theory revision. TREV extends its domain theory by using its learning and discovery abilities, by adding exclusion hypotheses, by formulating its consistency and completeness constraints, and by postulating new quantum properties when faced with an incomplete knowledge state. When it is faced with an inconsistent knowledge state, the program revises its domain knowledge (i.e. knowledge about particles and their reactions) by using its consistency constraints together with general algebraic constraints.
In its theory development and theory revision activities based on the consistency and completeness constraints, the program works in a coordinated way. However, the system's other task operators work independently and in an uncoordinated way. For example, the 'evaluate', 'formulate-new-particles', 'formulate-virtual-particles', and 'formulate-reactions' operators are fired by an external agent (e.g. a user) independently. Similarly, explanation generating functions of the system are called on user demand and for specific purposes, such as in explaining why a particular is unobservable.
Also, the operators which formulate new particles and reactions are not constrained by domain dependent and general constrains. Hence, they operate in a relatively large search space. As a result, these operators can formulate uninteresting domain objects as well as the interesting ones.
TREV's explanation functions take advantage the system's structural knowledge representation. The explanations provided are simple, and do not go deeper into the system's domain theory. However, the program can be improved in this direction.
The program's ability to fromulate new objects means that it has the ability to propose observations to decide whether the formulated objects (i.e. elementary particles and reactions) exist in nature. Observation results are entered by the 'user'. There are a few discovery models, such as IDS (Nordhausen & langley, 1993) and FAHRENHEIT (Zytkow, 1987) that can directly receive data from their physical environment. However, experimental setup is rather complex for any direct data acquisition in the domain of TREV.
The program has two types of theory revision capability. One is based on using the consistency constraints, and the other is theory revision by observational evidence.
Another shortcoming of the program is that the theory formation and revision operators fired by a rule set whose conditions are determined by the message list. In other words, the control rules are hardwired, though an explanation based learning method could be used to learn such rules. We will address this problem in the future versions of the program.
4. Conclusions
One important problem in artificial intelligence is building models that integrate different methods of representation and learning. We have described a discovery system, directed by completeness and consistency constraints, with the capabilities of theory formation and theory revision, and with the ability of explaining its knowledge state by its domain constraints. The system is capable of formulating new elementary particles and particle reactions, and proposing observations to test their existence. The program has a certain degree of integration in its representation, learning and discovery methods, which can be further improved.
References
Griffiths, D. (1987). Introduction to Elementary Particles. John Wiley and Sons, N.Y.
Kocabas, S. (1989). Scientific Explanation by Exclusion. In Proceedings of the 12th Congress on Cybernetics, Namur, Belgium.
Kocabas, S. (1991). Conflict resolution as discovery in particle physics. Machine Learning, Vol 6, No 3, 277-309.
Kulkarni, D. and Simon, H. (1988). The processes of scientific discovery. Cognitive Science, 12, 139-175.
Langley, P., Simon, H., Bradshaw, G., and Zykow, J. (1987). Scientific discovery: Exploration of the creative processes. MIT Press.
Nordhausen, B. and Langley, P. (1993). An integrated framework for empirical discovery. Machine Learning, 12, 17-47.
Omnes, R. (1970). Intorduction to Particle Physics. Tr. by G. Barton. Wiley Interscience, London.
O'Rorke, P., Morris, S. and Schulenburg, D. (1990). Theory formation by abstraction. In Shrager, J., and Langley P. eds. Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rajamoney, S.A. (1990). A computational approach to theory revision. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rose, D. and Langley, P. (1986). Chemical discovery as belief revision. Machine Learning, 1, 423-452.
Thagard, P. and Nowak, G. (1990). The conceptual structure of the geological revolution. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Valdes-Perez, R. (1992). Theory driven discovery of reaction pathways in the MECHEM system. In Proceedings of the National Conference on Artificial Intelligence.
Zytkow, J.M. (1987). Combining many searches in the FAHRENHEIT discovery system. Proceedings of the Fourth International Workshop on Machine Learning, Morgan Kaufmann, 281-287, Los Altos, CA.
Zytkow, J.M. (1990). Deriving laws through analysis of processes and equations. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Zytkow, J.M. and Simon, H. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
Automated Formulation of Reactions and Reaction Chains in Nuclear Astrophysics
Automated Formulation of Reactions and Reaction Chains in Nuclear Astrophysics
Sakir Kocabas
Department of Space Sciences and Technology
ITU, 80626 Maslak, Istanbul, TURKEY
Pat Langley
(LANGLEY @ NEWATLANTIS.ISLE.ORG)
Institute for the Study of Learning and Expertise
2164 Staunton Court, Palo Alto, CA 94306 USA
Abstract
In this paper we describe ASTRA, a computational research aid for the formulation and analysis of process explanations in nuclear astrophysics. The system operates in two independent modules. The first module generates fusion and decay reactions for the light elements from hydrogen to oxygen by using knowledge of quantum theory, and from these reactions, the second module constructs all theoretically possible reaction chains as process explanations for the nucleosynthesis of helium, carbon and oxygen. ASTRA has found apparently novel reactions that involve proton, electron and neutron capture. Currently, there is a small number of reactions and pathways proposed by astrophysicists to explain the synthesis of these elements and their relative abundance in stellar systems. ASTRA also produces many alternative reaction pathways, some of which are of interest to scientists working in this domain.
1 Introduction
Computational modeling of scientific discovery has been a primary concern of a small number of research groups in artificial intelligence, and has made considerable advances in its short history. A number of models have been developed in the last two decades to simulate discoveries in fields such as mathematics, physics, chemistry, and biology. These models addressed different aspects of discovery in formal and experimental sciences, such as mathematical theory formation (Lenat, 1979), searching for quantitative relationships and hypothesis formation (Langley, Simon, Bradshaw & Zytkow, 1987), theory development through the discovery of componential models (Zytkow & Simon, 1986; Rose & Langley, 1986), scientific problem formulation and experiment design (Kulkarni & Simon, 1990), theory formation and theory revision (Kocabas, 1991), and theory formation (Valdes-Perez, 1994).
In recent years however, interest increased towards the computational discovery of new scientific knowledge by means of new models. One of the earliest computational tools used in producing new scientific knowledge was DENDRAL (Feigenbaum, Buchanan & Lederberg, 1971), which helped analytical chemists to build correct 2-d models of some complex chemical substances. Two recent examples are Hendrickson's (1995) SYNGEN which designs the synthesis of some organic compounds from initial and intermediate compounds, and Valdes-Perez's (1995; 1997) MECHEM which has found new reaction pathways in physical chemistry.
This paper focuses on the results of ASTRA, an astrophysical research aid designed to support scientists in explaining the nucleosynthesis of elements and their relative abundance in stars. The program is a successor of BR-4 (Kocabas & Langley, 1995) which was developed as an integrated model for studying the role of predictions in particle physics. The behavior and the results of ASTRA is described with an emphasis on the system's abilities as a research tool in astrophysics.
2. Research Problems in Astrophysics
Astrophysics is a curious field of study related with the tiniest and the largest objects in the universe, the elementary particles, and stars and galaxies. One of its subfields, nuclear astrophysics, mainly concerns with the formation of chemical elements from hydrogen (H) and helium (4He), thought to have emerged in the early history of the universe, through a series of fusion and decay reactions in stars. Another important problem concerns the relative abundances of elements, in particular the abundance carbon (12C), nitrogen (14N) and oxygen (16O) relative to lighter elements lile lithium (7Li), beryllium (9Be) and boron (11B).
According to the current astrophysical theories, stars go through several stages in their lifetimes. The first stage, which follows the star. s initial formation by the condensation of cosmic clouds and hydrogen gas, involves . hydrogen burning. . During this stage, stars radiate energy emitted by a series of exothermic fusion reactions in which hydrogen is transformed into helium. Astrophysicists propose three different pathways (Audouze & Vauclair, 1980, p. 52; Williams, 1991, p. 351) to account for hydrogen burning in stars the size of the sun and smaller. Later stages consist of more complex reactions, typically involving heavier elements.
In their attempt to explain nucleosyntheses, the theorists first select a stellar model in thermal equilibrium which makes certain assumptions about the mass, temperature, density, and the element distribution in the stellar plasma. They then identify the particle and nuclear reactions consistent with quantum physics by calculation. Finally, they calculate the rates of these reactions, by using experimental and theoretical knowledge about nuclear cross-sections and reactant abundances. In this way they obtain a set of valid reactions with their rate coefficients. Scientists use the reactions with high rates to construct the pathways, either by working forward from lighter elements to the final element, or backward from the final element, until reaching to the existing lighter elements.
Naturally, there are many possible reactions, and a great number of reaction pathways even for a small number of reactions to start with. Astrophysicists deal with this problem by focusing their attention on only a small set of reactions, relying on heuristics to constrain the generation of explanatory hypotheses. In this process, there seems to be at least two places where automation could be used: In the formulation of all possible reactions within any selected energy band, and in the construction of pathways from any selected set of reactions.
In developing ASTRA our main objective was to see its results on several important research topics in nuclear astrophysics. These were: 1) hydrogen-burning processes, 2) helium burning processes, 3) formation of heavier elements carbon, nitrogen and oxygen through hydrogen and helium burning, and other fusion chains, 4) the role of neutrons in such processes, and 5) the anomaly in the relative abundance of the light elements.
We have examined a number of books and journal papers on nuclear astrophysics, notably the following work : Audouze & Vauclair (1980); Clayton (1983); Fowler (1986); Fowler, et al., 1967; Fowler et al., 1975; Harris & Fowler, et al., 1983; Cujec & Fowler, 1980; Kippenhahn & Weigert (1994); Lang (1974); and Williams (1991).
In the next section, we describe ASTRA in terms of its inputs, outputs and operations. Section 3 describes the experimental results of ASTRA, and Section 4 discusses its results. The paper ends with a summary of the conclusions.
3. System Description of ASTRA
Before we describe our application of ASTRA to nuclear astrophysics, we should first describe its inputs, outputs and procedures, which include two main stages. The first generates all theoretically valid reactions, and the second produces reaction chains as process explanations for the nucleosynthesis of elements.
3.1 Generating Reactions
The first stage of ASTRA takes as input descriptions for a set of elements and isotopes. Each entity is characterized in terms of five quantum properties: rest mass (in MeV/c2), electric charge, spin counts, lepton counts, and baryon counts. We also give ASTRA theoretical knowledge about conservation relations over these quantum properties that hold in reactions among the elements and isotopes. Finally, we constrain the system to consider only the exothermic reactions, assuming that endothermic reactions play a relatively minor role in stellar nucleosynthesis.
Based on this information, ASTRA systematically generates all fusion and decay reactions among these elements that obey the conservation laws, together with their energy emissions, or Q-values, in terms of mega electron volts (MeV). The reactions generated by the program are in the form: Rm Pn , m = 1,2; n = 1,2,3 where Rm and Pn are the sets of the reacting and resulting elements respectively, and m and n are the number of elements in the sets. (For m = 1, the formula represents decay reactions). An example of the output of this module for the H + 6Li reactions is as follows:*
-------
* The reaction formulations of ASTRA are based on neutral atoms. For this reason, there appear minor differences with textbook notations, such as in the second reaction above whose textbook version is H + 6Li 7Li + n, instead of H + 6Li 7Li + /e + n.
reaction( [h,li6], [be7], 5.68 ),
reaction( [h,li6], [li7, nu], 6.48 ),
reaction( [h,li6], [he4, he3], 4.08 ).
where the first list shows the reacting elements, the second the resulting elements, and the figures before the right parantheses, the total Q-values.
For the runs described in this paper, we provided ASTRA with the elements from hydrogen to oxygen, their isotopes and a few elementary particles like the electron, proton, neutron and the neutrino with their antiparticles, giving a total of 36 distinct entities. From these, the system generated some 400 different reactions, but some were minor variations on one another such as 3He + 9Be 12C + e + /e and 3He + 9Be 12C + n + /n. We eliminated such near repetitions manually, leaving 276 reactions that included 262 fusion reactions and 14 decays.
3.2 Generating Reaction Chains
ASTRA. s second stage takes as input these primitive reactions, along with an element E whose syntheses we want explained and the basic elements/isotopes (E) that we assume as given (typically hydrogen and deuterium). In response, the system generates all reaction chains that lead from the starting elements to the final element through the various reactions identified in the first stage. The system uses a depth-first, backward chaining search to construct the reaction chains. On the first step, ASTRA finds those reactions that give as an output the final element E. Upon selecting one of these reactions, R, it recursively finds those reactions that give as an output one of more R. s input elements. The algorithm continues this process, halting its recursion when it finds a reaction chain for which all the reacting elements are in (E), or when it cannot find a reaction off which to chain. ASTRA generates all possible reaction chains in this systematic manner.
ASTRA produces all possible exothermic fusion reactions and decays including the ones given in the astrophysics literature that we have looked at. The program constructs a large number of reaction chains, most of which would be ruled out by physicists on grounds of low reaction rates. However, some of the pathways produced by the program seem to be viable alternatives to the currently proposed mechanisms, both on the account of the energy emissions and the existence of the elements in stars.
4. The Results of ASTRA
In this section we report the results of our tests with ASTRA, within the conceptual framework of research topics in nuclear astrophysics. We first address two broad classes of reactions that are believed to play an important role in stellar nucleosyntheses, then turn to reaction chains that explain the synthesis of heavier elements.
4.1 Proton, Electron and Neutron Captures
The synthesis of chemical elements from hydrogen and helium in successive steps in stellar systems are explained by astrophysicists by a series of fusion and decay reactions. Two main processes among these reactions are proton and neutron captures in which a nucleus reacts with a proton or a neutron, giving a heavier element or isotope. Electron captures in which an orbital electron is absorbed by the atomic nucleus with the emission of a neutrino, play a relatively minor role in the nucleosyntheses.
Proton captures are an important class of exothermic reactions that take part in hydrogen burning processes. We have found 33 examples of proton captures given in astrophysics literature (e.g., Fowler, et al., 1967, 1975, 1983) for elements from hydrogen to oxygen (16O).
ASTRA. s first stage predicts that all elements from hydrogen to nitrogen (15N), with the exception of 4He, participate in proton capture. The program produces 46 such reactions, including all 33 examples we have found in texts, but also 13 others which we have not seen in astrophysics texts that we examined. Some of these reactions are:
H + 6Li 7Be
H + 9Be 4He + 4He + D
H + 9Be 10B
H + 10B 7Be + 4He
H + 11B 12C.
Electron capture reactions are weak interactions in which an electron is absorbed by the atomic nucleus to be transformed into one with a smaller atomic number. An important example which also take place in what is called the pp2 chain given below, is (e + 7Be 7Li + n). ASTRA. s first stage produces 6 electron capture reactions of which only the one just given appears in astrophysics texts.
In fusion reactions that involve neutron capture, an element combines with a neutron to form a heavier isotope of the same element. We found 17 neutron captures for light elements in the literature, while ASTRA predicts 59 such reactions that are theoretically possible for the same elements. These include the following reactions that we did not see in the texts:
n + 6Li 7Be + n
n + 7Be 4He + 4He
n + 8Be 9Be
n + 10B 11B
The third reaction may play an important role in stellar reaction pathways, which we will consider shortly.
4.2 Neutron and Deuteron Production
Neutron capture requires a continuous supply of neutrons in the stellar plasma, so that it relies on some neutron producing reaction. Audouze & Vauclair (1980, p. 86) suggest that
D + D 3He + n ,
is the only reaction that releases neutrons in the hydrogen burning stage of main-sequence stars. Yet, ASTRA also predicts six additional reactions that produce neutrons, three of which are
D + T 4He + n
3He + 7Li 9B + n
D + 9Be 10B + n
4He + 9Be 12C + n.
The first reaction appears likely in main-sequence stars, as D and T exist in them. However, astrophysicists would ignore most of these reactions for their low reactant abundances in stellar plasma. Most of the neutron-producing reactions rely on a deuteron as one of their inputs. The best known deuteron source is the reaction:
H + H D + /e + n ,
and in astrophysics texts we have found two more reactions that produce deuterium (T + 3He 4He + D and H + 9Be 8Be + D ). However, the first stage of ASTRA predicts 15 other reactions of this sort. These include:
3He + 6Li 7Be + D
3He + 7Li 8Be + D
4He + 10B 12C + D
3He + 11B 12C + D
3He + 13C 14N + D.
The first two of these reactions should take place in main-sequence stars, as 6Li and 7Li are known to exist there, yet we have not found either reaction in the literature that we examined. Again, astrophysicists would presumably ignore most of these reactions for their low reactant abundances.
4.3 Helium Synthesis by Hyrogen Burning
The transformation of hydrogen into helium in a series of nuclear processes which take place in main sequence stars as the principal source of energy. The standard reaction chains given in astrophysics texts (e.g. Audouze & Vauclair, 1980, p. 52; Williams, 1991, p. 351) for helium synthesis in such stars are the hydrogen-burning processes called . proton-proton. or pp chains. The first of these chains, is given as:
a. H + H D + /e + n
b. D + H 3He
c. 3He + 3He 4He + H + H.
The net effect of this reaction chain when reaction (a) occurs twice, is 4 H 4He + 2 n + 26.72 MeV. Another pathway hypothesized, is called . alpha-catalyzed chain. , is
d. 3He + 4He 7Be
e. 7Be + e 7Li + n
f. H + 7Li 8Be
g. 8Be 4He + 4He
in which reactions b and c provide both the 3He and the 4He needed by reaction d. An alternative pathway, which also appears in texts, replaces reaction e with H + 7Be 8B and f with 8B 8Be + /e + n, which produce the 8Be needed by the final reaction through a different mechanism. Astrophysicists refer to these three pathways as the pp1, pp2 and pp3 chains, respectively.
When asked to generate reaction chains from hydrogen to helium, the ASTRA system finds all of these reaction chains including the CNO cycle. Yet, ASTRA formulates another reaction chain which involves an electron capture, as in pp2:
H + H D + n
D + H 3He
3He + 6Li 9B
e + 9B 9Be + n
H + 9Be 4He + 6Li.
which has the same net effect. We did not see any record of this chain in the literature we examined. The program also finds 44 other processes of helium synthesis that differ in their last link of the chains. Many of these would be disregarded by astrophysicists for their small cross sections, including the chain:
H + H D + n
D + 4He 6Li
H + 6Li 7Li + n
H + 7Li 4He + 4He ,
As D is believed to be quickly destroyed by the reaction D + H 3He. Both Cujec & Fowler (1980) and Harris, Fowler, Caughlan, and Zimmerman (1983) argue that the reactions involving D are unlikely due to their low abundance. However, Clayton (1983, pp. 371-2) notes that the density of deuterium in the interstellar medium and the sun remains unknown, and suggests that the substance might be more common that usually believed.
4.4 Generation of Carbon and Oxygen
The origin and the relative abundance of carbon and oxygen has been one of the main concerns of astrophysics. The standard account (e.g., Fowler, 1986, pp. 5-6) relies on the process of helium-burning, in which helium nuclei react to form carbon and oxygen in the following steps:
4He + 4He 8Be
4He + 8Be 12C
4He + 12C 16O .
However, there were theoretical problems with this account; the first reaction is endothermic and the lifetime of 8Be is very short (2x 10-16s). Later calculations showed that 8Be resonances were sufficiently stable to allow the reaction with an alpha particle to produce carbon as in the second reaction. ASTRA does not formulate the reaction 4He + 4He 8Be because it is slightly endothermic, but the system finds 20 other reactions that produce 8Be, such as
D + 6Li 8Be
3He + 7Li 8Be + D
n + 7Be 8Be .
Once 8Be is available, 4He + 8Be 12C can take place exothermically, so ASTRA formulates this reaction. The system produces 24 additional chains that differ in their final steps to 12C. These include:
n + 8Be 9Be
4He + 9Be 12C + n ,
which relies on one of the neutron capture reactions that we discussed earlier. An alternative and even more plausible pathway produced by ASTRA involves a proton capture:
H + 8Be 9Be + n
4He + 9Be 12C + n .
Briefly, if 8Be captures a neutron or proton before it decays, then it transforms into its stable isotope 9Be. This in turn produces carbon by reacting with 4He, where the emitted neutron from the latter reaction can combine with another 8Be. Once 12C is formed, in whatever manner, it can react with 4He exothermically to produce oxygen
4He + 12C 16O .
In summary, ASTRA finds a number of reaction chains to carbon and oxygen that do not appear in astrophysics literature, all of which are theoretically possible, but the final judgement about their scientific value requires further evaluation, as we discuss next.
5. Discussion of Results
We have carefully compared ASTRA. s outputs, at both the reaction and pathway level to those available in astrophysics texts (Clayton, 1983; Audouze & Vauclair, 1980; Kippenhahn & Weigert, 1994; Fowler et al., 1967, 1975, 1983; Cujec & Fowler, 1980). We have examined the results of the system only on exothermic reactions, but ASTRA can formulate reactions in any energy band.
Although ASTRA calculates the Q-values of all reactions that it formulates, the current version does not take into account the reaction rates which are used by astrophysicists in determining the more likely reactions and reaction chains. Due to this limitation, the current version cannot decide which reactions must be dominant in a given burning phase in the star. We are considering to implement this capability in the program's future versions, in such a way that, given a stellar model (e.g., a model of the sun), when the reaction rates are given, it should eliminate some of the low-rate reactions before constructing the reaction chains.
It would be impossible for astrophysicists even to formulate all theoretically possible reactions for an exhaustive research, without a computational aid like ASTRA. For example, Fowler, et al. 1967; 1970 and 1983 cite 88 reactions for elements from H to 16O in their research, while ASTRA uses 276 such reactions. In the 88 reactions, the same authors cite 33 H-capture, 17 n-capture and 8 D-fusion reactions, while ASTRA formulates 46 H-captures, 59 n-captures and 75 D-fusions for the same range of elements.
The ASTRA program can handle a very large volume of data for constructing reaction chains, and although the hydrogen and helium burning processes have been dealt with extensively in the current literature, there may still be room for research. We understand that there is even more room for research on the synthesis of the heavier elements. Therefore, a complete analysis on the reactions and pathways can only be carried out with the aid of a computational tool such as our program.
Despite its current limitations the program still has the potential of being useful in several subfields of nuclear astrophysics. Although we tested ASTRA on the exothermic reactions of the light elements from H to 16O, the system can be used in exploring the reactions of heavier elements from oxygen to iron and further, which take place in stellar and interstellar processes.
Our research is continuing in three strands: on the one, we are in the process of evaluating the current results of the system, while on the other, we plan to add to it, the ability to use the rates of the reactions to distinguish more likely mechanisms, and finally, we are improving its interface to make the system a more useful research aid for astrophysicists.
6. Conclusions
In this paper we described ASTRA, a computational tool which formulates nuclear reactions and pathways for researchers in astrophysics. Although we have been generally satisfied with ASTRA's performance to date, there clearly exists a number of directions in which we can extend our work. We are planning to present the program's predictions to domain experts to further evaluate the behavior of the current system in terms of the novelty and plausibility of its results. Our work is continuing to improve ASTRA, and to make it a more useful research tool for astrophysicists.
References
Audouze, J., & Vauclair, S. (1980). An introduction to nuclear astrophysics. Holland: D. Riedel.
Clayton, D.D. (1983). Principles of Stellar Evolution and Nucleosynthesis. Chicago: The University of Chicago Press.
Cujec, B. & Fowler, W.A. (1980). Neglect of D, T, and 3He in advanced stellar evolution. The Astropysical Journal, 236: 658-660.
Kippenhahn, R. and Weigert, A. (1994). Stellar Structure and Evolution. London: Springer-Verlag.
Feigenbaum, E. A., Buchanan, B.G., Lederberg, J. (1971). On generality and problem solving: A case study using the DENDRAL program. In Machine Intelligence (vol. 6). Edinburgh: Edinburgh University Press.
Fowler, W.A. (1986). The synthesis of the chemical elements carbon and oxygen. In S.L. Shapiro & S.A. Teukolsky (Eds.), Highlights of modern astrophysics. New York: John Wiley & Sons.
Fowler, W.A., Caughlan, G.R., and Zimmermann, B.A. (1967). Thermonuclear Reaction Rates. Ann. Rev. Astron. Astrphysics, 5, 525-570.
Fowler, W.A., Caughlan, G.R., and Zimmermann, B.A. (1975). Thermonuclear Reaction Rates. Ann. Rev. Astron. Astrphysics, 13, 69-112.
Harris, M.J., Fowler, W.A. Caughlan, G.R., and Zimmermann, B. (1983). Thermonuclear reaction rates. Ann. Rev. Astron. Astrophysics, 21: 165-176.
Hendrickson, J.B. (1995). Systematic synthesis design: The SYNGEN program. Working Notes of the AAAI Spring Symposium on Systematic Methods of Scientific Discovery (pp. 13-17). Stanford, CA: AAAI Press.
Kocabas, S. (1991). Conflict resolution as discovery in particle physics. Machine Learning, 6, 277-309.
Kocabas, S., & Langley, P. (1995). Integration of research tasks for modeling discoveries in particle physics. Working notes of the AAAI Spring Symposium on Systematic Methods of Scientific Discovery (pp. 87-92). Stanford, CA: AAAI Press.
Kulkarni, D., & Simon, H.A. (1990). Experimentation in machine discovery. In J. Shrager & P. Langley (Eds.), Computational models of scientific discovery and theory formation. San Mateo, CA: Morgan Kaufmann.
Lang, K.R. (1974). Astrophysical formulae: A compendium for physicists and astrophysicists. New York: Springer-Verlag.
Langley, P., Simon, H.A., Bradshaw, G.L., & Zytkow, J.M. (1987). Scientific Discovery: Computational explorations of the creative processes. Cambridge, MA: MIT Press.
Lenat, D. (1979). On automated scientific theory formation: a case study using the AM program. In: Hayes, J., Michie, D., and Mikulich, L.I., eds. Machine Intelligence 9, 251-283, Halstead: New York.
Rose, D., & Langley, P. (1986). Chemical discovery as belief revision. Machine Learning, 1, 423-451.
Valdes-Perez, R.E. (1994). Discovery of conserved properties in particle physics: A comparison of two models. Machine Learning, 17, 47-67.
Valdes-Perez, R.E. (1995). Machine discovery in chemistry: New results. Artificial Intelligence, 74, 191-201.
Valdes-Perez, R.E. (1997). Computer-aided mechanism elucidation of acetylene hydrocarboxylation to acrylic acid based on a novel union of empirical and formal methods. Organometallics, 16(14): 3114-3127.
Williams, W.S.C. (1991). Nuclear and Particle Physics. Oxford: Clarendon Press.
Zytkow, J.M., & Simon, H.A. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
Sakir Kocabas
Department of Space Sciences and Technology
ITU, 80626 Maslak, Istanbul, TURKEY
Pat Langley
(LANGLEY @ NEWATLANTIS.ISLE.ORG)
Institute for the Study of Learning and Expertise
2164 Staunton Court, Palo Alto, CA 94306 USA
Abstract
In this paper we describe ASTRA, a computational research aid for the formulation and analysis of process explanations in nuclear astrophysics. The system operates in two independent modules. The first module generates fusion and decay reactions for the light elements from hydrogen to oxygen by using knowledge of quantum theory, and from these reactions, the second module constructs all theoretically possible reaction chains as process explanations for the nucleosynthesis of helium, carbon and oxygen. ASTRA has found apparently novel reactions that involve proton, electron and neutron capture. Currently, there is a small number of reactions and pathways proposed by astrophysicists to explain the synthesis of these elements and their relative abundance in stellar systems. ASTRA also produces many alternative reaction pathways, some of which are of interest to scientists working in this domain.
1 Introduction
Computational modeling of scientific discovery has been a primary concern of a small number of research groups in artificial intelligence, and has made considerable advances in its short history. A number of models have been developed in the last two decades to simulate discoveries in fields such as mathematics, physics, chemistry, and biology. These models addressed different aspects of discovery in formal and experimental sciences, such as mathematical theory formation (Lenat, 1979), searching for quantitative relationships and hypothesis formation (Langley, Simon, Bradshaw & Zytkow, 1987), theory development through the discovery of componential models (Zytkow & Simon, 1986; Rose & Langley, 1986), scientific problem formulation and experiment design (Kulkarni & Simon, 1990), theory formation and theory revision (Kocabas, 1991), and theory formation (Valdes-Perez, 1994).
In recent years however, interest increased towards the computational discovery of new scientific knowledge by means of new models. One of the earliest computational tools used in producing new scientific knowledge was DENDRAL (Feigenbaum, Buchanan & Lederberg, 1971), which helped analytical chemists to build correct 2-d models of some complex chemical substances. Two recent examples are Hendrickson's (1995) SYNGEN which designs the synthesis of some organic compounds from initial and intermediate compounds, and Valdes-Perez's (1995; 1997) MECHEM which has found new reaction pathways in physical chemistry.
This paper focuses on the results of ASTRA, an astrophysical research aid designed to support scientists in explaining the nucleosynthesis of elements and their relative abundance in stars. The program is a successor of BR-4 (Kocabas & Langley, 1995) which was developed as an integrated model for studying the role of predictions in particle physics. The behavior and the results of ASTRA is described with an emphasis on the system's abilities as a research tool in astrophysics.
2. Research Problems in Astrophysics
Astrophysics is a curious field of study related with the tiniest and the largest objects in the universe, the elementary particles, and stars and galaxies. One of its subfields, nuclear astrophysics, mainly concerns with the formation of chemical elements from hydrogen (H) and helium (4He), thought to have emerged in the early history of the universe, through a series of fusion and decay reactions in stars. Another important problem concerns the relative abundances of elements, in particular the abundance carbon (12C), nitrogen (14N) and oxygen (16O) relative to lighter elements lile lithium (7Li), beryllium (9Be) and boron (11B).
According to the current astrophysical theories, stars go through several stages in their lifetimes. The first stage, which follows the star. s initial formation by the condensation of cosmic clouds and hydrogen gas, involves . hydrogen burning. . During this stage, stars radiate energy emitted by a series of exothermic fusion reactions in which hydrogen is transformed into helium. Astrophysicists propose three different pathways (Audouze & Vauclair, 1980, p. 52; Williams, 1991, p. 351) to account for hydrogen burning in stars the size of the sun and smaller. Later stages consist of more complex reactions, typically involving heavier elements.
In their attempt to explain nucleosyntheses, the theorists first select a stellar model in thermal equilibrium which makes certain assumptions about the mass, temperature, density, and the element distribution in the stellar plasma. They then identify the particle and nuclear reactions consistent with quantum physics by calculation. Finally, they calculate the rates of these reactions, by using experimental and theoretical knowledge about nuclear cross-sections and reactant abundances. In this way they obtain a set of valid reactions with their rate coefficients. Scientists use the reactions with high rates to construct the pathways, either by working forward from lighter elements to the final element, or backward from the final element, until reaching to the existing lighter elements.
Naturally, there are many possible reactions, and a great number of reaction pathways even for a small number of reactions to start with. Astrophysicists deal with this problem by focusing their attention on only a small set of reactions, relying on heuristics to constrain the generation of explanatory hypotheses. In this process, there seems to be at least two places where automation could be used: In the formulation of all possible reactions within any selected energy band, and in the construction of pathways from any selected set of reactions.
In developing ASTRA our main objective was to see its results on several important research topics in nuclear astrophysics. These were: 1) hydrogen-burning processes, 2) helium burning processes, 3) formation of heavier elements carbon, nitrogen and oxygen through hydrogen and helium burning, and other fusion chains, 4) the role of neutrons in such processes, and 5) the anomaly in the relative abundance of the light elements.
We have examined a number of books and journal papers on nuclear astrophysics, notably the following work : Audouze & Vauclair (1980); Clayton (1983); Fowler (1986); Fowler, et al., 1967; Fowler et al., 1975; Harris & Fowler, et al., 1983; Cujec & Fowler, 1980; Kippenhahn & Weigert (1994); Lang (1974); and Williams (1991).
In the next section, we describe ASTRA in terms of its inputs, outputs and operations. Section 3 describes the experimental results of ASTRA, and Section 4 discusses its results. The paper ends with a summary of the conclusions.
3. System Description of ASTRA
Before we describe our application of ASTRA to nuclear astrophysics, we should first describe its inputs, outputs and procedures, which include two main stages. The first generates all theoretically valid reactions, and the second produces reaction chains as process explanations for the nucleosynthesis of elements.
3.1 Generating Reactions
The first stage of ASTRA takes as input descriptions for a set of elements and isotopes. Each entity is characterized in terms of five quantum properties: rest mass (in MeV/c2), electric charge, spin counts, lepton counts, and baryon counts. We also give ASTRA theoretical knowledge about conservation relations over these quantum properties that hold in reactions among the elements and isotopes. Finally, we constrain the system to consider only the exothermic reactions, assuming that endothermic reactions play a relatively minor role in stellar nucleosynthesis.
Based on this information, ASTRA systematically generates all fusion and decay reactions among these elements that obey the conservation laws, together with their energy emissions, or Q-values, in terms of mega electron volts (MeV). The reactions generated by the program are in the form: Rm Pn , m = 1,2; n = 1,2,3 where Rm and Pn are the sets of the reacting and resulting elements respectively, and m and n are the number of elements in the sets. (For m = 1, the formula represents decay reactions). An example of the output of this module for the H + 6Li reactions is as follows:*
-------
* The reaction formulations of ASTRA are based on neutral atoms. For this reason, there appear minor differences with textbook notations, such as in the second reaction above whose textbook version is H + 6Li 7Li + n, instead of H + 6Li 7Li + /e + n.
reaction( [h,li6], [be7], 5.68 ),
reaction( [h,li6], [li7, nu], 6.48 ),
reaction( [h,li6], [he4, he3], 4.08 ).
where the first list shows the reacting elements, the second the resulting elements, and the figures before the right parantheses, the total Q-values.
For the runs described in this paper, we provided ASTRA with the elements from hydrogen to oxygen, their isotopes and a few elementary particles like the electron, proton, neutron and the neutrino with their antiparticles, giving a total of 36 distinct entities. From these, the system generated some 400 different reactions, but some were minor variations on one another such as 3He + 9Be 12C + e + /e and 3He + 9Be 12C + n + /n. We eliminated such near repetitions manually, leaving 276 reactions that included 262 fusion reactions and 14 decays.
3.2 Generating Reaction Chains
ASTRA. s second stage takes as input these primitive reactions, along with an element E whose syntheses we want explained and the basic elements/isotopes (E) that we assume as given (typically hydrogen and deuterium). In response, the system generates all reaction chains that lead from the starting elements to the final element through the various reactions identified in the first stage. The system uses a depth-first, backward chaining search to construct the reaction chains. On the first step, ASTRA finds those reactions that give as an output the final element E. Upon selecting one of these reactions, R, it recursively finds those reactions that give as an output one of more R. s input elements. The algorithm continues this process, halting its recursion when it finds a reaction chain for which all the reacting elements are in (E), or when it cannot find a reaction off which to chain. ASTRA generates all possible reaction chains in this systematic manner.
ASTRA produces all possible exothermic fusion reactions and decays including the ones given in the astrophysics literature that we have looked at. The program constructs a large number of reaction chains, most of which would be ruled out by physicists on grounds of low reaction rates. However, some of the pathways produced by the program seem to be viable alternatives to the currently proposed mechanisms, both on the account of the energy emissions and the existence of the elements in stars.
4. The Results of ASTRA
In this section we report the results of our tests with ASTRA, within the conceptual framework of research topics in nuclear astrophysics. We first address two broad classes of reactions that are believed to play an important role in stellar nucleosyntheses, then turn to reaction chains that explain the synthesis of heavier elements.
4.1 Proton, Electron and Neutron Captures
The synthesis of chemical elements from hydrogen and helium in successive steps in stellar systems are explained by astrophysicists by a series of fusion and decay reactions. Two main processes among these reactions are proton and neutron captures in which a nucleus reacts with a proton or a neutron, giving a heavier element or isotope. Electron captures in which an orbital electron is absorbed by the atomic nucleus with the emission of a neutrino, play a relatively minor role in the nucleosyntheses.
Proton captures are an important class of exothermic reactions that take part in hydrogen burning processes. We have found 33 examples of proton captures given in astrophysics literature (e.g., Fowler, et al., 1967, 1975, 1983) for elements from hydrogen to oxygen (16O).
ASTRA. s first stage predicts that all elements from hydrogen to nitrogen (15N), with the exception of 4He, participate in proton capture. The program produces 46 such reactions, including all 33 examples we have found in texts, but also 13 others which we have not seen in astrophysics texts that we examined. Some of these reactions are:
H + 6Li 7Be
H + 9Be 4He + 4He + D
H + 9Be 10B
H + 10B 7Be + 4He
H + 11B 12C.
Electron capture reactions are weak interactions in which an electron is absorbed by the atomic nucleus to be transformed into one with a smaller atomic number. An important example which also take place in what is called the pp2 chain given below, is (e + 7Be 7Li + n). ASTRA. s first stage produces 6 electron capture reactions of which only the one just given appears in astrophysics texts.
In fusion reactions that involve neutron capture, an element combines with a neutron to form a heavier isotope of the same element. We found 17 neutron captures for light elements in the literature, while ASTRA predicts 59 such reactions that are theoretically possible for the same elements. These include the following reactions that we did not see in the texts:
n + 6Li 7Be + n
n + 7Be 4He + 4He
n + 8Be 9Be
n + 10B 11B
The third reaction may play an important role in stellar reaction pathways, which we will consider shortly.
4.2 Neutron and Deuteron Production
Neutron capture requires a continuous supply of neutrons in the stellar plasma, so that it relies on some neutron producing reaction. Audouze & Vauclair (1980, p. 86) suggest that
D + D 3He + n ,
is the only reaction that releases neutrons in the hydrogen burning stage of main-sequence stars. Yet, ASTRA also predicts six additional reactions that produce neutrons, three of which are
D + T 4He + n
3He + 7Li 9B + n
D + 9Be 10B + n
4He + 9Be 12C + n.
The first reaction appears likely in main-sequence stars, as D and T exist in them. However, astrophysicists would ignore most of these reactions for their low reactant abundances in stellar plasma. Most of the neutron-producing reactions rely on a deuteron as one of their inputs. The best known deuteron source is the reaction:
H + H D + /e + n ,
and in astrophysics texts we have found two more reactions that produce deuterium (T + 3He 4He + D and H + 9Be 8Be + D ). However, the first stage of ASTRA predicts 15 other reactions of this sort. These include:
3He + 6Li 7Be + D
3He + 7Li 8Be + D
4He + 10B 12C + D
3He + 11B 12C + D
3He + 13C 14N + D.
The first two of these reactions should take place in main-sequence stars, as 6Li and 7Li are known to exist there, yet we have not found either reaction in the literature that we examined. Again, astrophysicists would presumably ignore most of these reactions for their low reactant abundances.
4.3 Helium Synthesis by Hyrogen Burning
The transformation of hydrogen into helium in a series of nuclear processes which take place in main sequence stars as the principal source of energy. The standard reaction chains given in astrophysics texts (e.g. Audouze & Vauclair, 1980, p. 52; Williams, 1991, p. 351) for helium synthesis in such stars are the hydrogen-burning processes called . proton-proton. or pp chains. The first of these chains, is given as:
a. H + H D + /e + n
b. D + H 3He
c. 3He + 3He 4He + H + H.
The net effect of this reaction chain when reaction (a) occurs twice, is 4 H 4He + 2 n + 26.72 MeV. Another pathway hypothesized, is called . alpha-catalyzed chain. , is
d. 3He + 4He 7Be
e. 7Be + e 7Li + n
f. H + 7Li 8Be
g. 8Be 4He + 4He
in which reactions b and c provide both the 3He and the 4He needed by reaction d. An alternative pathway, which also appears in texts, replaces reaction e with H + 7Be 8B and f with 8B 8Be + /e + n, which produce the 8Be needed by the final reaction through a different mechanism. Astrophysicists refer to these three pathways as the pp1, pp2 and pp3 chains, respectively.
When asked to generate reaction chains from hydrogen to helium, the ASTRA system finds all of these reaction chains including the CNO cycle. Yet, ASTRA formulates another reaction chain which involves an electron capture, as in pp2:
H + H D + n
D + H 3He
3He + 6Li 9B
e + 9B 9Be + n
H + 9Be 4He + 6Li.
which has the same net effect. We did not see any record of this chain in the literature we examined. The program also finds 44 other processes of helium synthesis that differ in their last link of the chains. Many of these would be disregarded by astrophysicists for their small cross sections, including the chain:
H + H D + n
D + 4He 6Li
H + 6Li 7Li + n
H + 7Li 4He + 4He ,
As D is believed to be quickly destroyed by the reaction D + H 3He. Both Cujec & Fowler (1980) and Harris, Fowler, Caughlan, and Zimmerman (1983) argue that the reactions involving D are unlikely due to their low abundance. However, Clayton (1983, pp. 371-2) notes that the density of deuterium in the interstellar medium and the sun remains unknown, and suggests that the substance might be more common that usually believed.
4.4 Generation of Carbon and Oxygen
The origin and the relative abundance of carbon and oxygen has been one of the main concerns of astrophysics. The standard account (e.g., Fowler, 1986, pp. 5-6) relies on the process of helium-burning, in which helium nuclei react to form carbon and oxygen in the following steps:
4He + 4He 8Be
4He + 8Be 12C
4He + 12C 16O .
However, there were theoretical problems with this account; the first reaction is endothermic and the lifetime of 8Be is very short (2x 10-16s). Later calculations showed that 8Be resonances were sufficiently stable to allow the reaction with an alpha particle to produce carbon as in the second reaction. ASTRA does not formulate the reaction 4He + 4He 8Be because it is slightly endothermic, but the system finds 20 other reactions that produce 8Be, such as
D + 6Li 8Be
3He + 7Li 8Be + D
n + 7Be 8Be .
Once 8Be is available, 4He + 8Be 12C can take place exothermically, so ASTRA formulates this reaction. The system produces 24 additional chains that differ in their final steps to 12C. These include:
n + 8Be 9Be
4He + 9Be 12C + n ,
which relies on one of the neutron capture reactions that we discussed earlier. An alternative and even more plausible pathway produced by ASTRA involves a proton capture:
H + 8Be 9Be + n
4He + 9Be 12C + n .
Briefly, if 8Be captures a neutron or proton before it decays, then it transforms into its stable isotope 9Be. This in turn produces carbon by reacting with 4He, where the emitted neutron from the latter reaction can combine with another 8Be. Once 12C is formed, in whatever manner, it can react with 4He exothermically to produce oxygen
4He + 12C 16O .
In summary, ASTRA finds a number of reaction chains to carbon and oxygen that do not appear in astrophysics literature, all of which are theoretically possible, but the final judgement about their scientific value requires further evaluation, as we discuss next.
5. Discussion of Results
We have carefully compared ASTRA. s outputs, at both the reaction and pathway level to those available in astrophysics texts (Clayton, 1983; Audouze & Vauclair, 1980; Kippenhahn & Weigert, 1994; Fowler et al., 1967, 1975, 1983; Cujec & Fowler, 1980). We have examined the results of the system only on exothermic reactions, but ASTRA can formulate reactions in any energy band.
Although ASTRA calculates the Q-values of all reactions that it formulates, the current version does not take into account the reaction rates which are used by astrophysicists in determining the more likely reactions and reaction chains. Due to this limitation, the current version cannot decide which reactions must be dominant in a given burning phase in the star. We are considering to implement this capability in the program's future versions, in such a way that, given a stellar model (e.g., a model of the sun), when the reaction rates are given, it should eliminate some of the low-rate reactions before constructing the reaction chains.
It would be impossible for astrophysicists even to formulate all theoretically possible reactions for an exhaustive research, without a computational aid like ASTRA. For example, Fowler, et al. 1967; 1970 and 1983 cite 88 reactions for elements from H to 16O in their research, while ASTRA uses 276 such reactions. In the 88 reactions, the same authors cite 33 H-capture, 17 n-capture and 8 D-fusion reactions, while ASTRA formulates 46 H-captures, 59 n-captures and 75 D-fusions for the same range of elements.
The ASTRA program can handle a very large volume of data for constructing reaction chains, and although the hydrogen and helium burning processes have been dealt with extensively in the current literature, there may still be room for research. We understand that there is even more room for research on the synthesis of the heavier elements. Therefore, a complete analysis on the reactions and pathways can only be carried out with the aid of a computational tool such as our program.
Despite its current limitations the program still has the potential of being useful in several subfields of nuclear astrophysics. Although we tested ASTRA on the exothermic reactions of the light elements from H to 16O, the system can be used in exploring the reactions of heavier elements from oxygen to iron and further, which take place in stellar and interstellar processes.
Our research is continuing in three strands: on the one, we are in the process of evaluating the current results of the system, while on the other, we plan to add to it, the ability to use the rates of the reactions to distinguish more likely mechanisms, and finally, we are improving its interface to make the system a more useful research aid for astrophysicists.
6. Conclusions
In this paper we described ASTRA, a computational tool which formulates nuclear reactions and pathways for researchers in astrophysics. Although we have been generally satisfied with ASTRA's performance to date, there clearly exists a number of directions in which we can extend our work. We are planning to present the program's predictions to domain experts to further evaluate the behavior of the current system in terms of the novelty and plausibility of its results. Our work is continuing to improve ASTRA, and to make it a more useful research tool for astrophysicists.
References
Audouze, J., & Vauclair, S. (1980). An introduction to nuclear astrophysics. Holland: D. Riedel.
Clayton, D.D. (1983). Principles of Stellar Evolution and Nucleosynthesis. Chicago: The University of Chicago Press.
Cujec, B. & Fowler, W.A. (1980). Neglect of D, T, and 3He in advanced stellar evolution. The Astropysical Journal, 236: 658-660.
Kippenhahn, R. and Weigert, A. (1994). Stellar Structure and Evolution. London: Springer-Verlag.
Feigenbaum, E. A., Buchanan, B.G., Lederberg, J. (1971). On generality and problem solving: A case study using the DENDRAL program. In Machine Intelligence (vol. 6). Edinburgh: Edinburgh University Press.
Fowler, W.A. (1986). The synthesis of the chemical elements carbon and oxygen. In S.L. Shapiro & S.A. Teukolsky (Eds.), Highlights of modern astrophysics. New York: John Wiley & Sons.
Fowler, W.A., Caughlan, G.R., and Zimmermann, B.A. (1967). Thermonuclear Reaction Rates. Ann. Rev. Astron. Astrphysics, 5, 525-570.
Fowler, W.A., Caughlan, G.R., and Zimmermann, B.A. (1975). Thermonuclear Reaction Rates. Ann. Rev. Astron. Astrphysics, 13, 69-112.
Harris, M.J., Fowler, W.A. Caughlan, G.R., and Zimmermann, B. (1983). Thermonuclear reaction rates. Ann. Rev. Astron. Astrophysics, 21: 165-176.
Hendrickson, J.B. (1995). Systematic synthesis design: The SYNGEN program. Working Notes of the AAAI Spring Symposium on Systematic Methods of Scientific Discovery (pp. 13-17). Stanford, CA: AAAI Press.
Kocabas, S. (1991). Conflict resolution as discovery in particle physics. Machine Learning, 6, 277-309.
Kocabas, S., & Langley, P. (1995). Integration of research tasks for modeling discoveries in particle physics. Working notes of the AAAI Spring Symposium on Systematic Methods of Scientific Discovery (pp. 87-92). Stanford, CA: AAAI Press.
Kulkarni, D., & Simon, H.A. (1990). Experimentation in machine discovery. In J. Shrager & P. Langley (Eds.), Computational models of scientific discovery and theory formation. San Mateo, CA: Morgan Kaufmann.
Lang, K.R. (1974). Astrophysical formulae: A compendium for physicists and astrophysicists. New York: Springer-Verlag.
Langley, P., Simon, H.A., Bradshaw, G.L., & Zytkow, J.M. (1987). Scientific Discovery: Computational explorations of the creative processes. Cambridge, MA: MIT Press.
Lenat, D. (1979). On automated scientific theory formation: a case study using the AM program. In: Hayes, J., Michie, D., and Mikulich, L.I., eds. Machine Intelligence 9, 251-283, Halstead: New York.
Rose, D., & Langley, P. (1986). Chemical discovery as belief revision. Machine Learning, 1, 423-451.
Valdes-Perez, R.E. (1994). Discovery of conserved properties in particle physics: A comparison of two models. Machine Learning, 17, 47-67.
Valdes-Perez, R.E. (1995). Machine discovery in chemistry: New results. Artificial Intelligence, 74, 191-201.
Valdes-Perez, R.E. (1997). Computer-aided mechanism elucidation of acetylene hydrocarboxylation to acrylic acid based on a novel union of empirical and formal methods. Organometallics, 16(14): 3114-3127.
Williams, W.S.C. (1991). Nuclear and Particle Physics. Oxford: Clarendon Press.
Zytkow, J.M., & Simon, H.A. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
Automated Formulation of Reactions and Pathways in Nuclear Astrophysics
AUTOMATED FORMULATION OF REACTIONS AND PATHWAYS IN NUCLEAR ASTROPHYSICS: NEW RESULTS
Sakir Kocabas
Department of Space Engineering
Abstract
This paper describes some of the new results from ASTRA, a knowledge based research aid for the formulation and analysis of process explanations in nuclear astrophysics. The program formulates valid fusion and decay reactions for the elements by using its knowledge of quantum theory, and from these reactions constructs all theoretically possible reaction chains as process explanations for the nucleosynthesis of elements. Earlier applications of ASTRA generated reactions of the elements and isotopes from hydrogen to oxygen, and found novel reactions that involve proton, electron and neutron capture for these elements, and a series of new reaction chains for hydrogen burning processes. We have recently extended the system’s knowledge base for elements from oxygen to sulphur. The new applications of ASTRA generated a series of reactions and pathways involving the heavier elements fluorine, neon, sodium, magnesium, aluminium, silicon and sulphur, some of which we did not see in the texts. The program also generated a complete series of carbon, nitrogen and oxygen burning reactions some of which may be of interest to astrophysicists.
Key words: Automated reasoning, scientific discovery.
1 Introduction
Computational design and construction of chemical and nuclear reaction processes have recently become an active area of research in computer aided scientific discovery. Three examples of such efforts are Hendrickson's (1995) SYNGEN which designs the synthesis of some organic compounds from initial and intermediate compounds, Valdes-Perez's (1995) MECHEM which has found new reaction pathways in physical chemistry (see, also Zeigarnik, et al., 1997), and Kocabas and Langley’s (1998) ASTRA system has found new reactions and pathways in nuclear astrophysics. This system was designed to support scientists in explaining various fusion processes, the nucleosynthesis of elements and their relative abundance in stars.
ASTRA differs from the earlier systems mainly in its focus on astrophysics, and in its ability to generate the basic reactions of the elements by using the principles of quantum physics. In this respect, the program is a successor of BR-4 (Kocabas & Langley, 1995), which carries out theory revision in particle physics much like its predecessor BR-3 (Kocabas, 1991). The BR-3 system in turn uses techniques and ideas from STAHL (Zytkow & Simon, 1986) and STAHLp (Rose & Langley, 1986), which modeled qualitative discovery in chemistry.
The behavior ASTRA was described elsewhere (see, Kocabas & Langley, 1998; 1999), so the focus here will be on some of the new results of this program with an emphasis on the system's abilities as a research tool in astrophysics.
2 Application Area: Nuclear Astrophysics
Nuclear astrophysics is a branch of astrophysics that mainly concerns with the formation of heavier elements such as carbon (12C), nitrogen (14N) and oxygen (16O) from hydrogen (H) and helium (4He), through a series of fusion and decay processes in stars. Exploration of the processes in which the heavier elements from oxygen (16O) to iron (56Fe) participate is another main topic in this field.
Current astrophysical theories suggest that, stars go through several stages in their lifetimes after their initial formation by the condensation of cosmic clouds and hydrogen gas. The first is called the “hydrogen burning stage” during which stars radiate energy emitted by a series of exothermic fusion reactions in which hydrogen is transformed into helium. Astrophysicists propose several different pathways (Audouze & Vauclair, 1980, p. 52; Williams, 1991, p. 351) to account for hydrogen burning in stars. Later stages, depending on the size of the star, involve processes such as helium burning, carbon, nitrogen and oxygen burning.
Astrophysicists explain nucleosyntheses by first adopting a stellar model in thermal equilibrium which makes certain assumptions about the mass, temperature, density, and the element distribution in the star. They then formulate the nuclear reactions using the constraints of quantum physics. They also calculate the rates of these reactions, by using experimental and theoretical knowledge about nuclear cross-sections and reactant abundances. Theoretically, there are many possible reactions, and a great number of reaction pathways even for a small number of elements. Astrophysicists deal with this problem by deleting the less likely reactions and focusing their attention on the reactions with high rates.
In our previous work with ASTRA (Kocabas & Langley, 1998; 1999; Kocabas & Langley, in press), we examined the resuts of the program on several research topics in nuclear astrophysics. These were: 1) hydrogen-burning processes, 2) helium burning processes, 3) formation of heavier elements carbon, nitrogen and oxygen through hydrogen and helium burning, and other fusion chains, 4) the role of neutrons in such processes, and 5) the anomaly in the relative abundance of the light elements.
In evaluating the results of ASTRA, we have examined a number of books and journal papers on nuclear astrophysics, notably the following work: Audouze & Vauclair (1980); Clayton (1983); Fowler (1986); Fowler, et al., 1967; Fowler et al., 1975; Harris & Fowler, et al., 1983; Cujec & Fowler, 1980; Kippenhahn & Weigert (1994); Lang (1974); Williams (1991); and Adelberger, E.G., et al. (1998). We have also discussed the results of the system with experts in astrophysics.
In the next section, we describe ASTRA in terms of its inputs, outputs and operations. Section 4 describes the experimental results of ASTRA, Section 5 discusses these results, and Section 6 discusses related reseach. The paper ends with a summary of the conclusions.
3 The ASTRA System
We first describe briefly, ASTRA’s inputs, outputs and operations, before we describe our application of the system to nuclear astrophysics with some of the earlier and the new results. A more detailed description can be found in Kocabas and Langley (1998). The program operates in two stages: the first generates all theoretically valid reactions, and the second produces reaction chains as process explanations for the nucleosynthesis of elements.
3.1 Formulating Reactions
The knowledge base of ASTRA includes descriptions for a set of elements and isotopes. The current version has information about 68 such entities. Each entity is characterized in terms of five quantum properties: rest mass (in MeV/c2), electric charge, spin, lepton, and baryon counts. ASTRA also has theoretical knowledge about conservation rules concerning the quantum values, which hold in reactions among the elements and isotopes. Typically, the exothermic reactions play the major role in stellar nucleosynthesis, but the program allows the selection of the energy band to assist more detailed study.
Based on this information, ASTRA systematically generates all collision and decay reactions among these elements that obey the conservation laws, together with their energy emissions, or Q-values, in terms of mega electron volts (MeV). The reactions generated by the program are in the form: Rm -> Pn , m = 1,2,3; n = 1,2,3 where Rm and Pn are the sets of the reacting and resulting elements respectively, and m and n are the number of elements in the sets. (For m = 1, m=2 and m=3 the formula represents decays, and double and triple collision reactions respectively). Examples of the runs of this module based on information about elements from hydrogen to oxygen can be found in Kocabas & Langley (1998).
Here we describe the new results of ASTRA with information about the elements from hydrogen to sulphur, their isotopes and a few elementary particles like the electron, proton, neutron and the neutrino with their antiparticles, giving a total of 68 distinct entities. From these, the system generated more than 600 different reactions, but some were minor variations on one another. We eliminated such variants manually, leaving 472 reactions that included 344 fusion reactions and 28 decays.
3.2 Constructing Reaction Chains
In order to construct reaction chains, ASTRA’s second stage takes as input the reactions generated by its first stage, an element E whose syntheses we want explained, and a starting element (typically hydrogen). The system generates all reaction chains that lead from the starting element to the final element through the various reactions identified in the first stage. ASTRA’s mechanisms for constructing reaction chains has been described elsewhere (see, Kocabas & Langley, 1998).
The program constructs a large number of reaction chains, most of which would be ruled out by physicists on grounds of low reaction rates. However, as a research aid, ASTRA provides a full range of possible reaction mechnisms to astrophysicists for more complete analysis of the the nuclear processes in their field of research.
4 The New Results of ASTRA
In this section we report the new results of our tests with ASTRA concerning hydrogen burning with heavier elements such as oxygen, fluor, neon, sodium, magnesium, aluminium, silicon and phosphorus. We begin with three classes of reactions that are believed to play an important role in stellar nucleosyntheses: proton, electron and neutron captures. We then turn to processes of helium, carbon and oxygen burning which explain the synthesis of heavier elements.
4.1 Proton, Electron and Neutron Captures
Three main processes for the nucleosynthesis of elements in stellar systems are proton, electron and neutron captures in which a nucleus reacts with a proton, electron or a neutron, giving a heavier element or isotope.
Proton captures are an important class of exothermic reactions that take part in hydrogen burning processes. In astrophysics literature we have found 33 examples of proton captures (e.g., Fowler, et al., 1967, 1975, 1983) for elements from hydrogen to oxygen (16O), and 20 more for elements from oxygen to sulphur (32S).
ASTRA’s first stage predicts that all elements from hydrogen to sulphur(32S), with the exception of 4He, participate in exothermic proton capture. The program produces 46 such reactions for elements from hydrogen to oxygen, including all 33 examples we have found in texts, but also 13 others which we have not seen in astrophysics texts that we examined. The program also finds 72 proton captures for elements from oxygen (16O) to sulphur (32S), including the 20 such reactions cited in the same literature.
In electron capture reactions, an electron is absorbed by the atomic nucleus to be transformed into one with a smaller atomic number. ASTRA’s first stage produces 6 electron capture reactions for elements from hydrogen to oxygen of which only one appears in astrophysics texts. The program also found 8 electron capture reactions for elements from oxygen to sulphur, none of which we have seen in the texts.
In neutron capture, an element combines with a neutron to form a heavier isotope of the same element. We found 17 neutron captures for elements from hydrogen to oxygen in the literature, while ASTRA predicts 59 such reactions that are theoretically possible for the same elements. Some examples of these reactions can be found in Kocabas and Langley (1998). Recent runs of the system generated 76 reactions for elements from oxygen to sulphur, none of which we have seen in the texts we have examined.
4.2 Hyrogen Burning Processes
In the main sequence stars hydogen is transformed into helium in a series of nuclear reaction chains called hydrogen burning processes. These processes are the main source of energy for such stars. The standard processes given in astrophysics texts (e.g. Audouze & Vauclair, 1980, p. 52; Williams, 1991, p. 351) for helium synthesis in such stars are called “proton-proton” or pp chains. Other hydrogen burning reactions that appear in texts involve heavier elements carbon, nitrogen and oxygen, and the pathway is called the CNO-chain.
When asked to generate reaction chains from hydrogen to helium, the ASTRA system finds all of these reaction chains including the CNO cycle. Yet, ASTRA also produces a viable variant of the CNO cycle using the electron capture of 13N (see, Kocabas & Langley, 1998).
Recently, we have run ASTRA on hydrogen burning reactions involving the elements heavier than oxygen. Such reactions are hypothesized to occur in stars larger than the sun. Some of the hydrogen burning chains that the program found, involving the elements fluorine, neon, sodium, magnesium, silicon, phosphorus and sulphur are:
H + 16O -> 17O + nu
H + 17O -> 18F
H + 18F -> 19Ne
19Ne + e -> 19F + e + nu (e-capture)
H + 19F -> 16O + 4He
------------------------------
Cumulative account: 4 H -> 4He + 2 nu
H + 23Na -> 24Mg
H + 24Mg -> 25Mg + nu
H + 25Mg -> 26Al
H + 26Al -> 27Si
27Si + e -> 27Al + e + nu
H + 27Al -> 24Mg + 4He
---------------------------------
H + 28Si -> 29Si + nu
H + 29Si -> 30P
H + 30P -> 31S
31S -> 31P + nu
H + 31P -> 28Si + 4He
----------------------------
ASTRA produces many more alternatives to these reactions, providing a complete framework to be examined by researchers in this field.
4.3 Helium Burning Processes
One of the main concerns of astrophysics has been the origin and the relative abundance of carbon and. The standard account (e.g., Fowler, 1986, pp. 5-6) assumes the reaction of helium nuclei to form carbon and oxygen. Earlier runs of ASTRA produced 24 additional chains that differ in their final steps to 12C. These include:
n + 8Be --> 9Be
4He + 9Be --> 12C + n ,
which relies on a neutron capture reaction. Astrophysicists qualified this process as one that can compete with the standard account in explosive stars that produce many neutrons. We have discussed ASTRA’s results on the nucleosynthesis of carbon and oxygen including the related helium burning processes elsewhere (Kocabas & Langley, 1998; Kocabas & Langley, in press) in more detail. So we now turn to the new results of the program on this issue.
ASTRA finds 24 helium burning reactions involving the range of elements from oxygen to silicon, including the 16 such reactions cited in the texts. Some of these are:
4He + 16O -> 20Ne + 5.16
4He + 20Ne -> 24Mg + 9.3
4He + 23Na -> 27Al + 10.2
4He + 24Mg -> 28Si + 10.1
4He + 28Si -> 32S + 6.9
where the figures on the right represent the energy emissions in MeV.
A comparison of the helium burning reactions produced by ASTRA with the natural abundances of the elements from oxygen to sulphur in the CRC Handbook (Weast, R.C. & Astle, M.J., 1981) reveals an interesting result: The elements fluorine, neon, sodium, magnesium, silicon, phosphorus and sulphur in the solar system must have been formed by helium burning processs, rather than neutron captures. This is because, the stable isotope abundances of these elements indicate a parallelism with the stepwise alpha-capture (helium burning) of the stable lighter isotopes of the elements in the series. This matter seemed to deserve further analysis.
4.4 Carbon, Nitrogen and Oxygen Burning
Carbon burning takes place after the helium burning stage in a star. ASTRA finds four carbon burning reactions which produce the elements neon, sodium, and magnesium:
12C + 12C -> 24Mg + 14.4
12C + 12C -> H + 23Na + 2.72
12C + 12C -> 4He + 20Ne + 5.1
In nitrogen burning, two nitrogen atoms fuse together to form elements ranging from oxygen to silicon. ASTRA finds 10 such reactions, two of which are:
14N + 14N -> 28Si + 27.82
14N + 14N -> 12C + 16O + 10.46
Finally, ASTRA formulates six oxygen burning reactions in which the elements magnesium, silicon, phosphorus and sulphur are generated. Three of these reactions are
16O + 16O -> 32S + 17.12
16O + 16O -> n + 31S + 2.05
16O + 16O -> 8Be + 24Mg + 0.02
Carbon, nitrogen and oxygen burning reactions happen only in massive stars as they require higher energies to initiate. The astrophysics texts that we examined mention only a few of these reactions, such as 12C + 12C -> 24Mg, 14N + 14N -> 28Si, and 16O + 16O -> 32S, while ASTRA provides a full account of such reactions.
5. Discussion of Results
We have dicussed some of the results of ASTRA with astrophysicists and carefully compared its outputs to those available in astrophysics texts (Clayton, 1983; Audouze & Vauclair, 1980; Kippenhahn & Weigert, 1994; Fowler et al., 1967, 1975, 1983; Cujec & Fowler, 1980; Adelberger, E.G., et al. (1998). We received some encouraging comments from domain experts about the results of the program but we need more detailed analysis before claims of originality.
Earlier we examined the results of ASTRA only on exothermic reactions. Following discussions with domain experts, we have improved the system to formulate reactions in any selected energy band. In certain stellar conditions, some endothermic reactions can contribute to speed up certain nuclear processes.
The current version of ASTRA does not calculate the reaction rates which are used by astrophysicists in determining the more likely reactions and reaction chains. Astrophysicists suggested that this feature would be very useful in a research tool like ASTRA. However, the current version can receive as input the rate values for each reaction it has formulated, and by deleting those with low rates, can effectively eliminate a large number of reaction chains for their slow rates. We are considering to fully implement this capability in the program's future versions with the help of domain experts.
On the other hand, without a computational aid like ASTRA, it would be impossible for astrophysicists even to formulate all the theoretically possible reactions for an exhaustive research. The program can handle a very large volume of data for constructing reaction chains, and although the hydrogen and helium burning processes have been dealt with extensively for lighter elements in the current literature, there is still much scope for research on the processes of the heavier elements. A complete analysis on the reactions and pathways can only be carried out with the aid of a computational tool such as our program.
Although we tested ASTRA on the reactions of the elements from hydrogen (H) to sulphur (32S), the system can be used in exploring the reactions of heavier elements from sulphur to iron (56Fe) and further, which take place in stellar and interstellar processes.
While we are in the process of evaluating the new results of the system, we are also planning to add the ability to use the rates of the reactions to distinguish more likely mechanisms, and finally, we are improving its interface to make the system a more useful research aid for astrophysicists.
6. Conclusions
In this paper we described the new results of ASTRA, a computational tool which formulates reactions and pathways for researchers in nuclear astrophysics. We received encouraging comments from astrophysicsts about the earlier results of the program, and suggestions on how to further imporve its features. We continue to collaborate with domain experts to evaluate the behavior of the current system in terms of the novelty and plausibility of its latest results, and to improve ASTRA in its functionalities to make it a more useful research tool for astrophysicists.
References
Adelberger, E.G., et al. (1998). Solar fusion cross sections. Reviews of Modern Physics, vol. 70, No. 4. Pp 1266-1291.
Audouze, J., & Vauclair, S. (1980). An introduction to nuclear astrophysics. Holland: D. Riedel.
Clayton, D.D. (1983). Principles of Stellar Evolution and Nucleosynthesis. Chicago: The University of Chicago Press.
Cujec, B. & Fowler, W.A. (1980). Neglect of D, T, and 3He in advanced stellar evolution. The Astropysical Journal, 236: 658-660.
Fowler, W.A. (1986). The synthesis of the chemical elements carbon and oxygen. In S.L. Shapiro & S.A. Teukolsky (Eds.), Highlights of modern astrophysics. New York: John Wiley & Sons.
Fowler, W.A., Caughlan, G.R., and Zimmermann, B.A. (1967). Thermonuclear Reaction Rates. Ann. Rev. Astron. Astrphysics, 5, 525-570.
Fowler, W.A., Caughlan, G.R., and Zimmermann, B.A. (1975). Thermonuclear Reaction Rates. Ann. Rev. Astron. Astrphysics, 13, 69-112.
Harris, M.J., Fowler, W.A. Caughlan, G.R., and Zimmermann, B. (1983). Thermonuclear reaction rates. Ann. Rev. Astron. Astrophysics, 21: 165-176.
Hendrickson, J.B. (1995). Systematic synthesis design: The SYNGEN program. Working Notes of the AAAI Spring Symposium on Systematic Methods of Scientific Discovery (pp. 13-17). Stanford, CA: AAAI Press.
Kippenhahn, R. and Weigert, A. (1994). Stellar Structure and Evolution. London: Springer-Verlag.
Kocabas, S. (1991). Conflict resolution as discovery in particle physics. Machine Learning, 6, 277-309.
Kocabas, S., & Langley, P. (1995). Integration of research tasks for modeling discoveries in particle physics. Working notes of the AAAI Spring Symposium on Systematic Methods of Scientific Discovery (pp. 87-92). Stanford, CA: AAAI Press.
Kocabas, S. & Langley, P. (1998). Generating process explanations in nuclear astrophysics. Proceedings of the ECAI-98 Workshop on Machine Discovery (pp.4-9), Brighton, UK.
Kocabas, S. & Langley, P. (1999). Automated formulation of Reactions and reaction chains in nuclear astrophysics. Proceedings of the 8th Turkish Symposium of Artificial Intelligence and Neural Networks. Boğaziçi University. pp. 247-256.
Kocabas, S. & Langley, P. (in press). Computer generation of process explanations in nuclear astrophysics. International Journal of Human-Computer Studies.
Lang, K.R. (1974). Astrophysical formulae: A compendium for physicists and astrophysicists. New York: Springer-Verlag.
Rose, D., & Langley, P. (1986). Chemical discovery as belief revision. Machine Learning, 1, 423-451.
Valdes-Perez, R.E. (1995). Machine discovery in chemistry: New results. Artificial Intelligence, 74, 191-201.
Zeigarnik, A.V., Valdes-Perez, R.E., Temkin, O.N., Bruk, L.G. & Shalgunov, S.I. (1997). Computer-aided mechanism elucidation of acetylene hydrocarboxylation to acrylic acid based on a novel union of empirical and formal methods. Organometallics, 16(14): 3114-3127.
Weast, R.C. & Astle, M.J. (Eds.). (1981). CRC handbook of chemistry and physics (62dn ed.). Florida: CRC Press.
Williams, W.S.C. (1991). Nuclear and Particle Physics. Oxford: Clarendon Press.
Zytkow, J.M., & Simon, H.A. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
Sakir Kocabas
Department of Space Engineering
Abstract
This paper describes some of the new results from ASTRA, a knowledge based research aid for the formulation and analysis of process explanations in nuclear astrophysics. The program formulates valid fusion and decay reactions for the elements by using its knowledge of quantum theory, and from these reactions constructs all theoretically possible reaction chains as process explanations for the nucleosynthesis of elements. Earlier applications of ASTRA generated reactions of the elements and isotopes from hydrogen to oxygen, and found novel reactions that involve proton, electron and neutron capture for these elements, and a series of new reaction chains for hydrogen burning processes. We have recently extended the system’s knowledge base for elements from oxygen to sulphur. The new applications of ASTRA generated a series of reactions and pathways involving the heavier elements fluorine, neon, sodium, magnesium, aluminium, silicon and sulphur, some of which we did not see in the texts. The program also generated a complete series of carbon, nitrogen and oxygen burning reactions some of which may be of interest to astrophysicists.
Key words: Automated reasoning, scientific discovery.
1 Introduction
Computational design and construction of chemical and nuclear reaction processes have recently become an active area of research in computer aided scientific discovery. Three examples of such efforts are Hendrickson's (1995) SYNGEN which designs the synthesis of some organic compounds from initial and intermediate compounds, Valdes-Perez's (1995) MECHEM which has found new reaction pathways in physical chemistry (see, also Zeigarnik, et al., 1997), and Kocabas and Langley’s (1998) ASTRA system has found new reactions and pathways in nuclear astrophysics. This system was designed to support scientists in explaining various fusion processes, the nucleosynthesis of elements and their relative abundance in stars.
ASTRA differs from the earlier systems mainly in its focus on astrophysics, and in its ability to generate the basic reactions of the elements by using the principles of quantum physics. In this respect, the program is a successor of BR-4 (Kocabas & Langley, 1995), which carries out theory revision in particle physics much like its predecessor BR-3 (Kocabas, 1991). The BR-3 system in turn uses techniques and ideas from STAHL (Zytkow & Simon, 1986) and STAHLp (Rose & Langley, 1986), which modeled qualitative discovery in chemistry.
The behavior ASTRA was described elsewhere (see, Kocabas & Langley, 1998; 1999), so the focus here will be on some of the new results of this program with an emphasis on the system's abilities as a research tool in astrophysics.
2 Application Area: Nuclear Astrophysics
Nuclear astrophysics is a branch of astrophysics that mainly concerns with the formation of heavier elements such as carbon (12C), nitrogen (14N) and oxygen (16O) from hydrogen (H) and helium (4He), through a series of fusion and decay processes in stars. Exploration of the processes in which the heavier elements from oxygen (16O) to iron (56Fe) participate is another main topic in this field.
Current astrophysical theories suggest that, stars go through several stages in their lifetimes after their initial formation by the condensation of cosmic clouds and hydrogen gas. The first is called the “hydrogen burning stage” during which stars radiate energy emitted by a series of exothermic fusion reactions in which hydrogen is transformed into helium. Astrophysicists propose several different pathways (Audouze & Vauclair, 1980, p. 52; Williams, 1991, p. 351) to account for hydrogen burning in stars. Later stages, depending on the size of the star, involve processes such as helium burning, carbon, nitrogen and oxygen burning.
Astrophysicists explain nucleosyntheses by first adopting a stellar model in thermal equilibrium which makes certain assumptions about the mass, temperature, density, and the element distribution in the star. They then formulate the nuclear reactions using the constraints of quantum physics. They also calculate the rates of these reactions, by using experimental and theoretical knowledge about nuclear cross-sections and reactant abundances. Theoretically, there are many possible reactions, and a great number of reaction pathways even for a small number of elements. Astrophysicists deal with this problem by deleting the less likely reactions and focusing their attention on the reactions with high rates.
In our previous work with ASTRA (Kocabas & Langley, 1998; 1999; Kocabas & Langley, in press), we examined the resuts of the program on several research topics in nuclear astrophysics. These were: 1) hydrogen-burning processes, 2) helium burning processes, 3) formation of heavier elements carbon, nitrogen and oxygen through hydrogen and helium burning, and other fusion chains, 4) the role of neutrons in such processes, and 5) the anomaly in the relative abundance of the light elements.
In evaluating the results of ASTRA, we have examined a number of books and journal papers on nuclear astrophysics, notably the following work: Audouze & Vauclair (1980); Clayton (1983); Fowler (1986); Fowler, et al., 1967; Fowler et al., 1975; Harris & Fowler, et al., 1983; Cujec & Fowler, 1980; Kippenhahn & Weigert (1994); Lang (1974); Williams (1991); and Adelberger, E.G., et al. (1998). We have also discussed the results of the system with experts in astrophysics.
In the next section, we describe ASTRA in terms of its inputs, outputs and operations. Section 4 describes the experimental results of ASTRA, Section 5 discusses these results, and Section 6 discusses related reseach. The paper ends with a summary of the conclusions.
3 The ASTRA System
We first describe briefly, ASTRA’s inputs, outputs and operations, before we describe our application of the system to nuclear astrophysics with some of the earlier and the new results. A more detailed description can be found in Kocabas and Langley (1998). The program operates in two stages: the first generates all theoretically valid reactions, and the second produces reaction chains as process explanations for the nucleosynthesis of elements.
3.1 Formulating Reactions
The knowledge base of ASTRA includes descriptions for a set of elements and isotopes. The current version has information about 68 such entities. Each entity is characterized in terms of five quantum properties: rest mass (in MeV/c2), electric charge, spin, lepton, and baryon counts. ASTRA also has theoretical knowledge about conservation rules concerning the quantum values, which hold in reactions among the elements and isotopes. Typically, the exothermic reactions play the major role in stellar nucleosynthesis, but the program allows the selection of the energy band to assist more detailed study.
Based on this information, ASTRA systematically generates all collision and decay reactions among these elements that obey the conservation laws, together with their energy emissions, or Q-values, in terms of mega electron volts (MeV). The reactions generated by the program are in the form: Rm -> Pn , m = 1,2,3; n = 1,2,3 where Rm and Pn are the sets of the reacting and resulting elements respectively, and m and n are the number of elements in the sets. (For m = 1, m=2 and m=3 the formula represents decays, and double and triple collision reactions respectively). Examples of the runs of this module based on information about elements from hydrogen to oxygen can be found in Kocabas & Langley (1998).
Here we describe the new results of ASTRA with information about the elements from hydrogen to sulphur, their isotopes and a few elementary particles like the electron, proton, neutron and the neutrino with their antiparticles, giving a total of 68 distinct entities. From these, the system generated more than 600 different reactions, but some were minor variations on one another. We eliminated such variants manually, leaving 472 reactions that included 344 fusion reactions and 28 decays.
3.2 Constructing Reaction Chains
In order to construct reaction chains, ASTRA’s second stage takes as input the reactions generated by its first stage, an element E whose syntheses we want explained, and a starting element (typically hydrogen). The system generates all reaction chains that lead from the starting element to the final element through the various reactions identified in the first stage. ASTRA’s mechanisms for constructing reaction chains has been described elsewhere (see, Kocabas & Langley, 1998).
The program constructs a large number of reaction chains, most of which would be ruled out by physicists on grounds of low reaction rates. However, as a research aid, ASTRA provides a full range of possible reaction mechnisms to astrophysicists for more complete analysis of the the nuclear processes in their field of research.
4 The New Results of ASTRA
In this section we report the new results of our tests with ASTRA concerning hydrogen burning with heavier elements such as oxygen, fluor, neon, sodium, magnesium, aluminium, silicon and phosphorus. We begin with three classes of reactions that are believed to play an important role in stellar nucleosyntheses: proton, electron and neutron captures. We then turn to processes of helium, carbon and oxygen burning which explain the synthesis of heavier elements.
4.1 Proton, Electron and Neutron Captures
Three main processes for the nucleosynthesis of elements in stellar systems are proton, electron and neutron captures in which a nucleus reacts with a proton, electron or a neutron, giving a heavier element or isotope.
Proton captures are an important class of exothermic reactions that take part in hydrogen burning processes. In astrophysics literature we have found 33 examples of proton captures (e.g., Fowler, et al., 1967, 1975, 1983) for elements from hydrogen to oxygen (16O), and 20 more for elements from oxygen to sulphur (32S).
ASTRA’s first stage predicts that all elements from hydrogen to sulphur(32S), with the exception of 4He, participate in exothermic proton capture. The program produces 46 such reactions for elements from hydrogen to oxygen, including all 33 examples we have found in texts, but also 13 others which we have not seen in astrophysics texts that we examined. The program also finds 72 proton captures for elements from oxygen (16O) to sulphur (32S), including the 20 such reactions cited in the same literature.
In electron capture reactions, an electron is absorbed by the atomic nucleus to be transformed into one with a smaller atomic number. ASTRA’s first stage produces 6 electron capture reactions for elements from hydrogen to oxygen of which only one appears in astrophysics texts. The program also found 8 electron capture reactions for elements from oxygen to sulphur, none of which we have seen in the texts.
In neutron capture, an element combines with a neutron to form a heavier isotope of the same element. We found 17 neutron captures for elements from hydrogen to oxygen in the literature, while ASTRA predicts 59 such reactions that are theoretically possible for the same elements. Some examples of these reactions can be found in Kocabas and Langley (1998). Recent runs of the system generated 76 reactions for elements from oxygen to sulphur, none of which we have seen in the texts we have examined.
4.2 Hyrogen Burning Processes
In the main sequence stars hydogen is transformed into helium in a series of nuclear reaction chains called hydrogen burning processes. These processes are the main source of energy for such stars. The standard processes given in astrophysics texts (e.g. Audouze & Vauclair, 1980, p. 52; Williams, 1991, p. 351) for helium synthesis in such stars are called “proton-proton” or pp chains. Other hydrogen burning reactions that appear in texts involve heavier elements carbon, nitrogen and oxygen, and the pathway is called the CNO-chain.
When asked to generate reaction chains from hydrogen to helium, the ASTRA system finds all of these reaction chains including the CNO cycle. Yet, ASTRA also produces a viable variant of the CNO cycle using the electron capture of 13N (see, Kocabas & Langley, 1998).
Recently, we have run ASTRA on hydrogen burning reactions involving the elements heavier than oxygen. Such reactions are hypothesized to occur in stars larger than the sun. Some of the hydrogen burning chains that the program found, involving the elements fluorine, neon, sodium, magnesium, silicon, phosphorus and sulphur are:
H + 16O -> 17O + nu
H + 17O -> 18F
H + 18F -> 19Ne
19Ne + e -> 19F + e + nu (e-capture)
H + 19F -> 16O + 4He
------------------------------
Cumulative account: 4 H -> 4He + 2 nu
H + 23Na -> 24Mg
H + 24Mg -> 25Mg + nu
H + 25Mg -> 26Al
H + 26Al -> 27Si
27Si + e -> 27Al + e + nu
H + 27Al -> 24Mg + 4He
---------------------------------
H + 28Si -> 29Si + nu
H + 29Si -> 30P
H + 30P -> 31S
31S -> 31P + nu
H + 31P -> 28Si + 4He
----------------------------
ASTRA produces many more alternatives to these reactions, providing a complete framework to be examined by researchers in this field.
4.3 Helium Burning Processes
One of the main concerns of astrophysics has been the origin and the relative abundance of carbon and. The standard account (e.g., Fowler, 1986, pp. 5-6) assumes the reaction of helium nuclei to form carbon and oxygen. Earlier runs of ASTRA produced 24 additional chains that differ in their final steps to 12C. These include:
n + 8Be --> 9Be
4He + 9Be --> 12C + n ,
which relies on a neutron capture reaction. Astrophysicists qualified this process as one that can compete with the standard account in explosive stars that produce many neutrons. We have discussed ASTRA’s results on the nucleosynthesis of carbon and oxygen including the related helium burning processes elsewhere (Kocabas & Langley, 1998; Kocabas & Langley, in press) in more detail. So we now turn to the new results of the program on this issue.
ASTRA finds 24 helium burning reactions involving the range of elements from oxygen to silicon, including the 16 such reactions cited in the texts. Some of these are:
4He + 16O -> 20Ne + 5.16
4He + 20Ne -> 24Mg + 9.3
4He + 23Na -> 27Al + 10.2
4He + 24Mg -> 28Si + 10.1
4He + 28Si -> 32S + 6.9
where the figures on the right represent the energy emissions in MeV.
A comparison of the helium burning reactions produced by ASTRA with the natural abundances of the elements from oxygen to sulphur in the CRC Handbook (Weast, R.C. & Astle, M.J., 1981) reveals an interesting result: The elements fluorine, neon, sodium, magnesium, silicon, phosphorus and sulphur in the solar system must have been formed by helium burning processs, rather than neutron captures. This is because, the stable isotope abundances of these elements indicate a parallelism with the stepwise alpha-capture (helium burning) of the stable lighter isotopes of the elements in the series. This matter seemed to deserve further analysis.
4.4 Carbon, Nitrogen and Oxygen Burning
Carbon burning takes place after the helium burning stage in a star. ASTRA finds four carbon burning reactions which produce the elements neon, sodium, and magnesium:
12C + 12C -> 24Mg + 14.4
12C + 12C -> H + 23Na + 2.72
12C + 12C -> 4He + 20Ne + 5.1
In nitrogen burning, two nitrogen atoms fuse together to form elements ranging from oxygen to silicon. ASTRA finds 10 such reactions, two of which are:
14N + 14N -> 28Si + 27.82
14N + 14N -> 12C + 16O + 10.46
Finally, ASTRA formulates six oxygen burning reactions in which the elements magnesium, silicon, phosphorus and sulphur are generated. Three of these reactions are
16O + 16O -> 32S + 17.12
16O + 16O -> n + 31S + 2.05
16O + 16O -> 8Be + 24Mg + 0.02
Carbon, nitrogen and oxygen burning reactions happen only in massive stars as they require higher energies to initiate. The astrophysics texts that we examined mention only a few of these reactions, such as 12C + 12C -> 24Mg, 14N + 14N -> 28Si, and 16O + 16O -> 32S, while ASTRA provides a full account of such reactions.
5. Discussion of Results
We have dicussed some of the results of ASTRA with astrophysicists and carefully compared its outputs to those available in astrophysics texts (Clayton, 1983; Audouze & Vauclair, 1980; Kippenhahn & Weigert, 1994; Fowler et al., 1967, 1975, 1983; Cujec & Fowler, 1980; Adelberger, E.G., et al. (1998). We received some encouraging comments from domain experts about the results of the program but we need more detailed analysis before claims of originality.
Earlier we examined the results of ASTRA only on exothermic reactions. Following discussions with domain experts, we have improved the system to formulate reactions in any selected energy band. In certain stellar conditions, some endothermic reactions can contribute to speed up certain nuclear processes.
The current version of ASTRA does not calculate the reaction rates which are used by astrophysicists in determining the more likely reactions and reaction chains. Astrophysicists suggested that this feature would be very useful in a research tool like ASTRA. However, the current version can receive as input the rate values for each reaction it has formulated, and by deleting those with low rates, can effectively eliminate a large number of reaction chains for their slow rates. We are considering to fully implement this capability in the program's future versions with the help of domain experts.
On the other hand, without a computational aid like ASTRA, it would be impossible for astrophysicists even to formulate all the theoretically possible reactions for an exhaustive research. The program can handle a very large volume of data for constructing reaction chains, and although the hydrogen and helium burning processes have been dealt with extensively for lighter elements in the current literature, there is still much scope for research on the processes of the heavier elements. A complete analysis on the reactions and pathways can only be carried out with the aid of a computational tool such as our program.
Although we tested ASTRA on the reactions of the elements from hydrogen (H) to sulphur (32S), the system can be used in exploring the reactions of heavier elements from sulphur to iron (56Fe) and further, which take place in stellar and interstellar processes.
While we are in the process of evaluating the new results of the system, we are also planning to add the ability to use the rates of the reactions to distinguish more likely mechanisms, and finally, we are improving its interface to make the system a more useful research aid for astrophysicists.
6. Conclusions
In this paper we described the new results of ASTRA, a computational tool which formulates reactions and pathways for researchers in nuclear astrophysics. We received encouraging comments from astrophysicsts about the earlier results of the program, and suggestions on how to further imporve its features. We continue to collaborate with domain experts to evaluate the behavior of the current system in terms of the novelty and plausibility of its latest results, and to improve ASTRA in its functionalities to make it a more useful research tool for astrophysicists.
References
Adelberger, E.G., et al. (1998). Solar fusion cross sections. Reviews of Modern Physics, vol. 70, No. 4. Pp 1266-1291.
Audouze, J., & Vauclair, S. (1980). An introduction to nuclear astrophysics. Holland: D. Riedel.
Clayton, D.D. (1983). Principles of Stellar Evolution and Nucleosynthesis. Chicago: The University of Chicago Press.
Cujec, B. & Fowler, W.A. (1980). Neglect of D, T, and 3He in advanced stellar evolution. The Astropysical Journal, 236: 658-660.
Fowler, W.A. (1986). The synthesis of the chemical elements carbon and oxygen. In S.L. Shapiro & S.A. Teukolsky (Eds.), Highlights of modern astrophysics. New York: John Wiley & Sons.
Fowler, W.A., Caughlan, G.R., and Zimmermann, B.A. (1967). Thermonuclear Reaction Rates. Ann. Rev. Astron. Astrphysics, 5, 525-570.
Fowler, W.A., Caughlan, G.R., and Zimmermann, B.A. (1975). Thermonuclear Reaction Rates. Ann. Rev. Astron. Astrphysics, 13, 69-112.
Harris, M.J., Fowler, W.A. Caughlan, G.R., and Zimmermann, B. (1983). Thermonuclear reaction rates. Ann. Rev. Astron. Astrophysics, 21: 165-176.
Hendrickson, J.B. (1995). Systematic synthesis design: The SYNGEN program. Working Notes of the AAAI Spring Symposium on Systematic Methods of Scientific Discovery (pp. 13-17). Stanford, CA: AAAI Press.
Kippenhahn, R. and Weigert, A. (1994). Stellar Structure and Evolution. London: Springer-Verlag.
Kocabas, S. (1991). Conflict resolution as discovery in particle physics. Machine Learning, 6, 277-309.
Kocabas, S., & Langley, P. (1995). Integration of research tasks for modeling discoveries in particle physics. Working notes of the AAAI Spring Symposium on Systematic Methods of Scientific Discovery (pp. 87-92). Stanford, CA: AAAI Press.
Kocabas, S. & Langley, P. (1998). Generating process explanations in nuclear astrophysics. Proceedings of the ECAI-98 Workshop on Machine Discovery (pp.4-9), Brighton, UK.
Kocabas, S. & Langley, P. (1999). Automated formulation of Reactions and reaction chains in nuclear astrophysics. Proceedings of the 8th Turkish Symposium of Artificial Intelligence and Neural Networks. Boğaziçi University. pp. 247-256.
Kocabas, S. & Langley, P. (in press). Computer generation of process explanations in nuclear astrophysics. International Journal of Human-Computer Studies.
Lang, K.R. (1974). Astrophysical formulae: A compendium for physicists and astrophysicists. New York: Springer-Verlag.
Rose, D., & Langley, P. (1986). Chemical discovery as belief revision. Machine Learning, 1, 423-451.
Valdes-Perez, R.E. (1995). Machine discovery in chemistry: New results. Artificial Intelligence, 74, 191-201.
Zeigarnik, A.V., Valdes-Perez, R.E., Temkin, O.N., Bruk, L.G. & Shalgunov, S.I. (1997). Computer-aided mechanism elucidation of acetylene hydrocarboxylation to acrylic acid based on a novel union of empirical and formal methods. Organometallics, 16(14): 3114-3127.
Weast, R.C. & Astle, M.J. (Eds.). (1981). CRC handbook of chemistry and physics (62dn ed.). Florida: CRC Press.
Williams, W.S.C. (1991). Nuclear and Particle Physics. Oxford: Clarendon Press.
Zytkow, J.M., & Simon, H.A. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
AI and Philosophy of Science
AI AND PHILOSOPHY OF SCIENCE
Sakir Kocabas*
Istanbul Technical University
Department of Space Sciences and Technology
Abstract
Recent research in the computational study of scientific discovery has revealed a number of critical aspects of science overlooked by the conventional philosophical study. A series of computational models developed by AI scientists to study the different aspects of historical discovery indicate that hypothesis formation, testing and verification are only a small part of scientific research. This paper investigates from AI perspective, scientific creativity, the processes of scientific research, the dimensions of scientific research, and the role of knowledge in scientific research.
------------
* Also affiliated with: Department of AI, Marmara Research Center, PK 21, Gebze, Turkey.
1. Introduction
Scientific discovery and creativity has, in the last fifteen years, become one of the special concerns of artificial intelligence (AI). Within this period, a number of research papers and two important books have appeared on scientific discovery (see, Langley, Simon, Bradshaw, & Zytkow, 1987; Shrager & Langley, 1990). Closely related with the subject, several other publications have appeared. These include one on the computational philosophy of science (Thagard, 1988), one on theory revision in science (Darden, 1991), and another one on creativity (Boden, 1990).
Langley et al.'s (1987) work posed the first serious challenge to the conventional study of science by proposing that, far from being mysterious and unexplainable, scientific discovery (and by implication, scientific creativity), can be explained in terms of a series of processes. Their work also described several computational models in support of the authors' view. Shrager and Langley's (1990) later study introduced new methods for the study of scientific development, and explained how the methods of the computational study of science were superior to those of conventional philosophical studies. Boden's (1990) work on the other hand, extended some of these views and discussed, from a cognitive scientist's perspective, how creativity in arts and literature, as well as in science could be studied within a computational context, in a more systematic way.
However, previous work leaves some important issues in discovery untouched, such as the elements of scientific creativity, the types of scientific discovery and creativity, and the dimensions of scientific research. In this study, we examine the basic cognitive concepts of creativity, and describe how these concepts are interrelated, and then discuss the role of background knowledge and the kinds of knowledge necessary for scientific research. Finally, we discuss the types of scientific discovery and the elements of scientific research, and conclude with a summary.
2. Intelligence and Creativity in Science
Creativity and intelligence are closely related concepts, so that the any attempt that brings clarity to one concept will be helpful to define the other. AI scientists rely on computational terms in their definitions. Lenat and Feigenbaum (1987) define intelligence in terms of "search", as the power to find a solution to a problem in an large search space. Later, Feigenbaum defined intelligence in terms of "knowledge assembly" rather than "search" (see, Engelmore & Morgan, 1988, vii). According to this definition, an intelligent system has the ability to assemble the neccessary body of knowledge to conduct a complex task.
However, these definitions do not capture the complexity of the concept of intelligence. A more detailed definition has been given by Hayes-Roth (1993) within the context of intelligent agents, where the author discusses the agent characteristics in three system components: perception, cognition and action. Accordingly, each component must be capable of operating independently in a coherent way. Additionally, each component must meet a series of criteria in order to be called intelligent. These criteria require that an intelligent system be capable of perceiving, thinking and acting in real-time, asynchronously, selectively, coherently, flexibly, responsively, robustly and timely. It must also be capable of developing its abilities by adaptation and learning. Many AI scientists discuss learning at two levels as symbolic and subsymbolic, and classify symbolic learning into several different types as rote learning, learning by instruction, inductive learning, deductive learning, and learning by analogy.
Creativity can also be classified into different types. Accordingly, a distinction can be made between scientific creativity and other types of creativity such as artistic, architectural, musical and literary creativity. The former may involve the discovery of a new substance, the invention of a new mechanism or method, or the construction of a new model of reality (a hypothesis or a theory). The latter however, mostly manifests itself as a work of art or a new style, and the term "creativity" is usually associated with this type.
Scientific creativity can be distinguished from other forms of creativity such as in arts, music and literature, by its extensive reliance on background knowledge and experience in history. This may explain why we do not see child prodigies in creative science as we see in music and arts. Therefore, when we talk about scientific creativity, it is to be understood within this perspective.
Scientific creativity can be investigated through five basic cognitive and computational concepts:
1) Motivation for scientific research.
2) Ability to correctly formulate research problems within a body of knowledge.
3) Ability to create a comprehensive search space for the solution of a scientific problem.
4) Ability to assemble (or induce) and implement a set of heuristics to reduce the search space.
5) Patience and stamina for the exhaustive search for solving the selected scientific problem within the constrained search space. Fig. 1 summarizes the links between these concepts. Any missing link between them, can hinder scientific creativity.
Motivation for Formulate Generate Reduce Conduct
Scientific --> Research --> Search --> Search --> Exhaustive
Research Problems Space Space Search
Fig. 1. Problem formulation and search in scientific discovery.
As indicated in the above list, research motivation tops the requirements for scientific creativity. Motivation itself can be dependent on basic psychological needs. Various types of human motivation have been studied by psychologists in the last five decades (see, e.g., Maslow, 1966). Metaphysical commitments and ontological assumptions about the world may also affect motivation (see, e.g. Kuhn, 1970, p.41). This is an important issue, but is outside the scope of this study.
Problem formulation is the second major issue in scientific research. In modern scientific research, an access to a large and systematic body of knowledge is necessary for correctly formulating scientific problems. The accurate formulation of research problems requires a mastery of the conceptual structure of the field of science involved. The creative scientist can change this structure for reformulating a research problem in his/her search for a solution. In some cases, changes in the conceptual structure involve the most fundamental concepts and principles, such as time and measurability in physics. Changing representations on the other hand, provides alternative views of the problem space, and is considered as one of the most influential parameters of creativity in science (see, e.g., Simon, 1992; Karmiloff-Smith, 1990).
Extensive knowledge may also be used in creating a comprehensive search space for the selected research problem. The search space is then reduced to a manageable size, by selecting and applying appropriate search strategies, methods and heuristics. This is necessary to reach for a solution within acceptable limits of time and resources. Once the problem is defined and constrained, exhaustive search needs to be carried out within the search space, until a conclusion is reached about the solution of the scientific problem. Scientific creativity exhibits itself during the completion of a series of research tasks. Different types of knowledge may be used for each task, as will be explained next.
3. Types of Knowledge Used in Research, Types of Scientific Discovery
Modern scientific research is one of the most complex human activities, requiring the use of different types of general and specific knowledge. The knowledge necessary for modern scientific research can be divided into four types as a) Commonsense Knowledge, b) Technical Knowledge, c) Theoretical Knowledge, and d) Methodological Knowledge (*).
Commonsense knowledge is simple, general and relatively unstructured knowledge about the world. Technical knowledge can be defined as the knowledge about instruments, methods and processes. Theoretical background is helpful, but not always essential, in acquiring this kind of knowledge. Technical knowledge can be descriptive as well as prescriptive.
-----------------------
* Knowledge used in the proecesses of scientific discovery is by no means limited to the four types listed here. There can be other types of knowledge, including religious symbolisms, to play a role in scientific research, as can be seen in the recent history of quark theory in particle physics.
Theoretical knowledge is structured, descriptive knowledge about the world, embodying classifications and numerous interrelated hypotheses. Typical examples of theoretical knowledge are the classical mechanics and electro-magnetism.
Methodological knowledge, on the other hand, is exclusively prescriptive; it can be represented as condition-action rules. Methodological knowledge includes knowledge about how to distinguish between scientifically interesting and uninteresting phenomena, how to choose between alternative goals, strategies and methods in scientific research, how to design experiments, how to propose new hypotheses, and how to generalize, test and evaluate them. It is mostly the extent of this type of knowledge that makes the difference between a research scientist and a nonscientist.
Unlike the inference rules in theoretical knowledge, many of the methodological rules rely on extralogical methods such as inductive generalizations, abduction, abstraction and analogy. Such rules are frequently used in formulating problem states, in constraining large search spaces, and in hypothesis formation during the activity of scientific research.
Scientific creativity can be examined in relation to the scope of the research in which a discovery takes place. Kocabas (1992c) introduces a classification of scientific discovery as follows: 1) Logico-Mathematical Discovery, 2) Formal Discovery, 3) Theoretical Discovery, and 4) Empirical Discovery. This classification is based on the categorization of descriptive knowledge by Kocabas (1992a), and reflects the types of knowledge used in scientific research, and the type of knowledge discovered. All these four types of discovery have been studied in AI by a series of computational models.
According to this classification, logico-mathematical discovery takes place, as the name suggests, in the abstract domain of logic and mathematics. The distinguishing characteristic of logico-mathematical discovery is that, in principle, it does not require experimentation or observation. Nor does it need the knowledge of a physical domain par se, except for analogical transference in some cases.
Formal discovery takes place in a formal domain involving abstract entities, their classes and properties. Formal discovery requires logico-mathematical knowledge as background knowledge, for deductive inference on formal knowledge.
Theoretical discovery requires logico-mathematical, formal and theoretical knowledge, and in general, results from theoretical analysis and synthesis. In the history of science there are rather important theoretical discoveries or inventions such as Maxwell's equations and the Einstein-Lorenz transformations.
Empirical discovery requires experimental and observational data, as well as logico-mathematical and formal knowledge. Theoretical knowledge has not been a prerequisite in the early empirical discoveries in the history of science (e.g. in the 17th and 18th century chemistry), but in modern empirical research such as in oxide superconductivity and fusion experiments, extensive theoretical domain knowledge is necessary.
4. Computational Models of Discovery
In parallel with the types of discovery described above, computational models developed by AI scientists can be classified in the same types as Logico-mathematical Models, Formal Models, Empirical Models, and Theoretical Models.
Some of the earliest AI systems such as Logic Theorist were logico-mathematical discovery models designed to prove theorems in logic. Among the more recent computational models, AM (Lenat, 1979) constitutes an outstanding example for mathematical discovery.
Lenat's (1983) EURISKO, in its applications to Naval Fleet Design, Evolution, and 3-D circuit design, can be cited as a typical example to formal discovery systems.
Some computational models of theoretical discovery are PI (Thagard & Holyoak, 1985), ECHO (Thagard & Nowak, 1990), GALILEO (Zytkow, 1990), and PAULI (Valdes-Perez, 1994). The first two could better be characterized as concept discovery systems, and as such, are closer to formal discovery models. GALILEO on the other hand, is an interesting example of discovery by theoretical analysis in that it discovers more expressive forms of scientific laws. The PAULI system is another interesting model which has led to the discovery of a general theorem about the quantum values of elementary particles in physics.
Empirical discovery is an extensively studied area in AI, and a number of computational models have been designed to investigate its various aspects. Empirical discovery systems can be divided into two main classes as qualitative and quantitative models, although this distinction is sometimes irrelevant. Among the qualitative discovery systems, GLAUBER (Langley, et al., 1987) models the discovery of the acid-base theory in the 17th century chemistry. STAHL (Zytkow & Simon, 1986) and STAHLp (Rose & Langley, 1986) simulate the discovery of the componential models in the 18th century chemistry, the latter with the additional capability of partially modeling the paradigm shift from the phlogiston theory to the oxygen theory. AbE (O'Rorke, Morris & Schulenburg, 1990) provides a more detailed simulation of the transition from the phlogiston theory to the oxygen theory, demonstrating the role of abductive inference in the process. KEKADA (Kulkarni & Simon, 1988) simulates the discovery of the urea cycle in biochemistry by Krebs in the 1930s, by treating the process as search in several search spaces. COAST (Rajamoney, 1990) on the other hand, treats physical systems as "scenarios", and considers theory revision as incremental changes in qualitative schemas (Forbus, 1984).
Some of the other systems are BR-3 (Kocabas, 1991) and BR-4 (Kocabas & Langley, 1995) which model the discovery of several conservation laws about the elementary particles, the latter with the ability to simulate the discovery of the neutrino in particle physics. When faced with inconsistent solution states or new evidence, both systems can revise their domain theories incrementally. PAULI (Valdes-Perez, 1994) considers certain discovery problems as matrix operations in two search spaces, and reproduces BR-3's results, together with a set of alternatives, and additionally leads to a general theorem in particle physics. MECHEM (Valdes-Perez, 1995) discovers new pathways for a set of cathalytic chemical reactions, alternative to the ones known by chemists today.
Among the quantitative discovery models BACON (Langley, et al., 1987), FAHRENHEIT (Zytkow, 1987) and IDS (Nordhausen & Langley, 1987) can be cited as prominent examples. BACON was the first successful model of quantitative discovery, which also has attracted the interest of philosophers of science(*). The IDS system on the other hand, integrates qualitative and quantitative methods.
5. Aspects of Scientific Research
Research in the computational study of science indicates that the conventional philosophical study has in its history overlooked a number of critical aspects of science. Basic differences between the computational and the conventional philosophical approaches have been described by Shrager and Langley (1990). According to these authors, the conventional philosophical tradition focuses on the structure of scientific knowledge and emphasizes the evaluation of laws and theories, while the computational approach focuses on the processes of scientific thought, and emphasizes scientific discovery including the activities of data evaluation, theory formation and experimentation.
The distinction can be extended even further. Computational study of science concerns not only with the issues of hypothesis formation, testing and verification, but also a series of other related issues. Kocabas (1992b) names more than a dozen different major tasks involved in scientific research. These are: Formulating research goals, selecting research goals, defining research framework, gathering knowledge, organising knowledge, selecting research strategies, methods, tools and techniques, proposing experiments, designing experiments, selecting experiment materials, setting expectations, conducting experiments, data collection, data evaluation, hypothesis formation, theory formation, theory revision, goal satisfaction control, and producing explanations.
Each of these research tasks may involve activities dealing with a variety of planning, classification and evaluation problems. Kocabas (1992b) provides examples from the research in oxide superconductivity for the diversity of the activities involved in these research tasks. Consider, for example the formulation of scientific research goals, choosing between formulated goals, proposing strategies, proposing experiments, and hypothesis formation.
Heuristics about formulating research goals have been studied by Kulkarni and Simon (1988), Lenat (1983), and Darden (1987). Kocabas (1992b) divides research goals into two general forms that may overlap: Those that aim at explaining a phenomenon, and those that aim to study a penomenon. Creative scientists seem to utilize several general rules for formulating their research goals. One such rule is to focus attention to problems and phenomena that have not been explained or are unexplainable within the current scientific framework. However, such problems must have some general and important implications to be worthy of investigation.
--------------------------------------------
* See, e.g. the special issue (Vol 19, No 4) of Social Studies of Science.
Some scientific research problems may be strongly related to important technological needs. Energy conversion, storage, and transfer are still major technological problems that motivate scientific research into such areas as "cold fusion", oxide superconductivity, and electrochemistry. However, interestingness in itself is not a sufficient criterion for a phenomenon to attract the attention of the creative scientist. The research goals that are formulated must also be achievable.
It is not unusual that a scientist formulates alternative research goals in relation to a certain phenomenon. In such cases, the selection of a research goal among alternatives is another research task. Scientists use several selection criteria in deciding which problem to primarily focus on. Some of these constraints conflict with one another, and resolving such conflicts may not be a trivial task for the scientist.
Selecting research strategies is another important task for accomplishing a research goal. Strategy selection depends on the type of the research goal, such as explaining or examining a phenomenon. If the research strategy inmvolves experimentation, then the type of experiments needs to be decided.
Once the experimentation strategy is selected, the scientist has to decide about the relevant processes and techniques for the current strategy. S/he also has to decide about the experiment materials, and has to classify these materials against a set of parameters such as availability, likeliness to yield success, cost and relative hazards (e.g., radioactivity, flammability and corrosiveness), and select the best materials for the experiments.
Scientific experiments need to be designed and conducted according to certain theoretical frameworks, observation and measurement standards and procedures. Experimental variables must be defined beforehand, for tests are carried out to measure the variations between the values of these variables. The experimental data is evaluated to make sure if they reflect any violation of the experimental conditions. Hypotheses are formed or revised only after data evaluation.
Hypothesis formation is one of the most important tasks of scientific research. Despite the fact that it has been a primary concern of the conventional philosophy of science for a long time, it still needs a detailed investigation. In our study on oxide superconductivity research (see, Kocabas, 1992b), we have identified over 40 hypothesis formation heuristics that were utilized by scientists working in this field. The majority of these heuristics are general, while some are domain specific.
The diversity of interrelated research tasks is by itself sufficient to show that, scientific discovery is not a logical procedure or a process in itself, but the product of a series of complex processes called scientific research. Scientific creativity may be required in any of the research activities in these processes. History of modern physics has numerous examples of these processes. Although an extreme example, consider the design, construction and the operation of the CERN particle accelerator, where research involves proposing and designing experiments, setting expectations, conducting experiments, data collection, data evaluation, hypothesis formation, verification, and theory revision.
Computational models continue to be developed for modeling different aspects of scientific research. One of the hopes of research in this direction is to be able develop complete models for research, or artificial research assistants capable of directing research in different fields of science.
The increasing use of AI techniques in computational modeling may culminate in diminishing the role of mathematical reasoning in simulation. It does not seem unreasonable to expect complex physical systems (including physical theories themselves) be represented as computational models.
Computational modeling may provide other advantages in theoretical analysis and theory revision, for the use of such models can make intractably complex theories easier to grasp. Similarly, the use of computational models can make scientific explanations more systematic, more accurate and correct to the point.
6. Conclusion
Conventional philosophical approach ignores the multiplicity of the tasks and activities involved in scientific inquiry. We believe that, a much more detailed and careful examination and analysis of science is needed than that is envisaged by the conventional study of science. The computational approach provides both the necessary concepts and methods for such a study.
References
Boden, M. (1990). The creative mind. Sphere Books, London.
Darden, L. (1987). Viewing the history of science as compiled hindsight. The AI Magazine, 8, No. 2, 33-42.
Darden, L. (1991). Theory change in science: Strategies from Mendelian genetics. Oxford University Press, N.Y.
Engelmore, R. and Morgan, T. (1988). Blackboard systems. Addison Wesley.
Forbus, K.D. (1984). Qualitative process theory. Artificial Intelligence, 24, 85-168.
Hayes-Roth, B. (1993). Architectural foundations for real-time performance in intelligent systems. In David, J-M., Krivine, J-P., and Simmons, R. eds, Second Generation Expert Systems. Springer-Verlag, New York.
Karmiloff-Smith, A. (1990). Constraints of representational change: Evidence from children's drawing. Cognition, 34.
Kocabas, S. (1991). Conflict resolution as discovery in particle physics. Machine Learning, Vol 6, No 3, 277-309.
Kocabas, S. (1992a). Functional categorization of knowledge. AAAI Spring Symposium Series, 25-27 March 1992, Stanford, CA.
Kocabas, S. (1992b). Elements of scientific research: Modeling discoveries in oxide superconductivity. Proceedings of the ML92 Workshop on Machine Discovery, 63-70.
Kocabas, S. (1992c). Evaluation of discovery systems. Proceedings of the ML92 Workshop on Machine Discovery, 168-171.
Kocabas, S. and Langley, P. (1995). Integration of research tasks for modeling discoveries in particle physics. In Working Notes of 1995 Spring Symposium Series, AAAI Press, CA.
Kuhn, T.S. (1970). The structure of scientific revolutions. The University of Chicago Press, Chicago.
Kulkarni, D. and Simon, H. (1988). The processes of scientific discovery. Cognitive Science, 12, 139-175.
Langley, P., Simon, H., Bradshaw, G., and Zykow, J. (1987). Scientific discovery: Exploration of the creative processes. MIT Press.
Lenat, D.B. (1979). On automated scientific theory formation: A case study using the AM program. In Hayes, J., Michie., D., and Mikulich, D.I. eds., Machine Intelligence, 9, 251-283, Halstead, New York.
Lenat, D.B. (1983). EURISKO: A program that learns new heuristics and domain concepts. Artificial Intelligence 21, 61-98.
Lenat, D.B. and Feigenbaum, E. (1987). On the thresholds of knowledge. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, 1173-1182.
Maslow, A.H. (1966). The psychology of science: A reconnessaince. Harper and Row Publishers, N.Y.
Nordhausen, B. and Langley, P. (1987). Towards an integrated discovery system. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, 198-200.
O'Rorke, P., Morris, S. and Schulenburg, D. (1990). Theory formation by abstraction. In Shrager, J., and Langley P. eds. Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rajamoney, S.A. (1990). A computational approach to theory revision. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rose, D. and Langley, P. (1986). Chemical discovery as belief revision. Machine Learning, 1, 423-452.
Shrager, J., and Langley, P. Eds. (1990). Computational approaches to scientific discovery. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Simon, H.A. (1992). Scientific discovery as problem solving: Reply to critics. International Studies in the Philosophy of Science 6(1): 69-88.
Thagard, P. (1988). Computational philosophy of science. The MIT Press, Cambridge, MA.
Thagard, P. and Holyoak, K. (1985). Discovering the wave theory of sound: inductive inference in the context of problem solving. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, 610-612.
Thagard, P. and Nowak, G. (1990). The conceptual structure of the geological revolution. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Valdes-Perez, R.E. (1994). Algebraic reasoning about reactions: Discovery of conserved properties in particle physics. Machine Learning 17 (1), 47-68.
Valdes-Perez, R.E. (1995). Machine discovery in chemistry: New results. Artificial Intelligence 74 (1), 191-201.
Zytkow, J.M. (1987). Combining many searches in the FAHRENHEIT discovery system. Proc. 4th Internatonal Workshop on Machine Learning, Morgan Kaufmann, CA. 281-287.
Zytkow, J.M. (1990). Deriving laws through analysis of processes and equations. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Zytkow, J.M. and Simon, H. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
Sakir Kocabas*
Istanbul Technical University
Department of Space Sciences and Technology
Abstract
Recent research in the computational study of scientific discovery has revealed a number of critical aspects of science overlooked by the conventional philosophical study. A series of computational models developed by AI scientists to study the different aspects of historical discovery indicate that hypothesis formation, testing and verification are only a small part of scientific research. This paper investigates from AI perspective, scientific creativity, the processes of scientific research, the dimensions of scientific research, and the role of knowledge in scientific research.
------------
* Also affiliated with: Department of AI, Marmara Research Center, PK 21, Gebze, Turkey.
1. Introduction
Scientific discovery and creativity has, in the last fifteen years, become one of the special concerns of artificial intelligence (AI). Within this period, a number of research papers and two important books have appeared on scientific discovery (see, Langley, Simon, Bradshaw, & Zytkow, 1987; Shrager & Langley, 1990). Closely related with the subject, several other publications have appeared. These include one on the computational philosophy of science (Thagard, 1988), one on theory revision in science (Darden, 1991), and another one on creativity (Boden, 1990).
Langley et al.'s (1987) work posed the first serious challenge to the conventional study of science by proposing that, far from being mysterious and unexplainable, scientific discovery (and by implication, scientific creativity), can be explained in terms of a series of processes. Their work also described several computational models in support of the authors' view. Shrager and Langley's (1990) later study introduced new methods for the study of scientific development, and explained how the methods of the computational study of science were superior to those of conventional philosophical studies. Boden's (1990) work on the other hand, extended some of these views and discussed, from a cognitive scientist's perspective, how creativity in arts and literature, as well as in science could be studied within a computational context, in a more systematic way.
However, previous work leaves some important issues in discovery untouched, such as the elements of scientific creativity, the types of scientific discovery and creativity, and the dimensions of scientific research. In this study, we examine the basic cognitive concepts of creativity, and describe how these concepts are interrelated, and then discuss the role of background knowledge and the kinds of knowledge necessary for scientific research. Finally, we discuss the types of scientific discovery and the elements of scientific research, and conclude with a summary.
2. Intelligence and Creativity in Science
Creativity and intelligence are closely related concepts, so that the any attempt that brings clarity to one concept will be helpful to define the other. AI scientists rely on computational terms in their definitions. Lenat and Feigenbaum (1987) define intelligence in terms of "search", as the power to find a solution to a problem in an large search space. Later, Feigenbaum defined intelligence in terms of "knowledge assembly" rather than "search" (see, Engelmore & Morgan, 1988, vii). According to this definition, an intelligent system has the ability to assemble the neccessary body of knowledge to conduct a complex task.
However, these definitions do not capture the complexity of the concept of intelligence. A more detailed definition has been given by Hayes-Roth (1993) within the context of intelligent agents, where the author discusses the agent characteristics in three system components: perception, cognition and action. Accordingly, each component must be capable of operating independently in a coherent way. Additionally, each component must meet a series of criteria in order to be called intelligent. These criteria require that an intelligent system be capable of perceiving, thinking and acting in real-time, asynchronously, selectively, coherently, flexibly, responsively, robustly and timely. It must also be capable of developing its abilities by adaptation and learning. Many AI scientists discuss learning at two levels as symbolic and subsymbolic, and classify symbolic learning into several different types as rote learning, learning by instruction, inductive learning, deductive learning, and learning by analogy.
Creativity can also be classified into different types. Accordingly, a distinction can be made between scientific creativity and other types of creativity such as artistic, architectural, musical and literary creativity. The former may involve the discovery of a new substance, the invention of a new mechanism or method, or the construction of a new model of reality (a hypothesis or a theory). The latter however, mostly manifests itself as a work of art or a new style, and the term "creativity" is usually associated with this type.
Scientific creativity can be distinguished from other forms of creativity such as in arts, music and literature, by its extensive reliance on background knowledge and experience in history. This may explain why we do not see child prodigies in creative science as we see in music and arts. Therefore, when we talk about scientific creativity, it is to be understood within this perspective.
Scientific creativity can be investigated through five basic cognitive and computational concepts:
1) Motivation for scientific research.
2) Ability to correctly formulate research problems within a body of knowledge.
3) Ability to create a comprehensive search space for the solution of a scientific problem.
4) Ability to assemble (or induce) and implement a set of heuristics to reduce the search space.
5) Patience and stamina for the exhaustive search for solving the selected scientific problem within the constrained search space. Fig. 1 summarizes the links between these concepts. Any missing link between them, can hinder scientific creativity.
Motivation for Formulate Generate Reduce Conduct
Scientific --> Research --> Search --> Search --> Exhaustive
Research Problems Space Space Search
Fig. 1. Problem formulation and search in scientific discovery.
As indicated in the above list, research motivation tops the requirements for scientific creativity. Motivation itself can be dependent on basic psychological needs. Various types of human motivation have been studied by psychologists in the last five decades (see, e.g., Maslow, 1966). Metaphysical commitments and ontological assumptions about the world may also affect motivation (see, e.g. Kuhn, 1970, p.41). This is an important issue, but is outside the scope of this study.
Problem formulation is the second major issue in scientific research. In modern scientific research, an access to a large and systematic body of knowledge is necessary for correctly formulating scientific problems. The accurate formulation of research problems requires a mastery of the conceptual structure of the field of science involved. The creative scientist can change this structure for reformulating a research problem in his/her search for a solution. In some cases, changes in the conceptual structure involve the most fundamental concepts and principles, such as time and measurability in physics. Changing representations on the other hand, provides alternative views of the problem space, and is considered as one of the most influential parameters of creativity in science (see, e.g., Simon, 1992; Karmiloff-Smith, 1990).
Extensive knowledge may also be used in creating a comprehensive search space for the selected research problem. The search space is then reduced to a manageable size, by selecting and applying appropriate search strategies, methods and heuristics. This is necessary to reach for a solution within acceptable limits of time and resources. Once the problem is defined and constrained, exhaustive search needs to be carried out within the search space, until a conclusion is reached about the solution of the scientific problem. Scientific creativity exhibits itself during the completion of a series of research tasks. Different types of knowledge may be used for each task, as will be explained next.
3. Types of Knowledge Used in Research, Types of Scientific Discovery
Modern scientific research is one of the most complex human activities, requiring the use of different types of general and specific knowledge. The knowledge necessary for modern scientific research can be divided into four types as a) Commonsense Knowledge, b) Technical Knowledge, c) Theoretical Knowledge, and d) Methodological Knowledge (*).
Commonsense knowledge is simple, general and relatively unstructured knowledge about the world. Technical knowledge can be defined as the knowledge about instruments, methods and processes. Theoretical background is helpful, but not always essential, in acquiring this kind of knowledge. Technical knowledge can be descriptive as well as prescriptive.
-----------------------
* Knowledge used in the proecesses of scientific discovery is by no means limited to the four types listed here. There can be other types of knowledge, including religious symbolisms, to play a role in scientific research, as can be seen in the recent history of quark theory in particle physics.
Theoretical knowledge is structured, descriptive knowledge about the world, embodying classifications and numerous interrelated hypotheses. Typical examples of theoretical knowledge are the classical mechanics and electro-magnetism.
Methodological knowledge, on the other hand, is exclusively prescriptive; it can be represented as condition-action rules. Methodological knowledge includes knowledge about how to distinguish between scientifically interesting and uninteresting phenomena, how to choose between alternative goals, strategies and methods in scientific research, how to design experiments, how to propose new hypotheses, and how to generalize, test and evaluate them. It is mostly the extent of this type of knowledge that makes the difference between a research scientist and a nonscientist.
Unlike the inference rules in theoretical knowledge, many of the methodological rules rely on extralogical methods such as inductive generalizations, abduction, abstraction and analogy. Such rules are frequently used in formulating problem states, in constraining large search spaces, and in hypothesis formation during the activity of scientific research.
Scientific creativity can be examined in relation to the scope of the research in which a discovery takes place. Kocabas (1992c) introduces a classification of scientific discovery as follows: 1) Logico-Mathematical Discovery, 2) Formal Discovery, 3) Theoretical Discovery, and 4) Empirical Discovery. This classification is based on the categorization of descriptive knowledge by Kocabas (1992a), and reflects the types of knowledge used in scientific research, and the type of knowledge discovered. All these four types of discovery have been studied in AI by a series of computational models.
According to this classification, logico-mathematical discovery takes place, as the name suggests, in the abstract domain of logic and mathematics. The distinguishing characteristic of logico-mathematical discovery is that, in principle, it does not require experimentation or observation. Nor does it need the knowledge of a physical domain par se, except for analogical transference in some cases.
Formal discovery takes place in a formal domain involving abstract entities, their classes and properties. Formal discovery requires logico-mathematical knowledge as background knowledge, for deductive inference on formal knowledge.
Theoretical discovery requires logico-mathematical, formal and theoretical knowledge, and in general, results from theoretical analysis and synthesis. In the history of science there are rather important theoretical discoveries or inventions such as Maxwell's equations and the Einstein-Lorenz transformations.
Empirical discovery requires experimental and observational data, as well as logico-mathematical and formal knowledge. Theoretical knowledge has not been a prerequisite in the early empirical discoveries in the history of science (e.g. in the 17th and 18th century chemistry), but in modern empirical research such as in oxide superconductivity and fusion experiments, extensive theoretical domain knowledge is necessary.
4. Computational Models of Discovery
In parallel with the types of discovery described above, computational models developed by AI scientists can be classified in the same types as Logico-mathematical Models, Formal Models, Empirical Models, and Theoretical Models.
Some of the earliest AI systems such as Logic Theorist were logico-mathematical discovery models designed to prove theorems in logic. Among the more recent computational models, AM (Lenat, 1979) constitutes an outstanding example for mathematical discovery.
Lenat's (1983) EURISKO, in its applications to Naval Fleet Design, Evolution, and 3-D circuit design, can be cited as a typical example to formal discovery systems.
Some computational models of theoretical discovery are PI (Thagard & Holyoak, 1985), ECHO (Thagard & Nowak, 1990), GALILEO (Zytkow, 1990), and PAULI (Valdes-Perez, 1994). The first two could better be characterized as concept discovery systems, and as such, are closer to formal discovery models. GALILEO on the other hand, is an interesting example of discovery by theoretical analysis in that it discovers more expressive forms of scientific laws. The PAULI system is another interesting model which has led to the discovery of a general theorem about the quantum values of elementary particles in physics.
Empirical discovery is an extensively studied area in AI, and a number of computational models have been designed to investigate its various aspects. Empirical discovery systems can be divided into two main classes as qualitative and quantitative models, although this distinction is sometimes irrelevant. Among the qualitative discovery systems, GLAUBER (Langley, et al., 1987) models the discovery of the acid-base theory in the 17th century chemistry. STAHL (Zytkow & Simon, 1986) and STAHLp (Rose & Langley, 1986) simulate the discovery of the componential models in the 18th century chemistry, the latter with the additional capability of partially modeling the paradigm shift from the phlogiston theory to the oxygen theory. AbE (O'Rorke, Morris & Schulenburg, 1990) provides a more detailed simulation of the transition from the phlogiston theory to the oxygen theory, demonstrating the role of abductive inference in the process. KEKADA (Kulkarni & Simon, 1988) simulates the discovery of the urea cycle in biochemistry by Krebs in the 1930s, by treating the process as search in several search spaces. COAST (Rajamoney, 1990) on the other hand, treats physical systems as "scenarios", and considers theory revision as incremental changes in qualitative schemas (Forbus, 1984).
Some of the other systems are BR-3 (Kocabas, 1991) and BR-4 (Kocabas & Langley, 1995) which model the discovery of several conservation laws about the elementary particles, the latter with the ability to simulate the discovery of the neutrino in particle physics. When faced with inconsistent solution states or new evidence, both systems can revise their domain theories incrementally. PAULI (Valdes-Perez, 1994) considers certain discovery problems as matrix operations in two search spaces, and reproduces BR-3's results, together with a set of alternatives, and additionally leads to a general theorem in particle physics. MECHEM (Valdes-Perez, 1995) discovers new pathways for a set of cathalytic chemical reactions, alternative to the ones known by chemists today.
Among the quantitative discovery models BACON (Langley, et al., 1987), FAHRENHEIT (Zytkow, 1987) and IDS (Nordhausen & Langley, 1987) can be cited as prominent examples. BACON was the first successful model of quantitative discovery, which also has attracted the interest of philosophers of science(*). The IDS system on the other hand, integrates qualitative and quantitative methods.
5. Aspects of Scientific Research
Research in the computational study of science indicates that the conventional philosophical study has in its history overlooked a number of critical aspects of science. Basic differences between the computational and the conventional philosophical approaches have been described by Shrager and Langley (1990). According to these authors, the conventional philosophical tradition focuses on the structure of scientific knowledge and emphasizes the evaluation of laws and theories, while the computational approach focuses on the processes of scientific thought, and emphasizes scientific discovery including the activities of data evaluation, theory formation and experimentation.
The distinction can be extended even further. Computational study of science concerns not only with the issues of hypothesis formation, testing and verification, but also a series of other related issues. Kocabas (1992b) names more than a dozen different major tasks involved in scientific research. These are: Formulating research goals, selecting research goals, defining research framework, gathering knowledge, organising knowledge, selecting research strategies, methods, tools and techniques, proposing experiments, designing experiments, selecting experiment materials, setting expectations, conducting experiments, data collection, data evaluation, hypothesis formation, theory formation, theory revision, goal satisfaction control, and producing explanations.
Each of these research tasks may involve activities dealing with a variety of planning, classification and evaluation problems. Kocabas (1992b) provides examples from the research in oxide superconductivity for the diversity of the activities involved in these research tasks. Consider, for example the formulation of scientific research goals, choosing between formulated goals, proposing strategies, proposing experiments, and hypothesis formation.
Heuristics about formulating research goals have been studied by Kulkarni and Simon (1988), Lenat (1983), and Darden (1987). Kocabas (1992b) divides research goals into two general forms that may overlap: Those that aim at explaining a phenomenon, and those that aim to study a penomenon. Creative scientists seem to utilize several general rules for formulating their research goals. One such rule is to focus attention to problems and phenomena that have not been explained or are unexplainable within the current scientific framework. However, such problems must have some general and important implications to be worthy of investigation.
--------------------------------------------
* See, e.g. the special issue (Vol 19, No 4) of Social Studies of Science.
Some scientific research problems may be strongly related to important technological needs. Energy conversion, storage, and transfer are still major technological problems that motivate scientific research into such areas as "cold fusion", oxide superconductivity, and electrochemistry. However, interestingness in itself is not a sufficient criterion for a phenomenon to attract the attention of the creative scientist. The research goals that are formulated must also be achievable.
It is not unusual that a scientist formulates alternative research goals in relation to a certain phenomenon. In such cases, the selection of a research goal among alternatives is another research task. Scientists use several selection criteria in deciding which problem to primarily focus on. Some of these constraints conflict with one another, and resolving such conflicts may not be a trivial task for the scientist.
Selecting research strategies is another important task for accomplishing a research goal. Strategy selection depends on the type of the research goal, such as explaining or examining a phenomenon. If the research strategy inmvolves experimentation, then the type of experiments needs to be decided.
Once the experimentation strategy is selected, the scientist has to decide about the relevant processes and techniques for the current strategy. S/he also has to decide about the experiment materials, and has to classify these materials against a set of parameters such as availability, likeliness to yield success, cost and relative hazards (e.g., radioactivity, flammability and corrosiveness), and select the best materials for the experiments.
Scientific experiments need to be designed and conducted according to certain theoretical frameworks, observation and measurement standards and procedures. Experimental variables must be defined beforehand, for tests are carried out to measure the variations between the values of these variables. The experimental data is evaluated to make sure if they reflect any violation of the experimental conditions. Hypotheses are formed or revised only after data evaluation.
Hypothesis formation is one of the most important tasks of scientific research. Despite the fact that it has been a primary concern of the conventional philosophy of science for a long time, it still needs a detailed investigation. In our study on oxide superconductivity research (see, Kocabas, 1992b), we have identified over 40 hypothesis formation heuristics that were utilized by scientists working in this field. The majority of these heuristics are general, while some are domain specific.
The diversity of interrelated research tasks is by itself sufficient to show that, scientific discovery is not a logical procedure or a process in itself, but the product of a series of complex processes called scientific research. Scientific creativity may be required in any of the research activities in these processes. History of modern physics has numerous examples of these processes. Although an extreme example, consider the design, construction and the operation of the CERN particle accelerator, where research involves proposing and designing experiments, setting expectations, conducting experiments, data collection, data evaluation, hypothesis formation, verification, and theory revision.
Computational models continue to be developed for modeling different aspects of scientific research. One of the hopes of research in this direction is to be able develop complete models for research, or artificial research assistants capable of directing research in different fields of science.
The increasing use of AI techniques in computational modeling may culminate in diminishing the role of mathematical reasoning in simulation. It does not seem unreasonable to expect complex physical systems (including physical theories themselves) be represented as computational models.
Computational modeling may provide other advantages in theoretical analysis and theory revision, for the use of such models can make intractably complex theories easier to grasp. Similarly, the use of computational models can make scientific explanations more systematic, more accurate and correct to the point.
6. Conclusion
Conventional philosophical approach ignores the multiplicity of the tasks and activities involved in scientific inquiry. We believe that, a much more detailed and careful examination and analysis of science is needed than that is envisaged by the conventional study of science. The computational approach provides both the necessary concepts and methods for such a study.
References
Boden, M. (1990). The creative mind. Sphere Books, London.
Darden, L. (1987). Viewing the history of science as compiled hindsight. The AI Magazine, 8, No. 2, 33-42.
Darden, L. (1991). Theory change in science: Strategies from Mendelian genetics. Oxford University Press, N.Y.
Engelmore, R. and Morgan, T. (1988). Blackboard systems. Addison Wesley.
Forbus, K.D. (1984). Qualitative process theory. Artificial Intelligence, 24, 85-168.
Hayes-Roth, B. (1993). Architectural foundations for real-time performance in intelligent systems. In David, J-M., Krivine, J-P., and Simmons, R. eds, Second Generation Expert Systems. Springer-Verlag, New York.
Karmiloff-Smith, A. (1990). Constraints of representational change: Evidence from children's drawing. Cognition, 34.
Kocabas, S. (1991). Conflict resolution as discovery in particle physics. Machine Learning, Vol 6, No 3, 277-309.
Kocabas, S. (1992a). Functional categorization of knowledge. AAAI Spring Symposium Series, 25-27 March 1992, Stanford, CA.
Kocabas, S. (1992b). Elements of scientific research: Modeling discoveries in oxide superconductivity. Proceedings of the ML92 Workshop on Machine Discovery, 63-70.
Kocabas, S. (1992c). Evaluation of discovery systems. Proceedings of the ML92 Workshop on Machine Discovery, 168-171.
Kocabas, S. and Langley, P. (1995). Integration of research tasks for modeling discoveries in particle physics. In Working Notes of 1995 Spring Symposium Series, AAAI Press, CA.
Kuhn, T.S. (1970). The structure of scientific revolutions. The University of Chicago Press, Chicago.
Kulkarni, D. and Simon, H. (1988). The processes of scientific discovery. Cognitive Science, 12, 139-175.
Langley, P., Simon, H., Bradshaw, G., and Zykow, J. (1987). Scientific discovery: Exploration of the creative processes. MIT Press.
Lenat, D.B. (1979). On automated scientific theory formation: A case study using the AM program. In Hayes, J., Michie., D., and Mikulich, D.I. eds., Machine Intelligence, 9, 251-283, Halstead, New York.
Lenat, D.B. (1983). EURISKO: A program that learns new heuristics and domain concepts. Artificial Intelligence 21, 61-98.
Lenat, D.B. and Feigenbaum, E. (1987). On the thresholds of knowledge. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, 1173-1182.
Maslow, A.H. (1966). The psychology of science: A reconnessaince. Harper and Row Publishers, N.Y.
Nordhausen, B. and Langley, P. (1987). Towards an integrated discovery system. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, 198-200.
O'Rorke, P., Morris, S. and Schulenburg, D. (1990). Theory formation by abstraction. In Shrager, J., and Langley P. eds. Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rajamoney, S.A. (1990). A computational approach to theory revision. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rose, D. and Langley, P. (1986). Chemical discovery as belief revision. Machine Learning, 1, 423-452.
Shrager, J., and Langley, P. Eds. (1990). Computational approaches to scientific discovery. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Simon, H.A. (1992). Scientific discovery as problem solving: Reply to critics. International Studies in the Philosophy of Science 6(1): 69-88.
Thagard, P. (1988). Computational philosophy of science. The MIT Press, Cambridge, MA.
Thagard, P. and Holyoak, K. (1985). Discovering the wave theory of sound: inductive inference in the context of problem solving. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, 610-612.
Thagard, P. and Nowak, G. (1990). The conceptual structure of the geological revolution. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Valdes-Perez, R.E. (1994). Algebraic reasoning about reactions: Discovery of conserved properties in particle physics. Machine Learning 17 (1), 47-68.
Valdes-Perez, R.E. (1995). Machine discovery in chemistry: New results. Artificial Intelligence 74 (1), 191-201.
Zytkow, J.M. (1987). Combining many searches in the FAHRENHEIT discovery system. Proc. 4th Internatonal Workshop on Machine Learning, Morgan Kaufmann, CA. 281-287.
Zytkow, J.M. (1990). Deriving laws through analysis of processes and equations. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Zytkow, J.M. and Simon, H. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
A methodology for Modeling Scientific Discovery
A METHODOLOGY FOR MODELING SCIENTIFIC DISCOVERY
Sakir Kocabas *
uckoca @ tritu.bitnet
Department of Artificial Intelligence
Marmara Research Center, PK 21, Gebze, Turkey
Abstract
Computational modeling of scientific discovery has been emerging as an important research field in artificial intelligence. Building theoretical models for scientific development has until recently been the exclusive domain for philosophers of science. With the advances in artificial intelligence and especially in machine learning, opportunities have arisen for researchers in this field to test the learning methods developed in modeling scientific discovery. In the last fifteen years, a number of systems have been developed modeling various discoveries ranging from 17th to 20th century physics and chemistry. However, a methodology for building and evaluating such models has still not been developed. This paper focuses on the elements of historical discovery models, and the methods for their systematic construction and evaluation.
* Also affiliated with the Department of Space Sciences and Technology, ITU, Maslak, Istanbul, Turkey.
1. Introduction
Recent research in the computational study of science has revealed a number of important aspects of science that were overlooked by conventional study of science. Shrager and Langley (1990) describe the basic differences between the computational and the conventional philosophical approaches as follows: Conventional philosophical tradition focuses on the structure of scientific knowledge and emphasizes the evaluation of las and theories, while the computational approach focuses on the processes of scientific discovery including the activities of experimentation, data evaluation, and theory formation.
The distinction can be extended even further: Computational study of science concerns not only with the issues of hypothesis formation, testing and verification, but also a series of other issues related with scientific research. Kocabas (1992b) names more than a dozen major research tasks involved in physical sciences. These range from formulating and selecting research goals, defining research framework, gathering and organising related knowledge, and through selecting research strategies, methods, tools and techniques, to designing experiments, data collection, hypothesis and theory formation, theory revision and producing scientific explanations. Any of these research tasks may involve a variety of planning, classification and evaluation problems.
Computational study of science is more concerned with the methodological issues in science rather than the logico-philosophical issues which are the main concern of conventional studies. The main purpose of the former is to investigate the processes that lead to discovery in science, and eventually to build a model (or models) of scientific research which would be used as artificial research assistants.
Another discipline, social study of science, deals with the social dimension of science, e.g., with how scientific communities form and interact, how research projects are developed into research programmes, how these programmes evolve or terminate, and how research traditions develop in human societies. History of science, on the other hand, investigates scientific developments through the historical records, and provides a historical perspective to science.
Computational study of science draws ideas, perspectives, methods and data from conventional philosophical, social and historical studies, but it differs from these disciplines in some essential ways: 1) It has a medium, a computational model, for the reconstruction and analysis of an historical discovery, 2) Using its models, it can investigate the possible alternative routes to the discovery, 3) It aims assembling heuristics for developing models for scientific research for currently active research projects in science.
2. Types of Discovery
A methodology for the systematic evaluation of discovery models should first of all be capable of distinguishing between different types of discovery. In other words, it should provide a classification of discovery, so that one can identify a certain type in the history of science in relation to certain other discoveries.
Kocabas (1991c) introduces an implicit classification, which can be reformulated as follows: 1) Logico-Mathematical/Formal Discovery, 2) Theoretical Discovery, and 3) Empirical Discovery. This classification is somewhat in parallel with the categorization of knowledge by Kocabas (1992a), and reflects an order of diminishing degree of abstraction.
Logico-Mathematical/Formal Discovery: This type of discovery takes place, as the name suggests, in the abstract domain of logic and mathematics. Formal Discovery takes place in a formal domain which involves abstract entities, their classes and properties. Formal discovery requires logico-mathematical knowledge as background knowledge for inductive and/or deductive inference on domain knowledge. Examples of this type of discovery are the mathematical techniques and formal theories starting from the invention of decimal system and algebra to modern mathematics, and various axiom systems.
Theoretical Discovery: This type of discovery requires logico-mathematical, formal and theoretical knowledge, and in general results from theoretical analysis and synthesis. Some examples to theoretical discovery from the history of science are: a) The emergence of the pecial theory of relativity based on Einstein-Lorenz transformations, b) Maxwell's theory of electromagnetism based on his equations, c) Yukawa's theory of nuclear forces and mesons, and d) Dirac's theory of charge symmetry and antiparticles.
Empirical Discovery: Empirical discovery requires experimental and observational data, as well as logico-mathematical and formal knowledge. Theoretical knowledge has not been a prerequisite in the early empirical discoveries in the history of science, but in modern empirical research such as in oxide superconductivity and "cold fusion" experiments, theoretical domain knowledge is necessary. Empirical discovery can be further divided as heuristic and experimental/observational discovery.
Heuristic discoveries take place in attempts to finding qualitative and/or quantitative relationships in experimental data. Some examples to such discoveries are: a) GLAUBER's (Langley, et al., 1987) formulation of acid-alkali theory in the 17th century chemistry, b) STAHL's (Zytkow & Simon, 1986) discovery of componential models of compounds in the 18th century chemistry, c) Quantitative discoveries of simple pysical laws in classical physics (e.g. Kepler's laws, Boyle's law, Ohm's law), d) Discovery of new quantum properties and their value distribution to elementary particles in particle physics.
Experimental/Observational Discovery. This type of discovery is usually initiated by a thechnological inventions or innovations. Two examples are: The discovery of superconductivity by Onnes following his invention of a method to liquify helium, and the discovery of new particle interactions after the invention of cloud chamber.
A number of computational systems have been developed in the last 15 years for modeling these different types of discoveries.
Some of the earliest AI systems such as Logic Theorist were designed to prove theorems in logic. Among the more recent systems, AM (Lenat, 1979) stands out as a successful example in modeling mathematical discovery. The distinguishing characteristics of logico-mathematical discovery is that in principle, it does not require experimentation or observation. Nor does it require the knowledge of a physical domain. Lenat's (1983) EURISKO, in its applications to Naval Fleet Design, Evolution, and three dimensional circuit design, is a good example to formal discovery systems.
Examples of theoretical discovery models are PI (Thagard & Holyoak, 1985), ECHO (Thagard & Novak, 1990), and GALILEO (Zytkow, 1990). The first two systems can be better characterized as conceptual discovery systems, and as such, are closer to formal discovery systems. GALILEO on the other hand is an interesting example of discovery by theoretical analysis. In the history of science, there are some rather interesting theoretical discoveries such as Maxwell's equations and the Einstein-Lorenz transformations. Scarcity of research in modeling theoretical discovery in AI remains to be striking.
Empirical discovery is an extensively studied area in AI, and a number of computational models have been designed to investigate its various aspects. Empirical discovery systems can be divided into two main classes as qualitative and quantitative systems, although this distinction is sometimes irrelevant. Among the qualitative discovery systems, GLAUBER (Langley, et al., 1987), STAHL (Zytkow & Simon, 1986), STAHLp (Rose & Langley, 1986), BR-3 (Kocabas, 1991a), KEKADA (Kulkarni & Simon, 1988), Abe (O'Rorke, Morris & Schulenburg, 1990), and COAST (Rajamoney, 1990), MECHEM (Valdes-Perez, ???), and PAULI (Valdes-Perez, ???) can be cited.
Among the quantitative discovery systems, BACON (Langley, et al., 1987), FAHRENHEIT (Zytkow,1987) and IDS (Nordhausen & Langley, 1987) can be cited as prominent examples. BACON was the first successful example of quantitaive discovery, which has also attracted the interest of philophers of science. The IDS system on the other hand, integrates quantitative and qualitative methods.
So, for a systematic evaluation, computational models can be looked at within the framework of this classification. In this way, we would know why a logico-mathematical or formal discovery system does not need experimental data, and why a theoretical discovery system needs logico-mathematical, formal and theoretical knowledge for its operations.
3. The Methodology
It should be stated at this stage that no discovery model can reflect every detail of a discovery process, except perhaps when the model itself is used in a real-life discovery. In this perspective, historical discovery models can at best be rational reconstructions of the discovery process. In building such models, it is essential to find out and assemble the knowledge that has played a significant role in the discovery.
3.1 Collecting Historical Records
Collecting information about historical discoveries is not an easy task. One can identify three main sources of historical record for scientific discovery as history of science books, scientific research reports, and the log books used by the scientist during their experiments leading to the discovery. Most of the current discovery systems rely on publications on the history of physics and chemistry dedicated to a certain period. Scientific research papers and reports can be used for reconstructing more recent discoveries (e.g. te ones materialized in this century.) Kocabas (1992a) uses such researh reports and articles in science journals for reconstructing the discoveries in oxide superconductivity. Log books are not easy to obtain for their being personal property not meant for publication. It is no surprise that, among the discovery models, only Kulkarni & Simon's (1989) KEKADA is based on a log book, (i.e., Hans Krebs' log book) for its reconstruction of the urea cycle.
3.2 Assembling the Historcal Records in Standard Formats
Building a complex discovery model may require a good deal of time and effort. The main problem in this task is to assemble the necessary knowledge which may have been used in the discovery. It seems best to develop a standard format to assemble this knowledge in a structured way. This format may include the following slots: Discovery (name, date and responsible scientist(s), Historical Background, Available Technology, Empirical Knowledge, Theoretical Knowledge, Inputs, Algorithms, Heuristics, Results, Possible Alternative Results, and Effects of the Discovery. Figures 1 and 2 illustrate two examples of this structured representation.
This format provides a knowledge level view of the discovery, and allows the construction of the model in a systematic way. It also helps to analyze and revise the model as necessary. Additionally, it enables to see the level of detail that the model can be built for the reconstruction of the discovery.
Figure 1. Example of formatted data for the discovery of Y-Ba-Cu-O superconductor in oxide superconductivity.
-----------------------------------------------------------------------
Discovery Event
Discovery: Neutrino
Date of Discovery: 1931, W Pauli; 1934, E Fermi; 1953 C Cowan & F Raines
Source: Ne'eman & Kirsh (1986), p 67-69.
Background :
Historical Background/Problems: After the discovery of neutron in 1932, the picture of the atomic world was complete. Four elementary particles were known: photon, electron, proton and neutron. The nucleus was composed of protons and neutrons, and the behavior of the electrons around the nucleus was well explained by quantum mechanics. But there were unsolved problems such as the process of beta-decay and the nature of the forces that hold the components of the nucleus together. Beta-decay appeared to contradict the basic conservation laws of physics (conservation of energy and angular momentum.)
3 3
H --> He + e
1 2
n --> p + e
Calculations showed that in beta-decay, the mass difference between the original and the produced nuclei is equal to the maximal energy value that the electron can have, yet only a small minority of beta-particles (electrons) actually possess this energy. Most beta particles were emitted with less energy than indicated by the actual mass difference, and were not accompanied by a photon which could compensate for the energy difference.
Theoretical Background: The theories which attempted to explain the structure of the atomic nucleus predicted the existence of additional particles. Experimental physicists searched for these particles. Moreover, Dirac's 1928 relativistic wave equation had implied the existence of a particle with opposite quantum properties of the electron. (According to Dirac's equation, for every charged particle, there is an anti particle.) In 1931, Pauli proposed that during beta-decay, an additional particle is emitted. This particle carried part of the energy liberated in the process. Its mass would be zero or very small, and it would be electrically neutral.
Types of Empirical Knowledge and Technology: Radioactivity, particle detectors, cloud chambers and particle reactions originated by cosmic rays. (Early in the century, physicists who studied radioactivity observed that electroscopes discharged slowly even there was when no radiation in the vicinity. At first this was attributed to natural radioactivity, but in 1910 V. Hess showed that radiation grew stronger in the upper atmosphere. This radiation was later called cosmic rays. Later, it was realized that these were fast particles (mostly protons). Unlike beta-decay, alfa decay was found to conserve energy, momentum and angular momentum (spin).
Discovery Process
Discovery Goals: Investigate beta-decay, and explain why this type of reaction violates the conservation of mass/energy and spin. Inputs: Conservation laws concerning charge, energy, momentum and spin. A set of valid and observed particle reactions involving the electron, proton, neutron, positron, and gamma rays:
2 1
H + gamma --> H + n (1934, Chadwick & Goldhaber)
1 1
p + p --> p + p
n --> p + e
e + /e --> gamma
gamma --> e + /e
The following quantum properties about the particles were known:
Particle mass (MeV) charge spin
-------------------------------------------
gamma 0 0 1
e 0.51 -1 1/2
p 938.26 1 1/2
n 939.55 0 1/2
/e 0.51 1 1/2
-------------------------------------------
Algorithms:
The observed reaction
n --> p + e
violates the mass/energy and spin conservation laws. If we consider
the balance of charge and spin values:
0 = 1 + (-1) electrical charge
1/2 =/= 1/2 + 1/2 spin values
the inequality in the case of spin conservation can be turned into
an equality by introducing a hypothetical particle x, with zero
electrical charge and 1/2 spin:
n --> p + e + x
whose charge and spin balance would be
1 = 1 + (-1) + 0
1/2 == 1/2 + 1/2 + 1/2.
Outputs: Postulation of a new particle (which is called the
anti-neutrino /nu), with zero rest mass, zero electrical charge
and 1/2 spin values.
Secondary Results: The new particle was named as the neutrino (nu) by
Fermi. Later, the neutrino emitted in beta-decay was accepted as
an anti-particle. Hence the subsequent formulation of beta-decay
was
n --> p + e + /nu.
By using the principle of symmetry the possiblity of a
reverse reaction was also considered:
nu + p --> n + e
Alternative Outputs: -
Theory Development: The discovery of the neutrino completed the knowledge
about beta-decay. The validity of the basic conservation laws of physics
was once again supported by the reactions of elementary particles.
In 1953 Konopinski & Mahmoud postulated the lepton number, and
proposed the conservation of lepton number analogous to the
conservation of charge.
Explanations
Types of Explanations: Theoretical, deductive, abductive. New Research Problems: What was the rest mass of the neutrino? Did this particle have an antiparticle? How could this be proved? Would the neutrino react with other particles? What would be the results of reactions?
New Research Directions
The postulation of the neutrino led to the search for this particle, and it took about twenty years to prove its existence in the laboratory in an indirect way:
n --> p + e + /nu
/nu + p --> n + /e
/e + e --> 2 gamma
2 gamma + Cd --> Cd + n gamma
The gamma emissions from the last two reactions were detected by a series of photomultipliers one after the other in miliseconds.
-----------------------------------------------------------------------
Figure 2. Example of formatted historical data for the discovery of the neutrino in particle physics.
-----------------------------------------------------------------------
Discovery Event
Discovery: Y-Ba-Cu-O oxide superconductor
Date of Discovery: 16th February, 1987. Paul Chu et al.
Source: Physics Today?
Background :
Historical Background/Problems:
Theoretical Background: Several theories on superconductivity had been developed. One of these theories was the BCS theory which explains the phenomenon in terms of the conservation of angular and translational momentum.
Current theoretical knowledge implied the impossibility of oxide superconductors with higher Tcs than metal or alloy superconductors. (The theories were based on the accumulated experimental knowledge.)
Knowledge about the relationships between heat conductivity and electrical conductivity.
Types of Empirical Knowledge and Technology: Oxide superconductors had been known since 1973 when D. Johnston discovered superconductivity in LiTi2O4 at temperatures up to 13.7K. In 1975, A. Sleight discovered superconductivity in BaPb(1-x)Bi(x)O3 with a Tc up to 13K.
In 1986, La-Ba-Cu-O superconductor with Tc around 35K was discovered by Bednorz and Mueller.
Knowledge about elements in the Periodic Table. Processes for synthesis of double and triple oxide compounds. Element substitutions in such compounds.
Discovery Process
Discovery Goals: Search for oxides with higher Tcs than La-Ba-Cu-O compound.
Inputs: La-Ba-Cu-O superconducting compound, chemical elements, knowledge about the synthesis of double and triple oxides.
Algorithms: Element substitutions in La-Ba-Cu-O compound. Select an element from Periodic Table with electronic properties similar to La, and substitute it with this element in La-Ba-Cu-O under the relevant experimental conditions.
Outputs: Substitution of Y for La in La-Ba-Cu-O, and the discovery of Y-Ba-Cu-O superconductor.
Secondary Results: Other substitutions may be possible to yield better oxide superconductors.
Alternative Outputs: < to be filled >
Theory Development: The hypothesis that substances with highest Tcs are metal alloys was falsified once more.
Explanations
Types of Explanations: The role of crystal structure in oxide superconductivity was discussed. Explanations were based on electron-phonon interactions.
New Research Problems: Could there be other oxide compounds with higher Tcs?
New Research Directions: Search for oxides with higher Tcs. Explain oxide superconductivity.
-----------------------------------------------------------------------
4. The Discovery Model
Computational models of discovery need to be evaluated in accordance with their type, i.e. for being formal, theoretical or empirical models. However, there are some common points of evaluation. These can be listed as follows: research goals; methods of knowledge representation; the size, order and role of initial knowledge; theory revision and search methods; methods of learning and discovery; generality of the system's methods; and the system's predictive abilities. We can now look at these one by one.
4.1. Research Goals
The research goals of a discovery system varies with its domain of interest, and the methods that it employs. Some systems such as AM (Lenat, 1979), EURISKO (1983), and GLAUBER (Langley, et al., 1987) aim at discovering new concepts, relationships, heuristics or general hypotheses. Some other systems such as BR-3 (Kocabas, 1991a) and AbE (O'Rorke, Morris & Schulenburg, 1990) start with an impasse, and aim at consistency and/or completeness as their main goal, while discovery is a by-product of their activities. Yet others such as COAST (Rajamoney, 1990) and GENSIM/HYPGENE (Karp, 1990) search for consistent explanations, and GALILEO (Zytkow, 1990) for more expressive laws.
The research methods of a system must be adequate enough for its research goals. For example, a consistency oriented system must inevitably have theory revision capabilities, and a completeness oriented system must have the ability to generate and test new concepts and hypotheses. A few systems such as KEKADA (Kulkarni & Simon, 1988) and CER (Kocabas, 1989; 1992b) are capable of generating their own research goals by detecting problem states (inconsistencies, incompletenesses and anomalies) in their knowledge base. The system description of a computational model must clearly state its goals, or how they are generated.
4.2. Knowledge Representation Methods
Knowledge representation still remains to be an important issue in computational models, as it affects the efficiency of a system's methods of search, learning and discovery. Early models, (e.g. GLAUBER, STAHL and BR-3) employ relatively simple representation methods such as list structures and predicate expressions. Recent discovery systems (e.g. AbE, COAST, IDS) employ more structured knowledge representation schemes such as frames and qualitative process schemas, often in combination with predicate logic representation. Each representation scheme has its own advantages and disadvantages in terms of the implementation (see, e.g., Kocabas, 1991b for details). Therefore, the choice of knowledge representation schemes or their integration is an important issue in the design of computational models. Consequently, the system description of a model must explicitly state the knowledge representation methods that it employs, and how they are integrated.
4.3. The Order, Size, and the Role of Initial Knowledge
Initially, the discovery systems were divided into two broad groups as data- and theory-driven systems. Later on, the distinction began to appear as superficial, for some systems (e.g. STAHLp and BR-3) start as data driven models and acquire theory-driven system characteristics during their operations. The size of initial knowledge and how much of it is utilized by a discovery system is an important feature in the correct evaluation of that system. Some systems process data incrementally (e.g., STAHLp, BR-3, AbE, COAST), and the order of data given to the system affects its behavior (see, e.g., Kocabas, 1991a). If the discovery model is an incremental system, its description must evaluate the effects of data order. Data size is important for the evaluation of any discovery model to test the effectiveness of its search methods.
4.4. Theory Revision and Search Methods
One of the prominent problems that haunt discovery systems with large search spaces is the control of search. Whatever search methods are used, the size of the effectively used initial knowledge base is a significant indicator of the system's dimensions. Models with large search spaces utilize a number of search control methods. These can be as widely varied as logical constrains (as in STAHLp), algebraic constraints (as in BR-3 and GALILEO), general rules (as in EURISKO, BACON, KEKADA and IDS), and domain constraints (as in BR-3, KEKADA, AbE, COAST and GENSIM/HYPGENE). The description of a computational model must include its search methods, and explain why those particular methods are used rather than the others.
Theory revision is becoming an indispensable feature of discovery models. This is in line with the understanding that most scientific discoveries are the results of generating and testing hypotheses. If the discovery system has theory revision capabilities, first these must be described in detail in general terms, and then explained with a particular example. Also, where, how and why the system's search and theory revision methods fail need also be explained. Artificial data can be used in testing the effectiveness of a system's theory revision and search methods.
4.5. Learning and Discovery Methods
Discovery systems utilize deductive and inductive methods, but until now, there is no discovery model that uses analogical reasoning in a non- trivial sense. Logico-mathematical, formal and theoretical discovery systems such as AM, EURISKO, PI, GALILEO and ECHO extensively rely on deductive methods, while BACON employs inductive methods. Systems like STAHL, STAHLp, BR-3, KEKADA and IDS employ both inductive and deductive methods. A system's discovery methods cannot be separated from its search and theory revision methods.
4.6. Generality of Methods
Another important metric in the evaluation of a discovery model has been the generality of its discovery and search methods. Some discovery models such as EURISKO and BACON rely on rather general heuristics for their discoveries. Similarly, BR-3 employs algebraic rules to reduce its search space. However, there seems to be a limit for the uses of such general heuristics, as systems with more and structured domain knowledge must inevitably use domain heuristics for constraining search. Therefore, the size and the type of the discovery model must be considered in evaluating the generality of a system's methods.
4.7. Predictive Abilities
Predictive ability can be defined as a system's ability to generate a set of propositions which were undecidable prior to the discovery are decidable afterwards. Predictive ability is an important feature of theoretical and empirical systems. Does the system's predictive ability improve as it discovers new concepts, hypotheses or relationships? The answer to this question is also an indication of how effectively the system integrates and uses the knowledge that it has discovered. Discoveries of systems like BACON and GALILEO are validly applicable to an indefinite number of physical states. However, by themselves, these systems do not apply their knowledge to physical states. IDS, FAHRENHEIT and BR-3, on the other hand, effectively utilize the knowledge they discovered in new problem states.
5. Conclusion
Computational modeling of scientific discovery has been emerging as an important research area in artificial intelligence, and the number of computational models is steadily increasing. A methodology for systematic evaluation of these systems is necessary, not only for researchers in this field, but also for the interested philosophers and historians of science. First of all, a methodology needs to be developed for building historical discovery models. Secondly, a method of classification for such models for a systematic evaluation is needed. Then a set of evaluation criteria needs to be identified, which can include the research goals, knowledge representation methods, the role of initial knowledge, theory revision and search methods, learning and discovery methods, generality and the system's predictive abilities. In this paper we have discussed these issues and provided examples for the methods to be used.
References
Darden, L. (1987). Viewing the history of science as compiled hindsight. The AI Magazine 8, No. 2, (33-42).
Karp, P.D. (1990). Hypothesis formation as design. In: J. Shrager and P. Langley (Eds.) Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann, San Mateo, CA.
Kocabas, S. (1989). Functional Categorization of knowledge: Applications in modeling scientific discovery. PhD Thesis, Department of Electronic and Electrical Engineering, King's College London, University of London.
Kocabas, S. (1991a). Conflict resolution as discovery in particle physics. Machine Learning, 6, 277-309.
Kocabas, S. (1991b). A review of learning. The Knowledge Engineering Review, 6, 3.
Kocabas, S. (1991c). Computational models of scientific discovery. The Knowledge Engineering Review, 6, 259-305.
Kocabas, S. (1992a). Functional categorization of knowledge. AAAI Spring Symposium Series, 25-27 March 1992, Stanford, CA.
Kocabas, S. (1992b). Four levels of learning and representation in modeling scientific discovery. First Turkish Symposium on AI and Neural Networks, 25-26 June, Bilkent, Ankara.
Kulkarni, D. & Simon, H.D. (1988). The processes of scientific discovery. Cognitive Science, 12, 277-309.
Langley, P., Simon, H.A., Bradshaw, G.L., Zytkow, J.M. (1987). Scientific discovery: Computational explorations of the creative processes. Cambridge, MA: The MIT Press.
Lenat, D.B. (1979). On automated scientific theory formation: A case study using the AM program. In J. Hayes, D. Michie and L.I. Mikulich (Eds.) Machine Intellligence 9, (251-283). New York: Halstead.
Lenat, D.B. (1983). EURISKO: A program that learns new heuristics and domain concepts. Artificial Intelligence 21, Nos. 1-2, (61-98).
Nordhausen, B. & Langley, P. Towards an integrated discovery system. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, 198-200.
O'Rorke, P., Morris, S. & Schulenburg, D. (1990). Theory formation by abstraction. In: J. Shrager and P. Langley (Eds.) Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann, San Mateo, CA.
Rajamoney, S.A. (1990). A computational approach to theory revision. In: J. Shrager and P. Langley (Eds.) Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann, San Mateo, CA.
Thagard, P. & Holyoak, K. (1985). Discovering the wave theory of sound: Inductive inference in the context of problem solving. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, 610-612.
Thagard, P. & Nowak, G. (1990). The conceptual structure of the geological revolution. In: J. Shrager and P. Langley (Eds.) Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann, San Mateo, CA.
Valdes-Perez, R. (199?). MECHEM ......
Valdes-Perez, R. (in press). Discovery of conserved properties in particle physics: A comparison of two models. Machine Learning.
Zytkow, J. (1987). Combining many searches in the FAHRENHEIT discovery system. Proceedings of the Fourth International Workshop on Machine Learning, Los Altos, CA:
Morgan Kaufmann, 281-287.
Zytkow, J. (1990). Deriving laws through analysis of process and equations. In: J. Shrager and P. Langley (Eds.) Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann, San Mateo, CA.
Zytkow, J. & Simon, H.D. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
References on Superconductivity
Khurana, A. (1987a). Search and discovery: Superconductivity seen above the boiling point of nitrogen. Physics Today, April, 1987, 17-23.
Khurana, A. (1987b). Search and discovery: Bednorz and Mueller win Nobel Prize for new superconducting materials. Physics Today, December, 1987, 17-19.
References on Physics
Griffiths, D. (1987). Introduction to Elemantary Particles. N.Y., ohn Wiley & Sons.
Ne'eman, Y. and Kirsh, Y. (1986). The Particle Hunters. Cambridge University Press.
Pais, A. (1986). Inward Bound.
Sakir Kocabas *
uckoca @ tritu.bitnet
Department of Artificial Intelligence
Marmara Research Center, PK 21, Gebze, Turkey
Abstract
Computational modeling of scientific discovery has been emerging as an important research field in artificial intelligence. Building theoretical models for scientific development has until recently been the exclusive domain for philosophers of science. With the advances in artificial intelligence and especially in machine learning, opportunities have arisen for researchers in this field to test the learning methods developed in modeling scientific discovery. In the last fifteen years, a number of systems have been developed modeling various discoveries ranging from 17th to 20th century physics and chemistry. However, a methodology for building and evaluating such models has still not been developed. This paper focuses on the elements of historical discovery models, and the methods for their systematic construction and evaluation.
* Also affiliated with the Department of Space Sciences and Technology, ITU, Maslak, Istanbul, Turkey.
1. Introduction
Recent research in the computational study of science has revealed a number of important aspects of science that were overlooked by conventional study of science. Shrager and Langley (1990) describe the basic differences between the computational and the conventional philosophical approaches as follows: Conventional philosophical tradition focuses on the structure of scientific knowledge and emphasizes the evaluation of las and theories, while the computational approach focuses on the processes of scientific discovery including the activities of experimentation, data evaluation, and theory formation.
The distinction can be extended even further: Computational study of science concerns not only with the issues of hypothesis formation, testing and verification, but also a series of other issues related with scientific research. Kocabas (1992b) names more than a dozen major research tasks involved in physical sciences. These range from formulating and selecting research goals, defining research framework, gathering and organising related knowledge, and through selecting research strategies, methods, tools and techniques, to designing experiments, data collection, hypothesis and theory formation, theory revision and producing scientific explanations. Any of these research tasks may involve a variety of planning, classification and evaluation problems.
Computational study of science is more concerned with the methodological issues in science rather than the logico-philosophical issues which are the main concern of conventional studies. The main purpose of the former is to investigate the processes that lead to discovery in science, and eventually to build a model (or models) of scientific research which would be used as artificial research assistants.
Another discipline, social study of science, deals with the social dimension of science, e.g., with how scientific communities form and interact, how research projects are developed into research programmes, how these programmes evolve or terminate, and how research traditions develop in human societies. History of science, on the other hand, investigates scientific developments through the historical records, and provides a historical perspective to science.
Computational study of science draws ideas, perspectives, methods and data from conventional philosophical, social and historical studies, but it differs from these disciplines in some essential ways: 1) It has a medium, a computational model, for the reconstruction and analysis of an historical discovery, 2) Using its models, it can investigate the possible alternative routes to the discovery, 3) It aims assembling heuristics for developing models for scientific research for currently active research projects in science.
2. Types of Discovery
A methodology for the systematic evaluation of discovery models should first of all be capable of distinguishing between different types of discovery. In other words, it should provide a classification of discovery, so that one can identify a certain type in the history of science in relation to certain other discoveries.
Kocabas (1991c) introduces an implicit classification, which can be reformulated as follows: 1) Logico-Mathematical/Formal Discovery, 2) Theoretical Discovery, and 3) Empirical Discovery. This classification is somewhat in parallel with the categorization of knowledge by Kocabas (1992a), and reflects an order of diminishing degree of abstraction.
Logico-Mathematical/Formal Discovery: This type of discovery takes place, as the name suggests, in the abstract domain of logic and mathematics. Formal Discovery takes place in a formal domain which involves abstract entities, their classes and properties. Formal discovery requires logico-mathematical knowledge as background knowledge for inductive and/or deductive inference on domain knowledge. Examples of this type of discovery are the mathematical techniques and formal theories starting from the invention of decimal system and algebra to modern mathematics, and various axiom systems.
Theoretical Discovery: This type of discovery requires logico-mathematical, formal and theoretical knowledge, and in general results from theoretical analysis and synthesis. Some examples to theoretical discovery from the history of science are: a) The emergence of the pecial theory of relativity based on Einstein-Lorenz transformations, b) Maxwell's theory of electromagnetism based on his equations, c) Yukawa's theory of nuclear forces and mesons, and d) Dirac's theory of charge symmetry and antiparticles.
Empirical Discovery: Empirical discovery requires experimental and observational data, as well as logico-mathematical and formal knowledge. Theoretical knowledge has not been a prerequisite in the early empirical discoveries in the history of science, but in modern empirical research such as in oxide superconductivity and "cold fusion" experiments, theoretical domain knowledge is necessary. Empirical discovery can be further divided as heuristic and experimental/observational discovery.
Heuristic discoveries take place in attempts to finding qualitative and/or quantitative relationships in experimental data. Some examples to such discoveries are: a) GLAUBER's (Langley, et al., 1987) formulation of acid-alkali theory in the 17th century chemistry, b) STAHL's (Zytkow & Simon, 1986) discovery of componential models of compounds in the 18th century chemistry, c) Quantitative discoveries of simple pysical laws in classical physics (e.g. Kepler's laws, Boyle's law, Ohm's law), d) Discovery of new quantum properties and their value distribution to elementary particles in particle physics.
Experimental/Observational Discovery. This type of discovery is usually initiated by a thechnological inventions or innovations. Two examples are: The discovery of superconductivity by Onnes following his invention of a method to liquify helium, and the discovery of new particle interactions after the invention of cloud chamber.
A number of computational systems have been developed in the last 15 years for modeling these different types of discoveries.
Some of the earliest AI systems such as Logic Theorist were designed to prove theorems in logic. Among the more recent systems, AM (Lenat, 1979) stands out as a successful example in modeling mathematical discovery. The distinguishing characteristics of logico-mathematical discovery is that in principle, it does not require experimentation or observation. Nor does it require the knowledge of a physical domain. Lenat's (1983) EURISKO, in its applications to Naval Fleet Design, Evolution, and three dimensional circuit design, is a good example to formal discovery systems.
Examples of theoretical discovery models are PI (Thagard & Holyoak, 1985), ECHO (Thagard & Novak, 1990), and GALILEO (Zytkow, 1990). The first two systems can be better characterized as conceptual discovery systems, and as such, are closer to formal discovery systems. GALILEO on the other hand is an interesting example of discovery by theoretical analysis. In the history of science, there are some rather interesting theoretical discoveries such as Maxwell's equations and the Einstein-Lorenz transformations. Scarcity of research in modeling theoretical discovery in AI remains to be striking.
Empirical discovery is an extensively studied area in AI, and a number of computational models have been designed to investigate its various aspects. Empirical discovery systems can be divided into two main classes as qualitative and quantitative systems, although this distinction is sometimes irrelevant. Among the qualitative discovery systems, GLAUBER (Langley, et al., 1987), STAHL (Zytkow & Simon, 1986), STAHLp (Rose & Langley, 1986), BR-3 (Kocabas, 1991a), KEKADA (Kulkarni & Simon, 1988), Abe (O'Rorke, Morris & Schulenburg, 1990), and COAST (Rajamoney, 1990), MECHEM (Valdes-Perez, ???), and PAULI (Valdes-Perez, ???) can be cited.
Among the quantitative discovery systems, BACON (Langley, et al., 1987), FAHRENHEIT (Zytkow,1987) and IDS (Nordhausen & Langley, 1987) can be cited as prominent examples. BACON was the first successful example of quantitaive discovery, which has also attracted the interest of philophers of science. The IDS system on the other hand, integrates quantitative and qualitative methods.
So, for a systematic evaluation, computational models can be looked at within the framework of this classification. In this way, we would know why a logico-mathematical or formal discovery system does not need experimental data, and why a theoretical discovery system needs logico-mathematical, formal and theoretical knowledge for its operations.
3. The Methodology
It should be stated at this stage that no discovery model can reflect every detail of a discovery process, except perhaps when the model itself is used in a real-life discovery. In this perspective, historical discovery models can at best be rational reconstructions of the discovery process. In building such models, it is essential to find out and assemble the knowledge that has played a significant role in the discovery.
3.1 Collecting Historical Records
Collecting information about historical discoveries is not an easy task. One can identify three main sources of historical record for scientific discovery as history of science books, scientific research reports, and the log books used by the scientist during their experiments leading to the discovery. Most of the current discovery systems rely on publications on the history of physics and chemistry dedicated to a certain period. Scientific research papers and reports can be used for reconstructing more recent discoveries (e.g. te ones materialized in this century.) Kocabas (1992a) uses such researh reports and articles in science journals for reconstructing the discoveries in oxide superconductivity. Log books are not easy to obtain for their being personal property not meant for publication. It is no surprise that, among the discovery models, only Kulkarni & Simon's (1989) KEKADA is based on a log book, (i.e., Hans Krebs' log book) for its reconstruction of the urea cycle.
3.2 Assembling the Historcal Records in Standard Formats
Building a complex discovery model may require a good deal of time and effort. The main problem in this task is to assemble the necessary knowledge which may have been used in the discovery. It seems best to develop a standard format to assemble this knowledge in a structured way. This format may include the following slots: Discovery (name, date and responsible scientist(s), Historical Background, Available Technology, Empirical Knowledge, Theoretical Knowledge, Inputs, Algorithms, Heuristics, Results, Possible Alternative Results, and Effects of the Discovery. Figures 1 and 2 illustrate two examples of this structured representation.
This format provides a knowledge level view of the discovery, and allows the construction of the model in a systematic way. It also helps to analyze and revise the model as necessary. Additionally, it enables to see the level of detail that the model can be built for the reconstruction of the discovery.
Figure 1. Example of formatted data for the discovery of Y-Ba-Cu-O superconductor in oxide superconductivity.
-----------------------------------------------------------------------
Discovery Event
Discovery: Neutrino
Date of Discovery: 1931, W Pauli; 1934, E Fermi; 1953 C Cowan & F Raines
Source: Ne'eman & Kirsh (1986), p 67-69.
Background :
Historical Background/Problems: After the discovery of neutron in 1932, the picture of the atomic world was complete. Four elementary particles were known: photon, electron, proton and neutron. The nucleus was composed of protons and neutrons, and the behavior of the electrons around the nucleus was well explained by quantum mechanics. But there were unsolved problems such as the process of beta-decay and the nature of the forces that hold the components of the nucleus together. Beta-decay appeared to contradict the basic conservation laws of physics (conservation of energy and angular momentum.)
3 3
H --> He + e
1 2
n --> p + e
Calculations showed that in beta-decay, the mass difference between the original and the produced nuclei is equal to the maximal energy value that the electron can have, yet only a small minority of beta-particles (electrons) actually possess this energy. Most beta particles were emitted with less energy than indicated by the actual mass difference, and were not accompanied by a photon which could compensate for the energy difference.
Theoretical Background: The theories which attempted to explain the structure of the atomic nucleus predicted the existence of additional particles. Experimental physicists searched for these particles. Moreover, Dirac's 1928 relativistic wave equation had implied the existence of a particle with opposite quantum properties of the electron. (According to Dirac's equation, for every charged particle, there is an anti particle.) In 1931, Pauli proposed that during beta-decay, an additional particle is emitted. This particle carried part of the energy liberated in the process. Its mass would be zero or very small, and it would be electrically neutral.
Types of Empirical Knowledge and Technology: Radioactivity, particle detectors, cloud chambers and particle reactions originated by cosmic rays. (Early in the century, physicists who studied radioactivity observed that electroscopes discharged slowly even there was when no radiation in the vicinity. At first this was attributed to natural radioactivity, but in 1910 V. Hess showed that radiation grew stronger in the upper atmosphere. This radiation was later called cosmic rays. Later, it was realized that these were fast particles (mostly protons). Unlike beta-decay, alfa decay was found to conserve energy, momentum and angular momentum (spin).
Discovery Process
Discovery Goals: Investigate beta-decay, and explain why this type of reaction violates the conservation of mass/energy and spin. Inputs: Conservation laws concerning charge, energy, momentum and spin. A set of valid and observed particle reactions involving the electron, proton, neutron, positron, and gamma rays:
2 1
H + gamma --> H + n (1934, Chadwick & Goldhaber)
1 1
p + p --> p + p
n --> p + e
e + /e --> gamma
gamma --> e + /e
The following quantum properties about the particles were known:
Particle mass (MeV) charge spin
-------------------------------------------
gamma 0 0 1
e 0.51 -1 1/2
p 938.26 1 1/2
n 939.55 0 1/2
/e 0.51 1 1/2
-------------------------------------------
Algorithms:
The observed reaction
n --> p + e
violates the mass/energy and spin conservation laws. If we consider
the balance of charge and spin values:
0 = 1 + (-1) electrical charge
1/2 =/= 1/2 + 1/2 spin values
the inequality in the case of spin conservation can be turned into
an equality by introducing a hypothetical particle x, with zero
electrical charge and 1/2 spin:
n --> p + e + x
whose charge and spin balance would be
1 = 1 + (-1) + 0
1/2 == 1/2 + 1/2 + 1/2.
Outputs: Postulation of a new particle (which is called the
anti-neutrino /nu), with zero rest mass, zero electrical charge
and 1/2 spin values.
Secondary Results: The new particle was named as the neutrino (nu) by
Fermi. Later, the neutrino emitted in beta-decay was accepted as
an anti-particle. Hence the subsequent formulation of beta-decay
was
n --> p + e + /nu.
By using the principle of symmetry the possiblity of a
reverse reaction was also considered:
nu + p --> n + e
Alternative Outputs: -
Theory Development: The discovery of the neutrino completed the knowledge
about beta-decay. The validity of the basic conservation laws of physics
was once again supported by the reactions of elementary particles.
In 1953 Konopinski & Mahmoud postulated the lepton number, and
proposed the conservation of lepton number analogous to the
conservation of charge.
Explanations
Types of Explanations: Theoretical, deductive, abductive. New Research Problems: What was the rest mass of the neutrino? Did this particle have an antiparticle? How could this be proved? Would the neutrino react with other particles? What would be the results of reactions?
New Research Directions
The postulation of the neutrino led to the search for this particle, and it took about twenty years to prove its existence in the laboratory in an indirect way:
n --> p + e + /nu
/nu + p --> n + /e
/e + e --> 2 gamma
2 gamma + Cd --> Cd + n gamma
The gamma emissions from the last two reactions were detected by a series of photomultipliers one after the other in miliseconds.
-----------------------------------------------------------------------
Figure 2. Example of formatted historical data for the discovery of the neutrino in particle physics.
-----------------------------------------------------------------------
Discovery Event
Discovery: Y-Ba-Cu-O oxide superconductor
Date of Discovery: 16th February, 1987. Paul Chu et al.
Source: Physics Today?
Background :
Historical Background/Problems:
Theoretical Background: Several theories on superconductivity had been developed. One of these theories was the BCS theory which explains the phenomenon in terms of the conservation of angular and translational momentum.
Current theoretical knowledge implied the impossibility of oxide superconductors with higher Tcs than metal or alloy superconductors. (The theories were based on the accumulated experimental knowledge.)
Knowledge about the relationships between heat conductivity and electrical conductivity.
Types of Empirical Knowledge and Technology: Oxide superconductors had been known since 1973 when D. Johnston discovered superconductivity in LiTi2O4 at temperatures up to 13.7K. In 1975, A. Sleight discovered superconductivity in BaPb(1-x)Bi(x)O3 with a Tc up to 13K.
In 1986, La-Ba-Cu-O superconductor with Tc around 35K was discovered by Bednorz and Mueller.
Knowledge about elements in the Periodic Table. Processes for synthesis of double and triple oxide compounds. Element substitutions in such compounds.
Discovery Process
Discovery Goals: Search for oxides with higher Tcs than La-Ba-Cu-O compound.
Inputs: La-Ba-Cu-O superconducting compound, chemical elements, knowledge about the synthesis of double and triple oxides.
Algorithms: Element substitutions in La-Ba-Cu-O compound. Select an element from Periodic Table with electronic properties similar to La, and substitute it with this element in La-Ba-Cu-O under the relevant experimental conditions.
Outputs: Substitution of Y for La in La-Ba-Cu-O, and the discovery of Y-Ba-Cu-O superconductor.
Secondary Results: Other substitutions may be possible to yield better oxide superconductors.
Alternative Outputs: < to be filled >
Theory Development: The hypothesis that substances with highest Tcs are metal alloys was falsified once more.
Explanations
Types of Explanations: The role of crystal structure in oxide superconductivity was discussed. Explanations were based on electron-phonon interactions.
New Research Problems: Could there be other oxide compounds with higher Tcs?
New Research Directions: Search for oxides with higher Tcs. Explain oxide superconductivity.
-----------------------------------------------------------------------
4. The Discovery Model
Computational models of discovery need to be evaluated in accordance with their type, i.e. for being formal, theoretical or empirical models. However, there are some common points of evaluation. These can be listed as follows: research goals; methods of knowledge representation; the size, order and role of initial knowledge; theory revision and search methods; methods of learning and discovery; generality of the system's methods; and the system's predictive abilities. We can now look at these one by one.
4.1. Research Goals
The research goals of a discovery system varies with its domain of interest, and the methods that it employs. Some systems such as AM (Lenat, 1979), EURISKO (1983), and GLAUBER (Langley, et al., 1987) aim at discovering new concepts, relationships, heuristics or general hypotheses. Some other systems such as BR-3 (Kocabas, 1991a) and AbE (O'Rorke, Morris & Schulenburg, 1990) start with an impasse, and aim at consistency and/or completeness as their main goal, while discovery is a by-product of their activities. Yet others such as COAST (Rajamoney, 1990) and GENSIM/HYPGENE (Karp, 1990) search for consistent explanations, and GALILEO (Zytkow, 1990) for more expressive laws.
The research methods of a system must be adequate enough for its research goals. For example, a consistency oriented system must inevitably have theory revision capabilities, and a completeness oriented system must have the ability to generate and test new concepts and hypotheses. A few systems such as KEKADA (Kulkarni & Simon, 1988) and CER (Kocabas, 1989; 1992b) are capable of generating their own research goals by detecting problem states (inconsistencies, incompletenesses and anomalies) in their knowledge base. The system description of a computational model must clearly state its goals, or how they are generated.
4.2. Knowledge Representation Methods
Knowledge representation still remains to be an important issue in computational models, as it affects the efficiency of a system's methods of search, learning and discovery. Early models, (e.g. GLAUBER, STAHL and BR-3) employ relatively simple representation methods such as list structures and predicate expressions. Recent discovery systems (e.g. AbE, COAST, IDS) employ more structured knowledge representation schemes such as frames and qualitative process schemas, often in combination with predicate logic representation. Each representation scheme has its own advantages and disadvantages in terms of the implementation (see, e.g., Kocabas, 1991b for details). Therefore, the choice of knowledge representation schemes or their integration is an important issue in the design of computational models. Consequently, the system description of a model must explicitly state the knowledge representation methods that it employs, and how they are integrated.
4.3. The Order, Size, and the Role of Initial Knowledge
Initially, the discovery systems were divided into two broad groups as data- and theory-driven systems. Later on, the distinction began to appear as superficial, for some systems (e.g. STAHLp and BR-3) start as data driven models and acquire theory-driven system characteristics during their operations. The size of initial knowledge and how much of it is utilized by a discovery system is an important feature in the correct evaluation of that system. Some systems process data incrementally (e.g., STAHLp, BR-3, AbE, COAST), and the order of data given to the system affects its behavior (see, e.g., Kocabas, 1991a). If the discovery model is an incremental system, its description must evaluate the effects of data order. Data size is important for the evaluation of any discovery model to test the effectiveness of its search methods.
4.4. Theory Revision and Search Methods
One of the prominent problems that haunt discovery systems with large search spaces is the control of search. Whatever search methods are used, the size of the effectively used initial knowledge base is a significant indicator of the system's dimensions. Models with large search spaces utilize a number of search control methods. These can be as widely varied as logical constrains (as in STAHLp), algebraic constraints (as in BR-3 and GALILEO), general rules (as in EURISKO, BACON, KEKADA and IDS), and domain constraints (as in BR-3, KEKADA, AbE, COAST and GENSIM/HYPGENE). The description of a computational model must include its search methods, and explain why those particular methods are used rather than the others.
Theory revision is becoming an indispensable feature of discovery models. This is in line with the understanding that most scientific discoveries are the results of generating and testing hypotheses. If the discovery system has theory revision capabilities, first these must be described in detail in general terms, and then explained with a particular example. Also, where, how and why the system's search and theory revision methods fail need also be explained. Artificial data can be used in testing the effectiveness of a system's theory revision and search methods.
4.5. Learning and Discovery Methods
Discovery systems utilize deductive and inductive methods, but until now, there is no discovery model that uses analogical reasoning in a non- trivial sense. Logico-mathematical, formal and theoretical discovery systems such as AM, EURISKO, PI, GALILEO and ECHO extensively rely on deductive methods, while BACON employs inductive methods. Systems like STAHL, STAHLp, BR-3, KEKADA and IDS employ both inductive and deductive methods. A system's discovery methods cannot be separated from its search and theory revision methods.
4.6. Generality of Methods
Another important metric in the evaluation of a discovery model has been the generality of its discovery and search methods. Some discovery models such as EURISKO and BACON rely on rather general heuristics for their discoveries. Similarly, BR-3 employs algebraic rules to reduce its search space. However, there seems to be a limit for the uses of such general heuristics, as systems with more and structured domain knowledge must inevitably use domain heuristics for constraining search. Therefore, the size and the type of the discovery model must be considered in evaluating the generality of a system's methods.
4.7. Predictive Abilities
Predictive ability can be defined as a system's ability to generate a set of propositions which were undecidable prior to the discovery are decidable afterwards. Predictive ability is an important feature of theoretical and empirical systems. Does the system's predictive ability improve as it discovers new concepts, hypotheses or relationships? The answer to this question is also an indication of how effectively the system integrates and uses the knowledge that it has discovered. Discoveries of systems like BACON and GALILEO are validly applicable to an indefinite number of physical states. However, by themselves, these systems do not apply their knowledge to physical states. IDS, FAHRENHEIT and BR-3, on the other hand, effectively utilize the knowledge they discovered in new problem states.
5. Conclusion
Computational modeling of scientific discovery has been emerging as an important research area in artificial intelligence, and the number of computational models is steadily increasing. A methodology for systematic evaluation of these systems is necessary, not only for researchers in this field, but also for the interested philosophers and historians of science. First of all, a methodology needs to be developed for building historical discovery models. Secondly, a method of classification for such models for a systematic evaluation is needed. Then a set of evaluation criteria needs to be identified, which can include the research goals, knowledge representation methods, the role of initial knowledge, theory revision and search methods, learning and discovery methods, generality and the system's predictive abilities. In this paper we have discussed these issues and provided examples for the methods to be used.
References
Darden, L. (1987). Viewing the history of science as compiled hindsight. The AI Magazine 8, No. 2, (33-42).
Karp, P.D. (1990). Hypothesis formation as design. In: J. Shrager and P. Langley (Eds.) Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann, San Mateo, CA.
Kocabas, S. (1989). Functional Categorization of knowledge: Applications in modeling scientific discovery. PhD Thesis, Department of Electronic and Electrical Engineering, King's College London, University of London.
Kocabas, S. (1991a). Conflict resolution as discovery in particle physics. Machine Learning, 6, 277-309.
Kocabas, S. (1991b). A review of learning. The Knowledge Engineering Review, 6, 3.
Kocabas, S. (1991c). Computational models of scientific discovery. The Knowledge Engineering Review, 6, 259-305.
Kocabas, S. (1992a). Functional categorization of knowledge. AAAI Spring Symposium Series, 25-27 March 1992, Stanford, CA.
Kocabas, S. (1992b). Four levels of learning and representation in modeling scientific discovery. First Turkish Symposium on AI and Neural Networks, 25-26 June, Bilkent, Ankara.
Kulkarni, D. & Simon, H.D. (1988). The processes of scientific discovery. Cognitive Science, 12, 277-309.
Langley, P., Simon, H.A., Bradshaw, G.L., Zytkow, J.M. (1987). Scientific discovery: Computational explorations of the creative processes. Cambridge, MA: The MIT Press.
Lenat, D.B. (1979). On automated scientific theory formation: A case study using the AM program. In J. Hayes, D. Michie and L.I. Mikulich (Eds.) Machine Intellligence 9, (251-283). New York: Halstead.
Lenat, D.B. (1983). EURISKO: A program that learns new heuristics and domain concepts. Artificial Intelligence 21, Nos. 1-2, (61-98).
Nordhausen, B. & Langley, P. Towards an integrated discovery system. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, 198-200.
O'Rorke, P., Morris, S. & Schulenburg, D. (1990). Theory formation by abstraction. In: J. Shrager and P. Langley (Eds.) Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann, San Mateo, CA.
Rajamoney, S.A. (1990). A computational approach to theory revision. In: J. Shrager and P. Langley (Eds.) Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann, San Mateo, CA.
Thagard, P. & Holyoak, K. (1985). Discovering the wave theory of sound: Inductive inference in the context of problem solving. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, 610-612.
Thagard, P. & Nowak, G. (1990). The conceptual structure of the geological revolution. In: J. Shrager and P. Langley (Eds.) Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann, San Mateo, CA.
Valdes-Perez, R. (199?). MECHEM ......
Valdes-Perez, R. (in press). Discovery of conserved properties in particle physics: A comparison of two models. Machine Learning.
Zytkow, J. (1987). Combining many searches in the FAHRENHEIT discovery system. Proceedings of the Fourth International Workshop on Machine Learning, Los Altos, CA:
Morgan Kaufmann, 281-287.
Zytkow, J. (1990). Deriving laws through analysis of process and equations. In: J. Shrager and P. Langley (Eds.) Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann, San Mateo, CA.
Zytkow, J. & Simon, H.D. (1986). A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
References on Superconductivity
Khurana, A. (1987a). Search and discovery: Superconductivity seen above the boiling point of nitrogen. Physics Today, April, 1987, 17-23.
Khurana, A. (1987b). Search and discovery: Bednorz and Mueller win Nobel Prize for new superconducting materials. Physics Today, December, 1987, 17-19.
References on Physics
Griffiths, D. (1987). Introduction to Elemantary Particles. N.Y., ohn Wiley & Sons.
Ne'eman, Y. and Kirsh, Y. (1986). The Particle Hunters. Cambridge University Press.
Pais, A. (1986). Inward Bound.
Functional Categorization of Knowledge
Presented at AAAI - SSS 92 Symposium:
Propositional Knowledge Representation)
FUNCTIONAL CATEGORIZATION OF KNOWLEDGE
Dr Sakir Kocabas
ITU, Maslak, Turkey
Abstract
The continuous increase of human knowledge rendered the classification of knowledge an important task, from very early ages, for philosophers like Aristotle down to the modern age, to Wittgenstein. These classifications were necessitated by the difficulties in understanding, memorization and transmission of knowledge. An analogous task now is faced in knowledge based artificial intelligence systems as the needs arise to build larger and more versatile systems. In this paper we introduce a method for organising knowledge into several linguistic categories. We describe how this categorization introduces clarity in representing different types of knowledge, how it facilitates the analysis of complex propositions into their simple constituents, and how these in turn can be assembled into complex constructs such as frames and schemata.
1 Introduction
Large knowledge systems of the future, especially those that will represent scientific theories in physics, chemistry, biology astronomy, etc., cannot be confined to narrow domains of expertise. However, as the amount of knowledge given to or acquired by a system increases, two main problems arise: _knowledge acquision bottleneck_ (Lenat, Prakash, & Shepherd, 1986) and _brittleness_ (McCarthy, 1983; Holland, 1986). Knowledge acquisition bottleneck is related to the acquisition of new knowledge by instruction or various other methods of learning. Minsky (1984) and Lenat et al. (1986) point out that, in acquiring new knowledge, the human mind overcomes this problem by recalling similar concepts it already knows about and by recording the exceptions to the case in consideration. Brittleness on the other hand, is related to the difficulties in having the knowledge system expand beyond the scope originally contemplated by their designers.
One solution to the brittleness problem, proposed by McCarthy (1983), is to provide a system with the ability to acquire commonsense knowledge and reasoning. Another solution, which emphasizes inductive learning and is based on general purpose learning algorithms, has been proposed by Holland (1986). His general purpose "classifier systems", are essentially reactive rule based systems relying on genetic algorithms.
Holland (1986) proposes induction as the basic - and perhaps the only - way of making large advances in overcoming brittleness. In considering the specific problems induction faces in this context, Holland identifies the creation of useful ways of categorizing input as the primary task. He suggests that categories must be incorporated into rules that "point" both to actions and to an aura of associated categories. That is, as they are induced, they must be arranged in a "tangled hierarchy", enabling the system to model its environment appropriately.
Holland seems to have an Aristotelian notion of concept-based categorization(1) in mind here as opposed to functional categorization of knowledge. By "category", he understands common abstract features between objects or frames, so that when these are captured, one set of feature can be used in developing a similar frame. (Frames can be regarded as complex propositions linked together by inheritance features.)
Lenat (1983) proposes to carry along multiple representations simultaneously and to shift from one representation to another to enable the knowledge system to carry out the most frequent operations more quickly. He says that this has not been much studied or attempted in artificial intelligence, except in very small worlds. Lenat's CYC system (see, Lenat & Guha, 1990) makes use of this idea.
Porter and Kibler (1986, p. 283) state that in machine learning research, most systems have been built on a small number of rules (heuristics) without having to address the problem of organizing learned knowledge into a coherent, efficiently accessible whole. Also, Cheng and Fu (1984) emphasize that, compared with knowledge representation or the formalization of concepts, little work has been done in the area of knowledge organisation.
Woods (1986) offers a simple classification of knowledge. He states that the "knowledge of the world" consists of two kinds of things - facts about what is or has been true (the known world state) and rules for predicting changes over time. He mentions the need for taxonomic organisation. Woods also argues that the standard notations and semantics of the predicate calculus are insufficient by themselves - they need to be supplemented with additional mechanisms e.g., for non-monotonic reasoning, and metalogical reasoning.
A "functional" knowledge system described by Brachman and Levesque (1983), distinguishes between "definitional" and "factual" information. Their system contains two "boxes" of knowledge; one for maintaining analytical knowledge, and the other to build descriptive domain theories. They also use two languages for their representation, a frame language for analytical knowledge and an "assertional langage" for the descriptive domain theories.
Langley (1986) states that although concept learning has been a basic mainstay of the machine learning community, most research in this area has ignored a number of well-established psychological phenomena. He says that basic-level categories appear to have a special status in human memory, being retreived more quickly and being acquired earlier than other concepts, and suggests that more work is needed on concept formation, for such research would yield a better understanding of human concepts and their acquisition, and it should also lead to improved methods of nonhuman concept learning (also see, Gennari, Langley & Fisher (1989).
His notion of category referred here, is also concept based, in which clusters of concepts are regarded as categories, which in turn, are organised in hierarchies. A concept based approach addresses the issue of conceptual organisation of knowledge. Whereas, it can be argued that a considerable proportion of human cognitive activity is propositional. Therefore a functional organisation of knowledge needs to be developed at least in equal priority and depth.
The categorization intoduced in this paper is based on fundamental methodological and linguistic criteria: methods of verification, meaning (use), and the function of expressions. It is not aimed at the classification of concepts, but of simple propositions. In philosophical terms, our concept of category is based on the deep grammar of propositions and therefore, is quite different from the concept-based notions.
2. Theoretical and Philosophical Background: Piaget and Wittgenstein The future of artificial intelligence, to a certain extent, depends on the studies on cognitive development. Because of its better tractability by means of natural language, human cognitive development is still the best source for developing knowled ge based models in artificial intelligence. The foundations of cognitive science was laid by Piaget's work on human cognitive development in the early 1920s.
Piaget (1971) made extensive studies on human cognitive development and the development of language. There are several reasons why his work is of interest: It is related with 1) the linguistic methods of knowledge acquisition (questions and their classification), 2) the order of knowledge acquisition according to the types of knowledge acquired, 3) the methods of relating the acquired knowledge, and 4) the theoretical foundations for the organisation of knowledge.
Piaget (1971, p. 30) draws our attention to the importance of questions in cognitive development. From the standpoint of cognitive development, a question is a spontaneous search for information. He studies questions asked by the child between the ages of six and seven, and classifies them as questions 1) for causal explanation, 2) about reality, 3) on actions and intentions, 4) rules, 5) about classification, and 6) arithmetical questions.
It may be worthwhile to consider the child's questions from the standpoint of the organisation of acquired knowledge. Explanations seem to play a critical role in these activities. It is conceivable that the child's mind is actively involved in organising its knowledge during the knowledge acquisition processes. Analogously, intelligent knowledge systems may have to be given the ability to ask as many meaningful questions as possible during knowledge acquisition and learning, particularly in the development stage of such systems. The difficulties encountered in knowledge acquisition can be avoided by the use of such strategies of learning frequently used by the child. Lenat et al. (1986) use a similar strategy in developing the knowledge base of their CYC system, but their knowledge acquisition methods are not automated.
Piaget's observations on the "why" questions of the child can be viewed as the manifestation of the operations of an effective organizational capability for acquired knowledge. The child's lack of interest in logical justification of explanations can also be explained within this perspective: Confronted with a vast amount of knowledge to be learned, the primary cognitive problem must naturally be how to organise the acquired knowledge rather than learning how any proposition can be theoretically, logically or mathematically justified. Piaget's classification of questions and explanations, albeit for psychological purposes, constitutes an important step in understanding how expressions are used in language and how they might be classified according to their function and methods of verification.
Having seen some of Piaget's views on human cognitive development, we can now take a brief look at Wittgenstein's philosophical work on the grammatical distinctions between various types of expressions. Wittgenstein considered the functional classification of expressions as one of the most important tasks in philosophy. In one of his earlier works (Wittgenstein, 1965, pp. 44-45), he provides an illustrative analogy between the task of organising a heap of books into a library and the classification of knowledge. Wittgenstein pronounces that some of the greatest achievements in philosophy could only be compared with taking up some books which seemed to belong together and putting them on different shelves; nothing more being final about their positions than that they no longer lie side by side.
Wittgenstein's use of the methods of distinction between various types of expressions are at times implicit and scattered in his books. At Cambridge in 1939 (see, Wittgenstein, 1974), he devoted a whole series of lectures on how to distinguish mathematical sentences from non-mathematical statements. For example, in (Wittgenstein, 1974, p. 34) he says: "'Professor Hardy believes that x1 > x0' is not a mathematical statement. It is no more a mathematical statement than "William said that 7x8=54' is a mathematical statement." Here, Witggenstein is trying to draw a distinction between a factual statement and a mathematical sentence. Indeed, later (Wittgenstein, 1974, p. 111), he clearly states that in his discussions he aims to show the essential difference between the uses of mathematical propositions and non-mathematical propositions which seem to be "exactly analogous" to the former. One of the criteria that he proposes for distinguishing between mathematical and non-mathematical statements is the prefix "by definition ...", which is applicable to logico-mathematical and formal statements, but not to factual statements. In his various studies, Wittgenstein distinguishes between philosophical statements and theoretical statements; first-person psychological statements and theoretical statements; and basic belief statements and theoretical statements (e.g., see, Wittgenstein, 1953; Wittgenstein, 1970; Wittgenstein, 1978).
The Method of Categorization
The development of large and reliable knowledge systems requires an effective knowledge organisation. It seems that one way of achieving this is to categorize simple propositions into functionaly different classes and then assemble them into frames as necessary. Here there are two questions to be asked: 1) How many grammatically different categories can be identified in language? 2) What can be the criteria to be used in distinguishing between these categories? Before we attempt to answer these questions, the problems of consistency and completeness in knowledge systems need to be reconsidered.
3.1 Consistency, Completeness, Validity and Truth
Since Goedel's (1962) famous theorem, it is widely accepted that an overall completeness and consistency may not be achievable in large bodies of knowledge. However, a domain dependent consistency and completeness can and needs to be maintained. It was Wittgenstein (1981, p. 118) who pointed out that a contradiction is not the end of the world, but something that sets a limit to a particular "language game" or "grammar". His remarks opened the way to a context dependent concept of consistency in the philosophy of language. More recently, from the viewpoint of artificial intelligence, de Kleer (1986) argues that there is no necessity that the overall database be consistent in a knowledge system. He suggests that a context dependent concept of consistency provides a better way of achieving control in problem solving. Similar arguments can be found in recent artificial intelligence literature (see, e.g. Lenat & Feigenbaum, 1987; 1991.)
It may even be meaningless to try to achieve an overall consistency and completeness in complex bodies of knowledge. The simple reason is that "truth" and "verification" have different meanings in different "grammars". In other words, quite different methods and criteria are employed in establishing the "truth" of propositions in different categories. To illustrate, let us consider the expressions:
(1) 1^3 + ... + n^3 = (n(n+1)/2)^2
(2) Jupiter is a planet.
(3) In a nuclear reaction, the amount of energy released is equal to the rest-mass difference multiplied by the square of the velocity of light (E=mc^2).
(4) Calculus was invented by Newton and Leibniz independently.
The criteria and methods that are used in establishing the "validity" of each of these expressions are quite different:.The first expression is a mathematical statement which can be proved by the axioms of arithmetic using mathematical induction, and no factual investigation is needed to prove it. The second is a formal statement the "truth" of which is implied by the order of the concepts "planet" and "Jupiter" in language. Unlike theoretical, hypothetical, empirical statements, formal statements are not compared with facts for their validity. The third one is a theoretical statement which is used as part of a model to understand and explain certain physical phenomena. Finally, the last sentence is a historical statement, the "validity" of which is established by the methods used in historical study.
These four statements can be considered as belonging to four different categories, namely the categories of logico-mathematical, formal, theoretical, and historical statements. Propositions belonging to these categories can be found in the body of almost any scientific work. Therefore, knowledge systems should take into account such categorical distinctions. A system of criteria has been developed for functional classification of propositions, based on the works of Piaget (1971) and Wittgenstein (1974) outlined above. The principles for the functional categorization of propositions will now be described.
3.2 The Categorization Criteria
Propositions can be categorized according to their functions (uses) in language. The criteria that can be used in such categorizations can initially be divided into three groups:
1) Methodological (epistemic) criteria. The differences between the methods of verification (or falsification) of propositions may help to determine their categories. Verification methods of theoretical statements are different from those of logico-mathematical statements. In addition, verification may not apply to some propositions at all (e.g., allegorical and fictional sentences). A theoretical, hypothetical or empirical statement must be testable, or else must be derivable from testable theoretical statements, while logico- mathematical and formal statements are not verified by testing (i.e. by observation and/or experimentation).
Verification of formal statements is embedded in the logical structure of their constitutent concepts in language. In this way, formal statements reflect the concept structure in language. For example the sentences, "A cat is an animal," and "Calcium is an element" are formal statements and no one would ask for their verification by the verification methods of theoretical, hypothetical or empirical statements. Definitions also belong to the category of formal statements. On the other hand, logico- mathematical statements (e.g., theorems, non-theorems, lemmas and corollaries) are subject to the criterion of provability in a formal or informal calculus.
2) Grammatical labels used in language in the form of prefixes and postfixes., which play a role in the categorization of expressions. Some examples are: "By definition, ...", "According to the theory, ...", "According to the hypothesis, ...", "According to our experiences, ...", "I believe that, ...", "I do not believe that, ...", "I feel that, ...", "Probably, ...", "Possibly, ...", etc. It seems that at times we use these prefixes like flags or labels for classes of expressions, to signify their places in our knowledge.
Each of these prefixes are meaningful with the expressions of a certain category, while with others they are not. For example, it is meaningful to say: "By definition, 2+2 is 4," while it is not meaningful to say: "By definition, the gravitational force between two bodies is proportional to their masses and inversely proportional to the square of the distance between them." Similarly, it is not meaningful to say: "According to the hypothesis, 2+2 is 4," while it is meaningful to use the prefix with a theoretical or hypothetical statement.
3) Comparisons and contrasts of propositions with the ones already identified as belonging to a category. Comparisons can be very useful in cases where other criteria may fail to identify the category of a statement. Consider Fermat's "last theorem" for example. One might think that it is a hypothesis like those which physicists use to hypothesize certain subatomic particles (e.g. quarks). Fermat's last theorem is easily comparable with some other familiar arithmetical theorems, whereas physical hypotheses are more readily comparable with other similar physical hypotheses.
By using these criteria, it is possible to build a system of categorization which can be used in distinguishing the following categories: i) logical propositions, ii) mathematical statements, iii) formal statements, iv) grammatical (or meta) statements, v) theoretical-hypothetical-empirical statements, vi) historical sentences, vii) factual statements. In addition to these, there are other identifiable categories which can be listed as: a) basic belief statements, b) sensory and intentional statements, c) metaphorical and allegorical statements, d) fictional sentences. However, these will not be considered here, because the seven categories above seem to be sufficient to represent scientific knowledge to a large extent.
A list of questions that can be used in determining the category of a given statement S, is as follows:
- Does the statement S describe an event currently in effect?
- Does the statement S describe a past event or state of affairs?
- Does the statement S define a concept?
- Is the statement S testable against facts?
- Is the statement S varifiable by observations/experiments?
- Is the statement S provable/deducible?
- Is the statement S about another statement?
- Is the statement S a comment on a property/relation?
- Is the statement S a comment on a state of affairs?
- Is the statement, "Possibly, S" meaningful?
- Is the statement, "Probably, S" meaningful?
- Is the statement, "By definition, S" meaningful?
- Is the statement, "According to the hypothesis, S" meaningful?
- Is the statement, "According to the theory, S" meaningful?
- Is the statement, "According to the experiences, S" meaningful?
- Is the statement, "According to the rules, S" meaningful?
- Is the statement, "According to (so and so), S" meaningful?
The determination of the category of a given statement is a classification problem, in which the parameters are the questions listed.
3.3 Examples of Categorized Propositions
Some examples of statements belonging to the seven categories named above are as follows:
Logical Statements:
- A proposition and its negation imply all other propositions.
- A proposition P is derivable from propositions P and Q.
Mathematical Statements:
- The sum of the inner angles of a triangle is 180 degrees.
- There are natural numbers x, y, z, such that x^3 + y^3 = z^3.
Formal Statements:
- Jupiter is a planet.
- Cu is the chemical symbol of copper.
Grammatical Statements:
- Superconductivity is an important property.
- The statement, "Aluminium substitution reduces superconductivity," is consistent (with the knowledge base).
- Why BaPbBiO3 is a superconductor is not explainable (by the existing knowledge).
Theoretical-hypothetical-empirical Statements:
- The decay products of neutron are proton, electron and antineutrino.
- Electronegativity of nickel is 1.9.
- In metallic superconductors, superconductivity and electron density are positively related.
Historical Propositions:
- Superconductivity was discovered by H.K. Onnes in 1911.
- Partial substitution of sulfur for oxygen in LaNiO3 has been tried in superconductivity experiments.
Factual Statements:
- The price of Aluminium is 1.75 US dollars.
- The price of Scandium is over 50,000 US dollars.
Simple and complex propositional knowledge can be further organised within each category into levels of expressions. The levels can be based on Russell's logical theory of types (see, Whitehead & Russell, 1970). The categorization has been applied to varying extents in four different computational models of scientific reasoning and discovery in particle physics (see, Kocabas, 1989a; 1989b; and 1991), and is currently implemented in the development of a computational model of discovery in oxide superconductivity. These will not be described here, but a discussion based on our observations about the implementations will be given instead.
4. Discussion
The importance of the categorization of knowledge lies in the following possible advantages it may provide in the acquisition, representation, refinement and effective use of knowledge: 1) simple and complex propositions of different categories can be represented, accessed and used separately, 2) automatic transformation of knowledge from one form of representation into another one, especially from predicate statements into frames, can be made easier, which can be very useful in knowledge acqu isition during the development of large knowledge systems like CYC (Lenat et al., 1986), 3) some useful general formal and informal rules applicable to each category can be found, 4) logic mistakes in designing knowledge systems can be minimized, 5) search procedures can be made more effective, for categorization eliminates unnecessary search activities in the system, (e.g., formal questions can be answered by conducting search only in formal knowledge, theoretical questions in theoretical knowledge and so on), 6) more detailed and effective identification and resolution of conflicts and theory revision (or truth maintenance) is possible, and 7) dynamic reorganisation of knowledge can be made easier.
Categorization of knowledge facilitates truth maintenance, as in such systems "truths" of propositions of different categories are established differently.(A more detailed description of conflict resolution based on the categorization is given in Kocabas, 1989c.) Furthermore, some priorities of validity can be given to knowledge belonging to certain categories. For example, if a factual statement is in contradiction with a hypothesis, the problem is resolved by simply giving priority to factual knowledge over the theoretical, rather than removing the hypothesis.
In a categorized knowledge system, hypothesis generation is more systematic. New hypotheses are generated from factual and theoretical knowledge by induction and other forms of generalization. Similarly, new hypotheses can be generated from theoretical knowledge by specialization, abstraction and abduction. Certain forms of reasoning do not apply to certain categories of knowledge (e.g., specialization is not applicable to factual knowledge).
Combined uses of frame and logic representation provides a more structured representation of knowledge. Frame representation by itself is less efficient in making full use of logical and extralogical inference such as deduction, abstraction and abduction (see, Lenat & Guha, 1990). On the other hand, logic representation alone provides a less efficient basis for certain kinds of reasoning (e.g. taxonomic and analogical reasoning). The integration of frame and logic representations allows the copying and transformation of knowledge from frames into predicate statements. Brewka (1987) describes a method for translating knowledge from frames into predicate statements. However, he makes no mention of any theoretical work on translating predicate statements into frames in a general and systematic way. Categorization provides this opportunity, as it allows the transfer or transformation of predicate statements into frames.
In a categorized system, knowledge acquisition does not have to be in the form of frames. Knowledge can be entered in simple predicate statements to be categorized by means of a small set of transformation functions. They can then be automaticall y structured into frames by means of a set of knowledge assembly functions. Naturally, the frames must reflect the categorical distinctions in their structure. For example, every frame can have slots corresponding to the category names such as "logical", "formal", "the" and "factual".
Categorization facilitates the analysis of descriptive knowledge. Other methods of expressing complex propositions in terms of their atomic constitutents had been developed. E.g. Kowalski (1979, p. 33) describes the method of expressing n-ary relationships as a conjunction of n+1 binary relationships. Complex descriptions can also be represented in this way. This method is directly applicable in frame systems, but it decomposes propositions into decriptions instead of atomic sentences, which may not always be desirable for reasons of hindering the effective use of logical and extralogical inference.
Complex propositions can be analyzed to and represented in simple, categorized predicate statements. Consider, for example the proposition: "The pulsar, which was discovered by radio astronomers is a rapidly spinning neutron star whose radio sign al regularly turns off and on." This sentence can be split into simpler propositions:
formal: A pulsar is a neutron star.
the: A pulsar spins rapidly.
the: The radio signal of a pulsar rapidly turns on and off.
historical: The pulsar was discovered by radio astronomers.
As is seen, the constituent propositions are not only representationally, but also categorically distinguishable. Feigenbaum has recently redefined problem solving in terms of "knowledge assemby" rather than search (see, Engelmore & Morgan, 1988, vi i). In this new definition the emphasis is on "finding the right piece of knowledge to build into the right place in the emerging solution structure". In a categorized knowledge system simple propositions can be assembled together to build more complex constructs such as complex propositions, frames and schemata.
Inevitably, the categorization has its own drawbacks, beside introducing hopes in resolving some important knowledge level problems in the development of large knowledge systems. First of all, it imposes a structure on descriptive kowledge, which is based on a set of criteria. The validity of these criteria can be argued, but if humans utilize such criteria in organising their knowledge, it is worth considering to utilize them in computational models.
Another problem can be with certain propositions whose category may not be easy to decide. In such cases, the proposition can be maintained in more than one category. However, such cases are rare, and the duplications do not pose a serious difficulty. On the other hand, categorization errors can be identified and resolved by a set of "knowledge administration functions" which can be built to supervise the "distribution", "maintenance" and the "assembly" of knowledge in the categories of the system.
Holland's (1986) suggestion of using general purpose learning algorithms and giving emphasis on inductive learning as a solution to the brittleness problem has appeal. However, systems based on such methods use complex input and output units and their complexity grows as the system is given knowledge of different kinds and levels. In the end, the brittlenes problem transforms into two problems: Input and output management. To a certain extent this can be avoided in a multi layered general purpose system, but the theoretical basis for such systems has not been sufficiently developed. Additionally, Holland's proposal underestimates the role of the deductive methods in learning.
The categorization scheme introduced, is much more detailed than the classification suggested by Woods (1986) in which he divides knowledge into "facts" and "rules". His "facts" include what we call factual statements, simple logical, formal, mathematical, theoretical, and historical propositions. His "rules" include complex logical, formal, mathematical, and theoretical rules, which can be called rules of inference, as well as action rules (or production rules). The categorization meets some of the requirements proposed by Aiello et al., (1986), as it allows knowledge and metaknowledge to be represented in the same form. In this way, it allows the inference mechanisms to be accessible at both levels.
5. Conclusions
In this paper we described a methodology for organising descriptive knowledge into several functional categories, and discussed the ways in which it can integrate different methods of representation such as predicate logic and frame representations. The categorization helps to resolve some of the major problems of developing and maintaining large knowledge systems. These problems are, brittleness, knowledge acquisition bottleneck, and the identification and resolution of conflicts. Clarity and simplicity are essential in building complex knowledge systems, because as the system grows, it becomes more and more difficult to keep track of the relationships between domain concepts. (Lenat et al., 1986; 1990) provide dramatic examples of the complexities of adding more knowledge to a large knowledge system.) The categorization scheme introduced, provides clarity and simplicity, as it allows different kinds of knowledge to be represented in an organised way. Commonsense, as well as scientific knowledge can be represented in the categories described. The categorization has been implemented in several computational systems which model reasoning and discovery in astronomy, particle physics, and high-temperature superconductivity, incorporating various methods of learning and conflict resolution. More detailed implementations are being carried out.
References
Brachman, R.J. & Levesque, H.J. (1983). Krypton, a functional approach to knowledge representation. IEEE Computer. October 1983. (67-73).
Brewka, G. (1987). The logic of inheritance in frame systems. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, (483-488). Morgan Kaufmann.
Cheng, Y.S. and Fu, K.S. (1984). Conceptual clustering in knowledge organization. The First Conference on Artificial Intelligence Applications, (274-279). IEEE Computer Society Press.
de Kleer, J.S. (1986). An assumption based TMS. Artificial Intelligence, 28, 127-162.
Edwards, P. (1967). The Encyclopedia of Philosophy. Collier MacMillan Ltd.
Engelmore, R., & Morgan, T. (Eds.) (1988). Blackboard systems. Addison Wesley.
G”del, K. (1962). On formally undecidable propositions of Principia Mathematica and related systems. New York: Basic Books.
Holland, J.H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R.S. Michalski, J.G. Carbonell and T.M. Mitchell (eds.), Machine learning: An artificial intelligence approach. Los Altos, CA: Morgan Kaufmann.
Kocabas, S. (1989a). Qualitative and quantitative reasoning in particle physics. Proceedings of the 12th International Congress on Cybernetics, 250-271, Namur, Belgium.
Kocabas, S. (1989b). Scientific explanation by exclusion. Proceedings of the 12th International Congress on Cybernetics, 840-851, Namur, Belgium.
ÿ Kocabas, S. (1989c). Functional categorization of knowledge: Applications in modeling scientific research and discovery. PhD Thesis, Department of Electronic and Electrical Engineering, King's College London, University of London.
Kocabas, S. (1991). Conflict resolution as discovery in particle physics. Machine Learning, Vol 6, No 3, (277-309).
Kowalski, R. (1979). Logic for problem solving. New York: Elsevier North Holland.
Langley, P. (1986). Editorial: Human and Machine Learning. Machine Learning, Vol. 1, No.3, (246-247).
Lenat, D.B. (1983). The role of heuristics in learning by discovery: Three case studies. In R.S. Michalski, J.G. Carbonell & T.M. Mitchell (Eds.), Machine Learning: An artificial intelligence approach. Los Altos, CA: Morgan Kaufmann.
Lenat, D.B. Prakash, M., Shepherd, M. (1986). CYC: Using common sense knowledge to overcome brittleness and knowledge acquisition bottlenecks. The AI Magazine 7, No. 4, (65-85).
Lenat,D.B. & Feigenbaum, E.A. (1987). On the thresholds of knowledge. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, (1173-1182).
Lenat,D.B. & Feigenbaum, E.A. (1987). On the thresholds of knowledge. Artificial Intelligence, 47, (185-250).
Lenat, D.B. & Guha, R.V. (1990). Building Large knowledge based systems: Representation and inference in the CYC project. New York, NY: Addison Wesley.
ÿ McCarthy, J. (1983). Some expert systems need common sense. Annals of New York Academy of Sciences, 426, 129-137.
Minsky, M. (1984). Afterword to: Vinge, V., True Names. (130-153). New York: Bluejay Books.
Piaget, J. (1971). The language and thought of the child. Routledge & Kegan Paul Ltd. London.
Porter, B.W. and Kibler, D.F. (1986). Experimental goal regression: A method for learning problem-solving heuristics, Machine Learning, Vol 1, No. 3, (249-286).
Rose, D. & Langley, P. (1986). Chemical discovery as belief revision. Machine Learning, 1, (423-452).
Whietehead, A.N. & Russell, B. (1970). Principia Mathematica to 56. Cambridge: Cambridge Univertsity Press.
Wittgenstein, L. (1953). Philosophical Investigations. G.H. von Wright and G.E.M. Anscombe (Eds.). Oxford: Basil Blackwell.
Wittgenstein, L. (1965). The Blue and Brown Books. New York: Harper Torchbooks.
Wittgenstein, L. (1970). Lectures and conversations on aesthetics, psychology and religious belief. Ed. by C. Barrett. Oxford: Basil Blackwell.
ÿ Wittgenstein, L. (1974). Wittgenstein's lectures on the foundations of mathematics, Cambridge, 1939. C. Diamond (Ed.) Sussex: The Harvester Press.
Wittgenstein, L. (1978). Remarks on colour. G.E.M. Anscombe (Ed.) Oxford: Basil Blackwell.
Wittgenstein, L. (1981). Zettel. G.E.M. Anscombe and G.H. von Wright (Eds.) Oxford: Basil Blackwell.
Woods, W.A. (1986). Important issues in knowledge representation. Proceedings of the IEEE. Vol. 74, No. 10. (1322-1334).
Propositional Knowledge Representation)
FUNCTIONAL CATEGORIZATION OF KNOWLEDGE
Dr Sakir Kocabas
ITU, Maslak, Turkey
Abstract
The continuous increase of human knowledge rendered the classification of knowledge an important task, from very early ages, for philosophers like Aristotle down to the modern age, to Wittgenstein. These classifications were necessitated by the difficulties in understanding, memorization and transmission of knowledge. An analogous task now is faced in knowledge based artificial intelligence systems as the needs arise to build larger and more versatile systems. In this paper we introduce a method for organising knowledge into several linguistic categories. We describe how this categorization introduces clarity in representing different types of knowledge, how it facilitates the analysis of complex propositions into their simple constituents, and how these in turn can be assembled into complex constructs such as frames and schemata.
1 Introduction
Large knowledge systems of the future, especially those that will represent scientific theories in physics, chemistry, biology astronomy, etc., cannot be confined to narrow domains of expertise. However, as the amount of knowledge given to or acquired by a system increases, two main problems arise: _knowledge acquision bottleneck_ (Lenat, Prakash, & Shepherd, 1986) and _brittleness_ (McCarthy, 1983; Holland, 1986). Knowledge acquisition bottleneck is related to the acquisition of new knowledge by instruction or various other methods of learning. Minsky (1984) and Lenat et al. (1986) point out that, in acquiring new knowledge, the human mind overcomes this problem by recalling similar concepts it already knows about and by recording the exceptions to the case in consideration. Brittleness on the other hand, is related to the difficulties in having the knowledge system expand beyond the scope originally contemplated by their designers.
One solution to the brittleness problem, proposed by McCarthy (1983), is to provide a system with the ability to acquire commonsense knowledge and reasoning. Another solution, which emphasizes inductive learning and is based on general purpose learning algorithms, has been proposed by Holland (1986). His general purpose "classifier systems", are essentially reactive rule based systems relying on genetic algorithms.
Holland (1986) proposes induction as the basic - and perhaps the only - way of making large advances in overcoming brittleness. In considering the specific problems induction faces in this context, Holland identifies the creation of useful ways of categorizing input as the primary task. He suggests that categories must be incorporated into rules that "point" both to actions and to an aura of associated categories. That is, as they are induced, they must be arranged in a "tangled hierarchy", enabling the system to model its environment appropriately.
Holland seems to have an Aristotelian notion of concept-based categorization(1) in mind here as opposed to functional categorization of knowledge. By "category", he understands common abstract features between objects or frames, so that when these are captured, one set of feature can be used in developing a similar frame. (Frames can be regarded as complex propositions linked together by inheritance features.)
Lenat (1983) proposes to carry along multiple representations simultaneously and to shift from one representation to another to enable the knowledge system to carry out the most frequent operations more quickly. He says that this has not been much studied or attempted in artificial intelligence, except in very small worlds. Lenat's CYC system (see, Lenat & Guha, 1990) makes use of this idea.
Porter and Kibler (1986, p. 283) state that in machine learning research, most systems have been built on a small number of rules (heuristics) without having to address the problem of organizing learned knowledge into a coherent, efficiently accessible whole. Also, Cheng and Fu (1984) emphasize that, compared with knowledge representation or the formalization of concepts, little work has been done in the area of knowledge organisation.
Woods (1986) offers a simple classification of knowledge. He states that the "knowledge of the world" consists of two kinds of things - facts about what is or has been true (the known world state) and rules for predicting changes over time. He mentions the need for taxonomic organisation. Woods also argues that the standard notations and semantics of the predicate calculus are insufficient by themselves - they need to be supplemented with additional mechanisms e.g., for non-monotonic reasoning, and metalogical reasoning.
A "functional" knowledge system described by Brachman and Levesque (1983), distinguishes between "definitional" and "factual" information. Their system contains two "boxes" of knowledge; one for maintaining analytical knowledge, and the other to build descriptive domain theories. They also use two languages for their representation, a frame language for analytical knowledge and an "assertional langage" for the descriptive domain theories.
Langley (1986) states that although concept learning has been a basic mainstay of the machine learning community, most research in this area has ignored a number of well-established psychological phenomena. He says that basic-level categories appear to have a special status in human memory, being retreived more quickly and being acquired earlier than other concepts, and suggests that more work is needed on concept formation, for such research would yield a better understanding of human concepts and their acquisition, and it should also lead to improved methods of nonhuman concept learning (also see, Gennari, Langley & Fisher (1989).
His notion of category referred here, is also concept based, in which clusters of concepts are regarded as categories, which in turn, are organised in hierarchies. A concept based approach addresses the issue of conceptual organisation of knowledge. Whereas, it can be argued that a considerable proportion of human cognitive activity is propositional. Therefore a functional organisation of knowledge needs to be developed at least in equal priority and depth.
The categorization intoduced in this paper is based on fundamental methodological and linguistic criteria: methods of verification, meaning (use), and the function of expressions. It is not aimed at the classification of concepts, but of simple propositions. In philosophical terms, our concept of category is based on the deep grammar of propositions and therefore, is quite different from the concept-based notions.
2. Theoretical and Philosophical Background: Piaget and Wittgenstein The future of artificial intelligence, to a certain extent, depends on the studies on cognitive development. Because of its better tractability by means of natural language, human cognitive development is still the best source for developing knowled ge based models in artificial intelligence. The foundations of cognitive science was laid by Piaget's work on human cognitive development in the early 1920s.
Piaget (1971) made extensive studies on human cognitive development and the development of language. There are several reasons why his work is of interest: It is related with 1) the linguistic methods of knowledge acquisition (questions and their classification), 2) the order of knowledge acquisition according to the types of knowledge acquired, 3) the methods of relating the acquired knowledge, and 4) the theoretical foundations for the organisation of knowledge.
Piaget (1971, p. 30) draws our attention to the importance of questions in cognitive development. From the standpoint of cognitive development, a question is a spontaneous search for information. He studies questions asked by the child between the ages of six and seven, and classifies them as questions 1) for causal explanation, 2) about reality, 3) on actions and intentions, 4) rules, 5) about classification, and 6) arithmetical questions.
It may be worthwhile to consider the child's questions from the standpoint of the organisation of acquired knowledge. Explanations seem to play a critical role in these activities. It is conceivable that the child's mind is actively involved in organising its knowledge during the knowledge acquisition processes. Analogously, intelligent knowledge systems may have to be given the ability to ask as many meaningful questions as possible during knowledge acquisition and learning, particularly in the development stage of such systems. The difficulties encountered in knowledge acquisition can be avoided by the use of such strategies of learning frequently used by the child. Lenat et al. (1986) use a similar strategy in developing the knowledge base of their CYC system, but their knowledge acquisition methods are not automated.
Piaget's observations on the "why" questions of the child can be viewed as the manifestation of the operations of an effective organizational capability for acquired knowledge. The child's lack of interest in logical justification of explanations can also be explained within this perspective: Confronted with a vast amount of knowledge to be learned, the primary cognitive problem must naturally be how to organise the acquired knowledge rather than learning how any proposition can be theoretically, logically or mathematically justified. Piaget's classification of questions and explanations, albeit for psychological purposes, constitutes an important step in understanding how expressions are used in language and how they might be classified according to their function and methods of verification.
Having seen some of Piaget's views on human cognitive development, we can now take a brief look at Wittgenstein's philosophical work on the grammatical distinctions between various types of expressions. Wittgenstein considered the functional classification of expressions as one of the most important tasks in philosophy. In one of his earlier works (Wittgenstein, 1965, pp. 44-45), he provides an illustrative analogy between the task of organising a heap of books into a library and the classification of knowledge. Wittgenstein pronounces that some of the greatest achievements in philosophy could only be compared with taking up some books which seemed to belong together and putting them on different shelves; nothing more being final about their positions than that they no longer lie side by side.
Wittgenstein's use of the methods of distinction between various types of expressions are at times implicit and scattered in his books. At Cambridge in 1939 (see, Wittgenstein, 1974), he devoted a whole series of lectures on how to distinguish mathematical sentences from non-mathematical statements. For example, in (Wittgenstein, 1974, p. 34) he says: "'Professor Hardy believes that x1 > x0' is not a mathematical statement. It is no more a mathematical statement than "William said that 7x8=54' is a mathematical statement." Here, Witggenstein is trying to draw a distinction between a factual statement and a mathematical sentence. Indeed, later (Wittgenstein, 1974, p. 111), he clearly states that in his discussions he aims to show the essential difference between the uses of mathematical propositions and non-mathematical propositions which seem to be "exactly analogous" to the former. One of the criteria that he proposes for distinguishing between mathematical and non-mathematical statements is the prefix "by definition ...", which is applicable to logico-mathematical and formal statements, but not to factual statements. In his various studies, Wittgenstein distinguishes between philosophical statements and theoretical statements; first-person psychological statements and theoretical statements; and basic belief statements and theoretical statements (e.g., see, Wittgenstein, 1953; Wittgenstein, 1970; Wittgenstein, 1978).
The Method of Categorization
The development of large and reliable knowledge systems requires an effective knowledge organisation. It seems that one way of achieving this is to categorize simple propositions into functionaly different classes and then assemble them into frames as necessary. Here there are two questions to be asked: 1) How many grammatically different categories can be identified in language? 2) What can be the criteria to be used in distinguishing between these categories? Before we attempt to answer these questions, the problems of consistency and completeness in knowledge systems need to be reconsidered.
3.1 Consistency, Completeness, Validity and Truth
Since Goedel's (1962) famous theorem, it is widely accepted that an overall completeness and consistency may not be achievable in large bodies of knowledge. However, a domain dependent consistency and completeness can and needs to be maintained. It was Wittgenstein (1981, p. 118) who pointed out that a contradiction is not the end of the world, but something that sets a limit to a particular "language game" or "grammar". His remarks opened the way to a context dependent concept of consistency in the philosophy of language. More recently, from the viewpoint of artificial intelligence, de Kleer (1986) argues that there is no necessity that the overall database be consistent in a knowledge system. He suggests that a context dependent concept of consistency provides a better way of achieving control in problem solving. Similar arguments can be found in recent artificial intelligence literature (see, e.g. Lenat & Feigenbaum, 1987; 1991.)
It may even be meaningless to try to achieve an overall consistency and completeness in complex bodies of knowledge. The simple reason is that "truth" and "verification" have different meanings in different "grammars". In other words, quite different methods and criteria are employed in establishing the "truth" of propositions in different categories. To illustrate, let us consider the expressions:
(1) 1^3 + ... + n^3 = (n(n+1)/2)^2
(2) Jupiter is a planet.
(3) In a nuclear reaction, the amount of energy released is equal to the rest-mass difference multiplied by the square of the velocity of light (E=mc^2).
(4) Calculus was invented by Newton and Leibniz independently.
The criteria and methods that are used in establishing the "validity" of each of these expressions are quite different:.The first expression is a mathematical statement which can be proved by the axioms of arithmetic using mathematical induction, and no factual investigation is needed to prove it. The second is a formal statement the "truth" of which is implied by the order of the concepts "planet" and "Jupiter" in language. Unlike theoretical, hypothetical, empirical statements, formal statements are not compared with facts for their validity. The third one is a theoretical statement which is used as part of a model to understand and explain certain physical phenomena. Finally, the last sentence is a historical statement, the "validity" of which is established by the methods used in historical study.
These four statements can be considered as belonging to four different categories, namely the categories of logico-mathematical, formal, theoretical, and historical statements. Propositions belonging to these categories can be found in the body of almost any scientific work. Therefore, knowledge systems should take into account such categorical distinctions. A system of criteria has been developed for functional classification of propositions, based on the works of Piaget (1971) and Wittgenstein (1974) outlined above. The principles for the functional categorization of propositions will now be described.
3.2 The Categorization Criteria
Propositions can be categorized according to their functions (uses) in language. The criteria that can be used in such categorizations can initially be divided into three groups:
1) Methodological (epistemic) criteria. The differences between the methods of verification (or falsification) of propositions may help to determine their categories. Verification methods of theoretical statements are different from those of logico-mathematical statements. In addition, verification may not apply to some propositions at all (e.g., allegorical and fictional sentences). A theoretical, hypothetical or empirical statement must be testable, or else must be derivable from testable theoretical statements, while logico- mathematical and formal statements are not verified by testing (i.e. by observation and/or experimentation).
Verification of formal statements is embedded in the logical structure of their constitutent concepts in language. In this way, formal statements reflect the concept structure in language. For example the sentences, "A cat is an animal," and "Calcium is an element" are formal statements and no one would ask for their verification by the verification methods of theoretical, hypothetical or empirical statements. Definitions also belong to the category of formal statements. On the other hand, logico- mathematical statements (e.g., theorems, non-theorems, lemmas and corollaries) are subject to the criterion of provability in a formal or informal calculus.
2) Grammatical labels used in language in the form of prefixes and postfixes., which play a role in the categorization of expressions. Some examples are: "By definition, ...", "According to the theory, ...", "According to the hypothesis, ...", "According to our experiences, ...", "I believe that, ...", "I do not believe that, ...", "I feel that, ...", "Probably, ...", "Possibly, ...", etc. It seems that at times we use these prefixes like flags or labels for classes of expressions, to signify their places in our knowledge.
Each of these prefixes are meaningful with the expressions of a certain category, while with others they are not. For example, it is meaningful to say: "By definition, 2+2 is 4," while it is not meaningful to say: "By definition, the gravitational force between two bodies is proportional to their masses and inversely proportional to the square of the distance between them." Similarly, it is not meaningful to say: "According to the hypothesis, 2+2 is 4," while it is meaningful to use the prefix with a theoretical or hypothetical statement.
3) Comparisons and contrasts of propositions with the ones already identified as belonging to a category. Comparisons can be very useful in cases where other criteria may fail to identify the category of a statement. Consider Fermat's "last theorem" for example. One might think that it is a hypothesis like those which physicists use to hypothesize certain subatomic particles (e.g. quarks). Fermat's last theorem is easily comparable with some other familiar arithmetical theorems, whereas physical hypotheses are more readily comparable with other similar physical hypotheses.
By using these criteria, it is possible to build a system of categorization which can be used in distinguishing the following categories: i) logical propositions, ii) mathematical statements, iii) formal statements, iv) grammatical (or meta) statements, v) theoretical-hypothetical-empirical statements, vi) historical sentences, vii) factual statements. In addition to these, there are other identifiable categories which can be listed as: a) basic belief statements, b) sensory and intentional statements, c) metaphorical and allegorical statements, d) fictional sentences. However, these will not be considered here, because the seven categories above seem to be sufficient to represent scientific knowledge to a large extent.
A list of questions that can be used in determining the category of a given statement S, is as follows:
- Does the statement S describe an event currently in effect?
- Does the statement S describe a past event or state of affairs?
- Does the statement S define a concept?
- Is the statement S testable against facts?
- Is the statement S varifiable by observations/experiments?
- Is the statement S provable/deducible?
- Is the statement S about another statement?
- Is the statement S a comment on a property/relation?
- Is the statement S a comment on a state of affairs?
- Is the statement, "Possibly, S" meaningful?
- Is the statement, "Probably, S" meaningful?
- Is the statement, "By definition, S" meaningful?
- Is the statement, "According to the hypothesis, S" meaningful?
- Is the statement, "According to the theory, S" meaningful?
- Is the statement, "According to the experiences, S" meaningful?
- Is the statement, "According to the rules, S" meaningful?
- Is the statement, "According to (so and so), S" meaningful?
The determination of the category of a given statement is a classification problem, in which the parameters are the questions listed.
3.3 Examples of Categorized Propositions
Some examples of statements belonging to the seven categories named above are as follows:
Logical Statements:
- A proposition and its negation imply all other propositions.
- A proposition P is derivable from propositions P and Q.
Mathematical Statements:
- The sum of the inner angles of a triangle is 180 degrees.
- There are natural numbers x, y, z, such that x^3 + y^3 = z^3.
Formal Statements:
- Jupiter is a planet.
- Cu is the chemical symbol of copper.
Grammatical Statements:
- Superconductivity is an important property.
- The statement, "Aluminium substitution reduces superconductivity," is consistent (with the knowledge base).
- Why BaPbBiO3 is a superconductor is not explainable (by the existing knowledge).
Theoretical-hypothetical-empirical Statements:
- The decay products of neutron are proton, electron and antineutrino.
- Electronegativity of nickel is 1.9.
- In metallic superconductors, superconductivity and electron density are positively related.
Historical Propositions:
- Superconductivity was discovered by H.K. Onnes in 1911.
- Partial substitution of sulfur for oxygen in LaNiO3 has been tried in superconductivity experiments.
Factual Statements:
- The price of Aluminium is 1.75 US dollars.
- The price of Scandium is over 50,000 US dollars.
Simple and complex propositional knowledge can be further organised within each category into levels of expressions. The levels can be based on Russell's logical theory of types (see, Whitehead & Russell, 1970). The categorization has been applied to varying extents in four different computational models of scientific reasoning and discovery in particle physics (see, Kocabas, 1989a; 1989b; and 1991), and is currently implemented in the development of a computational model of discovery in oxide superconductivity. These will not be described here, but a discussion based on our observations about the implementations will be given instead.
4. Discussion
The importance of the categorization of knowledge lies in the following possible advantages it may provide in the acquisition, representation, refinement and effective use of knowledge: 1) simple and complex propositions of different categories can be represented, accessed and used separately, 2) automatic transformation of knowledge from one form of representation into another one, especially from predicate statements into frames, can be made easier, which can be very useful in knowledge acqu isition during the development of large knowledge systems like CYC (Lenat et al., 1986), 3) some useful general formal and informal rules applicable to each category can be found, 4) logic mistakes in designing knowledge systems can be minimized, 5) search procedures can be made more effective, for categorization eliminates unnecessary search activities in the system, (e.g., formal questions can be answered by conducting search only in formal knowledge, theoretical questions in theoretical knowledge and so on), 6) more detailed and effective identification and resolution of conflicts and theory revision (or truth maintenance) is possible, and 7) dynamic reorganisation of knowledge can be made easier.
Categorization of knowledge facilitates truth maintenance, as in such systems "truths" of propositions of different categories are established differently.(A more detailed description of conflict resolution based on the categorization is given in Kocabas, 1989c.) Furthermore, some priorities of validity can be given to knowledge belonging to certain categories. For example, if a factual statement is in contradiction with a hypothesis, the problem is resolved by simply giving priority to factual knowledge over the theoretical, rather than removing the hypothesis.
In a categorized knowledge system, hypothesis generation is more systematic. New hypotheses are generated from factual and theoretical knowledge by induction and other forms of generalization. Similarly, new hypotheses can be generated from theoretical knowledge by specialization, abstraction and abduction. Certain forms of reasoning do not apply to certain categories of knowledge (e.g., specialization is not applicable to factual knowledge).
Combined uses of frame and logic representation provides a more structured representation of knowledge. Frame representation by itself is less efficient in making full use of logical and extralogical inference such as deduction, abstraction and abduction (see, Lenat & Guha, 1990). On the other hand, logic representation alone provides a less efficient basis for certain kinds of reasoning (e.g. taxonomic and analogical reasoning). The integration of frame and logic representations allows the copying and transformation of knowledge from frames into predicate statements. Brewka (1987) describes a method for translating knowledge from frames into predicate statements. However, he makes no mention of any theoretical work on translating predicate statements into frames in a general and systematic way. Categorization provides this opportunity, as it allows the transfer or transformation of predicate statements into frames.
In a categorized system, knowledge acquisition does not have to be in the form of frames. Knowledge can be entered in simple predicate statements to be categorized by means of a small set of transformation functions. They can then be automaticall y structured into frames by means of a set of knowledge assembly functions. Naturally, the frames must reflect the categorical distinctions in their structure. For example, every frame can have slots corresponding to the category names such as "logical", "formal", "the" and "factual".
Categorization facilitates the analysis of descriptive knowledge. Other methods of expressing complex propositions in terms of their atomic constitutents had been developed. E.g. Kowalski (1979, p. 33) describes the method of expressing n-ary relationships as a conjunction of n+1 binary relationships. Complex descriptions can also be represented in this way. This method is directly applicable in frame systems, but it decomposes propositions into decriptions instead of atomic sentences, which may not always be desirable for reasons of hindering the effective use of logical and extralogical inference.
Complex propositions can be analyzed to and represented in simple, categorized predicate statements. Consider, for example the proposition: "The pulsar, which was discovered by radio astronomers is a rapidly spinning neutron star whose radio sign al regularly turns off and on." This sentence can be split into simpler propositions:
formal: A pulsar is a neutron star.
the: A pulsar spins rapidly.
the: The radio signal of a pulsar rapidly turns on and off.
historical: The pulsar was discovered by radio astronomers.
As is seen, the constituent propositions are not only representationally, but also categorically distinguishable. Feigenbaum has recently redefined problem solving in terms of "knowledge assemby" rather than search (see, Engelmore & Morgan, 1988, vi i). In this new definition the emphasis is on "finding the right piece of knowledge to build into the right place in the emerging solution structure". In a categorized knowledge system simple propositions can be assembled together to build more complex constructs such as complex propositions, frames and schemata.
Inevitably, the categorization has its own drawbacks, beside introducing hopes in resolving some important knowledge level problems in the development of large knowledge systems. First of all, it imposes a structure on descriptive kowledge, which is based on a set of criteria. The validity of these criteria can be argued, but if humans utilize such criteria in organising their knowledge, it is worth considering to utilize them in computational models.
Another problem can be with certain propositions whose category may not be easy to decide. In such cases, the proposition can be maintained in more than one category. However, such cases are rare, and the duplications do not pose a serious difficulty. On the other hand, categorization errors can be identified and resolved by a set of "knowledge administration functions" which can be built to supervise the "distribution", "maintenance" and the "assembly" of knowledge in the categories of the system.
Holland's (1986) suggestion of using general purpose learning algorithms and giving emphasis on inductive learning as a solution to the brittleness problem has appeal. However, systems based on such methods use complex input and output units and their complexity grows as the system is given knowledge of different kinds and levels. In the end, the brittlenes problem transforms into two problems: Input and output management. To a certain extent this can be avoided in a multi layered general purpose system, but the theoretical basis for such systems has not been sufficiently developed. Additionally, Holland's proposal underestimates the role of the deductive methods in learning.
The categorization scheme introduced, is much more detailed than the classification suggested by Woods (1986) in which he divides knowledge into "facts" and "rules". His "facts" include what we call factual statements, simple logical, formal, mathematical, theoretical, and historical propositions. His "rules" include complex logical, formal, mathematical, and theoretical rules, which can be called rules of inference, as well as action rules (or production rules). The categorization meets some of the requirements proposed by Aiello et al., (1986), as it allows knowledge and metaknowledge to be represented in the same form. In this way, it allows the inference mechanisms to be accessible at both levels.
5. Conclusions
In this paper we described a methodology for organising descriptive knowledge into several functional categories, and discussed the ways in which it can integrate different methods of representation such as predicate logic and frame representations. The categorization helps to resolve some of the major problems of developing and maintaining large knowledge systems. These problems are, brittleness, knowledge acquisition bottleneck, and the identification and resolution of conflicts. Clarity and simplicity are essential in building complex knowledge systems, because as the system grows, it becomes more and more difficult to keep track of the relationships between domain concepts. (Lenat et al., 1986; 1990) provide dramatic examples of the complexities of adding more knowledge to a large knowledge system.) The categorization scheme introduced, provides clarity and simplicity, as it allows different kinds of knowledge to be represented in an organised way. Commonsense, as well as scientific knowledge can be represented in the categories described. The categorization has been implemented in several computational systems which model reasoning and discovery in astronomy, particle physics, and high-temperature superconductivity, incorporating various methods of learning and conflict resolution. More detailed implementations are being carried out.
References
Brachman, R.J. & Levesque, H.J. (1983). Krypton, a functional approach to knowledge representation. IEEE Computer. October 1983. (67-73).
Brewka, G. (1987). The logic of inheritance in frame systems. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, (483-488). Morgan Kaufmann.
Cheng, Y.S. and Fu, K.S. (1984). Conceptual clustering in knowledge organization. The First Conference on Artificial Intelligence Applications, (274-279). IEEE Computer Society Press.
de Kleer, J.S. (1986). An assumption based TMS. Artificial Intelligence, 28, 127-162.
Edwards, P. (1967). The Encyclopedia of Philosophy. Collier MacMillan Ltd.
Engelmore, R., & Morgan, T. (Eds.) (1988). Blackboard systems. Addison Wesley.
G”del, K. (1962). On formally undecidable propositions of Principia Mathematica and related systems. New York: Basic Books.
Holland, J.H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R.S. Michalski, J.G. Carbonell and T.M. Mitchell (eds.), Machine learning: An artificial intelligence approach. Los Altos, CA: Morgan Kaufmann.
Kocabas, S. (1989a). Qualitative and quantitative reasoning in particle physics. Proceedings of the
Kocabas, S. (1989b). Scientific explanation by exclusion. Proceedings of the 12th International Congress on Cybernetics, 840-851, Namur, Belgium.
ÿ Kocabas, S. (1989c). Functional categorization of knowledge: Applications in modeling scientific research and discovery. PhD Thesis, Department of Electronic and Electrical Engineering, King's College London, University of London.
Kocabas, S. (1991). Conflict resolution as discovery in particle physics. Machine Learning, Vol 6, No 3, (277-309).
Kowalski, R. (1979). Logic for problem solving. New York: Elsevier North Holland.
Langley, P. (1986). Editorial: Human and Machine Learning. Machine Learning, Vol. 1, No.3, (246-247).
Lenat, D.B. (1983). The role of heuristics in learning by discovery: Three case studies. In R.S. Michalski, J.G. Carbonell & T.M. Mitchell (Eds.), Machine Learning: An artificial intelligence approach. Los Altos, CA: Morgan Kaufmann.
Lenat, D.B. Prakash, M., Shepherd, M. (1986). CYC: Using common sense knowledge to overcome brittleness and knowledge acquisition bottlenecks. The AI Magazine 7, No. 4, (65-85).
Lenat,D.B. & Feigenbaum, E.A. (1987). On the thresholds of knowledge. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, (1173-1182).
Lenat,D.B. & Feigenbaum, E.A. (1987). On the thresholds of knowledge. Artificial Intelligence, 47, (185-250).
Lenat, D.B. & Guha, R.V. (1990). Building Large knowledge based systems: Representation and inference in the CYC project. New York, NY: Addison Wesley.
ÿ McCarthy, J. (1983). Some expert systems need common sense. Annals of New York Academy of Sciences, 426, 129-137.
Minsky, M. (1984). Afterword to: Vinge, V., True Names. (130-153). New York: Bluejay Books.
Piaget, J. (1971). The language and thought of the child. Routledge & Kegan Paul Ltd. London.
Porter, B.W. and Kibler, D.F. (1986). Experimental goal regression: A method for learning problem-solving heuristics, Machine Learning, Vol 1, No. 3, (249-286).
Rose, D. & Langley, P. (1986). Chemical discovery as belief revision. Machine Learning, 1, (423-452).
Whietehead, A.N. & Russell, B. (1970). Principia Mathematica to 56. Cambridge: Cambridge Univertsity Press.
Wittgenstein, L. (1953). Philosophical Investigations. G.H. von Wright and G.E.M. Anscombe (Eds.). Oxford: Basil Blackwell.
Wittgenstein, L. (1965). The Blue and Brown Books. New York: Harper Torchbooks.
Wittgenstein, L. (1970). Lectures and conversations on aesthetics, psychology and religious belief. Ed. by C. Barrett. Oxford: Basil Blackwell.
ÿ Wittgenstein, L. (1974). Wittgenstein's lectures on the foundations of mathematics, Cambridge, 1939. C. Diamond (Ed.) Sussex: The Harvester Press.
Wittgenstein, L. (1978). Remarks on colour. G.E.M. Anscombe (Ed.) Oxford: Basil Blackwell.
Wittgenstein, L. (1981). Zettel. G.E.M. Anscombe and G.H. von Wright (Eds.) Oxford: Basil Blackwell.
Woods, W.A. (1986). Important issues in knowledge representation. Proceedings of the IEEE. Vol. 74, No. 10. (1322-1334).
Wednesday, August 15, 2007
List of Conference, Symposia etc.
List of all Conference, Symposia, Workshop and Journal Papers.
Kocabaş, Ş. (2001). Automated Formulation of Reactions and Pathways in Nuclear astrophysics: New Results. In Proceedings of the 4th International Conference on Discovery Science (DS-2001). Springer, pp. 172-181.
Kocabaş, Ş.& Langley, P (2001). An Integrated Framework for Extended Discovery in Particle Physics. In Proceedings of the 4th International Conference on Discovery Science (DS-2001). Springer, pp. 122-195.
Kocabaş, Ş. (2000). Automated formulation of reactions and pathways in nuclear astrophysics: New results. In Proceedings of the 9th Turkish Symposium on AI & NN. 21-23 June 2000, Izmir, Turkey. pp. 343-352.
Kocabaş, Ş. Langley, P. (2000). Computer generation of process explanations in nuclear astrophysics. International Journal of Human Computer Studies. 53, 1149-1164
Kocabaş, Ş. Langley, P. (1998). Automated Formulation of Reactions and Reaction Pathways in Nuclear Astrophysics. ECAI-98 Machine Discovery Workshop. 24 Ağustos 1998, Sussex Üniversitesi, Brighton, İngiltere. (Bildiriler, s. 4-9).
Kocabaş, Ş., Öztemel, E. (1998). AISim: An Intelligent Agent for Distributed Interactive Simulation. Computer Generated Forces and Simulation of Behavior (CGF-98). 12-14 Mayıs 1998, Orlando, Florida.(s. 257-261)
Öztemel, E., Kocabaş, Ş. (1998). Simulation for in-flight training. Computer Generated Forces and Simulation of Behavior (CGF-98). 12-14 Mayıs 1998, Orlando, Florida. (s. 149-155).
Kocabaş, Ş., Öztemel, E. (1998). Intelligent Agents for Distributed Interactive Battlefield Simulation. International Training and Education Conference (ITEC-98). 28-30 Nisan,1998, Lozan, İsviçre.
Öztemel, E., Kocabaş, Ş., Çetiner, B.G. (1998). Simulation Management System for WASiF. International Training and Education Conference (ITEC-98). 28-30 Nisan,1998, Lozan, İsviçre.
Kocabaş, Ş., Öztemel, E. (1998). Harp Oyunlarında Yapay Zeka Uygulamaları. TSK-MODSİM, Modelleme ve Simülasyon Konferansı, 1-3 Nisan 1998. Kara Harp Okulu, Ankara.
Kocabaş, Ş., Öztemel, E. (1998). AISim: Dağıtık Ortamlarda kompleks hava muharebesi için geliştirilmiş bir zeki tehdit sistemi.TSK-MODSİM, Modelleme ve Simülasyon Konferansı, 1-3 Nisan 1998. Kara Harp Okulu, Ankara.
Kocabaş, Ş. Öztemel, E., Uludağ, M., Koç, N. (1996). Design of a DIS Agent, the AISim System: A Progress Report. Computer Generated Forces and Simulation of Behavior (CGF-96). 23-25 Temmuz 1996, Orlando, Florida. (s. 119-126).
Öztemel, E., Kocabaş, Ş. (1996). Design Principles for Intelligent Agents in Distributed Interactive Simulation. SimTect '96. 25-26 Mart 1996, Melbourne, Avustralya.
Kocabaş, Ş. (1995). A Methodology for Modeling Scientific Discovery. Dördüncü Türk Yapay Zeka ve YSA Sempozyumu (TAINN-95), 26-28 Haziran 1995, Tübitak-MAM, Gebze. (s. 87-93).
Kocabaş, Ş., Öztemel, E., Uludağ, M., Koç, N. (1995). Agents That Learn and Explain Their Actions. Fifth Conference on Computer Generated Forces and Behavoral Representation (CGF-95). 9-11 Mayıs 1995, Orlando, Florida, A.B.D. (s. 63-68).
Kocabaş, Ş. ve Langley, P. (1995). Integration of Research Tasks for Modeling Discoveries in Particle Physics. AAAI Spring Symposium Series, 27-29 Mart 1995, Stanford Üniversitesi, Kaliforniya, A.B.D. (s. 87-92). [Bu sempozyum notları AAAI tarafından teknik rapor olarak da yayınlanmıştır.]
Kocabaş, Ş. (1995). A Methodology for Modeling Scientific Discovery. AAAI Spring Symposium Series, 27-29 Mart 1995, Stanford Üniversitesi, Kaliforniya, A.B.D. (s. 139-144). [Bu sempozyum notları AAAI tarafından teknik rapor olarak da yayınlanmıştır.]
Öztemel, E., Kocabaş, Ş. (1995). Artificial Intelligence in Military Simulation. 1. Uluslararası Havacılık Sempozyumu. 8-10 Mart 1995, İTÜ, İstanbul.
Kocabaş, Ş. (1994). Goal Directed Discovery and Explanation in Particle Physics. AAAI Spring Symposium Series, 21-23 Mart 1994, Stanford, Kaliforniya, A.B.D. (s. 54-61).
Kocabaş, Ş. (1994). Goal Directed Discovery and Explanation in Particle Physics. 3. Türk Yapay Zeka ve Yapay Sinir Ağları Sempozyumu. 22-24 Haziran 1994, ODTÜ. Ankara. (s. 245-256).
Kocabaş, Ş. (1993). Elements of Scientific Creativity. AAAI Spring Symposium Series, 23-25 Mart 1993, Stanford, Kaliforniya, A.B.D. (s. 39-45). [Bu sempozyum notları AAAI tarafından teknik rapor olarak da yayınlanmıştır.]
Kocabaş, Ş. (1993). Elements of Scientific Creativity. 2. Türk Yapay Zeka ve Yapay Sinir Ağları Sempozyumu. 24-25 Haziran 1993, Boğaziçi Üniversitesi, Istanbul. (s. 38-45).
Kocabaş, Ş. (1993). Representation of Descriptive and Prescriptive Knowledge in Intelligent Systems. 2. Türk Yapay Zeka ve Yapay Sinir Ağları Sempozyumu. 24-25 Haziran 1993, Boğaziçi Üniversitesi, Istanbul. (s. 54-63).
Kocabaş, Ş. (1992). Elements of Scientific Research: Modeling discoveries in oxide superconductivity. Machine Discovery Workshop, 4 Temmuz 1992, Aberdeen Universitesi, Iskocya. (s. 63-70).
Kocabaş, Ş. (1992). Evaluation of Discovery Systems. (Panel bildirisi). Machine Discovery Workshop, 4 Temmuz 1992, Aberdeen Universitesi, Iskocya. (s.168-171).
Kocabaş, Ş. (1992). Four Levels of Learning and Representation. Birinci Turk Yapay Zeka ve Sinir Ağları Sempozyumu, 26 - 27 Haziran 1992, Bilkent Üniversitesi, Ankara. (s. 123-138).
Kocabaş, Ş. (1992). Learning Control Rules and Rule Parallelism in Hierarchic Systems. ESDA-92, Haziran 1992, Istanbul.
Kocabaş, Ş. (1992). Functional Categorization of Knowledge: Applications in modeling Scientific Discovery. AI Magazine, Spring 1992, vol. 13, 1, s. 11-12.
Kocabaş, Ş. (1992). Functional Categorization of Knowledge. AAAI Spring Symposium Series, 25 - 27 Mart 1992, Stanford, Kaliforniya, A.B.D. (s. 83-92).
Kocabaş, Ş. ve Alkan, A. (1992). An Interface to SQL with Learning Abilities. DECSYM 92, 24-27 Mart 1992, Antalya, Turkiye. Bildiriler, s. 101-106.
Kocabaş, Ş. (1991). Conflict Resolution as Discovery in Particle Physics. Machine Learning, vol. 6, s. 277-309, Kluwer Academic Press.
Kocabaş, Ş. (1991). A Review of Learning. The Knowledge Engineering Review, vol 6, No 3. Cambridge University Press. (s. 195-222).
Kocabaş, Ş. (1991). Computational Models of Scientific Discovery. The Knowledge Engineering Review, vol 6, No 4. Cambridge University Press. (s. 259-305).
Kocabaş, Ş. (1991). Homuncular Learning and Rule Parallelism. IEE Proceedings of Control-91, IEE Publications. (s. 951-954).
Kocabaş, Ş. (1990). Functional Categorization of Knowledge: Applications in Modeling Scientific Discovery.Doktora Tezi , Londra Universitesi.
Kocabaş, Ş. (1989a). Qualitative and quantitative reasoning in particle physics. 12th International Congress on Cybernetics, 21-25 August, 1989, Namur, Belgium.
Kocabaş, Ş. (1989b). Scientific explanation by exclusion. 12th International Congress on Cybernetics, 21-25 August, 1989, Namur, Belgium.
Kocabaş, Ş. (2001). Automated Formulation of Reactions and Pathways in Nuclear astrophysics: New Results. In Proceedings of the 4th International Conference on Discovery Science (DS-2001). Springer, pp. 172-181.
Kocabaş, Ş.& Langley, P (2001). An Integrated Framework for Extended Discovery in Particle Physics. In Proceedings of the 4th International Conference on Discovery Science (DS-2001). Springer, pp. 122-195.
Kocabaş, Ş. (2000). Automated formulation of reactions and pathways in nuclear astrophysics: New results. In Proceedings of the 9th Turkish Symposium on AI & NN. 21-23 June 2000, Izmir, Turkey. pp. 343-352.
Kocabaş, Ş. Langley, P. (2000). Computer generation of process explanations in nuclear astrophysics. International Journal of Human Computer Studies. 53, 1149-1164
Kocabaş, Ş. Langley, P. (1998). Automated Formulation of Reactions and Reaction Pathways in Nuclear Astrophysics. ECAI-98 Machine Discovery Workshop. 24 Ağustos 1998, Sussex Üniversitesi, Brighton, İngiltere. (Bildiriler, s. 4-9).
Kocabaş, Ş., Öztemel, E. (1998). AISim: An Intelligent Agent for Distributed Interactive Simulation. Computer Generated Forces and Simulation of Behavior (CGF-98). 12-14 Mayıs 1998, Orlando, Florida.(s. 257-261)
Öztemel, E., Kocabaş, Ş. (1998). Simulation for in-flight training. Computer Generated Forces and Simulation of Behavior (CGF-98). 12-14 Mayıs 1998, Orlando, Florida. (s. 149-155).
Kocabaş, Ş., Öztemel, E. (1998). Intelligent Agents for Distributed Interactive Battlefield Simulation. International Training and Education Conference (ITEC-98). 28-30 Nisan,1998, Lozan, İsviçre.
Öztemel, E., Kocabaş, Ş., Çetiner, B.G. (1998). Simulation Management System for WASiF. International Training and Education Conference (ITEC-98). 28-30 Nisan,1998, Lozan, İsviçre.
Kocabaş, Ş., Öztemel, E. (1998). Harp Oyunlarında Yapay Zeka Uygulamaları. TSK-MODSİM, Modelleme ve Simülasyon Konferansı, 1-3 Nisan 1998. Kara Harp Okulu, Ankara.
Kocabaş, Ş., Öztemel, E. (1998). AISim: Dağıtık Ortamlarda kompleks hava muharebesi için geliştirilmiş bir zeki tehdit sistemi.TSK-MODSİM, Modelleme ve Simülasyon Konferansı, 1-3 Nisan 1998. Kara Harp Okulu, Ankara.
Kocabaş, Ş. Öztemel, E., Uludağ, M., Koç, N. (1996). Design of a DIS Agent, the AISim System: A Progress Report. Computer Generated Forces and Simulation of Behavior (CGF-96). 23-25 Temmuz 1996, Orlando, Florida. (s. 119-126).
Öztemel, E., Kocabaş, Ş. (1996). Design Principles for Intelligent Agents in Distributed Interactive Simulation. SimTect '96. 25-26 Mart 1996, Melbourne, Avustralya.
Kocabaş, Ş. (1995). A Methodology for Modeling Scientific Discovery. Dördüncü Türk Yapay Zeka ve YSA Sempozyumu (TAINN-95), 26-28 Haziran 1995, Tübitak-MAM, Gebze. (s. 87-93).
Kocabaş, Ş., Öztemel, E., Uludağ, M., Koç, N. (1995). Agents That Learn and Explain Their Actions. Fifth Conference on Computer Generated Forces and Behavoral Representation (CGF-95). 9-11 Mayıs 1995, Orlando, Florida, A.B.D. (s. 63-68).
Kocabaş, Ş. ve Langley, P. (1995). Integration of Research Tasks for Modeling Discoveries in Particle Physics. AAAI Spring Symposium Series, 27-29 Mart 1995, Stanford Üniversitesi, Kaliforniya, A.B.D. (s. 87-92). [Bu sempozyum notları AAAI tarafından teknik rapor olarak da yayınlanmıştır.]
Kocabaş, Ş. (1995). A Methodology for Modeling Scientific Discovery. AAAI Spring Symposium Series, 27-29 Mart 1995, Stanford Üniversitesi, Kaliforniya, A.B.D. (s. 139-144). [Bu sempozyum notları AAAI tarafından teknik rapor olarak da yayınlanmıştır.]
Öztemel, E., Kocabaş, Ş. (1995). Artificial Intelligence in Military Simulation. 1. Uluslararası Havacılık Sempozyumu. 8-10 Mart 1995, İTÜ, İstanbul.
Kocabaş, Ş. (1994). Goal Directed Discovery and Explanation in Particle Physics. AAAI Spring Symposium Series, 21-23 Mart 1994, Stanford, Kaliforniya, A.B.D. (s. 54-61).
Kocabaş, Ş. (1994). Goal Directed Discovery and Explanation in Particle Physics. 3. Türk Yapay Zeka ve Yapay Sinir Ağları Sempozyumu. 22-24 Haziran 1994, ODTÜ. Ankara. (s. 245-256).
Kocabaş, Ş. (1993). Elements of Scientific Creativity. AAAI Spring Symposium Series, 23-25 Mart 1993, Stanford, Kaliforniya, A.B.D. (s. 39-45). [Bu sempozyum notları AAAI tarafından teknik rapor olarak da yayınlanmıştır.]
Kocabaş, Ş. (1993). Elements of Scientific Creativity. 2. Türk Yapay Zeka ve Yapay Sinir Ağları Sempozyumu. 24-25 Haziran 1993, Boğaziçi Üniversitesi, Istanbul. (s. 38-45).
Kocabaş, Ş. (1993). Representation of Descriptive and Prescriptive Knowledge in Intelligent Systems. 2. Türk Yapay Zeka ve Yapay Sinir Ağları Sempozyumu. 24-25 Haziran 1993, Boğaziçi Üniversitesi, Istanbul. (s. 54-63).
Kocabaş, Ş. (1992). Elements of Scientific Research: Modeling discoveries in oxide superconductivity. Machine Discovery Workshop, 4 Temmuz 1992, Aberdeen Universitesi, Iskocya. (s. 63-70).
Kocabaş, Ş. (1992). Evaluation of Discovery Systems. (Panel bildirisi). Machine Discovery Workshop, 4 Temmuz 1992, Aberdeen Universitesi, Iskocya. (s.168-171).
Kocabaş, Ş. (1992). Four Levels of Learning and Representation. Birinci Turk Yapay Zeka ve Sinir Ağları Sempozyumu, 26 - 27 Haziran 1992, Bilkent Üniversitesi, Ankara. (s. 123-138).
Kocabaş, Ş. (1992). Learning Control Rules and Rule Parallelism in Hierarchic Systems. ESDA-92, Haziran 1992, Istanbul.
Kocabaş, Ş. (1992). Functional Categorization of Knowledge: Applications in modeling Scientific Discovery. AI Magazine, Spring 1992, vol. 13, 1, s. 11-12.
Kocabaş, Ş. (1992). Functional Categorization of Knowledge. AAAI Spring Symposium Series, 25 - 27 Mart 1992, Stanford, Kaliforniya, A.B.D. (s. 83-92).
Kocabaş, Ş. ve Alkan, A. (1992). An Interface to SQL with Learning Abilities. DECSYM 92, 24-27 Mart 1992, Antalya, Turkiye. Bildiriler, s. 101-106.
Kocabaş, Ş. (1991). Conflict Resolution as Discovery in Particle Physics. Machine Learning, vol. 6, s. 277-309, Kluwer Academic Press.
Kocabaş, Ş. (1991). A Review of Learning. The Knowledge Engineering Review, vol 6, No 3. Cambridge University Press. (s. 195-222).
Kocabaş, Ş. (1991). Computational Models of Scientific Discovery. The Knowledge Engineering Review, vol 6, No 4. Cambridge University Press. (s. 259-305).
Kocabaş, Ş. (1991). Homuncular Learning and Rule Parallelism. IEE Proceedings of Control-91, IEE Publications. (s. 951-954).
Kocabaş, Ş. (1990). Functional Categorization of Knowledge: Applications in Modeling Scientific Discovery.Doktora Tezi , Londra Universitesi.
Kocabaş, Ş. (1989a). Qualitative and quantitative reasoning in particle physics. 12th International Congress on Cybernetics, 21-25 August, 1989, Namur, Belgium.
Kocabaş, Ş. (1989b). Scientific explanation by exclusion. 12th International Congress on Cybernetics, 21-25 August, 1989, Namur, Belgium.
Design of A DIS Agent
Design of A DIS Agent, the AISim System:
A progress report
Sakir Kocabas*, Ercan Oztemel**,
1. Abstract
An intelligent system, AISim is being developed by our group at MRC, within the framework of EUCLID RTP 11.3 battlefield simulation project. AISim is being developed to enable a simulated air target (an F16 plane) to behave intelligently in cooperation with other computer generated and man controlled air targets, in tasks and activities in CAP and Escort missions in defensive and offensive scenarios. The system's tasks include Navigation, Patrol, Escort, BVR and WVR Engagement, Air-to-Air Refuelling, Disengage and Return-to-Base.
2. Introduction
The study of intelligent agents in real-time simulation systems has been one of the most challenging research topics in artificial intelligence (see, e.g., Jones, et. al. 1994). The primary purpose of such studies is to examine agent behavior in real-time environments and scenarios, and to prepare more realistic systems for training human operators for certain skills. Recently, extensive research is being carried out on intelligent agents operating in distributed interactive simulation (DIS) environments. The DIS environments enable to use a number of agents with different goals and behavior patterns in real-time scenarios (see, e.g., Oztemel & Kocabas, 1996; Laird, et al., 1995; Tambe, et al., 1995). DIS is mainly concerned with time and space-coherent, synthetic representation of real-world environments and interactions of operational entities in them.
The synthetic environment is created through real-time exchange of data units between distributed, and computationally autonomous simulation applications in the form of simulations, simulators and instrumented equipment interconnected through standard computer communicative services. The computational entities can be in one location or distributed geographically. A DIS system has the following characteristics:
No central computer is used for event scheduling or conflict resolution.. Autonomous simulation stations are responsible for maintaining the state of one or more simulation elements.. There is a standard protocol for communicating ground-truth data.. Receiving stations are responsible for determining what is to be perceived.. Simulation stations communicate only changes in their state.. "Dead-reckoning" algorithms are used to reduce overloads in processing communication data.
An intelligent agent consists mainly of three components: perception, cognition and action. Memory, reasoning, learning, understanding, planning, scheduling, and control are some of the basic characteristics of intelligent behavior. An agent equipped with these capabilities can receive information from its environment, organize its knowledge about the environment, evaluate situations, deduce conclusions, solve problems, and generate actions.
The cooperation of DIS agents depends on the kind of tasks and activities they are expected to do, and the environment in which they operate. There may be three different types of tasks: 1) Agents may perform problem solving in a common domain, 2) agents may be working together to improve their individual performance, and 3) agents may be working together to improve the performance of the overall system they are designed for.
In DIS systems the third type of cooperation is important as it concerns the question of dependency between agents. If an agent needs to communicate with other agents, it has to know the underlying model of these agents. Additionally, there has to be a standard data communication accesible by every entity within the overall system. Some data communication problems are solved by "dead reckoning" algorithms. Such algorithms estimate the future situations in the temporary absence of situational data, ensuring that the system is somewhat fault tolerant with respect to temporary communication failures.
In a complex environment, knowledge used by an egent can be incomplete, and the goals of the agents might be conflicting (Jones, et. al., 1994). If an agent has conflicting goals, a set of heuristics or a classifier can be used to deal with the conflict. However, if different agents have conflicting goals, then there is a need for a negotiator to deal with this problem. The negotiator is an agent which defines the authority of information.
This paper describes the design of an intelligent agent, AISim, operating in a DIS environment. Our study focuses on the following problems in designing such agents:
- Rationality of agent behavior- Agent cooperation and coordination- Resolution of conflicts in agent goals and tasks- Agent situation and behavior explanations- Agent reusability.
The design history of AISim goes back to the design of its prototype RSIM (Kocabas, et. al., 1995). RSIM was a simple model operating in a 2-d space, with capabilities of learning its rules of behavior and explaining its behavior. AISim is a much more developed version with the capabilities of detailed situation assessment, action management and behavior explanation. In the following sections, first a summary of the design history of AISim is provided. Then the system is described in terms of its hardware and software strucutre. Next, AISim's methods and capabilities are discussed in comparison with other related systems. Finally, the paper concludes with a summary of the results.
3. System Development
The following procedure is employed in the development of AISim:
- Domain analysis to define the activities to be simulated in the application.- Requirements analysis, to define the system's goals and functions.- Global design analysis, to ensure that each specified goal is achieved by a set of functions.- Detailed design, to guide the software engineers to code the system in accordance with the specified requirements.- Software development which is the actual code generation process.- Testing, verification and integration to DIS system.
Currently, our work has passed the prototype and design stages, and is now in the software development stage, in which AISim has been integrated with the underlying simulation system.
4. System Description
The basic hardware structure of the DIS system on which AISim runs is shown in Figure 1. The system operates as networked to the simulation system in a DIS environment, where AISim runs on the AI station, and control its agent(s) on a simulation station connected to the same DIS system. The simulation station runs ITEMS* simulation system. The communication between the workstations is carried out by exchanging standard data units in the network, under InterSIM** a DIS network software.
Simulation
AI Station station
------------ ------------
AISim simulation
system
------------ ------------
interface interface
------------ ------------
<- pdus ->
----------------------------------------- ... DIS Network
Figure 1. Hardware structure of the DIS system on which AISim runs.
ITEMS* is the product of CAE Electronics.InterSIM** is the product of TTS.
As to the software architecture of the system, we have selected a hierarchical approach for the design of AISim, in which the system has four levels of goals:
1) Mission goals2) Task goals3) Subtask goals4) Activity and action goals.
DIS scenarios require the definition of mission goals such as air interception and tactical air support. AISim has been designed for two different mission goals: Combat Air Patrol (CAP), and Escort to bombers. When the system's mission goal is defined as CAP, it is divided into a set of task goals such as navigation, patrol, and BVR combat. These task goals are further divided into a set of subtask goals such as trajectory guidance, weapons management, and evasion. The subtask goals in turn, are divided into activities such as firing and guiding a missile, performing an evasion maneuver, and turning towards a target. Activities are also divided into a set of simple actions such as changing heading, speed and altitude.
AISim's control structure supports the goal hierarchy described above. The system has two modules: Situation Assessment (SA) and Action Management (AM). The SA module monitors the situational parameters 10 times a second on avrage, by first selecting a set of situational parameters, calculates the situation, and sends a reduced set of situational indicators in the form of signals to the AM module.
The AM module itself consists of a set of operators in a hierarchy. On the top of this hierarchy is the Task Control Operator (TCO), which controls a set of task operators by deciding which task operator is to be activated under the current situation. Once a task operator is activated, this in turn, fires subtask and activity operators and rules. In this way, AISim directs its agent in the scenario in accordance with its assessment of the current situation.
AISim's TCO has the following operators which can become active in a CAP mission: Takeoff, Navigate, Patrol, BVR Engage, WVR Engage, Disengage, Air-to-Air Refuelling, Return to Base, and Land. Each of these operators have a set of subtask operators which in turn have a set of activity operators, and finally each activity operator has a set of action rules. The task control operator of AISim currently has 23 rules for selecting task operators for CAP missions. The total number of sub operators in these task operators is 52, which in turn have a small set of action rules and procedures. Figure 2 shows a section of AISim's mission, task, subtask and activity hierarchy. In this hierarchic control structure AISim supports the following intelligent agent characteristics: Situation assessment, action management and explanation.
---------------------------------------------------------
Mission CAP
Escort
Task CAP
Navigate
Patrol
BVR Engage
WVR Engage
Disengage
...
Subtask BVR Engage
BVR Approach
BVR Attack
BVR Evade
BVR Escape
Activity BVR Attack
Maintain Angle of Attack
Check Missile Envelope
Missile Launch
...
Action Missile Launch
Launch Missile
Perform f-pole
Guide Missile
...
---------------------------------------------------------
Figure 2. AISim's hierarchy of operators for
mission tasks, subtasks, activities and actions.
5. Discussion
In this section we will discuss AISim and compare it with other related in terms of:
- Domain tasks- System architecture (knowledge organization)- Intelligent agent features . Situation assessment (perception, cognition) . Action management (cognition, action) . Robustness . Timeliness . Flexibility (eg reusability) . Learning, . Explanation.- Performance in mission scenarios.
AISim has been tested in controlling an F 16 against ITEMS and man controlled Mig 29's and F 16s in various CAP scenarios. Tests on the system in escort scenarions are continuing. In CAP scenarios, the AISim agent (AIT) takes off, navigates to a patrol waypoint in a predefined desired engagement zone (DEZ), performs partol in an elliptical orbit towards towards a given threat direction. When a threat approaches a certain distance, AISim's TCO passes control to BVR Engagement operator, and this in turn, to BVR Approach sub-operator, and so AIT leaves patrol and approaches its target in a certain angle. Within a certain range, BVR Attack sub-operator takes control of AIT, guiding it through to own missile envelope, while securing and maintaining radar lock until a certain range. This sub operator is also responsible for launching and guiding BVR missiles. Meanwhile, if a radar lock comes from the opponent in a certain range, control passes the BVR Evade sub-operator, which in turn, guides AIT into evasive maneuvers. Chaff throws and radar jams can automatically be taken care of by the simulation system ITEMS. During BVR Attack or BVR Evade, if AIT has entered WVR engagement range, then TCO passes control to WVR Engage operator which directs AIT in WVR attack, evade and escape maneuvers.
At all times, TCO checks the fuel and missile stocks of AIT. When AIT runs out of BVR and/or WVR missiles, control passes to Disengage and RTB operators depending on the tactical situation. When the fuel level of AIT is below a predefined level, and the mission is still on, TCO passes control to Escape and/or AAR operators, and AIT is directed towards an AAR point where it refuels.
The above is a brief description of AIT's behavior in which a good deal of details are omitted for reasons of the limitations of this paper.
The knowledge organization and control structure of AISim is based on the hierarchic homuncular control (HH) architecture (Kocabas, 1991). Unlike the sequences of operators of Soar-IFOR, in this architecture, AISim's operators are systematically divided into mission, task, subtask and activity operators as shown in Figure 2. This architecture provides effective search control in real-time behavior. Accordingly, at any moment in its activity, the AISim agent can pass from one task (such as BVR Engage) to another taks (Disengage).
The number of operators and rules of AISim are small, compared to the variety of tasks and activities performed by its agents in a scenario. There are two reasons for this:
1) AISim's HH control architecture has proved to be effective in partitioning the control of agent activities.2) Many of the low level activities such as navigation to a waypoint and radar lock are carried out by the ITEMS simulation system.
Like Air-IFOR agents (Laird, et. al. 1995), AISim agents are isolated from the details of the underlying simulation environment, such as missile and plane dynamics, and sensor simulation. However, unlike Air-IFOR agents, AISim controls its agents created in a simulation station in the DIS environment, from a separate workstation connected to the same environment, using the data protocols of the DIS network software InterSIM. In other words, as opposed to Air-Soar systems which run in direct communication with its simulation system ModSAF on the same workstation, AISim runs independently on a separate workstation. Therefore its configuration is more general in terms of data communitation and control than that of Air-Soar.
As to the intelligent agent features of the system, AISim's SA module reads the set of data on the dynamic and static simulation elements, and computes the parameters of the tactical situtation from some of these data, and sends the relevant attribute-values to a message list to be read by the system's TCO operator. AISim reads about 60 different types of data (which are grouped in themselves), and sends about 15 types of data to the DIS network. The whole system's clock cycle is 20. AISim's action management operators, as have been described above, are capable of guiding its agent in different tasks and activities. The current version performs well in 1-v-1 engagements, and has a simple set of prime opponent selection rules to deal with more than one opponent at a time. However, unlike Air-IFOR agents the system has not yet been developed for 1-v-2 and 2-v-2 air combat scenarios.
AISim tests shows that the system is robust in the sense that the system shows reasonable performance in different scenarios in 1-v-1 and 1-v-2 engagements. The system has also passed the timeliness criterion in its current form.
As to the flexibility criterion, AISim architecture has proved to be flexible enough in adapting to other missions (e.g., from CAP to Escort missions) simply by adding new task operators and a small set of task control rules in TCO. Unlike Air/Soar's procedure, in which this system uses a decision procedure to select operators according to the current situation by using a rule set for operator selection, in AISim task selection is done by its task control operator. One advantage of this architecture is that it enables to change the doctrines of the AIT more easily.
We had tested learning methods on our earlier model RSIM (Kocabas, et al, 1995) which learns action rules to perform meaningful maneuvers in 1-v-1 engagements. Learning methods have been applied in limited activities such as learning pure pursuit (Hommertzheim, et. al., 1991) and certain close combat maneuvers (Crowe, 1990). AISim's architecture allows it to learn task control and activity rules, but the system's search space is too large for effective control and action rules. For this reason, we have not yet implemented learning methods in AISim. On the other hand. many military missions and tasks are taught by instruction. Air combat maneuvers are also well defined both in tactics and geometrical paths and trajectories. However, this does not mean that learning is not feasible in such systems, particularly because of the use of new technologies in missiles and planes.
Behavior explanations is an important feature for computer generated agents, as it is useful to know both for development and training purposes, what the agent has been doing at a particular moment during its activities. Behavior explanations can be in the form of post-mission explanations (Johnson, 1994) or in real-time (Kocabas, et al.,1995). Like its predecessor RSIM, AISim explains its agent's behavior in real-time. The system's knowledge organization, particularly its task based hierarchy of operators into tasks, subtasks, activities and actions, facilitates the detailed explanation of its agent's behavior in real-time. Air-Soar agents also have explanation capability, but as post-flight explanations (Johnson, 1994).
The same knowledge organization also facilitiates to include the description of agent goals and intentions beside simple behavior explanations. Goal directed explanations can be useful in monitoring the agent behavior more closely, particularly the agent's situation assessment capabilities. We are currently in the process of implementing this feature in AISim. Under these considerations, we believe that AISim has a more flexible knowledge organization scheme and control architecture than that of Soar which provides the basic knowledge organization scheme to Air-Soar systems.
As opposed to Tac-Air-Soar, AISim can in principle deal with multiple independent goals simultaneously. We are in the process of implementing this feature in the system. AISim can control more than one AI targets in a scenario from one station, although we have tried and tested only one so far.
Like Air-IFOR agents of Air/Soar, the AISim provides the following capabilities to AIT: situation assessment, following flight plans, performing patrol in reference to a certain waypoint and opponent direction, prime opponent selection, attack and missile management, evasion and escape, escort behavior and tactics, fuel management, disengagement, and coordinating with other agents in escort tasks. To these capabilities, own behavioral explanation and target behavior interpretation must be added.
On the other hand, compared with Air-IFOR agents, AISim agents have a limited range of mission simulations, as confined to CAP and Escort. Additionally, the current version of AISim agents have limited capabilities for 1-v-2 air combat.
6. Summary
In this paper we described the design and the architecture of an intelligent system AISim capable of performing tasks and activities in CAP and Escort missions. We have also discussed the system's knowledge organization and control architecture comparing with other related systems. AISim's architecture supports intelligent agent requirements such as situation assessment, action management, timeliness, flexibility and behavior explanation.
7. References
Crowe, M.X. (1990). "The application of artificial neural systems to the training of air combat decision-making skills", In Proceedings of the 12th ITSC., pp. 302-312.
Hommertzheim, D., Huffman, J., and Sabuncuoglu, I. (1991).Training and artificial neural network the pure pursuit maneuver. Computer Ops Res. 18 No.4, pp. 343-353.
Kocabas, S. (1991). Homuncular learning and rule parallelism:An application to BACON. In proceedings of International Conference on Control - 91, pp. 950-954.
Kocabas, S., Oztemel, E., Uludag, M., and Koc, N. (1995).Automated agents that learn and explain their own actions: A progress report. In Proceedings of the 5th Conference on Computer Generated Forces and Behavioral Representation. pp. 63-68.
Laird, J.E., Johnson, W.L., Jones, R.M., Koss, F., Lehman, J.F., Nielsen, P.E., Rosenbloom, P.S., Rubinoff, R., Schwamb, K.B.Tambe, M., Van Dyke, J. van Lent, E., and Wray, R.E. (1995)."Simulated intelligent forces for air: The Soar/IFOR project 1995" In Proceedings of the 5th Conference on Computer Generated Forces and Behavioral Representation. pp. 27-36.
Oztemel, E. and Kocabas, S. (1996). Design principles for intelligent agents in distributed interactive simulation. In Proceedings of SimTect-96, 25-26 March 1996, p. 103-106.
Tambe, M., Johnson, W.L., Jones, R.M., Koss, F., Laird, J.E., Rosenbloom, P.S. and Schwamb, K.B. (1995)."Intelligent agents for interactive simulation environments." AI Magazine, Spring, 1995, pp. 15-39.
Johnson, W.L. (1994). "Agents that explain their own actions", In Proceedings of the 4th Conference on Computer Generated Forces. May 1994, Orlando, Florida.
Jones, R.M., Laird, J.E., Tambe, M. & Rosenbloom, P.S. (1994)."Generating goals in response to interacting goals", In Proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation
8. Authors' Biographies
Sakir Kocabas is the head of the AI Department at MRC and the project manager for EUCLID RTP 11.3 WP2. Dr. Kocabas has a PhD degree in Information Engineering. His research interests are in the areas of Machine Learning and Discovery.
Ercan Oztemel is a researcher at the AI Department of MRC. Dr. Oztemel has a PhD degree in Artificial Intelligence. His research interests are Real-Time Knowledge Based Systems, Inductive Learning and Neural Networks.
Mahmut Uludag is a researcher at the AI Department of MRC. Mr. Uludag has a Masters of Science degree in Mechanical Engineering, and is a PhD student at ITU. His research interests are AI Applications in Real-Time Simulation.
Nazim Koc is a researcher at the AI Department of MRC. He has a Masters of Science degree in Symbolic Computation, and is a PhD student at ITU. His research interests are Symbolic Computation, Parallel Logic Programming and Machine Learning.
A progress report
Sakir Kocabas*, Ercan Oztemel**,
1. Abstract
An intelligent system, AISim is being developed by our group at MRC, within the framework of EUCLID RTP 11.3 battlefield simulation project. AISim is being developed to enable a simulated air target (an F16 plane) to behave intelligently in cooperation with other computer generated and man controlled air targets, in tasks and activities in CAP and Escort missions in defensive and offensive scenarios. The system's tasks include Navigation, Patrol, Escort, BVR and WVR Engagement, Air-to-Air Refuelling, Disengage and Return-to-Base.
2. Introduction
The study of intelligent agents in real-time simulation systems has been one of the most challenging research topics in artificial intelligence (see, e.g., Jones, et. al. 1994). The primary purpose of such studies is to examine agent behavior in real-time environments and scenarios, and to prepare more realistic systems for training human operators for certain skills. Recently, extensive research is being carried out on intelligent agents operating in distributed interactive simulation (DIS) environments. The DIS environments enable to use a number of agents with different goals and behavior patterns in real-time scenarios (see, e.g., Oztemel & Kocabas, 1996; Laird, et al., 1995; Tambe, et al., 1995). DIS is mainly concerned with time and space-coherent, synthetic representation of real-world environments and interactions of operational entities in them.
The synthetic environment is created through real-time exchange of data units between distributed, and computationally autonomous simulation applications in the form of simulations, simulators and instrumented equipment interconnected through standard computer communicative services. The computational entities can be in one location or distributed geographically. A DIS system has the following characteristics:
No central computer is used for event scheduling or conflict resolution.. Autonomous simulation stations are responsible for maintaining the state of one or more simulation elements.. There is a standard protocol for communicating ground-truth data.. Receiving stations are responsible for determining what is to be perceived.. Simulation stations communicate only changes in their state.. "Dead-reckoning" algorithms are used to reduce overloads in processing communication data.
An intelligent agent consists mainly of three components: perception, cognition and action. Memory, reasoning, learning, understanding, planning, scheduling, and control are some of the basic characteristics of intelligent behavior. An agent equipped with these capabilities can receive information from its environment, organize its knowledge about the environment, evaluate situations, deduce conclusions, solve problems, and generate actions.
The cooperation of DIS agents depends on the kind of tasks and activities they are expected to do, and the environment in which they operate. There may be three different types of tasks: 1) Agents may perform problem solving in a common domain, 2) agents may be working together to improve their individual performance, and 3) agents may be working together to improve the performance of the overall system they are designed for.
In DIS systems the third type of cooperation is important as it concerns the question of dependency between agents. If an agent needs to communicate with other agents, it has to know the underlying model of these agents. Additionally, there has to be a standard data communication accesible by every entity within the overall system. Some data communication problems are solved by "dead reckoning" algorithms. Such algorithms estimate the future situations in the temporary absence of situational data, ensuring that the system is somewhat fault tolerant with respect to temporary communication failures.
In a complex environment, knowledge used by an egent can be incomplete, and the goals of the agents might be conflicting (Jones, et. al., 1994). If an agent has conflicting goals, a set of heuristics or a classifier can be used to deal with the conflict. However, if different agents have conflicting goals, then there is a need for a negotiator to deal with this problem. The negotiator is an agent which defines the authority of information.
This paper describes the design of an intelligent agent, AISim, operating in a DIS environment. Our study focuses on the following problems in designing such agents:
- Rationality of agent behavior- Agent cooperation and coordination- Resolution of conflicts in agent goals and tasks- Agent situation and behavior explanations- Agent reusability.
The design history of AISim goes back to the design of its prototype RSIM (Kocabas, et. al., 1995). RSIM was a simple model operating in a 2-d space, with capabilities of learning its rules of behavior and explaining its behavior. AISim is a much more developed version with the capabilities of detailed situation assessment, action management and behavior explanation. In the following sections, first a summary of the design history of AISim is provided. Then the system is described in terms of its hardware and software strucutre. Next, AISim's methods and capabilities are discussed in comparison with other related systems. Finally, the paper concludes with a summary of the results.
3. System Development
The following procedure is employed in the development of AISim:
- Domain analysis to define the activities to be simulated in the application.- Requirements analysis, to define the system's goals and functions.- Global design analysis, to ensure that each specified goal is achieved by a set of functions.- Detailed design, to guide the software engineers to code the system in accordance with the specified requirements.- Software development which is the actual code generation process.- Testing, verification and integration to DIS system.
Currently, our work has passed the prototype and design stages, and is now in the software development stage, in which AISim has been integrated with the underlying simulation system.
4. System Description
The basic hardware structure of the DIS system on which AISim runs is shown in Figure 1. The system operates as networked to the simulation system in a DIS environment, where AISim runs on the AI station, and control its agent(s) on a simulation station connected to the same DIS system. The simulation station runs ITEMS* simulation system. The communication between the workstations is carried out by exchanging standard data units in the network, under InterSIM** a DIS network software.
Simulation
AI Station station
------------ ------------
AISim simulation
system
------------ ------------
interface interface
------------ ------------
<- pdus ->
----------------------------------------- ... DIS Network
Figure 1. Hardware structure of the DIS system on which AISim runs.
ITEMS* is the product of CAE Electronics.InterSIM** is the product of TTS.
As to the software architecture of the system, we have selected a hierarchical approach for the design of AISim, in which the system has four levels of goals:
1) Mission goals2) Task goals3) Subtask goals4) Activity and action goals.
DIS scenarios require the definition of mission goals such as air interception and tactical air support. AISim has been designed for two different mission goals: Combat Air Patrol (CAP), and Escort to bombers. When the system's mission goal is defined as CAP, it is divided into a set of task goals such as navigation, patrol, and BVR combat. These task goals are further divided into a set of subtask goals such as trajectory guidance, weapons management, and evasion. The subtask goals in turn, are divided into activities such as firing and guiding a missile, performing an evasion maneuver, and turning towards a target. Activities are also divided into a set of simple actions such as changing heading, speed and altitude.
AISim's control structure supports the goal hierarchy described above. The system has two modules: Situation Assessment (SA) and Action Management (AM). The SA module monitors the situational parameters 10 times a second on avrage, by first selecting a set of situational parameters, calculates the situation, and sends a reduced set of situational indicators in the form of signals to the AM module.
The AM module itself consists of a set of operators in a hierarchy. On the top of this hierarchy is the Task Control Operator (TCO), which controls a set of task operators by deciding which task operator is to be activated under the current situation. Once a task operator is activated, this in turn, fires subtask and activity operators and rules. In this way, AISim directs its agent in the scenario in accordance with its assessment of the current situation.
AISim's TCO has the following operators which can become active in a CAP mission: Takeoff, Navigate, Patrol, BVR Engage, WVR Engage, Disengage, Air-to-Air Refuelling, Return to Base, and Land. Each of these operators have a set of subtask operators which in turn have a set of activity operators, and finally each activity operator has a set of action rules. The task control operator of AISim currently has 23 rules for selecting task operators for CAP missions. The total number of sub operators in these task operators is 52, which in turn have a small set of action rules and procedures. Figure 2 shows a section of AISim's mission, task, subtask and activity hierarchy. In this hierarchic control structure AISim supports the following intelligent agent characteristics: Situation assessment, action management and explanation.
---------------------------------------------------------
Mission CAP
Escort
Task CAP
Navigate
Patrol
BVR Engage
WVR Engage
Disengage
...
Subtask BVR Engage
BVR Approach
BVR Attack
BVR Evade
BVR Escape
Activity BVR Attack
Maintain Angle of Attack
Check Missile Envelope
Missile Launch
...
Action Missile Launch
Launch Missile
Perform f-pole
Guide Missile
...
---------------------------------------------------------
Figure 2. AISim's hierarchy of operators for
mission tasks, subtasks, activities and actions.
5. Discussion
In this section we will discuss AISim and compare it with other related in terms of:
- Domain tasks- System architecture (knowledge organization)- Intelligent agent features . Situation assessment (perception, cognition) . Action management (cognition, action) . Robustness . Timeliness . Flexibility (eg reusability) . Learning, . Explanation.- Performance in mission scenarios.
AISim has been tested in controlling an F 16 against ITEMS and man controlled Mig 29's and F 16s in various CAP scenarios. Tests on the system in escort scenarions are continuing. In CAP scenarios, the AISim agent (AIT) takes off, navigates to a patrol waypoint in a predefined desired engagement zone (DEZ), performs partol in an elliptical orbit towards towards a given threat direction. When a threat approaches a certain distance, AISim's TCO passes control to BVR Engagement operator, and this in turn, to BVR Approach sub-operator, and so AIT leaves patrol and approaches its target in a certain angle. Within a certain range, BVR Attack sub-operator takes control of AIT, guiding it through to own missile envelope, while securing and maintaining radar lock until a certain range. This sub operator is also responsible for launching and guiding BVR missiles. Meanwhile, if a radar lock comes from the opponent in a certain range, control passes the BVR Evade sub-operator, which in turn, guides AIT into evasive maneuvers. Chaff throws and radar jams can automatically be taken care of by the simulation system ITEMS. During BVR Attack or BVR Evade, if AIT has entered WVR engagement range, then TCO passes control to WVR Engage operator which directs AIT in WVR attack, evade and escape maneuvers.
At all times, TCO checks the fuel and missile stocks of AIT. When AIT runs out of BVR and/or WVR missiles, control passes to Disengage and RTB operators depending on the tactical situation. When the fuel level of AIT is below a predefined level, and the mission is still on, TCO passes control to Escape and/or AAR operators, and AIT is directed towards an AAR point where it refuels.
The above is a brief description of AIT's behavior in which a good deal of details are omitted for reasons of the limitations of this paper.
The knowledge organization and control structure of AISim is based on the hierarchic homuncular control (HH) architecture (Kocabas, 1991). Unlike the sequences of operators of Soar-IFOR, in this architecture, AISim's operators are systematically divided into mission, task, subtask and activity operators as shown in Figure 2. This architecture provides effective search control in real-time behavior. Accordingly, at any moment in its activity, the AISim agent can pass from one task (such as BVR Engage) to another taks (Disengage).
The number of operators and rules of AISim are small, compared to the variety of tasks and activities performed by its agents in a scenario. There are two reasons for this:
1) AISim's HH control architecture has proved to be effective in partitioning the control of agent activities.2) Many of the low level activities such as navigation to a waypoint and radar lock are carried out by the ITEMS simulation system.
Like Air-IFOR agents (Laird, et. al. 1995), AISim agents are isolated from the details of the underlying simulation environment, such as missile and plane dynamics, and sensor simulation. However, unlike Air-IFOR agents, AISim controls its agents created in a simulation station in the DIS environment, from a separate workstation connected to the same environment, using the data protocols of the DIS network software InterSIM. In other words, as opposed to Air-Soar systems which run in direct communication with its simulation system ModSAF on the same workstation, AISim runs independently on a separate workstation. Therefore its configuration is more general in terms of data communitation and control than that of Air-Soar.
As to the intelligent agent features of the system, AISim's SA module reads the set of data on the dynamic and static simulation elements, and computes the parameters of the tactical situtation from some of these data, and sends the relevant attribute-values to a message list to be read by the system's TCO operator. AISim reads about 60 different types of data (which are grouped in themselves), and sends about 15 types of data to the DIS network. The whole system's clock cycle is 20. AISim's action management operators, as have been described above, are capable of guiding its agent in different tasks and activities. The current version performs well in 1-v-1 engagements, and has a simple set of prime opponent selection rules to deal with more than one opponent at a time. However, unlike Air-IFOR agents the system has not yet been developed for 1-v-2 and 2-v-2 air combat scenarios.
AISim tests shows that the system is robust in the sense that the system shows reasonable performance in different scenarios in 1-v-1 and 1-v-2 engagements. The system has also passed the timeliness criterion in its current form.
As to the flexibility criterion, AISim architecture has proved to be flexible enough in adapting to other missions (e.g., from CAP to Escort missions) simply by adding new task operators and a small set of task control rules in TCO. Unlike Air/Soar's procedure, in which this system uses a decision procedure to select operators according to the current situation by using a rule set for operator selection, in AISim task selection is done by its task control operator. One advantage of this architecture is that it enables to change the doctrines of the AIT more easily.
We had tested learning methods on our earlier model RSIM (Kocabas, et al, 1995) which learns action rules to perform meaningful maneuvers in 1-v-1 engagements. Learning methods have been applied in limited activities such as learning pure pursuit (Hommertzheim, et. al., 1991) and certain close combat maneuvers (Crowe, 1990). AISim's architecture allows it to learn task control and activity rules, but the system's search space is too large for effective control and action rules. For this reason, we have not yet implemented learning methods in AISim. On the other hand. many military missions and tasks are taught by instruction. Air combat maneuvers are also well defined both in tactics and geometrical paths and trajectories. However, this does not mean that learning is not feasible in such systems, particularly because of the use of new technologies in missiles and planes.
Behavior explanations is an important feature for computer generated agents, as it is useful to know both for development and training purposes, what the agent has been doing at a particular moment during its activities. Behavior explanations can be in the form of post-mission explanations (Johnson, 1994) or in real-time (Kocabas, et al.,1995). Like its predecessor RSIM, AISim explains its agent's behavior in real-time. The system's knowledge organization, particularly its task based hierarchy of operators into tasks, subtasks, activities and actions, facilitates the detailed explanation of its agent's behavior in real-time. Air-Soar agents also have explanation capability, but as post-flight explanations (Johnson, 1994).
The same knowledge organization also facilitiates to include the description of agent goals and intentions beside simple behavior explanations. Goal directed explanations can be useful in monitoring the agent behavior more closely, particularly the agent's situation assessment capabilities. We are currently in the process of implementing this feature in AISim. Under these considerations, we believe that AISim has a more flexible knowledge organization scheme and control architecture than that of Soar which provides the basic knowledge organization scheme to Air-Soar systems.
As opposed to Tac-Air-Soar, AISim can in principle deal with multiple independent goals simultaneously. We are in the process of implementing this feature in the system. AISim can control more than one AI targets in a scenario from one station, although we have tried and tested only one so far.
Like Air-IFOR agents of Air/Soar, the AISim provides the following capabilities to AIT: situation assessment, following flight plans, performing patrol in reference to a certain waypoint and opponent direction, prime opponent selection, attack and missile management, evasion and escape, escort behavior and tactics, fuel management, disengagement, and coordinating with other agents in escort tasks. To these capabilities, own behavioral explanation and target behavior interpretation must be added.
On the other hand, compared with Air-IFOR agents, AISim agents have a limited range of mission simulations, as confined to CAP and Escort. Additionally, the current version of AISim agents have limited capabilities for 1-v-2 air combat.
6. Summary
In this paper we described the design and the architecture of an intelligent system AISim capable of performing tasks and activities in CAP and Escort missions. We have also discussed the system's knowledge organization and control architecture comparing with other related systems. AISim's architecture supports intelligent agent requirements such as situation assessment, action management, timeliness, flexibility and behavior explanation.
7. References
Crowe, M.X. (1990). "The application of artificial neural systems to the training of air combat decision-making skills", In Proceedings of the 12th ITSC., pp. 302-312.
Hommertzheim, D., Huffman, J., and Sabuncuoglu, I. (1991).Training and artificial neural network the pure pursuit maneuver. Computer Ops Res. 18 No.4, pp. 343-353.
Kocabas, S. (1991). Homuncular learning and rule parallelism:An application to BACON. In proceedings of International Conference on Control - 91, pp. 950-954.
Kocabas, S., Oztemel, E., Uludag, M., and Koc, N. (1995).Automated agents that learn and explain their own actions: A progress report. In Proceedings of the 5th Conference on Computer Generated Forces and Behavioral Representation. pp. 63-68.
Laird, J.E., Johnson, W.L., Jones, R.M., Koss, F., Lehman, J.F., Nielsen, P.E., Rosenbloom, P.S., Rubinoff, R., Schwamb, K.B.Tambe, M., Van Dyke, J. van Lent, E., and Wray, R.E. (1995)."Simulated intelligent forces for air: The Soar/IFOR project 1995" In Proceedings of the 5th Conference on Computer Generated Forces and Behavioral Representation. pp. 27-36.
Oztemel, E. and Kocabas, S. (1996). Design principles for intelligent agents in distributed interactive simulation. In Proceedings of SimTect-96, 25-26 March 1996, p. 103-106.
Tambe, M., Johnson, W.L., Jones, R.M., Koss, F., Laird, J.E., Rosenbloom, P.S. and Schwamb, K.B. (1995)."Intelligent agents for interactive simulation environments." AI Magazine, Spring, 1995, pp. 15-39.
Johnson, W.L. (1994). "Agents that explain their own actions", In Proceedings of the 4th Conference on Computer Generated Forces. May 1994, Orlando, Florida.
Jones, R.M., Laird, J.E., Tambe, M. & Rosenbloom, P.S. (1994)."Generating goals in response to interacting goals", In Proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation
8. Authors' Biographies
Sakir Kocabas is the head of the AI Department at MRC and the project manager for EUCLID RTP 11.3 WP2. Dr. Kocabas has a PhD degree in Information Engineering. His research interests are in the areas of Machine Learning and Discovery.
Ercan Oztemel is a researcher at the AI Department of MRC. Dr. Oztemel has a PhD degree in Artificial Intelligence. His research interests are Real-Time Knowledge Based Systems, Inductive Learning and Neural Networks.
Mahmut Uludag is a researcher at the AI Department of MRC. Mr. Uludag has a Masters of Science degree in Mechanical Engineering, and is a PhD student at ITU. His research interests are AI Applications in Real-Time Simulation.
Nazim Koc is a researcher at the AI Department of MRC. He has a Masters of Science degree in Symbolic Computation, and is a PhD student at ITU. His research interests are Symbolic Computation, Parallel Logic Programming and Machine Learning.
Automated Agents that Learn...
AUTOMATED AGENTS THAT LEARN AND EXPLAIN THEIR OWN ACTIONS: A Progress Report
S. Kocabas*, E. Oztemel**, M. Uludagand Nazim Koc
Abstract
Computer generated agents need to be able to learn meaningful actions in various tactical situations and explain the reasons behind such actions. Different inductive methods have been tried by a few research groups in teaching actions to such agents in tactical air simulations. There have also been some attempts to enable the intelligent agents explain reasons behind their own actions in the form of debriefing records. However, previos research has left the integration of learning and real time explanation as an open issue. The use of inductive methods in teaching tactically meaningful actions makes it rather difficult to integrate learning and explanation. In our research, we have used deductive methods in teaching meaningful actions and their real time explanations to an intelligent air target in 1-v-1 air combat. Our research aims at integrating artificial intelligence techniques in an international EUCLID project for building a distributed simulation system.
Keywords: machine learning, driving simulation, real-time explanation.
-------------------------* Also affiliated with: Department of Space Sciences and Technology, ITU, Maslak, 80626 Istanbul, Turkey.** Also affiliated with: Depertment of Industrial Engineering, SAU, Esentepe Kampusu, Adapazari, Turkey.
1. Introduction
Recent research on computer generated agents focus on using artificial intelligence (AI) techniques in controlling such agents. Several research groups have studied the application of AI techniques in various aspects of air to air combat. These efforts include the application of neural networks for acquiring air combat decision-making skills (Crowe, M.X., 1990); automated agents for beyond visual range (BVR) tactical air simulation (Rosenbloom, et al., 1994); knowledge based decision aiding for BVR combat with multiple targets (Halski, et al., 1991); generating agent goals in an interactive environment (Jones, R.M., et al., 1994); and agents that explain their own actions (Johnson, W.L., 1994).
A large part of the current research relies on static knowledge based methods rather than machine learning techniques which enable the dynamic acquisition of the knowledge and skills of human behavior in tactical situations such as in air combat.
In our research we attempt to implement explanation based learning (EBL), a deductive machine learning technique, in teaching computer generated agents to perform intelligent behavior in BVR and close combat. This study is carried out as part of a joint EUCLID project RTP 11.3 which aims at building a distributed simulation system capable of integrating C3I functions and AI techniques. The project uses ITEMS as the simulation environment.
Explanation Based Learning has been one of the extensively investigated machine learning methods in artificial intelligence (see, e.g., Mitchell, et al., 1986). Different versions of EBL has been applied to a variety of tasks, such as learning concepts, control rules, and planning and scheduling, but the majority of these applications are in small domains.
2. The Task Domain
The aim of our research is to develop techniques to create AI targets (AIT) capable of performing intelligent behavior in tactical air combat. The tactical behavior includes BVR and close combat, in a BARCAP (barrier combat air partrol) scenario for an F16 plane. The task is the intelligent control of the AIT from an AI station connected to the main simulation system via Ethernet (see, Figure 1.)
------------------ ------------------
Ethernet Simulation
AI Station ++++++++++ Sysytem
(ITEMS)
------------------ ------------------
Network
Figure 1. The hardware structure for the
intelligent control of scenario elements.
The ITEMS simulation system is capable of large number of independent agents called scenario elements or "targets" in a real-time 3-D environment representing geographical, atmospheric and terrain data. In a scenario, the scenario elements can be controlled by human operators or control programs. The ITEMS system itself has rule based facilities for developing control systems for creating automated agents.
The acquisition of knowledge and skills for complex real-time behavior as in tactical air combat is a difficult task. Handcoding of rules for such behavior is rather tedious, as it is difficult to foresee all possible interactions. Therefore, machine learning methods need to be used for the acquisition of such knowledge and skills. Some inductive methods have been used in acquiring the rules of intelligent behavior e.g. from flight data obtained from excercises (see, e.g. Crowe, 1990; Sammut, et al., 1992). However, inductive methods require a large number of training examples in order to support reasonably acceptable behavior. Additionally, it is difficult - if not impossible- by inductive methods to integate capabilities for the intelligent agent to explain its own behavior in every tactical situation. Behavioral explanations for intelligent agents have been studied by Johnson (1994) using SOAR, but the explanations provided by Johnson's Debrief system are post-flight explanations, rather than real time explanations.
We have been developing an integrated system called RSIM, capable of controlling an F16 in the ITEMS simulation environment in an intelligent and human-like way. The RSIM system is capable of learning tactical behavior at training sessions, and producing and explaining its agent's behavior in real time during the execution of a mission. The program consists of two subsystems: Cognition-Action subsystem, and Learning and Explanation subsystem (see Figure
2).
RSIM has been tested on a 2-dimensional simulation system for BARCAP mission in 1-v-1 tactical situations with successful results. The program learns to patrol a region around a waypoint in Forward Battle Area (FBA), and engage a hostile target as soon as the situational conditions are satisfied. RSIM also learns the explanation of its target's behavior at each tactical situation during training excercises, and produces the same explanations in similar situations during scenario executions. The program is now being tested on SG Flight Simulator, and will be adapted to ITEMS as soon as the latter is installed.
------------- -------
begin Set Initial
RSIM -----> Conditions ----->
-------------
Learning and S
Explanation Cognition-Action I
-------------------------------------------- M
read Situation Assessment U
situation <---- L
x,y coordinates A
headings T
generalize angle I
situation distance <----- O
time N
expert missile range
---> call for missile count
action fuel S
------------------------ Y
expert S
---> call for ------------------------ T
explanation Action Management E
----> M
missile control/fire ----->
formulate select maneuver
rule explain action
-------------------------------------------- -------
end
Figure 2. Control structure of RSIM.
<>
References
Crowe, M.X. (1990). The application of artificial neural systems to the training of air combat decision-making skills. In Proceedings of the 12th ITSC., pp. 302-312.
Halski, D.J., Landy, J.R. & Kocher, J.A. (1991). Integrated control and avionics for air superiority: A knowledge-based decision-aiding system. AGARD CP-424, Madrid 1991, pp 53-1 to 53-10.
Johnson, W.L. (1994). Agents that explain their own actions. In Proceedings of the 4th Conference on Computer Generated Forces. May 1994, Orlando, Florida.
Jones, R.M., Laird, J.E., Tambe, M. & Rosenbloom, P.S. (1994). Generating goals in response to interacting goals.In proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation.
Mitchell, T., Keller, R. M., and Kedar-Cabelli, S.T. (1986). Explanation- based generalization: A unifying view. Machine Learning 1 (1) 47-80.
Rosenbloom, P.S., Johnson, W.L., Jones, R.M., Koss, F., Laird, J.E., Lehman, J.F., Rubinoff, R., Schwamb, K.B. & Tambe, M. (1994).Intelligent automated agents for tactical air simulation: A progress report. In proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation. pp. 69-78.
Sammut, C., Hurst, S., Kedzier, D., and Michie, D. (1992). Learning to fly. Machine Learning Workshop Proceedings, pp. 385-393, Morgan Kaufmann.
3. RSIM's Control Structure
In order to explain RSIM's operation we will describe the program in terms of its problem space, its subsystems, and its inputs and outputs. The Cognition-Action system of RSIM divides into two operators as Situation Assessment and Action Management. Each of these subsystems and their operators are described below.
3.1 Cognition-Action Subystem
The Cognition-Action subsystem of RSIM consists of two modules: Situation Assessment, and Action Management. An intelligent agent operating in a real-time environment, must have the capability of situation assessment in an effective way in real time. RSIM's Cognition-Action subsystem performs situation assessments by its Situation Assessment operator.
3.1.1. Situation Assessment
The problem space of RSIM consists of two targets moving in a two- dimensional space. There are 12 state variables for these targets. The names of these variables and their types are as follows:
x,y coordinates (AIT/MCT) (integer)
Headings (AIT/MCT) (8 directions)
Distance between targets (real)
Positional angle (AIT -> MCT) (real)
Time (real-time)
Missile range (integer)
Missile fired (AIT/MCT) (integer)
The values of the state variables determine the problem situation at every instant. As the targets change their positions every 1/2 seconds, the problem situation changes accordingly. At every cycle, RSIM has to make situation assessment, and has to decide which action to take. Only some of these values are provided by the Simulation System. These are the values for x-y coordinates for both targets, their headings, real time, missile range and fired missile count. The program's Situation Assessment operator reads the values for the x-y coordinates, and calculates the values for the real distance and the positional angle between the two targets.
Once the values for real distance and angle are calculated, these are classified into fuzzy values. The state variables and their values are sent to a message list by the Situation Assessment operator. This message list is read by the Action-Management operator.
3.1.2. Action Management
The Action-Management operator has three functions: 1) Select-Maneuver, Missile Control, and Explain Behavior. The Select-Maneuver function decides which action to be taken for the AIT, by reading the message list and matching the operational variables in the message list with the action rule set. The rule that matches the current situation is selected as the action rule to be in effect.
Here, each action rule points to a simple maneuver, where each maneuver consists of four-pixel motion. There are five such simple maneuvers as go straight (ss), soft turn right (sr), hard turn right (hr), sot turn left (sl), and hard turn left (hl), (see, Figure 3). In this way, each maneuver lasts two seconds (4 pixels by 1/2 second each).
ss
sl. . sr
. . . .
hl . . hr
. .
Figure 3. Five simple maneuvers for RSIM targets.
Altohugh the selected maneuvers last two seconds, situation assessments continue to be carried out at every cycle of 1/2 second and the message list is read by the Action-Management at every cycle. In this way, when AIT enters into missile fire zone during a simple maneuver, Missile-Control function fires a missile provided a missile is available.
The Action-Management operator can explain the reasons for the selection of a particular maneuver by sending a message, and this message appears during the execution of that maneuver in a screen window. In this way, the the behaviour of AIT is explaned for every simple maneuver in a continuous sequence of maneuvers.
All of the messages of the Action-Management operator, including the explanations, are sent to the Simulation System. The maneuver messages are translated to single-step actions by the Simulation System. For example, a message that says apply go-staright (ss) maneuver, is performed by moving the target by four pixels in the target heading, keeping the heading constant.
3.2. The Learning and Explanation Subsystem
RSIM has a learning subsystem which learns action rules for the AIT by an explanation based generalization (EBG) mechanism. Action rules and explanations are learned during training sessions in an incremental fashion. Action rules are if-then rules that match situations with simple maneuvers. At each problem state, operational variables in the message list periodically updated by the Situation Assessment function, are taken as the current situation. If no rule exists to match the current situation, then the Learning subsystem asks the trainer which maneuver to select. The Learning subsystem then generalizes the current situation, and records it as the conjunctive conditional part of the rule whose conclusion or action part proposes to apply the selected maneuver. The generalization consists of generalizing the values of operational situation variables from real values to a predetermined range. In this way, the distance and angle between the two targets are mapped into a particular distance and angle range.
The trainer also gives an explanation as to why that particular maneuver was selected. This explanation is associated with the rule generated for the current situation as the reason for the selection of the rule. An example rule is shown in Figure 3. The rule
-----------------------------------------------
Conditions: Distance is D6, and
Angle is A5, and
Heading(AIT) is E, and
Heading(MCT) is W.
Action: Apply maneuver SS.
Explanation: Target detected. Approach target.
-------------------------------------------------
Figure 3. Example of a rule generated by RSIM:
in this figure says that when the distance between AIT and MCT is within the range of D6, the angle is within the range of A5, the heading of AIT is east, and the heading of MCT is west, then continue to go straight. The reason for this particular maneuver under the current situation is that the target MCT has been detected, and the intention is to approach the target.
RSIM can apply the rules that it has generated as soon as a matching situation arises. In other words, the program generates and uses its rules in a dynamic way, rather than storing the rules in a rule database. Once the scenario ends (e.g. when a target is shot) learned rules can be transferred from dynamic memory to a rule file for future use.
References
Crowe, M.X. (1990). The application of artificial neural systems to the training of air combat decision-making skills. In Proceedings of the 12th ITSC., pp. 302-312.
Halski, D.J., Landy, J.R. & Kocher, J.A. (1991). Integrated control and avionics for air superiority: A knowledge-based decision-aiding system. AGARD CP-424, Madrid 1991, pp 53-1 to 53-10.
Johnson, W.L. (1994). Agents that explain their own actions. In Proceedings of the 4th Conference on Computer Generated Forces. May 1994, Orlando, Florida.
Jones, R.M., Laird, J.E., Tambe, M. & Rosenbloom, P.S. (1994). Generating goals in response to interacting goals. In proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation.
Mitchell, T., Keller, R. M., and Kedar-Cabelli, S.T. (1986). Explanation- based generalization: A unifying view. Machine Learning 1 (1) 47-80.
Rosenbloom, P.S., Johnson, W.L., Jones, R.M., Koss, F., Laird, J.E., Lehman, J.F., Rubinoff, R., Schwamb, K.B. & Tambe, M. (1994). Intelligent automated agents for tactical air simulation: A progress report. In proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation. pp. 69-78.
Sammut, C., Hurst, S., Kedzier, D., and Michie, D. (1992). Learning to fly. Machine Learning Workshop Proceedings, pp. 385-393, Morgan Kaufmann.
S. Kocabas*, E. Oztemel**, M. Uludagand Nazim Koc
Abstract
Computer generated agents need to be able to learn meaningful actions in various tactical situations and explain the reasons behind such actions. Different inductive methods have been tried by a few research groups in teaching actions to such agents in tactical air simulations. There have also been some attempts to enable the intelligent agents explain reasons behind their own actions in the form of debriefing records. However, previos research has left the integration of learning and real time explanation as an open issue. The use of inductive methods in teaching tactically meaningful actions makes it rather difficult to integrate learning and explanation. In our research, we have used deductive methods in teaching meaningful actions and their real time explanations to an intelligent air target in 1-v-1 air combat. Our research aims at integrating artificial intelligence techniques in an international EUCLID project for building a distributed simulation system.
Keywords: machine learning, driving simulation, real-time explanation.
-------------------------* Also affiliated with: Department of Space Sciences and Technology, ITU, Maslak, 80626 Istanbul, Turkey.** Also affiliated with: Depertment of Industrial Engineering, SAU, Esentepe Kampusu, Adapazari, Turkey.
1. Introduction
Recent research on computer generated agents focus on using artificial intelligence (AI) techniques in controlling such agents. Several research groups have studied the application of AI techniques in various aspects of air to air combat. These efforts include the application of neural networks for acquiring air combat decision-making skills (Crowe, M.X., 1990); automated agents for beyond visual range (BVR) tactical air simulation (Rosenbloom, et al., 1994); knowledge based decision aiding for BVR combat with multiple targets (Halski, et al., 1991); generating agent goals in an interactive environment (Jones, R.M., et al., 1994); and agents that explain their own actions (Johnson, W.L., 1994).
A large part of the current research relies on static knowledge based methods rather than machine learning techniques which enable the dynamic acquisition of the knowledge and skills of human behavior in tactical situations such as in air combat.
In our research we attempt to implement explanation based learning (EBL), a deductive machine learning technique, in teaching computer generated agents to perform intelligent behavior in BVR and close combat. This study is carried out as part of a joint EUCLID project RTP 11.3 which aims at building a distributed simulation system capable of integrating C3I functions and AI techniques. The project uses ITEMS as the simulation environment.
Explanation Based Learning has been one of the extensively investigated machine learning methods in artificial intelligence (see, e.g., Mitchell, et al., 1986). Different versions of EBL has been applied to a variety of tasks, such as learning concepts, control rules, and planning and scheduling, but the majority of these applications are in small domains.
2. The Task Domain
The aim of our research is to develop techniques to create AI targets (AIT) capable of performing intelligent behavior in tactical air combat. The tactical behavior includes BVR and close combat, in a BARCAP (barrier combat air partrol) scenario for an F16 plane. The task is the intelligent control of the AIT from an AI station connected to the main simulation system via Ethernet (see, Figure 1.)
------------------ ------------------
Ethernet Simulation
AI Station ++++++++++ Sysytem
(ITEMS)
------------------ ------------------
Network
Figure 1. The hardware structure for the
intelligent control of scenario elements.
The ITEMS simulation system is capable of large number of independent agents called scenario elements or "targets" in a real-time 3-D environment representing geographical, atmospheric and terrain data. In a scenario, the scenario elements can be controlled by human operators or control programs. The ITEMS system itself has rule based facilities for developing control systems for creating automated agents.
The acquisition of knowledge and skills for complex real-time behavior as in tactical air combat is a difficult task. Handcoding of rules for such behavior is rather tedious, as it is difficult to foresee all possible interactions. Therefore, machine learning methods need to be used for the acquisition of such knowledge and skills. Some inductive methods have been used in acquiring the rules of intelligent behavior e.g. from flight data obtained from excercises (see, e.g. Crowe, 1990; Sammut, et al., 1992). However, inductive methods require a large number of training examples in order to support reasonably acceptable behavior. Additionally, it is difficult - if not impossible- by inductive methods to integate capabilities for the intelligent agent to explain its own behavior in every tactical situation. Behavioral explanations for intelligent agents have been studied by Johnson (1994) using SOAR, but the explanations provided by Johnson's Debrief system are post-flight explanations, rather than real time explanations.
We have been developing an integrated system called RSIM, capable of controlling an F16 in the ITEMS simulation environment in an intelligent and human-like way. The RSIM system is capable of learning tactical behavior at training sessions, and producing and explaining its agent's behavior in real time during the execution of a mission. The program consists of two subsystems: Cognition-Action subsystem, and Learning and Explanation subsystem (see Figure
2).
RSIM has been tested on a 2-dimensional simulation system for BARCAP mission in 1-v-1 tactical situations with successful results. The program learns to patrol a region around a waypoint in Forward Battle Area (FBA), and engage a hostile target as soon as the situational conditions are satisfied. RSIM also learns the explanation of its target's behavior at each tactical situation during training excercises, and produces the same explanations in similar situations during scenario executions. The program is now being tested on SG Flight Simulator, and will be adapted to ITEMS as soon as the latter is installed.
------------- -------
begin Set Initial
RSIM -----> Conditions ----->
-------------
Learning and S
Explanation Cognition-Action I
-------------------------------------------- M
read Situation Assessment U
situation <---- L
x,y coordinates A
headings T
generalize angle I
situation distance <----- O
time N
expert missile range
---> call for missile count
action fuel S
------------------------ Y
expert S
---> call for ------------------------ T
explanation Action Management E
----> M
missile control/fire ----->
formulate select maneuver
rule explain action
-------------------------------------------- -------
end
Figure 2. Control structure of RSIM.
<>
References
Crowe, M.X. (1990). The application of artificial neural systems to the training of air combat decision-making skills. In Proceedings of the 12th ITSC., pp. 302-312.
Halski, D.J., Landy, J.R. & Kocher, J.A. (1991). Integrated control and avionics for air superiority: A knowledge-based decision-aiding system. AGARD CP-424, Madrid 1991, pp 53-1 to 53-10.
Johnson, W.L. (1994). Agents that explain their own actions. In Proceedings of the 4th Conference on Computer Generated Forces. May 1994, Orlando, Florida.
Jones, R.M., Laird, J.E., Tambe, M. & Rosenbloom, P.S. (1994). Generating goals in response to interacting goals.In proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation.
Mitchell, T., Keller, R. M., and Kedar-Cabelli, S.T. (1986). Explanation- based generalization: A unifying view. Machine Learning 1 (1) 47-80.
Rosenbloom, P.S., Johnson, W.L., Jones, R.M., Koss, F., Laird, J.E., Lehman, J.F., Rubinoff, R., Schwamb, K.B. & Tambe, M. (1994).Intelligent automated agents for tactical air simulation: A progress report. In proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation. pp. 69-78.
Sammut, C., Hurst, S., Kedzier, D., and Michie, D. (1992). Learning to fly. Machine Learning Workshop Proceedings, pp. 385-393, Morgan Kaufmann.
3. RSIM's Control Structure
In order to explain RSIM's operation we will describe the program in terms of its problem space, its subsystems, and its inputs and outputs. The Cognition-Action system of RSIM divides into two operators as Situation Assessment and Action Management. Each of these subsystems and their operators are described below.
3.1 Cognition-Action Subystem
The Cognition-Action subsystem of RSIM consists of two modules: Situation Assessment, and Action Management. An intelligent agent operating in a real-time environment, must have the capability of situation assessment in an effective way in real time. RSIM's Cognition-Action subsystem performs situation assessments by its Situation Assessment operator.
3.1.1. Situation Assessment
The problem space of RSIM consists of two targets moving in a two- dimensional space. There are 12 state variables for these targets. The names of these variables and their types are as follows:
x,y coordinates (AIT/MCT) (integer)
Headings (AIT/MCT) (8 directions)
Distance between targets (real)
Positional angle (AIT -> MCT) (real)
Time (real-time)
Missile range (integer)
Missile fired (AIT/MCT) (integer)
The values of the state variables determine the problem situation at every instant. As the targets change their positions every 1/2 seconds, the problem situation changes accordingly. At every cycle, RSIM has to make situation assessment, and has to decide which action to take. Only some of these values are provided by the Simulation System. These are the values for x-y coordinates for both targets, their headings, real time, missile range and fired missile count. The program's Situation Assessment operator reads the values for the x-y coordinates, and calculates the values for the real distance and the positional angle between the two targets.
Once the values for real distance and angle are calculated, these are classified into fuzzy values. The state variables and their values are sent to a message list by the Situation Assessment operator. This message list is read by the Action-Management operator.
3.1.2. Action Management
The Action-Management operator has three functions: 1) Select-Maneuver, Missile Control, and Explain Behavior. The Select-Maneuver function decides which action to be taken for the AIT, by reading the message list and matching the operational variables in the message list with the action rule set. The rule that matches the current situation is selected as the action rule to be in effect.
Here, each action rule points to a simple maneuver, where each maneuver consists of four-pixel motion. There are five such simple maneuvers as go straight (ss), soft turn right (sr), hard turn right (hr), sot turn left (sl), and hard turn left (hl), (see, Figure 3). In this way, each maneuver lasts two seconds (4 pixels by 1/2 second each).
ss
sl. . sr
. . . .
hl . . hr
. .
Figure 3. Five simple maneuvers for RSIM targets.
Altohugh the selected maneuvers last two seconds, situation assessments continue to be carried out at every cycle of 1/2 second and the message list is read by the Action-Management at every cycle. In this way, when AIT enters into missile fire zone during a simple maneuver, Missile-Control function fires a missile provided a missile is available.
The Action-Management operator can explain the reasons for the selection of a particular maneuver by sending a message, and this message appears during the execution of that maneuver in a screen window. In this way, the the behaviour of AIT is explaned for every simple maneuver in a continuous sequence of maneuvers.
All of the messages of the Action-Management operator, including the explanations, are sent to the Simulation System. The maneuver messages are translated to single-step actions by the Simulation System. For example, a message that says apply go-staright (ss) maneuver, is performed by moving the target by four pixels in the target heading, keeping the heading constant.
3.2. The Learning and Explanation Subsystem
RSIM has a learning subsystem which learns action rules for the AIT by an explanation based generalization (EBG) mechanism. Action rules and explanations are learned during training sessions in an incremental fashion. Action rules are if-then rules that match situations with simple maneuvers. At each problem state, operational variables in the message list periodically updated by the Situation Assessment function, are taken as the current situation. If no rule exists to match the current situation, then the Learning subsystem asks the trainer which maneuver to select. The Learning subsystem then generalizes the current situation, and records it as the conjunctive conditional part of the rule whose conclusion or action part proposes to apply the selected maneuver. The generalization consists of generalizing the values of operational situation variables from real values to a predetermined range. In this way, the distance and angle between the two targets are mapped into a particular distance and angle range.
The trainer also gives an explanation as to why that particular maneuver was selected. This explanation is associated with the rule generated for the current situation as the reason for the selection of the rule. An example rule is shown in Figure 3. The rule
-----------------------------------------------
Conditions: Distance is D6, and
Angle is A5, and
Heading(AIT) is E, and
Heading(MCT) is W.
Action: Apply maneuver SS.
Explanation: Target detected. Approach target.
-------------------------------------------------
Figure 3. Example of a rule generated by RSIM:
in this figure says that when the distance between AIT and MCT is within the range of D6, the angle is within the range of A5, the heading of AIT is east, and the heading of MCT is west, then continue to go straight. The reason for this particular maneuver under the current situation is that the target MCT has been detected, and the intention is to approach the target.
RSIM can apply the rules that it has generated as soon as a matching situation arises. In other words, the program generates and uses its rules in a dynamic way, rather than storing the rules in a rule database. Once the scenario ends (e.g. when a target is shot) learned rules can be transferred from dynamic memory to a rule file for future use.
References
Crowe, M.X. (1990). The application of artificial neural systems to the training of air combat decision-making skills. In Proceedings of the 12th ITSC., pp. 302-312.
Halski, D.J., Landy, J.R. & Kocher, J.A. (1991). Integrated control and avionics for air superiority: A knowledge-based decision-aiding system. AGARD CP-424, Madrid 1991, pp 53-1 to 53-10.
Johnson, W.L. (1994). Agents that explain their own actions. In Proceedings of the 4th Conference on Computer Generated Forces. May 1994, Orlando, Florida.
Jones, R.M., Laird, J.E., Tambe, M. & Rosenbloom, P.S. (1994). Generating goals in response to interacting goals. In proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation.
Mitchell, T., Keller, R. M., and Kedar-Cabelli, S.T. (1986). Explanation- based generalization: A unifying view. Machine Learning 1 (1) 47-80.
Rosenbloom, P.S., Johnson, W.L., Jones, R.M., Koss, F., Laird, J.E., Lehman, J.F., Rubinoff, R., Schwamb, K.B. & Tambe, M. (1994). Intelligent automated agents for tactical air simulation: A progress report. In proceedings of the 4th conference on Computer Generated Forces and Behavioral Representation. pp. 69-78.
Sammut, C., Hurst, S., Kedzier, D., and Michie, D. (1992). Learning to fly. Machine Learning Workshop Proceedings, pp. 385-393, Morgan Kaufmann.
AI and Scientific Creativity
AI AND SCIENTIFIC CREATIVITY
Sakir Kocabas
Abstract : This article examines the elements of scientific creativity through a series of basic cognitive and computational concepts. Scientific creativity requires motivation, an access to a body of systematic knowledge, an ability to correctly formulate research problems and to define a comprehensive problem space. It also requires an ability to reduce the corresponding search space by using methodological knowledge, and rigour to conduct search in the constrained search space. The article discusses the types and the role of knowledge involved in scientific research, types of scientific creativity, and the dimensions of scientific research.
Introduction
Scientific discovery and creativity has recently become one of the special concerns of artificial intelligence. Within the last five years, a number of research papers and two important books have appeared on scientific discovery (see, Langley, Simon, Bradshaw, & Zytkow, 1987; Shrager & Langley, 1990). Closely related with the subject, two other books have appeared: one on the computational philosophy of science (Thagard, 1988), and another one on creativity (Boden, 1990).
Langley et al.'s (1987) work posed the first serious challenge to the conventional study of science by proposing that, far from being mysterious and unexplainable, scientific discovery (and by implication scientific creativity), can be explained in a series of processes. Their work also described several computational models in support of the authors' view. Shrager and Langley's (1990) later study introduced new methods for the study of scientific development, and explained how the methods of the computational study of science were superior to those of conventional philosophy of science. Boden's (1990) work on the other hand, extended some of these views and discussed, from a cognitive scientist's perspective, how creativity in arts and literature, as well as in science could be studied within a computational context in a more systematic way.
However, previous work leaves some of the important issues in discovery untouched, such as the elements of scientific creativity, the types of scientific discovery and creativity, and the dimensions of scientific research. In this article, we examine the basic cognitive concepts of creativity, and describe how these concepts are connected, and then discuss the role of background knowledge and the kinds of knowledge necessary for scientific research. Finally, we discuss the types of scientific discovery and the elements of scientific research.
Creativity in Science
Creativity and intelligence are closely linked concepts, so much so that the existence of one is the measure of the other. Therefore, any attempt that brings clarity to one concept will be helpful to define the other. Lenat and Feigenbaum (1987) define intelligence in terms of 'search", as the power to find a solution to a problem in an large search space. Later, Feigenbaum defined intelligence in terms of 'knowledge assembly" rather than "search" (see, Engelmore & Morgan, 1988, vii). According to this new definition, an intelligent system has the ability to assemble the neccessary body of knowledge to conduct a complex task.
A distinction can be made between scientific creativity and other types of creativity such as artistic, architectural, musical and literary creativity. The former may involve the discovery of a new substance, the invention of a new mechanism or method, or the construction of a new model of reality (a hypothesis or a theory). The latter usually manifests itself as a work of art or a new style, and the term 'creativity" is usually associated with this type. Therefore, when we talk about scientific creativity, this is to be understood within this perspective.
Even in historical terms, scientific creativity is distinguished from other forms of creativity such as in arts, music and literature, in its extensive reliance on background knowledge and experience. This may explain why we do not see child prodigies in creative science as we see in music and arts.
Scientific creativity can be investigated through five basic cognitive and computational concepts. These are:
1) Motivation for scientific research.2) Ability to correctly formulate research problems within a body of knowledge.3) Ability to create a comprehensive search space for the solution of a scientific problem.4) Ability to assemble (or induce) and implement a set of heuristics to reduce the search space.5) Patience and stamina for the exhaustive search for solving the scientific problem within the constrained search space.
Fig. 1 summarizes the links between these concepts. Any missing link between them, can hinder scientific creativity.
Motivation for Formulate Generate Reduce Conduct
Scientific --> Research --> Search --> Search --> Exhaustive
Research Problems Space Space Search
Fig. 1. Problem formulation and search in scientific discovery
As indicated in the above list, research motivation tops the requirements for scientific creativity. There seems to be a significant relationship between concept structures (or ontology) and scientific motivation. The three millennia of European history of thought indicates this, in its ups and downs in scientific activity. The social aspects of science has been extensively studied both by sociologists and some philosophers of science. Various types of human motivation have also been studied by psychologists in the last five decades, but there seems to be little work on the relationships between ontology and scientific motivation.
In modern scientific research, an access to a large and systematic body of knowledge is necessary for correctly formulating scientific problems, both in creating a comprehensive search space, and in reducing the search space to reach for a solution within acceptable limits of time and resources. The correct formulation of research problems requires a mastery of the conceptual structure of the field of science involved. The creative scientist can change this structure for reformulating a research problem, and in some cases, these changes may involve the most fundamental concepts and principles, such as time and measurability in physics.
Scientific creativity exhibits itself during the completion of a series of research tasks. Different types of knowledge is used for such tasks, as will be explained next.
The Role of Background Knowledge in Scientific Creativivity
Modern scientific research is one of the most complex human activities, requiring the use of different types of general and specific knowledge. Knowledge necessary for modern scientific research can be divided into four types asa) Commonsense Knowledge,b) Technical Knowledge,c) Theoretical Knowledge, andd) Methodological Knowledge.
Commonsense knowledge is simple, general and relatively unstructured knowledge about the world. Statements such as 'Water extinguishes fire,' 'Fire burns paper' are examples of commonsense knowledge. Technical knowledge can be defined as the knowledge about instruments, methods and processes. Knowledge about how to repair a TV set, how to control a chemical reactor, and how to fly an aeroplane can be considered as technical knowledge. Theoretical background is helpful, but not always essential, in acquiring this kind of knowledge. Technical knowledge can be descriptive as well as prescriptive.
Theoretical knowledge is structured, descriptive knowledge about the world, embodying classifications and numerous interrelated hypotheses. Typical examples of theoretical knowledge are the classical mechanics and electro-magnetism.
Methodological knowledge, on the other hand, is exclusively prescriptive; it can be represented as condition-action rules. Methodological knowledge includes knowledge about how to distinguish between scientifically interesting and uninteresting phenomena, how to choose between alternative goals, strategies and methods during scientific research, how to design experiments, how to propose new hypotheses, and how to generalize, test and evaluate them. It is mostly the extent of this type of knowledge that makes the difference between a research scientist and a nonscientist.
Unlike the inference rules in theoretical knowledge, many of the methodological rules rely on extralogical methods such as inductive generalizations, abduction, abstraction and analogy. Such rules are frequently used in formulating problem states, in constraining large search spaces, and in hypothesis formation during the activity of scientific research.
Theoretical knowledge helps in formulating constraints for domain-dependent reduction of search space in scientific research. The story of Edison's discovery of a durable ligth bulb filament illustrates the relationships between theoretical knowledge, motivation and search. In the background of an insufficient theoretical knowledge available at the time about materials, Edison is said to have tried thousands of filaments made from different elements, metals and alloys. In this way, he compensated the deficiency of theoretical knowledge with his endless motivation and meticulous search. Theoretical and empirical knowledge played a much more important role in the discovery of the high-temperature oxide superconductors in 1986 and 1987.
In some cases, one discovery facilitates other discoveries. This has been the case in the discovery of certain quantum properties in particle physics, and in the discovery of new oxide superconductors in high temperature superconductivity research. In particle physics, the discovery of lepton, spin, and strangeness properties, after the discovery of baryon number, would require less cognitive effort in terms of abstraction and abductions applied in the process. This is because, by the discovery of the baryon property, the abstraction from electrical charge to a class of such quantum properties, and the abduction from observed and unobservable particle reactions, had already been successfully made (see, Kocabas 1991).
Similarly, in oxide superconductivity, after the discovery of La-Ba-Cu-O superconductor by Bednorz and Mueller in 1986, physicists extended the application of the ideas and methods that were developed, and discovered other oxides with higher transition temperatures.
Some discoveries rely more heavily on analogical reasoning than abstraction and abduction. For example, the so far unsuccessful attempts by physicists on 'cold-fusion' relies on an analogy between extreme pressures obtainable in a plasma, and that within the crystal structure of a metal electrode. Another interesting analogy for research in this field could be a 'nuclear catalyst' induced from a well known concept 'chemical catalyst'. In chemical kinetics, chemical catalysts can initiate certain chemical reactions otherwise unrealizable under the same temperature and pressure conditions, due to constraints explained by activation energy levels. Could one find a 'nuclear catalyst' to similar effects for nuclear fusion?
Types of Scientific Discovery and Creativity
Scientific creativity can be examined in relation to the scope of the research in which a discovery takes place. Kocabas (1992c) introduces a classification of scientific discovery as follows: 1) Logico-Mathematical Discovery, 2) Formal Discovery, 3) Theoretical Discovery, and 4) Empirical Discovery. This classification is based on the categorization of descriptive knowledge by Kocabas(1992a), and reflects the types of knowledge used in scientific research, and the type of knowledge discovered. All these four types of discovery have been studied by computational models in AI.
According to this classification, logico-mathematical discovery takes place, as the name suggests, in the abstract domain of logic and mathematics. Some of the earliest AI systems such as Logic Theorist were logico-mathematical discovery models designed to prove theorems in logic. Among the more recent computational models, AM (Lenat, 1979) appears as a successful example for mathematical discovery. The distinguishing characteristic of logico-mathematical discovery is that, in principle, it does not require experimentation or observation. Nor does it need the knowledge of a physical domain par se, except for analogical transference in some cases.
Formal discovery takes place in a formal domain involving abstract entities, their classes and properties. Formal discovery requires logico-mathematical knowledge as background knowledge, for deductive inference on formal knowledge. Lenat's (1983) EURISKO, in its applications to Naval Fleet Design, Evolution, and 3-D circuit design, is a good example to formal discovery systems.
Theoretical discovery requires logico-mathematical, formal and theoretical knowledge, and in general, results from theoretical analysis and synthesis. Some computational models of theoretical discovery are systems such as PI (Thagard & Holyoak, 1985), ECHO (Thagard & Nowak, 1990), and GALILEO (Zytkow, 1990). The first two could better be characterized as concept discovery systems, and as such, are closer to formal discovery models. GALILEO on the other hand, is an interesting example of discovery by theoretical analysis. In the history of science there are rather important theoretical discoveries or inventions such as Maxwell's equations and the Einstein-Lorenz transformations.
Empirical discovery is an extensively studied area, and a number of computational models have been designed to investigate its various aspects. Empirical discovery requires experimental and observational data, as well as logico-mathematical and formal knowledge. Theoretical knowledge has not been a prerequisite in the early empirical discoveries in the history of science (e.g. in the 17th and 18th century chemistry), but in modern empirical research such as in oxide superconductivity and 'cold fusion' experiments, extensive theoretical domain knowledge is necessary.
Empirical discovery systems can be divided into two main classes as qualitative and quantitative models, although this distinction is sometimes irrelevant. Among the qualitative discovery systems, GLAUBER (Langley, et al., 1987), STAHL (Zytkow & Simon, 1986), STAHLp (Rose & Langley, 1986), BR-3 (Kocabas, 1991a), KEKADA (Kulkarni & Simon, 1988), AbE (O'Rorke, Morris & Schulenburg, 1990), and COAST (Rajamoney, 1990) can be cited. Some of rediscoveries of these systems can be identified as formal discovery, such as GLAUBER's classification of substances as 'acid', 'alkali' and 'salt'.
Among the quantitative discovery models BACON (Langley, et al., 1987) FAHRENHEIT (Zytkow, 1987) and IDS (Nordhausen & Langley, 1987) can be cited as prominent examples. BACON was the first successful model of quantitative discovery, which also has attracted the interest of philosophers of science(*). The IDS system on the other hand, integrates qualitative and quantitative methods.
--------------------------------------------* See, e.g. the special issue (Vol 19, No 4) of Social Studies of Science.
Dimensions of Scientific Research
Research in the computational study of science has revealed a number of important aspects of science that were overlooked by the conventional philosophical study. Shrager and Langley (1990) describe the basic differences between the computational and the conventional philosophical approaches to science as follows: Conventional philosophical tradition focuses on the structure of scientific knowledge and emphasizes the evaluation of laws and theories, while the computational approach focuses on the processes of scientific thought, and emphasizes scientific discovery including the activities of data evaluation, theory formation and experimentation.
The distinction can be extended even further. Computational study of science concerns not only with the issues of hypothesis formation, testing and verification, which have been the main concern of conventional study of science, but also a series of other related issues. Kocabas (1992b) names more than a dozen different major tasks involved in scientific research. These are: Formulating research goals, selecting research goals, defining research framework, gathering knowledge, organising knowledge, selecting research strategies, methods, tools and techniques, proposing experiments, designing experiments and selecting experiment materials, setting expectations, conducting experiments, data collection, data evaluation, hypothesis formation, theory formation, theory revision, goal satisfaction control, and producing explanations.
Any of these research tasks may concern activities dealing with a variety of planning, classification and evaluation problems. To provide an idea about the diversity of the activities involved in these research tasks, we will give some of the results of our study on the research in oxide superconductivity (Kocabas, 1992b) in terms of five of the research tasks listed above. These are: Formulation of scientific research goals, choosing between formulated goals, proposing strategies, proposing experiments, and hypothesis formation.
Research goals can be divided into two general forms that may overlap: Those that aim at explaining a phenomenon, and those that aim to study a penomenon. Creative scientists seem to utilize several general rules for formulating their research goals: They focus their attention to problems and phenomena that have not been explained or unexplainable within the current scientific framework. However, such problems must have some general and important implications to be worthy of investigation. For example, why the moon has more craters in one particular area than others may not be regarded as an interesting problem. On the other hand, the research for understanding why some elementary particle reactions have never been observed would be important, because the results would interest not only quantum physics but also cosmology. Some scientific research problems may be strongly related to important technological needs. Energy conversion, storage, and transfer are still major technological problems that motivate scientific research into such areas as 'cold fusion', oxide superconductivity, and electrochemistry. However, interestingness in itself is not a sufficient criterion for a phenomenon to attract the attention of the creative scientist. The research goals that are formulated must be achievable.
It is not unusual that, in relation to a certain phenomenon, a scientist formulates alternative research goals. In such cases, the selection of a research goal among alternatives is another research task. Scientists use several selection criteria in deciding which problem to primarily focus on. Interestingness, importance, the materials and technological tools needed, economic constraints, and acheivability within a timescale are some of the metrics that affect the decision. As can be seen, some of these constraints conflict with one another, so that the scientist may have to do some classification before deciding which goal to select.
Selecting research strategies is another important task for achieving a research goal. Strategy selection depends on the type of the research goal. If the goal is to explain a certain phenomenon, gathering knowledge by detailed literature survey and theoretical analysis may take precedence. On the other hand, if the goal is to study a phenomenon, then experimentation and observation needs to be considered. If experimentation is selected, then the types of experiments needs to be decided. For example, if the research goal is to study the possibility of improving a certain important physical property (e.g., electrical conductivity) of a substance, there may be a number of alternative strategies. The following are only a few of the strategy heuristics extracted from the research reports on oxide superconductivity in 1987.
If the goal is to improve a property P and a process S improves P, then propose experiments applying S.If the goal is to improve a property P and another property Q is positively related with P, and a process S improves Q, then propose experiments to apply S.If the goal is to improve a property P, and another property Q has a negative effect E on P, then propose experiments to reduce or eliminate E.
Once the experimentation strategy is selected, the scientist has to decide about the relevant processes and techniques for the current strategy. S/he also has to decide about the experiment materials, and has to classify these materials against a set of parameters such as availability, likeliness to yield success, cost and relative hazards (e.g., radioactivity, flammability and corrosiveness), and select the best materials for the experiments.
Scientific experiments need to be designed and conducted according to certain procedures. Experimental parameters are defined, tests are made and parameter values are measured, and in this way relevant data is collected. The data is evaluated to make sure if they reflect any violation of the experimental conditions. After data evaluation, hypotheses are formed.
Hypothesis formation is one of the most important tasks of scientific research. Despite the fact that it has been a primary concern of the conventional philosophy of science for a long time, it still remains to be an aspect of scientific discovery that needs a detailed investigation. In our study on oxide superconductivity research, we have identified over 40 hypothesis formation heuristics that were utilized by scientists working in this field. Some of these heuristics are as follows:
If the value of a property P changes with the value of another property Q, then hypothesize that P and Q are related.If a process does not change a set of experimental parameters P1,...,Pm, but changes other such parameters Q1,...,Qn, then hypothesize that P1,...,Pm and Q1,...,Qn are independent.If a process is expected to enhance a property P of a substance M, but the expected increase does not take place, then hypothesize that there is another property Q hindering the effect.
The majority of these heuristics are general, while some are domain specific. Two examples are as follows:
If a change in the crystal structure S1 sharply diminishes a property P (e.g. electrical or thermal conductivity), and the change is also accompanied by the disappearence of some substructure S2, then hypothesize that S1 plays an important role for P.If two compounds M1 and M2 have have very similar bonding and electronic structure over a wide range of temperature, then hypothesize that M1 and M2 have very similar conduction properties within the same range.
These are only some examples of hypothesis formation rules used in a rather specialized domain of physical science. We will not discuss the methods and rules used in hypothesis verification and theory revision here for reasons of space. However, by considering the rules and methods used in various fields of science, from physical to human sciences, and over a dozen research tasks in each of them, we can realize the dimensions of scientific research.
The diversity of interrelated research tasks is by itself sufficient to show that, scientific discovery is not a logical procedure or a process in itself, but the product of a series of complex processes called scientific research. Scientific creativity may be required in any of the research activities in these processes. History of physics has many examples of this. Although an extreme example, consider the design, construction and the operation of the CERN particle accelerator, where research involves proposing and designing experiments, setting expectations, conducting experiments, data collection, data evaluation, hypothesis formation and verification, and theory revision.
Conventional philosophical approach ignores the multiplicity of the tasks and activities involved in scientific inquiry. We believe that, a much more detailed and careful examination and analysis of science is needed than that is envisaged by the conventional study of science. The computational approach provides both the necessary concepts and methods for such a study.
Conclusion
Scientific creativity needs to be investigated within its natural environment, namely within the processes of scientific research and discovery. Conventional philosophy of science, probably due to the limitations of its scope, has ignored a number of issues about science. Scientific creativity displays itself in the form of scientific discovery, which in turn, is the product of a series of complex tasks called scientific research. Therefore, a comprehensive study of science and scientific discovery requires a sufficiently rich set of concepts for a detailed and systematic investigation. Recent developments in the computational study of science provides some of these concepts. Based on these concepts, we have introduced a more detailed definition of scientific creativity, classified scientific discovery and creativity, and examined the the role of background knowledge in discovery within the wider dimensions of scientific research. A systematic investigation of scientific creativity cannot be conducted without considering the multiplicity of research tasks that have to be carried out by scientists during their activities.
References
Boden, M. 1990. The creative mind. Sphere Books, London.Engelmore, R. and Morgan, T. 1988. Blackboard systems. Addison Wesley.
Kocabas, S. 1991. Conflict resolution as discovery in particle physics. Machine Learning, Vol 6, No 3, 277-309.
Kocabas, S. 1992a. Functional categorization of knowledge. AAAI Spring Symposium Series, 25-27 March 1992, Stanford, CA.
Kocabas, S. 1992b. Elements of scientific research: Modeling discoveries in oxide superconductivity. Proceedings of the ML92 Workshop on Machine Discovery, 63-70.
Kocabas, S. 1992c. Evaluation of discovery systems. Proceedings of the ML92 Workshop on Machine Discovery, 168-171.
Kulkarni, D. and Simon, H. 1988. The processes of scientific discovery. Cognitive Science, 12, 139-175.
Langley, P., Simon, H., Bradshaw, G., and Zykow, J. 1987. Scientific discovery: Exploration of the creative processes. MIT Press.
Lenat, D.B. 1979. On automated scientific theory formation: A case study using the AM program. In Hayes, J., Michie., D., and Mikulich, D.I. eds., Machine Intelligence, 9, 251-283, Halstead, New York.
Lenat, D.B. 1983. EURISKO: A program that learns new heuristics and domain concepts. Artificial Intelligence 21, 61-98.
Lenat, D.B. and Feigenbaum, E. 1987. On the thresholds of knowledge. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, 1173-1182.
Nordhausen, B. and Langley, P. 1987. Towards an integrated discovery system. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, 198-200.
O'Rorke, P., Morris, S. and Schulenburg, D. 1990. Theory formation by abstraction. In Shrager, J., and Langley P. eds. Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rajamoney, S.A. 1990. A computational approach to theory revision. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rose, D. and Langley, P. 1986. Chemical discovery as belief revision. Machine Learning, 1, 423-452.
Shrager, J., and Langley, P. Eds. 1990. Computational approaches to scientific discovery. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Thagard, P. 1988. Computational philosophy of science. The MIT Press, Cambridge, MA.
Thagard, P. and Holyoak, K. 1985. Discovering the wave theory of sound: inductive inference in the context of problem solving. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, 610-612.
Thagard, P. and Nowak, G. 1990. The conceptual structure of the geological revolution. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Zytkow, J.M. 1990. Deriving laws through analysis of processes and equations. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Zytkow, J.M. and Simon, H. 1986. A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
Sakir Kocabas
Abstract : This article examines the elements of scientific creativity through a series of basic cognitive and computational concepts. Scientific creativity requires motivation, an access to a body of systematic knowledge, an ability to correctly formulate research problems and to define a comprehensive problem space. It also requires an ability to reduce the corresponding search space by using methodological knowledge, and rigour to conduct search in the constrained search space. The article discusses the types and the role of knowledge involved in scientific research, types of scientific creativity, and the dimensions of scientific research.
Introduction
Scientific discovery and creativity has recently become one of the special concerns of artificial intelligence. Within the last five years, a number of research papers and two important books have appeared on scientific discovery (see, Langley, Simon, Bradshaw, & Zytkow, 1987; Shrager & Langley, 1990). Closely related with the subject, two other books have appeared: one on the computational philosophy of science (Thagard, 1988), and another one on creativity (Boden, 1990).
Langley et al.'s (1987) work posed the first serious challenge to the conventional study of science by proposing that, far from being mysterious and unexplainable, scientific discovery (and by implication scientific creativity), can be explained in a series of processes. Their work also described several computational models in support of the authors' view. Shrager and Langley's (1990) later study introduced new methods for the study of scientific development, and explained how the methods of the computational study of science were superior to those of conventional philosophy of science. Boden's (1990) work on the other hand, extended some of these views and discussed, from a cognitive scientist's perspective, how creativity in arts and literature, as well as in science could be studied within a computational context in a more systematic way.
However, previous work leaves some of the important issues in discovery untouched, such as the elements of scientific creativity, the types of scientific discovery and creativity, and the dimensions of scientific research. In this article, we examine the basic cognitive concepts of creativity, and describe how these concepts are connected, and then discuss the role of background knowledge and the kinds of knowledge necessary for scientific research. Finally, we discuss the types of scientific discovery and the elements of scientific research.
Creativity in Science
Creativity and intelligence are closely linked concepts, so much so that the existence of one is the measure of the other. Therefore, any attempt that brings clarity to one concept will be helpful to define the other. Lenat and Feigenbaum (1987) define intelligence in terms of 'search", as the power to find a solution to a problem in an large search space. Later, Feigenbaum defined intelligence in terms of 'knowledge assembly" rather than "search" (see, Engelmore & Morgan, 1988, vii). According to this new definition, an intelligent system has the ability to assemble the neccessary body of knowledge to conduct a complex task.
A distinction can be made between scientific creativity and other types of creativity such as artistic, architectural, musical and literary creativity. The former may involve the discovery of a new substance, the invention of a new mechanism or method, or the construction of a new model of reality (a hypothesis or a theory). The latter usually manifests itself as a work of art or a new style, and the term 'creativity" is usually associated with this type. Therefore, when we talk about scientific creativity, this is to be understood within this perspective.
Even in historical terms, scientific creativity is distinguished from other forms of creativity such as in arts, music and literature, in its extensive reliance on background knowledge and experience. This may explain why we do not see child prodigies in creative science as we see in music and arts.
Scientific creativity can be investigated through five basic cognitive and computational concepts. These are:
1) Motivation for scientific research.2) Ability to correctly formulate research problems within a body of knowledge.3) Ability to create a comprehensive search space for the solution of a scientific problem.4) Ability to assemble (or induce) and implement a set of heuristics to reduce the search space.5) Patience and stamina for the exhaustive search for solving the scientific problem within the constrained search space.
Fig. 1 summarizes the links between these concepts. Any missing link between them, can hinder scientific creativity.
Motivation for Formulate Generate Reduce Conduct
Scientific --> Research --> Search --> Search --> Exhaustive
Research Problems Space Space Search
Fig. 1. Problem formulation and search in scientific discovery
As indicated in the above list, research motivation tops the requirements for scientific creativity. There seems to be a significant relationship between concept structures (or ontology) and scientific motivation. The three millennia of European history of thought indicates this, in its ups and downs in scientific activity. The social aspects of science has been extensively studied both by sociologists and some philosophers of science. Various types of human motivation have also been studied by psychologists in the last five decades, but there seems to be little work on the relationships between ontology and scientific motivation.
In modern scientific research, an access to a large and systematic body of knowledge is necessary for correctly formulating scientific problems, both in creating a comprehensive search space, and in reducing the search space to reach for a solution within acceptable limits of time and resources. The correct formulation of research problems requires a mastery of the conceptual structure of the field of science involved. The creative scientist can change this structure for reformulating a research problem, and in some cases, these changes may involve the most fundamental concepts and principles, such as time and measurability in physics.
Scientific creativity exhibits itself during the completion of a series of research tasks. Different types of knowledge is used for such tasks, as will be explained next.
The Role of Background Knowledge in Scientific Creativivity
Modern scientific research is one of the most complex human activities, requiring the use of different types of general and specific knowledge. Knowledge necessary for modern scientific research can be divided into four types asa) Commonsense Knowledge,b) Technical Knowledge,c) Theoretical Knowledge, andd) Methodological Knowledge.
Commonsense knowledge is simple, general and relatively unstructured knowledge about the world. Statements such as 'Water extinguishes fire,' 'Fire burns paper' are examples of commonsense knowledge. Technical knowledge can be defined as the knowledge about instruments, methods and processes. Knowledge about how to repair a TV set, how to control a chemical reactor, and how to fly an aeroplane can be considered as technical knowledge. Theoretical background is helpful, but not always essential, in acquiring this kind of knowledge. Technical knowledge can be descriptive as well as prescriptive.
Theoretical knowledge is structured, descriptive knowledge about the world, embodying classifications and numerous interrelated hypotheses. Typical examples of theoretical knowledge are the classical mechanics and electro-magnetism.
Methodological knowledge, on the other hand, is exclusively prescriptive; it can be represented as condition-action rules. Methodological knowledge includes knowledge about how to distinguish between scientifically interesting and uninteresting phenomena, how to choose between alternative goals, strategies and methods during scientific research, how to design experiments, how to propose new hypotheses, and how to generalize, test and evaluate them. It is mostly the extent of this type of knowledge that makes the difference between a research scientist and a nonscientist.
Unlike the inference rules in theoretical knowledge, many of the methodological rules rely on extralogical methods such as inductive generalizations, abduction, abstraction and analogy. Such rules are frequently used in formulating problem states, in constraining large search spaces, and in hypothesis formation during the activity of scientific research.
Theoretical knowledge helps in formulating constraints for domain-dependent reduction of search space in scientific research. The story of Edison's discovery of a durable ligth bulb filament illustrates the relationships between theoretical knowledge, motivation and search. In the background of an insufficient theoretical knowledge available at the time about materials, Edison is said to have tried thousands of filaments made from different elements, metals and alloys. In this way, he compensated the deficiency of theoretical knowledge with his endless motivation and meticulous search. Theoretical and empirical knowledge played a much more important role in the discovery of the high-temperature oxide superconductors in 1986 and 1987.
In some cases, one discovery facilitates other discoveries. This has been the case in the discovery of certain quantum properties in particle physics, and in the discovery of new oxide superconductors in high temperature superconductivity research. In particle physics, the discovery of lepton, spin, and strangeness properties, after the discovery of baryon number, would require less cognitive effort in terms of abstraction and abductions applied in the process. This is because, by the discovery of the baryon property, the abstraction from electrical charge to a class of such quantum properties, and the abduction from observed and unobservable particle reactions, had already been successfully made (see, Kocabas 1991).
Similarly, in oxide superconductivity, after the discovery of La-Ba-Cu-O superconductor by Bednorz and Mueller in 1986, physicists extended the application of the ideas and methods that were developed, and discovered other oxides with higher transition temperatures.
Some discoveries rely more heavily on analogical reasoning than abstraction and abduction. For example, the so far unsuccessful attempts by physicists on 'cold-fusion' relies on an analogy between extreme pressures obtainable in a plasma, and that within the crystal structure of a metal electrode. Another interesting analogy for research in this field could be a 'nuclear catalyst' induced from a well known concept 'chemical catalyst'. In chemical kinetics, chemical catalysts can initiate certain chemical reactions otherwise unrealizable under the same temperature and pressure conditions, due to constraints explained by activation energy levels. Could one find a 'nuclear catalyst' to similar effects for nuclear fusion?
Types of Scientific Discovery and Creativity
Scientific creativity can be examined in relation to the scope of the research in which a discovery takes place. Kocabas (1992c) introduces a classification of scientific discovery as follows: 1) Logico-Mathematical Discovery, 2) Formal Discovery, 3) Theoretical Discovery, and 4) Empirical Discovery. This classification is based on the categorization of descriptive knowledge by Kocabas(1992a), and reflects the types of knowledge used in scientific research, and the type of knowledge discovered. All these four types of discovery have been studied by computational models in AI.
According to this classification, logico-mathematical discovery takes place, as the name suggests, in the abstract domain of logic and mathematics. Some of the earliest AI systems such as Logic Theorist were logico-mathematical discovery models designed to prove theorems in logic. Among the more recent computational models, AM (Lenat, 1979) appears as a successful example for mathematical discovery. The distinguishing characteristic of logico-mathematical discovery is that, in principle, it does not require experimentation or observation. Nor does it need the knowledge of a physical domain par se, except for analogical transference in some cases.
Formal discovery takes place in a formal domain involving abstract entities, their classes and properties. Formal discovery requires logico-mathematical knowledge as background knowledge, for deductive inference on formal knowledge. Lenat's (1983) EURISKO, in its applications to Naval Fleet Design, Evolution, and 3-D circuit design, is a good example to formal discovery systems.
Theoretical discovery requires logico-mathematical, formal and theoretical knowledge, and in general, results from theoretical analysis and synthesis. Some computational models of theoretical discovery are systems such as PI (Thagard & Holyoak, 1985), ECHO (Thagard & Nowak, 1990), and GALILEO (Zytkow, 1990). The first two could better be characterized as concept discovery systems, and as such, are closer to formal discovery models. GALILEO on the other hand, is an interesting example of discovery by theoretical analysis. In the history of science there are rather important theoretical discoveries or inventions such as Maxwell's equations and the Einstein-Lorenz transformations.
Empirical discovery is an extensively studied area, and a number of computational models have been designed to investigate its various aspects. Empirical discovery requires experimental and observational data, as well as logico-mathematical and formal knowledge. Theoretical knowledge has not been a prerequisite in the early empirical discoveries in the history of science (e.g. in the 17th and 18th century chemistry), but in modern empirical research such as in oxide superconductivity and 'cold fusion' experiments, extensive theoretical domain knowledge is necessary.
Empirical discovery systems can be divided into two main classes as qualitative and quantitative models, although this distinction is sometimes irrelevant. Among the qualitative discovery systems, GLAUBER (Langley, et al., 1987), STAHL (Zytkow & Simon, 1986), STAHLp (Rose & Langley, 1986), BR-3 (Kocabas, 1991a), KEKADA (Kulkarni & Simon, 1988), AbE (O'Rorke, Morris & Schulenburg, 1990), and COAST (Rajamoney, 1990) can be cited. Some of rediscoveries of these systems can be identified as formal discovery, such as GLAUBER's classification of substances as 'acid', 'alkali' and 'salt'.
Among the quantitative discovery models BACON (Langley, et al., 1987) FAHRENHEIT (Zytkow, 1987) and IDS (Nordhausen & Langley, 1987) can be cited as prominent examples. BACON was the first successful model of quantitative discovery, which also has attracted the interest of philosophers of science(*). The IDS system on the other hand, integrates qualitative and quantitative methods.
--------------------------------------------* See, e.g. the special issue (Vol 19, No 4) of Social Studies of Science.
Dimensions of Scientific Research
Research in the computational study of science has revealed a number of important aspects of science that were overlooked by the conventional philosophical study. Shrager and Langley (1990) describe the basic differences between the computational and the conventional philosophical approaches to science as follows: Conventional philosophical tradition focuses on the structure of scientific knowledge and emphasizes the evaluation of laws and theories, while the computational approach focuses on the processes of scientific thought, and emphasizes scientific discovery including the activities of data evaluation, theory formation and experimentation.
The distinction can be extended even further. Computational study of science concerns not only with the issues of hypothesis formation, testing and verification, which have been the main concern of conventional study of science, but also a series of other related issues. Kocabas (1992b) names more than a dozen different major tasks involved in scientific research. These are: Formulating research goals, selecting research goals, defining research framework, gathering knowledge, organising knowledge, selecting research strategies, methods, tools and techniques, proposing experiments, designing experiments and selecting experiment materials, setting expectations, conducting experiments, data collection, data evaluation, hypothesis formation, theory formation, theory revision, goal satisfaction control, and producing explanations.
Any of these research tasks may concern activities dealing with a variety of planning, classification and evaluation problems. To provide an idea about the diversity of the activities involved in these research tasks, we will give some of the results of our study on the research in oxide superconductivity (Kocabas, 1992b) in terms of five of the research tasks listed above. These are: Formulation of scientific research goals, choosing between formulated goals, proposing strategies, proposing experiments, and hypothesis formation.
Research goals can be divided into two general forms that may overlap: Those that aim at explaining a phenomenon, and those that aim to study a penomenon. Creative scientists seem to utilize several general rules for formulating their research goals: They focus their attention to problems and phenomena that have not been explained or unexplainable within the current scientific framework. However, such problems must have some general and important implications to be worthy of investigation. For example, why the moon has more craters in one particular area than others may not be regarded as an interesting problem. On the other hand, the research for understanding why some elementary particle reactions have never been observed would be important, because the results would interest not only quantum physics but also cosmology. Some scientific research problems may be strongly related to important technological needs. Energy conversion, storage, and transfer are still major technological problems that motivate scientific research into such areas as 'cold fusion', oxide superconductivity, and electrochemistry. However, interestingness in itself is not a sufficient criterion for a phenomenon to attract the attention of the creative scientist. The research goals that are formulated must be achievable.
It is not unusual that, in relation to a certain phenomenon, a scientist formulates alternative research goals. In such cases, the selection of a research goal among alternatives is another research task. Scientists use several selection criteria in deciding which problem to primarily focus on. Interestingness, importance, the materials and technological tools needed, economic constraints, and acheivability within a timescale are some of the metrics that affect the decision. As can be seen, some of these constraints conflict with one another, so that the scientist may have to do some classification before deciding which goal to select.
Selecting research strategies is another important task for achieving a research goal. Strategy selection depends on the type of the research goal. If the goal is to explain a certain phenomenon, gathering knowledge by detailed literature survey and theoretical analysis may take precedence. On the other hand, if the goal is to study a phenomenon, then experimentation and observation needs to be considered. If experimentation is selected, then the types of experiments needs to be decided. For example, if the research goal is to study the possibility of improving a certain important physical property (e.g., electrical conductivity) of a substance, there may be a number of alternative strategies. The following are only a few of the strategy heuristics extracted from the research reports on oxide superconductivity in 1987.
If the goal is to improve a property P and a process S improves P, then propose experiments applying S.If the goal is to improve a property P and another property Q is positively related with P, and a process S improves Q, then propose experiments to apply S.If the goal is to improve a property P, and another property Q has a negative effect E on P, then propose experiments to reduce or eliminate E.
Once the experimentation strategy is selected, the scientist has to decide about the relevant processes and techniques for the current strategy. S/he also has to decide about the experiment materials, and has to classify these materials against a set of parameters such as availability, likeliness to yield success, cost and relative hazards (e.g., radioactivity, flammability and corrosiveness), and select the best materials for the experiments.
Scientific experiments need to be designed and conducted according to certain procedures. Experimental parameters are defined, tests are made and parameter values are measured, and in this way relevant data is collected. The data is evaluated to make sure if they reflect any violation of the experimental conditions. After data evaluation, hypotheses are formed.
Hypothesis formation is one of the most important tasks of scientific research. Despite the fact that it has been a primary concern of the conventional philosophy of science for a long time, it still remains to be an aspect of scientific discovery that needs a detailed investigation. In our study on oxide superconductivity research, we have identified over 40 hypothesis formation heuristics that were utilized by scientists working in this field. Some of these heuristics are as follows:
If the value of a property P changes with the value of another property Q, then hypothesize that P and Q are related.If a process does not change a set of experimental parameters P1,...,Pm, but changes other such parameters Q1,...,Qn, then hypothesize that P1,...,Pm and Q1,...,Qn are independent.If a process is expected to enhance a property P of a substance M, but the expected increase does not take place, then hypothesize that there is another property Q hindering the effect.
The majority of these heuristics are general, while some are domain specific. Two examples are as follows:
If a change in the crystal structure S1 sharply diminishes a property P (e.g. electrical or thermal conductivity), and the change is also accompanied by the disappearence of some substructure S2, then hypothesize that S1 plays an important role for P.If two compounds M1 and M2 have have very similar bonding and electronic structure over a wide range of temperature, then hypothesize that M1 and M2 have very similar conduction properties within the same range.
These are only some examples of hypothesis formation rules used in a rather specialized domain of physical science. We will not discuss the methods and rules used in hypothesis verification and theory revision here for reasons of space. However, by considering the rules and methods used in various fields of science, from physical to human sciences, and over a dozen research tasks in each of them, we can realize the dimensions of scientific research.
The diversity of interrelated research tasks is by itself sufficient to show that, scientific discovery is not a logical procedure or a process in itself, but the product of a series of complex processes called scientific research. Scientific creativity may be required in any of the research activities in these processes. History of physics has many examples of this. Although an extreme example, consider the design, construction and the operation of the CERN particle accelerator, where research involves proposing and designing experiments, setting expectations, conducting experiments, data collection, data evaluation, hypothesis formation and verification, and theory revision.
Conventional philosophical approach ignores the multiplicity of the tasks and activities involved in scientific inquiry. We believe that, a much more detailed and careful examination and analysis of science is needed than that is envisaged by the conventional study of science. The computational approach provides both the necessary concepts and methods for such a study.
Conclusion
Scientific creativity needs to be investigated within its natural environment, namely within the processes of scientific research and discovery. Conventional philosophy of science, probably due to the limitations of its scope, has ignored a number of issues about science. Scientific creativity displays itself in the form of scientific discovery, which in turn, is the product of a series of complex tasks called scientific research. Therefore, a comprehensive study of science and scientific discovery requires a sufficiently rich set of concepts for a detailed and systematic investigation. Recent developments in the computational study of science provides some of these concepts. Based on these concepts, we have introduced a more detailed definition of scientific creativity, classified scientific discovery and creativity, and examined the the role of background knowledge in discovery within the wider dimensions of scientific research. A systematic investigation of scientific creativity cannot be conducted without considering the multiplicity of research tasks that have to be carried out by scientists during their activities.
References
Boden, M. 1990. The creative mind. Sphere Books, London.Engelmore, R. and Morgan, T. 1988. Blackboard systems. Addison Wesley.
Kocabas, S. 1991. Conflict resolution as discovery in particle physics. Machine Learning, Vol 6, No 3, 277-309.
Kocabas, S. 1992a. Functional categorization of knowledge. AAAI Spring Symposium Series, 25-27 March 1992, Stanford, CA.
Kocabas, S. 1992b. Elements of scientific research: Modeling discoveries in oxide superconductivity. Proceedings of the ML92 Workshop on Machine Discovery, 63-70.
Kocabas, S. 1992c. Evaluation of discovery systems. Proceedings of the ML92 Workshop on Machine Discovery, 168-171.
Kulkarni, D. and Simon, H. 1988. The processes of scientific discovery. Cognitive Science, 12, 139-175.
Langley, P., Simon, H., Bradshaw, G., and Zykow, J. 1987. Scientific discovery: Exploration of the creative processes. MIT Press.
Lenat, D.B. 1979. On automated scientific theory formation: A case study using the AM program. In Hayes, J., Michie., D., and Mikulich, D.I. eds., Machine Intelligence, 9, 251-283, Halstead, New York.
Lenat, D.B. 1983. EURISKO: A program that learns new heuristics and domain concepts. Artificial Intelligence 21, 61-98.
Lenat, D.B. and Feigenbaum, E. 1987. On the thresholds of knowledge. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, 1173-1182.
Nordhausen, B. and Langley, P. 1987. Towards an integrated discovery system. Proceedings of the Tenth International Joint Conference on Artificial Intelligence, 198-200.
O'Rorke, P., Morris, S. and Schulenburg, D. 1990. Theory formation by abstraction. In Shrager, J., and Langley P. eds. Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rajamoney, S.A. 1990. A computational approach to theory revision. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Rose, D. and Langley, P. 1986. Chemical discovery as belief revision. Machine Learning, 1, 423-452.
Shrager, J., and Langley, P. Eds. 1990. Computational approaches to scientific discovery. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Thagard, P. 1988. Computational philosophy of science. The MIT Press, Cambridge, MA.
Thagard, P. and Holyoak, K. 1985. Discovering the wave theory of sound: inductive inference in the context of problem solving. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, 610-612.
Thagard, P. and Nowak, G. 1990. The conceptual structure of the geological revolution. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Zytkow, J.M. 1990. Deriving laws through analysis of processes and equations. In Shrager, J., and Langley P., eds., Computational models of scientific discovery and theory formation. Morgan Kaufmann, San Mateo, CA.
Zytkow, J.M. and Simon, H. 1986. A theory of historical discovery: The construction of componential models. Machine Learning, 1, 107-137.
CV
RESUME
Şakir Kocabaş:
Date & Place of Birth : 08 September 1945, Istanbul
Education:Degree / Years /UniversityAssoc. Prof.1994Istanbul Tecnical University,Dept. of Space Sciences and Tech.
Ph.D. (Artificial Intelligence)1985-90 King's College London Dept. of Information Engineering
Msc. (Chem. Engineering)1967-70 Istanbul Technical UniversityFaculty of Chem. Engineering
Bsc. (Chem. Engineering)1963-67 Istanbul Technical University
Academic Experience:- Research experience in Artificial Intelligence (Knowledge Organisation and Representation, Machine Learning and Scientific Discovery)- Project experience in Real-Time Simulation in Artificial Intelligence. - Programming experience in Prolog.
Career:91- Lecturer at Istanbul Technical University, teaching AI-1 and AI-2 (graduate courses), Logic Programming, and Physics (undergraduate courses).
Research Project: Automated fomulation of reactions and reaction pathways in nuclear astrophysics.
91-98 Head of the Artificial Intelligence Department, Marmara Research Center (MRC), 41470 Gebze, Turkey. (Took active part in founding the Artificial Intelligence Department at MRC in 1991.)
Projects led as Project Manager at MRC:
1) EUCLID RTP 11.3 WP 2 (1993-97). This project involved the use of AI techniques in real-time simulation, and received the 1997 successful project award by MRC (June 6, 1998).
2) EUCLID RTP 11.7 (1996-97). Completed successfully.
3) Research on the Computational Modeling of Scientific Discovery. (Conference papers published.)
4) Cyrillic Optical Character Recognition. (Prototype developed.)
90-91 Private research on discoveries in Particle Physics and Oxide Superconductivity.
85-90 Ph.D. study at King's College, London University. Research title: "Functional Categorization of Knowledge: Applications in Modeling Scientific Research and Discovery."
71-85 Various R&D and administrative work in chemical industry in Turkey and UK.
Organizational Memberships:
- American Association for Artificial Intelligence (AAAI). (Since 1991)
-British Computer Society, Specialist Group on Expert Systems (BCS-SGES). (1989-1996)- The Society for Artificial Intelligence and Simulation of Behaviour (AISB). (Since 1989).
Community Activities:1) Organising and participating in various philosophical symposia and seminars in London, 1985-1990; and in Istanbul, 1992 to date.2) Participating in the annual Turkish AI and NN (TAINN) symposia as a member of the organising comittee.3) Speaking, by invitation, on the national radio and TV programmes on philosophical, scientific and cultural matters.
Academic Interests: - Artificial Intelligence, Scientific Discovery, and Philosophy and History of Science.
Main Philosophical Publications:1. Kocabas, S. (1984). "İfadelerin Gramatik Ayirimi" (in Turkish, "Grammatical Classification of Propositions".) Ekin Yayinlari: Istanbul, 222 pp. [This work received the "Thought and Philosophy" award of the Turkish Writers Association in 1985].
2. Kocabas, S. (1997). "Islam'da Bilginin Temelleri" (in Turkish, "Foundations of Knowledge in Islam".) İz Yayincilik: Istanbul, 140 pp. [This book received the "Research" award of the Turkish Writers Association in 1997].
Şakir Kocabaş:
Date & Place of Birth : 08 September 1945, Istanbul
Education:Degree / Years /UniversityAssoc. Prof.1994Istanbul Tecnical University,Dept. of Space Sciences and Tech.
Ph.D. (Artificial Intelligence)1985-90 King's College London Dept. of Information Engineering
Msc. (Chem. Engineering)1967-70 Istanbul Technical UniversityFaculty of Chem. Engineering
Bsc. (Chem. Engineering)1963-67 Istanbul Technical University
Academic Experience:- Research experience in Artificial Intelligence (Knowledge Organisation and Representation, Machine Learning and Scientific Discovery)- Project experience in Real-Time Simulation in Artificial Intelligence. - Programming experience in Prolog.
Career:91- Lecturer at Istanbul Technical University, teaching AI-1 and AI-2 (graduate courses), Logic Programming, and Physics (undergraduate courses).
Research Project: Automated fomulation of reactions and reaction pathways in nuclear astrophysics.
91-98 Head of the Artificial Intelligence Department, Marmara Research Center (MRC), 41470 Gebze, Turkey. (Took active part in founding the Artificial Intelligence Department at MRC in 1991.)
Projects led as Project Manager at MRC:
1) EUCLID RTP 11.3 WP 2 (1993-97). This project involved the use of AI techniques in real-time simulation, and received the 1997 successful project award by MRC (June 6, 1998).
2) EUCLID RTP 11.7 (1996-97). Completed successfully.
3) Research on the Computational Modeling of Scientific Discovery. (Conference papers published.)
4) Cyrillic Optical Character Recognition. (Prototype developed.)
90-91 Private research on discoveries in Particle Physics and Oxide Superconductivity.
85-90 Ph.D. study at King's College, London University. Research title: "Functional Categorization of Knowledge: Applications in Modeling Scientific Research and Discovery."
71-85 Various R&D and administrative work in chemical industry in Turkey and UK.
Organizational Memberships:
- American Association for Artificial Intelligence (AAAI). (Since 1991)
-British Computer Society, Specialist Group on Expert Systems (BCS-SGES). (1989-1996)- The Society for Artificial Intelligence and Simulation of Behaviour (AISB). (Since 1989).
Community Activities:1) Organising and participating in various philosophical symposia and seminars in London, 1985-1990; and in Istanbul, 1992 to date.2) Participating in the annual Turkish AI and NN (TAINN) symposia as a member of the organising comittee.3) Speaking, by invitation, on the national radio and TV programmes on philosophical, scientific and cultural matters.
Academic Interests: - Artificial Intelligence, Scientific Discovery, and Philosophy and History of Science.
Main Philosophical Publications:1. Kocabas, S. (1984). "İfadelerin Gramatik Ayirimi" (in Turkish, "Grammatical Classification of Propositions".) Ekin Yayinlari: Istanbul, 222 pp. [This work received the "Thought and Philosophy" award of the Turkish Writers Association in 1985].
2. Kocabas, S. (1997). "Islam'da Bilginin Temelleri" (in Turkish, "Foundations of Knowledge in Islam".) İz Yayincilik: Istanbul, 140 pp. [This book received the "Research" award of the Turkish Writers Association in 1997].
Subscribe to:
Posts (Atom)