CMU Psychology Department

710core · Veteran Member

Posts for Modeling Development

710core · Veteran Member

Some questions for discussion

710core · Veteran Member

RE: Posts for Modeling Development

710core · Veteran Member

Hello,

I have a question about Bayesian modeling and Bayesian networks. As far as I understand the function of those models, they require some initial information about the likelihood of the events in question in order to produce output. That likelihood however is basically a set of probabilities that describe some aspect of the world. If a model for example is designed to perceive the motion of a moving object in two dimensions, then the initial assumption of the model could be a two dimensional probability plot with the axes indicating speed from 0 to some max speed in directions up, down, left and right. Each point then could be one object that has some speed in some particular direction. If a lot of objects are observed, then perhaps the model will form a circular plot centered at the origin with points densely packed close to zero and becoming fewer and fewer as we move further away from zero and close to the max speed. My question then becomes, what happens if we induce damage to such a model so that some part of that plot disappears. More importantly if we induce damage early in learning how would the plot mentioned above develop and how closely could it correlate to real lesion patient data?

Valentinos Zachariou

710core · Veteran Member

Re:

Physical changes in the developing child can prompt cognitive changes. An autonomous robot does not develop physically. Has anyone ever trained a robot with poor sensors, and then imbued the robot with more sophisticated sensors? Allow the robot to start small, and then give it access to a richer stream of input.

I don't know about robots, but there has been some documented success in terms of starting to train a connectionist model on simple stimuli and then gradually making the inputs more complex (Elman, 1993 I think).

Blair

-- Edited by 710core at 00:19, 2008-02-13

710core · Veteran Member

Elman (2005):

Elman (see also Gopnik & Tenenbaum, 2007) notes that one criticism of connectionist models is that they only generalize to their immediate experience, which would not explain many phenomena such as reasoning by analogy. Given that brain function can be roughly characterized by connectionist models, what does this imply for analogy? Do we simply learn from the richness of the environment that similar solutions apply across domains, and via some learning procedure ensure that learning connects these domains (via attractor basins perhaps)? Do recent advances in deep network architectures (e.g., Hinton, 2007), in which the original stimuli is abstracted away to an extreme level before encoding a solution explain how we learn to generalize beyond immediate experience? Or do current connectionist algorithms/architectures miss the point somehow?

Understanding how models overcome the poverty of the input stimulus to learn correct rules but not other rules which fit the data (e.g., learning grammar) is a problem for the classic connectionist algorithms. However, it is mentioned in the article that Bayesian principles are able to solve this problem, and I believe that other connectionist algorithms (e.g., contrastive Hebbian learning, restricted Boltzmann learning) are able to overcome similar problems as well. If this is the case, can all of these learning algorithms be considered functionally equivalent, but operating at different levels of analysis? Arent contrastive Hebbian learning and Bayesian learning very similar in how they compute outcomes? (let me know if Im going to need to describe contrastive Hebbian learning in detail)

Gopnik & Tenenbaum (2007):

Based on the offered description of causal Bayesian networks, I dont see how they differ conceptually from structural equation modeling (in that variables can be specified to be causal or correlational, or have no explicit relation). Are these techniques just different subtle flavors of a same general idea (as Principal Component Analysis and Factor Analysis could be thought of) or have I missed something deeper?

It is mentioned that Kemp et al. (2004) show that a system of categories and their properties is best organized according to a taxonomic tree structure. Both for this particular model and in general, how does one decide what relational structure best fits the data? In terms of variance accounted for? Am I correct in assuming that rather than consider all of the possible ways in which variables could be related, only a few key ones are every examined (e.g., circular chaining, causal chaining, tree structure)? When a structure is imposed on data, is that to say that it was innate, or that it was somehow discovered? If it is innate, are these structures essentially an algorithmic (as opposed to a computational) description of Spelkes core knowledge theory?

Meeeden & Blank (2006) and Kuipers et al.:

I dont understand why these researchers are so against supervised learning. It clearly happens in human development, and may provide critical structure to how humans give some base structure to their representations. Why not combine the approaches rather than try to learn everything from scratch?

The robot in Kuipers et al. learns to simplify a very large number of perceptual states into a few distinctive states. How do these models decide when a distinctive state is necessary and when many perceptual states can be collapsed into a single state? Do humans need to program in a constant for when more states should be collapsed into a single state (e.g., if % variance accounted by 2 states is not significantly different from % variance accounted for by 1 state, keep only one state?). If not, how do these models learn to dynamically adjust the number of states necessary to represent an optimal amount of information, which is both maximally informative and minimally processing intensive?

Discussion questions:

1) Marrs three levels of analysis seem reasonable to me. By and large, they remind me of how one might go about describing how to write an essay; first one might mentally brain storm an essay, then decide how to organize the ideas coherently, and then finally decide on a method by which to produce the essay. But maybe Ive been working with computers too long

2) Given the fact that computational theories (e.g., Bayesian models) have been able to account for data that algorithmic theories (e.g., connectionist models ) have not yet been able to, the answer is obvious. Rather, I think the question is when this level of analysis is appropriate. It seems to me that the best time to use this level of analysis is either at the very start or the very end of researching a domain (or perhaps when you get stuck in the middle). Computational models do a good job of keeping things simple, relatively speaking, so they are very good to either provide some initial characterization of a question, or provide a nice way of giving the gist of a more complex solution outlined at a more fine-grained level of analysis. Short of being stuck, however, I dont think they are particularly useful mid-way through research because they lack the detail which may be useful in modeling and validating a more precise account which may not be correct in the details.

3) Again, its hard to say no Obviously, structured representations (e.g., rules) are useful shortcuts for modeling a problem and help keep things simple so that our human brains can keep a version of a problem in working memory and try to further research without recourse to a computer. However, I worry that by using rules we run the risk of becoming complacent or naïve and assuming that the brain must instantiate these rules as is. Rules may be useful to keep problems simple, but ultimately we need to figure out what the brain is doing if we want to be sure we understand human cognitive processes.

4) I dont know that different modeling approaches compete, other than for our attention. Rather, I see them as each offering different advantages and disadvantages, which when fully explored, can help all modeling frameworks (or perhaps refute some so maybe they could compete).

5) A tricky question Other than learning, I suppose that we need to take into account changes to the architecture of the system (e.g., more neurons, different connectivity patterns, body maturation), changes to the environment (e.g., starting to go to school, learning a new sport), changes to learning strategies (e.g., learning new representational techniques), as well as the effects of some general life-long learning mechanisms. The problem here is that both conceptually and in terms of computer resources and raw data (e.g., data on what its like to go to school or how quickly neuronal structure changes in humans), it isnt feasible to implement models on this level of grandeur. So to an extent, we cant really look at too much more than we already had. Thus, I would argue that the best we can do is keep doing what were doing and try to model one or two big changes that we think are maximally important, and as we gain an understanding of these processes we can iteratively make the model more complex. However, I fear that at some point the whole may be more than the sum of its components, and we may not be able to make any progress without looking at all of these variables simultaneously

Blair

cpaynter · Member

The readings obviously advocate unsupervised models of learning. A clear problem with this approach, however, is that it can quickly become too unconstrained and lead to combinatorial explosions. How well do some of the bootstrapping techniques discussed in the readings handle this? Also, Blair mentioned changes in the architecture of the system. I know that in the nervous system, the general trend is that there is a decrease in the number of local connections, as a large number of neurons die off in the first few years of life. However, there is also an increase in the number of synapses between different areas. Have there been any attempts in robotics to incorporate this insight?
-Chris

710core · Veteran Member

2) Most psychologists focus on the level of representation and algorithm
and/or the implementation level. Should psychologists care about the level
of computational theory?
- Of course! Psychologists have the unique task of objectively understanding which they themselves possess and are indeed utilizing for the objective understanding. As such, we necessarily are obscured by our own perception, and where this may be most influential is in our definition of a psychological process that we are interested in studying. Having a more objective device for understanding what the psychological device is doing is supremely beneficial because it may reveal truths about the device that psychologists just cannot see from the perspective they are taking.

-------

In addition to considering other sources of change in our models, we should consider the gating of change by other development. When we model the development of a single process, we ignore the influence of development of other processes. In reality, some developments may not occur until after others, etc. For example, understanding the semantic meaning of a word requires many systems to work in synchrony (auditory, attention, visual, social...), but we often do not take these all into account at once. Surely, modeling at the system level will lead to the most accurate results. Why don't more people do this? Or - maybe I am ignorant to the research. Do people somehow incorporate other systems in their models? (via assumed static parameters? or in some dynamic way?) Maybe we just aren't there yet?

--------

It seems that if you consider Bayesian models and connectionist modeling as addressing separate levels of the question, as Marr says, then they are not in conflict but rather are complementary. Is it the case that researchers use Bayesian modeling functionally, to fully define the process in question, following which they use connectionist modeling or production systems to understand the algorithm that governs that process? Practically speaking, can Marr's levels be considered a flow, and if so, do researchers treat it as such? Do Bayesian modelers also complete connectionist models? Do Bayesian and connectionist modelers communicate well with each other?

What kind of modeling is used to address Marr's last level, hardware implementation?

-Erika

710core · Veteran Member

I would like to know in which domain does PDP works better than Bayesian, and also the opposite case. There is a difference in the meaning of the modeling between the two, so I guess the difference in application arises from there. However, would it be possible to model development of language using Bayesian (something like modeling constrained statistical learning)?

Sung-joo

710core · Veteran Member

Bayesian is more like some kind of algorithm rather than a structure like PDP or production system to me. Could you give some ideas why you treat it as the same level of the other two?