无监督学习和皮层发育-深度学习: 智能时代的核心驱动力量在线阅读

语速1.0: 2.0

进度0:

无监督学习和皮层发育

玻尔兹曼机既可以用来进行监督学习，即输入和输出都被钳制；也可以用于无监督学习，即只有输入被钳制。杰弗里·辛顿使用无监督的版本一次一层地搭建出了一个深度玻尔兹曼机。^[27] 从连接到输入单元的一层隐藏单元开始——即受限玻尔兹曼机（restricted Boltzmann machine），杰弗里用未标记的数据对它进行训练，这些数据比标记数据更容易获得（互联网上有数十亿未标记的图像和录音），可以使学习进度更快。无监督学习的第一步是从数据中提取所有数据共有的统计规律，但第一层隐藏单元只能提取简单的、用感知器也可以表现的特征。

下一步是将权重固定到第一层，并在上面添加第二层单元。更多的无监督玻尔兹曼学习产生了更复杂的一组特征，可以不断重复这一过程来创建具有多层深度的网络。

因为上层的单元包含更多非线性的低层特征组合，使得它们可以作为一个整体，从具体的特征中抽象出更普遍的特征。因此，在上层进行的分类变得容易得多，只需要更少的训练样本就能在更高的计算水平达到收敛。尽管如何描述这种解决方案背后的数学原理依然有待商榷，但已经有新的几何工具开始在这些深度网络中崭露头角了。^[28] 皮层似乎也是逐层生长的。在视觉系统发育的早期阶段，作为第一层从眼睛接收输入信号的神经元，初级视觉皮层中的神经元具有很高的可塑性，并且很容易根据视觉输入流重新建立连接，这种连接在关键时期结束后消失（这部分在第5章已经讨论过）。大脑后部的视觉区域和其他感官流的层级结构最先发育成熟，靠近大脑前部的皮层区域则需要更长的时间。前额叶皮层，也就是最靠近大脑前端的部分，直到成年早期才可能完全发育成熟。在关键期，皮层区域的连接接受神经活动影响最大，这些层叠的关键时期导致了皮层的逐渐发育。加州大学圣迭戈分校的认知科学家杰弗里·艾尔曼（Jeffrey Elman）和伊丽莎白·贝茨（Elizabeth Bates）跟其他同事一起，针对皮层的逐渐发育如何帮助儿童通过了解世界而获得新能力，给出了联结主义网络角度的解释。^[29] 这一工作为解释我们漫长的童年如何使人类成为最善于学习的物种，开辟了一个新的研究方向，同时也对先前围绕某些与生俱来的行为产生的论调提供了新的视角。

史蒂文·库沃茨（Steven Quartz）曾经是我实验室里的博士后研究员，现在在加州理工学院做教员。我们在合作发表的《骗子、爱人和英雄》（Liars, Lovers and Heroes）^[30] 一文中曾经写道，在儿童和青少年时期的大脑发育过程中，经验可以深刻地影响神经元的基因表达，从而改变负责行为的神经回路。基因的差异性以及环境影响之间的交互作用，让我们能够从新的角度认识大脑发育的复杂性。这一活跃的研究领域超越了先天与后天的辩论，并从文化生物学的角度对其进行了重新定义。我们的生物特性不仅促成了人类文明的诞生，也反过来被人类文明影响着。^[31] 最近的一项发现为这个故事续写了新篇章：当神经元之间突触的形成在早期发育过程中迅速增加时，神经元内部的DNA在诞生后通过一种甲基化的形式进行表观遗传修饰，这种甲基化调节基因的表达是大脑所独有的。^[32] 这种表观遗传修饰可将我和史蒂文之前设想的经验与基因之间建立起某种连接。

到了20世纪90年代，神经网络革命开始如火如荼地进行。认知神经科学领域正在扩展，计算机变得越来越快，但还不够快。玻尔兹曼机的技术性能非常出色，但是要模拟起来却慢得让人无法忍受。真正帮助我们取得进展的是一种更快的学习算法，恰恰在我们最需要它的时候，它与我们不期而遇了。

大多数神经元可以做出决定的最快速度大约为10毫秒，并且在1秒内做出决定所需要的时间不超过100时步。
涉及电磁学时，法拉第（Michael Faraday）的物理学是邋遢型的，而麦克斯韦（James Clerk Maxwell）的是整洁型的。
Theodore Holmes Bullock and G. Adrian Horridge, Structure and Function in the Nervous Systems of Invertebrates (San Francisco: W. H. Freeman, 1965).
E. Chen, K. M. Stiefel, T. J. Sejnowski, and T. H. Bullock, “Model of Traveling Waves in a Coral Nerve Network,” Journal of Comparative PhysiologyA194, no. 2(2008): 195–200.
D. S. Levine and S. Grossberg, “Visual Illusions in Neural Networks: Line Neutralization, Tilt after Effect, and Angle Expansion,” Journal of Theoretical Biology 61, no. 2 (1976):477–504.
G. B. Ermentrout and J. D. Cowan, “A Mathematical Theory of Visual Hallucination Patterns,” Biological Cybernetics 34, no. 3 (1979):137–150.
J. J. Hopfield, “Neural Networks and Physical Systems with Emergent Collective Computational Abilities,” Proceedings of the National Academy of Sciences of the United States of America 79, no. 8 (1982): 2554–2558.
尽管1976年的马尔–波吉奥立体视觉模型（第4章提到）的神经网络是对称的（因为马尔和波吉奥对所有单元进行了同步更新），但其网络的动态状况比使用异步更新的霍普菲尔德网络要复杂得多。D. Marr, G. Palm, and T. Poggio T, “Analysis ofaCooperative Stereo Algorithm,” Biological Cybernetics 28, no. 4(1978): 223–239.
L. L. Colgin, S. Leutgeb, K. Jezek, J. K. Leutgeb, E. I. Moser, B. L. McNaughton,and M. B Moser, “Attractor-Map versus Autoassociation Based Attractor Dynamics in the Hippocampal Network,” Journal of Neurophysiology 104, no. 1 (2010):35–50.
文献中采用模拟计算网络模型，通过sigmoid激发函数产生输出。——译者注
J. J. Hopfield and D. W. Tank,“‘Neural’ Computation of Decisions in Optimization Problems,” Biological Cybernetics 52, no. 3 (1985):141–152. 旅行商问题在计算机科学界很出名，它是解决问题所需的时间随着问题规模的增大而迅速增加的典型代表。
Dana H. Ballard and Christopher M. Brown, Computer Vision (Englewood Cliffs,NJ: Prentice Hall, 1982).
D. H. Ballard, G. E. Hinton, and T. J. Sejnowski, “Parallel Visual Computation,”Nature 306, no. 5938 (1983): 21–26: R. A. Hummel and S. W. Zucker, “On the Foundations of Relaxation Labeling Processes,” IEEE Transactions on Pattern Analysis and Machine Intelligence 5, no. 3 (1983): 267–287.
S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, “Optimization by Simulated Annealing,” Science 220, no. 4598 (1983): 671–680.
P. K. Kienker, T. J. Sejnowski, G. E. Hinton, and L. E. Schumacher, “Separating Figure from Ground withaParallel Network,” Perception 15 (1986): 197–216.
H. Zhou, H. S. Friedman, and R. von der Heydt, “Coding of Border Ownership in Monkey Visual Cortex,” Journal of Neuroscience 20, no. 17 (2000): 6594–6611.
与边缘单元相背离的图形单元被认为是背景。——译者注
退火算法中的温度变量。——译者注
钳制成所期望的输入值和输出值。——译者注
Donald O. Hebb, The Organization of Behavior:ANeuropsychological Theory (New York:Wiley&Sons., 1949), 62.
形容一个有精神病症状的人，将妄想的信念传递给另一个人，使对方也产生感应性妄想。——译者注
T. J. Sejnowski, P. K. Kienker, and G. E. Hinton, “Learning Symmetry Groups with Hidden Units: Beyond the Perceptron,” Physica 22D (1986): 260–275.
N. J. Cohen, I. Abrams, W. S. Harley, L. Tabor, and T. J. Sejnowski, “Skill Learning and Repetition Priming in Symmetry Detection: Parallel Studies of Human Subjects and Connectionist Models,” in Proceedings of the 8 th Annual Conference of the Cognitive Science Society (Hillsdale, NJ:Erlbaum, 1986), 23–44.
B. P. Yuhas, M. H. Goldstein Jr., T. J. Sejnowski, and R. E. Jenkins, “Neural Net- work Models of Sensory Integration for Improved Vowel Recognition,”Proceedings of the IEEE 78, no. 10(1990): 1658–1668.
G. E. Hinton, S. Osindero, and Y. Teh, “A Fast Learning Algorithm for Deep Belief Nets,” Neural Computation 18, no. 7 (2006): 1527–1554.
J. Y. Lettvin, H. R. Maturana, W. S. McCulloch, and W. H. Pitts, “What the Frog’s Eye Tells the Frog’s Brain,” Proceedings of the Institute of Radio Engineers 47,no. 11 (1959): 1940–1951.
http://hearingbrain.org/docs/letvin_ieee_1959.pdf.
R. R. Salakhutdinov and G. E. Hinton, “Deep Boltzmann Machines,” in Proceedings of the 12 th International Conference on Artificial Intelligence and Statistics,Journal of Machine Learning Research 5 (2009): 448–455. 保罗·斯莫伦斯基（Paul Smolensky）介绍了玻尔兹曼机的一种特殊情况，他称之为Harmonium: P. Smolensky, “Information Processing in Dynamical Systems:Foundations of Harmony Theory,” in David E. Rumelhart and James L. McLelland(eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations (Cambridge, MA: MIT Press, 1986), 194–281.
B. Poole, S. Lahiri, M. Raghu, J. Sohl-Dickstein, and S. Ganguli, “Exponential Expressivity in Deep Neural Networks through Transient Chaos,” in Advances in Neural Information Processing Systems 29 (2016): 3360–3368.
Jeffrey L. Elman, Elizabeth A. Bates, Mark H. Johnson, Annette Karmiloff-Smith,Domenico Parisi, and Kim Plunkett, Rethinking Innateness:AConnectionist Perspective on Development(Cambridge, MA: MIT Press, 1996).
Steven R. Quartz and Terrence J. Sejnowski, Liars, Lovers and Heroes: What the New Brain Science Has Revealed about How We Become Who We Are (New York:Harper-Collins, 2002).
S. Quartz and T. J. Sejnowski, “The Neural Basis of Cognitive Development: A Constructivist Manifesto,” Behavioral and Brain Sciences 20, no. 4 (1997): 537–596.
这被称为“非CG甲基化”（non-CG methylation）。参阅R. Lister, E. A.Mukamel, J. R.Nery, M. Urich, C. A. Puddifoot, N. D. Johnson, J. Lucero, Y.Huang A. J. Dwork, M. D. Schultz, M.Yu, J. Tonti-Filippini, H. Heyn, S. Hu, J. C.Wu, A. Rao, M. Esteller, C. He, F. G. Haghighi, T. J.Sejnowski, M. M. Behrens,J. R. Ecker, “Global Epigenomic Reconfiguration during Mammalian Brain Development,” Science 341, no. 6146 (2013): 629.