应对现实社会的复杂性
当前的大多数学习算法是在25年前开发的,为什么它们需要那么长的时间才能对现实世界产生影响呢?20世纪80年代的研究人员使用的计算机和标记数据,只能证明玩具问题的原理。尽管取得了一些似乎颇有前景的成果,但我们并不知道网络学习及其性能如何随着单元和连接数量的增加而增强,以适应现实世界问题的复杂性。人工智能中的大多数算法缩放性很差,从未跳出解决玩具级别问题的范畴。我们现在知道,神经网络学习的缩放性很好,随着网络规模和层数的不断增加,其性能也在不断增强。特别是反向传播技术,它的缩放性非常好。
我们应该对此感到惊讶吗?大脑皮层是哺乳动物的一项发明,在灵长类动物,尤其是人类中得到了高度发展。随着它的扩展,更多的功能慢慢出现,并且更多层次被添加到了关联区域,以实现更高阶的表征。
很少有复杂系统可以实现如此高级的缩放。互联网是为数不多的已经被扩大了100万倍的工程系统之一。一旦通信数据包协议建立起来,互联网就会开始进化,正如DNA中的遗传密码使细胞演化成为可能一样。
使用相同的一组数据训练许多深度学习网络,会导致生成大量不同的网络,它们都具有大致相同的平均性能水平。我们想知道的是,所有这些同等优秀的网络有哪些共同之处,而对单个网络进行分析并不能揭示这一点。理解深度学习原理的另一种方法是进一步探索学习算法的空间;我们只在所有学习算法的空间中对几个位置进行了抽样尝试。从更广泛的探索中可能会出现一种学习计算理论,该理论与其他科学领域的理论一样深奥,[28] 可能为从自然界中发现的学习算法提供更多的解释。
蒙特利尔大学的约书亚·本吉奥[29] (见图9–8),和杨立昆一起,接替杰弗里·辛顿,成为CIFAR神经计算和NCAP项目的主任,该项目在通过十年评估后更名为“机器学习和大脑学习”项目(Learning in Machines and Brains)。约书亚率领蒙特利尔大学的一个团队,致力于应用深度学习来处理自然语言,这将成为“机器学习和大脑学习”项目新的研究重点。在十多年的会议中,这个由20多名教师和研究员组成的小组开启了深度学习的研究。过去5年来,深度学习在过去难以解决的许多问题上取得了实质性进展,这些进展归功于小组成员的努力,他们当然只是一个更庞大社区中的一小部分人(将在第11章探讨)。
图片来源:约书亚·本吉奥。
尽管深度学习网络的能力已经在许多应用中得到了证明,但如果单靠自身,它们在现实世界中永远都无法存活下来。[30] 它们受到了研究者的青睐,后者为其提供数据,调整超参数,例如学习速度、层数和每层中的单元数量,以改善收敛效果,还为其提供了大量计算资源。另一方面,如果没有大脑和身体的其他部分提供支持和自主权,大脑皮层也无法在现实世界中存活。在一个不确定的世界中,这种支持和自主权是一个比模式识别更难解决的问题。第10章将会介绍一种古老的学习算法,它通过激励我们寻求对自身有利的经验来帮助我们在自然界中生存。
- Thomas S. Kuhn, The Structure of Scientific Revolutions,2nd ed. (Chicago:University of Chicago Press, 1970), 23
- M. Riesenhuber and T. Poggio, “Hierarchical Models of Object Recognition In Cortex.” Nat Neurosci. 2: 1019-1025, 1999: T. Serre, A. Oliva, and T. Poggio, “A Feedforward Architecture Accounts for Rapid Categorization.” Proceedings of the National Academy of Sciences of the United States of America 104, no. 15 (2007):6424– 6429.
- Pearl, Probabilistic Reasoning in Intelligent Systems.Morgan Kaufmann: 1988.
- Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle, “Greedy Layer-Wise Training of Deep Networks,” in Bernhard Schölkopf, John Platt, and Thomas Hoffman, eds., Advances in Neural Information Processing Systems 19:Proceedings of the 2006 Conference (Cambridge, MA:MIT Press), 153–160.
- Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and Jürgen Schmidhuber, “Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies,” In John F. Kolen and Stefan C. Kremer, eds.,AField Guide to Dynamical Recurrent Neural Networks (New York: IEEE Press, 2001),237–243.
- D. C. Ciresan, U. Meier, L. M. Gambardella, and J. Schmidhuber, “Deep Big Simple Neural Nets for Handwritten Digit Recognition,” Neural Computation 22, no. 12(2010): 3207–3220.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems 25 (NIPS 2012). https://papers.nips.cc/paper/4824-imagenet-classificationwith-deep-convolutional-neural-networks.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems 25 (NIPS 2012). https://papers.nips.cc/paper/4824-imagenet-classificationwith-deep-convolutional-neural-networks.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” 2015.
https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf.
- Yann LeCun, “Modèles connexionistes de l’apprentissage (Connectionist learning models)” (Ph.D. diss., Université Pierre et Marie Curie, Paris, 1987).
- Krizhevsky, Sutskever, and Hinton, “ImageNet Classification with Deep Convolutional Neural Networks.”
- M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” 2013.
https://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf.
- Patricia Smith Churchland, Neurophilosophy: TowardaUnified Science of the Mind-Brain(Cambridge, MA: MIT Press, 1989).
- Patricia Smith Churchland and Terrence J. Sejnowski, The Computational Brain ,2nd ed.
(Cambridge, MA: MIT Press 2016).
- D. L. Yamins and J. J. DiCarlo, “Using Goal-Driven Deep Learning Models to Understand Sensory Cortex,” Nature Neuroscience 19, no. 3 (2016): 356–365.
- S. Funahashi, C. J. Bruce, and P. S. Goldman-Rakic, “Visuospatial Coding in Primate Prefrontal Neurons Revealed by Oculomotor Paradigms,” Journal of Neurophysiology 63, no. 4 (1990):814–831.
- J. L. Elman, “Finding Structure in Time,” Cognitive Science 14 (1990): 179–211: M. I. Jordan, “Serial Order:AParallel Distributed Processing Approach,”Advances in Psychology 121(1997): 471–495: G. Hinton, L. Deng, G. E. Dahl,A. Mohamed, N. Jaitly, A. Senior, et al., “Deep Neural Networks for Acoustic Modeling in Speech Recognition,” IEEE Signal Processing Magazine ,29, no. 6(2012): 82–97.
- S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation 9, no. 8(1997): 1735–1780.
- John Markoff, “When A.I. Matures, It May Call Jürgen Schmidhuber ‘Dad.’”New York Times, November 27, 2016, https://www.nytimes.com/2016/11/27/technology/artificial-intelligence-pioneer-jurgen-schmidhuber-overlooked.html.
- 美国喜剧演员,有一句非常著名的口头禅:“我觉得自己没有受到尊重。”——译者注
- K. Xu, J. L. Ba, K. Kiror, K. Cho, A. Courville, R. Slakhutdinov, R. Zemel,YBengio,“Show, Attend and Tell: Neural Image Captions Generation with Visual Attention,” 2015, rev. 2016.
https://arxiv.org/pdf/1502.03044.pdf.
- I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,A. Courville, Y. Bengio, “Generative Adversarial Nets,” Advances in Neural Information Processing Systems,2014.
https://arxiv.org/pdf/1406.2661.pdf.
- 参阅A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” 2016, https://arxiv .org/pdf/1511.06434.pdf : Cade Metz and Keith Collins, “How an A.I.‘Cat-and- Mouse Game’ Generates Believable Fake Photos,” New York Times,January 2, 2018.
https://www.nytimes.com/interactive/2018/01/02/technology/ai-generated-photos .html.
- K. Schawinski, C. Zhang, H. Zhang, L. Fowler, and G. K. Santhanam, “Generative Adversarial Networks Recover Features in Astrophysical Images of Galaxies beyond the Deconvolution Limit,” 2017. https://arxiv.org/pdf/1702.00403.pdf.
- J. Chang and S. Scherer, “Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks,” 2017. https://arxiv.org/pdf/1705.02394.pdf.
- A. Nguyen, J. Yosinski, Y. Bengio, A. Dosovitskiy, and J. Clune, “Plug&Play Generative Networks: Conditional Iterative Generation of Images in Latent Space,” 2016, https://arxiv.org/pdf/1612.00005.pdf: Radford, Metz, and Chintala,“Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” 2016.
https://arxiv.org/pdf/1511.06434.pdf.
- Guy Trebay, “Miuccia Prada and Sylvia Fendi Grapple with the New World,” New York Times,June 19, 2017. https://www.nytimes.com/2017/06/19/fashion/mensstyle/prada-fendi-milan-mens-fashion.html.
- T. R. Poggio, S. Rifkin, Mukherjee and P. Niyogi. “General Conditions for Predictivity in Learning Theory,” Nature 428, no. 6981 (2004): 419–422.
- 本吉奥也是包括微软在内的多家公司的顾问,他联合创建了Element AI公司,但他的重心是在学术界,并致力于科学和公共事业的进步。
- 参见前言部分:Churchland and Sejnowski, The Computational Brain, 2nd ed., ix–xv。
本书评论