The Agnostic Structure of Data Science Methods

  • Domenico Napoletani
  • Marco Panza CNRS
  • Daniele Struppa
Mots-clés: Philosophie des sciences, Big Data, Philosophie des mathématiques

Résumé

In this paper we argue that data science is a coherent and novel approach to empirical problems that, in its most general form, does not build understanding about phenomena. Within the new type of mathematization at work in data science, mathematical methods are not selected because of any relevance for a problem at hand; mathematical methods are applied to a specific problem only by `forcing’, i.e. on the basis of their ability to reorganize the data for further analysis and the intrinsic richness of their mathematical structure. In particular, we argue that deep learning neural networks are best understood within the context of forcing optimization methods. We finally explore the broader question of the appropriateness of data science methods in solving problems. We argue that this question should not be interpreted as a search for a correspondence between phenomena and specific solutions found by data science methods; rather, it is the internal structure of data science methods that is open to precise forms of understanding.

Références

Dhammika Amaratunga, Javier Cabrera, Ziv Shkedy. 2014. Exploration and Analysis of DNA Microarray and Other High-Dimensional Data. Hoboken: John Wiley & Sons.

https://doi.org/10.1002/9781118364505

A. Brandt. 2002. Multiscale Scientific Computation: Review 2001. In T. J. Barth, T. F. Chan, R. Haimes (eds.) Multiscale and Multiresolution Methods: Theory and Applications. Berlin-Heidelberg: Springer Verlag, 3-95.

https://doi.org/10.1007/978-3-642-56205-1_1

S. Brin & L. Page. 1998. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30, 107-117.

https://doi.org/10.1016/j.comnet.2012.10.007

C. Calude, G. Longo. 2017. The Deluge of Spurious Correlations in Big Data. Foundations of

Science, 22, 595–612.

https://doi.org/10.1007/s10699-016-9489-4

D. Ciresan, U. Meier, J. Schmidhuber. 2012. Multi-column deep neural networks for image classification. CVPR ’12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, 3642-3649.

https://doi.org/10.1109/cvpr.2012.6248110

I. Goodfellow, Y. Bengio, A. Courville. 2016. Deep Learning. The MIT Press.

https://doi.org/10.1007/s10710-017-9314-z

R. L. Graham, B. L. Rothschild, Joel H. Spencer. 2015. Ramsey Theory. Hoboken: John Wiley & Sons.

https://doi.org/10.1038/scientificamerican0790-112

T. Hastie, R. Tibshirani, J. Friedman. 2016. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition, Springer Series in Statistics, Springer.

https://doi.org/10.1007/978-0-387-84858-7

K. Hornik. 1991. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2), 251–257.

https://doi.org/10.1016/0893-6080(91)90009-t

G. Masterton, E. J. Olsson & S. Angere. 2016. Linking as voting: how the Condorcet jury theorem in political science is relevant to webometrics. Scientometrics, 106(3), 945-966.

https://doi.org/10.1007/s11192-016-1837-1

A. Minelli. 2003. The Development of Animal Form: Ontogeny, Morphology, and Evolution. Cambridge: Cambridge University Press.

https://doi.org/10.1017/CBO9780511541476

A. Minelli, 2011. A principle of Developmental Inertia. In B. Hallgrimsson and B. K. Hall (eds.) Epigenetics: Linking Genotype and Phenotype in Development and Evolution.

Berkeley, CA: University of California Press.

V. Mnih, K. Kavukcuoglu, . Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg and D. Hassabis. 2015. Humanlevel control through deep reinforcement learning. Nature, 518, 529-533

https://doi.org/doi:10.1038/nature14236

D. Napoletani, M. Panza, and D.C. Struppa. 2011. Agnostic science. Towards a philosophy of data analysis. Foundations of Science, 16, 1-20.

https://doi.org/10.1007/s10699-010-9186-7

D. Napoletani, M. Panza, and D.C. Struppa. 2013. Processes rather than descriptions? Foundations of Science, 18(3)3, 587-590.

https://doi.org/10.1007/s10699-013-9332-0

D. Napoletani, M. Panza, and D.C. Struppa. 2014. Is big data enough? A reflection on the changing role of mathematics in applications. Notices of the American Mathematical Society, 61(5), 485-490.

https://doi.org/10.2307/j.ctvc778jw.29

D. Napoletani, M. Panza, and D.C. Struppa. 2017. Forcing Optimality and Brandt’s Principle. In J. Lenhard and M. Carrier (ed.), Mathematics as a Tool. Boston Studies in the Philosophy and History of Science 327, Springer.

https://doi.org/10.1007/978-3-319-54469-4_13

D. Napoletani, E. Petricoin, D. C. Struppa. 2012. Geometric Path Integrals. A Language for Multiscale Biology and Systems Robustness. In The Mathematical Legacy of Leon Ehrenpreis. Springer Proceedings in Mathematics, 16, 247-260.

https://doi.org/10.1007/978-88-470-1947-8_16

D. Napoletani, M. Signore, T. Sauer, L. Liotta, E. Petricoin. 2011. Homologous Control of Protein Signaling Networks. Journal of Theoretical Biology, 279(1), 21.

https://doi.org/10.1016/j.jtbi.2011.03.020

L. Page, S. Brin, R. Motwani, & T. Winograd, The PageRank citation ranking: bringing order in the Web. Manuscript to be found at http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf.

M. Panza. 1995. De la nature épargnante aux forces généreuses. Le principe de moindre action entre mathématiques et métaphysique : Maupertuis et Euler (1740-1751). Revue d’Histoire des sciences, 48, 435-520.

https://doi.org/10.3406/rhs.1995.1240

M. Panza. 2003. The Origins of Analytical Mechanics in 18th century. In H. N. Jahnke (ed.) A History of Analysis, American Mathematical Society and London Mathematical Society, s.l., 137-153.

J. Ramsay, B. W. Silverman. 2005. Functional Data Analysis. 2nd edition, Springer.

https://doi.org/10.1007/b98888

Q. Rao and J. Frtunikj. 2018. Deep Learning for Self-Driving Cars: Chances and Challenges. 2018 IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS), 35-38.

https://doi.org/10.1145/3194085.3194087

F. Schroff, D. Kalenichenko, J, Philbin. 2015. FaceNet: A Unified Embedding for Face Recognition and Clustering. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 815-823.

https://doi.org/10.1109/cvpr.2015.7298682

L. S. Schulman. 2005. Techniques and Applications of Path Integration. New York: Dover.

https://doi.org/10.1063/1.2914703

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel and D. Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484-489.

https://doi.org/10.1038/nature16961

Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, J. Dean. 2016. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Preprint available at https://arxiv.org/abs/1609.08144, 2016.

Publiée
2021-04-06
Comment citer
Napoletani, Domenico, Marco Panza, et Daniele Struppa. 2021. « The Agnostic Structure of Data Science Methods ». Lato Sensu: Revue De La Société De Philosophie Des Sciences 8 (2), 44-57. https://doi.org/10.20416/LSRSPS.V8I2.5.