References

[1] D. Bertsimas and J. Tsitsiklis, Introduction to linear optimization, 1st ed. Athena Scientific, 1997.

[2] D. P. Bertsekas, Nonlinear programming. Athena Scientific, 1999.

[3] G. Cornuéjols, “Valid inequalities for mixed integer linear programs.” Math. Program., vol. 112, no. 1, pp. 3–44, Mar. 2008,Available: http://dblp.uni-trier.de/db/journals/mp/mp112.html#Cornuejols08

[4] H. P. Williams, “Model building in linear and integer programming,” in Computational mathematical programming, 1985, pp. 25–53.

[5] E. Ghashim and P. Boily, “A Soft Introduction to Bayesian Data Analysis,” Data Science Report Series, 2020.

[6] E. T. Jaynes, Probability theory: The logic of science. Cambridge Press, 2003.

[7] A. Kolmogorov, Foundations of the theory of probability. Chelsea Publishing Company, 1933.

[8] Mathematical Association, UK, “An Aeroplane’s Guide to A Level Maths.”

[9] Wikipedia, “List of probability distributions,” 2021,Available: https://en.wikipedia.org/wiki/List\_of\_probability\_distributions

[10] R. E. Walpole, R. H. Myers, S. L. Myers, and K. Ye, Probability & statistics for engineers and scientists, 8th ed. Upper Saddle River: Pearson Education, 2007.

[11] R. V. Hogg and E. A. Tanis, Probability and statistical inference, 7th ed. Pearson/Prentice Hall, 2006.

[12] T. H. Davenport and D. J. Patil, “Data scientist: The sexiest job of the 21st century,” Harvard Business Review, Oct. 2012,Available: https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

[13] L. Donnelly, “Robots are better than doctors at diagnosing some cancers, major study finds,” The Telegraph, May 2018.

[14] P. A. B. Bien Nicholas AND Rajpurkar, “Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of mrnet,” PLOS Medicine, vol. 15, no. 11, pp. 1–19, 2018, doi: 10.1371/journal.pmed.1002699.

[15] BeauHD, “Google ai claims 99 percent accuracy in metastatic breast cancer detection,” Slashdot.com, Oct. 2018.

[16] Columbia University Irving Medical Center, “Data scientists find connections between birth month and health,” Newswire.com, Jun. 2015.

[17] “Scientists using gps tracking on endangered dhole wild dogs.” Live View GPS, Oct. 2018.

[18] S. Reichman, “These ai-invented paint color names are so bad, they’re good,” Curbed, May 2017.

[19] K. Hao, “We tried teaching an ai to write christmas movie plots. Hilarity ensued. Eventually,” MIT Technology Review, Dec. 2018.

[20] E. Betuel, “Math model determines who wrote beatles’ "in my life": Lennon or mccartney?” Inverse, Jul. 2018.

[21] Indiana University, “Scientists use instagram data to forecast top models at new york fashion week,” Science Daily, Sep. 2015.

[22] J. Hiner, “How big data will solve your email problem,” ZDNet, Oct. 2013.

[23] B. Smith, “Artificial intelligence better than physicists at designing quantum science experiments,” ABC Science, Oct. 2018.

[24] A. Van Dam, “This researcher studied 400,000 knitters and discovered what turns a hobby into a business,” Washington Post, Nov. 2018.

[25] E. Yong, “Wait, have we really wiped out 60% of animals?” The Atlantic, Oct. 2018.

[26] J. Dastin, “Amazon scraps secret ai recruiting tool that showed bias against women,” Reuters, Oct. 2018.

[27] “Facebook documents seized by mps investigating privacy breach.” BBC News, Nov. 2018.

[28] D. Wakabayashi, “Firm led by google veterans uses a.i. To ‘nudge’ workers toward happiness,” New York Times, Dec. 2018.

[29] S. Ramachandran and J. Flint, “At netflix, who wins when it’s hollywood vs.the algorithm?” Wall Street Journal, Nov. 2018.

[30] M. Jing, “AlphaGo vanquishes world’s top go player, marking a.i.’s superiority over human mind,” South China Morning Post, May 2017.

[31] D. Lewis, “An ai-written novella almost won a literary prize,” Smithsonian Magazine, Mar. 2016.

[32] E. Mack, “Elon musk: Artificial intelligence may spark world war iii,” CNET, Sep. 2017.

[33] T. Rikert, “A.I. Hype has peaked so what’s next?” TechCrunch, Sep. 2017.

[34] J. C. Scott, Against the grain: A deep history of the earliest states. New Haven: Yale University Press, 2017.

[35] R. Mérou, “Conceptual map of free software.” Wikimedia, 2010.

[36] Henning (WMDE), “UML diagram of the wikibase data model.” Wikimedia.

[37] Wooptoo, “Entity - relationship model.” Wikimedia.

[38] S. L. Lee and D. Baer, “20 cognitive biases that screw up your decisions,” Business Insider, Dec. 2015.

[39] “Cognitive biases.” The Decision Lab.

[40] R. Schutt and C. O’Neill, Doing data science: Straight talk from the front line. O’Reilly, 2013.

[41] “Research integrity & ethics.” Memorial University of Newfoundland.

[42] “A conversation with julie paquette: Ethics in quantitative contexts.” Paquette, J. and Boily, P.

[43] “Code of ethics/conducts.” Certified Analytics Professional.

[44] “Development of national statistical systems.” United Nations, Statistics Division.

[45] “ACM code of ethics and professional conduct.” Association for Computing Machinery.

[46] K. Fung, “The ethics conversation we’re not having about data,” Harvard Business Review, Nov. 2015.

[47] C. O’Neil, Weapons of math destruction: How big data increases inequality and threatens democracy. Crown, 2016.

[48] M. Chen, “Is ‘big data’ actually reinforcing social inequalities?” The Nation, Sep. 2013.

[49] R. W. Paul and L. Elder, Understanding the foundations of ethical reasoning, 2nd ed. Foundation for Critical Thinking, 2006.

[50] “Centre for big data ethics, law, and policy.” Data Science Institute, University of Virginia.

[51] “Open data.” Wikipedia.

[52] D. Brin, The transparent society: Will technology force us to choose between privacy and freedom? Perseus, 1998.

[53] “Open up guide: Using open data to combat corruption.” Open Data Charter, 2017.

[54] J. S. A. Corey, The Expanse. Orbit Books.

[55] N. Cohn, “How one 19-year-old illinois man is distorting national polling averages,” The Upshot, 2016.

[56] A. Gumbus and F. Grodzinsky, “Era of big data: Danger of descrimination,” ACM SIGCAS Computers and Society, vol. 45, no. 3, pp. 118–125, 2015.

[57] I. Johnston, “AI robots learning racism, sexism and other prejudices from humans, study finds,” The Independent, Apr. 2017.

[58] M. Judge, “Facial-recognition technology affects african americans more often,” The Root, 2016.

[59] I. Asimov, Foundation series. Gnome Press, Spectra, Doubleday.

[60] I. Stewart, “The fourth law of humanics,” Nature, vol. 535, 2016.

[61] J. Cranshaw, R. Schwartz, J. I. Hong, and N. M. Sadeh, “The livehoods project: Utilizing social media to understand the dynamics of a city,” in ICWSM, 2012.Available: http://dblp.uni-trier.de/db/conf/icwsm/icwsm2012.html#CranshawSHS12

[62] A. Jensen et al., “Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients,” Nature Communications, vol. 5, 2014, doi: 10.1038/ncomms5022.

[63] K.-W. Hsu, N. Pathak, J. Srivastava, G. Tschida, and E. Bjorklund, “Data mining based tax audit selection: A case study of a pilot project at the minnesota department of revenue,” in Real world data mining applications, Cham: Springer International Publishing, 2015, pp. 221–245. doi: 10.1007/978-3-319-07812-0_12.

[64] F. R. Bach and M. I. Jordan, “Learning spectral clustering, with application to speech separation,” J. Mach. Learn. Res., vol. 7, pp. 1963–2001, Dec. 2006.

[65] H. T. Kung and D. Vlah, “A spectral clustering approach to validating sensors via their peers in distributed sensor networks,” Int. J. Sen. Netw., vol. 8, no. 3/4, pp. 202–208, Oct. 2010, doi: 10.1504/IJSNET.2010.036195.

[66] V. U. Panchami and N. Radhika, “A novel approach for predicting the length of hospital stay with dbscan and supervised classification algorithms.” in ICADIWT, 2014, pp. 207–212.Available: http://dblp.uni-trier.de/db/conf/icadiwt/icadiwt2014.html#PanchamiR14

[67] C. Plant et al., “Automated detection of brain atrophy patterns based on mri for the prediction of alzheimer’s disease.” NeuroImage, vol. 50, no. 1, pp. 162–174, 2010,Available: http://dblp.uni-trier.de/db/journals/neuroimage/neuroimage50.html#PlantTOBMMBHE10

[68] S. E. Brossette, A. P. Sprague, J. M. Hardin, K. B. Waites, W. T. Jones, and S. A. Moser, “Association Rules and Data Mining in Hospital Infection Control and Public Health Surveillance,” Journal of the American Medical Informatics Association, vol. 5, no. 4, pp. 373–381, Jul. 1998, doi: 10.1136/jamia.1998.0050373.

[69] M. Kosinski and Y. Wang, “Deep neural networks are more accurate than humans at detecting sexual orientation from facial images,” Journal of Personality and Social Psychology, vol. 114, no. 2, pp. 246–257, Feb. 2018.

[70] J. Taylor, “Four problems in using crisp-dm and how to fix them,” KDnuggets.com, 2017.

[71] P. Boily, “Non-technical aspects of consulting,” Introduction to Quantitative Consulting, 2021.

[72] P. Boily, Introduction to Quantitative Consulting, 2021.

[73] A. De Mauro, M. Greco, and M. Grimaldi, “A formal definition of big data based on its essential features,” Library Review, vol. 65, no. 3, pp. 122–135, 2016.

[74] D. Robinson, “What’s the difference between data science, machine learning, and artificial intelligence?” Variance Explained, Jan. 2018,Available: http://varianceexplained.org/r/ds-ml-ai/

[75] D. Woods, “Bitly’s hilary mason on "what is a data scientist?",” Forbes, Mar. 2012,Available: https://www.forbes.com/sites/danwoods/2012/03/08/hilary-mason-what-is-a-data-scientist/#1189ca465502

[76] C. C. Aggarwal and C. K. Reddy, Eds., Data clustering: Algorithms and applications. CRC Press, 2014.Available: http://www.charuaggarwal.net/clusterbook.pdf

[77] F. Provost and T. Fawcett, Data science for business. O’Reilly, 2015.

[78] A. M. Masci et al., “An improved ontological representation of dendritic cells as a paradigm for all cell types,” BMC Bioinformatics, 2009.

[79] boot4life, “What json structure to use for key-value pairs.” StackOverflow, Jun. 2016.

[80] V. M. Chawla, ERD "Crow’s Foot" Relationship Symbols Cheat Sheet, 2013.

[81] N. Feldman, Data Lake or Data Swamp?, 2015.

[82] P. Hapala et al., “Mapping the electrostatic force field of single molecules from high-resolution scanning probe images,” Nature Communications, vol. 7, no. 11560, 2016.

[83] P. Boily, S. Davies, and J. Schellinck, Practical data visualization. Data Action Lab/Quadrangle, 2021.

[84] P. Boily, MAT2377 - Probability and Statistics for Engineers.

[85] T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: Data mining, inference, and prediction, 2nd ed. Springer, 2008.

[86] C. C. Aggarwal, Ed., Data classification: Algorithms and applications. CRC Press, 2015.

[87] C. C. Aggarwal, Data mining: The textbook. Cham: Springer, 2015. doi: 10.1007/978-3-319-14142-8.

[88] Wikipedia, “Cluster analysis algorithms.”

[89] R. Sutton and G. Barto, Reinforcement learning: An introduction. MIT Press, 2018.

[90] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press Cambridge, 2016.

[91] Y. Cissokho, S. Fadel, R. Millson, R. Pourhasan, and P. Boily, “Anomaly Detection and Outlier Analysis,” Data Science Report Series, 2020.

[92] T. Orchard and M. Woodbury, A missing information principle: Theory and applications. University of California Press, 1972.

[93] S. Hagiwara, “Nonresponse error in survey sampling: Comparison of different imputation methods.” Honours Thesis; School of Mathematics; Statistics, Carleton University, 2012.

[94] T. Raghunathan, J. Lepkowski, J. Van Hoewyk, and P. Solenberger, “A multivariate technique for multiply imputing missing values using a sequence of regression models,” Survey Methodology, vol. 27, no. 1, pp. 85–95, 2001.

[95] S. van Buuren, Flexible imputation of missing data. CRC Press, 2012.

[96] D. B. Rubin, Multiple imputation for nonresponse in surveys. Wiley, 1987.

[97] P. Boily, “Principles of data collection,” Data Science Report Series, 2020,Available: https://www.data-action-lab.com/wp-content/uploads/2021/08/IQC_ch_1.pdf

[98] P. Boily, “An imputation algorithm of blood alcohol content levels for drivers and pedestrians in fatal collisions,” Data Science Report Series, 2007,Available: https://www.data-action-lab.com/wp-content/uploads/2021/08/IQC_ch_2_CS.pdf

[99] “Height percentile calculator, by age and country.” Tall Life.

[100] O. Leduc, A. Macfie, A. Maheshwari, M. Pelletier, and P. Boily, “Feature selection and dimension reduction,” Data Science Report Series, 2020.

[101] “Interactive visualization to teach about the curse of dimensionality.”

[102] D. Dua and C. Graff, “Liver disorders dataset at the UCI machine learning repository.” University of California, Irvine, School of Information; Computer Sciences, 2017.

[103] @DamianMingle,,Available: https://twitter.com/DamianMingle/status/655534652833288192

[104] E. Tufte, Beautiful evidence. Graphics Press, 2008.

[105] T. Elms, Lexical distance of european languages. Etymologikon, 2008.Available: https://elms.wordpress.com/2008/03/04/lexical-distance-among-languages-of-europe/

[106] A. Cairo, The functional art. New Riders, 2013.

[107] A. Cairo, The truthful art. New Riders, 2016.

[108] N. Yau, FlowingData. Available: http://flowingdata.com

[109] I. Meireilles, Design for information. Rockport, 2013.

[110] P. Dragicevic and Y. Jansen, List of physical visualizations and related artifacts. Available: http://dataphys.org/list/

[111] Data Action Lab Podcast, Episode 3 - Minard’s March to Moscow, 2020.

[112] Data Action Lab, Data Analysis Short Course, 2020.

[113] R. A. Dahl, “Cause and effect in the study of politics,” in Cause and effect, D. Lerner, Ed. New York: Free Press, 1965, pp. 75–98.

[114] A. B. Hill, “The environment and disease: Association or causation?” Proc R Soc Med, vol. 58, no. 5, pp. 295–300, 1965.

[115] Z. Gemignani and C. Gemignani, Data fluency: Empowering your organization with effective data communication. Wiley, 2014.

[116] Z. Gemignani and C. Gemignani, A guide to creating dashboards people love to use. (ebook).Available: https://www.juiceanalytics.com

[117] S. Wexler, J. Shaffer, and A. Cotgreave, The big book of dashboards. Wiley, 2017.

[118] M. Pelletier and P. Boily, “Dashboard and data visualization, with examples,” Data Science Report Series, 2019.

[119] E. Tufte, The visual display of quantitative information. Graphics Press, 2001.

[120] C. Nussbaumer Knaflic, Storytelling with data. Wiley, 2015.

[121] Matillion.com, “Poor use of dashboard software,”Available: https://www.matillion.com/wp-content/uploads/2014/11/qlikview-poor-use-of-dashboard-software.png

[122] Geckoboard.com, “Two terrible dashboard examples,”Available: https://www.geckoboard.com/assets/2-terrible-dashboard-example-min.png

[123] N. Wickham H., Ggplot2: Elegant graphics for data analysis. Springer, 2021.

[124] H. Wickham, “A layered grammar of graphics,” Journal of Computational and Graphical Statistics, no. 19, pp. 3–28, 2009.

[125] K. Healey, Data Visualization: A Practical Introduction, 2018.

[126] H. Wickham, “Tidy data,” Journal of Statistical Software, vol. 59, no. 10, 2014.

[127] W. Chang, R graphics cookbook. O’Reilly, 2013.

[128] D. Barber, Bayesian reasoning and machine learning. Cambridge Press, 2012.

[129] G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical learning: With applications in r. Springer, 2014.

[130] D. Dua and E. Karra Taniskidou, “UCI machine learning repository.” Irvine, CA: University of California, School of Information; Computer Science, 2017.Available: http://archive.ics.uci.edu/ml

[131] S. Canada, “Athlete rebate.”

[132] E. Siegel, Predictive analytics: The power to predict who will click, buy, lie or die. Predictive Analytics World, 2016.

[133] E. Garcia, C. Romero, S. Ventura, and T. Calders, “Drawbacks and solutions of applying association rule mining in learning management systems,” 2007.

[134] Wikipedia, “Association rule learning.” 2020.Available: https://en.wikipedia.org/wiki/Association\_rule\_learning

[135] E. R. Omiecinski, “Alternative interest measures for mining associations in databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 1, pp. 57–69, 2003, doi: 10.1109/TKDE.2003.1161582.

[136] G. Piatetsky-Shapiro, “Discovery, analysis, and presentation of strong rules,” 1991.

[137] C. C. Aggarwal and P. S. Yu, “A new framework for itemset generation,” in Proceedings of the seventeenth acm sigact-sigmod-sigart symposium on principles of database systems, 1998, pp. 18–24. doi: 10.1145/275487.275490.

[138] P.-N. Tan, V. Kumar, and J. Srivastava, “Selecting the right objective measure for association analysis,” Inf. Syst., vol. 29, no. 4, pp. 293–313, Jun. 2004, doi: 10.1016/S0306-4379(03)00072-3.

[139] M. Hahsler and K. Hornik, “New probabilistic interest measures for association rules,” CoRR, vol. abs/0803.0966, 2008,Available: http://arxiv.org/abs/0803.0966

[140] T. Chou, “Apriori: Association rule mining in-depth explanation and python implementation,” Towards Data Science, Oct. 2020,Available: https://towardsdatascience.com/apriori-association-rule-mining-explanation-and-python-implementation-290b42afdfc6

[141] J. Leskovec, A. Rajamaran, and J. D. Ullman, Mining of massive datasets. Cambridge Press, 2014.

[142] M. Risdal, “Exploring survival on the titanic,” Kaggle.com, 2016.

[143] B. Kitts et al., “Click fraud detection: Adversarial pattern recognition over 5 years at microsoft,” in Annals of information systems (special issue on data mining in real-world applications), Springer, 2015, pp. 181–201. doi: 10.1007/978-3-319-07812-0.

[144] B. Kitts, “The making of a large-scale ad server,” 2013.

[145] S. Fefilatyev et al., “Detection of anomalous particles from deepwater horizon oil spill using SIPPER3 underwater imaging platform,” in Data mining case studies iv, proceedings of the 11th ieee international conference on data mining, Vancouver, BC: IEEE, 2011.

[146] B. Kitts, “Product targeting from rare events: Five years of one-to-one marketing at CPI,” Marketing Science Conference, 2005.

[147] L. Torgo, Data mining with r, 2nd ed. CRC Press, 2016.

[148] T. Hastie, T. Tibshirani, and M. Wainwright, Statistical learning with sparsity: The lasso and generalizations. CRC Press, 2015.

[149] O. Leduc and P. Boily, “Boosting with adaboost and gradient boosting,” Data Action Lab Blog, 2019,Available: https://www.data-action-lab.com/2019/07/31/boosting-with-adaboost-and-gradient-boosting/

[150] C. F. Robert, Le choix bayésien - principes et pratique. Springer-Verlag France, 2006.

[151] B. Efron, Large scale inference: Empirical bayes methods for estimation, testing, and prediction. Cambridge University Press, 2010.

[152] A. Ng and K. Soo, Eds., Surviving a disaster, in numsense! algobeans, 2016.

[153] D. H. Wolpert, “The lack of a priori distinctions between learning algorithms,” Neural Computation, vol. 8, no. 7, pp. 1341–1390, 1996, doi: 10.1162/neco.1996.8.7.1341.

[154] D. H. Wolpert and W. G. Macready, “Coevolutionary free lunches,” IEEE Transactions on Evolutionary Computation, vol. 9, no. 6, pp. 721–735, 2005, doi: 10.1109/TEVC.2005.856205.

[155] J. Chambers and T. Hastie, Statistical models in s. Wadsworth; Brooks/Cole, 1992.

[156] E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, “DBSCAN revisited, revisited: Why and how you should (still) use dbscan,” ACM Trans. Database Syst., vol. 42, no. 3, Jul. 2017, doi: 10.1145/3068335.

[157] J. d’Huy, “Scientists trace society’s myths to primordial origins,” Scientific American (Online), Sep. 2016.

[158] U. Habib, K. Hayat, and G. Zucker, “Complex building’s energy system operation patterns analysis using bag of words representation with hierarchical clustering,” Complex Adapt. Syst. Model., vol. 4, p. 8, 2016, doi: 10.1186/s40294-016-0020-0.

[159] M. Orlowska et al., “A comparison of antioxidant, antibacterial, and anticancer activity of the selected thyme species by means of hierarchical clustering and principal component analysis,” Acta Chromatographica Acta Chromatographica, vol. 28, no. 2, pp. 207–221, 2016, doi: 10.1556/achrom.28.2016.2.7.

[160] V. U. Panchami and N. Radhika, “A novel approach for predicting the length of hospital stay with dbscan and supervised classification algorithms.” in ICADIWT, 2014, pp. 207–212.Available: http://dblp.uni-trier.de/db/conf/icadiwt/icadiwt2014.html#PanchamiR14

[161] A. Jawad, K. Kersting, and N. Andrienko, “Where traffic meets dna: Mobility mining using biological sequence analysis revisited,” in Proceedings of the 19th acm sigspatial international conference on advances in geographic information systems, 2011, pp. 357–360. doi: 10.1145/2093973.2094022.

[162] G. Schoier and G. Borruso, “Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes,” in Computational science and its applications - iccsa 2011, 2011, pp. 454–465.

[163] N. Harris, “Visualizing dbscan clustering.”

[164] B. Desgraupes, ClusterCrit: Clustering indices. 2018.Available: https://CRAN.R-project.org/package=clusterCrit

[165] Z. Cheng, J. Caverlee, K. Lee, and D. Z. Sui, “Exploring millions of footprints in location sharing services.” in ICWSM, 2011.Available: http://dblp.uni-trier.de/db/conf/icwsm/icwsm2011.html#ChengCLS11

[166] R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, vol. 7, no. 7, pp. 179–188, 1936.

[167] A. M. Raja, “Penguins dataset overview - iris alternative,” Towards Data Science, Jun. 2020.

[168] Q. E. McCallum, Bad data handbook. O’Reilly, 2013.

[169] A. K. Maheshwari, Business intelligence and data mining. Business Expert Press, 2015.

[170] I. Stewart, J. Cohen, and T. Pratchett, The science of discworld ii: The globe. Ebury Publishing, 2011.Available: https://books.google.ca/books?id=MyozhndBMZkC

[171] M. Iqbal, “Spotify Revenue and Usage Statistics.” Business of Apps, 2021.Available: https://www.businessofapps.com/data/spotify-statistics/

[172] B. Plantinga, “What do Spotify’s audio features tell us about this year’s Eurovision Song Contest?” medium.com, 2018.Available: https://medium.com/@boplantinga/what-do-spotifys-audio-features-tell-us-about-this-year-s-eurovision-song-contest-66ad188e112a