Fall: September-December

Block 1 (4 weeks, 4 credits)

Course Number Course Title Short Description 2023-24 Lecture Instructor 2023-24 Lab Instructor
DSCI 511 Programming for Data Science Program design and data manipulation with Python. Overview of data structures, iteration, flow control, and program design relevant to data exploration and analysis. When and how to exploit pre-existing libraries. Quan Nguyen Quan Nguyen
DSCI 521 Computing Platforms for Data Science How to install, maintain, and use the data scientific software stack. The Unix shell, version control, and problem solving strategies. Literate programming documents. Daniel Chen Daniel Chen
DSCI 523 Programming for Data Manipulation Program design and data manipulation with R. Organizing, filtering, sorting, grouping, reformatting, converting, and cleaning data to prepare it for further analysis. Tiffany Timbers Tiffany Timbers
DSCI 551 Descriptive Statistics and Probability for Data Science Fundamental concepts in probability including conditional, joint, and marginal distributions. Statistical view of data coming from a probability distribution. Alexi Rodríguez-Arelis, Katie Burak Alexi Rodríguez-Arelis, Katie Burak

Block 2 (4 weeks, 4 credits)

Course Number Course Title Short Description 2023-24 Lecture Instructor 2023-24 Lab Instructor
DSCI 512 Algorithms & Data Structures How to choose and use appropriate algorithms and data structures to help solve data science problems. Key concepts such as recursion and algorithmic complexity (e.g., efficiency, scalability). Jian Zhu Jungyeul Park
DSCI 531 Data Visualization I Exploratory data analysis. Design of effective static visualizations. Plotting tools in R and Python. Joel Östblom Joel Östblom
DSCI 552 Statistical Inference and Computation I The statistical and probabilistic foundations of inference. Large sample results. The frequentist paradigm. Tiffany Timbers, Katie Burak Tiffany Timbers, Katie Burak
DSCI 571 Supervised Learning I Introduction to supervised machine learning. Basic machine learning concepts such as generalization error and overfitting. Various approaches such as K-NN, decision trees, linear classifiers. Varada Kolhatkar Varada Kolhatkar

Block 3 (4 weeks, 4 credits)

Course Number Course Title Short Description 2023-24 Lecture Instructor 2023-24 Lab Instructor
COLX 521 Corpus Linguistics Basic processing of text corpora using Python. Includes string manipulation, corpus readers, linguistic comparison of corpora, structured text formats, and text preprocessing tools. Garrett Nicolai Jungyeul Park
DSCI 513 Databases & Data Retrieval How to work with data stored in relational database systems. Storage structures and schemas, data relationships, and ways to query and aggregate such data. Gittu George Gittu George
DSCI 561 Regression I Linear models for a quantitative response variable, with multiple categorical and/or quantitative predictors. Matrix formulation of linear regression. Model assessment and prediction. Katie Burak Katie Burak
DSCI 573 Feature and Model Selection How to evaluate and select features and models. Cross-validation, ROC curves, feature engineering, and regularization. Joel Östblom Joel Östblom

Winter: January-April

Block 4 (4 weeks, 4 credits)

Course Number Course Title Short Description 2023-24 Lecture Instructor 2023-24 Lab Instructor
COLX 535 Parsing for Computational Linguistics The identification of syntactic structure in natural language. Parsing algorithms for popular grammar formalisms, application of statistical information to parsing, parser evaluation, and extraction of parse features. Miikka Silfverberg Jungyeul Park
COLX 561 Computational Semantics How meaning is represented by computers. An overview of popular semantic resources, and techniques for building new resources from unstructured text data. Garrett Nicolai Jungyeul Park
DSCI 563 Unsupervised Learning How to find groups and other structure in unlabeled, possibly high dimensional data. Dimension reduction for visualization and data analysis. Clustering, association rules, model fitting via the EM algorithm. Garrett Nicolai Jungyeul Park
DSCI 572 Supervised Learning II Introduction to optimization. Gradient descent and stochastic gradient descent. Roundoff error and finite differences. Neural networks and deep learning. Jian Zhu Jungyeul Park

Block 5 (4 weeks, 4 credits)

Course Number Course Title Short Description 2023-24 Lecture Instructor 2023-24 Lab Instructor
COLX 523 Advanced Corpus Linguistics Text corpora collection and curation. How to pull representative datasets from internet sources. Techniques for efficient and reliable annotation. Garrett Nicolai Jungyeul Park
COLX 525 Computational Morphology Approaches to sub-word phenomenon in language processing. Automatic morphological analysis of diverse languages, part of speech tagging, word segmentation, and character-level neural network models. Miikka Silfverberg Jungyeul Park
COLX 531 Machine Translation Key methodologies for automatic translation between languages, with a focus on statistical and neural machine translation approaches. Applying Machine Translation (MT) architectures to analogous monolingual tasks. MT evaluation. Jian Zhu Jungyeul Park
COLX 565 Sentiment Analysis Identification and analysis of opinion, especially in social media. Text polarity and emotion classification, fine-grained (e.g. aspectual) opinion mining, argumentation mining, sentiment in social networks. Garrett Nicolai Jungyeul Park

Block 6 (4 weeks, 4 credits)

Course Number Course Title Short Description 2023-24 Lecture Instructor 2023-24 Lab Instructor
COLX 563 Advanced Computational Semantics Application of machine learning to various semantic tasks. Likely topics include: information extraction, semantic role labelling, semantic parsing, discourse parsing, question answering, summarization, and natural language inference. Miikka Silfverberg Jungyeul Park
COLX 581 Natural Language Processing for Low-Resource Languages Building automatic language tools when data is scarce. Rule-based and hybrid systems, semi-supervised learning, active learning. Knowledge transfer from other (related) languages. Miikka Silfverberg Jungyeul Park
COLX 585 Trends in Computational Linguistics Cutting-edge techniques in natural language processing. For this iteration, the latest innovations in neural network architectures. Jian Zhu Jungyeul Park
DSCI 541 Privacy, Ethics & Security The legal, ethical, and security issues concerning data, including aggregated data. Proactive compliance with rules and, in their absence, principles for the responsible management of sensitive data. Case studies. Garrett Nicolai Jungyeul Park

Spring: May-June

Capstone Project (8-10 weeks, 6 credits)

Course Number Course Title Short Description 2023-24 Lecture Instructor
COLX 595 Capstone Project A mentored group project based on real data and questions from a partner within or outside the university. Students will formulate questions and design and execute a suitable analysis plan. The group will work collaboratively to produce a project report, presentation, and possibly other products, such as a web application. MDS-CL Staff