Fall: September-December
Block 1 (4 weeks, 4 credits)
Course Number | Course Title | Short Description | 2024-25 Lecture Instructor | 2024-25 Lab Instructor |
---|---|---|---|---|
DSCI 511 | Programming for Data Science | Program design and data manipulation with Python. Overview of data structures, iteration, flow control, and program design relevant to data exploration and analysis. When and how to exploit pre-existing libraries. | Scott Mackie | Jungyeul Park |
DSCI 521 | Computing Platforms for Data Science | How to install, maintain, and use the data scientific software stack. The Unix shell, version control, and problem solving strategies. Literate programming documents. | TBA | TBA |
DSCI 523 | Programming for Data Manipulation | Program design and data manipulation with R. Organizing, filtering, sorting, grouping, reformatting, converting, and cleaning data to prepare it for further analysis. | TBA | TBA |
DSCI 551 | Descriptive Statistics and Probability for Data Science | Fundamental concepts in probability including conditional, joint, and marginal distributions. Statistical view of data coming from a probability distribution. | TBA | TBA |
Block 2 (4 weeks, 4 credits)
Course Number | Course Title | Short Description | 2024-25 Lecture Instructor | 2024-25 Lab Instructor |
---|---|---|---|---|
DSCI 512 | Algorithms & Data Structures | How to choose and use appropriate algorithms and data structures to help solve data science problems. Key concepts such as recursion and algorithmic complexity (e.g., efficiency, scalability). | Jian Zhu | Jungyeul Park |
DSCI 531 | Data Visualization I | Exploratory data analysis. Design of effective static visualizations. Plotting tools in R and Python. | TBA | TBA |
DSCI 552 | Statistical Inference and Computation I | The statistical and probabilistic foundations of inference. Large sample results. The frequentist paradigm. | TBA | TBA |
DSCI 571 | Supervised Learning I | Introduction to supervised machine learning. Basic machine learning concepts such as generalization error and overfitting. Various approaches such as K-NN, decision trees, linear classifiers. | TBA | TBA |
Block 3 (4 weeks, 4 credits)
Course Number | Course Title | Short Description | 2024-25 Lecture Instructor | 2024-25 Lab Instructor |
---|---|---|---|---|
COLX 521 | Corpus Linguistics | Basic processing of text corpora using Python. Includes string manipulation, corpus readers, linguistic comparison of corpora, structured text formats, and text preprocessing tools. | Garrett Nicolai | Jungyeul Park |
DSCI 513 | Databases & Data Retrieval | How to work with data stored in relational database systems. Storage structures and schemas, data relationships, and ways to query and aggregate such data. | TBA | TBA |
DSCI 561 | Regression I | Linear models for a quantitative response variable, with multiple categorical and/or quantitative predictors. Matrix formulation of linear regression. Model assessment and prediction. | TBA | TBA |
DSCI 573 | Feature and Model Selection | How to evaluate and select features and models. Cross-validation, ROC curves, feature engineering, and regularization. | TBA | TBA |
Winter: January-April
Block 4 (4 weeks, 4 credits)
Course Number | Course Title | Short Description | 2024-25 Lecture Instructor | 2024-25 Lab Instructor |
---|---|---|---|---|
COLX 535 | Parsing for Computational Linguistics | The identification of syntactic structure in natural language. Parsing algorithms for popular grammar formalisms, application of statistical information to parsing, parser evaluation, and extraction of parse features. | Garrett Nicolai | Jungyeul Park |
COLX 561 | Computational Semantics | How meaning is represented by computers. An overview of popular semantic resources, and techniques for building new resources from unstructured text data. | Scott Mackie | Jungyeul Park |
DSCI 563 | Unsupervised Learning | How to find groups and other structure in unlabeled, possibly high dimensional data. Dimension reduction for visualization and data analysis. Clustering, association rules, model fitting via the EM algorithm. | Garrett Nicolai | Jungyeul Park |
DSCI 572 | Supervised Learning II | Introduction to optimization. Gradient descent and stochastic gradient descent. Roundoff error and finite differences. Neural networks and deep learning. | Jian Zhu | Jungyeul Park |
Block 5 (4 weeks, 4 credits)
Course Number | Course Title | Short Description | 2024-25 Lecture Instructor | 2024-25 Lab Instructor |
---|---|---|---|---|
COLX 523 | Advanced Corpus Linguistics | Text corpora collection and curation. How to pull representative datasets from internet sources. Techniques for efficient and reliable annotation. | Garrett Nicolai | Jungyeul Park |
COLX 525 | Computational Morphology | Approaches to sub-word phenomenon in language processing. Automatic morphological analysis of diverse languages, part of speech tagging, word segmentation, and character-level neural network models. | Garrett Nicolai | Jungyeul Park |
COLX 531 | Machine Translation | Key methodologies for automatic translation between languages, with a focus on statistical and neural machine translation approaches. Applying Machine Translation (MT) architectures to analogous monolingual tasks. MT evaluation. | Jian Zhu | Jungyeul Park |
COLX 565 | Sentiment Analysis | Identification and analysis of opinion, especially in social media. Text polarity and emotion classification, fine-grained (e.g. aspectual) opinion mining, argumentation mining, sentiment in social networks. | Muhammad Abdul-Mageed | Jungyeul Park |
Block 6 (4 weeks, 4 credits)
Course Number | Course Title | Short Description | 2024-25 Lecture Instructor | 2024-25 Lab Instructor |
---|---|---|---|---|
COLX 563 | Advanced Computational Semantics | Application of machine learning to various semantic tasks. Likely topics include: information extraction, semantic role labelling, semantic parsing, discourse parsing, question answering, summarization, and natural language inference. | Scott Mackie | Jungyeul Park |
COLX 581 | Natural Language Processing for Low-Resource Languages | Building automatic language tools when data is scarce. Rule-based and hybrid systems, semi-supervised learning, active learning. Knowledge transfer from other (related) languages. | Garrett Nicolai | Jungyeul Park |
COLX 585 | Trends in Computational Linguistics | Cutting-edge techniques in natural language processing. For this iteration, the latest innovations in neural network architectures. | Jian Zhu | Jungyeul Park |
DSCI 541 | Privacy, Ethics & Security | The legal, ethical, and security issues concerning data, including aggregated data. Proactive compliance with rules and, in their absence, principles for the responsible management of sensitive data. Case studies. | Garrett Nicolai | Jungyeul Park |
Spring: May-June
Capstone Project (8-10 weeks, 6 credits)
Course Number | Course Title | Short Description | 2024-25 Lecture Instructor |
---|---|---|---|
COLX 595 | Capstone Project | A mentored group project based on real data and questions from a partner within or outside the university. Students will formulate questions and design and execute a suitable analysis plan. The group will work collaboratively to produce a project report, presentation, and possibly other products, such as a web application. | MDS-CL Faculty |