Navigation: CSC Department Menu

Fall 2015 Undergraduate & Graduate Special Topic Courses

CSC495-001   Introduction to Data Science – Dr. Chirkova

Prerequisites: Knowledge of linear algebra at the undergraduate level (MA305 or MA405); knowledge of statistics at the undergraduate level (STAT 305, 312, 370, 372); programming in any language like Python, C++, FORTRAN, or Java (CSC 112, 114, 116); or permission of instructor. Programming in R, SAS, or Matlab is desirable but not required.

Description: Data Science has become increasingly important in nearly every industry sector and academic field. It has gained significant national attention and interest by combining techniques from several fields like Computer Science, Statistics, and Mathematics to extract knowledge from data. This course provides an overview of several foundational topics in Data Science and will expose students to the theory and algorithms underlying these techniques, as well as the use of commercial software packages. The class will include a mix of lectures and programming projects.

Course Structure: The course will be jointly taught between the Departments of Computer Science and Statistics. Experts from each topic will provide lectures and the course will include several types of assignments:

  • Tutorials: R is a popular statistical computing and graphics package used by many data scientists. In addition, many data scientists use analytics packages like those in SAS Enterprise Miner. Class lab work will cover R and SAS basics, and students will complete tutorial exercises.
  • Programming Projects: Team-based programming projects that use the tools and techniques learned during class to solve specific real-world problems. Teams will be formed that balance student skills.
  • Homework Assignments: Weekly homework assignments designed to reinforce and practice the topics learned that week in class.

Course Outline: The course will be divided into four segments:

  • Introduction to Analysis: This segment introduces the topic of Data Science and presents several qualitative and quantitative methods analysts use to solve problems. Students will learn structured analytic techniques including the Analysis of Competing Hypothesis, Contrarian, and Imaginative Thinking techniques.
  • Data Infrastructure: This segment provides a broad overview of the data life cycle, cloud technology, map-reduce architectures, SQL and No-SQL, data wrangling, and Semantic Web technologies.
  • Analytics: This segment reviews fundamental statistical inference, metrics, prediction using simple and multiple regression, and correlation. Supervised classification techniques will be covered, including k-nearest neighbor, logistic regression, naive Bayes, support vector machines and others. Unsupervised clustering techniques will also be addressed.
  • Visualization: This segment provides an overview of graphs and maps, with an introduction to geospatial analytics.

CSC591-002 Data Driven Decision Making – Dr. Streck

This course will provide the students with an understanding of the criteria required in decision-making including quantifying stakeholder value, dealing with uncertainty and risk, and critical problem-solving methodologies. Understanding and qualifying data sources, use of structured and unstructured data, and unstructured text analytics will be used to provide input into the decision making process. Students will learn how data can be transformed into business intelligence (BI) while participating in an action learning setting. Focus on exploring the decisions processes of Data Sciences field

CSC495/591-003 Experimental Algorithms – Dr. Stallmann

Many “real world” problems are NP-Hard and therefore do not admit efficient algorithms that obtain optimal solutions. Thus it is important to be able to assess the behavior of heuristics (no guarantee of optimality) and algorithms (guaranteed optimality but no guarantees on runtime) with respect to solution quality and execution time.

There are also problems that do have polynomial time algorithms but the best choice among several such algorithms is unclear, either because all have the same asymptotic runtime or because the asymptotic runtime, due to large constants, may not be indicative of actual runtime. The goal of this course is to explore these issues in a hands-on fashion, via a project that addresses a particular problem.

Prerequisites: CSC 316 (for undergraduates) CSC 505 (for graduate students) or a data structures and algorithms course similar to one of these. Proficiency in at least one high-level programming language such as Java or C.

CSC591-005 Foundations of Data Science – Dr. Vatsavai

Students will learn core statistical data analysis principles. This course introduces principle ideas in statistical learning and help students prepare for advanced courses in data mining and machine learning. Focus will also be given on applying these principles for variety of data analytical tasks using R.

Prerequisite: None. An undergraduate course in statistics is helpful but not required.

Topics: Random variables and probability distributions, exploratory data analysis, variable selection, sampling methods, histograms and probability distributions, density estimation, missing data and imputation, mixture models, latent variables, and expectation maximization, regression analysis, discriminant analysis, bagging and boosting, principle component analysis, information theory -- entropy, mutual information, Bayesian information criteria, conditional independence, rescaling and low-dimensional summaries, factor analysis, graphical causal models and causal inference, and evaluating predictive models.

CSC591-006 Data Intensive Computing – Dr. Freeh

This project-oriented course will survey many distributed computing frameworks, such as Hadoop, BOINC, and HPCC. Each student will work in a medium-size group on a semester-long project using the above frameworks and supporting systems, such as HDFS, NoSQL (e.g., MongoDB), and Hive. This course will explore processing massive amounts of data. Students will study current software frameworks and tools. They will understand the design principles underlying large clusters that support data intensive computing.

Prerequisite: Because the project is a significant portion of the grade, students are expected to have a strong systems background, including completion of CSC 501.

CSC591-008 Privacy – Dr Staddon

Anonymization/Reidentification:

  • Reidentification attacks (2 weeks)
  • Database protection mechanisms, k-anonymity, l-diversity, t-closeness (2 weeks)
  • Differential privacy (1 week)

PETs and Usability:

  • Private information retrieval
  • Search over encrypted data
  • Targeting/Personalization and privacy
  • Anonymous communication protocols (e.g. onion routing)
  • Enhancing the privacy of text (e.g. stylometry analysis, inference detection)

Privacy Measurement:

  • Machine learning approaches to measuring privacy concerns
  • Qualitative approaches to measuring privacy concerns
  • Data mining techniques for inference detection
  • Privacy survey design techniques

CSC591/791-001 Automated Software Engineering – Dr. Menzies (menzies.us)

Synopsis: What is the next "big thing" after "big data"? Well, after "data collection" comes "model construction" so the next big thing after big data will be "big modeling". In this subject, students will learn how to represent, execute, and reason about models. Our case studies will come from software engineering but the principles of this subject could be applied to models in general.

Topics: AI and software engineering; principles of model-based reasoning with a heavy focus on models about software engineering; search-based and evolutionary inference; representing and reasoning about models; handling uncertainty; decision making and model-based reasoning.

Project: Students will implement and reason about a large model of their own choosing (ideally, some model relating to software engineering).

Prerequisite: A programming background is required in a contemporary language such as Java or C/C++ or Python. Students in this class will work in Python, but no background knowledge of that language will be assumed.

CSC591/791-007 DevOps – Dr. Parnin

Modern software development organizations require entire teams of DevOps to automate and maintain software engineering processes and infrastructure vital to the organization. In this course, you will gain practical exposure to the skills, tools, and knowledge needed in automating software engineering processes and infrastructure. Students will have the chance to build new or extend existing software engineering tools and design a DevOps pipeline.

Prerequisite: CSC 510 - Software Engineering

CSC591/791-009 Social Computing – Dr. Singh

This course surveys the field of social computing. It provides an introduction to the key concepts, paradigms, and techniques of social computing and social analytics. Specific topics will be selected from the following list: social media, social network analysis, typology of social relationships, crowdsourcing, prediction markets, organizational modeling, contracts, norms, mobility and social context, sociotechnical systems, social interpretation of information, computational models, software engineering for social computing.