Skip to content

ozsu/CS738

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CS738 (Winter 2026)

Data Engineering for Data Science

  • Instructor: M. Tamer Özsu (Office: DC3350)

  • Lecture Room: MC 2034

  • Lecture Time: Monday & Wednesday 2:30-3:50pm

  • Office Hour M 1:00-2:00pm

 

Calendar Description

Introduction to data engineering issues in data science. Data management technology objectives. Structured data management: Relational database technology, database workloads (OLTP vs OLAP). Big data issues: dealing with volume (geo-distributed, cluster parallel, and cloud-native data management), dealing with variety (data type-native systems, NoSQL database systems), dealing with velocity (streaming data management), and big data processing platforms (MapReduce, Spark). Data preparation pipeline: data acquisition, data integration (data warehouses, data lakes, lake houses), dataset selection, data quality and cleaning, data provenance management. Introduction to several current topics in database research, such as Large Language Models, vector databases.

Open to Master of Data Science and Artificial Intelligence students and others without an undergraduate course on database systems (instructor approval required).

Course Logistics

  • This is a course that is specially designed for the data science program. It is an in-person course and no accommodations are made for remote attendance. Please make arrangements to attend lectures.

  • The course will use LEARN for dissemination of notes and for discussions. I have set up discussion topics for different components of the course. Please post at the appropraite forum and refrain from sending me email with questions; post them on the discussion fora.

  • I will be posting lecture slides on LEARN (look under Content/Course Slides). However, they may be posted shortly before lectures or sometimes even after a lecture. Some of them will be detailed, others just a skeleton. So, it is important to attend lectures to get the most from these.

  • There is no textbook for the course. I have started to write my notes and I'll be posting them on LEARN (look under Content/Course Notes). I make no promises about the availability of these notes for every topic -- I'll write as much as I can. I may assign reading from other textbooks and papers as appropriate.

  • I intend to have guest lecturers for some topics and will update the schedule as I get them confirmed. These guest lecture material are important and integral components of the course.

  • Some logistics:

    • The lecture times are MW 2:30-3:50

    • My office hour is on M1:00-2:00

    • The TA for the course is Sepideh Abedini (email).

  • Final exam schedule will be announced by the Registrar's Office in due course and I cannot change the schedule. There will be no makeup for the final. You will need to take it the next time the course is offered (Fall 2026). You have to pass the final exam to pass the course.

  • There will be five quizzes in the course. These will be 20-30 minute quizzes and will likely be taken online within LEARN (I have not yet worked out the logistics). There will be no makesups for the quizzes. If you miss some, the weight will be distributed to other quizzes. You have to take at least three of the quizzes to pass the course.

  • There will be two paper reviews. The logistics of these will be revealed later.

  • There will be homework assignments for you to work through the material, but these won't be marked - they are for you to review the material. I will provide solutions when the deadline for working on them is completed.

  • Two 48 hour extensions per student are provided. They may be used on one of the two paper reviews (at most one may be used per paper review). Email me and the TA at least 24 hours before the deadline to let us know that you're using it, and why. We will adjust the deadline on LEARN.

Use of Generative AI Tools

  • Students can use generative AI tools as aid, but have to write their own text in paper reviews. These will be checked using appropriate tools. Note that generative AI is known to hallucinate and may fabricate facts and inaccurately express ideas. They also commonly falsify references to other work.

  • In addition, you should be aware that the legal/copyright status of generative AI inputs and outputs is unclear. Exercise caution when using large portions of content from AI sources.

  • Bottom line: students are accountable for the content and accuracy of all work you submit in this class, including any supported by generative AI. You should be able to readily demonstrate your knowledge of your submissions. It is the students' responsibility to check and use these tools responsibly.

Syllabus & Schedule (Subject to adjustments)

Marking Scheme (Tentative)

  • Paper critiques (2): 40% Guidelines

  • Quizzes (5): 20%

  • Final: 40%

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published