Big Data Analytics (EECS E6893) and Advanced Big Data Analytics (EECS E6895)





E6893 Student List and Group Assignments
Big Data Public Dataset Information


COURSE BENEFITS:

  • Students will gain knowledge on analyzing Big Data. It serves as an introductory course for graduate students who are expecting to face Big Data storage, processing, analysis, visualization, and application issues on both workplaces and research environments.

  • Gain knowledge on this fast-changing technological direction. Big Data Analytics is probably the fastest evolving issue in the IT world now. New tools and algorithms are being created and adopted swiftly. Get insight on what tools, algorithms, and platforms to use on which types of real world use cases.

  • Get hands-on experience on Analytics, Mobile, Social and Security issues on Big Data through homeworks and final project

  • Final Project Reports will be published as Proceedings and Final Project Software will become Open Source. (Sapphirine Big Data Analytics Open Source Applications first release: Dec 22, 2014)


PROFESSOR CHING-YUNG LIN:


    Dr. Ching-Yung Lin is the IBM Chief Scientist, Graph Computing Research and an IBM Distinguished Researcher. He is also an IEEE Fellow and IEEE Distinguished Lecturer. He has been also an Adjunct Professor in Columbia University since 2005 and New York University since 2014. His interest is mainly on fundamental research of large-scale multimodality signal understanding, network graph computing, and computational social & cognitive sciences, and applied research on security, commerce, and collaboration. Since 2011, he has been leading a team of more than 40 Ph.D. researchers in worldwide IBM Research Labs and more than 20 professors and researchers in 9 universities (Northeastern, Northwestern, Columbia, Minnesota, Rutgers, CMU, New Mexico, USC, and UC Berkeley). He is currently the Principal Investigator of three major Big Data projects: DARPA Anomaly Detection at Multiple Scales (ADAMS), DARPA Social Media in Strategic Communications (SMISC), and ARL Social and Cognitive Network Academic Research Center (SCNARC). He leads a major IBM R&D initiative on Linked Big Data called IBM System G. Dr. Lin was the first IEEE fellow elected for contributions to Network Science. His team recently earned the Best Paper Awards on ACM CIKM 2012 and IEEE BigData 2013.

 


APPLICABLE DEGREE PROGRAMS:

  • Recommended for MS or Ph.D. students in Electrical Engineering, Computer Science or any discipline requires big data analytics.
  • Most courses 4000-level and above can be credited to all degree programs.  All courses are subject to advisor approval.

COURSE FEES:

  • None


EECS E6893: Big Data Analytics

Lecturer/Manager:

Ching-Yung Lin

 

Office Hours:

Thursday 9:30 - 10:00pm or by appointment

Office Location/Phone:

SIPA 417

Email Address:

c {dot} lin {at} columbia {dot} edu

 

Day & Time Class 
Meets on Campus:

Thursday 7:00pm - 9:30pm

Location:

International Affairs Building (SIPA building) 417

Credits for course:

3

Class Type:

Lecture


Prerequisites:

This will be a hands-on course. Students need to know at least one or more programming languages: C, C++, Java, Perl, Python, and/or Javascript to finish homeworks and final project.


Description:

With the advance of IT storage, pcoressing, computation, and sensing technologies, Big Data has become a novel norm of life. Only until recently, computers are able to capture and analysis all sorts of large-scale data from all kinds of fields -- people, behavior, information, devices, sensors, biological signals, finance, vehicles, astronology, neurology, etc. Almost all industries are bracing into the challenge of Big Data and want to dig out valuable information to get insight to solve their challenges.


This course shall provide the fundamental knowledge to equip students being able to handle those challenges. This discipline inherently invoves many fields. Because of its importance and broad impact, new software and hardware tools and algorithms are quickly emerging. A data scientist needs to keep up with this ever changing trends to be able to create a state-of-the-art solution for real-world challenges.


This Big Data Analytics course shall first introduce the overview applications, market trend, and the things to learn. Then, I will introduce the fundamental platforms, such as Hadoop, Spark, and other tools, such as IBM System G for Linked Big Data. Afterwards, the course will introduce several data storage methods and how to upload, distribute, and process them. This shall include HDFS, HBase, KV stores, document database, and graph database. The course will go on to introduce different ways of handling analytics algorithms on different platforms. Then, I will introduce visualization issues and mobile issues on Big Data Analytics. Students will then have fundamental knowledge on Big Data Analytics to handle various real-world challenges.


Afterwards, the course will zoom in to discuss large-scale machine learning methods that are foundations for artificial intelligence and cognitive networks. The course will discuss several methods to optimize the analytics based on different hardware platforms, such as Intel & Power chips, GPU, FPGA, etc. The lectures will conclude with introduction of the future challenges of Big Data, especially on the onging Linked Big Data issues which involves graphs, graphical models, spatio-temporal analysis, cognitive analytics, etc.


Students will choose the topics of their own for a final project. The application domain can be based on the students' own interest. This will be a good opportunity for students to apply what's learned in the class for their needs, either for the future work requirements or for the research problems at hand.


TAs (Graders):

Ghazal Fazelnia (gf2293), Rishina Tah (rt2545), Junkai Yan (jy2654), Siyuan Zhang (sz2476), Rama Kompella(rk2797), Liqun Chen (lc3041), Yongchen Jiang (yj2338), and Tian Han (th2569)

Office Hours:

Monday 9-11am: Ghazal Fazelnia
Monday 3:30-5:30pm: Rama Kompella
Tuesday 9:30-11:30am: Siyuan Zhang
Tuesday 4-6pm: Tian Han
Wednesday 4-6pm: Yongchen Jiang
Thursday 9:30-11:30am: Liqun Chen
Friday 9:30-11:30am: Rishina Tah
Friday 4-6pm: Junkai Yan

Office Location/Phone:

CS TA room

 

Required Textbook(s):

None

Reference Textbook(s):

class notes, and reference books or papers

Homework(s):

Three assignments (HW#1 - HW#3) including programming and written reports.

Project(s):

Final project in which students conduct research and hands-on implementation for self-selected topic on Big Data Analytics. Team collaboration of up to 3 students is encouraged.

Paper(s):

Report for each homework, the final project proposal, and the final project result. oral presentation of the final project results required. Remote student will use web conferecing tool to present the final project.

Midterm Exam:

None

Final Exam:

None

Grading:

Three homework assignments: 50%, Final Project (proposal, presentation, and report): 50%

Hardware
requirements:

PC with Internet access. 

Software
requirements:

Students may use their preferred software (C, C++, Java, Python, Perl, and/or Javascript) on their computers to complete homework assignments. 

Homework
submission:



by submission through Columbia CourseWorks

Course Outline

Class Date

Class 
Number

Topics Covered

Assignment

Due

09/10/15

1

Introduction to Big Data Analytics

 

09/17/15

2

Big Data Platforms

HW #0 (Download Hadoop, no submission)

09/24/15

3

Big Data Storage and Processing

HW #1 (Data Store & Processing -- Pig, HBase, Hive, and Oozie)

 

10/01/15

4

Big Data Analytics Algorithms -- I (recommender)

 

HW #1

10/08/15

5

Big Data Analytics Algorithms -- II (clustering)

HW #2 (Recommendation, Clustering, and Classification)

10/15/15

6

Big Data Analytics Algorithms -- III (classification)

10/22/15

7

Spark and Data Analytics

HW #2

10/29/15

8

Linked Big Data -- I (Graph DB)

HW #3 (In-Memory and Graph Computing -- Spark and System G)

11/05/15

9

Linked Big Data -- II (Graph Analytics)

11/12/15

10

Big Data Applications (TBA)

HW #3

11/19/15

11

Final Project Proposal Presentations

 

Proposal Slides

11/26/15

 

NO CLASS -- Thanksgiving Holiday

 

 

12/03/15

12

Big Data Visualization

 

 

12/10/15

13

Big Data Applications (TBA)

 

 

12/17/15 & 12/18/15

14

Big Data Analytics Workshop

 

Final Project Slides

 

 


EECS E6895: Advanced Big Data Analytics

Lecturer/Manager:

Ching-Yung Lin

 

Office Hours:

Thursday 9:30 - 10:00pm or by appointment

Office Location/Phone:

Mudd 535

Email Address:

c {dot} lin {at} columbia {dot} edu

 

Day & Time Class 
Meets on Campus:

Thursday 7:00pm - 9:30pm

Location:

Hamilton 517

Credits for course:

3

Class Type:

Lecture


Prerequisites:

This will be a hands-on course. Students need to know at least one or more programming languages: C, C++, Java, Perl, Python, and/or Javascript to finish homeworks and final project.


TAs (Graders):

Eric Johnson and David Naveen Dh Arthur

Office Hours:

Eric Johnson: Monday 8-10pm; David Arthur: Friday 2-4pm

Office Location/Phone:

CS TA room, Mudd building

 

Required Textbook(s):

None

Reference Textbook(s):

class notes, and reference books or papers

Homework(s):

Three assignments (HW#1 - HW#3) including programming and written reports.

Project(s):

Final project in which students conduct research and hands-on implementation for self-selected topic on Big Data Analytics. Team collaboration of up to 2 students is encouraged.

Paper(s):

Report for each homework, the final project proposal, and the final project result. oral presentation of the final project results required. Remote student will use web conferecing tool to present the final project.

Midterm Exam:

None

Final Exam:

None

Grading:

Three homework assignments: 50%, Final Project (proposal, presentation, and report): 50%

Hardware
requirements:

PC with Internet access. 

Software
requirements:

Students may use their preferred software (C, C++, Java, Python, Perl, and/or Javascript) on their computers to complete homework assignments. 

Homework
submission:



by submission through Columbia CourseWorks

Class Date

Class 
Number

Topics Covered

Assignment

Due

01/21/16

1

Introduction to Advanced Big Data Analytics

 

01/28/16

2

Big Data Analytics Case Study

 

02/04/16

3

Spark and Data Analytics

HW #1

 

02/11/16

4

Data Store

 

 

02/18/16

5

Social and Cognitive Analytics

HW #2

HW #1

02/25/16

6

Social and Cognitive Analytics II

03/03/16

7

GPU and CUDA

HW #3

03/10/16

8

GPU Programming on Mac, iOS, and AWS

HW #2 (March 11, 9am)

03/24/16

9

Advanced GPU Programming

03/31/16

10

Hardware Acceleration for Machine Learning and Big Data Analytics

HW #3

04/07/16

11

Final Project Proposal Presentations

 

04/14/16

12

Large-Scale Multimedia Analysis

 

 

04/21/16

13

Encrypted Domain Data Mining

 

 

04/28/16

14

Big Data Visualization

 

 

05/12/16

15

Final Project Presentations

 

Final Project Slides