Big Data Team

UST's Big Data Team

The Opportunity

Anyone involved with technology or software is likely to have heard the term “Big Data” used with increasing regularity. It is a hot growth area in information technology that is widely acknowledged to be a rich area for research, entrepreneurism and engineering. A McKinsey Global Institute report, titled “Big Data the Next Frontier for Innovation, Competition and Productivity,”¹ indicates that healthcare, the European public sector, retail, manufacturing and personal-location data could each contribute billions of dollars to the global economy in the next six years. It goes on to point out that “140,000-190,000 more deep analytical talent positions, and 1.5 million more data savvy-managers [will be] needed to take full advantage of Big Data in the United States” alone before 2018. Given the demand for expertise in dealing with Big Data, and given the specific aspects of expertise in all facets of Big Data across the Graduate Programs in Software (GPS) faculty, UST launched the Center of Excellence for Big Data in the spring of 2012.

What is “Big Data?” As with any emerging technology concept, definitions vary, but it is generally understood to mean data where the:

  • volume is beyond the capacity of traditional data bases (yes, multiple data bases).
  • velocity of the data is high (as in data streaming in from Twitter feeds, or blogs).
  • variety is large (perhaps from unstructured data sets like customer emails or YouTube videos), and some add
  • veracity² is a challenge since data come from unverified sources external to the organization.

So, the UST Center of Excellence for Big Data (CoE4BD) allows GPS to focus and leverage our skills to meet market demands. When the center was founded, it was the first university center for excellence in Big Data, and, among corporations, only HP had any efforts in this area.

Faculty Experience

Much of the momentum in starting this center came from the combined expertise of GPS faculty. Over the past decade, GPS faculty quietly had been amassing the skills and expertise critical for forming this center. Some of the expertise was commensurate with being the oldest and most successful graduate program focused on software in the United States. GPS was founded in 1985. At that time computer science and computer engineering departments abounded, but only about four U.S. schools were focusing their graduate programs specifically on Software Engineering. Software Engineering, as a discipline was just beginning to emerge as a specialty recognized by international standards organizations like IEEE (the Institute for Electronic and Electrical Engineers, the world’s largest professional association for the advancement of technology), and Twin Cities’ employers demanded credentialed software engineers as their employees. Since then, GPS has graduated over 3,000 alumni with high-quality, cutting-edge technologies. That need to be on the cutting edge of emerging technologies is what positioned GPS faculty to be prepared to stand up the CoE4BD.

Varied Activities

The plan for the center is that it will support students and the technology community by providing:

  • Course Support
  • Internal GPS Research
  • Cross-Disciplinary UST Research
  • Local Company Joint Research
  • External Funding
  • Faculty Consulting Opportunities
  • GPS Marketing Opportunities

The support for courses already has been deep and broad. Six courses have been revamped to include Big Data technologies, and two new courses have been developed. A Big Data certificate (four courses) has been approved and is enrolling students as well as alumni eager to gain this certification.

In addition the CoE4BD hosts a cluster of computer comprising a Hadoop software suite, shown in Figure 1. The computers include a Master node and 22 Worker nodes. The Master Node software includes:

  • HDFS (Hadoop File System)
  • HBase (non-relational database)
  • Hive (Data Warehousing)
  • Zookeeper (Workflow )
  • MySQL (database)
  • Cloudera (System Manager)
  • Pig
  • Implal

Each worker hardware node includes software for Hadoop Map/Reduce software that does the distributed processing send to the node.

Figure 1: The Center of Excellence for Big Data’s Hadoop Cluster comprises hardware (Internet connections, Network Switch, “Worker” and “Master” computer nodes) as well as software (Hadoop Map/Reduce).

Figure 1: The Center of Excellence for Big Data’s Hadoop Cluster comprises hardware (Internet connections, Network Switch, “Worker” and “Master” computer nodes) as well as software (Hadoop Map/Reduce).

 

Joint Research

GPS’s position as a technology leader has led to many new opportunities for students and professors. UST hosts the Twin Cities Hadoop (a leading Big Data tool) Users Group; Big Data class projects have been completed for Supervalu; St. Thomas (analyzing logs of Web connec- tion data) and startup companies: LiquidSpace, CogCubed, SaSolabra. Existing projects continue with Medtronic and with the UST Biology Department.

Long-Term Vision

The center’s intention is to become a leader in research by testing Big Data ideas on UST’s Big Data computing clusters; to foster education by providing tools and techniques to students and companies; and to deliver best practices with publications comparing performance and results of Big Data technologies.

Faculty’s vision for the center is that it will become:

  • nationally recognized as a leader in Big Data technologies;
  • a producer of strong candidates for internships, new hires and retooled employees;
  • a vehicle to more closely integrate GPS faculty research;
  • a trusted provider of consulting, best practices and training for a wide variety of Twin Cities businesses;
  • a UST-wide resource, increasing campus visibility and influence;
  • a Twin Cities resource for businesses, with mutual benefit;
  • an attractor for GPS recruits, increasing enrollment, and;
  • a vehicle for increasing seminar, publication and grant activity.

Bonnie H. Holub, Ph. D.

Holub was the Honeywell Chair in Global Technology Management in Graduate Programs in Software at the School of Engineering.

Holub is a founder and former CEO of Adventium Labs/Adventium Enterprises, a nonprofit research and development lab. She returned to St. Thomas in 2010. Holub was a tenured faculty member in GPS from 1987 to 2004, during which she founded and directed the Artificial Intelligence/High Performance and Parallel Computing Lab.

Holub holds a Ph.D. in computer science/artificial intelligence from the University of Minnesota. In 2009 she was named the Distinguished Alumnus of the Computer and Engineering Department at the University of Minnesota, Minneapolis, Minn.

Saeed Rahimi, Ph. D.

Rahimi is associate professor in Graduate Programs in Software at the School of Engineering.

He has taught courses in database management systems, management information systems, CASE technologies, data modeling, database modeling, database administration, distributed databases, system simulation and operating systems at St. Thomas and the University of Minnesota for more than 28 years.

Rahimi has held numerous technical and managerial positions for the past 33 years and conducted research and development of database and communications technologies for integrating distributed databases and computer systems. He is known as an authority in databases, distributed database management systems, data modeling, database design and implementation.

Bhabani Misra, Ph. D.

Misra is associate dean, graduate programs, School of Engineering.

Misra earned his doctorate degree and master’s degree in computer engineering and computer science, respectively, from North Dakota State University, Fargo, N.D. He joined St. Thomas in fall 1988. Misra has taught both graduate and undergraduate courses including software engineering, real-time systems, operating systems design, computer architecture, microprocessors, data and file structures and software architecture. Misra’s current interests are in the area of software architecture, software product lines, and service-oriented architecture.

Frank Haug, MSSE

Haug is instructor in Graduate Programs in Software at the School of Engineering.

Recent research interests include software development and database management systems (DBMS), distributed database management systems, data warehousing, metadata modeling and management, software quality and software lifecycle management areas.

Haug has worked as both an employee and as a consultant, spanning the areas of software development, database design and implementation, repositories, data dictionaries, metadata management, and system and network administration.

In addition to performing design and implementation, he has provided on-site instruction. Clients include Microsoft, American Family Insurance, Prairie Island Power Plant, and others.

Chih Lai, Ph. D.

Lai is associate professor in Graduate Programs in Software at the School of Engineering.

Lai has taught courses in data mining, multimedia information retrieval, real-time systems and software engineering. In 2010, Lai also worked as a visiting professor at the Informatics Department of Trier University of Applied Science in Germany. Lai is the 2004 University MAXI Grant recipient.

Previously, Lai was a principal software engineer, working on a next generation aircraft collision avoidance system (ADS-B), of which FAA mandates all aircraft to equip by 2020. Dr. Lai received three U.S. patents and three European patents, all related to aircraft collision avoidance algorithms. He also works with Medtronic on patient monitoring and movement analysis.

Brad Rubin, Ph. D.

Rubin is associate professor in Graduate Programs in Software at the School of Engineering.

Rubin teaches software analysis and design, computer security, advanced computer security, TCP/IP protocols, and information retrieval. He is pursuing a research agenda in quantum computation.

Rubin spent 14 years with IBM in Rochester, Minn., working on all facets of the AS/400 hardware and software development. He was a key player in IBM’s move to embrace the Java platform and was lead architect of IBM’s largest Java application, a business application framework product called San Francisco (now part of WebSphere). He was also chief technology officer for the Data Storage and Information Management division of Imation Corp., and leader of its R&D organization.

_____________________________________
¹ McKinsey Global Institute, “Big Data: The Next Frontier for Innovation, Competition, and Productivity,” June 2011. https://www.mckinsey. com/insights/mgi/research/technology_and_in- novation/big_data_the_next_frontier_for_inno- vation.

² Eaton, C., Deroos, D., Deutsch, T., Lapis, G., Zikopolous, Understanding Big Data, McGraw Hill, 2012.

 

From Exemplars, a publication of the Grants and Research Office.