Monthly Archives: May 2012

METU SIAM Workshop: Mathematics, Statistics and Data Deluge

                                                     

 

Massive amounts of data are collected every day, often from services we use regularly, but never think about. Scientific data comes in massive amounts from sensor networks, astronomical instruments, biometric devices, etc., and needs to be sorted out and understood. Personal data from our Google searches, our Facebook or Twitter activities, our credit card purchases, our travel habits, and so on, are being mined to provide information and insight. These data sets provide great opportunities, and pose dangers as well.

Four international Math, Applied Math and Statistics organizations (SIAM, AMS, MAA and ASA) have suggested “Mathematics, Statistics, and the Data Deluge” as the theme of this years Math Awareness Month. Our student chapter is proud to present a four day/ two week workshop on the subject. Six scholars from various departments of our university will present lectures on computational and mathematical tools that one can use to work effectively with and model large amounts of data.

The program of our workshop is as follows:

  1. May 16, 2012, Wednesday, 11:40-12:30, Estimation of Dynamics under Uncertainty, Gerhard Wilhelm Weber *,
  2. May 17, 2012, Thursday, 11:40-12:30, Fractals in Cognitive Science, Annette Hohenberger and Halil Düzcü *,
  3. May 17, 2012, Thursday, 14:00-17:30, Introduction to R and GGobi, Ozlem Ilk **,
  4. May 23, 2012, Wednesday, 14:00-16:30, MapReduce and Hadoop, Mining Big Data in the Cloud, Aybar Acar *,
  5. May 23, 2012, Wednesday, 11:40-12:30, Introduction to Clustering, Cem Iyigun *,
  6. May 24, 2012, Thursday, 14:00-15:30, Modeling with MARS and CMARS, Fatma Yerlikaya-Ozkurt * (Cancelled)

** Department of Mathematics, Computer Lab, M202,

* Institute of Applied Math, S209

The R Workshop that will be led by Ozlem Ilk will be completely interactive; the users will be able to implement in realtime everything introduced in the workshop on the computers available in the lab where the workshop will take place. We cordially thank the mathematics department for hosting this event.

All members of our university are invited. If you have any questions please contact us at metusiam@gmail.com.

Modeling with MARS and CMARS

Fatma Yerlikaya-Ozkurt

Institute of Applied Math, Scientific Computing Program

May 24, 2012, Thursday, 14:00-15:30
Institute of Applied Math, S209

MARS: Multivariate Adaptive Regression Splines
CMARS: Conic (Convex, Continuous) Multivariate Adaptive Regression Splines

(Cancelled)

Conic (Convex, Continuous) Multivariate Adaptive Regression Splines (CMARS), developed at Institute of Applied Mathematics (IAM), Middle East Technical University (METU), is an alternative approach to the Multivariate Adaptive Regression Splines (MARS). MARS is a well-known data mining tool capable of modeling high-dimensional data with nonlinear structure. Flexible nature of MARS modeling leads successful implementation of the method in various application areas. On the other hand, CMARS is based on a penalized residual sum of squares (PRSS) for MARS as a Tikhonov regularization (TR) problem. CMARS treats this problem by a continuous optimization technique, in particular, the framework of Conic Quadratic Programming (CQP). These convex optimization problems are very well-structured, herewith resembling linear programs and, hence, permitting the use of interior point methods. CMARS and MARS methods are preferred by modelers and data analysts in the fields of nonparametric regression/classification, and data mining. By this talk, the implementation of MARS and CMARS methods will be provided on test data. For CMARS algorithm, a user-friendly computing environment which is written in MATLAB will be used. For MARS, R program, a well-known and popular statistical software, will be preferred.

Fractals in Cognitive Science

Annette Hohenberger and Halil Düzcü

METU, Informatics Institute, Cognitive Science Program

May 17, 2012, Thursday, 11:40-12:30
Institute of Applied Math, S209

Fractals are self-similar structures with a precise definition in mathematics. They are ubiquitous in nature: coastlines, clouds, flowers, bodily organs, and many more, exhibit self-similarity. Their existence in the cognitive sciences, however, is just beginning to be revealed and their significance is not yet fully appreciated.

In our talk, we will distinguish two kinds of self-similarity: geometrical and statistical self-similarity. An example of the former is the Koch curve; an example of the latter is the power law scaling of probability density functions observed in the temporal patterns of cognitive processes. We will present examples of such power-laws in eye-gaze patterns and response latencies (reaction times) and explain what they possibly reveal about the functioning of the mind.

We will discuss with the audience how the concept of fractals may raise the mutual awareness of cognitive and natural scientists of the similarity of the processes they are studying in their respective areas.

Estimation of Dynamics Under Uncertainty

Gerhard Wilhelm Weber

METU, Institute of Applied Mathematics

May 16, Wednesday, 11:40-12:30
Institute of Applied Math, S209

This presentation introduces recent research efforts and results in identifying and predicting time-dependent processes, based on given data and with various degrees of model discontinuity and uncertainty. This generalization is gradually unfolded, while always being motivated by real-world challenges and applications of them.

We aim at displaying joy and interest in state-of-the art applied mathematics and to invite to education, research and developing our countries by its help, in our scientific community. In fact, we try to “make appetite” for this!

Introduction to R and GGobi

Özlem İlk

Middle East Technical University, Department of Statistics

May 17, Thursday, 14:00-17:30

Department of Mathematics, Computer Lab

Introduction to GGobi: A free data visiualization software

Duration: 1 hour

GGobi is a free software for visualizing high dimensional data. In this short course on GGobi, we will start with downloading it from internet. Later, basic properties, such as brushing, identifying, jittering, will be illustrated. We will also demonstrate the following tools of the software: variable manipulation, handling missing data, case subsetting and sampling. Interactive graphics will
be illustrated through rotation and projection of high dimensional data. The methods will be demonstrated on some demo datasets available in GGobi.

Introduction to R: A Free Computer Language and Computing Environment for Everyone

R is one of the most popular software for statistical computing and graphics; and yet the users are not restricted with statisticians anymore. In this short course on R, we will start with demonstrating how to download this free software from internet. Later, connecting to packages,libraries and help menus will be illustrated. One of the biggest challenges for new R users is reading data into the environment. Different solutions will be proposed for this issue. Moreover, how to save your results to an outside file will be covered. Basic applications, such as matrixoperations, random number generation, creating graphics, writing your own small functions, will be provided as well.

Introduction to Clustering

Cem Iyigun

METU, Department of Industrial Engineering

May 23, Wednesday 11:40-12:30
Institute of Applied Math, S209

Clustering is a process of partitioning (classification) of data points (observations, patterns) into disjoint groups (clusters) of similar objects. The search for clusters is a method of unsupervised learning, used in many areas, including statistics, machine learning, data mining, operations research, bioinformatics, facility location, and across multiple application areas including genetics, taxonomy, medicine, marketing, finance, and e-commerce.This talk presents an overview of clustering methods and reviews the different clustering algorithms and methods and gives the mathematical approaches to clustering problems.

MapReduce and Hadoop: Mining Big Data in the Cloud

METU Graduate School of Informatics

May 23, Wednesday, 14:00-16:30
Institute of Applied Mathematics, S209

This talk will be an introduction of the MapReduce [1] paradigm for distributed computing originally developed by Google for processing extremely large amounts of data. MapReduce scales the functional programming operators`map’ and `fold’ up to large, heterogenous, and loosely coupled computing clusters in order to perform arbitrarily complex processing in a parallel and distributed manner. The talk will elaborate on the details of computation and data flow in MapReduce using examples from well known algorithms in information retrieval and data mining.

Hadoop [2] is a software framework developed by the Apache Software Foundation that provides open-source and accessible derivative implementations of MapReduce and the closely related Google (distributed) File System [3]. Hadoop allows researchers and developers to utilize MapReduce for their own projects and, thus, has played a big role in the popularity of MapReduce in both enterprise and academic settings. The talk will give a broad overview of the Hadoop framework in parallel with the discussion on MapReduce, and hopefully give interested listeners enough information to start utilizing Hadoop/MapReduce for their own research.

The latter part of the talk will include a demo illustrating the application of MapReduce to data clustering (and market basket analysis, if time permits) using several Hadoop instances running on Amazon EC2 [4].

[1] Dean J. and Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters; Proc. of OSDI’04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, Dec. 2004

[2] The Apache Hadoop Project, http://hadoop.apache.org/

[3] Ghemawat S., Gobioff H., Leung S-T. The Google File System; Proc. of SOSP’ 03: 19th ACM Symposium on Operating Systems Principles, Lake George, NY, Oct. 2003

[4] Amazon Elastic Compute Cloud, http://aws.amazon.com/ec2/