Category Archives: news

Journal Publication!

Our paper entitled “LoRA-NIR: Low-Rank Adaptation of Vision Transformers for Remote Sensing with Near-Infrared Imagery” has been accepted for publication in IEEE Geoscience and Remote Sensing Letters. This is a collaborative study with Dr. Ulku and Dr. Tanriover from Ankara University.

Abstract

Plant health can be monitored dynamically using multispectral sensors that measure Near-Infrared reflectance (NIR). Despite this potential, obtaining and annotating high-resolution NIR images poses a significant challenge for training deep neural networks. Typically, large networks pre-trained on the RGB domain are utilized to fine-tune infrared images. This practice introduces a domain shift issue because of the differing visual traits between RGB and NIR images. As an alternative to fine-tuning, a method called low-rank adaptation (LoRA) enables more efficient training by optimizing rank-decomposition matrices while keeping the original network weights frozen. However, existing parameter-efficient adaptation strategies for remote sensing images focus on RGB images and overlook domain shift issues in the NIR domain. Therefore, this study investigates the potential benefits of using vision transformer (ViT) backbones pre-trained in the RGB domain, with low-rank adaptation for downstream tasks in the NIR domain. Extensive experiments demonstrate that employing LoRA with pre-trained ViT backbones yields the best performance for downstream tasks applied to NIR images.

The official link can be found here: https://ieeexplore.ieee…
and the arXiv preprint is here: https://arxiv.org/abs/2405.17901

WCEE 2024 Publication!

Our paper entitled “Deep Learning-based Average Shear Wave Velocity Prediction Using Accelerometer Records” has been accepted for publication in the18th World Conference on Earthquake Engineering (WCEE2024). Anybody interested in the title should definitely take a look!

Abstract

Assessing seismic hazards and thereby designing earthquake-resilient structures or evaluating structural damage that has been incurred after an earthquake are important objectives in earthquake engineering. Both tasks require critical evaluation of strong ground motion records, and the knowledge of site conditions at the earthquake stations plays a major role in achieving the aforementioned objectives. Site conditions are generally represented by the time-averaged shear wave velocity in the upper 30 meters of the geological materials (Vs30). Several strong motion stations lack Vs30 measurements, resulting in potentially inaccurate assessment of seismic hazards and evaluation of ground motion records. In this study, we present a deep learning-based approach for predicting Vs30 at strong motion station locations using three-channel earthquake records. For this purpose, Convolutional Neural Networks (CNNs) with dilated and causal convolutional layers are used to extract deep features from accelerometer records collected from over 700 stations located in Turkey. In order to overcome the limited availability of labeled data, we propose a two-phase training approach. In the first phase, a CNN is trained to estimate the epicenters, for which ground truth is available for all records. After the CNN is trained, the pre-trained encoder is fine-tuned based on the Vs30 ground truth. The performance of the proposed method is compared with machine learning models that utilize hand-crafted features. The results demonstrate that the deep convolutional encoder based Vs30 prediction model outperforms the machine learning models that rely on hand-crafted features. This suggests that our computational model can extract meaningful and informative features from the accelerometer records, enabling more accurate Vs30 predictions. The findings of this study highlight the potential of deep learning-based approaches in seismology and earthquake engineering.

CVPR 2024 Publication!

Our paper entitled “Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts” has been accepted for publication in the CVPR2024 Prompting in Vision (PV) Workshop. Anybody interested in the title should definitely take a look!

Abstract

Visual question answering (VQA) is known as an AI-complete task as it requires understanding, reasoning, and inferring about the vision and the language content. Over the past few years, numerous neural architectures have been suggested for the VQA problem. However, achieving success in zero-shot VQA remains a challenge due to its requirement for advanced generalization and reasoning skills. This study explores the impact of incorporating image captioning as an intermediary process within the VQA pipeline. Specifically, we explore the efficacy of utilizing image captions instead of images and leveraging large language models (LLMs) to establish a zero-shot setting. Since image captioning is the most crucial step in this process, we compare the impact of state-of-the-art image captioning models on VQA performance across various question types in terms of structure and semantics. We propose a straightforward and efficient question-driven image captioning approach within this pipeline to transfer contextual information into the question-answering (QA) model. This method involves extracting keywords from the question, generating a caption for each image-question pair using the keywords, and incorporating the question-driven caption into the LLM prompt. We evaluate the efficacy of using general-purpose and question-driven image captions in the VQA pipeline. Our study highlights the potential of employing image captions and harnessing the capabilities of LLMs to achieve competitive performance on GQA under the zero-shot setting. Our code is available here.

The official link can be found here: https://openaccess.thecvf…
and the arXiv preprint is here: https://arxiv.org/abs/2404.08589

This year, I will be personally attending the conference. So see you in Seattle!

AIRLab is founded!

Applied Intelligence Research Laboratory based at METU Informatics Institute is founded!

As a principal investigator behind its inception, I envision a lasting legacy for AIRLab—educating numerous students and successfully completing numerous projects in the years ahead!

Please visit our official website:
airlab-ii.metu.edu.tr
or contact us via email:
airlab@metu.edu.tr

Long live the AIRLab!

ICMV 2023 Publications!

We have two papers in ICMV2023:

Sequence Models for Drone vs Bird Classification
Fatih Çağatay Akyon, Erdem Akagündüz, Sinan Onur Altınuç, Alptekin Temizel

EANet: Enhanced Attribute-Based RGBT Tracker Network
Abbas Türkoğlu and Erdem Akagündüz

The official links will be available soon!

New Course: “MMI714”

Starting with 2023-2024 Fall Semester,
I will be lecturing a new course:
“MMI714 – Generative Models for Multimedia”

This advanced deep learning course offers a comprehensive introduction to the principles and practice of generative modelling.  Beginning with a review of the mathematical foundations required for the course, students will gain an understanding of the conventional autoregressive methods used in generative modelling, as well as more contemporary techniques such as deep generative neural models and diffusion models. The course covers all fundamental concepts related to generating media, including latent spaces, latent codes, and encoding. Throughout the course, students will have access to a wide range of resources, including lectures, readings, and hands on projects. In addition, a thorough review of recent state-of-the-art studies in the field will be provided each year to ensure students are up to date with the latest advances. By the end of the course, students will have gained the skills and knowledge necessary to tackle real-world generative modelling challenges and become proficient practitioners in this field..

During the registrations, MMI714 will only be available for MMI and DI students of the Informatics Institute. At this point, we do not know if there is going to be a sufficient quota for other departments’ students. Only during the add-drops, we will be able to tell you if you will be officially registered for this course. The first two weeks of the course will be accepting visitors.

Journal Publication!

Our paper entitled “A Survey on Infrared Image and Video Sets” has been accepted for publication in the Journal of Multimedia Tools & Applications. This was a part of my student Kevser İrem Danacı’s MSc. thesis. Just like the last sentence of the abstract says, we believe that this survey will be a guideline for computer vision and artificial intelligence researchers that are interested in working with the spectra beyond the visible domain.

Abstract

In this survey, we compile a list of publicly available infrared image and video sets for artificial intelligence and computer vision researchers. We mainly focus on IR image and video sets which are collected and labelled for computer vision applications such as object detection, object segmentation, classification, and motion detection. We categorize 92 different publicly available or private sets according to their sensor types, image resolution, and scale. We describe each and every set in detail regarding their collection purpose, operation environment, optical system properties, and area of application. We also cover a general overview of fundamental concepts that relate to IR imagery, such as IR radiation, IR detectors, IR optics and application fields. We analyse the statistical significance of the entire corpus from different perspectives. We believe that this survey will be a guideline for computer vision and artificial intelligence researchers that are interested in working with the spectra beyond the visible domain.

The official link can be found here: https://link.springer.com/article/…,
and the arXiv preprint is here: https://arxiv.org/abs/2203.08581

Journal Publication!

Our paper entitled “Deep Semantic Segmentation of Trees Using Multi-Spectral Images” has been accepted for publication in the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. This was a collaborative study with Dr. Ulku from Ankara University and Dr. Ghamisi from IARAI Austria.

Abstract

Forests can be efficiently monitored by automatic semantic segmentation of trees using satellite and/or aerial images. Still, several challenges can make the problem difficult, including the varying spectral signature of different trees, lack of sufficient labelled data, and geometrical occlusions. In this paper, we address the tree segmentation problem using multispectral imagery. While we carry out large-scale experiments on several deep learning architectures using various spectral input combinations, we also attempt to explore whether hand-crafted spectral vegetation indices can improve the performance of deep learning models in the segmentation of trees. Our experiments include benchmarking a variety of multispectral remote sensing image sets, deep semantic segmentation architectures, and various spectral bands as inputs, including a number of hand-crafted spectral vegetation indices. From our large-scale experiments, we draw several useful conclusions. One particularly important conclusion is that, with no additional computation burden, combining different categories of multispectral vegetation indices, such as NVDI, ARVI, and SAVI, within a single three-channel input, and using the state-of-the-art semantic segmentation architectures, tree segmentation accuracy can be improved under certain conditions, compared to using high-resolution visible and/or near-infrared input.

The official link can be found here: https://ieeexplore.ieee.org/document/9872072

ECCV 2022 Publication!

Our paper entitled “Detecting Driver Drowsiness as an Anomaly Using LSTM Autoencoders” has been accepted for publication in the to the First In-Vehicle Sensing And Monitorization (ISM) Workshop at ECCV 2022!

Abstract

In this paper, an LSTM autoencoder-based architecture is utilized for drowsiness detection with ResNet-34 as feature extractor. The problem is considered as anomaly detection for a single subject; therefore, only the normal driving representations are learned and it is expected that drowsiness representations, yielding higher reconstruction losses, are to be distinguished according to the knowledge of the network. In our study, the confidence levels of normal and anomaly clips are inves- tigated through the methodology of label assignment such that training performance of LSTM autoencoder and interpretation of anomalies encountered during testing are analyzed under varying confidence rates. Our method is experimented on NTHU-DDD and benchmarked with a state-of-the-art anomaly detection method for driver drowsiness. Results show that the proposed model achieves detection rate of 0.8740 area under curve (AUC) and is able to provide significant improvements on certain scenarios.

The official link can be found here: https://link.springer.com…
and the arXiv preprint is here: https://arxiv.org/abs/2209.05269