Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

publications

Monocular Depth Estimation with Self-supervised Instance Adaptation

ArXiv, 2020

Recent advances in self-supervised learning have demonstrated that it is possible to learn accurate monocular depth reconstruction from raw video data, without using any 3D ground truth for supervision. However, in robotics applications, multiple views of a scene may or may not be available, depending on the actions of the robot, switching between monocular and multi-view reconstruction. To address this mixed setting, we proposed a new approach that extends any off-the-shelf self-supervised monocular depth reconstruction system to usemore than one image at test time. Our method builds on a standard prior learned to perform monocular reconstruction, but uses self-supervision at test time to further improve the reconstruction accuracy when multiple images are available. When used to update the correct components of the model, this approach is highly-effective. On the standard KITTI bench-mark, our self-supervised method consistently outperforms all the previous methods with an average 25% reduction in absolute error for the three common setups (monocular, stereo and monocular+stereo), and comes very close in accuracy when compared to the fully-supervised state-of-the-art methods.

Download here

Real Time Monocular Vehicle Velocity Estimation using Synthetic Data

IEEE Intelligent Vehicles, 2021

Vision is one of the primary sensing modalities in autonomous driving. In this paper we look at the problem of estimating the velocity of road vehicles from a camera mounted on a moving car. Contrary to prior methods that train end-to-end deep networks that estimate the vehicles’ velocity from the video pixels, we propose a two-step approach where first an off-the-shelf tracker is used to extract vehicle bounding boxes and then a small neural network is used to regress the vehicle velocity from the tracked bounding boxes. Surprisingly, we find that this still achieves state-of-the-art estimation performance with the significant benefit of separating perception from dynamics estimation via a clean, interpretable and verifiable interface which allows us distill the statistics which are crucial for velocity estimation. We show that the latter can be used to easily generate synthetic training data in the space of bounding boxes and use this to improve the performance of our method further.

Download here

Calibrating Self-supervised Monocular Depth Estimation

NeurIPS Workshop on Machine Learning for Autonomous Driving (ML4AD), 2021

In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal. Whilst the networks achieve good performance, the often over-looked detail is that due to the inherent ambiguity of monocular vision they predict depth up to an unknown scaling factor. The scaling factor is then typically obtained from the LiDAR ground truth at test time, which severely limits practical applications of these methods. In this paper, we show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.

Download here

Lifting 2D Object Locations to 3D by Discounting LiDAR Outliers across Objects and Views

ICRA, 2022

We present a system for automatic converting of 2D mask object predictions and raw LiDAR point clouds into full 3D bounding boxes of objects. Because the LiDAR point clouds are partial, directly fitting bounding boxes to the point clouds is meaningless. Instead, we suggest that obtaining good results requires sharing information between \emph{all} objects in the dataset jointly, over multiple frames. We then make three improvements to the baseline. First, we address ambiguities in predicting the object rotations via direct optimization in this space while still backpropagating rotation prediction through the model. Second, we explicitly model outliers and task the network with learning their typical patterns, thus better discounting them. Third, we enforce temporal consistency when video data is available. With these contributions, our method significantly outperforms previous work despite the fact that those methods use significantly more complex pipelines, 3D models and additional human-annotated external sources of prior information.

Download here

Direct LiDAR-based object detector training from automated 2D detections

NeurIPS Workshop on Machine Learning for Autonomous Driving (ML4AD), 2022

3D Object detection (3DOD) is an important component of many applications, however existing methods rely heavily on datasets of depth and image data which require expensive annotation in 3D thus limiting the ability of a diverse dataset being collected which truly represents the long tail of potential scenes in the wild. In this work we propose to utilise a readily available robust 2D Object Detector and to transfer information about objects from 2D to 3D, allowing us to train a 3D Object Detector without the need for any human annotation in 3D. We demonstrate that our method significantly outperforms previous 3DOD methods supervised by only 2D annotations, and that our method narrows the accuracy gap between methods that use 3D supervision and those that do not.

Download here

Exploring The landscape of Large Language Models In Medical Question Answering: Observations and Open Questions

Submitted to NEJM AI, 2024

Large Language Models (LLMs) have shown promise in medical question answering by achieving passing scores in standardised exams and have been suggested as tools for supporting healthcare workers. Deploying LLMs into such a high-risk context requires a clear understanding of the limitations of these models. With the rapid development and release of new LLMs, it is especially valuable to identify patterns which exist across models and may, therefore, continue to appear in newer versions. In this paper, we evaluate a wide range of popular LLMs on their knowledge of medical questions in order to better understand their properties as a group. From this comparison, we provide preliminary observations and raise open questions for further research.

Download here

Instruction tuning for large language models: the impact of human-inspired learning strategies

To be submitted to COLM, 2024

teaching

Artificial Intelligence

Teaching Assistant, University of Oxford, Department of Computer Science, 2019

This course is offered to undergraduates and MSc students in computer science. It covered the following topics:

Introduction to AI
Search
Games
Constraint Satisfaction Problems
Machine Learning
Neural Networks

Data Structures and Algorithms

Teaching Assistant, University of Oxford, Department of Computer Science, 2019

This course is offered to undergraduates in computer science. It covered the following topics:

Introduction to Data Structures
Arrays and Linked Lists
Stacks and Queues
Trees
Graphs
Sorting and Searching
Algorithm Analysis
Recursion
Dynamic Programming
Greedy Algorithms
Divide and Conquer
Backtracking
Branch and Bound

Functional Programming

Teaching Assistant, University of Oxford, Department of Computer Science, 2019

This course is offered to undergraduates in computer science. It covered the following topics:

Introduction to Functional Programming
Haskell
Scheme

Computer Vision and Machine Learning

Teaching Assistant, University of Oxford, Department of Engineering Science and Department of Computer Science, 2019

This course is offered to DPhil (PhD) students in computer science and engineering science as part of EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems. It covered the following topics:

Introduction to Computer Vision
Image Formation and Camera Models
Image Processing
Feature Detection and Matching
Image Segmentation
Object Recognition
Object Detection
Object Tracking
3D Reconstruction
Deep Learning for Computer Vision
Visual SLAM

Advanced Language Modelling Methods

Teaching Assistant, University of Oxford, Oxford Internet Institute, 2024

This course is offered to MSc and DPhil students in computer science and social sciences. It covers the following topics:

Introduction to Language Modelling
N-gram Language Models
Neural Language Models
Transformer Models
Attention Mechanism
Self-attention Mechanism
Positional Encoding
Multi-head Attention
Masked Self-attention
Encoder-Decoder Architecture
BERT and GPT
Training Language Models
Fine-tuning Language Models
Language Model Evaluation
Applications of Language Models

Robert McCraith

Sitemap

Pages

Page Not Found

Robert McCraith

CV

Publications

Sitemap

Teaching

Blog posts

Posts

publications

Monocular Depth Estimation with Self-supervised Instance Adaptation

Real Time Monocular Vehicle Velocity Estimation using Synthetic Data

Calibrating Self-supervised Monocular Depth Estimation

Lifting 2D Object Locations to 3D by Discounting LiDAR Outliers across Objects and Views

Direct LiDAR-based object detector training from automated 2D detections

Exploring The landscape of Large Language Models In Medical Question Answering: Observations and Open Questions

Instruction tuning for large language models: the impact of human-inspired learning strategies

teaching

Artificial Intelligence

Data Structures and Algorithms

Functional Programming

Computer Vision and Machine Learning

Advanced Language Modelling Methods