CS231n: Convolutional Neural Networks for Visual Recognition | | Comprehensive online course notes covering convolutional neural networks for visual recognition. | Fundamental resource for learning about CNNs, widely used in computer vision. |
Understanding LSTM Networks | | A detailed explanation of how Long Short-Term Memory networks work. | Essential reading for understanding LSTMs, a key component in early sequence modeling. |
The Annotated Transformer | | An implementation and explanation of the Transformer model from 'Attention Is All You Need' using PyTorch. | Crucial resource for understanding the Transformer architecture, which is foundational for modern NLP. |
The Unreasonable Effectiveness of RNNs | | An exploration of the capabilities and surprising effectiveness of simple Recurrent Neural Networks on various sequence tasks. | Highlights the power of simple RNNs and the importance of data and computational resources. |
Neural Machine Translation | | Introduces an end-to-end neural network for machine translation. | A foundational paper in the development of neural machine translation. |
Attention Is All You Need | | Proposes the Transformer model, relying solely on attention mechanisms, achieving state-of-the-art results in machine translation. | Revolutionized sequence modeling by demonstrating the effectiveness of attention mechanisms, leading to the development of models like BERT and GPT. |
ImageNet Classification with Deep CNNs (AlexNet) | | Pioneering work using deep convolutional neural networks for image classification on the ImageNet dataset. | Demonstrated the power of deep learning for image recognition and spurred significant research in the field. |
Recurrent Neural Network Regularization | | Introduces dropout and recurrent dropout for regularizing Recurrent Neural Networks. | Introduced important regularization techniques for RNNs to combat overfitting. |
Pointer Networks | | Proposes a sequence-to-sequence model that learns to select output elements from an input sequence based on attention. | Useful for tasks where the output is a subset of the input, like sequence-to-sequence tasks with limited vocabulary. |
Relation Networks | | Introduces a module that learns to reason about relationships between entities. | Applicable to tasks requiring relational reasoning, such as question answering and visual reasoning. |
Identity Mappings in Deep Residual Networks | | Investigates the importance of identity mappings in deep residual networks and proposes a revised Residual Network architecture. | Contributed to the understanding and improvement of very deep residual networks. |
Deep Residual Learning for Image Recognition | | Introduces Residual Networks (ResNets) that use skip connections to enable training very deep neural networks. | A landmark paper that enabled training much deeper networks and significantly improved performance on various tasks. |
Multi-Scale Context Aggregation by Dilated Convolutions | | Proposes dilated convolutions for increasing the receptive field without losing resolution, useful for semantic segmentation. | Important for tasks requiring a wide receptive field, such as dense prediction tasks. |
Neural Message Passing for Quantum Chemistry | | Applies neural message passing to predict molecular properties. | Shows how neural networks can be applied to problems in computational chemistry. |
Neural Turing Machines | | Introduces a neural network architecture that combines aspects of neural networks and Turing machines, allowing for external memory interaction. | Explores the potential for neural networks to learn algorithmic procedures. |
Variational Lossy Autoencoder | | Proposes a variational autoencoder that learns a disentangled representation. | Contributes to the field of generative models and learning disentangled representations. |
Relational RNNs | | Introduces a recurrent neural network architecture that explicitly models relationships between states. | Offers a way to explicitly model relationships within sequences. |
Deep Speech 2 | | An end-to-end deep learning approach for speech recognition. | Demonstrates the effectiveness of end-to-end deep learning for speech recognition. |
GPipe | | Describes a system for training very large neural networks by partitioning them across multiple accelerators. | Enables training of much larger models than previously possible, pushing the boundaries of model size. |
Scaling Laws for Neural Language Models | | Studies how the performance of large language models scales with the size of the dataset, model, and compute. | Provides insights into how to effectively scale language models to achieve better performance. |
MDL for Neural Weights | | Explores using Minimum Description Length (MDL) to understand and regularize neural network weights. | Offers a theoretical framework for understanding and regularizing neural networks based on compression. |
MDL Tutorial | | A tutorial on the Minimum Description Length principle. | Provides a deeper understanding of the MDL principle and its applications. |
The First Law of Complexodynamics | | A theoretical exploration of the thermodynamics of computation. | A theoretical work exploring the fundamental limits of computation. |
Coffee Automaton | | A simple cellular automaton demonstrating complex behavior. | A simple model demonstrating how complex behavior can emerge from simple rules. |
Kolmogorov Complexity and Algorithmic Randomness | | Introduces the concepts of Kolmogorov complexity and algorithmic randomness. | Provides the theoretical foundation for measuring the complexity and randomness of individual objects. |
Machine Super Intelligence | | Discusses the potential and implications of achieving machine super intelligence. | A thought-provoking discussion on the future potential of AI. |