Shortcuts

Convolutional Architectures

This package lists contributed convolutional architectures.


GPT-2

class pl_bolts.models.vision.GPT2(embed_dim, heads, layers, num_positions, vocab_size, num_classes)[source]

Bases: pytorch_lightning.

GPT-2 from language Models are Unsupervised Multitask Learners

Paper by: Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever

Implementation contributed by:

Example:

from pl_bolts.models.vision import GPT2

seq_len = 17
batch_size = 32
vocab_size = 16
x = torch.randint(0, vocab_size, (seq_len, batch_size))
model = GPT2(embed_dim=32, heads=2, layers=2, num_positions=seq_len, vocab_size=vocab_size, num_classes=4)
results = model(x)
forward(x, classify=False)[source]

Expect input as shape [sequence len, batch] If classify, return classification logits


Image GPT

class pl_bolts.models.vision.ImageGPT(embed_dim=16, heads=2, layers=2, pixels=28, vocab_size=16, num_classes=10, classify=False, batch_size=64, learning_rate=0.01, steps=25000, data_dir='.', num_workers=8, **kwargs)[source]

Bases: pytorch_lightning.

Paper: Generative Pretraining from Pixels [original paper code].

Paper by: Mark Che, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever

Implementation contributed by:

Original repo with results and more implementation details:

Example Results (Photo credits: Teddy Koker):

credit-Teddy-Koker credit-Teddy-Koker

Default arguments:

Argument Defaults

Argument

Default

iGPT-S (Chen et al.)

–embed_dim

16

512

–heads

2

8

–layers

8

24

–pixels

28

32

–vocab_size

16

512

–num_classes

10

10

–batch_size

64

128

–learning_rate

0.01

0.01

–steps

25000

1000000

Example:

import pytorch_lightning as pl
from pl_bolts.models.vision import ImageGPT

dm = MNISTDataModule('.')
model = ImageGPT(dm)

pl.Trainer(gpu=4).fit(model)

As script:

cd pl_bolts/models/vision/image_gpt
python igpt_module.py --learning_rate 1e-2 --batch_size 32 --gpus 4
Parameters
  • embed_dim (int) – the embedding dim

  • heads (int) – number of attention heads

  • layers (int) – number of layers

  • pixels (int) – number of input pixels

  • vocab_size (int) – vocab size

  • num_classes (int) – number of classes in the input

  • classify (bool) – true if should classify

  • batch_size (int) – the batch size

  • learning_rate (float) – learning rate

  • steps (int) – number of steps for cosine annealing

  • data_dir (str) – where to store data

  • num_workers (int) – num_data workers


Pixel CNN

class pl_bolts.models.vision.PixelCNN(input_channels, hidden_channels=256, num_blocks=5)[source]

Bases: torch.nn.

Implementation of Pixel CNN.

Paper authors: Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu

Implemented by:

  • William Falcon

Example:

>>> from pl_bolts.models.vision import PixelCNN
>>> import torch
...
>>> model = PixelCNN(input_channels=3)
>>> x = torch.rand(5, 3, 64, 64)
>>> out = model(x)
...
>>> out.shape
torch.Size([5, 3, 64, 64])

UNet

class pl_bolts.models.vision.UNet(num_classes, input_channels=3, num_layers=5, features_start=64, bilinear=False)[source]

Bases: torch.nn.

Paper: U-Net: Convolutional Networks for Biomedical Image Segmentation

Paper authors: Olaf Ronneberger, Philipp Fischer, Thomas Brox

Implemented by:

Parameters
  • num_classes (int) – Number of output classes required

  • input_channels (int) – Number of channels in input images (default 3)

  • num_layers (int) – Number of layers in each side of U-net (default 5)

  • features_start (int) – Number of features in first layer (default 64)

  • bilinear (bool) – Whether to use bilinear interpolation or transposed convolutions (default) for upsampling.


Semantic Segmentation

Model template to use for semantic segmentation tasks. The model uses a UNet architecture by default. Override any part of this model to build your own variation.

from pl_bolts.models.vision import SemSegment
from pl_bolts.datamodules import KittiDataModule
import pytorch_lightning as pl

dm = KittiDataModule('path/to/kitt/dataset/', batch_size=4)
model = SemSegment(datamodule=dm)
trainer = pl.Trainer()
trainer.fit(model)
class pl_bolts.models.vision.SemSegment(lr=0.01, num_classes=19, num_layers=5, features_start=64, bilinear=False)[source]

Bases: pytorch_lightning.

Basic model for semantic segmentation. Uses UNet architecture by default.

The default parameters in this model are for the KITTI dataset. Note, if you’d like to use this model as is, you will first need to download the KITTI dataset yourself. You can download the dataset here.

Implemented by:

Parameters
  • num_layers (int) – number of layers in each side of U-net (default 5)

  • features_start (int) – number of features in first layer (default 64)

  • bilinear (bool) – whether to use bilinear interpolation (True) or transposed convolutions (default) for upsampling.

  • lr (float) – learning (default 0.01)

Read the Docs v: 0.3.4
Versions
latest
stable
0.3.4
0.3.2
0.3.1
0.3.0
0.2.5
0.2.4
0.2.3
0.2.2
0.2.1
0.2.0
0.1.1
docs-build-rtd
0.1.0
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.