Deep Learning

Computer Vision

PixelMon: Deep Learning Image Generation Framework

PixelMon is a modular PyTorch framework for image generation using Variational Autoencoders and Generative Adversarial Networks. Designed for rapid experimentation and extensibility, it enables researchers and developers to explore generative models across diverse visual domains.

GAN-generated anime faces: example of high-quality synthesis

Model Architectures & Technical Approach

PixelMon implements two complementary generative approaches: Variational Autoencoders (VAEs) for learning compressed representations and Deep Convolutional GANs (DCGANs) for high-quality image synthesis. Both models are built with modular PyTorch components for maximum flexibility and extensibility.

VAE Architecture

VAE model: convolutional encoder, latent space, and decoder

DCGAN Architecture

GAN Architecture Diagram: Generator and Discriminator

DCGAN: generator and discriminator networks

VAE Technical Details

Architecture Components

• Encoder: 3 convolutional blocks with BatchNorm and LeakyReLU
• Latent space: μ and σ outputs for reparameterization trick
• Decoder: ConvTranspose2d layers for image reconstruction
• Loss: Reconstruction + KL divergence

Technical Features

• Configurable hidden dimensions: [32, 64, 128]
• Latent dimension: 16 for efficient compression
• Input/Output: 3×16×16 RGB images
• Gaussian latent space for smooth interpolation

DCGAN Technical Details

Generator Network

• Input: Random noise vector (latent_dim=100)
• Architecture: ConvTranspose2d upsampling layers
• Normalization: BatchNorm2d for stable training
• Activation: ReLU → Tanh output

Discriminator Network

• Input: Real or generated images
• Architecture: Conv2d feature extraction
• Activation: LeakyReLU for gradient flow
• Output: Binary classification (real/fake)

Implementation Philosophy

Modularity

Each component—from dataset loaders to model architectures to training scripts—is designed as an independent, reusable module that inherits from PyTorch base classes for seamless integration.

Experimentation

Framework enables rapid switching between models and datasets with minimal code changes, supporting iterative research and development workflows.

Reproducibility

Consistent random seeds, parameter tracking, and standardized training loops ensure reliable experimental comparisons and results validation.

Experiments & Datasets

PixelMon was evaluated on two primary datasets, each chosen to test different aspects of generative modeling: low-resolution pixel art for rapid prototyping and anime faces for complex, high-variance generation. These datasets highlight the framework's versatility and robustness across diverse visual domains.

Pixel Art Dataset

Sample of the 89,000-image pixel art dataset (16×16 RGB)

89,000 images at 16×16 RGB resolution from diverse pixel art styles and games. Chosen for its clear visual patterns, manageable computational requirements, and suitability for rapid experimentation and architecture validation.

Dataset Characteristics

• Diversity: Multiple art styles from classic arcade games to modern indie titles
• Consistency: Uniform 16×16 resolution with clear, distinct visual patterns
• Complexity: Balanced between simplicity for rapid training and richness for meaningful generation
• Training Speed: Small image size enables fast iteration cycles

Anime Faces Dataset

Sample of the 63,632-image anime faces dataset

63,632 images of anime character faces, providing complex facial features, diverse art styles, and challenging generation targets. Used to validate model performance on higher-complexity visual patterns and scalability to realistic domains.

Dataset Characteristics

• Complexity: Rich facial features, expressions, and artistic variations
• Variance: Wide range of character designs, hair colors, and facial structures
• Challenge: Tests model's ability to capture fine details and avoid mode collapse
• Realism: Bridges gap between synthetic pixel art and natural image domains

Dataset Infrastructure & Extensibility

Extensible Framework: Additional datasets including Pokemon (900 images), Landscapes (12,000 images), and MNIST are supported for future experiments and comparative studies.

PyTorch Integration: All data loaders inherit from PyTorch Dataset class with automatic preprocessing, normalization, and tensor conversion pipelines.

Modular Design: Each dataset handler can be imported independently, enabling custom experimentation workflows and easy integration with external datasets.

Scalability Testing: Framework architecture validated across resolutions from 16×16 to 224×224, demonstrating adaptability to diverse image sizes and domains.

Results

PixelMon achieved strong results across both datasets, demonstrating the effectiveness of its modular design and the power of modern generative models. Below are representative samples and key outcomes for each experiment.

VAE: Pixel Art

VAE-generated pixel art after 150 epochs

• Clear latent space organization and smooth interpolation
• Fast convergence and stable training
• Meaningful reconstruction of diverse art styles

GAN: Pixel Art

GAN-generated pixel art samples after 10 epochs

GAN-generated pixel art after 10 epochs

• Coherent pixel art styles and character features
• High-quality generation with minimal overfitting
• Rapid training convergence in just 10 epochs

VAE: Anime Faces

VAE-generated anime faces

• Successful encoding and reconstruction of facial features
• Demonstrated scalability to complex domains
• Preserved artistic style consistency

GAN: Anime Faces

GAN-generated anime faces

• High-quality face generation with style consistency
• Robust performance on high-variance data
• Sharp, detailed facial features and expressions

Key Achievements

Modular Framework: Enabled rapid cross-dataset and cross-model experimentation
Stable Training: Consistent convergence and reproducibility across runs
Quality Generation: Maintained visual coherence and domain-specific characteristics

Extensibility: Easy to add new datasets, models, and training strategies
Performance: Achieved strong results with minimal hyperparameter tuning
Documentation: Comprehensive code documentation and usage examples

Technologies & Tools

Python

PyTorch

Variational Autoencoders

Deep Convolutional GANs

Computer Vision

Kaggle Datasets

Image Generation

Neural Networks

Installation & Usage

# Clone and install the PixelMon framework

git clone https://github.com/smiley-maker/PixelMon.git

cd PixelMon

pip install -e .

# Run pre-configured experiments

python -m src.pipelines.pixelart_pipeline

python -m src.pipelines.pixelartgan_pipeline

Python API Example

# Import PixelMon components

from src.model_architectures.VAE.model.vae_model import VanillaVAE

from src.data_loaders.pixelart_handler import PixelArtDataset

from src.training_scripts.train_vae import TrainVAE

# Configure and train model

model = VanillaVAE(in_channels=3, latent_dim=16, hidden_dims=[32, 64, 128])

dataset = PixelArtDataset()

trainer = TrainVAE(model=model, epochs=150, batch_size=50)

trainer.train_model()

Technical Documentation & Blog

Medium Blog Post: "Image Generation: VAEs and GANs"

I documented the complete development process, technical challenges, and experimental results in a comprehensive blog post. The article covers the mathematical foundations of both VAEs and GANs, implementation details, and lessons learned from training on diverse datasets.

Topics Covered: VAE architecture, GAN training dynamics, dataset preprocessing, loss function optimization, and practical tips for stable training across different image domains.

Key Technical Insights

• VAE Latent Space: Lower-dimensional representations effectively capture essential image features across different domains
• GAN Training: Careful balance between generator and discriminator learning rates prevents mode collapse
• Dataset Scaling: Framework performance validated across resolutions from 16×16 to higher complexity domains
• Framework Design: Modular architecture enables rapid experimentation across different model types and datasets

Development Learnings

• Code Organization: Proper separation of concerns between data loading, model architecture, and training logic
• Reproducibility: Consistent random seeds and parameter tracking for reliable experimental comparisons
• GPU Optimization: Efficient batch processing and memory management for large-scale training
• Library Design: PyTorch-compatible interfaces enable easy integration with existing ML workflows

PixelMon: Deep Learning Image Generation Framework

Model Architectures & Technical Approach

VAE Architecture

DCGAN Architecture

VAE Technical Details

Architecture Components

Technical Features

DCGAN Technical Details

Generator Network

Discriminator Network

Implementation Philosophy

Modularity

Experimentation

Reproducibility

Experiments & Datasets

Pixel Art Dataset

Dataset Characteristics

Anime Faces Dataset

Dataset Characteristics

Dataset Infrastructure & Extensibility

Results

VAE: Pixel Art

GAN: Pixel Art

VAE: Anime Faces

GAN: Anime Faces

Key Achievements

Technologies & Tools

Installation & Usage

Python API Example

Technical Documentation & Blog

Medium Blog Post: "Image Generation: VAEs and GANs"

Key Technical Insights

Development Learnings

Further Reading & Wrap-up