Tutorials¶

Learn DataJoint by building real pipelines.

These tutorials guide you through building data pipelines step by step. Each tutorial is a Jupyter notebook that you can run interactively. Start with the basics and progress to domain-specific and advanced topics.

Quick Start¶

Install DataJoint:

pip install datajoint

Configure database credentials in your project (see Configuration):

# Create datajoint.json for non-sensitive settings
echo '{"database": {"host": "localhost", "port": 3306}}' > datajoint.json

# Create secrets directory for credentials
mkdir -p .secrets
echo "root" > .secrets/database.user
echo "password" > .secrets/database.password

Define and populate a simple pipeline:

import datajoint as dj

schema = dj.Schema('my_pipeline')

@schema
class Subject(dj.Manual):
    definition = """
    subject_id : int32
    ---
    name : varchar(100)
    date_of_birth : date
    """

@schema
class Session(dj.Manual):
    definition = """
    -> Subject
    session_idx : int16
    ---
    session_date : date
    """

@schema
class SessionAnalysis(dj.Computed):
    definition = """
    -> Session
    ---
    result : float64
    """

    def make(self, key):
        # Compute result for this session
        self.insert1({**key, 'result': 42.0})

# Insert data
Subject.insert1({'subject_id': 1, 'name': 'M001', 'date_of_birth': '2026-01-15'})
Session.insert1({'subject_id': 1, 'session_idx': 1, 'session_date': '2026-01-06'})

# Run computations
SessionAnalysis.populate()

Continue learning with the structured tutorials below.

Learning Paths¶

Choose your learning path based on your goals:

🌱 New to DataJoint¶

Goal: Understand core concepts and build your first pipeline

Path:

First Pipeline — 30 min — Tables, queries, four core operations
Schema Design — 45 min — Primary keys, relationships, table tiers
Data Entry — 30 min — Inserting and managing data
Queries — 45 min — Operators, restrictions, projections
Try an example: University Database — Complete pipeline with realistic data

Next: Read Relational Workflow Model to understand the conceptual foundation.

🚀 Building Production Pipelines¶

Goal: Create automated, scalable data processing workflows

Prerequisites: Complete basics above or have equivalent experience

Path:

Computation — Automated processing with Imported/Computed tables
Object Storage — Handle large data (arrays, files, images)
Distributed Computing — Multi-worker parallel execution
Practice: Fractal Pipeline or Blob Detection

Next:

Run Computations — populate() usage patterns
Distributed Computing — Cluster deployment
Handle Errors — Job management and recovery

🧪 Domain-Specific Applications¶

Goal: Build scientific data pipelines for your field

Prerequisites: Complete basics, understand computation model

Production Software: DataJoint Elements

Standard pipelines for neurophysiology experiments, actively used in many labs worldwide. These are not tutorials—they are production-ready modular pipelines for calcium imaging, electrophysiology, array ephys, optogenetics, and more.

Learning tutorials (neuroscience):

Calcium Imaging — Import movies, segment cells, extract traces
Electrophysiology — Import recordings, spike detection, waveforms
Allen CCF — Hierarchical brain atlas ontology

General patterns:

Hotel Reservations — Booking systems with resource management
Languages & Proficiency — Many-to-many relationships

🔧 Extending DataJoint¶

Goal: Customize DataJoint for specialized needs

Prerequisites: Proficient with basics and production pipelines

Path:

Custom Codecs — Create domain-specific data types
JSON Data Type — Semi-structured data patterns
SQL Comparison — Understand DataJoint's query algebra

Next:

Codec API — Complete codec specification
Create Custom Codec — Step-by-step codec development

Basics¶

Core concepts for getting started with DataJoint:

First Pipeline — Tables, queries, and the four core operations
Schema Design — Primary keys, relationships, and table tiers
Data Entry — Inserting and managing data
Queries — Operators and fetching results
Computation — Imported and Computed tables
Object Storage — Blobs, attachments, and object stores

Examples¶

Complete pipelines demonstrating DataJoint patterns:

University Database — Academic records with students, courses, and grades
Hotel Reservations — Booking system with rooms, guests, and reservations
Languages & Proficiency — Language skills tracking with many-to-many relationships
Fractal Pipeline — Iterative computation and parameter sweeps
Blob Detection — Image processing with automated computation

Domain Tutorials¶

Real-world scientific pipelines:

Calcium Imaging — Import TIFF movies, segment cells, extract fluorescence traces
Electrophysiology — Import recordings, detect spikes, extract waveforms
Electrophysiology with Object Storage — Neural data with <npy@> lazy loading
Allen CCF — Brain atlas with hierarchical region ontology

Advanced Topics¶

Extending DataJoint for specialized use cases:

SQL Comparison — DataJoint for SQL users
JSON Data Type — Semi-structured data in tables
Distributed Computing — Multi-process and cluster workflows
Custom Codecs — Extending the type system

Running the Tutorials¶

# Clone the repository
git clone https://github.com/datajoint/datajoint-docs.git
cd datajoint-docs

# Start the tutorial environment
docker compose up -d

# Launch Jupyter
jupyter lab src/tutorials/

All tutorials use a local MySQL database that resets between sessions.