TutorialsΒΆ
Learn DataJoint by building real pipelines.
These tutorials guide you through building data pipelines step by step. Each tutorial is a Jupyter notebook that you can run interactively. Start with the basics and progress to domain-specific and advanced topics.
Quick StartΒΆ
Install DataJoint:
pip install datajoint
Configure database credentials in your project (see Configuration):
# Create datajoint.json for non-sensitive settings
echo '{"database": {"host": "localhost", "port": 3306}}' > datajoint.json
# Create secrets directory for credentials
mkdir -p .secrets
echo "root" > .secrets/database.user
echo "password" > .secrets/database.password
Define and populate a simple pipeline:
import datajoint as dj
schema = dj.Schema('my_pipeline')
@schema
class Subject(dj.Manual):
definition = """
subject_id : int32
---
name : varchar(100)
date_of_birth : date
"""
@schema
class Session(dj.Manual):
definition = """
-> Subject
session_idx : int16
---
session_date : date
"""
@schema
class SessionAnalysis(dj.Computed):
definition = """
-> Session
---
result : float64
"""
def make(self, key):
# Compute result for this session
self.insert1({**key, 'result': 42.0})
# Insert data
Subject.insert1({'subject_id': 1, 'name': 'M001', 'date_of_birth': '2026-01-15'})
Session.insert1({'subject_id': 1, 'session_idx': 1, 'session_date': '2026-01-06'})
# Run computations
SessionAnalysis.populate()
Continue learning with the structured tutorials below.
Learning PathsΒΆ
Choose your learning path based on your goals:
π± New to DataJointΒΆ
Goal: Understand core concepts and build your first pipeline
Path:
- First Pipeline β 30 min β Tables, queries, four core operations
- Schema Design β 45 min β Primary keys, relationships, table tiers
- Data Entry β 30 min β Inserting and managing data
- Queries β 45 min β Operators, restrictions, projections
- Try an example: University Database β Complete pipeline with realistic data
Next: Read Relational Workflow Model to understand the conceptual foundation.
π Building Production PipelinesΒΆ
Goal: Create automated, scalable data processing workflows
Prerequisites: Complete basics above or have equivalent experience
Path:
- Computation β Automated processing with Imported/Computed tables
- Object Storage β Handle large data (arrays, files, images)
- Distributed Computing β Multi-worker parallel execution
- Practice: Fractal Pipeline or Blob Detection
Next:
- Run Computations β populate() usage patterns
- Distributed Computing β Cluster deployment
- Handle Errors β Job management and recovery
π§ͺ Domain-Specific ApplicationsΒΆ
Goal: Build scientific data pipelines for your field
Prerequisites: Complete basics, understand computation model
Production Software: DataJoint Elements
Standard pipelines for neurophysiology experiments, actively used in many labs worldwide. These are not tutorialsβthey are production-ready modular pipelines for calcium imaging, electrophysiology, array ephys, optogenetics, and more.
Learning tutorials (neuroscience):
- Calcium Imaging β Import movies, segment cells, extract traces
- Electrophysiology β Import recordings, spike detection, waveforms
- Allen CCF β Hierarchical brain atlas ontology
General patterns:
- Hotel Reservations β Booking systems with resource management
- Languages & Proficiency β Many-to-many relationships
π§ Extending DataJointΒΆ
Goal: Customize DataJoint for specialized needs
Prerequisites: Proficient with basics and production pipelines
Path:
- Custom Codecs β Create domain-specific data types
- JSON Data Type β Semi-structured data patterns
- SQL Comparison β Understand DataJoint's query algebra
Next:
- Codec API β Complete codec specification
- Create Custom Codec β Step-by-step codec development
BasicsΒΆ
Core concepts for getting started with DataJoint:
- First Pipeline β Tables, queries, and the four core operations
- Schema Design β Primary keys, relationships, and table tiers
- Data Entry β Inserting and managing data
- Queries β Operators and fetching results
- Computation β Imported and Computed tables
- Object Storage β Blobs, attachments, and object stores
ExamplesΒΆ
Complete pipelines demonstrating DataJoint patterns:
- University Database β Academic records with students, courses, and grades
- Hotel Reservations β Booking system with rooms, guests, and reservations
- Languages & Proficiency β Language skills tracking with many-to-many relationships
- Fractal Pipeline β Iterative computation and parameter sweeps
- Blob Detection β Image processing with automated computation
Domain TutorialsΒΆ
Real-world scientific pipelines:
- Calcium Imaging β Import TIFF movies, segment cells, extract fluorescence traces
- Electrophysiology β Import recordings, detect spikes, extract waveforms
- Electrophysiology with Object Storage β Neural data with
<npy@>lazy loading - Allen CCF β Brain atlas with hierarchical region ontology
Advanced TopicsΒΆ
Extending DataJoint for specialized use cases:
- SQL Comparison β DataJoint for SQL users
- JSON Data Type β Semi-structured data in tables
- Distributed Computing β Multi-process and cluster workflows
- Custom Codecs β Extending the type system
Running the TutorialsΒΆ
# Clone the repository
git clone https://github.com/datajoint/datajoint-docs.git
cd datajoint-docs
# Start the tutorial environment
docker compose up -d
# Launch Jupyter
jupyter lab src/tutorials/
All tutorials use a local MySQL database that resets between sessions.