Skip to content

TutorialsΒΆ

Learn DataJoint by building real pipelines.

These tutorials guide you through building data pipelines step by step. Each tutorial is a Jupyter notebook that you can run interactively. Start with the basics and progress to domain-specific and advanced topics.

Quick StartΒΆ

Install DataJoint:

pip install datajoint

Configure database credentials in your project (see Configuration):

# Create datajoint.json for non-sensitive settings
echo '{"database": {"host": "localhost", "port": 3306}}' > datajoint.json

# Create secrets directory for credentials
mkdir -p .secrets
echo "root" > .secrets/database.user
echo "password" > .secrets/database.password

Define and populate a simple pipeline:

import datajoint as dj

schema = dj.Schema('my_pipeline')

@schema
class Subject(dj.Manual):
    definition = """
    subject_id : int32
    ---
    name : varchar(100)
    date_of_birth : date
    """

@schema
class Session(dj.Manual):
    definition = """
    -> Subject
    session_idx : int16
    ---
    session_date : date
    """

@schema
class SessionAnalysis(dj.Computed):
    definition = """
    -> Session
    ---
    result : float64
    """

    def make(self, key):
        # Compute result for this session
        self.insert1({**key, 'result': 42.0})

# Insert data
Subject.insert1({'subject_id': 1, 'name': 'M001', 'date_of_birth': '2026-01-15'})
Session.insert1({'subject_id': 1, 'session_idx': 1, 'session_date': '2026-01-06'})

# Run computations
SessionAnalysis.populate()

Continue learning with the structured tutorials below.

Learning PathsΒΆ

Choose your learning path based on your goals:

🌱 New to DataJoint¢

Goal: Understand core concepts and build your first pipeline

Path:

  1. First Pipeline β€” 30 min β€” Tables, queries, four core operations
  2. Schema Design β€” 45 min β€” Primary keys, relationships, table tiers
  3. Data Entry β€” 30 min β€” Inserting and managing data
  4. Queries β€” 45 min β€” Operators, restrictions, projections
  5. Try an example: University Database β€” Complete pipeline with realistic data

Next: Read Relational Workflow Model to understand the conceptual foundation.


πŸš€ Building Production PipelinesΒΆ

Goal: Create automated, scalable data processing workflows

Prerequisites: Complete basics above or have equivalent experience

Path:

  1. Computation β€” Automated processing with Imported/Computed tables
  2. Object Storage β€” Handle large data (arrays, files, images)
  3. Distributed Computing β€” Multi-worker parallel execution
  4. Practice: Fractal Pipeline or Blob Detection

Next:


πŸ§ͺ Domain-Specific ApplicationsΒΆ

Goal: Build scientific data pipelines for your field

Prerequisites: Complete basics, understand computation model

Production Software: DataJoint Elements

Standard pipelines for neurophysiology experiments, actively used in many labs worldwide. These are not tutorialsβ€”they are production-ready modular pipelines for calcium imaging, electrophysiology, array ephys, optogenetics, and more.

Learning tutorials (neuroscience):

General patterns:


πŸ”§ Extending DataJointΒΆ

Goal: Customize DataJoint for specialized needs

Prerequisites: Proficient with basics and production pipelines

Path:

  1. Custom Codecs β€” Create domain-specific data types
  2. JSON Data Type β€” Semi-structured data patterns
  3. SQL Comparison β€” Understand DataJoint's query algebra

Next:


BasicsΒΆ

Core concepts for getting started with DataJoint:

  1. First Pipeline β€” Tables, queries, and the four core operations
  2. Schema Design β€” Primary keys, relationships, and table tiers
  3. Data Entry β€” Inserting and managing data
  4. Queries β€” Operators and fetching results
  5. Computation β€” Imported and Computed tables
  6. Object Storage β€” Blobs, attachments, and object stores

ExamplesΒΆ

Complete pipelines demonstrating DataJoint patterns:

Domain TutorialsΒΆ

Real-world scientific pipelines:

Advanced TopicsΒΆ

Extending DataJoint for specialized use cases:

Running the TutorialsΒΆ

# Clone the repository
git clone https://github.com/datajoint/datajoint-docs.git
cd datajoint-docs

# Start the tutorial environment
docker compose up -d

# Launch Jupyter
jupyter lab src/tutorials/

All tutorials use a local MySQL database that resets between sessions.