Documentation Best Practices

This guide provides practical advice for creating high-quality documentation for scientific Python projects, with a focus on beginner-friendly approaches and integration with lab workflows.

Why Documentation Matters

Good documentation:

📖 Helps future you: Remember what you did 6 months from now
🤝 Enables collaboration: Team members can understand and use your code
🔬 Ensures reproducibility: Others can replicate your analysis
📊 Supports publication: Methods section writes itself
⚖️ Meets compliance: Required for regulated industries
🎓 Facilitates learning: Onboard new lab members faster

Reality check: Writing documentation takes time, but saves much more time later.

Documentation Types

Different documentation serves different purposes:

Documentation Types
Type	Purpose	Examples
Code Documentation	Explain how functions work	Docstrings, inline comments
User Guide	Show how to use the code	Tutorials, examples, workflows
API Reference	List all functions/classes	Auto-generated from docstrings
Theory/Methods	Explain the science	Equations, algorithms, references
Contributing Guide	Help others contribute	Setup, style guide, PR process

For beginners: Start with docstrings and one example. Expand from there.

Writing Docstrings

Docstrings are documentation inside your code. Python uses NumPy-style docstrings:

Basic Structure

def function_name(parameter1, parameter2):
    """
    One-line summary of what the function does.

    Longer description with more details about the function,
    its purpose, and how it works. Can span multiple lines.

    Parameters
    ----------
    parameter1 : type
        Description of the first parameter.
    parameter2 : type
        Description of the second parameter.

    Returns
    -------
    return_type
        Description of what the function returns.

    Examples
    --------
    >>> result = function_name(1, 2)
    >>> print(result)
    3

    Notes
    -----
    Any additional notes, warnings, or important information.

    See Also
    --------
    related_function : Brief description of relation
    """
    # Function implementation
    pass

Docstring Sections

Required sections:

Summary line: One line explaining what it does
Parameters: All input parameters with types
Returns: What the function returns

Recommended sections:

Examples: Actual code showing usage
Notes: Important details, limitations, or warnings
See Also: Links to related functions

Optional sections:

References: Citations to papers or books
Raises: Exceptions that might be raised

Example: Good Docstring

def calculate_peak_area(time, absorbance, start_idx, end_idx):
    """
    Calculate the area under a chromatographic peak.

    Uses trapezoidal integration to compute the peak area between
    specified indices. This is a standard method for quantitative
    chromatography analysis.

    Parameters
    ----------
    time : np.ndarray
        Time values in minutes (1D array).
    absorbance : np.ndarray
        Absorbance values in mAU (1D array).
    start_idx : int
        Index of peak start in the data arrays.
    end_idx : int
        Index of peak end in the data arrays.

    Returns
    -------
    area : float
        Peak area in mAU·min.

    Examples
    --------
    >>> time = np.array([0.0, 0.1, 0.2, 0.3, 0.4])
    >>> absorbance = np.array([2.0, 5.0, 8.0, 5.0, 2.0])
    >>> area = calculate_peak_area(time, absorbance, 0, 4)
    >>> print(f"Peak area: {area:.2f} mAU·min")
    Peak area: 2.00 mAU·min

    Notes
    -----
    For eLabFTW integration, document integration parameters
    (start/end indices, baseline correction) in the experiment
    record for full traceability.

    References
    ----------
    .. [1] Snyder, L. R., et al. (2010). Introduction to Modern
           Liquid Chromatography. Wiley.
    """
    return np.trapezoid(absorbance[start_idx:end_idx+1],
                       time[start_idx:end_idx+1])

Type Hints

Use type hints for clarity:

from typing import Tuple, List, Optional
import numpy as np

def find_peaks(time: np.ndarray,
               absorbance: np.ndarray,
               threshold: float = 10.0) -> List[dict]:
    """
    Find peaks in chromatogram data.

    Parameters
    ----------
    time : np.ndarray
        Time values.
    absorbance : np.ndarray
        Absorbance values.
    threshold : float, optional
        Minimum peak height (default: 10.0).

    Returns
    -------
    list of dict
        Each dict contains 'retention_time', 'height', 'area'.
    """
    pass

Type hints help: * Users understand what inputs are expected * IDEs provide better autocomplete * Tools can catch type errors

Writing User Guides

User guides show how to use your code. Key principles:

Start with Installation

Installation
============

Requirements
------------

* Python 3.7 or higher
* NumPy 1.19.0 or higher
* eLabFTW instance (optional, for full workflow)

Install
-------

Clone the repository:

.. code-block:: bash

   git clone https://github.com/username/project.git
   cd project

Install dependencies:

.. code-block:: bash

   pip install -r requirements.txt

Provide Quick Start

Quick Start
===========

Analyze an HPLC chromatogram in 3 steps:

1. **Load data**:

.. code-block:: python

   from hplc_analysis import load_chromatogram
   time, absorbance = load_chromatogram('data/sample.txt')

2. **Find peaks**:

.. code-block:: python

   from hplc_analysis import find_peaks
   peaks = find_peaks(time, absorbance, threshold=50.0)

3. **View results**:

.. code-block:: python

   for i, peak in enumerate(peaks, 1):
       print(f"Peak {i}: {peak['retention_time']:.2f} min")

Include Complete Examples

Show realistic, complete workflows:

Complete Workflow Example
=========================

This example shows the full analysis pipeline from data loading
to results reporting, including eLabFTW integration.

Step 1: Reference eLabFTW Experiment
-------------------------------------

Before starting, note your eLabFTW experiment ID::

    Experiment: #67890
    URL: https://your-instance.org/experiments.php?mode=view&id=67890

Step 2: Load and Analyze Data
------------------------------

.. code-block:: python

   import numpy as np
   from hplc_analysis import analyze_chromatogram

   # Reference eLabFTW in code
   # Experiment: https://your-instance.org/experiments.php?mode=view&id=67890

   results = analyze_chromatogram('data/sample.txt', threshold=50.0)

   print(f"Found {results['n_peaks']} peaks")
   for peak in results['peaks']:
       print(f"  RT: {peak['retention_time']:.2f} min")
       print(f"  Area: {peak['area']:.1f} mAU·min")

Step 3: Document in eLabFTW
----------------------------

Upload results to your eLabFTW experiment and include:

* Peak table (CSV export)
* Analysis parameters used
* GitHub commit hash for reproducibility

Using Sphinx

Sphinx converts documentation to beautiful HTML. Key concepts:

reStructuredText Basics

reStructuredText (.rst files) is a markup language:

Section Heading
===============

Subsection
----------

**Bold text** and *italic text*

Bullet lists:

* First item
* Second item
* Third item

Numbered lists:

1. Step one
2. Step two
3. Step three

Code blocks:

.. code-block:: python

   import numpy as np
   x = np.array([1, 2, 3])

External links:

`Python <https://python.org/>`_

Cross-references:

See :doc:`other_page` for more details.
See :func:`module.function` for API docs.

Mathematical Equations

Use LaTeX syntax for equations:

The peak area :math:`A` is calculated as:

.. math::

   A = \int_{t_1}^{t_2} h(t) \, dt

where :math:`h(t)` is the absorbance at time :math:`t`.

This renders as:

The peak area \(A\) is calculated as:

\[A = \int_{t_1}^{t_2} h(t) \, dt\]

where \(h(t)\) is the absorbance at time \(t\).

Cross-References

Link between documentation pages:

# Link to another page
See :doc:`elabftw_integration` for details.

# Link to a section
See :ref:`writing-docstrings` for examples.

# Link to a function
Use :func:`hplc_analysis.find_peaks` to detect peaks.

# Link to a module
See :mod:`hplc_analysis` for all analysis functions.

Auto-Generated API Docs

Sphinx can automatically generate API documentation from docstrings:

API Reference
=============

HPLC Analysis Module
--------------------

.. automodule:: hplc_analysis
   :members:
   :undoc-members:
   :show-inheritance:

This creates a complete API reference from your docstrings.

Organizing Documentation

Structure for Small Projects

docs/source/
├── index.rst              # Homepage with quick start
├── installation.rst       # How to install
├── tutorial.rst           # Step-by-step guide
├── api.rst                # Auto-generated API docs
└── elabftw.rst           # eLabFTW integration guide

Structure for Larger Projects

docs/source/
├── index.rst
├── getting_started/
│   ├── installation.rst
│   ├── quickstart.rst
│   └── configuration.rst
├── user_guide/
│   ├── basic_usage.rst
│   ├── advanced_topics.rst
│   └── elabftw_integration.rst
├── api_reference/
│   ├── analysis.rst
│   ├── visualization.rst
│   └── utilities.rst
└── developer_guide/
    ├── contributing.rst
    ├── testing.rst
    └── release_process.rst

For beginners: Start with the simple structure. Reorganize as you grow.

eLabFTW Integration in Docs

Show Complete Workflows

Document the full cycle:

Laboratory Workflow
===================

Before Running Analysis
-----------------------

1. Create eLabFTW experiment record
2. Document method and parameters
3. Upload raw data files
4. Note experiment ID for reference

During Analysis
---------------

1. Reference eLabFTW ID in code header
2. Load data with eLabFTW references
3. Run analysis with documented parameters
4. Save intermediate results

After Analysis
--------------

1. Upload results to eLabFTW
2. Link GitHub commit for traceability
3. Document analysis parameters used
4. Review and validate results

Provide Examples with eLabFTW Links

Example Analysis
================

This example analyzes HPLC data from eLabFTW experiment #67890:

https://your-elabftw-instance.org/experiments.php?mode=view&id=67890

The experiment used equipment EQUIP-12345 (HPLC-UV system):

https://your-elabftw-instance.org/database.php?mode=view&id=EQUIP-12345

Method details, calibration data, and quality control results
are documented in the eLabFTW records.

Document Where Things Go

Data Organization
=================

.. list-table::
   :widths: 30 35 35
   :header-rows: 1

   * - Data Type
     - Store In
     - Notes
   * - Raw instrument data
     - eLabFTW (attached to experiment)
     - Original files, unmodified
   * - Analysis scripts
     - GitHub repository
     - Version controlled
   * - Processed results
     - eLabFTW (attached to experiment)
     - With GitHub commit reference
   * - Protocols/methods
     - eLabFTW (database)
     - Shared across experiments
   * - Small test data
     - GitHub (data/ folder)
     - For examples and testing

Version Control for Documentation

Documentation needs version control too:

Commit Documentation Changes

# Good commit messages for docs
git commit -m "Add HPLC analysis tutorial with examples"
git commit -m "Fix broken links in API reference"
git commit -m "Update installation instructions for Windows"

Keep Docs Synchronized

When changing code:

Update docstrings if function signatures change
Update examples if usage patterns change
Update tutorials if workflows change
Check cross-references still work
Rebuild docs to catch issues

Version Documentation

For releases, document the version:

# In docs/source/conf.py
version = '1.0'  # Short X.Y version
release = '1.0.0'  # Full version

Tag releases in Git:

git tag -a v1.0.0 -m "Version 1.0.0 release"
git push origin v1.0.0

Common Mistakes to Avoid

❌ No Documentation

“The code is self-documenting” - No, it’s not.

✅ Fix: Start with minimal docstrings. Expand gradually.

❌ Outdated Documentation

Documentation describes old version of code.

✅ Fix: Update docs when changing code. Test examples.

❌ Assuming Too Much Knowledge

“Obviously you need to preprocess the data first…”

✅ Fix: Write for beginners. Explain each step.

❌ No Examples

Just parameter lists without showing usage.

✅ Fix: Add at least one working example per function.

❌ Broken Links

Cross-references point to nonexistent pages.

✅ Fix: Build docs locally and check for warnings.

❌ Inconsistent Style

Random mix of formatting and organization.

✅ Fix: Follow this template. Be consistent.

Tools and Resources

Documentation Tools

Sphinx: Main documentation generator
sphinx-rtd-theme: Professional Read the Docs theme
napoleon: NumPy docstring support
autodoc: Auto-generate API docs
doctest: Test code examples in docstrings

Checking Documentation

# Build and check for warnings
cd docs
make clean
make html

# Check for broken links
make linkcheck

# Spell check (if aspell installed)
find source -name "*.rst" -exec aspell check {} \;

Learning Resources

Example Documentation Checklist

Use this checklist for your documentation:

Code Documentation
==================

[ ] All functions have docstrings
[ ] Docstrings include Parameters and Returns
[ ] At least one example per main function
[ ] Type hints on function signatures
[ ] Complex logic has inline comments
[ ] eLabFTW references where applicable

User Guide
==========

[ ] Installation instructions included
[ ] Quick start example provided
[ ] Complete workflow example shown
[ ] eLabFTW integration explained
[ ] Troubleshooting section exists
[ ] Links to external resources

API Reference
=============

[ ] Auto-generated from docstrings
[ ] All modules documented
[ ] Cross-references work
[ ] Examples build without errors

Build and Deploy
================

[ ] Builds locally without errors
[ ] No broken links (linkcheck passes)
[ ] Equations render correctly
[ ] GitHub Actions workflow configured
[ ] Deploys to GitHub Pages

Getting Help

If you’re stuck on documentation:

Check existing examples in this repository
Search Sphinx documentation for specific features
Look at similar projects (NumPy, SciPy, pandas)
Ask in discussions on GitHub
Start simple and improve iteratively

Remember: Perfect documentation doesn’t exist. Good enough documentation that actually exists is infinitely better than perfect documentation that never gets written.