4.3 KiB
4.3 KiB
Contributing to TCGA Downloader
Thank you for your interest in contributing! This document provides guidelines and instructions for contributing to the project.
Getting Started
Prerequisites
- Python 3.11 or higher
- pip or uv
- gdc-client (for testing downloads)
- Git
Setting Up Development Environment
- Fork the repository
- Clone your fork:
git clone <your-fork-url>
cd tcga-downloader
- Install the package in development mode:
pip install -e ".[dev]"
- Install and setup pre-commit hooks:
pip install pre-commit
pre-commit install
Development Workflow
Branch Strategy
main: Production-ready code- Create feature branches from
main:feature/your-feature-name - Use descriptive branch names
Making Changes
- Create a new branch for your feature:
git checkout -b feature/your-feature-name
-
Make your changes following the coding standards (see below)
-
Run tests and linting:
pytest
black .
ruff check --fix .
mypy tcga_downloader
- Commit your changes with descriptive messages:
git add .
git commit -m "feat: add new filtering option for sample type"
- Push to your fork:
git push origin feature/your-feature-name
- Create a Pull Request
Coding Standards
Code Style
This project uses:
- Black: Code formatting (line length: 100)
- Ruff: Linting
- mypy: Type checking
Before committing, run:
black .
ruff check --fix .
mypy tcga_downloader
Or use pre-commit hooks which run automatically:
pre-commit run --all-files
Type Hints
All public functions should include type hints. Use standard library types:
from typing import List, Optional
def query_files(
project: str,
data_type: str,
fields: Optional[List[str]] = None,
) -> List[dict]:
...
Logging
Use the logging module instead of print statements:
from tcga_downloader.logger import get_logger
logger = get_logger("module_name")
logger.info("Processing file: %s", filename)
logger.debug("Detailed debug information")
Error Handling
Use descriptive error messages and appropriate exception types:
if not manifest_path.exists():
raise FileNotFoundError(f"Manifest file not found: {manifest_path}")
Testing
Running Tests
pytest
With coverage:
pytest --cov=tcga_downloader --cov-report=html --cov-report=term
Writing Tests
- Place tests in the
tests/directory - Name test files:
test_<module>.py - Name test functions:
test_<function>_<scenario>
Example:
def test_manifest_roundtrip_tsv(tmp_path):
records = [ManifestRecord(...)]
path = tmp_path / "m.tsv"
write_manifest(records, path, fmt="tsv")
loaded = load_manifest(path)
assert loaded == records
Test Coverage
Aim for high test coverage. New features should include tests.
Documentation
Code Documentation
Use docstrings for public APIs:
def build_filters(project: str, data_type: str) -> dict:
"""Build GDC API filters for querying files.
Args:
project: TCGA project ID (e.g., "TCGA-BRCA")
data_type: Data type to filter (e.g., "Gene Expression")
Returns:
Filter dictionary for GDC API query
"""
...
Documentation Updates
When adding new features:
- Update README.md with usage examples
- Add type hints
- Update relevant sections in documentation
Pull Request Process
PR Checklist
Before submitting a PR, ensure:
- Code follows project style guidelines
- All tests pass
- New features include tests
- Documentation is updated
- Commit messages are clear and descriptive
- Pre-commit hooks pass
PR Template
## Description
Brief description of changes
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update
## Testing
Describe tests added/updated
## Checklist
- [ ] Tests pass
- [ ] Code style checks pass
- [ ] Documentation updated
Code Review
- Be constructive and respectful
- Focus on code quality and correctness
- Address review comments promptly
- Request changes when necessary
Questions or Issues?
Feel free to open an issue or ask in discussions for any questions!