# Contributing to TCGA Downloader Thank you for your interest in contributing! This document provides guidelines and instructions for contributing to the project. ## Getting Started ### Prerequisites - Python 3.11 or higher - pip or uv - gdc-client (for testing downloads) - Git ### Setting Up Development Environment 1. Fork the repository 2. Clone your fork: ```bash git clone cd tcga-downloader ``` 3. Install the package in development mode: ```bash pip install -e ".[dev]" ``` 4. Install and setup pre-commit hooks: ```bash pip install pre-commit pre-commit install ``` ## Development Workflow ### Branch Strategy - `main`: Production-ready code - Create feature branches from `main`: `feature/your-feature-name` - Use descriptive branch names ### Making Changes 1. Create a new branch for your feature: ```bash git checkout -b feature/your-feature-name ``` 2. Make your changes following the coding standards (see below) 3. Run tests and linting: ```bash pytest black . ruff check --fix . mypy tcga_downloader ``` 4. Commit your changes with descriptive messages: ```bash git add . git commit -m "feat: add new filtering option for sample type" ``` 5. Push to your fork: ```bash git push origin feature/your-feature-name ``` 6. Create a Pull Request ## Coding Standards ### Code Style This project uses: - **Black**: Code formatting (line length: 100) - **Ruff**: Linting - **mypy**: Type checking Before committing, run: ```bash black . ruff check --fix . mypy tcga_downloader ``` Or use pre-commit hooks which run automatically: ```bash pre-commit run --all-files ``` ### Type Hints All public functions should include type hints. Use standard library types: ```python from typing import List, Optional def query_files( project: str, data_type: str, fields: Optional[List[str]] = None, ) -> List[dict]: ... ``` ### Logging Use the logging module instead of print statements: ```python from tcga_downloader.logger import get_logger logger = get_logger("module_name") logger.info("Processing file: %s", filename) logger.debug("Detailed debug information") ``` ### Error Handling Use descriptive error messages and appropriate exception types: ```python if not manifest_path.exists(): raise FileNotFoundError(f"Manifest file not found: {manifest_path}") ``` ## Testing ### Running Tests ```bash pytest ``` With coverage: ```bash pytest --cov=tcga_downloader --cov-report=html --cov-report=term ``` ### Writing Tests - Place tests in the `tests/` directory - Name test files: `test_.py` - Name test functions: `test__` Example: ```python def test_manifest_roundtrip_tsv(tmp_path): records = [ManifestRecord(...)] path = tmp_path / "m.tsv" write_manifest(records, path, fmt="tsv") loaded = load_manifest(path) assert loaded == records ``` ### Test Coverage Aim for high test coverage. New features should include tests. ## Documentation ### Code Documentation Use docstrings for public APIs: ```python def build_filters(project: str, data_type: str) -> dict: """Build GDC API filters for querying files. Args: project: TCGA project ID (e.g., "TCGA-BRCA") data_type: Data type to filter (e.g., "Gene Expression") Returns: Filter dictionary for GDC API query """ ... ``` ### Documentation Updates When adding new features: 1. Update README.md with usage examples 2. Add type hints 3. Update relevant sections in documentation ## Pull Request Process ### PR Checklist Before submitting a PR, ensure: - [ ] Code follows project style guidelines - [ ] All tests pass - [ ] New features include tests - [ ] Documentation is updated - [ ] Commit messages are clear and descriptive - [ ] Pre-commit hooks pass ### PR Template ```markdown ## Description Brief description of changes ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation update ## Testing Describe tests added/updated ## Checklist - [ ] Tests pass - [ ] Code style checks pass - [ ] Documentation updated ``` ## Code Review - Be constructive and respectful - Focus on code quality and correctness - Address review comments promptly - Request changes when necessary ## Questions or Issues? Feel free to open an issue or ask in discussions for any questions!