2  Example folder structure

There are many ways to set up a project folder, and the exact structure may differ slightly between projects depending on what aspects and file types are relevant. However it is good to establish some standard structures. Here are some examples/guidelines for setting up a Data Project folder and subfolders.

2.1 Main Directory

Create a main directory with a name that reflects the project. The next section has some tips on creating informative project names.

2.2 Subdirectories

Within your main directory, create the following subdirectories to organize your files:

  • data/: Store raw and processed data files here.
    • raw/: For raw, unaltered datasets.
    • processed/: For cleaned and transformed datasets.
  • notebooks/: Store RMarkdown or Quarto notebooks.
    • exploratory/: Notebooks for initial data exploration and analysis.
    • final/: Notebooks with finalized analyses and results.
  • scripts/: Store all your scripts for data processing, analysis, and modeling. If your project has multiple distinct steps you may want to include subfolders:
    • data_preprocessing/: Scripts for cleaning and preparing data.
    • analysis/: Scripts for data analysis.
    • models/: Scripts for building and evaluating models.
  • outputs/: Store results such as model outputs, plots, and reports.
    • figures/: For all plots and figures.
    • tables/: For tables and other summary results.
    • reports/: For final reports and presentations.
  • docs/: Documentation for the project.
    • references/: Any reference materials or literature.
    • manuals/: Any user manuals or guides.

2.3 Important Files

Include the following files at the root of your project:

  • README.md: A markdown file explaining the project, its structure, and how to get started.
  • requirements.txt or environment.yml: List of dependencies needed for the project.
  • *.Rproj: This will be created in RStudio.
  • .gitignore: To specify which files and directories should be ignored by Git.
  • LICENSE: License information for the project.

2.4 Example Directory Structure

my_data_science_project/
│
├── data/
│   ├── raw/
│   └── processed/
│
├── notebooks/
│   ├── exploratory/
│   └── final/
│
├── scripts/
│   ├── data_preprocessing/
│   ├── analysis/
│   └── models/
│
├── results/
│   ├── figures/
│   ├── tables/
│   └── reports/
│
├── docs/
│   ├── references/
│   └── manuals/
│
├── README.md
├── requirements.txt or environment.yml
├── .Rproj
├── .gitignore
└── LICENSE