A Practical Guide
Agentic AI assistance for R, Stata, Python, LaTeX, and research workflows
VS Code is one of several places Codex can run. If you have not picked a surface yet, see where to run agentic AI for a short tour of the alternatives (terminal, desktop app, web, mobile) before working through this guide. For the Anthropic counterpart, see the companion Claude Code guide and the side-by-side comparison.
Setup
Installing Codex in VS Code is straightforward. The important thing is to install the official OpenAI extension and open a full project folder before asking it to do research work.
Ctrl with Cmd throughout, for example Cmd+Shift+X for Extensions and Cmd+Shift+P for the Command Palette.
Ctrl+Shift+X.openai.chatgpt. You can also press Ctrl+P and run ext install openai.chatgpt.Next
Codex works out of the box, but a few one-time setup steps make it much more useful for empirical research. Everything below is covered in detail later; this is the short roadmap.
AGENTS.md file for each project. Tell Codex once about your data, sample definitions, file layout, and coding conventions. Codex reads these instructions before it starts work.Chat when you only want advice, Agent when Codex may edit files and run commands, and Agent (Full Access) only when you deliberately want broader autonomy.$skill-name.Recommended Extensions
Install these alongside Codex to cover the full research stack. Search for each by extension ID in the Extensions panel (Ctrl+Shift+X).
.do files and you want to execute them without leaving VS Code. It also exposes an MCP server, so if you configure that server for Codex, Codex can interact with Stata more directly. Built by Thomas Monk, London School of Economics..py files. Required for most Python work in VS Code..docx or PDFs to markdown, reshaping a messy CSV, scraping a table, pulling a series from an API, cleaning a bibliography file - is solved most quickly by a short Python script. Ask Codex in plain English, and it can write, run, and debug that script for you.
RBQL) for filtering rows without loading them into R or Python.latex-workshop leaves auxiliary files (.aux, .bbl, .log, .fls, .fdb_latexmk, and so on) next to your .tex source. To have VS Code delete them automatically after each successful build, add three settings to .vscode/settings.json:
"latex-workshop.latex.autoClean.run": "onBuilt",
"latex-workshop.latex.clean.method": "glob",
"latex-workshop.latex.clean.fileTypes": [
"*.aux", "*.bbl", "*.blg", "*.log",
"*.out", "*.toc", "*.fls", "*.fdb_latexmk",
"*.nav", "*.snm", "*.vrb"
]
The first line triggers cleanup after every build; the second tells it to match by file extension; the third lists which extensions to delete.
File Formats
Codex works best with plain text. The more a file looks like characters on a page, as opposed to a rendered binary, the better Codex can inspect, edit, and reason about it. For economists this matters: a referee report in .docx, a codebook as a scanned PDF, or a Stata dataset in .dta can all be useful inputs, but usually only after conversion to a text-friendly form.
A markdown file (.md) is the simplest useful form: a plain-text document with lightweight formatting conventions - # for headings, *italic*, **bold**, bullet lists, and links written as [text](url). It renders nicely but remains fully readable as raw text.
A rough guide to the formats you are most likely to encounter:
pandoc file.docx -o file.md. Codex can do this for you if pandoc is installed..qsf file is technically JSON, but deeply nested. Ask Codex to flatten it into a markdown outline of blocks, questions, and answer choices before reviewing it.PDFs are where researchers most often get stuck. They look like documents, but they are layout files. Text may be stored as page-positioned glyphs with no clean paragraph, column, or table structure.
A practical rule of thumb:
ocrmypdf, then convert the OCR'd PDF to markdown or text..tex, use that instead. It will almost always be cleaner than re-parsing the compiled PDF.For converting PDFs to markdown, a few options work well:
Customisation
An AGENTS.md file gives Codex persistent instructions and context. It is the place to state things that are always true about your project: folder layout, data rules, code conventions, model notation, and what not to touch.
There are two especially useful levels:
~ is your home folder; on Windows it is typically C:\Users\yourname\.Codex can also read nested AGENTS.md and AGENTS.override.md files as it walks from the project root to your current working directory. More specific instructions appear later and take precedence.
Unlike Claude Code, Codex does not use CLAUDE.md. For a new empirical project, create AGENTS.md manually or ask Codex to draft one after it has inspected the folder. A useful starting point:
# Project: [Paper title]
## Data
- Unit of analysis: firm-year panel, 2010-2022
- Raw data lives in data/raw/ - never edit these files
- Main dataset after cleaning: data/clean/panel.dta
- When data are large, create summaries or samples instead of loading full files into context
## Code conventions
- R scripts are numbered: 01_clean.R, 02_analysis.R, 03_tables.R
- All regressions cluster SEs at the firm level unless instructed otherwise
- Use fixest for panel regressions
- Save generated tables to paper/tables/
## LaTeX
- Main file: paper/main.tex
- Use \widehat{} rather than \hat{} for estimators
- Keep notation consistent with paper/notation.md
## Safety
- Do not overwrite raw data
- Ask before installing new packages
- Run the relevant script after editing analysis code
AGENTS.md is also the right place for instructions about how you want Codex to write: tone, voice, and phrases to avoid. Paul Goldsmith-Pinkham has a useful post on using these instructions to get AI assistance that improves rather than flattens your prose: Writing and thinking with AI assistance.
Codex skills are folders containing a SKILL.md file, plus optional scripts, references, and assets. They package task-specific instructions so Codex can follow a workflow reliably. Skills are available in the Codex CLI, IDE extension, and Codex app.
Codex skills are not slash commands in the Claude sense. In Codex, slash commands such as /status, /review, /cloud, and /local control the session. Skills are invoked by mentioning them directly, for example $robustness, or by typing $ and selecting from the list. Codex may also invoke a skill automatically when your request matches its description.
Skills can live in several places:
To create a $robustness skill manually:
mkdir -p .agents/skills/robustness
# then create .agents/skills/robustness/SKILL.md
A minimal SKILL.md for an economist:
---
name: robustness
description: Use when asked to run standard robustness checks on the main regression.
---
For the main specification in this project:
1. Identify the baseline regression script and output table
2. Re-run with alternative clustering: industry-year instead of firm
3. Re-run dropping the top and bottom 1% of the outcome variable
4. Re-run on the pre-2020 subsample only
5. Produce a summary table comparing coefficients across all specifications
6. Report which files changed and which commands were run
Codex includes a $skill-installer skill for installing curated or external skills. For local experimentation you can also copy a skill folder into ~/.agents/skills/ or into a project's .agents/skills/ directory. If a new skill does not appear immediately, restart Codex.
For economists, the existing Claude-skill ecosystem is still useful as a source of ideas, but paths and invocation syntax need to be adapted for Codex. A few starting points:
$skills.Control
The most important Codex habit is choosing the right level of autonomy for the task. The mode switcher sits under the chat input in the Codex panel.
/cloud or the cloud controls when you want Codex to run a larger task remotely, then review the changes locally before merging them.Version Control
Codex is git-aware. When you open a project that is a git repository, Codex can inspect diffs, staged changes, branches, and commit history. This makes it useful for version control tasks alongside coding ones.
Write a commit message for my staged changes.
Summarise what changed between the last two commits
in code/02_analysis.R.
/review
Review my uncommitted changes and flag anything that
could change the sample or regression specification.
I have a merge conflict in code/03_tables.R. Read both
versions and resolve it, keeping the newer variable names.
Performance
Codex can only hold a limited amount of project information in its working memory at once. Large empirical projects with many scripts, logs, data extracts, and output tables can fill that space quickly. A few habits help:
Point Codex at the specific files you want it to use instead of making it infer the whole project.
@code/02_analysis.R fix the clustering in the main spec.
Select the exact equation, table, or error message in VS Code and use the Codex command to add the selection to the current thread.
Explain this selected LaTeX error and patch the table.
A fresh thread avoids dragging old assumptions into a new problem. In VS Code, Cmd+N or Ctrl+N creates a new Codex thread by default.
Use /status to see thread information, context usage, and rate-limit information.
/auto-context can include recent files and IDE context automatically. It is convenient for small projects, but explicit file mentions are often cleaner for large research folders.