Interacting with LLMs

Command-line tools

llm is a CLI tool that lets you send prompts and receive responses without opening a browser
Authenticate once on first use and then check that it works:

$ llm "Tell me a science joke"

Piping text through stdin lets you chain llm calls with other shell tools

$ cat results.txt | llm "Extract all numeric values and list them"

llm is useful for quick one-off queries, batch processing in shell scripts, and working in environments without a GUI
Claude Code is more sophisticated but vendor-specific
Use --help to see available flags for model selection and output format

Editor and notebook integrations

Editor integrations see your open files as context
- They hallucinate less on code that is already on screen
Inline completions are accepted with Tab and can be rejected with Escape
- Treat them like fancy autocomplete, not ground truth
marimo has MCP server support, letting notebooks talk to LLM tools without leaving the notebook interface
- marimo-pair can query the contents of Python's memory, which makes it very powerful for exploratory data analysis

Writing effective prompts

Be specific: vague prompts produce vague answers
- Weak: "Analyze my data"
- Stronger: "Given this CSV of penguin measurements, compute the mean bill length per species"
Provide context: include relevant column names, file formats, or domain vocabulary
Ask for step-by-step reasoning: "Write a step-by-step plan before taking action" often improves accuracy
- And gives you a chance to say "no" before something disastrous happens
Role prompting: starting with "You are an expert in…" can improve domain-specific responses
Few-shot examples: show the model one or two input-output pairs before your actual query

Convert species codes to full names. Examples:
Input: "Adel" -> Output: "Adelie"
Input: "Chin" -> Output: "Chinstrap"

Ask for structured output when you need to parse the response programmatically

Return your answer as JSON with keys "species", "mean_bill_mm", and "sample_size".
Do not include any other text.

Negative constraints help
- E.g., "Do not include code comments" or "Do not summarize the question back to me"
Iterative refinement
- Send a follow-up prompt correcting or extending the previous response rather than starting over
Long, complex prompts can be broken into smaller prompts that build on each other

Why does the wording of that question matter? LLMs are trained using reinforcement learning from human feedback (RLHF), where human raters rank model responses. Raters tend to prefer responses that are agreeable and helpful-sounding, so the model learns that confirming what the user said is more likely to be rated positively than contradicting them. This is called sycophancy: optimizing for approval rather than accuracy. Asking "Is there anything wrong with this?" explicitly invites disagreement and shifts the model toward a more critical mode. A quick test: tell an LLM something subtly wrong ("the p-value is the probability that the null hypothesis is true, right?") and compare what you get from "Is this right?" versus "Is there anything wrong with this description?"

Model Context Protocol

Model context protocol (MCP) is an open standard for connecting LLM applications to external tools and data sources
It uses JSON-RPC
- The LLM client sends a request describing a tool call
- The MCP server executes it and returns a result
An MCP] exposes a set of tools, each with a name, description, and JSON schema for its inputs
The LLM sees tool descriptions in its context and can choose to call a tool
- The client executes the actual call
Common MCP servers: filesystem access, SQLite databases, web search, GitHub, and calendar
MCP decouples the LLM from the tool implementation: the same server works with any MCP-compatible client
The server runs as a local process
- It does not need to send your data to a third-party service (though it can)

MCP example

Install the SQLite MCP server: uvx mcp-server-sqlite --db-path penguins.db
Add it to your config so your LLM knows it is available
- This is LLM-specific, so the Claude configuration is shown below

{
  "mcpServers": {
    "penguins": {
      "command": "uvx",
      "args": ["mcp-server-sqlite", "--db-path", "penguins.db"]
    }
  }
}

Ask a natural-language question
- The model calls the query tool automatically:

User: How many distinct species are in the penguins table?
Claude: [calls query tool with select count(distinct species) from penguins]
Result: 3

Verify by running the SQL yourself:

sqlite3 penguins.db "select count(distinct species) from penguins;"

The LLM constructs the SQL
- You verify the result
- Neither step replaces the other

Agents

A single prompt produces one response
- An agent runs a loop: observe → plan → act → observe → …
Agents use tool calls to gather information and take actions: web search, code execution, file read/write
The agent loop continues until the model decides the task is complete or a maximum step count is reached
Agents can take irreversible actions: deleting files, sending requests to external APIs, committing code
Risk compounds with step count: an error in step 2 can cause every subsequent step to be wrong
Human-in-the-loop checkpoints (requiring approval before certain tool calls) reduce the damage done by mistakes
Agents work well for tasks with clear success criteria that can be verified programmatically
Agents work poorly for tasks requiring subjective judgment or where the environment is ambiguous
Always review the full list of actions an agent took before accepting its output

Skills and extensions

A skill is a Markdown file containing a system prompt that specializes the model's behavior for a task
Skills are stored in (for example) ~/.claude/ and are available across projects
A skill can instruct the model to always check documentation before generating code, always output JSON, or always log its reasoning
Finding community skills: the Claude Code documentation and GitHub list commonly used skills
Installing a skill
- Copy the .md file to ~/.claude/
- Reference it by name in a prompt or config

$ cat ~/.claude/check-docs.md
# Check Python documentation before using external library

Before generating any Python code that uses an external library,
state the library version and confirm the API against the official docs.

Writing a skill automates a prompt pattern you would otherwise repeat by hand in every session

Exercises

Use an LLM in the terminal to ask a question about a CSV file in your project
- Check its answer against the result of a direct shell command on the same file
Install an MCP SQLite server, connect to the penguins database, and ask how many distinct species there are
- Verify the answer with a direct SQLite query run from the shell
Write a two-sentence skill that instructs an LLM to always check Polars documentation before generating code
- Test it on a data-loading task and record whether it changed the output
Prompt an agent to find and fix a syntax error in a short Python script
- Review every tool call it made
- Note which changes were correct and which introduced new problems
Identify one task from your daily workflow where an agent would be helpful and one where it would be risky
- Focus on reversibility and verifiability