Consulting

Data science consulting for complex legal, policy, compliance, risk, and program data.

I help clients turn messy records, qualitative assessments, text, geospatial information, and small or imbalanced datasets into auditable workflows and decision-ready evidence.

My work is strongest where the data are difficult to structure, the categories require domain judgment, and the results need to be credible to both technical and nontechnical stakeholders. I bring more than a decade of research and applied analysis experience across law, policy, defense, international security, and nonprofit work.

Core areas

LLM Validation

Quality assurance, rubric design, structured evaluation, and review workflows for LLM-supported analysis.

Legal and Policy NLP

Text classification, coding workflows, training-data reconstruction, and model evaluation for legal or policy corpora.

Program Evaluation

Analysis workflows linking administrative, survey, social media, and geospatial data for stakeholder-facing evidence.

Risk Analytics

Risk-score validation, review consistency, rare-outcome prediction, model diagnostics, and reproducible reporting.

Types of solutions

Practical analytical systems, not just one-off analysis.

LLM and AI Quality Review

Build review systems that check AI-assisted or analyst-written outputs for omissions, internal consistency, score alignment, evidence support, and category-specific quality standards.

Rubric design and structured evaluation criteria.
Schema-constrained LLM review outputs.
Comparison examples, context injection, and audit trails.

Text Classification and Coding Workflows

Turn legal, policy, compliance, survey, or research text into structured data that can be reviewed, measured, and modeled.

LLM-assisted training-data reconstruction and labeling support.
Taxonomy development, coding rules, and validation workflows.
Model comparison across LLM, classical NLP, and fine-tuned approaches.

Program Evaluation and Measurement

Design evaluation workflows for programs where outcomes are indirect, hard to measure, or scattered across administrative, survey, social media, and contextual data.

Treatment and comparison-group construction.
Sentiment, topic, or custom dictionary measures.
Regression, bootstrapping, diagnostics, and stakeholder reporting.

Risk, Compliance, and Review Consistency

Evaluate whether qualitative assessments, written justifications, and structured scores line up in ways that are consistent and defensible.

Embedding, clustering, and score-text alignment analysis.
Review consistency checks across categories, analysts, or time.
Feature engineering and interpretable predictive modeling.

Geospatial and Contextual Data Engineering

Build spatial and contextual features when records need to be linked to districts, counties, precincts, conflict zones, or other geographic units.

Spatial overlays, aggregation, and district-level covariates.
Entity matching across inconsistent names and identifiers.
Validation checks for joins, shapefiles, and derived features.

Reproducible Analysis Pipelines

Move analysis out of fragile spreadsheets and one-off scripts into documented workflows that can be rerun, inspected, and extended.

Python or R pipelines for cleaning, joining, modeling, and reporting.
Versioned artifacts, logs, and data-quality checks.
Clear handoff documentation for technical and nontechnical teams.

Ways I can help

Scoping and Feasibility

Clarify the decision problem, audit available data, identify the right analytical path, and separate what can be measured now from what would require better data or labels.

Prototype to Working Pipeline

Build a first version of the workflow, test model options, document assumptions, and produce outputs that stakeholders can inspect before a larger implementation.

Model and Workflow Review

Review an existing model, LLM workflow, evaluation design, or analytical report for methodological weaknesses, data leakage, unclear assumptions, or missing diagnostics.

Research-to-Decision Translation

Turn complex analysis into memos, visuals, presentations, and recommendations for legal, policy, executive, research, or program teams.

Selected clients

Ropes & Gray LLP

Built LLM-based validation and quality-control workflows for compliance assessment outputs, including category-specific rubrics, structured context, comparison examples, and schema-constrained review outputs. Earlier work developed a validation framework for private-equity risk scoring using embeddings, clustering, statistical testing, NLP features, and predictive modeling.

The Carter Center

Evaluated a high-profile political engagement program by linking candidate records, vendor social media data, and geospatial covariates, then modeling program sign-up and post-signing sentiment across roughly 1.15 million social media posts. Also built a legal NLP workflow for automating the coding of self-expression laws.

Center for Strategic and Budgetary Assessments

Audited and replicated regression and machine-learning models for defense cost estimation, including support vector regression, neural networks, and decision trees.

For complex problems that are not yet model-ready.

I can scope the problem, build the data workflow, evaluate modeling options, communicate uncertainty, and translate results into recommendations that stakeholders can use.

Start a conversation