GoodScience

A methodologically principled and scientifically responsible process translating data into meaning is the craft of GoodScience.

Do Good Science.

FRAMEWORK

The GoodScience framework is designed to generate meaningful and reliable evidence from data and support high-quality decisions.

Nature
Nature
The complex nexus of causes that produce the phenomena of our world that are available for empirical study. The underlying causal structure of nature is often abstruse or inscrutable.

        Underlying causal reality
Beginning of the data generating process

          
Population
Population
All of the objects (existing, extant and/or possible) in the category of interest for study. The population is the realization of causal process in nature. The Population is the expression of the 'long run' probabilistic tendencies in nature's causes. The population is also the primary object of study and inference.

            Importance of study / experimental design
Risk of selection bias
Confounding by indication

          
Sample
Sample
The subset of the population available for study and observed.

            Omitted variables
Missing data
Measurement issues
Information bias

          
Data
Data
The actual observations made and recorded on the sample. Not all observations/variables of all possible variables from the sample are collected. Measurements are made imperfectly and recorded with errors. The particular instance of the data (out of many possible instances) are the source of the statistical likelihood on which the analysis is predicated.
Uncertainties
Model specification
Model selection
Assumptions re: distributions
Structural Causal Models and Directed Acyclic Graphs are crucial for specifying the set of covariate adjustments necessary for unbiased estimation of target effects; and for drawing causal inference.
See Context below for further explanation.
Analysis
Analysis
The mathematical procedures that account for both the structure and randomness of the data. Typically a model is used or is at least implicit. All analyses require assumptions (both strong and weak).
Analytic bias
Model selection
Model misspecification
Over-fitting
Residual confounding
Arbitrary categorization
Collider bias
Inference
& Belief
Inference & Belief
The conclusions drawn from the analysis of the data (and in combination with any external information), including whether any associations observed are causal in nature and likely reproducible effects in independent data. Belief depends on the strength of the findings and the research process, coherence with existing knowledge, and numerous cognitive and psychological factors including biases, intentions and motivations.

            Association vs. Causation
Cognition / psychology
Intentions & motivations

          

            Decisions
& Actions
          
Decisions & Actions
The consequences, if any, of the research activities. The impact of the research will depend in part on the strength of the belief resulting from the inference, and the relevance for problems faced by others. Consequences include clinical behavior and medical decision making; and scientific behavior including confirmatory reproduction of research, and motivation of additional research.
Hover over nodes and arrows for more detailed information.

ETHOS

Any science needs to faithfully connect its observations and measures with a sciences' systematic implications.

Analysis methods are a filter and a lens in the process leading from observations to knowledge and value.

In the communication between the data generating process and knowledge, analysis methods can be low- or high-fidelity. The details of how this connection is performed matter.

All but the most trivial of scientific questions will require modeling. The numerous decisions and assumptions made in the process of regression modeling and estimation determine the accuracy of any signal or pattern that is the target of scientific interest.

Modeling is not simply the plumbing between data and conclusions. Most observational data in health science are generated by largely nonrandom and poorly understood mechanisms of causal processes, subject selection, exposure assignment, measurement error and missing data; and are analyzed to evaluate effects that are latent under these mechanisms.

Modeling well is a craft: much more than just facility with a toolkit of regression techniques. Rigorous understanding of the specific purposes of the research, the processes that generated the data, the specific analytic strategy and modeling tools employed, and a thorough understanding of the raw material (data and assumptions) that are fed to the model, are all part of a successful program of scientific modeling.

Modeling well begins with asking good questions.

A principled, well integrated and coherent, consistent, and reliable process translating data into meaning is the craft of good science.

PERSPECTIVES

Structural Causal Models (SCMs) and Directed Acyclic Graphs (DAGs) are foundational tools for moving beyond pattern-finding into genuine causal understanding. Here's why they matter:

Distinguishing Causation from Correlation

Traditional statistics excels at finding associations, but associations can be misleading. SCMs and DAGs provide a formal language for encoding why variables relate — not just that they do. This is essential for any analysis where you want to intervene, predict under new conditions, or explain mechanisms.

Controlling for the Right Variables

DAGs reveal which variables to include (or exclude) in a model — a non-obvious problem. Three key structures illustrate this:

Confounders (common causes of X and Y) — must be controlled to avoid spurious associations
Colliders (common effects of X and Y) — must not be controlled, or you introduce bias
Mediators (variables on the causal path from X to Y) — controlling for them blocks the effect you're trying to measure

Getting this wrong with purely statistical intuition is easy; DAGs make the logic explicit and checkable.

The Do-Calculus and Interventional Reasoning

Judea Pearl's do-calculus, built on SCMs, distinguishes between observing that a variable takes a value versus setting it (an intervention). This lets analysts answer questions like "what would happen if we forced X to change?" using observational data — under identifiable conditions — which is enormously valuable when experiments are costly or unethical.

Counterfactual Analysis

SCMs support counterfactual reasoning: "What would the outcome have been had the treatment been different, for this specific individual?" This underpins fairness analysis, policy evaluation, and personalized decision-making in ways that purely predictive models cannot.

Communicating Assumptions Transparently

Every causal analysis rests on assumptions. DAGs make those assumptions explicit and visible, allowing collaborators, reviewers, and domain experts to critique the model structure rather than hidden statistical choices. This improves scientific rigor and reproducibility.

Practical Applications

The value shows up across many fields: epidemiology (estimating treatment effects), economics (policy analysis), machine learning (building models robust to distribution shift), and AI fairness (detecting and correcting discriminatory pathways).

In short, SCMs and DAGs are important because they provide the scaffolding needed to answer causal questions rigorously — the questions that most actually matter for decision-making — rather than just describing data as it happens to be.

Short Answer: Not Reliably — But Usefully, With the Right Framing

What AI Can Do Reasonably Well

Encoding domain knowledge into DAG structure

If a human expert provides the causal assumptions (e.g., "X affects Y, Z is a confounder"), AI can translate these into a valid DAG and correctly apply algorithms like the backdoor criterion, frontdoor criterion, or do-calculus to identify valid adjustment sets. This is essentially symbolic reasoning over a graph, and LLMs can handle it competently for simple-to-moderate DAGs.

Algorithmic identification given a DAG

Given a well-specified DAG, tools like dagitty, bnlearn, or Python's causalgraphicalmodels are deterministically correct — and AI can reliably generate and interpret their outputs.

Where It Breaks Down

DAG construction from data alone is deeply unreliable

Causal structure cannot be identified from observational data without strong assumptions. Algorithms like PC, FCI, or GES recover Markov equivalence classes — sets of DAGs with the same conditional independencies — not unique causal graphs. AI has no principled way to choose among them without domain knowledge.

LLMs hallucinate causal structure

When asked to generate a DAG "from scratch" for a domain, LLMs tend to produce plausible-looking but unfounded graphs. They may:

Confuse correlation with causation in their training data
Omit latent confounders they have no way to know about
Confidently assert edges that reflect statistical associations, not mechanisms

The identification problem is hard even with a correct DAG

Adjustment set identification requires checking graphical conditions (d-separation, admissibility) that become computationally and conceptually complex with:

Hidden/latent variables
Selection bias nodes
Cycles or feedback loops
Interference between units

No ground truth for validation

Unlike math or code, there's often no way to verify a generated DAG is correct without running an experiment — which defeats the purpose.

The Deeper Epistemological Issue

DAGs are representations of assumptions, not discoveries. A DAG is only as valid as the domain knowledge baked into it. The question "did AI get the DAG right?" is often unanswerable without a randomized experiment or near-experimental variation. This makes AI-generated DAGs particularly dangerous when used naively — they provide false confidence in causal conclusions drawn from observational data.

Practical Verdict

Use Case	Reliability
Encode expert-specified assumptions into a DAG	✅ High
Find adjustment sets given a known DAG	✅ High (especially with formal tools)
Generate DAG structure from domain description	⚠️ Moderate — requires expert review
Discover causal structure from data alone	❌ Low — not reliably possible in general
Handle latent confounders without domain input	❌ Very low

The most defensible workflow is: human provides causal assumptions → AI encodes them as a DAG → formal algorithm identifies adjustment sets → human audits the assumptions. AI as a reasoning assistant over explicit assumptions is sound; AI as a causal discoverer is not.

For all but the most trivial of scientific questions, rigorous analysis in research will require conditioning, such as stratification or adjustment. Practically, all observational data analyses and even nearly all RCTs require analyses involving conditioning and adjustment. Almost immediately, statistical modeling offers several advantages in conditioning and adjustment: for efficiency with limited information, for using continuous covariates with minimal information loss, for managing complexity (bias, effect modification, missing-ness, heteroskedasticity, etc.) in an integrated and coherent framework; and now with modern tools, advantages for visualization of information and patterns in the data.

The regression modeling framework for statistical analysis accommodates the all the fundamental statistical interests of (i) estimation, (ii) hypothesis testing, (iii) description, and (iv) prediction. Even simple univariate and bivariate analyses are special cases of more general regression problems that make strong assumptions about ignorability of covariates and which arbitrarily set coefficients for these covariates to zero. There are special advantages for approaching regression problems from a prediction perspective: if you get the left hand-side correct (Ŷ) you get better performance for estimation, hypothesis testing and even description. Therefore, methods for optimizing prediction confer advantages and benefits for other statistical and research objectives. But doing prediction well is as much craft as it is technical, and receiving the experience and insight necessary for cultivating this craft is an important factor for effectiveness and success in scientific work. The wisdom—often non-obvious or even counter-intuitive–cataloged in the Regression Modeling Strategies (RMS) text, ebook, and the short course are a unique and rare opportunity to rapidly develop statistical acumen and accelerate scientific sophistication. RMS is also a gateway and stepping stone for other modern statistical applications such as Bayesian methods and hierarchical models, robust statistical techniques, biomarker identification, adaptive trials, comparative effectiveness, meta-analysis and evidence synthesis, etc.; and for judicious use of machine learning/AI, etc.

RMS helps us better understand and manage sources of uncertainty in scientific work; thus, RMS makes us all better scientists, and better—more sophisticated—consumers and administrators of research.

Harding CO, Whitehall KB, Lilienstein J, Sazova O, Lindstrom K, Levy DG, Burton BK. Long-term management strategies for pegvaliase use in phenylketonuria. Genet Med. 2025 Aug;27(8):101459.
Harding CO, Longo N, Northrup H, Sacharow S, Singh R, Thomas JA, Vockley J, Zori RT, Bulloch Whitehall K, Lilienstein J, Lindstrom K, Levy DG, Jones S, Burton BK. Pegvaliase for the treatment of phenylketonuria: Final results of a long-term phase 3 clinical trial program. Mol Genet Metab Rep. 2024 Apr 23;39:101084.
Burton BK, Clague GE, Harding CO, Kucuksayrac E, Levy DG, Lindstrom K, Longo N, Maillot F, Muntau AC, Rutsch F, Zori RT. Long-term comparative effectiveness of pegvaliase versus medical nutrition therapy. Mol Genet Metab. 2024 Jan;141(1):108114.
Pool JL, Glazer R, Crikelair N, Levy D. The role of baseline blood pressure in guiding treatment choice. Clin Drug Investig. 2009;29(12):791-802.
Weycker D, Keskinaslan A, Levy DG, Edelsberg J, Kartashov A, Oster G. Effectiveness of add-on therapy with amlodipine in hypertensive patients receiving valsartan. Blood Press Suppl. 2008 Dec;2:5-12.
Black HR, Levy DG, Crikelair N, Rocha R. Predicting age- and dose-related responses to antihypertensive therapy. J Am Soc Hypertens. 2008 Nov-Dec;2(6):476-83.
Sowers JR, Lastra G, Rocha R, Seifu Y, Crikelair N, Levy DG. Initial combination therapy compared with monotherapy in diabetic hypertensive patients. J Clin Hypertens (Greenwich). 2008 Sep;10(9):668-76.
Weir MR, Levy D, Crikelair N, Rocha R, Meng X, Glazer R. Time to achieve blood-pressure goal. Am J Hypertens. 2007 Jul;20(7):807-15.
Weycker D, Edelsberg J, Vincze G, Levy DG, Kartashov A, Oster G. Blood pressure control in patients initiating antihypertensive therapy. Ann Pharmacother. 2008 Feb;42(2):169-76.
Weir MR, Crikelair N, Levy D, Rocha R, Kuturu V, Glazer R. Evaluation of the dose response with valsartan and valsartan/hydrochlorothiazide. J Clin Hypertens (Greenwich). 2007 Feb;9(2):103-12.
Levy DG, Stergachis A, McFarland LV, Van Vorst K, Graham DJ, Johnson ES, Park BJ, Shatin D, Clouse JC, Elmer GW. Antibiotics and Clostridium difficile diarrhea in the ambulatory care setting. Clin Ther. 2000 Jan;22(1):91-102.
Disch DL, O'Connor GT, Birkmeyer JD, Olmstead EM, Levy DG, Plume SK. Changes in patients undergoing coronary artery bypass grafting: 1987-1990. Ann Thorac Surg. 1994 Feb;57(2):416-23.
O'Connor GT, Morton JR, Diehl MJ, Olmstead EM, Coffin LH, Levy DG, Maloney CT, Plume SK, Nugent W, Malenka DJ, et al. Differences between men and women in-hospital mortality associated with coronary artery bypass graft surgery. Circulation. 1993 Nov;88(5 Pt 1):2104-10.
Dodds TM, Stone JG, Coromilas J, Weinberger M, Levy DG. Prophylactic nitroglycerin infusion during noncardiac surgery does not reduce perioperative ischemia. Anesth Analg. 1993 Apr;76(4):705-13.
O'Connor GT, Plume SK, Olmstead EM, Coffin LH, Morton JR, Maloney CT, Nowicki ER, Levy DG, Tryzelaar JF, Hernandez F, et al. Multivariate prediction of in-hospital mortality associated with coronary artery bypass graft surgery. Circulation. 1992 Jun;85(6):2110-8.

Drew Levy is the founder of GoodScience, where he focuses on statistical modeling, prediction, causal inference, and evidence quality in healthcare. His work examines how evidence is generated, how signal is distinguished from noise, and how analytic methods support reliable decision-making under uncertainty. He practices the craft of analysis with a principled framework and modern applied methods.

ABOUT

CONTACT

drew@dogoodscience.com