Appendix A — Did Authors Respond to Unjournal Evaluations?

Author

The Unjournal

Published

November 7, 2025

Overview

This chapter analyzes whether research papers were substantively revised in response to Unjournal evaluations.

Research Questions

  1. Did authors update their papers after receiving Unjournal evaluations?
  2. What were the major changes between the evaluated version and later versions?
  3. Do the changes reflect evaluator suggestions?

Methodology

We use a multi-step approach:

  1. Identify paper pairs: Find papers that exist in both “before” (at time of evaluation) and “after” (latest version) states
  2. Match evaluations: Extract paper titles from evaluation markdown files and match against metadata to identify which papers received evaluations
  3. Extract and compare: Use PDF text extraction to identify line-level changes between versions
  4. LLM analysis (optional): Use GPT-4 to:
    • Identify major substantive changes
    • Extract key suggestions from evaluations
    • Assess whether changes align with evaluator feedback
  5. Present evidence: Show specific examples of likely evaluation-driven changes

Data Sources

  • Before versions: Papers in papers/ and more papers/ folders
  • After versions: Papers in latest_papers_post_UJ/ folder
  • Evaluations: Markdown files in unjournal_evaluations/
  • Metadata: Coda table with publication dates and DOI deposit dates
Limitation

The Coda API doesn’t expose DOI deposit dates from the public table. For this pilot analysis, we focus on papers we can confidently match between folders based on author names and titles.

Phase 1 Results: Paper Matching and Change Detection

Show code
import pandas as pd
import json
from pathlib import Path

# Load results from Phase 1 analysis
with open('paper_change_analysis/change_analysis_results.json', 'r') as f:
    results = json.load(f)

df_all = pd.DataFrame(results)

# Summary statistics
print(f"Total paper pairs analyzed: {len(df_all)}")
print(f"Papers with evaluations: {df_all['has_evaluation'].sum()}")
print(f"Papers with changes (>50 lines): {(df_all['total_changes'] > 50).sum()}")
print(f"Papers with BOTH evaluations AND changes: {((df_all['has_evaluation']) & (df_all['total_changes'] > 50)).sum()}")
Total paper pairs analyzed: 34
Papers with evaluations: 20
Papers with changes (>50 lines): 12
Papers with BOTH evaluations AND changes: 6

Papers with Evaluations and Substantial Changes

These are the most interesting cases for LLM analysis:

Show code
# Filter to papers with both evaluations and changes
candidates = df_all[
    (df_all['has_evaluation'] == True) &
    (df_all['total_changes'] > 50)
].copy()

# Display key metrics
candidates_display = candidates[[
    'paper_id', 'total_changes', 'additions_count', 'deletions_count',
    'text_length_change_pct', 'evaluation_files'
]].copy()

candidates_display['evaluation_count'] = candidates_display['evaluation_files'].apply(len)
candidates_display = candidates_display.drop('evaluation_files', axis=1)

candidates_display.columns = [
    'Paper', 'Total Line Changes', 'Lines Added', 'Lines Deleted',
    'Text Change %', 'Evaluation Count'
]

candidates_display
Paper Total Line Changes Lines Added Lines Deleted Text Change % Evaluation Count
7 Mark BuntaineMichael GreenstoneGuojun HeMengdi... 110 39 71 -0.829916 2
8 Vivi AlatasArun G ChandrasekharMarkus MobiusBe... 2403 931 1472 -36.556987 2
16 Verónica Salazar RestrepoGabriel Leite Mariant... 223 106 117 -0.479753 1
19 Augustin BergeronJean-Paul CarvalhoJoseph Henr... 3207 1678 1529 -1.438094 2
25 Robert W HahnNathaniel HendrenRobert D Metcalf... 1291 649 642 0.266932 2
32 Daron AcemogluCevat Giray AksoyCeren BaysanCar... 1660 849 811 3.863107 2

All Papers with Changes (No evaluation required)

Some papers show substantial changes even without matched evaluations:

Show code
# Papers with significant changes
changed_papers = df_all[df_all['total_changes'] > 200].copy()

changed_display = changed_papers[[
    'paper_id', 'total_changes', 'text_length_change_pct', 'has_evaluation'
]].sort_values('total_changes', ascending=False).copy()

changed_display.columns = ['Paper', 'Total Line Changes', 'Text Change %', 'Has Evaluation']

print(f"\nPapers with >200 line changes: {len(changed_display)}")
changed_display

Papers with >200 line changes: 11
Paper Total Line Changes Text Change % Has Evaluation
11 Abhijit BanerjeeMichael FayeAlan KruegerPaul N... 4489 -13.005971 False
19 Augustin BergeronJean-Paul CarvalhoJoseph Henr... 3207 -1.438094 True
21 Adrien BilalDiego R Känzig_The Macroeconomic I... 2614 2.021894 False
8 Vivi AlatasArun G ChandrasekharMarkus MobiusBe... 2403 -36.556987 True
9 B Kelsey JackSeema JayachandranNamrata KalaRoh... 2382 -41.188777 False
12 Bhargav BhatJonathan de QuidtJohannes Haushofe... 1730 2.253546 False
32 Daron AcemogluCevat Giray AksoyCeren BaysanCar... 1660 3.863107 True
26 Richard T CarsonJoshua S Graff ZivinJeffrey G ... 1401 30.152721 False
25 Robert W HahnNathaniel HendrenRobert D Metcalf... 1291 0.266932 True
2 Michael KremerJonathan D LevinChristopher M Sn... 1163 128.405166 False
16 Verónica Salazar RestrepoGabriel Leite Mariant... 223 -0.479753 True

Summary

From 34 confirmed paper pairs:

  • 20 papers had matched Unjournal evaluations (using title-based matching from evaluation markdown files)
  • 6 papers showed both evaluations AND substantial changes (>50 line changes)
  • 12 papers showed substantial revisions overall (some with evaluations, some without)

The six papers with both evaluations and substantial changes represent promising cases for further analysis to assess whether changes reflected evaluator feedback.

Next Steps

  1. Extract evaluation suggestions: Parse evaluation markdown files for specific suggestions
  2. LLM analysis: Run GPT-4 analysis on the 6 candidate papers to assess attribution
  3. Generate case studies: Present detailed examples with evidence

Placeholder for Results

Results from the LLM analysis will appear here once the analysis pipeline is complete.

Example Structure

For each paper pair, we’ll show:

  1. Paper metadata (authors, title, dates)
  2. Major changes detected (sections added/removed, methods changed, etc.)
  3. Evaluator suggestions (extracted from evaluation files)
  4. Attribution assessment (likely/unlikely influenced by evaluation)
  5. Evidence and quotes (specific textual evidence)

Phase 2: LLM Analysis (To Be Run)

The next step is to run GPT-4 analysis on the 6 papers with both evaluations and substantial changes. This will:

  1. Identify major substantive changes in each paper
  2. Extract specific suggestions from evaluations
  3. Assess whether changes likely reflect evaluator feedback
  4. Provide evidence and quotes supporting the assessment

Estimated cost: $1.50-6.00 (depending on paper/evaluation length) Estimated time: 10-30 minutes

To run the LLM analysis:

conda activate qpy311_arm
python paper_change_analysis/scripts/llm_change_attribution.py

Results will be saved to paper_change_analysis/llm_analysis/ and can be incorporated into this document.

Running the Full Pipeline

To reproduce this analysis:

# Phase 1: Identify paper pairs and extract text (completed)
conda activate qpy311_arm
python paper_change_analysis/scripts/analyze_paper_changes.py

# Phase 2: LLM analysis of changes and attribution (optional, requires OpenAI API key)
python paper_change_analysis/scripts/llm_change_attribution.py

# Phase 3: Render this document with results
quarto render paper_response_analysis.qmd