Sil Hamilton

Sil Hamilton · 2023-09-13T18:36:46.239Z

Happy to say I'll be speaking at CommHIT23 on AI & healthcare, hosted at the Kennedy Space Centre in Florida. I'm attending as an advisor for Health Tech Without Borders (https://www.htwb.org/).

Ithaca, New York, United States
899 followers 500+ connections

View mutual connections with Sil

Sil can introduce you to 10+ people at Epiq

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Join to view profile

Epiq

Cornell University

Personal Website

About

I’m a PhD student working with David Mimno and Matthew Wilkens at Cornell University in…

Activity

899 followers

Sil Hamilton

Sil Hamilton

1d
Report this post
Sil Hamilton shared this
In the news — 404 Media, VICE Media, Gizmodo.com, and Yahoo News wrote about David Mimno's and my new preprint paper on tell-tale signs of AI-generated stories! We found popular chatbots like Claude and ChatGPT are especially obsessed with telling stories about lighthouses 🤔 We investigated why... and found the original ChatGPT is probably to blame, but the problem is so deeply rooted in training data it might be hard to fix. See our paper here: https://lnkd.in/er9juGx7 404 Media article: https://lnkd.in/ey6BgzPh

Chatbots Keep Telling Stories About Lighthouse Keeper 'Elias Thorne'. We Might Know Why

Chatbots Keep Telling Stories About Lighthouse Keeper 'Elias Thorne'. We Might Know Why
1 Comment
Sil Hamilton

Sil Hamilton

2y
Report this post
Sil Hamilton shared this
AI needs journalism more than journalism needs AI! The Computer History Museum has published my article (co-written with Jennifer 8. Lee) on open-source #AI with other great pieces by Google News' Richard Gingras, Big Local News' Cheryl Phillips, The Globe and Mail's David Walmsley, Outlier Media's Candice Fortman (she/her), The Pivot Fund's Tracie Powell, and more! It was a pleasure to both attend the workshop and engage with so many. Read it here: https://lnkd.in/ehXvEuU7

public_profile__posts
5 Comments
Sil Hamilton

Sil Hamilton

2y
Report this post
Sil Hamilton shared this
My Knight Center class on building useful #genAI newsroom workflows with LangChain starts today—over 210 students! It'll be fun to see what people come up with. Registration is still open for another two weeks: https://lnkd.in/guUUK69f

Generative AI for journalists: Discovering what data can do

Generative AI for journalists: Discovering what data can do
2 Comments
Sil Hamilton

Sil Hamilton

2y
Report this post
Sil Hamilton shared this
Ecstatic to announce I'll be doing a second course with the Knight Center on #GenAI for journalists! Learn how to build and run useful agents on your laptop with Jupyter notebooks and LangChain—it starts in just a week! https://lnkd.in/guUUK69f

public_profile__posts
1 Comment
Sil Hamilton

Sil Hamilton

2y
Report this post
Sil Hamilton shared this
I was proud to sit on the jury watching folks’ creative #journalism & #AI projects at #MediaParty 2023 in Buenos Aires last week. LLMs are malleable! Great for hackathons and businesses alike—so many untapped use cases.

public_profile__posts
3 Comments
Sil Hamilton

Sil Hamilton

2y
Report this post
Sil Hamilton shared this
Such a great two days here at the Kennedy Space Center. Thank you CommHIT for hosting the conference—and a big thank you to Jarone Lee, MD, MPH & Health Tech Without Borders for inviting me to talk about language models.

public_profile__posts
1 Comment
Sil Hamilton

Sil Hamilton

2y
Report this post
Sil Hamilton shared this
I'm quoted in this month's WIRED! In Vauhini Vara's great piece on AI co-writing. On GPT-3.5, she notes it has "the witless efficiency of a stapler." True! To a LLM, words are concepts, and concepts are words. RLHF is linguistic relativity in action.

Confessions of a Viral AI Writer

Confessions of a Viral AI Writer
1 Comment
Sil Hamilton

Sil Hamilton

2y
Report this post
Sil Hamilton shared this
¡Hola argentina! I'll be in Buenos Aires this October for #MediaParty 2023 where I'll be giving a keynote on privacy & #LLMs. Large language models like ChatGPT & Claude are great, but there are better solutions for journalists handling sensitive information—I'll explore why local models are best for local news. See you there!

Media Party: Building the Future of Journalism and Media - Media Party

Media Party: Building the Future of Journalism and Media - Media Party
Sil Hamilton

Sil Hamilton

2y
Report this post
Sil Hamilton shared this
Happy to say I'll be speaking at CommHIT23 on AI & healthcare, hosted at the Kennedy Space Centre in Florida. I'm attending as an advisor for Health Tech Without Borders (https://www.htwb.org/).

public_profile__posts
4 Comments

See all activities

Experience

Epiq

New York, NY
-

New York, New York, United States
-
-
-
-

Montreal, Quebec, Canada
-
-

Montreal, Quebec, Canada
-

Hamilton, Ontario, Canada
-

Hamilton, Ontario, Canada
-

Hamilton, Ontario, Canada

Education

Cornell University

2024 - 2030

I'm studying machine learning and cultural analytics under David Mimno and Matthew Wilkens. My work investigates computational narrative understanding with large language models.
-

2021 - 2024
-

2017 - 2021

Volunteer Experience

Technology Advisor

Health Tech Without Borders

Jul 2023 - Aug 2024 1 year 2 months

Health

Advising the board on matters of artificial intelligence.
Technical Advisor

The Associated Press

Feb 2023 - Apr 2023 3 months

Science and Technology

I advised on AI when the Associated Press was formulating their AI policies.
Research Affiliate

Brown Institute for Media Innovation

Aug 2024 - Present 1 year 11 months

Education

I gave workshops and presentations at the Brown Institute.

Publications

NarraBench: A Comprehensive Framework for Narrative Benchmarking

EACL 2026 March 20, 2026
We present NarraBench, a theory-informed taxonomy of narrative-understanding tasks, as well as an associated survey of 78 existing benchmarks in the area. We find significant need for new evaluations covering aspects of narrative understanding that are either overlooked in current work or are poorly aligned with existing metrics. Specifically, we estimate that only 27% of narrative tasks are well captured by existing benchmarks, and we note that some areas -- including narrative events, style…

We present NarraBench, a theory-informed taxonomy of narrative-understanding tasks, as well as an associated survey of 78 existing benchmarks in the area. We find significant need for new evaluations covering aspects of narrative understanding that are either overlooked in current work or are poorly aligned with existing metrics. Specifically, we estimate that only 27% of narrative tasks are well captured by existing benchmarks, and we note that some areas -- including narrative events, style, perspective, and revelation -- are nearly absent from current evaluations. We also note the need for increased development of benchmarks capable of assessing constitutively subjective and perspectival aspects of narrative, that is, aspects for which there is generally no single correct answer. Our taxonomy, survey, and methodology are of value to NLP researchers seeking to test LLM narrative understanding.

Other authors
See publication
Too Long, Didn't Model: Decomposing LLM Long-Context Understanding With Novels

SIGHUM 2026 May 20, 2025
Although the context length of large language models (LLMs) has increased to millions of tokens, evaluating their effectiveness beyond needle-in-a-haystack approaches has proven difficult. We argue that novels provide a case study of subtle, complicated structure and long-range semantic dependencies often over 128k tokens in length. Inspired by work on computational novel analysis, we release the Too Long, Didn't Model (TLDM) benchmark, which tests a model's ability to report plot summary…

Although the context length of large language models (LLMs) has increased to millions of tokens, evaluating their effectiveness beyond needle-in-a-haystack approaches has proven difficult. We argue that novels provide a case study of subtle, complicated structure and long-range semantic dependencies often over 128k tokens in length. Inspired by work on computational novel analysis, we release the Too Long, Didn't Model (TLDM) benchmark, which tests a model's ability to report plot summary, storyworld configuration, and elapsed narrative time. We find that none of seven tested frontier LLMs retain stable understanding beyond 64k tokens. Our results suggest language model developers must look beyond "lost in the middle" benchmarks when evaluating model performance in complex long-context scenarios. To aid in further development we release the TLDM benchmark together with reference code and data.

Other authors
See publication
The Zero Body Problem: Probing LLM Use of Sensory Language

COLM 2025 April 8, 2025

Sensory language expresses embodied experiences ranging from taste and sound to excitement and stomachache. This language is of interest to scholars from a wide range of domains including robotics, narratology, linguistics, and cognitive science. In this work, we explore whether language models, which are not embodied, can approximate human use of embodied language. We extend an existing corpus of parallel human and model responses to short story prompts with an additional 18,000 stories…

Sensory language expresses embodied experiences ranging from taste and sound to excitement and stomachache. This language is of interest to scholars from a wide range of domains including robotics, narratology, linguistics, and cognitive science. In this work, we explore whether language models, which are not embodied, can approximate human use of embodied language. We extend an existing corpus of parallel human and model responses to short story prompts with an additional 18,000 stories generated by 18 popular models. We find that all models generate stories that differ significantly from human usage of sensory language, but the direction of these differences varies considerably between model families. Namely, Gemini models use significantly more sensory language than humans along most axes whereas most models from the remaining five families use significantly less. Linear probes run on five models suggest that they are capable of identifying sensory language. However, we find preliminary evidence suggesting that instruction tuning may discourage usage of sensory language. Finally, to support further work, we release our expanded story dataset.

See publication
Million Eyes on the “Robot Umps”: The Case for Studying Sports in HRI Through Baseball

2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI) March 4, 2025
In this position paper, we argue that baseball-and sports more broadly-provide a unique and under-explored opportunity for researchers to study human-robot interaction (HRI) in real-world settings. Using the rise of robot umpires in baseball as a primary example, we examine emerging themes such as power dynamics among players and umpires, labor implications, and technical challenges. We emphasize the affordances and benefits of studying sports within HRI, including the integration of…

In this position paper, we argue that baseball-and sports more broadly-provide a unique and under-explored opportunity for researchers to study human-robot interaction (HRI) in real-world settings. Using the rise of robot umpires in baseball as a primary example, we examine emerging themes such as power dynamics among players and umpires, labor implications, and technical challenges. We emphasize the affordances and benefits of studying sports within HRI, including the integration of interdisciplinary perspectives, the large-scale deployment of robots, and the examination of their role in deeply rooted cultural practices.

Other authors
See publication
A City of Millions: Mapping Literary Social Networks At Scale

Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities February 26, 2025
We release 70,509 high-quality social networks extracted from multilingual fiction and nonfiction narratives. We additionally provide metadata for~ 30,000 of these texts (73% nonfiction and 27% fiction) written between 1800 and 1999 in 58 languages. This dataset provides information on historical social worlds at an unprecedented scale, including data for 2,510,021 individuals in 2,805,482 pair-wise relationships annotated for affinity and relationship type. We achieve this scale by automating…

We release 70,509 high-quality social networks extracted from multilingual fiction and nonfiction narratives. We additionally provide metadata for~ 30,000 of these texts (73% nonfiction and 27% fiction) written between 1800 and 1999 in 58 languages. This dataset provides information on historical social worlds at an unprecedented scale, including data for 2,510,021 individuals in 2,805,482 pair-wise relationships annotated for affinity and relationship type. We achieve this scale by automating previously manual methods of extracting social networks; specifically, we adapt an existing annotation task as a language model prompt, ensuring consistency at scale with the use of structured output. This dataset serves as a unique resource for humanities and social science research by providing data on cognitive models of social realities.

Other authors
See publication
Detecting Mode Collapse in Language Models via Narration

Workshop on the Scaling Behavior of Large Language Models February 6, 2024

No two authors write alike. Personal flourishes invoked in written narratives, from lexicon to rhetorical devices, imply a particular author--what literary theorists label the implied or virtual author; distinct from the real author or narrator of a text. Early large language models trained on unfiltered training sets drawn from a variety of discordant sources yielded incoherent personalities, problematic for conversational tasks but proving useful for sampling literature from multiple…

No two authors write alike. Personal flourishes invoked in written narratives, from lexicon to rhetorical devices, imply a particular author--what literary theorists label the implied or virtual author; distinct from the real author or narrator of a text. Early large language models trained on unfiltered training sets drawn from a variety of discordant sources yielded incoherent personalities, problematic for conversational tasks but proving useful for sampling literature from multiple perspectives. Successes in alignment research in recent years have allowed researchers to impose subjectively consistent personae on language models via instruction tuning and reinforcement learning from human feedback (RLHF), but whether aligned models retain the ability to model an arbitrary virtual author has received little scrutiny. By studying 4,374 stories sampled from three OpenAI language models, we show successive versions of GPT-3 suffer from increasing degrees of "mode collapse" whereby overfitting the model during alignment constrains it from generalizing over authorship: models suffering from mode collapse become unable to assume a multiplicity of perspectives. Our method and results are significant for researchers seeking to employ language models in sociological simulations.

See publication
Mrs. Dalloway Said She Would Segment the Chapters Herself

Workshop for Narrative Understanding July 14, 2023
This paper proposes a sentiment-centric pipeline to perform unsupervised plot extraction on non-linear novels like Virginia Woolf’s Mrs. Dalloway, a novel widely considered to be “plotless. Combining transformer-based sentiment analysis models with statistical testing, we model sentiment’s rate-of-change and correspondingly segment the novel into emotionally self-contained units qualitatively evaluated to be meaningful surrogate pseudo-chapters. We validate our findings by evaluating our…

This paper proposes a sentiment-centric pipeline to perform unsupervised plot extraction on non-linear novels like Virginia Woolf’s Mrs. Dalloway, a novel widely considered to be “plotless. Combining transformer-based sentiment analysis models with statistical testing, we model sentiment’s rate-of-change and correspondingly segment the novel into emotionally self-contained units qualitatively evaluated to be meaningful surrogate pseudo-chapters. We validate our findings by evaluating our pipeline as a fully unsupervised text segmentation model, achieving a F-1 score of 0.643 (regional) and 0.214 (exact) in chapter break prediction on a validation set of linear novels with existing chapter structures. In addition, we observe notable differences between the distributions of predicted chapter lengths in linear and non-linear fictional narratives, with the latter exhibiting significantly greater variability. Our results hold significance for narrative researchers appraising methods for extracting plots from non-linear novels.

Other authors
See publication
Blind Judgement: Agent-Based Supreme Court Modelling with GPT

Creative AI Across Modalities, AAAI 2023 February 14, 2023

We present a novel Transformer-based multi-agent system for simulating the judicial rulings of the 2010-2016 Supreme Court of the United States. We train nine separate models with the respective authored opinions of each supreme justice active ca. 2015 and test the resulting system on 96 real-world cases. We find our system predicts the decisions of the real-world Supreme Court with better-than-random accuracy. We further find a correlation between model accuracy with respect to individual…

We present a novel Transformer-based multi-agent system for simulating the judicial rulings of the 2010-2016 Supreme Court of the United States. We train nine separate models with the respective authored opinions of each supreme justice active ca. 2015 and test the resulting system on 96 real-world cases. We find our system predicts the decisions of the real-world Supreme Court with better-than-random accuracy. We further find a correlation between model accuracy with respect to individual justices and their alignment between legal conservatism & liberalism. Our methods and results hold significance for researchers interested in using language models to simulate politically-charged discourse between multiple agents.

See publication
MultiHATHI: A Complete Collection of Multilingual Prose Fiction in the HathiTrust Digital Library

Journal of Open Humanities Data February 8, 2023
This dataset provides detailed metadata on ca. 10.2 million works of fiction and non-fiction written after 1799 in 521 different languages available in the HathiTrust Digital Library. The dataset bolsters the May 2022 Hathifile by supplying missing predicted fiction tags with a bespoke BERT-based multilingual classifier. Our classifier completes the catalogue with an additional 400,000 non-English volumes predicted to be works of fiction, capturing 95% of all works presently provided by…

This dataset provides detailed metadata on ca. 10.2 million works of fiction and non-fiction written after 1799 in 521 different languages available in the HathiTrust Digital Library. The dataset bolsters the May 2022 Hathifile by supplying missing predicted fiction tags with a bespoke BERT-based multilingual classifier. Our classifier completes the catalogue with an additional 400,000 non-English volumes predicted to be works of fiction, capturing 95% of all works presently provided by HathiTrust. We provide each work with metadata including the work’s genre at the level of fiction or non-fiction, length in pages, original language, and the year the work was published. With a total page count of ca. 1.4 billion pages, our dataset provides researchers with a substantial source of non-English modern literature. We also present insight into how multilingual classifiers can be trained with monolingual data, itself a discovery with implications for the study of lower resource languages. We hope our provisions will accelerate empirical research into non-English prose and literature.

Other authors
See publication
The COVID That Wasn’t: Counterfactual Journalism using GPT

SIGHUM, COLING 2022 December 10, 2022
In this paper, we explore the use of large language models to assess human interpretations of real world events. To do so, we use a language model trained prior to 2020 to artificially generate news articles concerning COVID-19 given the headlines of actual articles written during the pandemic. We then compare stylistic qualities of our artificially generated corpus with a news corpus, in this case 5,082 articles produced by CBC News between January 23 and May 5, 2020. We find our artificially…

In this paper, we explore the use of large language models to assess human interpretations of real world events. To do so, we use a language model trained prior to 2020 to artificially generate news articles concerning COVID-19 given the headlines of actual articles written during the pandemic. We then compare stylistic qualities of our artificially generated corpus with a news corpus, in this case 5,082 articles produced by CBC News between January 23 and May 5, 2020. We find our artificially generated articles exhibits a considerably more negative attitude towards COVID and a significantly lower reliance on geopolitical framing. Our methods and results hold importance for researchers seeking to simulate large scale cultural processes via recent breakthroughs in text generation.

Other authors
See publication

Honors & Awards

NSERC Postgraduate Scholarships-Doctoral (PGS-D)

Natural Sciences and Engineering Research Council of Canada

Apr 2025

3 year fellowship to pursue extracting cultural concepts from neural networks.
Steamship Fellowship for Language AI at Writing Atlas

Steamship, Inc.

Feb 2023
Joseph-Armand Bombardier Canada Graduate Scholarship

Social Sciences and Humanities Research Council of Canada

Aug 2021

Languages

English

Native or bilingual proficiency
French

Professional working proficiency
Dutch

Limited working proficiency

View Sil’s full profile

See who you know in common
Get introduced
Contact Sil directly

Join to view full profile

Explore more posts

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses

See all courses

Sil Hamilton

Ithaca, New York, United States 899 followers 500+ connections

About

Activity

899 followers

Sil Hamilton

Sil Hamilton

Sil Hamilton

Sil Hamilton

Sil Hamilton

Sil Hamilton

Sil Hamilton

Sil Hamilton

Sil Hamilton

Experience

-

-

-

-

-

-

-

-

-

-

Education

-

-

Volunteer Experience

Technology Advisor

Technical Advisor

Research Affiliate

Publications

EACL 2026 March 20, 2026

SIGHUM 2026 May 20, 2025

COLM 2025 April 8, 2025

2025 20th ACM/IEEE International Conference on Human-Robot Interaction (HRI) March 4, 2025

Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities February 26, 2025

Workshop on the Scaling Behavior of Large Language Models February 6, 2024

Workshop for Narrative Understanding July 14, 2023

Creative AI Across Modalities, AAAI 2023 February 14, 2023

Journal of Open Humanities Data February 8, 2023

SIGHUM, COLING 2022 December 10, 2022

Honors & Awards

NSERC Postgraduate Scholarships-Doctoral (PGS-D)

Natural Sciences and Engineering Research Council of Canada

Steamship Fellowship for Language AI at Writing Atlas

Steamship, Inc.

Joseph-Armand Bombardier Canada Graduate Scholarship

Social Sciences and Humanities Research Council of Canada

Languages

English

Native or bilingual proficiency

French

Professional working proficiency

Dutch

Limited working proficiency

View Sil’s full profile

Explore more posts

Explore top content on LinkedIn

Add new skills with these courses

Create Your Own Code Assistant with Llama 2, Node.js, and React.js

Advanced RAG Applications with Vector Databases

LLM Foundations: Vector Databases for Caching and Retrieval Augmented Generation (RAG)

Ithaca, New York, United States
899 followers 500+ connections