Language-Enhanced Generative Modeling for Amyloid
PET Synthesis from MRI and Blood Biomarkers
Zhengjie Zhanga,1, Xiaoxie Maob,c,1, Qihao Guod, Shaoting Zhanga, Qi
Huangc,∗, Mu Zhoue,∗, Fang Xiec,∗, Mianxin Liua,f,∗
aShanghai Artificial Intelligence Laboratory, Shanghai, 200082, China
bSchool of Medicine, Xiamen University, Xiamen, Fujian, China
cDepartment of Nuclear Medicine & PET Center, Huashan Hospital, Fudan
University, Shanghai, China
dDepartment of Gerontology, Shanghai Jiao Tong University Affiliated Sixth People’s
Hospital, Shanghai, China
eDepartment of Computer Science, Rutgers University, New Brunswick, New
Jersey, United States
fShenzhen Institutes of Advanced Technology, Chinese Academy of
Sciences, Shenzhen, China
Abstract
Alzheimer’s disease (AD) diagnosis heavily relies on amyloid-beta positron
emission tomography (Aβ-PET), which is limited by its high cost and lim-
ited accessibility. This study explores whether Aβ-PET spatial patterns can
be predicted from blood-based biomarkers (BBMs) and magnetic resonance
imaging(MRI)scans. WecollectedAβ-PETimages, T1-weightedMRIscans,
and BBMs from 566 participants. A language-enhanced generative model,
driven by a large language model (LLM) and multimodal information fusion,
was developed to synthesize PET images. Synthesized images were evaluated
for image quality, diagnostic consistency, and clinical applicability within a
fully automated diagnostic pipeline. The synthetic PET images closely re-
semble real PET scans in both structural details (SSIM = 0.920±0.003)
and regional patterns (Pearson’s R = 0.955±0.007). In physician evalu-
ations, the diagnostic outcomes using synthetic PET show high agreement
with real PET–based diagnoses (accuracy = 0.80). We developed a fully au-
∗Corresponding authors: liumianxin@pjlab.org.cn (M. Liu); fangxie@fudan.edu.cn (F. Xie)
muzhou1@gmail.com (M. Zhou); hq_1124@163.com (Q. Huang)
1These authors contributed equally to this work.arXiv:2511.02206v2  [cs.CV]  16 Nov 2025
tomatic AD diagnostic pipeline integrating PET synthesis and classification.
The synthetic PET–based model (AUC = 0.78) outperforms T1-based (AUC
= 0.68) and BBM-based (AUC = 0.73) models, while combining synthetic
PET and BBMs further improved performance (AUC = 0.79). Our method
significantly outperform conventional MRI- and BBM-based synthesis and
diagnosis methods. Ablation analysis supports the advantages of LLM inte-
gration and prompt engineering. Our language-enhanced generative model
demonstrates strong capability in synthesizing realistic PET images, enhanc-
ing the utility of MRI and BBMs for Aβspatial pattern assessment. The de-
veloped fully automatic and cost-effective AD diagnostic pipeline improves
the diagnostic workflow for Alzheimer’s disease.
Keywords:Blood-based Biomarkers, PET Synthesis, Large Language
Model, Multimodal, Generative Model
1. Introduction
Alzheimer’sdisease(AD)isaneurodegenerativedisordercausingprogres-
sive cognitive and functional decline in patients [1, 2]. Although treatments
such as lecanemab and donanemab have shown promise, AD progression re-
mains irreversible [3, 4, 5]. Therefore, timely and accurate diagnosis is essen-
tial for effective AD screening and management. To date, amyloid-beta (Aβ)
and tau pathology detection primarily rely on positron emission tomography
(PET) imaging and cerebrospinal fluid sampling [6, 7, 8]. In particular, PET
imaging remains the only in vivo, non-invasive method capable of provid-
ing refined spatial distributions of Aβand tau depositions, which are crucial
for AD subtyping, staging, and prognosis [9, 10, 11]. However, PET imag-
ing is limited by its high cost, low accessibility, and the potential radiation
exposure to patients [12, 13].
To overcome these limitations, generative models have emerged as a
promisingapproachtosynthesiserealisticPETimagesfrommorecost-effective,
accessible, and non-radiative data modalities [14, 15]. Previous studies have
exploredAβ-PETsynthesisusingvariousdatasources,includingT1-weighted
MRI (T1 images) and functional MRI (fMRI) [16, 17]. T1 images provide
high-resolution anatomical details, while fMRI captures dynamic brain activ-
ity and functional connectivity. Although both modalities offer clinically rel-
evant information, they are not directly coupled with Aβdeposition [18, 19].
Consequently, analytical methods based on these modalities often struggle
2
to achieve high-quality, reliable Aβ-PET synthesis.
Recent advances in blood-based biomarkers (BBMs) offer a sensitive,
cost-effective assessment of Aβpathology at the global level [20, 21, 22, 23,
24]. BBM-based methods have successfully predicted Aβ-PET positivity and
global standardised uptake value ratios (SUVRs) [25, 26, 27, 28]. Impor-
tantly, BBMs are minimally invasive, easy to collect, and inexpensive to
apply in clinical settings [29, 27]. However, while BBMs reflect the macro-
scopic state of Aβpathology, they lack spatial resolution and are insufficient
to represent detailed deposition patterns captured by PET imaging. Given
the complementary nature of BBMs and MRI, integrating these modalities
into a multimodal framework could enable refined, high-quality, and cost-
effective Aβ-PET synthesis.
While T1 images offer detailed spatial information, BBMs are numerical
data without inherent spatial context. This modality disparity presents a
major challenge for multimodal fusion. Leveraging the capabilities of large
medical language models (LLMs) and prompt engineering [30, 31, 32, 33, 34],
we introduce a language-enhanced encoder that aligns BBMs with deep clin-
ical knowledge beyond raw numerical representation. Trained on large-scale
medical corpora, the LLM captures clinically meaningful semantics. By
translating non-imaging data into context-rich representations, we explore
prompt-guided LLMs for extracting knowledge-enhanced BBM features. In-
tegrating these with T1-derived spatial features, we enable more effective
multimodal generative modelling for AD assessment.
In this study, we propose a language-enhanced generative model based on
clinicalT1 imagesandBBMs tosynthesiserealisticPETimagesforimproved
Aβspatial pattern assessment. We comprehensively evaluate the synthetic
PET images in terms of image quality, diagnostic feature consistency (in-
cluding physician-based assessment), and clinical applicability using a large
cohort (N = 566). The proposed model outperforms conventional methods
across all evaluation dimensions, primarily due to its LLM-driven feature en-
hancement and prompt-based learning strategy. Our results highlight that
this generative AI framework provides a cost-effective solution to support
large-scale AD screening, diagnosis, and management.
3
2. METHODS
2.1. Participants and materials
2.1.1. Participants
Participants for this study were recruited from Huashan Hospital and the
Sixth People’s Hospital affiliated with Shanghai Jiao Tong University. The
research was approved by the institutional review boards of both hospitals
and all participants provided written informed consent. Data collection was
registered under the identifier "ChiCTR2000036842" with the title "Con-
struction of a Pre-clinical AD Neuroimaging Cohort Using the ATN System"
(http://www.chictr.org.cn/showproj.aspx?proj=59802).
Participants were selected according to the following exclusion criteria:
1) fewer than 5 years of education, 2) a history of neurological or psychiatric
disorders, 3) other neurological conditions beyond AD spectrum disorders,
and 4) significant alcohol or drug abuse. In total, 566 participants were
enrolled and data collection commenced in December 2018 and concluded in
July 2022. The dataset includes demographic details (age, gender, years of
education), MR and PET imaging data, 7 types of BBMs, and results from
8 neuropsychological tests.
2.1.2. MR and PET imaging
T1 images were acquired using a 3T Prisma MRI scanner (Siemens, Er-
langen, Germany) at the Shanghai Jiao Tong University Affiliated Sixth
People’s Hospital. The imaging was performed with a 3D magnetization-
prepared rapid gradient-echo sequence, employing the following parameters:
repetition time = 3000 ms, echo time = 2.56 ms, flip angle = 7°, acquisition
matrix = 320×320, in-plane resolution = 0.8×0.8 mm², slice thickness =
0.8 mm, and 208 sagittal slices.
Aβ-PET imaging was performed using a Biograph mCT Flow PET/CT
scanner (Siemens Healthineers, Erlangen, Germany) at the Department of
NuclearMedicine&PETCenter,HuashanHospital,FudanUniversity,within
two weeks following the MRI scans [35]. Participants received an intravenous
injection of 18F-AV45 at approximately 7.4 MBq/kg. After a 50-minute
rest period, they underwent low-dose CT scanning and PET scanning. The
PET scans were subsequently processed using a filtered back-projection re-
construction algorithm, which included corrections for decay, normalization,
dead time, photon attenuation (based on CT), scatter, and random coinci-
4
dences. The final PET images had a matrix size of 168×168×148 voxels
and a voxel resolution of 2.04×2.04×1.5 mm³.
2.1.3. Blood-based biomarkers for AD
We collected plasma samples before PET scans, stored them at -80°C,
and subsequently analysed these samples using the Quanterix Simoa HD-
1 platform [23, 25]. The measurement of plasma AD biomarkers included
Aβ42, Aβ40, T-tau (Neurology 3-Plex A assay kit, lot 502838), p-tau181
(Assay Kit V2, lot 502923), and neurofilament light chain (NFL, NF-light
assay kit, lot 202700)[36, 37]. The Aβ42/40 ratio was calculated as a nor-
malized measure of Aβpathology. The p-tau181/Aβ42 ratio was computed
to assess the relationship between tau phosphorylation and amyloid pathol-
ogy. In addition, NFL concentrations were measured to indicate the level of
neurodegeneration. Technicians blinded to clinical imaging data performed
the measurements to avoid potential bias. Plasma biomarker concentrations
were reported in pg/mL.
2.1.4. Neuropsychological assessment
Systematicneuropsychologicalassessments(NAs)wereconductedbytrained
neuropsychologistsatShanghaiJiaoTongUniversityAffiliatedSixthPeople’s
Hospital. This evaluation comprised 8 assessments: Mini-Mental State Ex-
amination (MMSE), Montreal Cognitive Assessment-Basic (MoCA-B), au-
ditory verbal learning test 30-minute long-delayed free recall (AVLT-LDR),
AVLT-recognition (AVLT-R), animal fluency test (AFT), 30-item Boston
naming test (BNT), shape trails test A and B (STT-A and STT-B) [38, 39,
40, 41, 42]. We obtained 8 neuropsychological scores from these assessments:
MMSE and MoCA-B for global cognitive functions; AVLT-LDR and AVLT-
R for memory function; AFT and BNT (total scores) for language function;
and STT-A and STT-B (time to completion) for executive function. Higher
scores indicate better cognitive ability for all measures except STT-A and
STT-B, where shorter completion times reflect better executive function. It
should be noted that we have included all available scores to patients. In
particular, MMSE and MoCA-B scores were completed by all patients.
The criteria for diagnosing AD dementia (ADD) and mild cognitive im-
pairment (MCI) were derived from the 2011 National Institute on Aging-
Alzheimer’sAssociation(NIA-AA)guidelinesforADD,alongwithamodified
version of Jak and Bondi’s criteria[43, 44]. MCI was identified in participants
who showed impairment in one or more cognitive domains or had scores more
5
than one standard deviation below the norm in each of the three cognitive
domains[45]. Participants exhibiting no signs of cognitive impairment were
classified as cognitively unimpaired (CU).
2.1.5. Data preprocessing
T1 and PET images were preprocessed using Advanced Normalization
Tools (ANTs, https://antspy.readthedocs.io/en/latest/index.html). For T1
images, N4 bias correction was applied to mitigate bias field effects, followed
by skull stripping with a pretrained deep learning method from ANTsPyNet
(https://antspy.readthedocs.io/en/latest/segmentation.html). PET images
were registered to the corresponding T1 images using an affine transforma-
tioncomputedbyANTsafteracquisitionandreconstruction. Thesamebrain
mask from the T1 images was used to perform skull stripping on the PET im-
ages. To convert PET intensity values to SUVR, we normalized them against
the average regional value of the bilateral cerebellar crus. These regions
were identified using an affine registration from the automated anatomical
labeling-116 (AAL-116) atlas, aligning the template T1 brain images to the
individual T1 space. All images were then resized to 128×128×128 voxels
and scaled to intensity values between 0 and 1 before being fed into the AI
models.
2.2. Language-enhanced generative framework
Figure 1 illustrates our language-enhanced generative framework for syn-
thesizing PET images from T1 images and non-imaging data composed of
patient demographics, BBMs, and NAs. The framework comprises two main
components: a language-enhanced encoder and a generative adversarial net-
work (GAN). Specifically, the language-enhanced encoder first transforms
all non-imaging data into a Global-context Prompt, which summarizes key
clinical information, and then encodes this prompt into semantic features
using an LLM. The generative adversarial network consists of a generator
and a discriminator. The generator takes T1 images and the encoded text
features as inputs to produce the corresponding PET images. Meanwhile,
the discriminator distinguishes between real and synthetic PET images for
adversarial training. To enable effective multimodal integration, two sepa-
rate text-image fusion modules are respectively designed for the generator
and the discriminator.
6
(A) Language -enhanced encoder (C) Architecture comparison
(B) Generative adversarial networkT1-only
T1+BBMs -Num
T1+BBMs -LLM (Our)Text 
feature
BBMs   Demographics   NAs          
Doctor
diagnosis           
Model
prediction           
 MLP            
LLM
T1
Real PET
Synthetic  PETText 
feature
Generator
 Discriminatoror T1
DecoderEncoder
PET
BBMs T1
MLPs
DecoderEncoder
PET
BBMs T1
DecoderEncoder
PETLLMPromptTraining -onlyTraining & Inference
Real
Syntheticor [Diagnosis / Prediction ] The Aβ -PET is positive .  
[Demographics ]The patient is a 58 years old female  with 12 years of education. 
[Blood -based biomarkers ] Blood biomarker information is as follows:  the 
concentration of Aβ40 is 227.1  pg/mL, Aβ42 is 3.18 pg/mL…(other BBMs) …
[Neuropsychological assessments ] Cognitive function assessment scale scores 
are as follows: MMSE is 14, MoCA -B is 9, A VLT -N5 is none ,…(other NAs) …Global -context promptTraining -only
Inference -onlyFigure1: Overviewofourproposedlanguage-enhancedgenerativeframeworkforPETsyn-
thesis.(A) Language-enhanced encoder:Clinical variables including demographics,
BBMs, and NAs are formatted into a global-context prompt, which begins with a global
PET characteristics statement (ground-truth diagnosis for training and clinical-variable-
based prediction for inference, see Methods), followed by structured clinical details. The
prompt is encoded into text features using a medical LLM.(B) Generative adversarial
network:The generator uses T1 images along with text features to synthesize Aβ-PET
images. The discriminator evaluates the authenticity of PET images by comparing them
with paired T1 images and text features only during training stage.(C) Architecture
comparison:This component illustrates the key differences between our method and
two representative baselines. Our approach ("T1+BBMs-LLM") synthesizes PET images
using T1-weighted images and BBMs encoded as text features via an LLM. In contrast,
the "T1-only" baseline uses only T1 images, while the "T1+BBMs-Num" baseline incor-
porates BBMs as normalized numerical features.
2.2.1. Language-enhanced encoder
The language-enhanced encoder is designed to extract semantic features
from a range of clinical variables (BBMs, demographics, and NAs). This
process consists of two main stages: (1) transforming these variables into
a unified textual format, referred to as the global-context prompt, and (2)
7
encoding this prompt into semantic features using a medical LLM.
Prompt engineering involves mapping numerical clinical variables into
structuredtext, whichiscrucialforguidingtheLLMtoproduceaccurateand
refined outputs [46, 47]. To enable this, we design a global-context prompt
that combines a summary-level descriptor of Aβpathology with a structured
representationofkeyclinicalvariables(see“Global-contextprompt” inFigure
1). This prompt provides a semantic lead-in to orient the LLM’s interpre-
tation while integrating all relevant inputs into a coherent, language-based
format for effective context modeling.
The global-context prompt consists of two components: (i) a summary
diagnostic sentence reflecting the overall Aβpathology status (e.g., “The Aβ-
PET is positive/negative.”), and (ii) a structured list of 18 clinical variables,
including age, gender, years of education, Aβ40, Aβ42, T-Tau, P-Tau 181,
NFL, Aβ42/40, P-Tau 181/Aβ42, MMSE, MoCA-B, AVLT-N5, AVLT-N7,
AFT, BNT, STT-A, and STT-B. To ensure structural consistency, missing
values are explicitly denoted as “None”.
During training, the summary sentence is derived from the gold-standard
Aβ-PET diagnosis obtained through expert assessment. In the inference
stage, where real PET images and expert evaluations are unavailable, we
instead derive a coarse prediction for constructing this summary using a
prediction model based on clinical variables. Specifically, a two-layer mul-
tilayer perceptron (MLP) is trained to predict Aβ-PET positivity based on
11 clinical variables, excluding gender and 6 NAs that exhibit substantial
missingness. The model’s binary prediction is then used to construct the
summary sentence. Notably, this predicted summary serves only as guidance
for the LLM and does not explicitly determine the final characteristics of the
synthetic PET image. As demonstrated in Section 3.4 and Figure S1, the
generated PET images remain capable of capturing realistic Aβpathology
patterns even when the predicted summaries contain mistakes.
For the feature extraction, we utilize a pretrained BioMedBERT model
[48] that specializes in biomedical natural language processing. Trained on a
diverse range of biomedical texts (e.g., PubMed), BioMedBERT has a broad
knowledge base to encode clinical text information. The model processes
text data by first tokenizing it into smaller units of words. These tokens
are then analysed through multiple transformer layers, which capture the
context and semantic meaning of the text inputs. This critical design results
inthegenerationofaknowledge-enhancedfeaturerepresentationoftheinput
clinical texts.
8
T1
Generator Fusion ModuleMaxPool3dDouble Conv
MaxPool3dDouble Conv
MaxPool3dDouble Conv
Double ConvUpsampleDouble Conv
UpsampleDouble Conv
UpsampleDouble Conv
Text FeatureSingle Conv
Skip connection
Skip connection
Skip connectionSynthetic  PET
Concat
Discriminator Fusion Module
Text Feature
Conv
Linear
Real  or  Synthetic ?Conv × 4Synthetic  PET Real PET T1
γ, βImage Feature Text Feature
MLPs Scale, Shift
Fusion Feature
Image Feature Text Feature
Repeat
Fusion FeatureLinear
Concat
(A) Generator                                               (B) Discriminator (C) Fusion ModuleGenerator Fusion Module
Discriminator Fusion ModuleFigure 2: Detailed architecture of the proposed framework.(A) Generator:The gener-
ator employs a three-layer U-Net architecture. The encoder includes three downsampling
blocks with double convolutional layers and max pooling. The encoded T1 features are
then fused with text features. The decoder features three upsampling blocks with trilin-
ear interpolation and double convolutional layers, incorporating skip connections to retain
feature information. Finally, the decoder output is processed through a convolutional
layer to synthesize the PET image.(B) Discriminator:The discriminator begins by
concatenating the T1 and PET image, followed by feature extraction using four convo-
lutional layers. These image features are integrated with text features. The combined
features are subsequently processed through a convolutional layer and a fully connected
layer to produce the authenticity judgment for each PET image.(C) Fusion Module:
In the generator fusion module, text features generate scale and shift parameters through
non-linear multilayer perceptrons (MLPs), which adjust image features on a channel-wise
basis. In the discriminator fusion module, text features are first reduced in dimensional-
ity through a linear layer, then expanded by repeating the channels to fill the remaining
dimensions, and finally concatenated with the image features.
2.2.2. Generator and discriminator
In Figure 2, the generator synthesizes PET images from T1 images and
encoded text features, while the discriminator learns to distinguish between
real and synthetic PET images. Through iterative optimization of both com-
ponents, we find that the model can progressively improve the quality of the
synthetic PET images.
The generator is built using a U-Net architecture, a type of convolutional
neural network (CNN) [49]. U-Net features a contracting path (encoder) that
capturescontextthroughdownsamplingandfeaturecompression, andasym-
metric expanding path (decoder) that enables precise localization through
9
upsampling and feature decompression. Both the encoder and decoder use
convolutional layers, with skip connections established between correspond-
ing spatial resolutions. These skip connections transmit features directly to
the decoder, preserving finer details in the synthetic PET images. Addition-
ally, the bottleneck layer of the U-Net includes a generator fusion module to
integrate T1 features with text features.
The discriminator is a CNN model designed to classify PET images as
either synthetic or real classes. We extend the concept of paired adversarial
loss[50]tobeimplementedwiththreemodalities. Alongwithusingsynthetic
or real PET images as inputs, the discriminator is provided with correspond-
ing T1 images and text features as additional reference information. This
design enables the discriminator to reference all available modalities when
determining whether a PET image is real or synthetic. Specifically, the dis-
criminator employs four convolutional layers to extract paired features from
T1 and PET images, followed by merging these image features with text
features in the discriminator fusion module.
The loss function comprises three primary components: the generator
loss, the discriminator loss, and the Mean Squared Error (MSE) loss. The
generator loss (L G) is used during the training of generator and is defined as
the negative expected value of the log-probability that discriminator assigns
to synthetic PET images (where a probability closer to 1 indicates a more
realistic image).
LG(G, D) =−E[log(D(G(x 1, x2), x 1, x2))].(1)
E(.)denotestheoperatorofexpectedvaluecalculation(averaging).G(x 1, x2)
denotes the PET image synthesized by the generatorGfrom T1 imagex 1
and text featurex 2.D(G(x 1, x2), x 1, x2)represents the probability assigned
by the discriminatorDto the synthetic PET being real. Therefore, minimiz-
ingL Gmeans maximizing this probability for all cases. After training, the
generator is able to synthesize PET images that can deceive the discrimina-
tor.
During the training of the generator, the MSE loss (L MSE) is also used.
The MSE loss focuses on the voxel-level similarity between the synthetic and
real PET images, complementing the generator loss by addressing details
that may be overlooked.L MSEcalculates the expected value of the squared
difference between corresponding voxels in the synthetic and real PET im-
ages.
10
LMSE(G) =E[(y−G(x 1, x2))2],(2)
whereydenotes the real PET image.
ThetotalgeneratorlosscombinesL GandL MSE, weightedbyaparameter
λ:
LGtotal(G) =L G(G, D) +λL MSE(G).(3)
During the training of the discriminator, the discriminator loss (L D) is
utilized. The discriminator’s role is to distinguish between real and synthetic
PET images. To achieve this, the discriminator loss is defined to maximize
the probability that the discriminator assigns to real PET images (maximiz-
ingD(y, x 1, x2)) and to minimize the probability assigned to synthetic PET
images (maximizing1−D(G(x 1, x2), x 1, x2)). Consequently, the discrimina-
tor loss is expressed as
LD(G, D) =−E[logD(y, x 1, x2)]−E[log(1−D(G(x 1, x2), x 1, x2))].(4)
After training, the discriminator is able to distinguish between real and
synthetic PET images, engaging in adversarial training with the generator.
2.2.3. Fusion modules
In Figure 2, we design two fusion modules to integrate text and image
features for the generator and discriminator respectively. The discriminator
fusion module employs concatenation as previous generative models [51, 52].
This module applies a linear layer to adjust the dimensions of the text fea-
turesfrom756to32. Theseadjustedtextfeaturesarethenrepeatedtomatch
the dimensions of the image features. Finally, the text and image features
are concatenated along the channel dimension. Incorporating text features
into the discriminator allows it to utilize these features when assessing the
authenticity of PET images, thereby directing the generator to produce PET
images that more accurately reflect the text information.
In the generator, we aim to use T1 features as the primary component,
with text features serving to modulate them. Consequently, we designed the
generator fusion module based on the idea of feature-wise linear modulation
[53], rather than employing a concatenation. This module calculates a linear
modulation based on the text feature to manipulate the entire image feature,
resulting in a fused feature representation. Specifically, we employ two layers
of MLPs to estimate scaling parameters (γ) and shifting parameters (β),
11
whichmatchthechanneldimensionsoftheimagefeatures. Theseparameters
are used to scale and shift the image features in each channel, facilitating the
modulation of image features by the text features.
2.3. Evaluation of synthetic PET images
The PET images synthesized by our model are systematically evaluated
from three key perspectives: image quality, diagnostic feature consistency,
and clinical applicability. First, image quality is assessed based on the voxel-
level and region-level visual resemblance of synthetic PET images to real
PET images. Next, diagnostic feature consistency is examined to determine
whether the diagnostic information in the synthetic PET images aligns with
that in real PET images. Finally, clinical applicability is assessed to evaluate
how synthetic PET images can enhance AD assessment in terms of diagnostic
performance.
2.3.1. Evaluation of image quality
The quality of synthetic PET images is evaluated from two complemen-
tary perspectives. On the one hand, we consider global image quality metrics
that are widely used in computer vision. These voxel-level measures focus
on the overall similarity between synthetic and real images. Specifically, we
calculate the Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio
(PSNR), and Mean Squared Error (MSE) [54]. MSE quantifies the aver-
age voxel-wise discrepancy between synthetic and real PET images, PSNR
represents the ratio between the meaningful signal and the error, and SSIM
jointly assesses similarity in terms of luminance, contrast, and structure.
While these global metrics are important for benchmarking generative mod-
els, they do not fully capture the clinical utility of PET synthesis.
On the other hand, because PET interpretation is usually performed at
the brain region level, we further design region-based evaluation metrics tai-
lored to this application scenario. For each subject, SUVRs are first averaged
within anatomically defined regions, including 116 grey-matter regions from
the AAL-116 atlas and four additional white-matter regions segmented using
the SynthSeg toolbox [55]. Based on these regional SUVRs, we adopt two
complementary measures. First, to assess consistency across regions within
each subject, we compute the Pearson correlation coefficient (Pearson’s R)
between synthetic and real PET regional SUVRs. This relationship is also
visualized using scatter plots with fitted regression lines, providing an intu-
itive view of inter-region agreement. Second, to examine the distribution of
12
synthesis errors across the brain, we calculate the absolute difference between
synthetic and real PET SUVRs for each region and then average these er-
rors across all subjects. This region-level error profile highlights anatomical
areas where synthesis is more or less accurate and enables direct comparison
between different methods.
In the results, our method is labeled as "T1+BBMs-LLM", which synthe-
sizes PET images using T1-weighted images and BBMs encoded by LLM. We
compare the proposed method to two representative baselines: (i) a classic
image-to-image generation model using only T1 as input, labeled as "T1-
only", and (ii) a model that incorporates BBMs as numerical values, rather
than textual data, labeled "T1+BBMs-Num". In the "T1+BBMs-Num"
method, BBMs are encoded into normalized numerical values, which are
then directly used as features to assist in PET image synthesis.
2.3.2. Evaluation of diagnostic consistency
TodeterminewhethersyntheticPETimagespreservediagnosticallymean-
ingful features, we designed a two-part evaluation framework combining ex-
pert physician review and a scalable model-based assessment. This frame-
work aims to assess whether Aβpositivity can be reliably evaluated from
synthetic PET images using the same clinical criteria applied to real PET
images. A schematic summary of the complete diagnostic consistency eval-
uation workflow is shown in Fig. 3. The evaluation includes (1) a physician-
based assessment focusing on a representative subset and (2) a model-based
assessment applied to the full dataset. Details of each component are de-
scribed below.
Expert interpretation of PET imaging requires substantial clinical ex-
perience and time; therefore, this evaluation was performed on a carefully
selected subset of 50 cases. To ensure representativeness, the subset was
chosen to match the MSE distribution of the full dataset (see Supplemen-
tary Table S1), avoiding bias toward atypical cases. For each selected case,
two senior physicians independently assessed Aβpositivity based solely on
the synthetic PET images generated by our method, blinded to all clinical
information and the reference label. Any disagreement was adjudicated by
a third senior physician. The adjudicated diagnosis from synthetic PET was
then compared against a fixed reference label established previously from the
real PET together with relevant clinical information under routine clinical
procedures. This double-reading with arbitration mirrors standard clinical
practice and provides an unbiased human benchmark for diagnostic consis-
13
tency.
To scale the evaluation beyond the physician-reviewed subset and to en-
able direct comparison across different synthesis methods, we complemented
the expert assessment with a model-based evaluation. A 3D-ResNet [56]
classifier was trained on real PET images and their clinical labels to learn di-
agnosticpatternsconsistentwithphysiciandecision-making. Aftervalidation
on real PET data, the trained model was applied to synthetic PET images
generated by all three synthesis pipelines, including our proposed method,
the “T1-only” method, and the “T1+BBMs-Num” method. By comparing
the model’s predictions on real PET (P real) with its predictions on synthetic
PET produced by each method (P syn), we assess not only whether diagnostic
features are preserved within our method but also how its diagnostic consis-
tency compares with alternative synthesis strategies. Although the results
of this model-based evaluation are influenced by the classifier’s own perfor-
mance, the model demonstrated sufficiently stable accuracy on real PET and
therefore provides a reliable auxiliary tool for large-scale consistency assess-
ment and cross-method comparison, without substituting clinical judgment.
Diagnostic consistency between real and synthetic PET–based assess-
ments was quantified using Cohen’s Kappa [57]. Kappa values between 0
and 0.2 indicate poor consistency, 0.2 to 0.6 indicate moderate consistency,
and values≥0.6 indicate good consistency. Because Aβpositivity prediction
is a binary classification task, we additionally report accuracy, sensitivity,
specificity, and F1-score.
Model-based evaluation Physician-based evaluation
Train Inference
EvaluatorData Source
EvaluationSubset (50 cases)
Physician B Physician A
Physician C
Compare Diagnoses
Real vs Synthetic PETDiagnostic model     
(3D-ResNet) Full Dataset 
Diagnostic model 
(Trained )Real PET & LabelSynthetic PET
[by 3 methods]
Get prediction P_real Get prediction P_syn
Compare Prediction
P_real vs P_synSynthetic PET
[by our method]Label
Figure 3: Overview of the diagnostic consistency evaluation framework. The physician-
based evaluation uses a representative subset and a multi-reader arbitration workflow,
while the model-based evaluation scales the assessment to the full dataset using a diag-
nostic model trained on real PET images.
14
2.3.3. Evaluation of clinical applicability of fully AI pipeline
To further explore and evaluate the potential of synthetic PET images,
we develop an AI diagnostic model based on synthetic images and assess the
model performance on the clinical applicability.
In the absence of real PET images, we propose two models to evaluate the
utility and scalability of synthetic PET images. The first model relies solely
on synthetic PET images, while the second model integrates synthetic PET
images with all available clinical assessments to enhance diagnostic perfor-
mance. The first model is trained based on the 3D-ResNet architecture. The
second model employs the well-trained first model for feature extraction from
synthetic PET images and the established MLP model (same as described
in the LLM-based text encoder section) for feature extraction from clinical
assessments. These features are then fused at a logits layer.
We compare these two proposed models with models based on (i) T1,
(ii) BBMs, and (iii) a combination of T1 and BBMs. This comparison aims
to determine whether synthetic PET images offer advantages in predicting
Aβpositivity and whether they can be further optimized by incorporating
additional relevant variables. The evaluation metrics include AUC, accuracy,
sensitivity, specificity, and F1 score.
2.4. Implementation
For the generative model, the loss function weightλis set to 10 to balance
the losses. The Adam optimizer is used with an initial learning rate of 5.0e-4
for the generator and 1.0e-4 for the discriminator. The weight decay is set to
1.0e-4, and the parameters for the first and second moment estimates of the
gradients,β1 andβ2, are set to 0.5 and 0.999, respectively. The learning rate
decay strategy employs cosine annealing with a period of 100 epochs and a
minimum learning rate of 1.0e-7. Training is conducted for 100 epochs with
a batch size of 8.
For the classification model, the initial learning rate is set to 1.0e-4,
with the same optimization strategy and number of epochs as the generative
model, but with a batch size of 16. All models are initialized using Gaussian
initialization (mean = 0, standard deviation = 0.02). Training is imple-
mented using the PyTorch framework and performed on a single NVIDIA
GeForce GTX 3090Ti GPU.
15
2.5. Statistical analysis
Due to the non-normal distribution of the data, differences in gender are
tested using Fisher’s exact test, while differences in age, education years,
BBMs, and NAs are tested using the two-sided Mann–Whitney U test. For
model validation, the mean and standard deviation (STD) of performance
metrics from cross-validation are computed and reported. Performance met-
ric comparisons between different methods are conducted using one-sided
paired t-test. All statistical analyses are performed using python 3.10, scikit-
learn 1.5.0 and scipy 1.13.1.
3. RESULTS
3.1. Participant characteristics
This study includes 566 participants with three groups: cognitively unim-
paired (CU, n = 342), mild cognitive impairment (MCI, n = 134), and AD
dementia (ADD, n = 90). The characteristics of these participants are de-
pictedinTable1. Theageandsexdistributionsaresimilaracrossthegroups.
The median education level is 12.0 years in the CU group, which is signifi-
cantly higher than the 11.0 years in both the MCI and ADD groups. Further-
more, the education level in the MCI group is significantly higher than that
in the ADD group. Regarding plasma biomarkers, the ADD group exhibited
significantly higher levels of p-tau181, NFL, and P-Tau181/Aβ42 ratio, as
well as markedly lower levels of Aβ42 and Aβ42/40 ratio compared to the
CU and MCI groups. The MCI group also shows significant alterations in p-
tau181 and NFL biomarker levels compared to the CU group, although to a
lesser degree than the ADD group. Cognitive and functional assessments, in-
cluding MMSE, MoCA-B, and other 6 NAs (AVLT-LDR, AVLT-recognition,
AFT, and BNT) in cognitive sub-domains, demonstrate a clear decline from
CU to MCI and ADD. Conversely, STT-A and STT-B exhibit a significant
increasing trend across all three groups.
3.2. Evaluation of image quality
We evaluate the quality of the synthetic Aβ-PET images by assessing
both the overall image fidelity and accuracy in different brain regions. Our
method (T1+BBMs-LLM) demonstrates superior performance compared to
other methods in these aspects.
AsshowninFigure4A,ourmethodachievesthelowestMSEof0.00235±0.00021
andthehighestPSNRof27.24±0.26dB,indicatingthatthesynthesizedPET
16
Table 1: Participant characteristics
CU MCI ADD
Sample Size 342 134 90
Age (median (IQR)) 65.0 (10.0) 66.0 (10.0) 66.0 (11.0)
Females (No. (%)) 218 (63.7) 83 (61.9) 52 (57.8)
Education (median (IQR)) 12.0 (5.0) b*** 11.0 (4.0) a*** 9.0 (6.0) c***
Aβ40 (pg/mL) (median (IQR)) 196.43 (65.3) 193.05 (81.9) 197.40 (59.6)
Aβ42 (pg/mL) (median (IQR)) 10.2 (4.13) b** 10.3 (4.00) 9.02 (4.26) c**
Aβ42/40 rate (median (IQR)) 0.05 (0.02) b*** 0.05 (0.01) 0.04 (0.01) c***
T-Tau (pg/mL) (median (IQR)) 2.20 (1.19) 2.41 (1.28) 2.18 (1.55)
P-Tau 181 (pg/mL) (median (IQR)) 1.75 (1.06) b*** 1.97 (1.56) a* 3.36 (2.74) c***
NFL (pg/mL) (median (IQR)) 12.6 (8.25) b*** 14.8 (9.16) a*** 19.6 (9.18) c***
P-Tau 181/Aβ42 rate (median (IQR)) 0.18 (0.12) b*** 0.19 (0.17) 0.4 (0.37) c***
MMSE (median (IQR)) 28.0 (2.0) b*** 27.0 (2.0) a*** 18.0 (6.0) c***
MoCA-B (median (IQR)) 26.0 (4.0) b*** 22.0 (4.0) a*** 14.0 (6.0) c***
AVLT-LDR (median (IQR)) 5.0 (4.0) b*** 2.0 (3.0) a*** 0.0 (1.0) c***
AVLT-R (median (IQR)) 22.0 (2.0) b*** 18.0 (3.0) a*** 17.0 (4.5) c**
AFT (median (IQR)) 16.0 (5.0) b*** 13.0 (4.0) a*** 9.0 (5.0) c***
BNT (median (IQR)) 25.0 (5.0) b*** 21.0 (5.0) a*** 19.0 (7.0) c***
STT-A (median (IQR)) 44.0 (16.0) b*** 51.0 (19.0) a*** 75.0 (37.8) c***
STT-B (median (IQR)) 117.0 (45.0) b*** 147.0 (53.2) a*** 190.0 (84.0) c***
Aβ-PET positivity (No. (%)) 110 (32.2) b*** 54 (40.3) 72 (80.0) c***
Note:Data are presented as median (M) and interquartile range (IQR) or par-
ticipant number (n) and percentage (%). Data were compared using a two-tailed
Mann–Whitney U test or Fisher’s exact test.
Abbreviations:Aβ, amyloid-β; NFL, neurofilament light chain; IQR, interquartile
range; MMSE, Mini-Mental State Examination; MoCA-B, Montreal Cognitive Assess-
ment–Basic Version; AVLT, Auditory Verbal Learning Test; AVLT-LDR, delayed free
recall of the Auditory Verbal Learning Test; BNT, Boston Naming Test; AFT, Animal
Fluency Test; STT, Shape Trails Test.
Significance:(a) Significant difference between CU and MCI; (b) between CU and
ADD; (c) between MCI and ADD. *, **, and *** indicate significant differences be-
tween two groups after Bonferroni correction, with p < 0.05, p < 0.01, and p < 0.001,
respectively.
17
images have minimal voxel-level discrepancies from real PET images. In ad-
dition, an SSIM of 0.9202±0.0033 further confirms that our method preserves
overall structural similarity. The violin plots also show that the distribution
of evaluation results is more favorable for our method. Statistical tests sup-
portthesefindings, withourmethodshowingsignificantimprovementsacross
all metrics (MSE: P = 1.85e-14 for T1-only and P = 4.47e-6 for T1+BBMs-
Num; PSNR: P = 2.46e-37 for T1-only and P = 2.90e-11 for T1+BBMs-
Num; SSIM: P = 7.92e-67 for T1-only and P = 0.0067 for T1+BBMs-Num).
Notably, the T1+BBMs-Num method shows improvement over the T1-only
method, underscoring the usefulness of incorporating BBMs in synthesiz-
ing PET images. However, this approach is limited by numerical encoding.
Our method overcomes this limitation by introducing LLM-based encoding,
thereby better leveraging BBMs.
Qualitative comparisons are presented in Figure 4B. Each case includes
T1 images, real PET images, synthetic PET images generated by three meth-
ods, and corresponding absolute error maps. These cases feature coronal,
sagittal, and transverse slices for a comprehensive illustration. PET images
synthesized from T1-only inputs mainly preserve structural information but
fail to capture pathological uptake patterns, leading to larger errors. The
T1+BBMs-Num method improves the representation of Aβdeposition but
still shows noticeable deficiencies in regions such as the frontal, occipital, and
temporal lobes (highlighted with boxes and circles). In contrast, our method
produces synthetic PET images with spatial patterns closely resembling real
PET, reducing both local discrepancies and global error.
To further evaluate region-specific accuracy, we compared synthetic and
real PET SUVRs at the level of anatomically defined regions. Our method
achievesthestrongestcorrelation,withPearson’sRof0.9545±0.0065,demon-
strating robust agreement in regional uptake patterns (Figure 5A). Compar-
ison of individual R among three methods suggest a significant improvement
of our method (Pearson’s R: P = 2.42e-42 for T1-only and P = 5.20e-18
for T1+BBMs-Num). Figure 5B shows the distribution of regional absolute
errors across 116 grey-matter and 4 white-matter regions, averaged over all
subjects. Our method consistently yields lower errors than the baselines,
particularly in regions such as the frontal cortex, temporal lobe, and pre-
cuneus—areas known to be clinically relevant for Aβdeposition. These re-
sults further confirm that our method not only improves global similarity but
also better captures the spatial distribution of pathological features across
the brain.
18
Figure 5B presents the distribution of regional absolute errors across 116
grey-matter and 4 white-matter regions, averaged over all subjects. Our
method provides the lowest discrepancy in regional SUVRs compared with
the baselines, highlighting its ability to accurately capture regional Aβde-
position patterns. Notably, the bilateral cingulum (Cingulum_Mid_L and
Cingulum_Mid_R), cerebellar vermis (Vermis_3 and Vermis_4_5), and bi-
lateral thalamus exhibit markedly reduced errors, while consistent improve-
ments are also observed in subcortical structures such as the caudate, puta-
men, pallidum, and insula bilaterally. These results confirm that our method
not only improves global similarity but also more faithfully preserves the spa-
tial distribution of pathological features across the brain.
19
A
 BFigure 4: Evaluation of image quality of the synthetic PET image. (A). Violin plots
for comparing image quality assessment metrics distributions in individual predictions
across methods. **: indicates significance at P < 0.01. ***: P < 0.001. (B) Visual
comparison for investigating the methodological differences through case studies. From
left to right, it includes T1, synthetic PET images from three methods, and real PET
images. PET images are displayed using range-normalized SUVRs with pseudocolors for
clarity. Areaswithobviousdifferencesarehighlightedwithannotations, asboxesorcircles,
in the images. Below each synthetic PET image, corresponding error maps are provided,
indicating discrepancies measured by absolute differences.
20
A
BFigure5: Region-basedevaluationmetricsfordifferentmethods. (A).Scatterplotbetween
ground-truth and predicted regional SUVRs for all subject with inter-region correlations
and (B) absolute errors in different brain regions. The brain is divided into 116 grey-
matter regions based on the AAL-116 atlas and 4 white-matter regions. The correlation
and absolute error between the mean SUVR of the synthetic PET and real PET images for
eachregionareshown. InB,darkercolorsrepresentinglargererrors. Eachrowcorresponds
to a method, with the last row representing our method.
3.3. Evaluation of diagnostic consistency
ToexaminewhethersyntheticPETimagesretaindiagnosticallymeaning-
ful features beyond visual similarity, we conducted a physician-based evalua-
tion. As shown in Figure 6A–B, the synthetic PET images achieved an accu-
racy of 0.80, an F1 score of 0.75, and a Cohen’s Kappa of 0.60. These values
indicate good agreement with diagnoses based on real PET images, suggest-
ing that the synthetic images preserve critical pathological features relevant
for clinical interpretation. Notably, diagnostic consistency was higher for Aβ-
negative cases, with relatively lower performance observed for Aβ-positive
cases. This discrepancy likely stems from the greater challenge faced by
generative models in accurately capturing and reproducing the subtle patho-
logical signals characteristic of Aβ-positive subjects.
For the model-based evaluation, the judge model trained on real PET
images demonstrates strong performance (accuracy of 0.8357 and an AUC
21
of 0.8768). We see that the judge model captures key diagnostic features in
real PET images and is thus well suited for evaluating the diagnostic consis-
tency. Although this result may be influenced by the imperfect accuracy of
judge model, it allows for a comparative analysis with our approach evidently
outperforming the other two baselines.
Figures 6C-E further illustrate the diagnostic results inferred by the judge
model. We recognize that the T1-only method performs poorly across all
metrics, with an F1 score and sensitivity below 0.7, and T1+BBMs-Num
performs slightly better (merely above 0.7). These results indicate that the
PET images synthesized by these methods remarkably differ from real PET
images in terms of the diagnostic features. In contrast, our method achieves
the leading values in all metrics, reflecting the strength of our diagnostic
consistency to the real PET. For instance, our method outperforms the two
compared methods in accuracy (P = 0.0015 for T1-only and P = 0.0060
for T1+BBMs-LLM ) and AUC (P = 0.0021 for T1-only and P = 0.0236
for T1+BBMs-LLM ). Considering the class imbalance, the F1 score is an-
other important evaluation metric. Our method also shows an improvement
over the competing methods (P = 0.0135 for T1-only and P = 0.0532 for
T1+BBMs-Num). The agreement matrices for these results are shown in
Figure 6 E. Our method achieves the highest Cohen’s Kappa among all com-
peting methods. Note that the Cohen’s Kappa in this analysis is slightly
lower than that using human-based evaluation, which is induced by the im-
perfect diagnosis ability of the judge model. Overall, though being partially
biased, the model-based evaluation offers reasonable evidence supporting the
advantage of our method in terms of consistent diagnostic feature generation,
which is in line with the image quality metrics analysis.
22
negative positiveT1-only
positivenegative
Real 
PET168
50174
174Kappa
0.2618negative positiveT1+BBMs-Num
84
76258
148Kappa
0.4126negative positiveT1+BBMs-LLM
44
88298
136Kappa
0.4952
Model-based Evaluationnegative positiveT1+BBMs-LLM
positivenegative
Real 
PET0
1025
15Kappa
0.600
Physician-based EvaluationFigure 6: Evaluation on diagnostic consistency of the synthetic PET image. (A-B) The
corresponding results of physician-based evaluation. Only results from our method partici-
pant this analysis. (A) Inter-rater agreement matrices for real PET and the corresponding
synthetic PET from our method, evaluated by physicians. (B) Bar charts of performance
metrics of our synthetic PET in physician-based evaluation. (C-E) The performance of
PET synthesized by three methods, evaluated by a model trained on real PET. (C) The
inter-rater agreement matrix comparing diagnosis of real and synthetic PET, with Cohen’s
kappa in the central box. (D) Bar charts showing the F1 score, accuracy, and sensitivity
of the three methods, represented by their means with error bars indicating standard devi-
ations. (E) Receiver operating characteristic (ROC) curves and the corresponding AUCs.
The performance of real PET is shown with a grey border as a reference. *: indicates
significance at P < 0.05. **: P < 0.01.◦: a marginal significance, where the P-value is
0.0532.
23
3.4. Evaluation of clinical applicability of fully AI pipeline
We utilize the synthetic PET images to train Aβpathology classification
models, establishing a complete AI pipeline to further facilitate the clinical
applicability of PET generation. We use T1 images and BBMs to diagnose
Aβpositivity, as comparison baselines. As shown in Figure 7, T1 achieves a
classification AUC of only 0.64, with other metrics also performing poorly.
BBMs yields slightly better results but with a higher rate of false negatives.
Combining T1 images and BBMs using a multimodal diagnostic model im-
proves all metrics, yet this approach still falls short compared to our syn-
thetic PET images. This finding indicates that integrating T1 and BBMs
through PET generation model offers a more cohesive fusion for AD diag-
nosis compared to traditional multimodal diagnostic models. Furthermore,
fusing synthetic PET with BBMs yields superior results that outperforms
single-modal approaches or their simple fusion (T1, BBMs, and T1+BBMs).
Also, the model integrating synthetic PET and BBMs demonstrates a bal-
anced sensitivity and specificity. These findings underscore the potential of
synthetic PET from our proposed method in AD diagnostics, particularly in
enhancing pathological diagnosis accuracy by integrating diverse sources of
clinical information.
A
 B
Figure 7: Evaluation results on clinical applicability of the synthetic PET image. (A) Bar
chartsofF1score, accuracy, andsensitivity, witherrorbarsindicatingstandarddeviations.
(B) ROC curves with the corresponding AUCs. For both plots, the significant differences
are annotated for the fusion of synthetic PET and BBMs compared to other methods. *:
indicates significance at P < 0.05. **: P < 0.01. ***: P < 0.001.
24
3.5. The effectiveness of prompt engineering
The prompt engineering for utilizing multi-modality clinical data (e.g.,
demographics, BBMs and NAs) is a core contribution in our study. We
demonstrate the effectiveness of the proposed prompt design using a com-
parison to three variants in the prompt. In Figure 8 A and B, we observe a
remarkable performance decrease when placing the summary sentence (the
diagnosis/prediction prompt in Figure 1A) at the end of the prompt. Also,
excluding the diagnostic sentence degrades all metrics. However, using the
summary sentence only does not result in leading performance, which sug-
gests the key descriptions about the clinical variables are informative. Over-
all, we recognize that using the summary sentence in starting a prompt can
efficiently guide the LLM to encode clinical variables and enhance the quality
of synthetic PET images.
A
BSummary-last Summary- first (proposed) Summary- only Summary-excluded
Summary-last Summary-first (proposed) Summary-only Summary- excluded
Figure 8: Ablation study for the effectiveness of prompt engineering. Evaluation of (A)
PET image quality and (B) diagnostic feature consistency using our proposed prompt
and three different prompts for comparisons. The results in B is obtained using model-
based feature consistency evaluation. "Summary-first" indicates our proposed design.
"Summary-last" places the summary sentence at the end. "Summary-only" uses only the
summary sentence. "Summary-excluded" does not include the summary sentence and
preserves all other prompts about the clinical variables.
25
4. Discussion
Inthisstudy, weproposeamultimodalgenerativeAIapproachforsynthe-
sizing PET images from structural MRI and BBMs. We develop a language-
enhanced framework and prompt engineering to optimize PET synthesis.
Our systematic validation shows that the proposed method can (i) produce
synthesized PET semantically similar to real-world PET at individual level;
(ii) offer consistent Aβpathological features to support clinical assessment;
and (iii) enable the development of fully AI pipeline to provide Aβdiagnosis
without requiring real-world PET scans.
Achieving a cost-effective and accurate AD diagnosis remains a signifi-
cant challenge. Routine diagnosis using Aβ-PET scans is heavily constrained
by the cost, availability, and risk of radiation exposure [12, 13]. Meanwhile,
BBM-based methods provide a more accessible alternative for predicting AD
positivity without requiring PET scans, but the lack of spatial information
in BBMs limits their ability to fully replace PET imaging [58, 59]. Our
multimodal framework integrates pathological insights from BBM features
and T1-based anatomical details to synthesize high-quality Aβ-PET images.
Our validation demonstrates that these images not only provide the spatial
informationnecessaryforaccurateADassessments, butalsoenablethedevel-
opment of AD diagnostic models that outperform the BBM-based methods.
Additionally, integrating synthetic Aβ-PET images with BBMs further en-
hances diagnostic accuracy, highlighting the potential of generative models
to improve AD diagnosis. Overall, our study opens avenues to synthesize
PET images from MRI data and BBMs towards predictive AD diagnosis.
This data-driven approach is extensible to large-scale early AD diagnosis by
offering a cost-effective alternative to real PET imaging.
Our approach is pioneering in integrating BBMs into a multimodal gener-
ativemodelforPETsynthesis. WeshowthattheBBM-detectedpathological
insightscanenhancethesynthesisofAβ-PETspatialpattern, complementing
to traditional MRI-based generative approaches. We advance the representa-
tion of BBMs by employing a language-enhanced encoder, thereby improving
the integration of non-imaging data with T1 images and enhancing PET syn-
thesis. Conventionally, BBMs and other clinical variables were encoded as
numerical formats (e.g. concentration levels of BBMs) [60, 61]. A simple
integration of high-dimensional image features (more than 10 thousands)
and low dimensional non-image BBM features (represented as a few num-
bers) would likely results in a performance biased towards the image features
26
[62], which limits the contribution of non-imaging features. In contrast, our
approach employs prompt engineering to map non-imaging data into mean-
ingful contexts, guiding a medical LLM to extract knowledge-enhanced fea-
tures. To improve the context understanding, we design a "summary-first"
prompt that prioritizes the critical diagnostic information. Experimental re-
sults demonstrate that this design enhances the quality of synthetic PET
images, highlighting that a specialized prompt design can greatly contribute
to the overall performance.
A key contribution of our study is to enable a systematic process of PET
imagessynthesisandclinicalapplicationevaluation. Ourcomprehensiveeval-
uation is a progressive process, encompassing image quality, diagnostic con-
sistency, and clinical applicability. By contrast, prior PET image synthesis
studies have primarily focused on metrics of visual quality (e.g., MSE, PSNR,
and SSIM) [14, 15, 16]. It becomes increasingly evident that the detailed de-
piction of anatomical structures and image-based pathological characteristics
arecrucialinclinicaldecisionmakingthatarebeyondthescopeoftraditional
metrics. In our study, recognizing that PET analysis is frequently performed
at the brain-regional level, we introduce key evaluation metrics based on
regional SUVRs, including an absolute error and inter-region correlation.
These metrics ensure a clinically-meaningful structural similarity between
synthetic PET images and real PET images. Additionally, physician-based
evaluation of image-based pathological characteristics can help assess the
effectiveness of synthetic PET images under a clinical setting. However, a
physician-basedevaluationoverlarge-scaledataistime-consumingandlabor-
intensive. To respond, we perform a physician-based evaluation on a subset
of the data and conducted model evaluation on the entire dataset. For the
model evaluation, we develop an AI model that can mimic human diagnostic
criteria and offer quality insights into the PET image generation. Finally,
we establish a fully AI-driven pipeline, consisting of PET generation and AI
diagnosis, to substantially confirm the clinical applicability of the proposed
method.
Our study has several limitations. First, due to the difficulty in multi-
modal data acquisition, our study has not undergone a multi-center system-
atic evaluation. In addition, the range of the BBM values could vary across
the assays and reference protocols [63] and a quality assessment is required
to reveal potential protocol-induced BBM variances. Moreover, integrative
analysis with other promising biomarkers is valuable to extend the gener-
alization power of our approach. For instance, P-Tau 217 is suggested to
27
predict continuous brain Aβlevels in early AD subjects [58] and is recom-
mended as the only BBM to diagnose Aβpathology in the NIA-AA 2024
Diagnostic Guidelines [6]. Inclusion of P-Tau 217 into our framework could
help achieve the performance improvement. Finally, considering the associ-
ation of Tau-PET with T1 and BBM [64, 65], the extensive structure of our
framework could be adapted for Tau-PET synthesis to facilitate rapid AD
staging and prognosis.
In conclusion, our research presents a language-enhanced, multimodal
framework that effectively synthesizes Aβ-PET images from T1 images and
BBMs. This advancement addresses critical limitations of traditional PET
imaging data, offering a cost-effective and accurate alternative that enhances
early AD assessment, diagnosis, and decision making.
Contributors
ZZ and XM are major contributors to draft and revise of the manuscript
for content and to analyse the data. XM, QG, QH and FX played major role
in the acquisition of data. ZZ, XM, QG, SZ, QH, MZ, and ML substantially
revise the manuscript. ZZ, QH, MZ, ML and FX conceptualize and design
the study. ZZ, XM, QH, MZ, and ML interpret the data.
Declaration of interests
All authors declare no potential conflict of interests.
Acknowledgments
This study is supported in part by the National Natural Science Foun-
dation of China (82402394, 82201583, 8217052097, 82071962), Shanghai Pu-
jiang Program ( 23PJ1430200), STI2030-Major Projects (022ZD0213800),
and Shanghai Artificial Intelligence Laboratory.
Data Sharing Statement
Due to patient privacy protection, institutional regulations, and restric-
tions imposed by the ethical approvals governing this study, the raw PET,
MRI, and clinical data cannot be made publicly available. De-identified
derived data supporting the findings of this study can be provided upon rea-
sonable request to the corresponding author and will require approval from
28
the institutional review boards of Huashan Hospital and the Sixth People’s
Hospital affiliated with Shanghai Jiao Tong University.
References
[1] P. Scheltens, B. De Strooper, M. Kivipelto, H. Holstege, G. Chételat,
C.E.Teunissen, J.Cummings, W.M.vanderFlier, Alzheimer’sdisease,
The Lancet 397 (10284) (2021) 1577–1590.
[2] C. L. Masters, R. Bateman, K. Blennow, C. C. Rowe, R. A. Sperling,
J. L. Cummings, Alzheimer’s disease, Nature reviews disease primers
1 (1) (2015) 1–18.
[3] D. P. Veitch, M. W. Weiner, P. S. Aisen, L. A. Beckett, N. J. Cairns,
R. C. Green, D. Harvey, C. R. Jack Jr, W. Jagust, J. C. Morris, et al.,
Understanding disease progression and improving alzheimer’s disease
clinical trials: Recent highlights from the alzheimer’s disease neuroimag-
ing initiative, Alzheimer’s & Dementia 15 (1) (2019) 106–152.
[4] C. H. Van Dyck, C. J. Swanson, P. Aisen, R. J. Bateman, C. Chen,
M. Gee, M. Kanekiyo, D.Li, L.Reyderman, S. Cohen, etal., Lecanemab
in early alzheimer’s disease, New England Journal of Medicine 388 (1)
(2023) 9–21.
[5] S. Srivastava, R. Ahmad, S. K. Khare, Alzheimer’s disease and its treat-
ment by different approaches: A review, European Journal of Medicinal
Chemistry 216 (2021) 113320.
[6] C. R. Jack Jr, S. J. Andrews, T. G. Beach, T. Buracchio, B. Dunn,
A.Graf, O.Hansson, C.Ho, W.Jagust, E.McDade, etal., Revisedcrite-
ria for the diagnosis and staging of alzheimer’s disease, Nature medicine
(2024) 1–4.
[7] W. M. Van Der Flier, P. Scheltens, The atn framework—moving pre-
clinical alzheimer disease to clinical relevance, JAMA neurology 79 (10)
(2022) 968–970.
[8] C.R.JackJr, D.A.Bennett, K.Blennow, M.C.Carrillo, B.Dunn, S.B.
Haeberlein, D. M. Holtzman, W. Jagust, F. Jessen, J. Karlawish, et al.,
Nia-aa research framework: toward a biological definition of alzheimer’s
disease, Alzheimer’s & Dementia 14 (4) (2018) 535–562.
29
[9] D.Biel, M.Brendel, A.Rubinski, K.Buerger, D.Janowitz, M.Dichgans,
N. Franzmeier, A. D. N. I. (ADNI), Tau-pet and in vivo braak-staging
as prognostic markers of future cognitive decline in cognitively normal
to demented individuals, Alzheimer’s research & therapy 13 (1) (2021)
137.
[10] N.Mattsson,S.Palmqvist,E.Stomrud,J.Vogel,O.Hansson, Stagingβ-
amyloid pathology with amyloid positron emission tomography, JAMA
neurology 76 (11) (2019) 1319–1329.
[11] R. Ossenkoppele, R. Smith, N. Mattsson-Carlgren, C. Groot, A. Leuzy,
O. Strandberg, S. Palmqvist, T. Olsson, J. Jögi, E. Stormrud, et al.,
Accuracy of tau positron emission tomography as a prognostic marker in
preclinical and prodromal alzheimer disease: a head-to-head comparison
against amyloid positron emission tomography and magnetic resonance
imaging, JAMA neurology 78 (8) (2021) 961–971.
[12] L. Rice, S. Bisdas, The diagnostic value of fdg and amyloid pet in
alzheimer’s disease—a systematic review, European journal of radiol-
ogy 94 (2017) 16–24.
[13] A. Nordberg, J. O. Rinne, A. Kadir, B. Långström, The use of pet in
alzheimer disease, Nature Reviews Neurology 6 (2) (2010) 78–87.
[14] S. Hu, B. Lei, S. Wang, Y. Wang, Z. Feng, Y. Shen, Bidirectional map-
ping generative adversarial networks for brain mr to pet synthesis, IEEE
Transactions on Medical Imaging 41 (1) (2021) 145–157.
[15] J. Zhang, X. He, L. Qing, F. Gao, B. Wang, Bpgan: Brain pet syn-
thesis from mri using generative adversarial network for multi-modal
alzheimer’s disease diagnosis, Computer Methods and Programs in
Biomedicine 217 (2022) 106676.
[16] F. Vega, A. Addeh, A. Ganesh, E. E. Smith, M. E. MacDonald, Image
translation for estimating two-dimensional axial amyloid-beta pet from
structural mri, Journal of Magnetic Resonance Imaging 59 (3) (2024)
1021–1031.
[17] C. Li, M. Liu, J. Xia, L. Mei, Q. Yang, F. Shi, H. Zhang, D. Shen,
Individualized assessment of brain aβdeposition with fmri using deep
learning, IEEE Journal of Biomedical and Health Informatics (2023).
30
[18] C. R. Jack, D. S. Knopman, W. J. Jagust, L. M. Shaw, P. S. Aisen,
M.W.Weiner, R.C.Petersen, J.Q.Trojanowski, Hypotheticalmodelof
dynamic biomarkers of the alzheimer’s pathological cascade, The Lancet
Neurology 9 (1) (2010) 119–128.
[19] S. A. Hasani, M. Mayeli, M. A. Salehi, R. Barzegar Parizi, A system-
atic review of the association between amyloid-βandτpathology with
functional connectivity alterations in the alzheimer dementia spectrum
utilizing pet scan and rsfmri, Dementia and Geriatric Cognitive Disor-
ders Extra 11 (2) (2021) 78–90.
[20] M. M. Mielke, N. R. Fowler, Alzheimer disease blood biomarkers: con-
siderations for population-level use, Nature Reviews Neurology (2024)
1–10.
[21] O.Hansson,R.M.Edelmayer,A.L.Boxer,M.C.Carrillo,M.M.Mielke,
G. D. Rabinovici, S. Salloway, R. Sperling, H. Zetterberg, C. E. Teunis-
sen, The alzheimer’s association appropriate use recommendations for
bloodbiomarkersinalzheimer’sdisease, Alzheimer’s&Dementia18(12)
(2022) 2669–2686.
[22] O. Hansson, K. Blennow, H. Zetterberg, J. Dage, Blood biomarkers for
alzheimer’s disease in clinical practice and trials, Nature Aging 3 (5)
(2023) 506–519.
[23] B. Olsson, R. Lautner, U. Andreasson, A. Öhrfelt, E. Portelius,
M. Bjerke, M. Hölttä, C. Rosén, C. Olsson, G. Strobel, et al., Csf and
blood biomarkers for the diagnosis of alzheimer’s disease: a systematic
review and meta-analysis, The Lancet Neurology 15 (7) (2016) 673–684.
[24] C. E. Teunissen, I. M. Verberk, E. H. Thijssen, L. Vermunt, O. Hans-
son, H. Zetterberg, W. M. van der Flier, M. M. Mielke, M. Del Campo,
Blood-based biomarkers for alzheimer’s disease: towards clinical imple-
mentation, The Lancet Neurology 21 (1) (2022) 66–77.
[25] S. Janelidze, N. Mattsson, S. Palmqvist, R. Smith, T. G. Beach, G. E.
Serrano, X. Chai, N. K. Proctor, U. Eichenlaub, H. Zetterberg, et al.,
Plasma p-tau181 in alzheimer’s disease: relationship to other biomark-
ers, differential diagnosis, neuropathology and longitudinal progression
to alzheimer’s dementia, Nature medicine 26 (3) (2020) 379–386.
31
[26] N. R. Barthélemy, G. Salvadó, S. E. Schindler, Y. He, S. Janelidze,
L. E. Collij, B. Saef, R. L. Henson, C. D. Chen, B. A. Gordon, et al.,
Highly accurate blood test for alzheimer’s disease is similar or superior
to clinical cerebrospinal fluid tests, Nature medicine 30 (4) (2024) 1085–
1095.
[27] M. R. Meyer, K. M. Kirmess, S. Eastwood, T. L. Wente-Roth,
F. Irvin, M. S. Holubasch, V. Venkatesh, I. Fogelman, M. Monane,
L. Hanna, et al., Clinical validation of the precivityad2 blood test: A
mass spectrometry-based test with algorithm combining% p-tau217 and
aβ42/40 ratio to identify presence of brain amyloid, Alzheimer’s & De-
mentia (2024).
[28] J. K. Wisch, B. A. Gordon, A. H. Boerwinkle, P. H. Luckett, J. G.
Bollinger, V. Ovod, Y. Li, R. L. Henson, T. West, M. R. Meyer,
et al., Predicting continuous amyloid pet values with csf and plasma
aβ42/aβ40, Alzheimer’s & Dementia: Diagnosis, Assessment & Disease
Monitoring 15 (1) (2023) e12405.
[29] S. Palmqvist, P. Tideman, N. Mattsson-Carlgren, S. E. Schindler,
R. Smith, R. Ossenkoppele, S. Calling, T. West, M. Monane, P. B.
Verghese, et al., Blood biomarkers to detect alzheimer disease in pri-
mary care and secondary care, JAMA (2024).
[30] Y. Zhang, J. Gao, Z. Tan, L. Zhou, K. Ding, M. Zhou, S. Zhang,
D. Wang, Data-centric foundation models in computational healthcare:
A survey, arXiv preprint arXiv:2401.02458 (2024).
[31] K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung,
N. Scales, A. Tanwani, H. Cole-Lewis, S. Pfohl, et al., Large language
models encode clinical knowledge, Nature 620 (7972) (2023) 172–180.
[32] A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F.
Tan,D.S.W.Ting,Largelanguagemodelsinmedicine,Naturemedicine
29 (8) (2023) 1930–1940.
[33] J. Wang, E. Shi, S. Yu, Z. Wu, C. Ma, H. Dai, Q. Yang, Y. Kang,
J. Wu, H. Hu, et al., Prompt engineering for healthcare: Methodologies
and applications, arXiv preprint arXiv:2304.14670 (2023).
32
[34] A. Ahmed, X. Zeng, R. Xi, M. Hou, S. A. Shah, Med-prompt: A novel
prompt engineering framework for medicine prediction on free-text clin-
ical notes, Journal of King Saud University-Computer and Information
Sciences 36 (2) (2024) 101933.
[35] S. Ren, J. Li, L. Huang, Q. Huang, K. Chen, J. Hu, F. Jessen, X. Hu,
D. Jiang, L. Zhu, et al., Brain functional alterations and association
with cognition in people with preclinical subjective cognitive decline and
objective subtle cognitive difficulties, Neuroscience 513 (2023) 137–144.
[36] F.-F. Pan, Q. Huang, Y. Wang, Y.-F. Wang, Y.-H. Guan, F. Xie, Q.-H.
Guo, Non-linearcharacterofplasmaamyloidbetaoverthecourseofcog-
nitive decline in alzheimer’s continuum, Frontiers in Aging Neuroscience
14 (2022) 832700.
[37] H. Chu, C. Huang, Y. Guan, F. Xie, M. Chen, Q. Guo, The associations
between nutritional status and physical frailty and alzheimer’s disease
plasma biomarkers in older cognitively unimpaired adults with positive
of amyloid-βpet, Clinical Nutrition 43 (7) (2024) 1647–1656.
[38] Q. Guo, Q. Zhao, M. Chen, D. Ding, Z. Hong, A comparison study of
mild cognitive impairment with 3 memory tests among chinese individ-
uals, Alzheimer Disease & Associated Disorders 23 (3) (2009) 253–259.
[39] Q. Zhao, Q. Guo, Z. Hong, Clustering and switching during a seman-
tic verbal fluency test contribute to differential diagnosis of cognitive
impairment, Neuroscience bulletin 29 (2013) 75–82.
[40] Q. Zhao, Q. Guo, F. Li, Y. Zhou, B. Wang, Z. Hong, The shape trail
test: application of a new variant of the trail making test, PloS one 8 (2)
(2013) e57333.
[41] Q. Zhao, Q. Guo, X. Liang, M. Chen, Y. Zhou, D. Ding, Z. Hong,
Auditory verbal learning test is superior to rey-osterrieth complex figure
memory for predicting mild cognitive impairment to alzheimer’s disease,
Current Alzheimer Research 12 (6) (2015) 520–526.
[42] D. Ding, Q. Zhao, Q. Guo, X. Liang, J. Luo, L. Yu, L. Zheng, Z. Hong,
S. A. S. SAS, Progression and predictors of mild cognitive impairment
in chinese elderly: a prospective follow-up in the shanghai aging study,
33
Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring
4 (2016) 28–36.
[43] G. M. McKhann, D. S. Knopman, H. Chertkow, B. T. Hyman, C. R.
Jack Jr, C. H. Kawas, W. E. Klunk, W. J. Koroshetz, J. J. Manly,
R. Mayeux, et al., The diagnosis of dementia due to alzheimer’s disease:
Recommendations from the national institute on aging-alzheimer’s as-
sociation workgroups on diagnostic guidelines for alzheimer’s disease,
Alzheimer’s & dementia 7 (3) (2011) 263–269.
[44] T. Chen, F. Pan, Q. Huang, G. Xie, X. Chao, L. Wu, J. Wang, L. Cui,
T. Sun, M. Li, et al., Metabolic phenotyping reveals an emerging role of
ammonia abnormality in alzheimer’s disease, Nature Communications
15 (1) (2024) 3796.
[45] L.Huang, K.Chen, Z.Liu, Q.Guo, Aconceptualframeworkforresearch
on cognitive impairment with no dementia in memory clinic, Current
Alzheimer Research 17 (6) (2020) 517–525.
[46] P. Sahoo, A. K. Singh, S. Saha, V. Jain, S. Mondal, A. Chadha, A sys-
tematic survey of prompt engineering in large language models: Tech-
niques and applications, arXiv preprint arXiv:2402.07927 (2024).
[47] V. Liu, L. B. Chilton, Design guidelines for prompt engineering text-to-
image generative models, in: Proceedings of the 2022 CHI conference
on human factors in computing systems, 2022, pp. 1–23.
[48] S. Chakraborty, E. Bisong, S. Bhatt, T. Wagner, R. Elliott, F. Mosconi,
Biomedbert: A pre-trained biomedical language model for qa and ir,
in: Proceedings of the 28th international conference on computational
linguistics, 2020, pp. 669–679.
[49] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger, 3d
u-net: learning dense volumetric segmentation from sparse annotation,
in: Medical Image Computing and Computer-Assisted Intervention–
MICCAI 2016: 19th International Conference, Athens, Greece, October
17-21, 2016, Proceedings, Part II 19, 2016, pp. 424–432.
[50] M. Mirza, S. Osindero, Conditional generative adversarial nets, arXiv
preprint arXiv:1411.1784 (2014).
34
[51] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Genera-
tive adversarial text to image synthesis, in: International conference on
machine learning, 2016, pp. 1060–1069.
[52] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, D. N. Metaxas,
Stackgan++: Realistic image synthesis with stacked generative adver-
sarial networks, IEEE transactions on pattern analysis and machine in-
telligence 41 (8) (2018) 1947–1962.
[53] E.Perez, F.Strub, H.DeVries, V.Dumoulin, A.Courville, Film: Visual
reasoningwithageneralconditioninglayer, in: ProceedingsoftheAAAI
conference on artificial intelligence, Vol. 32, 2018.
[54] A. Hore, D. Ziou, Image quality metrics: Psnr vs. ssim, in: 2010 20th
international conference on pattern recognition, 2010, pp. 2366–2369.
[55] B. Billot, D. N. Greve, O. Puonti, A. Thielscher, K. Van Leemput,
B. Fischl, A. V. Dalca, J. E. Iglesias, et al., Synthseg: Segmentation
of brain mri scans of any contrast and resolution without retraining,
Medical image analysis 86 (2023) 102789.
[56] S. Chen, K. Ma, Y. Zheng, Med3d: Transfer learning for 3d medical
image analysis, arXiv preprint arXiv:1904.00625 (2019).
[57] J. Cohen, A coefficient of agreement for nominal scales, Educational and
psychological measurement 20 (1) (1960) 37–46.
[58] V. Devanarayan, T. Doherty, A. Charil, P. Sachdev, Y. Ye, L. K. Mu-
rali, D. A. Llano, J. Zhou, L. Reyderman, H. Hampel, et al., Plasma
ptau217 predicts continuous brain amyloid levels in preclinical and early
alzheimer’s disease, Alzheimer’s & Dementia (2024).
[59] W. S. Brum, N. C. Cullen, J. Therriault, S. Janelidze, N. Rahmouni,
J. Stevenson, S. Servaes, A. L. Benedet, E. R. Zimmer, E. Stomrud,
et al., A blood-based biomarker workflow for optimal tau-pet referral in
memory clinic settings, Nature Communications 15 (1) (2024) 2311.
[60] T. Xia, A. Chartsias, C. Wang, S. A. Tsaftaris, A. D. N. Initiative,
et al., Learning to synthesise the ageing brain without longitudinal data,
Medical Image Analysis 73 (2021) 102169.
35
[61] T. Xia, A. Chartsias, S. A. Tsaftaris, A. D. N. Initiative, Consistent
brain ageing synthesis, in: Medical Image Computing and Computer
Assisted Intervention–MICCAI 2019: 22nd International Conference,
Shenzhen, China, October 13–17, 2019, Proceedings, Part IV 22, 2019,
pp. 750–758.
[62] C. Cui, H. Yang, Y. Wang, S. Zhao, Z. Asad, L. A. Coburn, K. T.
Wilson, B. A. Landman, Y. Huo, Deep multimodal fusion of image and
non-image data in disease diagnosis and prognosis: a review, Progress
in Biomedical Engineering 5 (2) (2023) 022001.
[63] C. Giangrande, V. Delatour, U. Andreasson, K. Blennow, J. Gobom,
H. Zetterberg, Harmonization and standardization of biofluid-based
biomarker measurements for at (n) classification in alzheimer’s disease,
Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring
15 (3) (2023) e12465.
[64] J. Lee, B. J. Burkett, H.-K. Min, M. L. Senjem, E. Dicks, N. Corriveau-
Lecavalier, C. T. Mester, H. J. Wiste, E. S. Lundt, M. E. Murray, et al.,
Synthesizing images of tau pathology from cross-modal neuroimaging
using deep learning, Brain 147 (3) (2024) 980–995.
[65] D. C. Matthews, J. W. Kinney, A. Ritter, R. D. Andrews, E. N.
Toledano Strom, A. S. Lukic, L. N. Koenig, C. Revta, H. M. Fillit,
K. Zhong, et al., Relationships between plasma biomarkers, tau pet,
fdg pet, and volumetric mri in mild to moderate alzheimer’s disease
patients, Alzheimer’s & Dementia: Translational Research & Clinical
Interventions 10 (3) (2024) e12490.
References
[1] P. Scheltens, B. De Strooper, M. Kivipelto, H. Holstege, G. Chételat,
C.E.Teunissen, J.Cummings, W.M.vanderFlier, Alzheimer’sdisease,
The Lancet 397 (10284) (2021) 1577–1590.
[2] C. L. Masters, R. Bateman, K. Blennow, C. C. Rowe, R. A. Sperling,
J. L. Cummings, Alzheimer’s disease, Nature reviews disease primers
1 (1) (2015) 1–18.
36
[3] D. P. Veitch, M. W. Weiner, P. S. Aisen, L. A. Beckett, N. J. Cairns,
R. C. Green, D. Harvey, C. R. Jack Jr, W. Jagust, J. C. Morris, et al.,
Understanding disease progression and improving alzheimer’s disease
clinical trials: Recent highlights from the alzheimer’s disease neuroimag-
ing initiative, Alzheimer’s & Dementia 15 (1) (2019) 106–152.
[4] C. H. Van Dyck, C. J. Swanson, P. Aisen, R. J. Bateman, C. Chen,
M. Gee, M. Kanekiyo, D.Li, L.Reyderman, S. Cohen, etal., Lecanemab
in early alzheimer’s disease, New England Journal of Medicine 388 (1)
(2023) 9–21.
[5] S. Srivastava, R. Ahmad, S. K. Khare, Alzheimer’s disease and its treat-
ment by different approaches: A review, European Journal of Medicinal
Chemistry 216 (2021) 113320.
[6] C. R. Jack Jr, S. J. Andrews, T. G. Beach, T. Buracchio, B. Dunn,
A.Graf, O.Hansson, C.Ho, W.Jagust, E.McDade, etal., Revisedcrite-
ria for the diagnosis and staging of alzheimer’s disease, Nature medicine
(2024) 1–4.
[7] W. M. Van Der Flier, P. Scheltens, The atn framework—moving pre-
clinical alzheimer disease to clinical relevance, JAMA neurology 79 (10)
(2022) 968–970.
[8] C.R.JackJr, D.A.Bennett, K.Blennow, M.C.Carrillo, B.Dunn, S.B.
Haeberlein, D. M. Holtzman, W. Jagust, F. Jessen, J. Karlawish, et al.,
Nia-aa research framework: toward a biological definition of alzheimer’s
disease, Alzheimer’s & Dementia 14 (4) (2018) 535–562.
[9] D.Biel, M.Brendel, A.Rubinski, K.Buerger, D.Janowitz, M.Dichgans,
N. Franzmeier, A. D. N. I. (ADNI), Tau-pet and in vivo braak-staging
as prognostic markers of future cognitive decline in cognitively normal
to demented individuals, Alzheimer’s research & therapy 13 (1) (2021)
137.
[10] N.Mattsson,S.Palmqvist,E.Stomrud,J.Vogel,O.Hansson, Stagingβ-
amyloid pathology with amyloid positron emission tomography, JAMA
neurology 76 (11) (2019) 1319–1329.
[11] R. Ossenkoppele, R. Smith, N. Mattsson-Carlgren, C. Groot, A. Leuzy,
O. Strandberg, S. Palmqvist, T. Olsson, J. Jögi, E. Stormrud, et al.,
37
Accuracy of tau positron emission tomography as a prognostic marker in
preclinical and prodromal alzheimer disease: a head-to-head comparison
against amyloid positron emission tomography and magnetic resonance
imaging, JAMA neurology 78 (8) (2021) 961–971.
[12] L. Rice, S. Bisdas, The diagnostic value of fdg and amyloid pet in
alzheimer’s disease—a systematic review, European journal of radiol-
ogy 94 (2017) 16–24.
[13] A. Nordberg, J. O. Rinne, A. Kadir, B. Långström, The use of pet in
alzheimer disease, Nature Reviews Neurology 6 (2) (2010) 78–87.
[14] S. Hu, B. Lei, S. Wang, Y. Wang, Z. Feng, Y. Shen, Bidirectional map-
ping generative adversarial networks for brain mr to pet synthesis, IEEE
Transactions on Medical Imaging 41 (1) (2021) 145–157.
[15] J. Zhang, X. He, L. Qing, F. Gao, B. Wang, Bpgan: Brain pet syn-
thesis from mri using generative adversarial network for multi-modal
alzheimer’s disease diagnosis, Computer Methods and Programs in
Biomedicine 217 (2022) 106676.
[16] F. Vega, A. Addeh, A. Ganesh, E. E. Smith, M. E. MacDonald, Image
translation for estimating two-dimensional axial amyloid-beta pet from
structural mri, Journal of Magnetic Resonance Imaging 59 (3) (2024)
1021–1031.
[17] C. Li, M. Liu, J. Xia, L. Mei, Q. Yang, F. Shi, H. Zhang, D. Shen,
Individualized assessment of brain aβdeposition with fmri using deep
learning, IEEE Journal of Biomedical and Health Informatics (2023).
[18] C. R. Jack, D. S. Knopman, W. J. Jagust, L. M. Shaw, P. S. Aisen,
M.W.Weiner, R.C.Petersen, J.Q.Trojanowski, Hypotheticalmodelof
dynamic biomarkers of the alzheimer’s pathological cascade, The Lancet
Neurology 9 (1) (2010) 119–128.
[19] S. A. Hasani, M. Mayeli, M. A. Salehi, R. Barzegar Parizi, A system-
atic review of the association between amyloid-βandτpathology with
functional connectivity alterations in the alzheimer dementia spectrum
utilizing pet scan and rsfmri, Dementia and Geriatric Cognitive Disor-
ders Extra 11 (2) (2021) 78–90.
38
[20] M. M. Mielke, N. R. Fowler, Alzheimer disease blood biomarkers: con-
siderations for population-level use, Nature Reviews Neurology (2024)
1–10.
[21] O.Hansson,R.M.Edelmayer,A.L.Boxer,M.C.Carrillo,M.M.Mielke,
G. D. Rabinovici, S. Salloway, R. Sperling, H. Zetterberg, C. E. Teunis-
sen, The alzheimer’s association appropriate use recommendations for
bloodbiomarkersinalzheimer’sdisease, Alzheimer’s&Dementia18(12)
(2022) 2669–2686.
[22] O. Hansson, K. Blennow, H. Zetterberg, J. Dage, Blood biomarkers for
alzheimer’s disease in clinical practice and trials, Nature Aging 3 (5)
(2023) 506–519.
[23] B. Olsson, R. Lautner, U. Andreasson, A. Öhrfelt, E. Portelius,
M. Bjerke, M. Hölttä, C. Rosén, C. Olsson, G. Strobel, et al., Csf and
blood biomarkers for the diagnosis of alzheimer’s disease: a systematic
review and meta-analysis, The Lancet Neurology 15 (7) (2016) 673–684.
[24] C. E. Teunissen, I. M. Verberk, E. H. Thijssen, L. Vermunt, O. Hans-
son, H. Zetterberg, W. M. van der Flier, M. M. Mielke, M. Del Campo,
Blood-based biomarkers for alzheimer’s disease: towards clinical imple-
mentation, The Lancet Neurology 21 (1) (2022) 66–77.
[25] S. Janelidze, N. Mattsson, S. Palmqvist, R. Smith, T. G. Beach, G. E.
Serrano, X. Chai, N. K. Proctor, U. Eichenlaub, H. Zetterberg, et al.,
Plasma p-tau181 in alzheimer’s disease: relationship to other biomark-
ers, differential diagnosis, neuropathology and longitudinal progression
to alzheimer’s dementia, Nature medicine 26 (3) (2020) 379–386.
[26] N. R. Barthélemy, G. Salvadó, S. E. Schindler, Y. He, S. Janelidze,
L. E. Collij, B. Saef, R. L. Henson, C. D. Chen, B. A. Gordon, et al.,
Highly accurate blood test for alzheimer’s disease is similar or superior
to clinical cerebrospinal fluid tests, Nature medicine 30 (4) (2024) 1085–
1095.
[27] M. R. Meyer, K. M. Kirmess, S. Eastwood, T. L. Wente-Roth,
F. Irvin, M. S. Holubasch, V. Venkatesh, I. Fogelman, M. Monane,
L. Hanna, et al., Clinical validation of the precivityad2 blood test: A
mass spectrometry-based test with algorithm combining% p-tau217 and
39
aβ42/40 ratio to identify presence of brain amyloid, Alzheimer’s & De-
mentia (2024).
[28] J. K. Wisch, B. A. Gordon, A. H. Boerwinkle, P. H. Luckett, J. G.
Bollinger, V. Ovod, Y. Li, R. L. Henson, T. West, M. R. Meyer,
et al., Predicting continuous amyloid pet values with csf and plasma
aβ42/aβ40, Alzheimer’s & Dementia: Diagnosis, Assessment & Disease
Monitoring 15 (1) (2023) e12405.
[29] S. Palmqvist, P. Tideman, N. Mattsson-Carlgren, S. E. Schindler,
R. Smith, R. Ossenkoppele, S. Calling, T. West, M. Monane, P. B.
Verghese, et al., Blood biomarkers to detect alzheimer disease in pri-
mary care and secondary care, JAMA (2024).
[30] Y. Zhang, J. Gao, Z. Tan, L. Zhou, K. Ding, M. Zhou, S. Zhang,
D. Wang, Data-centric foundation models in computational healthcare:
A survey, arXiv preprint arXiv:2401.02458 (2024).
[31] K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung,
N. Scales, A. Tanwani, H. Cole-Lewis, S. Pfohl, et al., Large language
models encode clinical knowledge, Nature 620 (7972) (2023) 172–180.
[32] A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F.
Tan,D.S.W.Ting,Largelanguagemodelsinmedicine,Naturemedicine
29 (8) (2023) 1930–1940.
[33] J. Wang, E. Shi, S. Yu, Z. Wu, C. Ma, H. Dai, Q. Yang, Y. Kang,
J. Wu, H. Hu, et al., Prompt engineering for healthcare: Methodologies
and applications, arXiv preprint arXiv:2304.14670 (2023).
[34] A. Ahmed, X. Zeng, R. Xi, M. Hou, S. A. Shah, Med-prompt: A novel
prompt engineering framework for medicine prediction on free-text clin-
ical notes, Journal of King Saud University-Computer and Information
Sciences 36 (2) (2024) 101933.
[35] S. Ren, J. Li, L. Huang, Q. Huang, K. Chen, J. Hu, F. Jessen, X. Hu,
D. Jiang, L. Zhu, et al., Brain functional alterations and association
with cognition in people with preclinical subjective cognitive decline and
objective subtle cognitive difficulties, Neuroscience 513 (2023) 137–144.
40
[36] F.-F. Pan, Q. Huang, Y. Wang, Y.-F. Wang, Y.-H. Guan, F. Xie, Q.-H.
Guo, Non-linearcharacterofplasmaamyloidbetaoverthecourseofcog-
nitive decline in alzheimer’s continuum, Frontiers in Aging Neuroscience
14 (2022) 832700.
[37] H. Chu, C. Huang, Y. Guan, F. Xie, M. Chen, Q. Guo, The associations
between nutritional status and physical frailty and alzheimer’s disease
plasma biomarkers in older cognitively unimpaired adults with positive
of amyloid-βpet, Clinical Nutrition 43 (7) (2024) 1647–1656.
[38] Q. Guo, Q. Zhao, M. Chen, D. Ding, Z. Hong, A comparison study of
mild cognitive impairment with 3 memory tests among chinese individ-
uals, Alzheimer Disease & Associated Disorders 23 (3) (2009) 253–259.
[39] Q. Zhao, Q. Guo, Z. Hong, Clustering and switching during a seman-
tic verbal fluency test contribute to differential diagnosis of cognitive
impairment, Neuroscience bulletin 29 (2013) 75–82.
[40] Q. Zhao, Q. Guo, F. Li, Y. Zhou, B. Wang, Z. Hong, The shape trail
test: application of a new variant of the trail making test, PloS one 8 (2)
(2013) e57333.
[41] Q. Zhao, Q. Guo, X. Liang, M. Chen, Y. Zhou, D. Ding, Z. Hong,
Auditory verbal learning test is superior to rey-osterrieth complex figure
memory for predicting mild cognitive impairment to alzheimer’s disease,
Current Alzheimer Research 12 (6) (2015) 520–526.
[42] D. Ding, Q. Zhao, Q. Guo, X. Liang, J. Luo, L. Yu, L. Zheng, Z. Hong,
S. A. S. SAS, Progression and predictors of mild cognitive impairment
in chinese elderly: a prospective follow-up in the shanghai aging study,
Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring
4 (2016) 28–36.
[43] G. M. McKhann, D. S. Knopman, H. Chertkow, B. T. Hyman, C. R.
Jack Jr, C. H. Kawas, W. E. Klunk, W. J. Koroshetz, J. J. Manly,
R. Mayeux, et al., The diagnosis of dementia due to alzheimer’s disease:
Recommendations from the national institute on aging-alzheimer’s as-
sociation workgroups on diagnostic guidelines for alzheimer’s disease,
Alzheimer’s & dementia 7 (3) (2011) 263–269.
41
[44] T. Chen, F. Pan, Q. Huang, G. Xie, X. Chao, L. Wu, J. Wang, L. Cui,
T. Sun, M. Li, et al., Metabolic phenotyping reveals an emerging role of
ammonia abnormality in alzheimer’s disease, Nature Communications
15 (1) (2024) 3796.
[45] L.Huang, K.Chen, Z.Liu, Q.Guo, Aconceptualframeworkforresearch
on cognitive impairment with no dementia in memory clinic, Current
Alzheimer Research 17 (6) (2020) 517–525.
[46] P. Sahoo, A. K. Singh, S. Saha, V. Jain, S. Mondal, A. Chadha, A sys-
tematic survey of prompt engineering in large language models: Tech-
niques and applications, arXiv preprint arXiv:2402.07927 (2024).
[47] V. Liu, L. B. Chilton, Design guidelines for prompt engineering text-to-
image generative models, in: Proceedings of the 2022 CHI conference
on human factors in computing systems, 2022, pp. 1–23.
[48] S. Chakraborty, E. Bisong, S. Bhatt, T. Wagner, R. Elliott, F. Mosconi,
Biomedbert: A pre-trained biomedical language model for qa and ir,
in: Proceedings of the 28th international conference on computational
linguistics, 2020, pp. 669–679.
[49] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger, 3d
u-net: learning dense volumetric segmentation from sparse annotation,
in: Medical Image Computing and Computer-Assisted Intervention–
MICCAI 2016: 19th International Conference, Athens, Greece, October
17-21, 2016, Proceedings, Part II 19, 2016, pp. 424–432.
[50] M. Mirza, S. Osindero, Conditional generative adversarial nets, arXiv
preprint arXiv:1411.1784 (2014).
[51] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Genera-
tive adversarial text to image synthesis, in: International conference on
machine learning, 2016, pp. 1060–1069.
[52] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, D. N. Metaxas,
Stackgan++: Realistic image synthesis with stacked generative adver-
sarial networks, IEEE transactions on pattern analysis and machine in-
telligence 41 (8) (2018) 1947–1962.
42
[53] E.Perez, F.Strub, H.DeVries, V.Dumoulin, A.Courville, Film: Visual
reasoningwithageneralconditioninglayer, in: ProceedingsoftheAAAI
conference on artificial intelligence, Vol. 32, 2018.
[54] A. Hore, D. Ziou, Image quality metrics: Psnr vs. ssim, in: 2010 20th
international conference on pattern recognition, 2010, pp. 2366–2369.
[55] B. Billot, D. N. Greve, O. Puonti, A. Thielscher, K. Van Leemput,
B. Fischl, A. V. Dalca, J. E. Iglesias, et al., Synthseg: Segmentation
of brain mri scans of any contrast and resolution without retraining,
Medical image analysis 86 (2023) 102789.
[56] S. Chen, K. Ma, Y. Zheng, Med3d: Transfer learning for 3d medical
image analysis, arXiv preprint arXiv:1904.00625 (2019).
[57] J. Cohen, A coefficient of agreement for nominal scales, Educational and
psychological measurement 20 (1) (1960) 37–46.
[58] V. Devanarayan, T. Doherty, A. Charil, P. Sachdev, Y. Ye, L. K. Mu-
rali, D. A. Llano, J. Zhou, L. Reyderman, H. Hampel, et al., Plasma
ptau217 predicts continuous brain amyloid levels in preclinical and early
alzheimer’s disease, Alzheimer’s & Dementia (2024).
[59] W. S. Brum, N. C. Cullen, J. Therriault, S. Janelidze, N. Rahmouni,
J. Stevenson, S. Servaes, A. L. Benedet, E. R. Zimmer, E. Stomrud,
et al., A blood-based biomarker workflow for optimal tau-pet referral in
memory clinic settings, Nature Communications 15 (1) (2024) 2311.
[60] T. Xia, A. Chartsias, C. Wang, S. A. Tsaftaris, A. D. N. Initiative,
et al., Learning to synthesise the ageing brain without longitudinal data,
Medical Image Analysis 73 (2021) 102169.
[61] T. Xia, A. Chartsias, S. A. Tsaftaris, A. D. N. Initiative, Consistent
brain ageing synthesis, in: Medical Image Computing and Computer
Assisted Intervention–MICCAI 2019: 22nd International Conference,
Shenzhen, China, October 13–17, 2019, Proceedings, Part IV 22, 2019,
pp. 750–758.
[62] C. Cui, H. Yang, Y. Wang, S. Zhao, Z. Asad, L. A. Coburn, K. T.
Wilson, B. A. Landman, Y. Huo, Deep multimodal fusion of image and
43
non-image data in disease diagnosis and prognosis: a review, Progress
in Biomedical Engineering 5 (2) (2023) 022001.
[63] C. Giangrande, V. Delatour, U. Andreasson, K. Blennow, J. Gobom,
H. Zetterberg, Harmonization and standardization of biofluid-based
biomarker measurements for at (n) classification in alzheimer’s disease,
Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring
15 (3) (2023) e12465.
[64] J. Lee, B. J. Burkett, H.-K. Min, M. L. Senjem, E. Dicks, N. Corriveau-
Lecavalier, C. T. Mester, H. J. Wiste, E. S. Lundt, M. E. Murray, et al.,
Synthesizing images of tau pathology from cross-modal neuroimaging
using deep learning, Brain 147 (3) (2024) 980–995.
[65] D. C. Matthews, J. W. Kinney, A. Ritter, R. D. Andrews, E. N.
Toledano Strom, A. S. Lukic, L. N. Koenig, C. Revta, H. M. Fillit,
K. Zhong, et al., Relationships between plasma biomarkers, tau pet,
fdg pet, and volumetric mri in mild to moderate alzheimer’s disease
patients, Alzheimer’s & Dementia: Translational Research & Clinical
Interventions 10 (3) (2024) e12490.
44