Skip to content

Commit

Permalink
Started adding related works
Browse files Browse the repository at this point in the history
  • Loading branch information
smsharma committed Mar 11, 2024
1 parent 61f1dc1 commit 42f3ecc
Show file tree
Hide file tree
Showing 5 changed files with 15 additions and 7 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@

## Paper draft

[Link to paper draft](https://github.com/smsharma/HubbleCLIP/blob/main-pdf/paper/main.pdf). The PDF is compiled automatically from the `main` branch into the `main-pdf` branch on push.
[Link to paper draft](https://github.com/smsharma/HubbleCLIP/blob/main-pdf/paper/hubble_paperclip.pdf). The PDF is compiled automatically from the `main` branch into the `main-pdf` branch on push.

Binary file modified paper/hubble_paperclip.pdf
Binary file not shown.
20 changes: 14 additions & 6 deletions paper/hubble_paperclip.tex
Original file line number Diff line number Diff line change
Expand Up @@ -189,9 +189,17 @@ \section{Introduction}
Our method opens up the possibility of interacting with astronomical survey data using free-form natural language as an interface, which is a cornerstone of the success of the modern foundation model paradigm.
%

The CLIP family of foundation models, which in their original form embed images and associated captions into a common representation space via contrastive learning, have shown strong performance and generalization capabilities on a variety of downstream tasks including zero-shot classification and image retrieval.
\paragraph*{Related work}

% The CLIP family of foundation models, which in their original form embed images and associated captions into a common representation space via contrastive learning, have shown strong performance and generalization capabilities on a variety of downstream tasks including zero-shot classification and image retrieval~\citep{radford2021learning}.
%
The concept of learning task-agnostic representations via self-supervised learning has been applied within astrophysics \cite{slijepcevic2024radio,stein2021self,hayat2021self,slijepcevic2022learning} and used for downstream tasks like object similar search \citep{stein2021self}, gravitational lens finding \citep{stein2022mining}, estimation of Galactic distances \citep{hayat2021estimating}, and identification of rare galaxies \citep{walmsley2023rare}.
%
Associating different modalities via contrastive training has been employed in many other scientific domains~\citep[e.g.,][]{liu2023text,Sanchez-Fernandez2022.11.17.516915,lanusse2023astroclip,cepeda2023geoclip}, and has been shown to be effective in learning semantically meaningful joint representations. Here, we present for the first time an application associating diverse astronomical data with the text modality.
%
For a recent review of contrastive learning in astrophysics, see \citet{huertas2023brief}.
%
The concept of associating diverse modalities via contrastive training has been employed in other scientific domains~\citep[e.g.,][]{liu2023text,Sanchez-Fernandez2022.11.17.516915,lanusse2023astroclip,cepeda2023geoclip}, and has been shown to be effective in learning semantically meaningful joint representations. Here, we present for the first time an application associating astronomical data with the text modality.
% \citet{bowles2023radio,bowles2022new}

The rest of this paper is organized as follows.
%
Expand All @@ -206,7 +214,7 @@ \section{Introduction}
\begin{figure*}[!t]
\centering
\includegraphics[width=0.99\textwidth]{plots/figure.pdf}
\caption{\changes{Overview of the PAPERCLIP method. (Left) A pre-trained CLIP model is fine-tuned using a dataset of \hubble observations and corresponding proposal abstracts. (Right) The fine-tuned model can then be used for downstream tasks such as observation retrieval (i.e., finding the observations most relevant to a given text query). The proposal abstract snippet here corresponds to proposal ID \href{https://archive.stsci.edu/proposal_search.php?id=16914&mission=hst}{16914}}}
\caption{\changes{Overview of the PAPERCLIP method. (Left) A pre-trained CLIP model is fine-tuned using a dataset of \hubble observations and corresponding proposal abstracts. (Right) The fine-tuned model can then be used for downstream tasks such as observation retrieval (i.e., finding the observations most relevant to a given text query). The proposal abstract snippet here corresponds to proposal ID \href{https://archive.stsci.edu/proposal_search.php?id=16914&mission=hst}{16914}}.}
\label{fig:overview}
\end{figure*}

Expand Down Expand Up @@ -349,9 +357,9 @@ \section{Methodology}
%
With PAPERCLIP, we leverage the strong generalization capabilities demonstrated by pre-trained CLIP models and adapt these to work with domain-specific \hubble data via fine tuning.

\subsection{Contrastive Language-Image Pre-training (CLIP)}
\subsection{Contrastive Language-Image Pre-training}

CLIP \citep[Contrastive Language-Image Pre-training;][]{radford2021learning} is a multi-modal neural network model pre-trained on a large corpus of image-text pairs via weak supervision using a contrastive loss.
Contrastive Language-Image Pre-training \citep[CLIP;][]{radford2021learning} is a multi-modal neural network model pre-trained on a large corpus of image-text pairs via weak supervision using a contrastive loss.
%
Given a minibatch $\mathcal{B}$ of $|\mathcal{B}|$ image-text pairs $\{(I_i, T_i)\}$, the goal is to align the learned representations of corresponding (positive) pairs $(I_i, T_i)$ while repelling the representations of unaligned (negative) pairs $(I_i, T_{j\neq i})$.
%
Expand Down Expand Up @@ -422,7 +430,7 @@ \subsection{Evaluation Metrics}

We also qualitatively evaluate the learned embeddings through image retrieval (i.e., retrieving the most relevant images from the validation set using natural language queries) and description retrieval (i.e., querying the astrophysical object classes and science use cases most relevant to a given observation, akin to zero-shot classification) experiments.
%
For the description/text retrieval evaluation, we define a list of possible text associations (i.e., classes), which we show in App.~\ref{app:categories}, by querying the \textsc{Claude 2}\footnote{\url{https://claude.ai/}} large language model along with manual curation.
For the description/text retrieval evaluation, we define a list of possible text associations (i.e., classes), which we show in App.~\ref{app:categories}, by querying the \textsc{Claude 2}\footnote{\url{https://claude.ai/}} large language followed by with manual curation.

\section{Results and Discussion}
\label{sec:results}
Expand Down
Binary file modified paper/plots/figure.pdf
Binary file not shown.
Binary file modified paper/plots/pro.afdesign
Binary file not shown.

0 comments on commit 42f3ecc

Please sign in to comment.