Abstract: Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. However, these models are typically trained on an enormous amount of ...
Abstract: We study the task of image retrieval with text feedback, where a reference image and modification text are composed to retrieve the desired target image. To accomplish this goal, existing ...