Stability AI also credited EleutherAI and LAION (a German nonprofit which assembled the dataset on which Stable Diffusion was trained) as supporters of the project. Development was led by Patrick Esser of Runway and Robin Rombach of CompVis, who were among the researchers who had earlier invented the latent diffusion model architecture used by Stable Diffusion. The technical license for the model was released by the CompVis group at Ludwig Maximilian University of Munich. The development of Stable Diffusion was funded and shaped by the start-up company Stability AI. This marked a departure from previous proprietary text-to-image models such as DALL-E and Midjourney which were accessible only via cloud services. Its code and model weights have been released publicly, and it can run on most consumer hardware equipped with a modest GPU with at least 8 GB VRAM. Stable Diffusion is a latent diffusion model, a kind of deep generative artificial neural network. It was developed by researchers from the CompVis Group at Ludwig Maximilian University of Munich and Runway with a compute donation by Stability AI and training data from non-profit organizations. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt. Stable Diffusion is a deep learning, text-to-image model released in 2022. Nvidia has published a blog post on the process, while the research paper is available for open access on arXiv.Github. While the in-filled content is, naturally, not an exact match for what was originally there - especially obvious in the demonstration where entire objects, like rocks and bridges, are masked from the image and disappear entirely post-processing - it's surprisingly convincing and, unlike techniques which require post-processing for colour matching and other tweaks, entirely automatic in operation. The team's demonstration video, running on the company's Tesla V100 accelerator boards, showcases how the system operates, and it's undeniably impressive in use. The solution, as the paper's title makes clear, is the use of partial convolutions ' where the convolution is masked and renormalised to be conditioned on only valid pixels.' Coupled with a mechanism which generates an updated mask for the next layer as part of the forward pass, the result is a system which can automatically restore missing parts of images - even when there is nothing left in the original image from which a sample can be taken, as in one of the example uses of replacing a celebrity's eyes. Post-processing is usually used to reduce such artefacts, but are expensive and may fail.' ![]() ![]() ' This often leads to artefacts such as colour discrepancy and blurriness. ' Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value),' the researchers explain in the abstract for their paper, Image Inpainting for Irregular Holes using Partial Convolutions. Nvidia's research and development team have come up with a new GPU-accelerated deep-learning system capable of restoring images seemingly damaged beyond repair, building on existing 'image inpainting' technologies - functionally similar to Adobe Photoshop's Content Aware Fill tool - and, it claims, surpassing them.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |