Implementing NLP search on fediverse

ofcourse@kbin.social · edit-2 1 year ago

Implementing NLP search on fediverse

kjr@kbin.social · 1 year ago

@ofcourse there are instances which defederated an instance because they implented free-text search. There is not agreement on that.
Steps 1 and 2 are problematic, since the Fediverse is hetetogeneous and not every instance federates which every instance, and not all the content is shared between diffrerent software (i.e. only a part of the content in kbin can be accessed by madtodon).
Anyway the approach is sound, maybe not for the fediverse, but for groups of instances which agree on a shared search engine.
I’m not sure about the GPU requirements, but on CPU the updates could be very slow for the actual trafic.

ofcourse@kbin.social · edit-2 1 year ago

Thanks for sharing your insights.

I’m curious why instances offering free search are defederated? I would have guessed everyone wants better search. Is it because of privacy concerns or instances don’t want to be indexed or have traffic directed elsewhere?

I was hoping that if I index only for the purpose of embeddings (which would prevent recreating the original content) and only share urls to the content that it should eliminate privacy and traffic concerns.

I’m still in the process of understanding how and if this would work. It’s only a personal project at this stage but you are right cpu/gpu and vector stores would be things I’d need to consider.

noodlejetski@kbin.social · 1 year ago

prepare for a ton of instances defederating from yours on day 1.