Skip to main content

PARALLEL_TRAINING

From our embeddings in search systems paper we say DistBelief mentioned in the 2013 Word2Vec paper

DistBelief

Sounds like they update small portions of the model on many different servers, and send all of the updates back to a centralized server which then will push weight updates back through training servers
Mini-Batch gradient descent with adaptive learning rates via AdaGrad

DistBelief