Skip to main content

PARALLEL_TRAINING

From our embeddings in search systems paper we say DistBelief mentioned in the 2013 Word2Vec paper

DistBelief

  • Sounds like they update small portions of the model on many different servers, and send all of the updates back to a centralized server which then will push weight updates back through training servers
  • Mini-Batch gradient descent with adaptive learning rates via AdaGrad