∇θJ(θ) = E[∇logπ(a|s)R] L = -Σyᵢlog(ŷᵢ) Attention(Q,K,V) p(θ|D) ∝ p(D|θ)p(θ) σ(x) = 1/(1+e⁻ˣ) softmax(QKᵀ/√dₖ)V BERT Embeddings H(X) = -ΣP(x)logP(x) ReLU(x) = max(0,x) LayerNorm(x) BatchNorm P(A|B) = P(B|A)P(A)/P(B) ELBO VAE Diffusion MSE = (1/n)Σ(y-ŷ)² argmax
T

The Intelligent Systems Ecosystem

Research Community

No Posts Yet

Be the first to share your research