Deep Learning Functions Reference¶
LinAlgKit provides comprehensive mathematical functions for building neural networks and deep learning applications.
Table of Contents¶
- Activation Functions
- Loss Functions
- Normalization
- Convolution Operations
- Weight Initialization
- Utility Functions
- Advanced Math
- Examples
Activation Functions¶
sigmoid(x)¶
Sigmoid activation: σ(x) = 1 / (1 + exp(-x))
import LinAlgKit as lk
import numpy as np
x = np.array([-2, -1, 0, 1, 2])
output = lk.sigmoid(x)
# [0.119, 0.269, 0.5, 0.731, 0.881]
Properties: - Output range: (0, 1) - Used for: Binary classification, gates in LSTMs
relu(x)¶
Rectified Linear Unit: ReLU(x) = max(0, x)
x = np.array([-2, -1, 0, 1, 2])
output = lk.relu(x)
# [0, 0, 0, 1, 2]
Properties: - Output range: [0, ∞) - Fast to compute - Can cause "dying ReLU" problem
leaky_relu(x, alpha=0.01)¶
Leaky ReLU: f(x) = x if x > 0, else α*x
x = np.array([-2, -1, 0, 1, 2])
output = lk.leaky_relu(x, alpha=0.1)
# [-0.2, -0.1, 0, 1, 2]
Properties: - Prevents dying ReLU - α typically 0.01 or 0.1
elu(x, alpha=1.0)¶
Exponential Linear Unit: f(x) = x if x > 0, else α*(exp(x) - 1)
x = np.array([-2, -1, 0, 1, 2])
output = lk.elu(x)
# [-0.865, -0.632, 0, 1, 2]
Properties: - Smooth for negative values - Mean activations closer to zero
gelu(x)¶
Gaussian Error Linear Unit (used in BERT, GPT):
x = np.array([-2, -1, 0, 1, 2])
output = lk.gelu(x)
# [-0.045, -0.158, 0, 0.841, 1.955]
Properties: - Smooth, differentiable everywhere - Current state-of-the-art for transformers
swish(x, beta=1.0)¶
Self-gated activation: f(x) = x * sigmoid(β*x)
output = lk.swish(x, beta=1.0)
Properties: - Smooth, non-monotonic - Outperforms ReLU in deep networks
softmax(x, axis=-1)¶
Converts logits to probabilities:
logits = np.array([[2.0, 1.0, 0.1]])
probs = lk.softmax(logits)
# [[0.659, 0.242, 0.099]] (sums to 1)
Properties: - Output sums to 1 - Used for multi-class classification
log_softmax(x, axis=-1)¶
Numerically stable log of softmax:
log_probs = lk.log_softmax(logits)
Use case: Computing cross-entropy loss efficiently
softplus(x)¶
Smooth approximation of ReLU: f(x) = log(1 + exp(x))
output = lk.softplus(x)
tanh(x)¶
Hyperbolic tangent:
output = lk.tanh(x)
# Range: (-1, 1)
Loss Functions¶
mse_loss(predictions, targets, reduction='mean')¶
Mean Squared Error for regression:
pred = np.array([1.0, 2.0, 3.0])
target = np.array([1.1, 2.2, 2.8])
loss = lk.mse_loss(pred, target)
# 0.03
mae_loss(predictions, targets, reduction='mean')¶
Mean Absolute Error (L1 loss):
loss = lk.mae_loss(pred, target)
huber_loss(predictions, targets, delta=1.0, reduction='mean')¶
Robust loss combining MSE and MAE:
loss = lk.huber_loss(pred, target, delta=1.0)
Properties: - Quadratic for small errors - Linear for large errors (robust to outliers)
cross_entropy_loss(predictions, targets, epsilon=1e-12)¶
Cross-entropy for multi-class classification:
probs = np.array([[0.7, 0.2, 0.1], [0.1, 0.8, 0.1]])
targets = np.array([0, 1]) # Class indices
loss = lk.cross_entropy_loss(probs, targets)
binary_cross_entropy(predictions, targets, epsilon=1e-12)¶
Binary cross-entropy for binary classification:
probs = np.array([0.9, 0.1, 0.8])
targets = np.array([1, 0, 1])
loss = lk.binary_cross_entropy(probs, targets)
Normalization Functions¶
batch_norm(x, gamma=None, beta=None, epsilon=1e-5, axis=0)¶
Batch normalization:
# x shape: (batch_size, features)
x_norm = lk.batch_norm(x, gamma=scale, beta=shift)
Properties: - Normalizes across batch dimension - Reduces internal covariate shift
layer_norm(x, gamma=None, beta=None, epsilon=1e-5)¶
Layer normalization (used in transformers):
# Normalizes across feature dimension
x_norm = lk.layer_norm(x)
Properties: - Normalizes across features, not batch - Works with any batch size
instance_norm(x, epsilon=1e-5)¶
Instance normalization (for style transfer):
# x shape: (batch, channels, height, width)
x_norm = lk.instance_norm(x)
Convolution Operations¶
conv2d(x, kernel, stride=1, padding=0)¶
2D convolution:
# Input: (batch, channels, H, W) or (H, W)
# Kernel: (out_channels, in_channels, kH, kW) or (kH, kW)
image = np.random.randn(1, 1, 28, 28)
kernel = np.random.randn(32, 1, 3, 3)
output = lk.conv2d(image, kernel, stride=1, padding=1)
# Output shape: (1, 32, 28, 28)
max_pool2d(x, kernel_size=2, stride=None)¶
Max pooling:
output = lk.max_pool2d(x, kernel_size=2)
# Reduces spatial dimensions by half
avg_pool2d(x, kernel_size=2, stride=None)¶
Average pooling:
output = lk.avg_pool2d(x, kernel_size=2)
global_avg_pool2d(x)¶
Global average pooling:
# Input: (batch, channels, H, W)
# Output: (batch, channels)
output = lk.global_avg_pool2d(x)
Weight Initialization¶
xavier_uniform(shape, gain=1.0)¶
Xavier/Glorot uniform initialization (for tanh/sigmoid):
weights = lk.xavier_uniform((784, 256))
xavier_normal(shape, gain=1.0)¶
Xavier/Glorot normal initialization:
weights = lk.xavier_normal((784, 256))
he_uniform(shape)¶
He/Kaiming uniform initialization (for ReLU):
weights = lk.he_uniform((784, 256))
he_normal(shape)¶
He/Kaiming normal initialization:
weights = lk.he_normal((784, 256))
Utility Functions¶
dropout(x, p=0.5, training=True)¶
Dropout regularization:
# During training (randomly zeros elements)
x_dropped = lk.dropout(x, p=0.5, training=True)
# During inference (returns unchanged)
x_out = lk.dropout(x, p=0.5, training=False)
one_hot(indices, num_classes)¶
One-hot encoding:
labels = np.array([0, 2, 1])
encoded = lk.one_hot(labels, num_classes=3)
# [[1, 0, 0],
# [0, 0, 1],
# [0, 1, 0]]
clip(x, min_val, max_val)¶
Clip values to a range:
x_clipped = lk.clip(x, -1.0, 1.0)
flatten(x, start_dim=0)¶
Flatten tensor:
# Input: (batch, C, H, W)
# Output: (batch, C*H*W) if start_dim=1
x_flat = lk.flatten(x, start_dim=1)
reshape(x, shape)¶
Reshape array:
x_reshaped = lk.reshape(x, (batch_size, -1))
Advanced Math Functions¶
normalize(x, axis=-1, epsilon=1e-12)¶
L2 normalize along axis:
x_normalized = lk.normalize(x)
# ||x|| = 1 along specified axis
cosine_similarity(a, b, axis=-1)¶
Cosine similarity:
similarity = lk.cosine_similarity(a, b)
# Range: [-1, 1]
euclidean_distance(a, b, axis=-1)¶
Euclidean distance:
distance = lk.euclidean_distance(a, b)
pairwise_distances(X, Y=None)¶
Compute all pairwise distances:
# X: (n, features), Y: (m, features)
# Output: (n, m) distance matrix
distances = lk.pairwise_distances(X, Y)
numerical_gradient(f, x, epsilon=1e-7)¶
Compute numerical gradient:
def loss_fn(w):
return np.sum(w ** 2)
grad = lk.numerical_gradient(loss_fn, weights)
outer(a, b)¶
Outer product:
result = lk.outer(a, b) # a[:, None] * b[None, :]
inner(a, b)¶
Inner product:
result = lk.inner(a, b)
dot(a, b)¶
Dot product:
result = lk.dot(a, b)
cross(a, b)¶
Cross product (3D vectors):
result = lk.cross(a, b)
Examples¶
Example 1: Simple Neural Network Forward Pass¶
import LinAlgKit as lk
import numpy as np
# Initialize weights
W1 = lk.he_normal((784, 128))
W2 = lk.he_normal((128, 10))
# Forward pass
def forward(x):
# Layer 1
h1 = lk.relu(x @ W1)
h1 = lk.dropout(h1, p=0.2, training=True)
# Layer 2
logits = h1 @ W2
probs = lk.softmax(logits)
return probs
# Example input
x = np.random.randn(32, 784)
output = forward(x)
print(f"Output shape: {output.shape}") # (32, 10)
Example 2: Convolutional Layer¶
import LinAlgKit as lk
import numpy as np
# Input image batch
images = np.random.randn(16, 3, 32, 32) # (batch, channels, H, W)
# Convolution kernel
kernel = lk.he_normal((64, 3, 3, 3)) # (out_ch, in_ch, kH, kW)
# Forward pass
conv_out = lk.conv2d(images, kernel, stride=1, padding=1)
conv_out = lk.batch_norm(conv_out)
conv_out = lk.relu(conv_out)
pooled = lk.max_pool2d(conv_out, kernel_size=2)
print(f"After conv: {conv_out.shape}") # (16, 64, 32, 32)
print(f"After pool: {pooled.shape}") # (16, 64, 16, 16)
Example 3: Training Step with Loss¶
import LinAlgKit as lk
import numpy as np
# Predictions and targets
logits = np.random.randn(32, 10)
targets = np.random.randint(0, 10, size=32)
# Compute loss
probs = lk.softmax(logits)
loss = lk.cross_entropy_loss(probs, targets)
print(f"Cross-entropy loss: {loss:.4f}")
# For regression
predictions = np.random.randn(32, 1)
regression_targets = np.random.randn(32, 1)
mse = lk.mse_loss(predictions, regression_targets)
print(f"MSE loss: {mse:.4f}")
Function Reference Table¶
| Category | Functions |
|---|---|
| Activations | sigmoid, relu, leaky_relu, elu, gelu, swish, softplus, tanh, softmax, log_softmax |
| Derivatives | sigmoid_derivative, relu_derivative, leaky_relu_derivative, elu_derivative, tanh_derivative |
| Losses | mse_loss, mae_loss, huber_loss, cross_entropy_loss, binary_cross_entropy |
| Normalization | batch_norm, layer_norm, instance_norm |
| Convolution | conv2d, max_pool2d, avg_pool2d, global_avg_pool2d |
| Initialization | xavier_uniform, xavier_normal, he_uniform, he_normal |
| Utilities | dropout, one_hot, clip, flatten, reshape |
| Math | normalize, cosine_similarity, euclidean_distance, pairwise_distances, numerical_gradient, outer, inner, dot, cross, norm |
For matrix operations, see API Reference.