Deep Learning Functions Reference¶

LinAlgKit provides comprehensive mathematical functions for building neural networks and deep learning applications.

Table of Contents¶

Activation Functions
Loss Functions
Normalization
Convolution Operations
Weight Initialization
Utility Functions
Advanced Math
Examples

Activation Functions¶

sigmoid(x)¶

Sigmoid activation: σ(x) = 1 / (1 + exp(-x))

import LinAlgKit as lk
import numpy as np

x = np.array([-2, -1, 0, 1, 2])
output = lk.sigmoid(x)
# [0.119, 0.269, 0.5, 0.731, 0.881]

Properties: - Output range: (0, 1) - Used for: Binary classification, gates in LSTMs

relu(x)¶

Rectified Linear Unit: ReLU(x) = max(0, x)

x = np.array([-2, -1, 0, 1, 2])
output = lk.relu(x)
# [0, 0, 0, 1, 2]

Properties: - Output range: [0, ∞) - Fast to compute - Can cause "dying ReLU" problem

leaky_relu(x, alpha=0.01)¶

Leaky ReLU: f(x) = x if x > 0, else α*x

x = np.array([-2, -1, 0, 1, 2])
output = lk.leaky_relu(x, alpha=0.1)
# [-0.2, -0.1, 0, 1, 2]

Properties: - Prevents dying ReLU - α typically 0.01 or 0.1

elu(x, alpha=1.0)¶

Exponential Linear Unit: f(x) = x if x > 0, else α*(exp(x) - 1)

x = np.array([-2, -1, 0, 1, 2])
output = lk.elu(x)
# [-0.865, -0.632, 0, 1, 2]

Properties: - Smooth for negative values - Mean activations closer to zero

gelu(x)¶

Gaussian Error Linear Unit (used in BERT, GPT):

x = np.array([-2, -1, 0, 1, 2])
output = lk.gelu(x)
# [-0.045, -0.158, 0, 0.841, 1.955]

Properties: - Smooth, differentiable everywhere - Current state-of-the-art for transformers

swish(x, beta=1.0)¶

Self-gated activation: f(x) = x * sigmoid(β*x)

output = lk.swish(x, beta=1.0)

Properties: - Smooth, non-monotonic - Outperforms ReLU in deep networks

softmax(x, axis=-1)¶

Converts logits to probabilities:

logits = np.array([[2.0, 1.0, 0.1]])
probs = lk.softmax(logits)
# [[0.659, 0.242, 0.099]]  (sums to 1)

Properties: - Output sums to 1 - Used for multi-class classification

log_softmax(x, axis=-1)¶

Numerically stable log of softmax:

log_probs = lk.log_softmax(logits)

Use case: Computing cross-entropy loss efficiently

softplus(x)¶

Smooth approximation of ReLU: f(x) = log(1 + exp(x))

output = lk.softplus(x)

tanh(x)¶

Hyperbolic tangent:

output = lk.tanh(x)
# Range: (-1, 1)

Loss Functions¶

mse_loss(predictions, targets, reduction='mean')¶

Mean Squared Error for regression:

pred = np.array([1.0, 2.0, 3.0])
target = np.array([1.1, 2.2, 2.8])
loss = lk.mse_loss(pred, target)
# 0.03

mae_loss(predictions, targets, reduction='mean')¶

Mean Absolute Error (L1 loss):

loss = lk.mae_loss(pred, target)

huber_loss(predictions, targets, delta=1.0, reduction='mean')¶

Robust loss combining MSE and MAE:

loss = lk.huber_loss(pred, target, delta=1.0)

Properties: - Quadratic for small errors - Linear for large errors (robust to outliers)

cross_entropy_loss(predictions, targets, epsilon=1e-12)¶

Cross-entropy for multi-class classification:

probs = np.array([[0.7, 0.2, 0.1], [0.1, 0.8, 0.1]])
targets = np.array([0, 1])  # Class indices
loss = lk.cross_entropy_loss(probs, targets)

binary_cross_entropy(predictions, targets, epsilon=1e-12)¶

Binary cross-entropy for binary classification:

probs = np.array([0.9, 0.1, 0.8])
targets = np.array([1, 0, 1])
loss = lk.binary_cross_entropy(probs, targets)

Normalization Functions¶

batch_norm(x, gamma=None, beta=None, epsilon=1e-5, axis=0)¶

Batch normalization:

# x shape: (batch_size, features)
x_norm = lk.batch_norm(x, gamma=scale, beta=shift)

Properties: - Normalizes across batch dimension - Reduces internal covariate shift

layer_norm(x, gamma=None, beta=None, epsilon=1e-5)¶

Layer normalization (used in transformers):

# Normalizes across feature dimension
x_norm = lk.layer_norm(x)

Properties: - Normalizes across features, not batch - Works with any batch size

instance_norm(x, epsilon=1e-5)¶

Instance normalization (for style transfer):

# x shape: (batch, channels, height, width)
x_norm = lk.instance_norm(x)

Convolution Operations¶

conv2d(x, kernel, stride=1, padding=0)¶

2D convolution:

# Input: (batch, channels, H, W) or (H, W)
# Kernel: (out_channels, in_channels, kH, kW) or (kH, kW)
image = np.random.randn(1, 1, 28, 28)
kernel = np.random.randn(32, 1, 3, 3)
output = lk.conv2d(image, kernel, stride=1, padding=1)
# Output shape: (1, 32, 28, 28)

max_pool2d(x, kernel_size=2, stride=None)¶

Max pooling:

output = lk.max_pool2d(x, kernel_size=2)
# Reduces spatial dimensions by half

avg_pool2d(x, kernel_size=2, stride=None)¶

Average pooling:

output = lk.avg_pool2d(x, kernel_size=2)

global_avg_pool2d(x)¶

Global average pooling:

# Input: (batch, channels, H, W)
# Output: (batch, channels)
output = lk.global_avg_pool2d(x)

Weight Initialization¶

xavier_uniform(shape, gain=1.0)¶

Xavier/Glorot uniform initialization (for tanh/sigmoid):

weights = lk.xavier_uniform((784, 256))

xavier_normal(shape, gain=1.0)¶

Xavier/Glorot normal initialization:

weights = lk.xavier_normal((784, 256))

he_uniform(shape)¶

He/Kaiming uniform initialization (for ReLU):

weights = lk.he_uniform((784, 256))

he_normal(shape)¶

He/Kaiming normal initialization:

weights = lk.he_normal((784, 256))

Utility Functions¶

dropout(x, p=0.5, training=True)¶

Dropout regularization:

# During training (randomly zeros elements)
x_dropped = lk.dropout(x, p=0.5, training=True)

# During inference (returns unchanged)
x_out = lk.dropout(x, p=0.5, training=False)

one_hot(indices, num_classes)¶

One-hot encoding:

labels = np.array([0, 2, 1])
encoded = lk.one_hot(labels, num_classes=3)
# [[1, 0, 0],
#  [0, 0, 1],
#  [0, 1, 0]]

clip(x, min_val, max_val)¶

Clip values to a range:

x_clipped = lk.clip(x, -1.0, 1.0)

flatten(x, start_dim=0)¶

Flatten tensor:

# Input: (batch, C, H, W)
# Output: (batch, C*H*W) if start_dim=1
x_flat = lk.flatten(x, start_dim=1)

reshape(x, shape)¶

Reshape array:

x_reshaped = lk.reshape(x, (batch_size, -1))

Advanced Math Functions¶

normalize(x, axis=-1, epsilon=1e-12)¶

L2 normalize along axis:

x_normalized = lk.normalize(x)
# ||x|| = 1 along specified axis

cosine_similarity(a, b, axis=-1)¶

Cosine similarity:

similarity = lk.cosine_similarity(a, b)
# Range: [-1, 1]

euclidean_distance(a, b, axis=-1)¶

Euclidean distance:

distance = lk.euclidean_distance(a, b)

pairwise_distances(X, Y=None)¶

Compute all pairwise distances:

# X: (n, features), Y: (m, features)
# Output: (n, m) distance matrix
distances = lk.pairwise_distances(X, Y)

numerical_gradient(f, x, epsilon=1e-7)¶

Compute numerical gradient:

def loss_fn(w):
    return np.sum(w ** 2)

grad = lk.numerical_gradient(loss_fn, weights)

outer(a, b)¶

Outer product:

result = lk.outer(a, b)  # a[:, None] * b[None, :]

inner(a, b)¶

Inner product:

result = lk.inner(a, b)

dot(a, b)¶

Dot product:

result = lk.dot(a, b)

cross(a, b)¶

Cross product (3D vectors):

result = lk.cross(a, b)

Examples¶

Example 1: Simple Neural Network Forward Pass¶

import LinAlgKit as lk
import numpy as np

# Initialize weights
W1 = lk.he_normal((784, 128))
W2 = lk.he_normal((128, 10))

# Forward pass
def forward(x):
    # Layer 1
    h1 = lk.relu(x @ W1)
    h1 = lk.dropout(h1, p=0.2, training=True)

    # Layer 2
    logits = h1 @ W2
    probs = lk.softmax(logits)
    return probs

# Example input
x = np.random.randn(32, 784)
output = forward(x)
print(f"Output shape: {output.shape}")  # (32, 10)

Example 2: Convolutional Layer¶

import LinAlgKit as lk
import numpy as np

# Input image batch
images = np.random.randn(16, 3, 32, 32)  # (batch, channels, H, W)

# Convolution kernel
kernel = lk.he_normal((64, 3, 3, 3))  # (out_ch, in_ch, kH, kW)

# Forward pass
conv_out = lk.conv2d(images, kernel, stride=1, padding=1)
conv_out = lk.batch_norm(conv_out)
conv_out = lk.relu(conv_out)
pooled = lk.max_pool2d(conv_out, kernel_size=2)

print(f"After conv: {conv_out.shape}")  # (16, 64, 32, 32)
print(f"After pool: {pooled.shape}")    # (16, 64, 16, 16)

Example 3: Training Step with Loss¶

import LinAlgKit as lk
import numpy as np

# Predictions and targets
logits = np.random.randn(32, 10)
targets = np.random.randint(0, 10, size=32)

# Compute loss
probs = lk.softmax(logits)
loss = lk.cross_entropy_loss(probs, targets)
print(f"Cross-entropy loss: {loss:.4f}")

# For regression
predictions = np.random.randn(32, 1)
regression_targets = np.random.randn(32, 1)
mse = lk.mse_loss(predictions, regression_targets)
print(f"MSE loss: {mse:.4f}")

Function Reference Table¶

Category	Functions
Activations	`sigmoid`, `relu`, `leaky_relu`, `elu`, `gelu`, `swish`, `softplus`, `tanh`, `softmax`, `log_softmax`
Derivatives	`sigmoid_derivative`, `relu_derivative`, `leaky_relu_derivative`, `elu_derivative`, `tanh_derivative`
Losses	`mse_loss`, `mae_loss`, `huber_loss`, `cross_entropy_loss`, `binary_cross_entropy`
Normalization	`batch_norm`, `layer_norm`, `instance_norm`
Convolution	`conv2d`, `max_pool2d`, `avg_pool2d`, `global_avg_pool2d`
Initialization	`xavier_uniform`, `xavier_normal`, `he_uniform`, `he_normal`
Utilities	`dropout`, `one_hot`, `clip`, `flatten`, `reshape`
Math	`normalize`, `cosine_similarity`, `euclidean_distance`, `pairwise_distances`, `numerical_gradient`, `outer`, `inner`, `dot`, `cross`, `norm`

For matrix operations, see API Reference.