Integer-Only Softmax Experiment for LLM Inferencing

Test Configuration

Transformer Model: Select the transformer model to use for testing

Test Mode:

Single Prompt Batch Testing

Single prompt for quick tests, batch for comprehensive evaluation

Prompt: Tip: The default country changes randomly on each page load

Select Batch: Prompts (one per line): 0 prompts in batch

Number of tokens to generate: Higher values provide more comprehensive error statistics but take longer

Advanced Configuration

Configure exp(x) range and parameters for different methods

▼

Configure the exp(x) range for different method types. Single-table methods are limited by int16 precision.

xmax for Single-Table Methods (LUTi16, LUTi8):

Tradeoff: Higher xmax covers more extreme values but spreads quantization precision thinner, causing larger errors on typical small values. Lower xmax gives better precision but clips extreme outliers.
Recommendation: Use 6-10 for balanced coverage. Single tables struggle with wide ranges—use DIGmax for xmax>10.

xmax for Multi-Table Methods (DIGmax):

Tradeoff: Higher xmax accommodates extreme outliers with multiple tables adapting precision. Lower xmax reduces memory and improves precision for typical ranges.
Recommendation: Use 20-40 for robust handling of diverse attention patterns. DIGmax excels at wide ranges—go higher for safety without major accuracy loss.

Polynomial Order (for Polynomial method): Tradeoff: Higher orders (6-7) improve accuracy for large x but require more multiply-accumulate operations and risk numerical overflow. Lower orders (2-3) are faster but only accurate for small x values.
Recommendation: Order 5 balances accuracy and computational cost. Use Order 3-4 for ultra-low-power, Order 6-7 for research comparisons.

Number of Tables for DIGmax (Exponential/Log): Tradeoff: More tables provide finer-grained range adaptation, improving accuracy across diverse attention patterns but increasing memory footprint. Fewer tables reduce memory but force precision compromises.
Recommendation: 6 tables (~3KB) is optimal for embedded systems. Use 8-12 for best accuracy, 3-4 for extreme memory constraints.

Number of Tables for DIGmax (Linear): Tradeoff: Linear distribution lacks exponential adaptation, requiring many more tables (256+) to match accuracy of 6 logarithmic tables. Fewer tables severely degrade precision. More tables approach logarithmic accuracy but waste memory.
Note: Exponential/log distribution is generally superior. Use linear only for controlled benchmarking or specific hardware constraints.

Select Implementations to Test

Click cards to select/deselect implementations. Each will run separately and results will be compared.

Softmax Implementation for Integer-Only Hardware for LLM Inferencing

Current Model

Test Configuration

Advanced Configuration

Select Implementations to Test