Int
Float
Decimal
Byte
Literals
Compiler
Type System
JBallerina Runtime
Blog Post
Designing Numeric Literals for Ballerina
Today, I’d like to share our experience and lessons learned while designing the numeric type system for Ballerina Lang.
History
In the early days of Ballerina, we adopted int
, long
, byte
, float
, and double
as dedicated types for each common case. At this stage, we were primarily drawing inspiration from other languages and didn’t fully consider the implications of having numerous numerical data types. Our approach was greatly influenced by Java because at that time Ballerina was initially a JVM-based interpreted language.
However, we soon realized that maintaining separate data types was burdensome for our users. Ballerina’s primary target audience were not hardcore developers, but individuals who primarily used to work with DSL (Domain Specific Languages) and low-code/no-code editors. At this level of development, the complexity of having multiple types was a major drawback and which led to a lot of confusion among developers. (when to use int
vs long
vs byte
vs float
vs double
etc.)
Concurrently, we identified a requirement to introduce decimal
as a primary type, given Ballerina’s usage for network integration and the handling of financial data - one of its primary use cases. Consequently, the inclusion of a more precise type like decimal
became a must. Another requirement was to support decimal is to support pure JSON numbers.
Considering these new requirements, we revisited our approach and looked into modern language practices. Simultaneously, we adopted a structure-based type checking system which greatly simplified our problem.
Solution
We ended up with three built-in basic types: int
, float
, and decimal
.
int
- Represents 64-bit signed integer valuesfloat
- Represents 64-bit IEEE 754-2008 binary floating-point numbersdecimal
- Represents 128-bit IEEE 754-2008 decimal floating-point values
It is simple to explain and supports common use cases (I will cover to Performance aspects later. :) ). With help of union types, we defined other types such as byte
, signed32
, unsigned32
, signed16
, unsigned16
, signed8
, and unsigned8
as subtypes of int
. This was done to reduce complexity and to still provide a range of types for different advanced use cases. For example, the byte
type is defined as a union of integers between 0 and 255, inclusive. The same principle applies to the other integer subtypes.
This design facilitated better support for JSON. Ballerina defines the json type as a union of ()|int|float|decimal|boolean|string|json[]|map<json>
, preferring decimal
as the default numeric type when constructing a json value from a string.
Another design decision we made was to not to support implicit conversion among numerical types. I will discuss this in detail next.
The basic idea is to make the static typing simple and easy to understand. But at runtime, with enough information, we can represent a value in best optimized way, which is an implementation detail and mostly developers don’t want to know.
Working with Literals
In Ballerina, a value written as a numeric literal always represents a specific type, which is determined by the literal itself. The type of a literal can be one of the basic types, such as int
, float
, or decimal
.
For example, the literal 10 represents the integer value
10, and its basic type is int
. However, in some contexts, the same literal 10 can also represent a floating-point value 10.0 or a decimal value 10. Depending on the context, the compiler determines the appropriate type of the literal to use.
To determine the type of numeric literals, we have defined a 3-step algorithm. To help explain this, I’ve included a link to a playground that visualizes the process.
Auto-Casting and Conversion
Ballerina does not support implicit conversion among numerical types. This safeguard helps prevent unintended loss of precision and unexpected program behavior.
For example, the following code would result in a compile-time error:
Compiling source
numeric-error.bal
ERROR [numeric-error.bal:(2:13,2:19)] operator '+' not defined for 'decimal' and 'float'
ERROR [numeric-error.bal:(3:9,3:12)] incompatible types: expected 'int', found 'float'
error: compilation contains errors
bal version
Ballerina 2201.6.0 (Swan Lake Update 6)
However, we do allow safe type inference from literals, as seen in float x = 1. This rationale is clarified in the earlier mentioned algorithm.
So, while we can’t entirely prevent misuse through adding a cast and the like, Ballerina’s strict typing and explicit conversion requirements at least alert the user to potential issues during compile time. This is particularly important for our main focus - network integration - where identifying such issues at compile time rather than runtime is crucial.
Performance Implications
It’s clear that using the new model to represent an int32 list requires the allocation of an int64 list, which isn’t optimal. For a byte list, this could even be considered overkill. However, in order to maintain performance, byte[] is specially handled in the runtime.
While there are future plans to allocate memory based on static type for other integer types, currently they’re all modeled as int64. Given that Ballerina’s target applications are not system applications (such as OS development, low-level apps), this is a known compromise we’ve had to make to strike a balance between ease of use and performance.
This blog originally published on Reddit /r/ProgrammingLanguages/
Table of Content
Key Words
📃 This article was originally published on Reddit /r/ProgrammingLanguages/.