Skip to main content

8 posts tagged with "TypR"

TypR tag description

View All Tags

Conceptual limitations of llms

· 6 min read
Hategekimana Fabrice
IT trainer, Programming language designer

Artificial intelligence, and Large Language Models (LLMs) in particular, are making spectacular progress. Yet, behind the prevailing enthusiasm, several fundamental conceptual limitations persist. These limits are not simple bugs to fix: they are structural and deserve careful consideration. This explain why programming languages will still exist in the future.

1. The Data Limitation: AI's Achilles' Heel

The first limitation, and arguably the most well-known, concerns training data.

The majority of AI systems are trained on vast datasets, and the quality of that data directly determines the model's capabilities. You could take the best AI of the era: if you train it with erroneous, biased, or low-quality data, it won't produce anything impressive.

The Bias Problem

Data must not only be of good quality but also free from bias. There have already been cases where an AI used in a judicial context produced biased and discriminatory judgments. ProPublica's investigation into the COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) software revealed that for the same offense, the sentence could vary depending on the defendant's skin color. This type of bias, inherited from training data, can have devastating consequences.

Pollution from AI-Generated Content

The problem is worsening with a recent phenomenon: most LLMs are trained on content sourced from the Internet. However, we are witnessing an explosion of AI-generated content on the web. This content, often mass-produced with no concern for quality or added value, ends up in the training data of future AI systems. This is a vicious cycle that researchers call model collapse (Shumailov et al., Nature, 2024), a degenerative process that risks progressively degrading model quality over successive generations.

2. The Context Limitation: A Failing Short-Term Memory

If training data represents an AI's long-term memory, context constitutes its short-term memory.

Context has a limited size and can only retain a certain number of elements from a project. Worse still, it resets with each new session. Granted, new techniques allow the creation of summary files to maintain some form of continuity, but these summaries themselves consume part of the available context.

The Context Window Paradox

Companies have developed AI systems with increasingly large context windows. But as demonstrated by the study Lost in the Middle: How Language Models Use Long Contexts (Liu et al., 2023), response quality degrades significantly when relevant information is located in the middle of the context. Quantity does not replace quality of processing.

The Human Advantage Over Time

This is precisely where a human eventually outperforms an AI. On a given task, the AI may perform better at first. But as the human develops and accumulates experience, their "context" improves and remains consistent. The AI's context, on the other hand, resets or becomes unreliable over time.

3. The Language Limitation: Ambiguity as a Fundamental Obstacle

This limitation is particularly relevant for code-focused AI. We often hear that AI will replace developers, or even programming languages themselves, with natural language becoming the new interface.

There is some truth to this vision, but it obscures a fundamental problem: AI systems constitute a new layer of abstraction that inherits the flaws of human languages.

The reason we created formal languages (programming languages) in the first place is precisely because they eliminate all ambiguity to produce correct machine code. Human language, by contrast, is inherently ambiguous. The same sentence can be interpreted in different ways, and the fact that AI systems operate on a probabilistic generation basis makes the endeavor even riskier.

However, some avenues exist: certain research shows that natural language can be reduced to a controlled language (Controlled Natural Language), trading expressiveness for precision. A comprehensive classification of these languages was proposed by Kuhn (2014). Other work explores hybrid approaches combining formal and natural languages.

4. The Environment Limitation: The Real World Isn't Ready

The last conceptual limitation is perhaps the most underestimated: that of the environment.

The AI systems of the future risk resembling the flying cars we were once promised. One of the main reasons flying cars never materialized isn't just technology: it's that the world wasn't ready to accommodate them. Airways already had their rules, designed for aircraft. Roads were designed for cars. Integrating a hybrid vehicle would have been a regulatory and systemic nightmare.

The Gap with Workplace Reality

The reality of a human work environment is fundamentally different. A workplace cannot be reduced to algorithmic metrics. It involves human beings interacting in an environment far more complex than any laboratory, even one equipped with AI agents. We are trying to integrate AI, robots, and machines into a universe designed by and for humans. And these machines do not yet have the capacity to fully adapt to it.

Impressive Performance, but Only in the Lab

The same applies to AI. Benchmarks from the research group METR, notably their study Measuring AI Ability to Complete Long Tasks (March 2025), show impressive results: AI systems capable of working autonomously for several hours on a given task, with a time horizon that roughly doubles every 7 months. But these tasks remain confined to specific domains (programming, security, infrastructure) and take place in research and experimental environments.


In Summary

Current LLMs, despite considerable progress in creating AI systems that reason better and can take more initiative, remain systems that:

  • are dependent on training data quality,
  • have a limited memory (context),
  • suffer from the ambiguity of natural language,
  • have a very restricted interaction with the real world.

The true challenge of creating AI capable of fully replacing human beings therefore remains wide open.

Let's Not Forget the Progress

That said, it would be unfair not to acknowledge the phenomenal progress that has been made. AI systems are now capable of performing reasoning tasks on existing knowledge, sparing us from "reinventing the wheel." A large portion of repetitive work is now automated, freeing us to focus on more important tasks: research tasks, solving open and complex problems, and challenges deeply tied to the human experience.

AI doesn't replace humans. It repositions them to do what they do best. AI doesn't replace programming languages. It covers a domain where the actual level of abstraction is blocking development speed so programming languages can strive in their own domain.

What if S3 code could write itself?

· 2 min read
Hategekimana Fabrice
IT trainer, Programming language designer

If you've ever maintained an R package with S3 classes, you know the drill: manually writing constructors, hoping field names stay consistent, debugging method dispatch when someone passes the wrong structure. S3 is powerful, but it puts all the burden on the developer.

I've been working on a tool called typr that takes a different approach. You declare your types in a concise .ty file, typr checks them statically, and then generates clean, idiomatic R/S3 code. The output is plain R — no runtime dependency, no magic.

A concrete example

Suppose you're building a package that models survey respondents:

type Respondent <- list {
id: int,
name: char,
score: num
};

create_respondent <- fn(id: int, name: char, score: num): Respondent {
list(id = id, name = name, score = score)
};

TypR checks at compile time that every function using a Respondent respects the structure. If you mistype a field or pass a char where an int is expected, you get an error before anything runs. The generated R code uses standard S3 classes that any other R package can consume.

Why not just use R7 or S4?

R7 and S4 solve the class definition problem, but they don't offer static analysis before runtime. TypR operates at a different level: it catches type errors at write time, then generates code that works with whichever class system you prefer. Think of it as a layer above the class system, not a replacement.

What's next

I'm preparing a proposal for the R Consortium ISC Grant Program to turn typr into a proper R package — installable, documented, with vignettes showing real package development workflows.

Before I submit, I want to hear from the community:

  • Does this match a pain point you've actually experienced?
  • What features would matter most? (data.frame column typing? editor integration? CI/CD checks?)
  • What would stop you from trying it?

You can try it now:

I'd genuinely appreciate any feedback — especially the skeptical kind. It'll make the project and the proposal better.

Fabrice — Geneva, Switzerland

Better tests with TypR

· 4 min read
Hategekimana Fabrice
IT trainer, Programming language designer

Test-Driven Development (TDD) is a well-recognized practice for improving code quality. Yet its adoption often remains limited due to the friction it imposes: juggling between code and test files, maintaining synchronization between logic and tests, and managing sometimes complex test architecture.

TypR offers an elegant solution to this problem: inline test blocks.

The Problem with Classical TDD in R

In a traditional R project using testthat, the typical structure looks like this:

my-package/
├── R/
│ └── person.R
└── tests/
└── testthat/
└── test-person.R

This separation between source code and tests creates several points of friction:

  • Cognitive distance: you need to navigate between two files to understand the expected behavior of a function
  • Manual synchronization: when modifying a function, you need to remember to open the corresponding test file
  • Mental overhead: the structure requires maintaining two parallel directory trees

These frictions, while seemingly minor, accumulate and can discourage TDD adoption, especially for less experienced developers.

The TypR Solution: Inline Test Blocks

TypR introduces inline test blocks, allowing you to write tests directly in the same file as the logic they test. Here's a concrete example:

# person.ty

# Type definition
type Person <- list {
name: char,
age: int
};

# Constructor
let new_person <- fn(name: char, age: int): Person {
list(name = name, age = age)
};

# first method
let greet <- fn(self: Person): char {
paste0("Hello, my name is ", self$name,
" and I am ", self$age, " years old.")
};

# second method
let is_adult <- fn(self: Person): bool {
self$age >= 18
};

# Test block
Test {

# first test suite
test_that("Person initialization works correctly", {
let person <- new_person("Alice", 25)
expect_equal(person$name, "Alice")
expect_equal(person$age, 25)
})

# second test suite
test_that("greet returns correct message", {
let person <- new_person("Bob", 30)
expect_equal(
person.greet(),
"Hello, my name is Bob and I am 30 years old."
)
})

# third test suite
test_that("is_adult correctly identifies adults", {
let adult <- new_person("Charlie", 25)
let minor <- new_person("David", 15)

expect_true(adult.is_adult())
expect_false(minor.is_adult())
})
}

How Does It Work?

During transpilation, TypR takes an intelligent approach:

  1. Automatic separation: TypR code is transpiled to native R in the R/ folder
  2. Test extraction: Test { } blocks are extracted and transpiled to testthat
  3. Conventional organization: tests are automatically placed in tests/testthat/test-<filename>.R

For our person.ty example, TypR generates:

my-package/
├── R/
│ └── person.R # Transpiled native R code
└── tests/
└── testthat/
└── test-person.R # Tests extracted from #!test block

The final result is a standard, compatible R package that respects all R ecosystem conventions, while benefiting from the advantages of inline test blocks during development.

Cognitive Proximity and shorter development Cylcle

Logic and its tests are side by side. No more navigating between files to understand the expected behavior of a function. This proximity facilitates:

  • Code understanding for new contributors
  • Maintenance: you immediately see which tests are affected by a change
  • Living documentation: tests serve as specification directly visible

Classical TDD involves a "Red-Green-Refactor" cycle:

  • Write a failing test
  • Write minimal code to make it pass
  • Refactor

With TypR, this cycle becomes more fluid because everything happens in the same editing context. No file switching, no loss of focus.

Use Case: Test-Driven Development

Imagine we want to add a method to calculate birth year. With TypR, the TDD workflow becomes:

  # ... existing code ...

let birth_year <- fn() -> int {
# TODO: implement
}

Test {
# ... existing tests ...

test_that("birth_year calculates correctly", {
let person <- new_person("Eve", 30)
let current_year <- as__integer(format(Sys.Date(), "%Y"))
let expected_year <- current_year - 30

expect_equal(person.birth_year(), expected_year)
})
}
  1. I write the test first (Red)
  2. I implement the function just above (Green)
  3. I refactor if needed, with tests there to protect me

All in one file, one context, one continuous workflow.

Conclusion

Of course there are some limitations. Inline test blocks are not a silver bullet. Here are some points to consider:

File size: For functions with many tests, the file can become long. In this case, it's still possible to create traditional separate test files as a complement.

Team habits: Teams accustomed to strict code/test separation may need some adjustment time.

CI/CD integration: No impact since TypR transpiles to standard R. Your existing pipelines work without modification.

TypR's inline test blocks don't revolutionize TDD, but they make it more accessible and more enjoyable. By reducing friction between code and tests, they naturally encourage TDD practice adoption.

TypR's philosophy is simple: add typing and ergonomics to R without sacrificing ecosystem compatibility. Inline test blocks fit perfectly into this vision.

If you develop R packages and have always found TDD too constraining, TypR might just change your perspective. And if you're already convinced by TDD, inline test blocks will make your workflow even smoother.

Solving the OOP's "Chaos" for R

· 5 min read
Hategekimana Fabrice
IT trainer, Programming language designer

Introduction: The Object-Oriented Dilemma in R

For R package and application developers, the Object-Oriented Programming (OOP) landscape is, at best, a minefield. Between S3, which is informal and flexible but lacks validation; S4, which is formal but heavy; and R6, which introduces reference semantics but deviates from R's functional style, choosing an OOP system is often a painful compromise. This fragmentation slows down development, complicates maintenance, and introduces security flaws into the code.

TypR, a new typed version of the R language, offers a radical solution: the complete unification of the OOP system, reinforced by powerful static typing. Designed specifically for package developers, TypR aims to restore simplicity, speed, and security to the heart of the R ecosystem.

Unified OOP: Ending the OOP System War

TypR directly addresses the confusion by replacing existing systems (currently S3, and soon R6) with a single, coherent approach.

R OOP SystemMain CharacteristicDeveloper ProblemTypR Solution
S3Single dispatch, informalLack of validation, runtime errorsReplaced by a typed and validated system
S4Multiple dispatch, formalHeavy, complex syntaxReplaced by a simple, uniform syntax
R6Reference semanticsDeviates from R idiom, fragmentationReplaced by a unified approach with UFCS

TypR offer an abstraction that allows for increased development speed and simplified maintenance, as there is only one way to design robust objects.

Simplified OOP: Development Speed

In TypR, the developer focuses on defining types and functions, letting the language manage the underlying OOP implementation.

If I want to create a Person class, I just have to define a type and it's constructor:

type Person <- list {
name: char,
age: int
};

let new_person <- fn(name: char, age: int): Person {
list(name = name, age = age)
};

new_person("Alice", 32);
# $name
# Alice
# $age
# 32
# attr(,"class")
# [1] "Person" "any"

If you want to create a method for this type, you just have to create a function with this type as the first parameter:

let print <- fn(p: Person): Empty {
cat("Person<", p$name, ",", p$age, ">", sep="")
};

let person <- new_person("Alice", 32);

print(person)
# Person<Alice,32>

Underneath this implementation, TypR automatically created a Person class and implemented the generic function print.

This is the general shift from classic OOP:

Classic OOPTypR's OOP
ClassesTypes
MethodesFunctions
inheritancesubtyping

I choose to use types instead of classes because they are more versatil. While we can have classes and interfaces in modern OOP, TypR offer product types, union types, intersection types, interface types and exponential types.

Inheritance is no more a manual task. TypR's type inference is smart enougth to know when a type is a subtype of another. If it's the case, the subtype directly inherit the methods of its parent.

For instance the record type list { age: int } is a super type of Person (since Person is just an alias for the type list {name: char, age: int}). It's counter intuitive at firts, but in type theory, if all the fields of a list are present in a second list, the second list is a subtype of the first one. To help you understand this concept, you can read list { age: int } as "the set of list who have at least the field 'age: int'" and this is the case for Person.

With that in mind, if I create a method is_minor() for list { age: int }, Person will automatically inherit this method by the subtyping principle.

let is_minor <- fn(a: list { age: int }): bool {
a$age < 18
};

let person <- new_person("Bob", "15");

person |> is_minor()
# TRUE

Also types brings other cool powers:

  • Development Speed: Typing opens the door to advanced development tools. Auto-completion becomes more precise, and refactoring tools can operate with complete safety, significantly accelerating the iteration cycle.
  • Clarity and Readability: Explicit type definition acts as built-in documentation, making the code easier for collaborators to read and understand.

UFCS: Function Chaining Reimagined

TypR integrates the Uniform Function Call Syntax (UFCS), a powerful feature popularized by languages like Nim. UFCS is the cornerstone of OOP unification in TypR, as it reconciles R's functional style with object-oriented syntax.

Traditionally, in R, we write:

f(x, y)

Or, with the pipe operator:

x |> f(y)

With TypR's UFCS, these two syntaxes become equivalent to the object-oriented syntax:

x.f(y)

UFCS allows the developer to use dot notation (x.f()) to call any function whose first argument is x, without that function having to be formally a "method" of the object x.

Taking our previous example, we can use a dot shaped function call for our person:

let person <- new_person("Bob", "15");

# regular function call
is_minor(person)

# pipe function call
person |> is_minor()

# dot function call
person.is_minor()

Benefits for R Developers:

  • Natural Piping: Function chaining becomes free and unambiguous. The dot operato can be used interchangeably with the pipe operator.
  • Method Discovery: IDEs can list all applicable functions for an object simply by typing object. (dot) or object |> (pipe), improving the ergonomics of writing code.
  • Uniformity: Whether you call a generic function or a type-specific method, the syntax remains the same, eliminating confusion between different calling styles.

Conclusion: The Future of R is Typed

TypR is not just a new version of R; it is a re-foundation of how robust packages and applications are built. By targeting the major pain points of R's fragmented OOP and adding the security of static typing and the fluidity of UFCS, TypR offers developers:

BenefitImpact
SimplicityA single, unified OOP system replaces the complexity of S3/S4/R6.
SpeedStatic typing for safe refactoring and precise auto-completion.
SecurityCompile-time error detection, reducing production bugs.

If you are an R package developer tired of compromising between flexibility and robustness, TypR is the revolution you have been waiting for. It is time to switch to a simpler, faster, and safer R. What do you think?

TypR for the frontend?

· 3 min read
Hategekimana Fabrice
IT trainer, Programming language designer

When R meet Javascript

Imagine being able to generate JavaScript with the syntax of R and all the rigor of static typing? This is exactly what JS Blocks enable.

The Problem with Generated Code

Programmatically generating code is a common practice: building queries, creating front-end scripts, templating... But this approach suffers from a major flaw: the absence of verification at write-time. You concatenate strings hoping the result will be syntactically correct, and you only discover your errors at runtime.

js_code <- paste0("let a = ", value, ";")  # What if 'value' isn't defined?

R suffer from this major problem when working with web applications. There are a lot of tools that allow users to integrate javascript with R. But it sometimes require a fair amount of knowledge about javascript and it's ecosystem.

The Elegance of JS Blocks

JS Blocks reverse this logic. You write code that looks like TypR with javascript functions/variables, but benefits from the full power of static typing:

let js_code <- JS {
let a: int <- 5;
let b: char <- "hello";

let greet <- fn(name: char): char {
"Hello, " + name
};
};

This code will generate:

js_code <- "let a = 5;
let b = 'hello';

let greet = (name) => {
'Hello, ' + name
};
"

This code transpiles to an R string containing valid JavaScript, but the magic happens before: the compiler checks types, detects errors, and guarantees that the generated code will be syntactically correct.

Since this code gives a string, it's possible to write it in a separate file or inject it in tools that require javascript code.

Concrete Advantages

  • Compile-time safety: Type errors are detected immediately, not when your JavaScript executes in the browser.

  • Autocompletion and IntelliSense: Your editor understands your code's structure, even though it will ultimately be converted to a string.

  • Risk-free refactoring: Rename a variable, and all its occurrences in your JS Blocks are automatically updated.

  • Living documentation: Type annotations serve as built-in documentation about what your generated code is supposed to do.

Compelling Use Cases

This approach particularly shines in several scenarios:

  • Generating interactive visualizations: Create complex configurations for D3.js or Plotly with the certainty that your parameters are properly typed.

  • Building R webapps: In Shiny or other frameworks, generate JavaScript for the front-end with all the rigor of the backend.

  • Advanced templating: Produce configurable scripts where parameters are statically validated before generation even occurs.

  • Testing and mocking: Generate JavaScript test code with guarantees of consistency.

  • Building new Web frameworks.

Innovation Through Constraint

What makes JS Blocks particularly interesting is that they transform a constraint (having to generate JavaScript) into an opportunity. Rather than working around the problem with fragile solutions, they tackle it head-on by creating a bridge between two worlds: the rigor of static typing and the flexibility of code generation.

It's a perfect example of how good language design can solve real problems without adding unnecessary complexity. The concept is simple to understand, but its implications are profound.

Conclusion

JS Blocks represent an elegant approach to a problem every developer has encountered. By bringing static typing to generated code, they eliminate an entire class of errors while preserving the flexibility needed for programmatic generation.

In my opinion, it's the kind of innovation that, once adopted, makes it impossible to imagine working any other way.

R and TypR

· 6 min read
Hategekimana Fabrice
IT trainer, Programming language designer

When people first hear about TypR, a very common question comes up:

“Is TypR meant to replace R?”

The short answer is no. The more interesting answer is: TypR is designed to live next to R, not instead of it.

TypR is not a new platform, a new runtime, or a new ecosystem that forces you to rewrite everything. It is a typed language that transpiles to R, and integrates directly into the existing R ecosystem.

I will use an example to illustrate this concept. We will assume we already have an existing package named along with a file along/R/hello.R:

# along/R/hello.R
#' @export
hello <- function() {
print("Hello wolrd")
}

A TypR package is just an R package

Someone might ask:

How to create an TypR package from an R package?

Simple answer: Just add one folder!

Indeed, structurally, nothing exotic is happening. A TypR package is simply a normal R package with one additional folder:

along/
DESCRIPTION
NAMESPACE
R/
TypR/ <- A new folder for TypR's code
along.Rproj
man

And voila!

The TypR/ folder contains files written in TypR, while the R/ folder contains the R code that will actually be executed.

Let's create a file named main.ty (mandatory). Inside that file, we will create a type Person with a name and an age.

type Person <- list {
name: char,
age: int
};

let new_person <- fn(name: char, age: int): Person {
list(name = name, age = age)
};

Let's also create a get_info that will return a string with the information of the person.

let get_info <- fn(p: Person): char {
paste(p$name, " is ", p$age, " years old")
|> as__character() #for compatibility
};

Then we can transpile our code with the terminal command typr build at the root of the project.

typr build

When you transpile TypR, you are not producing a special artifact. You are simply generating R code into the package’s R/ directory. In this case, it will produce the content of the main file main.ty and other helper files in alphabetical order:

along/
R/
a_std.R <-- generated helper file
b_generic_functions.R <-- generated helper file
c_types.R <-- generated generated helper file
d_main.R <-- generated entry point to TypR's code
hello.R <-- default file

To R, CRAN, devtools, pkgdown, testthat, and everything else, this is just a regular R package.

It's now possible to call the functions from hello.R.

#' @export
hello <- function() {
person <- new_person("John", 27)
get_info(person)
}

If we try to load the function with the R console we get the expected result:

# In the R console
> devtools::install()
> library(along)
> hello()
[1] "John is 27 years old"
attr(,"class")
[1] "Character" "character" "any"

TypR introduces no new runtime

TypR does not require:

  • a VM
  • a native compiler
  • a special version of R
  • system-level dependencies

Just the typr binary file. Once transpiled, the result is plain R. It is parsed, interpreted, and executed by R exactly like any other R code.

TypR only exists at development time. At runtime, it disappears.

You can freely mix TypR and R

Inside the same package, you can:

  • write some functions in TypR
  • keep others in regular R
  • call TypR-generated code from R
  • call R code from TypR

We have already shown all except the last one, so let's try it. We will start by creating a along/R/utils.R file with an example function:

# along/R/utils.R

example <- function() {
print("This is an example function!");
}

Now we can use it from TypR within our get_info() function:

# along/TypR/person.ty
# example is a function which take nothing and return nothing
@example: () -> Empty

let get_info <- fn(p: Person): char {
example(); # <--- Using the R function here
paste(p$name, " is ", p$age, " years old")
|> as__character() #for compatibility
};

Now get_info() also call example() and is integrated into our package:

> devtools::install()
> library(along)
> hello()
[1] "This is an example function!"
[1] "John is 27 years old"
attr(,"class")
[1] "Character" "character" "any"

Best practices

It's better to only use custom R functions from TypR only when using TypR's untyped functions isn't enougth (for some reasons). In all cases, interoperability exists.

TypR is not an “all or nothing” mode. It is just an alternative source language that produces R. You can migrate gradually, file by file, function by function.

The only possible point of friction

There is only one situation where TypR can collide with R:

When TypR transpiles into a file that already exists in R/.

For instance, if you create a hello.ty into the TypR/ folder, it will rewrite the hello.R in the R/ folder, then yes — there is a collision. But that is just two tools writing to the same file.

This is not a conceptual conflict between R and TypR. It is simply a file-level conflict, easy to avoid through conventions or configuration.

Other than that, TypR and R do not step on each other.

Best practice

A good practice is to use main.ty as an aggregation module and create one file per type.

In our case, we should put the content of main.ty into the person.ty file, then import it within the main file with the mod keyword:

mod person;

TypR will understand that it should find un file named "person.ty", parse, type check and transpile it into a person.R file within the R/ folder. This practice will bring the same result but with a cleaner code.

The right mental model

TypR is not a new language that replaces R. It is much closer to things like:

  • TypeScript for JavaScript
  • Cython for Python
  • Scala.js for JavaScript

You write in a more structured, safer, more expressive language… and you get standard R code at the end.

Why this matters

This changes the real question.

You don’t have to ask:

“Should I leave R for TypR?”

But rather:

“Would this part of my code benefit from types?”

You can keep:

  • tidyverse
  • data.table
  • CRAN
  • your existing code
  • your team

…and add where it makes sense:

  • static typing
  • richer data structures
  • compile-time guarantees
  • safer refactoring

TypR does not compete with R. It amplifies R with software engineering good practices.

Vectorization by design

· 8 min read
Hategekimana Fabrice
IT trainer, Programming language designer

Introduction

Vectorization is one of the greatest tools for data manipulation I know and I am happy that R got this system out of the box. It makes computation simple and simplifies translation from formula to code.

Unfortunately I have encountered one limitation: R's vectors are not very compatible with functional programming or object-oriented programming, two paradigms I like when building libraries or applications. This is what TypR's vectorization is improving with the mechanism of lifting-based vectorization.

Throughout this post, we will attempt to construct a concept of a 2D geometric point. A point is a mathematical object that has x and y coordinates. We will use the S3 system from R for illustration and then use TypR's type system.

Vectorization with R

As I said, R makes it pretty easy to vectorize primitive types like integers or characters.

# building a vector
v <- c(1, 2, 3, 4)

# Now we can multiply with a number or add a number
3*v + 4
#[1] 7 10 13 16

Point construction

But what happens if we define a Point type?

# Point type creation with S3 (without valdation for simplicity)
Point <- function(x, y) {
structure(
list(x = x, y = y),
class = "Point"
)
}

We will also define a print method to make a point easier to visualize.

print.Point <- function(p, ...) {
cat("Point<", p$x, ",", p$y, ">\n", sep="")
invisible(p)
}

So we have our Point type!

Point(3, 4)
#Point<3,4>

And we will define a scale method to scale a point with a number. It will multiply each coordinate with this number.

scale.Point <- function(p, n) {
Point(p$x*n, p$y*n)
}

It's now functional!

# we can use it this way
scale(Point(3, 4), 2)
#Point<6,8>

# or this way
Point(3, 4) |> scale(2)
#Point<6,8>

A little bonus: we will also supercharge the * operator for the Point class so scaling can be done using right multiplication with a number.

# Definition of the multiplication operator for Point 
`*.Point` <- function(p, n) {
scale(p, n)
}

# Now this works
Point(3, 4) * 2
#Point<6,8>

# But the other way around doesn't work
2 * Point(3, 4)
#Error

Point vectorization

But what if we want a vector of Point?

points <- c(Point(1, 2), 
Point(3, 4),
Point(5, 6))

points

#$x
#[1] 1
#
#$y
#[1] 2
#
#$x
#[1] 3
#
#$y
#[1] 4
#
#$x
#[1] 5
#
#$y
#[1] 6

The vector of points loses its structure! We can't access it as expected:

points$x
#[1] 1

points$y
#[1] 2

points[1]
#$x
#[1] 1

To make a vector of points, we have to store them in a list. In R, a list is a vector of pointers so one can save any structure with them.


points <- list(Point(1, 2),
Point(3, 4),
Point(5, 6))


points
#[[1]]
#Point<1,2>
#
#[[2]]
#Point<3,4>
#
#[[3]]
#Point<5,6>

But we aren't keeping the capability of using native vectorial operations anymore.

# works well
points[[1]]
#Point<1,2>

scale(points, 2)
#Error

points$x
#NULL

points$y
#NULL

It's better to directly use vectors inside an S3 object but you have to do some gymnastics for that. Because of that, developers should keep in mind they should keep in mind vectorization while building functions for objects or functions (which is a mental load by itself).

What about TypR?

I'm glad you asked!

TypR uses its own vectorization system named lifting-based vectorization. This concept exploits the type system of TypR to infer when to apply vectorization.

The best way to use vectorization is not to think about vectorization.

The developer just has to write their functions for scalar values and TypR will decide when to lift the parameters and the function into a vectorial computation based on how the function is used. It's also compatible with more complex types like named lists or functions.

Point construction

Let's build our Point type and a constructor with TypR:

# Type definition
type Point <- {
x: int,
y: int
};

# Constructor for the Point type
let new_point <- fn(x: int, y: int): Point {
list(x = x, y = y)
};

We will also define a print function for points:

# print function
let print <- fn(p: Point): Empty {
cat("Point<", p$x, ",", p$y, ">", sep="");
invisible(p);
};

Now we can build a point like before:

new_point(3, 4)
#Point<3,4>

We won't forget to implement the scale function.

let scale <- fn(p: Point, n: int): Point {
new_point(p$x * n, p$y * n)
};

new_point(3, 4)
|> scale(2)
#Point<6,8>

Of course, we also have the capability of implementing the * operator:

let `*` <- fn(p: Point, n: int): Point {
scale(p, n)
};

new_point(3, 4) * 2
#Point<6,8>

Point vectorization

Now what about vectors? TypR has its own way to deal with them. For better understanding, let's make a vector of points:

# creating a vector of points in TypR
points <- [new_point(1, 2),
new_point(3, 4),
new_point(5, 6)]

points
#typed_vec [3]
#[1] Point<1,2>
#[2] Point<3,4>
#[3] Point<5,6>

We have an array notation syntax like other programming languages. This kind of array automatically exploits the print function for its members.

What's the best part? TypR's arrays are vectorized by default! So these operations work:

# scaling a group of point with one number
scale(points, 2)
#typed_vec [3]
#[1] Point<2,4>
#[2] Point<6,8>
#[3] Point<10,12>

# same but with pipe and a groupe of numbers n
points
|> scale([1, 2, 3])
#typed_vec [3]
#[1] Point<1,2>
#[2] Point<6,8>
#[3] Point<15,18>

# same but with the "*" operator
points * 3
#typed_vec [3]
#[1] Point<3,6>
#[2] Point<9,12>
#[3] Point<15,18>

We also have the possibility to work with types by themselves. Let's define the + operator that will help adding two points by adding their respective fields.

# Definition of the "+" operator
let `+` <- fn(p1: Point, p2: Point): Point {
new_point(p1$x + p2$x, p1$y + p2$y)
};

# adding a group of point with a scalar
points + new_point(1, 1)
#typed_vec [3]
#[1] Point<2,3>
#[2] Point<4,5>
#[3] Point<6,7>

# adding two group of point of the same type
points + points
#typed_vec [3]
#[1] Point<2,4>
#[2] Point<6,8>
#[3] Point<10,12>

And what about reduction functions? One can use the reduce function to reduce the elements of the array.

# will add all the points
reduce(points, `+`<Point>)
#Point<9,12>

We specify the type <Point> for the + operator because TypR's type system isn't doing this kind of inference yet. But as you can see, it summed all elements of points. One can also use a shortcut by using the sum function:

points
|> sum()
#Point<9,12>

It automatically uses the + operator underneath. If your type implements it, sum will work on the vector.

And for functions? We can also do function composition powered by vectors. I won't present examples there but later in another publication.

Future works

Type specific applications

I would like to create vectorized field accessors to make it easier to work with a vector of named lists:

# will give all the values contained in the x field of each point
points$x
# will give all the values contained in the y field of each point
points$y

It would also be cool to be able to call similar functions with the same parameters.

# vector of functions
let functions <- [`+`, `*`];

functions(3, 4)
# could return:
#typed_vec [3]
#[1] 7
#[2] 12

It could help with applying a set of statistical models to a specific set of data.

Compabilities with other systems

Underneath, TypR's array are using a custom S3 object for data storage and vectorization. This doesn't invalidate native vectors or data.frame from R who will be faster and efficient. I want to create bridge that will help convert them into native types.

# In the future, Array -> Vector for performances
let arr <- [1, 2, 3, 4];
let vec <- arr |> to_vec();

# In the future, Array -> dataframe for performances
let df <- points |> to_df();

Conclusion

Even though lifting-based vectorization looks like reinventing the wheel, I truly believe it's a true conceptual heir of the classic way of doing vectorization and a logical continuation of it if R was a typed language.

Now the responsibility of vectorizing functions is no more in the hands of the developper who can now focus on solving the problem. TypR offer a flexible interface to works with vectors.

TypR's new official documentation

· One min read
Hategekimana Fabrice
IT trainer, Programming language designer

After some adventures, the official documentation will soon be open.

Documentation is sometimes hard to maintain because it is an independent entity to the code base. It's even created in a separate github repository.