If you are going to be super-strict with type-checking, wouldn’t it be best to switch to a statically typed language and get the performance gains as well?
Hallelujah, that's always been my position. To the static typing folks: leave my dynamically typed languages alone and go coding with something that really suit your needs. If the answer is that Python, Ruby, JS, whatever are really much more pleasant to code with, my reply is that they are so precisely because we don't have to type type definitions. Tradeoffs.
I think types are particularly valuable for libraries. A library author using copious types really helps the downstream user to know "Ok, this function returns a dict(Foo, Bar)". But after that, it's a matter of preference if you want to add those types to your own code or not.
Having the types in the libraries makes it a lot easier for your tools/IDEs to give good suggestions and catch bugs that you might otherwise miss.
Personally I like having my TypeScript cake and eating it.
I also truly believe those who design type systems would benefit from taking a look what kind of code people programming in dynamically-typed languages produce.
I do too, but I feel like TypeScript stands alone as an unusually effective and pleasant to use bolted-on type system. I've not seen any other approach come close. (My sample size is Python, Ruby and Elixir)
I started using types with Python in 2018-ish, and I never looked back.
I am not that good a programmer, so maybe I am wrong, but I just like being able to tell what the data is that's moving through the system. Typed function signatures, a little shift+k here and there, a warning that I am trying to add int and a string. I don't see what's the harm in having that?
At the end of the day, if you don't want to use Python with types -- do not. Unless somebody at work is forcing you, and it feels like putting lipstick on a pig (especially with something like numpy that doesn't easily support types)? Then condolences.
> To the static typing folks: leave my dynamically typed languages alone
Surely you understand that the push to add types to dynamically-typed languages comes from dynamic-typing folks, not from static-typing folks. People who are deeply into static typing have little incentive to consider e.g. Python, whose support for types is relatively weak, loosely-defined, and rarely-enforced compared to the statically-typed languages that exist today.
I am totally with you and I am glad I am not the only one who is totally against those type-addictions leaking into languages that did not need them in the first place.
Types in ruby are even worse than in python, because the type systems in use really make ruby turn very ugly. In python it is not as much as a huge problem with regards to syntax, as python has a stricter syntax (e. g. mandating foo.bar() whereas in ruby you can typically omit the (), among other syntax sugar examples).
We need to keep the type people out of those languages.
Many years ago, on IRC, on #haskell, they said they don't want everyone to use Haskell. Back then I did not understand it. After the type-addicted people emerged out of nowhere, I now begin to understand why Haskell is so snobbish. If you let every idea float, you end up ruining languages - and then those who wanted this, will retire and move away too. Ultimate damage factor caused as outcome here.
The only reason I gave up resisting and started writing any significant code in Python at all was that it got some kind of type system, and thus became less unpleasant to code with.
"Pleasant to code with" does not describe getting "AttributeError: 'NoneType' object has no attribute 'foo'" 25 levels deep in a stack trace already obfuscated by dynamic object-oriented nonsense. In production, because it's an unusual case and testing missed it. Not that test cases aren't way more work than types anyway.
Running more type checkers isn't really about strictness. The main benefit to library maintainers is to make sure that their APIs are compatible with whatever tools their users run.
This wouldn't really be an issue for most other languages, but Python's typing ecosystem is uniquely fragmented, with only partial standardization between several popular tools.
GP's point is obvious: performance is immaterial to the discussion. Static code analysis is about preventing bugs. Therefore OP fails to make any sort of point, as it's a straw man argument.
What statically typed language would you suggest for machine learning and large data pipelines? I don't love Python, but it has by far the largest ecosystem.
Well, that's the curse of machine learning: since everyone uses Python you have to deal with Python. Even though Python isn't very nice when things start to get serious and you don't want to spend your time fiddling with noise just to make something work at scale.
I'd wish the ML/AI/LLM crowd would see that it is in their interest to get better developer ergonomics at scale. (I don't want to have to turn to C++)
Cython is a niche language for writing perf-critical bits inside your Python codebase. It's like C for people who don't want to learn C. At least that's how I treated it, when I had to write some stuff to make some numpy ops faster.
Cython is not in any real sense a replacement for a modern data/ml stack.
> What statically typed language would you suggest for machine learning and large data pipelines? I don't love Python, but it has by far the largest ecosystem.
Pay no attention to OP. It's nonsensical to even suggest you should migrate away from a whole tech stack just because you want to run static code analysis, specially when the argument is based on having too many static analysis tools to chose from. Utter nonsense.
> If you are going to be super-strict with type-checking, wouldn’t it be best to switch to a statically typed language and get the performance gains as well?
strict type checking is an incredibly useful tool for cases when you really want to make sure your code is correct and behaving as expected (one of many tools).
There are lots of people who like python and want to use it for things that where incorrect code has serious consequences. Type checking is helpful in these contexts.
Type checking remains optional for the masses and is not practical in many cases. Still, pushing away people who want to use all available tools for writing correct python only hurts the community.
Python has other, bigger problems that make it a constant headache. One of them being the dismissive attitude towards any and all of problems that come from versioning, dependencies and quirks that make it challenging to have robustness.
Criticisms are typically dismissed by suggesting heaping yet another "solution" onto the growing pile of "solutions" that you have to drag around with you. That people have to learn. That you have to install tooling for. That has to be vetted. That has to become part of the toolbox to get even seemingly simple things done. This attitude is a big part of the reason that I strongly advise people against using Python in production. On top of all the problems presented in a real-world setting.
Almost all of the time, people who are fond of Python are more interested in defending python, disparaging me, downvoting me etc that listen to why I make that recommendation.
(I get it. People like Python. What I think of Python as a language is irrelevant. In fact I don't have that much against it. But I do have a lot against it in a setting where you need reliability and repeatability)
I have spent the last month of my life building a system that can run Python tooling reliably in a business critical application. I knew this was going to be a pretty big job when I started, but for every problem I solve, a bunch of new problems arise. I am starting to see light at the end of the tunnel but it hasn't exactly been smooth sailing. I'm almost there for a first version, but there are a bunch of problems still to solve. Mostly because I care about developer ergonomics and that things should "just work". One important goal is that my solution shouldn't impose any significant cognitive burden on people who use it. That's really hard.
(I don't think the solution will be open source since my contract wouldn't allow for it. But I'll make the case at some point for why it should be open sourced)
And yes. There are statically typed languages available today that have decent tooling that provides superior developer ergonomics. I can understand that people don't want to learn new languages, but if you have the capacity to do so I would recommend trying to move on if the code you write has to run outside your own workstation. If an old fart like me can learn and adopt new languages, so can you.
Yeah, I can't say I really get the appeal of gradual typing. It's commented/documented code at best and outright lies at worst. Sure, you can build tooling around it and improve your DX a bit but isn't it always a house of cards?
> If you are going to be super-strict with type-checking, wouldn’t it be best to switch to a statically typed language and get the performance gains as well?
I don't understand your question. The whole point of static code analysis is preventing bugs. Don't you like Python code to not have bugs that are easily caught with static code analysis, or is preventing code a foreign idea that is better left to other languages?
Yes, but unfortunately Python has invaded everything, and one must adapt.
Python is going to be preinstalled on almost any machine I use, with a reasonable assortment of libraries. And even if they're not preinstalled, the libraries I want are likely to exist. They'll have unstable APIs and weird quirks, and I'll have to take my choice of bad packaging systems to install them, and everything will just generally be a pain, but they'll exist and largely work. That's not true for any language I actually want to code in. I mean, I'm not going to deny that Python is better than shell scripts or (usually) C.
It's not like it's a pleasant language to code in, especially if you actually want to use the type support, which is weird and irregular and keeps changing and has to work around fundamental design problems at the core of the language.
> Prioritise running as many type-checkers as possible on your test suite. Run at least one on your source code.
There are two types of tests: those that test against the public API, and those that test internal codes with various mocks and fakes. I think the vast majority of unit tests is the latter one, in which case the suggestion does not really make sense.
From my experience with Python, both personal and professional, I find it immature and not well-suited for large codebases. Typing should have become part of the language a long time ago; it is clear that users want it.
Take, for example, PHP… look at the features released in the last 6 or so years, starting with PHP 7, and how mature the language has become.
With the advance of AI-assisted programming, I feel like Python is always a bad choice.
> "In Python, any method __eq__ is expected to return bool, and if it doesn't, then we need to explicitly tell type-checkers to ignore the type error. This function in Polars can also return different types depending on the inputs, thus requiring overloads."
Why would you ever want a == b to not return a bool??
EDIT: Yes, I understand that you can do element-wise equality checks on numpy arrays now
There are examples like ORM query builders (something like `User.id == user_id` should not return a boolean, but rather some inspectable query part), multi-value comparisons (e.g. numpy arrays and views which could also be used as masks for indexing)
In general, when you get your hands on operator overloading you get a bunch of various quirky applications for each. Some dunder methods have strict runtime-level rules (e.g. __hash__ or __len__), some don't
Elementwise equality! Given two dataframe columns or ndarrays, users often expect `==` to give out a column or ndarrays of bools (like `+`, ``, `*, `&`, and just about every other binary operator).
It could return a vector or a deferred expression? In polars, for example, operations on `pl.col` return `Expr` objects that are used to build queries, not immediately evaluated:
df.filter(pl.col("status") == "active")
In numpy, `x == y` return a boolean vector of the same shape as x and y, comparing them element-wise.
I can see the first one making sense, but why would you need a representation of equality other than "yes, these are equal" and "no, these are not equal"?
The first use case that comes to mind is if you want a DSL to build expressions that are evaluated later in some different context e.g. when using `polars`:
Well personally I’m not a fan of turning everything into an object, but if you have properties or methods that exist upon the concept of Equality you might want to encode directly onto a class. Maybe in a domain where “Equality” is an important concept, like mathematics or even something like accounting.
Could enable a different interface into approximate equality for floating point numbers: Equality.approximate(iota: float) -> bool
Never understood this complaint about operator overloading.
In any language, a function called `isEqual` could wipe your hard drive and replace your wallpaper with a photo of a penguin. Therefore, letting programmers pick the names of their functions is bad? No, obviously naming things for least surprise is the programmer's responsibility.
But when it's the symbols `==` instead of an ASCII name, it's a problem in language design?
(FWIW in Javascript, being unable to override == is actually a problem when you want to use objects as Map keys)
Python never met a footgun it didn’t need to adopt. In this case, however, it’s not equality checks, but operator overloading. I was a Python developer for a decade before switching to Go and life on this side is so much better.
Operator overloading has never been an issue for me, but terminating a line with a comma creating a tuple, or white space (including new lines) between strings to concatenate have cost me days of work over the years.
I understand why those exist, but they’re pure evil.
IIRC, SQLAlchemy overloads this to return an object that represents an equality check in SQL. Because it was returning an object, it was always evaluating to True, because of another of Python’s footguns: truthiness/falsiness. This was a decade ago, and these particular footguns were not even remotely the biggest culprits in our bug backlogs (another honorable mention includes accidentally calling a sync function in an async context, causing timeouts in unrelated endpoints and leading to cascading system failure).
The fact that this article seems to honestly recommend people run 5 different type checkers on library test suits really reflects the tacked on feeling of Python typing.
I am not sure it is recommending more than it is commenting on the current state of developing public-facing APIs in Python.
The downstream users that import the package either have to ignore checking its exported types altogether, manually stub it, or have a subpar development experience to varying degrees.
This is something I saw the other day with some package that provided comprehensive stubs for an untyped library. The .pyi file was littered with comments about quirks from the numerous type checkers (five now).
Why anyone would still use mypy besides legacy infrastructure is beyond me. It is dog slow as well as being the laziest of all, not catching many mistakes.
Unfortunately for Django apps switching to any alternative leads to the dreaded “wall of errors” issue. If anyone got to work this out in the past, I’d gladly take advices.
what are ppls' impression of pyrefly? i've become completely captive to uv's tooling. it has allowed me to think only about coding versus tooling. dont feel like giving another typechecker a chance unless it offer's something i'm not getting from ty.
Why would users care if you're using the same type checker as them? Surely they're not expecting all their imports to be instrumented for running redundant types checks?
Users do not care about that, but they want to not see type errors or warnings when they integrate your API in their code.
That’s why you want to run their type checker on your API. you cannot know what “their type checker” is, so you want to run all popular type checkers on your API.
Sounds like a them-problem. Their type checker can accept my declared typings for my public API, or they can override it with their own custom type stubs if they have objections.
The blog entry fits into ruby too, to some extent; while the situation is
nowhear near as bad as in python, you have the same question-marks why
types suddenly emerge out of nowhere. Almost ... almost as if some people
have a specific agenda, and try to pull through with it.
Well, there you have it - the type-addicted people are ruining python.
With agents it no longer makes sense to tie yourself to Python's archaic
development experience. How many type checkers are there? Package managers? Don't even get me started on cross-platform deployment.
Strongly typed, compiled languages have never been easier to use, and agents reap huge benefits from the tight feedback loop that the compiler provides. Moreover the benefits of the Python ecosystem are less significant today than anytime in the past 20 years. Need something that's only available in Python? Just point some agents at it and you can port it.
> Just point some agents at it and you can port it.
Don’t think we’re there yet, otherwise we would see a bunch of forks of major libraries to alternative languages - and not just Python. There’s still too much risk of insidious errors and bugs.
I've done thus a few times for stuff in the < 10,000 LOC space. It works great.
There's something particularly satisfying about shipping a 1-10MB static rust binary instead of a 2GiB docker python environment.
(I'm talking about just porting simple applications, or maybe a missing package/crate at a time. Not both at once, and not typical 100K-10M line internal legacy sprawl)
I've noticed this a lot in LLM generated Java. Since it doesn't know what can or can't be null it tends to wrap everything in Optional<T>. Super strong type systems are becoming even more important.
I think types are particularly valuable for libraries. A library author using copious types really helps the downstream user to know "Ok, this function returns a dict(Foo, Bar)". But after that, it's a matter of preference if you want to add those types to your own code or not.
Having the types in the libraries makes it a lot easier for your tools/IDEs to give good suggestions and catch bugs that you might otherwise miss.
I also truly believe those who design type systems would benefit from taking a look what kind of code people programming in dynamically-typed languages produce.
I am not that good a programmer, so maybe I am wrong, but I just like being able to tell what the data is that's moving through the system. Typed function signatures, a little shift+k here and there, a warning that I am trying to add int and a string. I don't see what's the harm in having that?
At the end of the day, if you don't want to use Python with types -- do not. Unless somebody at work is forcing you, and it feels like putting lipstick on a pig (especially with something like numpy that doesn't easily support types)? Then condolences.
Surely you understand that the push to add types to dynamically-typed languages comes from dynamic-typing folks, not from static-typing folks. People who are deeply into static typing have little incentive to consider e.g. Python, whose support for types is relatively weak, loosely-defined, and rarely-enforced compared to the statically-typed languages that exist today.
Types in ruby are even worse than in python, because the type systems in use really make ruby turn very ugly. In python it is not as much as a huge problem with regards to syntax, as python has a stricter syntax (e. g. mandating foo.bar() whereas in ruby you can typically omit the (), among other syntax sugar examples).
We need to keep the type people out of those languages.
Many years ago, on IRC, on #haskell, they said they don't want everyone to use Haskell. Back then I did not understand it. After the type-addicted people emerged out of nowhere, I now begin to understand why Haskell is so snobbish. If you let every idea float, you end up ruining languages - and then those who wanted this, will retire and move away too. Ultimate damage factor caused as outcome here.
"Pleasant to code with" does not describe getting "AttributeError: 'NoneType' object has no attribute 'foo'" 25 levels deep in a stack trace already obfuscated by dynamic object-oriented nonsense. In production, because it's an unusual case and testing missed it. Not that test cases aren't way more work than types anyway.
This wouldn't really be an issue for most other languages, but Python's typing ecosystem is uniquely fragmented, with only partial standardization between several popular tools.
GP's point is obvious: performance is immaterial to the discussion. Static code analysis is about preventing bugs. Therefore OP fails to make any sort of point, as it's a straw man argument.
I'd wish the ML/AI/LLM crowd would see that it is in their interest to get better developer ergonomics at scale. (I don't want to have to turn to C++)
Cython is not in any real sense a replacement for a modern data/ml stack.
Pay no attention to OP. It's nonsensical to even suggest you should migrate away from a whole tech stack just because you want to run static code analysis, specially when the argument is based on having too many static analysis tools to chose from. Utter nonsense.
You can use type-checking to get better performance already, without leaving Python. See https://blog.glyph.im/2022/04/you-should-compile-your-python...
There are lots of people who like python and want to use it for things that where incorrect code has serious consequences. Type checking is helpful in these contexts.
Type checking remains optional for the masses and is not practical in many cases. Still, pushing away people who want to use all available tools for writing correct python only hurts the community.
Criticisms are typically dismissed by suggesting heaping yet another "solution" onto the growing pile of "solutions" that you have to drag around with you. That people have to learn. That you have to install tooling for. That has to be vetted. That has to become part of the toolbox to get even seemingly simple things done. This attitude is a big part of the reason that I strongly advise people against using Python in production. On top of all the problems presented in a real-world setting.
Almost all of the time, people who are fond of Python are more interested in defending python, disparaging me, downvoting me etc that listen to why I make that recommendation.
(I get it. People like Python. What I think of Python as a language is irrelevant. In fact I don't have that much against it. But I do have a lot against it in a setting where you need reliability and repeatability)
I have spent the last month of my life building a system that can run Python tooling reliably in a business critical application. I knew this was going to be a pretty big job when I started, but for every problem I solve, a bunch of new problems arise. I am starting to see light at the end of the tunnel but it hasn't exactly been smooth sailing. I'm almost there for a first version, but there are a bunch of problems still to solve. Mostly because I care about developer ergonomics and that things should "just work". One important goal is that my solution shouldn't impose any significant cognitive burden on people who use it. That's really hard.
(I don't think the solution will be open source since my contract wouldn't allow for it. But I'll make the case at some point for why it should be open sourced)
And yes. There are statically typed languages available today that have decent tooling that provides superior developer ergonomics. I can understand that people don't want to learn new languages, but if you have the capacity to do so I would recommend trying to move on if the code you write has to run outside your own workstation. If an old fart like me can learn and adopt new languages, so can you.
Machine-checked documentation is always valuable, IMO
For people who don't have a choice, type checked Python is better than nothing.
I don't understand your question. The whole point of static code analysis is preventing bugs. Don't you like Python code to not have bugs that are easily caught with static code analysis, or is preventing code a foreign idea that is better left to other languages?
Python is going to be preinstalled on almost any machine I use, with a reasonable assortment of libraries. And even if they're not preinstalled, the libraries I want are likely to exist. They'll have unstable APIs and weird quirks, and I'll have to take my choice of bad packaging systems to install them, and everything will just generally be a pain, but they'll exist and largely work. That's not true for any language I actually want to code in. I mean, I'm not going to deny that Python is better than shell scripts or (usually) C.
It's not like it's a pleasant language to code in, especially if you actually want to use the type support, which is weird and irregular and keeps changing and has to work around fundamental design problems at the core of the language.
There are two types of tests: those that test against the public API, and those that test internal codes with various mocks and fakes. I think the vast majority of unit tests is the latter one, in which case the suggestion does not really make sense.
Take, for example, PHP… look at the features released in the last 6 or so years, starting with PHP 7, and how mature the language has become.
With the advance of AI-assisted programming, I feel like Python is always a bad choice.
Why would you ever want a == b to not return a bool??
EDIT: Yes, I understand that you can do element-wise equality checks on numpy arrays now
In general, when you get your hands on operator overloading you get a bunch of various quirky applications for each. Some dunder methods have strict runtime-level rules (e.g. __hash__ or __len__), some don't
Another example might be if you have a domain specific representation of equality (e.g. class Equality)
```python df.filter( pl.col("foo") == pl.col("bar"), ) ```
Sqlalchemy does something equivalent too, and I'm sure there are many others.
Could enable a different interface into approximate equality for floating point numbers: Equality.approximate(iota: float) -> bool
In any language, a function called `isEqual` could wipe your hard drive and replace your wallpaper with a photo of a penguin. Therefore, letting programmers pick the names of their functions is bad? No, obviously naming things for least surprise is the programmer's responsibility.
But when it's the symbols `==` instead of an ASCII name, it's a problem in language design?
(FWIW in Javascript, being unable to override == is actually a problem when you want to use objects as Map keys)
I understand why those exist, but they’re pure evil.
The downstream users that import the package either have to ignore checking its exported types altogether, manually stub it, or have a subpar development experience to varying degrees.
This is something I saw the other day with some package that provided comprehensive stubs for an untyped library. The .pyi file was littered with comments about quirks from the numerous type checkers (five now).
> The type checking that matters most (and why you've probably got it backwards)
Honestly, I don’t care if the author got some AI help. But that click-bait style is ubiquitous and obnoxious.
Unfortunately for Django apps switching to any alternative leads to the dreaded “wall of errors” issue. If anyone got to work this out in the past, I’d gladly take advices.
django==4.2.30
djangorestframework==3.16.1
---
django-types==0.15.0
djangorestframework-types==0.8.0
pyright==1.1.390
My dj version is pretty old, but I'd assume things have only gotten better since v 4?
That’s why you want to run their type checker on your API. you cannot know what “their type checker” is, so you want to run all popular type checkers on your API.
The blog entry fits into ruby too, to some extent; while the situation is nowhear near as bad as in python, you have the same question-marks why types suddenly emerge out of nowhere. Almost ... almost as if some people have a specific agenda, and try to pull through with it.
Well, there you have it - the type-addicted people are ruining python.
Strongly typed, compiled languages have never been easier to use, and agents reap huge benefits from the tight feedback loop that the compiler provides. Moreover the benefits of the Python ecosystem are less significant today than anytime in the past 20 years. Need something that's only available in Python? Just point some agents at it and you can port it.
Don’t think we’re there yet, otherwise we would see a bunch of forks of major libraries to alternative languages - and not just Python. There’s still too much risk of insidious errors and bugs.
There's something particularly satisfying about shipping a 1-10MB static rust binary instead of a 2GiB docker python environment.
(I'm talking about just porting simple applications, or maybe a missing package/crate at a time. Not both at once, and not typical 100K-10M line internal legacy sprawl)
When something is easier/requires less context, it tends to work well for both human and LLM.
I've noticed LLMs sometimes pick a documented anti-pattern (passing Optional around in Java is not recommended), then amplify it (like a human might).