I can summarize the latest publicly available material on Abstract Syntax Trees (ASTs) and point to a few representative threads and papers.
Key takeaways
- ASTs continue to be central to code analysis, transformation, and tooling (linters, formatters, compilers) as they provide a structured, language-aware representation of source code. They enable static analysis and automated transformations by exposing syntactic and semantic constructs in a tree form.[2][4]
- Recent research compares AST representations across parsing tools (e.g., JDT, Tree-sitter, srcML, ANTLR) and finds notable differences in size, depth, and abstraction level. JDT often yields smaller, shallower trees with higher abstraction, while other parsers provide richer but larger trees that can impose a higher learning burden on models or analyzers.[4][2]
- The relevance of ASTs is expanding in AI-assisted code tasks. Discussions and tutorials emphasize how ASTs support code understanding, patching, and generation, and there’s ongoing interest in how ASTs will interact with large language models and program synthesis pipelines.[1][7]
- Diverse educational and practical content exists, from introductory explanations of what ASTs are and how they’re used in tooling, to deep-dives on AST patching and code summarization. This includes videos and blogs by practitioners who illustrate using ASTs for static analysis, code transformation, and code generation.[7]
Representative sources you can consult
- Practical explanations of how parsing, tokenizing, and ASTs fit into tooling like ESLint, Prettier, and Babel, plus notes on code generation from ASTs: “Abstract syntax trees are going to be important in 2024” (video discussion).[1]
- Comparative analyses of AST parsers (JDT, Tree-sitter, srcML, ANTLR) and how their outputs differ in size, depth, and abstraction, with findings on which configurations tend to perform best for code-related tasks: arXiv paper and related PDF/HTML summaries.[2][4]
- General overviews and introductory material on what ASTs are and how they underpin static analysis and transformations: introductory explainer videos and blog posts from practitioners in the field.[7]
- Broader discussions on ASTs in programming language understanding and their role in model training and evaluation, including notes on how AST structure influences downstream tasks like code search or summarization.[3][4]
If you’d like, I can:
- Pull a brief, up-to-date synthesis focused on a particular language (e.g., JavaScript/TypeScript or Python) and how its typical ASTs are structured in common tooling.
- Create a short annotated reading list with one-sentence takeaways for each resource.
- Generate a simple comparative table showing typical tree-size or abstraction differences among popular parsers (JDT, Tree-sitter, ANTLR, srcML) with caveats.
Would you like me to dive into one of these options or tailor the summary to a specific language or tooling stack?
Sources
Based on the extensive experimental results, we conclude the following findings: • The ASTs generated by different AST parsing methods differ in size and abstraction level. The size (in terms of tree size and tree depth) and abstraction level (in terms of unique types and unique tokens) of the ASTs generated by JDT are the smallest and highest, respectively. On … pets require more high-level abstract summaries in code summarization, and code snippets semantically match but contain fewer query...
arxiv.org• The ASTs generated by different AST parsing methods differ in size and abstraction level. The size (in terms of tree size and tree depth) and abstraction level (in terms of unique types and unique tokens) of the ASTs generated by JDT are the smallest and highest, respectively. On the contrary, ASTs generated by ANTLR exhibit the largest size and the lowest abstraction level. Tree-sitter and srcML are both intermediate in structure size and abstraction level between JDT and ANTLR. … • Among...
arxiv.orginterpreter, pyre-ast will be able to parse/reject it as well. Furthermore, abstract syntax trees obtained from pyre-ast is guaranteed to 100% match the results obtained by Python's own ast.parse API, down to every AST node and every line/column number.
alan.petitepomme.netWe apply the approach to gradually migrate the schemas of the AUTOBAYES program synthesis system to concrete syntax. Fit experiences show that this can result in a considerable reduction of the code size and an improved readability of the code. In particular, abstracting out fresh-variable generation and second-order term construction allows the formulation of larger continuous fragments and improves the locality in the schemas. … We used the recent grammar of the Arden Syntax v.2.10, and both...
www.science.govievans on June 7, 2021 It supports many more languages (~17 at various stages of development) and being able to do AST patching as in the original is one of the capabilities we're experimenting with: https://semgrep.dev/docs/experiments/overview/#autofix Would love your feedback!
news.ycombinator.com