Performance
On a benchmark where we need to validate and normalize thousands URLs found on popular websites, we find that ada can be several times faster than popular competitors (system: Apple MacBook 2022 with LLVM 14).
Ada has improved the performance of the popular JavaScript environment Node.js:
Available datasets
ada-url/url-various-datasets
These are collections of URLs for benchmarking purposes. Disclaimer: This repository is developed and released for research purposes only.
files/node_files.txt
:- Contains all source files from a given Node.js snapshot as URLs (43415 URLs).
files/linux_files.txt
:- Contains all files from a Linux systems as URLs (169312 URLs).
wikipedia/wikipedia_100k.txt
:- Contains 100k URLs from a snapshot of all Wikipedia articles as URLs (March 6th 2023)
others/kasztp.txt
:- Contains test URLs from URL_Shortener (MIT License) (48009 URLs).
others/userbait.txt
:- Contains test URLs from phishing_sites_detector (unknown copyright) (11430 URLs).
top100/top100.txt
- Contains crawl of the top visited 100 websites and extracts unique URLs (98000 URLs)
ada-url/url-dataset
This repository crawls the top visited 100 websites and extracts unique URLs to be used for generating a dataset of unique real-world URL examples. The script creates a out.txt file with each line containing a different URL.
Resources
ada_analysis
Introduction
Repository to do data analysis
The file follow the CSV format. For each URL in a set, on a given system, we include the number of cycles and instructions needed to process the URL as well as many other attributes of the URL, including its protocol type, length of the path and so forth. You can open CSV files in a spreadsheet tool.
The big_url_set is our default (github//ada-url/url-dataset/out.txt).
We process each URL 30 times, but not in sequence. We record the time needed to generate the normalized URL (href).
The benchmark done using model_bench. It only works under Linux because only under Linux can we get the fine grained precision we need to benchmark individual URL.
We do not need report the timings (ns) for precision reasons. Only the number of cycles and the number of instructions are reported.
js_url_benchmark
Introduction
Runs the same benchmark in latest Bun, Deno as well as Node.js v16, v17, v18 and v20.
Benchmarks
JavaScript URLs
Benchmarks and results are available through js_url_benchmark
repository in ada-url
organization.