2026-06-25

Email about bridging RTL

thangellamudiBridgingRTLAssertion2026’s dataset is not the same as liuRTLCoderFullyOpenSource2025
Furthermore, thangellamudiBridgingRTLAssertion2026 doesn’t contain verbatim source code of luRTLLMOpenSourceBenchmark2023 . Though, it does contain example with similar module names.
liuRTLCoderFullyOpenSource2025 already demonstrated that using their dataset helps small models to rival the performance of frontier models on luRTLLMOpenSourceBenchmark2023 benchmark
But, they include a crucial step. They use Rouge-L metric to compute similarity between the dataset and the benchmark. They remove all the examples that exceed a certain threshold. In their case it resulted in the removal of 100 examples. Then the train their models.

My conclusions

I think thangellamudiBridgingRTLAssertion2026 use the technique suggested in liuRTLCoderFullyOpenSource2025 to generate their own dataset. I perused through some examples and it seem AI-generated. I am fairly certain no human would ever name their modules like this.
The SVADataset sometimes doesn’t have any assertions
Sometime the assertions are of the flavor assert (WIDTH>0) else fatal() which is akin to C assert than what we are looking for.
I do think the performance is real. liuRTLCoderFullyOpenSource2025 using Mistral and Deepseek do get good results. But Idk why the authors used Llama2 in 2026.
This may indicate that the benchmark is too weak.
I will make another pass to be sure about the details.