../

Meeting with Tommy

Problem with RTL that has been seen before

  • Even if the LLM hasn’t seen any of the assertions, it seeing the RTL may have unintended consequences.
  • Can we introduce a methodology to generate these RTL samples?
  • Something like csmith? RTLSmith?
  • Handcrafting the benchmarks isn’t really scalable and we would have to release the benchmarks anyhow
  • We definitely don’t want any test contamination

Metrics

  • We can follow the lead of the other papers and just use proven%, COI coverage, etc.
  • But can we construct something new?
  • Are all coverage created equal?
    • Some coverage has more bug finding/preventing capability than others
  • How do we compare equal coverage?

ChatSVA

  • Read this paper
  • Figure out that their Function Coverage metric is all about

Templates?

  • So far we have just been working on unguided mining, what if we want to guide the mining somehow?
  • Look up Matt Dwyer’s LTL patterns
  • We have to be thinking about feedback loops

Next Steps

  • Read FAVA
    • Attempt to connect our work with theirs
  • Read CHATSVA
  • Try to come up with new metrics
  • Read up on what to do after Periscope