PodParley PodParley

Ep. 259 - June 9, 2024

An episode of the TechcraftingAI NLP podcast, hosted by Brad Edwards, titled "Ep. 259 - June 9, 2024" was published on June 11, 2024 and runs 37 minutes.

June 11, 2024 ·37m · TechcraftingAI NLP

0:00 / 0:00

ArXiv NLP research for Sunday, June 09, 2024. 00:19: How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States 01:40: DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation 03:25: Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses 05:08: MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations 06:17: SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models 08:11: Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions 09:54: MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation 11:20: QGEval: A Benchmark for Question Generation Evaluation 12:44: MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model 13:43: Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization 14:46: The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models 16:30: RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation 18:14: Hidden Holes: topological aspects of language models 19:46: Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper 20:40: Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models 22:02: MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering 23:12: II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models 25:17: Zero-Shot End-To-End Spoken Question Answering In Medical Domain 26:27: Are Large Language Models Actually Good at Text Style Transfer? 27:32: Feriji: A French-Zarma Parallel Corpus, Glossary & Translator 28:56: TTM-RE: Memory-Augmented Document-Level Relation Extraction 30:12: Why Don't Prompt-Based Fairness Metrics Correlate? 31:27: Hello Again! LLM-powered Personalized Agent for Long-term Dialogue 33:12: Semisupervised Neural Proto-Language Reconstruction 34:12: Prompting Large Language Models with Audio for General-Purpose Speech Summarization 35:14: A Dual-View Approach to Classifying Radiology Reports by Co-Training 36:07: ThaiCoref: Thai Coreference Resolution Dataset

ArXiv NLP research for Sunday, June 09, 2024.


00:19: How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

01:40: DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

03:25: Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses

05:08: MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations

06:17: SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models

08:11: Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

09:54: MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation

11:20: QGEval: A Benchmark for Question Generation Evaluation

12:44: MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model

13:43: Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization

14:46: The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

16:30: RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation

18:14: Hidden Holes: topological aspects of language models

19:46: Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper

20:40: Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models

22:02: MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering

23:12: II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

25:17: Zero-Shot End-To-End Spoken Question Answering In Medical Domain

26:27: Are Large Language Models Actually Good at Text Style Transfer?

27:32: Feriji: A French-Zarma Parallel Corpus, Glossary & Translator

28:56: TTM-RE: Memory-Augmented Document-Level Relation Extraction

30:12: Why Don't Prompt-Based Fairness Metrics Correlate?

31:27: Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

33:12: Semisupervised Neural Proto-Language Reconstruction

34:12: Prompting Large Language Models with Audio for General-Purpose Speech Summarization

35:14: A Dual-View Approach to Classifying Radiology Reports by Co-Training

36:07: ThaiCoref: Thai Coreference Resolution Dataset

No similar episodes found.

No similar podcasts found.

URL copied to clipboard!