The “Lokal” project aimed to make Telugu news more accessible and engaging by generating summaries and titles directly in Telugu. Given a diverse dataset of Telugu news articles, our goal was to create concise summaries and attention-grabbing headlines that captured the essence of each article.
Technology Choices and Implementation
For this project, Claude 3.5 Sonnet on Amazon Bedrock was chosen due to its sophisticated language generation capabilities, well-suited to handle complex languages like Telugu. The process centred around prompt engineering, where we carefully crafted prompts to guide the model in generating high-quality outputs. Using prompt engineering, we refined Claude 3.5 Sonnet’s responses to meet specific guidelines (details below), which focused on producing accurate and engaging headlines and summaries.
Guidelines Followed
The following guidelines were crucial to the project’s success:
- Headline Guidelines: Headlines were to be crisp and creative, using 4-7 words to summarize the article without repeating phrases. Headlines needed appropriate punctuation and grammatical accuracy and had to appeal to the audience by beginning with key facts, whether they were presented in a straightforward, curiosity-driven, question-based, or emotional format. Additionally, misleading headlines and spelling errors were strictly avoided to ensure clarity and trustworthiness.
- Content Guidelines: Summaries were limited to 400 characters or 60 words, focusing on conciseness, accuracy, and structure. Summaries followed the 5Ws and H (Where, When, What, Who, Why, How) framework, emphasizing event details, timing, and key figures without redundancy or repetition. If content was sensitive, the source was included to ensure transparency.
These guidelines drove our prompt engineering, ensuring that each generated summary and headline aligned with the project’s vision for engaging, reliable, and culturally relevant news content.
Prompt Engineering and Key Design Decisions
A primary challenge in this project was tailoring Claude 3.5 Sonnet’s outputs to achieve the headline and content standards while maintaining readability and relevance. The entire project revolved around prompt engineering, requiring iterative refinement of prompts to strike a balance between brevity, clarity, and the complexity of Telugu language structure. This approach allowed us to achieve precise summaries and headlines while aligning with the customer expectations.
Technical Implementation
Here’s the flow of approach:
Stage 1: Summary Generation
Our summary generation prompt was carefully engineered to maintain journalistic standards using article as input. Here’s a key excerpt from summary generation prompt:
Summary Guidelines:
– Write a concise news summary in Telugu (50-70 words)
– If article is less than 400 characters, rewrite without further shrinking
– Capture the essence while including key facts: who, what, when, where, why, and how
– Retain critical numbers, statistics, and information accurately
– Use journalistic style: clear, concise, and objective
– Ensure proper Telugu grammar and spelling accuracy
– Double-check words with similar-sounding letters (బ/భ, ప/ఫ, శ/ష)
– Ensure proper use of vowel marks (గుణింతాలు) and consonant combinations (ఒత్తులు)
Stage 2: Headline Generation
The headline generation phase used the summary as input, following a structured approach based on content type. Here’s the core logic from headline generation prompt:
If summary contains notable quote/statement:
Format: “Direct Quote : Speaker’s Name”
Example: “రైతులు దేశానికి వెన్నెముక” : ప్రధాని మోదీ
For general news:
– Create concise headline in 5-9 words
– Avoid colons and comma-separated phrases
Example: “సెప్టెంబర్ 27న ‘దేవర’ చిత్రం విడుదల”
Stage 3: Quality Check
The final stage involved a comprehensive quality check by the model by passing generated summary and headline focusing on multiple aspects:
Here’s a key snippet of prompt used in quality check:
- Grammar and Spelling:
– Match verbs to subject’s gender, number, and tense
– Apply proper case endings for nouns and pronouns
– Ensure gender agreement between nouns and modifiers
– Follow sandhi rules for word joining
– Maintain consistent tenses
– Use postpositions correctly
– Apply appropriate respect levels in verbs
- Content Quality Checks:
– Completeness: Capture full essence of main story
– Word Flow: Natural, flowing Telugu language
– Main Point: Highlight most critical aspect
– Meaningfulness: Each word contributes to clear message
– Relevance: Direct relation to article content
– Conciseness: Single, impactful phrase for headline
– Accuracy: Avoid misleading information
Through this three-stage process, we ensured the quality and authenticity of Telugu news content. Following this methodology enabled us to deliver accurate, clear, and reliable summary and headlines.
Global Relevance and Scalability
While this project focused on Telugu, our approach to prompt engineering and language generation can extend to other languages, showcasing the potential of AI-driven content generation for local news worldwide. Claude 3.5 Sonnet’s versatility allows this solution to scale across languages, supporting accessibility to regional content on a global scale.
Key Takeaways and Lessons Learned
- AI for Language Accessibility: Claude 3.5 Sonnet’s capabilities demonstrated the power of AI in promoting language inclusivity, making local news more accessible and engaging for Telugu-speaking audiences.
- Focused Use of Prompt Engineering: This project underscored the effectiveness of prompt engineering as a powerful tool to align AI outputs with specific requirements, eliminating the need for additional deployment or support services.
- Cultural Relevance: Creating summaries and headlines in a regional language meant considering the local culture. This showed how careful prompt design helps ensure the content connects with the audience in a meaningful way.