viva la vida

Saturday, March 22, 2025

Exploring the Challenges of Batch Processing in Large Models

In the exploration of large model applications, batch processing is a crucial turning point. Many novel and valuable phenomena have been discovered through batch processing practices. Previous sharing on batch processing has received enthusiastic attention, with unexpected reading volumes exceeding 1,000. This encourages us to delve deeper into the problems encountered during batch processing, particularly in the field of publication translation.

To begin with, it's essential to understand two core concepts: system prompts and user prompts. Simply put, system prompts are the requirements users provide to the model, informing it how to fulfill our needs. User prompts, on the other hand, are the text content provided to the model for processing. In platforms like Dify/Coze, the prompts filled in when adding large model nodes are system prompts, which guide the node's processing.

However, designing system prompts often presents a dilemma. To enhance model output, we need to constrain the model using prompts, but this limitation also restricts the application scenarios of system prompts, making it challenging to accommodate various complex situations. In batch processing scenarios, fixed system prompts are typically used in conjunction with different user prompts. Due to the diversity of user prompts, system prompts may not be adequately adapted to specific "text," leading to model output errors.

Regarding text slicing in batch processing, simply dividing text by character count seems straightforward but often results in fragmented articles. Even with added context and reassembly, the effect is still unsatisfactory. Dividing text by sentence can lead to a strong machine translation flavor, and actual operation often encounters strange situations, such as mistaking decimal points for periods. Dividing text by natural paragraphs is relatively better, as paragraphs are typically complete contextual units with fixed identifiers. Although issues still arise in special scenarios, overall performance is acceptable. However, processing an entire article as a unit can lead to content compression, rewriting, and structural chaos, making it inadvisable.

In batch processing practices, two common modes are observed. The multi-level vertical deepening mode involves chained transmission to obtain the final result, processing only one user prompt per demand, with multiple system prompts. Intermediate outputs are unnecessary intermediate results, and only the final result is desired. Handling such demands in Chatbots can be troublesome, requiring pre-written system prompts in files and manual pasting. Moreover, models may exhibit judgment errors, hallucinations, and accumulated errors during vertical processing, affecting the stability of the final result.

Parallel processing of similar tasks is more common, typically involving identical processing of multiple similar texts, such as keyword extraction or segmented translation. In such scenarios, fixed system prompts struggle to effectively process each output segment. Therefore, designing system prompts requires balancing output effect and universality. In actual business scenarios, these two modes often intertwine, with problems accumulating and amplifying, ultimately leading to large models exhibiting "wise in single operations, poor in batch operations" performance. However, from another perspective, these challenges also guide our future research direction, motivating us to gradually overcome these issues and optimize large models' performance in batch processing tasks.

viva la vida

Saturday, March 22, 2025

No comments:

Post a Comment