Current Issue : January-March Volume : 2026 Issue Number : 1 Articles : 5 Articles
With the rapid advancement of large language models (LLMs), agents capable of autonomous perception, decision-making, and action have emerged as a frontier paradigm in artificial intelligence. These entities are transitioning from academic research to complex real-world applications. However, the rapid iteration of agent capabilities poses severe challenges to evaluation methodologies—particularly in assessing their core competencies in data processing and evaluation. As of 2025, the field of agent data evaluation exhibits a dynamic yet fragmented landscape. Traditional static dataset-based evaluations are no longer sufficient to measure agent performance in open, dynamic environments. The research community is actively shifting toward more interactive and realistic benchmarking paradigms. Despite the emergence of innovative benchmarks such as ToolBench and MLAgentBench, there remains a widespread lack of unified evaluation standards, widely accepted metric systems, and mature methodologies. This paper systematically reviews the state of agent data evaluation in 2025, tracing the evolution from traditional metrics to emerging process-oriented ones. Building upon this, we delve into the methodology of dataset and benchmark design, with particular attention to key elements in experimental design, such as controlled experiments, sample size determination, and statistical analysis. Furthermore, we analyze the core challenges facing the field, including the “realism gap” between evaluation and real-world tasks, the scalability dilemma of automated evaluation, and the increasingly prominent issues of data privacy and security. Our findings indicate that although potential technologies such as differential privacy and federated learning exist, dedicated privacy-preserving frameworks for agent evaluation remain in their infancy. Finally, this report outlines future research directions, emphasizing the urgent need to establish unified evaluation frameworks, develop process-oriented evaluation metrics, and formulate standardized privacy and security auditing protocols—aiming to provide a scientific foundation for building more robust, trustworthy, and responsible agent systems....
Smart contracts empower many blockchain applications but are exposed to code-level defects. Existing methods do not scale to the evolving code, do not represent complex control and data flows, and lack granular and calibrated evidence. To address the above concerns, we present an across-graph corresponding contract-graph method for vulnerability detection: abstract syntax, control flow, and data flow are fused into a typed, directed contract-graph whose nodes are enriched with pre-code embeddings (GraphCodeBERT or CodeT5+). A Graph Matching Network (GMN) with cross-graph attention compares contract-graphs, aligns homologous sub-graphs associated with vulnerabilities, and supports the interpretation of statements at the level of balance between a broad structural coverage and a discriminative pairwise alignment. The evaluation follows a deploymentoriented protocol with thresholds fixed for validation, multi-seed averaging, and a conservative estimate of sensitivity under low-false-positive budgets. On SmartBugsWild, the method consistently and markedly exceeds strong rule-based and learning baselines and maintains a higher sensitivity to matching false-positive rates; ablations track the gains to multi-graph fusion, pre-trained encoders, and cross-graph matching, stable through seeds....
Background: Passive brain–computer interface (pBCI) systems use a combination of electroencephalography (EEG) and machine learning (ML) to evaluate a user’s cognitive and physiological state, with increasing applications in both clinical and non-clinical scenarios. pBCI systems have been limited by their traditional reliance on sensor technologies that cannot easily be integrated into non-laboratory settings where pBCIs are most needed. Advances in textile-electrode-based EEG show promise in overcoming the operational limitations; however, no study has demonstrated their use in pBCIs. This study presents the first application of fully textile-based EEG for pBCIs in differentiating cognitive states. Methods: Cognitive state comparisons between eyes-open (EO) and eyes-closed (EC) conditions were conducted using publicly available data for both novel textile and traditional dry-electrode EEG. EO vs. EC differences across both EEG sensor technologies were assessed in delta, theta, alpha, and beta EEG power bands, followed by the application of a Support Vector Machine (SVM) classifier. The SVM was applied to each EEG system separately and in a combined setting, where the classifier was trained on dry EEG data and tested on textile EEG data. Results: The textile EEG system accurately captured the characteristic increase in alpha power from EO to EC (p < 0.01), but power values were lower than those of dry EEG across all frequency bands. Classification accuracies for the standalone dry and textile systems were 96% and 92%, respectively. The cross-sensor generalizability assessment resulted in a 91% classification accuracy. Conclusions: This study presents the first use of textile-based EEG for pBCI applications. Our results indicate that textile-based EEG can reliably capture changes in EEG power bands between EO and EC, and that a pBCI system utilizing non-traditional textile electrodes is both accurate and generalizable....
The study investigates the use of powerful machine learning approaches to the real-time detection of phishing URLs, addressing a critical cybersecurity concern. The dataset we utilized in this research work was collected from the University of California Irvine (UCI) Machine Learning Repository. It has 235,795 instances with fifty-four distinct parameters. The label class is of binomial type and has only two target classes. We used a range of complex algorithms, including k-nearest neighbor, naive Bayes, decision trees, random forests, and random tree, to assess the discriminative characteristics retrieved from URLs. The random forest classifier beat the other classifiers, reaching the greatest accuracy of 99.99%. The study demonstrates that these models achieve superior accuracy in identifying phishing attempts, significantly outperforming traditional detection methodologies. The findings underscore the potential of machine learning to provide a scalable, efficient, and robust solution for real-time phishing detection. Implementing these innovative platforms to existing security solutions is going to play a critical role in sustaining the protective line against continuously evolving and persistent phishing schemes....
Real-time driving monitoring systems can use self-powered sensors based on artificial intelligence (AI) and triboelectric nanogenerators (TENGs). Here, we created a TENG-based self-powered intelligent steering wheel that can detect hand gripping. The TENG serves as the steering wheel’s smart surface. In addition to monitoring the steering wheel in real time, the intelligent steering wheel reacts quickly. The TENG sensor can detect hazardous conditions and lower processing demands while retaining excellent identification accuracy when used in conjunction with machine learning. Additionally, the TENG sensor may now offer an accurate and affordable monitoring solution for smart driving thanks to the integration of AI....
Loading....