| Many existing works in profiling focused their attention on individual user entities,which are already minimal inseparable units.These works mainly analyzed the characteristics of users’ behavior in order to process tagging on them.There are also other works on users’ group analysis,e.g.community detection,the key point of which is the membership inside communities combined by a network.However,communities tend to have blurry boundaries compared to organizations.Different algorithms usually produce different communities.This leads us to think about how to generate a systematic organization profile and contribute to applications for organizations and pushes us to propose the concept of organization profiling.To the best of our knowledge,this paper sets foot on the first research of organization profiling.Organization profiling refers to the processes of analysing the intrinsic attributes,dynamic behaviours and changes,and extracting the differences as tags between organizations.Here,an organization with all the members belong to it is treated as an entity.This paper provides the definition of organization profiling and its basic framework.It classifies the internal entities into two parts:members and attributes.And the attributes of an organization are seperated by dynamic and static ones.Furthermore,it defines all the three phases of organization profiling,namely data acquisition,information fusion and information excavation.We set our emphasis on three dynamic attributes of organization profiling separately in this paper.Feasibility study and practical implementation are carried out based on core member and community detection,relation extraction and interest discovery of organization.Some algorithms are adjusted and innovated in this part.In the first part of core discovery and community detection,we propose a core-member-based community detection algorithm,which adopts the douple layer BGLL algorithm and introduces core members into the second layer of it.We implemented the algorithm for community detection and did analysis on organizations such as school and university.In the second part of relation extraction,we proposed a method of open entity relation extraction based on parsing tree.We improved the original location-restricted open entity relation extraction method and enhanced the accuracy of relation extraction.This method is suitable for entity relation extraction towards massive data.We carried out experiments with news data crawled from the internet,parallelized in Hadoop.We applied the method and launched the project of relation extration for the information portal of university.In the last part of interest discovery module,we transformed the task of interest discovery to the task of document clustering.A variant of LDA,author-topic model,is utilized to achieve the goal of interest discovery at the level of labs.This paper initiated study on methodology of organization profiling and generating a perfect profile for an organization is non-trivial.Therefore,we emphasize briefly the research on feasibility.Further study is strongly desired so we put forward our work as a start and appeal for the participation of more and more people to join in our study on organization profiling. |