2月7日是什么星座| 湿漉漉是什么意思| 肺部真菌感染用什么药最好| 怀疑甲亢需要做什么检查| 下葬下雨是什么兆头| 肤专家抑菌软膏主要治什么| 十月初八是什么星座| 吃什么食物补钙最快| qaq什么意思| 苏轼号什么| 佝偻是什么意思| 螃蟹过街的歇后语是什么| 副军长是什么军衔| cdfi是什么意思| 女人小肚子疼是什么原因| 颈椎病头晕吃什么药| 鸡肉和什么相克| 乳钉的作用是什么| 6月16日什么星座| 打激素有什么副作用| 一九四九年属什么生肖| 手指长倒刺是什么原因| 胸口容易出汗是什么原因| 四个月念什么字| 边界欠清是什么意思| 早期流产是什么症状| 做胃镜前喝的那个液体是什么| 口腔老是出血是什么原因| 超七水晶是什么| 什么原因导致缺钾| 金脸银脸代表什么人物| 26是什么意思| 身上红痣多是什么原因| 梦见打麻将是什么意思| 新生儿晚上哭闹不睡觉是什么原因| 友人是什么意思| 什么东西去火| 坨坨什么意思| 为什么这样对我| 轻度高血压吃什么食物可以降压| 嘴巴苦吃什么药| 脸痒是什么原因| 什么脑袋| 沙拉酱是用什么做的| 咲是什么意思| 膛目结舌是什么意思| 对药物过敏是什么症状| 梦见煎鱼是什么预兆| 肺部肿瘤不能吃什么| 副县长什么级别| 食管在什么位置图片| 什么是腹式呼吸的正确方法| 7月17什么星座| 咳嗽有痰吃什么好的快| 白起为什么被赐死| 795是什么意思| 什么叫meta分析| 龟头敏感早泄吃什么药| 江小白加雪碧什么意思| gbs是什么意思| 儿童身高矮小挂什么科| 利血平是什么药| 6.1号是什么星座| 补肾吃什么好| 手腕关节疼痛什么原因引起的| 看门神是什么生肖| 儿童去火吃什么药| 怀孕吃鹅蛋有什么好处| 芒种是什么意思| 四叶草是什么牌子| 脚底起水泡是什么原因| club monaco是什么牌子| 脾虚湿热吃什么中成药| 985大学是什么意思| 宫颈炎吃什么药效果最好| 来大姨妈能吃什么水果| 木樨是什么意思| 什么叫免疫力| 西瓜又什么又什么| 屁股出血什么原因| 血压偏低吃什么| 年终奖一般什么时候发| 菌痢的症状是什么样| 小孩尿不出来尿是什么原因| 奴役是什么意思| 小孩老是肚子疼是什么原因| 软脚虾是什么意思| 槐花什么时候开花| 嘴角生疮是什么原因| 唐卡是什么| 红眼病用什么眼药水| 三点水加个有字念什么| 木加一笔有什么字| 半月板是什么意思| 吃什么养脾胃| 梦见墙倒了有什么预兆| 凝固酶阳性是什么意思| 9月27是什么星座| 线性骨折是什么意思| 穿拖鞋脚臭是什么原因| 我是什么星座| 腺样体挂什么科| 梦见水果是什么意思| 筷子在古代叫什么| 灵芝孢子粉是什么| 睡觉为什么磨牙| 龙须菜是什么植物| geforce是什么牌子| 道士是什么生肖| 酒是什么时候发明的| 秦朝之前是什么朝代| 曼巴是什么意思| 羊经后半边读什么| 变化无常的意思是什么| 血小板低是什么意思| 什么的恐龙| 罗汉果泡水有什么好处| 十二年义务教育什么时候开始| 龙的本命佛是什么佛| 吃了避孕药会有什么副作用| 刻代表什么生肖| 二月初十是什么星座| 10月10日是什么星座| 夏天适合吃什么| 鸡蛋和什么不能一起吃吗| 口扫是什么| o型血吃什么瘦的最快| 肾宝片有什么副作用吗| 5月8号是什么日子| rr医学上什么意思| 手指甲上的月牙代表什么| 右附件区囊肿是什么意思| 五更泻吃什么药| 风湿吃什么药好| 最小的一位数是什么| 光天化日什么意思| 什么时候排卵| 为什么乳头会变黑| 女性尿道炎挂什么科| 11月8日是什么星座| 梦见自己流鼻血是什么预兆| 木节念什么| 长期喝咖啡有什么危害| 为什么青蛙跳的比树高| 成都市市长是什么级别| 慢性咽炎吃什么药效果最好| 热量是什么意思| 什么是肝脏纤维化| 老鼠爱吃什么食物| 喝什么助眠| 三九胃泰治什么胃病效果好| 什么的屏障| 硫酸镁注射有什么作用| 臆想症是什么病| 立是什么结构的字| 性激素是什么意思| 男朋友昵称叫什么好听| 吃什么菜减肥| 相处是什么意思| 体外射精是什么意思| 月经安全期是什么时候| 性激素是查什么| 腰酸是什么病的前兆| 自作多情是什么意思| 肺炎支原体阳性是什么意思| 吃什么利尿| 女人依赖男人说明什么| 醒酒是什么意思| 腰酸是什么病的前兆| 年下恋是什么意思| 耳朵会动的人说明什么| 碱性磷酸酶偏低是什么意思| 有龙则灵的灵是什么意思| 火代表什么数字| 鼻窦炎吃什么药好得快| 折耳猫为什么不能养| 险象环生是什么意思| 背疼挂什么科室最好| mi医学上是什么意思| 属马的是什么星座| 血糖偏高会有什么症状| 吃什么能增强记忆力| 什么是钼靶检查| 玫瑰糠疹用什么药| 儿女双全什么意思| 甲片是什么| 南瓜子不能和什么一起吃| 鸡奸什么意思| 中指尖麻木是什么原因| 七六年属什么| 宝宝肋骨外翻是什么原因| 婴儿42天检查什么项目| 操逼什么意思| 遇人不淑是什么意思| 高密度脂蛋白胆固醇偏低什么意思| 八月二号是什么星座| 两个百字念什么| 枕秃是什么意思| 舌裂纹是什么原因| 痰多吃什么好| 小孩经常流鼻血是什么原因| 神农架为什么是禁区| 女人吃火龙果有什么好处| 胃炎不能吃什么| 平舌音是什么意思| 上火喝什么药| cc是什么单位| 富士山什么时候喷发| 欲情故纵什么意思| 598是什么意思| 天庭是什么意思| 丫丫的老公叫什么| 拥趸是什么意思| 人加三笔是什么字| 水煮鱼用什么鱼做好吃| 蜈蚣长什么样| 骨是什么结构| 衣服发黄是什么原因| 流年是什么| 多潘立酮片是什么药| 甲状腺素低吃什么能补| 什么是事故隐患| 什么是斜率| 1882年属什么生肖| 别见怪是什么意思| 2岁什么都听懂但不说话| 风疹病毒抗体igg阳性是什么意思| 三下乡是什么| 女人生气容易得什么病| 缺铁性贫血吃什么补得快| 稷读什么| 老人睡眠多是什么原因| 瘫痪是什么意思| 上呼吸道感染吃什么消炎药| 附子理中丸治什么病| 外地车进北京有什么限制| 假牛肉干是什么做的| 华丽转身是什么意思| 脚肿是什么原因引起的| 低钾血症是什么病| 补气血吃什么好| 厘米为什么叫公分| 王久是什么字| 乳房长斑点是什么原因| 神经病和精神病有什么区别| 拥趸是什么意思| 总是打嗝是什么原因| 喝荷叶茶有什么好处和坏处| 大便培养是检查什么的| 电脑什么牌子好| 什么袍加身| bace是什么意思| 亮相是什么意思| 木生什么| 夏天梦见下雪是什么意思| 嫖娼是什么| 什么是领导| hpf是什么意思| 医院可以点痣吗挂什么科| 睡眠不好吃什么药最有效| 脆鱼是什么鱼| 经常困想睡觉是什么问题| 肾在什么位置| 什么样的马| 百度

hp是什么意思

Method and system for virtual intelligence user interaction Download PDF

Info

Publication number
WO2023137078A1
WO2023137078A1 PCT/US2023/010624 US2023010624W WO2023137078A1 WO 2023137078 A1 WO2023137078 A1 WO 2023137078A1 US 2023010624 W US2023010624 W US 2023010624W WO 2023137078 A1 WO2023137078 A1 WO 2023137078A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
virtual
request
personification
question
Prior art date
Application number
PCT/US2023/010624
Other languages
French (fr)
Inventor
James KAPLAN
Original Assignee
MeetKai, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MeetKai, Inc. filed Critical MeetKai, Inc.
Publication of WO2023137078A1 publication Critical patent/WO2023137078A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/011Emotion or mood input determined on the basis of sensed human body parameters such as pulse, heart rate or beat, temperature of skin, facial expressions, iris, voice pitch, brain activity patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/038Indexing scheme relating to G06F3/038
    • G06F2203/0381Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise

Definitions

  • the present invention is directed to a method and system to provide user interaction with virtual personifications using artificial intelligence (“Al”).
  • Al artificial intelligence
  • a system and method to generate and update virtual personification using artificial intelligence receiving data associated with a person, the data comprising one or more of the following: text files, audio files, image files, and video files, and rendering a virtual personification of the person and outputting the virtual personification to a user. Then, receiving and interpreting a user input to generate a user request and updating the virtual personification in response to the user request.
  • the update comprising one or more of the following.
  • the virtual personification is of a person, either living or deceased. It is contemplated that the virtual personification may comprise an audio output and video output which are presented in a virtual environment of a type associated with the virtual personification. The virtual personification may comprise a representation of a non-living item.
  • the method is further configured to, responsive to being unable to create the generated response at the virtual reality device, transmit the question or request from the user to a remote artificial intelligence module.
  • the remote artificial intelligence module may be a computing device with a processor and memory storing machine readable code configured to receive the question or request from the user via the virtual reality device, process the question or request to derive a meaning, and perform one or more searches for answers to the question or request in databases unrelated to the virtual personification.
  • Upon locating an answer to the question or request generating data that represents the virtual personification answering the question or request and transmitting the answer or the data that represents the virtual personification answering the question or request to the virtual reality device for presentation to the user.
  • the method may further comprise tracking a user’s hand position using one or more user hand position tracking devices to determine what the user is pointing at in the virtual environment.
  • the step of generating a response to the question or request may use artificial intelligence to generate an answer by searching one or more databases that contain information from the person represented by the virtual personification but which to not provide a direct answer to the question or request.
  • a system for presenting an interactive, artificial intelligence assisted, virtual personification to a user comprising a virtual reality device configured to have at least a portion be worn by the user.
  • the virtual reality device includes a wearable screen configured for viewing by a user, one or more speakers configured to provide audio output to the user, a microphone configured to receive audio input from the user, and one or more external input devices configured to receive input from the user.
  • Also part of the virtual reality device includes a communication module configured to communicate over a computer network or Internet, and a processor with access to a memory. The processor executes machine readable code and the memory is configured to store the machine readable code.
  • the machine readable code is configured to present a virtual environment on the wearable screen and through the one or more speakers to the user and present, to the user on the wearable screen and through the one or more speakers, a virtual personification of a person currently living or deceased, in the virtual environment.
  • the code is also configured to receive a question or request from the user regarding one or more aspects of the virtual environment or the virtual personification and then generate a response to the question or request from the user, which includes generating video content and audio content which did not previously exist.
  • the code then presents the generated response to the user on the wearable screen and through the one or more speakers in response to question or request from the user.
  • the machine readable code is further configured to, responsive to being unable to create the generated response at the virtual reality device, transmit the question or request from the user to a remote artificial intelligence module.
  • the remote artificial intelligence module may be a computing device with memory and processor such that memory store machine readable code configured to receive the question or request from the user via the virtual reality device, process the question or request to derive a meaning, and perform one or more searches for answers to the question or request in databases unrelated to the virtual personification. Then, upon locating an answer to the question or request, generating data that represents the virtual personification answering the question or request, and transmitting the answer or the data that represents the virtual personification answering the question or request to the virtual reality device for presentation to the user.
  • the system may further comprise one or more user hand position tracking device configured to track a position of a user’s hand to determine what the user is pointing at in the virtual environment.
  • the input from the user comprises an audio input or an input to the one or more external input devices. It is contemplated that generating video content and audio content which did not previously exist is generated by processing existing video, audio, or both, of the person represented by the virtual personification, to form the video content and audio content which did not previously exist.
  • the generated response to the question or request uses artificial intelligence to generate an answer by searching one or more databases that contain information from a person represented by the virtual personification but which to not provide a direct answer to the question or request.
  • Also disclosed herein is a method for presenting an interactive experience with a virtual personification using a screen, speakers, and microphone of a user computing device.
  • the method comprises a virtual environment on the wearable screen and through the one or more speakers to the user and present the virtual personification in the virtual environment.
  • receiving input from the user comprising a question, a user request, or subject regarding one or more aspects of the virtual environment, the virtual personification, or the actions of the virtual personification in the virtual environment.
  • This method then sends a request for a response to the input from the user to an Al computing device that is remote from the user computing device, and with the Al computing device, create a response based on pre-existing content stored in one or more databases which is processed to create the generated response.
  • the Al computing device is a computing device with memory and processor such that the memory stores machine readable code configured to receive the input from the user computing device, process the input from the user to derive a meaning, and based on the meaning, perform one or more searches for answers to the input from the user in databases unrelated to the virtual personification. Upon locating an response to the input from the user, generate data that represents the virtual personification answering the question or request, and transmit the data, that represents the virtual personification responding to the input from the user, to the user computing device.
  • This method may further include monitoring one or more user hand position tracking devices configured to track a position of a user’ s hand to determine what the user is pointing at in the virtual environment and interpreting the pointing as the input from the user. It is contemplated that the input from the user comprises an audio input or an input from the user to the one or more external input devices.
  • the step of generating video content and audio content which did not previously exist occurs by processing existing video, audio, or both of a person represented by the virtual personification to generate new content.
  • Figure 1A illustrates a first exemplary embodiment of the present virtual personification Al system integrated into a virtual reality system.
  • Figure IB illustrates a second exemplary embodiment of the virtual personification Al system which may use a local Al operating on a separate user device such as a smartphone, a tablet, a personal computer, etc.
  • Figure 2 illustrates an exemplary environment of use of the virtual personification Al system.
  • Figure 3 illustrates a block diagram of an example embodiment of a computing device, also referred to as a user device which may or may not be mobile.
  • FIG. 4 is a block diagram of an exemplary computing device, mobile device, or server, such as one of the devices described above, according to one exemplary embodiment.
  • Al services Services provided as procedures and methods to a program to accomplish artificial intelligence goals. Examples may include, but are not limited to, image modeling, text modeling, forecasting, planning, recommendations, search, speech processing, audio processing, audio generation, text generation, image generation, and many more.
  • a device is any element running with a minimum of a CPU or a system which is used to interface with a device.
  • an accelerator can be attached in the form of a GPU or other specialized hardware accelerator. This accelerator can speed up the computation of Al services.
  • An application is any software running on any device such as mobile devices, laptop, desktop, server, smart watches, tablets, home speakers, wearable devices including smart rings, glasses, hearing aids, CarPlay devices, security cameras, webcams, televisions, projection screen monitors, sound bars, personal computers, headphones, earbuds, and laptop devices where a user can interact with touch, audio, visual, or passively.
  • Al services Services provided as procedures and methods to a program to accomplish artificial intelligence goals. Examples may include, but are not limited to, image modeling, text modeling, forecasting, planning, recommendations, search, speech processing, audio processing, audio generation, text generation, image generation, and many more.
  • a device is any element running with a minimum of a CPU or a system which is used to interface with a device.
  • an accelerator can be attached in the form of a GPU or other specialized hardware accelerator. This accelerator can speed up the computation of Al services.
  • An application is any software running on any device such as mobile devices, laptop, desktop, server, smart watches, tablets, home speakers, wearable devices including smart rings, glasses, hearing aids, CarPlay devices, security cameras, webcams, televisions, projection screen monitors, sound bars, personal computers, headphones, earbuds, and laptop devices where a user can interact with touch, audio, visual, or passively.
  • a virtual personification system may analyze pre-recorded data to generate dynamic responses to user requests/questions through virtual personifications.
  • the virtual personification may be a virtual representation which may be based on a real person. For example, the user, a family member or relative, a famous person, a historical figure, or any other type person.
  • the virtual representation may also be a user or computer created person that does not represent a real person.
  • Pre-recorded data may include image, video, or audio footage of the real person (such as YouTube and other film footage). Dynamic responses are generated to user requests/questions related to that known person, even though prerecorded data may not include any adequate responses or responses which will match the question.
  • a user may wish to be provided with a recipe from a famous chef, such as Gordon Ramsey, to make grilled salmon.
  • the virtual personification may analyze Gordon Ramsey’s footage on making grilled chicken and grilled potatoes to generate a virtual personification of Gordon Ramsey guiding the user through the process of making grilled salmon, as if Gordon Ramsey were in a cooking show and personally providing detailed instructions to the specific user request.
  • the system Al can pull details from prior recordings and manipulate the visual and audio files to create a new virtual representation that is directly and accurately responsive to the user’s request. Al may generate new information, such as how to adjust the response to be responsive to the specific user request.
  • Al can understand the user’s request, analyze the information already provided by the chef about how to cook chicken, realize that chicken is not salmon, and then search for a recipe for salmon by the same chef or recipe, and then process the new recipe and the virtual representation to present the new recipe to the user of the system using the virtual representation, as if the original chef was actually providing the recipe for salmon and not chicken.
  • This example may be applied to any other topic or environment of use.
  • the virtual personification of Gordon Ramsey may use a voice that sounds like Gordon Ramsey, may be dressed like Gordon Ramsey, as he typically appears on cooking shows, and may mimic Gordon Ramsey’s body language and speech pattern.
  • Al may be used to create the virtual personification even in situations when the actual person never actually provided a responsive answer in a video or audio recording.
  • the virtual representation may be created using built-in Al modules such as a virtual personification rendering module (discussed in more details below) or using third-party tools, which the virtual personification system may interface with.
  • the user may attempt Gordon Ramsey’s recipe of scrambled eggs, which may already be available on YouTube, and which may involve the use of milk. However, upon determination he has no milk in the fridge, the user may wish to ask Gordon Ramsey whether whipped cream may be used as a substitute. While in the existing footage on YouTube, Gordon Ramsey may not have provided an answer to that question, the virtual personification may analyze Gordon Ramsey’s footage on substituting other items for milk to generate a virtual personification of Gordon Ramsey to answer this user question.
  • the virtual personification of Gordon Ramsey may include a prediction of Gordon Ramsey’s typical reaction in such situations.
  • the Al may determine, based on pre-recorded data, that Gordon Ramsey typically acts impatiently to such questions. Thus, the virtual personification of Gordon Ramsey may display a frown or curt gestures when providing the predicted answer.
  • the virtual personification may be presented in a virtual reality space, which may be rendered using a virtual reality system.
  • the virtual reality space may be a kitchen.
  • topics such as carpentry the environment may be a wood working shop, car repair would appear in an auto garage, education may appear as a classroom, and information about a topic may actually appear inside the items, such as inside a virtual computer or a virtual engine to show how something works in combination with Al that creates answers for the user using the virtual reality space and the virtual personification.
  • FIG. 1A illustrates a first exemplary embodiment of the present virtual personification Al system integrated into a virtual reality system.
  • the virtual reality space is rendered by a virtual reality system.
  • Exemplary virtual reality systems are described in U.S. Patent No 9,898,091, U.S. Patent Publication 2014/0364212, and U.S. Patent Publication 2015/0234189, which are incorporated by reference herein in their entirety as teaching exemplary virtual reality systems and methods.
  • a user 100A may access the virtual reality space by the one or more components of a virtual reality system, such as a virtual reality device (“VR device”) 104A and external input devices 108A, which may be accessories to the VR device 104A.
  • VR device virtual reality device
  • the VR device 104A may be in direct communication with the external input devices 108A (such as by Bluetooth?) or via network 112A providing internet or signals (e.g., a personal area network, a local area network (“LAN”), a wireless LAN, a wide area network, etc.).
  • the VR device 104A may also communicate with a remote Al 116A via the network 112A.
  • the VR device 104A may be a wearable user device such as a virtual reality headset (“VR headset”), and the external input devices 108A may be hand-held controllers where a user may provide additional input such as arm motion, hand gestures, and various selection or control input through buttons or joysticks on such controllers.
  • the VR device may generally include input devices 120 A through 128 A, input processing modules 132A, VR applications 134A, output rendering modules 138A, output devices 156A, 160A, and a communication module 164A.
  • Input devices may include one or more audio input devices 120 A (such as microphones), one or more position tracking input devices 124A (to detect a user’s position and motion), and one or more facial tracking input devices 128 A (such as facial cameras to detect facial expressions, eye-tracking camera to detect gaze and eye movement, etc.). Additional external input devices may provide user biometrics data or tracking of other user body parts.
  • audio input devices 120 A such as microphones
  • position tracking input devices 124A to detect a user’s position and motion
  • facial tracking input devices 128 A such as facial cameras to detect facial expressions, eye-tracking camera to detect gaze and eye movement, etc.
  • Additional external input devices may provide user biometrics data or tracking of other user body parts.
  • the input processing modules 132A may include, but are not limited to, an external input processing module 142A (used to process external inputs such as input from external devices 108A or additional external input devices discussed above), an audio input processing module 144A (used to process audio inputs, such as user speech or sounds), a position input processing module 146A (to process position and motion tracking inputs such as hand motions, finger motions, arm motions, head position), and a facial input processing module 148A (to process facial inputs of the user).
  • an external input processing module 142A used to process external inputs such as input from external devices 108A or additional external input devices discussed above
  • an audio input processing module 144A used to process audio inputs, such as user speech or sounds
  • a position input processing module 146A to process position and motion tracking inputs such as hand motions, finger motions, arm motions, head position
  • a facial input processing module 148A to process facial inputs of the user.
  • the VR applications 134A are generally responsible for rendering virtual reality spaces associated with their respective VR applications 134A.
  • a VR museum application may render a virtual museum through which a user may traverse and present various artwork which the user may view or interact with. This is achieved through the VR application’s 134A integration with output rendering modules 138A, which in turn presents the rendered files on output devices 156A, 160A.
  • the output rendering modules 138A may include, but are not limited to, an audio output processing module 150A responsible for processing audio files, and an image and/or video output processing module 152A, responsible for processing image and/or video files.
  • one or more audio output devices 156A such as built-in speakers on the VR headset may present the processed audio file
  • one or more image and/or video output devices 160A may display the processed image and/or video files.
  • Other types of output may include, but are not limited to, motion or temperature changes to the VR device 104A or the external input devices 108A (such as vibration on hand-held controllers).
  • User interaction may in turn modify the virtual reality space. For example, if a user inputs motion to indicate he picked up a vase, the rendered virtual reality space may display a vase moving in accordance with the user’s motion.
  • the transmission of information occurs in a bi-directional streaming fashion, from the user 100A to the VR device 104A and/or external input devices 108A, then from the VR device 104A and/or external input devices 108A back to the user 100A.
  • U.S. Application 17/218,021 provides a more detailed discussion on bi-directional streaming using Al services and examples of broader and specific uses.
  • the Al may be completely or partially built into the VR device 104 A or specific VR applications 134A. Such built-in Al components may be referred to a local Al 168 A. Other Al components may be located in the remote Al 116A, which may be operating on remote devices or on cloud-based servers. The local and remote Al 168 A, 116A may communicate via the network 112A. [0040] The Al may enhance the user s 100A interaction with the virtual reality system using the embodiments and methods described above.
  • the Al may include one or more of the following components to generally operate the Al and process data, one or more processors 172 and one or more memory storage devices where logic modules 176 and machine learning modules 178 may be stored to provide general Al services.
  • the memory storage devices may further include one or more modules to specifically enhance user-VR interaction, such as speech-to-text modules 180, non-verbal input processing modules 182, text augmentation modules 184, conversation management modules 186, response generation modules 188, audio rendering and updating modules 190, virtual personification rendering modules 192, virtual personification prediction modules 194, and integration modules 196.
  • modules to specifically enhance user-VR interaction such as speech-to-text modules 180, non-verbal input processing modules 182, text augmentation modules 184, conversation management modules 186, response generation modules 188, audio rendering and updating modules 190, virtual personification rendering modules 192, virtual personification prediction modules 194, and integration modules 196.
  • the speech-to-text modules 180 may be used to perform voice detection and customized speech to text recognition, as well as to generally detect, recognize, process, and interpret user audio input. Recognition allows the speech-to-text modules 180 to distinguish between verbal input (such as a user question) and non-verbal input (such as the user’s sigh of relief).
  • a user may start an active conversation in the virtual reality space by simply speaking.
  • the speech-to-text modules 180 may use voice activity detection in order to differentiate that the user has started speaking, as opposed to ambient noise activity.
  • the speech-to-text modules 180 may process the input audio from the microphone to recognize the user’s spoken text. This processing can either happen as part of the viewing device (such as the VR device 104A), on a device connected to the viewing device, or on a remote server over the network (such as the remote Al 116A). This process may convert the stream of audio into the spoken language, such as text processable by a computer.
  • the speech-to-text modules 180 may be customized to the current scene that the user is experiencing inside the virtual space, or a virtual personification that the user wishes to interact with. This customization could allow for custom vocabulary to be recognized when it would make sense in the specific environment or specific virtual personification. For example, if a user were interacting with a virtual personification of a cooking chef, then the speech recognition system may be customized to enhance name recognition for words associated food, whereas in a different environment a different vocabulary would be used. If the virtual personification of Gordon Ramsey were in a kitchen, then the speech recognition system may be customized to enhance name recognition for kitchen utensils.
  • the Al’s speech-to-text modules 180 are intended to integrate and enhance existing features in the virtual reality system.
  • the Al speech-to-text modules 180 may generate exponential amounts of interpretations from a single user input, automatically select the top interpretation based on user data, and hold multi-turn conversations with the user as a continuation of that single user input.
  • Appendix A includes a more detailed discussion on systems and methods for enhanced speech-to- text. The enhanced speech-to-text and integration with other applications outside the virtual reality system, and the additional mechanism to recognize usable user input (as discussed in step 2) and to process out of scope user input.
  • the Al s non-verbal input processing modules 182 may be used to process nonverbal input.
  • audio input may be non-verbal (such as a user’s sigh of relief, or tone of voice).
  • external input devices 108 A may include devices to track a user’ s biometrics or body parts other than arm, hand, and finger movement. Examples of devices to track a user’s biometrics include but are not limited to smartwatches, FitbitsTM, heart-rate monitors, blood pressure monitors, or any other devices which may be used to track a user’ s heart-rate, oxygen level, blood pressure, or any other metrics that may track a user’s body condition.
  • Such input may all be processed using additional processing modules, which may be part of the virtual reality system (such as built into the VR device 104A), and/or may be part of the local or remote Al 168A, 116A.
  • the text augmentation modules 184 may be used to add further context to the interpreted user 100A input.
  • the speech-to-text modules 180 may supplement the spoken text with what the user is currently doing, or interacting with, to enhance its linguistic understanding of what the user has said. For example, this allows the Al to find co-references between what the user said and what they are looking at. Such as, if a user asks, “how old is this”, the term “this” can be implied from what the user is currently looking at, touching, near, or pointing at in the virtual world.
  • This functionality can be carried about by fusions of any or one of the following inputs: the user's head position, eye detection, hand position - including placement, grip, pointing, controller position, and general orientation.
  • the system may also fuse in non-controller related signals, such as biometrics from heart rate, breathing patterns, and any other bio sensory information. This information is fused over time to detect not just instantaneous values for fusion but trends as well.
  • the text augmentation modules 184 may also be integrated with the non-verbal input processing modules 182 to receive further context. For example, in a multi-tum conversation where a user requests information, the user may input the word “okay”. Conventional system may, by default, cease communication because the response “okay” may be pre-coded as a command to terminate interaction.
  • the text augmentation modules 184 may analyze the user’s tone to detect (1) boredom, and interpret “okay” as a request to shorten the information provided, (2) hesitation or confusion, and interpret “okay” as a request for additional information, (3) impatience, and interpret “okay” as a request to end the interaction.
  • the text augmentation modules’ 184 integration with other devices and modules may not be linear. Rather, context from the virtual reality system may be used in one or more steps of speech interpretation. For example, in a multi-tum conversation (such as the conversation described above), at each turn of a user input, the speech-to-text modules may be used to generate the most accurate interpretation of the user’ s input, and the non-verbal input processing module 182 may be used to inject more context. Further, the Al’s conversation management modules 186 may be integrated with the text augmentation modules 184 to generate the output used in single or multi-turn conversations.
  • the conversation management modules 186 may classify the spoken text into different categories to facilitate the open-ended conversation. First the conversation management modules 186 may determine if a statement is meant to initiate a new conversation or one that continues an existing conversation. If the user is detected to initiate a new conversation, then the conversation management modules 186 may classify the result among categories.
  • a first category may include user comments that may not necessarily require a strong response. For example, if a user states “this is really cool”, the conversation management modules 186 may render the virtual personification to respond with a more descriptive or expressive response in relation to what was remarked as being cool. Alternatively, the virtual personification may not respond.
  • a second category may include user questions that may be in relation to the current scene.
  • a third category may be user questions that are in relation to the nonvirtualized world (i.e., reality).
  • the conversation management modules 186 may facilitate an answer to the question via the virtual personification.
  • the system may then proceed down to one of two or more paths.
  • the conversation management modules 186 may first attempt to use information in pre-recorded data to answer the question. For example, during a user interaction with a virtual Gordon Ramsey on making a grilled salmon, a user may ask about the use of an ingredient not in the current recipe.
  • the conversation management modules 186 may retrieve footage from another video where Gordon Ramsey uses that ingredient and may render the virtual Gordon Ramsey to modify the current recipe to include that ingredient.
  • the conversation management modules 186 request the response generation modules 188 to analyze additional data (such as data on Gordon Ramsey’s presentation of a similar alternative ingredient or based on other chefs or known cooking information) to generate new behavior, speak, actions, response for the virtual Gordon Ramsey (such as the output of an opinion that the ingredient may not be desirable, or the rendering of Gordon Ramsey adding the ingredient to the recipe using Gordon Ramsey’s voice and behavior). If the user is in an existing conversation, then the conversation management modules 186 may proceed with the same approach as in the previous section, but with the added impetus of considering the context. Using context and past conversation details in the Al system provides a more realistic user interaction and avoid the virtual representation from repeating themselves or providing the same response.
  • additional data such as data on Gordon Ramsey’s presentation of a similar alternative ingredient or based on other chefs or known cooking information
  • the conversation management modules 186 may account for the likelihood that the user will continue to ask questions or follow ups to the previous response. The conversation management modules 186 may use this information to better carry out the next algorithm by utilizing this additional information.
  • the audio rendering and updating modules 190 may be used, both independently and/or in conjunction with other modules 188, to output audio.
  • the audio rendering and updating modules 190 may ensure, for example, that the proper response to a user question to virtual Gordon Ramsey may indeed be in Gordon Ramsey’s voice, tone, and speech pattern.
  • the audio rendering and updating modules 190 may determine Gordon Ramsey frequently accompanies his speech with an exasperated sigh. As a result, the audio rendering and updating modules 190 may cause the virtual personification of Gordon Ramsey to output the same exasperated sigh in its user interactions. Additional analysis may be performed to customize speech of the virtual personification in the following areas: volume (such as a person who frequently speaks in raised voice), accent, tone (such as a person who likes to emphasize certain words), speech pattern, etc.
  • the audio rendering and updating modules 190 may also supplement the user experience with the appropriate background noise or music, which may be updated based on user action. For example, as virtual Gordon Ramsey guides the user through the experience of baking a cake, at one-point virtual Gordon Ramsey may show the user how to turn on a cake mixer. In addition to the visual rendering of a cake mixer that is currently in use, the audio rendering and updating modules 190 may update the ambient noise to a background noise of an operating cake mixer. At a later point, then virtual Gordon Ramsey turns the cake mixer off, the audio rendering and updating modules 190 may update the background noise of an operating cake mixer back to ambient noise. This would be true for any and all sounds in a virtual environment.
  • the virtual personification rendering module 192 may be used, in conjunction with other modules, to output the visual representation of the virtual personification.
  • the virtual personification rendering module 192 may be used to generate the virtual Gordon Ramsey, which may include a visual representation of one or more of the following: Gordon Ramsey’s face, body, and clothing.
  • the virtual personification rendering module 192 processes existing image or video of the person or item being personified and maps and creates new video from existing video and is also able to create new video or image from prior image and video to create a virtual personification that is doing or saying things which the original person never did or said.
  • the virtual personification prediction modules 194 may be used, in conjunction with other modules, to predict behavior and responses by the virtual personification. Particularly in cases where no existing pre-recording data exists, and the Al must generate new responses/behavior, the virtual personification rendering module 192 may be used to render predictions on the virtual personifications’ responses, which may include one or more of the following: facial expressions, body movements, gestures, etc.
  • the integration modules 196 may be used to integrate the one or more modules described above, so they may operate in sync.
  • the response generation modules 188 may be integrated with the audio rendering and updating module to accurately output a virtual personification’s speech using the proper voice and tone, which may, in turn, be integrated with the virtual personification rendering modules 192 and the virtual personification prediction modules 194 to ensure the coherence between the output of voice, facial expression, and body movements.
  • additional customization may be added to the virtual personification through additional modules.
  • additional modules may be added to analyze a person’s temperament or personality and incorporate them into the virtual personification of that person.
  • the Al presents two other main improvements on conventional systems - unlimited data and analysis, and dynamic and predictive rendering.
  • a conventional rendering of a virtual Gordon Ramsey may be limited to existing video footage of Gordon Ramsey (such as, for example, a deep fake using Gordon Ramsey’s face, or the output of a pre-recording of Gordon Ramsey).
  • a conventional software application used to render the virtual Gordon Ramsey may be limited to existing data in the local device.
  • the Al may query any database accessible over the network to retrieve additional pre-recorded data, or generate rendering of virtual Gordon Ramsey in never-before-seen situations based on analysis of existing footage.
  • additional information not currently existing on the internet may be predicted and generated.
  • Figure IB illustrates a second exemplary embodiment of the virtual personification Al system which may use a local Al 168B operating on a separate user device 190 such as a smartphone, a tablet, a personal computer, etc.
  • a separate user device 190 such as a smartphone, a tablet, a personal computer, etc.
  • Such separate user devices 190 may establish direct communication to the communication module 164 in the VR device 104B and/or a remote Al 116B housing additional Al modules or may communicate with the VR device 104B and remote Al 116B via the network 112B.
  • the various Al components 172-196 illustrated in Figure 1A and discussed above may be stored in the local Al 168B and/or the remote Al 116B.
  • the external input devices 108B and the various VR device components 120B-164B may interact with each other and with the local and remote AIs 168B, 116B in similar fashion as described in Figure
  • any one or more of the Al modules 172-196 may be included in the local Al 168 and/or the remote Al 116.
  • all Al modules 172-196 may be located on a local Al 168 operating in the VR device 104 such that no remote Al 116 may be necessary.
  • all Al modules 172-196 may be located in a remote Al 116.
  • most or all Al modules 172-196 are in a remote Al 116 such that the Al may be integrated with any VR device 104, including VR devices 104 with no built-in local Al 168 A. Such integration may be achieved using Al layers to power cross platform Al services, which is discussed in more details in U.S. Application 17/218,021.
  • VR devices may be the preferred device to implement the virtual personification Al system
  • the virtual personification Al system may be used on any computing devices capable of performing user interaction.
  • the virtual personification Al system may be implemented on a device capable of performing AR display, such that the virtual personification may be output via AR technology.
  • the virtual personification Al system may be implemented on smartphones, tablets, personal computers, laptop devices, etc., where the virtual personification may be a 2-dimensional output on a display screen.
  • the virtual personification may be in audio-only mode, for implementation on a peripheral device (such as a vehicle with CarPlay) or wearable devices (such as smartwatches, smart rings, glasses, hearing aids, headphones, earbuds, etc.), home devices (such as home speakers, security cameras, webcams, televisions, projection screen monitors, sound bars, etc.), or any other electronic devices.
  • a peripheral device such as a vehicle with CarPlay
  • wearable devices such as smartwatches, smart rings, glasses, hearing aids, headphones, earbuds, etc.
  • home devices such as home speakers, security cameras, webcams, televisions, projection screen monitors, sound bars, etc.
  • FIG. 2 illustrates an exemplary environment of use of the virtual personification Al system.
  • a user 200 may interact with the following devices, which may, as discussed above, be capable of implementing the virtual personification Al system: a VR device 204 (as illustrated in Figure 1), an AR device 208, or any other computing device 212 (such as a computer, a smart TV, or a smartphone).
  • These devices 204-212 may then implement the Al modules 220 (which are separately illustrated as 172-196 in Figure 1A and discussed above).
  • Such implementation may be, as discussed above, locally, remotely, or both.
  • the Al modules 220 may be integrated with third-party tools such as virtual representation modules 216 and audio data modules 224.
  • Virtual representation modules 216 may be any additional tools used to generate virtual personifications and virtual environments.
  • Audio data modules 224 may be additional tools used to generate audio for virtual personifications and virtual environments.
  • the Al modules 220 and its integrated third-party tools 216, 224 may be in direct communication, or communicate via a network 228 to access programs, servers, and/or databases stored in a cloud 232 and/or cloud-based servers, as well as other devices 236, which may in turn be connected to their respective databases 240.
  • the Al modules 220 may access the third-party tools 216, 224 remotely, such as via the network 228.
  • the Al modules 220 may thus access resources from all connected programs, devices, servers, and/or databases.
  • FIG. 3 illustrates an example embodiment of a mobile device on which a solution generator may operate, also referred to as a user device which may or may not be mobile.
  • a user device which may or may not be mobile.
  • the mobile device 300 may comprise any type of mobile communication device capable of performing as described below.
  • the mobile device may comprise a Personal Digital Assistant (“PDA”), cellular telephone, smart phone, tablet PC, wireless electronic pad, an loT device, a “wearable” electronic device or any other computing device.
  • PDA Personal Digital Assistant
  • the mobile device 300 is configured with an outer housing 304 designed to protect and contain the components described below.
  • a processor 308 communicates over the buses 312 with the other components of the mobile device 300.
  • the processor 308 may comprise any type processor or controller capable of performing as described herein.
  • the processor 308 may comprise a general purpose processor, ASIC, ARM, DSP, controller, or any other type processing device.
  • the processor 308 and other elements of the mobile device 300 receive power from a battery 320 or other power source.
  • An electrical interface 324 provides one or more electrical ports to electrically interface with the mobile device, such as with a second electronic device, computer, a medical device, or a power supply/charging device.
  • the interface 324 may comprise any type electrical interface or connector format.
  • One or more memories 310 are part of the mobile device 300 for storage of machine readable code for execution on the processor 308 and for storage of data, such as image, audio, user, location, accelerometer, or any other type of data.
  • the memory 310 may comprise RAM, ROM, flash memory, optical memory, or micro-drive memory.
  • the machine readable code (software modules and/or routines) as described herein is non-transitory.
  • the processor 308 connects to a user interface 316.
  • the user interface 316 may comprise any system or device configured to accept user input to control the mobile device.
  • the user interface 316 may comprise one or more of the following: microphone, keyboard, roller ball, buttons, wheels, pointer key, touch pad, and touch screen.
  • a touch screen controller 330 which interfaces through the bus 312 and connects to a display 328.
  • the display comprises any type display screen configured to display visual information to the user.
  • the screen may comprise a LED, LCD, thin film transistor screen, OEL CSTN (color super twisted nematic), TFT (thin film transistor), TFD (thin film diode), OEED (organic light-emitting diode), AMOEED display (active-matrix organic light-emitting diode), capacitive touch screen, resistive touch screen or any combination of such technologies.
  • the display 328 receives signals from the processor 308, and these signals are translated by the display into text and images as is understood in the art.
  • the display 328 may further comprise a display processor (not shown) or controller that interfaces with the processor 308.
  • the touch screen controller 330 may comprise a module configured to receive signals from a touch screen which is overlaid on the display 328.
  • a speaker 334 and microphone 338 are also part of this exemplary mobile device.
  • the speaker 334 and microphone 338 may be controlled by the processor 308.
  • the microphone 338 is configured to receive and convert audio signals to electrical signals based on processor 308 control.
  • the processor 308 may activate the speaker 334 to generate audio signals.
  • first wireless transceiver 340 and a second wireless transceiver 344 are connected to respective antennas 348, 352.
  • the first and second transceivers 340, 344 are configured to receive incoming signals from a remote transmitter and perform analog front-end processing on the signals to generate analog baseband signals. The incoming signal may be further processed by conversion to a digital format, such as by an analog to digital converter, for subsequent processing by the processor 308.
  • the first and second transceivers 340, 344 are configured to receive outgoing signals from the processor 308, or another component of the mobile device 308, and up convert these signals from baseband to RF frequency for transmission over the respective antenna 348, 352.
  • the mobile device 300 may have only one or two such systems, or more transceivers.
  • some devices are tri-band or quad-band capable, or have Bluetooth?, NFC, or other communication capability.
  • the mobile device 300 may be configured to operate according to any presently existing or future developed wireless standard including, but not limited to, Bluetooth, WI-FI such as IEEE 802.11 a,b,g,n, wireless LAN, WMAN, broadband fixed access, WiMAX, any cellular technology including CDMA, GSM, EDGE, 3G, 4G, 5G, TDMA, AMPS, FRS, GMRS, citizen band radio, VHF, AM, FM, and wireless USB.
  • WI-FI such as IEEE 802.11 a,b,g,n, wireless LAN, WMAN, broadband fixed access, WiMAX, any cellular technology including CDMA, GSM, EDGE, 3G, 4G, 5G, TDMA, AMPS, FRS, GMRS, citizen band radio, VHF, AM, FM, and wireless USB.
  • GPS global positioning system
  • a gyroscope 364 connects to the bus 312B to generate and provide orientation data regarding the orientation of the mobile device 300.
  • a magnetometer 368 is provided to supply directional information to the mobile device 300.
  • An accelerometer 372 connects to the bus 312B to provide information or data regarding shocks or forces experienced by the mobile device. In one configuration, the accelerometer 372 and gyroscope 364 generate and provide data to the processor 308 to indicate a movement path and orientation of the mobile device 300.
  • One or more cameras (still, video, or both) 376 are provided to capture image data for storage in the memory 310 and/or for possible transmission over a wireless or wired link, or for viewing at a later time.
  • the one or more cameras 376 may be configured to detect an image using visible light and/or near-infrared light.
  • the cameras 376 may also be configured to utilize image intensification, active illumination, or thermal vision to obtain images in dark environments.
  • the processor 308 may process machine-readable code that is stored on the memory to perform the functions described herein.
  • a flasher and/or flashlight 380 such as an LED light, are provided and are processor controllable.
  • the flasher or flashlight 380 may serve as a strobe or traditional flashlight.
  • the flasher or flashlight 380 may also be configured to emit near- infrared light.
  • a power management module 384 interfaces with or monitors the battery 320 to manage power consumption, control battery charging, and provide supply voltages to the various devices which may require different power requirements.
  • FIG. 4 is a block diagram of an exemplary computing device, mobile device, or server, such as one of the devices described above, according to one exemplary embodiment.
  • Computing device 400 is intended to represent various forms of digital computers, such as smartphones, tablets, kiosks, laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers.
  • Computing device 400 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, and other similar computing devices.
  • the components shown, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit the implementations described and/or claimed in this document.
  • Computing device 400 includes a processor 402, memory 404, a storage device 406, a high-speed interface or controller 408 connecting to memory 404 and high-speed expansion ports 410, and a low-speed interface or controller 412 connecting to low- speed bus 414 and storage device 406.
  • Each of the components 402, 404, 406, 408, 410, and 412, are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate.
  • the processor 402 can process instructions for execution within the computing device 400, including instructions stored in the memory 404 or on the storage device 406, to display graphical information for a GUI on an external input/output device, such as display 416 coupled to high-speed controller 408.
  • multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory.
  • multiple computing devices 400 may be connected, with each device providing portions of the necessary operations (e.g., a server bank, a group of blade servers, or a multi-processor system).
  • the memory 404 stores information within the computing device 400.
  • the memory 404 is a volatile memory unit or units.
  • the memory 404 is a non-volatile memory unit or units.
  • the memory 404 may also be another form of computer-readable medium, such as a magnetic or optical disk.
  • the storage device 406 is capable of providing mass storage for the computing device 400.
  • the storage device 406 may be or contain a computer-readable medium, such as a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid-state memory device, or an array of devices, including devices in a storage area network or other configurations.
  • a computer program product can be tangibly embodied in an information carrier.
  • the computer program product may also contain instructions that, when executed, perform one or more methods, such as those described above.
  • the information carrier is a computer- or machine-readable medium, such as the memory 404, the storage device 406, or memory on processor 402.
  • the high-speed controller 408 manages bandwidth- intensive operations for the computing device 400, while the low-speed controller 412 manages lower bandwidthintensive operations. Such allocation of functions is exemplary only.
  • the high-speed controller 408 is coupled to memory 404, display 416 (i.e., through a graphics processor or accelerator), and to high-speed expansion ports 410, which may accept various expansion cards (not shown).
  • low-speed controller 412 is coupled to storage device 406 and low- speed bus 414.
  • the low-speed bus 414 which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router (i.e., through a network adapter).
  • input/output devices such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router (i.e., through a network adapter).
  • the computing device 400 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 420, or multiple times in a group of such servers. It may also be implemented as part of a rack server system 424. In addition, it may be implemented in a personal computer such as a laptop computer 422. Alternatively, components from computing device 400 may be combined with other components in a mobile device (not shown), such as device 450. Each of such devices may contain one or more computing devices 400, 450, and an entire system may be made up of multiple computing devices 400, 450 communicating with each other.
  • Computing device 450 includes a processor 452, memory 464, an input/output device such as a display 454, a communication interface 466, and a transceiver 468, among other components.
  • the computing device 450 may also be provided with a storage device, such as a micro-drive or other device(s), to provide additional storage.
  • a storage device such as a micro-drive or other device(s)
  • Each of the components 452, 464, 454, 466, and 468 are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
  • the processor 452 can execute instructions within the computing device 450, including instructions stored in the memory 464.
  • the processor 452 may be implemented as a chipset of chips that include separate and multiple analog and digital processors.
  • the processor may provide, for example, for coordination of the other components of the computing device 450, such as control of user interfaces, applications run by the computing device 450, and wireless communication by the computing device 450.
  • Processor 452 may communicate with a user through control interface 458 and display interface 456 coupled to a display 454.
  • the display 454 may be a TFT LCD (Thin-Film-Transistor Liquid Crystal Display) or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology.
  • the display interface 456 may comprise appropriate circuitry for driving the display 454 to present graphical and other information to a user.
  • the control interface 458 may receive commands from a user and convert them for submission to the processor 452.
  • an external interface 462 may be provided in communication with processor 452, to enable near area communication of computing device 450 with other devices .
  • external interface 462 may provide for wired communication, or in other implementations, for wireless communication, whilst multiple interfaces may also be used.
  • the memory 464 stores information within the computing device 450.
  • the memory 464 can be implemented as one or more of a computer-readable medium or media, a volatile or a non-volatile memory unit or units.
  • Expansion memory 474 may also be provided and connected to the computing device 450 through expansion interface 472, which may include, for example, a SIMM (Single In Line Memory Module) card interface.
  • SIMM Single In Line Memory Module
  • expansion memory 474 may provide extra storage space and/or may also store applications or other information for the computing device 450.
  • expansion memory 474 may include instructions to carry out or supplement the processes described above and may also include secure information.
  • expansion memory 474 may be provided as a security module for computing device 450 and may be programmed with instructions that permit secure use of the same.
  • the memory may include for example, flash memory and/or NVRAM memory, as discussed below.
  • a computer program product is tangibly embodied in an information carrier.
  • the computer program product contains instructions that, when executed, perform one or more methods, such as those described above.
  • the information carrier is a computer- or machine-readable medium, such as the memory 464, expansion memory 474, or memory on processor 452, that may be received for example, over transceiver 468 or external interface 462.
  • the computing device 450 may communicate wirelessly through communication interface 466, which may include digital signal processing circuitry where necessary. Communication interface 466 may provide for communications under various modes or protocols, such as GSM voice calls, SMS, EMS, or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, or GPRS, among others. Such communication may occur for example, through a radio-frequency transceiver 468. In addition, short-range communication may occur, such as using a Bluetooth, Wi-Fi, or other such transceiver (not shown). In addition, GPS (Global Positioning system) receiver module 470 may provide additional navigation- and location-related wireless data to the computing device 450, which may be used as appropriate by applications running on the computing device 450.
  • GPS Global Positioning system
  • the computing device 450 may also communicate audibly using audio codec 460, which may receive spoken information from a user and convert it to usable digital information. Audio codec 460 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the computing device 450. Such sound may include audio from voice telephone calls, recorded audio (e.g., voice messages, music files, etc.), and may also further include audio generated by applications operating on the computing device 450.
  • Audio codec 460 may receive spoken information from a user and convert it to usable digital information. Audio codec 460 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the computing device 450. Such sound may include audio from voice telephone calls, recorded audio (e.g., voice messages, music files, etc.), and may also further include audio generated by applications operating on the computing device 450.
  • the computing device 450 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 460. It may also be implemented as part of a smartphone 482, personal digital assistant, a computer tablet, or other similar mobile device.
  • various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, especially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof.
  • ASICs application specific integrated circuits
  • These various implementations can include applications in one or more computer programs that are executable and/or interpretable on a programmable system, including at least one programmable processor which may be special or of general purpose, coupled to receive data and instructions, to and from a storage system, at least one input device, and at least one output device.
  • the systems and techniques described here can be implemented on a computer having a display device LCD (liquid crystal display) monitor, LED, or any other flat panel display, for displaying information to the user, a keyboard, and a pointing device (e.g., mouse, joystick, trackball, or similar device) by which the user can provide input to the computer.
  • LCD liquid crystal display
  • a keyboard e.g., a keyboard
  • a pointing device e.g., mouse, joystick, trackball, or similar device
  • Other kinds of devices can be used to provide for interaction with a user as well, for example; feedback provided to the user can be any form of sensory feedback (e.g., visual, auditory, or tactile); and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • the systems and techniques described here may be implemented in a computing system (e.g., computing device 400 and/or 450) that includes a back end component, or that includes a middleware component (e.g., application server), or that includes a frontend component such as a client computer having a graphical user interface or a Web browser through which a user may interact with an implementation of the systems and techniques described herein, or any combination of such back-end, middleware, or front-end components.
  • the components of the system may be interconnected by any form or medium of digital data communication, such as a communication network. Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
  • LAN local area network
  • WAN wide area network
  • the Internet the global information network
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • the virtual personification Al system may be used to implement many possible applications.
  • a conventional weather application may display weather searchable by pre-determined parameters (such as location and date).
  • the Al system may output a virtual personification of a popular on-camera meteorologist (such as Jim Cantore) to not only provide weather based on pre-determined parameters, but also to further interact with the user.
  • a virtual personification of a popular on-camera meteorologist such as Jim Cantore
  • a user may first request the current weather and a forecast for Hawaii, and then ask, “what clothes should I pack for my upcoming vacation”?
  • a conventional weather application will not understand the user question because it may not remember the context (weather in Hawaii).
  • the virtual personification Al system may not only understand the context of the user question and accurately determine the user’s true request - a retrieval of the user’s calendar to determine the accurate date range and/or location for the upcoming vacation, a projection of the weather during that date range, an analysis of proper attire given the weather, personal data (such as user preferences on clothing items), and location (which may take into account additional factors such as humidity, altitude, and local culture). Further, the Al may be capable of presenting the proper response to the user request using the virtual personification of Jim Cantore in a conversational format, even though that person has not previously answered that question or provided that particular response in the past.
  • pre-recorded data may be analyzed, and virtual personifications may be generated, for any person(s), not just famous ones.
  • a user may submit family videos of a great-grandfather who has passed away, and the virtual personification system may render a virtual personification of the greatgrandfather, who may interact with future generations. This concept may be applied to any person living or passed away.
  • the system may create virtual personifications which have an appearance different to anyone alive or previously alive.
  • the Al may supplement missing footage with its own.
  • new rendering may be generated using Jillian Michaels’ footage performing other, but similar actions, or other people performing the requested exercise, but rendered to appear as Jillian Michaels.
  • the new rendering may be generated using deepfake or other technology to combine Jillian Michaels’ footage with generic footage of another person performing a standing core exercise.
  • the Al may provide a set of default footage (such as a default virtual model performing standing core exercises) and superimpose whatever footage of Jillian Michaels’ may be available on the default model. It is contemplated that any other type of technology may be used to generate new rendering where no pre-recorded data exist.
  • Another possible expansion of the virtual personification Al system may be to generate and render entirely imaginary characters. For example, by combining a new character design for a two-headed animal with default models or existing footage of a dinosaur, a virtual personification of a new two-headed dinosaur may be generated, and its behavior may be based on analysis of a wolf, or it may interact with the user with the voice of an old man.
  • virtual personification may be customized based on both pre-recorded data and user preferences. It is contemplated that the personification is not limited to people but other things (real or not real), such as but not limited to animals, robots, cars, birds, fish, extinct species, alien creatures, created items or beings, or any other item.
  • the virtual personification Al system s ability to generate any type of virtual personification and its versatility of customization enables broad application to all types of technology and environment.
  • One exemplary use may be education, where classes and/or tutorials may be provided to students with virtual personification of an instructor on any subject, which may automate some or all of a student’s classroom experience without compromising the student’s personal interaction (such as to ask questions).
  • the virtual instructor may draw information from existing knowledge or databases, thereby providing answers a live instructor may not have.
  • the class may be taught by a virtual representation of someone famous, such as Albert Einstein, Sandra Day O’Connor, or Alexander Graham Bell.
  • Another exemplary use may be training, which may include standardized training for professional purposes (such as customized professional training of airplane pilots using a virtual personification of a flight instructor and the cockpit), or training for hobbies or information learning (such as cooking with a virtual personification of Gordon Ramsey).
  • the virtual environment may react to different actions by the user, such as use of certain controls, to provide realistic training.
  • the virtual personification Al system may be used to generate realtime response instructions, such as medical or emergency training.
  • a 9- 1-1 dispatcher may assist a caller to perform CPR by transmitting footage of a virtual personification of medical personnel performing CPR on a patient while waiting for an ambulance to arrive.
  • the caller may interact with the medical personnel by asking questions such as “the patient is still not breathing, now what”? Answers may be pulled from a database and the personification may actually perform the answer to the question so the user can see how to perform the medical training.
  • This concept can be applied to teaching tools for medical procedures such that the virtual representations of the best doctors in the world can be created to show how to perform a procedure and dynamically respond to any question for personalized teaching.
  • Yet another exemplary use may be entertainment, such as allowing users to interact with famous people from the past or an imaginary character in a conversational setting. This allows people to have personal interactions with people from history and interact using the Al databases such that the virtual representation can answer any question or perform any action in real time during user interaction. These famous people from which the virtual representation is created could be any person, famous for any reason.
  • a database may be created for each person so that the Al system can accurately create visual, audio, and knowledge representations.
  • the virtual personification Al system may be used to generate realtime response instructions, such as medical or emergency training.
  • a 9- 1-1 dispatcher may assist a caller to perform CPR by transmitting footage of a virtual personification of medical personnel performing CPR on a patient while waiting for an ambulance to arrive. The caller may interact with the medical personnel by asking questions such as “the patient is still not breathing, now what”?
  • Yet another example may be a simulated room where two virtual personifications may interact with each other instead of (or in addition to) interaction with users.
  • a user may wish to simulate a philosophical debate between Socrates and Kant.
  • the virtual room may be expanded to include entire virtual worlds and large populations of virtual characters. The user can learn from seeing how two or more virtual representation interact, such as in professional environments, military situation, formal engagements, or social interaction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A method and apparatus to generate and update virtual personification using artificial intelligence comprising a system configured to perform the following. Receive data associated with a person such as text files, audio files, image files, and video files. Render a virtual personification of the person and output the virtual personification to a user, such as on a display screen. Then, receiving and interpreting a user input to generate a user request, and then updating the virtual personification. The update may include generating an audio output using the text files and the audio files of the person and/or generating a video output using the image files and the video files of the person. The audio output and the video output is presented to the user by the virtual personification and it has not previously occurred by the person or thing represented by the virtual personification.

Description

METHOD AND SYSTEM FOR VIRTUAL INTELLIGENCE USER
INTERACTION
1. Field of the Invention
[0001] The present invention is directed to a method and system to provide user interaction with virtual personifications using artificial intelligence (“Al”).
2. Description of the Related Art
[0002] Advancement with VR and AR technology now allows for users to view real or simulated environments, (referred to as virtual environments) using a screen equipped headset or a traditional screen. Within these virtual environments, user have been able to view elements and move about to further explore the world. However, user interaction with current technology on virtual avatar is typically based on pre-recorded, pre-scripted, image or audio files. In other words, the user can look about the environment and travel from place to place within the environment but beyond that, interaction with the virtual environment is limited.
[0003] Other systems allow for some interaction with the virtual environment to obtain information about an items in the environment, such as to click on an items to obtain additional information. However, the interaction with elements in the virtual environment is limited to pre-created or pre-recorded information that typically is no more than a short pre-recorded message or text that is typically non-responsive and often no better than a frustrating voice script. These systems lack individualization to the particular user’s interest and specific questions and are sterile in that prior art systems are no better than simply reading an article or watching a video on a web site.
SUMMARY
[0004] To overcome the drawbacks of the prior art and provide additional benefits, disclosed is a system and method to generate and update virtual personification using artificial intelligence receiving data associated with a person, the data comprising one or more of the following: text files, audio files, image files, and video files, and rendering a virtual personification of the person and outputting the virtual personification to a user. Then, receiving and interpreting a user input to generate a user request and updating the virtual personification in response to the user request. The update comprising one or more of the following. Responsive to the user request, generating an audio output using the text and audio files of the person and responsive to the user request, generating a video output using the image files and the video files of the person, such that the audio output and the video output is presented to the user by the virtual personification. Furthermore, the audio output and the video output presented by the virtual personification has not previously occurred by the person or thing represented by the virtual personification.
[0005] In one embodiment, the virtual personification is of a person, either living or deceased. It is contemplated that the virtual personification may comprise an audio output and video output which are presented in a virtual environment of a type associated with the virtual personification. The virtual personification may comprise a representation of a non-living item.
[0006] In one embodiment, the method is further configured to, responsive to being unable to create the generated response at the virtual reality device, transmit the question or request from the user to a remote artificial intelligence module. The remote artificial intelligence module may be a computing device with a processor and memory storing machine readable code configured to receive the question or request from the user via the virtual reality device, process the question or request to derive a meaning, and perform one or more searches for answers to the question or request in databases unrelated to the virtual personification. Upon locating an answer to the question or request, generating data that represents the virtual personification answering the question or request and transmitting the answer or the data that represents the virtual personification answering the question or request to the virtual reality device for presentation to the user. It is also contemplated that the method may further comprise tracking a user’s hand position using one or more user hand position tracking devices to determine what the user is pointing at in the virtual environment. The step of generating a response to the question or request may use artificial intelligence to generate an answer by searching one or more databases that contain information from the person represented by the virtual personification but which to not provide a direct answer to the question or request.
[0007] Also disclosed is a system for presenting an interactive, artificial intelligence assisted, virtual personification to a user comprising a virtual reality device configured to have at least a portion be worn by the user. The virtual reality device includes a wearable screen configured for viewing by a user, one or more speakers configured to provide audio output to the user, a microphone configured to receive audio input from the user, and one or more external input devices configured to receive input from the user. Also part of the virtual reality device includes a communication module configured to communicate over a computer network or Internet, and a processor with access to a memory. The processor executes machine readable code and the memory is configured to store the machine readable code. The machine readable code is configured to present a virtual environment on the wearable screen and through the one or more speakers to the user and present, to the user on the wearable screen and through the one or more speakers, a virtual personification of a person currently living or deceased, in the virtual environment. The code is also configured to receive a question or request from the user regarding one or more aspects of the virtual environment or the virtual personification and then generate a response to the question or request from the user, which includes generating video content and audio content which did not previously exist. The code then presents the generated response to the user on the wearable screen and through the one or more speakers in response to question or request from the user.
[0008] In one embodiment, the machine readable code is further configured to, responsive to being unable to create the generated response at the virtual reality device, transmit the question or request from the user to a remote artificial intelligence module. It is further contemplated that the remote artificial intelligence module may be a computing device with memory and processor such that memory store machine readable code configured to receive the question or request from the user via the virtual reality device, process the question or request to derive a meaning, and perform one or more searches for answers to the question or request in databases unrelated to the virtual personification. Then, upon locating an answer to the question or request, generating data that represents the virtual personification answering the question or request, and transmitting the answer or the data that represents the virtual personification answering the question or request to the virtual reality device for presentation to the user. [0009] The system may further comprise one or more user hand position tracking device configured to track a position of a user’s hand to determine what the user is pointing at in the virtual environment. In one embodiment, the input from the user comprises an audio input or an input to the one or more external input devices. It is contemplated that generating video content and audio content which did not previously exist is generated by processing existing video, audio, or both, of the person represented by the virtual personification, to form the video content and audio content which did not previously exist. In addition, the generated response to the question or request uses artificial intelligence to generate an answer by searching one or more databases that contain information from a person represented by the virtual personification but which to not provide a direct answer to the question or request.
[0010] Also disclosed herein is a method for presenting an interactive experience with a virtual personification using a screen, speakers, and microphone of a user computing device. In one embodiment the method comprises a virtual environment on the wearable screen and through the one or more speakers to the user and present the virtual personification in the virtual environment. Then, receiving input from the user comprising a question, a user request, or subject regarding one or more aspects of the virtual environment, the virtual personification, or the actions of the virtual personification in the virtual environment. This method then sends a request for a response to the input from the user to an Al computing device that is remote from the user computing device, and with the Al computing device, create a response based on pre-existing content stored in one or more databases which is processed to create the generated response. Then, transmitting the generated response to the user computing device and, at the user computing device, based on the generated response from the Al computing device, generating video content and audio content which did not previously exist. Finally, the method of operation presents video content and audio content which did not previously exist to the user.
[0011] In one embodiment, the Al computing device is a computing device with memory and processor such that the memory stores machine readable code configured to receive the input from the user computing device, process the input from the user to derive a meaning, and based on the meaning, perform one or more searches for answers to the input from the user in databases unrelated to the virtual personification. Upon locating an response to the input from the user, generate data that represents the virtual personification answering the question or request, and transmit the data, that represents the virtual personification responding to the input from the user, to the user computing device.
[0012] This method may further include monitoring one or more user hand position tracking devices configured to track a position of a user’ s hand to determine what the user is pointing at in the virtual environment and interpreting the pointing as the input from the user. It is contemplated that the input from the user comprises an audio input or an input from the user to the one or more external input devices. The step of generating video content and audio content which did not previously exist occurs by processing existing video, audio, or both of a person represented by the virtual personification to generate new content.
[0013] Other systems, methods, features and advantages of the invention will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims.
DESCRIPTION OF THE FIGURES
[0014] The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. In the figures, like reference numerals designate corresponding parts throughout the different views.
[0015] Figure 1A illustrates a first exemplary embodiment of the present virtual personification Al system integrated into a virtual reality system.
[0016] Figure IB illustrates a second exemplary embodiment of the virtual personification Al system which may use a local Al operating on a separate user device such as a smartphone, a tablet, a personal computer, etc.
[0017] Figure 2 illustrates an exemplary environment of use of the virtual personification Al system.
[0018] Figure 3 illustrates a block diagram of an example embodiment of a computing device, also referred to as a user device which may or may not be mobile.
[0019] Figure 4 is a block diagram of an exemplary computing device, mobile device, or server, such as one of the devices described above, according to one exemplary embodiment. DETAILED DESCRIPTION OF THE INVENTION
GLOSSARY OF TERMS
[0020] Al services: Services provided as procedures and methods to a program to accomplish artificial intelligence goals. Examples may include, but are not limited to, image modeling, text modeling, forecasting, planning, recommendations, search, speech processing, audio processing, audio generation, text generation, image generation, and many more.
[0021] Device: A device is any element running with a minimum of a CPU or a system which is used to interface with a device. Optionally, an accelerator can be attached in the form of a GPU or other specialized hardware accelerator. This accelerator can speed up the computation of Al services.
[0022] Application: An application is any software running on any device such as mobile devices, laptop, desktop, server, smart watches, tablets, home speakers, wearable devices including smart rings, glasses, hearing aids, CarPlay devices, security cameras, webcams, televisions, projection screen monitors, sound bars, personal computers, headphones, earbuds, and laptop devices where a user can interact with touch, audio, visual, or passively.
[0023] The following terms are used in this document and the following definitions are provided to aid in understanding but should be interpreted as being limiting in scope
[0024] Al services: Services provided as procedures and methods to a program to accomplish artificial intelligence goals. Examples may include, but are not limited to, image modeling, text modeling, forecasting, planning, recommendations, search, speech processing, audio processing, audio generation, text generation, image generation, and many more.
[0025] Device: A device is any element running with a minimum of a CPU or a system which is used to interface with a device. Optionally, an accelerator can be attached in the form of a GPU or other specialized hardware accelerator. This accelerator can speed up the computation of Al services.
[0026] Application: An application is any software running on any device such as mobile devices, laptop, desktop, server, smart watches, tablets, home speakers, wearable devices including smart rings, glasses, hearing aids, CarPlay devices, security cameras, webcams, televisions, projection screen monitors, sound bars, personal computers, headphones, earbuds, and laptop devices where a user can interact with touch, audio, visual, or passively.
[0027] In this disclosure, a virtual personification system may analyze pre-recorded data to generate dynamic responses to user requests/questions through virtual personifications. In one embodiment, the virtual personification may be a virtual representation which may be based on a real person. For example, the user, a family member or relative, a famous person, a historical figure, or any other type person. The virtual representation may also be a user or computer created person that does not represent a real person. Pre-recorded data may include image, video, or audio footage of the real person (such as YouTube and other film footage). Dynamic responses are generated to user requests/questions related to that known person, even though prerecorded data may not include any adequate responses or responses which will match the question.
[0028] For example, a user may wish to be provided with a recipe from a famous chef, such as Gordon Ramsey, to make grilled salmon. Upon determination that pre-recorded data exists on Gordon Ramsey making another type of grilled chicken, the virtual personification may analyze Gordon Ramsey’s footage on making grilled chicken and grilled potatoes to generate a virtual personification of Gordon Ramsey guiding the user through the process of making grilled salmon, as if Gordon Ramsey were in a cooking show and personally providing detailed instructions to the specific user request. The system Al can pull details from prior recordings and manipulate the visual and audio files to create a new virtual representation that is directly and accurately responsive to the user’s request. Al may generate new information, such as how to adjust the response to be responsive to the specific user request. In the example of the cooking question, Al can understand the user’s request, analyze the information already provided by the chef about how to cook chicken, realize that chicken is not salmon, and then search for a recipe for salmon by the same chef or recipe, and then process the new recipe and the virtual representation to present the new recipe to the user of the system using the virtual representation, as if the original chef was actually providing the recipe for salmon and not chicken. Although applied to food, this example may be applied to any other topic or environment of use. [0029] The virtual personification of Gordon Ramsey may use a voice that sounds like Gordon Ramsey, may be dressed like Gordon Ramsey, as he typically appears on cooking shows, and may mimic Gordon Ramsey’s body language and speech pattern. Al may be used to create the virtual personification even in situations when the actual person never actually provided a responsive answer in a video or audio recording. The virtual representation may be created using built-in Al modules such as a virtual personification rendering module (discussed in more details below) or using third-party tools, which the virtual personification system may interface with.
[0030] In another example, the user may attempt Gordon Ramsey’s recipe of scrambled eggs, which may already be available on YouTube, and which may involve the use of milk. However, upon determination he has no milk in the fridge, the user may wish to ask Gordon Ramsey whether whipped cream may be used as a substitute. While in the existing footage on YouTube, Gordon Ramsey may not have provided an answer to that question, the virtual personification may analyze Gordon Ramsey’s footage on substituting other items for milk to generate a virtual personification of Gordon Ramsey to answer this user question. The virtual personification of Gordon Ramsey may include a prediction of Gordon Ramsey’s typical reaction in such situations. For example, the Al may determine, based on pre-recorded data, that Gordon Ramsey typically acts impatiently to such questions. Thus, the virtual personification of Gordon Ramsey may display a frown or curt gestures when providing the predicted answer.
[0031] In one embodiment, the virtual personification may be presented in a virtual reality space, which may be rendered using a virtual reality system. For example, in a cooking environment, the virtual reality space may be a kitchen. For other topics, such as carpentry the environment may be a wood working shop, car repair would appear in an auto garage, education may appear as a classroom, and information about a topic may actually appear inside the items, such as inside a virtual computer or a virtual engine to show how something works in combination with Al that creates answers for the user using the virtual reality space and the virtual personification.
[0032] Figure 1A illustrates a first exemplary embodiment of the present virtual personification Al system integrated into a virtual reality system. The virtual reality space is rendered by a virtual reality system. Exemplary virtual reality systems are described in U.S. Patent No 9,898,091, U.S. Patent Publication 2014/0364212, and U.S. Patent Publication 2015/0234189, which are incorporated by reference herein in their entirety as teaching exemplary virtual reality systems and methods. A user 100A may access the virtual reality space by the one or more components of a virtual reality system, such as a virtual reality device (“VR device”) 104A and external input devices 108A, which may be accessories to the VR device 104A. The VR device 104A may be in direct communication with the external input devices 108A (such as by Bluetooth?) or via network 112A providing internet or signals (e.g., a personal area network, a local area network (“LAN”), a wireless LAN, a wide area network, etc.). The VR device 104A may also communicate with a remote Al 116A via the network 112A.
[0033] In a preferred embodiment, the VR device 104A may be a wearable user device such as a virtual reality headset (“VR headset”), and the external input devices 108A may be hand-held controllers where a user may provide additional input such as arm motion, hand gestures, and various selection or control input through buttons or joysticks on such controllers. [0034] The VR device may generally include input devices 120 A through 128 A, input processing modules 132A, VR applications 134A, output rendering modules 138A, output devices 156A, 160A, and a communication module 164A. Input devices may include one or more audio input devices 120 A (such as microphones), one or more position tracking input devices 124A (to detect a user’s position and motion), and one or more facial tracking input devices 128 A (such as facial cameras to detect facial expressions, eye-tracking camera to detect gaze and eye movement, etc.). Additional external input devices may provide user biometrics data or tracking of other user body parts.
[0035] The input processing modules 132A may include, but are not limited to, an external input processing module 142A (used to process external inputs such as input from external devices 108A or additional external input devices discussed above), an audio input processing module 144A (used to process audio inputs, such as user speech or sounds), a position input processing module 146A (to process position and motion tracking inputs such as hand motions, finger motions, arm motions, head position), and a facial input processing module 148A (to process facial inputs of the user).
[0036] The VR applications 134A are generally responsible for rendering virtual reality spaces associated with their respective VR applications 134A. For example, a VR museum application may render a virtual museum through which a user may traverse and present various artwork which the user may view or interact with. This is achieved through the VR application’s 134A integration with output rendering modules 138A, which in turn presents the rendered files on output devices 156A, 160A. [0037] Specifically, the output rendering modules 138A may include, but are not limited to, an audio output processing module 150A responsible for processing audio files, and an image and/or video output processing module 152A, responsible for processing image and/or video files. In turn, one or more audio output devices 156A, such as built-in speakers on the VR headset may present the processed audio file, and one or more image and/or video output devices 160A (such as a built-in screen on the VR headset) may display the processed image and/or video files. Other types of output may include, but are not limited to, motion or temperature changes to the VR device 104A or the external input devices 108A (such as vibration on hand-held controllers).
[0038] User interaction may in turn modify the virtual reality space. For example, if a user inputs motion to indicate he picked up a vase, the rendered virtual reality space may display a vase moving in accordance with the user’s motion. Thus, the transmission of information occurs in a bi-directional streaming fashion, from the user 100A to the VR device 104A and/or external input devices 108A, then from the VR device 104A and/or external input devices 108A back to the user 100A. U.S. Application 17/218,021 provides a more detailed discussion on bi-directional streaming using Al services and examples of broader and specific uses.
[0039] The Al may be completely or partially built into the VR device 104 A or specific VR applications 134A. Such built-in Al components may be referred to a local Al 168 A. Other Al components may be located in the remote Al 116A, which may be operating on remote devices or on cloud-based servers. The local and remote Al 168 A, 116A may communicate via the network 112A. [0040] The Al may enhance the user s 100A interaction with the virtual reality system using the embodiments and methods described above. The Al may include one or more of the following components to generally operate the Al and process data, one or more processors 172 and one or more memory storage devices where logic modules 176 and machine learning modules 178 may be stored to provide general Al services. The memory storage devices may further include one or more modules to specifically enhance user-VR interaction, such as speech-to-text modules 180, non-verbal input processing modules 182, text augmentation modules 184, conversation management modules 186, response generation modules 188, audio rendering and updating modules 190, virtual personification rendering modules 192, virtual personification prediction modules 194, and integration modules 196.
[0041] The speech-to-text modules 180 may be used to perform voice detection and customized speech to text recognition, as well as to generally detect, recognize, process, and interpret user audio input. Recognition allows the speech-to-text modules 180 to distinguish between verbal input (such as a user question) and non-verbal input (such as the user’s sigh of relief).
[0042] A user may start an active conversation in the virtual reality space by simply speaking. The speech-to-text modules 180 may use voice activity detection in order to differentiate that the user has started speaking, as opposed to ambient noise activity. When true speech is detected, the speech-to-text modules 180 may process the input audio from the microphone to recognize the user’s spoken text. This processing can either happen as part of the viewing device (such as the VR device 104A), on a device connected to the viewing device, or on a remote server over the network (such as the remote Al 116A). This process may convert the stream of audio into the spoken language, such as text processable by a computer.
[0043] The speech-to-text modules 180 may be customized to the current scene that the user is experiencing inside the virtual space, or a virtual personification that the user wishes to interact with. This customization could allow for custom vocabulary to be recognized when it would make sense in the specific environment or specific virtual personification. For example, if a user were interacting with a virtual personification of a cooking chef, then the speech recognition system may be customized to enhance name recognition for words associated food, whereas in a different environment a different vocabulary would be used. If the virtual personification of Gordon Ramsey were in a kitchen, then the speech recognition system may be customized to enhance name recognition for kitchen utensils.
[0044] While the virtual reality system may have its own modules to process audio inputs, the Al’s speech-to-text modules 180 are intended to integrate and enhance existing features in the virtual reality system. For example, the Al speech-to-text modules 180 may generate exponential amounts of interpretations from a single user input, automatically select the top interpretation based on user data, and hold multi-turn conversations with the user as a continuation of that single user input. Appendix A includes a more detailed discussion on systems and methods for enhanced speech-to- text. The enhanced speech-to-text and integration with other applications outside the virtual reality system, and the additional mechanism to recognize usable user input (as discussed in step 2) and to process out of scope user input. [0045] The Al s non-verbal input processing modules 182 may be used to process nonverbal input. As discussed above, audio input may be non-verbal (such as a user’s sigh of relief, or tone of voice). As well, external input devices 108 A may include devices to track a user’ s biometrics or body parts other than arm, hand, and finger movement. Examples of devices to track a user’s biometrics include but are not limited to smartwatches, Fitbits?, heart-rate monitors, blood pressure monitors, or any other devices which may be used to track a user’ s heart-rate, oxygen level, blood pressure, or any other metrics that may track a user’s body condition. Such input may all be processed using additional processing modules, which may be part of the virtual reality system (such as built into the VR device 104A), and/or may be part of the local or remote Al 168A, 116A.
[0046] The text augmentation modules 184 may be used to add further context to the interpreted user 100A input. When the speech-to-text modules 180 has successfully transcribed the user’ s spoken text, the text augmentation modules 184 may supplement the spoken text with what the user is currently doing, or interacting with, to enhance its linguistic understanding of what the user has said. For example, this allows the Al to find co-references between what the user said and what they are looking at. Such as, if a user asks, “how old is this”, the term “this” can be implied from what the user is currently looking at, touching, near, or pointing at in the virtual world. This functionality can be carried about by fusions of any or one of the following inputs: the user's head position, eye detection, hand position - including placement, grip, pointing, controller position, and general orientation. Furthermore, the system may also fuse in non-controller related signals, such as biometrics from heart rate, breathing patterns, and any other bio sensory information. This information is fused over time to detect not just instantaneous values for fusion but trends as well.
[0047] The text augmentation modules 184 may also be integrated with the non-verbal input processing modules 182 to receive further context. For example, in a multi-tum conversation where a user requests information, the user may input the word “okay”. Conventional system may, by default, cease communication because the response “okay” may be pre-coded as a command to terminate interaction. The text augmentation modules 184, in contrast, may analyze the user’s tone to detect (1) boredom, and interpret “okay” as a request to shorten the information provided, (2) hesitation or confusion, and interpret “okay” as a request for additional information, (3) impatience, and interpret “okay” as a request to end the interaction.
[0048] The text augmentation modules’ 184 integration with other devices and modules may not be linear. Rather, context from the virtual reality system may be used in one or more steps of speech interpretation. For example, in a multi-tum conversation (such as the conversation described above), at each turn of a user input, the speech-to-text modules may be used to generate the most accurate interpretation of the user’ s input, and the non-verbal input processing module 182 may be used to inject more context. Further, the Al’s conversation management modules 186 may be integrated with the text augmentation modules 184 to generate the output used in single or multi-turn conversations.
[0049] Once the Al has augmented the text by considering the current state of the virtual space in relation to the user, then a conversation may be carried out. The conversation management modules 186 may classify the spoken text into different categories to facilitate the open-ended conversation. First the conversation management modules 186 may determine if a statement is meant to initiate a new conversation or one that continues an existing conversation. If the user is detected to initiate a new conversation, then the conversation management modules 186 may classify the result among categories. A first category may include user comments that may not necessarily require a strong response. For example, if a user states “this is really cool”, the conversation management modules 186 may render the virtual personification to respond with a more descriptive or expressive response in relation to what was remarked as being cool. Alternatively, the virtual personification may not respond. A second category may include user questions that may be in relation to the current scene. A third category may be user questions that are in relation to the nonvirtualized world (i.e., reality).
[0050] In the second and third categories, the conversation management modules 186 may facilitate an answer to the question via the virtual personification. In the second category of a question being detected in relation to the virtual world, the system may then proceed down to one of two or more paths. The conversation management modules 186 may first attempt to use information in pre-recorded data to answer the question. For example, during a user interaction with a virtual Gordon Ramsey on making a grilled salmon, a user may ask about the use of an ingredient not in the current recipe. The conversation management modules 186 may retrieve footage from another video where Gordon Ramsey uses that ingredient and may render the virtual Gordon Ramsey to modify the current recipe to include that ingredient. [0051] If no pre-recorded data exists, then the conversation management modules 186 request the response generation modules 188 to analyze additional data (such as data on Gordon Ramsey’s presentation of a similar alternative ingredient or based on other chefs or known cooking information) to generate new behavior, speak, actions, response for the virtual Gordon Ramsey (such as the output of an opinion that the ingredient may not be desirable, or the rendering of Gordon Ramsey adding the ingredient to the recipe using Gordon Ramsey’s voice and behavior). If the user is in an existing conversation, then the conversation management modules 186 may proceed with the same approach as in the previous section, but with the added impetus of considering the context. Using context and past conversation details in the Al system provides a more realistic user interaction and avoid the virtual representation from repeating themselves or providing the same response.
[0052] After the response is provided, the conversation management modules 186 may account for the likelihood that the user will continue to ask questions or follow ups to the previous response. The conversation management modules 186 may use this information to better carry out the next algorithm by utilizing this additional information.
[0053] The audio rendering and updating modules 190 may be used, both independently and/or in conjunction with other modules 188, to output audio. The audio rendering and updating modules 190 may ensure, for example, that the proper response to a user question to virtual Gordon Ramsey may indeed be in Gordon Ramsey’s voice, tone, and speech pattern. For example, based on analysis of Gordon Ramsey’s past audio files, the audio rendering and updating modules 190 may determine Gordon Ramsey frequently accompanies his speech with an exasperated sigh. As a result, the audio rendering and updating modules 190 may cause the virtual personification of Gordon Ramsey to output the same exasperated sigh in its user interactions. Additional analysis may be performed to customize speech of the virtual personification in the following areas: volume (such as a person who frequently speaks in raised voice), accent, tone (such as a person who likes to emphasize certain words), speech pattern, etc.
[0054] The audio rendering and updating modules 190 may also supplement the user experience with the appropriate background noise or music, which may be updated based on user action. For example, as virtual Gordon Ramsey guides the user through the experience of baking a cake, at one-point virtual Gordon Ramsey may show the user how to turn on a cake mixer. In addition to the visual rendering of a cake mixer that is currently in use, the audio rendering and updating modules 190 may update the ambient noise to a background noise of an operating cake mixer. At a later point, then virtual Gordon Ramsey turns the cake mixer off, the audio rendering and updating modules 190 may update the background noise of an operating cake mixer back to ambient noise. This would be true for any and all sounds in a virtual environment.
[0055] The virtual personification rendering module 192 may be used, in conjunction with other modules, to output the visual representation of the virtual personification. For example, the virtual personification rendering module 192 may be used to generate the virtual Gordon Ramsey, which may include a visual representation of one or more of the following: Gordon Ramsey’s face, body, and clothing. The virtual personification rendering module 192 processes existing image or video of the person or item being personified and maps and creates new video from existing video and is also able to create new video or image from prior image and video to create a virtual personification that is doing or saying things which the original person never did or said.
[0056] The virtual personification prediction modules 194 may be used, in conjunction with other modules, to predict behavior and responses by the virtual personification. Particularly in cases where no existing pre-recording data exists, and the Al must generate new responses/behavior, the virtual personification rendering module 192 may be used to render predictions on the virtual personifications’ responses, which may include one or more of the following: facial expressions, body movements, gestures, etc.
[0057] The integration modules 196 may be used to integrate the one or more modules described above, so they may operate in sync. For example, the response generation modules 188 may be integrated with the audio rendering and updating module to accurately output a virtual personification’s speech using the proper voice and tone, which may, in turn, be integrated with the virtual personification rendering modules 192 and the virtual personification prediction modules 194 to ensure the coherence between the output of voice, facial expression, and body movements. In one embodiment, additional customization may be added to the virtual personification through additional modules. For example, additional modules may be added to analyze a person’s temperament or personality and incorporate them into the virtual personification of that person.
[0058] Conventional virtual reality systems, and conversational Al systems, may not have the ability to hold multi-tum conversations. For example, following the user’s request to add a first ingredient to a recipe, Gordon Ramsey may suggest the use of a second ingredient instead. The user may ask, “can you show me what that would look like”. Conventional virtual reality systems may not understand what “that” is referring to, or incorrectly identify the user is still looking for a display of the first ingredient, or not have the requested information. In addition, the Al system would generate an image or video with what that added ingredient would look like even if such video or image footage never previously existed. In contrast, through the integration of speech-to-text modules 180, conversation management modules 186, and logic modules 176 such as natural language understanding modules, fuzzy logic modules, the Al would understand the user 100 A, in continuation of the previous exchange, is referring to the second ingredient and may thus provide output that is truly responsive to the user’s second input.
[0059] In addition to the robust speech interpretation and multi-conversation feature, the Al presents two other main improvements on conventional systems - unlimited data and analysis, and dynamic and predictive rendering. Using the previous example, a conventional rendering of a virtual Gordon Ramsey may be limited to existing video footage of Gordon Ramsey (such as, for example, a deep fake using Gordon Ramsey’s face, or the output of a pre-recording of Gordon Ramsey). Further, a conventional software application used to render the virtual Gordon Ramsey may be limited to existing data in the local device. The Al, in contrast, may query any database accessible over the network to retrieve additional pre-recorded data, or generate rendering of virtual Gordon Ramsey in never-before-seen situations based on analysis of existing footage. Thus, not only is the entire universe of information available on the internet accessible by the user through Al, but additional information not currently existing on the internet may be predicted and generated.
[0060] Figure IB illustrates a second exemplary embodiment of the virtual personification Al system which may use a local Al 168B operating on a separate user device 190 such as a smartphone, a tablet, a personal computer, etc. Such separate user devices 190 may establish direct communication to the communication module 164 in the VR device 104B and/or a remote Al 116B housing additional Al modules or may communicate with the VR device 104B and remote Al 116B via the network 112B. The various Al components 172-196 illustrated in Figure 1A and discussed above may be stored in the local Al 168B and/or the remote Al 116B. The external input devices 108B and the various VR device components 120B-164B may interact with each other and with the local and remote AIs 168B, 116B in similar fashion as described in Figure
IA.
[0061] It is contemplated that in any embodiment, including but not limited to 1A and
IB, any one or more of the Al modules 172-196 may be included in the local Al 168 and/or the remote Al 116. In one embodiment, all Al modules 172-196 may be located on a local Al 168 operating in the VR device 104 such that no remote Al 116 may be necessary. Alternatively, all Al modules 172-196 may be located in a remote Al 116. In preferred embodiments, most or all Al modules 172-196 are in a remote Al 116 such that the Al may be integrated with any VR device 104, including VR devices 104 with no built-in local Al 168 A. Such integration may be achieved using Al layers to power cross platform Al services, which is discussed in more details in U.S. Application 17/218,021. [0062] While VR devices may be the preferred device to implement the virtual personification Al system, it is intended that the virtual personification Al system may be used on any computing devices capable of performing user interaction. For example, the virtual personification Al system may be implemented on a device capable of performing AR display, such that the virtual personification may be output via AR technology. Similarly, the virtual personification Al system may be implemented on smartphones, tablets, personal computers, laptop devices, etc., where the virtual personification may be a 2-dimensional output on a display screen. In one embodiment, the virtual personification may be in audio-only mode, for implementation on a peripheral device (such as a vehicle with CarPlay) or wearable devices (such as smartwatches, smart rings, glasses, hearing aids, headphones, earbuds, etc.), home devices (such as home speakers, security cameras, webcams, televisions, projection screen monitors, sound bars, etc.), or any other electronic devices.
[0063] Figure 2 illustrates an exemplary environment of use of the virtual personification Al system. In Figure 2, a user 200 may interact with the following devices, which may, as discussed above, be capable of implementing the virtual personification Al system: a VR device 204 (as illustrated in Figure 1), an AR device 208, or any other computing device 212 (such as a computer, a smart TV, or a smartphone). These devices 204-212 may then implement the Al modules 220 (which are separately illustrated as 172-196 in Figure 1A and discussed above). Such implementation may be, as discussed above, locally, remotely, or both.
[0064] In one embodiment, the Al modules 220 may be integrated with third-party tools such as virtual representation modules 216 and audio data modules 224. Virtual representation modules 216 may be any additional tools used to generate virtual personifications and virtual environments. Audio data modules 224 may be additional tools used to generate audio for virtual personifications and virtual environments.
[0065] The Al modules 220 and its integrated third-party tools 216, 224 may be in direct communication, or communicate via a network 228 to access programs, servers, and/or databases stored in a cloud 232 and/or cloud-based servers, as well as other devices 236, which may in turn be connected to their respective databases 240. In one embodiment, the Al modules 220 may access the third-party tools 216, 224 remotely, such as via the network 228. The Al modules 220 may thus access resources from all connected programs, devices, servers, and/or databases.
[0066] Figure 3 illustrates an example embodiment of a mobile device on which a solution generator may operate, also referred to as a user device which may or may not be mobile. This is but one possible mobile device configuration and as such, it is contemplated that one of ordinary skill in the art may differently configure the mobile device. The mobile device 300 may comprise any type of mobile communication device capable of performing as described below. The mobile device may comprise a Personal Digital Assistant (“PDA”), cellular telephone, smart phone, tablet PC, wireless electronic pad, an loT device, a “wearable” electronic device or any other computing device.
[0067] In this example embodiment, the mobile device 300 is configured with an outer housing 304 designed to protect and contain the components described below. Within the housing 304 is a processor 308 and a first and second bus 312A, 312B (collectively 312). The processor 308 communicates over the buses 312 with the other components of the mobile device 300. The processor 308 may comprise any type processor or controller capable of performing as described herein. The processor 308 may comprise a general purpose processor, ASIC, ARM, DSP, controller, or any other type processing device. The processor 308 and other elements of the mobile device 300 receive power from a battery 320 or other power source. An electrical interface 324 provides one or more electrical ports to electrically interface with the mobile device, such as with a second electronic device, computer, a medical device, or a power supply/charging device. The interface 324 may comprise any type electrical interface or connector format.
[0068] One or more memories 310 are part of the mobile device 300 for storage of machine readable code for execution on the processor 308 and for storage of data, such as image, audio, user, location, accelerometer, or any other type of data. The memory 310 may comprise RAM, ROM, flash memory, optical memory, or micro-drive memory. The machine readable code (software modules and/or routines) as described herein is non-transitory.
[0069] As part of this embodiment, the processor 308 connects to a user interface 316. The user interface 316 may comprise any system or device configured to accept user input to control the mobile device. The user interface 316 may comprise one or more of the following: microphone, keyboard, roller ball, buttons, wheels, pointer key, touch pad, and touch screen. Also provide is a touch screen controller 330 which interfaces through the bus 312 and connects to a display 328.
[0070] The display comprises any type display screen configured to display visual information to the user. The screen may comprise a LED, LCD, thin film transistor screen, OEL CSTN (color super twisted nematic), TFT (thin film transistor), TFD (thin film diode), OEED (organic light-emitting diode), AMOEED display (active-matrix organic light-emitting diode), capacitive touch screen, resistive touch screen or any combination of such technologies. The display 328 receives signals from the processor 308, and these signals are translated by the display into text and images as is understood in the art. The display 328 may further comprise a display processor (not shown) or controller that interfaces with the processor 308. The touch screen controller 330 may comprise a module configured to receive signals from a touch screen which is overlaid on the display 328.
[0071] Also part of this exemplary mobile device is a speaker 334 and microphone 338. The speaker 334 and microphone 338 may be controlled by the processor 308. The microphone 338 is configured to receive and convert audio signals to electrical signals based on processor 308 control. Likewise, the processor 308 may activate the speaker 334 to generate audio signals. These devices operate as is understood in the art and as such, are not described in detail herein.
[0072] Also connected to one or more of the buses 312 is a first wireless transceiver 340 and a second wireless transceiver 344, each of which connect to respective antennas 348, 352. The first and second transceivers 340, 344 are configured to receive incoming signals from a remote transmitter and perform analog front-end processing on the signals to generate analog baseband signals. The incoming signal may be further processed by conversion to a digital format, such as by an analog to digital converter, for subsequent processing by the processor 308. Likewise, the first and second transceivers 340, 344 are configured to receive outgoing signals from the processor 308, or another component of the mobile device 308, and up convert these signals from baseband to RF frequency for transmission over the respective antenna 348, 352. Although shown with a first wireless transceiver 340 and a second wireless transceiver 344, it is contemplated that the mobile device 300 may have only one or two such systems, or more transceivers. For example, some devices are tri-band or quad-band capable, or have Bluetooth?, NFC, or other communication capability.
[0073] It is contemplated that the mobile device 300, hence the first wireless transceiver 340 and a second wireless transceiver 344, may be configured to operate according to any presently existing or future developed wireless standard including, but not limited to, Bluetooth, WI-FI such as IEEE 802.11 a,b,g,n, wireless LAN, WMAN, broadband fixed access, WiMAX, any cellular technology including CDMA, GSM, EDGE, 3G, 4G, 5G, TDMA, AMPS, FRS, GMRS, citizen band radio, VHF, AM, FM, and wireless USB.
[0074] Also part of the mobile device 300 is one or more systems connected to the second bus 312B which also interfaces with the processor 308. These devices include a global positioning system (GPS) module 360 with associated antenna 362. The GPS module 360 is capable of receiving and processing signals from satellites or other transponders to generate data regarding the location, direction of travel, and speed of the GPS module 360. GPS is generally understood in the art and hence not described in detail herein. A gyroscope 364 connects to the bus 312B to generate and provide orientation data regarding the orientation of the mobile device 300. A magnetometer 368 is provided to supply directional information to the mobile device 300. An accelerometer 372 connects to the bus 312B to provide information or data regarding shocks or forces experienced by the mobile device. In one configuration, the accelerometer 372 and gyroscope 364 generate and provide data to the processor 308 to indicate a movement path and orientation of the mobile device 300.
[0075] One or more cameras (still, video, or both) 376 are provided to capture image data for storage in the memory 310 and/or for possible transmission over a wireless or wired link, or for viewing at a later time. The one or more cameras 376 may be configured to detect an image using visible light and/or near-infrared light. The cameras 376 may also be configured to utilize image intensification, active illumination, or thermal vision to obtain images in dark environments. The processor 308 may process machine-readable code that is stored on the memory to perform the functions described herein.
[0076] A flasher and/or flashlight 380, such as an LED light, are provided and are processor controllable. The flasher or flashlight 380 may serve as a strobe or traditional flashlight. The flasher or flashlight 380 may also be configured to emit near- infrared light. A power management module 384 interfaces with or monitors the battery 320 to manage power consumption, control battery charging, and provide supply voltages to the various devices which may require different power requirements.
[0077] Figure 4 is a block diagram of an exemplary computing device, mobile device, or server, such as one of the devices described above, according to one exemplary embodiment. Computing device 400 is intended to represent various forms of digital computers, such as smartphones, tablets, kiosks, laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Computing device 400 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, and other similar computing devices. The components shown, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit the implementations described and/or claimed in this document.
[0078] Computing device 400 includes a processor 402, memory 404, a storage device 406, a high-speed interface or controller 408 connecting to memory 404 and high-speed expansion ports 410, and a low-speed interface or controller 412 connecting to low- speed bus 414 and storage device 406. Each of the components 402, 404, 406, 408, 410, and 412, are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate. The processor 402 can process instructions for execution within the computing device 400, including instructions stored in the memory 404 or on the storage device 406, to display graphical information for a GUI on an external input/output device, such as display 416 coupled to high-speed controller 408. In other implementations, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Also, multiple computing devices 400 may be connected, with each device providing portions of the necessary operations (e.g., a server bank, a group of blade servers, or a multi-processor system).
[0079] The memory 404 stores information within the computing device 400. In one implementation, the memory 404 is a volatile memory unit or units. In another implementation, the memory 404 is a non-volatile memory unit or units. The memory 404 may also be another form of computer-readable medium, such as a magnetic or optical disk. [0080] The storage device 406 is capable of providing mass storage for the computing device 400. In one implementation, the storage device 406 may be or contain a computer-readable medium, such as a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid-state memory device, or an array of devices, including devices in a storage area network or other configurations. A computer program product can be tangibly embodied in an information carrier. The computer program product may also contain instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 404, the storage device 406, or memory on processor 402.
[0081] The high-speed controller 408 manages bandwidth- intensive operations for the computing device 400, while the low-speed controller 412 manages lower bandwidthintensive operations. Such allocation of functions is exemplary only. In one implementation, the high-speed controller 408 is coupled to memory 404, display 416 (i.e., through a graphics processor or accelerator), and to high-speed expansion ports 410, which may accept various expansion cards (not shown). In this representative implementation, low-speed controller 412 is coupled to storage device 406 and low- speed bus 414. The low-speed bus 414, which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router (i.e., through a network adapter).
[0082] The computing device 400 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 420, or multiple times in a group of such servers. It may also be implemented as part of a rack server system 424. In addition, it may be implemented in a personal computer such as a laptop computer 422. Alternatively, components from computing device 400 may be combined with other components in a mobile device (not shown), such as device 450. Each of such devices may contain one or more computing devices 400, 450, and an entire system may be made up of multiple computing devices 400, 450 communicating with each other.
[0083] Computing device 450 includes a processor 452, memory 464, an input/output device such as a display 454, a communication interface 466, and a transceiver 468, among other components. The computing device 450 may also be provided with a storage device, such as a micro-drive or other device(s), to provide additional storage. Each of the components 452, 464, 454, 466, and 468, are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
[0084] The processor 452 can execute instructions within the computing device 450, including instructions stored in the memory 464. The processor 452 may be implemented as a chipset of chips that include separate and multiple analog and digital processors. The processor may provide, for example, for coordination of the other components of the computing device 450, such as control of user interfaces, applications run by the computing device 450, and wireless communication by the computing device 450.
[0085] Processor 452 may communicate with a user through control interface 458 and display interface 456 coupled to a display 454. For example, the display 454 may be a TFT LCD (Thin-Film-Transistor Liquid Crystal Display) or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology. The display interface 456 may comprise appropriate circuitry for driving the display 454 to present graphical and other information to a user. The control interface 458 may receive commands from a user and convert them for submission to the processor 452. In addition, an external interface 462 may be provided in communication with processor 452, to enable near area communication of computing device 450 with other devices . In some implementations external interface 462 may provide for wired communication, or in other implementations, for wireless communication, whilst multiple interfaces may also be used.
[0086] The memory 464 stores information within the computing device 450. The memory 464 can be implemented as one or more of a computer-readable medium or media, a volatile or a non-volatile memory unit or units. Expansion memory 474 may also be provided and connected to the computing device 450 through expansion interface 472, which may include, for example, a SIMM (Single In Line Memory Module) card interface. Such expansion memory 474 may provide extra storage space and/or may also store applications or other information for the computing device 450. Specifically, expansion memory 474 may include instructions to carry out or supplement the processes described above and may also include secure information. Thus, for example, expansion memory 474 may be provided as a security module for computing device 450 and may be programmed with instructions that permit secure use of the same. In addition, secure applications may be provided via the DIMM cards, along with additional information, such as placing identifying information on the DIMM card in a non-hackable manner. [0087] The memory may include for example, flash memory and/or NVRAM memory, as discussed below. In one implementation, a computer program product is tangibly embodied in an information carrier. The computer program product contains instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 464, expansion memory 474, or memory on processor 452, that may be received for example, over transceiver 468 or external interface 462.
[0088] The computing device 450 may communicate wirelessly through communication interface 466, which may include digital signal processing circuitry where necessary. Communication interface 466 may provide for communications under various modes or protocols, such as GSM voice calls, SMS, EMS, or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, or GPRS, among others. Such communication may occur for example, through a radio-frequency transceiver 468. In addition, short-range communication may occur, such as using a Bluetooth, Wi-Fi, or other such transceiver (not shown). In addition, GPS (Global Positioning system) receiver module 470 may provide additional navigation- and location-related wireless data to the computing device 450, which may be used as appropriate by applications running on the computing device 450.
[0089] The computing device 450 may also communicate audibly using audio codec 460, which may receive spoken information from a user and convert it to usable digital information. Audio codec 460 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the computing device 450. Such sound may include audio from voice telephone calls, recorded audio (e.g., voice messages, music files, etc.), and may also further include audio generated by applications operating on the computing device 450.
[0090] The computing device 450 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 460. It may also be implemented as part of a smartphone 482, personal digital assistant, a computer tablet, or other similar mobile device.
[0091] Thus, various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, especially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include applications in one or more computer programs that are executable and/or interpretable on a programmable system, including at least one programmable processor which may be special or of general purpose, coupled to receive data and instructions, to and from a storage system, at least one input device, and at least one output device.
[0092] These computer programs (also known as programs software, software applications or code) include machine instructions for a programmable processor and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms “machine- readable medium” and “computer-readable medium” refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
[0093] To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device LCD (liquid crystal display) monitor, LED, or any other flat panel display, for displaying information to the user, a keyboard, and a pointing device (e.g., mouse, joystick, trackball, or similar device) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well, for example; feedback provided to the user can be any form of sensory feedback (e.g., visual, auditory, or tactile); and input from the user can be received in any form, including acoustic, speech, or tactile input.
[0094] The systems and techniques described here may be implemented in a computing system (e.g., computing device 400 and/or 450) that includes a back end component, or that includes a middleware component (e.g., application server), or that includes a frontend component such as a client computer having a graphical user interface or a Web browser through which a user may interact with an implementation of the systems and techniques described herein, or any combination of such back-end, middleware, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication, such as a communication network. Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
[0095] The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
[0096] It will be appreciated that the virtual personification Al system may be used to implement many possible applications. For example, a conventional weather application may display weather searchable by pre-determined parameters (such as location and date). In contrast, the Al system may output a virtual personification of a popular on-camera meteorologist (such as Jim Cantore) to not only provide weather based on pre-determined parameters, but also to further interact with the user. For example, a user may first request the current weather and a forecast for Hawaii, and then ask, “what clothes should I pack for my upcoming vacation”? A conventional weather application will not understand the user question because it may not remember the context (weather in Hawaii). The virtual personification Al system, on the other hand, may not only understand the context of the user question and accurately determine the user’s true request - a retrieval of the user’s calendar to determine the accurate date range and/or location for the upcoming vacation, a projection of the weather during that date range, an analysis of proper attire given the weather, personal data (such as user preferences on clothing items), and location (which may take into account additional factors such as humidity, altitude, and local culture). Further, the Al may be capable of presenting the proper response to the user request using the virtual personification of Jim Cantore in a conversational format, even though that person has not previously answered that question or provided that particular response in the past. [0097] It is contemplated that pre-recorded data may be analyzed, and virtual personifications may be generated, for any person(s), not just famous ones. For example, a user may submit family videos of a great-grandfather who has passed away, and the virtual personification system may render a virtual personification of the greatgrandfather, who may interact with future generations. This concept may be applied to any person living or passed away. In addition, the system may create virtual personifications which have an appearance different to anyone alive or previously alive.
[0098] In one embodiment, the Al may supplement missing footage with its own. In the example where a user asks the virtual personification of Jillian Michaels, a fitness coach, for a standing core exercise, but no recording of Jillian Michaels performing such an exercise exists, then new rendering may be generated using Jillian Michaels’ footage performing other, but similar actions, or other people performing the requested exercise, but rendered to appear as Jillian Michaels. The new rendering may be generated using deepfake or other technology to combine Jillian Michaels’ footage with generic footage of another person performing a standing core exercise. As yet another alternative, the Al may provide a set of default footage (such as a default virtual model performing standing core exercises) and superimpose whatever footage of Jillian Michaels’ may be available on the default model. It is contemplated that any other type of technology may be used to generate new rendering where no pre-recorded data exist.
[0099] Another possible expansion of the virtual personification Al system may be to generate and render entirely imaginary characters. For example, by combining a new character design for a two-headed animal with default models or existing footage of a dinosaur, a virtual personification of a new two-headed dinosaur may be generated, and its behavior may be based on analysis of a wolf, or it may interact with the user with the voice of an old man. Thus, virtual personification may be customized based on both pre-recorded data and user preferences. It is contemplated that the personification is not limited to people but other things (real or not real), such as but not limited to animals, robots, cars, birds, fish, extinct species, alien creatures, created items or beings, or any other item.
[0100] The virtual personification Al system’s ability to generate any type of virtual personification and its versatility of customization enables broad application to all types of technology and environment. One exemplary use may be education, where classes and/or tutorials may be provided to students with virtual personification of an instructor on any subject, which may automate some or all of a student’s classroom experience without compromising the student’s personal interaction (such as to ask questions). As an improvement to in-person experiences, the virtual instructor may draw information from existing knowledge or databases, thereby providing answers a live instructor may not have. The class may be taught by a virtual representation of someone famous, such as Albert Einstein, Sandra Day O’Connor, or Alexander Graham Bell.
[0101] Another exemplary use may be training, which may include standardized training for professional purposes (such as customized professional training of airplane pilots using a virtual personification of a flight instructor and the cockpit), or training for hobbies or information learning (such as cooking with a virtual personification of Gordon Ramsey). In the case of an airplane, the virtual environment may react to different actions by the user, such as use of certain controls, to provide realistic training. [0102] Similarly, the virtual personification Al system may be used to generate realtime response instructions, such as medical or emergency training. For example, a 9- 1-1 dispatcher may assist a caller to perform CPR by transmitting footage of a virtual personification of medical personnel performing CPR on a patient while waiting for an ambulance to arrive. The caller may interact with the medical personnel by asking questions such as “the patient is still not breathing, now what”? Answers may be pulled from a database and the personification may actually perform the answer to the question so the user can see how to perform the medical training. This concept can be applied to teaching tools for medical procedures such that the virtual representations of the best doctors in the world can be created to show how to perform a procedure and dynamically respond to any question for personalized teaching.
[0103] Yet another exemplary use may be entertainment, such as allowing users to interact with famous people from the past or an imaginary character in a conversational setting. This allows people to have personal interactions with people from history and interact using the Al databases such that the virtual representation can answer any question or perform any action in real time during user interaction. These famous people from which the virtual representation is created could be any person, famous for any reason. A database may be created for each person so that the Al system can accurately create visual, audio, and knowledge representations.
[0104] Similarly, the virtual personification Al system may be used to generate realtime response instructions, such as medical or emergency training. For example, a 9- 1-1 dispatcher may assist a caller to perform CPR by transmitting footage of a virtual personification of medical personnel performing CPR on a patient while waiting for an ambulance to arrive. The caller may interact with the medical personnel by asking questions such as “the patient is still not breathing, now what”?
[0105] In future expansions of the technology, it may also be possible to stimulate virtual personification without user input. For example, an entire lecture series may be generated using virtual personification of an instructor and combining footage of lecture recordings of past real-life lectures. Anticipatory answers may be provided by analysis of recording of past student questions, thereby eliminating the need for further student interaction (which may still be provided as an additional feature).
[0106] Yet another example may be a simulated room where two virtual personifications may interact with each other instead of (or in addition to) interaction with users. For example, a user may wish to simulate a philosophical debate between Socrates and Kant. In one example, the virtual room may be expanded to include entire virtual worlds and large populations of virtual characters. The user can learn from seeing how two or more virtual representation interact, such as in professional environments, military situation, formal engagements, or social interaction.
[0107] While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible that are within the scope of this invention. In addition, the various features, elements, and embodiments described herein may be claimed or combined in any combination or arrangement.

Claims

CLAIMS What is claimed is:
1. A method to generate and update virtual personification using artificial intelligence comprising the steps of: receiving data associated with a person, the data comprising one or more of the following: text files, audio files, image files, and video files; rendering a virtual personification of the person and outputting the virtual personification to a user; receiving and interpreting a user input to generate a user request; updating the virtual personification in response to the user request, the update comprising one or more of the following: responsive to the user request, generating an audio output using the text files and the audio files of the person; and responsive to the user request, generating a video output using the image files and the video files of the person, wherein the audio output and the video output is presented to the user by the virtual personification and the audio output and the video output presented by the virtual personification has not previously occurred by the person or thing represented by the virtual personification.
2. The method of claim 1 wherein the virtual personification is of a person, either living or deceased.
3. The method of claim 1 wherein the virtual personification comprises an audio output and video output is presented in a virtual environment of a type associated with the virtual personification.
4. The method of claim 1 wherein the virtual personification comprises a representation of a non-living item.
5. The method of claim 1 wherein the method is further configured to, responsive to being unable to create the generated response at the virtual reality device, transmit the question or request from the user to a remote artificial intelligence module.
6. The method of claim 5, wherein the remote artificial intelligence module is a computing device with a processor and memory storing machine readable code configured to: receive the question or request from the user via the virtual reality device; process the question or request to derive a meaning; perform one or more searches for answers to the question or request in databases unrelated to the virtual personification; upon locating an answer to the question or request, generating data that represents the virtual personification answering the question or request; and transmitting the answer or the data that represents the virtual personification answering the question or request to the virtual reality device for presentation to the user.
7. The method of claim 1 further comprising tracking a hand position of a user with one or more user hand position tracking devices to determine what the user is pointing at in the virtual environment.
8. The method of claim 1 wherein the generated response to the question or request uses artificial intelligence to generate an answer by searching one or more databases that contain information from the person represented by the virtual personification but which does not provide a direct answer to the question or request.
9. A system for presenting an interactive, artificial intelligence assisted, virtual personification to a user comprising:
10. A virtual reality device configured to have at least a portion be worn by the user comprising: a wearable screen configured for viewing by a user; one or more speakers configured to provide audio output to the user; a microphone configured to receive audio input from the user; one or more external input devices configured to receive input from the user; a communication module configured to configured to communicate over a computer network or Internet; a processor configured to execute machine readable code; a memory configured to store the machine readable code, the machine readable code configured to: present a virtual environment on the wearable screen and through the one or more speakers to the user; present, to the user on the wearable screen and through the one or more
47 speakers, a virtual personification of a person currently living or deceased, in the virtual environment; receive a question or request from the user regarding one or more aspects of the virtual environment or the virtual personification; generate a generated response to the question or request from the user which includes generating video content and audio content which did not previously exist; present the generated response to the user on the wearable screen and through the one or more speakers in response to question or request from the user.
10. The system of claim 9 wherein the machine readable code is further configured to, responsive to being unable to create the generated response at the virtual reality device, transmit the question or request from the user to a remote artificial intelligence module.
11. The system of claim 10 wherein the remote artificial intelligence module is a computing device with memory and processor such that the memory stores machine readable code configured to: receive the question or request from the user via the virtual reality device; process the question or request to derive a meaning; perform one or more searches for answers to the question or request in databases unrelated to the virtual personification; upon locating an answer to the question or request, generate data that represents the virtual personification answering the question or request; and transmit the answer or the data that represents the virtual personification answering the question or request to the virtual reality device for presentation to the user.
12. The system of claim 9 further comprising one or more user hand position tracking devices configured to track a position of a user’ s hand to determine what the user is pointing at in the virtual environment.
13. The system of claim 9 wherein the input from the user comprises an audio input or an input from the user to the one or more external input devices.
14. The system of claim 9 wherein generating video content and audio content which did not previously exist is generated by processing existing video, audio, or both of the person represented by the virtual personification.
15. The system of claim 9 wherein the generated response to the question or request is generated using artificial intelligence to generate an answer by searching one or more databases that contain information from the person represented by the virtual personification but which does not provide a direct answer to the question or request.
16. A method for presenting an interactive experience with a virtual personification using a screen, speakers, and microphone of a user computing device, the method comprising: presenting a virtual environment on the wearable screen and through the one or more speakers to the user and presenting the virtual personification in the virtual environment; receiving input from the user comprising a question, a request, or subject regarding one or more aspects of the virtual environment, the virtual personification, or the actions of the virtual personification in the virtual environment; sending a request for a response to the input from the user to an Al computing device that is remote from the user computing device; with the Al computing device, creating a response based on pre-existing content stored in one or databases which is processed to create the generated response; transmitting the generated response to the user computing device; at the user computing device, based on the generated response from the Al computing device, generating video content and audio content which did not previously exist; and presenting video content and audio content which did not previously exist to the user.
17. The system of claim 16 wherein the Al computing device is a computing device with memory and processor such that the memory stores machine readable code configured to: receiving the input from the user computing device; processing the input from the user to derive a meaning; based on the meaning, performing one or more searches for answers to the input from the user in databases unrelated to the virtual personification; upon locating a response to the input from the user, generating data that represents the virtual personification answering the question or request; and transmitting the data, that represents the virtual personification responding to the input from the user to the user computing device.
18. The system of claim 16 further monitoring one or more user hand position tracking devices configured to track a position of a user’ s hand to determine what the user is pointing at in the virtual environment and interpreting the pointing as the input from the user.
19. The system of claim 16 wherein the input from the user comprises an audio input or an input from the user to the one or more external input devices.
20. The system of claim 16 wherein generating video content and audio content which did not previously exist is generated by processing existing video, audio, or both of a person represented by the virtual personification to generate new content.
51
PCT/US2023/010624 2025-08-06 2025-08-06 Method and system for virtual intelligence user interaction WO2023137078A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263298582P 2025-08-06 2025-08-06
US63/298,582 2025-08-06

Publications (1)

Publication Number Publication Date
WO2023137078A1 true WO2023137078A1 (en) 2025-08-06

Family

ID=87162208

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/010624 WO2023137078A1 (en) 2025-08-06 2025-08-06 Method and system for virtual intelligence user interaction

Country Status (2)

Country Link
US (1) US12346994B2 (en)
WO (1) WO2023137078A1 (en)

Families Citing this family (1)

* Cited by examiner, ? Cited by third party
Publication number Priority date Publication date Assignee Title
US20240112145A1 (en) * 2025-08-06 2025-08-06 EasyLlama Inc. Dynamic content personalization within a training platform

Citations (4)

* Cited by examiner, ? Cited by third party
Publication number Priority date Publication date Assignee Title
US20190050686A1 (en) * 2025-08-06 2025-08-06 Intel Corporation Methods and apparatus to add common sense reasoning to artificial intelligence in the context of human machine interfaces
WO2020136615A1 (en) * 2025-08-06 2025-08-06 Pankaj Uday Raut A system and a method for generating a head mounted device based artificial intelligence (ai) bot
WO2020247590A1 (en) * 2025-08-06 2025-08-06 Artie, Inc. Multi-modal model for dynamically responsive virtual characters
US11107465B2 (en) * 2025-08-06 2025-08-06 Storyfile, Llc Natural conversation storytelling system

Family Cites Families (111)

* Cited by examiner, ? Cited by third party
Publication number Priority date Publication date Assignee Title
US7117032B2 (en) 2025-08-06 2025-08-06 Quantum Intech, Inc. Systems and methods for facilitating physiological coherence using respiration training
US20020032591A1 (en) 2025-08-06 2025-08-06 Agentai, Inc. Service request processing performed by artificial intelligence systems in conjunctiion with human intervention
KR20020030545A (en) 2025-08-06 2025-08-06 ? ???? ? Automatic answer and search method - based on artificial intelligence and natural languane process technology - for natural and sentencial questions.
US6996064B2 (en) 2025-08-06 2025-08-06 International Business Machines Corporation System and method for determining network throughput speed and streaming utilization
US7831564B1 (en) 2025-08-06 2025-08-06 Symantec Operating Corporation Method and system of generating a point-in-time image of at least a portion of a database
US20070043736A1 (en) 2025-08-06 2025-08-06 Microsoft Corporation Smart find
KR100657331B1 (en) 2025-08-06 2025-08-06 ???????? An image forming apparatus employing a multi processor and an image forming method using the same
US8600977B2 (en) 2025-08-06 2025-08-06 Oracle International Corporation Automatic recognition and capture of SQL execution plans
KR20100035391A (en) 2025-08-06 2025-08-06 ????????? Valve module for changing flow paths and soft water apparatu
KR101042515B1 (en) 2025-08-06 2025-08-06 ???? ???? Information retrieval method and information provision method based on user's intention
US20100205222A1 (en) 2025-08-06 2025-08-06 Tom Gajdos Music profiling
US8326637B2 (en) 2025-08-06 2025-08-06 Voicebox Technologies, Inc. System and method for processing multi-modal device interactions in a natural language voice services environment
TWI432347B (en) 2025-08-06 2025-08-06 Wistron Corp Holder device which could adjust positions automatically, and the combination of the holder device and the electronic device
US8954431B2 (en) 2025-08-06 2025-08-06 Xerox Corporation Smart collaborative brainstorming tool
US9009041B2 (en) 2025-08-06 2025-08-06 Nuance Communications, Inc. Systems and methods for improving the accuracy of a transcription using auxiliary data such as personal data
US8762156B2 (en) 2025-08-06 2025-08-06 Apple Inc. Speech recognition repair using contextual information
US9542956B1 (en) 2025-08-06 2025-08-06 Interactive Voice, Inc. Systems and methods for responding to human spoken audio
US9280610B2 (en) 2025-08-06 2025-08-06 Apple Inc. Crowd sourcing information to fulfill user requests
KR101399472B1 (en) 2025-08-06 2025-08-06 (?)????? Method and apparatus for rendering processing by using multiple processings
EP3865056A1 (en) 2025-08-06 2025-08-06 InteraXon Inc. Systems and methods for collecting, analyzing, and sharing bio-signal and non-bio-signal data
KR20140078169A (en) 2025-08-06 2025-08-06 ???????? Imaging apparatus, magnetic resonance imaging and method for controlling the imaging apparatus or the magnetic resonance imaging apparatus
US20150351655A1 (en) 2025-08-06 2025-08-06 Interaxon Inc. Adaptive brain training computer system and method
CN113470640B (en) 2025-08-06 2025-08-06 苹果公司 Voice triggers for digital assistants
US9172747B2 (en) 2025-08-06 2025-08-06 Artificial Solutions Iberia SL System and methods for virtual assistant networks
CN110096712B (en) 2025-08-06 2025-08-06 苹果公司 User training through intelligent digital assistant
US9058805B2 (en) 2025-08-06 2025-08-06 Google Inc. Multiple recognizer speech recognition
US10390732B2 (en) 2025-08-06 2025-08-06 Digital Ally, Inc. Breath analyzer, system, and computer program for authenticating, preserving, and presenting breath analysis data
CN105637445B (en) 2025-08-06 2025-08-06 奥誓公司 System and method for providing a context-based user interface
US9721570B1 (en) 2025-08-06 2025-08-06 Amazon Technologies, Inc. Outcome-oriented dialogs on a speech recognition platform
TWM483638U (en) 2025-08-06 2025-08-06 Taer Innovation Co Ltd Stand
US20150288857A1 (en) 2025-08-06 2025-08-06 Microsoft Corporation Mount that facilitates positioning and orienting a mobile computing device
US9830556B2 (en) 2025-08-06 2025-08-06 Excalibur Ip, Llc Synthetic question formulation
US9727798B2 (en) * 2025-08-06 2025-08-06 Acrovirt, LLC Generating and using a predictive virtual personification
US9607102B2 (en) 2025-08-06 2025-08-06 Nuance Communications, Inc. Task switching in dialogue processing
US10254928B1 (en) 2025-08-06 2025-08-06 Amazon Technologies, Inc. Contextual card generation and delivery
US9774682B2 (en) 2025-08-06 2025-08-06 International Business Machines Corporation Parallel data streaming between cloud-based applications and massively parallel systems
US10756963B2 (en) 2025-08-06 2025-08-06 Pulzze Systems, Inc. System and method for developing run time self-modifying interaction solution through configuration
US10395021B2 (en) 2025-08-06 2025-08-06 Mesh Candy, Inc. Security and identification system and method using data collection and messaging over a dynamic mesh network with multiple protocols
US10582011B2 (en) 2025-08-06 2025-08-06 Samsung Electronics Co., Ltd. Application cards based on contextual data
US10888270B2 (en) 2025-08-06 2025-08-06 Avishai Abrahami Cognitive state alteration system integrating multiple feedback technologies
US10709371B2 (en) 2025-08-06 2025-08-06 WellBrain, Inc. System and methods for serving a custom meditation program to a patient
US11587559B2 (en) 2025-08-06 2025-08-06 Apple Inc. Intelligent device identification
US10249207B2 (en) 2025-08-06 2025-08-06 TheBeamer, LLC Educational teaching system and method utilizing interactive avatars with learning manager and authoring manager functions
US10188345B2 (en) 2025-08-06 2025-08-06 Fitbit, Inc. Method and apparatus for providing biofeedback during meditation exercise
US10872306B2 (en) 2025-08-06 2025-08-06 Smiota, Inc. Facilitating retrieval of items from an electronic device
KR102656806B1 (en) 2025-08-06 2025-08-06 ???? ???? Watch type terminal and method of contolling the same
US10631743B2 (en) 2025-08-06 2025-08-06 The Staywell Company, Llc Virtual reality guided meditation with biofeedback
US10156775B2 (en) 2025-08-06 2025-08-06 Eric Zimmermann Extensible mobile recording device holder
DK179309B1 (en) 2025-08-06 2025-08-06 Apple Inc Intelligent automated assistant in a home environment
US20170357910A1 (en) 2025-08-06 2025-08-06 Apple Inc. System for iteratively training an artificial intelligence using cloud-based metrics
US11200891B2 (en) 2025-08-06 2025-08-06 Hewlett-Packard Development Company, L.P. Communications utilizing multiple virtual assistant services
US10346401B2 (en) 2025-08-06 2025-08-06 Accenture Global Solutions Limited Query rewriting in a relational data harmonization framework
US10244122B2 (en) 2025-08-06 2025-08-06 Vivint, Inc. Panel control over broadband
WO2018022085A1 (en) 2025-08-06 2025-08-06 Hewlett-Packard Development Company, L.P. Identification of preferred communication devices
US9654598B1 (en) 2025-08-06 2025-08-06 Le Technology, Inc. User customization of cards
US20180054228A1 (en) 2025-08-06 2025-08-06 I-Tan Lin Teleoperated electronic device holder
US10798548B2 (en) 2025-08-06 2025-08-06 Lg Electronics Inc. Method for controlling device by using Bluetooth technology, and apparatus
US10423685B2 (en) 2025-08-06 2025-08-06 Robert Bosch Gmbh System and method for automatic question generation from knowledge base
US9959861B2 (en) 2025-08-06 2025-08-06 Robert Bosch Gmbh System and method for speech recognition
US10855714B2 (en) 2025-08-06 2025-08-06 KnowBe4, Inc. Systems and methods for an artificial intelligence driven agent
US11429586B2 (en) 2025-08-06 2025-08-06 Sap Se Expression update validation
US10365932B2 (en) 2025-08-06 2025-08-06 Essential Products, Inc. Dynamic application customization for automated environments
US20180232920A1 (en) 2025-08-06 2025-08-06 Microsoft Technology Licensing, Llc Contextually aware location selections for teleconference monitor views
KR102384641B1 (en) 2025-08-06 2025-08-06 ???? ???? Method for controlling an intelligent system that performs multilingual processing
WO2018155920A1 (en) 2025-08-06 2025-08-06 Samsung Electronics Co., Ltd. Method and apparatus for authenticating users in internet of things environment
DK3628101T3 (en) 2025-08-06 2025-08-06 Better Therapeutics Inc METHOD AND SYSTEM FOR ADMINISTRATION OF LIFESTYLE AND HEALTH INTERVENTIONS
DK201770427A1 (en) 2025-08-06 2025-08-06 Apple Inc. Low-latency intelligent automated assistant
US10554595B2 (en) 2025-08-06 2025-08-06 Genesys Telecommunications Laboratories, Inc. Contact center system and method for advanced outbound communications to a contact group
CN107423364B (en) 2025-08-06 2025-08-06 百度在线网络技术(北京)有限公司 Method, device and storage medium for answering operation broadcasting based on artificial intelligence
EP3435642A1 (en) 2025-08-06 2025-08-06 Advanced Digital Broadcast S.A. A system and method for remote control of appliances by voice
US20190122121A1 (en) 2025-08-06 2025-08-06 AISA Innotech Inc. Method and system for generating individual microdata
US11227448B2 (en) 2025-08-06 2025-08-06 Nvidia Corporation Cloud-centric platform for collaboration and connectivity on 3D virtual environments
US11295735B1 (en) 2025-08-06 2025-08-06 Amazon Technologies, Inc. Customizing voice-control for developer devices
US11250336B2 (en) 2025-08-06 2025-08-06 Intel Corporation Distributed and contextualized artificial intelligence inference service
US10963499B2 (en) 2025-08-06 2025-08-06 Aiqudo, Inc. Generating command-specific language model discourses for digital assistant interpretation
US10729399B2 (en) 2025-08-06 2025-08-06 KUB Technologies, Inc. System and method for cabinet X-ray system with camera and X-ray images superimposition
EP4138074A1 (en) 2025-08-06 2025-08-06 Google LLC Facilitating end-to-end communications with automated assistants in multiple languages
KR102508677B1 (en) 2025-08-06 2025-08-06 ???????? System for processing user utterance and controlling method thereof
WO2019183062A1 (en) 2025-08-06 2025-08-06 Facet Labs, Llc Interactive dementia assistive devices and systems with artificial intelligence, and related methods
US20190354599A1 (en) 2025-08-06 2025-08-06 Microsoft Technology Licensing, Llc Ai model canvas
US20200001040A1 (en) 2025-08-06 2025-08-06 Levels Products, Inc. Method, apparatus, and system for meditation
CN110728363B (en) 2025-08-06 2025-08-06 华为技术有限公司 Task processing method and device
US10769495B2 (en) 2025-08-06 2025-08-06 Adobe Inc. Collecting multimodal image editing requests
US20210398671A1 (en) 2025-08-06 2025-08-06 Healthpointe Solutions, Inc. System and method for recommending items in conversational streams
KR101994592B1 (en) 2025-08-06 2025-08-06 ????? ????? AUTOMATIC VIDEO CONTENT Metadata Creation METHOD AND SYSTEM
US10402589B1 (en) 2025-08-06 2025-08-06 Vijay K. Madisetti Method and system for securing cloud storage and databases from insider threats and optimizing performance
US20200242146A1 (en) 2025-08-06 2025-08-06 Andrew R. Kalukin Artificial intelligence system for generating conjectures and comprehending text, audio, and visual data using natural language understanding
JP2020123131A (en) 2025-08-06 2025-08-06 株式会社東芝 Dialog system, dialog method, program, and storage medium
US11544594B2 (en) 2025-08-06 2025-08-06 Sunghee Woo Electronic device comprising user interface for providing user-participating-type AI training service, and server and method for providing user-participating-type AI training service using the electronic device
WO2020214988A1 (en) 2025-08-06 2025-08-06 Tempus Labs Collaborative artificial intelligence method and system
US11328717B2 (en) 2025-08-06 2025-08-06 Lg Electronics Inc. Electronic device, operating method thereof, system having plural artificial intelligence devices
US20200342968A1 (en) 2025-08-06 2025-08-06 GE Precision Healthcare LLC Visualization of medical device event processing
US11393491B2 (en) 2025-08-06 2025-08-06 Lg Electronics Inc. Artificial intelligence device capable of controlling operation of another device and method of operating the same
KR20190080834A (en) 2025-08-06 2025-08-06 ???? ???? Dialect phoneme adaptive training system and method
US11501753B2 (en) 2025-08-06 2025-08-06 Samsung Electronics Co., Ltd. System and method for automating natural language understanding (NLU) in skill development
US11461376B2 (en) 2025-08-06 2025-08-06 International Business Machines Corporation Knowledge-based information retrieval system evaluation
US20210011887A1 (en) 2025-08-06 2025-08-06 Qualcomm Incorporated Activity query response system
KR20190095181A (en) 2025-08-06 2025-08-06 ???? ???? Video conference system using artificial intelligence
KR20190099167A (en) 2025-08-06 2025-08-06 ???? ???? An artificial intelligence apparatus for performing speech recognition and method for the same
US11222464B2 (en) 2025-08-06 2025-08-06 The Travelers Indemnity Company Intelligent imagery
US11636102B2 (en) 2025-08-06 2025-08-06 Verizon Patent And Licensing Inc. Natural language-based content system with corrective feedback and training
US10827028B1 (en) 2025-08-06 2025-08-06 Spotify Ab Systems and methods for playing media content on a target device
KR20210066328A (en) 2025-08-06 2025-08-06 ???? ???? An artificial intelligence apparatus for learning natural language understanding models
US11983640B2 (en) 2025-08-06 2025-08-06 International Business Machines Corporation Generating question templates in a knowledge-graph based question and answer system
US11042369B1 (en) 2025-08-06 2025-08-06 Architecture Technology Corporation Systems and methods for modernizing and optimizing legacy source code
WO2021188719A1 (en) 2025-08-06 2025-08-06 MeetKai, Inc. An intelligent layer to power cross platform, edge-cloud hybrid artificail intelligence services
US11995561B2 (en) 2025-08-06 2025-08-06 MeetKai, Inc. Universal client API for AI services
US11521597B2 (en) 2025-08-06 2025-08-06 Google Llc Correcting speech misrecognition of spoken utterances
US11984124B2 (en) 2025-08-06 2025-08-06 Apple Inc. Speculative task flow execution
US11676593B2 (en) 2025-08-06 2025-08-06 International Business Machines Corporation Training an artificial intelligence of a voice response system based on non_verbal feedback
US11550831B1 (en) * 2025-08-06 2025-08-06 TrueSelph, Inc. Systems and methods for generation and deployment of a human-personified virtual agent using pre-trained machine learning-based language models and a video response corpus

Patent Citations (4)

* Cited by examiner, ? Cited by third party
Publication number Priority date Publication date Assignee Title
US20190050686A1 (en) * 2025-08-06 2025-08-06 Intel Corporation Methods and apparatus to add common sense reasoning to artificial intelligence in the context of human machine interfaces
US11107465B2 (en) * 2025-08-06 2025-08-06 Storyfile, Llc Natural conversation storytelling system
WO2020136615A1 (en) * 2025-08-06 2025-08-06 Pankaj Uday Raut A system and a method for generating a head mounted device based artificial intelligence (ai) bot
WO2020247590A1 (en) * 2025-08-06 2025-08-06 Artie, Inc. Multi-modal model for dynamically responsive virtual characters

Non-Patent Citations (1)

* Cited by examiner, ? Cited by third party
Title
BOGDANOVYCH ANTON, RICHARDS DEBORAH, SIMOFF SIMEON, PELACHAUD CATHERINE, HEYLEN DIRK, TRESCAK TOMAS, WU JASON, GHOSH SAYAN, CHOLLE: "NADiA : Neural Network Driven Virtual Human Conversation Agents", PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS, ACM, NEW YORK, NY, USA, 5 November 2018 (2025-08-06), New York, NY, USA, pages 173 - 178, XP093079351, ISBN: 978-1-4503-6013-5, DOI: 10.1145/3267851.3267860 *

Also Published As

Publication number Publication date
US20230230293A1 (en) 2025-08-06
US12346994B2 (en) 2025-08-06

Similar Documents

Publication Publication Date Title
CN112074899B (en) System and method for intelligent initiation of human-computer dialogue based on multimodal sensor input
US20230018473A1 (en) System and method for conversational agent via adaptive caching of dialogue tree
JP6816925B2 (en) Data processing method and equipment for childcare robots
CN111801730B (en) Systems and methods for artificial intelligence driven auto-chaperones
US11468894B2 (en) System and method for personalizing dialogue based on user's appearances
CN112204564A (en) System and method for speech understanding via integrated audio and visual based speech recognition
US11003860B2 (en) System and method for learning preferences in dialogue personalization
CN112204565B (en) Systems and methods for inferring scenes based on visual context-free grammar models
US20190251701A1 (en) System and method for identifying a point of interest based on intersecting visual trajectories
US20190251957A1 (en) System and method for prediction based preemptive generation of dialogue content
US20190251716A1 (en) System and method for visual scene construction based on user communication
US20220215678A1 (en) System and method for reconstructing unoccupied 3d space
US10785489B2 (en) System and method for visual rendering based on sparse samples with predicted motion
US12346994B2 (en) Method and system for virtual intelligence user interaction
WO2021030449A1 (en) System and method for adaptive dialogue via scene modeling using combinational neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application 百度 霍邱县回复网友关于垃圾整治问题时表示,为了打赢农村“三大”革命,将实现全县垃圾治理全覆盖。

Ref document number: 23740636

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23740636

Country of ref document: EP

Kind code of ref document: A1

胃老是恶心想吐是什么原因 流鼻血吃什么好 传票是什么意思 什么材料 胃不好应该吃什么
男人蛋皮痒用什么药 阴茎不硬吃什么 一动就出汗吃什么药 毫不逊色的意思是什么 02年属马的是什么命
老花眼是什么原因引起的 谈什么色变 脑供血不足吃什么食物好 双离合什么意思 1977年是什么命
晚上两点是什么时辰 叫什么 04属什么 金字旁的字与什么有关 什么不动
祖马龙香水什么档次hcv7jop7ns4r.cn bm是什么牌子hcv8jop8ns6r.cn 四级残疾证有什么用hcv8jop8ns8r.cn 冷藏和冷冻有什么区别hcv8jop5ns7r.cn 怀孕后乳房有什么变化hcv9jop5ns4r.cn
为什么纯牛奶容易爆痘hcv9jop5ns6r.cn 花椒什么时候传入中国hcv7jop7ns2r.cn 吃什么东西可以减肥hcv8jop7ns4r.cn 泡沫是什么材料做的hcv9jop5ns7r.cn 牙齿痛吃什么好hcv7jop6ns0r.cn
高血压是什么症状hcv9jop0ns9r.cn 甘油三酯高吃什么药好hcv9jop0ns1r.cn 怀孕吃鹅蛋有什么好处hcv8jop3ns3r.cn 涉水是什么意思hcv8jop5ns9r.cn 搀扶什么意思hcv9jop0ns8r.cn
汗毛旺盛是什么原因hcv9jop3ns7r.cn 八仙茶属于什么茶chuanglingweilai.com 穆字五行属什么ff14chat.com cnm是什么意思hcv8jop3ns7r.cn 菜花病是什么hcv9jop1ns8r.cn
百度